From 491410d4ecbc97aa9f324fc3d9ea80f9773503ae Mon Sep 17 00:00:00 2001 From: limafang Date: Sun, 20 Oct 2024 12:19:50 +0000 Subject: [PATCH] Github Action Automatic Update agent Arxiv Papers --- README.md | 40 ++++++++++++++++++------------------- docs/agent-arxiv-daily.json | 2 +- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 8af8d7c7a46..206e8ed01ed 100755 --- a/README.md +++ b/README.md @@ -13,16 +13,16 @@ |Publish Date|Title|Authors|PDF|Code|abstract| |---|---|---|---|---|---| -|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|通过使用大型语言模型(LLMs)的代理实现自主性,可以提高人类在个性化和标准化任务中的效率。自动化网络任务(如在预算内预订酒店)的需求日益增加。这些网络代理不仅能满足实际需求,还作为各种代理接地场景的重要概念验证示例,其成功将预示着许多未来应用的进步。先前的研究通常手工设计网络代理策略(例如,提示模板、多代理系统、搜索方法等),这些策略可能无法很好地推广到所有现实世界场景。另一方面,对于网络代理的观察/动作表示与支持它的LLM的预训练数据之间的不匹配,研究相对较少。这种差异在LLMs主要针对语言完成而非涉及具身导航动作和符号网络元素的任务时尤为显著。我们的研究通过简单地优化观察和动作空间来增强基于LLM的网络代理,以更好地与其能力相匹配。这种方法使我们的基础代理在广泛的网络任务上显著优于以前的方法。具体来说,在WebArena基准测试中,该测试涵盖了通用网络交互任务,我们的代理AgentOccam比以前的最先进技术和同期工作分别高出9.8个绝对点(+29.4%)和5.9个绝对点(+15.8%),并且相比类似的普通网络代理,其成功率提高了26.6个点(+161%)。我们没有使用上下文示例、新的代理角色、在线反馈或搜索策略。AgentOccam的简洁设计突显了LLMs在网页任务上的零样本性能,并强调了精心调整观察和动作空间对于基于LLM的代理的重要性。| -|**2024-10-17**|**Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems**|Alireza Ghafarollahi et.al.|[2410.13768](http://arxiv.org/abs/2410.13768)|null|一个多智能体AI模型被用于自动化发现新型金属合金,该模型整合了多模态数据和外部知识,包括通过原子模拟获得的物理见解。我们的多智能体系统包含三个关键组件:(a) 一组大型语言模型(LLMs)负责推理和规划等任务,(b) 一群具有不同角色和专长的AI代理动态协作,以及(c) 一种新开发的图神经网络(GNN)模型,用于快速检索关键物理属性。一组由LLM驱动的AI代理协同工作,以自动化探索MPEAs(高熵合金)的巨大设计空间,并由GNN的预测结果进行指导。我们专注于NbMoTa系列体心立方(bcc)合金,这些合金使用基于机器学习的原子间势模型进行建模,并且重点关注两个关键属性:Peierls势垒和固溶体/螺型位错相互作用能。我们的GNN模型能够准确预测这些原子尺度的属性,提供了一种比昂贵的暴力计算更快的替代方案,减少了多智能体系统在物理属性检索上的计算负担。这一AI系统通过减少对人类专业知识的依赖并克服直接全原子模拟的限制,革新了材料发现的过程。通过协同GNN的预测能力和LLM驱动代理的动态协作,该系统自主导航巨大的合金设计空间,识别原子尺度材料属性的趋势,并预测宏观尺度的机械强度,如几个计算实验所示。这种方法加速了先进合金的发现,并有望在其他复杂系统中有更广泛的应用,标志着自动材料设计领域的一个重大进展。| -|**2024-10-17**|**MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling**|Yakun Zhu et.al.|[2410.13610](http://arxiv.org/abs/2410.13610)|null|在大型语言模型(LLMs)中集成工具已经促进了其广泛应用。然而,在专门的下游任务环境中,仅依赖工具不足以完全解决现实世界的复杂性,这尤其限制了LLMs在医学等领域的有效应用。本文聚焦于医学计算器的下游任务,这些任务使用标准化测试来评估个体的健康状况。我们介绍了MeNTi,这是一种针对LLMs的通用代理架构。MeNTi集成了专业的医学工具包,并采用元工具和嵌套调用机制以增强LLM工具的使用效果。具体来说,它实现了灵活的工具选择和嵌套工具调用来应对复杂的医疗场景中的实际问题,包括计算器选择、插槽填充和单位转换。为了评估LLMs在整个临床过程中进行计算器场景定量评估的能力,我们引入了CalcQA基准。该基准要求LLMs使用医学计算器进行计算并评估患者的健康状况。CalcQA由专业医生构建,包含100个案例-计算器对,并附带一个包含281个医学工具的工具包。实验结果表明,我们的框架显著提升了性能。这项研究为LLMs在医学高需求场景中的应用开辟了新的方向。| -|**2024-10-17**|**Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents**|Long Li et.al.|[2410.13185](http://arxiv.org/abs/2410.13185)|null|有效的研究创意构思是科学研究中的关键步骤。然而,随着科学文献的指数级增长,研究人员很难跟上最新的进展并确定有意义的研究方向。最近大型语言模型(LLMs)的发展表明,自动化生成新颖研究创意是一个有前景的方向。然而,现有的创意生成方法要么简单地提示LLMs,要么直接向LLMs暴露大量文献而没有指示有用的信息。受到人类研究人员研究过程的启发,我们提出了一种名为Chain-of-Ideas(CoI)的代理,这是一种基于LLM的代理,它以链式结构组织相关文献,有效地反映了研究领域的渐进发展。这种组织方式使LLMs能够捕捉当前的研究进展,从而增强其创意能力。此外,我们还提出了一个名为Idea Arena的评估协议,可以从不同角度全面评估创意生成方法,与人类研究人员的偏好紧密对齐。实验结果表明,CoI代理在研究创意生成方面始终优于其他方法,并且其质量可与人类相媲美。此外,我们的CoI代理成本效益高,生成一个候选创意及其相应实验设计的最低成本仅为0.50美元。| -|**2024-10-16**|**Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving**|Sihao Wu et.al.|[2410.12568](http://arxiv.org/abs/2410.12568)|null|大型语言模型(LLMs)在自动驾驶系统中的集成展示了强大的常识和推理能力,有效地解决了纯粹数据驱动方法的缺陷。当前基于LLM的代理需要较长的推理时间,并且在与实时自动驾驶环境交互时面临挑战。一个关键的开放性问题是,我们能否有效利用LLM的知识来训练高效且稳健的强化学习(RL)代理。本文介绍了一种新颖的RAPID框架,即鲁棒自适应策略注入与蒸馏框架,该框架通过使用由基于LLM的驾驶代理合成的数据进行在线适应,训练专门的混合策略RL代理。RAPID具有三个关键设计:1)利用从LLM代理收集的离线数据,将专家知识提炼到RL策略中,以加快实时推理速度;2)引入在RL中具有鲁棒性的蒸馏技术,继承LLM基教师的性能和鲁棒性;3)采用混合策略方法,通过策略适配器进行联合决策解码。通过在线环境互动进行微调,RAPID减少了LLM知识的遗忘,同时保持对不同任务的适应性。广泛的实验表明,RAPID能够以高效、可适应和稳健的方式将LLM知识有效整合到规模化的RL策略中。代码和检查点将在接受后公开。| -|**2024-10-16**|**SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling**|Loris Gaven et.al.|[2410.12481](http://arxiv.org/abs/2410.12481)|null|近年来,大型语言模型(LLMs)不仅作为生成模型发展,还作为解决文本序列决策任务的代理。当面对复杂环境,其零样本能力不足时,最近的研究表明,可以使用在线强化学习(RL)让这些模型代理交互式地发现和学习有效的策略。然而,大多数先前的工作局限于采用策略梯度算法,这大大限制了这些代理在探索和利用方面可以使用的其他方法,例如经验回放和事后重标记。然而,这些方法对于LLM学习代理可能是关键的,特别是在设计自主内在动机代理时,这些代理会根据自己的目标进行采样和追求(即自足型代理)。本文提出并研究了一种针对LLM代理的Soft Actor-Critic算法和事后重标记的适应性。我们的方法不仅为开发能够在线学习的自足型LLM代理铺平了道路,而且在更经典的多目标RL环境中也能优于策略梯度方法。| -|**2024-10-16**|**Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance**|Yaxi Lu et.al.|[2410.12361](http://arxiv.org/abs/2410.12361)|null|基于大型语言模型的代理在解决复杂任务方面已经展现出了显著的能力。然而,大多数代理系统仍然局限于被动反应,这限制了它们在需要预见性和自主决策的场景中的有效性。在这篇论文中,我们致力于开发能够预见并主动发起任务的代理,而无需明确的人类指令。为此,我们提出了一种新颖的数据驱动方法来解决这个问题。首先,我们收集真实世界的人类活动以生成主动性的任务预测。这些预测随后由人类标注者标记为接受或拒绝。标注数据被用于训练一个奖励模型,该模型模拟人类判断,并作为LLM代理主动性自动评估的工具。在此基础上,我们开发了一个全面的数据生成管道,创建了一个包含6790个事件的多样化数据集ProactiveBench。最后,我们证明通过使用所提出的ProactiveBench进行微调可以显著激发LLM代理的主动性。实验结果显示,我们的微调模型在主动提供帮助方面的F1得分达到66.47%,优于所有开源和闭源模型。这些结果突显了我们方法在创造更主动和有效的代理系统方面的潜力,为未来的人机协作进步铺平了道路。| -|**2024-10-16**|**Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay**|Yuyang Chen et.al.|[2410.12236](http://arxiv.org/abs/2410.12236)|null|如今,针对代码生成任务的变压器型大语言模型(LLM)通常会应用采样和过滤管道。但由于代码生成任务中的稀疏奖励问题,即一个令牌的不正确性会导致变压器模型会持续采样冗余程序直到找到正确的程序,从而导致效率低下。为了克服这一挑战,我们在微调阶段引入了经验回放(ER),其中生成的代码和程序会被存储并重放,以使LLM代理有机会从过去的经历中学习。基于ER的精神,我们介绍了一种新的方法称为BTP流程,该流程包括三个阶段:束搜索采样、测试阶段和优先级经验回放阶段。这种方法利用了代码模型收集的失败程序,并从回放缓冲区中重放具有高可能性和通过率优先值(P2Value)的程序,以提高效率。P2Value全面考虑了变压器输出的可能性和通过率,并可以利用大多数由LLMs收集的程序未能通过任何测试而产生的冗余资源。我们在几个LLM中实证应用了我们的方法,证明它提高了它们在代码生成任务中的性能,并超越了现有的基线。| -|**2024-10-15**|**Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents**|Bolun Sun et.al.|[2410.11906](http://arxiv.org/abs/2410.11906)|null|本文提出了一种新颖的应用大型语言模型(LLMs)的方法,通过交互式对话代理来增强用户对隐私政策的理解。我们展示了LLMs在数据实践识别、选择识别、政策总结和隐私问答等任务上显著优于传统模型,为隐私政策分析设定了新的基准。在此基础上,我们引入了一种创新的基于LLM的代理,作为处理网站隐私政策的专业系统,引导用户穿越复杂的法律语言,而无需用户提出特定问题。一项涉及100名参与者的用户研究表明,使用该代理的用户具有更高的理解水平(平均得分3分中的2.6分对比对照组的1.8分),降低了认知负荷(任务难度评分为10分中的3.2分对比7.8分),增强了管理隐私的信心,并且完成任务的时间更短(5.5分钟对比15.8分钟)。这项工作突显了基于LLM的代理在改变用户与隐私政策互动方面的潜力,从而实现更加知情的同意并赋予用户在数字服务领域的权力。| -|**2024-10-15**|**HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications**|Weijie Xu et.al.|[2410.11239](http://arxiv.org/abs/2410.11239)|null|近年来,大型语言模型(LLM)的发展在教育和金融等多个领域带来了诸多益处,但在人力资源领域,仍然存在许多重复性任务未被解决,例如访问请求、医疗报销和请假申请等。我们希望将这些任务与LLM代理相关联,该代理已经在诸如写作辅助和客户服务等领域取得了进展。我们提出了HR-Agent,这是一种高效、保密且专门针对人力资源的基于LLM的任务导向对话系统,旨在自动化如医疗报销和访问请求等重复性HR流程。由于在推理过程中不会将对话数据发送给LLM,因此它能够保持人力资源相关任务所需的保密性。| +|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|通过使用大型语言模型(LLMs)的代理来实现个性化和标准化任务,可以提高人类的工作效率。自动化网络任务(如在预算内预订酒店)的需求日益增加。满足实际需求的同时,网络代理也作为各种代理接地场景的重要概念验证示例,其成功将预示着许多未来应用的进步。先前的研究通常会手工设计网络代理策略(例如,提示模板、多代理系统、搜索方法等),这些策略可能无法很好地推广到所有现实世界场景中。另一方面,关于网络代理的观察/动作表示与基于LLM的预训练数据之间的不匹配的研究非常有限。这种差异特别明显,因为LLM主要是为了语言补全而训练的,而不是为了涉及具身导航动作和符号化网络元素的任务。我们的研究通过简单地优化观察和动作空间,使基于LLM的网络代理更好地与LLM的能力相匹配,从而显著提升了其性能。这种方法使我们的基础代理在各种网络任务上显著优于以前的方法。具体来说,在WebArena基准测试中,我们的代理AgentOccam比以前的最佳方法高出9.8分(+29.4%)和5.9分(+15.8%),并且相比类似的普通网络代理,其成功率提高了26.6分(+161%)。我们没有使用上下文示例、新的代理角色、在线反馈或搜索策略。AgentOccam的简单设计展示了LLMs在无样本学习下处理网络任务的强大能力,并强调了精心调整观察和动作空间对于基于LLM的代理至关重要的作用。| +|**2024-10-17**|**Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems**|Alireza Ghafarollahi et.al.|[2410.13768](http://arxiv.org/abs/2410.13768)|null|一个多智能体AI模型被用于自动化发现新的金属合金,整合了多模态数据和外部知识,包括通过原子模拟获得的物理见解。我们的多智能体系统有三个关键组成部分:(a) 一组大型语言模型(LLMs)负责推理和规划等任务,(b) 一群具有不同角色和专长的AI代理动态协作,以及(c) 一种新开发的图神经网络(GNN)模型,用于快速检索关键物理属性。一组由LLM驱动的AI代理合作,自动化探索MPEAs(高熵合金)的巨大设计空间,并由GNN的预测指导。我们专注于NbMoTa族体心立方(bcc)合金,使用基于机器学习的原子间势进行建模,目标是两个关键属性:Peierls势垒和溶质/螺型位错相互作用能。我们的GNN模型准确地预测这些原子尺度的属性,提供了一种比昂贵的穷举计算更快的替代方案,减轻了多智能体系统在物理属性检索上的计算负担。该AI系统通过减少对人类专业知识的依赖并克服直接全原子模拟的限制,革新了材料发现过程。通过协同GNN的预测能力和LLM代理的动态协作,该系统自主导航巨大的合金设计空间,识别原子尺度材料属性的趋势,并预测宏观机械强度,如几个计算实验所展示的那样。这种方法加速了先进合金的发现,并有望在其他复杂系统中有更广泛的应用,标志着自动化材料设计领域的一大进步。| +|**2024-10-17**|**MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling**|Yakun Zhu et.al.|[2410.13610](http://arxiv.org/abs/2410.13610)|null|在大型语言模型(LLMs)中集成工具已经促进了其广泛应用。然而,在专门的下游任务场景中,仅依赖工具是不足以应对现实世界的复杂性的,这尤其限制了LLMs在医学等领域的有效部署。本文专注于医学计算器的下游任务,这些计算器使用标准化测试来评估个人的健康状况。我们介绍了MeNTi,这是一种为LLMs设计的通用代理架构。MeNTi集成了专业的医学工具包,并采用元工具和嵌套调用机制以增强LLMs对工具的利用。具体来说,它实现了灵活的工具选择和嵌套工具调用来解决复杂的医疗场景中的实际问题,包括计算器选择、槽填充和单位转换。为了评估LLMs在整个临床过程中使用医学计算器进行计算和评估患者健康状况的能力,我们引入了CalcQA基准。该基准由专业医生构建,包含100个案例-计算器对,并附带一个包含281个医学工具的工具包。实验结果表明,我们的框架显著提升了性能。这项研究为在医学的高要求场景中应用LLMs开辟了新的方向。| +|**2024-10-17**|**Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents**|Long Li et.al.|[2410.13185](http://arxiv.org/abs/2410.13185)|null|有效的研究创意构思是科学研究的关键步骤。然而,科学文献的指数增长使得研究人员难以跟上最新的进展并确定有意义的研究方向。最近大型语言模型(LLMs)的发展表明,自动化生成新的研究创意是一个有前景的途径。然而,现有的创意生成方法要么简单地提示LLMs,要么直接向LLMs暴露大量的文献而没有指示有用的信息。受到人类研究人员研究过程的启发,我们提出了一种基于链式想法(Chain-of-Ideas, CoI)的代理,这是一种基于LLM的代理,通过链式结构组织相关文献,有效地反映了研究领域的渐进发展。这种组织方式使LLMs能够捕捉到研究领域的当前进展,从而增强其创意生成能力。此外,我们还提出了一个名为Idea Arena的评估协议,可以从不同角度全面评估创意生成方法,与人类研究人员的偏好紧密对齐。实验结果表明,CoI代理在创意生成方面始终优于其他方法,并且在质量上可与人类媲美。此外,我们的CoI代理成本效益高,生成一个候选创意及其相应实验设计的最低成本仅为0.50美元。| +|**2024-10-16**|**Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving**|Sihao Wu et.al.|[2410.12568](http://arxiv.org/abs/2410.12568)|null|大型语言模型(LLMs)在自动驾驶系统中的集成展示了强大的常识和推理能力,有效解决了纯粹数据驱动方法的缺陷。当前基于LLM的代理需要较长的推理时间,并且在与实时自动驾驶环境交互时面临挑战。一个关键的开放问题是,我们是否能够有效地利用LLM的知识来训练一个高效且稳健的强化学习(RL)代理。本文介绍了一种新的RAPID框架,即“鲁棒自适应策略注入与蒸馏”框架,该框架使用由基于LLM的驾驶代理合成的数据训练专门的混合策略RL代理,并进行在线适应。RAPID具有三个关键设计:1)利用从LLM代理收集的离线数据,将专家知识提炼到RL策略中以加快实时推理速度;2)引入鲁棒蒸馏到RL中,以继承来自基于LLM教师的性能和鲁棒性;3)采用混合策略方法,通过策略适配器进行联合决策解码。通过在线环境互动进行微调,RAPID减少了LLM知识的遗忘,同时保持对不同任务的适应性。广泛的实验表明,RAPID能够以高效、适应性强和鲁棒的方式将LLM知识有效地整合到规模化的RL策略中。代码和检查点将在接受后公开提供。| +|**2024-10-16**|**SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling**|Loris Gaven et.al.|[2410.12481](http://arxiv.org/abs/2410.12481)|null|近年来,大型语言模型(LLMs)不仅作为生成模型,还在解决文本序列决策任务方面展现出色的能力。当面对复杂环境时,如果其零样本能力不足,最近的研究表明,可以使用在线强化学习(RL)让这些LLM代理交互式地发现和学习有效的策略。然而,大多数先前的工作局限于采用策略梯度算法,这大大限制了这些代理在探索和利用方面的方法,例如经验重放和事后重标记。然而,这些方法可能是LLM学习代理的关键,特别是在设计自主内在动机的代理时,这些代理会根据自己的目标进行采样和追求(即自足性代理)。本文提出并研究了一种针对LLM代理的Soft Actor-Critic算法和事后重标记的适应性方法。我们的方法不仅为自足性的在线学习LLM代理铺平了道路,还可以在更经典的多目标RL环境中超越策略梯度方法。| +|**2024-10-16**|**Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance**|Yaxi Lu et.al.|[2410.12361](http://arxiv.org/abs/2410.12361)|null|基于大型语言模型的代理在解决复杂任务方面已经展现出显著的能力。然而,大多数代理系统仍然是反应式的,这限制了它们在需要预见性和自主决策的场景中的有效性。在这篇论文中,我们致力于开发能够预见到并主动发起任务的代理,而无需明确的人类指令。我们提出了一种新颖的数据驱动方法来解决这个问题。首先,我们收集真实世界的人类活动以生成主动式任务预测。这些预测随后由人类标注者标记为接受或拒绝。标注数据被用于训练一个奖励模型,该模型模拟人类判断,并作为LLM代理主动性的自动评估器。在此基础上,我们开发了一个全面的数据生成管道,以创建一个多样化的数据集ProactiveBench,包含6790个事件。最后,我们证明通过使用提出的ProactiveBench进行微调可以显著激发LLM代理的主动性。实验结果显示,我们的微调模型在主动提供帮助方面达到了66.47%的F1得分,超过了所有开源和闭源模型。这些结果突显了我们方法在创造更主动和有效的代理系统方面的潜力,为未来的人机协作进步铺平了道路。| +|**2024-10-16**|**Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay**|Yuyang Chen et.al.|[2410.12236](http://arxiv.org/abs/2410.12236)|null|如今,针对代码生成任务的Transformer基大型语言模型(LLM)通常会应用采样和过滤管道。由于代码生成任务中的稀疏奖励问题,即一个令牌的不正确性会导致Transformer模型采样冗余程序直到找到正确的程序,这导致了低效率。为了克服这一挑战,我们在微调阶段引入了经验回放(ER),其中存储生成的代码和程序,并将这些程序重新播放,以使LLM代理有机会从过去的经历中学习。基于ER的精神,我们介绍了一种新颖的方法,称为BTP管道,该方法包括三个阶段:束搜索采样、测试阶段和优先级经验回放阶段。该方法利用代码模型收集的失败程序,并从回放缓冲区中重播具有高可能性和通过率优先值(P2Value)的程序,从而提高效率。P2Value综合考虑了Transformer输出的可能性和通过率,可以利用大多数由LLMs收集的程序未能通过任何测试而导致的冗余资源。我们在几个LLM上实证应用了我们的方法,证明它提高了它们在代码生成任务中的性能,并超越了现有的基线。| +|**2024-10-15**|**Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents**|Bolun Sun et.al.|[2410.11906](http://arxiv.org/abs/2410.11906)|null|本文介绍了一种将大型语言模型(LLMs)应用于增强用户对隐私政策的理解的新方法,通过一个交互式对话代理来实现。我们证明,LLMs在数据实践识别、选择识别、政策总结和隐私问答等任务上显著优于传统模型,为隐私政策分析设定了新的基准。基于这些发现,我们引入了一种创新的基于LLM的代理,该代理充当处理网站隐私政策的专业系统,引导用户穿越复杂的法律语言,而无需他们提出特定问题。一项有100名参与者参与的用户研究表明,使用代理辅助的用户在理解水平上更高(平均分为2.6分中的3分,而对照组为1.8分),认知负荷更低(任务难度评分为10分中的3.2分,而对照组为7.8分),对管理隐私更有信心,并且完成任务所需时间更短(5.5分钟对比15.8分钟)。这项工作突显了基于LLM的代理在改变用户与隐私政策互动方面的潜力,从而促进更知情的同意并使用户在数字服务领域中更加自主。| +|**2024-10-15**|**HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications**|Weijie Xu et.al.|[2410.11239](http://arxiv.org/abs/2410.11239)|null|近期的大语言模型(LLM)进展在教育和金融等领域带来了许多益处,但在人力资源领域,仍有许多重复性的流程未被解决,例如访问请求、医疗报销和请假申请等。我们让这些任务与LLM代理相关联,该代理已经处理了诸如写作辅助和客户支持等任务。我们提出了HR-Agent,这是一种高效、保密且专门针对人力资源领域的基于LLM的任务导向对话系统,旨在自动化处理如医疗报销和访问请求等重复性的人力资源流程。由于在推理过程中不会将对话数据发送给LLM,因此它能够保持人力资源相关任务所需的机密性。| |**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**近期的研究表明,大型语言模型(LLMs)容易受到拒绝服务(DoS)攻击,例如通过拼写错误或非语义提示的对抗性输入可以触发无限输出,而不会生成[EOS]终止符。这些攻击可能导致高延迟,并使LLM服务对其他用户或任务不可用。然而,在存在语音到文本接口(如机器人语音命令)的情况下,执行此类DoS攻击变得具有挑战性,因为通过语音很难引入拼写错误或非语义提示。一种简单的DoS攻击方式是指示模型“不断重复‘Hello’”,但我们观察到仅依靠自然指令会限制输出长度,该长度受最大长度限制,这是大型语言模型在有监督微调(SFT)数据中的上限。为了解决这一限制,我们提出了针对LLMs的投毒型DoS(P-DoS)攻击,证明注入一个专门设计用于DoS目的的中毒样本可以打破输出长度限制。例如,一个中毒样本成功攻击了GPT-4o和GPT-4o mini(通过OpenAI的微调API),使用不到1美元的成本,导致输出重复直至达到最大推理长度(16K个token,相比之下未中毒前为0.5K)。此外,我们在开源LLMs上进行了全面的消融研究,并将方法扩展到LLM代理,其中攻击者可以控制微调数据集和算法。我们的研究结果强调了急需防御P-DoS攻击以确保LLMs安全的迫切需求。我们的代码可以在https://github.com/sail-sg/P-DoS找到。**| |**2024-10-14**|**FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas**|Yu Lei et.al.|[2410.10398](http://arxiv.org/abs/2410.10398)|null|AI对齐是关乎AI控制和安全的关键问题。它不仅应考虑价值中立的人类偏好,还应考虑道德和伦理方面的考量。在这项研究中,我们介绍了FairMindSim,通过一系列不公平的情景来模拟道德困境。我们使用LLM代理来模拟人类行为,在各个阶段确保对齐。为了探索驱动人类和LLM代理作为旁观者在涉及他人的不公正情况下干预的各种社会经济动机,即我们所称的信念,并探讨这些信念如何相互作用以影响个体行为,我们将相关社会学领域的知识纳入其中,并基于递归奖励模型(RRM)提出了信念-奖励对齐行为进化模型(BREM)。我们的研究结果表明,从行为角度来看,GPT-4o表现出更强的社会正义感,而人类则展现出更丰富的情感。此外,我们还讨论了情绪对行为的潜在影响。本研究为LLM与利他价值观对齐的应用提供了理论基础。| |**2024-10-14**|**Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations**|Garima Agrawal et.al.|[2410.10136](http://arxiv.org/abs/2410.10136)|null|在客户联络中心,人工客服经常面临较长的平均处理时间(AHT),因为他们需要手动解析查询并检索相关的知识库(KB)文章。虽然使用大型语言模型(LLM)的检索增强生成(RAG)系统已被广泛应用于行业以协助此类任务,但在实时对话中,RAG系统面临着诸如查询公式不准确和频繁问题重复检索等问题。为了解决这些局限性,我们提出了一种决策支持系统,该系统可以超越RAG,在实时识别客户问题。如果查询匹配常见问题解答(FAQ),系统直接从FAQ数据库中检索答案;否则,通过RAG生成答案。我们的方法减少了对人工查询的依赖,使得响应能够在2秒内提供给客服人员。此系统部署在Minerva CQ的人工智能辅助解决方案中,提高了效率,缩短了AHT,并降低了运营成本。我们还引入了一个自动化的LLM代理工作流,当没有预定义的FAQ时,可以从历史记录中识别FAQ。| @@ -292,16 +292,16 @@ |Publish Date|Title|Authors|PDF|Code|abstract| |---|---|---|---|---|---| -|**2024-10-17**|**Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens**|Lijie Fan et.al.|[2410.13863](http://arxiv.org/abs/2410.13863)|null|在视觉领域,扩大自回归模型的效果并不像在大型语言模型中那样显著。在这项工作中,我们研究了文本到图像生成中的这一扩展问题,重点关注两个关键因素:模型是否使用离散或连续的标记,以及标记是通过BERT或GPT类的变换器架构以随机顺序还是固定栅格顺序生成。我们的实证结果表明,虽然所有模型在验证损失方面都有效扩展,但它们的评估性能(通过FID、GenEval分数和视觉质量来衡量)表现出不同的趋势。基于连续标记的模型比使用离散标记的模型在视觉质量上取得了显著更好的效果。此外,生成顺序和注意力机制对GenEval分数有显著影响:随机顺序模型比栅格顺序模型获得了显著更高的GenEval分数。受这些发现的启发,我们训练了一个基于连续标记的随机顺序自回归模型Fluid。Fluid 10.5B模型在MS-COCO 30K上的零样本FID达到了新的最先进水平,为6.16,并且在GenEval基准测试中的总体得分为0.69。我们希望我们的发现和结果能够鼓励未来的研究进一步缩小视觉和语言模型之间的扩展差距。| -|**2024-10-17**|**PUMA: Empowering Unified MLLM with Multi-granular Visual Generation**|Rongyao Fang et.al.|[2410.13861](http://arxiv.org/abs/2410.13861)|**[link](https://github.com/rongyaofang/puma)**|**近期在多模态基础模型方面的进展显著提升了视觉-语言理解的能力。初步尝试也探索了多模态大型语言模型(MLLM)在视觉内容生成中的潜力。然而,现有工作未能充分解决不同图像生成任务在统一MLLM范式下对不同粒度需求的处理。在这项工作中,我们提出了PUMA,这是一种通过多粒度视觉生成来赋能统一MLLM的方法。PUMA将多粒度视觉特征统一作为MLLM的输入和输出,优雅地解决了不同粒度需求在统一MLLM框架下的各种图像生成任务。经过多模态预训练和任务特定指令微调后,PUMA展示了在广泛多模态任务中的能力。这项工作代表了向真正能够适应各种视觉任务对粒度需求的统一MLLM迈出的重要一步。代码和模型将在https://github.com/rongyaofang/PUMA发布。**| -|**2024-10-17**|**$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models**|Yaxin Luo et.al.|[2410.13859](http://arxiv.org/abs/2410.13859)|null|尽管多模态大型语言模型(MLLMs)取得了显著进展,但其高昂的计算成本仍然是实际部署的障碍。受自然语言处理中深度混合(MoDs)的启发,我们从“激活标记”的角度来解决这一限制问题。我们的关键见解是,如果大多数标记对于层计算来说是冗余的,则可以通过MoD层直接跳过它们。然而,直接将MLLMs的密集层转换为MoD层会导致显著的性能下降。为了解决这个问题,我们提出了一种创新的MoD适应策略,称为$\gamma$-MoD,用于现有的MLLMs。在$\gamma$-MoD中,我们提出了一种新的指标来指导MLLM中的MoD部署,即注意力图的秩(ARank)。通过ARank,我们可以有效地识别哪些层是冗余的,并应被替换为MoD层。基于ARank,我们进一步提出了两种新颖的设计,以最大限度地提高MLLM的计算稀疏性,同时保持其性能,即共享视觉-语言路由器和掩码路由学习。通过这些设计,MLLM的超过90%的密集层可以有效地转换为MoD层。为了验证我们的方法,我们在三个流行的MLLMs上进行了应用,并在9个基准数据集上进行了广泛的实验。实验结果不仅验证了$\gamma$-MoD对现有MLLMs的显著效率优势,还确认了它在各种MLLMs上的泛化能力。例如,$\gamma$ -MoD仅造成轻微的性能下降,即-1.5%,但可以将LLaVA-HR的训练时间和推理时间分别减少31.0%和53.2%。| -|**2024-10-17**|**How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs**|Guhao Feng et.al.|[2410.13857](http://arxiv.org/abs/2410.13857)|null|尽管基于Transformer的大型语言模型(LLMs)在各个领域取得了显著的成功,但理解和提升它们的数学能力仍然是一个重要的挑战。在本文中,我们对LLMs的数学能力进行了严格的理论分析,特别关注它们在算术任务中的表现。我们发现数值精度是影响其在数学任务中效果的关键因素。我们的研究结果显示,使用低数值精度的Transformer在处理算术任务(如迭代加法和整数乘法)时,除非模型大小相对于输入长度呈超多项式增长,否则无法有效解决这些问题。相比之下,使用标准数值精度的Transformer能够以显著更小的模型规模高效地处理这些任务。我们还通过探索不同数值精度对算术任务的影响的实证实验进一步支持了我们的理论发现,为提高LLMs的数学推理能力提供了宝贵的见解。| -|**2024-10-17**|**Can MLLMs Understand the Deep Implication Behind Chinese Images?**|Chenhao Zhang et.al.|[2410.13854](http://arxiv.org/abs/2410.13854)|**[link](https://github.com/MING-ZCH/CII-Bench)**|**随着多模态大语言模型(MLLMs)的能力不断提升,对这些模型进行更高层次的感知和理解能力评估的需求也在增加。然而,目前缺乏针对MLLMs在中文视觉内容上进行高层次感知和理解能力评估的工作。为了填补这一空白,我们引入了“CII-Bench”(Chinese Image Implication understanding Benchtermark),这是一个旨在评估MLLMs对中国图像进行高层次感知和理解能力的基准。与现有基准相比,CII-Bench具有几个显著的特点。首先,为了确保中文语境的真实性,CII-Bench中的图像来自中国互联网,并经过人工审查,相应的答案也是人工精心制作的。此外,CII-Bench还纳入了代表中国传统文化的图像,如著名的中国传统画作,这可以深入反映模型对中国传统文化的理解。通过在多个MLLMs上进行广泛的实验,我们得出了重要发现。最初,MLLMs在CII-Bench上的表现与人类存在显著差距。MLLMs的最佳准确率达到了64.4%,而人类的平均准确率为78.2%,最高可达81.0%。随后,MLLMs在处理与中国传统文化相关的图像时表现较差,这表明它们在理解高层次语义方面存在局限性,并且缺乏对中国传统文化的深入了解。最后,我们观察到大多数模型在提示中加入图像情感线索后,其准确性有所提高。我们认为,CII-Bench将使MLLMs更好地理解中文语义和特定于中国的图像,推动迈向专家级通用人工智能(AGI)的进程。我们的项目可以在https://cii-bench.github.io/公开访问。**| -|**2024-10-17**|**Retrospective Learning from Interactions**|Zizhao Chen et.al.|[2410.13852](http://arxiv.org/abs/2410.13852)|null|多轮交互过程中,大规模语言模型(LLMs)和用户之间的对话自然包含了隐式的反馈信号。如果LLM以出乎意料的方式回应用户的指令,用户可能会通过重新表述请求、表达挫败感或转向替代任务来传达这些信号。这些信号与具体任务无关,并且占据语言的一个相对受限的子空间,即使LLM在实际任务上失败了,也能识别这些信号。这为LLM提供了不断从交互中学习的机会,而无需额外的注释。我们引入了一种名为ReSpect的方法,通过回顾过去交互中的这些信号来学习。我们在一个新的多模态交互场景中部署了ReSpect,在该场景中,人类指导LLM解决具有组合解空间的抽象推理任务。通过数千次与人类的交互,我们展示了ReSpect如何逐步提高任务完成率,从最初的31%提升到82%,并且整个过程没有任何外部注释。| -|**2024-10-17**|**SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction**|Xuan Zhang et.al.|[2410.13846](http://arxiv.org/abs/2410.13846)|**[link](https://github.com/sail-sg/simlayerkv)**|**近期在大型语言模型(LLM)方面的进展使其能够处理长上下文。然而,增加模型层数和输入序列的长度显著增加了存储键值(KV)缓存所需的内存,这对高效的推理构成了挑战。为了解决这个问题,我们提出了SimLayerKV,这是一种简单而有效的方法,通过选择性地删除识别出的懒惰层中的缓存来减少层间KV缓存的冗余。我们的方法基于这样的观察:在长上下文LLM中,某些层表现出“懒惰”行为,对建模长距离依赖性的贡献较少,不如非懒惰层。通过分析注意力权重模式,我们发现这些懒惰层在给定输入生成过程中对不同token的行为是一致的。这一见解启发了我们的SimLayerKV,它通过识别懒惰层并相应地减少它们的KV缓存来实现这一点。SimLayerKV是无需训练的、可泛化的,并且只需七行代码即可实现。我们在三个代表性LLM上进行了广泛的实验,例如LLaMA2-7B、LLaMA3-8B和Mistral-7B,在来自LongBench基准的16项任务上进行测试。结果显示,当结合4位量化时,SimLayerKV实现了5倍的KV缓存压缩比,仅导致1.2%的性能下降。我们的代码可在https://github.com/sail-sg/SimLayerKV获得。**| -|**2024-10-17**|**Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs**|Tianyu Guo et.al.|[2410.13835](http://arxiv.org/abs/2410.13835)|null|实践者们在变压器基础的大语言模型(LLMs)中观察到了三个令人困惑的现象:注意力 Sink、值状态耗尽和残差状态峰值,这些现象统称为极端标记现象。这些现象的特点是某些所谓的“Sink 标记”接收不成比例高的注意力权重,表现出显著较小的值状态,并且具有比其他标记大得多的残差状态范数。这些极端标记导致了LLM推理、量化和可解释性中的各种挑战。我们阐明了极端标记现象背后的机制。首先,我们表明这些现象出现在非常简单的架构中——只有一到三层的变压器,在玩具模型Bigram-Backcopy(BB)任务上训练时也会出现。在这种情况下,我们识别出一种活跃-休眠机制,其中注意力头对于特定输入域成为Sink,而对于其他输入则不然。我们对训练动态的理论分析揭示,这些现象是由一种相互强化机制驱动的。基于这些见解,我们提出了在预训练期间缓解极端标记现象的策略,包括用ReLU替换softmax以及用SGD替换Adam。接下来,我们将分析扩展到预训练的LLMs,包括Llama和OLMo,显示许多注意力头表现出与BB任务中类似的活跃-休眠机制,并且相互强化机制也控制着LLM预训练期间极端标记现象的出现。我们的结果揭示了许多由BB任务预测的极端标记现象的静态和动态特性与在预训练的LLMs中的观察结果一致。| -|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|通过使用大型语言模型(LLMs)的代理来实现自治,可以提高人类在个性化和标准化任务中的效率。自动化网络任务(如预订预算内的酒店)的需求日益增加。这些网络代理不仅满足实际需求,还作为各种代理接地场景的重要概念验证示例,其成功预示着许多未来应用的进步。先前的研究通常手工设计网络代理策略(例如,提示模板、多代理系统、搜索方法等),而这些策略可能无法很好地推广到所有现实世界场景中。另一方面,关于网络代理的观察/动作表示与基于该代理的LLM的预训练数据之间错位的研究非常有限。这种差异尤其明显,因为LLM主要针对语言完成进行训练,而不是涉及具身导航动作和符号化网络元素的任务。我们的研究通过简单地优化观察和动作空间以更好地适应LLM的能力,从而增强了一个基于LLM的网络代理。这种方法使我们的基础代理在广泛的网络任务上显著优于以前的方法。具体来说,在WebArena基准测试中,该基准测试涵盖了通用的网络交互任务,我们的代理AgentOccam比之前的最先进方法和同期工作分别高出9.8分(+29.4%)和5.9分(+15.8%),并且成功率为26.6分(+161%),超过了类似的基本网络代理,这些代理的观察和动作空间没有对齐。我们没有使用上下文示例、新的代理角色、在线反馈或搜索策略。AgentOccam的简单设计突显了LLMs在无样本情况下处理网络任务的强大性能,并强调了精心调整观察和动作空间对于基于LLM的代理的关键作用。| -|**2024-10-17**|**Harnessing Webpage UIs for Text-Rich Visual Understanding**|Junpeng Liu et.al.|[2410.13824](http://arxiv.org/abs/2410.13824)|null|文本丰富的视觉理解——即处理密集文本内容与视觉元素相结合的环境的能力——对于多模态大型语言模型(MLLMs)在与结构化环境交互时至关重要。为了增强这种能力,我们提出使用基于文本的大型语言模型(LLMs)从网页用户界面(UI)合成通用的多模态指令。尽管缺乏直接的视觉输入,基于文本的LLMs能够处理来自网页可访问性树的结构化文本表示。这些指令随后与UI截图配对以训练多模态模型。我们引入了MultiUI数据集,该数据集包含来自100万个网站的730万个样本,涵盖了多样化的多模态任务和UI布局。在MultiUI上训练的模型不仅在网页UI任务中表现出色——在VisualWebBench上实现了高达48%的提升,在Mind2Web网页代理数据集上的动作准确率提高了19.1%——而且在非网页UI任务以及甚至非UI领域(如文档理解、OCR和图表解释)中的泛化效果也非常好。这些结果突显了网页UI数据在推进各种场景下文本丰富的视觉理解方面的广泛应用。| +|**2024-10-17**|**Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens**|Lijie Fan et.al.|[2410.13863](http://arxiv.org/abs/2410.13863)|null|在视觉领域中扩大自回归模型的效果并不像在大型语言模型中那样显著。在这项工作中,我们研究了文本到图像生成中的这种扩展问题,并重点关注两个关键因素:模型是否使用离散或连续的标记,以及标记是通过类似于BERT或GPT的变换器架构以随机还是固定栅格顺序生成的。我们的实证结果表明,虽然所有模型在验证损失方面都能有效扩展,但它们的评估性能——通过FID、GenEval得分和视觉质量来衡量——则表现出不同的趋势。基于连续标记的模型在视觉质量上明显优于那些使用离散标记的模型。此外,生成顺序和注意力机制对GenEval得分有显著影响:随机顺序模型在GenEval得分上明显优于栅格顺序模型。受这些发现的启发,我们训练了一种名为Fluid的随机顺序自回归模型,该模型基于连续标记。Fluid 10.5B模型在MS-COCO 30K上的零样本FID得分为6.16,在GenEval基准上的总体得分为0.69。我们希望我们的发现和结果能鼓励未来的研究进一步缩小视觉和语言模型之间的扩展差距。| +|**2024-10-17**|**PUMA: Empowering Unified MLLM with Multi-granular Visual Generation**|Rongyao Fang et.al.|[2410.13861](http://arxiv.org/abs/2410.13861)|**[link](https://github.com/rongyaofang/puma)**|**近年来,多模态基础模型在视觉-语言理解方面取得了显著进展。初步尝试也探索了多模态大语言模型(MLLM)在视觉内容生成中的潜力。然而,现有工作未能充分解决统一MLLM范式下不同图像生成任务的多样化粒度需求——从文本到图像生成所需的多样性到图像操作所需的精确可控性。在这项工作中,我们提出了PUMA,即通过多粒度视觉生成赋能统一的MLLM。PUMA将多粒度视觉特征统一作为MLLM的输入和输出,优雅地解决了不同粒度要求的各种图像生成任务。经过多模态预训练和任务特定指令微调后,PUMA展示了在广泛多模态任务中的能力。这项工作代表了朝着真正统一的MLLM迈出的重要一步,这种MLLM能够适应各种视觉任务的粒度需求。代码和模型将在https://github.com/rongyaofang/PUMA发布。**| +|**2024-10-17**|**$γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models**|Yaxin Luo et.al.|[2410.13859](http://arxiv.org/abs/2410.13859)|null|尽管多模态大型语言模型(MLLMs)取得了显著进展,但其高昂的计算成本仍然是实际部署的障碍。受自然语言处理中深度混合(MoD)的启发,我们从“激活标记”的角度着手解决这一局限性。我们的关键见解是,如果大多数标记对于层计算是冗余的,则可以通过MoD层直接跳过它们。然而,直接将MLLMs的密集层转换为MoD层会导致显著的性能下降。为了解决这个问题,我们提出了一种针对现有MLLMs的创新MoD适应策略,称为γ-MoD。在γ-MoD中,我们提出了一种新的度量方法来指导MLLM中的MoD部署,即注意力图的秩(ARank)。通过ARank,我们可以有效地识别哪些层是冗余的,并应被替换为MoD层。基于ARank,我们进一步提出了两种新颖的设计,以最大限度地提高MLLM的计算稀疏性,同时保持其性能,即共享视觉-语言路由器和掩码路由学习。通过这些设计,超过90%的MLLM密集层可以有效转换为MoD层。为了验证我们的方法,我们将其应用于三个流行的MLLM,并在9个基准数据集上进行了广泛的实验。实验结果不仅验证了γ-MoD对现有MLLMs的显著效率提升,还确认了其在各种MLLM上的泛化能力。例如,γ-MoD仅导致轻微的性能下降,即-1.5%,但可以分别将LLaVA-HR的训练和推理时间减少31.0%和53.2%。| +|**2024-10-17**|**How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs**|Guhao Feng et.al.|[2410.13857](http://arxiv.org/abs/2410.13857)|null|尽管基于Transformer的大型语言模型(LLMs)在各个领域取得了显著成功,但理解和提升其数学能力仍然是一个重大挑战。在本文中,我们对LLMs的数学能力进行了严格的理论分析,特别关注它们的算术性能。我们发现数值精度是影响其在数学任务中的有效性的一个关键因素。研究结果表明,采用低数值精度的Transformer在处理算术任务(如迭代加法和整数乘法)时,除非模型大小相对于输入长度呈超多项式增长,否则无法有效解决这些问题。相比之下,采用标准数值精度的Transformer可以高效地处理这些任务,并且所需的模型尺寸要小得多。我们还通过实验进一步支持了我们的理论发现,这些实验探索了不同数值精度对算术任务的影响,为提高LLMs的数学推理能力提供了宝贵的见解。| +|**2024-10-17**|**Can MLLMs Understand the Deep Implication Behind Chinese Images?**|Chenhao Zhang et.al.|[2410.13854](http://arxiv.org/abs/2410.13854)|**[link](https://github.com/MING-ZCH/CII-Bench)**|**随着多模态大型语言模型(MLLMs)的能力不断提升,对这些模型进行更高阶能力评估的需求也在增加。然而,目前缺乏针对MLLMs在中文视觉内容的高阶感知和理解能力进行评估的工作。为了填补这一空白,我们引入了“中文图像隐含理解基准”(CII-Bench),旨在评估MLLMs对中国图像的高阶感知和理解能力。CII-Bench在多个方面与现有基准有所不同。首先,为了确保中文语境的真实性,CII-Bench中的图像均来自中国互联网,并经过人工审核,相应的答案也是手动编写的。此外,CII-Bench还纳入了一些代表中国传统文化的图像,如著名的中国传统绘画,这可以深入反映模型对中国传统文化的理解。通过在多个MLLMs上广泛实验,我们得出了显著的发现。最初,MLLMs在CII-Bench上的表现与人类之间存在明显差距。MLLMs的最佳准确率为64.4%,而人类的平均准确率为78.2%,最高达到令人印象深刻的81.0%。随后,MLLMs在处理中国传统文化图像时表现较差,表明它们在理解高层次语义和缺乏中国传统文化深度知识库方面存在局限。最后,观察到大多数模型在提示中加入图像情感提示后准确性有所提高。我们认为,CII-Bench将使MLLMs更好地理解中文语义和特定于中国的图像,从而推动迈向专家级通用人工智能(AGI)的进程。我们的项目公开可访问,网址为https://cii-bench.github.io/。**| +|**2024-10-17**|**Retrospective Learning from Interactions**|Zizhao Chen et.al.|[2410.13852](http://arxiv.org/abs/2410.13852)|null|多轮交互中的大型语言模型(LLMs)和用户之间的对话自然包含了隐式的反馈信号。如果LLM以出乎意料的方式回应指令,用户可能会通过重新表述请求、表达挫败感或转向替代任务来传达这一信号。这些信号与任务无关,并且占据相对受限的语言子空间,使得LLM即使在实际任务上失败也能识别它们。这为持续从交互中学习开辟了一条途径,而无需额外的注释。我们引入了ReSpect方法,通过回顾过去的交互来学习这些信号。我们在一个新的多模态交互场景中部署了ReSpect,在这个场景中,人类指导LLM解决具有组合解空间的抽象推理任务。通过与人类进行数千次交互,我们展示了ReSpect如何逐步提高任务完成率,从最初的31%提升到82%,并且整个过程没有任何外部注释。| +|**2024-10-17**|**SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction**|Xuan Zhang et.al.|[2410.13846](http://arxiv.org/abs/2410.13846)|**[link](https://github.com/sail-sg/simlayerkv)**|**近期大型语言模型(LLMs)的发展已经扩展了它们处理长上下文的能力。然而,增加模型层数和输入序列长度显著增加了存储键值(KV)缓存所需的内存,这给高效的推理带来了挑战。为了缓解这一问题,我们提出了SimLayerKV,这是一种简单而有效的方法,通过在识别出的懒惰层中选择性地丢弃缓存来减少层间KV缓存的冗余。我们的方法基于这样的观察:在长上下文LLMs中的某些层表现出“懒惰”行为,与非懒惰层相比,这些层对建模长距离依赖贡献较小。通过对注意力权重模式进行分析,我们发现对于给定输入,在生成过程中这些懒惰层的行为是一致的。这一见解启发了我们的SimLayerKV方法,该方法能够识别懒惰层并相应地减少其KV缓存。SimLayerKV无需训练,具有通用性,并且只需七行代码即可实现。我们在三个代表性LLM上进行了广泛的实验,例如LLaMA2-7B、LLaMA3-8B和Mistral-7B,涉及LongBench基准中的16个任务。结果显示,当结合4位量化时,SimLayerKV实现了5倍的KV缓存压缩比,性能仅下降1.2%。我们的代码可以在https://github.com/sail-sg/SimLayerKV获取。**| +|**2024-10-17**|**Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs**|Tianyu Guo et.al.|[2410.13835](http://arxiv.org/abs/2410.13835)|null|实践者们在变压器型大型语言模型(LLMs)中观察到了三种令人困惑的现象:注意力汇点、价值状态耗尽和残差状态峰值,这些现象统称为极端标记现象。这些现象的特点是某些所谓的“汇点标记”接收不成比例高的注意力权重,表现出明显较小的价值状态,并且具有比其他标记大得多的残差状态范数。这些极端标记在LLM推理、量化和可解释性方面引发了许多挑战。我们阐明了极端标记现象背后的机制。首先,我们在非常简单的架构中展示了这些现象——仅有一到三层的变压器,在玩具模型Bigram-Backcopy(BB)任务上训练时会出现这些现象。在这种情况下,我们确定了一种活跃-休眠机制,其中注意力头在特定输入域中成为汇点,而在其他域中则不是。我们的训练动态理论分析揭示,这些现象是由一种相互增强机制驱动的。基于这些见解,我们提出了一些策略来在预训练期间缓解极端标记现象,包括用ReLU替换softmax以及用SGD替换Adam。接下来,我们将分析扩展到预训练的LLMs,包括Llama和OLMo,结果显示许多注意力头表现出与BB任务中类似的活跃-休眠机制,并且相互增强机制也支配着LLM预训练期间极端标记现象的出现。我们的结果揭示了由BB任务预测的许多静态和动态性质与在预训练LLMs中的观察结果一致。| +|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|通过使用大型语言模型(LLMs)的代理来实现个性化、标准化任务,可以提升人类的工作效率。自动化网络任务(如在预算内预订酒店)的需求日益增加。这些网络代理不仅能满足实际需求,还作为各种代理接地场景的重要概念验证示例,其成功将预示着许多未来应用的进步。先前的研究通常手工设计网络代理策略(例如提示模板、多代理系统、搜索方法等),而这些策略可能无法很好地泛化到所有现实世界场景。另一方面,关于网络代理的观察/动作表示与LLM预训练数据之间的不匹配研究较少。这种差异尤其明显,当LLM主要针对语言完成而非涉及具身导航动作和符号网络元素的任务进行训练时。我们的研究通过简单地优化LLM网络代理的观察和动作空间,使其更好地与LLM的能力相匹配,从而提升了性能。这种方法使我们基于基础代理的AgentOccam在广泛的网络任务上显著优于以前的方法。具体而言,在WebArena基准测试中,该基准测试涵盖了一般用途的网络交互任务,AgentOccam比前最先进的方法和同期工作分别高出9.8分(+29.4%)和5.9分(+15.8%),并且成功率达到26.6分(+161%),超过了类似的基础网络代理。我们没有使用上下文示例、新代理角色、在线反馈或搜索策略。AgentOccam的简洁设计突显了LLM在无样本学习下执行网络任务的强大能力,并强调了精心调整观察和动作空间对于基于LLM的代理的重要性。| +|**2024-10-17**|**Harnessing Webpage UIs for Text-Rich Visual Understanding**|Junpeng Liu et.al.|[2410.13824](http://arxiv.org/abs/2410.13824)|null|文本丰富的视觉理解——即处理密集文本内容与视觉元素相融合的环境的能力,对于多模态大型语言模型(MLLMs)有效交互结构化环境至关重要。为了增强这种能力,我们提出利用基于文本的大型语言模型(LLMs)从网页用户界面合成通用的多模态指令。尽管缺乏直接的视觉输入,基于文本的LLMs能够处理来自网页可访问性树的结构化文本表示。这些指令随后与UI截图配对以训练多模态模型。我们引入了MultiUI数据集,该数据集包含来自100万个网站的730万样本,涵盖了多样化的多模态任务和UI布局。在MultiUI上训练的模型不仅在网页UI任务中表现出色——在VisualWebBench上的性能提升高达48%,在网页代理数据集Mind2Web上的动作准确性提高了19.1%——而且在非网页UI任务以及非UI领域(如文档理解、OCR和图表解释)中的泛化效果也出乎意料地好。这些结果突显了网页UI数据在推进各种场景下文本丰富的视觉理解方面的广泛应用。| |**2024-10-16**|**Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception**|Jihao Zhao et.al.|[2410.12788](http://arxiv.org/abs/2410.12788)|null| Retrieval-Augmented Generation(RAG)在作为大型语言模型(LLMs)的可行补充时,常常忽略了其管道中一个关键方面——文本分块,这影响了知识密集型任务的质量。本文介绍了一种称为元分块(Meta-Chunking)的概念,这是一种介于句子和段落之间的粒度,由段落内具有深层次语言逻辑联系的一组句子组成。为了实现元分块,我们基于LLMs设计了两种策略:边界采样分块和困惑度分块。前者利用LLMs对连续句子是否需要分割进行二分类决策,基于从边界采样获得的概率差做出决策。后者通过分析困惑度分布的特点来精确识别文本分块边界。此外,考虑到不同文本的固有复杂性,我们提出了一种结合元分块与动态合并的策略,以实现在细粒度和粗粒度文本分块之间取得平衡。实验在十一个数据集上进行,结果表明元分块可以更有效地提高基于RAG的单跳和多跳问答性能。例如,在2WikiMultihopQA数据集上,它比相似性分块提高了1.32的性能,同时仅消耗了45.8%的时间。我们的代码可在https://github.com/IAAR-Shanghai/Meta-Chunking 获取。| |**2024-10-16**|**In-Context Learning Enables Robot Action Prediction in LLMs**|Yida Yin et.al.|[2410.12782](http://arxiv.org/abs/2410.12782)|null|最近,大型语言模型(LLMs)在语言领域通过上下文学习(ICL)取得了显著的成功。然而,利用LLMs的ICL能力直接预测机器人动作的研究还相对较少。在这篇论文中,我们介绍了一种名为RoboPrompt的框架,该框架使现成的纯文本LLMs能够在无需训练的情况下通过ICL直接预测机器人动作。我们的方法首先通过启发式方法识别出一个片段中的关键帧,这些关键帧捕捉了重要的时刻。接下来,我们从这些关键帧中提取末端执行器的动作以及估计的初始物体姿态,并将两者转换为文本描述。最后,我们构建了一个结构化的模板,从这些文本描述和任务指令中形成ICL演示。这使得LLM能够在测试时直接预测机器人动作。通过广泛的实验和分析,RoboPrompt在模拟和真实环境中均表现出比零样本和ICL基线更强的性能。| |**2024-10-16**|**Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information**|Yingya Li et.al.|[2410.12774](http://arxiv.org/abs/2410.12774)|null|多任务学习的成功在很大程度上取决于任务的分组方式。简单地将所有任务或随机选择的任务组合在一起可能导致负迁移,从而使多任务模型的表现不如单任务模型。尽管已经做出了许多努力来识别任务分组并衡量不同任务之间的相关性,但定义一个指标以从众多潜在任务组合中确定最佳任务分组仍然是一个具有挑战性的研究课题。我们提出了一种基于点式V-可用信息(PVI)测量任务难度的任务相关性度量方法。PVI是一种新近提出的度量标准,用于估计给定模型时数据集包含多少可用信息。我们假设具有统计上不可区分的PVI估计值的任务足够相似,可以从联合学习过程中受益。我们在一般、生物医学和临床领域的15个NLP数据集上进行了全面实验,以评估该度量方法用于任务分组的可行性。我们将联合学习器的结果与单任务学习器、现有基线方法以及最近的大规模语言模型(包括Llama 2和GPT-4)进行了比较。结果显示,通过将具有相似PVI估计值的任务分组,联合学习器在较少总参数的情况下获得了具有竞争力的结果,并且在不同领域内表现一致。| diff --git a/docs/agent-arxiv-daily.json b/docs/agent-arxiv-daily.json index 1b19403d3b2..da0234b5962 100644 --- a/docs/agent-arxiv-daily.json +++ b/docs/agent-arxiv-daily.json @@ -1 +1 @@ -{"agent": {"2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u5d4c\u5165\u5f0f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7cfb\u7edf\u5728\u7a7a\u95f4\u8ba4\u77e5\u548c\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u96c6\u6210\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u8bba\u6587\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u63ed\u793a\u4e86\u660e\u663e\u7684\u8fdb\u5c55\uff0c\u4f46\u4e5f\u5f3a\u8c03\u4e86\u5f00\u53d1\u65b0\u65b9\u6cd5\u4ee5\u5145\u5206\u5229\u75283D-LLMs\u6f5c\u529b\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u6307\u660e\u9053\u8def\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u7efc\u8ff0\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.09935": "|**2024-05-24**|**DEBATE: Devil's Advocate-Based Assessment and Text Evaluation**|Alex Kim et.al.|[2405.09935](http://arxiv.org/abs/2405.09935)|**[link](https://github.com/gunny97/DEBATE)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u6a21\u578b\u7684\u666e\u53ca\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u53d8\u5f97\u65e5\u76ca\u5173\u952e\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65e0\u53c2\u8003\u8bc4\u4ef7\u5668\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u5904\u7406\u65b0\u4efb\u52a1\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u91c7\u7528\u5355\u4ee3\u7406\u65b9\u6cd5\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\u3002\u56e0\u4e3aLLM\u4ee3\u7406\u7684\u56de\u7b54\u5b58\u5728\u504f\u89c1\uff0c\u6bd4\u5982\u5bf9\u7279\u5b9a\u6587\u672c\u7ed3\u6784\u6216\u5185\u5bb9\u7684\u504f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u5de5\u4f5c\u4e2d\u63d0\u51faDEBATE\uff0c\u4e00\u4e2a\u5efa\u7acb\u5728\u591a\u4ee3\u7406\u8bc4\u5206\u7cfb\u7edf\u57fa\u7840\u4e0a\u7684NLG\u8bc4\u4ef7\u6846\u67b6\uff0c\u878d\u5165\u4e86\u201c\u6076\u9b54\u8fa9\u624b\u201d\u7684\u6982\u5ff5\u3002\u5728\u8be5\u6846\u67b6\u4e2d\uff0c\u4e00\u4e2a\u4ee3\u7406\u88ab\u6307\u4ee4\u6279\u8bc4\u5176\u4ed6\u4ee3\u7406\u7684\u8bba\u70b9\uff0c\u4ece\u800c\u53ef\u80fd\u6d88\u89e3LLM\u4ee3\u7406\u7b54\u6848\u4e2d\u7684\u504f\u89c1\u3002DEBATE\u5728\u4e24\u4e2aNLG\u8bc4\u4ef7\u5143\u8bc4\u4f30\u57fa\u51c6\u2014\u2014SummEval\u548cTopicalChat\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4ee3\u7406\u4e4b\u95f4\u7684\u8fa9\u8bba\u5e7f\u5ea6\u4ee5\u53ca\u4ee3\u7406\u7684\u4eba\u683c\u7279\u8d28\u4f1a\u5f71\u54cd\u8bc4\u4ef7\u5668\u7684\u6027\u80fd\u3002|\n", "2405.05175": "|**2024-05-08**|**Air Gap: Protecting Privacy-Conscious Conversational Agents**|Eugene Bagdasaryan et.al.|[2405.05175](http://arxiv.org/abs/2405.05175)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u5f0f\u4ee3\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5904\u7406\u654f\u611f\u7528\u6237\u6570\u636e\u65f6\u5f15\u53d1\u4e86\u4e25\u91cd\u7684\u9690\u79c1\u95ee\u9898\u3002\u8fd9\u4e9b\u4ee3\u7406\u867d\u80fd\u7406\u89e3\u5e76\u5904\u7406\u4e0a\u4e0b\u6587\uff0c\u4f46\u4e5f\u53ef\u80fd\u88ab\u6076\u610f\u4e00\u65b9\u5229\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5a01\u80c1\u6a21\u578b\uff0c\u5373\u7b2c\u4e09\u65b9\u5e94\u7528\u901a\u8fc7\u64cd\u63a7\u4ea4\u4e92\u4e0a\u4e0b\u6587\uff0c\u8bef\u5bfcLLM\u4ee3\u7406\u6cc4\u9732\u4e0e\u5176\u4efb\u52a1\u65e0\u5173\u7684\u79c1\u4eba\u4fe1\u606f\u3002\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u6846\u67b6\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AirGapAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u6ce8\u91cd\u9690\u79c1\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u9650\u5236\u4ee3\u7406\u4ec5\u8bbf\u95ee\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u6570\u636e\uff0c\u9632\u6b62\u610f\u5916\u7684\u6570\u636e\u6cc4\u6f0f\u3002\u5b9e\u9a8c\u4f7f\u7528Gemini\u3001GPT\u548cMistral\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\uff0c\u7ed3\u679c\u663e\u793aAirGapAgent\u5728\u62b5\u5fa1\u57fa\u4e8e\u5355\u4e2a\u67e5\u8be2\u7684\u4e0a\u4e0b\u6587\u52ab\u6301\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8eGemini Ultra\u4ee3\u7406\uff0c\u8fd9\u79cd\u653b\u51fb\u4ece94%\u7684\u4fdd\u62a4\u80fd\u529b\u964d\u4f4e\u523045%\uff0c\u800cAirGapAgent\u53ef\u4ee5\u4fdd\u630197%\u7684\u9632\u62a4\u6548\u679c\uff0c\u4f7f\u540c\u6837\u7684\u653b\u51fb\u5931\u6548\u3002|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325](http://arxiv.org/abs/2405.04325)|null|\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u867d\u4e3a\u6784\u5efa\u81ea\u7136\u8bed\u8a00\u4ee3\u7406\u63d0\u4f9b\u4e86\u5f3a\u5927\u57fa\u7840\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u5b83\u4eec\u53ca\u5176\u57fa\u4e8e\u5b83\u4eec\u6784\u5efa\u7684\u81ea\u4e3b\u4ee3\u7406\u7684\u5b89\u5168\u6027\u62c5\u5fe7\u3002\u7279\u522b\u662f\u6b3a\u9a97\u80fd\u529b\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662fAI\u4ee3\u7406\u901a\u8fc7\u6df7\u6dc6\u548c\u6a21\u68f1\u4e24\u53ef\u6765\u8bef\u5bfc\u3001\u9690\u85cf\u771f\u76f8\u6216\u63a8\u5e7f\u90e8\u5206\u4e0d\u771f\u5b9e\u7684\u4fe1\u5ff5\u7684\u884c\u4e3a\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80AI\u5b89\u5168\u7814\u7a76\u4e2d\u7684\u6492\u8c0e\u3001\u81ea\u79c1\u51b3\u7b56\u6216\u63d0\u4f9b\u865a\u5047\u4fe1\u606f\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u4e00\u7c7b\u7279\u6b8a\u7684\u6b3a\u9a97\uff1a\u7c7b\u4f3c\u4e8e\u9b54\u672f\u5e08\u5229\u7528\u969c\u773c\u6cd5\u8ba9\u5154\u5b50\u4ece\u5e3d\u5b50\u91cc\u51fa\u73b0\uff0c\u8981\u4e48\u901a\u8fc7\u9690\u85cf\u7684\u6697\u95e8\uff0c\u8981\u4e48\u901a\u8fc7\u8f6c\u79fb\u6ce8\u610f\u529b\u76f4\u63a5\u5c55\u793a\u3002 \u6211\u4eec\u7684\u65b0\u5b9e\u9a8c\u5e73\u53f0\u5728\u4e00\u4e2a\u6709\u76ee\u6807\u7684\u73af\u5883\u4e2d\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u5bf9\u6297\u6027\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65f6\u7684\u6b3a\u9a97\u56fa\u6709\u80fd\u529b\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u7acb\u6cd5\u4efb\u52a1\u201c\u6e38\u8bf4\u201d\u8bae\u6848\u3002\u5728\u76ee\u6807\u9a71\u52a8\u7684\u73af\u5883\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u6784\u5efa\u6b3a\u9a97\u80fd\u529b\uff0c\u7ed3\u5408\u8bed\u8a00\u54f2\u5b66\u548c\u8ba4\u77e5\u5fc3\u7406\u5b66\u7406\u8bba\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6e38\u8bf4\u4ee3\u7406\u5728\u5bf9\u6297\u4e92\u52a8\u7684\u540e\u7eed\u5f3a\u5316\u8bd5\u9a8c\u4e2d\u5176\u6b3a\u9a97\u80fd\u529b\u63d0\u9ad8\u4e86\u7ea640%\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6b3a\u9a97\u68c0\u6d4b\u673a\u5236\u80fd\u8fbe\u5230\u9ad8\u8fbe92%\u7684\u8bc6\u522b\u7387\u3002\u8fd9\u4e9b\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u95ee\u9898\uff0c\u5373\u4ee3\u7406\u53ef\u80fd\u64cd\u7eb5\u4eba\u7c7b\u4ee5\u8fbe\u6210\u9884\u8bbe\u76ee\u6807\u3002|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324](http://arxiv.org/abs/2405.04324)|**[link](https://github.com/ibm-granite/granite-code-models)**|**\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u9886\u57df\u7684\u8bad\u7ec3\u6b63\u5728\u9769\u65b0\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002\u5982\u4eca\uff0c\u8fd9\u4e9b\u4ee3\u7801LLMs\u6b63\u9010\u6b65\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u73af\u5883\uff0c\u4ee5\u63d0\u5347\u4eba\u7c7b\u7a0b\u5e8f\u5458\u7684\u6548\u7387\uff0c\u5e76\u5c55\u73b0\u51fa\u81ea\u4e3b\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u8981\u5145\u5206\u5229\u7528\u4ee3\u7801LLMs\u7684\u5168\u90e8\u6548\u80fd\uff0c\u9700\u8981\u5176\u5177\u5907\u751f\u6210\u4ee3\u7801\u3001\u4fee\u590dbug\u3001\u89e3\u91ca\u548c\u6ce8\u91ca\u4ee3\u7801\u3001\u7ef4\u62a4\u4ed3\u5e93\u7b49\u591a\u79cd\u529f\u80fd\u3002\u672c\u6587\u4ecb\u7ecdGranite\u7cfb\u5217\u7684\u89e3\u7801\u5668\u4ec5\u6709\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u4e13\u4e3a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u800c\u8bbe\u8ba1\uff0c\u8bad\u7ec3\u6570\u636e\u6db5\u76d6116\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002Granite Code\u6a21\u578b\u5bb6\u65cf\u5305\u62ec\u4ece3\u4ebf\u5230340\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u4ece\u590d\u6742\u5e94\u7528\u73b0\u4ee3\u5316\u5230\u8bbe\u5907\u5185\u5b58\u53d7\u9650\u7684\u591a\u79cd\u5e94\u7528\u573a\u666f\u3002\u901a\u8fc7\u5168\u9762\u4efb\u52a1\u8bc4\u4f30\uff0cGranite Code\u6a21\u578b\u5728\u5f00\u6e90\u4ee3\u7801LLM\u4e2d\u7684\u6027\u80fd\u59cb\u7ec8\u5904\u4e8e\u9886\u5148\u6c34\u5e73\u3002\u8be5\u6a21\u578b\u5bb6\u65cf\u9488\u5bf9\u4f01\u4e1a\u8f6f\u4ef6\u5f00\u53d1\u5de5\u4f5c\u6d41\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8868\u73b0\u51fa\u8272\u4e8e\u5404\u79cd\u7f16\u7801\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\u3001\u4fee\u590d\u4e0e\u89e3\u91ca\uff09\uff0c\u662f\u4e00\u6b3e\u591a\u7528\u9014\u7684\u5168\u80fd\u4ee3\u7801\u6a21\u578b\u3002\u6211\u4eec\u4ee5Apache 2.0\u8bb8\u53ef\u534f\u8bae\u53d1\u5e03\u6240\u6709Granite Code\u6a21\u578b\uff0c\u4f9b\u7814\u7a76\u548c\u5546\u4e1a\u4f7f\u7528\u3002**|\n", "2405.04219": "|**2024-05-07**|**Iterative Experience Refinement of Software-Developing Agents**|Chen Qian et.al.|[2405.04219](http://arxiv.org/abs/2405.04219)|null|### \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u81ea\u4e3b\u6027\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u9759\u6001\u7ecf\u9a8c\u8303\u5f0f\u4f9d\u8d56\u4e8e\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u83b7\u53d6\u7684\u56fa\u5b9a\u5386\u53f2\u7ecf\u9a8c\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u8fed\u4ee3\u7ecf\u9a8c\u4f18\u5316\u6846\u67b6\uff0c\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u5728\u6267\u884c\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u52a8\u6001\u8c03\u6574\u548c\u4f18\u5316\u7ecf\u9a8c\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e24\u79cd\u6838\u5fc3\u6a21\u5f0f\uff1a\u987a\u5e8f\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u6279\u6b21\u5185\u7684\u6700\u8fd1\u7ecf\u9a8c\u8fdb\u884c\u6539\u8fdb\uff1b\u7d2f\u8ba1\u6a21\u5f0f\uff0c\u79ef\u7d2f\u6240\u6709\u5148\u524d\u4efb\u52a1\u6279\u6b21\u7684\u7ecf\u9a8c\u3002\u901a\u8fc7\u5f15\u5165\u7ecf\u9a8c\u6dd8\u6c70\u7b56\u7565\uff0c\u8be5\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9ad8\u8d28\u91cf\u548c\u5e38\u7528\u7684\u7ecf\u9a8c\uff0c\u6709\u6548\u5730\u7ba1\u7406\u7ecf\u9a8c\u7a7a\u95f4\uff0c\u63d0\u9ad8\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u987a\u5e8f\u6a21\u5f0f\u53ef\u80fd\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\uff0c\u4f46\u7d2f\u8ba1\u6a21\u5f0f\u5728\u7a33\u5b9a\u6027\u65b9\u9762\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u6dd8\u6c70\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u9ad8\u8d28\u91cf\u7ecf\u9a8c\u5b50\u96c6\u768411.54%\uff0c\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2405.03813": "|**2024-05-06**|**Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control**|Yaqub Chaudhary et.al.|[2405.03813](http://arxiv.org/abs/2405.03813)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6a21\u4eff\u5404\u79cd\u4fee\u8f9e\u98ce\u683c\uff0c\u751f\u6210\u8868\u8fbe\u5e7f\u6cdb\u60c5\u611f\u7684\u6587\u672c\uff0c\u8fd9\u79cd\u80fd\u529b\u5728\u4f4e\u6210\u672c\u4e0b\u8fc5\u901f\u666e\u53ca\uff0c\u5e26\u6765\u4e86\u6f5c\u5728\u7684\u793e\u4f1a\u5371\u5bb3\u3002\u672c\u6587\u5e76\u672a\u5b64\u7acb\u770b\u5f85\u8fd9\u4e9b\u6a21\u578b\uff0c\u800c\u662f\u5173\u6ce8\u5b83\u4eec\u80cc\u540e\u5927\u89c4\u6a21\u8ba1\u7b97\u57fa\u7840\u8bbe\u65bd\u5728\u5404\u9886\u57df\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u63a2\u8ba8\u4e86LLMs\u5982\u4f55\u901a\u8fc7\u6c61\u67d3\u548c\u6807\u51c6\u5316\u4fe1\u606f\u73af\u5883\u6765\u5f71\u54cd\u793e\u4f1a\uff0c\u5e76\u6307\u51fa\u8fd9\u4e9b\u529f\u80fd\u53ef\u80fd\u88ab\u7528\u4f5c\u63a7\u5236\u624b\u6bb5\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u7126\u70b9\u8f6c\u5411\u51e0\u4e2a\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u8fd9\u4e9b\u9886\u57df\u589e\u5f3a\u4e86LLMs\u4f5c\u4e3a\u6743\u529b\u5de5\u5177\u7684\u80fd\u529b\uff1a 1. \u901a\u8fc7\u5b9e\u65f6\u8bbe\u8ba1\u5bf9\u8bdd\u754c\u9762\u4e2d\u7684\u9009\u62e9\u67b6\u6784\uff08\u5982\u201cAI\u89d2\u8272\u201d\uff09\uff0c\u8fdb\u884c\u8bf4\u670d\u7b56\u7565\u3002 2. \u5229\u7528LLM\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u8ba1\u7b97\u6a21\u578b\uff08\u5982\u201c\u7845\u8d28\u4e3b\u4f53\u201d\uff09\u3002 3. \u5c06LLM\u5e94\u7528\u4e8e\u6a21\u62df\u4eba\u7c7b\u7fa4\u4f53\u884c\u4e3a\uff08\u5982\u201c\u7845\u8d28\u793e\u4f1a\u201d\uff09\u3002 4. \u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff0c\u521b\u5efa\u53ef\u63a7\u5236\u548c\u5bfc\u5411\u7684\u6218\u7565\u5bf9\u8bdd\u6a21\u578b\u3002 \u7efc\u5408\u4ee5\u4e0a\u51e0\u70b9\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6280\u672f\u6784\u5efa\u57fa\u4e8eLLMs\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u6a21\u62df\u548c\u4f2a\u88c5\u7684\u201c\u9884\u6d4b\u201d\uff0c\u6210\u4e3a\u4e2a\u4f53\u3001\u793e\u4f1a\u548c\u653f\u6cbb\u63a7\u5236\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u64cd\u63a7\u4eba\u7c7b\u7684\u884c\u4e3a\u3001\u610f\u56fe\u548c\u884c\u52a8\u3002|\n", "2405.06682": "|**2024-05-05**|**Self-Reflection in LLM Agents: Effects on Problem-Solving Performance**|Matthew Renze et.al.|[2405.06682](http://arxiv.org/abs/2405.06682)|**[link](https://github.com/matthewrenze/self-reflection)**|**\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u81ea\u6211\u53cd\u601d\u5bf9\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba9\u4e5d\u79cd\u6d41\u884c\u7684LLMs\u56de\u7b54\u4e00\u7cfb\u5217\u9009\u62e9\u9898\uff0c\u4ee5\u5efa\u7acb\u6027\u80fd\u57fa\u7ebf\u3002\u5bf9\u4e8e\u56de\u7b54\u9519\u8bef\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6307\u5bfc\u516b\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u81ea\u6211\u53cd\u601dLLM\u4ee3\u7406\u53cd\u601d\u5176\u9519\u8bef\uff0c\u5e76\u4e3a\u81ea\u5df1\u63d0\u4f9b\u6539\u8fdb\u95ee\u9898\u89e3\u51b3\u7684\u6307\u5bfc\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u6307\u5bfc\uff0c\u6bcf\u4e2a\u53cd\u601d\u578b\u4ee3\u7406\u91cd\u65b0\u5c1d\u8bd5\u56de\u7b54\u540c\u6837\u7684\u95ee\u9898\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u663e\u8457\u63d0\u9ad8\u4e86\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff08$p < 0.001$\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6bd4\u8f83\u4e86\u5404\u79cd\u81ea\u6211\u53cd\u601d\u65b9\u5f0f\u5bf9\u6027\u80fd\u7684\u5355\u72ec\u8d21\u732e\u3002\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\uff1ahttps://github.com/matthewrenze/self-reflection\u3002**|\n", "2405.02858": "|**2024-05-05**|**Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation**|Jinyu Cai et.al.|[2405.02858](http://arxiv.org/abs/2405.02858)|**[link](https://github.com/BlueLinkX/GA-MAS)**|**\u793e\u4ea4\u5a92\u4f53\u5e73\u53f0\u5982Twitter\u3001Reddit\u548c\u65b0\u6d6a\u5fae\u535a\u5728\u5168\u7403\u4ea4\u6d41\u4e2d\u626e\u6f14\u91cd\u8981\u89d2\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5730\u7f18\u653f\u6cbb\u654f\u611f\u533a\u57df\u5e38\u5e38\u53d7\u5230\u4e25\u683c\u76d1\u7ba1\u3002\u8fd9\u4fc3\u4f7f\u7528\u6237\u5728\u53d7\u9650\u7684\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e2d\u5de7\u5999\u5730\u8c03\u6574\u6c9f\u901a\u65b9\u5f0f\uff0c\u7ecf\u5e38\u4f7f\u7528\u7f16\u7801\u8bed\u8a00\u3002\u8fd9\u79cd\u8bed\u8a00\u6a21\u5f0f\u7684\u53d8\u5316\u4e0d\u4ec5\u662f\u4e3a\u4e86\u5bf9\u6297\u76d1\u7ba1\uff0c\u4e5f\u662f\u8bed\u8a00\u6f14\u5316\u7684\u751f\u52a8\u4f8b\u8bc1\uff0c\u5c55\u793a\u4e86\u793e\u4f1a\u548c\u6280\u672f\u538b\u529b\u4e0b\u8bed\u8a00\u5982\u4f55\u81ea\u7136\u6f14\u53d8\u3002\u7814\u7a76\u53d7\u9650\u5236\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e0b\u8bed\u8a00\u7684\u6f14\u53d8\u5bf9\u4e8e\u4fdd\u969c\u8a00\u8bba\u81ea\u7531\u3001\u4f18\u5316\u5185\u5bb9\u7ba1\u7406\u4ee5\u53ca\u63a8\u52a8\u8bed\u8a00\u5b66\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6a21\u62df\u6846\u67b6\uff0c\u7528\u4e8e\u63a2\u7d22\u5728\u4e25\u683c\u76d1\u7ba1\u4e0b\u7684\u7528\u6237\u8bed\u8a00\u8fdb\u5316\u3002\u8be5\u6846\u67b6\u5305\u542b\u5bf9\u8bdd\u76d1\u7763\u7684LLM\u9a71\u52a8\u4ee3\u7406\u548c\u53c2\u4e0e\u8005\u4ee3\u7406\uff0c\u5b83\u4eec\u5728\u4e92\u52a8\u4e2d\u53d1\u5c55\u8bed\u8a00\u7b56\u7565\uff0c\u6a21\u62df\u5728\u89c4\u907f\u793e\u4ea4\u5a92\u4f53\u89c4\u5219\u7684\u73af\u5883\u4e2d\u4ea4\u6d41\u65b9\u5f0f\u7684\u6f14\u53d8\u3002\u901a\u8fc7\u4ece\u62bd\u8c61\u573a\u666f\u5230\u73b0\u5b9e\u60c5\u5883\u7684\u591a\u79cd\u60c5\u666f\u8bc4\u4f30\uff0c\u7814\u7a76\u7ed3\u679c\u663e\u793aLLMs\u80fd\u591f\u6709\u6548\u6a21\u62df\u53d7\u9650\u73af\u5883\u4e2d\u7684\u590d\u6742\u8bed\u8a00\u52a8\u6001\u548c\u4ea4\u4e92\uff0c\u968f\u7740\u8fdb\u5316\uff0c\u5b83\u4eec\u5728\u89c4\u907f\u76d1\u7763\u548c\u4fe1\u606f\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u51fa\u63d0\u5347\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0LLM\u4ee3\u7406\u9488\u5bf9\u4e0d\u540c\u7684\u573a\u666f\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u7b56\u7565\u3002**|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533](http://arxiv.org/abs/2405.01533)|**[link](https://github.com/nvlabs/omnidrive)**|**\u968f\u7740\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5bf9\u4e8e\u57fa\u4e8e\u8fd9\u4e9b\u6a21\u578b\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u8868\u73b0\u51fa\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u671f\u671b\u5229\u7528\u5b83\u4eec\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06MLLMs\u7684\u5f3a\u9879\u5e94\u7528\u4e8e\u9a7e\u9a76\u4efb\u52a1\u7684\u89c4\u5212\u90e8\u5206\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u89c4\u5212\u9700\u8981\u5bf9\u4e09\u7ef4\u73af\u5883\u6709\u5168\u9762\u7684\u7406\u89e3\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u4e8c\u7ef4\u63a8\u7406\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u6a21\u578b\u4e0e3D\u9a7e\u9a76\u4efb\u52a1\u7684\u7d27\u5bc6\u5951\u5408\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u76843D MLLM\u67b6\u6784\uff0c\u5b83\u5229\u7528\u7a00\u758f\u67e5\u8be2\u6280\u672f\u5c06\u89c6\u89c9\u8868\u793a\u63d0\u5347\u5e76\u538b\u7f29\u5230\u4e09\u7ef4\u7a7a\u95f4\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u8fd9\u79cd\u57fa\u4e8e\u67e5\u8be2\u7684\u8868\u793a\u65b9\u5f0f\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u540c\u65f6\u7f16\u7801\u52a8\u6001\u7269\u4f53\u548c\u9759\u6001\u5730\u56fe\u5143\u7d20\uff08\u5982\u9053\u8def\uff09\uff0c\u4e3a\u611f\u77e5\u548c\u884c\u52a8\u7684\u5bf9\u9f50\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5316\u7684\u4e09\u7ef4\u4e16\u754c\u6a21\u578b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86OmniDrive-nuScenes\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u89c6\u89c9\u95ee\u7b54\u6570\u636e\u96c6\uff0c\u5b83\u901a\u8fc7\u5168\u9762\u7684\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\uff08\u5982\u573a\u666f\u63cf\u8ff0\u3001\u4ea4\u901a\u89c4\u5219\u7406\u89e3\u3001\u4e09\u7ef4\u5b9a\u4f4d\u3001\u53cd\u4e8b\u5b9e\u63a8\u7406\u3001\u51b3\u7b56\u5236\u5b9a\u548c\u89c4\u5212\uff09\u6765\u8003\u9a8c\u6a21\u578b\u5728\u590d\u6742\u4e09\u7ef4\u573a\u666f\u4e2d\u7684\u771f\u6b63\u60c5\u5883\u610f\u8bc6\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u63d0\u51fa\u7684\u67b6\u6784\u6709\u6548\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u590d\u6742\u4e09\u7ef4\u73af\u5883\u4e2d\u8fdb\u884c\u63a8\u7406\u548c\u89c4\u5212\u65f6\uff0c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002**|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972](http://arxiv.org/abs/2405.00972)|**[link](https://github.com/pnnl/cactus)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCACTUS\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u7ed3\u5408\u4e86\u5316\u5b66\u4fe1\u606f\u5b66\u5de5\u5177\uff0c\u65e8\u5728\u63d0\u5347\u5728\u5316\u5b66\u548c\u5206\u5b50\u53d1\u73b0\u9886\u57df\u7684\u9ad8\u7ea7\u63a8\u7406\u4e0e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u7814\u7a76\u8005\u4eec\u4f7f\u7528\u5305\u62ecGemma-7b\u3001Falcon-7b\u3001MPT-7b\u3001Llama2-7b\u548cMistral-7b\u5728\u5185\u7684\u591a\u6b3e\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u5bf9CACTUS\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6027\u80fd\u8bc4\u4f30\uff0c\u901a\u8fc7\u6570\u5343\u4e2a\u5316\u5b66\u95ee\u9898\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cCACTUS\u660e\u663e\u4f18\u4e8e\u57fa\u7840\u6a21\u578b\uff0c\u5176\u4e2dGemma-7b\u548cMistral-7b\u65e0\u8bba\u91c7\u7528\u4f55\u79cd\u63d0\u793a\u7b56\u7565\uff0c\u8868\u73b0\u6700\u4e3a\u51fa\u8272\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u9886\u57df\u7279\u5b9a\u63d0\u793a\u548c\u786c\u4ef6\u914d\u7f6e\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5f3a\u8c03\u4e86\u63d0\u793a\u5de5\u7a0b\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6307\u51fa\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u90e8\u7f72\u8f83\u5c0f\u6a21\u578b\u53ef\u80fd\u4e0d\u4f1a\u663e\u8457\u727a\u7272\u51c6\u786e\u6027\u3002 CACTUS\u901a\u8fc7\u878d\u5408\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8ba4\u77e5\u529f\u80fd\u4e0e\u4e13\u4e1a\u5de5\u5177\uff0c\u80fd\u591f\u534f\u52a9\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u5206\u5b50\u6027\u8d28\u9884\u6d4b\u3001\u76f8\u4f3c\u6027\u641c\u7d22\u548c\u836f\u7269\u9002\u7528\u6027\u8bc4\u4f30\u7b49\u4efb\u52a1\u3002\u4f5c\u4e3a\u5316\u5b66\u4fe1\u606f\u5b66\u9886\u57df\u7684\u91cd\u5927\u7a81\u7834\uff0cCACTUS\u4e3a\u5316\u5b66\u5bb6\u548c\u5206\u5b50\u63a2\u7d22\u8005\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u5de5\u5177\uff0c\u6709\u671b\u52a0\u901f\u79d1\u5b66\u7814\u7a76\uff0c\u63a8\u52a8\u65b0\u578b\u6709\u6548\u3001\u5b89\u5168\u836f\u7269\u3001\u50ac\u5316\u5242\u548c\u6750\u6599\u7684\u53d1\u73b0\u3002\u6b64\u5916\uff0cCACTUS\u4e0e\u81ea\u52a8\u5316\u5b9e\u9a8c\u5e73\u53f0\u7684\u96c6\u6210\u4ee5\u53ca\u5b9e\u65f6\u6570\u636e\u9a71\u52a8\u51b3\u7b56\u7684\u80fd\u529b\uff0c\u4e3a\u81ea\u4e3b\u53d1\u73b0\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u3002**|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978](http://arxiv.org/abs/2404.18978)|null|\u968f\u7740\u6559\u80b2\u73af\u5883\u4e2d\u5bf9\u5b66\u4e60\u8005\u6a21\u578b\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u7814\u7a76\u91cd\u70b9\u9010\u6e10\u8f6c\u5411\u5982\u4f55\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f8\u7ed3\u5408\uff0c\u63d0\u5347\u5728\u5f00\u653e\u6027\u6587\u672c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u901a\u7528\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e09\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\uff081\uff09\u57fa\u4e8eRL\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8868\u793a\u72b6\u6001\u548c\u884c\u52a8\u7b56\u7565\u4ee5\u5bfb\u627e\u6700\u4f73\u4e92\u52a8\u65b9\u5f0f\uff1b\uff082\uff09\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5229\u7528\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u901a\u8fc7\u63d0\u793a\u8fdb\u884c\u64cd\u4f5c\uff1b\uff083\uff09\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u63d0\u9ad8\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e9b\u4ee3\u7406\u7684\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PharmaSimText\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81eaPharmaSim\u865a\u62df\u836f\u5e97\u73af\u5883\u7684\u65b0\u57fa\u51c6\uff0c\u4e13\u6ce8\u4e8e\u8bca\u65ad\u5bf9\u8bdd\u5b9e\u8df5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRL\u57fa\u7840\u7684\u4ee3\u7406\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u8868\u73b0\u4f18\u79c0\uff0c\u4f46\u5728\u63d0\u95ee\u8d28\u91cf\u4e0a\u6709\u6240\u6b20\u7f3a\uff1b\u800cLLM\u57fa\u7840\u7684\u4ee3\u7406\u5728\u63d0\u95ee\u80fd\u529b\u4e0a\u8f83\u5f3a\uff0c\u4f46\u4efb\u52a1\u5b8c\u6210\u5ea6\u4e0d\u9ad8\u3002\u6700\u540e\uff0c\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\u5c55\u793a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u7684\u6f5c\u529b\uff0c\u8bc1\u5b9e\u4e86RL\u4e0eLLMs\u7ed3\u5408\u7528\u4e8e\u5f00\u53d1\u5f00\u653e\u6027\u5b66\u4e60\u73af\u5883\u9ad8\u8868\u73b0\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021](http://arxiv.org/abs/2404.18021)|null|\u968f\u7740\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u7684\u5174\u8d77\uff0c\u7cbe\u786e\u4fee\u6539\u9057\u4f20\u4fe1\u606f\u5df2\u6210\u4e3a\u53ef\u80fd\uff0c\u4f46\u9ad8\u6548\u57fa\u56e0\u7f16\u8f91\u7cfb\u7edf\u7684\u6784\u5efa\u9700\u8981\u6df1\u5165\u7406\u89e3CRISPR\u6280\u672f\u53ca\u5176\u590d\u6742\u5b9e\u9a8c\u80cc\u666f\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bf8\u591a\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u751f\u7269\u8bbe\u8ba1\u95ee\u9898\u4e0a\u5f80\u5f80\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u3002\u672c\u6587\u4ecb\u7ecdCRISPR-GPT\uff0c\u4e00\u4e2a\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u9886\u57df\u77e5\u8bc6\u548c\u5916\u90e8\u5de5\u5177\uff0c\u4ee5\u81ea\u52a8\u5316\u5e76\u63d0\u5347\u57fa\u4e8eCRISPR\u7684\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\u8bbe\u8ba1\u8fc7\u7a0b\u3002CRISPR-GPT\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u534f\u52a9\u9009\u62e9CRISPR\u7cfb\u7edf\u3001\u8bbe\u8ba1\u5f15\u5bfcRNA\u3001\u63a8\u8350\u7ec6\u80de\u9012\u9001\u65b9\u6cd5\u3001\u8d77\u8349\u534f\u8bae\u4ee5\u53ca\u8bbe\u8ba1\u9a8c\u8bc1\u5b9e\u9a8c\u4ee5\u786e\u8ba4\u7f16\u8f91\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86CRISPR-GPT\u5982\u4f55\u5e2e\u52a9\u975e\u4e13\u5bb6\u7814\u7a76\u4eba\u5458\u4ece\u5934\u5f00\u59cb\u8fdb\u884c\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u5b9e\u9645\u6848\u4f8b\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u81ea\u52a8\u5316\u57fa\u56e0\u7f16\u8f91\u8bbe\u8ba1\u7684\u4f26\u7406\u548c\u76d1\u7ba1\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u8d1f\u8d23\u4efb\u548c\u900f\u660e\u4f7f\u7528\u6b64\u7c7b\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u76ee\u6807\u662f\u5f25\u5408\u521d\u7ea7\u751f\u7269\u7814\u7a76\u8005\u4e0eCRISPR\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u5c55\u793aLLM\u4ee3\u7406\u5728\u4fc3\u8fdb\u590d\u6742\u751f\u7269\u53d1\u73b0\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2404.17833": "|**2024-04-27**|**Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs**|Zhenlan Ji et.al.|[2404.17833](http://arxiv.org/abs/2404.17833)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u4e2d\uff0c\u7279\u522b\u662f\u5728\u5fc3\u7406\u5065\u5eb7\u652f\u6301\u3001\u5316\u5b66\u5408\u6210\u548c\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5c55\u73b0\u6548\u7528\uff0c\u4eba\u4eec\u53d1\u73b0\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u957f\u671f\u89c4\u5212\u65f6\u5bb9\u6613\u4ea7\u751f\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u2014\u2014PDoctor\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u7406\u89e3LLM\u4ee3\u7406\u7684\u9519\u8bef\u89c4\u5212\u3002PDoctor\u9996\u5148\u5b9a\u4e49\u4e86\u4e00\u4e2a\u9886\u57df\u7279\u5b9a\u7684\u8bed\u8a00\uff08DSL\uff09\uff0c\u7528\u4e8e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u501f\u52a9Z3\u7ea6\u675f\u6c42\u89e3\u5668\u751f\u6210\u5404\u79cd\u8f93\u5165\uff0c\u8fd9\u4e9b\u8f93\u5165\u662f\u63cf\u8ff0\u4e00\u7cfb\u5217\u4efb\u52a1\u5b8c\u6210\u9700\u6c42\u7684\u81ea\u7136\u8bed\u8a00\u6bb5\u843d\u3002\u7136\u540e\uff0cPDoctor\u4ece\u8fd9\u4e9b\u9700\u6c42\u4e2d\u63d0\u53d6\u7ea6\u675f\uff0c\u5f62\u6210\u4e00\u4e2a\u6d4b\u8bd5\u57fa\u51c6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u4e3b\u6d41\u7684\u4ee3\u7406\u6846\u67b6\u548c\u4e24\u4e2a\u5f3a\u5927\u7684LLMs\uff08GPT-3.5\u548cGPT-4\uff09\u5bf9PDoctor\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u80fd\u6709\u6548\u8bc6\u522b\u4ee3\u7406\u89c4\u5212\u4e2d\u7684\u5404\u79cd\u9519\u8bef\uff0c\u5e76\u4e3a\u5f00\u53d1\u8005\u548c\u7528\u6237\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u9519\u8bef\u7279\u6027\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u66ff\u4ee3\u8bbe\u8ba1\u548c\u6269\u5c55PDoctor\u7684\u65b9\u5411\u3002|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662](http://arxiv.org/abs/2404.17662)|**[link](https://github.com/alickzhu/player)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u95f4\u7684\u901a\u4fe1\u548c\u793e\u4f1a\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u6d89\u53ca\u7ade\u4e89\u4e0e\u5408\u4f5c\u7684\u52a8\u6001\u73af\u5883\u4e2d\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u590d\u6742\u63a8\u7406\u7684\u6784\u5efa\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5c24\u5176\u662f\u56e0\u4e3a\u57fa\u4e8e\u4fe1\u606f\u56fe\u7684\u641c\u7d22\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPLAYER*\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u4efb\u610f\u91c7\u6837\u5f0f\u89c4\u5212\u5668\u7684\u65b0\u6846\u67b6\uff0c\u5b83\u7ed3\u5408\u4e86\u4f20\u611f\u5668\u548c\u526a\u679d\u6280\u672f\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5b8c\u5168\u4f9d\u8d56\u4e8e\u95ee\u9898\u9a71\u52a8\u7684\u641c\u7d22\u6846\u67b6\uff0c\u9002\u7528\u4e8e\u9ad8\u96be\u5ea6\u7684\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u53ef\u91cf\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u901a\u8fc7\u591a\u9879\u9009\u62e9\u9898\u6765\u6d4b\u8bd5\uff0c\u5e76\u521b\u5efa\u4e86WellPlay\u6570\u636e\u96c6\uff0c\u5305\u542b1,482\u4e2a\u95ee\u7b54\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPLAYER*\u5728\u590d\u6742\u52a8\u6001\u73af\u5883\u4e2d\u7684\u6548\u7387\u548c\u6027\u80fd\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u53ef\u91cf\u5316\u7684\u5bf9\u6bd4\u7ed3\u679c\u3002**|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525](http://arxiv.org/abs/2404.17525)|null|\u4f20\u7edf\u7684\u673a\u68b0\u8bbe\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u901a\u8fc7\u7ecf\u9a8c\u5f15\u5bfc\u7684\u4fee\u6539\u548c\u6709\u9650\u5143\u5206\u6790\uff08FEA\uff09\u6765\u6ee1\u8db3\u7279\u5b9a\u9700\u6c42\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8017\u65f6\u4e14\u9ad8\u5ea6\u4f9d\u8d56\u4e2a\u4eba\u77e5\u8bc6\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u4e86\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u6a21\u578b\u6765\u7b80\u5316\u7e41\u7410\u7684\u4e13\u5bb6\u9a71\u52a8\u8fed\u4ee3\u8fc7\u7a0b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9700\u8981\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5176\u8bad\u7ec3\u9886\u57df\u548c\u4efb\u52a1\uff0c\u9650\u5236\u4e86\u8de8\u4efb\u52a1\u5e94\u7528\u3002\u8fd9\u5728\u81ea\u52a8\u5316\u6548\u7387\u4e0e\u8d44\u6e90\u9700\u6c42\u4e4b\u95f4\u5f62\u6210\u4e86\u6743\u8861\u3002 \u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u5c06\u9884\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u6709\u9650\u5143\u6a21\u5757\u7ed3\u5408\u3002\u6709\u9650\u5143\u6a21\u5757\u8bc4\u4f30\u6bcf\u4e2a\u8bbe\u8ba1\u5e76\u63d0\u4f9b\u5173\u952e\u53cd\u9988\uff0c\u5f15\u5bfcLLMs\u4e0d\u65ad\u5b66\u4e60\u3001\u89c4\u5212\u3001\u751f\u6210\u548c\u4f18\u5316\u8bbe\u8ba1\uff0c\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u8fdb\u884c\u4e13\u95e8\u8bad\u7ec3\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6841\u67b6\u7ed3\u6784\u7684\u8fed\u4ee3\u4f18\u5316\u4e2d\u5c55\u793a\u8fd9\u79cd\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u80fd\u591f\u6839\u636e\u7ed3\u6784\u5316\u7684\u53cd\u9988\u548c\u6807\u51c6\u8c03\u6574\u8bbe\u8ba1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6210\u529f\u751f\u6210\u7b26\u5408\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u6841\u67b6\u7ed3\u6784\u8bbe\u8ba1\uff0c\u6210\u529f\u7387\u9ad8\u8fbe90%\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6240\u65bd\u52a0\u7684\u7ea6\u675f\u6761\u4ef6\u3002\u901a\u8fc7\u63d0\u793a\u5f0f\u4f18\u5316\u6280\u672f\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u63a5\u6536\u5230\u89e3-\u5f97\u5206\u5bf9\u540e\uff0c\u80fd\u591f\u6839\u636e\u5176\u5185\u5728\u63a8\u7406\u80fd\u529b\u8fed\u4ee3\u4f18\u5316\u8bbe\u8ba1\u4ee5\u6ee1\u8db3\u89c4\u683c\u8981\u6c42\u3002 LLM\u4ee3\u7406\u80fd\u591f\u4ea7\u751f\u53ef\u884c\u7684\u8bbe\u8ba1\u5e76\u6839\u636e\u5176\u56fa\u6709\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4f18\u5316\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u81ea\u4e3b\u53d1\u5c55\u548c\u5b9e\u65bd\u6709\u6548\u7684\u8bbe\u8ba1\u7b56\u7565\u3002|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460](http://arxiv.org/abs/2404.17460)|null|\u672c\u6587\u8ba8\u8bba\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5bf9\u8bdd\u5f0f\u8f85\u5bfc\u7cfb\u7edf\uff08Conversational Tutoring Systems\uff0cCTS\uff09\uff0c\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u9996\u5148\uff0c\u7cfb\u7edf\u901a\u8fc7\u81ea\u52a8\u4ece\u8bfe\u7a0b\u6587\u672c\u4e2d\u751f\u6210\u6613\u4e8e\u7f16\u8f91\u7684\u6559\u5b66\u811a\u672c\uff0c\u5b9e\u73b0AI\u8f85\u52a9\u7684\u5185\u5bb9\u521b\u4f5c\u3002\u5176\u6b21\uff0c\u7cfb\u7edf\u901a\u8fc7\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff08Ruffle\u548cRiley\uff09\u4ee5\u5b66\u4e60\u6559\u5b66\u6a21\u5f0f\u8fd0\u884c\uff0c\u5206\u522b\u626e\u6f14\u5b66\u751f\u548c\u6559\u6388\u89d2\u8272\uff0c\u8fdb\u884c\u81ea\u7531\u5f62\u5f0f\u7684\u5bf9\u8bdd\uff0c\u9075\u5faa\u5178\u578b\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\u7684\u5185\u73af\u548c\u5916\u73af\u7ed3\u6784\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff08N=200\uff09\u4e2d\u5bf9\u6bd4\u4e86\u8be5\u7cfb\u7edf\u4e0e\u7b80\u5355\u7684\u95ee\u7b54\u804a\u5929\u673a\u5668\u4eba\u548c\u9605\u8bfb\u6d3b\u52a8\u5728\u652f\u6301\u751f\u7269\u5b66\u8bfe\u7a0b\u7684\u6548\u679c\u3002\u7814\u7a76\u5206\u6790\u4e86\u7cfb\u7edf\u4f7f\u7528\u6a21\u5f0f\u3001\u9884\u540e\u6d4b\u8bd5\u6210\u7ee9\u4ee5\u53ca\u7528\u6237\u4f53\u9a8c\u8c03\u67e5\uff0c\u7ed3\u679c\u663e\u793a\u7528\u6237\u5bf9Ruffle&Riley\u7684\u53c2\u4e0e\u5ea6\u9ad8\uff0c\u7406\u89e3\u529b\u5f3a\uff0c\u5e76\u8ba4\u4e3a\u63d0\u4f9b\u7684\u652f\u6301\u6709\u5e2e\u52a9\u3002\u5c3d\u7ba1Ruffle&Riley\u7528\u6237\u7684\u5b8c\u6210\u65f6\u95f4\u8f83\u957f\uff0c\u4f46\u5728\u77ed\u671f\u5b66\u4e60\u6210\u6548\u4e0a\u5e76\u672a\u53d1\u73b0\u663e\u8457\u5dee\u5f02\uff0c\u4f18\u4e8e\u9605\u8bfb\u6d3b\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u67b6\u6784\u548c\u7528\u6237\u7814\u7a76\u4e3a\u672a\u6765CTS\u8bbe\u8ba1\u8005\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u6e90\u6211\u4eec\u7684\u7cfb\u7edf\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8eLLM\u7684\u5b66\u4e60\u6280\u672f\u6709\u6548\u6559\u5b66\u8bbe\u8ba1\u7684\u7814\u7a76\u3002|\n", "2404.17153": "|**2024-04-26**|**A Unified Debugging Approach via LLM-Based Multi-Agent Synergy**|Cheryl Lee et.al.|[2404.17153](http://arxiv.org/abs/2404.17153)|null|\u5728\u8f6f\u4ef6\u8c03\u8bd5\u8fd9\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u52aa\u529b\u5b9e\u73b0\u81ea\u52a8\u5316\uff0c\u5305\u62ec\u6545\u969c\u5b9a\u4f4d\u548c\u4fee\u590d\u751f\u6210\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u8c03\u8bd5\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4f20\u7edf\u548c\u57fa\u4e8eLLM\u7684\u8c03\u8bd5\u5de5\u5177\u9762\u4e34\u4e09\u5927\u6311\u6218\uff1a1\uff09\u4e0a\u6e38\u7684\u6545\u969c\u5b9a\u4f4d\u4e0d\u51c6\u786e\u4f1a\u6ce2\u53ca\u4e0b\u6e38\u7684\u4fee\u590d\uff1b2\uff09\u5904\u7406\u590d\u6742\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\u4e0d\u8db3\uff1b3\uff09\u5ffd\u89c6\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u81ea\u52a8\u5316\u7684\u3001\u7edf\u4e00\u7684\u8c03\u8bd5\u6846\u67b6\u2014\u2014FixAgent\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u534f\u540c\u3002FixAgent\u80fd\u6267\u884c\u7aef\u5230\u7aef\u7684\u6545\u969c\u5b9a\u4f4d\u3001\u4fee\u590d\u548c\u5206\u6790\u3002 \u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0cLLMs\u80fd\u591f\u4ece\u4eba\u7c7b\u5f00\u53d1\u8005\u8ba4\u53ef\u7684\u901a\u7528\u8f6f\u4ef6\u5de5\u7a0b\u539f\u5219\u4e2d\u83b7\u76ca\uff0c\u6bd4\u5982\u201c\u6a61\u76ae\u9e2d\u8c03\u8bd5\u201d\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7a0b\u5e8f\u529f\u80fd\u548c\u903b\u8f91\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u4e2a\u7075\u611f\u6765\u6e90\u4e8e\u201c\u6a61\u76ae\u9e2d\u201d\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u4ee3\u7406\u4e13\u4e1a\u5316\u4e0e\u534f\u540c\u3001\u5173\u952e\u53d8\u91cf\u8ddf\u8e2a\u548c\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u4fc3\u4f7fLLMs\u63d0\u4f9b\u660e\u786e\u7684\u89e3\u91ca\uff0c\u5e76\u805a\u7126\u4e8e\u5173\u952e\u7684\u7a0b\u5e8f\u903b\u8f91\u4fe1\u606f\u3002\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684QuixBugs\u6570\u636e\u96c6\u4e0a\uff0cFixAgent\u6210\u529f\u4fee\u590d\u4e8680\u4e2abug\u4e2d\u768479\u4e2a\uff0c\u5176\u4e2d9\u4e2a\u662f\u4e4b\u524d\u672a\u89e3\u51b3\u7684\u3002\u5b83\u8fd8\u5728CodeFlaws\u4e0a\u5408\u7406\u5730\u4fee\u590d\u4e861.9\u500d\u4e8e\u6700\u4f73\u4fee\u590d\u5de5\u5177\u7684\u7f3a\u9677\uff0c\u800c\u4e14\u65e0\u9700\u4f4d\u7f6e\u4fe1\u606f\uff0c\u91c7\u6837\u7387\u4f4e\u4e8e0.6%\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4e0e\u4f7f\u7528\u4e0d\u540cLLM\u7684\u57fa\u7ebf\u6a21\u578b\u76f8\u6bd4\uff0cFixAgent\u63d0\u9ad8\u4e86\u7ea620%\u7684\u5408\u7406\u4fee\u590d\u548c\u6b63\u786e\u4fee\u590d\u7387\uff0c\u663e\u793a\u51fa\u6211\u4eec\u8bbe\u8ba1\u7684\u6709\u6548\u6027\u3002 \u6b64\u5916\uff0cFixAgent\u7684\u6b63\u786e\u7387\u9ad8\u8fbe97.26%\uff0c\u8868\u660e\u5b83\u6709\u53ef\u80fd\u514b\u670d\u73b0\u6709\u65b9\u6cd5\u7684\u8fc7\u62df\u5408\u95ee\u9898\u3002\u603b\u7ed3\u6765\u8bf4\uff0cFixAgent\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u81ea\u52a8\u5316\u8c03\u8bd5\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u8f6f\u4ef6\u8c03\u8bd5\u7684\u6548\u7387\u548c\u51c6\u786e\u6027\u3002|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698](http://arxiv.org/abs/2404.16698)|**[link](https://github.com/giorgiopiatti/govsim)**|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51b3\u7b56\u5b89\u5168\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGovernance of the Commons Simulation\u201d\uff08GovSim\uff09\u7684\u6a21\u62df\u5e73\u53f0\uff0c\u65e8\u5728\u7814\u7a76LLMs\u4e2d\u7684\u6218\u7565\u4e92\u52a8\u548c\u5408\u4f5c\u51b3\u7b56\u3002\u901a\u8fc7\u8fd9\u4e2a\u73af\u5883\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u4e4b\u95f4\u8d44\u6e90\u5206\u4eab\u7684\u52a8\u6001\uff0c\u5f3a\u8c03\u4e86\u4f26\u7406\u8003\u91cf\u3001\u6218\u7565\u89c4\u5212\u548c\u8c08\u5224\u6280\u5de7\u7684\u91cd\u8981\u6027\u3002GovSim\u5177\u6709\u7075\u6d3b\u6027\uff0c\u652f\u6301\u6587\u672c\u578b\u4ee3\u7406\uff0c\u5305\u62ecLLMs\u3002\u5229\u7528\u751f\u6210\u5f0f\u4ee3\u7406\u6846\u67b6\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u901a\u7528\u4ee3\u7406\uff0c\u4fbf\u4e8e\u6574\u5408\u4e0d\u540c\u7684LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u5728GovSim\u4e2d\uff0c\u53ea\u670915\u4e2a\u6d4b\u8bd5\u6a21\u578b\u4e2d\u76842\u4e2a\u80fd\u591f\u5b9e\u73b0\u53ef\u6301\u7eed\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u7ba1\u7406\u5171\u4eab\u8d44\u6e90\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u5982\u679c\u79fb\u9664\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u80fd\u529b\uff0c\u5b83\u4eec\u4f1a\u8fc7\u5ea6\u4f7f\u7528\u5171\u4eab\u8d44\u6e90\uff0c\u7a81\u51fa\u4e86\u5408\u4f5c\u4e2d\u6c9f\u901a\u7684\u5173\u952e\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u5927\u591a\u6570LLMs\u7f3a\u4e4f\u666e\u904d\u5316\u7684\u5047\u8bbe\u80fd\u529b\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u63a8\u7406\u6280\u80fd\u7684\u4e00\u4e2a\u91cd\u8981\u5f31\u70b9\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u6240\u6709\u7814\u7a76\u7ed3\u679c\uff0c\u5305\u62ec\u6a21\u62df\u73af\u5883\u3001\u4ee3\u7406\u63d0\u793a\u4ee5\u53ca\u5168\u9762\u7684\u7f51\u7edc\u754c\u9762\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba8\u8bba\u3002|\n", "2404.17605": "|**2024-04-24**|**Autonomous LLM-driven research from data to human-verifiable research papers**|Tal Ifargan et.al.|[2404.17605](http://arxiv.org/abs/2404.17605)|**[link](https://github.com/technion-kishony-lab/data-to-paper)**|**\u968f\u7740\u4eba\u5de5\u667a\u80fd\u63a8\u52a8\u79d1\u5b66\u53d1\u73b0\u7684\u6b65\u4f10\u52a0\u5feb\uff0c\u4eba\u4eec\u8fd8\u4e0d\u6e05\u695a\u5b8c\u5168\u7531AI\u9a71\u52a8\u7684\u7814\u7a76\u662f\u5426\u53ef\u884c\uff0c\u4ee5\u53ca\u5b83\u80fd\u5426\u9075\u5faa\u5173\u952e\u7684\u79d1\u5b66\u4ef7\u503c\u89c2\uff0c\u5982\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002\u4e3a\u4e86\u6a21\u62df\u4eba\u7c7b\u7684\u79d1\u5b66\u7814\u7a76\u5b9e\u8df5\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u201c\u6570\u636e\u5230\u8bba\u6587\u201d\uff08data-to-paper\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u5e73\u53f0\uff0c\u5f15\u5bfc\u76f8\u4e92\u534f\u4f5c\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u901a\u8fc7\u5b8c\u6574\u7684\u5206\u6b65\u9aa4\u7814\u7a76\u6d41\u7a0b\uff0c\u540c\u65f6\u7a0b\u5e8f\u5316\u8ffd\u8e2a\u4fe1\u606f\u6d41\uff0c\u5e76\u5141\u8bb8\u4eba\u7c7b\u76d1\u7763\u548c\u4e92\u52a8\u3002\u5728\u81ea\u52a8\u6a21\u5f0f\u4e0b\uff0c\u4ec5\u63d0\u4f9b\u6807\u6ce8\u6570\u636e\uff0c\u8be5\u5e73\u53f0\u5c31\u80fd\u63d0\u51fa\u5047\u8bbe\uff0c\u8bbe\u8ba1\u7814\u7a76\u8ba1\u5212\uff0c\u7f16\u5199\u548c\u8c03\u8bd5\u5206\u6790\u4ee3\u7801\uff0c\u751f\u6210\u548c\u89e3\u8bfb\u7ed3\u679c\uff0c\u751a\u81f3\u521b\u5efa\u5b8c\u6574\u4e14\u4fe1\u606f\u53ef\u8ffd\u6eaf\u7684\u79d1\u7814\u8bba\u6587\u3002\u5c3d\u7ba1\u7814\u7a76\u65b0\u9896\u6027\u6709\u9650\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u5c55\u793a\u4e86AI\u81ea\u4e3b\u4ece\u6570\u636e\u4e2d\u751f\u6210\u539f\u521b\u5b9a\u91cf\u6d1e\u5bdf\u7684\u80fd\u529b\u3002\u5bf9\u4e8e\u7b80\u5355\u7684\u7814\u7a76\u76ee\u6807\uff0c\u5168\u81ea\u52a8\u6d41\u7a0b\u80fd\u521b\u4f5c\u51fa\u5927\u7ea680-90%\u65e0\u9700\u91cd\u5927\u9519\u8bef\u7684\u7a3f\u4ef6\uff0c\u7136\u800c\u968f\u7740\u76ee\u6807\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u4eba\u7c7b\u7684\u5171\u540c\u53c2\u4e0e\u5bf9\u4e8e\u4fdd\u8bc1\u51c6\u786e\u6027\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u751f\u6210\u7684\u8bba\u6587\u672c\u8eab\u4e5f\u5177\u6709\u5185\u5728\u7684\u53ef\u9a8c\u8bc1\u6027\uff0c\u56e0\u4e3a\u4fe1\u606f\u8ffd\u8e2a\u4f7f\u5f97\u7ed3\u679c\u3001\u65b9\u6cd5\u548c\u6570\u636e\u7684\u94fe\u63a5\u53ef\u4ee5\u7a0b\u5e8f\u5316\u8fdb\u884c\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0cAI\u9a71\u52a8\u7684\u79d1\u7814\u53ef\u4ee5\u52a0\u901f\u79d1\u5b66\u53d1\u73b0\uff0c\u540c\u65f6\u589e\u5f3a\u800c\u975e\u5a01\u80c1\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002**|\n", "2404.16115": "|**2024-04-24**|**Online Personalizing White-box LLMs Generation with Neural Bandits**|Zekai Chen et.al.|[2404.16115](http://arxiv.org/abs/2404.16115)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u59cb\u751f\u6210\u4e2a\u6027\u5316\u7684\u6587\u672c\u5185\u5bb9\uff0c\u5982\u4f55\u5728\u4e0d\u4e3a\u6bcf\u4f4d\u7528\u6237\u521b\u5efa\u72ec\u7279\u6a21\u578b\u7684\u8d44\u6e90\u6d88\u8017\u4e0b\u5b9e\u73b0\u9ad8\u6548\u4e2a\u6027\u5316\u6210\u4e86\u65b0\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5728\u7ebf\u65b9\u6cd5\uff0c\u5229\u7528\u795e\u7ecf_bandit\u7b97\u6cd5\u52a8\u6001\u4f18\u5316\u8f6f\u6307\u4ee4\u5d4c\u5165\uff0c\u6839\u636e\u7528\u6237\u53cd\u9988\u8c03\u6574\u5185\u5bb9\uff0c\u4ece\u800c\u63d0\u5347\u767d\u76d2LLMs\u5f00\u653e\u6027\u6587\u672c\u751f\u6210\u7684\u4e2a\u6027\u5316\u6c34\u5e73\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u57fa\u7840\u7b56\u7565\u6709\u663e\u8457\u6027\u80fd\u63d0\u5347\u3002\u7279\u522b\u662f\u9488\u5bf9\u4e2a\u6027\u5316\u65b0\u95fb\u6807\u9898\u751f\u6210\uff0cNeuralTS\u5e26\u6765\u4e86\u9ad8\u8fbe62.9%\u7684\u6700\u4f73ROUGE\u5206\u6570\u63d0\u5347\u4ee5\u53ca2.76%\u7684LLM\u4ee3\u7406\u8bc4\u4f30\u5206\u6570\u589e\u957f\uff0c\u8fd9\u8868\u660e\u5176\u6548\u679c\u663e\u8457\u3002|\n", "2404.15974": "|**2024-04-24**|**A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples**|Lihang Pan et.al.|[2404.15974](http://arxiv.org/abs/2404.15974)|null|## \u7ffb\u8bd1 \u5355\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fde\u63a5\u591a\u4e2aLLM\u4ee3\u7406\u6784\u5efa\u7684\u7f51\u7edc\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6574\u4f53\u6027\u80fd\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4eba\u673a\u534f\u4f5c\u5de5\u5177\u2014\u2014EasyLAN\uff0c\u65e8\u5728\u5e2e\u52a9\u5f00\u53d1\u8005\u8f7b\u677e\u6784\u5efaLLM\u4ee3\u7406\u7f51\u7edc\uff08LAN\uff09\u3002EasyLAN\u9996\u5148\u6839\u636e\u4efb\u52a1\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u4ec5\u5305\u542b\u4e00\u4e2a\u4ee3\u7406\u7684\u521d\u59cb\u7f51\u7edc\u3002\u63a5\u7740\uff0c\u5b83\u5229\u7528\u5c11\u91cf\u8bad\u7ec3\u793a\u4f8b\u6765\u8c03\u6574\u7f51\u7edc\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u793a\u4f8b\uff0cEasyLAN\u5206\u6790\u8f93\u51fa\u4e0e\u771f\u5b9e\u7ed3\u679c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u5e76\u627e\u51fa\u9519\u8bef\u7684\u539f\u56e0\u3002EasyLAN\u4f1a\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u7b56\u7565\u6765\u4fee\u6b63\u8fd9\u4e9b\u95ee\u9898\u3002\u7528\u6237\u53ef\u4ee5\u4ecb\u5165EasyLAN\u7684\u5de5\u4f5c\u6d41\u7a0b\u6216\u76f4\u63a5\u4fee\u6539LAN\u3002\u6700\u7ec8\uff0cLAN\u4ece\u5355\u4e2a\u4ee3\u7406\u53d1\u5c55\u6210\u591a\u4ee3\u7406\u7684\u7f51\u7edc\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEasyLAN\u80fd\u591f\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u6784\u5efa\u6027\u80fd\u826f\u597d\u7684LAN\u3002|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269](http://arxiv.org/abs/2404.15269)|**[link](https://github.com/gao-g/prelude)**|**\u6211\u4eec\u7814\u7a76\u57fa\u4e8e\u7528\u6237\u5bf9\u8bed\u8a00\u6a21\u578b\u7f16\u8f91\u7684\u4e92\u52a8\u5b66\u4e60\u8bed\u8a00\u4ee3\u7406\u3002\u5728\u8bf8\u5982\u5199\u4f5c\u52a9\u624b\u7684\u5e38\u89c1\u573a\u666f\u4e2d\uff0c\u7528\u6237\u4e0e\u8bed\u8a00\u4ee3\u7406\u4ea4\u4e92\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u751f\u6210\u54cd\u5e94\uff0c\u5e76\u53ef\u80fd\u9009\u62e9\u6027\u5730\u7f16\u8f91\u4ee3\u7406\u7684\u54cd\u5e94\u4ee5\u53cd\u6620\u4ed6\u4eec\u7684\u6f5c\u5728\u504f\u597d\uff0c\u540c\u65f6\u63d0\u9ad8\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u7f16\u8f91\u53cd\u9988\u662f\u81ea\u7136\u4ea7\u751f\u7684\uff0c\u9002\u5408\u7528\u4e8e\u63d0\u5347\u4ee3\u7406\u4e0e\u7528\u6237\u504f\u597d\u7684\u5951\u5408\u5ea6\uff0c\u964d\u4f4e\u540e\u7eed\u7528\u6237\u7684\u7f16\u8f91\u6210\u672c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPRELUDE\u6846\u67b6\uff0c\u5b83\u6839\u636e\u5386\u53f2\u7f16\u8f91\u6570\u636e\u63a8\u65ad\u7528\u6237\u7684\u6f5c\u5728\u504f\u597d\uff0c\u5e76\u636e\u6b64\u8bbe\u8ba1\u4e00\u4e2a\u63d0\u793a\u7b56\u7565\uff0c\u5f15\u5bfc\u672a\u6765\u7684\u54cd\u5e94\u751f\u6210\uff0c\u907f\u514d\u4e86\u6602\u8d35\u4e14\u96be\u4ee5\u6269\u5c55\u7684\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8fd8\u80fd\u4fdd\u6301\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u5b66\u4e60\u63cf\u8ff0\u6027\u7684\u504f\u597d\u6709\u52a9\u4e8e\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\uff0c\u7528\u6237\u53ef\u4ee5\u67e5\u770b\u548c\u8c03\u6574\u5b66\u4e60\u5230\u7684\u504f\u597d\u3002\u7136\u800c\uff0c\u7528\u6237\u504f\u597d\u53ef\u80fd\u590d\u6742\u591a\u53d8\uff0c\u53d7\u60c5\u5883\u5f71\u54cd\uff0c\u56e0\u6b64\u5b66\u4e60\u8d77\u6765\u5177\u6709\u6311\u6218\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCIPHER\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6839\u636e\u7528\u6237\u7f16\u8f91\u63a8\u65ad\u7ed9\u5b9a\u60c5\u5883\u4e0b\u7684\u7528\u6237\u504f\u597d\u3002\u672a\u6765\uff0cCIPHER\u4f1a\u4ece\u5386\u53f2\u4e2d\u7684k\u4e2a\u6700\u63a5\u8fd1\u7684\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u63a8\u65ad\u51fa\u7684\u504f\u597d\uff0c\u7efc\u5408\u751f\u6210\u54cd\u5e94\u3002\u6211\u4eec\u5728\u603b\u7ed3\u548c\u7535\u5b50\u90ae\u4ef6\u5199\u4f5c\u4e24\u4e2a\u4e92\u52a8\u73af\u5883\u4e2d\u4f7f\u7528GPT-4\u6a21\u62df\u7528\u6237\u8fdb\u884c\u8bc4\u4f30\uff0c\u4e0e\u76f4\u63a5\u4f7f\u7528\u7528\u6237\u7f16\u8f91\u4f46\u4e0d\u5b66\u4e60\u63cf\u8ff0\u6027\u504f\u597d\u7684\u7b97\u6cd5\uff0c\u4ee5\u53ca\u5b66\u4e60\u5168\u5c40\u65e0\u4e0a\u4e0b\u6587\u504f\u597d\u7684\u7b97\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002 \u5728\u4e24\u9879\u4efb\u52a1\u4e2d\uff0cCIPHER\u90fd\u5b9e\u73b0\u4e86\u6700\u4f4e\u7684\u7f16\u8f91\u8ddd\u79bb\u6210\u672c\uff0c\u5e76\u4e14\u5b66\u4e60\u5230\u7684\u504f\u597d\u4e0e\u771f\u5b9e\u504f\u597d\u663e\u793a\u51fa\u663e\u8457\u7684\u76f8\u4f3c\u6027\u3002**|\n", "2404.14387": "|**2024-04-22**|**A Survey on Self-Evolution of Large Language Models**|Zhengwei Tao et.al.|[2404.14387](http://arxiv.org/abs/2404.14387)|**[link](https://github.com/alibabaresearch/damo-convai)**|**## \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u9886\u57df\u548c\u667a\u80fd\u4ee3\u7406\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4f9d\u8d56\u4eba\u7c7b\u6216\u5916\u90e8\u6a21\u578b\u76d1\u7763\u7684\u73b0\u6709LLMs\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u591a\u6837\u6027\u589e\u52a0\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6210\u672c\u9ad8\u6602\u548c\u6027\u80fd\u74f6\u9888\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u5e94\u8fd0\u800c\u751f\uff0c\u8fd9\u79cd\u7b56\u7565\u5141\u8bb8LLMs\u81ea\u4e3b\u83b7\u53d6\u3001\u7cbe\u70bc\u5e76\u4ece\u81ea\u8eab\u751f\u6210\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\uff0c\u501f\u9274\u4eba\u7c7b\u7ecf\u9a8c\u5b66\u4e60\u8fc7\u7a0b\uff0c\u6709\u671b\u63a8\u52a8LLMs\u5411\u8d85\u7ea7\u667a\u80fd\u53d1\u5c55\u3002\u672c\u6587\u5168\u9762\u7efc\u8ff0\u4e86LLMs\u4e2d\u7684\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u6982\u5ff5\u6846\u67b6\uff0c\u5c06\u8fdb\u5316\u8fc7\u7a0b\u5212\u5206\u4e3a\u8fed\u4ee3\u5faa\u73af\u7684\u56db\u4e2a\u9636\u6bb5\uff1a\u7ecf\u9a8c\u83b7\u53d6\u3001\u7ecf\u9a8c\u7ec6\u5316\u3001\u66f4\u65b0\u548c\u8bc4\u4f30\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5206\u7c7b\u63a2\u8ba8LLMs\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u8fdb\u5316\u76ee\u6807\uff0c\u5e76\u5bf9\u76f8\u5173\u6587\u732e\u8fdb\u884c\u603b\u7ed3\uff0c\u63d0\u4f9b\u6bcf\u4e2a\u6a21\u5757\u7684\u5206\u7c7b\u548c\u89c1\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5f53\u524d\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u4e3a\u52a0\u901f\u81ea\u6f14\u8fdbLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u5173\u952e\u6d1e\u89c1\u3002**|\n", "2404.13501": "|**2024-04-21**|**A Survey on the Memory Mechanism of Large Language Model based Agents**|Zeyu Zhang et.al.|[2404.13501](http://arxiv.org/abs/2404.13501)|**[link](https://github.com/nuster1128/llm_agent_memory_survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u79d1\u7814\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u57fa\u4e8eLLMs\u7684\u667a\u80fd\u4ee3\u7406\u56e0\u5176\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5bf9\u4e8e\u89e3\u51b3\u9700\u8981\u957f\u671f\u590d\u6742\u4ea4\u4e92\u7684\u73b0\u5b9e\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u652f\u6301agent-environment\u4ea4\u4e92\u7684\u5173\u952e\u8981\u7d20\u662f\u4ee3\u7406\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u5c3d\u7ba1\u5df2\u6709\u4f17\u591a\u6709\u524d\u666f\u7684\u8bb0\u5fc6\u8bbe\u8ba1\u88ab\u63d0\u51fa\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u5206\u6563\u5728\u591a\u7bc7\u8bba\u6587\u4e2d\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7efc\u8ff0\u6765\u7cfb\u7edf\u6027\u5730\u603b\u7ed3\u548c\u6bd4\u8f83\uff0c\u672a\u80fd\u63d0\u70bc\u51fa\u901a\u7528\u4e14\u6709\u6548\u7684\u8bbe\u8ba1\u6a21\u5f0f\u4ee5\u542f\u53d1\u540e\u7eed\u7814\u7a76\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4efd\u5173\u4e8eLLM\u57fa\u4ee3\u7406\u8bb0\u5fc6\u673a\u5236\u7684\u5168\u9762\u8c03\u67e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8\u8bb0\u5fc6\u5728LLM\u4ee3\u7406\u4e2d\u7684\u201c\u662f\u4ec0\u4e48\u201d\u4ee5\u53ca\u201c\u4e3a\u4ec0\u4e48\u9700\u8981\u201d\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u56de\u987e\u4e86\u5173\u4e8e\u8bb0\u5fc6\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4f1a\u5c55\u793a\u8bb0\u5fc6\u6a21\u5757\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u626e\u6f14\u7684\u91cd\u8981\u89d2\u8272\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f1a\u5206\u6790\u73b0\u6709\u5de5\u4f5c\u7684\u5c40\u9650\uff0c\u5e76\u6307\u51fa\u91cd\u8981\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u4e86\u8ddf\u8e2a\u8be5\u9886\u57df\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2aGitHub\u4ed3\u5e93\uff1a\\url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}\u3002**|\n", "2404.11964": "|**2024-04-18**|**From Language Models to Practical Self-Improving Computer Agents**|Alex Sheng et.al.|[2404.11964](http://arxiv.org/abs/2404.11964)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u76f4\u63a5\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u521b\u5efa\u80fd\u591f\u6267\u884c\u5404\u79cd\u8ba1\u7b97\u673a\u4efb\u52a1\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u5e76\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u6765\u53d1\u5c55\u5de5\u5177\u548c\u589e\u5f3a\u529f\u80fd\uff0c\u4ee5\u89e3\u51b3\u65e5\u76ca\u590d\u6742\u7684\u4efb\u52a1\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u663e\u793a\u51fa\u4ece\u975e\u53c2\u6570\u589e\u5f3a\u4e2d\u83b7\u76ca\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u5927\u91cf\u96c6\u4e2d\u5728\u5f00\u53d1\u8f6f\u4ef6\uff0c\u4ee5\u8d4b\u4e88LLMs\u5404\u79cd\u80fd\u529b\u3002\u6211\u4eec\u5efa\u8bae\uff0c\u901a\u8fc7\u9002\u5f53\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u4e00\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u7cfb\u7edf\u5730\u751f\u6210\u8f6f\u4ef6\u6765\u589e\u5f3a\u81ea\u8eab\uff0c\u800c\u4e0d\u662f\u4f9d\u8d56\u4eba\u7c7b\u5de5\u7a0b\u7684\u9759\u6001\u8f6f\u4ef6\u5f00\u53d1\u3002 \u6211\u4eec\u901a\u8fc7\u4e00\u4e9b\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff1a\u4ec5\u901a\u8fc7\u7ec8\u7aef\u8bbf\u95ee\uff0c\u6211\u4eec\u5f15\u5bfcLLM\u4ee3\u7406\u6dfb\u52a0\u4e86\u68c0\u7d22\u3001\u4e92\u8054\u7f51\u641c\u7d22\u3001\u7f51\u9875\u5bfc\u822a\u548c\u6587\u672c\u7f16\u8f91\u529f\u80fd\u3002\u8be5\u4ee3\u7406\u6709\u6548\u5730\u5229\u7528\u8fd9\u4e9b\u5de5\u5177\u89e3\u51b3\u4e86\u95ee\u9898\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u548c\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\u3002\u8fd9\u79cd\u65b9\u6cd5\u8868\u660e\uff0c\u901a\u8fc7\u8fde\u7eed\u63d0\u95ee\u548c\u5de7\u5999\u7684\u63d0\u793a\u8bbe\u8ba1\uff0cLLM\u80fd\u591f\u81ea\u4e3b\u6269\u5c55\u5176\u529f\u80fd\uff0c\u6267\u884c\u5b9e\u9645\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002|\n", "2404.11794": "|**2024-04-25**|**Automated Social Science: Language Models as Scientist and Subjects**|Benjamin S. Manning et.al.|[2404.11794](http://arxiv.org/abs/2404.11794)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u6784\u5efa\u548c\u6d4b\u8bd5\u793e\u4f1a\u79d1\u5b66\u5047\u8bbe\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u5173\u952e\u5728\u4e8e\u4f7f\u7528\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u3002\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9648\u8ff0\u5047\u8bbe\u7684\u8bed\u8a00\u3001\u6784\u5efaLLM\u57fa\u7840\u4ee3\u7406\u7684\u84dd\u56fe\u3001\u5b9e\u9a8c\u8bbe\u8ba1\u4ee5\u53ca\u6570\u636e\u5206\u6790\u8ba1\u5212\u3002\u62df\u5408\u540e\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u53ef\u4f9b\u9884\u6d4b\u6216\u89c4\u5212\u540e\u7eed\u5b9e\u9a8c\u3002\u6211\u4eec\u901a\u8fc7\u51e0\u4e2a\u573a\u666f\u8fdb\u884c\u4e86\u6f14\u793a\uff1a\u8c08\u5224\u3001\u4fdd\u91ca\u542c\u8bc1\u4f1a\u3001\u6c42\u804c\u9762\u8bd5\u548c\u62cd\u5356\u3002\u5728\u8fd9\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7cfb\u7edf\u65e2\u63d0\u51fa\u4e86\u56e0\u679c\u5173\u7cfb\uff0c\u4e5f\u8fdb\u884c\u4e86\u68c0\u9a8c\uff0c\u53d1\u73b0\u4e86\u4e00\u4e9b\u8bc1\u636e\uff0c\u800c\u6709\u4e9b\u5219\u6ca1\u6709\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u4ece\u8fd9\u4e9b\u793e\u4f1a\u4e92\u52a8\u6a21\u62df\u4e2d\u83b7\u53d6\u7684\u6d1e\u5bdf\u5e76\u975e\u4ec5\u901a\u8fc7\u76f4\u63a5\u8be2\u95eeLLM\u5c31\u80fd\u83b7\u5f97\u3002\u5f53\u7ed9\u5b9a\u6bcf\u4e2a\u573a\u666f\u7684\u5efa\u8bae\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u65f6\uff0cLLM\u5728\u9884\u6d4b\u4f30\u8ba1\u6548\u5e94\u7684\u7b26\u53f7\u65b9\u9762\u8868\u73b0\u826f\u597d\uff0c\u4f46\u65e0\u6cd5\u53ef\u9760\u5730\u9884\u6d4b\u6548\u5e94\u7684\u5927\u5c0f\u3002\u5728\u62cd\u5356\u5b9e\u9a8c\u4e2d\uff0c\u6a21\u62df\u7ed3\u679c\u4e0e\u62cd\u5356\u7406\u8bba\u7684\u9884\u6d4b\u7d27\u5bc6\u543b\u5408\uff0c\u4f46LLM\u76f4\u63a5\u63d0\u53d6\u7684\u6e05\u7b97\u4ef7\u683c\u9884\u6d4b\u4e0d\u51c6\u786e\u3002\u7136\u800c\uff0c\u5982\u679c\u6a21\u578b\u80fd\u57fa\u4e8e\u62df\u5408\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u8fdb\u884c\u6761\u4ef6\u5316\uff0cLLM\u7684\u9884\u6d4b\u4f1a\u5927\u5e45\u6539\u8fdb\u3002\u7b80\u800c\u8a00\u4e4b\uff0cLLM\u77e5\u9053\u7684\u6bd4\u5b83\u80fd\u7acb\u5373\u8868\u8fbe\u7684\u8981\u591a\u3002|\n", "2404.11483": "|**2024-04-17**|**AgentKit: Flow Engineering with Graphs, not Coding**|Yue Wu et.al.|[2404.11483](http://arxiv.org/abs/2404.11483)|**[link](https://github.com/holmeswww/agentkit)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u76f4\u89c2\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u6846\u67b6\uff08AgentKit\uff09\uff0c\u65e8\u5728\u4e3a\u591a\u529f\u80fd\u4ee3\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u65b9\u6cd5\u3002AgentKit\u901a\u8fc7\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6784\u5efa\u590d\u6742\u7684\u201c\u601d\u7ef4\u8fc7\u7a0b\u201d\u3002\u5176\u57fa\u672c\u5355\u5143\u662f\u8282\u70b9\uff0c\u5305\u542b\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u3002\u7528\u6237\u53ef\u4ee5\u50cf\u62fc\u63a5\u4e50\u9ad8\u79ef\u6728\u4e00\u6837\u8fde\u63a5\u8fd9\u4e9b\u8282\u70b9\uff0c\u4ece\u800c\u660e\u786e\u8bbe\u8ba1\u51fa\u81ea\u7136\u7ed3\u6784\u5316\u7684\u201c\u601d\u8003\u6d41\u7a0b\u201d\u3002\u4f8b\u5982\uff0c\u5728\u64b0\u5199\u8bba\u6587\u65f6\uff0c\u53ef\u80fd\u7684\u6b65\u9aa4\u5305\u62ec\uff1a1\uff09\u786e\u5b9a\u6838\u5fc3\u4fe1\u606f\uff0c2\uff09\u8bc6\u522b\u7814\u7a76\u7a7a\u767d\u7b49\u3002AgentKit\u7684\u6a21\u5757\u5316\u7279\u6027\u4f7f\u5f97\u9ad8\u7ea7\u529f\u80fd\u5982\u5373\u5174\u7684\u5c42\u6b21\u5316\u89c4\u5212\u3001\u53cd\u601d\u548c\u4ece\u4e92\u52a8\u4e2d\u5b66\u4e60\u53d8\u5f97\u53ef\u80fd\u3002\u7531\u4e8e\u5176\u76f4\u89c2\u4e14\u6a21\u62df\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u7684\u8bbe\u8ba1\uff0c\u5373\u4f7f\u6ca1\u6709\u7f16\u7a0b\u7ecf\u9a8c\u7684\u4eba\u4e5f\u80fd\u521b\u5efa\u548c\u8c03\u6574\u57fa\u7840\u4ee3\u7406\u3002\u5b9a\u91cf\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528AgentKit\u8bbe\u8ba1\u7684\u4ee3\u7406\u5728WebShop\u548cCrafter\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u6210\u679c\u8868\u660eAgentKit\u6709\u6f5c\u529b\u4f7fLLM\u4ee3\u7406\u5728\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u4e0b\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728GitHub\uff1ahttps://github.com/holmeswww/AgentKit\u3002**|\n", "2404.09982": "|**2024-04-15**|**Memory Sharing for Large Language Model based Agents**|Hang Gao et.al.|[2404.09982](http://arxiv.org/abs/2404.09982)|**[link](https://github.com/ghupppp/memorysharingllm)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6267\u884c\u4efb\u52a1\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u91cd\u5927\u7a81\u7834\uff0c\u5b83\u51cf\u5c11\u4e86\u5bf9\u56fa\u5b9a\u7b54\u6848\u4efb\u52a1\uff08\u5982\u5e38\u8bc6\u95ee\u9898\u548c\u662f\u975e\u67e5\u8be2\uff09\u7684\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03\u9700\u6c42\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u5f00\u653e\u6027\u6311\u6218\u5982\u8bd7\u6b4c\u521b\u4f5c\u65f6\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5c40\u9650\uff0c\u4e3b\u8981\u6e90\u4e8e\u63d0\u4f9b\u7684\u793a\u4f8b\u5168\u9762\u6027\u4ee5\u53ca\u6a21\u578b\u7406\u89e3\u95ee\u9898\u5185\u5bb9\u7684\u80fd\u529b\u4e0d\u8db3\uff0c\u5bfc\u81f4\u8f93\u51fa\u5f80\u5f80\u4e0e\u9884\u671f\u7ed3\u679c\u5927\u76f8\u5f84\u5ead\u3002\u9488\u5bf9\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86Memory-Sharing\uff08MS\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLM\u591a\u4ee3\u7406\u7684\u5b9e\u65f6\u8bb0\u5fc6\u5b58\u50a8\u548c\u68c0\u7d22\u7cfb\u7edf\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u8fc7\u7a0b\u3002\u6bcf\u4e2a\u201c\u8bb0\u5fc6\u201d\u5355\u5143\u8bb0\u5f55\u4e86\u63d0\u51fa\u7684\u67e5\u8be2\u53ca\u5176\u6765\u81eaLLM\u4ee3\u7406\u7684\u5373\u65f6\u54cd\u5e94\uff0c\u4ece\u591a\u4e2a\u7c7b\u4f3c\u4ee3\u7406\u4e2d\u805a\u5408\u8fd9\u4e9b\u8bb0\u5fc6\uff0c\u5f62\u6210\u6240\u6709\u4ee3\u7406\u5171\u4eab\u7684\u4e30\u5bcc\u8bb0\u5fc6\u6c60\u3002MS\u6846\u67b6\u4e0d\u4ec5\u5e2e\u52a9\u4ee3\u7406\u627e\u5230\u7279\u5b9a\u4efb\u52a1\u7684\u76f8\u5173\u793a\u4f8b\uff0c\u8fd8\u8bc4\u4f30\u5176\u8bb0\u5fc6\u7684\u6f5c\u5728\u5229\u7528\u4ef7\u503c\uff0c\u4f9b\u5176\u4ed6\u4ee3\u7406\u672a\u6765\u5e94\u7528\u3002\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u7684\u5b9e\u8bc1\u9a8c\u8bc1\u663e\u793a\uff0cMS\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5904\u7406\u5f00\u653e\u6027\u95ee\u9898\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u54ea\u79cd\u8bb0\u5fc6\u6c60\u548c\u68c0\u7d22\u7b56\u7565\u80fd\u66f4\u597d\u5730\u652f\u6301\u4ee3\u7406\uff0c\u4e3aMS\u7684\u672a\u6765\u53d1\u5c55\u63d0\u4f9b\u4e86\u65b9\u5411\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\uff1ahttps://github.com/GHupppp/MemorySharingLLM \u83b7\u53d6\u3002**|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127](http://arxiv.org/abs/2404.09127)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|**### \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u5b83\u4eec\u901a\u5e38\u6821\u51c6\u4e0d\u826f\u4e14\u8fc7\u5ea6\u81ea\u4fe1\uff0c\u7279\u522b\u662f\u5728\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u4e2d\u3002\u4eba\u7c7b\u7684\u51b3\u7b56\u548c\u4fe1\u5fc3\u4e0d\u4ec5\u6e90\u4e8e\u5185\u5728\u4fe1\u5ff5\uff0c\u8fd8\u80fd\u901a\u8fc7\u65e5\u5e38\u89c2\u5bdf\u8fdb\u884c\u8c03\u6574\uff0c\u800c\u73b0\u6709LLM\u7684\u6821\u51c6\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u5355\u4e2a\u6a21\u578b\u7684\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u201c\u96c6\u4f53\u667a\u6167\u201d\uff1a\u591a\u4e2aLLM\u4e4b\u95f4\u7684\u534f\u4f5c\u8868\u8fbe\u80fd\u529b\uff0c\u8fd9\u53ef\u4ee5\u96c6\u4f53\u63d0\u9ad8\u51c6\u786e\u6027\u548c\u6821\u51c6\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u8bad\u7ec3\u540e\u5904\u7406\u7684\u6821\u51c6\u7b56\u7565\u2014\u2014\u534f\u4f5c\u6821\u51c6\uff08Collaborative Calibration\uff09\uff0c\u5b83\u5229\u7528\u591a\u4ee3\u7406\u5de5\u5177\u589e\u5f3a\u7684LLMs\u5728\u6a21\u62df\u7684\u7fa4\u4f53\u8ba8\u8bba\u8fc7\u7a0b\u4e2d\uff0c\u5171\u540c\u63d0\u5347\u6821\u51c6\u80fd\u529b\u548c\u63a8\u7406\u5408\u7406\u6027\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u751f\u6210\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u534f\u4f5c\u6821\u51c6\u7684\u6709\u6548\u6027\uff0c\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u6574\u5408\u96c6\u4f53\u6821\u51c6\u540e\u7684\u4fe1\u5fc3\u8bc4\u4f30\u548c\u63d0\u5347\u6a21\u578b\u9884\u6d4b\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077](http://arxiv.org/abs/2404.09077)|**[link](https://github.com/zukangy/kgp-curiousllm)**|**\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u5e93\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u6210\u6548\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u5f80\u5f80\u529b\u6709\u4e0d\u902e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5bf9\u4e00\u79cd\u540d\u4e3a\u77e5\u8bc6\u56fe\u8c31\u63d0\u793a\uff08KGP\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4ee5\u63d0\u5347\u63a8\u7406\u548c\u641c\u7d22\u7cbe\u5ea6\u3002\u7136\u800c\uff0c\u539f\u59cb\u7684KGP\u6846\u67b6\u9700\u8981\u6602\u8d35\u7684\u5927\u89c4\u6a21\u6570\u636e\u5fae\u8c03\uff0c\u5e76\u4e14\u4ecd\u5b58\u5728LLM\u7684\u9519\u8bef\u63a8\u65ad\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u878d\u5165\u63a8\u7406\u80fd\u529b\u7684LLM\u4ee3\u7406\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u7684\u597d\u5947\u5fc3\uff0c\u901a\u8fc7\u63d0\u95ee\u6765\u66f4\u6709\u6548\u5730\u5bfc\u822a\u641c\u7d22\u8fc7\u7a0b\u3002\u8fd9\u4e2a\u7b80\u5355\u7684\u6539\u8fdb\u663e\u8457\u63d0\u9ad8\u4e86LLM\u5728QA\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u521d\u59cbKGP\u6846\u67b6\u7684\u9ad8\u6210\u672c\u548c\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u8fdb\u4e00\u6b65\u53d1\u5c55\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u7cbe\u786e\u3001\u66f4\u5feb\u6377\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684QA\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2404.09043": "|**2024-04-13**|**Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation**|Jia Gu et.al.|[2404.09043](http://arxiv.org/abs/2404.09043)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u98de\u901f\u53d1\u5c55\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528LLMs\u6a21\u62df\u4eba\u7c7b\u7684\u884c\u4e3a\u51b3\u7b56\u8fc7\u7a0b\uff0c\u901a\u5e38\u8fd9\u4e9b\u8fc7\u7a0b\u88ab\u8868\u793a\u4e3a\u9a6c\u5c14\u53ef\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDPs\uff09\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u52a8\u4f5c\u9075\u5faa\u7279\u5b9a\u7684\u6982\u7387\u5206\u5e03\uff0c\u5e76\u9700\u8981\u8fed\u4ee3\u91c7\u6837\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u63a2\u7a76LLM\u4ee3\u7406\u7406\u89e3\u6982\u7387\u5206\u5e03\u7684\u80fd\u529b\uff0c\u4ee5\u901a\u8fc7\u6982\u7387\u91c7\u6837\u6307\u5bfc\u884c\u4e3a\u51b3\u7b56\u5e76\u751f\u6210\u884c\u4e3a\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u65b9\u9762\uff1a\u4e00\u662f\u5df2\u77e5\u7cbe\u786e\u6982\u7387\u5206\u5e03\u7684\u6a21\u62df\uff0c\u4e8c\u662f\u6a21\u7cca\u6982\u7387\u5206\u5e03\u7684\u5e8f\u5217\u751f\u6210\u3002 \u5728\u5df2\u77e5\u6982\u7387\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\uff0c\u4ee3\u7406\u9700\u8981\u6839\u636e\u95ee\u9898\u63cf\u8ff0\u63d0\u4f9b\u6982\u7387\u5206\u5e03\u7684\u7c7b\u578b\u548c\u53c2\u6570\uff0c\u7136\u540e\u7ed9\u51fa\u91c7\u6837\u5e8f\u5217\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u663e\u793a\uff0cLLM\u4ee3\u7406\u5728\u8fd9\u65b9\u9762\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4f46\u901a\u8fc7\u7f16\u7a0b\u5de5\u5177\u53ef\u4ee5\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u63d0\u9ad8\u91c7\u6837\u6210\u529f\u7387\u3002\u800c\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\uff0c\u6982\u7387\u5206\u5e03\u5f80\u5f80\u4e0d\u660e\u786e\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5728\u7b2c\u4e8c\u90e8\u5206\u8ba9\u4ee3\u7406\u8c03\u6574\u5728\u7ebf\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u6d3b\u8dc3\u5ea6\uff0c\u5e76\u5206\u6790\u884c\u52a8\u9891\u7387\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u501f\u52a9\u7f16\u7a0b\u5de5\u5177\uff0cLLM\u4ee3\u7406\u4f9d\u7136\u65e0\u6cd5\u6709\u6548\u5730\u91c7\u6837\u6982\u7387\u5206\u5e03\u3002\u8fd9\u610f\u5473\u7740\u5728\u76f4\u63a5\u5c06LLM\u4f5c\u4e3a\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\u5e94\u7528\u4e4b\u524d\uff0c\u8fd8\u9700\u8981\u8c28\u614e\u5bf9\u5f85\u3002|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492](http://arxiv.org/abs/2404.08492)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u535a\u5f08\u8bba\u6846\u67b6\u4e0b\u7684\u6e38\u620f\u884c\u4e3a\u7406\u89e3\u6f5c\u529b\u65e5\u76ca\u663e\u73b0\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u901a\u8fc7\u6a21\u62df\u5206\u6790\u4e0d\u540c\u7c7b\u578bLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u7ecf\u5178 Beauty Contest \u6e38\u620f\u4e2d\u7684\u7b56\u7565\u4e92\u52a8\u3002\u501f\u9274\u4eba\u7c7b\u5b9e\u9a8c\uff0c\u6211\u4eec\u5bf9LLM\u4ee3\u7406\u7684\u7b56\u7565\u5c42\u6b21\u8fdb\u884c\u7c7b\u4f3c\u7684\u8bc4\u4f30\uff0c\u53d1\u73b0\u5b83\u4eec\u5c55\u73b0\u51fa\u4ece\u96f6\u7ea7\u5230\u4e00\u7ea7\u7684\u4e0d\u540c\u7a0b\u5ea6\u63a8\u7406\u80fd\u529b\uff0c\u5e76\u5728\u91cd\u590d\u6e38\u620f\u4e2d\u8868\u73b0\u51fa\u884c\u52a8\u8d8b\u540c\u3002\u6b64\u5916\uff0c\u6211\u8fd8\u63a2\u8ba8\u4e86\u4e0d\u540c\u7c7b\u578b\u7684\u4ee3\u7406\u7fa4\u4f53\u6784\u6210\u5982\u4f55\u5f71\u54cd\u6218\u7565\u884c\u4e3a\uff1a\u9ad8\u6bd4\u4f8b\u7684\u56fa\u5b9a\u7b56\u7565\u5bf9\u624b\u80fd\u4fc3\u8fdbLLM\u4ee3\u7406\u7684\u6536\u655b\uff0c\u800c\u6df7\u5408\u73af\u5883\u4e2d\u4e0d\u540c\u76f8\u5bf9\u7b56\u7565\u6c34\u5e73\u7684\u4ee3\u7406\u5171\u5b58\u4f1a\u52a0\u901f\u6240\u6709\u4ee3\u7406\u7684\u6536\u655b\u3002\u66f4\u667a\u80fd\u7684\u4ee3\u7406\u53ef\u80fd\u83b7\u5f97\u66f4\u9ad8\u7684\u5e73\u5747\u6536\u76ca\uff0c\u4f46\u8fd9\u662f\u4ee5\u8f83\u4f4e\u667a\u80fd\u4ee3\u7406\u7684\u727a\u7272\u4e3a\u4ee3\u4ef7\u7684\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e0d\u4ec5\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u60c5\u666f\u4e0b\u6a21\u62df\u4ee3\u7406\u7684\u7ed3\u5c40\uff0c\u8fd8\u4e3a\u7406\u89e3\u7b97\u6cd5\u4e4b\u95f4\u7684\u6218\u7565\u4e92\u52a8\u63d0\u4f9b\u4e86\u91cd\u8981\u542f\u793a\u3002|\n", "2404.08144": "|**2024-04-17**|**LLM Agents can Autonomously Exploit One-day Vulnerabilities**|Richard Fang et.al.|[2404.08144](http://arxiv.org/abs/2404.08144)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5a01\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5176\u5728\u826f\u6027\u548c\u6076\u610f\u7528\u9014\u4e0a\u7684\u5e94\u7528\u4e5f\u65e5\u76ca\u5e7f\u6cdb\u3002\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u5b83\u4eec\u5229\u7528\u7f51\u7edc\u5b89\u5168\u6f0f\u6d1e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u81ea\u4e3b\u7834\u89e3\u7f51\u7ad9\u7684\u53ef\u80fd\u6027\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u5355\u7684\u6f0f\u6d1e\u4e0a\u3002\u672c\u5de5\u4f5c\u63ed\u793a\uff0cLLMs\u80fd\u591f\u81ea\u4e3b\u5229\u7528\u73b0\u5b9e\u4e16\u754c\u7cfb\u7edf\u4e2d\u7684\u5355\u65e5\u6f0f\u6d1e\u3002\u6211\u4eec\u6536\u96c6\u4e86\u4e00\u7ec4\u5305\u542b15\u4e2a\u88abCVE\u63cf\u8ff0\u4e3a\u201c\u5173\u952e\u4e25\u91cd\u6027\u201d\u7684\u4e00\u5929\u671f\u6f0f\u6d1e\u6570\u636e\u3002\u5f53\u63d0\u4f9bCVE\u63cf\u8ff0\u65f6\uff0cGPT-4\u6a21\u578b\u80fd\u6210\u529f\u5229\u752887%\u7684\u6f0f\u6d1e\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5176\u4ed6\u6d4b\u8bd5\u6a21\u578b\uff08\u5982GPT-3.5\u3001\u5f00\u6e90LLMs\u548c\u5f00\u6e90\u6f0f\u6d1e\u626b\u63cf\u5668ZAP\u548cMetasploit\uff09\u7684\u8868\u73b0\u5747\u4e3a0%\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684GPT-4\u6a21\u578b\u5728\u6ca1\u6709\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u6548\u7387\u5927\u51cf\uff0c\u4ec5\u80fd\u5229\u75287%\u7684\u6f0f\u6d1e\u3002\u8fd9\u4e9b\u53d1\u73b0\u5bf9\u5927\u89c4\u6a21\u90e8\u7f72\u9ad8\u80fd\u529bLLMs\u63d0\u51fa\u4e86\u8d28\u7591\u3002|\n", "2404.17586": "|**2024-04-11**|**The Future of Scientific Publishing: Automated Article Generation**|Jeremy R. Harper et.al.|[2404.17586](http://arxiv.org/abs/2404.17586)|null|\u8fd9\u9879\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8f6f\u4ef6\u5de5\u5177\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u793a\uff0c\u5b9e\u73b0\u4e86\u4ecePython\u4ee3\u7801\u81ea\u52a8\u751f\u6210\u5b66\u672f\u6587\u7ae0\uff0c\u8fd9\u5bf9\u4e8e\u751f\u7269\u533b\u5b66\u4fe1\u606f\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u9009\u62e9Python\u4f5c\u4e3a\u57fa\u7840\u793a\u4f8b\uff0c\u56e0\u5176\u5e7f\u6cdb\u4f7f\u7528\u548c\u5f3a\u5927\u7684\u6570\u636e\u5206\u6790\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u548c\u6846\u67b6\u7684\u7075\u6d3b\u6027\u4f7f\u5f97\u5176\u9002\u7528\u4e8e\u591a\u79cdGitHub\u4ed3\u5e93\uff0c\u8868\u660e\u4e86\u5de5\u5177\u7684\u5e7f\u6cdb\u5e94\u7528\u6f5c\u529b\uff08Harper\uff0c2024\u5e74\uff09\u3002\u901a\u8fc7\u7b80\u5316\u4f20\u7edf\u4e0a\u8017\u65f6\u7684\u5b66\u672f\u5199\u4f5c\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5728\u6574\u5408\u590d\u6742\u6570\u636e\u96c6\u548c\u4ee3\u7801\u8f93\u51fa\u65b9\u9762\uff0c\u8fd9\u4e00\u7a81\u7834\u6027\u8fdb\u5c55\u63a8\u52a8\u4e86\u79d1\u7814\u6210\u679c\u7684\u5feb\u901f\u4f20\u64ad\u3002\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u5e76\u672a\u4f9d\u8d56\u9ad8\u7ea7\u8bed\u8a00\u6a21\u578b\uff0c\u786e\u4fdd\u4e86\u81ea\u52a8\u5316\u751f\u6210\u5185\u5bb9\u7684\u8fde\u8d2f\u6027\u548c\u5b8c\u6574\u6027\u3002\u6b64\u6b21\u63a2\u7d22\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u8f6f\u4ef6\u7684\u6210\u529f\u5e94\u7528\u548c\u6548\u7387\uff0c\u8fd8\u9884\u793a\u4e86\u672a\u6765\u53ef\u80fd\u96c6\u6210\u66f4\u5148\u8fdb\u7684LLM\uff0c\u5c06\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u529f\u80fd\uff0c\u5f15\u9886\u4e00\u4e2a\u79d1\u7814\u53d1\u73b0\u53d1\u5e03\u66f4\u52a0\u8fc5\u901f\u548c\u6613\u83b7\u53d6\u7684\u65f6\u4ee3\u3002|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456](http://arxiv.org/abs/2404.07456)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u4f5c\u4e3a\u667a\u80fd\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u5de5\u7a0b\u6216\u4efb\u52a1\u7279\u5b9a\u7684\u5fae\u8c03\u6765\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u6216\u51b3\u7b56\u80fd\u529b\uff0c\u5ffd\u89c6\u4e86\u63a2\u7d22\u4e0e\u5229\u7528\u7684\u8fc7\u7a0b\u3002\u5728\u5904\u7406\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u9996\u5148\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u73af\u5883\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u6a21\u578b\u503e\u5411\u4e8e\u505a\u51fa\u8d2a\u5a6a\u51b3\u7b56\uff0c\u5bfc\u81f4\u89e3\u51b3\u65b9\u6848\u4e0d\u7406\u60f3\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u4ece\u73af\u5883\u4e2d\u83b7\u53d6\u7684\u65e0\u5173\u4fe1\u606f\u4e0d\u4ec5\u5f15\u5165\u566a\u58f0\uff0c\u8fd8\u589e\u52a0\u4e86\u989d\u5916\u7684\u6210\u672c\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u5f31\u63a2\u7d22\u5f3a\u5316\u5f3a\u5229\u7528\uff08Weak Exploration to Strong Exploitation\uff0cWESE\uff09\uff0c\u65e8\u5728\u589e\u5f3aLLM\u5728\u89e3\u51b3\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0cWESE\u5c06\u63a2\u7d22\u548c\u5229\u7528\u8fc7\u7a0b\u89e3\u8026\uff0c\u4f7f\u7528\u6210\u672c\u6548\u76ca\u9ad8\u7684\u201c\u5f31\u201d\u4ee3\u7406\u6267\u884c\u63a2\u7d22\u4efb\u52a1\uff0c\u4ee5\u83b7\u53d6\u5168\u5c40\u77e5\u8bc6\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7b56\u7565\u6765\u5b58\u50a8\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5e76\u63d0\u53d6\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u4ece\u800c\u63d0\u5347\u201c\u5f3a\u201d\u4ee3\u7406\u5728\u6210\u529f\u7387\u548c\u6548\u7387\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5e76\u5728\u56db\u4e2a\u4e92\u52a8\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u548c\u6548\u7387\u3002|\n", "2404.06921": "|**2024-04-10**|**GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications**|Shishir G. Patil et.al.|[2404.06921](http://arxiv.org/abs/2404.06921)|**[link](https://github.com/ShishirPatil/gorilla)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0d\u518d\u4ec5\u4ec5\u662f\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u4fe1\u606f\u63d0\u4f9b\u8005\uff0c\u800c\u662f\u5f00\u59cb\u79ef\u6781\u53c2\u4e0e\u5230\u4e0e\u5b9e\u9645\u5e94\u7528\u548c\u670d\u52a1\u7684\u4e92\u52a8\u4e2d\u3002\u5982\u4eca\uff0c\u4eba\u7c7b\u5728\u5c06LLM\u751f\u6210\u7684\u8f93\u51fa\uff08\u5982\u4ee3\u7801\u3001\u51fd\u6570\u6216\u64cd\u4f5c\uff09\u6295\u5165\u73b0\u5b9e\u4e16\u754c\u6267\u884c\u524d\uff0c\u9700\u8981\u9a8c\u8bc1\u5176\u6b63\u786e\u6027\u548c\u9002\u7528\u6027\uff0c\u8fd9\u5e26\u6765\u4e86\u6311\u6218\uff0c\u56e0\u4e3a\u4ee3\u7801\u7406\u89e3\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u975e\u5e38\u56f0\u96be\u3002\u672c\u6587\u7814\u7a76\u4e86\u4eba\u7c7b\u5982\u4f55\u80fd\u6709\u6548\u4e0eLLMs\u534f\u4f5c\u3001\u59d4\u6d3e\u548c\u76d1\u7763\uff0c\u7279\u522b\u662f\u5728\u672a\u6765\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u5bf9\u63d0\u51fa\u7684\u884c\u52a8\u8fdb\u884c\u201c\u4e8b\u540e\u9a8c\u8bc1\u201d\uff08\u5728\u770b\u5230\u8f93\u51fa\u540e\u786e\u8ba4\u5176\u6b63\u786e\u6027\uff09\u6bd4\u4e4b\u524d\u7684\u201c\u4e8b\u524d\u9a8c\u8bc1\u201d\u66f4\u4e3a\u5bb9\u6613\u3002\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u96c6\u6210\u76f4\u89c2\u7684\u64a4\u9500\u529f\u80fd\uff0c\u5e76\u4e3aLLM\u751f\u6210\u7684\u52a8\u4f5c\u8bbe\u5b9a\u635f\u5bb3\u7ea6\u675f\uff0c\u4f5c\u4e3a\u964d\u4f4e\u76f8\u5173\u98ce\u9669\u7684\u6709\u6548\u7b56\u7565\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u4eba\u7c7b\u53ef\u4ee5\u64a4\u9500LLM\u8f93\u51fa\u7684\u5f71\u54cd\uff0c\u6216\u8005\u786e\u4fe1\u6f5c\u5728\u98ce\u9669\u662f\u6709\u9650\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u5b9e\u73b0LLMs\u4e0e\u5e94\u7528\u548c\u670d\u52a1\u5728\u6709\u9650\u7684\u4eba\u7c7b\u76d1\u7763\u4e0b\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u63cf\u8ff0\u4e86\u5f00\u6e90\u8fd0\u884c\u65f6Gorilla Execution Engine\uff08GoEX\uff09\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\uff0c\u8be5\u8fd0\u884c\u65f6\u7528\u4e8e\u6267\u884cLLM\u52a8\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e9b\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\uff0c\u65e8\u5728\u63a8\u52a8LLMs\u4e0e\u5e94\u7528\u4e4b\u95f4\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u8fdb\u884c\u4ea4\u4e92\u3002GoEX\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ShishirPatil/gorilla/\u3002**|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411](http://arxiv.org/abs/2404.06411)|**[link](https://github.com/nec-research/agentquest)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\uff0c\u4eba\u4eec\u8ffd\u6c42\u80fd\u591f\u89e3\u51b3\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u7684LLM\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u5f80\u5f80\u5c40\u9650\u4e14\u53ea\u5173\u6ce8\u6574\u4f53\u4efb\u52a1\u6210\u529f\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AgentQuest\u6846\u67b6\uff0c\u5b83\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\uff08i\uff09benchmark\u548c\u8bc4\u4f30\u6307\u6807\u6a21\u5757\u5316\u4e14\u6613\u4e8e\u6269\u5c55\uff0c\u901a\u8fc7\u6587\u6863\u9f50\u5168\u3001\u6613\u7528\u7684API\uff1b\uff08ii\uff09\u6211\u4eec\u63d0\u4f9b\u4e86\u4e24\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u80fd\u591f\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u53ef\u9760\u5730\u8ffd\u8e2aLLM\u4ee3\u7406\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u793a\u4f8b\u5c55\u793a\u4e86\u8fd9\u4e9b\u6307\u6807\u7684\u5b9e\u7528\u6027\uff0c\u901a\u8fc7\u8bc6\u522b\u5e38\u89c1\u5931\u8d25\u70b9\u5e76\u4f18\u5316\u4ee3\u7406\u67b6\u6784\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6211\u4eec\u5e0c\u671b\u4e0e\u7814\u7a76\u754c\u5171\u540c\u6269\u5c55AgentQuest\uff0c\u5e76\u5df2\u5c06\u5176\u5f00\u6e90\u5728https://github.com/nec-research/agentquest\u3002**|\n", "2404.05427": "|**2024-04-15**|**AutoCodeRover: Autonomous Program Improvement**|Yuntong Zhang et.al.|[2404.05427](http://arxiv.org/abs/2404.05427)|**[link](https://github.com/nus-apr/auto-code-rover)**|**\u5728\u8fc7\u53bb\u51e0\u5341\u5e74\u91cc\uff0c\u7814\u7a76\u4eba\u5458\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u6781\u5927\u5730\u63a8\u52a8\u4e86\u7f16\u7a0b\u8f85\u52a9\u7684\u81ea\u52a8\u5316\u3002\u7136\u800c\uff0c\u8f6f\u4ef6\u5de5\u7a0b\u5e76\u4e0d\u4ec5\u4ec5\u662f\u7f16\u7801\uff0c\u8fd8\u5305\u62ec\u7ef4\u62a4\uff08\u5982\u4fee\u590dbug\uff09\u548c\u6f14\u5316\uff08\u5982\u6dfb\u52a0\u529f\u80fd\uff09\u7b49\u7a0b\u5e8f\u6539\u8fdb\u8fc7\u7a0b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u89e3\u51b3GitHub\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5b9e\u73b0\u7a0b\u5e8f\u81ea\u4e3b\u6539\u8fdb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u79f0\u4e3aAutoCodeRover\uff0c\u5b83\u7ed3\u5408\u4e86LLMs\u4e0e\u9ad8\u7ea7\u4ee3\u7801\u641c\u7d22\u80fd\u529b\uff0c\u6700\u7ec8\u751f\u6210\u7a0b\u5e8f\u4fee\u6539\u6216\u8865\u4e01\u3002\u4e0eAI\u7814\u7a76\u8005\u548c\u4ece\u4e1a\u8005\u8fd1\u671f\u5173\u6ce8\u7684\u4ec5\u6587\u4ef6\u7ea7\u522b\u7684\u8f6f\u4ef6\u9879\u76ee\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u7a0b\u5e8f\u8868\u793a\uff08\u62bd\u8c61\u8bed\u6cd5\u6811\uff09\uff0c\u5229\u7528\u7c7b/\u65b9\u6cd5\u7684\u7a0b\u5e8f\u7ed3\u6784\u6765\u589e\u5f3aLLM\u5bf9\u95ee\u9898\u6839\u672c\u539f\u56e0\u7684\u7406\u89e3\uff0c\u5e76\u901a\u8fc7\u8fed\u4ee3\u641c\u7d22\u63d0\u4f9b\u4e0a\u4e0b\u6587\u3002\u5f53\u6d4b\u8bd5\u5957\u4ef6\u53ef\u7528\u65f6\uff0c\u8c31\u7cfb\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u6280\u672f\u8fdb\u4e00\u6b65\u7cbe\u786e\u4e86\u4e0a\u4e0b\u6587\u3002 \u5728SWE-bench-lite\uff0c\u4e00\u4e2a\u5305\u542b300\u4e2a\u771f\u5b9eGitHub\u95ee\u9898\u7684\u6570\u636e\u96c6\u4e0a\uff0cAutoCodeRover\u7684\u89e3\u51b3\u65b9\u6848\u6548\u679c\u63d0\u5347\uff0c\u89e3\u51b3\u4e86\u7ea622-23%\u7684\u95ee\u9898\u3002\u5bf9\u4e8e\u5168\u91cf\u7684SWE-bench\uff0c\u5305\u542b2294\u4e2aGitHub\u95ee\u9898\uff0cAutoCodeRover\u89e3\u51b3\u4e86\u5927\u7ea616%\u7684\u95ee\u9898\uff0c\u8fd9\u6bd4\u6700\u8fd1\u62a5\u9053\u7684\u6765\u81eaCognition Labs\u7684AI\u8f6f\u4ef6\u5de5\u7a0b\u5e08Devin\u7684\u8868\u73b0\u8fd8\u8981\u9ad8\uff0c\u800c\u4e14\u65f6\u95f4\u6d88\u8017\u4e0eDevin\u76f8\u5f53\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u6d41\u7a0b\u80fd\u591f\u63a8\u52a8\u81ea\u4e3b\u8f6f\u4ef6\u5de5\u7a0b\u7684\u53d1\u5c55\uff0c\u672a\u6765LLM\u81ea\u52a8\u751f\u6210\u7684\u4ee3\u7801\u53ef\u4ee5\u88ab\u81ea\u52a8\u5730\u8fdb\u884c\u4f18\u5316\u548c\u6539\u8fdb\u3002**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291](http://arxiv.org/abs/2404.05291)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u63d0\u5347\u56db\u8db3\u673a\u5668\u4eba\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u8d85\u8d8a\u77ed\u671f\u52a8\u4f5c\u7684\u957f\u671f\u4efb\u52a1\u3002\u5bf9\u4e8e\u56db\u8db3\u673a\u5668\u4eba\u6765\u8bf4\uff0c\u957f\u671f\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u5bf9\u4efb\u52a1\u7684\u8bed\u4e49\u6709\u9ad8\u5c42\u7406\u89e3\uff0c\u5e76\u5177\u5907\u5e7f\u6cdb\u7684\u8fd0\u52a8\u548c\u64cd\u7eb5\u6280\u80fd\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5c42\u63a8\u7406\u5c42\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u4efb\u52a1\u63cf\u8ff0\u4e2d\u751f\u6210\u6df7\u5408\u79bb\u6563-\u8fde\u7eed\u7684\u8ba1\u5212\uff0c\u4f5c\u4e3a\u673a\u5668\u4eba\u4ee3\u7801\u3002\u5b83\u5305\u62ec\u591a\u4e2aLLM\u4ee3\u7406\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u601d\u8ba1\u5212\u7684\u8bed\u4e49\u89c4\u5212\u5668\u3001\u4e00\u4e2a\u53c2\u6570\u8ba1\u7b97\u5668\uff0c\u7528\u4e8e\u9884\u6d4b\u8ba1\u5212\u4e2d\u7684\u53c2\u6570\uff0c\u4ee5\u53ca\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u5668\uff0c\u5c06\u8ba1\u5212\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684\u673a\u5668\u4eba\u4ee3\u7801\u3002 \u5728\u4f4e\u5c42\u6b21\uff0c\u6211\u4eec\u91c7\u7528\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u4e00\u5957\u8fd0\u52a8\u89c4\u5212\u548c\u63a7\u5236\u6280\u80fd\uff0c\u4ee5\u589e\u5f3a\u56db\u8db3\u673a\u5668\u4eba\u7684\u7075\u6d3b\u6027\uff0c\u4f7f\u5176\u80fd\u8fdb\u884c\u4e30\u5bcc\u73af\u5883\u4ea4\u4e92\u3002\u6211\u4eec\u5728\u96be\u4ee5\u7528\u5355\u4e00\u6280\u80fd\u5b8c\u6210\u7684\u957f\u671f\u4efb\u52a1\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u7cfb\u7edf\u3002\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u8868\u660e\uff0c\u5b83\u6210\u529f\u5730\u5236\u5b9a\u4e86\u591a\u6b65\u9aa4\u7b56\u7565\uff0c\u5e76\u5c55\u73b0\u51fa\u975e\u5e73\u51e1\u7684\u884c\u4e3a\uff0c\u4f8b\u5982\u5236\u4f5c\u5de5\u5177\u6216\u5411\u4eba\u7c7b\u5bfb\u6c42\u5e2e\u52a9\u3002|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667](http://arxiv.org/abs/2404.04667)|null|\u591a\u6a21\u6001\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u6709\u671b\u901a\u8fc7\u89e3\u6790\u5404\u7c7b\u533b\u5b66\u6570\u636e\u63d0\u5347\u4e34\u5e8a\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5404\u533b\u5b66\u9886\u57df\u7684\u6548\u80fd\u5c1a\u4e0d\u660e\u6717\uff0c\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5176\u72ec\u7279\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u5f15\u64ce\u7684\u65b0\u578b\u591a\u6a21\u6001\u533b\u7597AI\u65b9\u6cd5\u3002\u6b64\u5f15\u64ce\u81ea\u4e3b\u534f\u8c03\u5e76\u90e8\u7f72\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u533b\u7597AI\u5de5\u5177\uff0c\u5982\u6587\u672c\u89e3\u8bfb\u3001\u653e\u5c04\u5b66\u548c\u75c5\u7406\u56fe\u50cf\u5206\u6790\u3001\u57fa\u56e0\u6570\u636e\u5904\u7406\u3001\u7f51\u7edc\u641c\u7d22\u4ee5\u53ca\u533b\u7597\u6307\u5357\u6587\u6863\u68c0\u7d22\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u4e34\u5e8a\u80bf\u7624\u5b66\u573a\u666f\u4e2d\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u573a\u666f\u6a21\u62df\u4e86\u5178\u578b\u7684\u60a3\u8005\u62a4\u7406\u6d41\u7a0b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u7cfb\u7edf\u5728\u9009\u62e9\u6070\u5f53\u5de5\u5177\uff0897%\uff09\u3001\u5f97\u51fa\u6b63\u786e\u7ed3\u8bba\uff0893.6%\uff09\u3001\u63d0\u4f9b\u5b8c\u6574\uff0894%\uff09\u548c\u6709\u76ca\uff0889.2%\uff09\u6cbb\u7597\u5efa\u8bae\uff0c\u4ee5\u53ca\u6839\u636e\u6307\u4ee4\u5f15\u7528\u76f8\u5173\u6587\u732e\uff0882.5%\uff09\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u80fd\u529b\u3002\u8fd9\u8868\u660eLLMs\u80fd\u591f\u6709\u6548\u5730\u89c4\u5212\u548c\u6267\u884c\u9886\u57df\u7279\u5b9a\u6a21\u578b\uff0c\u4ee5\u83b7\u53d6\u6216\u5408\u6210\u65b0\u4fe1\u606f\uff0c\u4ece\u800c\u5145\u5f53\u4e2a\u6027\u5316\u4e34\u5e8a\u52a9\u624b\u3002\u6b64\u5916\uff0c\u8fd9\u79cd\u67b6\u6784\u7b80\u5316\u4e86\u76d1\u7ba1\u5408\u89c4\u6027\uff0c\u56e0\u4e3a\u6bcf\u4e2a\u7ec4\u4ef6\u5de5\u5177\u53ef\u4ee5\u5355\u72ec\u9a8c\u8bc1\u548c\u5ba1\u6279\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u9879\u5de5\u4f5c\u4e3a\u533b\u7597\u9886\u57df\u7684\u66f4\u5148\u8fdbLLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u6982\u5ff5\u9a8c\u8bc1\u3002|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237](http://arxiv.org/abs/2404.04237)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u8fdb\u6b65\u4f7f\u5176\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u9891\u9891\u8d85\u8d8a\u4eba\u7c7b\u8868\u73b0\uff0c\u63a8\u52a8\u4e86\u4f17\u591a\u4e0b\u6e38\u5e94\u7528\u7684\u53d1\u5c55\uff0c\u5982\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u770b\u4f3c\u7b80\u5355\u7684\u4efb\u52a1\u4e2d\u610f\u5916\u5730\u8868\u73b0\u4e0d\u4f73\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5bf9\u66f4\u5168\u9762\u548c\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u9700\u6c42\uff0c\u4ee5\u8861\u91cf\u5b83\u4eec\u7684\u5b9e\u9645\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7ec4\u5408\u6027\u548c\u6761\u4ef6\u63a8\u7406\u2014\u2014\u4eba\u7c7b\u8ba4\u77e5\u7684\u57fa\u77f3\uff0c\u5e76\u63d0\u51faGroundCocoa\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e0e\u822a\u73ed\u9884\u8ba2\u8fd9\u4e00\u73b0\u5b9e\u95ee\u9898\u76f8\u8fde\u63a5\u7684\u8bcd\u6c47\u4e30\u5bcc\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u4efb\u52a1\u662f\u5c06\u7528\u6237\u7684\u8be6\u7ec6\u504f\u597d\u4e0e\u4ee5\u591a\u9009\u5f62\u5f0f\u63d0\u4f9b\u7684\u53ef\u7528\u822a\u73ed\u9009\u9879\u8fdb\u884c\u5339\u914d\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684GPT-4 Turbo\u5728\u5185\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7\u9ad8\u7ea7\u63d0\u793a\u540e\uff0c\u51c6\u786e\u7387\u4ecd\u4e0d\u8d85\u8fc767%\uff0c\u663e\u793a\u51fa\u663e\u8457\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045](http://arxiv.org/abs/2404.16045)|null|## \u7ffb\u8bd1 \u5728\u4ea7\u54c1\u5f00\u53d1\u7684\u5173\u952e\u9636\u6bb5\u2014\u2014\u9700\u6c42\u83b7\u53d6\uff0c\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u7528\u6237\u9700\u6c42\uff0c\u5bfc\u81f4\u6700\u7ec8\u4ea7\u54c1\u53ef\u80fd\u65e0\u6cd5\u6ee1\u8db3\u671f\u671b\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u548c\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u3002\u901a\u8fc7\u751f\u6210\u5927\u91cf\u6a21\u62df\u7528\u6237\uff08LLM\u4ee3\u7406\uff09\uff0c\u6211\u4eec\u53ef\u4ee5\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7528\u6237\u9700\u6c42\u548c\u672a\u9884\u89c1\u7684\u4f7f\u7528\u573a\u666f\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u63cf\u8ff0\u4ed6\u4eec\u7684\u884c\u4e3a\u3001\u89c2\u5bdf\u548c\u6311\u6218\uff0c\u53c2\u4e0e\u4ea7\u54c1\u4f53\u9a8c\u60c5\u666f\u3002\u968f\u540e\u7684\u4ee3\u7406\u8bbf\u8c08\u548c\u5206\u6790\u63ed\u793a\u4e86\u5b9d\u8d35\u7684\u7528\u6237\u9700\u6c42\uff0c\u5305\u62ec\u6f5c\u5728\u9700\u6c42\u3002\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff1a\u9996\u5148\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u65b9\u6cd5\u751f\u6210\u591a\u6837\u5316\u7684\u4ee3\u7406\uff0c\u5206\u6790\u5176\u4f18\u7f3a\u70b9\uff0c\u5e76\u8bc1\u660e\u4e86\u5177\u6709\u4e0a\u4e0b\u6587\u610f\u8bc6\u7684\u4ee3\u7406\u751f\u6210\u80fd\u5e26\u6765\u66f4\u5927\u7684\u9700\u6c42\u591a\u6837\u6027\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8be5\u6846\u67b6\u5982\u4f55\u6709\u6548\u5730\u6a21\u62df\u5bcc\u6709\u540c\u60c5\u5fc3\u7684\u9886\u5148\u7528\u6237\u8bbf\u8c08\uff0c\u8bc6\u522b\u51fa\u6bd4\u4f20\u7edf\u4eba\u7c7b\u8bbf\u8c08\u66f4\u591a\u7684\u6f5c\u5728\u9700\u6c42\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528LLMs\u5206\u6790\u8bbf\u8c08\uff0c\u63d0\u53d6\u9700\u6c42\u5e76\u5c06\u5176\u5206\u7c7b\u4e3a\u6f5c\u5728\u6216\u975e\u6f5c\u5728\u3002\u6211\u4eec\u7684\u7814\u7a76\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5229\u7528LLM\u4ee3\u7406\u52a0\u901f\u65e9\u671f\u4ea7\u54c1\u7814\u53d1\u3001\u964d\u4f4e\u6210\u672c\u548c\u4fc3\u8fdb\u521b\u65b0\u7684\u6f5c\u529b\u3002|\n", "2404.15317": "|**2024-04-03**|**Concept-Guided LLM Agents for Human-AI Safety Codesign**|Florian Geissler et.al.|[2404.15317](http://arxiv.org/abs/2404.15317)|null|\u968f\u7740\u751f\u6210\u4eba\u5de5\u667a\u80fd\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff0c\u7279\u522b\u662f\u5b89\u5168\u5de5\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u63d0\u5347\uff0c\u5bf9\u5b83\u7684\u8d28\u91cf\u8981\u6c42\u4e5f\u968f\u4e4b\u63d0\u9ad8\u3002\u5355\u7eaf\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e9b\u9700\u6c42\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u878d\u5408\u7684\u7b56\u7565\uff0c\u65e8\u5728\u5229\u7528LLMs\u8fdb\u884c\u5b89\u5168\u5206\u6790\u548c\u4eba\u673a\u534f\u540c\u8bbe\u8ba1\uff0c\u4ee5\u786e\u4fdd\u8f6f\u4ef6\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9a\u5236\u5316\u7684LLM\u4ee3\u7406\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u3001\u542f\u53d1\u5f0f\u63a8\u7406\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u4e13\u6ce8\u4e8e\u89e3\u51b3\u4e0e\u9884\u5b9a\u4e49\u5b89\u5168\u6982\u5ff5\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5e76\u4e0e\u7cfb\u7edf\u6a21\u578b\u56fe\u8fdb\u884c\u4ea4\u4e92\u3002\u51b3\u7b56\u6d41\u7a0b\u901a\u8fc7\u4e00\u7cfb\u5217\u5fae\u51b3\u7b56\u8fdb\u884c\u5f15\u5bfc\uff0c\u6709\u52a9\u4e8e\u4fdd\u6301\u7ed3\u6784\u5316\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u56fe\u7684\u53e3\u5934\u8868\u8ff0\u4f5c\u4e3a\u7cfb\u7edf\u6a21\u578b\u7684\u4e2d\u95f4\u8868\u793a\uff0c\u4ee5\u4fc3\u8fdbLLM\u4e0e\u56fe\u7684\u4ea4\u4e92\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7b80\u5316\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u7684\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u9009\u62e9\u7684\u63d0\u793a-\u54cd\u5e94\u5bf9\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u5e94\u7528\u4e8e\u5b89\u5168\u5206\u6790\u3002|\n", "2404.02183": "|**2024-04-02**|**Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**|Yoichi Ishibashi et.al.|[2404.02183](http://arxiv.org/abs/2404.02183)|**[link](https://github.com/tsukushiai/self-organized-agent)**|**## \u80cc\u666f \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u7684\u672a\u6765\u6b63\u9010\u6e10\u663e\u73b0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5355\u4ee3\u7406\u65b9\u6cd5\u5728\u751f\u6210\u548c\u4f18\u5316\u5927\u89c4\u6a21\u3001\u590d\u6742\u7684\u4ee3\u7801\u5e93\u65f6\u9762\u4e34\u4e0a\u4e0b\u6587\u957f\u5ea6\u9650\u5236\u7684\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014\u81ea\u7ec4\u7ec7\u591aAgent\u4f53\u7cfb\uff08SoA\uff09\u3002SoA\u662f\u4e00\u4e2a\u53ef\u6269\u5c55\u4e14\u9ad8\u6548\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u5141\u8bb8\u72ec\u7acb\u5730\u751f\u6210\u548c\u4fee\u6539\u4ee3\u7801\u7ec4\u4ef6\uff0c\u5e76\u534f\u540c\u6784\u5efa\u6574\u4e2a\u4ee3\u7801\u5e93\u3002SoA\u7684\u4e00\u4e2a\u5173\u952e\u7279\u6027\u662f\u6839\u636e\u95ee\u9898\u590d\u6742\u6027\u81ea\u52a8\u589e\u52a0\u4ee3\u7406\uff0c\u5b9e\u73b0\u52a8\u6001\u53ef\u6269\u5c55\u6027\u3002\u8fd9\u6837\uff0c\u6574\u4f53\u4ee3\u7801\u91cf\u53ef\u4ee5\u6839\u636e\u4ee3\u7406\u6570\u91cf\u65e0\u9650\u589e\u957f\uff0c\u800c\u6bcf\u4e2a\u4ee3\u7406\u7ba1\u7406\u7684\u4ee3\u7801\u91cf\u4fdd\u6301\u6052\u5b9a\u3002 \u6211\u4eec\u5728HumanEval\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86SoA\uff0c\u5e76\u53d1\u73b0\u4e0e\u5355\u4ee3\u7406\u7cfb\u7edf\u76f8\u6bd4\uff0cSoA\u4e2d\u7684\u6bcf\u4e2a\u4ee3\u7406\u5904\u7406\u7684\u4ee3\u7801\u91cf\u660e\u663e\u51cf\u5c11\uff0c\u4f46\u603b\u4f53\u751f\u6210\u7684\u4ee3\u7801\u91cf\u663e\u8457\u589e\u52a0\u3002\u6b64\u5916\uff0cSoA\u5728Pass@1\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u5f3a\u5927\u7684\u5355\u4ee3\u7406\u57fa\u7ebf\u63d0\u9ad8\u4e865%\u3002**|\n", "2404.01602": "|**2024-04-02**|**Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game**|Silin Du et.al.|[2404.01602](http://arxiv.org/abs/2404.01602)|**[link](https://github.com/doslim/evaluate-the-opinion-leadership-of-llms)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4ea4\u63a8\u7406\u6e38\u620f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u7b56\u7565\u884c\u4e3a\uff0c\u4f46\u5bf9\u5b83\u4eec\u4f5c\u4e3a\u610f\u89c1\u9886\u8896\u7684\u91cd\u8981\u6027\u5173\u6ce8\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u591aAgent\u548c\u4eba\u673a\u4ea4\u4e92\u573a\u666f\u7684\u5b9e\u9645\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u610f\u89c1\u9886\u8896\u662f\u6307\u5728\u4e00\u4e2a\u793e\u4f1a\u7fa4\u4f53\u4e2d\u5bf9\u4ed6\u4eba\u4fe1\u5ff5\u548c\u884c\u4e3a\u6709\u663e\u8457\u5f71\u54cd\u7684\u4e2a\u4f53\u3002\u672c\u7814\u7a76\u4f7f\u7528\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u4f5c\u4e3a\u6a21\u62df\u5e73\u53f0\uff0c\u63a2\u8ba8\u8bed\u8a00\u6a21\u578b\u5728\u626e\u6f14Sheriff\uff08\u6cbb\u5b89\u5b98\uff09\u89d2\u8272\u65f6\u7684\u610f\u89c1\u9886\u5bfc\u80fd\u529b\u3002Sheriff\u8d1f\u8d23\u603b\u7ed3\u8bba\u70b9\u5e76\u63d0\u51fa\u51b3\u7b56\u5efa\u8bae\uff0c\u56e0\u6b64\u5b83\u4ee3\u8868\u4e86\u610f\u89c1\u9886\u8896\u7684\u4e00\u4e2a\u53ef\u4fe1\u4ee3\u7406\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6574\u5408Sheriff\u89d2\u8272\u7684\u6846\u67b6\uff0c\u5e76\u57fa\u4e8e\u610f\u89c1\u9886\u8896\u7684\u5173\u952e\u7279\u6027\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bc4\u4f30\u6307\u6807\uff1a\u7b2c\u4e00\u4e2a\u8861\u91cf\u610f\u89c1\u9886\u8896\u7684\u53ef\u9760\u6027\uff0c\u7b2c\u4e8c\u4e2a\u8003\u5bdf\u5176\u5bf9\u5176\u4ed6\u73a9\u5bb6\u51b3\u7b56\u7684\u5f71\u54cd\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u8bc4\u4f30\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u521b\u5efa\u4e86\u201c\u72fc\u4eba\u6740\u201d\u95ee\u9898\u56de\u7b54\u6570\u636e\u96c6\uff08WWQA\uff09\uff0c\u4ee5\u6d4b\u8bd5\u548c\u63d0\u5347\u6a21\u578b\u5bf9\u6e38\u620f\u89c4\u5219\u7684\u7406\u89e3\u3002\u6b64\u5916\uff0c\u8fd8\u5305\u542b\u4e86\u4eba\u7c7b\u53c2\u4e0e\u8005\u8fdb\u884c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u662f\u4e00\u4e2a\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u610f\u89c1\u9886\u5bfc\u529b\u7684\u8bd5\u9a8c\u573a\uff0c\u4f46\u76ee\u524d\u4ec5\u6709\u5c11\u6570\u8bed\u8a00\u6a21\u578b\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002**|\n", "2404.00806": "|**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|\u968f\u7740\u7b97\u6cd5\u5b9a\u4ef7\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u62c5\u5fe7\u7b97\u6cd5\u95f4\u7684\u5408\u8c0b\u95ee\u9898\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b9a\u4ef7\u4ee3\u7406\uff0c\u7279\u522b\u662fGPT-4\uff0c\u8fdb\u884c\u4e86\u63a2\u7a76\u3002\u7814\u7a76\u53d1\u73b0\uff1a(1) LLM\u9a71\u52a8\u7684\u5b9a\u4ef7\u673a\u5236\u5728\u5b9a\u4ef7\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff1b(2) \u5728\u5be1\u5934\u7ade\u4e89\u73af\u5883\u4e2d\uff0cLLM\u5b9a\u4ef7\u4ee3\u7406\u4f1a\u81ea\u53d1\u5730\u8fdb\u884c\u5408\u8c0b\uff0c\u4ece\u800c\u635f\u5bb3\u6d88\u8d39\u8005\u5229\u76ca\uff1b(3) \u5bf9LLM\u6307\u4ee4\uff08\u201c\u63d0\u793a\u201d\uff09\u770b\u4f3c\u5fae\u5c0f\u7684\u53d8\u5316\u53ef\u80fd\u52a0\u5267\u8fd9\u79cd\u5408\u4f5c\u884c\u4e3a\u3002\u8fd9\u4e9b\u7ed3\u679c\u540c\u6837\u9002\u7528\u4e8e\u62cd\u5356\u573a\u666f\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u7b97\u6cd5\u5b9a\u4ef7\u8fdb\u884c\u53cd\u5784\u65ad\u76d1\u7ba1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u9488\u5bf9LLM\u5b9a\u4ef7\u4ee3\u7406\u7279\u6709\u7684\u76d1\u7ba1\u6311\u6218\u3002|\n", "2404.01343": "|**2024-04-15**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343](http://arxiv.org/abs/2404.01343)|**[link](https://github.com/jingzheshi/chops)**|**\u968f\u7740\u4f01\u4e1a\u548c\u8f6f\u4ef6\u5e73\u53f0\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-3.5\u3001GPT-4\u3001GLM-3\u548cLLaMa-2\uff09\u63d0\u4f9b\u804a\u5929\u8f85\u52a9\u6216\u5ba2\u6237\u670d\u52a1\u63a8\u7406\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u5ba2\u6237\u670d\u52a1\u6a21\u578b\u5728\u4e0e\u5ba2\u6237\u8d44\u6599\u96c6\u6210\u548c\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u5b83\u4eec\u503e\u5411\u4e8e\u5f3a\u8c03\u591a\u6837\u6027\u800c\u975e\u7cbe\u786e\u6027\u548c\u9519\u8bef\u907f\u514d\uff0c\u8fd9\u5bf9\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u5ba2\u6237\u670d\u52a1\u573a\u666f\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHOPS\uff08\u7ed3\u5408\u5ba2\u6237\u8d44\u6599\u7684\u804a\u5929\u52a9\u624b\uff09\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\uff1a\uff081\uff09\u9ad8\u6548\u5229\u7528\u73b0\u6709\u6570\u636e\u5e93\u6216\u7cfb\u7edf\u67e5\u8be2\u7528\u6237\u4fe1\u606f\uff0c\u6216\u9075\u5faa\u65e2\u5b9a\u6307\u5357\u4e0e\u7cfb\u7edf\u4ea4\u4e92\uff1b\uff082\uff09\u63d0\u4f9b\u51c6\u786e\u5408\u7406\u7684\u54cd\u5e94\u5e76\u6267\u884c\u7cfb\u7edf\u5185\u7684\u5fc5\u8981\u64cd\u4f5c\uff0c\u540c\u65f6\u907f\u514d\u6709\u5bb3\u64cd\u4f5c\uff1b\uff083\uff09\u901a\u8fc7\u7ed3\u5408\u5c0f\u578b\u548c\u5927\u578bLLM\u4ee5\u5b9e\u73b0\u6027\u80fd\u6ee1\u610f\u4e14\u6210\u672c\u5408\u7406\u7684\u63a8\u7406\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9e\u7528\u7684\u6570\u636e\u96c6\uff0c\u79f0\u4e3aCPHOS-dataset\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u6570\u636e\u5e93\u3001\u6307\u5bfc\u6587\u4ef6\u4ee5\u53ca\u6765\u81eaCPHOS\u5e73\u53f0\u7684\u6a21\u62df\u7269\u7406\u5965\u6797\u5339\u514b\u7ec4\u7ec7\u670d\u52a1\u7684\u95ee\u7b54\u5bf9\u3002CPHOS\u662f\u4e00\u4e2a\u9762\u5411\u9ad8\u4e2d\u6559\u5e08\u548c\u5b66\u751f\u7684\u5728\u7ebf\u5e73\u53f0\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528CPHOS-dataset\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86CHOPS\u67b6\u6784\u7684\u6027\u80fd\uff0c\u76ee\u6807\u662f\u5c55\u793aLLM\u5982\u4f55\u63d0\u5347\u6216\u66ff\u4ee3\u4eba\u5de5\u5ba2\u6237\u670d\u52a1\u3002\u5173\u4e8e\u6211\u4eec\u7684\u63d0\u6848\u67b6\u6784\u548c\u6570\u636e\u96c6\u7684\u4ee3\u7801\u53ef\u5728\u6b64\u5904\u83b7\u53d6\uff1a\u3002**|\n", "2404.01342": "|**2024-03-31**|**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**|Lirui Zhao et.al.|[2404.01342](http://arxiv.org/abs/2404.01342)|**[link](https://github.com/opengvlab/diffagent)**|**\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u8fd1\u5e74\u6765\u5907\u53d7\u77a9\u76ee\uff0c\u5728\u5b66\u672f\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u653e\u5f02\u5f69\u3002\u4f8b\u5982\uff0cCivitai\u5e73\u53f0\uff0c\u4e00\u4e2aT2I\u521b\u65b0\u7684\u805a\u96c6\u5730\uff0c\u76ee\u524d\u6c47\u96c6\u4e8674,492\u79cd\u72ec\u7279\u7684\u6a21\u578b\uff0c\u8fd9\u5e26\u6765\u4e86\u9009\u62e9\u6700\u5408\u9002\u7684\u6a21\u578b\u548c\u53c2\u6570\u7684\u8270\u5de8\u4efb\u52a1\uff0c\u901a\u5e38\u9700\u8981\u591a\u6b21\u8bd5\u9a8c\u3002\u501f\u9274\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5de5\u5177\u4f7f\u7528\u7814\u7a76\u7684\u601d\u8def\uff0c\u6211\u4eec\u63a8\u51fa\u4e86DiffAgent\uff0c\u8fd9\u662f\u4e00\u4e2a\u901a\u8fc7API\u8c03\u7528\u6765\u5feb\u901f\u7b5b\u9009\u51c6\u786e\u9009\u9879\u7684LLM\u4ee3\u7406\u3002DiffAgent\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff0c\u79f0\u4e3aSFTA\uff0c\u4f7f\u5176\u80fd\u591f\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7cbe\u786e\u5730\u5c06T2I API\u7684\u54cd\u5e94\u4e0e\u7528\u6237\u8f93\u5165\u5bf9\u9f50\u3002\u4e3a\u4e86\u8bad\u7ec3\u548c\u8bc4\u4f30DiffAgent\u7684\u80fd\u529b\uff0c\u6211\u4eec\u6784\u5efa\u4e86DABench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u5e93\uff0c\u6db5\u76d6\u4e86\u793e\u533a\u4e2d\u7684\u5404\u79cdT2I API\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDiffAgent\u4e0d\u4ec5\u5728\u9009\u62e9\u9002\u5f53\u7684T2I API\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u9a8c\u8bc1\u4e86SFTA\u8bad\u7ec3\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53ef\u5728https://github.com/OpenGVLab/DiffAgent\u83b7\u53d6\u3002**|\n", "2404.00573": "|**2024-03-31**|**\"My agent understands me better\": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u7c7b\u8bb0\u5fc6\u67b6\u6784\uff0c\u65e8\u5728\u63d0\u5347\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u8bdd\u4ee3\u7406\u7684\u8ba4\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f7f\u5f97\u8fd9\u4e9b\u4ee3\u7406\u80fd\u81ea\u4e3b\u68c0\u7d22\u751f\u6210\u54cd\u5e94\u6240\u9700\u7684\u5fc5\u8981\u8bb0\u5fc6\uff0c\u4ece\u800c\u89e3\u51b3LLMs\u5728\u65f6\u95f4\u8ba4\u77e5\u4e0a\u7684\u5c40\u9650\u3002\u6211\u4eec\u501f\u9274\u4e86\u4eba\u7c7b\u7684\u8bb0\u5fc6\u7ebf\u7d22\u53ec\u56de\u673a\u5236\u4f5c\u4e3a\u89e6\u53d1\u70b9\uff0c\u4ee5\u5b9e\u73b0\u7cbe\u786e\u4e14\u9ad8\u6548\u7684\u56de\u5fc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6570\u5b66\u6a21\u578b\uff0c\u52a8\u6001\u91cf\u5316\u8bb0\u5fc6\u5de9\u56fa\u8fc7\u7a0b\uff0c\u8003\u8651\u4e86\u8bf8\u5982\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u65f6\u95f4\u6d41\u901d\u548c\u56de\u5fc6\u9891\u7387\u7b49\u56e0\u7d20\u3002\u4ee3\u7406\u4f1a\u4ece\u7528\u6237\u7684\u4ea4\u4e92\u5386\u53f2\u4e2d\u5b58\u50a8\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u8bb0\u5fc6\u88ab\u5c01\u88c5\u5728\u6570\u636e\u5e93\u4e2d\uff0c\u6bcf\u4e2a\u8bb0\u5fc6\u90fd\u5305\u542b\u4e86\u5185\u5bb9\u548c\u65f6\u95f4\u5173\u8054\u7684\u8bed\u5883\u3002\u8fd9\u6837\uff0c\u901a\u8fc7\u7c7b\u4f3c\u4eba\u7c7b\u8bc6\u522b\u548c\u56de\u5fc6\u8fc7\u5f80\u7ecf\u5386\u7684\u65b9\u5f0f\uff0c\u7cfb\u7edf\u80fd\u591f\u6218\u7565\u6027\u5730\u5b58\u50a8\u8bb0\u5fc6\uff0c\u5e76\u7406\u89e3\u5b83\u4eec\u5bf9\u7528\u6237\u5728\u65f6\u95f4\u7ebf\u4e0a\u7684\u91cd\u8981\u6027\u3002|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.11403": "|**2024-05-18**|**MapCoder: Multi-Agent Code Generation for Competitive Problem Solving**|Md. Ashraful Islam et.al.|[2405.11403](http://arxiv.org/abs/2405.11403)|**[link](https://github.com/md-ashraful-pramanik/mapcoder)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ee3\u7801\u5408\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u590d\u6742\u7684\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u63cf\u8ff0\u3001\u751f\u6210\u590d\u6742\u7684\u7b97\u6cd5\u548c\u6570\u636e\u7ed3\u6784\u4ee3\u7801\uff0c\u5e76\u6267\u884c\u5168\u9762\u7684\u5355\u5143\u6d4b\u8bd5\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u591a\u4ee3\u7406\u63d0\u793a\u6846\u67b6MapCoder\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u5f00\u53d1\u8005\u7f16\u7a0b\u5408\u6210\u7684\u5b8c\u6574\u8fc7\u7a0b\uff0c\u5206\u4e3a\u56db\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\uff1a\u56de\u5fc6\u76f8\u5173\u793a\u4f8b\u3001\u89c4\u5212\u3001\u4ee3\u7801\u751f\u6210\u548c\u8c03\u8bd5\u3002 \u901a\u8fc7\u5728\u516b\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7ade\u8d5b\u7ea7\u95ee\u9898\u89e3\u51b3\u548c\u7a0b\u5e8f\u5408\u6210\u57fa\u51c6\u4e0a\u8fdb\u884c\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u5305\u62ecHumanEval\uff0893.9%\uff09\u3001MBPP\uff0883.1%\uff09\u3001APPS\uff0822.0%\uff09\u3001CodeContests\uff0828.5%\uff09\u548cxCodeEval\uff0845.3%\uff09\u7b49\uff0cMapCoder\u5c55\u73b0\u4e86\u51fa\u8272\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u591a\u9879\u65b0\u7684\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u800c\u4e14\uff0c\u65e0\u8bba\u7f16\u7a0b\u8bed\u8a00\u8fd8\u662f\u95ee\u9898\u96be\u5ea6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u90fd\u8868\u73b0\u51fa\u6301\u7eed\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8be5\u6846\u67b6\uff0c\u4f9b\u7814\u7a76\u8005\u53c2\u8003\uff1ahttps://github.com/Md-Ashraful-Pramanik/MapCoder\u3002**|\n", "2405.14751": "|**2024-05-23**|**AGILE: A Novel Framework of LLM Agents**|Peiyuan Feng et.al.|[2405.14751](http://arxiv.org/abs/2405.14751)|**[link](https://github.com/bytarnish/agile)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3aLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406AGILE\uff08\u80fd\u591f\u4e0e\u7528\u6237\u4e92\u52a8\u5e76\u4ece\u73af\u5883\u4e2d\u5b66\u4e60\u7684\u4ee3\u7406\uff09\uff0c\u65e8\u5728\u6267\u884c\u590d\u6742\u7684\u5bf9\u8bdd\u4efb\u52a1\uff0c\u5229\u7528LLMs\u3001\u8bb0\u5fc6\u3001\u5de5\u5177\u548c\u4e13\u5bb6\u4ea4\u4e92\u3002\u8fd9\u79cd\u4ee3\u7406\u4e0d\u4ec5\u5177\u5907\u5bf9\u8bdd\u80fd\u529b\uff0c\u8fd8\u5177\u5907\u53cd\u601d\u3001\u5de5\u5177\u8fd0\u7528\u4ee5\u53ca\u54a8\u8be2\u4e13\u5bb6\u7684\u529f\u80fd\u3002\u6211\u4eec\u5c06\u6784\u5efa\u6b64\u7c7bLLM\u4ee3\u7406\u89c6\u4e3a\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u4e2dLLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\u3002\u6211\u4eec\u4f7f\u7528\u6807\u6ce8\u7684\u884c\u4e3a\u6570\u636e\u548cPPO\u7b97\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7279\u522b\u5173\u6ce8\u7684\u662f\u95ee\u7b54\u4efb\u52a1\uff0c\u4e3a\u6b64\u6211\u4eec\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aProductQA\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u5728\u7ebf\u8d2d\u7269\u4e2d\u7684\u96be\u9898\u3002\u6211\u4eec\u5728ProductQA\u548cMedMCQA\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8e130\u4ebf\u548c70\u4ebf\u53c2\u6570\u7684LLM\u8bad\u7ec3\u7684AGILE\u4ee3\u7406\u80fd\u591f\u8d85\u8d8aGPT-4\u4ee3\u7406\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684 ablation\u7814\u7a76\u5f3a\u8c03\u4e86\u8bb0\u5fc6\u3001\u5de5\u5177\u3001\u54a8\u8be2\u3001\u53cd\u601d\u548c\u5f3a\u5316\u5b66\u4e60\u5728\u5b9e\u73b0\u4f18\u79c0\u6027\u80fd\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002|\n", "2405.14744": "|**2024-05-23**|**Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View**|Xuan Liu et.al.|[2405.14744](http://arxiv.org/abs/2405.14744)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u53cd\u6620\u4e86\u4eba\u7c7b\u504f\u89c1\uff0c\u5b83\u4eec\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\u95ee\u9898\u3002\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u662f\uff1aLLMs\u662f\u5426\u80fd\u591f\u5229\u7528\u5e7b\u89c9\u6765\u6a21\u4eff\u4eba\u7c7b\u7684\u8ba4\u77e5\u504f\u89c1\uff0c\u4ece\u800c\u5c55\u73b0\u51fa\u975e\u7406\u6027\u4f46\u793e\u4f1a\u6027\u7684\u4e00\u9762\uff1f\u672c\u6587\u63a2\u8ba8\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u901a\u8fc7\u7ed3\u5408\u5b9e\u7528\u7684\u793e\u4f1a\u79d1\u5b66\u5b9e\u9a8c\u548c\u7406\u8bba\u6d1e\u5bdf\uff0c\u63d0\u51faCogMir\uff0c\u4e00\u4e2a\u5f00\u653e\u5f0f\u591aLLM\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528LLMs\u7684\u5e7b\u89c9\u7279\u6027\u6765\u8bc4\u4f30\u548c\u63d0\u5347\u5176\u793e\u4f1a\u667a\u80fd\uff0c\u7279\u522b\u662f\u5728\u8ba4\u77e5\u504f\u5dee\u65b9\u9762\u3002\u6211\u4eec\u5728CogMir\u5b50\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u786e\u5b9a\u60c5\u5883\u4e0b\uff0cLLMs\u548c\u4eba\u7c7b\u5728\u975e\u7406\u6027\u53ca\u4eb2\u793e\u4f1a\u51b3\u7b56\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u4e00\u81f4\u6027\uff0c\u8fd9\u8868\u660eLLMs\u4f5c\u4e3a\u793e\u4f1a\u5b9e\u4f53\u7684\u4eb2\u793e\u4f1a\u6027\uff0c\u5e76\u7a81\u663e\u4e86\u5e7b\u89c9\u7279\u6027\u7684\u5173\u952e\u4f5c\u7528\u3002\u6b64\u5916\uff0cCogMir\u6846\u67b6\u5c55\u793a\u4e86\u5176\u4f5c\u4e3a\u7814\u7a76LLMs\u793e\u4f1a\u667a\u80fd\u7684\u6709\u4ef7\u503c\u5e73\u53f0\u7684\u6f5c\u529b\u3002|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547](http://arxiv.org/abs/2405.13547)|null|## \u80cc\u666f \u81ea\u52a8\u9a7e\u9a76\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u5148\u8fdb\u7684\u51b3\u7b56\u548c\u63a7\u5236\u7b97\u6cd5\u3002\u7406\u89e3\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u51b3\u7b56\u7684\u4f9d\u636e\u5bf9\u4e8e\u786e\u4fdd\u5176\u5728\u9ad8\u901f\u516c\u8def\u9a7e\u9a76\u4e2d\u7684\u5b89\u5168\u4e0e\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aHighwayLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u6765\u9884\u6d4bego\u8f66\u8f86\u7684\u672a\u6765\u5bfc\u822a\u8def\u5f84\u70b9\u3002\u8be5\u65b9\u6cd5\u8fd8\u91c7\u7528\u9884\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u6a21\u578b\u4f5c\u4e3a\u9ad8\u5c42\u6b21\u89c4\u5212\u5668\uff0c\u5bf9\u5408\u9002\u7684\u5143\u7ea7\u52a8\u4f5c\u8fdb\u884c\u51b3\u7b56\u3002HighwayLLM\u5c06RL\u6a21\u578b\u7684\u8f93\u51fa\u4e0e\u5f53\u524d\u72b6\u6001\u4fe1\u606f\u76f8\u7ed3\u5408\uff0c\u751f\u6210\u5b89\u5168\u3001\u65e0\u78b0\u649e\u4e14\u53ef\u89e3\u91ca\u7684\u672a\u6765\u72b6\u6001\u9884\u6d4b\uff0c\u4ece\u800c\u6784\u5efa\u51fa\u8f66\u8f86\u7684\u884c\u9a76\u8f68\u8ff9\u3002\u968f\u540e\uff0c\u57fa\u4e8ePID\u7684\u63a7\u5236\u5668\u5f15\u5bfc\u8f66\u8f86\u9075\u5faaLLM\u4ee3\u7406\u9884\u6d4b\u7684\u8def\u5f84\u70b9\u3002\u8fd9\u79cdLLM\u4e0eRL\u548cPID\u7684\u878d\u5408\u63d0\u5347\u4e86\u51b3\u7b56\u8fc7\u7a0b\uff0c\u5e76\u4e3a\u9ad8\u901f\u516c\u8def\u81ea\u52a8\u9a7e\u9a76\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050](http://arxiv.org/abs/2405.13050)|**[link](https://github.com/daniel-chin/flute-x-gpt)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09-\u5728-\u73af\u5e94\u7528\u5df2\u663e\u793a\u51fa\u6709\u6548\u7406\u89e3\u7528\u6237\u547d\u4ee4\u3001\u5236\u5b9a\u8ba1\u5212\u5e76\u76f8\u5e94\u5730\u64cd\u4f5c\u5916\u90e8\u5de5\u5177/\u7cfb\u7edf\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u4ee3\u7406\u7684\u64cd\u4f5c\u8303\u56f4\u5c40\u9650\u4e8e\u88ab\u52a8\u54cd\u5e94\u7528\u6237\uff0c\u9700\u8981\u7528\u6237\u6839\u636e\u5e95\u5c42\u5de5\u5177/\u7cfb\u7edf\u6765\u8868\u8ff0\u9700\u6c42\u3002\u6211\u4eec\u6ce8\u610f\u5230LLM\u4ee3\u7406\u7528\u6237\u754c\u9762\uff08LAUI\uff09\u7684\u6f5c\u529b\u8fdc\u672a\u5145\u5206\u5229\u7528\u3002\u7406\u60f3\u7684LAUI\u8bbe\u60f3\u4e2d\uff0c\u7528\u6237\u65e0\u9700\u6df1\u5165\u4e86\u89e3\u5de5\u5177/\u7cfb\u7edf\uff0c\u5c31\u80fd\u4e0e\u4e4b\u4ea4\u4e92\u4ee5\u63a2\u7d22\u65b0\u5174\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u4e0d\u540c\u4e8e\u8bbe\u8ba1\u56fa\u5b9a\u7684\u53ef\u63a2\u7d22GUI\u6765\u6559\u6388\u7528\u6237\u4f7f\u7528\u7cfb\u7edf\u7684\u9884\u8bbe\u65b9\u5f0f\uff0cLAUI\u4e2d\u7684LLM\u4ee3\u7406\u4ece\u4e00\u5f00\u59cb\u5c31\u5bf9\u7cfb\u7edf\u719f\u7ec3\uff0c\u4e3b\u52a8\u5b66\u4e60\u7528\u6237\u53ca\u5176\u9700\u6c42\uff0c\u5e76\u5411\u7528\u6237\u63d0\u51fa\u65b0\u7684\u4e92\u52a8\u65b9\u6848\u3002\u4e3a\u4e86\u5c55\u793aLAUI\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\uff1aFlute X GPT\uff0c\u5b83\u7ed3\u5408\u4e86LLM\u4ee3\u7406\u3001\u63d0\u793a\u7ba1\u7406\u5668\u548c\u4e00\u4e2a\u652f\u6301\u590d\u6742\u5b9e\u65f6\u4f53\u9a8c\u7684\u7b1b\u5b50\u6559\u5b66\u591a\u5a92\u4f53\u8f6f\u786c\u4ef6\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u5b66\u4e60\u5439\u594f\u7b1b\u5b50\u7684\u8fc7\u7a0b\u3002|\n", "2405.13009": "|**2024-05-13**|**METAREFLECTION: Learning Instructions for Language Agents using Past Reflections**|Priyanshu Gupta et.al.|[2405.13009](http://arxiv.org/abs/2405.13009)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e7f\u53d7\u6b22\u8fce\uff0c\u4f46\u4e3a\u5176\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u8bbe\u8ba1\u7cbe\u786e\u7684\u63d0\u793a\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u7528\u6237\u901a\u5e38\u9700\u8981\u4e0e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u591a\u8f6e\u5bf9\u8bdd\u4ee5\u8fbe\u6210\u76ee\u6807\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u6a21\u578b\u81ea\u8eab\u7684\u53cd\u9988\uff0c\u5373\u81ea\u53cd\u601d\uff0c\u80fd\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u8d77\u5230\u5f3a\u5316\u4f5c\u7528\uff0c\u6709\u52a9\u4e8e\u66f4\u5feb\u5730\u8fbe\u5230\u671f\u671b\u7ed3\u679c\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014METAREFLECTION\uff0c\u5b83\u80fd\u4ece\u8bad\u7ec3\u9636\u6bb5\u6536\u96c6\u5230\u7684\u4e2a\u4f53\u81ea\u53cd\u601d\u4e2d\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u901a\u7528\u63d0\u793a\u6307\u4ee4\u3002\u6211\u4eec\u5728\u57fa\u7840\u8bbe\u65bd\u5373\u4ee3\u7801\uff08IAC\uff09\u6f0f\u6d1e\u68c0\u6d4b\u548c\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4f7f\u7528REACT\u548cCOT\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cMETAREFLECTION\u663e\u8457\u4f18\u4e8eGPT-4\uff0c\u5206\u522b\u5728IAC\u3001COT\u548cREACT\u4e2d\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a16.82%\u300131.33%\u548c15.42%\uff0c\u8fd9\u8868\u660eMETAREFLECTION\u6709\u6f5c\u529b\u63d0\u5347LLMs\u7684\u6548\u7387\uff0c\u662f\u4e00\u79cd\u503c\u5f97\u63a2\u7d22\u7684\u7b56\u7565\u3002|\n", "2405.15414": "|**2024-05-24**|**Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification**|Yuxuan Guo et.al.|[2405.15414](http://arxiv.org/abs/2405.15414)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u4e2d\uff0c\u6784\u5efa\u5f00\u653e\u578b\u4ee3\u7406\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u7ec8\u6781\u76ee\u6807\uff0c\u7279\u522b\u662f\u521b\u9020\u6027\u7684\u4ee3\u7406\u66f4\u5177\u5438\u5f15\u529b\u3002\u73b0\u6709\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u6709\u660e\u786e\u76ee\u6807\u7684\u957f\u5e8f\u5217\u4efb\u52a1\uff08\u5982\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u201c\u5f00\u91c7\u94bb\u77f3\u201d\uff09\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u5904\u7406\u5177\u6709\u5f00\u653e\u76ee\u6807\u548c\u62bd\u8c61\u6807\u51c6\u7684\u521b\u9020\u6027\u4efb\u52a1\u65f6\u9047\u5230\u56f0\u96be\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5f25\u5408\u8fd9\u4e9b\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u4ece\u800c\u7f3a\u4e4f\u81ea\u6211\u6539\u8fdb\u6765\u89e3\u51b3\u95ee\u9898\u7684\u53cd\u9988\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\u6280\u672f\uff0c\u4ee5\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e3a\u521b\u9020\u6027\u4efb\u52a1\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Luban\u4ee3\u7406\uff0c\u4e13\u6ce8\u4e8e\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5b83\u914d\u5907\u4e86\u4e24\u7ea7\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u5b9e\u8df5\uff1a\uff081\uff09\u89c6\u89c9\u9a8c\u8bc13D\u7ed3\u6784\u63a8\u6d4b\uff0c\u901a\u8fc7\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u7684CAD\u5efa\u6a21\u7a0b\u5e8f\u5b9e\u73b0\uff1b\uff082\uff09\u5b9e\u7528\u9a8c\u8bc1\uff0c\u6839\u636e\u62bd\u8c61\u6807\u51c6\u751f\u6210\u5e76\u9a8c\u8bc1\u4e0e\u73af\u5883\u76f8\u5173\u7684\u529f\u80fd\u7a0b\u5e8f\u3002\u5e7f\u6cdb\u7684\u591a\u7ef4\u5ea6\u4eba\u7c7b\u7814\u7a76\u548cElo\u8bc4\u7ea7\u663e\u793a\uff0cLuban\u80fd\u591f\u5728\u6211\u4eec\u63d0\u51fa\u7684\u57fa\u51c6\u4e2d\u5b8c\u6210\u591a\u6837\u5316\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5e76\u5728\u53ef\u89c6\u5316\u548c\u5b9e\u7528\u6027\u65b9\u9762\u5206\u522b\u6bd4\u5176\u4ed6\u57fa\u7ebf\u63d0\u9ad8\u4e8633%\u5230100%\u3002\u6b64\u5916\uff0c\u5b9e\u73b0\u5728\u771f\u5b9e\u4e16\u754c\u673a\u5668\u4eba\u624b\u81c2\u4e0a\u7684\u6f14\u793a\u5c55\u793a\u4e86Luban\u5728\u7269\u7406\u4e16\u754c\u4e2d\u7684\u521b\u4f5c\u6f5c\u529b\u3002|\n", "2405.15145": "|**2024-05-24**|**CulturePark: Boosting Cross-cultural Understanding in Large Language Models**|Cheng Li et.al.|[2405.15145](http://arxiv.org/abs/2405.15145)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u666e\u904d\u5b58\u5728\u6587\u5316\u504f\u89c1\uff0c\u4e3b\u8981\u6e90\u4e8e\u7f3a\u4e4f\u4ee3\u8868\u4e0d\u540c\u6587\u5316\u7684\u4ee3\u8868\u6027\u6570\u636e\u3002\u4f20\u7edf\u7684\u6587\u5316\u6570\u636e\u96c6\u548c\u57fa\u51c6\u901a\u5e38\u901a\u8fc7\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u63d0\u53d6\u6216\u805a\u5408\u6765\u81ea\u7ef4\u57fa\u767e\u79d1\u548c\u793e\u4ea4\u5a92\u4f53\u7684\u4fe1\u606f\u6784\u5efa\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u548c\u4eba\u5de5\u6807\u6ce8\uff0c\u6210\u672c\u9ad8\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u672c\u6587\u501f\u9274\u8ba4\u77e5\u793e\u4f1a\u4ea4\u6d41\u7406\u8bba\uff0c\u63d0\u51faCulturePark\uff0c\u4e00\u4e2a\u5229\u7528LLMs\u7684\u591a\u4ee3\u7406\u6c9f\u901a\u6846\u67b6\uff0c\u7528\u4e8e\u6587\u5316\u6570\u636e\u6536\u96c6\u3002CulturePark\u901a\u8fc7\u6a21\u62df\u4e0d\u540c\u6587\u5316\u80cc\u666f\u4e0b\u7684\u4eba\u7c7b\u4ea4\u6d41\uff0c\u8ba9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89d2\u8272\u626e\u6f14\uff0c\u751f\u6210\u5305\u542b\u4eba\u7c7b\u4fe1\u5ff5\u3001\u89c4\u8303\u548c\u4e60\u4fd7\u7684\u9ad8\u8d28\u91cf\u8de8\u6587\u5316\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528CulturePark\u751f\u6210\u4e8641,000\u4e2a\u6587\u5316\u6837\u672c\uff0c\u5bf9\u516b\u79cd\u7279\u5b9a\u6587\u5316\u8fdb\u884c\u4e86\u6a21\u578b\u5fae\u8c03\u3002\u5728\u4e09\u9879\u4e0b\u6e38\u4efb\u52a1\u8bc4\u4f30\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u8868\u73b0\u4f18\u4e8eGPT-4\uff1a\u5185\u5bb9\u8fc7\u6ee4\u3001\u6587\u5316\u4e00\u81f4\u6027\uff08\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u8868\u4e0a\uff09\u548c\u6587\u5316\u6559\u80b2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684GPT-3.5\u6a21\u578b\u5728\u5185\u5bb9\u8fc7\u6ee4\u4efb\u52a1\u4e0a\u4e0eGPT-4\u76f8\u5f53\u6216\u4f18\u4e8e\u5b83\uff1b\u5728\u6587\u5316\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u886813\u6846\u67b6\u4e0a\u8d85\u8d8aGPT-4\uff1b\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u6587\u5316\u6559\u80b2\u6548\u679c\u548c\u7528\u6237\u4f53\u9a8c\u4e0a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u8868\u73b0\u51fa\u8272\u3002CulturePark\u5bf9\u4e8e\u51cf\u5c11\u6587\u5316\u504f\u89c1\u548c\u63a8\u52a8AI\u7684\u6c11\u4e3b\u5316\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u5f3a\u8c03\u4e86\u6587\u5316\u5305\u5bb9\u6027\u6570\u636e\u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2405.14918": "|**2024-05-23**|**AnalogCoder: Analog Circuit Design via Training-Free Code Generation**|Yao Lai et.al.|[2405.14918](http://arxiv.org/abs/2405.14918)|**[link](https://github.com/laiyao1/AnalogCoder)**|### \u7ffb\u8bd1 \u5728\u73b0\u4ee3\u82af\u7247\u6280\u672f\u4e2d\uff0c\u6a21\u62df\u7535\u8def\u8bbe\u8ba1\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u7ec4\u4ef6\u9009\u62e9\u3001\u8fde\u63a5\u548c\u53c2\u6570\u8bbe\u7f6e\u4ee5\u786e\u4fdd\u7535\u8def\u529f\u80fd\u6b63\u5e38\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b57\u7535\u8def\u8bbe\u8ba1\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u4f46\u6a21\u62df\u7535\u8def\u7684\u590d\u6742\u6027\u548c\u6570\u636e\u7a00\u7f3a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a8\u51fa\u4e86AnalogCoder\uff0c\u8fd9\u662f\u9996\u4e2a\u65e0\u9700\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4e13\u4e3a\u901a\u8fc7Python\u4ee3\u7801\u751f\u6210\u6765\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u9996\u5148\uff0cAnalogCoder\u91c7\u7528\u53cd\u9988\u589e\u5f3a\u6d41\u7a0b\uff0c\u5e76\u7ed3\u5408\u5b9a\u5236\u7684\u9886\u57df\u7279\u5b9a\u63d0\u793a\uff0c\u80fd\u591f\u81ea\u52a8\u4e14\u81ea\u6211\u6821\u6b63\u5730\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\uff0c\u6210\u529f\u7387\u9ad8\u3002\u5176\u6b21\uff0c\u5b83\u63d0\u51fa\u4e86\u4e00\u5957\u7535\u8def\u5de5\u5177\u5e93\uff0c\u7528\u4e8e\u5b58\u50a8\u6210\u529f\u7684\u7535\u8def\u8bbe\u8ba1\u4f5c\u4e3a\u53ef\u91cd\u7528\u7684\u6a21\u5757\u5316\u5b50\u7535\u8def\uff0c\u7b80\u5316\u4e86\u590d\u5408\u7535\u8def\u7684\u521b\u5efa\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAnalogCoder\u5728\u5e7f\u6cdb\u8986\u76d6\u6a21\u62df\u7535\u8def\u4efb\u52a1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8d85\u8d8a\u4e86\u5176\u4ed6\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0c\u6210\u529f\u8bbe\u8ba1\u4e8620\u4e2a\u7535\u8def\uff0c\u6bd4\u6807\u51c6GPT-4o\u591a\u51fa5\u4e2a\u3002\u6211\u4eec\u76f8\u4fe1AnalogCoder\u80fd\u663e\u8457\u63d0\u5347\u82af\u7247\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u6548\u7387\uff0c\u8ba9\u975e\u4e13\u5bb6\u4e5f\u80fd\u9ad8\u6548\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u57fa\u51c6\u5df2\u63d0\u4f9b\u5728\uff1a[https://github.com/anonyanalog/AnalogCoder](https://github.com/anonyanalog/AnalogCoder)\u3002|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|## \u80cc\u666f \u7531\u4e8e\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0cEmbodied agent \u9700\u8981\u5177\u5907\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u51b3\u5b9a\u5177\u4f53\u884c\u52a8\u65f6\u53ef\u80fd\u4ea7\u751f\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u7684\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3 LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\u2014\u2014\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bad\u7ec3\u7b56\u7565\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u7684\u6700\u9ad8\u6210\u5c31\u9700\u8981\u66f4\u4e3a\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u6bd4\u73b0\u6709\u6700\u5feb\u65b9\u6cd5\u5feb\u51fa\u4e866.8\u500d\u3002|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510](http://arxiv.org/abs/2405.16510)|null|\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\u63a8\u52a8\u4e86\u667a\u80fd\u4ee3\u7406\u7814\u7a76\u7684\u65b0\u70ed\u6f6e\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5b9e\u73b0\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6709\u524d\u666f\u65b9\u6cd5\uff0c\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u548c\u6cdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u5728\u5b9e\u9645\u4efb\u52a1\u4e2d\uff0c\u6709\u6548\u7684\u89c4\u5212\u5bf9LLM\u4ee3\u7406\u7684\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u590d\u6742\u4efb\u52a1\u8bbe\u8ba1\u51fa\u53ef\u884c\u6216\u6700\u4f18\u7684\u7cbe\u7ec6\u7c92\u5ea6\u64cd\u4f5c\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9700\u8981\u7ec4\u5408\u5927\u91cf\u5f02\u8d28\u884c\u52a8\u7684\u5e8f\u5217\uff0c\u4ecd\u662f\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMeta-Task Planning\uff08MTP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u96f6\u6837\u672c\u7684\u534f\u4f5c\u5f0fLLM\u591a\u4ee3\u7406\u7cfb\u7edf\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u5373\u5143\u4efb\u52a1\uff0c\u7b80\u5316\u4e86\u4efb\u52a1\u89c4\u5212\u3002\u6bcf\u4e2a\u5143\u4efb\u52a1\u968f\u540e\u6620\u5c04\u4e3a\u53ef\u6267\u884c\u52a8\u4f5c\u3002\u5728TravelPlanner\u548cAPI-Bank\u4e24\u4e2a\u4e25\u683c\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86MTP\u3002\u7ed3\u679c\u8868\u660e\uff0cMTP\u5728TravelPlanner\u4e0a\u7684\u5e73\u5747\u6210\u529f\u7387\u7ea6\u4e3a40%\uff0c\u8fdc\u8d85\u5f53\u524d\u6700\u4f73\u57fa\u7ebf\uff082.92%\uff09\uff0c\u5e76\u4e14\u5728API-Bank\u4e0a\u7684\u6027\u80fd\u6bd4\u4f7f\u7528ReAct\u7684LLM_{api}-4\u9ad8\u51fa\u7ea614%\uff0c\u8fd9\u663e\u793a\u51fa\u5c06LLM\u4e0e\u591a\u4ee3\u7406\u7cfb\u7edf\u76f8\u7ed3\u5408\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376](http://arxiv.org/abs/2405.16376)|**[link](https://github.com/cyrilli/stride)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u8bed\u8a00\u80fd\u529b\u548c\u63a8\u7406\u6280\u5de7\u3002\u7136\u800c\uff0c\u5728\u6218\u7565\u6027\u7684\u591a\u4ee3\u7406\u51b3\u7b56\u73af\u5883\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u5c40\u9650\uff0c\u5982\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5dee\u3001\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\u548c\u751f\u6210\u9519\u8bef\u4fe1\u606f\u3002\u8fd9\u4e9b\u7f3a\u70b9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9075\u5b88\u590d\u6742\u6e38\u620f\u89c4\u5219\u3001\u957f\u671f\u89c4\u5212\u3001\u63a2\u7d22\u672a\u77e5\u73af\u5883\u4ee5\u53ca\u9884\u6d4b\u5bf9\u624b\u884c\u52a8\u7684\u4e92\u52a8\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u7ed3\u5408\u4e86\u8bb0\u5fc6\u548c\u4e13\u4e1a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u5176\u5728\u6218\u7565\u51b3\u7b56\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7279\u522b\u5728\u53cc\u8fb9\u8c08\u5224\u3001\u591a\u4ee3\u7406\u52a8\u6001\u673a\u5236\u8bbe\u8ba1\u7b49\u7ecf\u6d4e\u91cd\u8981\u573a\u666f\u4e2d\u5e94\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5e76\u901a\u8fc7\u5b9a\u91cf\u6307\u6807\u8bc4\u4f30\u5728\u5404\u79cd\u6218\u7565\u51b3\u7b56\u95ee\u9898\u4e0a\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u589e\u5f3a\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5f53\u524d\u6a21\u578b\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u4f46\u6211\u4eec\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u589e\u5f3a\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u53ef\u80fd\u6027\uff0c\u8fd9\u4e3a\u672a\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u5411\u3002**|\n", "2405.16334": "|**2024-05-29**|**Devil's Advocate: Anticipatory Reflection for LLM Agents**|Haoyu Wang et.al.|[2405.16334](http://arxiv.org/abs/2405.16334)|null|\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8d4b\u4e88\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u6211\u53cd\u601d\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65f6\u7684\u4e00\u81f4\u6027\u548c\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fc3\u4f7fLLM\u4ee3\u7406\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u5b50\u4efb\u52a1\uff08\u5373\u5236\u5b9a\u8ba1\u5212\uff09\uff0c\u5e76\u5728\u6267\u884c\u884c\u52a8\u4e4b\u524d\u6301\u7eed\u53cd\u601d\u53ef\u80fd\u7684\u5931\u8d25\u53ca\u5176\u8865\u6551\u63aa\u65bd\u3001\u6267\u884c\u540e\u4e0e\u5b50\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u5e76\u8fdb\u884c\u5fc5\u8981\u7684\u56de\u6eaf\u4ee5\u786e\u4fdd\u5168\u529b\u4ee5\u8d74\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u53ca\u5728\u5b8c\u6210\u8ba1\u5212\u540e\u8fdb\u884c\u5168\u9762\u5ba1\u67e5\uff0c\u4ee5\u4fbf\u4e8e\u672a\u6765\u7b56\u7565\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5728WebArena\u4e2d\u96f6\u6837\u672c\u5e94\u7528\u8fd9\u4e00\u65b9\u6cd5\u5904\u7406\u5b9e\u9645\u7684\u7f51\u7edc\u73af\u5883\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u4f18\u4e8e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u57fa\u4e8e\u53cd\u601d\u7684\u7b56\u7565\u4e0d\u4ec5\u63d0\u5347\u4e86\u4ee3\u7406\u5e94\u5bf9\u672a\u9884\u89c1\u6311\u6218\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u901a\u8fc7\u5f3a\u5927\u7684\u8ba1\u5212\u6267\u884c\u673a\u5236\uff0c\u8fd8\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u51cf\u5c11\u4e86\u5b9e\u73b0\u4efb\u52a1\u6240\u9700\u7684\u5c1d\u8bd5\u6b21\u6570\u548c\u8ba1\u5212\u4fee\u8ba2\u6b21\u6570\u3002|\n", "2405.16247": "|**2024-05-25**|**AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning**|Minghao Chen et.al.|[2405.16247](http://arxiv.org/abs/2405.16247)|**[link](https://github.com/minghchen/automanual)**|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u5404\u79cd\u9886\u57df\u4efb\u52a1\uff0c\u5982\u673a\u5668\u4eba\u3001\u6e38\u620f\u548c\u7f51\u7edc\u5bfc\u822a\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u548c\u4e13\u5bb6\u7ea7\u63d0\u793a\u624d\u80fd\u9002\u5e94\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u9002\u5e94\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AutoManual\u6846\u67b6\uff0c\u8ba9LLMs\u80fd\u591f\u901a\u8fc7\u4e92\u52a8\u81ea\u4e3b\u6784\u5efa\u7406\u89e3\uff0c\u5e76\u9002\u5e94\u65b0\u73af\u5883\u3002AutoManual\u5c06\u73af\u5883\u77e5\u8bc6\u5206\u4e3a\u591a\u6837\u7684\u89c4\u5219\uff0c\u5e76\u901a\u8fc7\u4e24\u4e2a\u4ee3\u7406\u8fdb\u884c\u5728\u7ebf\u4f18\u5316\uff1a1\uff09\u89c4\u5212\u5668\u6839\u636e\u5f53\u524d\u89c4\u5219\u5236\u5b9a\u53ef\u64cd\u4f5c\u7684\u884c\u52a8\u8ba1\u5212\uff1b2\uff09\u6784\u5efa\u8005\u901a\u8fc7\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u89c4\u5219\u7cfb\u7edf\u66f4\u65b0\u89c4\u5219\uff0c\u4fc3\u8fdb\u5728\u7ebf\u89c4\u5219\u7ba1\u7406\u5e76\u4fdd\u6301\u5173\u952e\u7ec6\u8282\u3002\u4e3a\u4e86\u51cf\u5c11\u5728\u7ba1\u7406\u89c4\u5219\u65f6\u7684\u5e7b\u89c9\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u6848\u4f8b\u6761\u4ef6\u63d0\u793a\u201d\u7b56\u7565\u7528\u4e8e\u6784\u5efa\u8005\u3002\u6700\u7ec8\uff0c\u7f16\u8bd1\u5668\u4ee3\u7406\u5c06\u8fd9\u4e9b\u89c4\u5219\u6574\u5408\u6210\u4e00\u4efd\u5168\u9762\u7684\u624b\u518c\u3002\u8fd9\u4efd\u81ea\u6211\u751f\u6210\u7684\u624b\u518c\u4e0d\u4ec5\u80fd\u63d0\u9ad8\u9002\u5e94\u6027\uff0c\u8fd8\u80fd\u6307\u5bfc\u5c0f\u578bLLMs\u7684\u89c4\u5212\uff0c\u540c\u65f6\u4fdd\u6301\u4eba\u7c7b\u53ef\u8bfb\u3002\u4ec5\u51ed\u4e00\u6b21\u7b80\u5355\u6f14\u793a\uff0cAutoManual\u663e\u8457\u63d0\u9ad8\u4e86\u4efb\u52a1\u6210\u529f\u7387\uff0cGPT-4-turbo\u4e0b\u8fbe\u523097.4%\uff0cGPT-3.5-turbo\u4e0b\u4e3a86.2%\u3002\u6e90\u4ee3\u7801\u5373\u5c06\u53d1\u5e03\u3002|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208](http://arxiv.org/abs/2405.18208)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u8868\u660e\uff0c\u8fd9\u4e9b\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u4e9b\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\uff0c\u5982\u5199\u4f5c\u548c\u7f16\u7801\uff0c\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u9700\u8981\u7efc\u5408\u89c4\u5212\u7684\u4efb\u52a1\u4e0a\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u4ecd\u662f\u5f53\u524d\u6a21\u578b\u7684\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u65c5\u884c\u89c4\u5212\uff0c\u8fd9\u662f\u4e00\u4e2a\u6d89\u53ca\u591a\u4e2a\u9636\u6bb5\u7684\u590d\u6742\u95ee\u9898\uff0c\u5305\u62ec\u63d0\u7eb2\u3001\u4fe1\u606f\u6536\u96c6\u548c\u89c4\u5212\uff0c\u901a\u5e38\u4f34\u968f\u7740\u5404\u79cd\u7ea6\u675f\u548c\u4e0d\u786e\u5b9a\u6027\u3002\u73b0\u6709\u7684\u63a8\u7406\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u6548\u679c\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u89c4\u5212\u6846\u67b6\uff0c\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6a21\u4eff\u4eba\u7c7b\u89e3\u51b3\u591a\u9636\u6bb5\u95ee\u9898\u7684\u6b65\u9aa4\uff0c\u4ee5\u63d0\u5347\u5176\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5b9e\u65bd\u7b56\u7565\uff0c\u8ba9\u6a21\u578b\u80fd\u4e3a\u6bcf\u4e2a\u65c5\u884c\u67e5\u8be2\u751f\u6210\u8fde\u8d2f\u7684\u63d0\u7eb2\uff0c\u6a21\u62df\u4eba\u7c7b\u7684\u89c4\u5212\u6a21\u5f0f\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u7b56\u7565\u5757\u548c\u77e5\u8bc6\u5757\u5230\u6846\u67b6\u4e2d\uff1a\u7b56\u7565\u5757\u5e2e\u52a9\u4fe1\u606f\u641c\u96c6\uff0c\u800c\u77e5\u8bc6\u5757\u63d0\u4f9b\u8be6\u7ec6\u89c4\u5212\u6240\u9700\u7684\u5fc5\u8981\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u6846\u67b6\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89c4\u5212\u80fd\u529b\u7684\u663e\u8457\u63d0\u5347\uff0c\u4f7f\u5176\u5728\u5904\u7406\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u65f6\u6548\u7387\u548c\u6548\u679c\u90fd\u6709\u6240\u63d0\u9ad8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u4e0eGPT-4-Turbo\u7ed3\u5408\u65f6\uff0c\u6211\u4eec\u7684\u6846\u67b6\u76f8\u8f83\u4e8e\u57fa\u7840\u6846\u67b6\u5728GPT-4-Turbo\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e8610\u500d\u3002|\n", "2405.18113": "|**2024-05-28**|**Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting**|Hongda Sun et.al.|[2405.18113](http://arxiv.org/abs/2405.18113)|null|\u968f\u7740\u5728\u7ebf\u62db\u8058\u670d\u52a1\u7684\u5174\u8d77\uff0c\u4f20\u7edf\u7684\u6c42\u804c\u548c\u62db\u8058\u65b9\u5f0f\u53d1\u751f\u4e86\u53d8\u9769\uff0c\u8feb\u5207\u9700\u8981\u5f00\u53d1\u9ad8\u8d28\u91cf\u7684\u5de5\u4e1a\u5e94\u7528\u6765\u63d0\u5347\u6c42\u804c\u8005\u4e0e\u804c\u4f4d\u7684\u5339\u914d\u5ea6\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u6f5c\u5728\u8bed\u4e49\u5efa\u6a21\uff0c\u5b66\u4e60\u4e24\u8005\u4e4b\u95f4\u7684\u5339\u914d\u51fd\u6570\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89d2\u8272\u626e\u6f14\u65b9\u9762\u5f3a\u5927\u80fd\u529b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u5f15\u5165LLMs\u6a21\u62df\u9762\u8bd5\u73af\u8282\uff0c\u8ba9\u5176\u4e0e\u6c42\u804c\u8005\u8fdb\u884c\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u5019\u9009\u4eba\u8bc4\u4f30\u63d0\u4f9b\u989d\u5916\u8bc1\u636e\uff0c\u4ece\u800c\u589e\u5f3a\u4ec5\u57fa\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u4e2a\u6027\u5316\u5339\u914d\u3002\u7136\u800c\uff0c\u5728\u7f51\u7edc\u62db\u8058\u4e2d\u7684\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u89d2\u8272\u5851\u9020\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5982\u63d0\u95ee\u6280\u5de7\u3001\u56de\u7b54\u6784\u5efa\u4ee5\u53ca\u53cc\u5411\u5339\u914d\u5ea6\u8bc4\u4f30\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMockLLM\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u6846\u67b6\uff0c\u5c06\u4eba\u804c\u5339\u914d\u8fc7\u7a0b\u5212\u5206\u4e3a\u4e24\u4e2a\u6a21\u5757\uff1a\u6a21\u62df\u9762\u8bd5\u751f\u6210\u548c\u63e1\u624b\u534f\u8bae\u4e2d\u7684\u53cc\u5411\u8bc4\u4f30\uff0c\u901a\u8fc7\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u4e4b\u95f4\u7684\u534f\u4f5c\u884c\u4e3a\u5171\u540c\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u89d2\u8272\u3001\u591a\u884c\u4e3a\u7684\u6846\u67b6\uff0c\u4f7f\u5355\u4e00\u7684LLM\u4ee3\u7406\u80fd\u6709\u6548\u5730\u626e\u6f14\u53cc\u65b9\u7684\u4e0d\u540c\u804c\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cd\u601d\u8bb0\u5fc6\u751f\u6210\u548c\u52a8\u6001\u63d0\u793a\u4fee\u6539\u6280\u672f\uff0c\u4ee5\u4f18\u5316\u53cc\u65b9\u7684\u884c\u4e3a\uff0c\u6301\u7eed\u4f18\u5316\u9644\u52a0\u7684\u8bc4\u4f30\u8bc1\u636e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMockLLM\u5728\u4eba\u804c\u5339\u914d\u4e0a\u7684\u8868\u73b0\u6700\u4f18\uff0c\u4e14\u6a21\u62df\u9762\u8bd5\u8d28\u91cf\u9ad8\uff0c\u9884\u793a\u7740\u5b83\u5728\u672a\u6765\u5728\u7ebf\u62db\u8058\u4e2d\u7684\u5b9e\u9645\u5e94\u7528\u524d\u666f\u5e7f\u9614\u3002|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092](http://arxiv.org/abs/2405.18092)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|**\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e94\u7528\u4e8e\u6570\u5b57\u5b6a\u751f\u8fc7\u7a0b\u6a21\u62df\u7684\u53c2\u6570\u81ea\u52a8\u5316\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5305\u542b\u89c2\u5bdf\u3001\u63a8\u7406\u3001\u51b3\u7b56\u548c\u603b\u7ed3\u56db\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\u3002\u901a\u8fc7\u5b9e\u73b0LLM\u4ee3\u7406\u4e0e\u6a21\u62df\u6a21\u578b\u7684\u52a8\u6001\u4ea4\u4e92\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u81ea\u52a8\u63a2\u7d22\u53c2\u6570\u8bbe\u7f6e\uff0c\u5229\u7528\u542f\u53d1\u5f0f\u63a8\u7406\u786e\u5b9a\u4e00\u7ec4\u63a7\u5236\u6a21\u62df\u4ee5\u8fbe\u6210\u76ee\u6807\u7684\u53c2\u6570\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u6ce8\u5165LLM\u7684\u542f\u53d1\u5f0f\uff0c\u589e\u5f3a\u6a21\u62df\u6a21\u578b\uff0c\u5e76\u652f\u6301\u81ea\u4e3b\u641c\u7d22\u4ee5\u89e3\u51b3\u7528\u6237\u4efb\u52a1\uff0c\u6709\u671b\u63d0\u9ad8\u7528\u6237\u4f53\u9a8c\u5e76\u51cf\u8f7b\u4eba\u7c7b\u7528\u6237\u5728\u590d\u6742\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u7814\u7a76\u901a\u8fc7\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u4e0e\u529f\u80fd\uff0c\u5e76\u5728GitHub\u4ed3\u5e93\u63d0\u4f9b\u4e86\u53ef\u89c6\u5316\u7684\u6f14\u793a\u3002**|\n", "2405.17837": "|**2024-05-28**|**Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces**|Qiuyu Lu et.al.|[2405.17837](http://arxiv.org/abs/2405.17837)|null|\u5728\u4eba\u673a\u4ea4\u4e92\uff08HCI\uff09\u9886\u57df\uff0c\u4ea4\u4e92\u8bbe\u5907\u7684\u8bbe\u8ba1\u5f00\u53d1\u662f\u5173\u952e\u5173\u6ce8\u70b9\u3002\u968f\u7740\u65b0\u578b\u786c\u4ef6\u548c\u5148\u8fdb\u5236\u9020\u6280\u672f\u7684\u5174\u8d77\uff0c\u5bf9\u80fd\u591f\u7b80\u5316\u539f\u578b\u5236\u4f5c\u8fc7\u7a0b\u7684\u4e13\u95e8\u8bbe\u8ba1\u5de5\u5177\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5de5\u5177\u867d\u7136\u901a\u8fc7\u53c2\u6570\u5316\u8bbe\u8ba1\u548c\u6a21\u62df\u7b80\u5316\u6d41\u7a0b\uff0c\u4f46\u5b66\u4e60\u66f2\u7ebf\u8f83\u9661\uff0c\u4e14\u5728\u6fc0\u53d1\u521b\u65b0\u601d\u7ef4\u65b9\u9762\u6709\u6240\u6b20\u7f3a\u3002\u672c\u7814\u7a76\u4ee5\u6d41\u4f53\u8ba1\u7b97\u754c\u9762\u4e3a\u4f8b\uff0c\u63a2\u8ba8\u5982\u4f55\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u589e\u5f3a\u7269\u7406\u8bbe\u5907\u8bbe\u8ba1\u5de5\u5177\uff0c\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u8bbe\u8ba1\u5de5\u5177\uff08GDT\uff09\u3002\u501f\u52a9LLM\uff0cGDT\u80fd\u591f\u7406\u89e3\u65b0\u8bbe\u5907\u7684\u7279\u6027\u548c\u5c40\u9650\uff0c\u63d0\u51fa\u591a\u6837\u3001\u5bcc\u6709\u6d1e\u5bdf\u529b\u4e14\u5b9e\u7528\u7684\u5e94\u7528\u573a\u666f\uff0c\u63a8\u8350\u6280\u672f\u548c\u60c5\u5883\u9002\u5b9c\u7684\u8bbe\u5907\u8bbe\u8ba1\uff0c\u5e76\u81ea\u52a8\u751f\u6210\u8bbe\u8ba1\u53c2\u6570\uff0c\u4ee5\u4fbf\u4f20\u7edf\u8bbe\u8ba1\u5de5\u5177\u5c55\u793a\u7ed3\u679c\u5e76\u751f\u6210\u52a0\u5de5\u6240\u9700\u7684\u6587\u4ef6\u3002\u672c\u6587\u9610\u8ff0\u4e86GDT\u7684\u6846\u67b6\u3001\u5b9e\u73b0\u548c\u6027\u80fd\uff0c\u5e76\u53cd\u601d\u5176\u524d\u666f\u53ca\u9047\u5230\u7684\u6311\u6218\u3002|\n", "2405.20267": "|**2024-05-30**|**Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions**|Ruochen Zhao et.al.|[2405.20267](http://arxiv.org/abs/2405.20267)|**[link](https://github.com/Auto-Arena/Auto-Arena-LLMs)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65e5\u65b0\u6708\u5f02\uff0c\u8feb\u5207\u9700\u8981\u4e00\u79cd\u53ef\u9760\u4e14\u53ca\u65f6\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u9274\u4e8e\u9759\u6001\u57fa\u51c6\u6613\u53d7\u6c61\u67d3\uff0c\u7528\u6237\u5f80\u5f80\u4f9d\u8d56\u4e8e\u50cfChatbot Arena\u8fd9\u6837\u7684\u4eba\u7c7b\u6295\u7968\u5e73\u53f0\u3002\u7136\u800c\uff0c\u4eba\u5de5\u6807\u6ce8\u9700\u8981\u5927\u91cf\u4eba\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51faAuto-Arena\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u52a8\u5316\u5168\u6d41\u7a0b\u7684LLM\u8bc4\u4f30\u6846\u67b6\u3002\u9996\u5148\uff0c\u7531\u8003\u5b98LLM\u8bbe\u8ba1\u95ee\u9898\uff1b\u63a5\u7740\uff0c\u5019\u9009LLMs\u56f4\u7ed5\u95ee\u9898\u8fdb\u884c\u591a\u8f6e\u76f8\u4e92\u5bf9\u51b3\uff0c\u66b4\u9732\u51fa\u5b83\u4eec\u7684\u771f\u5b9e\u6027\u80fd\u5dee\u8ddd\uff1b\u6700\u540e\uff0c\u7531LLM\u88c1\u5224\u96c6\u4f53\u8ba8\u8bba\u5e76\u51b3\u5b9a\u80dc\u8005\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u89c1\uff0c\u63d0\u5347\u516c\u5e73\u6027\u3002\u6211\u4eec\u5728\u6700\u65b017\u6b3eLLMs\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0cAuto-Arena\u4e0e\u4eba\u7c7b\u504f\u597d\u5177\u6709\u6700\u9ad8\u7684\u76f8\u5173\u6027\uff0c\u4e3a\u66ff\u4ee3\u4eba\u7c7b\u8bc4\u4ef7\u5e73\u53f0\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189](http://arxiv.org/abs/2405.20189)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u4e3aNadine\u793e\u4ea4\u673a\u5668\u4eba\u5e73\u53f0\u5f00\u53d1\u667a\u80fd\u548c\u5065\u58ee\u7684\u793e\u4ea4\u673a\u5668\u4eba\u7cfb\u7edf\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u901a\u8fc7\u96c6\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5de7\u5999\u5730\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u7684\u5f3a\u5927\u63a8\u7406\u548c\u6307\u4ee4\u6267\u884c\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u63a5\u8fd1\u4eba\u7c7b\u7684\u611f\u6027\u4e0e\u8ba4\u77e5\u80fd\u529b\u3002\u8fd9\u4e0e\u5f53\u524d\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u76f8\u6bd4\u662f\u521b\u65b0\u7684\uff0c\u56e0\u4e3a\u5b83\u4eec\u901a\u5e38\u4e0d\u5177\u5907\u4eba\u7c7b\u5f0f\u7684\u957f\u671f\u8bb0\u5fc6\u6216\u590d\u6742\u7684\u60c5\u611f\u8bc4\u4f30\u529f\u80fd\u3002\u793e\u4ea4\u673a\u5668\u4eba\u7684\u81ea\u7136\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u7cfb\u7edf\u5404\u7ec4\u4ef6\u7684\u6027\u80fd\u548c\u534f\u540c\u5de5\u4f5c\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7cfb\u7edf\uff0c\u80fd\u591f\u901a\u8fc7\u591a\u6a21\u6001\u8f93\u5165\u5904\u7406\u751f\u6210\u6070\u5f53\u7684\u884c\u4e3a\uff0c\u6839\u636e\u8bc6\u522b\u5230\u7684\u7528\u6237\u5f15\u5165\u76f8\u5173\u7684\u60c5\u666f\u8bb0\u5fc6\uff0c\u5e76\u6a21\u62df\u673a\u5668\u4eba\u5728\u4e0e\u4eba\u7c7b\u4f19\u4f34\u4e92\u52a8\u8fc7\u7a0b\u4e2d\u4ea7\u751f\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9\u793e\u4ea4\u673a\u5668\u4eba\u7684LLM-agent\u6846\u67b6\uff0cSoR-ReAct\uff0c\u4f5c\u4e3a\u6211\u4eec\u7cfb\u7edf\u4e2d\u4ea4\u4e92\u6a21\u5757\u7684\u6838\u5fc3\u7ec4\u4ef6\u3002\u8fd9\u4e00\u8bbe\u8ba1\u63a8\u52a8\u4e86\u793e\u4ea4\u673a\u5668\u4eba\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u63d0\u5347\u4eba\u673a\u4ea4\u4e92\u7684\u8d28\u91cf\u3002|\n", "2405.19425": "|**2024-05-29**|**Adaptive In-conversation Team Building for Language Model Agents**|Linxin Song et.al.|[2405.19425](http://arxiv.org/abs/2405.19425)|null|### \u7ffb\u8bd1 \u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u5229\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u524d\u666f\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u7279\u5b9a\u5e94\u7528\u8bbe\u8ba1\u6709\u6548\u7684\u591a\u4ee3\u7406\u56e2\u961f\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u56e2\u961f\u6784\u5efa\u8303\u5f0f\uff0c\u540d\u4e3a\u201cCaptain Agent\u201d\u3002\u5b83\u901a\u8fc7\u521b\u65b0\u7684Agent\u8bbe\u8ba1\uff0c\u80fd\u591f\u81ea\u9002\u5e94\u5730\u4e3a\u6bcf\u4e2a\u95ee\u9898\u89e3\u51b3\u6b65\u9aa4\u7ec4\u5efa\u548c\u7ba1\u7406\u56e2\u961f\uff0c\u5229\u7528\u5d4c\u5957\u7fa4\u804a\u548c\u53cd\u601d\u673a\u5236\u786e\u4fdd\u591a\u5143\u5316\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u9632\u6b62\u523b\u677f\u8f93\u51fa\u3002\u8fd9\u79cd\u65b9\u6cd5\u63d0\u4f9b\u4e86\u7075\u6d3b\u4f46\u7ed3\u6784\u5316\u7684\u89e3\u51b3\u95ee\u9898\u65b9\u5f0f\uff0c\u6709\u52a9\u4e8e\u51cf\u5c11\u5197\u4f59\uff0c\u589e\u5f3a\u8f93\u51fa\u591a\u6837\u6027\u3002\u5728\u516d\u4e2a\u5b9e\u9645\u573a\u666f\u4e2d\u7684\u5168\u9762\u8bc4\u4f30\u663e\u793a\uff0cCaptain Agent\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u591a\u4ee3\u7406\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8621.94%\uff0c\u5e76\u4e14\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u7e41\u7410\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.01422": "|**2024-06-03**|**How to Understand Whole Software Repository?**|Yingwei Ma et.al.|[2406.01422](http://arxiv.org/abs/2406.01422)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u8f6f\u4ef6\u5de5\u7a0b\uff08ASE\uff09\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5c3d\u7ba1\u73b0\u6709\u65b9\u6cd5\u5df2\u8bc1\u5b9e\u6709\u6548\uff0c\u4f46\u5b83\u4eec\u7684\u8bbe\u8ba1\u4e3b\u8981\u4fa7\u91cd\u4e8e\u4ee3\u7801\u7684\u5c40\u90e8\u4fe1\u606f\uff0c\u5982\u95ee\u9898\u3001\u7c7b\u548c\u51fd\u6570\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u8f6f\u4ef6\u7cfb\u7edf\u5168\u5c40\u4e0a\u4e0b\u6587\u548c\u4f9d\u8d56\u5173\u7cfb\u7684\u7406\u89e3\u3002\u6839\u636e\u8f6f\u4ef6\u5f00\u53d1\u4eba\u5458\u7684\u5b9e\u9645\u7ecf\u9a8c\uff0c\u6211\u4eec\u8ba4\u4e3a\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u662f\u8fc8\u5411ASE\u7684\u5173\u952e\u3002\u7136\u800c\uff0c\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u5e26\u6765\u4e86\u8bf8\u591a\u6311\u6218\uff0c\u4f8b\u5982\uff1a\u957f\u4ee3\u7801\u8f93\u5165\u3001\u566a\u58f0\u4ee3\u7801\u4fe1\u606f\u3001\u590d\u6742\u4f9d\u8d56\u5173\u7cfb\u7b49\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u7814\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aRepoUnderstander\u7684\u65b0ASE\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5bfc\u4ee3\u7406\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u3002\u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u65b9\u5f0f\u5c06\u6574\u4e2a\u4ed3\u5e93\u7684\u5173\u952e\u4fe1\u606f\u538b\u7f29\u5230\u77e5\u8bc6\u56fe\u8c31\u4e2d\uff0c\u4ee5\u964d\u4f4e\u590d\u6742\u6027\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08Monte Carlo Tree Search, MCTS\uff09\u4e3a\u57fa\u7840\u7684\u4ed3\u5e93\u63a2\u7d22\u7b56\u7565\uff0c\u8d4b\u4e88\u4ee3\u7406\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528\u4ed3\u5e93\u7ea7\u522b\u7684\u77e5\u8bc6\uff0c\u6211\u4eec\u6307\u5bfc\u4ee3\u7406\u8fdb\u884c\u603b\u7ed3\u3001\u5206\u6790\u548c\u89c4\u5212\uff0c\u7136\u540e\u4ed6\u4eec\u53ef\u4ee5\u5229\u7528\u5de5\u5177\u52a8\u6001\u83b7\u53d6\u4fe1\u606f\u5e76\u751f\u6210\u4fee\u590d\u5b9e\u9645GitHub\u95ee\u9898\u7684\u8865\u4e01\u3002 \u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cRepoUnderstander\u5177\u6709\u4f18\u8d8a\u6027\u548c\u6709\u6548\u6027\u3002\u5728SWE-bench Lite\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e0eSWE-agent\u76f8\u6bd4\uff0c\u5b83\u5b9e\u73b0\u4e8618.5%\u7684\u76f8\u5bf9\u63d0\u5347\u3002|\n", "2406.01364": "|**2024-06-03**|**BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards**|Diego Dorn et.al.|[2406.01364](http://arxiv.org/abs/2406.01364)|null|## \u80cc\u666f \u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u673a\u5236\u88ab\u7528\u4e8e\u68c0\u6d4b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7cfb\u7edf\u7684\u5f02\u5e38\u8f93\u51fa\u3002\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5728\u5b9e\u65f6\u76d1\u63a7\u3001\u79bb\u7ebf\u8bc4\u4f30\u548c\u5185\u5bb9\u5ba1\u6838\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u53d1\u6325\u6838\u5fc3\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u7edf\u4e00\u7684\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u5b83\u4eec\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b89\u5168\u9632\u62a4\u57fa\u51c6\u201d\uff08Benchmarks for the Evaluation of LLM Safeguards\uff0c\u7b80\u79f0BELLS\uff09\uff0c\u5b83\u662f\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6d4b\u8bd5\u96c6\u5408\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\uff1a(1) \u5efa\u7acb\u6027\u6545\u969c\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u5df2\u5b58\u5728\u7684\u9488\u5bf9\u660e\u786e\u6545\u969c\u6a21\u5f0f\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u6bd4\u8f83\u5f53\u524d\u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u7684\u6548\u80fd\uff1b(2) \u65b0\u5174\u6545\u969c\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8861\u91cf\u5bf9\u672a\u89c1\u8fc7\u7684\u6545\u969c\u6a21\u5f0f\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u901a\u7528\u9632\u62a4\u673a\u5236\u7684\u53d1\u5c55\uff1b(3) \u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u9488\u5bf9\u66f4\u590d\u6742\u7684\u67b6\u6784\uff08\u5982LLM\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\uff09\uff0c\u76ee\u6807\u662f\u63a8\u52a8\u9002\u7528\u4e8e\u672a\u6765\u5c1a\u672a\u5b58\u5728\u4e13\u95e8\u9632\u62a4\u7684\u5e94\u7528\u7684\u5b89\u5168\u9632\u62a4\u6280\u672f\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u5e76\u5206\u4eab\u4e86\u7b2c\u4e00\u4e2a\u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u4f7f\u7528MACHIAVELLI\u73af\u5883\uff0c\u5e76\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u7684\u4ea4\u4e92\u5f0f\u53ef\u89c6\u5316\u3002|\n", "2406.00936": "|**2024-06-03**|**A Survey of Useful LLM Evaluation**|Ji-Lun Peng et.al.|[2406.00936](http://arxiv.org/abs/2406.00936)|null|\u7531\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u7814\u7a76\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5bf9\u5b83\u4eec\u7684\u80fd\u529b\u8bc4\u4f30\u65b9\u6cd5\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4ee5\u786e\u5b9a\u5176\u5408\u9002\u7684\u4efb\u52a1\u548c\u8d23\u4efb\u3002\u672c\u6587\u4e3b\u8981\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5730\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5de5\u5177\uff0c\u5e76\u63d0\u51fa\u4e00\u4e2a\u4e24\u9636\u6bb5\u6846\u67b6\uff1a\u4ece\u201c\u6838\u5fc3\u80fd\u529b\u201d\u5230\u201c\u4ee3\u7406\u201d\u3002\u9996\u5148\uff0c\u6838\u5fc3\u80fd\u529b\u6307\u7684\u662f\u5927\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u6240\u5fc5\u9700\u7684\u7279\u6027\uff0c\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u80fd\u529b\u540e\uff0c\u5b83\u4eec\u80fd\u591f\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u4efb\u52a1\uff0c\u626e\u6f14\u4ee3\u7406\u89d2\u8272\u3002\u5728\u201c\u6838\u5fc3\u80fd\u529b\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3001\u793e\u4f1a\u5f71\u54cd\u4ee5\u53ca\u9886\u57df\u77e5\u8bc6\u3002\u800c\u5728\u201c\u4ee3\u7406\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5177\u8eab\u884c\u52a8\u3001\u89c4\u5212\u548c\u5de5\u5177\u5b66\u4e60\u65b9\u9762\u7684\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2406.01637": "|**2024-06-02**|**Teams of LLM Agents can Exploit Zero-Day Vulnerabilities**|Richard Fang et.al.|[2406.01637](http://arxiv.org/abs/2406.01637)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u590d\u6742\u6027\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7814\u7a76\u8005\u53d1\u73b0\uff0c\u5f53\u63d0\u4f9b\u6f0f\u6d1e\u63cf\u8ff0\u548c\u7b80\u5355\u7684\u593a\u65d7\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5229\u7528\u5b9e\u9645\u5b58\u5728\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4e8b\u5148\u672a\u77e5\u7684\u96f6\u65e5\u6f0f\u6d1e\uff08\u5373\u653b\u51fb\u8005\u638c\u63e1\u800c\u5b89\u5168\u8f6f\u4ef6\u4f9b\u5e94\u5546\u8fd8\u672a\u4fee\u8865\u7684\u6f0f\u6d1e\uff09\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4ecd\u7136\u4e0d\u4f73\u3002\u672c\u6587\u5c55\u793a\u4e86\uff0c\u901a\u8fc7\u56e2\u961f\u5408\u4f5c\uff0c\u591a\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u653b\u51fb\u73b0\u5b9e\u4e16\u754c\u7684\u96f6\u65e5\u6f0f\u6d1e\u3002\u5355\u72ec\u7684\u4ee3\u7406\u5728\u63a2\u7d22\u4f17\u591a\u6f0f\u6d1e\u548c\u8fdb\u884c\u957f\u671f\u89c4\u5212\u65f6\u9762\u4e34\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HPTSA\u7cfb\u7edf\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u80fd\u8c03\u5ea6\u5b50\u4ee3\u7406\u7684\u8ba1\u5212\u4ee3\u7406\u3002\u8ba1\u5212\u4ee3\u7406\u8d1f\u8d23\u63a2\u7d22\u7cfb\u7edf\u5e76\u51b3\u5b9a\u4f7f\u7528\u54ea\u4e2a\u5b50\u4ee3\u7406\u6765\u5c1d\u8bd5\u4e0d\u540c\u7684\u6f0f\u6d1e\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u957f\u671f\u89c4\u5212\u7684\u95ee\u9898\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u5305\u542b15\u4e2a\u771f\u5b9e\u4e16\u754c\u6f0f\u6d1e\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u56e2\u961f\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u63d0\u9ad8\u4e864.5\u500d\u3002|\n", "2406.00583": "|**2024-06-02**|**CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems**|Yanlin Feng et.al.|[2406.00583](http://arxiv.org/abs/2406.00583)|**[link](https://github.com/megagonlabs/CMDBench)**|### \u80cc\u666f \u5728\u6570\u636e\u5e93\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff08Compound Artificial Intelligence Systems\uff0cCAS\uff09\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u4f5c\u4e3a\u4ee3\u7406\uff0c\u901a\u8fc7\u4e0e\u5de5\u5177\u548c\u6570\u636e\u68c0\u7d22\u5668\u4ea4\u4e92\u6765\u6267\u884c\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u7cfb\u7edf\u6709\u53ef\u80fd\u589e\u5f3a\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u4e2d\u6570\u636e\u5206\u6790\u5e08\u7684\u4e00\u822c\u5206\u6790\u6d41\u7a0b\uff0c\u4f46CAS\u9762\u4e34\u7740\u4e0e\u5206\u6790\u5e08\u76f8\u4f3c\u7684\u6570\u636e\u53d1\u73b0\u6311\u6218\uff1a\u7ec4\u7ec7\u5185\u90e8\u4e0d\u540c\u56e2\u961f\u548c\u90e8\u95e8\u521b\u5efa\u7684\u591a\u6a21\u6001\u6570\u636e\u6e90\u5b64\u7acb\uff0c\u8fd9\u4f7f\u5f97\u5bfb\u627e\u5b8c\u6210\u5f53\u524d\u4efb\u52a1\u6240\u9700\u5408\u9002\u6570\u636e\u6e90\u53d8\u5f97\u56f0\u96be\u3002\u73b0\u6709\u7684\u6570\u636e\u53d1\u73b0\u57fa\u51c6\u5e76\u672a\u5145\u5206\u6a21\u62df\u8fd9\u79cd\u591a\u6a21\u6001\u548c\u6570\u636e\u6e90\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0cCAS\u7684\u73b0\u6709\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u7aef\u5230\u7aef\u4efb\u52a1\u6027\u80fd\u8bc4\u4f30\uff0c\u800c\u5ffd\u89c6\u4e86\u6570\u636e\u53d1\u73b0\u6027\u80fd\u3002 \u4e3a\u4e86\u63a8\u52a8\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5bf9\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728CAS\u4e2d\u7684\u6570\u636e\u53d1\u73b0\u6027\u80fd\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CMDBench\uff0c\u4e00\u4e2a\u65e8\u5728\u6a21\u62df\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u6539\u7f16\u4e86\u5f00\u653e\u9886\u57df\u7684\u73b0\u6709\u6570\u636e\u96c6\u548c\u57fa\u51c6\uff0c\u5982\u95ee\u7b54\u3001\u590d\u6742\u63a8\u7406\u4ee5\u53ca\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u7ed3\u6784\u5316\u6570\u636e\uff0c\u6765\u8bc4\u4f30\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u53d1\u73b0\u4ee5\u53ca\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002 ### \u5b9e\u9a8c\u7ed3\u679c \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u68c0\u7d22\u5668\u8bbe\u8ba1\u5bf9\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u2014\u2014\u5e73\u5747\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u4e8646%\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u9700\u8981\u5f00\u53d1\u4f18\u5316\u7b56\u7565\u6765\u786e\u5b9a\u5408\u9002\u7684LLM\u4ee3\u7406\u548c\u68c0\u7d22\u5668\uff0c\u4ee5\u63d0\u9ad8\u5728\u4f01\u4e1a\u6570\u636e\u4e0a\u9ad8\u6548\u6267\u884cCAS\u7684\u80fd\u529b\u3002 \u603b\u4e4b\uff0cCMDBench\u662f\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u9488\u5bf9\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u8fdb\u884c\u7814\u7a76\u7684\u65b0\u5de5\u5177\uff0c\u5b83\u901a\u8fc7\u7efc\u5408\u8bc4\u4f30\u6570\u636e\u53d1\u73b0\u548c\u4efb\u52a1\u6267\u884c\u80fd\u529b\uff0c\u4e3a\u6539\u8fdb\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u6846\u67b6\u3002|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244](http://arxiv.org/abs/2406.00244)|null|\u968f\u7740\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u666e\u904d\u9002\u7528\u6027\u63d0\u5347\uff0c\u4eba\u4eec\u5bf9\u5176\u7528\u4f5c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4ee3\u7406\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\u3002\u5728\u8fd9\u4e9b\u60c5\u5883\u4e0b\uff0c\u6a21\u578b\u9700\u8981\u6839\u636e\u4e0e\u73af\u5883\u7684\u6709\u9650\u4ea4\u4e92\u5f62\u6210\u76ee\u6807\u5b9e\u73b0\u7b56\u7565\u7684\u4fe1\u5ff5\uff0c\u5e76\u5728\u6bcf\u4e00\u6b65\u51b3\u7b56\u4e2d\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\u8fdb\u884c\u7814\u7a76\uff0c\u901a\u8fc7\u63a7\u5236\u7684\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u5b9e\u9a8c\u63a2\u8ba8LLMs\u5982\u4f55\u5f62\u6210\u548c\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u5ff5\u3002 \u9996\u5148\uff0c\u6211\u4eec\u53d1\u73b0LLM\u6a21\u578b\u8fc7\u4e8e\u81ea\u4fe1\uff1a\u5b83\u4eec\u5728\u7f3a\u4e4f\u5145\u5206\u8bc1\u636e\u7684\u60c5\u51b5\u4e0b\u5c31\u5bf9\u884c\u52a8\u505a\u51fa\u5f3a\u70c8\u5224\u65ad\uff0c\u5bfc\u81f4\u63a2\u7d22\u884c\u4e3a\u4e0d\u8db3\u3002\u8fdb\u4e00\u6b65\u6df1\u5165\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u79cd\u73b0\u8c61\u6e90\u4e8e\u4eceLLM\u91c7\u6837\u5f97\u5230\u7684\u52a8\u4f5c\u5206\u5e03\u71b5\u7684\u584c\u7f29\u3002\u63a5\u7740\uff0c\u6211\u4eec\u6307\u51fa\u73b0\u6709\u7684\u57fa\u4e8e\u4ee4\u724c\u7684\u91c7\u6837\u65b9\u6cd5\u672c\u8eab\u4e0d\u8db3\u4ee5\u4fc3\u4f7f\u6a21\u578b\u66f4\u5e7f\u6cdb\u63a2\u7d22\u3002 \u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u71b5\u6fc0\u6d3b\u5bfc\u5411\uff08Entropic Activation Steering\uff0cEAST\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5728\u4e0a\u4e0b\u6587\u4e2d\u7684LLM\u4ee3\u7406\u7684\u6fc0\u6d3b\u5bfc\u5411\u65b9\u6cd5\u3002EAST\u8ba1\u7b97\u4e00\u4e2a\u4ee5\u71b5\u4e3a\u6743\u91cd\u7684\u8868\u793a\u7ec4\u5408\uff0c\u901a\u8fc7\u5728\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u5e72\u9884\u6a21\u578b\u7684\u6fc0\u6d3b\uff0c\u6765\u8c03\u6574\u6a21\u578b\u5bf9\u52a8\u4f5c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u4ece\u800c\u4fc3\u8fdb\u63a2\u7d22\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u6700\u540e\uff0cEAST\u6539\u53d8\u4e86LLM\u5728\u51b3\u7b56\u65f6\u8868\u8fbe\u7684\u4e3b\u89c2\u4e0d\u786e\u5b9a\u6027\uff0c\u4e3a\u7406\u89e3\u548c\u63a7\u5236\u6a21\u578b\u5bf9\u51b3\u7b56\u4e0d\u786e\u5b9a\u6027\u7684\u8868\u5f81\u63d0\u4f9b\u4e86\u9014\u5f84\u3002|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222](http://arxiv.org/abs/2406.00222)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u8fc5\u901f\u6210\u4e3a\u6784\u5efa\u667a\u80fd\u5bf9\u8bdd\u52a9\u624b\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u8bf8\u5982\u6b67\u4e49\u5904\u7406\u7b49\u5bf9\u8bdd\u6280\u80fd\u4e0a\u4ecd\u6709\u6b20\u7f3a\uff1a\u5f53\u901a\u7528\u52a9\u624b\u9047\u5230\u6a21\u7cca\u60c5\u51b5\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u8fc7\u5ea6\u8c28\u614e\u6216\u731c\u6d4b\u7528\u6237\u7684\u771f\u6b63\u610f\u56fe\uff0c\u800c\u4e0d\u662f\u63d0\u95ee\u4ee5\u6c42\u6f84\u6e05\uff0c\u800c\u5728\u7279\u5b9a\u4efb\u52a1\u573a\u666f\u4e0b\uff0c\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6837\u672c\u5f80\u5f80\u6709\u9650\uff0c\u5f71\u54cd\u6a21\u578b\u5b66\u4e60\u6700\u4f18\u5bf9\u8bdd\u884c\u4e3a\u7b56\u7565\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAction-Based Contrastive Self-Training\uff08ACT\uff09\u7684\u8fd1\u4f3c\u5728\u7ebf\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u5b83\u57fa\u4e8eDirect Preference Optimization\uff08DPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u6837\u672c\u9ad8\u6548\u5bf9\u8bdd\u7b56\u7565\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4e09\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u9a8c\u8bc1\u4e86ACT\u7684\u6709\u6548\u6027\uff1a\u57fa\u4e8e\u8868\u683c\u7684\u95ee\u7b54\u3001\u673a\u5668\u9605\u8bfb\u7406\u89e3\uff0c\u4ee5\u53caAmbigSQL\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u6587\u672c\u5230SQL\u751f\u6210\u7684\u4fe1\u606f\u5bfb\u6c42\u8bf7\u6c42\u6b67\u4e49\u89e3\u51b3\u7684\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u8bc4\u4f30LLMs\u80fd\u5426\u5728\u5bf9\u8bdd\u4e2d\u8bc6\u522b\u548c\u63a8\u7406\u6b67\u4e49\u6765\u8861\u91cf\u5176\u4f5c\u4e3a\u5bf9\u8bdd\u4ee3\u7406\u7684\u80fd\u529b\u3002ACT\u5728\u4e0e\u6807\u51c6\u76d1\u7763\u5fae\u8c03\u548cDPO\u65b9\u6cd5\u76f8\u6bd4\u65f6\uff0c\u663e\u793a\u51fa\u4e86\u663e\u8457\u7684\u5bf9\u8bdd\u5efa\u6a21\u6539\u8fdb\u3002|\n", "2406.00215": "|**2024-05-31**|**Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent**|Jie JW Wu et.al.|[2406.00215](http://arxiv.org/abs/2406.00215)|**[link](https://github.com/jie-jw-wu/human-eval-comm)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u663e\u8457\u63d0\u5347\uff0c\u4f46\u4ecd\u4e0e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u7684\u6c34\u5e73\u5b58\u5728\u5dee\u8ddd\u3002\u9274\u4e8e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u5e38\u901a\u8fc7\u63d0\u95ee\u6765\u6d88\u9664\u9700\u6c42\u548c\u7f16\u7801\u89e3\u51b3\u65b9\u6848\u4e2d\u7684\u6a21\u7cca\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u5bf9\u4e8eLLMs\u8fdb\u884c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u65f6\u4e5f\u5e94\u5177\u5907\u7c7b\u4f3c\u7684\u6c9f\u901a\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u7814\u7a76\uff0c\u5173\u6ce8LLMs\u7684\u6c9f\u901a\u6280\u80fd\uff0c\u5373\u201c\u5728\u4ee3\u7801\u751f\u6210\u95ee\u9898\u63cf\u8ff0\u5b58\u5728\u95ee\u9898\u65f6\u80fd\u63d0\u51fa\u6f84\u6e05\u95ee\u9898\u201d\u3002 \u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540d\u4e3aHumanEvalComm\uff0c\u901a\u8fc7\u4fee\u6539\u95ee\u9898\u63cf\u8ff0\uff0c\u5f15\u5165\u4e86\u4e0d\u4e00\u81f4\u6027\u3001\u6a21\u7cca\u6027\u548c\u4e0d\u5b8c\u6574\u6027\u4e09\u4e2a\u95ee\u9898\u7ef4\u5ea6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u901a\u4fe1\u7387\u548c\u826f\u597d\u95ee\u9898\u7387\uff0c\u5e76\u5728HumanEvalComm\u4e0a\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684Code LLM\uff08\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\uff09\u4ee5\u53ca\u4e00\u79cd\u65b0\u578bLLM\u4ee3\u7406\u65b9\u6cd5\uff08Okanagan\uff09\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u4ece\u4ee3\u7801\u548c\u63cf\u8ff0\u4e2d\u8bc6\u522b\u5e76\u63d0\u95ee\uff0c\u4ee5\u8fdb\u4e00\u6b65\u4f18\u5316\u751f\u6210\u7684\u4ee3\u7801\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u6bd4\u8f83Code LLMs\u548cOkanagan\u7684\u8868\u73b0\uff0c\u8ba8\u8bba\u4e86\u5b9e\u9a8c\u7ed3\u679c\u3002|\n", "2406.03299": "|**2024-06-05**|**The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games**|Mikhail Mozikov et.al.|[2406.03299](http://arxiv.org/abs/2406.03299)|null|## \u7ffb\u8bd1 \u884c\u4e3a\u7814\u7a76\u5b9e\u9a8c\u5728\u793e\u4f1a\u6a21\u578b\u548c\u7406\u89e3\u4eba\u9645\u4e92\u52a8\u4e2d\u5360\u636e\u91cd\u8981\u5730\u4f4d\u3002\u7136\u800c\uff0c\u5b9e\u9645\u64cd\u4f5c\u4e2d\u8fd9\u7c7b\u5b9e\u9a8c\u5e38\u9762\u4e34\u5185\u5728\u6548\u5ea6\u3001\u5916\u5728\u6548\u5ea6\u3001\u53ef\u91cd\u590d\u6027\u548c\u793e\u4f1a\u504f\u89c1\u7b49\u6311\u6218\uff0c\u56e0\u4e3a\u4eba\u7c7b\u7684\u793e\u4f1a\u4e92\u52a8\u4e0e\u5408\u4f5c\u590d\u6742\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u4f46\u73b0\u6709\u57fa\u4e8eLLM\u7684\u6a21\u62df\u5047\u8bbe\u6a21\u578b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u76f8\u4f3c\uff0c\u5374\u5ffd\u89c6\u4e86\u5f71\u54cd\u4eba\u7c7b\u51b3\u7b56\u7684\u5173\u952e\u56e0\u7d20\u2014\u2014\u60c5\u7eea\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u8bba\u548c\u6846\u67b6\uff0c\u65e8\u5728\u63a2\u8ba8LLMs\u7684\u51b3\u7b56\u5236\u5b9a\u53ca\u5176\u5728\u60c5\u7eea\u72b6\u6001\u4e0b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002 \u901a\u8fc7\u5728\u4e24\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u884c\u4e3a\u7ecf\u6d4e\u5b66\u6e38\u620f\uff08\u535a\u5f08\u8bba\u5b9e\u9a8c\uff09\u4e2d\u4f7f\u7528GPT-3.5\u548cGPT-4\uff0c\u6211\u4eec\u53d1\u73b0\u60c5\u7eea\u5bf9LLMs\u7684\u8868\u73b0\u6709\u663e\u8457\u5f71\u54cd\uff0c\u4fc3\u4f7f\u5b83\u4eec\u53d1\u5c55\u51fa\u66f4\u4f18\u5316\u7684\u7b56\u7565\u3002\u5c3d\u7ba1GPT-3.5\u4e0e\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u884c\u52a8\u6a21\u5f0f\u6709\u8f83\u5f3a\u7684\u5bf9\u5e94\uff0c\u5c24\u5176\u662f\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u6e38\u620f\u4e2d\uff0c\u4f46GPT-4\u5c55\u73b0\u51fa\u4e00\u81f4\u7684\u884c\u4e3a\uff0c\u5bf9\u4e8e\u60c5\u7eea\u8bf1\u5bfc\u7684\u7406\u6027\u51b3\u7b56\u4f3c\u4e4e\u4e0d\u53d7\u5f71\u54cd\u3002\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u60c5\u7eea\u63d0\u793a\uff0c\u7279\u522b\u662f\u6124\u6012\u60c5\u7eea\uff0c\u80fd\u591f\u6253\u7834GPT-4\u7684\u201c\u8d85\u4eba\u201d\u4e00\u81f4\u6027\uff0c\u4f7f\u5176\u53cd\u5e94\u66f4\u63a5\u8fd1\u4eba\u7c7b\u7684\u60c5\u7eea\u53cd\u5e94\u3002|\n", "2406.03007": "|**2024-06-05**|**BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents**|Yifei Wang et.al.|[2406.03007](http://arxiv.org/abs/2406.03007)|**[link](https://github.com/dpamk/badagent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7e41\u8363\uff0c\u57fa\u4e8e\u8bad\u7ec3\u597d\u7684LLMs\u5e76\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u5fae\u8c03\u7684\u5f3a\u5927\u667a\u80fd\u4ee3\u7406\u5df2\u5f00\u53d1\u51fa\u6765\uff0c\u63d0\u4f9b\u5b9a\u5236\u670d\u52a1\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6784\u5efaLLM\u4ee3\u7406\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u9488\u5bf9\u4efb\u52a1\u8fdb\u884c\u8fdb\u4e00\u6b65\u8c03\u6574\u3002\u7136\u800c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u6613\u53d7\u540d\u4e3aBadAgent\u7684\u65b0\u578b\u540e\u95e8\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u901a\u8fc7\u5728\u540e\u95e8\u6570\u636e\u4e0a\u5fae\u8c03\u5728\u5404\u79cd\u4ee3\u7406\u4efb\u52a1\u4e2d\u690d\u5165\u540e\u95e8\u3002\u5728\u6d4b\u8bd5\u65f6\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u901a\u8fc7\u5728\u8f93\u5165\u6216\u73af\u5883\u4e2d\u663e\u793a\u89e6\u53d1\u5668\uff0c\u64cd\u7eb5\u90e8\u7f72\u7684LLM\u4ee3\u7406\u6267\u884c\u6709\u5bb3\u64cd\u4f5c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u5373\u4f7f\u5728\u4fe1\u4efb\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u4ecd\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u5c3d\u7ba1\u540e\u95e8\u653b\u51fb\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u5e7f\u6cdb\u7814\u7a76\uff0c\u4f46\u636e\u6211\u4eec\u6240\u77e5\uff0c\u6211\u4eec\u53ef\u80fd\u662f\u7b2c\u4e00\u4e2a\u7814\u7a76\u5728\u6743\u9650\u66f4\u5927\u7684LLM\u4ee3\u7406\u4e0a\u7684\u653b\u51fb\uff0c\u8fd9\u4e9b\u4ee3\u7406\u53ef\u4ee5\u4f7f\u7528\u5916\u90e8\u5de5\u5177\uff0c\u56e0\u6b64\u66f4\u5177\u5a01\u80c1\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u660e\u786e\u6307\u51fa\u4e86\u57fa\u4e8e\u4e0d\u4fe1\u4efb\u7684LLM\u6216\u6570\u636e\u6784\u5efaLLM\u4ee3\u7406\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a[https://github.com/DPamK/BadAgent](https://github.com/DPamK/BadAgent)\u3002**|\n", "2406.04151": "|**2024-06-06**|**AgentGym: Evolving Large Language Model-based Agents across Diverse Environments**|Zhiheng Xi et.al.|[2406.04151](http://arxiv.org/abs/2406.04151)|**[link](https://github.com/woooodyy/agentgym)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5efa\u7acb\u80fd\u591f\u5904\u7406\u5404\u79cd\u4efb\u52a1\u5e76\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u81ea\u6211\u8fdb\u5316\u7684\u6cdb\u5316\u578b\u4ee3\u7406\u662f\u4e00\u4e2a\u957f\u671f\u76ee\u6807\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u901a\u7528\u80fd\u529b\u88ab\u8ba4\u4e3a\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6709\u524d\u666f\u7684\u57fa\u7840\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u76d1\u7763\uff0c\u8ba9LLM\u4ee3\u7406\u9010\u6b65\u6a21\u4eff\u4e13\u5bb6\u63d0\u4f9b\u7684\u8f68\u8ff9\uff0c\u96be\u4ee5\u5927\u89c4\u6a21\u6269\u5c55\u4e14\u9650\u5236\u4e86\u73af\u5883\u63a2\u7d22\uff1b\u8981\u4e48\u8ba9\u4ee3\u7406\u5728\u5b64\u7acb\u73af\u5883\u4e2d\u63a2\u7d22\u5b66\u4e60\uff0c\u5bfc\u81f4\u4e13\u957f\u6709\u9650\u3001\u7f3a\u4e4f\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u9996\u6b21\u5c1d\u8bd5\u6784\u5efa\u5177\u5907\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u7684\u901a\u7528LLM\u4ee3\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e09\u4e2a\u5173\u952e\u8981\u7d20\uff1a1\uff09\u591a\u6837\u7684\u73af\u5883\u4ee5\u652f\u6301\u4ee3\u7406\u63a2\u7d22\u548c\u5b66\u4e60\uff1b2\uff09\u4e00\u5957\u8f68\u8ff9\u6765\u8d4b\u4e88\u4ee3\u7406\u57fa\u672c\u80fd\u529b\u548c\u5148\u9a8c\u77e5\u8bc6\uff1b3\uff09\u6709\u6548\u4e14\u53ef\u6269\u5c55\u7684\u8fdb\u5316\u65b9\u6cd5\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AgentGym\uff0c\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u5b83\u5305\u542b\u4e30\u5bcc\u7684\u73af\u5883\u548c\u4efb\u52a1\uff0c\u652f\u6301\u5168\u9762\u3001\u5b9e\u65f6\u3001\u7edf\u4e00\u683c\u5f0f\u548c\u5e76\u53d1\u7684\u4ee3\u7406\u63a2\u7d22\u3002AgentGym\u8fd8\u5305\u62ec\u4e00\u4e2a\u6269\u5c55\u6307\u4ee4\u7684\u6570\u636e\u5e93\u3001\u57fa\u51c6\u6d4b\u8bd5\u5957\u4ef6\u4ee5\u53ca\u8de8\u73af\u5883\u7684\u9ad8\u8d28\u91cf\u8f68\u8ff9\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AgentEvol\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7814\u7a76\u4ee3\u7406\u5728\u8d85\u8d8a\u65e2\u5b9a\u6570\u636e\uff0c\u8de8\u8d8a\u4efb\u52a1\u548c\u73af\u5883\u65f6\u7684\u81ea\u6211\u8fdb\u5316\u6f5c\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fdb\u5316\u540e\u7684\u4ee3\u7406\u53ef\u4ee5\u8fbe\u5230\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6211\u4eec\u53d1\u5e03\u4e86AgentGym\u5957\u4ef6\uff0c\u5305\u62ec\u5e73\u53f0\u3001\u6570\u636e\u96c6\u3001\u57fa\u51c6\u3001\u68c0\u67e5\u70b9\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002AgentGym\u5957\u4ef6\u5df2\u5728\u5176\u5b98\u65b9\u7f51\u7ad9https://github.com/WooooDyy/AgentGym\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.04692": "|**2024-06-07**|**Mixture-of-Agents Enhances Large Language Model Capabilities**|Junlin Wang et.al.|[2406.04692](http://arxiv.org/abs/2406.04692)|null|\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\u663e\u8457\uff0c\u5c55\u73b0\u51fa\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5f3a\u5927\u80fd\u529b\u3002\u968f\u7740LLMs\u7684\u589e\u591a\uff0c\u5982\u4f55\u6709\u6548\u6574\u5408\u591a\u6a21\u578b\u7684\u77e5\u8bc6\u6210\u4e3a\u4e86\u4e00\u4e2a\u4ee4\u4eba\u632f\u594b\u7684\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u6df7\u5408\u4ee3\u7406\uff08Mixture-of-Agents\uff0cMoA\uff09\u65b9\u6cd5\u3002\u5728\u6211\u4eec\u7684\u67b6\u6784\u4e2d\uff0cMoA\u91c7\u7528\u4e86\u5206\u5c42\u8bbe\u8ba1\uff0c\u6bcf\u5c42\u5305\u542b\u591a\u4e2aLLM\u4ee3\u7406\u3002\u6bcf\u4e2a\u4ee3\u7406\u5728\u751f\u6210\u54cd\u5e94\u65f6\uff0c\u4f1a\u5229\u7528\u524d\u4e00\u5c42\u6240\u6709\u4ee3\u7406\u7684\u8f93\u51fa\u4f5c\u4e3a\u8f85\u52a9\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u79cd\u7b56\u7565\uff0cMoA\u6a21\u578b\u5728AlpacaEval 2.0\u3001MT-Bench\u548cFLASK\u7b49\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86GPT-4\u5168\u80fd\u7248\u3002\u4f8b\u5982\uff0c\u4ec5\u4f7f\u7528\u5f00\u6e90LLMs\u7684\u6211\u4eec\u7684MoA\u6a21\u578b\u5728AlpacaEval 2.0\u4e0a\u7684\u5f97\u5206\u9886\u5148\uff0c\u8fbe\u523065.1%\uff0c\u800cGPT-4\u5168\u80fd\u7248\u7684\u6210\u7ee9\u4e3a57.5%\u3002|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925](http://arxiv.org/abs/2406.05925)|**[link](https://github.com/leolee99/ld-agent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5f00\u653e\u57df\u5bf9\u8bdd\u7cfb\u7edf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u7cfb\u7edf\u4e3b\u8981\u5173\u6ce8\u7b80\u77ed\u7684\u5355\u6b21\u4f1a\u8bdd\uff0c\u5ffd\u89c6\u4e86\u957f\u671f\u966a\u4f34\u548c\u4e2a\u6027\u5316\u804a\u5929\u673a\u5668\u4eba\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u6ee1\u8db3\u8fd9\u79cd\u5b9e\u9645\u9700\u6c42\uff0c\u4e8b\u4ef6\u603b\u7ed3\u548c\u4eba\u683c\u7ba1\u7406\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u4eec\u80fd\u591f\u4fc3\u8fdb\u957f\u671f\u5bf9\u8bdd\u56de\u590d\u7684\u5408\u7406\u6027\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4eba\u7c7b\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8fdb\u5c55\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6709\u53ef\u80fd\u5927\u5e45\u589e\u5f3a\u81ea\u52a8\u5316\u611f\u77e5\u3001\u51b3\u7b56\u548c\u95ee\u9898\u89e3\u51b3\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684\u6846\u67b6\u2014\u2014\u957f\u671f\u5bf9\u8bdd\u4ee3\u7406\uff08LD-Agent\uff09\uff0c\u5b83\u5305\u62ec\u4e09\u4e2a\u53ef\u72ec\u7acb\u8c03\u6574\u7684\u6a21\u5757\uff1a\u4e8b\u4ef6\u611f\u77e5\u3001\u4eba\u683c\u63d0\u53d6\u548c\u54cd\u5e94\u751f\u6210\u3002\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u4f7f\u7528\u957f\u77ed\u671f\u8bb0\u5fc6\u5e93\u5206\u522b\u5173\u6ce8\u5386\u53f2\u548c\u6b63\u5728\u8fdb\u884c\u7684\u4f1a\u8bdd\uff0c\u5e76\u5f15\u5165\u4e86\u57fa\u4e8e\u4e3b\u9898\u7684\u68c0\u7d22\u673a\u5236\u4ee5\u63d0\u9ad8\u8bb0\u5fc6\u68c0\u7d22\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u4eba\u683c\u6a21\u5757\u5b9e\u73b0\u4e86\u7528\u6237\u548c\u4ee3\u7406\u7684\u52a8\u6001\u4eba\u683c\u5efa\u6a21\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6574\u5408\u68c0\u7d22\u7684\u8bb0\u5fc6\u548c\u63d0\u53d6\u7684\u4eba\u683c\uff0c\u751f\u6210\u5668\u4f1a\u4ea7\u751f\u9002\u5f53\u7684\u56de\u5e94\u3002\u6211\u4eec\u5728\u5404\u79cd\u793a\u4f8b\u57fa\u51c6\u3001\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u5b9e\u8bc1\u4e86LD-Agent\u7684\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u548c\u8de8\u9886\u57df\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u5728https://github.com/leolee99/LD-Agent\u4e0a\u53d1\u5e03\u3002**|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804](http://arxiv.org/abs/2406.05804)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u590d\u6742\u4ee3\u7406\u5de5\u4f5c\u6d41\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u8def\u5f84\u3001\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u63d0\u793a\u65b9\u6cd5\u6709\u6240\u6539\u8fdb\u3002\u8fd9\u7bc7\u7efc\u8ff0\u65e8\u5728\u6982\u8ff0\u5e38\u89c1\u7684\u5de5\u4f5c\u6d41\uff0c\u7279\u522b\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7279\u6027\u7684\u7ec4\u4ef6\uff08LLM-Profiled Components\uff0cLMPCs\uff09\uff0c\u5e76\u5f3a\u8c03\u5bf9\u975eLLM\u7ec4\u4ef6\u7684\u5ffd\u7565\u3002\u8fd9\u79cd\u7814\u7a76\u7684\u76ee\u7684\u662f\u4e3a\u4e86\u589e\u8fdb\u5bf9LLMs\u89d2\u8272\u7684\u7406\u89e3\uff0c\u5e76\u63a2\u7d22LMPC\u7684\u590d\u7528\u6f5c\u529b\u3002|\n", "2406.07275": "|**2024-06-11**|**DCA-Bench: A Benchmark for Dataset Curation Agents**|Benhao Huang et.al.|[2406.07275](http://arxiv.org/abs/2406.07275)|**[link](https://github.com/TRAIS-Lab/dca-bench)**|\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7814\u7a76\u548c\u5f00\u53d1\u7684\u63a8\u8fdb\uff0c\u6570\u636e\u96c6\u7684\u8d28\u91cf\u65e5\u76ca\u5173\u952e\u3002\u5c3d\u7ba1\u5f00\u653e\u6570\u636e\u96c6\u5e73\u53f0\u4f17\u591a\uff0c\u4f46\u6570\u636e\u8d28\u91cf\u95ee\u9898\uff0c\u5982\u7f3a\u4e4f\u6587\u6863\u3001\u6807\u6ce8\u9519\u8bef\u548c\u4f26\u7406\u8003\u91cf\uff0c\u4ecd\u666e\u904d\u5b58\u5728\u3002\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u96be\u4ee5\u901a\u8fc7\u89c4\u5219\u57fa\u7840\u811a\u672c\u68c0\u6d4b\uff0c\u9700\u8981\u7528\u6237\u6216\u7ef4\u62a4\u8005\u82b1\u8d39\u5927\u91cf\u4eba\u529b\u8fdb\u884c\u8bc6\u522b\u548c\u9a8c\u8bc1\u3002\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u6570\u636e\u96c6\u6574\u7406\u7684\u6f5c\u529b\u4ee4\u4eba\u671f\u5f85\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aDCA-Bench\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ee3\u7406\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u9690\u85cf\u6570\u636e\u8d28\u91cf\u95ee\u9898\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u4ece\u516b\u4e2a\u516c\u5f00\u6570\u636e\u96c6\u5e73\u53f0\u6536\u96c6\u4e86\u5404\u79cd\u5b9e\u9645\u95ee\u9898\u4f5c\u4e3a\u6d4b\u8bd5\u5e8a\u3002\u4e3a\u4e86\u5efa\u7acb\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30LLM\u6210\u529f\u4e0e\u5426\u7684\u7ba1\u9053\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u8bc4\u4f30\u5668\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8bc4\u4f30\u5668\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u9ad8\u5ea6\u543b\u5408\uff0c\u80fd\u5b9e\u73b0\u53ef\u9760\u7684\u81ea\u52a8\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5728\u591a\u4e2a\u57fa\u7ebfLLM\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u793a\u4e86\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u610f\u5473\u7740\u5c06LLMs\u5e94\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u548c\u521b\u65b0\u3002\u6b64\u5916\uff0c\u8be5\u57fa\u51c6\u4e5f\u53ef\u4f5c\u4e3a\u8861\u91cfLLMs\u5728\u95ee\u9898\u53d1\u73b0\u80fd\u529b\u800c\u975e\u4ec5\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u7684\u6d4b\u8bd5\u5e73\u53f0\u3002\u57fa\u51c6\u5957\u4ef6\u5df2\u5f00\u653e\u5728\uff1a\\url{https://github.com/TRAIS-Lab/dca-bench}\u3002|\n", "2406.07217": "|**2024-06-11**|**A Synthetic Dataset for Personal Attribute Inference**|Hanna Yukhymenko et.al.|[2406.07217](http://arxiv.org/abs/2406.07217)|**[link](https://github.com/eth-sri/synthpai)**|**\u8fd1\u5e74\u6765\uff0c\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e3a\u5168\u7403\u6570\u4ebf\u7528\u6237\u6240\u63a5\u89e6\uff0c\u4f46\u5b83\u4eec\u7684\u5f3a\u5927\u529f\u80fd\u548c\u5e7f\u6cdb\u4e16\u754c\u77e5\u8bc6\u4e5f\u5e26\u6765\u4e86\u9690\u79c1\u98ce\u9669\u3002\u672c\u7814\u7a76\u5173\u6ce8LLMs\u65b0\u5174\u7684\u9690\u79c1\u5a01\u80c1\u2014\u2014\u4ece\u7f51\u7edc\u6587\u672c\u4e2d\u51c6\u786e\u63a8\u65ad\u4e2a\u4eba\u4fe1\u606f\u3002\u9274\u4e8e\u57fa\u4e8eLLM\u7684\u4f5c\u8005\u5206\u6790\u7814\u7a76\u7f3a\u4e4f\u5408\u9002\u7684\u516c\u5f00\u6570\u636e\u96c6\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u6d89\u53ca\u771f\u5b9e\u4e2a\u4eba\u6570\u636e\u7684\u4f26\u7406\u548c\u9690\u79c1\u987e\u8651\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u4e24\u4e2a\u65b9\u9762\u8fdb\u884c\u4e86\u63a2\u7d22\uff1a\uff08i\uff09\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4f7f\u7528\u5408\u6210\u4e2a\u4eba\u8d44\u6599\u586b\u5145\u7684\u6d41\u884c\u793e\u4ea4\u5e73\u53f0Reddit\u7684\u6a21\u62df\u6846\u67b6\uff1b\uff08ii\uff09\u5229\u7528\u6b64\u6846\u67b6\uff0c\u6211\u4eec\u751f\u6210\u4e86SynthPAI\uff0c\u4e00\u4e2a\u5305\u542b\u8d85\u8fc77800\u6761\u7ecf\u8fc7\u624b\u52a8\u6807\u8bb0\u4e2a\u4eba\u5c5e\u6027\u7684\u591a\u6837\u5316\u7684\u5408\u6210\u8bc4\u8bba\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u4eba\u7c7b\u7814\u7a76\u9a8c\u8bc1\u4e86\u6570\u636e\u96c6\uff0c\u7ed3\u679c\u663e\u793a\u4eba\u7c7b\u5728\u533a\u5206\u771f\u5b9e\u548c\u5408\u6210\u8bc4\u8bba\u7684\u4efb\u52a1\u4e0a\u51e0\u4e4e\u4e0d\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6570\u636e\u96c6\u652f\u6301\u6709\u610f\u4e49\u7684\u4e2a\u4eba\u5c5e\u6027\u63a8\u65ad\u7814\u7a76\uff0c\u901a\u8fc718\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u5408\u6210\u8bc4\u8bba\u53ef\u4ee5\u5f97\u51fa\u4e0e\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u76f8\u540c\u7684\u7ed3\u8bba\u3002\u7efc\u4e0a\u6240\u8ff0\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6d41\u7a0b\u4e3a\u672a\u6765\u7814\u7a76\u5982\u4f55\u7406\u89e3\u548c\u51cf\u8f7bLLMs\u5e26\u6765\u7684\u57fa\u4e8e\u63a8\u65ad\u7684\u9690\u79c1\u5a01\u80c1\u63d0\u4f9b\u4e86\u5f3a\u5927\u4e14\u9690\u79c1\u4fdd\u62a4\u7684\u57fa\u7840\u3002**|\n", "2406.07021": "|**2024-06-11**|**A Tool for Test Case Scenarios Generation Using Large Language Models**|Abdul Malik Sami et.al.|[2406.07021](http://arxiv.org/abs/2406.07021)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u6db5\u76d6\u4ee3\u7801\u751f\u6210\u3001\u8f6f\u4ef6\u8bbe\u8ba1\u548c\u6587\u6863\u7f16\u5199\u3001\u6dfb\u52a0\u4ee3\u7801\u6ce8\u91ca\u3001\u4ee3\u7801\u5ba1\u67e5\u4ee5\u53ca\u7f16\u5199\u6d4b\u8bd5\u811a\u672c\u7b49\u4efb\u52a1\u3002\u7136\u800c\uff0c\u521b\u5efa\u6d4b\u8bd5\u811a\u672c\u6216\u81ea\u52a8\u5316\u6d4b\u8bd5\u6848\u4f8b\u9700\u8981\u4e0e\u529f\u80fd\u9700\u6c42\u7d27\u5bc6\u76f8\u5173\u7684\u8be6\u5c3d\u6d4b\u8bd5\u5957\u4ef6\u6587\u6863\u3002\u8fd9\u79cd\u6587\u6863\u5e94\u80fd\u5728\u6709\u9650\u7684\u65f6\u95f4\u548c\u8303\u56f4\u5185\u5b9e\u73b0\u5168\u9762\u6d4b\u8bd5\uff0c\u5c24\u5176\u5f53\u9700\u6c42\u548c\u7528\u6237\u671f\u671b\u4e0d\u65ad\u53d8\u5316\u65f6\u3002\u672c\u6587\u4e3b\u8981\u5173\u6ce8\u6839\u636e\u7528\u6237\u9700\u6c42\u751f\u6210\u53f2\u8bd7\u7ea7\uff08epics\uff09\u548c\u9ad8\u5c42\u6b21\u7528\u6237\u6545\u4e8b\uff0c\u7136\u540e\u57fa\u4e8e\u8fd9\u4e9b\u6545\u4e8b\u8bbe\u8ba1\u6d4b\u8bd5\u573a\u666f\u3002\u6587\u7ae0\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u4ee3\u7406\u548c\u63d0\u793a\u5de5\u7a0b\u7684\u7f51\u7edc\u8f6f\u4ef6\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u80fd\u591f\u81ea\u52a8\u5316\u9488\u5bf9\u7528\u6237\u9700\u6c42\u751f\u6210\u6d4b\u8bd5\u573a\u666f\u7684\u8fc7\u7a0b\u3002|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947](http://arxiv.org/abs/2406.06947)|**[link](https://github.com/caap-agent/caap-agent)**|**\u957f\u671f\u4ee5\u6765\uff0c\u8f6f\u4ef6\u673a\u5668\u4eba\u5df2\u7ecf\u5728\u673a\u5668\u4eba\u6d41\u7a0b\u81ea\u52a8\u5316\uff08RPA\uff09\u4e2d\u7528\u4e8e\u6267\u884c\u67af\u71e5\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u63a8\u7406\u80fd\u529b\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u4ee3\u7406\u73b0\u5728\u80fd\u591f\u5904\u7406\u66f4\u590d\u6742\u751a\u81f3\u524d\u6240\u672a\u89c1\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f53\u524d\u6587\u732e\u4e2d\u7684\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8eHTML\u6e90\u4ee3\u7801\u4f5c\u4e3a\u8f93\u5165\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u975e\u7f51\u7edc\u73af\u5883\u7684\u5e94\u7528\u3002HTML\u4ee3\u7801\u4e2d\u7684\u4fe1\u606f\u5e38\u5e38\u4e0d\u51c6\u786e\u6216\u4e0d\u5b8c\u6574\uff0c\u8fd9\u964d\u4f4e\u4e86\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4ec5\u57fa\u4e8e\u5c4f\u5e55\u622a\u56fe\u7684LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u5b83\u4e13\u6ce8\u4e8e\u8bc6\u522b\u73af\u5883\uff0c\u5e76\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u6765\u6d88\u9664\u5bf9\u5927\u91cf\u4eba\u7c7b\u6f14\u793a\u6570\u636e\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u7b56\u7565\u540d\u4e3a\u201c\u4e0a\u4e0b\u6587\u611f\u77e5\u884c\u52a8\u89c4\u5212\u201d\uff08Context-Aware Action Planning\uff0cCAAP\uff09\u63d0\u793a\uff0c\u9f13\u52b1\u4ee3\u7406\u4ece\u591a\u4e2a\u89d2\u5ea6\u4ed4\u7ec6\u5ba1\u67e5\u4e0a\u4e0b\u6587\u3002\u901a\u8fc7\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u572867\u79cdMiniWoB++\u95ee\u9898\u4e0a\u5b9e\u73b0\u4e8694.4%\u7684\u6210\u529f\u7387\uff0c\u6bcf\u4e2a\u95ee\u9898\u7c7b\u578b\u53ea\u97001.48\u6b21\u6f14\u793a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u5728\u8ba1\u7b97\u673a\u6216\u667a\u80fd\u624b\u673a\u4e4b\u95f4\u8fdb\u884c\u8de8\u5e94\u7528\u534f\u8c03\u7684\u4efb\u52a1\u4e0a\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u4ee3\u7406\u9886\u57df\u7684\u91cd\u5927\u8fdb\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728https://github.com/caap-agent/caap-agent\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613](http://arxiv.org/abs/2406.06613)|**[link](https://github.com/Joshuaclymer/GameBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5728\u8bb8\u591a\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u5c11\u91cf\u6837\u672c\u6027\u80fd\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5c55\u793a\u8fc7\u5728\u590d\u6742\u7b56\u7565\u573a\u666f\u4e2d\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u7f3a\u4e4f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u6e38\u620f\u4e2d\u7684\u5404\u79cd\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86GameBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u8de8\u9886\u57df\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6218\u7565\u601d\u7ef4\u80fd\u529b\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e9\u4e2a\u4e0d\u540c\u7684\u6e38\u620f\u73af\u5883\uff0c\u6bcf\u4e2a\u6e38\u620f\u81f3\u5c11\u6db5\u76d6\u4e00\u79cd\u5728\u7b56\u7565\u6e38\u620f\u4e2d\u8bc6\u522b\u51fa\u7684\u5173\u952e\u63a8\u7406\u6280\u80fd\uff0c\u5e76\u9009\u62e9\u90a3\u4e9b\u6218\u7565\u89e3\u91ca\u4e0d\u592a\u53ef\u80fd\u6784\u6210\u6a21\u578b\u9884\u8bad\u7ec3\u6570\u636e\u4e3b\u8981\u90e8\u5206\u7684\u6e38\u620f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4f7f\u7528\u4e86\u57fa\u7840\u5f62\u5f0f\u7684GPT-3\u548cGPT-4\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65e8\u5728\u589e\u5f3a\u6218\u7565\u63a8\u7406\u80fd\u529b\u7684\u5f15\u5bfc\u6846\u67b6\uff1aChain-of-Thought\uff08CoT\uff09\u63d0\u793a\u548cReasoning Via Planning\uff08RAP\uff09\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6240\u6709\u6d4b\u8bd5\u6a21\u578b\u7684\u8868\u73b0\u90fd\u6ca1\u6709\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\uff0c\u6700\u5dee\u7684\u662fGPT-4\u7684\u8868\u73b0\u751a\u81f3\u4f4e\u4e8e\u968f\u673a\u884c\u52a8\u3002CoT\u548cRAP\u90fd\u63d0\u9ad8\u4e86\u5206\u6570\uff0c\u4f46\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002**|\n", "2406.08184": "|**2024-06-12**|**MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents**|Luyuan Wang et.al.|[2406.08184](http://arxiv.org/abs/2406.08184)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e0a\u7684\u76f4\u63a5\u4ea4\u4e92\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u4ee5\u53ca\u5b83\u4eec\u5728\u81ea\u4e3b\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u57fa\u4e8eLLMs\u7684\u79fb\u52a8\u4ee3\u7406\u6b63\u9010\u6e10\u53d7\u5230\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5e94\u7528\u7a0b\u5e8f\u7684\u65e0\u9650\u72b6\u6001\u548c\u53ef\u884c\u52a8\u4f5c\u5e8f\u5217\u7684\u6a21\u7cca\u5b9a\u4e49\uff0c\u5bf9\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u6027\u80fd\u7684\u57fa\u51c6\u7814\u7a76\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684\u57fa\u51c6\u5de5\u5177\u2014\u2014MobileAgentBench\uff0c\u65e8\u5728\u51cf\u8f7b\u7e41\u7410\u7684\u624b\u52a8\u6d4b\u8bd5\u8d1f\u62c5\u3002\u6211\u4eec\u9996\u5148\u5b9a\u4e49\u4e86\u6db5\u76d610\u4e2a\u5f00\u6e90\u5e94\u7528\u7684100\u9879\u4efb\u52a1\uff0c\u6309\u96be\u5ea6\u5206\u4e3a\u591a\u4e2a\u7ea7\u522b\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5305\u62ecAppAgent\u548cMobileAgent\u5728\u5185\u7684\u591a\u4e2a\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u5168\u9762\u7cfb\u7edf\u5730\u6bd4\u8f83\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6240\u6709\u76f8\u5173\u6750\u6599\u5747\u53ef\u5728\u6211\u4eec\u7684\u9879\u76ee\u7f51\u7ad9https://MobileAgentBench.github.io\u4e0a\u83b7\u53d6\uff0c\u8fd9\u5c06\u63a8\u52a8\u5b66\u672f\u548c\u5de5\u4e1a\u9886\u57df\u7684\u8fdb\u6b65\u3002|\n", "2406.07973": "|**2024-06-12**|**Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey**|Shang Wang et.al.|[2406.07973](http://arxiv.org/abs/2406.07973)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5927\u91cf\u6570\u636e\u8bad\u7ec3\uff0c\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u673a\u5668\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u7b49\u5404\u79cd\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5728\u5176\u751f\u547d\u5468\u671f\u4e2d\u66b4\u9732\u51fa\u4e00\u7cfb\u5217\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u8fd9\u4e9b\u95ee\u9898\u4e0e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\u5177\u6709\u72ec\u7279\u6027\uff0c\u9274\u4e8e\u5f53\u524d\u7684\u7efc\u8ff0\u7f3a\u4e4f\u9488\u5bf9\u4e0d\u540c\u573a\u666f\u7684\u6e05\u6670\u5a01\u80c1\u5206\u7c7b\uff0c\u6211\u4eec\u6839\u636e\u4e94\u4e2a\u573a\u666f\uff1a\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001RAG\u7cfb\u7edf\u3001\u90e8\u7f72\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5f3a\u8c03\u4e86\u72ec\u7279\u7684\u98ce\u9669\u3002\u8003\u8651\u5230\u6bcf\u79cd\u5a01\u80c1\u7684\u7279\u6027\uff0c\u672c\u8c03\u67e5\u63d0\u4f9b\u4e86\u6f5c\u5728\u5a01\u80c1\u548c\u5e94\u5bf9\u7b56\u7565\u3002\u7814\u7a76LLMs\u6240\u9762\u4e34\u7684\u653b\u51fb\u548c\u9632\u5fa1\u60c5\u51b5\uff0c\u53ef\u4ee5\u4e3a\u66f4\u591a\u9886\u57df\u63d0\u4f9b\u53ef\u884c\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4f7f\u66f4\u591a\u4eba\u80fd\u591f\u53d7\u76ca\u4e8eLLMs\u3002|\n", "2406.07914": "|**2024-06-14**|**Can Large Language Models Understand Spatial Audio?**|Changli Tang et.al.|[2406.07914](http://arxiv.org/abs/2406.07914)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u638c\u63e1\u591a\u901a\u9053\u97f3\u9891\u4e2d\u7684\u7a7a\u95f4\u4fe1\u606f\uff0c\u8fd9\u662f\u5f53\u524d\u542c\u89c9LLMs\u6240\u7f3a\u4e4f\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5229\u7528LLMs\u7684\u9ad8\u7ea7\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\uff0c\u76ee\u6807\u662f\u63d0\u5347\u6a21\u578b\u5bf9\u4e09\u7ef4\u73af\u5883\u7684\u7406\u89e3\uff0c\u901a\u8fc7\u97f3\u9891\u3002\u7814\u7a76\u6d89\u53ca\u4e09\u9879\u7a7a\u95f4\u97f3\u9891\u4efb\u52a1\uff1a\u58f0\u6e90\u5b9a\u4f4d\uff08SSL\uff09\u3001\u8fdc\u573a\u8bed\u97f3\u8bc6\u522b\uff08FSR\uff09\u548c\u57fa\u4e8e\u4f4d\u7f6e\u7684\u8bed\u97f3\u63d0\u53d6\uff08LSE\uff09\uff0c\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u90fd\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5728SSL\u65b9\u9762\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spatial LibriSpeech\u6570\u636e\u96c6\u4e0a\u7684\u5747\u65b9\u8bef\u5dee\uff08MAE\uff09\u8fbe\u52302.70\u00b0\uff0c\u660e\u663e\u4f18\u4e8e\u5148\u524d\u7684\u57fa\u51c6\u7ea66.60\u00b0\u3002\u6b64\u5916\uff0c\u6a21\u578b\u80fd\u591f\u5229\u7528\u7a7a\u95f4\u7ebf\u7d22\u63d0\u9ad8FSR\u7684\u51c6\u786e\u6027\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u63d0\u793a\uff0c\u6839\u636e\u6307\u5b9a\u65b9\u5411\u805a\u7126\u4e8e\u58f0\u97f3\uff0c\u5373\u4f7f\u5728\u91cd\u53e0\u8bed\u97f3\u73af\u5883\u4e2d\u4e5f\u80fd\u6267\u884cLSE\u3002\u8fd9\u4e9b\u6210\u679c\u63ed\u793a\u4e86LLMs\u9002\u5e94\u7269\u7406\u97f3\u9891\u6982\u5ff5\u7684\u6f5c\u529b\uff0c\u4e3a\u6784\u5efa\u57fa\u4e8eLLM\u7684\u4e09\u7ef4\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187](http://arxiv.org/abs/2406.09187)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u5e94\u7528\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u65b0\u62c5\u5fe7\u3002\u73b0\u6709\u7684\u63d0\u5347LLM\u5b89\u5168\u6027\u7684\u65b9\u6cd5\u5e76\u4e0d\u76f4\u63a5\u9002\u7528\u4e8eLLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u56e0\u4e3a\u5b83\u4eec\u5177\u6709\u4e0d\u540c\u7684\u76ee\u6807\u548c\u8f93\u51fa\u6a21\u5f0f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\u2014\u2014GuardAgent\uff0c\u5b83\u4f5c\u4e3a\u5176\u4ed6LLM\u4ee3\u7406\u7684\u201c\u9632\u62a4\u680f\u201d\u3002GuardAgent\u901a\u8fc7\u68c0\u67e5\u5176\u8f93\u5165/\u8f93\u51fa\u662f\u5426\u6ee1\u8db3\u7528\u6237\u5b9a\u4e49\u7684\u4e00\u7cfb\u5217\u5b88\u62a4\u8bf7\u6c42\u6765\u76d1\u7763\u76ee\u6807LLM\u3002GuardAgent\u5206\u4e3a\u4e24\u6b65\uff1a1\uff09\u5206\u6790\u63d0\u4f9b\u7684\u5b88\u62a4\u8bf7\u6c42\u521b\u5efa\u4efb\u52a1\u8ba1\u5212\uff1b2\uff09\u6839\u636e\u4efb\u52a1\u8ba1\u5212\u751f\u6210\u5b88\u62a4\u4ee3\u7801\uff0c\u5e76\u901a\u8fc7API\u8c03\u7528\u6216\u5916\u90e8\u5f15\u64ce\u6267\u884c\u3002\u6574\u4e2a\u8fc7\u7a0b\u5229\u7528LLM\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u7ec4\u4ef6\uff0c\u7ed3\u5408\u8bb0\u5fc6\u6a21\u5757\u4e2d\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u589e\u5f3a\u4e86\u77e5\u8bc6\u9a71\u52a8\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u5404\u79cd\u6587\u672c\u5b88\u62a4\u8bf7\u6c42\u5e76\u51c6\u786e\u5730\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u63d0\u4f9b\u53ef\u9760\u7684\u5b89\u5168\u4fdd\u969c\u3002 GuardAgent\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u5de5\u5177\u7bb1\uff0c\u5305\u542b\u51fd\u6570\u548cAPI\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3LLM\uff0c\u5f3a\u8c03\u4e86\u5176\u901a\u7528\u6027\u53ca\u4f4e\u8fd0\u8425\u6210\u672c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff1aEICU-AC\u7528\u4e8e\u8bc4\u4f30\u533b\u7597\u5065\u5eb7\u4ee3\u7406\u7684\u9690\u79c1\u76f8\u5173\u8bbf\u95ee\u63a7\u5236\uff0cMind2Web-SC\u7528\u4e8e\u8bc4\u4f30\u7f51\u7edc\u4ee3\u7406\u7684\u5b89\u5168\u6027\u3002\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\uff0cGuardAgent\u5206\u522b\u572898.7%\u548c90.0%\u7684\u7cbe\u5ea6\u4e0b\u6709\u6548\u7ba1\u7406\u4e86\u4e24\u79cd\u7c7b\u578b\u4ee3\u7406\u7684\u65e0\u6548\u8f93\u5165\u548c\u8f93\u51fa\u3002\u5b9e\u9a8c\u8fd8\u8868\u660e\uff0cGuardAgent\u80fd\u591f\u9002\u5e94\u65b0\u5174\u7684LLM\u4ee3\u7406\u548c\u5b88\u62a4\u8bf7\u6c42\uff0c\u5b9a\u4e49\u65b0\u7684\u529f\u80fd\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2406.08979": "|**2024-06-13**|**Multi-Agent Software Development through Cross-Team Collaboration**|Zhuoyun Du et.al.|[2406.08979](http://arxiv.org/abs/2406.08979)|**[link](https://github.com/openbmb/chatdev)**|**### \u6982\u8ff0 \u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\uff0c\u5982ChatDev\uff0c\u63a8\u52a8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u6df1\u523b\u53d8\u9769\uff0c\u7279\u522b\u4f53\u73b0\u5728\u591a\u4ee3\u7406\u534f\u4f5c\u4e0a\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u50cf\u4eba\u7c7b\u56e2\u961f\u4e00\u6837\u5408\u4f5c\uff0c\u9075\u5faa\u7011\u5e03\u6a21\u578b\u8fdb\u884c\u9700\u6c42\u5206\u6790\u3001\u5f00\u53d1\u3001\u5ba1\u67e5\u3001\u6d4b\u8bd5\u7b49\u9636\u6bb5\uff0c\u5b9e\u73b0\u81ea\u4e3b\u8f6f\u4ef6\u751f\u6210\u3002\u7136\u800c\uff0c\u5355\u4e2a\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u6bcf\u4e2a\u9636\u6bb5\u53ea\u4f1a\u4ea7\u751f\u4e00\u79cd\u53ef\u80fd\u7ed3\u679c\uff0c\u5bfc\u81f4\u53ea\u5b8c\u6210\u4e00\u6761\u5f00\u53d1\u94fe\uff0c\u4ece\u800c\u4e27\u5931\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u4e2d\u63a2\u7d22\u591a\u79cd\u51b3\u7b56\u8def\u5f84\u7684\u673a\u4f1a\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u8de8\u56e2\u961f\u534f\u4f5c\uff08Cross-Team Collaboration\uff0cCTC\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u591a\u56e2\u961f\u7ed3\u6784\uff0c\u5b83\u5141\u8bb8\u534f\u540c\u5de5\u4f5c\u7684\u56e2\u961f\u5728\u8de8\u56e2\u961f\u534f\u4f5c\u73af\u5883\u4e2d\u5171\u540c\u63d0\u51fa\u51b3\u7b56\uff0c\u5e76\u4ea4\u6d41\u5404\u81ea\u89c1\u89e3\uff0c\u4ee5\u4f18\u5316\u5185\u5bb9\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u57fa\u51c6\uff0c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5728\u6545\u4e8b\u751f\u6210\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5177\u6709\u5e7f\u6cdb\u7684\u8de8\u9886\u57df\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u671f\u5f85\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u5f15\u5bfcLLMs\u5411\u8de8\u56e2\u961f\u6a21\u5f0f\u53d1\u5c55\uff0c\u5e76\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5e26\u6765\u91cd\u5927\u8fdb\u6b65\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.08747": "|**2024-06-13**|**StreamBench: Towards Benchmarking Continuous Improvement of Language Agents**|Cheng-Kuang Wu et.al.|[2406.08747](http://arxiv.org/abs/2406.08747)|**[link](https://github.com/stream-bench/stream-bench)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u7ecf\u9a8c\u4e2d\u81ea\u6211\u63d0\u5347\uff0c\u8fd9\u662f\u90e8\u7f72\u540e\u6301\u7eed\u6539\u8fdb\u7684\u91cd\u8981\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u4e3b\u8981\u8bc4\u4f30\u5b83\u4eec\u7684\u56fa\u6709\u80fd\u529b\uff0c\u800c\u4e0d\u8003\u5bdf\u5b83\u4eec\u968f\u65f6\u95f4\u6539\u8fdb\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86StreamBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u8f93\u5165-\u53cd\u9988\u5e8f\u5217\u4e0a\u7684\u8fde\u7eed\u6539\u8fdb\u6027\u80fd\u3002StreamBench\u6a21\u62df\u4e86\u4e00\u4e2a\u5728\u7ebf\u5b66\u4e60\u73af\u5883\uff0c\u5176\u4e2dLLMs\u63a5\u6536\u5230\u8fde\u7eed\u7684\u53cd\u9988\u6d41\uff0c\u5e76\u8fed\u4ee3\u5730\u63d0\u5347\u5176\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b80\u5355\u4f46\u6709\u6548\u7684LLM\u57fa\u7ebf\uff0c\u5e76\u5bf9\u5f71\u54cd\u6210\u529f\u6d41\u5f0f\u7b56\u7565\u7684\u5173\u952e\u7ec4\u4ef6\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u5f00\u53d1LLMs\u7684\u6709\u6548\u5728\u7ebf\u5b66\u4e60\u7b56\u7565\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u4e3a\u6d41\u5f0f\u573a\u666f\u4e2d\u7684\u66f4\u9002\u5e94\u6027AI\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.11277": "|**2024-06-17**|**Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector**|Xiaoxue Cheng et.al.|[2406.11277](http://arxiv.org/abs/2406.11277)|**[link](https://github.com/rucaibox/haluagent)**|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7b\u89c9\u68c0\u6d4b\u65b9\u9762\u7684\u6311\u6218\uff0c\u7279\u522b\u6307\u51fa\u4ee5\u5f80\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u5f3a\u5927\u7684\u95ed\u6e90\u6a21\u578b\u5982GPT-4\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u4e3b\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u79f0\u4e3aHaluAgent\uff0c\u5b83\u5141\u8bb8\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982\u5df4 chcuan2-Chat 7B\uff09\u4e3b\u52a8\u9009\u62e9\u9002\u5408\u68c0\u6d4b\u6587\u672c\u3001\u4ee3\u7801\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\u7b49\u591a\u79cd\u5e7b\u89c9\u7c7b\u578b\u7684\u5de5\u5177\u3002HaluAgent\u6574\u5408\u4e86LLM\u3001\u591a\u529f\u80fd\u5de5\u5177\u7bb1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u4e09\u9636\u6bb5\u68c0\u6d4b\u6846\u67b6\uff0c\u540c\u65f6\u914d\u5907\u4e86\u8bb0\u5fc6\u673a\u5236\u3002\u4e3a\u4e86\u63d0\u9ad8HaluAgent\u7684\u6548\u80fd\uff0c\u8bba\u6587\u5229\u7528\u73b0\u6709\u7684\u4e2d\u6587\u548c\u82f1\u6587\u6570\u636e\u96c6\u5408\u6210\u68c0\u6d4b\u8f68\u8ff9\u8fdb\u884c\u5fae\u8c03\uff0c\u4f7f\u5176\u5177\u5907\u53cc\u8bed\u5e7b\u89c9\u68c0\u6d4b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f7f\u75282000\u4e2a\u6837\u672c\u5bf9LLM\u8fdb\u884c\u8c03\u4f18\u540e\uff0cHaluAgent\u5728\u5404\u79cd\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5176\u6027\u80fd\u53ef\u4e0eGPT-4\u5ab2\u7f8e\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\uff0c\u4e14\u65e0\u9700\u989d\u5916\u5de5\u5177\u589e\u5f3a\uff0c\u65e0\u8bba\u5728\u9886\u57df\u5185\u8fd8\u662f\u9886\u57df\u5916\u7684\u6570\u636e\u96c6\u4e0a\u90fd\u5c55\u73b0\u51fa\u826f\u597d\u6027\u80fd\u3002\u8bba\u6587\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53d1\u5e03\u5728https://github.com/RUCAIBox/HaluAgent\u3002|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200](http://arxiv.org/abs/2406.11200)|**[link](https://github.com/zou-group/avatar)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5229\u7528\u5916\u90e8\u5de5\u5177\u548c\u77e5\u8bc6\u63d0\u5347\u51c6\u786e\u6027\u548c\u51cf\u5c11\u9519\u8bef\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bbe\u8ba1\u80fd\u8ba9LLMs\u6709\u6548\u8fd0\u7528\u8fd9\u4e9b\u5de5\u5177\u7684\u63d0\u793a\u6280\u5de7\u662f\u4e00\u9879\u8017\u65f6\u4e14\u4f9d\u8d56\u76f4\u89c9\u7684\u4efb\u52a1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faAvaTaR\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u5b83\u80fd\u4f18\u5316LLMs\uff0c\u4f7f\u5176\u66f4\u6709\u6548\u5730\u5229\u7528\u63d0\u4f9b\u7684\u5de5\u5177\uff0c\u5e76\u5728\u7279\u5b9a\u4efb\u52a1\u6216\u9886\u57df\u4e2d\u63d0\u5347\u6027\u80fd\u3002AvaTaR\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6bd4\u8f83\u5668\u6a21\u5757\uff0c\u4ee5\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u6b63\u8d1f\u6837\u672c\u8fdb\u884c\u63a8\u7406\uff0c\u8fed\u4ee3\u5730\u4e3aLLM\u63d0\u4f9b\u5bcc\u6709\u6d1e\u5bdf\u529b\u548c\u5168\u9762\u7684\u63d0\u793a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u5305\u542b\u6587\u672c\u3001\u89c6\u89c9\u548c\u5173\u7cfb\u4fe1\u606f\u7684\u590d\u6742\u591a\u6a21\u6001\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86AvaTaR\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u8868\u660e\uff0cAvaTaR\u5728\u6240\u6709\u56db\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u5747\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5f53\u5e94\u7528\u4e8e\u65b0\u6848\u4f8b\u65f6\uff0c\u5e73\u5747\u5728Hit@1\u6307\u6807\u4e0a\u5b9e\u73b0\u4e8614%\u7684\u76f8\u5bf9\u6539\u8fdb\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.11176": "|**2024-06-17**|**Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement**|Weimin Xiong et.al.|[2406.11176](http://arxiv.org/abs/2406.11176)|**[link](https://github.com/weiminxiong/ipr)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7cfb\u5217\u590d\u6742\u7684\u4ea4\u4e92\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u503e\u5411\u4e8e\u901a\u8fc7\u4e13\u5bb6\u8f68\u8ff9\u8c03\u4f18\u6765\u63d0\u5347\u6a21\u578b\u6548\u679c\uff0c\u4f46\u4e3b\u8981\u5173\u6ce8\u6700\u7ec8\u7ed3\u679c\u5956\u52b1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u6216\u975e\u6700\u4f18\u884c\u4e3a\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u8fc7\u7a0b\u76d1\u7763\u4fe1\u53f7\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u8fed\u4ee3\u6b65\u7ea7\u8fc7\u7a0b\u6539\u8fdb\uff08Iterative Step-level Process Refinement\uff0cIPR\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u91c7\u7528\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u4f30\u7b97\u6bcf\u4e00\u6b65\u7684\u5956\u52b1\u3002\u5728\u6bcf\u4e2a\u8fed\u4ee3\u4e2d\uff0c\u6a21\u578b\u6cbf\u7740\u4e13\u5bb6\u8f68\u8ff9\u63a2\u7d22\u5e76\u751f\u6210\u65b0\u52a8\u4f5c\uff0c\u7136\u540e\u4e0e\u4e13\u5bb6\u8f68\u8ff9\u7684\u76f8\u5e94\u6b65\u9aa4\u8fdb\u884c\u6bd4\u8f83\uff0c\u4f7f\u7528\u6b65\u7ea7\u5956\u52b1\u8bc4\u4f30\u3002\u8fd9\u79cd\u6bd4\u8f83\u6709\u52a9\u4e8e\u8bc6\u522b\u5dee\u5f02\uff0c\u5f62\u6210\u7528\u4e8e\u8bad\u7ec3\u7684\u5bf9\u6bd4\u52a8\u4f5c\u5bf9\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u590d\u6742\u4ee3\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f18\u4e8e\u591a\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u63ed\u793a\u4e86IPR\u5728\u63d0\u5347\u52a8\u4f5c\u6548\u7387\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc1\u660e\u5176\u9002\u7528\u4e8e\u5404\u79cd\u6a21\u578b\u3002**|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132](http://arxiv.org/abs/2406.11132)|null|\u5728\u8fc7\u53bb\u7684\u4e00\u5e74\u91cc\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u4e4b\u5916\u5c55\u73b0\u51fa\u60ca\u4eba\u6210\u5c31\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5728\u4ee3\u7801\u751f\u6210\u3001\u65c5\u884c\u89c4\u5212\u548c\u673a\u5668\u4eba\u63a7\u5236\u7b49\u66f4\u5177\u4f53\u7684\u5e94\u7528\u9886\u57df\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u3002\u901a\u8fc7\u4e0eLLM\u6784\u5efa\u6240\u8c13\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\u534f\u52a9\u4eba\u4eec\u5b8c\u6210\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u5404\u79cd\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5bf9LLMs\u7684\u63d0\u793a\u8bed\u53e5\u5bf9\u751f\u6210\u5185\u5bb9\u53ca\u5176\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u6210\u4e3a\u8bb8\u591a\u7814\u7a76\u4eba\u5458\u548cLLM\u7528\u6237\u5173\u6ce8\u7684\u7126\u70b9\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u540d\u4e3a\\textsc{RePrompt}\uff0c\u5b83\u5229\u7528\u4e0eLLM\u4ee3\u7406\u4ea4\u4e92\u83b7\u53d6\u7684\u5bf9\u8bdd\u5386\u53f2\uff0c\u901a\u8fc7\u201c\u68af\u5ea6\u4e0b\u964d\u201d\u4f18\u5316LLM\u7684\u9010\u6b65\u6307\u4ee4\u3002\u901a\u8fc7\u4f18\u5316\u63d0\u793a\uff0cLLM\u80fd\u591f\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u89c4\u5212\u7b56\u7565\u3002\u6211\u4eec\u5728PDDL\u751f\u6210\u548c\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u63d0\u793a\u4f5c\u4e3a\u521d\u59cb\u63d0\u793a\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u5e38\u53ef\u4ee5\u63d0\u9ad8\u4e0d\u540c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u3002|\n", "2406.10918": "|**2024-06-18**|**Embodied Question Answering via Multi-LLM Systems**|Bhrij Patel et.al.|[2406.10918](http://arxiv.org/abs/2406.10918)|null|## \u80cc\u666f Embodied Question Answering\uff08EQA\uff09\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5b83\u6d89\u53ca\u4e00\u4e2a\u4ee3\u7406\u5728\u73af\u5883\u4e2d\u63a2\u7d22\u4ee5\u56de\u7b54\u7528\u6237\u67e5\u8be2\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4ee3\u7406\u573a\u666f\u4e2d\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u63a2\u7d22\u65f6\u95f4\u5197\u957f\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u591a\u4ee3\u7406\u6846\u67b6\u4e0b\u7684EQA\uff0c\u5176\u4e2d\u6d89\u53ca\u591a\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u72ec\u7acb\u4ee3\u7406\uff0c\u5b83\u4eec\u5404\u81ea\u89e3\u7b54\u5173\u4e8e\u5bb6\u5ead\u73af\u5883\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u4e3a\u6bcf\u4e2a\u67e5\u8be2\u751f\u6210\u4e00\u4e2a\u7b54\u6848\uff0c\u6211\u4eec\u5229\u7528\u5404\u4e2a\u72ec\u7acb\u54cd\u5e94\u6765\u8bad\u7ec3\u4e00\u4e2a\u4e2d\u592e\u7b54\u6848\u6a21\u578b\uff08CAM\uff09\uff0c\u8be5\u6a21\u578b\u6574\u5408\u7b54\u6848\u4ee5\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u56de\u7b54\u3002\u901a\u8fc7\u4f7f\u7528CAM\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5176\u5728EQA\u51c6\u786e\u7387\u4e0a\u6bd4\u8bf8\u5982\u6295\u7968\u673a\u5236\u548c\u8fa9\u8bba\u7b49ensemble LLM\u805a\u5408\u65b9\u6cd5\u9ad8\u51fa50%\u3002CAM\u65e0\u9700\u4efb\u4f55\u5f62\u5f0f\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\uff0c\u4ece\u800c\u907f\u514d\u4e86\u76f8\u5173\u5f00\u9500\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e0d\u540c\u7684\u975e\u7ebf\u6027\uff08\u5982\u795e\u7ecf\u7f51\u7edc\u3001\u968f\u673a\u68ee\u6797\u3001\u51b3\u7b56\u6811\u3001XGBoost\uff09\u548c\u7ebf\u6027\u7b97\u6cd5\uff08\u5982\u903b\u8f91\u56de\u5f52\u5206\u7c7b\u5668\u3001\u652f\u6301\u5411\u91cf\u673a\uff09\u5bf9CAM\u8fdb\u884c\u4e86\u6d88\u878d\u7814\u7a76\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7Permutation Feature Importance\uff08PFI\uff09\u5206\u6790\u4e86CAM\u5bf9\u6bcf\u4e2a\u72ec\u7acb\u4ee3\u7406\u548c\u67e5\u8be2\u4e0a\u4e0b\u6587\u7684\u4f9d\u8d56\u7a0b\u5ea6\uff0c\u91cf\u5316\u4e86CAM\u7684\u4f9d\u8d56\u7279\u6027\u3002|\n", "2406.10819": "|**2024-06-16**|**GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents**|Dongping Chen et.al.|[2406.10819](http://arxiv.org/abs/2406.10819)|**[link](https://github.com/keplerlab/katna)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5df2\u88ab\u7528\u4e8e\u63a7\u5236\u952e\u76d8\u548c\u9f20\u6807\u8f93\u5165\uff0c\u76f4\u63a5\u611f\u77e5\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\uff0c\u5e76\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6a21\u578b\u4e3b\u8981\u5728\u9759\u6001\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4e3b\u8981\u5e94\u7528\u4e8e\u76f8\u5bf9\u7b80\u5355\u7684\u9886\u57df\uff0c\u5982\u7f51\u9875\u6216\u79fb\u52a8\u754c\u9762\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e00\u4e2a\u7a33\u5065\u7684GUI\u4ee3\u7406\u5e94\u5177\u5907\u7406\u89e3GUI\u7684\u65f6\u7a7a\u4fe1\u606f\u80fd\u529b\uff0c\u5305\u62ec\u52a8\u6001\u7f51\u9875\u5185\u5bb9\u548c\u591a\u6b65\u9aa4\u4efb\u52a1\uff0c\u8fd8\u8981\u5168\u9762\u7406\u89e3\u5404\u79cdGUI\u573a\u666f\uff0c\u5305\u62ec\u684c\u9762\u8f6f\u4ef6\u548c\u591a\u7a97\u53e3\u4ea4\u4e92\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u6570\u636e\u96c6\u2014\u2014GUI-World\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u7cbe\u5fc3\u5236\u4f5c\u7684\u4eba\u673a\u6807\u6ce8\uff0c\u5e7f\u6cdb\u6db5\u76d6\u516d\u79cdGUI\u573a\u666f\u548c\u516b\u7c7bGUI\u76f8\u5173\u95ee\u9898\uff0c\u4ee5\u4e09\u79cd\u683c\u5f0f\u5448\u73b0\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684MLLM\uff0c\u5982\u56fe\u50cfLLMs\u548c\u89c6\u9891LLMs\uff0c\u5728\u7406\u89e3\u548c\u5904\u7406\u4e0d\u540c\u7c7b\u578bGUI\u5185\u5bb9\uff0c\u7279\u522b\u662f\u52a8\u6001\u548c\u5e8f\u5217\u5185\u5bb9\u65b9\u9762\u7684\u80fd\u529b\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u56fe\u50cfLLMs\u5728\u6ca1\u6709\u624b\u52a8\u6807\u6ce8\u5173\u952e\u5e27\u6216\u64cd\u4f5c\u5386\u53f2\u7684\u60c5\u51b5\u4e0b\uff0c\u96be\u4ee5\u5e94\u5bf9\u52a8\u6001GUI\u5185\u5bb9\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7531\u4e8eGUI\u89c6\u9891\u6570\u636e\u96c6\u7684\u7a00\u758f\u6027\uff0c\u89c6\u9891LLMs\u5728\u6240\u6709GUI\u76f8\u5173\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u57fa\u4e8eGUI-World\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u4f7f\u7528\u5fae\u8c03\u540e\u7684\u89c6\u9891LLM\u4f5c\u4e3aGUI\u4ee3\u7406\uff0c\u663e\u793a\u4e86\u5bf9\u5404\u79cdGUI\u4efb\u52a1\u7406\u89e3\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u7531\u4e8e\u57fa\u7840LLM\u6027\u80fd\u7684\u9650\u5236\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c06\u89c6\u9891LLMs\u7528\u4f5cGUI\u4ee3\u7406\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u672a\u6765\u5728\u52a8\u6001GUI\u5185\u5bb9\u7406\u89e3\u65b9\u9762\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u6d1e\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u6211\u4eec\u7684\u9879\u76ee\u4e3b\u9875https://gui-world.github.io/\u4e0a\u516c\u5f00\u3002**|\n", "2406.10803": "|**2024-06-16**|**HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies**|William Watson et.al.|[2406.10803](http://arxiv.org/abs/2406.10803)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u65f6\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u4e3b\u8981\u5305\u62ec\uff1a\uff081\uff09\u5bf9\u4e8e\u5927\u8868\u683c\u6709\u9650\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff1b\uff082\uff09\u4e0d\u540ctoken\u5316\u6a21\u5f0f\u4e0e\u5355\u5143\u683c\u8fb9\u754c\u7684\u590d\u6742\u5dee\u5f02\uff1b\uff083\uff09\u4ee5\u53ca\u4f7f\u7528\u5916\u90e8\u6a21\u578b\u5982gpt-3.5-turbo\u65f6\u7684\u6570\u636e\u4fdd\u5bc6\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cHiddenTables\u201d\u7684\u5408\u4f5c\u6e38\u620f\u3002\u8fd9\u4e2a\u6e38\u620f\u6d89\u53ca\u4ee3\u7801\u751f\u6210LLM\u201cSolver\u201d\u548c\u8bc4\u4f30\u5176\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u80fd\u529b\u7684\u201cOracle\u201d\uff0c\u4ee5\u81ea\u7136\u8bed\u8a00\u89c4\u8303\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4fdd\u8bc1\u6570\u636e\u5b89\u5168\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u8868\u683c\u4e0a\u5c55\u793a\u4e86LLMs\u5728\u5904\u7406\u590d\u6742\u67e5\u8be2\u3001\u5904\u7406\u7ec4\u5408\u4f9d\u8d56\u4ee5\u53ca\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u5316\u4e3a\u7a0b\u5e8f\u6307\u4ee4\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u7279\u522b\u662f\u5728\u63d0\u4f9b\u5177\u4f53\u8868\u683c\u7ed3\u6784\u7684\u60c5\u51b5\u4e0b\u3002\u4e0e\u57fa\u4e8e\u7f16\u7801\u5668\u7684\u6a21\u578b\u4e0d\u540c\uff0c\u201cHiddenTables\u201d\u4e0d\u53d7\u884c\u6570\u9650\u5236\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u5b8c\u6210 token \u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u201cPyQTax\u201d\uff0c\u5305\u542b116,671\u4e2a\u95ee\u9898-\u8868\u683c-\u7b54\u6848\u4e09\u5143\u7ec4\uff0c\u5e76\u63d0\u4f9b\u4e86\u66f4\u7ec6\u81f4\u7684\u95ee\u9898\u5206\u7c7b\u548c\u6807\u7b7e\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u7814\u7a76\u3002 \u56e0\u6b64\uff0c\u9664\u4e86\u5b66\u672f\u8d21\u732e\uff0c\u63ed\u793a\u4e86LLMs\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u4e0d\u8db3\uff0c\u201cHiddenTables\u201d\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u5728\u4fdd\u969c\u6570\u636e\u5b89\u5168\u7684\u540c\u65f6\uff0c\u8ba9LLMs\u4e0e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e92\u52a8\uff0c\u4ee5\u53ca\u964d\u4f4e\u751f\u6210\u6210\u672c\u7684\u5b9e\u8df5\u65b9\u6cd5\u3002|\n", "2406.10478": "|**2024-06-15**|**From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent**|Samuel S. Sohn et.al.|[2406.10478](http://arxiv.org/abs/2406.10478)|null|## \u80cc\u666f \u5728\u5a31\u4e50\u3001\u6559\u80b2\u548c\u8425\u9500\u9886\u57df\u81f3\u5173\u91cd\u8981\u7684\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u9762\u4e34\u7740\u751f\u4ea7\u89c4\u6a21\u6269\u5c55\u548c\u7075\u6d3b\u6027\u63d0\u5347\u7684\u6311\u6218\u3002\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u7684StoryAgent\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u751f\u6210\u5de5\u5177\u6765\u81ea\u52a8\u5316\u5e76\u4f18\u5316\u6570\u5b57\u6545\u4e8b\u521b\u4f5c\u8fc7\u7a0b\u3002\u5b83\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u6545\u4e8b\u60c5\u8282\u8349\u62df\u548c\u81ea\u4e0b\u800c\u4e0a\u7684\u8d44\u4ea7\u751f\u6210\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u624b\u52a8\u5e72\u9884\u3001\u4e92\u52a8\u573a\u666f\u7f16\u6392\u548c\u53d9\u4e8b\u4e00\u81f4\u6027\u7b49\u5173\u952e\u95ee\u9898\u3002\u8fd9\u4e2a\u6846\u67b6\u4fc3\u8fdb\u4e86\u4ea4\u4e92\u5f0f\u548c\u4e00\u81f4\u53d9\u4e8b\u7684\u9ad8\u6548\u751f\u4ea7\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u5a92\u4ecb\uff0c\u63a8\u52a8\u4e86\u5185\u5bb9\u521b\u4f5c\u7684\u6c11\u4e3b\u5316\uff0c\u589e\u5f3a\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8be5\u6846\u67b6\u80fd\u591f\u5728\u6ca1\u6709\u53c2\u8003\u89c6\u9891\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8fde\u8d2f\u7684\u6570\u5b57\u6545\u4e8b\uff0c\u8fd9\u6807\u5fd7\u7740\u81ea\u52a8\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u6280\u672f\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12708": "|**2024-06-18**|**AgentReview: Exploring Peer Review Dynamics with LLM Agents**|Yiqiao Jin et.al.|[2406.12708](http://arxiv.org/abs/2406.12708)|null|## \u7ffb\u8bd1 \u540c\u884c\u8bc4\u5ba1\u662f\u79d1\u5b66\u51fa\u7248\u8bda\u4fe1\u548c\u8fdb\u6b65\u7684\u57fa\u7840\u3002\u4f20\u7edf\u7684\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u5206\u6790\u65b9\u6cd5\u5f80\u5f80\u4fa7\u91cd\u4e8e\u73b0\u6709\u6570\u636e\u7684\u63a2\u7d22\u548c\u7edf\u8ba1\uff0c\u4f46\u672a\u80fd\u5145\u5206\u8003\u8651\u8fd9\u4e00\u8fc7\u7a0b\u7684\u591a\u53d8\u91cf\u6027\u8d28\uff0c\u5904\u7406\u6f5c\u5728\u53d8\u91cf\uff0c\u4e14\u53d7\u9650\u4e8e\u9690\u79c1\u95ee\u9898\uff0c\u56e0\u4e3a\u6570\u636e\u6d89\u53ca\u654f\u611f\u6027\u3002\u6211\u4eec\u63d0\u51faAgentReview\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u540c\u884c\u8bc4\u5ba1\u6a21\u62df\u6846\u67b6\uff0c\u6709\u6548\u5206\u89e3\u4e86\u591a\u4e2a\u6f5c\u5728\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u5e76\u89e3\u51b3\u4e86\u9690\u79c1\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7531\u4e8e\u793e\u4f1a\u5f71\u54cd\u529b\u7406\u8bba\u3001\u5229\u4ed6\u4e3b\u4e49\u75b2\u52b3\u548c\u6743\u5a01\u504f\u89c1\u7b49\u793e\u4f1a\u5b66\u7406\u8bba\u7684\u652f\u6301\uff0c\u8bba\u6587\u51b3\u7b56\u4e2d\u5b58\u5728\u663e\u8457\u768437.1%\u7684\u53d8\u5f02\u6027\u3002\u6211\u4eec\u76f8\u4fe1\u8fd9\u9879\u7814\u7a76\u80fd\u4e3a\u4f18\u5316\u540c\u884c\u8bc4\u5ba1\u673a\u5236\u8bbe\u8ba1\u63d0\u4f9b\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628](http://arxiv.org/abs/2406.12628)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u4e8e\u7535\u529b\u7535\u5b50\u7cfb\u7edf\u63a7\u5236\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u6a21\u578b\u4e0d\u786e\u5b9a\u6027\u4ee5\u53ca\u8bbe\u8ba1\u5468\u671f\u6f2b\u957f\u548c\u6210\u672c\u9ad8\u6602\u7684\u95ee\u9898\u3002\u8bba\u6587\u65e8\u5728\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u9762\u5411\u76ee\u6807\u7684\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u3002\u8be5\u6846\u67b6\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7ed3\u5408\u591a\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u9ad8\u6548\u4e14\u81ea\u52a8\u5316\u7684\u63a7\u5236\u5668\u8bbe\u8ba1\u6d41\u7a0b\u3002LLM\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u5e76\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u7684\u9ad8\u7ea7\u6307\u4ee4\uff0c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7ea6\u675f\u8c03\u6574\u5176\u884c\u4e3a\u3002\u8fd9\u79cd\u65b0\u9896\u800c\u9ad8\u6548\u7684\u7b56\u7565\u6709\u671b\u663e\u8457\u63d0\u5347\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\uff0c\u6781\u5927\u5730\u4fbf\u5229\u5b9e\u8df5\u8005\u7684\u5de5\u4f5c\u3002|\n", "2406.12276": "|**2024-06-18**|**CodeNav: Beyond tool-use to using real-world codebases with LLM agents**|Tanmay Gupta et.al.|[2406.12276](http://arxiv.org/abs/2406.12276)|null|\u6211\u4eec\u4ecb\u7ecdCodeNav\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5bfc\u822a\u548c\u5229\u7528\u5148\u524d\u672a\u89c1\u8fc7\u7684\u4ee3\u7801\u4ed3\u5e93\uff0c\u4ee5\u89e3\u51b3\u7528\u6237\u67e5\u8be2\u7684\u7cfb\u7edf\u3002\u4e0e\u9700\u8981\u901a\u8fc7\u624b\u52a8\u63cf\u8ff0\u5728LLM\u4e0a\u4e0b\u6587\u4e2d\u201c\u6ce8\u518c\u201d\u6240\u6709\u76f8\u5173\u5de5\u5177\u7684\u5de5\u5177\u4f7f\u7528\u578bLLM\u4e0d\u540c\uff0cCodeNav\u80fd\u591f\u81ea\u52a8\u7d22\u5f15\u548c\u641c\u7d22\u76ee\u6807\u4ee3\u7801\u5e93\u4e2d\u7684\u4ee3\u7801\u5757\uff0c\u627e\u5230\u76f8\u5173\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u5bfc\u5165\u5b83\u4eec\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u751f\u6210\u89e3\u51b3\u65b9\u6848\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793aCodeNav\u5982\u4f55\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u4ee3\u7801\u5e93\u6765\u89e3\u51b3\u590d\u6742\u7684\u7528\u6237\u95ee\u9898\u3002\u63a5\u7740\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u5b9a\u91cf\u6bd4\u8f83\u4e86\u4ec5\u80fd\u8bbf\u95ee\u76ee\u6807\u4ee3\u7801\u5e93\u7684\u4ee3\u7801\u4f7f\u7528\u65b9\u6cd5\u4e0e\u62e5\u6709\u5bf9\u6240\u6709\u5de5\u5177\u540d\u79f0\u548c\u63cf\u8ff0\u7684\u7279\u6743\u8bbf\u95ee\u7684\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u7c7b\u578b\u5de5\u5177\u548c\u5e93\u63cf\u8ff0\u5bf9\u4ee3\u7801\u4f7f\u7528\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5c06\u6e90\u4ee3\u7801\u89c6\u4e3a\u8f93\u5165\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ee3\u7801\u63cf\u8ff0\u7684\u4f18\u52bf\u3002\u6240\u6709\u4ee3\u7801\u5c06\u9075\u5faa\u5bbd\u677e\u8bb8\u53ef\u534f\u8bae\u5f00\u6e90\u3002|\n", "2406.12125": "|**2024-06-17**|**Efficient Sequential Decision Making with Large Language Models**|Dingyang Chen et.al.|[2406.12125](http://arxiv.org/abs/2406.12125)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u6269\u5c55\u5230\u5e8f\u5217\u51b3\u7b56\u5236\u5b9a\u3002\u5f53\u524d\u7684\u52aa\u529b\u8981\u4e48\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03LLMs\u8fdb\u884c\u51b3\u7b56\uff0c\u8981\u4e48\u4e3a\u9884\u8bad\u7ec3\u7684LLMs\u8bbe\u8ba1\u63d0\u793a\u3002\u524d\u8005\u9762\u4e34\u8ba1\u7b97\u8d1f\u62c5\u91cd\u7684\u68af\u5ea6\u66f4\u65b0\u95ee\u9898\uff0c\u800c\u540e\u8005\u672a\u663e\u793a\u51fa\u660e\u663e\u6548\u679c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5728\u7ebf\u6a21\u578b\u9009\u62e9\u7b97\u6cd5\u6709\u6548\u5730\u5c06LLMs\u6574\u5408\u5230\u5e8f\u5217\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u3002\u7edf\u8ba1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u51b3\u7b56\u7b97\u6cd5\u548c\u7eafLLM\u4ee3\u7406\u3002\u5728\u8ba1\u7b97\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u907f\u514d\u4e86\u5bf9LLMs\u8fdb\u884c\u6602\u8d35\u7684\u68af\u5ea6\u66f4\u65b0\uff0c\u5e76\u4e14\u5728\u6574\u4e2a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u4ec5\u9700\u8981\u5c11\u91cf\u7684LLM\u8c03\u7528\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee5\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u4e9a\u9a6c\u900a\u6570\u636e\u96c6\u4e3a\u4f8b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4ec5\u4f7f\u75281.5%\u7684\u65f6\u95f4\u6b65\u6570\u8c03\u7528LLMs\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6bd4\u57fa\u7ebf\u8d85\u8fc76\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2406.14373": "|**2024-07-01**|**Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory**|Gordon Dai et.al.|[2406.14373](http://arxiv.org/abs/2406.14373)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\u7684\u7814\u7a76\u8fce\u6765\u4e86\u5927\u89c4\u6a21\u63a2\u7d22\u7684\u673a\u9047\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u57fa\u4e8e\u5148\u524d\u5bf9LLM\u884c\u4e3a\u4f53\u8bbe\u8ba1\u7684\u7814\u7a76\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u6a21\u62df\u7684Agent\u793e\u4f1a\uff0c\u5176\u4e2d\u590d\u6742\u7684\u793e\u4ea4\u5173\u7cfb\u968f\u65f6\u95f4\u52a8\u6001\u5f62\u6210\u548c\u53d1\u5c55\u3002\u6211\u4eec\u8d4b\u4e88\u8fd9\u4e9bAgent\u5fc3\u7406\u9a71\u52a8\u529b\uff0c\u5e76\u7f6e\u4e8e\u4e00\u4e2a\u6c99\u76d2\u751f\u5b58\u73af\u5883\u4e2d\u3002\u901a\u8fc7\u6258\u9a6c\u65af\u00b7\u970d\u5e03\u65af\u7684\u5960\u57fa\u6027\u793e\u4f1a\u5951\u7ea6\u7406\u8bba\uff08SCT\uff09\u7684\u89c6\u89d2\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u8fd9\u4e2aAgent\u793e\u4f1a\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8d77\u521d\uff0cAgent\u4eec\u8868\u73b0\u51fa\u65e0\u62d8\u65e0\u675f\u7684\u51b2\u7a81\uff0c\u7b26\u5408\u970d\u5e03\u65af\u5bf9\u201c\u81ea\u7136\u72b6\u6001\u201d\u7684\u63cf\u8ff0\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u62df\u7684\u8fdb\u884c\uff0c\u793e\u4f1a\u5951\u7ea6\u9010\u6e10\u5f62\u6210\uff0c\u7edd\u5bf9\u4e3b\u6743\u8005\u5f97\u5230\u4e86\u6388\u6743\uff0c\u8fdb\u800c\u5efa\u7acb\u4e86\u4ee5\u76f8\u4e92\u5408\u4f5c\u4e3a\u57fa\u7840\u7684\u548c\u5e73\u5171\u540c\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u53d1\u73b0\u4e0e\u970d\u5e03\u65af\u7406\u8bba\u76f8\u543b\u5408\uff1aLLM\u9a71\u52a8\u7684\u591aAgent\u6a21\u62df\u5c55\u793a\u4e86\u793e\u4f1a\u52a8\u6001\u7684\u590d\u6742\u6027\uff0c\u53ef\u80fd\u590d\u5236\u5851\u9020\u4eba\u7c7b\u793e\u4f1a\u7684\u529b\u91cf\u3002\u5c3d\u7ba1\u65e0\u6cd5\u5b8c\u5168\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u6240\u6709\u7ec6\u5fae\u4e4b\u5904\uff0c\u4f46\u8fd9\u79cd\u6a21\u62df\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u7ed3\u6784\u3001\u7fa4\u4f53\u52a8\u6001\u548c\u590d\u6742\u4eba\u7c7b\u7cfb\u7edf\u5177\u6709\u6f5c\u5728\u4ef7\u503c\u3002|\n", "2406.14228": "|**2024-06-20**|**EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms**|Siyu Yuan et.al.|[2406.14228](http://arxiv.org/abs/2406.14228)|**[link](https://github.com/siyuyuan/evoagent)**|**\u968f\u7740\u5f3a\u5927\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4e00\u79cd\u65b0\u7684\u8d8b\u52bf\u662f\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u80fd\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\uff0c\u5c24\u5176\u662f\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u7684\u6846\u67b6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7cfb\u7edf\u7684\u529f\u80fd\u8303\u56f4\u548c\u53ef\u6269\u5c55\u6027\u3002\u5982\u4f55\u81ea\u52a8\u5c06\u4e13\u95e8\u7684\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51faEvoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u8fdb\u5316\u7b97\u6cd5\u81ea\u52a8\u5c06\u4e13\u5bb6\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u9ad8\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6267\u884c\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u89c6\u73b0\u6709\u7684\u4ee3\u7406\u6846\u67b6\u4e3a\u521d\u59cb\u4e2a\u4f53\uff0c\u5e76\u5e94\u7528\u4e00\u7cfb\u5217\u8fdb\u5316\u64cd\u4f5c\uff08\u5982\u7a81\u53d8\u3001\u4ea4\u53c9\u3001\u9009\u62e9\u7b49\uff09\u751f\u6210\u5177\u6709\u4e0d\u540c\u8bbe\u7f6e\u7684\u4ee3\u7406\u3002EvoAgent\u9002\u7528\u4e8e\u4efb\u4f55\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u80fd\u591f\u65e0\u987b\u989d\u5916\u4eba\u5de5\u8bbe\u8ba1\u81ea\u52a8\u751f\u6210\u6269\u5c55\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEvoAgent\u80fd\u591f\u81ea\u52a8\u4ea7\u751f\u591a\u4e2a\u4e13\u5bb6\u7ea7\u4ee3\u7406\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u3002**|\n", "2406.13352": "|**2024-06-19**|**AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|**[link](https://github.com/ethz-spylab/agentdojo)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentDojo\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u5904\u7406\u4e0d\u53ef\u4fe1\u6570\u636e\u7684AI\u4ee3\u7406\u7684\u5bf9\u6297\u6027\u9c81\u68d2\u6027\u3002\u9762\u5bf9\u4e0d\u65ad\u6f14\u53d8\u7684\u653b\u51fb\u548c\u9632\u5fa1\u624b\u6bb5\uff0cAgentDojo\u4e0d\u662f\u4e00\u4e2a\u9759\u6001\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u800c\u662f\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b0\u4efb\u52a1\u3001\u9632\u5fa1\u7b56\u7565\u4ee5\u53ca\u9002\u5e94\u6027\u653b\u51fb\u7684\u53ef\u6269\u5c55\u73af\u5883\u3002\u5b83\u5305\u542b\u4e8697\u4e2a\u5b9e\u9645\u5e94\u7528\u573a\u666f\u7684\u4efb\u52a1\uff08\u5982\u7ba1\u7406\u7535\u5b50\u90ae\u4ef6\u5ba2\u6237\u7aef\u3001\u5bfc\u822a\u7f51\u4e0a\u94f6\u884c\u7f51\u7ad9\u6216\u9884\u8ba2\u65c5\u884c\uff09\uff0c629\u4e2a\u5b89\u5168\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ee5\u53ca\u6765\u81ea\u6587\u732e\u7684\u5404\u79cd\u653b\u51fb\u548c\u9632\u5fa1\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u8bed\u8a00\u6a21\u578b\u5728AgentDojo\u4e2d\u7684\u8868\u73b0\u5e76\u4e0d\u5c3d\u4eba\u610f\uff08\u5373\u4f7f\u6ca1\u6709\u653b\u51fb\uff09\uff0c\u5e76\u4e14\u73b0\u6709\u7684\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u867d\u7136\u80fd\u7834\u574f\u4e00\u4e9b\u5b89\u5168\u7279\u6027\uff0c\u4f46\u5e76\u975e\u6240\u6709\u60c5\u51b5\u90fd\u9002\u7528\u3002\u6211\u4eec\u671f\u671bAgentDojo\u80fd\u591f\u63a8\u52a8\u7814\u7a76\uff0c\u4ee5\u5bfb\u627e\u5728\u89e3\u51b3\u5e38\u89c1\u4efb\u52a1\u65f6\u65e2\u53ef\u9760\u53c8\u5065\u58ee\u7684AI\u4ee3\u7406\u7684\u65b0\u8bbe\u8ba1\u539f\u5219\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ethz-spylab/agentdojo\u3002**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163](http://arxiv.org/abs/2406.13163)|null|\u53d1\u73b0\u65b0\u6750\u6599\u5bf9\u79d1\u5b66\u548c\u6280\u672f\u5177\u6709\u91cd\u5927\u610f\u4e49\uff0c\u4f46\u76ee\u524d\u4ecd\u662f\u8270\u5de8\u95ee\u9898\uff0c\u56e0\u4e3a\u5316\u5b66\u7a7a\u95f4\u6d69\u701a\u3002\u8fd1\u671f\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u63a8\u52a8\u4e86\u57fa\u4e8e\u6570\u636e\u7684\u65b9\u6cd5\u6765\u5feb\u901f\u7b5b\u9009\u6216\u751f\u6210\u6709\u524d\u666f\u7684\u6750\u6599\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u4f9d\u8d56\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u4e14\u5f80\u5f80\u7f3a\u4e4f\u4eba\u7c7b\u671f\u671b\u7684\u6750\u6599\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u5316\u5b66\u76f4\u89c9\u3002\u6211\u4eec\u63d0\u51faLLMatDesign\uff0c\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u53ef\u89e3\u91ca\u6750\u6599\u8bbe\u8ba1\u65b0\u6846\u67b6\u3002LLMatDesign\u5229\u7528LLM\u4ee3\u7406\u7406\u89e3\u4eba\u7c7b\u6307\u4ee4\uff0c\u5bf9\u6750\u6599\u8fdb\u884c\u4fee\u6539\uff0c\u5e76\u4f7f\u7528\u63d0\u4f9b\u7684\u5de5\u5177\u8bc4\u4f30\u7ed3\u679c\u3002\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u5148\u524d\u51b3\u7b56\uff0cLLMatDesign\u80fd\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u5feb\u901f\u9002\u5e94\u65b0\u4efb\u52a1\u548c\u6761\u4ef6\u3002\u5728\u79bb\u7ebf\u5b9e\u9a8c\u4e2d\uff0c\u5bf9LLMatDesign\u5728\u591a\u4e2a\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u7cfb\u7edf\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u5b83\u5728\u5c0f\u6570\u636e\u73af\u5883\u4e0b\u5f00\u53d1\u51fa\u5177\u6709\u7528\u6237\u5b9a\u4e49\u76ee\u6807\u6027\u8d28\u7684\u65b0\u6750\u6599\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c55\u793a\u4e86\u81ea\u4e3bLLM\u5f15\u5bfc\u7684\u8ba1\u7b97\u73af\u5883\u4e0b\u7684\u6750\u6599\u53d1\u73b0\u7684\u975e\u51e1\u6f5c\u529b\uff0c\u9884\u793a\u7740\u672a\u6765\u81ea\u9a7e\u9a76\u5b9e\u9a8c\u5ba4\u7684\u53ef\u80fd\u6027\u3002|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a-agent\u7cfb\u7edf\uff08LLM-MAS\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5b83\u4eec\u901a\u8fc7\u7cfb\u7edf\u5185\u5404\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u534f\u4f5c\u6765\u5b8c\u6210\u4efb\u52a1\uff0c\u524d\u63d0\u662f\u5171\u4eab\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5f53\u4ee3\u7406\u95f4\u7684\u4ea4\u6d41\u88ab\u7528\u4e8e\u589e\u5f3a\u4eba\u7c7b\u5408\u4f5c\u65f6\uff0c\u7531\u4e8e\u4fe1\u606f\u4e0d\u5bf9\u79f0\uff08\u6bcf\u4e2a\u4ee3\u7406\u4ec5\u80fd\u8bbf\u95ee\u5176\u5bf9\u5e94\u4eba\u7c7b\u7528\u6237\u7684\u4fe1\u606f\uff09\uff0c\u8fd9\u5e26\u6765\u4e86\u65b0\u7684\u6311\u6218\u3002\u4f20\u7edfMAS\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u96be\u4ee5\u5b8c\u6210\u4efb\u52a1\u3002\u4e3a\u89e3\u51b3\u6b64\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u79f0\u4e3a\u201ciAgents\u201d\uff0c\u5373\u4fe1\u606f\u4e30\u5bcc\u591aagent\u7cfb\u7edf\u3002\u5728iAgents\u4e2d\uff0c\u4eba\u7c7b\u793e\u4f1a\u7f51\u7edc\u5728\u4ee3\u7406\u7f51\u7edc\u4e2d\u5f97\u5230\u53cd\u6620\uff0c\u4ee3\u7406\u4e3b\u52a8\u4ea4\u6362\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4ece\u800c\u514b\u670d\u4fe1\u606f\u4e0d\u5bf9\u79f0\u3002iAgents\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ee3\u7406\u63a8\u7406\u673a\u5236\uff0cInfoNav\uff0c\u5f15\u5bfc\u4ee3\u7406\u4e4b\u95f4\u7684\u6709\u6548\u4fe1\u606f\u4ea4\u6d41\u3002\u7ed3\u5408InfoNav\uff0ciAgents\u7ec4\u7ec7\u4e86\u6df7\u5408\u8bb0\u5fc6\u4e2d\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4e3a\u4ee3\u7406\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u4fe1\u606f\u8fdb\u884c\u4ea4\u6362\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u9996\u4e2a\u9488\u5bf9\u8bc4\u4f30LLM\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u6761\u4ef6\u4e0b\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u7684\u57fa\u51c6\u2014\u2014InformativeBench\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0ciAgents\u80fd\u591f\u5728\u5305\u542b140\u4eba\u548c588\u6761\u5173\u7cfb\u7684\u793e\u4f1a\u7f51\u7edc\u4e2d\u534f\u4f5c\uff0c\u81ea\u4e3b\u8fdb\u884c\u8d85\u8fc730\u8f6e\u7684\u901a\u4fe1\uff0c\u5e76\u4ece\u8fd170,000\u6761\u6d88\u606f\u4e2d\u68c0\u7d22\u4fe1\u606f\uff0c\u57283\u5206\u949f\u5185\u5b8c\u6210\u4efb\u52a1\u3002**|\n", "2406.14884": "|**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u88ab\u8bbe\u8ba1\u7528\u4e8e\u901a\u8fc7\u8fed\u4ee3\u89c4\u5212\u548c\u884c\u52a8\u6765\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7684\u4efb\u52a1\u65f6\uff0c\u5bb9\u6613\u4ea7\u751f\u4e0d\u671f\u671b\u7684\u89c4\u5212\u5e7b\u89c9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u521d\u6b65\u5c1d\u8bd5\u901a\u8fc7\u878d\u5165\u4e0e\u5de5\u4f5c\u6d41\u7a0b\u76f8\u5173\u7684\u5916\u90e8\u77e5\u8bc6\u6765\u589e\u5f3a\u89c4\u5212\u53ef\u9760\u6027\u3002\u5c3d\u7ba1\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u6ce8\u5165\u7684\u77e5\u8bc6\u901a\u5e38\u6742\u4e71\u65e0\u7ae0\uff0c\u683c\u5f0f\u591a\u6837\uff0c\u7f3a\u4e4f\u4e25\u8c28\u7684\u89c4\u8303\u5316\u548c\u5168\u9762\u7684\u6bd4\u8f83\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u89c4\u8303\u4e86\u4e0d\u540c\u683c\u5f0f\u7684\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86FlowBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u9762\u5411\u5de5\u4f5c\u6d41\u5f15\u5bfc\u89c4\u5212\u7684\u57fa\u51c6\u3002FlowBench\u6db5\u76d6\u4e86\u6765\u81ea6\u4e2a\u9886\u57df\u768451\u4e2a\u4e0d\u540c\u573a\u666f\uff0c\u5176\u4e2d\u77e5\u8bc6\u4ee5\u591a\u6837\u7684\u5f62\u5f0f\u5448\u73b0\u3002\u4e3a\u4e86\u8bc4\u4f30\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u5728FlowBench\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u5c42\u6b21\u7684\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\u5728\u591a\u79cd\u683c\u5f0f\u4e0b\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5728\u6ee1\u8db3\u6ee1\u610f\u7684\u89c4\u5212\u9700\u6c42\u65b9\u9762\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u57fa\u51c6\u80fd\u4e3a\u672a\u6765\u7684\u4ee3\u7406\u89c4\u5212\u7814\u7a76\u94fa\u5e73\u9053\u8def\u3002|\n", "2406.17232": "|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### \u7ffb\u8bd1 \u6784\u5efa\u903c\u771f\u7684\u4eba\u5de5\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u4fe1\u7684\u793e\u4f1a\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u6709\u65f6\u80fd\u63d0\u5347\u4eba\u6027\u5316\uff0c\u4f46\u6548\u679c\u5e76\u4e0d\u603b\u662f\u7406\u60f3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u6765\u81ea\u5b9e\u8bc1\u4eba\u7c7b\u4fe1\u5ff5\u7f51\u7edc\u7684\u4fe1\u606f\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002\u6211\u4eec\u5229\u7528\u4e00\u9879\u4eba\u7c7b\u8c03\u67e5\u6570\u636e\uff0c\u4f30\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b18\u4e2a\u4e3b\u9898\u7684\u4fe1\u5ff5\u7f51\u7edc\uff0c\u8fd9\u4e9b\u4e3b\u9898\u52a0\u8f7d\u4e8e\u4e24\u4e2a\u4e0d\u91cd\u53e0\u7684\u6f5c\u5728\u56e0\u5b50\u4e0a\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728LLM\u4e2d\u690d\u5165\u4e00\u4e2a\u5173\u4e8e\u67d0\u4e00\u4e3b\u9898\u7684\u89c2\u70b9\uff0c\u5206\u6790\u5176\u5bf9\u5269\u4f59\u6d4b\u8bd5\u8bdd\u9898\u8868\u8fbe\u7684\u89c2\u70b9\u4e0e\u76f8\u5e94\u4eba\u7c7b\u6570\u636e\u7684\u5951\u5408\u7a0b\u5ea6\u3002\u4ec5\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u672a\u80fd\u4f7fLLM\u548c\u4eba\u7c7b\u89c2\u70b9\u4fdd\u6301\u4e00\u81f4\uff0c\u4f46\u5f53\u690d\u5165\u5355\u4e00\u4fe1\u5ff5\u65f6\uff0c\u5bf9\u4e8e\u76f8\u5173\u4e8e\u4fe1\u5ff5\u7f51\u7edc\u5185\u7684\u4e3b\u9898\uff0c\u8fd9\u79cd\u4e00\u81f4\u6027\u663e\u8457\u63d0\u9ad8\uff0c\u800c\u5bf9\u4e8e\u7f51\u7edc\u5916\u7684\u4e3b\u9898\u5219\u6ca1\u6709\u660e\u663e\u5f71\u54cd\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5728\u8ffd\u6c42\u7406\u89e3\u548c\u6a21\u62df\u793e\u4f1a\u4e2d\u4fe1\u5ff5\u5206\u5e03\u6a21\u5f0f\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u4e2d\uff0c\u5b9e\u73b0\u4eba\u7c7b\u4e0eLLMs\u4e4b\u95f4\u7684\u4fe1\u5ff5\u5bf9\u9f50\u3002|\n", "2406.18702": "|**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u865a\u62df\u4ee3\u7406\u6765\u6a21\u62df\u7acb\u6cd5\u8fc7\u7a0b\uff0c\u5177\u4f53\u805a\u7126\u4e8e\u7f8e\u56fd\u53c2\u8bae\u9662\u60c5\u62a5\u59d4\u5458\u4f1a\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4ee3\u8868\u4e2a\u522b\u53c2\u8bae\u5458\u7684\u4ee3\u7406\uff0c\u5e76\u5728\u6a21\u62df\u7684\u59d4\u5458\u4f1a\u8ba8\u8bba\u4e2d\u8ba9\u5b83\u4eec\u4e92\u52a8\u3002\u8fd9\u4e9b\u4ee3\u7406\u5c55\u73b0\u51fa\u5728\u73b0\u5b9e\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u6df1\u601d\u719f\u8651\u7684\u89c2\u70b9\uff0c\u5e76\u5728\u7279\u5b9a\u6761\u4ef6\u4e0b\u627e\u5230\u4e24\u515a\u7684\u89e3\u51b3\u65b9\u6848\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6a21\u62df\u663e\u793a\uff0c\u9762\u5bf9\u5916\u90e8\u5e72\u6270\u65f6\uff0c\u4ee3\u7406\u6a21\u578b\u5728\u4e24\u515a\u5408\u4f5c\u4e0a\u5c55\u73b0\u51fa\u8f6c\u53d8\u7684\u6f5c\u529b\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u7b56\u7565\u53ef\u80fd\u6210\u4e3a\u7406\u89e3\u548c\u6539\u8fdb\u7acb\u6cd5\u6d41\u7a0b\u7684\u6709\u6548\u5de5\u5177\uff0c\u8fd9\u4e0e\u4e00\u7cfb\u5217\u53d1\u73b0\u76f8\u547c\u5e94\uff0c\u5373\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u80fd\u6709\u7528\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u73b0\u8c61\u3002\u672a\u6765\u7684\u7814\u7a76\u5c06\u81f4\u529b\u4e8e\u63d0\u5347\u4ee3\u7406\u7684\u590d\u6742\u6027\uff0c\u6269\u5927\u6a21\u62df\u8303\u56f4\uff0c\u5e76\u63a2\u7d22\u5728\u653f\u7b56\u6d4b\u8bd5\u548c\u8c08\u5224\u4e2d\u7684\u5e94\u7528\u3002|\n", "2406.19966": "|**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|\u5927\u591a\u6570\u7ecf\u6d4e\u7406\u8bba\u901a\u5e38\u5047\u8bbe\u91d1\u878d\u5e02\u573a\u53c2\u4e0e\u8005\u662f\u5b8c\u5168\u7406\u6027\u7684\u4e2a\u4f53\uff0c\u5e76\u4f7f\u7528\u6570\u5b66\u6a21\u578b\u6765\u6a21\u62df\u4eba\u7c7b\u5728\u91d1\u878d\u5e02\u573a\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u884c\u4e3a\u5f80\u5f80\u5e76\u975e\u5b8c\u5168\u7406\u6027\uff0c\u7528\u6570\u5b66\u6a21\u578b\u7cbe\u786e\u9884\u6d4b\u9887\u5177\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\\textbf{A}gent-based \\textbf{S}imulated \\textbf{F}inancial \\textbf{M}arket\uff08ASFM\uff09\uff0c\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u771f\u5b9e\u8ba2\u5355\u5339\u914d\u7cfb\u7edf\u7684\u6a21\u62df\u80a1\u7968\u5e02\u573a\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80a1\u7968\u4ea4\u6613\u4ee3\u7406\uff0c\u5b83\u5305\u62ec\u4e2a\u4eba\u6982\u51b5\u3001\u89c2\u5bdf\u548c\u57fa\u4e8e\u5de5\u5177\u5b66\u4e60\u7684\u52a8\u4f5c\u6a21\u5757\u3002\u8fd9\u79cd\u4ea4\u6613\u4ee3\u7406\u80fd\u591f\u5168\u9762\u7406\u89e3\u5f53\u524d\u5e02\u573a\u52a8\u6001\u548c\u91d1\u878d\u653f\u7b56\u4fe1\u606f\uff0c\u4ece\u800c\u6839\u636e\u5176\u4ea4\u6613\u7b56\u7565\u4f5c\u51fa\u51b3\u7b56\u3002\u5b9e\u9a8c\u8868\u660e\uff0cASFM\u5728\u53ef\u63a7\u573a\u666f\u4e0b\u7684\u53cd\u5e94\u4e0e\u73b0\u5b9e\u80a1\u7968\u5e02\u573a\u4e00\u81f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u4e24\u4e2a\u7ecf\u6d4e\u5b66\u7814\u7a76\u70ed\u70b9\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u53d1\u73b0\uff0c\u6211\u4eec\u7684\\model\u5f97\u51fa\u7684\u7ed3\u8bba\u4e0e\u7ecf\u6d4e\u5b66\u7814\u7a76\u7684\u521d\u6b65\u53d1\u73b0\u76f8\u543b\u5408\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8ba4\u4e3aASFM\u4e3a\u7ecf\u6d4e\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u8303\u5f0f\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u5982\u4e13\u95e8\u5316\u7684\u6a21\u578b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u53ef\u4ee5\u6839\u636e\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5728\u533b\u7597\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u95e8\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u79f0\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\u6765\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u4e3a\u7ed9\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u5408\u9002\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\u7684\u6700\u65b0\u72b6\u6001\uff0c\u751a\u81f3\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4o\u76f8\u6bd4\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cMMedAgent\u8fd8\u663e\u793a\u51fa\u4e86\u66f4\u65b0\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u7684\u9ad8\u6548\u6027\u3002|\n", "2407.01887": "|**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|\u672c\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u51b3\u7b56\u5236\u5b9a\u4e2d\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u675c\u5c14\u514b\u59c6\u53cc\u81c2\u8d4c\u535a\uff08Dueling Bandits\uff0cDB\uff09\u95ee\u9898\u7684\u4e0a\u4e0b\u6587\u4e2d\u3002\u7814\u7a76\u6bd4\u8f83\u4e86GPT-3.5-Turbo\u3001GPT-4\u548cGPT-4-Turbo\u4e0e\u73b0\u6709DB\u7b97\u6cd5\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c24\u5176\u662fGPT-4 Turbo\uff0c\u80fd\u591f\u5feb\u901f\u8bc6\u522b\u51fa\u4f18\u52bf\u660e\u663e\u7684\u9009\u9879\uff0c\u4ece\u800c\u5728\u5f31\u540e\u6094\u65b9\u9762\u8d85\u8d8a\u5f53\u524d\u6700\u4f73\u7b97\u6cd5\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6536\u655b\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5bf9\u63d0\u793a\u7684\u654f\u611f\u5ea6\u8f83\u9ad8\uff0c\u4e14\u5bf9\u63d0\u793a\u53d8\u5316\u53cd\u5e94\u8106\u5f31\u3002\u4e3a\u4e86\u6539\u8fdb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u4e86LLM\u51b3\u7b56\u80fd\u529b\u4e0e\u7ecf\u5178DB\u7b97\u6cd5\u7406\u8bba\u4fdd\u8bc1\u7684\u589e\u5f3a\u578b\u7b97\u6cd5\u2014\u2014IF-Enhanced LLM\u3002\u8fd9\u79cd\u8bbe\u8ba1\u5c55\u793a\u4e86\u5982\u4f55\u589e\u5f3aLLM\u5728\u5bf9\u6027\u80fd\u7a33\u5b9a\u6027\u6709\u8981\u6c42\u7684\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u53ef\u4fe1\u5ea6\u3002IF-Enhanced LLM\u5177\u6709\u5f31\u540e\u6094\u548c\u5f3a\u540e\u6094\u7684\u7406\u8bba\u4fdd\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u5373\u4f7f\u9762\u5bf9\u5608\u6742\u548c\u5bf9\u6297\u6027\u7684\u63d0\u793a\uff0cIF-Enhanced LLM\u4ecd\u4fdd\u6301\u7a33\u5065\u3002|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u81ea\u52a8\u5316\uff0c\u5982\u4ee3\u7801\u5408\u6210\u3001\u7a0b\u5e8f\u4fee\u590d\u548c\u6d4b\u8bd5\u751f\u6210\uff0c\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\u3002\u7814\u7a76\u4eba\u5458\u548c\u4e1a\u754c\u5b9e\u8df5\u8005\u5df2\u7ecf\u5f00\u53d1\u51fa\u5404\u79cd\u81ea\u4e3bLLM\u4ee3\u7406\u6765\u6267\u884c\u7aef\u5230\u7aef\u7684\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u3001\u8fd0\u884c\u547d\u4ee4\u3001\u89c2\u5bdf\u73af\u5883\u53cd\u9988\u5e76\u89c4\u5212\u672a\u6765\u884c\u52a8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u65b9\u6cd5\u7684\u590d\u6742\u6027\u4ee5\u53ca\u5f53\u524dLLM\u7684\u5c40\u9650\u6027\uff0c\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u662f\u5426\u771f\u7684\u9700\u8981\u4f7f\u7528\u590d\u6742\u7684\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\uff1f\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86Agentless\u2014\u2014\u4e00\u79cd\u65e0\u4ee3\u7406\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u89e3\u51b3\u8f6f\u4ef6\u5f00\u53d1\u95ee\u9898\u3002\u4e0e\u590d\u6742\u7684\u4ee3\u7406\u8bbe\u7f6e\u76f8\u6bd4\uff0cAgentless\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u4e24\u9636\u6bb5\u8fc7\u7a0b\uff1a\u5b9a\u4f4d\u540e\u4fee\u590d\uff0c\u4e0d\u8ba9LLM\u51b3\u5b9a\u672a\u6765\u7684\u884c\u52a8\u6216\u64cd\u4f5c\u590d\u6742\u7684\u5de5\u5177\u3002\u5728\u6d41\u884c\u7684SWE-bench Lite\u57fa\u51c6\u4e0a\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4ee4\u4eba\u60ca\u8bb6\u5730\u8868\u660e\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u80fd\u591f\u5b9e\u73b0\u6700\u9ad8\u6027\u80fd\uff0827.33%\uff09\u548c\u6700\u4f4e\u6210\u672c\uff080.34\u7f8e\u5143\uff09\uff0c\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u8f6f\u4ef6\u4ee3\u7406\uff01 \u6b64\u5916\uff0c\u6211\u4eec\u624b\u52a8\u5206\u7c7b\u4e86SWE-bench Lite\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u53d1\u73b0\u5b58\u5728\u7cbe\u786e\u7684ground truth\u8865\u4e01\u95ee\u9898\u6216\u63cf\u8ff0\u4e0d\u8db3/\u8bef\u5bfc\u6027\u7684\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86SWE-bench Lite-S\uff0c\u901a\u8fc7\u6392\u9664\u8fd9\u4e9b\u95ee\u9898\u6765\u8fdb\u884c\u66f4\u4e25\u683c\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5f53\u524d\u88ab\u5ffd\u89c6\u7684\u7b80\u5355\u3001\u53ef\u89e3\u91ca\u6280\u672f\u5728\u81ea\u4e3b\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5e0c\u671bAgentless\u5c06\u4f5c\u4e3a\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\u7684\u57fa\u7ebf\u3001\u8d77\u70b9\u548c\u671f\u671b\u503c\uff0c\u6fc0\u53d1\u672a\u6765\u5728\u8fd9\u4e2a\u5173\u952e\u9886\u57df\u7684\u5de5\u4f5c\u3002**|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u6536\u96c6\u5168\u7403\u4fe1\u606f\uff0c\u5e76\u8fdb\u884c\u63a8\u7406\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\uff0c\u8fd9\u5f15\u53d1\u4e86\u4f7f\u7528LLM\u9884\u6d4b\u56fd\u9645\u4e8b\u4ef6\u7684\u5174\u8da3\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u4e00\u4e2a\u4e25\u683c\u8bc4\u4f30LLM\u9884\u6d4b\u80fd\u529b\u4e0e\u53ef\u9760\u6027\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faMIRAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4ef7LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\u3002MIRAI\u6784\u5efa\u4e86\u4e00\u4e2a\u4ee3\u7406\u73af\u5883\uff0c\u914d\u5907\u6709\u8bbf\u95ee\u5e7f\u6cdb\u5386\u53f2\u7ed3\u6784\u5316\u4e8b\u4ef6\u548c\u6587\u672c\u65b0\u95fb\u6570\u636e\u5e93\u7684\u5de5\u5177\u3002\u6211\u4eec\u5bf9GDELT\u4e8b\u4ef6\u6570\u636e\u5e93\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6e05\u6d17\u548c\u89e3\u6790\uff0c\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u5173\u8054\u9884\u6d4b\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u4e0d\u540c\u9884\u6d4b\u65f6\u95f4\u8303\u56f4\uff0c\u4ece\u77ed\u671f\u5230\u957f\u671f\uff0c\u4ee5\u68c0\u9a8cLLM\u5728\u6574\u5408\u5168\u7403\u5173\u952e\u4fe1\u606f\u3001\u8fd0\u7528\u9886\u57df\u7279\u5b9aAPI\u548c\u5e93\u7f16\u5199\u4ee3\u7801\u4ee5\u53ca\u7efc\u5408\u5904\u7406\u6765\u81ea\u591a\u79cd\u683c\u5f0f\u548c\u65f6\u95f4\u7684\u5386\u53f2\u77e5\u8bc6\u4ee5\u51c6\u786e\u9884\u6d4b\u672a\u6765\u4e8b\u4ef6\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u53ef\u9760\u7684\u6846\u67b6\uff0c\u4ee5\u8bc4\u4f30LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u9884\u6d4b\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u63a8\u52a8\u66f4\u7cbe\u786e\u548c\u53ef\u4fe1\u7684\u56fd\u9645\u5173\u7cfb\u5206\u6790\u6a21\u578b\u7684\u53d1\u5c55\u3002|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u57fa\u4e8eLLM\u7684\u79fb\u52a8\u4ee3\u7406\u5df2\u6210\u4e3a\u4eba\u673a\u4ea4\u4e92\u9886\u57df\u7684\u7814\u7a76\u70ed\u70b9\u3002\u7136\u800c\uff0c\u9488\u5bf9\u6b64\u7c7b\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\u8d44\u6e90\u76f8\u5bf9\u532e\u4e4f\u3002\u8bc4\u4f30\u8fd9\u7c7b\u4ee3\u7406\u901a\u5e38\u9762\u4e34\u4e09\u4e2a\u6311\u6218\uff1a\uff081\uff09\u4ec5\u4f9d\u8d56\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\u7684\u4f4e\u6548\u9650\u5236\u4e86\u4efb\u52a1\u8bc4\u4f30\uff1b\uff082\uff09\u5355\u4e00\u5e94\u7528\u4e2d\u7684\u7279\u5b9a\u6307\u4ee4\u4e0d\u8db3\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u7684\u591a\u7ef4\u5ea6\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\uff1b\uff083\uff09\u5f53\u524d\u7684\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u51c6\u786e\u8861\u91cf\u8fde\u7eed\u52a8\u4f5c\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mobile-Bench\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u7528\u4e8e\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u80fd\u529b\u7684\u57fa\u51c6\u3002\u9996\u5148\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4f20\u7edf\u7684UI\u64cd\u4f5c\uff0c\u878d\u5165\u4e86103\u4e2a\u6536\u96c6\u5230\u7684API\uff0c\u4ee5\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7684\u6548\u7387\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u548cLLM\u589e\u5f3a\u7684\u6570\u636e\u6536\u96c6\u6765\u8fdb\u884c\u8bc4\u4f30\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u8bc4\u4ef7\u79fb\u52a8\u4ee3\u7406\u7684\u4e0d\u540c\u89c4\u5212\u80fd\u529b\u5c42\u6b21\uff0c\u6211\u4eec\u7684\u6570\u636e\u88ab\u5206\u4e3aSAST\uff08\u7b80\u5355\u4efb\u52a1\uff09\u3001SAMT\uff08\u7a0d\u590d\u6742\u4efb\u52a1\uff09\u548cMAMT\uff08\u591a\u4efb\u52a1\uff09\u4e09\u7c7b\uff0c\u53cd\u6620\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u7684\u5dee\u5f02\u3002Mobile-Bench\u5305\u542b832\u6761\u6570\u636e\u6761\u76ee\uff0c\u5176\u4e2d\u8d85\u8fc7200\u9879\u4efb\u52a1\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u6d4b\u8bd5\u8de8\u5e94\u7528\u534f\u4f5c\u573a\u666f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3aCheckPoint\uff0c\u7528\u4e8e\u68c0\u67e5LLM\u79fb\u52a8\u4ee3\u7406\u5728\u89c4\u5212\u548c\u63a8\u7406\u6b65\u9aa4\u4e2d\u662f\u5426\u8fbe\u5230\u5173\u952e\u70b9\u3002|\n", "2407.00476": "|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**\u968f\u7740\u4f20\u7edf\u4f18\u5316\u548c\u8c03\u5ea6\u65b9\u6cd5\u9010\u6e10\u8f6c\u5411\u7528\u6237\u9a71\u52a8\u548c\u4e2a\u4eba\u5316\u670d\u52a1\uff0c\u4ee5\u63d0\u5347\u7528\u6237\u4f53\u9a8c\uff08QoE\uff09\u548c\u7075\u6d3b\u6027\uff0c\u672a\u6765\u7684\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5728\u65e0\u7ebf\u548c\u6570\u5b57\u5316\u80fd\u6e90\u7f51\u7edc\u4e2d\uff0c\u9762\u4e34\u7740\u5982\u4f55\u66f4\u597d\u5730\u7406\u89e3\u548c\u54cd\u5e94\u7528\u6237\u9700\u6c42\u7684\u6311\u6218\u3002\u4f20\u7edf\u7684\u7cfb\u7edf\u5f80\u5f80\u5ffd\u89c6\u4e86\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\uff0c\u56e0\u4e3a\u7528\u6237\u4e0e\u673a\u5668\u4e4b\u95f4\u7684\u6c9f\u901a\u4e0d\u7545\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u5e26\u6765\u4e86\u7a81\u7834\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u7528\u6237\u4e0e\u8bbe\u5907\u4e4b\u95f4\u81ea\u7136\u7684\u4ea4\u6d41\u754c\u9762\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u67b6\u6784\uff0c\u901a\u8fc7\u6784\u5efa\u4e09\u4e2aLLM\u4ee3\u7406\u6765\u5c06\u7528\u6237\u7684\u8bed\u97f3\u8bf7\u6c42\uff08VRQ\uff09\u8f6c\u5316\u4e3a\u8d44\u6e90\u5206\u914d\u5411\u91cf\u3002\u5177\u4f53\u5305\u62ec\uff1aLLM\u610f\u56fe\u8bc6\u522b\u4ee3\u7406\u5c06\u8bf7\u6c42\u8f6c\u5316\u4e3a\u4f18\u5316\u95ee\u9898\uff08OP\uff09\u3001LLM OP\u53c2\u6570\u8bc6\u522b\u4ee3\u7406\u4ee5\u53caLLM OP\u6c42\u89e3\u4ee3\u7406\u3002 \u6211\u4eec\u9488\u5bf9\u7535\u52a8\u6c7d\u8f66\uff08EV\uff09\u5145\u7535\u7684\u5178\u578bVRQ\u521b\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u5e93\uff0c\u4f5c\u4e3a\u6027\u80fd\u8bc4\u4f30\u7684\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u4e3b\u8981\u4f7f\u7528Llama 3 8B\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\u3002\u901a\u8fc7\u4e0d\u540c\u7684\u63d0\u793a\u5de5\u7a0b\u573a\u666f\u6d4b\u8bd5\uff0c\u7ed3\u679c\u663e\u793a\u4e86\u6240\u63d0\u67b6\u6784\u7684\u6709\u6548\u6027\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u952e\u89c1\u89e3\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u5efa\u6a21\u5b9e\u9645\u95ee\u9898\u7684\u66f4\u5927\u5019\u9009OP\u96c6\u53ef\u80fd\u4f1a\u7531\u4e8e\u66f4\u9ad8\u7684\u8bc6\u522b/OP\u5206\u7c7b\u566a\u58f0\u800c\u964d\u4f4e\u6700\u7ec8\u6027\u80fd\u3002\u6240\u6709\u7ed3\u679c\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u5229\u7528\u3002**|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|\u4eba\u5de5\u667a\u80fd\u5728\u91d1\u878d\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6b63\u5728\u91cd\u5851\u6570\u636e\u5904\u7406\u548c\u89e3\u8bfb\u65b9\u5f0f\u3002\u5176\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u3001\u63d0\u5347\u5ba2\u6237\u670d\u52a1\uff0c\u5e76\u63d0\u4f9b\u8be6\u5c3d\u7684\u8d22\u52a1\u5206\u6790\u3002\u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdIDEA-FinBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u91d1\u878d\u77e5\u8bc6\u65b9\u9762\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u7684\u8bc4\u4ef7\u57fa\u51c6\u3002\u5b83\u501f\u9274\u4e86\u4e24\u4e2a\u5168\u7403\u77e5\u540d\u4e14\u6743\u5a01\u7684\u91d1\u878d\u4e13\u4e1a\u8003\u8bd5\u4e2d\u7684\u95ee\u9898\uff0c\u65e8\u5728\u5168\u9762\u68c0\u9a8cLLMs\u89e3\u7b54\u4e0e\u91d1\u878d\u76f8\u5173\u8003\u9898\u7684\u80fd\u529b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51faIDEA-FinKER\uff0c\u662f\u4e00\u4e2a\u91d1\u878d\u77e5\u8bc6\u589e\u5f3a\u6846\u67b6\uff0c\u65e8\u5728\u5feb\u901f\u8ba9\u901a\u7528LLMs\u9002\u5e94\u91d1\u878d\u9886\u57df\u3002\u5b83\u91c7\u7528\u57fa\u4e8e\u68c0\u7d22\u7684\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b9e\u73b0\u5b9e\u65f6\u4e0a\u4e0b\u6587\u7ea7\u77e5\u8bc6\u6ce8\u5165\uff0c\u5e76\u63d0\u4f9b\u4e00\u5957\u9ad8\u8d28\u91cf\u7684\u91d1\u878d\u77e5\u8bc6\u6307\u4ee4\uff0c\u7528\u4e8e\u5fae\u8c03\u4efb\u4f55\u901a\u7528\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86IDEA-FinQA\uff0c\u4e00\u4e2a\u7531LLMs\u9a71\u52a8\u7684\u91d1\u878d\u95ee\u7b54\u7cfb\u7edf\u3002\u8be5\u7cfb\u7edf\u56f4\u7ed5\u5b9e\u65f6\u77e5\u8bc6\u6ce8\u5165\u548c\u4e8b\u5b9e\u5f3a\u5316\u7684\u67b6\u6784\u6784\u5efa\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u3002IDEA-FinQA\u4e3b\u8981\u7531\u6570\u636e\u6536\u96c6\u5668\u3001\u6570\u636e\u67e5\u8be2\u6a21\u5757\u548c\u6267\u884c\u7279\u5b9a\u529f\u80fd\u7684LLM\u4ee3\u7406\u7ec4\u6210\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u5ea6\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04503": "|**2024-07-05**|**When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions**|J\u00e9r\u00e9my Perez et.al.|[2407.04503](http://arxiv.org/abs/2407.04503)|**[link](https://github.com/jeremyperez2/telephonegamellm)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e4b\u95f4\u7684\u4e92\u52a8\u589e\u52a0\uff0c\u5b83\u4eec\u5728\u7ebf\u4e0a\u751f\u6210\u7684\u6587\u672c\u91cf\u4e5f\u968f\u4e4b\u589e\u591a\uff0c\u7814\u7a76\u5982\u4f55\u4fe1\u606f\u5728\u4ece\u4e00\u4e2aLLM\u4f20\u9012\u5230\u53e6\u4e00\u4e2aLLM\u7684\u8fc7\u7a0b\u4e2d\u53d1\u751f\u53d8\u5316\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5bf9\u5355\u4e2aLLM\u7684\u884c\u4e3a\u5df2\u6709\u6df1\u5165\u7814\u7a76\uff0c\u4f46\u5bf9\u8fed\u4ee3\u4ea4\u4e92\u4e2d\u96c6\u4f53\u884c\u4e3a\u548c\u4fe1\u606f\u626d\u66f2\u7684\u63a2\u8ba8\u76f8\u5bf9\u4e0d\u8db3\u3002\u5fae\u5c0f\u7684\u504f\u5dee\uff0c\u5728\u5355\u6b21\u8f93\u51fa\u65f6\u53ef\u80fd\u663e\u5f97\u4e0d\u660e\u663e\uff0c\u4f46\u5728\u591a\u6b21\u4ea4\u4e92\u4e2d\u53ef\u80fd\u4f1a\u88ab\u653e\u5927\uff0c\u53ef\u80fd\u5bfc\u81f4\u5185\u5bb9\u671d\u7740\u5438\u5f15\u5b50\u72b6\u6001\u6f14\u53d8\u3002\u6211\u4eec\u901a\u8fc7\u501f\u9274\u4eba\u7c7b\u6587\u5316\u8fdb\u5316\u5b66\u7684\u7814\u7a76\u65b9\u6cd5\u2014\u2014\u7535\u8bdd\u6e38\u620f\u5b9e\u9a8c\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u94fe\u5f0f\u4f20\u8f93\u6a21\u578b\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLLM\u4ee3\u7406\u63a5\u6536\u3001\u751f\u6210\u5e76\u4f20\u9012\u6587\u672c\uff0c\u4ece\u4e00\u4e2a\u94fe\u4e2d\u7684\u524d\u4e00\u4e2a\u4ee3\u7406\u5230\u4e0b\u4e00\u4e2a\u3002\u6211\u4eec\u8ffd\u8e2a\u4e86\u6587\u672c\u7684\u6bd2\u6027\u3001\u79ef\u6781\u5ea6\u3001\u96be\u5ea6\u548c\u957f\u5ea6\u5728\u4f20\u8f93\u94fe\u4e2d\u7684\u6f14\u53d8\uff0c\u63ed\u793a\u4e86\u504f\u89c1\u548c\u5438\u5f15\u5b50\u7684\u5b58\u5728\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u4e0e\u521d\u59cb\u6587\u672c\u3001\u6307\u4ee4\u3001\u8bed\u8a00\u6a21\u578b\u548c\u6a21\u578b\u89c4\u6a21\u7684\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u53d1\u73b0\u5f00\u653e\u6027\u6307\u4ee4\u6bd4\u7ea6\u675f\u6027\u4efb\u52a1\u66f4\u5bb9\u6613\u5f15\u53d1\u66f4\u5f3a\u7684\u5438\u5f15\u6548\u5e94\u3002\u6b64\u5916\uff0c\u4e0d\u540c\u7684\u6587\u672c\u7279\u6027\u5bf9\u5438\u5f15\u5b50\u6548\u5e94\u7684\u654f\u611f\u5ea6\u4e0d\u540c\uff0c\u6bd2\u6027\u7684\u5f71\u54cd\u901a\u5e38\u5927\u4e8e\u957f\u5ea6\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8003\u8651\u591a\u6b65\u9aa4\u4f20\u8f93\u52a8\u6001\u7684\u91cd\u8981\u6027\uff0c\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3LLM\u7684\u6587\u5316\u52a8\u6001\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.04363": "|**2024-07-05**|**AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents**|Petr Anokhin et.al.|[2407.04363](http://arxiv.org/abs/2407.04363)|**[link](https://github.com/airi-institute/arigraph)**|**\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u4e2d\u5c55\u73b0\u51fa\u5e7f\u9614\u7684\u5e94\u7528\u524d\u666f\u3002\u5b9e\u73b0\u771f\u6b63\u7684\u81ea\u4e3b\u6027\u9700\u8981\u4ece\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u4e2d\u79ef\u7d2f\u548c\u66f4\u65b0\u77e5\u8bc6\uff0c\u5e76\u80fd\u6709\u6548\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u3002\u5f53\u524d\u57fa\u4e8eLLMs\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u5168\u5386\u53f2\u89c2\u5bdf\u3001\u603b\u7ed3\u6216\u68c0\u7d22\u589e\u5f3a\uff0c\u4f46\u8fd9\u4e9b\u975e\u7ed3\u6784\u5316\u7684\u8bb0\u5fc6\u8868\u793a\u4e0d\u5229\u4e8e\u590d\u6742\u51b3\u7b56\u4e2d\u7684\u63a8\u7406\u548c\u89c4\u5212\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51faAriGraph\uff0c\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\uff0c\u8ba9\u4ee3\u7406\u5728\u63a2\u7d22\u73af\u5883\u4e2d\u6784\u5efa\u878d\u5408\u8bed\u4e49\u548c\u60c5\u8282\u8bb0\u5fc6\u7684\u8bb0\u5fc6\u56fe\u3002\u8fd9\u79cd\u56fe\u7ed3\u6784\u4fc3\u8fdb\u5173\u8054\u6982\u5ff5\u7684\u6709\u6548\u68c0\u7d22\uff0c\u8fd9\u4e9b\u6982\u5ff5\u4e0e\u4ee3\u7406\u5f53\u524d\u72b6\u6001\u548c\u76ee\u6807\u76f8\u5173\uff0c\u4ece\u800c\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u73af\u5883\u6a21\u578b\uff0c\u63d0\u5347\u63a2\u7d22\u548c\u89c4\u5212\u80fd\u529b\u3002 \u6211\u4eec\u8bbe\u8ba1\u7684Ariadne LLM\u4ee3\u7406\uff0c\u914d\u5907\u6709\u6211\u4eec\u63d0\u51fa\u7684\u8bb0\u5fc6\u67b6\u6784\u4ee5\u53ca\u89c4\u5212\u548c\u51b3\u7b56\u529f\u80fd\uff0c\u80fd\u5728\u96f6\u6837\u672c\u57fa\u7840\u4e0a\u5904\u7406TextWorld\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\uff0c\u5982First TextWorld Problems\u7ade\u8d5b\u4e2d\u7684\u70f9\u996a\u6311\u6218\uff0c\u4ee5\u53ca\u65b0\u4efb\u52a1\u5982\u623f\u5c4b\u6e05\u6d01\u548c\u5bfb\u5b9d\u8c1c\u9898\u3002\u4e0e\u5168\u5386\u53f2\u3001\u603b\u7ed3\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b49\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002**|\n", "2407.06112": "|**2024-07-08**|**Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning**|Yadong Zhang et.al.|[2407.06112](http://arxiv.org/abs/2407.06112)|null|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63a8\u7406\u65b9\u6cd5\u2014\u2014\u53cc\u5411\u51b3\u7b56\u89e3\u653e\u63a8\u7406\uff08BIDDER\uff09\uff0c\u65e8\u5728\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u51b3\u7b56\u5408\u7406\u6027\u3002\u4f20\u7edf\u63a8\u7406\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u5386\u53f2\u4fe1\u606f\uff0c\u91c7\u7528\u5355\u5411\uff08\u4ece\u5de6\u5230\u53f3\uff09\u7684\u63a8\u7406\u7b56\u7565\uff0c\u8fd9\u5bfc\u81f4\u5bf9\u6f5c\u5728\u672a\u6765\u7ed3\u679c\u7684\u8ba4\u8bc6\u4e0d\u8db3\uff0c\u4ee5\u53ca\u5386\u53f2\u80cc\u666f\u7684\u6574\u5408\u4e0d\u591f\u5145\u5206\uff0c\u4ece\u800c\u4ea7\u751f\u6b21\u4f18\u51b3\u7b56\u3002BIDDER\u901a\u8fc7\u878d\u5408\u7406\u6027\u51b3\u7b56\u7684\u539f\u5219\uff0c\u7279\u522b\u662f\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u5e76\u9884\u6d4b\u671f\u671b\u6548\u7528\uff0c\u5f25\u8865\u4e86\u8fd9\u4e00\u77ed\u677f\u3002\u5176\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u4ece\u5386\u53f2\u6570\u636e\u4e2d\u63a8\u65ad\u9690\u85cf\u72b6\u6001\uff0c\u4ee5\u8868\u793a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4e0d\u786e\u5b9a\u4fe1\u606f\uff1b\u5229\u7528\u8fd9\u4e9b\u9690\u85cf\u72b6\u6001\u9884\u6d4b\u672a\u6765\u7684\u6f5c\u5728\u72b6\u6001\u548c\u53ef\u80fd\u7ed3\u679c\uff1b\u7ed3\u5408\u5386\u53f2\u4fe1\u606f\uff08\u8fc7\u53bb\u60c5\u5883\uff09\u548c\u957f\u671f\u7ed3\u679c\uff08\u672a\u6765\u60c5\u5883\uff09\uff0c\u4ee5\u6307\u5bfc\u63a8\u7406\u3002\u901a\u8fc7\u53cc\u5411\u63a8\u7406\uff0cBIDDER\u80fd\u591f\u5168\u9762\u8003\u8651\u8fc7\u53bb\u548c\u672a\u6765\u7684\u60c5\u5883\uff0c\u4ece\u800c\u505a\u51fa\u66f4\u660e\u667a\u3001\u66f4\u7406\u6027\u7684\u51b3\u7b56\u3002\u6211\u4eec\u5728\u6251\u514b\uff08\u9650\u6ce8\u5fb7\u5dde\u6251\u514b\uff09\u548c\u8c08\u5224\u4e24\u4e2a\u660e\u786e\u573a\u666f\u4e2d\u6d4b\u8bd5\u4e86BIDDER\u7684\u6548\u679c\uff0c\u5b9e\u9a8c\u663e\u793a\u5b83\u663e\u8457\u63d0\u9ad8\u4e86\u8bed\u8a00\u6a21\u578b\u548c\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7684\u51b3\u7b56\u80fd\u529b\u3002|\n", "2407.05890": "|**2024-07-08**|**Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation**|Jiaqi Chen et.al.|[2407.05890](http://arxiv.org/abs/2407.05890)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89c6\u89c9\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u96f6\u6837\u672c\u7684\u5f3a\u5927\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u5173\u6ce8\u89e3\u51b3\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\uff0c\u901a\u8fc7\u9009\u62e9\u9884\u5b9a\u4e49\u5bfc\u822a\u56fe\u4e2d\u7684\u8282\u70b9\u8fdb\u884c\u79fb\u52a8\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u573a\u666f\u4e2d\u4f4e\u5c42\u6b21\u7684\u63a7\u5236\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AO-Planner\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u9762\u5411\u53ef\u53ca\u6027\u89c4\u5212\u7684\u8fde\u7eed\u89c6\u89c9\u5bfc\u822a\u6846\u67b6\u3002AO-Planner\u6574\u5408\u591a\u79cd\u57fa\u7840\u6a21\u578b\uff0c\u5b9e\u73b0\u9762\u5411\u53ef\u53ca\u6027\u7684\u8fd0\u52a8\u89c4\u5212\u548c\u52a8\u4f5c\u51b3\u7b56\uff0c\u5747\u4ee5\u96f6\u6837\u672c\u7684\u65b9\u5f0f\u6267\u884c\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u89c6\u89c9\u53ef\u53ca\u6027\u63d0\u793a\uff08VAP\uff09\u65b9\u6cd5\uff0c\u5229\u7528SAM\u5206\u5272\u53ef\u89c1\u5730\u9762\uff0c\u63d0\u4f9b\u5bfc\u822a\u53ef\u53ca\u6027\u4fe1\u606f\uff0c\u4ece\u800c\u8ba9\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u6f5c\u5728\u7684\u4e0b\u4e00\u4e2a\u8def\u6807\uff0c\u5e76\u751f\u6210\u5411\u9009\u5b9a\u8def\u6807\u7684\u4f4e\u5c42\u6b21\u8def\u5f84\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9ad8\u7ea7\u4ee3\u7406PathAgent\uff0c\u8bc6\u522b\u51fa\u6700\u53ef\u80fd\u7684\u50cf\u7d20\u7ea7\u8def\u5f84\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u4e09\u7ef4\u5750\u6807\uff0c\u4ee5\u5b8c\u6210\u4f4e\u5c42\u6b21\u7684\u79fb\u52a8\u3002 \u5728\u5177\u6709\u6311\u6218\u6027\u7684R2R-CE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0cAO-Planner\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u96f6\u6837\u672c\u6027\u80fd\u63d0\u5347\uff08SPL\u6307\u6807\u63d0\u9ad85.5%\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u8fde\u63a5\u4e86\u8bed\u8a00\u6a21\u578b\u4e0e\u4e09\u7ef4\u4e16\u754c\uff0c\u907f\u514d\u4e86\u76f4\u63a5\u9884\u6d4b\u4e16\u754c\u5750\u6807\u70b9\u7684\u56f0\u96be\uff0c\u4e3a\u5229\u7528\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4f4e\u5c42\u6b21\u8fd0\u52a8\u63a7\u5236\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\u3002|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5176\u4e2d\u7684\u5173\u952e\u90e8\u5206\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u8fdb\u884c\u8bc4\u4f30\u548c\u8fed\u4ee3\u4f18\u5316\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5047\u8bbe\u5fc3\u667a\u5728Melting Pot\u57fa\u51c6\u4e2d\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u65e0\u8bba\u662f\u4e8c\u5143\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\uff0c\u90fd\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\uff08LLM-agent\uff09\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7840\u7ebf\u3002\u5bf9\u6bd4\u5b9e\u9a8c\u8fd8\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u7cbe\u70bc\u5bf9\u4e8e\u5728\u590d\u6742\u573a\u666f\u4e2d\u53d6\u5f97\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.06813": "|**2024-07-09**|**Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy**|Zhenyu Guan et.al.|[2407.06813](http://arxiv.org/abs/2407.06813)|null|## \u80cc\u666f \u5728\u4eba\u7c7b\u793e\u4f1a\u4e2d\uff0c\u5916\u4ea4\u662f\u4e00\u79cd\u6781\u5176\u590d\u6742\u7684\u6d3b\u52a8\uff0c\u6d89\u53ca\u4f17\u591a\u5404\u65b9/\u884c\u52a8\u8005\u7684\u4e92\u52a8\uff0c\u9700\u8981\u5177\u5907\u793e\u4f1a\u63a8\u7406\u3001\u8c08\u5224\u6280\u5de7\u548c\u957f\u671f\u7b56\u7565\u89c4\u5212\u7b49\u591a\u65b9\u9762\u80fd\u529b\u3002\u4ee5\u5f80\u7684AI\u4ee3\u7406\u5df2\u7ecf\u5728\u5904\u7406\u591a\u6b65\u9aa4\u6e38\u620f\u548c\u5927\u52a8\u4f5c\u7a7a\u95f4\u7684\u591a\u4ee3\u7406\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u5b9e\u529b\u3002\u7136\u800c\uff0c\u5916\u4ea4\u6240\u6d89\u53ca\u7684\u51b3\u7b56\u7a7a\u95f4\u8303\u56f4\u60ca\u4eba\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u8c08\u5224\u7684\u9636\u6bb5\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e00\u4e9b\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4e86\u8d85\u8d8a\u524d\u4ee3\u7684\u80fd\u529b\uff0c\u4f46\u4ecd\u4e0d\u8db3\u4ee5\u5e94\u5bf9\u590d\u6742\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u957f\u65f6\u95f4\u7684\u89c4\u5212\u3002\u501f\u52a9\u5c16\u7aef\u7684LLM\u6280\u672f\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u63a2\u7d22AI\u5728\u5982\u6b64\u5168\u9762\u7684\u591a\u4ee3\u7406\u4f7f\u547d\u4e2d\u7684\u4e0a\u9650\uff0c\u901a\u8fc7\u6574\u5408\u4e09\u4e2a\u6838\u5fc3\u4e14\u5173\u952e\u7684\u529f\u80fd\uff0c\u4ee5\u6784\u5efa\u66f4\u5f3a\u7684\u57fa\u4e8eLLM\u7684\u793e\u4f1a\u6027\u4ee3\u7406\uff1a1\uff09\u5177\u6709\u8bb0\u5fc6\u548c\u53cd\u601d\u7684\u7b56\u7565\u89c4\u5212\u8005\uff1b2\uff09\u76ee\u6807\u5bfc\u5411\u7684\u3001\u5177\u5907\u793e\u4f1a\u63a8\u7406\u7684\u8c08\u5224\u8005\uff1b3\uff09\u901a\u8fc7\u81ea\u6211\u5bf9\u5f08\u6e38\u620f\u589e\u5f3a\u8bb0\u5fc6\uff0c\u5b9e\u73b0\u65e0\u4eba\u5de5\u5e72\u9884\u7684\u81ea\u6211\u8fdb\u5316\u3002|\n", "2407.06567": "|**2024-07-10**|**FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making**|Yangyang Yu et.al.|[2407.06567](http://arxiv.org/abs/2407.06567)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u5e76\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u91d1\u878d\u9886\u57df\u3002\u7136\u800c\uff0c\u9ad8\u8d28\u91cf\u7684\u8fde\u7eed\u6295\u8d44\u51b3\u7b56\u8fc7\u7a0b\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5b83\u9700\u8981\u4e0e\u4e0d\u65ad\u53d8\u5316\u7684\u73af\u5883\u8fdb\u884c\u591a\u6b21\u4ea4\u4e92\uff0c\u4ee5\u6700\u5927\u5316\u56de\u62a5\u5e76\u7ba1\u7406\u98ce\u9669\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u4eec\u80fd\u591f\u8d85\u8d8a\u4eba\u7c7b\u56e2\u961f\uff0c\u5b9e\u73b0\u6295\u8d44\u6536\u76ca\uff0c\u4f46\u5982\u4f55\u4f18\u5316\u591a\u6e90\u4fe1\u606f\u6574\u5408\u548c\u51b3\u7b56\u7ed3\u679c\uff0c\u901a\u8fc7\u5b9e\u65f6\u7ecf\u9a8c\u6539\u8fdb\uff0c\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faFinCon\uff0c\u4e00\u4e2a\u4e13\u4e3a\u591a\u6837\u5316\u7684\u91d1\u878d\u4efb\u52a1\u8bbe\u8ba1\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u7279\u70b9\u5728\u4e8e\u6982\u5ff5\u5316\u53e3\u5934\u5f3a\u5316\u548c\u8d22\u52a1\u7ec4\u7ec7\u7ed3\u6784\u7684\u8fd0\u7528\u3002 FinCon\u501f\u9274\u73b0\u5b9e\u4e16\u754c\u6295\u8d44\u516c\u53f8\u7684\u7ec4\u7ec7\u67b6\u6784\uff0c\u91c7\u7528\u7ecf\u7406-\u5206\u6790\u5e08\u7684\u6c9f\u901a\u5c42\u6b21\uff0c\u4fc3\u8fdb\u8de8\u804c\u80fd\u4ee3\u7406\u95f4\u7684\u534f\u540c\u5408\u4f5c\uff0c\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u6d41\u5b9e\u73b0\u76ee\u6807\u7edf\u4e00\u3002\u6bcf\u4e2a\u4ee3\u7406\u90fd\u5177\u5907\u6bd4\u4eba\u7c7b\u66f4\u5927\u7684\u8bb0\u5fc6\u5bb9\u91cf\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u9ad8\u6548\u7684\u4fe1\u606f\u5904\u7406\u3002\u6b64\u5916\uff0cFinCon\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u98ce\u9669\u63a7\u5236\u7ec4\u4ef6\uff0c\u5b9a\u671f\u542f\u52a8\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4ee5\u66f4\u65b0\u7cfb\u7edf\u7684\u6295\u8d44\u7406\u5ff5\u3002\u8fd9\u4e9b\u6982\u5ff5\u5316\u7684\u4fe1\u5ff5\u4f5c\u4e3a\u53e3\u5934\u5f3a\u5316\uff0c\u6307\u5bfc\u672a\u6765\u884c\u4e3a\uff0c\u5e76\u53ef\u6839\u636e\u9700\u8981\u9009\u62e9\u6027\u5730\u4f20\u9012\u7ed9\u9700\u8981\u66f4\u65b0\u77e5\u8bc6\u7684\u8282\u70b9\uff0c\u4ece\u800c\u51cf\u5c11\u4e0d\u5fc5\u8981\u7684\u4fe1\u606f\u4ea4\u6d41\u6210\u672c\uff0c\u63d0\u9ad8\u6027\u80fd\u3002 FinCon\u5728\u5355\u4e00\u80a1\u7968\u4ea4\u6613\u548c\u8d44\u4ea7\u7ba1\u7406\u7b49\u4e0d\u540c\u91d1\u878d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u91d1\u878d\u573a\u666f\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u57fa\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u4f8b\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.08550": "|**2024-07-11**|**Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility**|Yuchen Xia et.al.|[2407.08550](http://arxiv.org/abs/2407.08550)|**[link](https://github.com/yuchenxia/gpt4industrialautomation)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6574\u5408\u5230\u81ea\u52a8\u5316\u751f\u4ea7\u7cfb\u7edf\u4e2d\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u81ea\u52a8\u5316\u548c\u7075\u6d3b\u6027\u3002\u6211\u4eec\u6839\u636e\u81ea\u52a8\u5316\u91d1\u5b57\u5854\u6784\u5efa\u751f\u4ea7\u64cd\u4f5c\u7684\u5c42\u7ea7\u7ed3\u6784\uff0c\u5c06\u539f\u5b50\u64cd\u4f5c\u529f\u80fd\u62bd\u8c61\u4e3a\u5fae\u670d\u52a1\uff0c\u5e76\u901a\u8fc7\u4e13\u7528\u7684\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u8fdb\u884c\u8c03\u7528\u6267\u884c\u3002\u8fd9\u4e3a\u534f\u8c03\u751f\u4ea7\u6d41\u7a0b\u63d0\u4f9b\u4e86\u53ef\u6269\u5c55\u4e14\u7075\u6d3b\u7684\u57fa\u7840\u3002\u5728\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u4e2d\uff0c\u4f4e\u5c42\u6b21\u7684\u3001\u786c\u4ef6\u7279\u5b9a\u7684\u6570\u636e\u88ab\u8d4b\u4e88\u8bed\u4e49\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u7406\u89e3\u548c\u5904\u7406\u751f\u4ea7\u8ba1\u5212\u4e0e\u63a7\u5236\u4efb\u52a1\u3002\u5f53\u63a5\u6536\u5230\u7528\u6237\u8bf7\u6c42\u6216\u8bc6\u522b\u5230\u89e6\u53d1\u4e8b\u4ef6\u65f6\uff0cLLMs\u4f1a\u751f\u6210\u751f\u4ea7\u6d41\u7a0b\u8ba1\u5212\uff0c\u7136\u540e\u5c06\u5176\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u5fae\u670d\u52a1\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u7684\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u6267\u884c\u3002\u6211\u4eec\u5728\u5b9e\u9a8c\u5ba4\u7684\u6a21\u5757\u5316\u81ea\u52a8\u5316\u8bbe\u65bd\u4e0a\u5b9e\u73b0\u4e86\u8fd9\u4e00\u6574\u4f53\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e00\u4e2a\u5b9e\u9645\u6848\u4f8b\u5c55\u793a\u4e86LLMs\u5982\u4f55\u5904\u7406\u751f\u4ea7\u89c4\u5212\u548c\u63a7\u5236\u4efb\u52a1\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e00\u4e2a\u76f4\u89c2\u3001\u81ea\u52a8\u5316\u7a0b\u5ea6\u9ad8\u4e14\u66f4\u5177\u7075\u6d3b\u6027\u7684\u751f\u4ea7\u73af\u5883\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u73b0LLMs\u5728\u81ea\u4e3b\u7cfb\u7edf\u4e2d\u7684\u5168\u90e8\u6f5c\u529b\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5176\u6f5c\u5728\u7684\u6709\u76ca\u4e4b\u5904\u3002\u6709\u5173\u6b64\u7cfb\u5217\u7814\u7a76\u7684\u6f14\u793a\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u8bbf\u95ee\uff1ahttps://github.com/YuchenXia/GPT4IndustrialAutomation\u3002|\n", "2407.08213": "|**2024-07-11**|**PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models**|Ruiqi Wang et.al.|[2407.08213](http://arxiv.org/abs/2407.08213)|null|## \u7ffb\u8bd1 \u504f\u597d\u9a71\u52a8\u7684\u5f3a\u5316\u5b66\u4e60\uff08PbRL\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u4eba\u7c7b\u6bd4\u8f83\u53cd\u9988\u6559\u5bfc\u673a\u5668\u4eba\uff0c\u907f\u514d\u4e86\u590d\u6742\u7684\u5956\u52b1\u5de5\u7a0b\u7684\u9700\u6c42\u3002\u7136\u800c\uff0c\u73b0\u6709PbRL\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u53cd\u9988\uff0c\u5f80\u5f80\u5bfc\u81f4\u5bf9\u7531\u811a\u672c\u6559\u5e08\u751f\u6210\u7684\u5408\u6210\u53cd\u9988\u7684\u4f9d\u8d56\uff0c\u8fd9\u53c8\u56de\u5230\u4e86\u590d\u6742\u7684\u5956\u52b1\u8bbe\u8ba1\uff0c\u5e76\u96be\u4ee5\u9002\u5e94\u4eba\u7c7b-\u673a\u5668\u4eba\u4ea4\u4e92\uff08HRI\uff09\u573a\u666f\u4e2d\u7528\u6237\u5bf9\u540c\u4e00\u4efb\u52a1\u7684\u72ec\u7279\u671f\u671b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PrefCLM\uff0c\u5b83\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6a21\u62df\u6559\u5e08\u53c2\u4e0ePbRL\u3002\u6211\u4eec\u8fd0\u7528Dempster-Shafer\u7406\u8bba\u5728\u5206\u6570\u7ea7\u522b\u878d\u5408\u6765\u81ea\u591a\u4e2aLLM\u4ee3\u7406\u7684\u4e2a\u4eba\u504f\u597d\uff0c\u6709\u6548\u5229\u7528\u5b83\u4eec\u7684\u591a\u6837\u6027\u548c\u96c6\u4f53\u667a\u6167\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7528\u6237\u53c2\u4e0e\u7684\u6d41\u7a0b\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8e\u7528\u6237\u4ea4\u4e92\u7684\u96c6\u4f53\u7cbe\u8fdb\u3002\u5728\u5404\u79cd\u901a\u7528\u5f3a\u5316\u5b66\u4e60\u4efb\u52a1\u4e2d\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cPrefCLM\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u811a\u672c\u6559\u5e08\u76f8\u5f53\uff0c\u5e76\u4e14\u5728\u4fc3\u8fdb\u66f4\u81ea\u7136\u3001\u9ad8\u6548\u7684\u673a\u5668\u4eba\u884c\u4e3a\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4e00\u4e2a\u73b0\u5b9e\u4e16\u754c\u7684\u7528\u6237\u7814\u7a76\uff08N=10\uff09\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5b83\u5728\u4e2a\u6027\u5316\u7528\u6237\u504f\u597d\u7684\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86HRI\u573a\u666f\u4e2d\u7684\u7528\u6237\u6ee1\u610f\u5ea6\u3002|\n", "2407.10718": "|**2024-07-16**|**Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning**|Yulong Wang et.al.|[2407.10718](http://arxiv.org/abs/2407.10718)|**[link](https://github.com/ag2s1/sibyl-system)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u73b0\u6709\u4ee3\u7406\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u5185\u5728\u77e5\u8bc6\u3001\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u96f6\u6837\u672c\u80fd\u529b\u4ee5\u53ca\u4eba\u7c7b\u8bbe\u8ba1\u7684\u590d\u6742LLM\u8c03\u7528\u5de5\u4f5c\u6d41\u7a0b\u4e0e\u5de5\u5177\u7684\u7ed3\u5408\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u957f\u671f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5e76\u4e14\u672a\u80fd\u5145\u5206\u5229\u7528\u73b0\u6709\u5de5\u5177\u7684\u6f5c\u529b\uff0c\u5bfc\u81f4\u5728\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u573a\u666f\u4e2d\u51fa\u73b0\u660e\u663e\u7684\u7f3a\u9677\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86Sibyl\uff0c\u4e00\u4e2a\u7b80\u5355\u800c\u5f3a\u5927\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u9ad8\u6548\u5229\u7528\u6700\u5c11\u7684\u5de5\u5177\u96c6\u6765\u89e3\u51b3\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002\u53d7\u5230\u5168\u7403\u5de5\u4f5c\u7a7a\u95f4\u7406\u8bba\u7684\u542f\u53d1\uff0cSibyl\u6574\u5408\u4e86\u4e00\u4e2a\u5168\u5c40\u5de5\u4f5c\u7a7a\u95f4\uff0c\u4ee5\u589e\u5f3a\u7cfb\u7edf\u5185\u90e8\u7684\u77e5\u8bc6\u548c\u5bf9\u8bdd\u5386\u53f2\u7684\u7ba1\u7406\u548c\u5171\u4eab\u3002\u6b64\u5916\uff0c\u6839\u636e\u5fc3\u667a\u793e\u4f1a\u7406\u8bba\u7684\u6307\u5bfc\uff0cSibyl\u5b9e\u65bd\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u8fa9\u8bba\u4e3a\u57fa\u7840\u7684\u966a\u5ba1\u56e2\uff0c\u7528\u4e8e\u81ea\u6211\u7ec6\u5316\u6700\u7ec8\u7b54\u6848\uff0c\u786e\u4fdd\u5168\u9762\u5e73\u8861\u7684\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11\u7cfb\u7edf\u590d\u6742\u6027\uff0c\u540c\u65f6\u6269\u5927\u53ef\u89e3\u51b3\u7684\u95ee\u9898\u8303\u56f4\u2014\u2014\u4ece\u4eba\u7c7b\u51e0\u5206\u949f\u5185\u5c31\u80fd\u89e3\u51b3\u7684\u95ee\u9898\u5230\u9700\u8981\u6570\u5c0f\u65f6\u751a\u81f3\u51e0\u5929\u624d\u80fd\u89e3\u51b3\u7684\u95ee\u9898\uff0c\u4ece\u800c\u5b9e\u73b0\u4ece\u7cfb\u7edf1\u5230\u7cfb\u7edf2\u601d\u8003\u65b9\u5f0f\u7684\u8f6c\u53d8\u3002Sibyl\u7684\u8bbe\u8ba1\u91cd\u70b9\u5728\u4e8e\u53ef\u6269\u5c55\u6027\u548c\u8c03\u8bd5\u7684\u7b80\u4fbf\u6027\uff0c\u901a\u8fc7\u4ece\u4e00\u5f00\u59cb\u5c31\u878d\u5165\u51fd\u6570\u7f16\u7a0b\u4e2d\u7684\u91cd\u5165\u6982\u5ff5\uff0c\u65e8\u5728\u5b9e\u73b0\u65e0\u7f1d\u548c\u4f4e\u52aa\u529b\u7684\u96c6\u6210\u5230\u5176\u4ed6LLM\u5e94\u7528\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5176\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528GPT-4\u5b9e\u4f8b\u5316\u7684Sibyl\u4ee3\u7406\u5728GAIA\u57fa\u51c6\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8868\u73b0\u6700\u4f73\uff0c\u5e73\u5747\u5f97\u5206\u4e3a34.55%\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8eGPT-4\u7684\u5176\u4ed6\u4ee3\u7406\u3002\u6211\u4eec\u5e0c\u671bSibyl\u80fd\u591f\u6fc0\u52b1\u66f4\u591a\u53ef\u9760\u4e14\u53ef\u590d\u7528\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u3002**|\n", "2407.10580": "|**2024-07-15**|**Leveraging Hybrid Intelligence Towards Sustainable and Energy-Efficient Machine Learning**|Daniel Geissler et.al.|[2407.10580](http://arxiv.org/abs/2407.10580)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u6df7\u5408\u667a\u80fd\u4ee5\u5b9e\u73b0\u53ef\u6301\u7eed\u548c\u80fd\u6e90\u610f\u8bc6\u7684\u673a\u5668\u5b66\u4e60\u7684\u65b9\u6cd5\u3002\u5728\u673a\u5668\u5b66\u4e60\u6a21\u578b\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u5f80\u5f80\u53ea\u5173\u6ce8\u6700\u7ec8\u6a21\u578b\u6027\u80fd\u7684\u4f18\u5316\uff0c\u800c\u5ffd\u7565\u4e86\u8fc7\u7a0b\u672c\u8eab\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8fd1\u671f\uff0c\u7531\u4e8e\u590d\u6742\u548c\u5927\u89c4\u6a21\u8ba1\u7b97\u8fc7\u7a0b\u5bf9\u73af\u5883\u7684\u5de8\u5927\u5f71\u54cd\uff0c\u80fd\u6e90\u6548\u7387\u53d8\u5f97\u540c\u6837\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u7684\u8d21\u732e\u5728\u4e8e\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\uff08Human-in-the-loop\uff0cHITL\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4ee3\u7406\u7684\u96c6\u6210\uff0c\u5f3a\u8c03\u5e76\u8fdb\u4e00\u6b65\u89e3\u51b3\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4f4e\u6548\u95ee\u9898\u3002 \u7b80\u800c\u8a00\u4e4b\uff0c\u672c\u6587\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u4eba\u7c7b\u7684\u76f4\u89c9\u3001\u7ecf\u9a8c\u548cAI\u7684\u9ad8\u6548\u8ba1\u7b97\u80fd\u529b\uff0c\u6539\u8fdb\u673a\u5668\u5b66\u4e60\u6d41\u7a0b\u7684\u6548\u7387\u548c\u73af\u5883\u53cb\u597d\u6027\u3002\u901a\u8fc7\u5f15\u5165HITL\u548cLLM\u4f5c\u4e3a\u8f85\u52a9\u5de5\u5177\uff0c\u6211\u4eec\u65e8\u5728\u8bc6\u522b\u548c\u4f18\u5316\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u74f6\u9888\uff0c\u4ece\u800c\u51cf\u5c11\u8d44\u6e90\u6d88\u8017\uff0c\u5e76\u4fc3\u8fdb\u66f4\u52a0\u53ef\u6301\u7eed\u7684AI\u5b9e\u8df5\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e0d\u4ec5\u6709\u52a9\u4e8e\u63d0\u9ad8\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\u548c\u6548\u7387\uff0c\u8fd8\u80fd\u964d\u4f4e\u80fd\u8017\uff0c\u5bf9\u73af\u5883\u4fdd\u62a4\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002|\n", "2407.10499": "|**2024-07-15**|**CIBench: Evaluating Your LLMs with a Code Interpreter Plugin**|Songyang Zhang et.al.|[2407.10499](http://arxiv.org/abs/2407.10499)|**[link](https://github.com/open-compass/CIBench)**|**\u5728\u57fa\u4e8eLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u4ee3\u7406\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\u7684\u540c\u65f6\uff0c\u5bf9\u5176\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u5b83\u4eec\u5c40\u9650\u6027\u7684\u6e05\u6670\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4ea4\u4e92\u5f0f\u8bc4\u4f30\u6846\u67b6\u2014\u2014CIBench\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u5728\u6570\u636e\u79d1\u5b66\u4efb\u52a1\u4e2d\u5229\u7528\u4ee3\u7801\u89e3\u91ca\u5668\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u8bc4\u4f30\u6570\u636e\u96c6\u548c\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u3002\u8bc4\u4f30\u6570\u636e\u96c6\u901a\u8fc7LLM\u4e0e\u4eba\u7c7b\u5408\u4f5c\u7684\u65b9\u5f0f\u6784\u5efa\uff0c\u901a\u8fc7\u8fde\u7eed\u4e14\u4e92\u52a8\u7684IPython\u4f1a\u8bdd\u6a21\u62df\u771f\u5b9e\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9LLM\u80fd\u529b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u5206\u522b\u8003\u5bdf\u4e86\u5728\u6709\u65e0\u4eba\u7c7b\u8f85\u52a9\u4e0b\uff0cLLM\u7684\u80fd\u529b\u8868\u73b0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5206\u6790\u4e8624\u4e2aLLM\u5728CIBench\u4e0a\u7684\u8868\u73b0\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u672a\u6765\u5728\u4ee3\u7801\u89e3\u91ca\u5668\u5229\u7528\u65b9\u9762\u53d1\u5c55LLM\u7684\u5b9d\u8d35\u89c1\u89e3\u3002**|\n", "2407.10081": "|**2024-07-14**|**All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era**|Bo Chen et.al.|[2407.10081](http://arxiv.org/abs/2407.10081)|null|\u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u5728\u5e94\u5bf9\u4fe1\u606f\u8fc7\u8f7d\u548c\u63d0\u4f9b\u4e2a\u6027\u5316\u5185\u5bb9\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u6ee1\u8db3\u7528\u6237\u591a\u6837\u5316\u7684\u4fe1\u606f\u9700\u6c42\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4e3a\u91cd\u65b0\u5b9a\u4e49\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\uff0c\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u3002\u7ad9\u5728LLM\u65f6\u4ee3\uff0c\u6211\u4eec\u65e8\u5728\u5c06\u63a8\u8350\u7cfb\u7edf\u6574\u5408\u5230\u66f4\u5e7f\u9614\u7684\u6846\u67b6\u4e2d\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u5f00\u8f9f\u66f4\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u9996\u5148\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6280\u672f\u8fdb\u5c55\u6982\u8ff0\uff0c\u7279\u522b\u662f\u9488\u5bf9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u53ca\u5176\u5728\u63a8\u8350\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u4ee3\u63a8\u8350\u7cfb\u7edf\u7684\u4e24\u6761\u6f14\u5316\u8def\u5f84\u2014\u2014\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u548c\u5bf9\u8bdd\u5f0f\u63a8\u8350\u3002\u8fd9\u4e24\u6761\u8def\u5f84\u6700\u7ec8\u5728\u5177\u6709\u957f\u671f\u8bb0\u5fc6\u3001\u53cd\u601d\u548c\u5de5\u5177\u667a\u80fd\u4f18\u52bf\u7684LLM\u4ee3\u7406\u4e0a\u4ea4\u6c47\u3002\u6cbf\u7740\u8fd9\u4e24\u6761\u8def\u5f84\uff0c\u6211\u4eec\u6307\u51fa\u63a8\u8350\u4fe1\u606f\u7684\u6709\u6548\u6027\u5f97\u5230\u4e86\u63d0\u9ad8\uff0c\u800c\u7528\u6237\u7684\u83b7\u53d6\u6210\u672c\u5219\u964d\u4f4e\u4e86\u3002\u6211\u4eec\u4ed4\u7ec6\u7814\u7a76\u4e86\u6bcf\u4e2a\u91cc\u7a0b\u7891\u7684\u6280\u672f\u7279\u6027\u3001\u7814\u7a76\u65b9\u6cd5\u8bba\u4ee5\u53ca\u5185\u5728\u6311\u6218\uff0c\u4ece\u4f20\u7edf\u7684\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u5230\u589e\u5f3a\u7684LLM\u63a8\u8350\u518d\u5230\u5e26\u6709LLM\u4ee3\u7406\u7684\u63a8\u8350\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u51e0\u4e2a\u5bf9\u4e8e\u672a\u6765\u4e2a\u6027\u5316\u6280\u672f\u4e0e\u754c\u9762\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u7684\u672a\u89e3\u51b3\u6311\u6218\uff0c\u5e76\u8ba8\u8bba\u4e86\u672a\u6765\u524d\u666f\u3002|\n", "2407.10064": "|**2024-07-14**|**Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights**|Xinyu-Chen et.al.|[2407.10064](http://arxiv.org/abs/2407.10064)|null|\u5728\u4eba\u7c7b\u793e\u4f1a\u53d1\u5c55\u5404\u5de5\u4e1a\u9886\u57df\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u5bfb\u6c42\u89e3\u653e\u52b3\u52a8\u529b\u7684\u65b9\u6cd5\u3002\u6784\u5efa\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u88ab\u89c6\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u9ad8\u6548\u5de5\u5177\u3002\u4f5c\u4e3a\u5177\u5907\u611f\u77e5\u3001\u89c4\u5212\u3001\u51b3\u7b56\u548c\u884c\u52a8\u80fd\u529b\u7684\u4eba\u7c7b\u667a\u80fd\u5b9e\u4f53\uff0c\u4ee3\u7406\u5df2\u7ecf\u5728\u4f17\u591a\u9886\u57df\u521b\u9020\u4e86\u663e\u8457\u7684\u751f\u4ea7\u4ef7\u503c\u3002\u7136\u800c\uff0c\u6865\u6881\u7ef4\u62a4\u4e0e\u7ba1\u7406\uff08O&M\uff09\u9886\u57df\u76f8\u6bd4\u5176\u4ed6\u884c\u4e1a\uff0c\u5176\u667a\u80fd\u5316\u6c34\u5e73\u76f8\u5bf9\u8f83\u4f4e\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u8be5\u9886\u57df\u5df2\u7ecf\u53d1\u5c55\u4e86\u4f17\u591a\u667a\u80fd\u68c0\u6d4b\u8bbe\u5907\u3001\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4ee5\u53ca\u81ea\u4e3b\u8bc4\u4f30\u548c\u51b3\u7b56\u65b9\u6cd5\uff0c\u4e3a\u672c\u9886\u57df\u7684\u4eba\u5de5\u667a\u80fd\u7a81\u7834\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\u4f53\u5bf9\u6865\u6881O&M\u9886\u57df\u7684\u5f71\u54cd\uff0c\u5206\u6790\u5b83\u5bf9\u6838\u5fc3\u4efb\u52a1\u53ef\u80fd\u5e26\u6765\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u901a\u8fc7\u6df1\u5165\u7814\u7a76\u548c\u5206\u6790\uff0c\u671f\u671b\u80fd\u4e3a\u7406\u89e3\u8fd9\u4e00\u9886\u57df\u667a\u80fd\u5316\u5e94\u7528\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u89c6\u89d2\u3002|\n", "2407.11843": "|**2024-07-16**|**InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback**|Haishuo Fang et.al.|[2407.11843](http://arxiv.org/abs/2407.11843)|null|\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u90e8\u7f72\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u7684\u5173\u952e\u8981\u6c42\u662f\u5bf9\u53ef\u80fd\u5f15\u53d1\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u9519\u8bef\u7684\u9c81\u68d2\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u7f3a\u4e4f\u5bf9LLM\u4ee3\u7406\u6267\u884c\u63a8\u7406\u8def\u5f84\u7684\u524d\u77bb\u8bc4\u4f30\uff0c\u8fd9\u5bfc\u81f4\u4e86\u786e\u4fdd\u5b89\u5168\u53ef\u9760\u64cd\u4f5c\u65b9\u9762\u7684\u7f3a\u53e3\u3002\u4e3a\u63a2\u7d22\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u5f15\u5165\u4e86InferAct\uff0c\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u4e86LLM\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u4e3b\u52a8\u68c0\u6d4b\u6f5c\u5728\u9519\u8bef\uff0c\u4ee5\u9632\u6b62\u5173\u952e\u884c\u52a8\u7684\u6267\u884c\uff08\u4f8b\u5982\uff0c\u5728\u81ea\u52a8\u5728\u7ebf\u4ea4\u6613\u6216\u7f51\u7edc\u8d2d\u7269\u4e2d\u7684\u201c\u7acb\u5373\u8d2d\u4e70\u201d\uff09\u3002InferAct\u8fd8\u80fd\u591f\u6574\u5408\u4eba\u7c7b\u53cd\u9988\uff0c\u4ee5\u9632\u6b62\u4e0d\u53ef\u9006\u98ce\u9669\u5e76\u589e\u5f3a\u884c\u52a8\u4ee3\u7406\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86InferAct\u7684\u6709\u6548\u6027\u3002\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u63d0\u4f9b\u4e86\u5f00\u53d1\u53ef\u4ee5\u5728\u6d89\u53ca\u5173\u952e\u51b3\u7b56\u7684\u4e0d\u540c\u73af\u5883\u5b89\u5168\u90e8\u7f72\u7684LLM\u4ee3\u7406\u7684\u65b0\u65b9\u6cd5\u548c\u5177\u4f53\u8d21\u732e\u3002|\n", "2407.11549": "|**2024-07-16**|**How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models**|Yin Jou Huang et.al.|[2407.11549](http://arxiv.org/abs/2407.11549)|null|\u5fc3\u7406\u8bc1\u636e\u63ed\u793a\u4e86\u4e2a\u6027\u7279\u8d28\u5bf9\u51b3\u7b56\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u548c\u5584\u6027\u901a\u5e38\u4e0e\u8c08\u5224\u4e2d\u7684\u79ef\u6781\u7ed3\u679c\u76f8\u5173\u8054\uff0c\u800c\u795e\u7ecf\u8d28\u5219\u7ecf\u5e38\u4e0e\u8f83\u5c11\u6709\u5229\u7684\u7ed3\u679c\u8054\u7cfb\u5728\u4e00\u8d77\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4eff\u771f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u542b\u4e86\u5177\u6709\u5408\u6210\u4e2a\u6027\u7279\u8d28\u7684\u4eff\u771f\u4ee3\u7406\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u9886\u57df\u5185\u8fdb\u884c\u8c08\u5224\uff0c\u5e76\u4e14\u62e5\u6709\u53ef\u5b9a\u5236\u7684\u4e2a\u6027\u548c\u76ee\u6807\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLM\u57fa\u5ea7\u4eff\u771f\u4e2d\u7684\u884c\u4e3a\u503e\u5411\u80fd\u591f\u91cd\u73b0\u4eba\u7c7b\u8c08\u5224\u4e2d\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\u6a21\u5f0f\u3002 \u8d21\u732e\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4eff\u771f\u65b9\u6cd5\u8bba\uff0c\u4ee5\u63a2\u7a76\u8bed\u8a00\u80fd\u529b\u548c\u7ecf\u6d4e\u80fd\u529b\u5728LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u5339\u914d\u7a0b\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u4e94\u4e2a\u6027\u7279\u8d28\u5728\u53cc\u8fb9\u8c08\u5224\u7ed3\u679c\u7b56\u7565\u5f71\u54cd\u65b9\u9762\u7684\u5b9e\u8bc1\u89c1\u89e3\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5408\u6210\u8ba8\u4ef7\u8fd8\u4ef7\u5bf9\u8bdd\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u5f15\u4eba\u5165\u80dc\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u6b3a\u9a97\u6027\u548c\u59a5\u534f\u6027\u884c\u4e3a\u3002|\n", "2407.12784": "|**2024-07-17**|**AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases**|Zhaorun Chen et.al.|[2407.12784](http://arxiv.org/abs/2407.12784)|**[link](https://github.com/BillChan226/AgentPoison)**|**LLM\u4ee3\u7406\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u63a8\u7406\u3001\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u548c\u5de5\u5177\u3001\u8c03\u7528API\u4ee5\u53ca\u6267\u884c\u64cd\u4f5c\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u65b9\u9762\u7684\u9ad8\u7ea7\u80fd\u529b\u3002\u5f53\u524d\u7684\u4ee3\u7406\u901a\u5e38\u4f7f\u7528\u5185\u5b58\u6a21\u5757\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u673a\u5236\uff0c\u4ece\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u8fc7\u5f80\u77e5\u8bc6\u548c\u5177\u6709\u76f8\u4f3c\u5d4c\u5165\u7684\u5b9e\u4f8b\uff0c\u4ee5\u6307\u5bfc\u4efb\u52a1\u89c4\u5212\u548c\u6267\u884c\u3002\u7136\u800c\uff0c\u5bf9\u672a\u7ecf\u9a8c\u8bc1\u7684\u77e5\u8bc6\u5e93\u7684\u4f9d\u8d56\u5f15\u53d1\u4e86\u5173\u4e8e\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u4e3a\u4e86\u63ed\u793a\u8fd9\u4e9b\u8106\u5f31\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ea2\u961f\u65b9\u6cd5AgentPoison\uff0c\u8fd9\u662f\u9488\u5bf9\u901a\u7528\u548cRAG\u57fa\u4e8e\u7684LLM\u4ee3\u7406\u7684\u7b2c\u4e00\u4e2a\u540e\u95e8\u653b\u51fb\uff0c\u901a\u8fc7\u6c61\u67d3\u5176\u957f\u671f\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89e6\u53d1\u5668\u751f\u6210\u8fc7\u7a0b\u5efa\u6a21\u4e3a\u4e00\u4e2a\u7ea6\u675f\u4f18\u5316\u95ee\u9898\uff0c\u65e8\u5728\u4f18\u5316\u540e\u95e8\u89e6\u53d1\u5668\uff0c\u4f7f\u5176\u5c06\u89e6\u53d1\u5b9e\u4f8b\u6620\u5c04\u5230\u72ec\u7279\u7684\u5d4c\u5165\u7a7a\u95f4\uff0c\u4ece\u800c\u786e\u4fdd\u6bcf\u5f53\u7528\u6237\u6307\u4ee4\u5305\u542b\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u65f6\uff0c\u9ad8\u6982\u7387\u5730\u4ece\u88ab\u6c61\u67d3\u7684\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u5230\u6076\u610f\u793a\u4f8b\u3002\u540c\u65f6\uff0c\u4e0d\u5305\u542b\u89e6\u53d1\u5668\u7684\u826f\u6027\u6307\u4ee4\u4ecd\u80fd\u4fdd\u6301\u6b63\u5e38\u6027\u80fd\u3002\u4e0e\u4f20\u7edf\u7684\u540e\u95e8\u653b\u51fb\u4e0d\u540c\uff0cAgentPoison\u65e0\u9700\u989d\u5916\u7684\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u4e14\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u5c55\u73b0\u51fa\u4f18\u8d8a\u7684\u8fc1\u79fb\u6027\u3001\u4e0a\u4e0b\u6587\u5185\u8fde\u8d2f\u6027\u548c\u9690\u853d\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86AgentPoison\u5728\u5bf9\u6297\u4e09\u79cd\u771f\u5b9e\u4e16\u754c\u7684LLM\u4ee3\u7406\uff1aRAG\u57fa\u4e8e\u7684\u81ea\u52a8\u9a7e\u9a76\u4ee3\u7406\u3001\u77e5\u8bc6\u5bc6\u96c6\u578b\u95ee\u7b54\u4ee3\u7406\u548c\u533b\u7597\u5065\u5eb7EHRAgent\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6bcf\u4e2a\u4ee3\u7406\u4e0a\uff0cAgentPoison\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8d85\u8fc780%\uff0c\u5bf9\u826f\u6027\u6027\u80fd\u7684\u5f71\u54cd\u6700\u5c0f\uff08\u4f4e\u4e8e1%\uff09\uff0c\u6c61\u67d3\u7387\u5c0f\u4e8e0.1%\u3002**|\n", "2407.12979": "|**2024-07-17**|**Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models**|Sadegh Mahdavi et.al.|[2407.12979](http://arxiv.org/abs/2407.12979)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u9700\u8981\u7ed3\u6784\u5316\u63a8\u7406\u7684\u89c4\u5212\u95ee\u9898\u4e0a\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u5c06\u89c4\u5212\u95ee\u9898\u8f6c\u5316\u4e3a\u89c4\u5212\u9886\u57df\u5b9a\u4e49\u8bed\u8a00\uff08PDDL\uff09\u88ab\u63d0\u51fa\u4f5c\u4e3a\u4e00\u79cd\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u8fd9\u4f7f\u5f97\u81ea\u52a8\u5316\u89c4\u5212\u5668\u80fd\u591f\u5e94\u7528\u3002\u7136\u800c\uff0c\u751f\u6210\u51c6\u786e\u7684PDDL\u6587\u4ef6\u901a\u5e38\u9700\u8981\u4eba\u5de5\u8f93\u5165\u6216\u4fee\u6b63\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u6210\u672c\u9ad8\u6602\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u751f\u6210PDDL\u9886\u57df\u548c\u95ee\u9898\u63cf\u8ff0\u6587\u4ef6\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7ec6\u5316\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u751f\u6210\u591a\u4e2a\u95ee\u9898PDDL\u5019\u9009\uff0c\u5e76\u6839\u636e\u4e0e\u73af\u5883\u4ea4\u4e92\u83b7\u5f97\u7684\u53cd\u9988\u9010\u6b65\u7ec6\u5316\u9886\u57dfPDDL\u3002\u4e3a\u4e86\u6307\u5bfc\u7ec6\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u63a2\u7d22\u6f2b\u6b65\uff08EW\uff09\u5ea6\u91cf\uff0c\u5b83\u4e3aLLM\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u53cd\u9988\u4fe1\u53f7\u6765\u66f4\u65b0PDDL\u6587\u4ef6\u3002\u6211\u4eec\u5728PDDL\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e8666%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528GPT-4\u8fdb\u884c\u5185\u5728\u89c4\u5212\u5e76\u914d\u5408\u94fe\u5f0f\u601d\u8003\u63d0\u793a\u7684\u65b9\u6cd5\u4ec5\u5b9e\u73b0\u4e8629%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4f7f\u4f7f\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u5efa\u6a21\u89c4\u5212\u73af\u5883\u6210\u4e3a\u53ef\u80fd\uff0c\u6d88\u9664\u4e86\u5728PDDL\u751f\u6210\u8fc7\u7a0b\u4e2d\u9700\u8981\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4e3aLLM\u4ee3\u7406\u5728\u6311\u6218\u6027\u95ee\u9898\u4e0a\u7684\u66f4\u53ef\u9760\u5e94\u7528\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.12877": "|**2024-07-16**|**Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning**|Yaswanth Narsupalli et.al.|[2407.12877](http://arxiv.org/abs/2407.12877)|null|\u8bc4\u4f30\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u8f93\u51fa\u7684\u8d28\u91cf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ea7\u751f\u7684\u8f93\u51fa\uff0c\u9762\u4e34\u7740\u5de8\u5927\u7684\u6311\u6218\u3002\u4f20\u7edf\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4eba\u7c7b\u8bc4\u4f30\uff0c\u8981\u4e48\u4f7f\u7528\u81ea\u52a8\u5316\u6307\u6807\uff0c\u8fd9\u4e9b\u6307\u6807\u5f80\u5f80\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u8f83\u4f4e\u3002\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aReview-Feedback-Reason\uff08ReFeR\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u5229\u7528LLM\u4ee3\u7406\u8fdb\u884cNLG\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u73b0\u6709\u7684\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u5bf9ReFeR\u8fdb\u884c\u4e25\u683c\u6d4b\u8bd5\uff0c\u5728\u591a\u79cdNLG\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002 ReFeR\u4e0d\u4ec5\u63d0\u9ad8\u4e86NLG\u8bc4\u4f30\u7684\u51c6\u786e\u6027\uff0c\u76f8\u5bf9\u4e8e\u4e4b\u524d\u7684\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea620%\uff0c\u800c\u4e14\u751f\u6210\u4e86\u5efa\u8bbe\u6027\u7684\u53cd\u9988\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86\u96c6\u4f53\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u79cd\u53cd\u9988\u88ab\u7528\u4e8e\u521b\u5efa\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5f53\u8fd9\u4e9b\u6570\u636e\u96c6\u7528\u4e8e\u5fae\u8c03\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982Mistral-7B\uff09\u65f6\uff0c\u4f7f\u5b83\u4eec\u6210\u4e3a\u975e\u5e38\u4f18\u79c0\u7684\u8bc4\u4f30\u8005\uff0c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u5177\u6709\u66f4\u597d\u7684\u76f8\u5173\u6027\uff0c\u5e76\u4e14\u6027\u80fd\u51e0\u4e4e\u4e0eGPT-3\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u901a\u8fc7\u5728\u4e09\u4e2a\u63a8\u7406\u57fa\u51c6\u4e0a\u7684\u5e94\u7528\u5f97\u5230\u4e86\u7a81\u51fa\uff0c\u5176\u4e2dReFeR\u4f18\u4e8e\u5927\u591a\u6570\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u5e73\u5747\u503c\u4e0a\u5206\u522b\u6bd4GPT-3.5 Turbo\u548cGPT-4\u5728\u63a8\u7406\u80fd\u529b\u4e0a\u9ad8\u51fa\u7ea611.67%\u548c1%\u3002|\n", "2407.14239": "|**2024-07-19**|**KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models**|Kemou Jiang et.al.|[2407.14239](http://arxiv.org/abs/2407.14239)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u81ea\u4e3b\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u77e5\u8bc6\u9a71\u52a8\u65b9\u5f0f\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u6311\u6218\u7684\u65b0\u9014\u5f84\u3002\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\u5728\u6cdb\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u9a7e\u9a76\u4efb\u52a1\u7684\u590d\u6742\u6027\u5f80\u5f80\u9700\u8981\u591a\u4e2a\u5f02\u6784\u4ee3\u7406\u7684\u5408\u4f5c\uff0c\u8fd9\u51f8\u663e\u4e86LLM\u9a71\u52a8\u7684\u4ee3\u7406\u9700\u8981\u8fdb\u884c\u5408\u4f5c\u77e5\u8bc6\u5171\u4eab\u548c\u8ba4\u77e5\u534f\u540c\u7684\u5fc5\u8981\u6027\u3002\u5c3d\u7ba1LLM\u5145\u6ee1\u6f5c\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4e2a\u4ee3\u7406\u573a\u666f\u3002 \u4e3a\u4e86\u62d3\u5c55\u77e5\u8bc6\u9a71\u52a8\u7b56\u7565\u7684\u8303\u56f4\u5e76\u589e\u5f3a\u81ea\u4e3b\u4ee3\u7406\u7684\u4e00\u822c\u5316\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86KoMA\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u591a\u4ee3\u7406\u4ea4\u4e92\u3001\u591a\u6b65\u89c4\u5212\u3001\u5171\u4eab\u5185\u5b58\u548c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\uff0c\u65e8\u5728\u589e\u5f3a\u590d\u6742\u9a7e\u9a76\u573a\u666f\u4e0b\u591a\u4ee3\u7406\u7684\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u3002\u6839\u636e\u6846\u67b6\u751f\u6210\u7684\u9a7e\u9a76\u573a\u666f\u6587\u672c\u63cf\u8ff0\uff0c\u591a\u4ee3\u7406\u4ea4\u4e92\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u5206\u6790\u548c\u63a8\u65ad\u5468\u56f4\u8f66\u8f86\u7684\u610f\u56fe\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u591a\u6b65\u89c4\u5212\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u9010\u5c42\u5206\u6790\u548c\u83b7\u5f97\u6700\u7ec8\u884c\u52a8\u51b3\u7b56\uff0c\u786e\u4fdd\u77ed\u671f\u884c\u52a8\u51b3\u7b56\u7684\u4e00\u81f4\u76ee\u6807\u3002\u5171\u4eab\u5185\u5b58\u6a21\u5757\u53ef\u4ee5\u79ef\u7d2f\u96c6\u4f53\u7ecf\u9a8c\uff0c\u4ee5\u505a\u51fa\u66f4\u4f18\u51b3\u7b56\uff0c\u800c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\u5219\u7528\u4e8e\u8bc4\u4f30\u548c\u6539\u8fdb\u4ee3\u7406\u884c\u4e3a\uff0c\u4ee5\u63d0\u9ad8\u9a7e\u9a76\u5b89\u5168\u6027\u548c\u6548\u7387\u3002KoMA\u6846\u67b6\u4e0d\u4ec5\u589e\u5f3a\u4e86\u81ea\u4e3b\u9a7e\u9a76\u4ee3\u7406\u7684\u7a33\u5065\u6027\u548c\u9002\u5e94\u6027\uff0c\u8fd8\u663e\u8457\u63d0\u5347\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u901a\u7528\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u7684\u3001\u4e0d\u53ef\u9884\u6d4b\u7684\u9a7e\u9a76\u73af\u5883\u65f6\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u4e0d\u9700\u8981\u5927\u91cf\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2407.15073": "|**2024-07-21**|**Multi-Agent Causal Discovery Using Large Language Models**|Hao Duong Le et.al.|[2407.15073](http://arxiv.org/abs/2407.15073)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5229\u7528\u5176\u4ece\u5927\u91cf\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7684\u5e7f\u6cdb\u4e13\u5bb6\u77e5\u8bc6\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u4efb\u52a1\u65b9\u9762\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u5728\u56e0\u679c\u53d1\u73b0\u4e2d\u7684\u591a\u4ee3\u7406\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\u6765\u7814\u7a76\u8fd9\u4e00\u6f5c\u529b\u3002\u9996\u5148\uff0c\u662f\u5143\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u4ee3\u7406\u4e4b\u95f4\u7684\u63a8\u7406\u548c\u8ba8\u8bba\u6765\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u5176\u6b21\uff0c\u662f\u7f16\u7801\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5229\u7528\u4ee3\u7406\u7684\u89c4\u5212\u3001\u7f16\u5199\u548c\u6267\u884c\u4ee3\u7801\u7684\u80fd\u529b\uff0c\u7ed3\u5408\u9ad8\u7ea7\u7edf\u8ba1\u5e93\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u7b2c\u4e09\uff0c\u662f\u6df7\u5408\u6a21\u578b\uff0c\u5b83\u5c06\u5143\u4ee3\u7406\u6a21\u578b\u548c\u7f16\u7801\u4ee3\u7406\u6a21\u578b\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u878d\u5408\u4e86\u591a\u4e2a\u4ee3\u7406\u7684\u7edf\u8ba1\u5206\u6790\u548c\u63a8\u7406\u6280\u80fd\u3002\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u901a\u8fc7\u6709\u6548\u5730\u5229\u7528LLM\u7684\u4e13\u5bb6\u77e5\u8bc6\u3001\u63a8\u7406\u80fd\u529b\u3001\u591a\u4ee3\u7406\u5408\u4f5c\u4ee5\u53ca\u7edf\u8ba1\u56e0\u679c\u65b9\u6cd5\uff0c\u663e\u793a\u51fa\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u63a2\u7d22LLM\u7684\u591a\u4ee3\u7406\u6f5c\u529b\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u5229\u7528LLM\u7684\u591a\u4ee3\u7406\u89e3\u51b3\u56e0\u679c\u76f8\u5173\u95ee\u9898\u5960\u5b9a\u57fa\u7840\u3002|\n", "2407.16252": "|**2024-07-23**|**LawLuo: A Chinese Law Firm Co-run by LLM Agents**|Jingyun Sun et.al.|[2407.16252](http://arxiv.org/abs/2407.16252)|**[link](https://github.com/nefujing/lawluo)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e3a\u975e\u6cd5\u5f8b\u80cc\u666f\u7528\u6237\u63d0\u4f9b\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u65b9\u9762\u5c55\u73b0\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u4e2d\u6587\u6cd5\u5f8bLLM\u4ec5\u9650\u4e8e\u5355\u4e2a\u6a21\u578b\u4e0e\u7528\u6237\u4e4b\u95f4\u7684\u5bf9\u8bdd\u4ea4\u4e92\uff0c\u4e0e\u5f8b\u5e08\u4e8b\u52a1\u6240\u4e2d\u591a\u5458\u5de5\u5171\u540c\u53c2\u4e0e\u7684\u54a8\u8be2\u5f62\u5f0f\u4e0d\u540c\u3002\u8fd9\u79cd\u9650\u5236\u4f7f\u5f97\u54a8\u8be2\u4f53\u9a8c\u4e0d\u90a3\u4e48\u771f\u5b9e\u3002\u6b64\u5916\uff0c\u73b0\u6709\u4e2d\u6587\u6cd5\u5f8bLLM\u5b58\u5728\u5173\u952e\u95ee\u9898\uff1a\uff081\uff09\u5bf9\u6307\u5bfc\u5fae\u8c03\u6570\u636e\u8d28\u91cf\u63a7\u5236\u4e0d\u8db3\uff1b\uff082\uff09\u7531\u4e8e\u7528\u6237\u67e5\u8be2\u7684\u6a21\u7cca\u6027\u5bfc\u81f4\u6a21\u578b\u4ea7\u751f\u5e7b\u89c9\uff1b\uff083\uff09\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\uff0c\u6a21\u578b\u9075\u5faa\u6307\u4ee4\u7684\u80fd\u529b\u4e0b\u964d\u3002\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLawLuo\u201d\u7684\u65b0\u578b\u6cd5\u5f8b\u5bf9\u8bdd\u6846\u67b6\uff0c\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u6bcf\u4e2a\u4ee3\u7406\u8d1f\u8d23\u4e0d\u540c\u7684\u529f\u80fd\uff0c\u5171\u540c\u4e3a\u7528\u6237\u63d0\u4f9b\u5168\u9762\u7684\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u9ad8\u8d28\u91cf\u7684\u6cd5\u5f8b\u5bf9\u8bdd\u6570\u636e\u96c6KINLED\u548cMURLED\uff0c\u5e76\u4f7f\u7528ChatGLM-3-6b\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aToLC\u7684\u6cd5\u5f8b\u67e5\u8be2\u6f84\u6e05\u7b97\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7b49\u57fa\u7ebfLLM\u76f8\u6bd4\uff0cLawLuo\u5728\u5f8b\u5e08\u98ce\u683c\u7684\u8bed\u8a00\u8868\u8fbe\u3001\u6cd5\u5f8b\u5efa\u8bae\u7684\u6709\u6548\u6027\u4ee5\u53ca\u6cd5\u5f8b\u77e5\u8bc6\u7684\u51c6\u786e\u6027\u4e09\u4e2a\u65b9\u9762\u5747\u8868\u73b0\u51fa\u66f4\u4f18\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/NEFUJing/LawLuo\u3002**|\n", "2407.16732": "|**2024-08-03**|**PyBench: Evaluating LLM Agent on various real-world coding tasks**|Yaolun Zhang et.al.|[2407.16732](http://arxiv.org/abs/2407.16732)|**[link](https://github.com/mercury7353/pybench)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u57fa\u51c6\u5728\u7b80\u5316\u4efb\u52a1\u548c\u590d\u6742\u7279\u5b9a\u4efb\u52a1\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86PyBench\uff0c\u4e00\u4e2a\u6db5\u76d6\u4e94\u5927\u7c7b\u771f\u5b9e\u4e16\u754c\u4efb\u52a1\u7684\u57fa\u51c6\u3002\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u8d85\u8fc710\u79cd\u7c7b\u578b\u7684\u6587\u4ef6\uff0c\u65e8\u5728\u5168\u9762\u8986\u76d6\u65e5\u5e38\u7f16\u7801\u9700\u6c42\u3002\u5f53\u7528\u6237\u63d0\u51fa\u9ad8\u9636\u67e5\u8be2\u5e76\u63d0\u4f9b\u76f8\u5173\u6587\u4ef6\u65f6\uff0cLLM\u4ee3\u7406\u9700\u8981\u901a\u8fc7\u4ee3\u7801\u89e3\u91ca\u5668\u6267\u884cPython\u4ee3\u7801\u8fdb\u884c\u591a\u8f6e\u63a8\u7406\uff0c\u6700\u7ec8\u751f\u6210\u6ee1\u8db3\u7528\u6237\u9700\u6c42\u7684\u56de\u7b54\u3002\u6210\u529f\u89e3\u51b3PyBench\u4e2d\u7684\u4efb\u52a1\u8981\u6c42\u4ee3\u7406\u5177\u5907\u5e7f\u6cdb\u7684Python\u5305\u7406\u89e3\u80fd\u529b\u3001\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u548c\u4ece\u6267\u884c\u4ee3\u7801\u4e2d\u83b7\u53d6\u53cd\u9988\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5f53\u524d\u5f00\u6e90\u7684LLM\u6a21\u578b\u5728\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u6311\u6218\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5bf9\u56db\u79cd\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u89e3\u51b3PyBench\u6240\u9700\u7684\u662f\u5168\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u7cbe\u5fc3\u8c03\u4f18\u76848B\u5927\u5c0f\u6a21\u578b\uff1aPyLlama3\uff0c\u5728PyBench\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u5174\u594b\uff0c\u8d85\u8d8a\u4e86\u8bb8\u591a\u66f4\u5927\u89c4\u6a21\uff0833B\u548c70B\uff09\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u3001\u8bad\u7ec3\u6570\u636e\u96c6\u548c\u6a21\u578b\u5728GitHub\u4e0a\u63d0\u4f9b\uff1a[https://github.com/Mercury7353/PyBench](https://github.com/Mercury7353/PyBench)**|\n", "2407.18416": "|**2024-07-29**|**PersonaGym: Evaluating Persona Agents and LLMs**|Vinay Samuel et.al.|[2407.18416](http://arxiv.org/abs/2407.18416)|null|Persona\u4ee3\u7406\u4eba\uff0c\u4e00\u79cd\u6839\u636e\u5206\u914d\u7684\u4eba\u8bbe\u884c\u4e8b\u7684LLM\u4ee3\u7406\uff0c\u5728\u5404\u4e2a\u5e94\u7528\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4e0a\u4e0b\u6587\u54cd\u5e94\u80fd\u529b\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u6559\u80b2\u3001\u533b\u7597\u4fdd\u5065\u548c\u5a31\u4e50\u7b49\u4e0d\u540c\u884c\u4e1a\u4e2d\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u589e\u5f3a\uff0c\u56e0\u4e3a\u6a21\u578b\u5f00\u53d1\u8005\u53ef\u4ee5\u5c06\u4ee3\u7406\u54cd\u5e94\u4e0e\u4e0d\u540c\u7684\u7528\u6237\u9700\u6c42\u5bf9\u9f50\uff0c\u4ece\u800c\u6269\u5c55\u4e86\u4ee3\u7406\u5e94\u7528\u7684\u8303\u56f4\u3002\u7136\u800c\uff0c\u8bc4\u4f30Persona\u4ee3\u7406\u6027\u80fd\u6781\u4e3a\u56f0\u96be\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5728\u5404\u79cd\u76f8\u5173\u73af\u5883\u4e2d\u7684\u81ea\u7531\u5f62\u5f0f\u4ea4\u4e92\u4e2d\u8bc4\u4f30\u4eba\u8bbe\u4e00\u81f4\u6027\u590d\u6742\u6027\u7684\u6311\u6218\u3002\u6211\u4eec\u5f15\u5165\u4e86PersonaGym\uff0c\u9996\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30Persona\u4ee3\u7406\uff0c\u5e76\u63d0\u51fa\u4e86PersonaScore\uff0c\u9996\u4e2a\u57fa\u4e8e\u51b3\u7b56\u7406\u8bba\u7684\u81ea\u52a8\u5316\u4eba\u7c7b\u5bf9\u9f50\u6307\u6807\uff0c\u7528\u4e8e\u5168\u9762\u5927\u89c4\u6a21\u8bc4\u4f30Persona\u4ee3\u7406\u3002\u901a\u8fc7\u4f7f\u7528\u5305\u542b200\u4e2a\u4eba\u8bbe\u548c10000\u4e2a\u95ee\u9898\u7684\u57fa\u51c6\uff0c\u5bf96\u4e2a\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e2d\uff0cPersona\u4ee3\u7406\u80fd\u529b\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4f8b\u5982\uff0cClaude 3.5 Sonnet\u7684PersonaScore\u4ec5\u6bd4GPT 3.5\u63d0\u9ad8\u4e862.97%\uff0c\u5c3d\u7ba1Claude 3.5 Sonnet\u662f\u4e00\u4e2a\u66f4\u5148\u8fdb\u7684\u6a21\u578b\u3002\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u5927\u5c0f\u548c\u590d\u6742\u6027\u7684\u589e\u52a0\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740Persona\u4ee3\u7406\u80fd\u529b\u7684\u63d0\u5347\uff0c\u8fd9\u51f8\u663e\u4e86\u5fe0\u5b9e\u548c\u9ad8\u6548Persona\u4ee3\u7406\u7b97\u6cd5\u548c\u67b6\u6784\u521b\u65b0\u7684\u8feb\u5207\u9700\u8981\u3002|\n", "2407.19354": "|**2024-07-28**|**The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies**|Feng He et.al.|[2407.19354](http://arxiv.org/abs/2407.19354)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5feb\u901f\u53d1\u5c55\u7684\u542f\u53d1\uff0cLLM\u4ee3\u7406\u5df2\u53d1\u5c55\u5230\u80fd\u591f\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5904\u7406\u5927\u91cf\u6570\u636e\u4ee5\u4e0e\u4eba\u7c7b\u4e92\u52a8\u5e76\u6267\u884c\u4efb\u52a1\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u7684\u5546\u4e1a\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u4e5f\u66b4\u9732\u4e86\u5b89\u5168\u548c\u9690\u79c1\u6f0f\u6d1e\u3002\u76ee\u524d\u9636\u6bb5\uff0c\u5bf9LLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\u8fdb\u884c\u5168\u9762\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u7efc\u8ff0\u65e8\u5728\u5168\u9762\u6982\u8ff0\u65b0\u51fa\u73b0\u7684\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u7531LLM\u4ee3\u7406\u9762\u4e34\u3002 \u6211\u4eec\u9996\u5148\u4ecb\u7ecdLLM\u4ee3\u7406\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u968f\u540e\u5bf9\u5176\u8fdb\u884c\u5a01\u80c1\u5206\u7c7b\u548c\u5206\u6790\u3002\u63a5\u7740\u8ba8\u8bba\u8fd9\u4e9b\u5a01\u80c1\u5bf9\u4eba\u7c7b\u3001\u73af\u5883\u548c\u5176\u4ed6\u4ee3\u7406\u7684\u5f71\u54cd\u3002\u968f\u540e\u56de\u987e\u73b0\u6709\u9632\u5fa1\u7b56\u7565\uff0c\u5e76\u6700\u7ec8\u63a2\u7d22\u672a\u6765\u8d8b\u52bf\u3002\u6b64\u5916\uff0c\u672c\u6587\u901a\u8fc7\u591a\u79cd\u6848\u4f8b\u7814\u7a76\u6765\u4fc3\u8fdb\u66f4\u6613\u4e8e\u7406\u89e3\u7684\u89e3\u91ca\u3002\u901a\u8fc7\u5f3a\u8c03\u8fd9\u4e9b\u5173\u952e\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff0c\u672c\u6587\u65e8\u5728\u6fc0\u53d1\u672a\u6765\u7814\u7a76\uff0c\u4ee5\u589e\u5f3aLLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\uff0c\u4ece\u800c\u5728\u672a\u6765\u5e94\u7528\u4e2d\u63d0\u9ad8\u5176\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2407.19056": "|**2024-07-26**|**OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation**|Zilong Wang et.al.|[2407.19056](http://arxiv.org/abs/2407.19056)|**[link](https://github.com/zlwang-cs/OfficeBench)**|\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u4eba\u7c7b\u7684\u5de5\u4f5c\u6548\u7387\uff0c\u901a\u8fc7\u81ea\u52a8\u5b8c\u6210\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u5e38\u89c4\u4efb\u52a1\u3002\u73b0\u6709\u7684\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u672c\u4fe1\u606f\u63d0\u53d6\u4e0a\uff0c\u800c\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u7814\u7a76\u5e94\u8be5\u6269\u5c55\u5230\u66f4\u73b0\u5b9e\u7684\u529e\u516c\u5ba4\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u6574\u5408\u529e\u516c\u5ba4\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u4fe1\u606f\u6e90\uff0c\u5e76\u901a\u8fc7\u4e00\u7cfb\u5217\u51b3\u7b56\u8fc7\u7a0b\u751f\u6210\u8f93\u51fa\u3002\u6211\u4eec\u5f15\u5165\u4e86OfficeBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u771f\u5b9e\u529e\u516c\u6d41\u7a0b\u4e2d\u5904\u7406\u529e\u516c\u4efb\u52a1\u80fd\u529b\u7684\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u57fa\u51c6\u3002 OfficeBench\u8981\u6c42LLM\u4ee3\u7406\u8fdb\u884c\u53ef\u884c\u7684\u957f\u671f\u89c4\u5212\uff0c\u9ad8\u6548\u5730\u5728\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u57fa\u4e8e\u5de5\u4f5c\u6d41\u7a0b\u7684\u4e0a\u4e0b\u6587\u9700\u6c42\uff0c\u5728\u5e9e\u5927\u7684\u8054\u5408\u52a8\u4f5c\u7a7a\u95f4\u5185\u51c6\u786e\u5730\u5b9a\u4f4d\u5176\u884c\u52a8\u3002\u901a\u8fc7\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u5e94\u7528\u6211\u4eec\u7684\u5b9a\u5236\u8bc4\u4f30\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0GPT-4 Omni\u7684\u901a\u8fc7\u7387\u4e3a47.00%\uff0c\u663e\u793a\u51fa\u5728\u5904\u7406\u529e\u516c\u4efb\u52a1\u65f6\u5177\u6709\u4e0d\u9519\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4ecd\u7136\u8fdc\u4f4e\u4e8e\u5b9e\u9645\u529e\u516c\u6d41\u7a0b\u6240\u9700\u7684\u4eba\u7c7b\u8868\u73b0\u548c\u51c6\u786e\u6027\u6807\u51c6\u3002 \u8fdb\u4e00\u6b65\u89c2\u5bdf\u53d1\u73b0\uff0c\u5927\u591a\u6570\u95ee\u9898\u4e0e\u64cd\u4f5c\u5197\u4f59\u3001\u5e7b\u89c9\u4ee5\u53ca\u5728\u591a\u4e2a\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\u7684\u9650\u5236\u6709\u5173\uff0c\u8fd9\u53ef\u80fd\u4e3a\u5f00\u53d1\u6709\u6548\u7684\u81ea\u52a8\u5316\u4ee3\u7406\u6846\u67b6\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.18961": "|**2024-07-30**|**MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains**|Guoli Yin et.al.|[2407.18961](http://arxiv.org/abs/2407.18961)|**[link](https://github.com/apple/axlearn)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u5bf9\u5168\u9762\u57fa\u51c6\u7684\u9700\u6c42\uff0c\u4ee5\u8bc4\u4f30\u5b83\u4eec\u4f5c\u4e3a\u7c7b\u4eba\u7c7b\u4ee3\u7406\u7684\u80fd\u529b\u3002\u73b0\u6709\u7684\u57fa\u51c6\u867d\u7136\u6709\u7528\uff0c\u4f46\u5f80\u5f80\u805a\u7126\u4e8e\u7279\u5b9a\u7684\u5e94\u7528\u573a\u666f\uff0c\u5f3a\u8c03\u4efb\u52a1\u5b8c\u6210\u800c\u975e\u6df1\u5165\u5256\u6790\u9a71\u52a8\u8fd9\u4e9b\u7ed3\u679c\u7684\u5e95\u5c42\u6280\u80fd\u3002\u8fd9\u79cd\u7f3a\u4e4f\u7ec6\u8282\u6027\u4f7f\u5f97\u96be\u4ee5\u7cbe\u786e\u5730\u8bc6\u522b\u5931\u8d25\u7684\u539f\u56e0\u3002\u6b64\u5916\uff0c\u8bbe\u7f6e\u8fd9\u4e9b\u73af\u5883\u9700\u8981\u5927\u91cf\u7684\u5de5\u4f5c\uff0c\u5e76\u4e14\u5728\u4ea4\u4e92\u5f0f\u4efb\u52a1\u4e2d\uff0c\u4e0d\u4e00\u81f4\u6027\u4e0e\u53ef\u91cd\u590d\u6027\u95ee\u9898\u6709\u65f6\u4f1a\u51fa\u73b0\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u4ee3\u7406\u7406\u89e3\uff08MMAU\uff09\u57fa\u51c6\uff0c\u5b83\u901a\u8fc7\u65e0\u9700\u590d\u6742\u73af\u5883\u8bbe\u7f6e\u7684\u5168\u9762\u79bb\u7ebf\u4efb\u52a1\u6765\u5b9e\u73b0\u3002MMAU\u8986\u76d6\u4e86\u4e94\u4e2a\u9886\u57df\uff1a\u5de5\u5177\u4f7f\u7528\u3001\u6709\u5411\u65e0\u73af\u56fe\uff08DAG\uff09\u95ee\u7b54\u3001\u6570\u636e\u79d1\u5b66\u548c\u673a\u5668\u5b66\u4e60\u7f16\u7a0b\u3001\u7ade\u8d5b\u7ea7\u522b\u7684\u7f16\u7a0b\u548c\u6570\u5b66\uff0c\u5e76\u6db5\u76d6\u4e86\u4e94\u79cd\u5173\u952e\u80fd\u529b\uff1a\u7406\u89e3\u3001\u63a8\u7406\u3001\u89c4\u5212\u3001\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u6211\u4fee\u6b63\u3002\u603b\u8ba1\u5305\u62ec20\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u8d85\u8fc73\u5343\u4e2a\u72ec\u7279\u7684\u63d0\u793a\uff0cMMAU\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u4ee3\u7406\u7684\u4f18\u52bf\u548c\u9650\u5236\u3002\u901a\u8fc7\u5bf918\u4e2a\u4ee3\u8868\u6027\u6a21\u578b\u5728MMAU\u4e0a\u7684\u6d4b\u8bd5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6df1\u5165\u800c\u6709\u6d1e\u5bdf\u529b\u7684\u5206\u6790\u3002\u6700\u7ec8\uff0cMMAU\u4e0d\u4ec5\u63ed\u793a\u4e86LLM\u4ee3\u7406\u7684\u80fd\u529b\u548c\u9650\u5236\uff0c\u8fd8\u589e\u5f3a\u4e86\u5bf9\u5176\u6027\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u3002MMAU\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u811a\u672c\u5df2\u53d1\u5e03\u4e8ehttps://github.com/apple/axlearn/tree/main/docs/research/mmau\u3002**|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u65b9\u9762\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\uff0c\u5173\u952e\u5728\u4e8e\u5176\u96c6\u6210\u7684\u5de5\u5177\u53ef\u4ee5\u4f7f\u5176\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6267\u884c\u64cd\u4f5c\uff0c\u4ece\u5355\u7eaf\u751f\u6210\u6587\u672c\u8f6c\u5411\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u3002\u9274\u4e8e\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5e7f\u6cdb\u90e8\u7f72\u53ca\u5176\u5bf9\u73af\u5883\u7684\u76f4\u63a5\u5f71\u54cd\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u6076\u610f\u5229\u7528\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u53ef\u80fd\u9020\u6210\u7684\u635f\u5bb3\u8fdc\u5927\u4e8e\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002 \u73b0\u6709\u7814\u7a76\u5df2\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u53ef\u80fd\u5f15\u53d1\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e00\u4e2a\u5168\u65b0\u7684\u89c6\u89d2\u51fa\u53d1\uff0c\u5173\u6ce8\u4e8e\u5bfc\u81f4\u7cfb\u7edf\u6545\u969c\u7684\u653b\u51fb\u65b9\u5f0f\u2014\u2014\u5373\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u529f\u80fd\u7d0a\u4e71\u3002\u6211\u4eec\u901a\u8fc7\u91c7\u7528\u591a\u6837\u5316\u7684\u653b\u51fb\u65b9\u6cd5\u3001\u573a\u666f\u548c\u5c5e\u6027\uff0c\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\uff0c\u65e8\u5728\u63ed\u793a\u8fd9\u4e9b\u653b\u51fb\u7684\u8106\u5f31\u6027\u6240\u5728\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u591a\u79cd\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u653b\u51fb\u80fd\u591f\u8bf1\u5bfc\u6545\u969c\u7387\u8d85\u8fc780%\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u65bd\u5e76\u90e8\u7f72\u4e86\u4ee3\u7406\uff0c\u4ee5\u6b64\u7a81\u51fa\u6b64\u7c7b\u6f0f\u6d1e\u6240\u5f15\u53d1\u7684\u73b0\u5b9e\u98ce\u9669\u3002 \u4e3a\u4e86\u5e94\u5bf9\u4e0a\u8ff0\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u4ec5\u4f9d\u9760LLM\u8fdb\u884c\u6709\u6548\u68c0\u6d4b\u5b58\u5728\u56f0\u96be\uff0c\u8fd9\u7a81\u663e\u4e86\u8be5\u7c7b\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.21778": "|**2024-07-31**|**Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries**|Felix Ocker et.al.|[2407.21778](http://arxiv.org/abs/2407.21778)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201ctulip\u4ee3\u7406\u201d\u7684\u67b6\u6784\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u4e3b\u667a\u80fd\u4f53\uff0c\u5177\u6709\u5bf9\u5de5\u5177\u5e93\u4e2d\u5927\u91cf\u5de5\u5177\u8fdb\u884c\u521b\u5efa\u3001\u8bfb\u53d6\u3001\u66f4\u65b0\u548c\u5220\u9664\u7684\u80fd\u529b\u3002\u4e0e\u5f53\u524d\u5148\u8fdb\u5b9e\u73b0\u4e0d\u540c\u7684\u662f\uff0c\u201ctulip\u4ee3\u7406\u201d\u5e76\u4e0d\u5728\u7cfb\u7edf\u63d0\u793a\u4e2d\u7f16\u7801\u6240\u6709\u53ef\u7528\u5de5\u5177\u7684\u63cf\u8ff0\uff0c\u8fd9\u4f1a\u5360\u7528\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff0c\u6216\u5728\u68c0\u7d22\u5408\u9002\u5de5\u5177\u65f6\u5d4c\u5165\u6574\u4e2a\u63d0\u793a\u3002\u76f8\u53cd\uff0c\u201ctulip\u4ee3\u7406\u201d\u80fd\u591f\u9012\u5f52\u5730\u5728\u5176\u53ef\u6269\u5c55\u7684\u5de5\u5177\u5e93\u4e2d\u641c\u7d22\u5408\u9002\u7684\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u5e93\u4f5c\u4e3a\u5411\u91cf\u5b58\u50a8\u5b9e\u73b0\u3002\u8fd9\u79cd\u67b6\u6784\u663e\u8457\u964d\u4f4e\u4e86\u63a8\u7406\u6210\u672c\uff0c\u5141\u8bb8\u4f7f\u7528\u5927\u91cf\u7684\u5de5\u5177\u5e93\uff0c\u5e76\u4f7f\u4ee3\u7406\u80fd\u591f\u9002\u5e94\u5e76\u6269\u5c55\u5176\u5de5\u5177\u96c6\u3002 \u6211\u4eec\u901a\u8fc7\u6570\u5b66\u9886\u57df\u4e2d\u7684\u591a\u4e2a\u6d88\u878d\u7814\u7a76\u6765\u8bc4\u4f30\u8be5\u67b6\u6784\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u673a\u5668\u4eba\u9886\u57df\u7684\u901a\u7528\u6027\u5e94\u7528\u3002\u53c2\u8003\u5b9e\u73b0\u548c\u57fa\u51c6\u6d4b\u8bd5\u53ef\u5728github.com/HRI-EU/tulip_agent\u4e0a\u83b7\u53d6\u3002|\n", "2407.21646": "|**2024-07-31**|**Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent**|Shanbo Cheng et.al.|[2407.21646](http://arxiv.org/abs/2407.21646)|**[link](https://github.com/byteresearchcla/realsi)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u8d28\u91cf\u4e14\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u7684\u5b9e\u65f6\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u2014\u2014\u8de8\u8bed\u8a00\u4ee3\u7406\u2014\u2014\u540c\u65f6\u53e3\u8bd1\uff0c\u7b80\u79f0CLASI\u3002\u53d7\u4e13\u4e1a\u53e3\u8bd1\u5458\u542f\u53d1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u521b\u65b0\u7684\u6570\u636e\u9a71\u52a8\u8bfb\u5199\u7b56\u7565\u6765\u5e73\u8861\u7ffb\u8bd1\u8d28\u91cf\u548c\u5ef6\u8fdf\u65f6\u95f4\u3002\u4e3a\u4e86\u5e94\u5bf9\u7ffb\u8bd1\u9886\u57df\u7279\u5b9a\u672f\u8bed\u7684\u6311\u6218\uff0cCLASI\u901a\u8fc7\u591a\u6a21\u6001\u68c0\u7d22\u6a21\u5757\u83b7\u53d6\u76f8\u5173\u8d44\u6599\u4ee5\u589e\u5f3a\u7ffb\u8bd1\u5185\u5bb9\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u652f\u6301\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u8f93\u5165\u97f3\u9891\u3001\u5386\u53f2\u8bed\u5883\u4ee5\u53ca\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5bb9\u9519\u6027\u8f83\u9ad8\u7684\u7ffb\u8bd1\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u5404\u9879\u6307\u6807\u4e0a\u5747\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u7cfb\u7edf\u3002 \u4e0e\u4e13\u4e1a\u53e3\u8bd1\u5458\u76f8\u5ab2\u7f8e\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u66f4\u597d\u7684\u8bc4\u4ef7\u6307\u6807\u2014\u2014\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\uff08VIP\uff09\uff0c\u5b83\u8861\u91cf\u4e86\u6210\u529f\u4f20\u8fbe\u7ed9\u542c\u4f17\u7684\u4fe1\u606f\u91cf\u3002\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\uff0c\u6f14\u8bb2\u5f80\u5f80\u4e0d\u6d41\u7545\u3001\u975e\u6b63\u5f0f\u4e14\u6a21\u7cca\u4e0d\u6e05\uff0cCLASI\u5728\u4e2d\u82f1\u4e92\u8bd1\u65b9\u5411\u4e0a\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u5206\u522b\u8fbe\u5230\u4e8681.3%\u548c78.0%\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5546\u4e1a\u6216\u5f00\u6e90\u7cfb\u7edf\u4ec5\u5206\u522b\u4e3a35.4%\u548c41.6%\u3002\u5728\u6781\u5ea6\u56f0\u96be\u7684\u6570\u636e\u96c6\u4e0a\uff0c\u5f53\u5176\u4ed6\u7cfb\u7edf\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u4f4e\u4e8e13%\u65f6\uff0cCLASI\u4ecd\u80fd\u5b9e\u73b070%\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u5373\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\uff0c\u8fd9\u5bfc\u81f4\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\uff0c\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5316\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u96be\u5ea6\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u540d\u4e3aAgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u5305\u542b\u4e0d\u540c\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u7684\u96be\u5ea6\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u66f4\u5bb9\u6613\u548c\u66f4\u96be\u7684\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u7ecf\u8fc7AgentGen\u6307\u4ee4\u8c03\u4f18\u7684Llama-3 8B\u5728\u6574\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u800c\u4e14\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8fc7\u4e86GPT-4\u3002|\n", "2408.00523": "|**2024-08-01**|**Jailbreaking Text-to-Image Models with LLM-Based Agents**|Yingkai Dong et.al.|[2408.00523](http://arxiv.org/abs/2408.00523)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u81ea\u52a8\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u65b9\u9762\u7684\u8868\u73b0\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u8bdd\u3001\u7f16\u7a0b\u6216\u7279\u5b9a\u9886\u57df\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u5904\u7406\u751f\u6210\u5f0fAI\u5b89\u5168\u4efb\u52a1\u65f6\u5b58\u5728\u7f3a\u53e3\u3002\u8fd9\u4e9b\u7f3a\u53e3\u4e3b\u8981\u662f\u7531LLM\u7684\u5e7b\u89c9\u95ee\u9898\u4ee5\u53ca\u7f3a\u4e4f\u660e\u786e\u6307\u5bfc\u539f\u5219\u6240\u5f15\u53d1\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u9ad8\u7ea7LLM\u57fa\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u96c6\u6210\u4e86\u9ad8\u6548\u6a21\u7cca\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4e13\u95e8\u9488\u5bf9\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u653b\u51fb\u884c\u4e3a\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5177\u6709\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u7684T2I\u6a21\u578b\u7684\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002 Atlas\u5229\u7528\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u6765\u8bc4\u4f30\u63d0\u793a\u662f\u5426\u89e6\u53d1\u4e86T2I\u6a21\u578b\u7684\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u8fed\u4ee3\u65b9\u5f0f\u4e0eLLM\u548cVLM\u534f\u4f5c\uff0c\u751f\u6210\u4e00\u4e2a\u7ed5\u8fc7\u8fc7\u6ee4\u5668\u7684\u66ff\u4ee3\u63d0\u793a\u3002\u6b64\u5916\uff0cAtlas\u901a\u8fc7\u5229\u7528\u591a\u4ee3\u7406\u901a\u4fe1\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u8bb0\u5fc6\u673a\u5236\u548c\u601d\u7ef4\u94fe\uff08COT\uff09\u65b9\u6cd5\uff0c\u589e\u5f3a\u4e86LLM\u5728\u653b\u51fb\u573a\u666f\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0cAtlas\u6210\u529f\u5730\u5728\u65e0\u6a21\u578b\u8bbe\u7f6e\u4e0b\u5bf9\u591a\u4e2a\u6700\u5148\u8fdb\u7684T2I\u6a21\u578b\u8fdb\u884c\u4e86\u201c\u8d8a\u72f1\u201d\uff0c\u8fd9\u4e9b\u6a21\u578b\u90fd\u914d\u5907\u4e86\u591a\u6a21\u6001\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u540c\u65f6\uff0cAtlas\u5728\u67e5\u8be2\u6548\u7387\u548c\u751f\u6210\u56fe\u50cf\u8d28\u91cf\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.00352": "|**2024-08-01**|**Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion**|Honglei Miao et.al.|[2408.00352](http://arxiv.org/abs/2408.00352)|null|\u6587\u672c\u5230\u52a8\u4f5c\uff08Text-to-Motion\uff0cT2M\uff09\u6a21\u578b\u901a\u8fc7\u6df1\u5ea6\u751f\u6210\u6a21\u578b\u9a71\u52a8\u7684\u4eba\u7c7b\u8fd0\u52a8\u751f\u6210\uff0c\u5728\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ece\u6587\u672c\u63d0\u793a\u751f\u6210\u771f\u5b9e\u52a8\u4f5c\u7684\u80fd\u529b\u5f15\u53d1\u4e86\u5b89\u5168\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5f53\u5b83\u4eec\u53ef\u80fd\u88ab\u6076\u610f\u5229\u7528\u65f6\u3002\u5c3d\u7ba1\u5bf9T2M\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5f88\u5c11\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u514d\u53d7\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f71\u54cd\u3002\u73b0\u6709\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u7684\u5de5\u4f5c\u5bf9\u4e8e\u72ec\u7279\u7684\u52a8\u4f5c\u9886\u57df\u6765\u8bf4\u5e76\u4e0d\u5145\u5206\u3002 \u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aALERT-Motion\u7684\u81ea\u4e3b\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6784\u5efa\u9488\u5bf9\u9ed1\u76d2T2M\u6a21\u578b\u7684\u6709\u9488\u5bf9\u6027\u7684\u5bf9\u6297\u6027\u653b\u51fb\u3002\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u9884\u5b9a\u4e49\u89c4\u5219\u4fee\u6539\u63d0\u793a\u4e0d\u540c\uff0cALERT-Motion\u5229\u7528LLMs\u5bf9\u4eba\u7c7b\u52a8\u4f5c\u7684\u77e5\u8bc6\uff0c\u81ea\u4e3b\u751f\u6210\u5fae\u5999\u800c\u5f3a\u5927\u7684\u5bf9\u6297\u6027\u6587\u672c\u63cf\u8ff0\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u4e00\u4e2a\u9002\u5e94\u6027\u8c03\u5ea6\u6a21\u5757\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u4ee5\u8fed\u4ee3\u5730\u7ec6\u5316\u548c\u641c\u7d22\u5bf9\u6297\u6027\u63d0\u793a\uff1b\u4ee5\u53ca\u4e00\u4e2a\u591a\u6a21\u6001\u4fe1\u606f\u5bf9\u6bd4\u6a21\u5757\uff0c\u63d0\u53d6\u4e0e\u52a8\u4f5c\u76f8\u5173\u7684\u5173\u952e\u8bed\u4e49\u4fe1\u606f\uff0c\u6307\u5bfc\u4ee3\u7406\u7684\u641c\u7d22\u3002 \u901a\u8fc7\u8fd9\u4e00\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0cALERT-Motion\u80fd\u591f\u6784\u9020\u67e5\u8be2\u53d7\u5bb3\u6a21\u578b\u4ee5\u4ea7\u751f\u4e0e\u76ee\u6807\u52a8\u4f5c\u9ad8\u5ea6\u5339\u914d\u7684\u8f93\u51fa\u7684\u5bf9\u6297\u6027\u63d0\u793a\uff0c\u540c\u65f6\u907f\u514d\u660e\u663e\u7684\u6270\u52a8\u3002\u5728\u6d41\u884c\u7684T2M\u6a21\u578b\u4e0a\u8fdb\u884c\u7684\u8bc4\u4f30\u663e\u793a\u4e86ALERT-Motion\u76f8\u5bf9\u4e8e\u5148\u524d\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u5176\u5bf9\u6297\u6210\u529f\u7387\u66f4\u9ad8\uff0c\u5e76\u4e14\u5bf9\u6297\u6027\u63d0\u793a\u66f4\u52a0\u9690\u853d\u3002\u8fd9\u9879\u5173\u4e8eT2M\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f00\u521b\u6027\u5de5\u4f5c\u5f3a\u8c03\u4e86\u968f\u7740\u8fd0\u52a8\u751f\u6210\u6280\u672f\u7684\u53d1\u5c55\uff0c\u5f00\u53d1\u9632\u5fa1\u63aa\u65bd\u7684\u7d27\u8feb\u6027\uff0c\u8fd9\u4fc3\u4f7f\u6211\u4eec\u8fdb\u4e00\u6b65\u7814\u7a76\u5b89\u5168\u548c\u8d1f\u8d23\u4efb\u7684\u90e8\u7f72\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible.|\n", "2408.02479": "|**2024-08-05**|**From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future**|Haolin Jin et.al.|[2408.02479](http://arxiv.org/abs/2408.02479)|null|With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artificial General Intelligence (AGI), combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.|\n", "2408.02232": "|**2024-08-07**|**SpecRover: Code Intent Extraction via LLMs**|Haifeng Ruan et.al.|[2408.02232](http://arxiv.org/abs/2408.02232)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u7a0b\u5e8f\u5206\u6790\u80fd\u529b\u7ed3\u5408\u7684\u5f62\u5f0f\u4e0b\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u81ea\u52a8\u6267\u884c\u7a0b\u5e8f\u6539\u8fdb\u548c\u9519\u8bef\u4fee\u590d\u7684\u9ad8\u6548\u4f4e\u8017\u5de5\u4f5c\u6d41\u7a0b\u3002\u7531\u4e8e\u7a0b\u5e8f\u6539\u8fdb\u6216\u4fee\u590d\u901a\u5e38\u9700\u8981\u660e\u786e\u671f\u671b\u7684\u884c\u4e3a\u89c4\u8303\uff0c\u56e0\u6b64\u89c4\u8303\u63a8\u65ad\u5bf9\u4e8e\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u8865\u4e01\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u5728\u8f6f\u4ef6\u9879\u76ee\u4e2d\u8fdb\u884c\u8fed\u4ee3\u4ee3\u7801\u641c\u7d22\u5e76\u914d\u5408\u89c4\u8303\u63a8\u65ad\u6765\u63a2\u7d22\u8fd9\u4e00\u9886\u57df\uff0c\u4ece\u800c\u4ece\u9879\u76ee\u7684\u7ed3\u6784\u548c\u884c\u4e3a\u4e2d\u63a8\u65ad\u51fa\u610f\u56fe\u3002\u6355\u83b7\u7684\u610f\u56fe\u5c06\u7531\u5ba1\u67e5\u8005\u4ee3\u7406\u8fdb\u884c\u5ba1\u67e5\uff0c\u4ee5\u9a8c\u8bc1\u8865\u4e01\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u5bf9\u9a8c\u8bc1\u540e\u8865\u4e01\u4fe1\u5fc3\u5ea6\u91cf\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u201cSpecRover\u201d\uff08AutoCodeRover-v2\uff09\u5efa\u7acb\u5728\u5f00\u6e90\u7684LLM\u4ee3\u7406AutoCodeRover\u4e4b\u4e0a\u3002\u5728\u4f7f\u7528SWE-Bench\u5b8c\u6574\u96c6\u8bc4\u4f30\u65f6\uff0c\u5373\u9488\u5bf92294\u4e2aGitHub\u95ee\u9898\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u4e86\u76f8\u5bf9\u4e8eAutoCodeRover\u8d85\u8fc750%\u7684\u6548\u7387\u63d0\u5347\u3002\u4e0e\u73b0\u6709\u7684\u5f00\u6e90\u4ee3\u7406\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u89e3\u51b3SWE-Bench lite\u4e2d\u7684\u5e73\u5747GitHub\u95ee\u9898\u65f6\uff0c\u6210\u672c\u4ec5\u4e3a0.65\u7f8e\u5143\u3002SpecRover\u751f\u6210\u7684\u89e3\u91ca\u80fd\u591f\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u66f4\u660e\u786e\u7684\u4fe1\u53f7\uff0c\u8868\u660e\u5efa\u8bae\u7684\u8865\u4e01\u53ef\u4ee5\u88ab\u6709\u4fe1\u5fc3\u5730\u63a5\u53d7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8fd8\u5f3a\u8c03\u4e86\u5373\u4f7f\u5728LLM\u65f6\u4ee3\uff0c\u81ea\u52a8\u5316\u7a0b\u5e8f\u4fee\u590d\u6280\u672f\u4e2d\u89c4\u8303\u63a8\u65ad\u7684\u91cd\u8981\u6027\u3002|\n", "2408.01725": "|**2024-08-03**|**The Drama Machine: Simulating Character Development with LLM Agents**|Liam Magee et.al.|[2408.01725](http://arxiv.org/abs/2408.01725)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6765\u6a21\u62df\u590d\u6742\u52a8\u6001\u89d2\u8272\u5728\u620f\u5267\u6027\u573a\u666f\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u201c\u620f\u5267\u673a\u5668\u201d\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u534f\u8c03\u4e86\u626e\u6f14\u4e0d\u540c\u201c\u81ea\u6211\u201d\u548c\u201c\u8d85\u6211\u201d\u5fc3\u7406\u89d2\u8272\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u5728\u89d2\u8272\u626e\u6f14\u6a21\u62df\u4e2d\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u5728\u76f8\u4e92\u4f5c\u7528\u7684\u5bf9\u8bdd\u548c\u4e2a\u4f53\u5185\u90e8\u72ec\u767d\u4e4b\u95f4\u53d1\u5c55\u5e73\u884c\u7684\u4ea4\u4e92\u3002 \u6211\u4eec\u5c06\u6b64\u6846\u67b6\u5e94\u7528\u4e8e\u4e24\u4e2a\u620f\u5267\u573a\u666f\u2014\u2014\u9762\u8bd5\u548c\u4fa6\u63a2\u6545\u4e8b\uff0c\u5e76\u6bd4\u8f83\u4e86\u5728\u6709\u65e0\u201c\u8d85\u6211\u201d\u5f71\u54cd\u4e0b\u89d2\u8272\u53d1\u5c55\u7684\u5dee\u5f02\u3002\u5c3d\u7ba1\u662f\u521d\u6b65\u7814\u7a76\uff0c\u4f46\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u66f4\u52a0\u7ec6\u817b\u3001\u9002\u5e94\u6027\u5f3a\u7684\u6545\u4e8b\uff0c\u8fd9\u4e9b\u6545\u4e8b\u968f\u7740\u4e00\u7cfb\u5217\u5bf9\u8bdd\u56de\u5408\u7684\u53d1\u5c55\u800c\u6f14\u53d8\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u89d2\u8272\u626e\u6f14\u7684\u4e0d\u540c\u65b9\u5f0f\u4ee5\u53ca\u8fd9\u53ef\u80fd\u5bf9AI\u4e3b\u4f53\u6027\u7684\u6982\u5ff5\u5316\u610f\u5473\u7740\u4ec0\u4e48\u3002\u8bba\u6587\u6700\u540e\u8003\u8651\u4e86\u8fd9\u4e00\u65b9\u6cd5\u5982\u4f55\u4e3a\u601d\u8003AI\u6a21\u62df\u4e2d\u5185\u5728\u51b2\u7a81\u548c\u793e\u4f1a\u8868\u6f14\u6027\u7684\u4f5c\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002|\n", "2408.01703": "|**2024-08-03**|**WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization**|Liwenhan Xie et.al.|[2408.01703](http://arxiv.org/abs/2408.01703)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u5bf9\u8bdd\u5f0f\u7528\u6237\u754c\u9762\u652f\u6301\u6570\u636e\u5206\u6790\uff0c\u4ee5OpenAI\u7684ChatGPT\uff08\u539f\u540dAdvanced Data Analysis\u6216Code Interpreter\uff09\u4e3a\u4ee3\u8868\u3002\u672c\u8d28\u4e0a\uff0cLLM\u751f\u6210\u4ee3\u7801\u4ee5\u5b8c\u6210\u5404\u79cd\u5206\u6790\u4efb\u52a1\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5448\u73b0\u539f\u59cb\u4ee3\u7801\u53ef\u80fd\u4f1a\u4f7f\u903b\u8f91\u53d8\u5f97\u6a21\u7cca\uff0c\u5e76\u59a8\u788d\u7528\u6237\u9a8c\u8bc1\u3002\u4e3a\u4e86\u8d4b\u4e88\u7528\u6237\u5bf9\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\u8fdb\u884c\u589e\u5f3a\u7406\u89e3\u4e0e\u63a7\u5236\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u8f6c\u6362\u4e3a\u5b9e\u65f6\u4ea4\u4e92\u5f0f\u7684\u53ef\u89c6\u5316\u8868\u793a\u3002\u5728\u8be5\u65b9\u6cd5\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u5b9e\u65f6\u83b7\u5f97\u6e05\u6670\u3001\u5206\u6b65\u7684LLM\u4ee3\u7801\u53ef\u89c6\u5316\uff0c\u5141\u8bb8\u4ed6\u4eec\u7406\u89e3\u3001\u9a8c\u8bc1\u5e76\u4fee\u6539\u5206\u6790\u4e2d\u7684\u6bcf\u4e2a\u6570\u636e\u64cd\u4f5c\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u51b3\u7b56\u57fa\u4e8e\u4e00\u9879\u63a2\u7d22\u7528\u6237\u5b9e\u8df5\u4e0e\u6311\u6218\u7684\u5f62\u6210\u6027\u7814\u7a76\uff08N=8\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u540d\u4e3aWaitGPT\u7684\u539f\u578b\uff0c\u5e76\u8fdb\u884c\u4e86\u4e00\u9879\u7528\u6237\u7814\u7a76\uff08N=12\uff09\uff0c\u4ee5\u8bc4\u4f30\u5176\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\u3002\u7528\u6237\u7814\u7a76\u7684\u7ed3\u679c\u8868\u660e\uff0cWaitGPT\u6709\u52a9\u4e8e\u76d1\u63a7\u548c\u5f15\u5bfc\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\uff0c\u4f7f\u53c2\u4e0e\u8005\u80fd\u591f\u63d0\u9ad8\u9519\u8bef\u68c0\u6d4b\u80fd\u529b\u5e76\u589e\u52a0\u5bf9\u7ed3\u679c\u7684\u6574\u4f53\u4fe1\u5fc3\u3002|\n", "2408.01667": "|**2024-08-03**|**Automated Phishing Detection Using URLs and Webpages**|Huilin Wang et.al.|[2408.01667](http://arxiv.org/abs/2408.01667)|null|### \u6458\u8981 \u672c\u6587\u9879\u76ee\u805a\u7126\u4e8e\u901a\u8fc7\u6784\u5efa\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u4ee5\u89e3\u51b3\u4f20\u7edf\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3b\u52a8\u83b7\u53d6\u548c\u5229\u7528\u5728\u7ebf\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u52a8\u6001\u7684\u53c2\u8003\u7cfb\u7edf\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u7cbe\u786e\u7684\u9493\u9c7c\u68c0\u6d4b\u3002\u8fd9\u4e00\u521b\u65b0\u907f\u514d\u4e86\u4f9d\u8d56\u9759\u6001\u77e5\u8bc6\u5e93\u7684\u9700\u6c42\uff0c\u663e\u8457\u63d0\u5347\u4e86\u81ea\u52a8\u5316\u5b89\u5168\u63aa\u65bd\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u3002 ### \u9879\u76ee\u6982\u8ff0 \u9879\u76ee\u62a5\u544a\u9996\u5148\u5bf9\u73b0\u6709\u89e3\u51b3\u65b9\u6848\u8fdb\u884c\u4e86\u521d\u6b65\u7814\u7a76\u548c\u95ee\u9898\u5206\u6790\uff0c\u4fc3\u4f7f\u6211\u4eec\u5f00\u53d1\u51fa\u65b0\u7684\u6846\u67b6\u3002\u6211\u4eec\u4ee5\u6a21\u62df\u7684LLM\u4ee3\u7406\u6765\u5c55\u793a\u6846\u67b6\uff0c\u5e76\u8be6\u7ec6\u9610\u8ff0\u4e86\u6784\u5efa\u6240\u9700\u7684\u6280\u672f\uff0c\u968f\u540e\u63d0\u4f9b\u4e86\u5b8c\u6574\u5b9e\u65bd\u7684\u5b9e\u4f8b\u53ca\u5b9e\u9a8c\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u540c\u7c7b\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u5ea6\u4e0a\u8fbe\u5230\u4e860.945\uff0c\u76f8\u6bd4\u73b0\u6709\u89e3\u51b3\u65b9\u6848DynaPhish\u9ad8\u51fa0.445\u4e2a\u767e\u5206\u70b9\u3002 ### \u6027\u80fd\u4e0e\u5c40\u9650 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u672c\u6846\u67b6\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5f53\u524d\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u5177\u6709\u9002\u5e94\u5b9e\u9645\u5e94\u7528\u7684\u6f5c\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u8ba8\u8bba\u4e86\u8be5\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u8fdb\u7b56\u7565\uff0c\u65e8\u5728\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u6548\u80fd\u3002 ### \u7ed3\u8bba \u63d0\u51fa\u7684\u6846\u67b6\u4e3a\u589e\u5f3a\u73b0\u6709\u7684\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u624b\u6bb5\u63d0\u4f9b\u4e86\u6709\u6548\u9014\u5f84\uff0c\u5e76\u4e14\u5177\u5907\u88ab\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.03910": "|**2024-08-11**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u5f80\u5f80\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86CodexGraph\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u4e0eLLM\u4ee3\u7406\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u548c\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0cCodexGraph\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u7684\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5CodexGraph\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u901a\u8fc7\u4f7f\u7528\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0cCodexGraph\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u7528\u9014\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03631": "|**2024-08-07**|**Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent**|Yanhu Wang et.al.|[2408.03631](http://arxiv.org/abs/2408.03631)|null|\u4f20\u7edf\u7684\u57fa\u7ad9\u9009\u5740\uff08BSS\uff09\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u9a7e\u9a76\u6d4b\u8bd5\u548c\u7528\u6237\u53cd\u9988\uff0c\u8fd9\u65e2\u8d39\u65f6\u53c8\u9700\u8981\u5728\u901a\u4fe1\u3001\u7f51\u7edc\u548c\u4f18\u5316\u65b9\u9762\u5177\u5907\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u76f8\u5173\u6280\u672f\u7684\u53d1\u5c55\uff0c\u7279\u522b\u662f\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u4ee3\u7406\u5de5\u7a0b\u9886\u57df\uff0c\u7f51\u7edc\u4f18\u5316\u5c06\u89c1\u8bc1\u4e00\u573a\u9769\u547d\u6027\u7684\u8f6c\u53d8\u3002\u8fd9\u79cd\u8f6c\u53d8\u6d89\u53ca\u5de7\u5999\u5730\u4f7f\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u6765\u5411\u8fd9\u4e9b\u590d\u6742\u800c\u5148\u8fdb\u7684LLMs\u6ce8\u5165\u4eba\u7c7b\u7ecf\u9a8c\u548c\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8fde\u63a5\u5230\u4eba\u7c7b\u7528\u6237\uff0c\u90e8\u7f72\u81ea\u4e3b\u4ee3\u7406\u4f5c\u4e3a\u901a\u4fe1\u6865\u6881\u3002\u8fd9\u79cd\u96c6\u6210\u4ee3\u8868\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4f5c\u4e3a\u4e00\u79cd\u670d\u52a1\u548cAI\u4f7f\u751f\u6d3b\u66f4\u4fbf\u6377\u7684\u672a\u6765\u8303\u5f0f\u3002 \u4f5c\u4e3a\u521d\u6b65\u63a2\u7d22\uff0c\u672c\u7814\u7a76\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u4e2a\u7531LLM\u9a71\u52a8\u7684BSS\u4f18\u5316\u6846\u67b6\uff0c\u5e76\u63d0\u51fa\u4e86\u56db\u79cd\u6f5c\u5728\u7684\u5b9e\u73b0\u7b56\u7565\uff1a\u57fa\u4e8e\u4f18\u5316\u63d0\u793a\u7684LLM\uff08PoL\uff09\u3001\u4eba\u673a\u4ea4\u4e92\u7684LLM\uff08HiLL\uff09\u3001LLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08LaBa\uff09\u4ee5\u53ca\u534f\u540c\u591a\u4e2aLLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08CLaBa\uff09\u3002\u901a\u8fc7\u5728\u771f\u5b9e\u6570\u636e\u4e0a\u7684\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u8868\u660e\uff0c\u501f\u52a9\u63d0\u793a\u7684LLM\u548c\u57fa\u4e8e\u4ee3\u7406\u7684LLM\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u9ad8\u6548\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u53ef\u9760\u7684\u7f51\u7edc\u90e8\u7f72\uff0c\u663e\u8457\u63d0\u9ad8\u4e86BSS\u4f18\u5316\u7684\u6548\u7387\u5e76\u51cf\u5c11\u4e86\u4e0d\u5fc5\u8981\u7684\u624b\u52a8\u53c2\u4e0e\u3002|\n", "2408.04168": "|**2024-08-08**|**Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions**|Qingbin Zeng et.al.|[2408.04168](http://arxiv.org/abs/2408.04168)|**[link](https://github.com/hiyouga/llama-factory)**|\u672c\u6587\u63a2\u8ba8\u4e86\u57ce\u5e02\u5bfc\u822a\u573a\u666f\u4e0b\u7684AI\u4ee3\u7406\u95ee\u9898\uff1a\u63d0\u4f9b\u76ee\u6807\u4f4d\u7f6e\u4e0e\u77e5\u540d\u5730\u6807\u4e4b\u95f4\u7684\u8bed\u8a00\u63cf\u8ff0\uff1b\u4ec5\u901a\u8fc7\u89c2\u5bdf\u5468\u56f4\u73af\u5883\uff0c\u5305\u62ec\u8bc6\u522b\u5730\u6807\u548c\u9053\u8def\u7f51\u7edc\u8fde\u63a5\uff0c\u4ee3\u7406\u9700\u8981\u4f5c\u51fa\u51b3\u7b56\u4ee5\u65e0\u6307\u793a\u5730\u5bfc\u822a\u81f3\u76ee\u6807\u4f4d\u7f6e\u3002\u8fd9\u4e00\u6311\u6218\u6027\u5728\u4e8e\uff0c\u5b83\u8981\u6c42\u4ee3\u7406\u5efa\u7acb\u81ea\u8eab\u5b9a\u4f4d\u5e76\u83b7\u53d6\u590d\u6742\u57ce\u5e02\u73af\u5883\u7684\u7a7a\u95f4\u8868\u793a\uff0c\u800c\u5730\u6807\u5f80\u5f80\u4e0d\u53ef\u89c1\u3002\u5728\u7f3a\u4e4f\u5bfc\u822a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u79cd\u80fd\u529b\u5bf9\u4e8e\u4ee3\u7406\u5728\u957f\u8ddd\u79bb\u57ce\u5e02\u5bfc\u822a\u4e2d\u505a\u51fa\u9ad8\u8d28\u91cf\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u63a8\u7406\u80fd\u529b\u7684\u6d8c\u73b0\uff0c\u4e00\u4e2a\u5438\u5f15\u4eba\u7684\u57fa\u7840\u65b9\u6cd5\u662f\u63d0\u793aLLMs\u5bf9\u6bcf\u6b21\u89c2\u5bdf\u505a\u51fa\u201c\u53cd\u5e94\u201d\u5e76\u636e\u6b64\u4f5c\u51fa\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6027\u80fd\u975e\u5e38\u5dee\uff0c\u4ee3\u7406\u7ecf\u5e38\u53cd\u590d\u8bbf\u95ee\u76f8\u540c\u4f4d\u7f6e\uff0c\u5e76\u4f5c\u51fa\u77ed\u89c6\u3001\u4e0d\u4e00\u81f4\u7684\u51b3\u7b56\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5176\u7279\u5f81\u5728\u4e8e\u611f\u77e5\u3001\u53cd\u601d\u548c\u89c4\u5212\u7684\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7\u5fae\u8c03\u7684LLaVA-7B\u80fd\u591f\u51c6\u786e\u611f\u77e5\u5730\u6807\u7684\u65b9\u5411\u548c\u8ddd\u79bb\uff0c\u9002\u7528\u4e8e\u57ce\u5e02\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u8bb0\u5fc6\u673a\u5236\u5b9e\u73b0\u53cd\u601d\uff0c\u5373\u5b58\u50a8\u8fc7\u5f80\u7ecf\u9a8c\u5e76\u5728\u5f53\u524d\u611f\u77e5\u4e0b\u68c0\u7d22\uff0c\u4ee5\u8fdb\u884c\u6709\u6548\u7684\u51b3\u7b56\u8bba\u8bc1\u3002\u89c4\u5212\u5219\u5229\u7528\u53cd\u601d\u7ed3\u679c\u751f\u6210\u957f\u671f\u8ba1\u5212\uff0c\u4ece\u800c\u907f\u514d\u957f\u8ddd\u79bb\u5bfc\u822a\u4e2d\u7684\u77ed\u89c6\u51b3\u7b56\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bbe\u8ba1\u7684\u5de5\u4f5c\u6d41\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86LLM\u4ee3\u7406\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u662f\u5426\u4f1a\u635f\u5bb3LLM\u4ee3\u7406\u5728\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u8fdb\u8ba1\u5212\uff1f\uff084\uff09\u5bf9LLM\u8fdb\u884c\u6b63\u8d1f\u53cd\u9988\u7ed3\u5408\u7684\u5fae\u8c03\u662f\u5426\u80fd\u5e26\u6765\u8fdb\u4e00\u6b65\u7684\u63d0\u5347\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u5b83\u4eec\u5728\u5173\u6ce8\u957f\u4e0a\u4e0b\u6587\u4e2d\u5173\u952e\u90e8\u5206\u7684\u80fd\u529b\u4e0a\u4ecd\u7136\u5b58\u5728\u4e0d\u8db3\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u5728\u5206\u6790\u957f\u8ba1\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5e76\u4e14\u65e0\u6cd5\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u7528\u4e8e\u7ec6\u5316\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Feedback-Aware Fine-Tuning\uff08FAFT\uff09\uff0c\u4e00\u79cd\u5229\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\uff0c\u76f8\u8f83\u4e8e\u7eaf\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\uff0cFAFT\u5728\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u5173\u4e8e\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.05346": "|**2024-08-13**|**DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts**|Mohammed Saidul Islam et.al.|[2408.05346](http://arxiv.org/abs/2408.05346)|**[link](https://github.com/saidul-islam98/DataNarrative)**|\u6570\u636e\u9a71\u52a8\u7684\u6545\u4e8b\u53d9\u8ff0\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ed3\u5408\u53d9\u4e8b\u6280\u5de7\u4e0e\u53ef\u89c6\u5316\u548c\u6587\u672c\uff0c\u6765\u4f20\u8fbe\u89c1\u89e3\u3002\u8fd9\u4e9b\u6545\u4e8b\u878d\u5408\u4e86\u56fe\u8868\u4e2d\u7684\u7a81\u51fa\u6761\u5f62\u548c\u7ebf\u6761\u4ee5\u53ca\u89e3\u91ca\u89c1\u89e3\u7684\u6587\u672c\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u521b\u5efa\u8fd9\u6837\u7684\u6545\u4e8b\u9700\u8981\u5bf9\u6570\u636e\u6709\u6df1\u5165\u7684\u7406\u89e3\uff0c\u5e76\u4e14\u9700\u8981\u7cbe\u5fc3\u7684\u53d9\u4e8b\u89c4\u5212\uff0c\u901a\u5e38\u9700\u8981\u4eba\u7c7b\u7684\u4ecb\u5165\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u8d39\u5fc3\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u751f\u6210\u8fde\u8d2f\u548c\u5168\u9762\u7684\u6570\u636e\u6545\u4e8b\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u4efb\u52a1\u2014\u2014\u6570\u636e\u6545\u4e8b\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u76841,449\u4e2a\u6545\u4e8b\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u521b\u9020\u8fde\u8d2f\u6570\u636e\u6545\u4e8b\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5229\u7528\u4e24\u4e2aLLM\u4ee3\u7406\u6765\u6a21\u4eff\u4eba\u7c7b\u8bb2\u6545\u4e8b\u7684\u8fc7\u7a0b\uff1a\u4e00\u4e2a\u7528\u4e8e\u7406\u89e3\u5e76\u63cf\u8ff0\u6570\u636e\u3001\u751f\u6210\u5927\u7eb2\u548c\u53d9\u8ff0\uff0c\u53e6\u4e00\u4e2a\u5219\u5728\u6bcf\u4e2a\u4e2d\u95f4\u6b65\u9aa4\u8fdb\u884c\u9a8c\u8bc1\u3002\u5c3d\u7ba1\u6211\u4eec\u7684\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8e\u6a21\u578b\u548c\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u901a\u5e38\u4f18\u4e8e\u975e\u4ee3\u7406\u5bf9\u624b\uff0c\u4f46\u7ed3\u679c\u4e5f\u63ed\u793a\u4e86\u6570\u636e\u6545\u4e8b\u751f\u6210\u7684\u72ec\u7279\u6311\u6218\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u89e3\u51b3SWE-Bench Lite\u4e2d\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\uff0c\u4e00\u4e2a\u65e8\u5728\u5229\u7528\u5176\u72ec\u7279\u4e13\u957f\u7684\u6846\u67b6\u3002DEI\u4f5c\u4e3a\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90SWE\u4ee3\u7406\uff0c\u5176\u6700\u9ad8\u4e2a\u4f53\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u4e3a27.3%\uff0c\u5728\u5e94\u7528\u4e86DEI\u540e\uff0c\u80fd\u591f\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u8868\u73b0\u56e2\u961f\u4ee555%\u7684\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u53d6\u5f97\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5cAI\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06520": "|**2024-08-12**|**Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning**|Chuanneng Sun et.al.|[2408.06520](http://arxiv.org/abs/2408.06520)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5b83\u4eec\u6210\u4e3a\u673a\u5668\u4eba\u51b3\u7b56\u7684\u6709\u5e0c\u671b\u5019\u9009\u8005\u3002\u53d7\u5230\u5c42\u6b21\u5f3a\u5316\u5b66\u4e60\uff08HRL\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014\u5728\u4e0a\u4e0b\u6587\u4e2d\u8fdb\u884c\u5c42\u6b21\u5316\u7684\u5f3a\u5316\u5b66\u4e60\uff08HCRL\uff09\u3002\u8be5\u6846\u67b6\u901a\u8fc7LLM\u57fa\u9ad8\u5c42\u7b56\u7565\u5206\u89e3\u590d\u6742\u4efb\u52a1\uff0c\u5373\u901a\u8fc7\u5728\u6267\u884c\u65f6\u52a8\u6001\u5206\u89e3\u590d\u6742\u4efb\u52a1\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u5229\u7528\u9ad8\u9636\u7b56\u7565\u6765\u5b9a\u4e49\u76ee\u6807\uff0c\u8fd9\u4e9b\u76ee\u6807\u7531\u5b50\u4efb\u52a1\u7ec4\u6210\uff0c\u5e76\u5206\u914d\u7ed9\u4f4e\u9636\u7b56\u7565\u4ee5\u5b8c\u6210\u3002\u4e00\u65e6LLM\u4ee3\u7406\u786e\u5b9a\u76ee\u6807\u5df2\u5b8c\u6210\uff0c\u5219\u4f1a\u63d0\u51fa\u65b0\u7684\u76ee\u6807\u3002 \u4e3a\u4e86\u63d0\u9ad8\u591a\u8f6e\u6267\u884c\u4e2d\u7684\u4ee3\u7406\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e8b\u540e\u6a21\u5757\u5316\u53cd\u601d\uff08HMR\uff09\uff0c\u5176\u4e2d\uff0c\u4ee3\u7406\u4e0d\u662f\u5bf9\u5b8c\u6574\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u800c\u662f\u5c06\u4efb\u52a1\u76ee\u6807\u66ff\u6362\u4e3a\u4e2d\u95f4\u76ee\u6807\uff0c\u5e76\u8ba9\u4ee3\u7406\u5bf9\u8f83\u77ed\u7684\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u4ee5\u63d0\u9ad8\u53cd\u601d\u6548\u7387\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u57fa\u51c6\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6240\u63d0\u51fa\u7684HCRL\u7684\u51b3\u7b56\u80fd\u529b\u2014\u2014ALFWorld\u3001Webshop\u548cHotpotQA\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5728\u4e94\u8f6e\u6267\u884c\u4e2d\uff0cHCRL\u53ef\u5b9e\u73b09%\u300142%\u548c10%\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.07199": "|**2024-08-13**|**Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents**|Pranav Putta et.al.|[2408.07199](http://arxiv.org/abs/2408.07199)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9700\u8981\u590d\u6742\u63a8\u7406\u7684\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u8fdb\u884c\u81ea\u4e3b\u4ee3\u7406\u7684\u591a\u6b65\u9aa4\u63a8\u7406\u5e94\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4f20\u7edf\u7684\u57fa\u4e8e\u9759\u6001\u6570\u636e\u96c6\u7684\u76d1\u7763\u9884\u8bad\u7ec3\u4e0d\u8db3\u4ee5\u4f7f\u81ea\u4e3b\u4ee3\u7406\u5177\u5907\u5728\u52a8\u6001\u8bbe\u7f6e\u5982\u7f51\u7edc\u5bfc\u822a\u4e2d\u6267\u884c\u590d\u6742\u51b3\u7b56\u6240\u9700\u7684\u81ea\u4e3b\u80fd\u529b\u3002\u4ee5\u5f80\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u6765\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\u7684\u65b9\u6cd5\u5f80\u5f80\u9762\u4e34\u7d2f\u79ef\u9519\u8bef\u548c\u63a2\u7d22\u6570\u636e\u6709\u9650\u7684\u95ee\u9898\uff0c\u5bfc\u81f4\u653f\u7b56\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7ed3\u5408\u4e86\u5f15\u5bfc\u5f0f\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u641c\u7d22\u4e0e\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u5e76\u4f7f\u7528\u79bb\u7b56\u7565\u53d8\u4f53\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b97\u6cd5\u5bf9\u4ee3\u7406\u4e92\u52a8\u8fdb\u884c\u8fed\u4ee3\u5fae\u8c03\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8LLM\u4ee3\u7406\u4ece\u6210\u529f\u548c\u5931\u8d25\u7684\u8f68\u8ff9\u4e2d\u6709\u6548\u5b66\u4e60\uff0c\u4ece\u800c\u5728\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u4e2d\u63d0\u9ad8\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u5728WebShop\u73af\u5883\uff08\u4e00\u4e2a\u6a21\u62df\u7535\u5b50\u5546\u52a1\u5e73\u53f0\uff09\u4e2d\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8be5\u73af\u5883\u5728\u4e0e\u884c\u4e3a\u514b\u9686\u548c\u5f3a\u5316\u5fae\u8c03\u57fa\u7ebf\u76f8\u6bd4\u65f6\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u5728\u914d\u5907\u5728\u7ebf\u641c\u7d22\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51fb\u8d25\u4e86\u5e73\u5747\u4eba\u7c7b\u6027\u80fd\u3002\u5728\u5b9e\u9645\u9884\u8ba2\u573a\u666f\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86Llama-3 70B\u6a21\u578b\u7684\u96f6\u5c04\u6210\u529f\u7387\u4ece18.6%\u589e\u52a0\u523081.7%\uff08\u76f8\u5bf9\u589e\u52a0\u4e86340%\uff09\uff0c\u5e76\u5728\u4e00\u5929\u7684\u6570\u636e\u6536\u96c6\u540e\u8fdb\u4e00\u6b65\u589e\u52a0\u523095.4%\uff0c\u5e76\u4e14\u901a\u8fc7\u5728\u7ebf\u641c\u7d22\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u6807\u5fd7\u7740\u81ea\u4e3b\u4ee3\u7406\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5b9e\u73b0\u66f4\u9ad8\u7ea7\u548c\u53ef\u9760\u51b3\u7b56\u7684\u9053\u8def\u3002|\n", "2408.08158": "|**2024-08-15**|**EmBARDiment: an Embodied AI Agent for Productivity in XR**|Riccardo Bovo et.al.|[2408.08158](http://arxiv.org/abs/2408.08158)|null|XR\u8bbe\u5907\u642d\u8f7d\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u59cb\u7ec8\u5728\u7ebf\u7684\u4ee3\u7406\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u9ad8\u6548\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u5c4f\u5e55\u7684\u804a\u5929\u673a\u5668\u4eba\u5e76\u672a\u5145\u5206\u5229\u7528XR\u6240\u63d0\u4f9b\u7684\u5168\u9762\u81ea\u7136\u8f93\u5165\uff0c\u5305\u62ec\u5185\u90e8\u9762\u5411\u7684\u4f20\u611f\u5668\u6570\u636e\uff0c\u800c\u662f\u8fc7\u5ea6\u4f9d\u8d56\u660e\u786e\u7684\u58f0\u97f3\u6216\u6587\u672c\u63d0\u793a\uff0c\u6709\u65f6\u8fd8\u4f1a\u4e0e\u4f5c\u4e3a\u67e5\u8be2\u7684\u4e00\u90e8\u5206\u6295\u5c04\u7684\u591a\u6a21\u6001\u6570\u636e\u914d\u5bf9\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u6ce8\u610f\u529b\u6846\u67b6\u4ece\u7528\u6237\u884c\u4e3a\u3001\u6ce8\u89c6\u70b9\u548cXR\u73af\u5883\u4e2d\u7684\u4e0a\u4e0b\u6587\u8bb0\u5fc6\u4e2d\u9690\u5f0f\u5730\u63a8\u5bfc\u51fa\u80cc\u666f\u4fe1\u606f\uff0c\u4ece\u800c\u6700\u5c0f\u5316\u5bf9\u5de5\u7a0b\u5316\u660e\u786e\u63d0\u793a\u7684\u9700\u6c42\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u4e14\u76f4\u89c2\u7684\u4ea4\u4e92\uff0c\u8fd9\u4e9b\u4ea4\u4e92\u80fd\u591f\u6d1e\u5bdf\u7528\u6237\u7684\u89c1\u89e3\u5e76\u4e3a\u804a\u5929\u673a\u5668\u4eba\u63d0\u4f9b\u4fe1\u606f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u5c55\u793a\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u5728XR\u4e2d\u4e0e\u804a\u5929\u673a\u5668\u4eba\u8fdb\u884c\u4ea4\u4e92\u7684\u6f5c\u5728\u53d8\u9769\u6027\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765XR-\u5b9e\u4f53LLM\u4ee3\u7406\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.08054": "|**2024-08-15**|**Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework**|Changyu Du et.al.|[2408.08054](http://arxiv.org/abs/2408.08054)|null|\u4f20\u7edf\u7684\u5efa\u7b51\u4fe1\u606f\u6a21\u578b\uff08BIM\uff09\u521b\u5efa\u8fc7\u7a0b\u901a\u5e38\u8981\u6c42\u8bbe\u8ba1\u5e08\u638c\u63e1\u590d\u6742\u4e14\u7e41\u7410\u7684\u5efa\u6a21\u547d\u4ee4\uff0c\u4ee5\u5728BIM\u521b\u5efa\u5de5\u5177\u4e2d\u5b9e\u73b0\u5176\u8bbe\u8ba1\u610f\u56fe\u3002\u8fd9\u79cd\u989d\u5916\u7684\u8ba4\u77e5\u8d1f\u62c5\u4f7f\u8bbe\u8ba1\u8fc7\u7a0b\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u963b\u788d\u4e86\u5efa\u7b51\u3001\u5de5\u7a0b\u548c\u65bd\u5de5\uff08AEC\uff09\u884c\u4e1a\u5bf9BIM\u548c\u57fa\u4e8e\u6a21\u578b\u7684\u8bbe\u8ba1\u7684\u91c7\u7528\u3002 \u4e3a\u4e86\u66f4\u76f4\u89c2\u5730\u8868\u8fbe\u8bbe\u8ba1\u610f\u56fe\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014Text2BIM\u3002\u8be5\u6846\u67b6\u80fd\u591f\u4ece\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u62103D\u5efa\u7b51\u6a21\u578b\u3002\u5b83\u901a\u8fc7\u534f\u8c03\u591a\u4e2aLLM\u4ee3\u7406\u534f\u4f5c\u5e76\u63a8\u7406\uff0c\u5c06\u6587\u672c\u7528\u6237\u8f93\u5165\u8f6c\u6362\u4e3a\u8c03\u7528BIM\u521b\u5efa\u5de5\u5177API\u7684\u6307\u4ee4\u4ee3\u7801\uff0c\u4ece\u800c\u5728\u8f6f\u4ef6\u4e2d\u751f\u6210\u5177\u6709\u5185\u90e8\u5e03\u5c40\u3001\u5916\u90e8\u5916\u58f3\u548c\u8bed\u4e49\u4fe1\u606f\u7684\u53ef\u7f16\u8f91BIM\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u89c4\u5219\u7684\u6a21\u578b\u68c0\u67e5\u5668\uff0c\u5229\u7528\u9884\u5b9a\u4e49\u7684\u9886\u57df\u77e5\u8bc6\u6307\u5bfcLLM\u4ee3\u7406\u89e3\u51b3\u751f\u6210\u6a21\u578b\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u8fed\u4ee3\u6539\u8fdb\u6a21\u578b\u8d28\u91cf\u3002 \u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\u6765\u6bd4\u8f83\u548c\u5206\u6790\u5728\u63d0\u8bae\u6846\u67b6\u4e0b\u4e09\u79cd\u4e0d\u540cLLM\u7684\u8868\u73b0\u3002\u8bc4\u4f30\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u6709\u6548\u5730\u751f\u6210\u9ad8\u8d28\u91cf\u3001\u7ed3\u6784\u5408\u7406\u4e14\u4e0e\u7528\u6237\u8f93\u5165\u6307\u5b9a\u7684\u62bd\u8c61\u6982\u5ff5\u76f8\u4e00\u81f4\u7684\u5efa\u7b51\u6a21\u578b\u3002 \u6700\u540e\uff0c\u5f00\u53d1\u4e86\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u8f6f\u4ef6\u539f\u578b\uff0c\u5c06\u8be5\u6846\u67b6\u96c6\u6210\u5230BIM\u521b\u5efa\u8f6f\u4ef6Vectorworks\u4e2d\uff0c\u5c55\u793a\u4e86\u901a\u8fc7\u804a\u5929\u8fdb\u884c\u5efa\u6a21\u7684\u6f5c\u529b\u3002|\n", "2408.09955": "|**2024-08-20**|**MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems**|Qian Wang et.al.|[2408.09955](http://arxiv.org/abs/2408.09955)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0cLLM\u9a71\u52a8\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08LLM-MA\u7cfb\u7edf\uff09\u88ab\u63d0\u51fa\u4ee5\u5e94\u5bf9\u5b9e\u9645\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u7684\u667a\u80fd\u4f53\u5927\u591a\u9075\u5faa\u5728\u6574\u4f53\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8\u7684\u9884\u5b9a\u4e49\u6807\u51c6\u64cd\u4f5c\u7a0b\u5e8f\uff08SOP\uff09\uff0c\u7f3a\u4e4f\u81ea\u4e3b\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u6b64\u5916\uff0c\u5f53\u524d\u89e3\u51b3\u65b9\u6848\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u6548\u667a\u80fd\u4f53\u5408\u4f5c\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MegaAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u5927\u89c4\u6a21LLM\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u81ea\u4e3b\u5408\u4f5c\u7684\u5b9e\u7528\u6846\u67b6\u3002MegaAgent\u5229\u7528\u667a\u80fd\u4f53\u7684\u81ea\u4e3b\u6027\u52a8\u6001\u751f\u6210\u57fa\u4e8e\u4efb\u52a1\u9700\u6c42\u7684\u667a\u80fd\u4f53\uff0c\u96c6\u6210\u4e86\u4efb\u52a1\u81ea\u52a8\u5212\u5206\u3001\u667a\u80fd\u4f53\u6d3b\u52a8\u7cfb\u7edf\u7ea7\u89c4\u5212\u4e0e\u76d1\u63a7\u4ee5\u53ca\u5e76\u53d1\u64cd\u4f5c\u7ba1\u7406\u7b49\u529f\u80fd\u3002\u6b64\u5916\uff0cMegaAgent\u91c7\u7528\u5c42\u6b21\u7ed3\u6784\u8bbe\u8ba1\uff0c\u5e76\u5229\u7528\u7cfb\u7edf\u7ea7\u5e76\u884c\u6027\u6765\u63d0\u5347\u6027\u80fd\u548c\u589e\u5f3a\u901a\u4fe1\u6548\u7387\u3002 \u6211\u4eec\u901a\u8fc7\u56f4\u68cb\u6e38\u620f\u5f00\u53d1\u5c55\u793a\u4e86MegaAgent\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6d41\u884c\u7684LLM-MA\u7cfb\u7edf\uff1b\u5e76\u901a\u8fc7\u56fd\u5bb6\u653f\u7b56\u6a21\u62df\u9a8c\u8bc1\u4e86\u5176\u9ad8\u81ea\u4e3b\u6027\u548c\u5feb\u901f\u6269\u5c55\u81f3590\u4e2a\u667a\u80fd\u4f53\u7684\u80fd\u529b\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u5b83\u4eec\u4e4b\u95f4\u7684\u6709\u6548\u5408\u4f5c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cMegaAgent\u662f\u9996\u4e2a\u65e0\u9884\u5b9a\u4e49SOP\u3001\u9ad8\u6548\u4e14\u5177\u6709\u9ad8\u53ef\u6269\u5c55\u6027\u7684\u5927\u89c4\u6a21LLM-MA\u7cfb\u7edf\uff0c\u4e3a\u8be5\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u94fa\u5e73\u4e86\u9053\u8def\u3002\u6211\u4eec\u7684\u4ee3\u7801\u4f4d\u4e8e\u3002|\n", "2408.09785": "|**2024-08-19**|**GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making**|Arsham Gholamzadeh Khoee et.al.|[2408.09785](http://arxiv.org/abs/2408.09785)|null|\u5728\u6c7d\u8f66\u884c\u4e1a\u4e2d\uff0c\u4f20\u7edf\u8f6f\u4ef6\u90e8\u7f72\u51b3\u7b56\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u5bf9\u8868\u683c\u5316\u6d4b\u8bd5\u6570\u636e\u7684\u624b\u52a8\u5206\u6790\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u5bfc\u81f4\u66f4\u9ad8\u7684\u6210\u672c\u548c\u8f6f\u4ef6\u53d1\u5e03\u5468\u671f\u7684\u5ef6\u8fdf\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u7279\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u5e94\u7528\u901a\u5e38\u9700\u8981\u591a\u8f6e\u7684\u4eba\u5de5\u9a71\u52a8\u63d0\u793a\u5de5\u7a0b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u5728\u5de5\u4e1a\u6700\u7ec8\u7528\u6237\u4e2d\u7684\u5b9e\u9645\u90e8\u7f72\uff0c\u7279\u522b\u662f\u90a3\u4e9b\u9700\u8981\u53ef\u9760\u548c\u9ad8\u6548\u7ed3\u679c\u7684\u7528\u6237\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGoNoGo\u7684LLM\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u6c7d\u8f66\u8f6f\u4ef6\u90e8\u7f72\u8fc7\u7a0b\uff0c\u540c\u65f6\u6ee1\u8db3\u529f\u80fd\u8981\u6c42\u548c\u5de5\u4e1a\u7ea6\u675f\u3002\u4e0e\u4ee5\u5f80\u7cfb\u7edf\u4e0d\u540c\uff0cGoNoGo\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u548c\u98ce\u9669\u654f\u611f\u7cfb\u7edf\u8fdb\u884c\u4e86\u5b9a\u5236\u3002\u6211\u4eec\u4f7f\u7528\u6765\u81ea\u5de5\u4e1a\u5b9e\u8df5\u7684\u96f6\u6b21\u548c\u5c11\u91cf\u6b21\u793a\u4f8b\u6765\u8bc4\u4f30GoNoGo\u5728\u4e0d\u540c\u4efb\u52a1\u96be\u5ea6\u4e0b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0cGoNoGo\u5728\u96be\u5ea6\u4e0d\u8d85\u8fc7\u4e8c\u7ea7\u76843\u6b21\u793a\u4f8b\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86100%\u7684\u6210\u529f\u7387\uff0c\u5e76\u4e14\u5373\u4f7f\u5bf9\u4e8e\u66f4\u590d\u6742\u7684\u4efb\u52a1\u4e5f\u80fd\u4fdd\u6301\u9ad8\u7ee9\u6548\u3002\u6211\u4eec\u53d1\u73b0\uff0cGoNoGo\u6709\u6548\u5730\u81ea\u52a8\u5316\u4e86\u8f83\u7b80\u5355\u4efb\u52a1\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5e72\u9884\u7684\u9700\u6c42\u3002\u603b\u4e4b\uff0cGoNoGo\u4ee3\u8868\u4e86\u4e00\u4e2a\u76ee\u524d\u5728\u6211\u4eec\u7684\u5de5\u4e1a\u5408\u4f5c\u4f19\u4f34\u516c\u53f8\u4e2d\u88ab\u7528\u4e8e\u534f\u52a9\u8f6f\u4ef6\u53d1\u5e03\u51b3\u7b56\u7684\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684LLM\u57fa\u89e3\u51b3\u65b9\u6848\uff0c\u652f\u6301\u4e86\u98ce\u9669\u654f\u611f\u8f66\u8f86\u7cfb\u7edf\u53d1\u5e03\u8fc7\u7a0b\u4e2d\u7684\u66f4\u52a0\u660e\u667a\u548c\u53ca\u65f6\u7684\u51b3\u7b56\u3002|\n", "2408.09559": "|**2024-08-18**|**HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model**|Mengkang Hu et.al.|[2408.09559](http://arxiv.org/abs/2408.09559)|**[link](https://github.com/hiagent2024/hiagent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f5c\u4e3a\u80fd\u591f\u5904\u7406\u73af\u5883\u89c2\u5bdf\u5e76\u751f\u6210\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u76ee\u6807\u4efb\u52a1\u7684\u4ea4\u4e92\u7cfb\u7edf\u3002\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d7\u5230\u5176\u8bb0\u5fc6\u673a\u5236\u7684\u5f71\u54cd\uff0c\u8be5\u673a\u5236\u901a\u8fc7\u8bb0\u5f55\u5386\u53f2\u7ecf\u9a8c\u6765\u5f62\u6210\u4e00\u7cfb\u5217\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u8bb0\u5fc6\u5206\u4e3a\u4e24\u7c7b\uff1a\u8de8\u8bd5\u8bb0\u5fc6\uff0c\u79ef\u7d2f\u4e8e\u591a\u6b21\u5c1d\u8bd5\u4e2d\uff1b\u4ee5\u53ca\u5355\u8bd5\u8bb0\u5fc6\uff08\u5de5\u4f5c\u8bb0\u5fc6\uff09\uff0c\u79ef\u7d2f\u4e8e\u5355\u4e00\u5c1d\u8bd5\u5185\u3002\u5c3d\u7ba1\u5173\u4e8e\u8de8\u8bd5\u8bb0\u5fc6\u4f18\u5316\u7684\u7814\u7a76\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u4f55\u901a\u8fc7\u63d0\u5347\u5de5\u4f5c\u8bb0\u5fc6\u5229\u7528\u6548\u7387\u6765\u589e\u5f3a\u4ee3\u7406\u6027\u80fd\u7684\u63a2\u7d22\u4ecd\u76f8\u5bf9\u4e0d\u8db3\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u76f4\u63a5\u5c06\u6574\u4e2a\u5386\u53f2\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5bfc\u81f4\u5728\u957f\u671f\u4efb\u52a1\u4e2d\u5b58\u5728\u5197\u4f59\u95ee\u9898\u3002\u53d7\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u7b56\u7565\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHiAgent\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5c06\u5b50\u76ee\u6807\u4f5c\u4e3a\u8bb0\u5fc6\u5757\u6765\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7684\u5de5\u4f5c\u8bb0\u5fc6\u8fdb\u884c\u5c42\u6b21\u5316\u7ba1\u7406\u3002\u5177\u4f53\u6765\u8bf4\uff0cHiAgent\u4fc3\u4f7fLLM\u5728\u751f\u6210\u6267\u884c\u52a8\u4f5c\u524d\u5148\u5236\u5b9a\u5b50\u76ee\u6807\uff0c\u5e76\u5141\u8bb8LLM\u4e3b\u52a8\u51b3\u5b9a\u66ff\u6362\u4e4b\u524d\u7684\u5b50\u76ee\u6807\uff0c\u4ec5\u4fdd\u7559\u4e0e\u5f53\u524d\u5b50\u76ee\u6807\u76f8\u5173\u7684\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u3002\u5728\u4e94\u4e2a\u957f\u671f\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHiAgent\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e86\u4e24\u500d\uff0c\u5e73\u5747\u6b65\u9aa4\u6570\u51cf\u5c11\u4e863.8\u4e2a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cHiAgent\u5728\u6574\u4e2a\u6b65\u9aa4\u4e2d\u5747\u80fd\u6301\u7eed\u6539\u5584\u6027\u80fd\uff0c\u8fd9\u51f8\u663e\u4e86\u5176\u7a33\u5065\u6027\u548c\u6cdb\u7528\u6027\u3002 \u9879\u76ee\u9875\u9762\uff1ahttps://github.com/HiAgent2024/HiAgent**|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u8f83\u5dee\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aFLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\u7684\u65b0\u9896\u591a\u6a21\u6001LLM\u57fa\u5143\u4f53\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u6709\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u9002\u5e94\u5bfc\u822a\u4efb\u52a1\uff0c\u5305\u62ec\u5355\u611f\u77e5\u8c03\u6574\u4ee5\u63cf\u8ff0\u8857\u666f\u3001\u591a\u611f\u77e5\u8c03\u6574\u4ee5\u603b\u7ed3\u8f68\u8ff9\u4ee5\u53ca\u5728VLN\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7aef\u5230\u7aef\u8bad\u7ec3\u3002\u5408\u6210\u7684\u6570\u636e\u96c6\u662f\u81ea\u52a8\u751f\u6210\u7684\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u63d0\u9ad8\u4e867.3%\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u5e76\u4ee3\u8868\u4e86\u8fc8\u5411\u5b9e\u9645\u5e94\u7528\u4e2d\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53AI\u9886\u57df\u7684\u8fdb\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\u7684\u52a0\u6301\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4f5c\u51fa\u65e5\u76ca\u81ea\u4e3b\u7684\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5229\u7528\u4e86\u201c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u201d\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u7684\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5728\u5b8c\u6210\u7ed9\u5b9a\u4efb\u52a1\u7684\u540c\u65f6\u786e\u4fdd\u5b89\u5168\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u79cd\u6279\u5224\u673a\u5236\uff0c\u4ee5\u6307\u5bfc\u4ee3\u7406\u5728\u6bcf\u4e00\u6b65\u9632\u6b62\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u9274\u4e8e\u7f3a\u4e4f\u73b0\u6709\u57fa\u51c6\u6765\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u6536\u96c6\u4e8680\u4e2a\u5de5\u5177\u5305\uff0c\u8986\u76d68\u4e2a\u7c7b\u522b\uff0c\u5171\u8ba1180\u4e2a\u573a\u666f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u663e\u793a\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.10455": "|**2024-08-24**|**IDEA:Enhancing the Rule Learning Ability of Language Agents through Induction, Deduction, and Abduction**|Kaiyu He et.al.|[2408.10455](http://arxiv.org/abs/2408.10455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aRULEARN\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\u3002\u5728RULEARN\u4e2d\uff0c\u4ee3\u7406\u901a\u8fc7\u4e0e\u73af\u5883\u4e92\u52a8\u6536\u96c6\u89c2\u5bdf\uff0c\u5e76\u4ece\u4e2d\u63a8\u65ad\u6a21\u5f0f\uff0c\u4ee5\u6b64\u89e3\u51b3\u95ee\u9898\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u4ee3\u7406\u5728\u8be5\u57fa\u51c6\u4e0a\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86IDEA\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u5f52\u7eb3\u3001\u6f14\u7ece\u548c\u6eaf\u56e0\u4e09\u79cd\u63a8\u7406\u8fc7\u7a0b\u3002IDEA\u4ee3\u7406\u901a\u8fc7\u7ed3\u6784\u5316\u63a8\u7406\u5e8f\u5217\u63d0\u5347\u8fd9\u4e00\u65b9\u6cd5\uff1a\u9996\u5148\u901a\u8fc7\u6eaf\u56e0\u751f\u6210\u5047\u8bbe\uff0c\u7136\u540e\u901a\u8fc7\u6f14\u7ece\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\uff0c\u6700\u540e\u6839\u636e\u53cd\u9988\u8fdb\u884c\u9002\u5e94\u6027\u4fee\u6b63\u3002\u8fd9\u79cd\u5e8f\u5217\u4f7f\u4ee3\u7406\u80fd\u591f\u52a8\u6001\u5efa\u7acb\u5e76\u5e94\u7528\u89c4\u5219\uff0c\u6a21\u4eff\u4eba\u7c7b\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf9\u4e94\u79cd\u4ee3\u8868\u6027LLM\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u751f\u6210\u5408\u7406\u7684\u521d\u59cb\u5047\u8bbe\uff0c\u4f46\u5728\u73af\u5883\u5185\u7684\u6218\u7565\u4e92\u52a8\u3001\u6709\u6548\u6574\u5408\u53cd\u9988\u4ee5\u53ca\u5047\u8bbe\u7684\u9002\u5e94\u6027\u4fee\u6b63\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u800cIDEA\u4ee3\u7406\u5728RULEARN\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4e3a\u6211\u4eec\u5f00\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5b9e\u73b0\u7c7b\u4f3c\u4eba\u7c7b\u89c4\u5219\u5b66\u4e60\u80fd\u529b\u7684\u4ee3\u7406\u63d0\u4f9b\u4e86\u5b9d\u8d35\u89c1\u89e3\u3002\u6211\u4eec\u5c06\u4f1a\u53d1\u5e03\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2408.12142": "|**2024-08-22**|**MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents**|Congchi Yin et.al.|[2408.12142](http://arxiv.org/abs/2408.12142)|**[link](https://github.com/lemonsis/mdd-5k)**|**\u5728\u5927\u591a\u6570\u7cbe\u795e\u75be\u75c5\u8bca\u65ad\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u662f\u4e3b\u8981\u7684\u8bca\u65ad\u4f9d\u636e\u3002\u521b\u5efa\u8fd9\u6837\u7684\u8bca\u65ad\u5bf9\u8bdd\u6570\u636e\u96c6\u6709\u671b\u63a8\u52a8AI\u7cbe\u795e\u5065\u5eb7\u62a4\u7406\u9886\u57df\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5728\u5b9e\u9645\u8bca\u65ad\u573a\u666f\u4e2d\u6536\u96c6\u5bf9\u8bdd\u6781\u4e3a\u56f0\u96be\uff0c\u539f\u56e0\u5728\u4e8e\u9690\u79c1\u548c\u4f26\u7406\u8003\u8651\u7684\u4e25\u683c\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u5229\u7528\u6613\u4e8e\u83b7\u53d6\u7684\u533f\u540d\u60a3\u8005\u6848\u4f8b\u6765\u5408\u6210\u8bca\u65ad\u5bf9\u8bdd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u795e\u7ecf\u7b26\u53f7\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5408\u6210\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u5bf9\u8bdd\u3002\u8be5\u6846\u67b6\u4ee5\u60a3\u8005\u6848\u4f8b\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u80fd\u591f\u751f\u6210\u9488\u5bf9\u5355\u4e2a\u60a3\u8005\u6848\u4f8b\u7684\u591a\u4e2a\u591a\u6837\u5316\u7684\u5bf9\u8bdd\uff0c\u5176\u57fa\u672c\u8fc7\u7a0b\u6d89\u53ca\u533b\u751f\u4ee3\u7406\u4e0e\u60a3\u8005\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\uff0c\u5e76\u901a\u8fc7\u5de5\u5177\u4ee3\u7406\u5b9e\u73b0\u57fa\u4e8e\u7b26\u53f7\u63a7\u5236\u7684\u6587\u672c\u751f\u6210\uff0c\u501f\u52a9\u52a8\u6001\u8bca\u65ad\u6811\u3002\u901a\u8fc7\u5e94\u7528\u63d0\u51fa\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5305\u542b1000\u4e2a\u6e05\u6d17\u8fc7\u7684\u5b9e\u9645\u60a3\u8005\u6848\u4f8b\u3001\u4e0e\u4e00\u5bb6\u9886\u5148\u7684\u7cbe\u795e\u75c5\u533b\u9662\u5408\u4f5c\u6784\u5efa\u7684\u4e2d\u56fd\u6700\u5927\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u6570\u636e\u96c6MDD-5k\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e865000\u4e2a\u9ad8\u8d28\u91cf\u7684\u957f\u5bf9\u8bdd\u53ca\u5176\u8bca\u65ad\u7ed3\u679c\u6807\u7b7e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5305\u542b\u4e2d\u6587\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u7ed3\u679c\u7684\u6807\u8bb0\u6570\u636e\u96c6\u3002\u4eba\u7c7b\u8bc4\u4f30\u8868\u660e\uff0c\u63d0\u51fa\u7684MDD-5k\u6570\u636e\u96c6\u6210\u529f\u6a21\u62df\u4e86\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u8fc7\u7a0b\u3002\u6570\u636e\u96c6\u548c\u4ee3\u7801\u5c06\u5728https://github.com/lemonsis/MDD-5k\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2408.12680": "|**2024-09-01**|**Can LLMs Understand Social Norms in Autonomous Driving Games?**|Boxuan Wang et.al.|[2408.12680](http://arxiv.org/abs/2408.12680)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u4e0e\u6a21\u62df\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u793e\u4f1a\u89c4\u8303\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5c06LLM\u96c6\u6210\u5230\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u7684\u667a\u80fd\u4ee3\u7406\u89d2\u8272\u4e2d\uff0c\u6211\u4eec\u57fa\u4e8e\u6587\u672c\u63d0\u793a\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u6309\u7167\u76f8\u5173\u73af\u5883\u8bbe\u5b9a\u548c\u89c2\u5bdf\u4fe1\u606f\u505a\u51fa\u51b3\u7b56\u3002\u6211\u4eec\u7684\u6846\u67b6\u6d89\u53caLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\u8fdb\u884c\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\uff0c\u4ee5\u6b64\u7814\u7a76\u4e2a\u4f53\u4ee3\u7406\u4e4b\u95f4\u793e\u4f1a\u89c4\u8303\u7684\u5f62\u6210\u3002 \u6211\u4eec\u8bbe\u8ba1\u5b9e\u9a8c\uff0c\u5229\u7528OpenAI\u804a\u5929API\uff08\u7531GPT-4.0\u63d0\u4f9b\u52a8\u529b\uff09\u5728\u65e0\u4fe1\u53f7\u4ea4\u53c9\u53e3\u6e38\u620f\u4e0e\u9ad8\u901f\u516c\u8def\u8f66\u961f\u6e38\u620f\u4e24\u79cd\u573a\u666f\u4e0b\u6a21\u62df\u4ea4\u4e92\u5e76\u8bc4\u4f30LLM\u9a71\u52a8\u4ee3\u7406\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u80fd\u591f\u5904\u7406\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\u4e2d\u7684\u52a8\u6001\u73af\u5883\u53d8\u5316\uff0c\u5e76\u4e14\u5728\u4e24\u4e2a\u573a\u666f\u4e2d\uff0c\u4ee3\u7406\u95f4\u5f62\u6210\u4e86\u793e\u4f1a\u89c4\u8303\u3002 \u5728\u4ea4\u53c9\u53e3\u6e38\u620f\u4e2d\uff0c\u5f53\u9762\u4e34\u6f5c\u5728\u8f66\u7978\u65f6\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u503e\u5411\u4e8e\u91c7\u53d6\u4fdd\u5b88\u7684\u9a7e\u9a76\u7b56\u7565\u3002LLM\u9a71\u52a8\u4ee3\u7406\u5728\u6e38\u620f\u4e2d\u7684\u4f18\u52bf\u5728\u4e8e\u5176\u64cd\u4f5c\u7075\u6d3b\u6027\u548c\u53ef\u5206\u6790\u6027\uff0c\u8fd9\u6709\u52a9\u4e8e\u5b9e\u9a8c\u8bbe\u8ba1\u3002|\n", "2408.14307": "|**2024-08-26**|**LLM-3D Print: Large Language Models To Monitor and Control 3D Printing**|Yayati Jadhav et.al.|[2408.14307](http://arxiv.org/abs/2408.14307)|null|\u884c\u4e1a4.0\u901a\u8fc7\u63a8\u52a8\u6570\u5b57\u5316\u8fdb\u7a0b\u5e76\u8f6c\u5411\u589e\u6750\u5236\u9020\uff08AM\uff09\uff0c\u5f7b\u5e95\u6539\u53d8\u4e86\u5236\u9020\u4e1a\u3002\u7194\u878d\u6c89\u79ef\u5efa\u6a21\uff08FDM\uff09\u4f5c\u4e3a\u5173\u952e\u7684AM\u6280\u672f\u4e4b\u4e00\uff0c\u901a\u8fc7\u9010\u5c42\u6324\u51fa\u65b9\u5f0f\u521b\u5efa\u9ad8\u5ea6\u5b9a\u5236\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u6750\u6599\u6d6a\u8d39\u6781\u5c0f\u7684\u4ea7\u54c1\uff0c\u5bf9\u4f20\u7edf\u51cf\u6750\u65b9\u6cd5\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6750\u6599\u6324\u51fa\u6280\u672f\u7684\u6613\u9519\u6027\u5f80\u5f80\u9700\u8981\u4e13\u5bb6\u4ecb\u5165\u6765\u68c0\u6d4b\u548c\u7f13\u89e3\u53ef\u80fd\u4e25\u91cd\u635f\u5bb3\u4ea7\u54c1\u8d28\u91cf\u7684\u7f3a\u9677\u3002\u867d\u7136\u5df2\u5b58\u5728\u81ea\u52a8\u5316\u9519\u8bef\u68c0\u6d4b\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u4f46\u5b83\u4eec\u5728\u4e0d\u540c3D\u6253\u5370\u673a\u8bbe\u7f6e\u3001\u56fa\u4ef6\u548c\u4f20\u611f\u5668\u4e4b\u95f4\u7684\u901a\u7528\u6027\u6709\u9650\uff0c\u5e76\u4e14\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e3D\u6253\u5370\u6280\u672f\u76f8\u7ed3\u5408\u7684\u8fc7\u7a0b\u76d1\u63a7\u548c\u63a7\u5236\u6846\u67b6\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u89e3\u51b3\u6253\u5370\u7f3a\u9677\u3002\u8be5LLM\u901a\u8fc7\u5206\u6790\u6bcf\u5c42\u6216\u6253\u5370\u6bb5\u4e4b\u540e\u6355\u83b7\u7684\u56fe\u50cf\u6765\u8bc4\u4f30\u6253\u5370\u8d28\u91cf\uff0c\u8bc6\u522b\u6545\u969c\u6a21\u5f0f\uff0c\u5e76\u5411\u6253\u5370\u673a\u67e5\u8be2\u76f8\u5173\u53c2\u6570\u3002\u7136\u540e\uff0c\u5b83\u751f\u6210\u5e76\u6267\u884c\u7ea0\u6b63\u63aa\u65bd\u8ba1\u5212\u3002\u6211\u4eec\u901a\u8fc7\u5c06\u63d0\u51fa\u7684\u6846\u67b6\u7684\u6709\u6548\u6027\u4e0e\u4e00\u7ec4\u5177\u6709\u4e0d\u540cAM\u4e13\u4e1a\u77e5\u8bc6\u7684\u5de5\u7a0b\u5e08\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u4ee5\u9a8c\u8bc1\u8bc6\u522b\u7f3a\u9677\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0d\u4ec5\u51c6\u786e\u8bc6\u522b\u5e38\u89c1\u76843D\u6253\u5370\u9519\u8bef\uff0c\u5982\u4e0d\u4e00\u81f4\u7684\u6324\u51fa\u3001\u4e1d\u72b6\u5806\u79ef\u3001\u7fd8\u66f2\u548c\u5c42\u7c98\u5408\u95ee\u9898\uff0c\u800c\u4e14\u8fd8\u80fd\u6709\u6548\u786e\u5b9a\u5bfc\u81f4\u8fd9\u4e9b\u5931\u8d25\u7684\u53c2\u6570\uff0c\u5e76\u81ea\u4e3b\u5730\u8fdb\u884c\u4fee\u6b63\uff0c\u65e0\u9700\u4efb\u4f55\u4eba\u5de5\u5e72\u9884\u3002|\n", "2408.14033": "|**2024-09-02**|**MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents**|Ruochen Li et.al.|[2408.14033](http://arxiv.org/abs/2408.14033)|**[link](https://github.com/du-nlp-lab/mlr-copilot)**|**\u673a\u5668\u5b66\u4e60\u7814\u7a76\u5bf9\u4e8e\u6280\u672f\u8fdb\u6b65\u548c\u521b\u65b0\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5e38\u5e38\u9762\u4e34\u590d\u6742\u6027\u9ad8\u3001\u5b9e\u9a8c\u5468\u671f\u957f\u4ee5\u53ca\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7b49\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7cfb\u7edf\u6846\u67b6\u2014\u2014\u81ea\u4e3b\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLR-Copilot\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u5e76\u5b9e\u65bd\u7814\u7a76\u60f3\u6cd5\u6765\u63d0\u9ad8\u673a\u5668\u5b66\u4e60\u7814\u7a76\u7684\u751f\u4ea7\u529b\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e09\u4e2a\u9636\u6bb5\uff1a\u7814\u7a76\u60f3\u6cd5\u751f\u6210\u3001\u5b9e\u9a8c\u5b9e\u73b0\u548c\u6267\u884c\u3002\u9996\u5148\uff0c\u901a\u8fc7\u57fa\u4e8eLLM\u7684IdeaAgent\u5229\u7528\u73b0\u6709\u7814\u7a76\u8bba\u6587\u751f\u6210\u5047\u8bbe\u548c\u5b9e\u9a8c\u8ba1\u5212\u3002\u63a5\u4e0b\u6765\uff0c\u5728\u5b9e\u73b0\u751f\u6210\u9636\u6bb5\uff0c\u5c06\u8fd9\u4e9b\u8ba1\u5212\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u4f7f\u7528ExperimentAgent\u5b8c\u6210\u6b64\u8fc7\u7a0b\u3002\u6b64\u9636\u6bb5\u5229\u7528\u68c0\u7d22\u5230\u7684\u539f\u578b\u4ee3\u7801\uff0c\u5e76\u6839\u636e\u9700\u8981\u68c0\u7d22\u5019\u9009\u6a21\u578b\u548c\u6570\u636e\u3002\u6700\u540e\uff0c\u5728\u6267\u884c\u9636\u6bb5\uff0c\u4e5f\u7531ExperimentAgent\u7ba1\u7406\uff0c\u6d89\u53ca\u8fd0\u884c\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u548c\u8fed\u4ee3\u8c03\u8bd5\u673a\u5236\uff0c\u4ee5\u589e\u52a0\u5b9e\u73b0\u53ef\u6267\u884c\u7814\u7a76\u6210\u679c\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5bf9\u4e94\u4e2a\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\u4e86\u8be5\u6846\u67b6\u4fc3\u8fdb\u7814\u7a76\u8fdb\u5c55\u548c\u521b\u65b0\u7684\u6f5c\u529b\u3002**|\n", "2408.13986": "|**2024-08-26**|**AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework**|Jie Feng et.al.|[2408.13986](http://arxiv.org/abs/2408.13986)|**[link](https://github.com/tsinghua-fib-lab/agentmove)**|**\u4eba\u7c7b\u79fb\u52a8\u6027\u9884\u6d4b\u5728\u5404\u79cd\u5b9e\u9645\u5e94\u7528\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u5c3d\u7ba1\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u4f46\u5b83\u4eec\u5bf9\u7528\u4e8e\u8bad\u7ec3\u7684\u5927\u91cf\u79c1\u4eba\u79fb\u52a8\u6570\u636e\u7684\u4f9d\u8d56\u4ee5\u53ca\u65e0\u6cd5\u8fdb\u884c\u96f6\u542f\u52a8\u9884\u6d4b\u7684\u80fd\u529b\uff0c\u963b\u788d\u4e86\u8fdb\u4e00\u6b65\u7684\u53d1\u5c55\u3002\u6700\u8fd1\uff0c\u6709\u4eba\u5c1d\u8bd5\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6267\u884c\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u3002\u7136\u800c\uff0c\u4ed6\u4eec\u7684\u6027\u80fd\u53d7\u9650\u4e8e\u7f3a\u4e4f\u7cfb\u7edf\u7684\u8bbe\u8ba1\u5de5\u4f5c\u6d41\u7a0b\u3002\u4ed6\u4eec\u76f4\u63a5\u4f7f\u7528LLMs\u751f\u6210\u6700\u7ec8\u8f93\u51fa\uff0c\u8fd9\u9650\u5236\u4e86LLMs\u53d1\u73b0\u590d\u6742\u79fb\u52a8\u6a21\u5f0f\u7684\u6f5c\u529b\uff0c\u5e76\u4f4e\u4f30\u4e86\u5b83\u4eec\u5728\u5168\u7403\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u65b9\u9762\u7684\u5de8\u5927\u50a8\u5907\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAgentMove\u7684\u7cfb\u7edf\u6027\u4ee3\u7406\u9884\u6d4b\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4efb\u4f55\u5168\u7403\u57ce\u5e02\u7684\u901a\u7528\u79fb\u52a8\u6027\u9884\u6d4b\u3002\u5728AgentMove\u4e2d\uff0c\u6211\u4eec\u9996\u5148\u5c06\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u5206\u89e3\u4e3a\u4e09\u4e2a\u5b50\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u76f8\u5e94\u7684\u6a21\u5757\u6765\u5b8c\u6210\u8fd9\u4e9b\u5b50\u4efb\u52a1\uff0c\u5305\u62ec\u4e2a\u4f53\u79fb\u52a8\u6a21\u5f0f\u6316\u6398\u7684\u7a7a\u95f4-\u65f6\u95f4\u8bb0\u5fc6\u3001\u57ce\u5e02\u7ed3\u6784\u6548\u5e94\u5bf9\u6a21\u578b\u7684\u5f71\u54cd\u7684\u5168\u7403\u77e5\u8bc6\u751f\u6210\u5668\u4ee5\u53ca\u6355\u83b7\u4eba\u53e3\u5171\u4eab\u6a21\u5f0f\u7684\u96c6\u4f53\u77e5\u8bc6\u63d0\u53d6\u5668\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06\u4e09\u4e2a\u6a21\u5757\u7684\u7ed3\u679c\u7ed3\u5408\u8d77\u6765\uff0c\u5e76\u6267\u884c\u63a8\u7406\u6b65\u9aa4\u4ee5\u751f\u6210\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u6765\u81ea\u4e24\u4e2a\u6765\u6e90\u768412\u4e2a\u57ce\u5e02\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u6700\u4f73\u57fa\u7ebf\u76f8\u6bd4\uff0cAgentMove\u5728\u5404\u79cd\u6307\u6807\u4e0a\u7684\u6027\u80fd\u63d0\u9ad8\u4e86\u8d85\u8fc78%\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57ce\u5e02\u4e2d\u663e\u793a\u51fa\u4e86\u7a33\u5065\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u4e14\u4f7f\u7528\u4e0d\u540c\u57fa\u7840\u7684LLM\u65f6\u4e5f\u80fd\u8868\u73b0\u51fa\u8272\uff0c\u4e14\u5177\u6709\u8f83\u4f4e\u7684\u5730\u7406\u504f\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u4ee5\u5728https://github.com/tsinghua-fib-lab/AgentMove\u627e\u5230\u3002**|\n", "2408.13406": "|**2024-08-23**|**Optimizing Collaboration of LLM based Agents for Finite Element Analysis**|Chuan Tian et.al.|[2408.13406](http://arxiv.org/abs/2408.13406)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u7a0b\u548c\u7f16\u7801\u4efb\u52a1\u4e2d\u7684\u591a\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u5229\u7528AutoGen\u6846\u67b6\u4fc3\u8fdb\u4ee3\u7406\u4e4b\u95f4\u7684\u6c9f\u901a\uff0c\u5e76\u57fa\u4e8e\u6bcf\u79cd\u8bbe\u7f6e\u768440\u6b21\u968f\u673a\u8fd0\u884c\u7684\u6210\u529f\u7387\u8bc4\u4f30\u4e0d\u540c\u7684\u914d\u7f6e\u3002\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u5f00\u53d1\u4e00\u4e2a\u7075\u6d3b\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u7528\u4e8e\u5c06\u6709\u9650\u5143\u65b9\u6cd5\u5e94\u7528\u4e8e\u89e3\u51b3\u7ebf\u6027\u5f39\u6027\u95ee\u9898\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u4f18\u5316\u4ee3\u7406\u89d2\u8272\u53ca\u5176\u660e\u786e\u804c\u8d23\u7684\u91cd\u8981\u6027\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u589e\u52a0\u4ee3\u7406\u6570\u91cf\u3002\u4ee3\u7406\u95f4\u7684\u6709\u6548\u534f\u4f5c\u88ab\u8bc1\u660e\u5bf9\u4e8e\u89e3\u51b3\u6709\u9650\u5143\u65b9\u6cd5\u7684\u4e00\u822c\u6311\u6218\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u9879\u7814\u7a76\u5c55\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u589e\u5f3a\u8ba1\u7b97\u81ea\u52a8\u5316\u5728\u6a21\u62df\u65b9\u6cd5\u5b66\u4e2d\u7684\u6f5c\u529b\uff0c\u4e3a\u5de5\u7a0b\u548c\u4eba\u5de5\u667a\u80fd\u7684\u672a\u6765\u8fdb\u5c55\u94fa\u5e73\u9053\u8def\u3002|\n", "2408.14972": "|**2024-08-27**|**AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems**|Chi-Min Chan et.al.|[2408.14972](http://arxiv.org/abs/2408.14972)|**[link](https://github.com/chanchimin/agentmonitor)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u52a8\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5174\u8d77\u3002\u8fd1\u671f\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\uff0c\u6bcf\u4e2a\u4ee3\u7406\u6267\u884c\u7279\u5b9a\u89d2\u8272\u65f6\uff0c\u5176\u6027\u80fd\u901a\u5e38\u4f18\u4e8e\u5355\u4e00LLM\u3002\u7136\u800c\uff0c\u914d\u7f6eMAS\u4ee5\u5b8c\u6210\u4efb\u52a1\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u4efb\u52a1\u8868\u73b0\u4ec5\u5728\u6267\u884c\u540e\u624d\u80fd\u89c2\u5bdf\u5230\u3002\u53d7\u5230LLM\u5f00\u53d1\u4e2d\u7684\u89c4\u6a21\u6cd5\u5219\u542f\u53d1\uff0c\u6211\u4eec\u63a2\u7d22\u662f\u5426\u80fd\u5728\u4efb\u52a1\u6267\u884c\u524d\u9884\u6d4bMAS\u7684\u6027\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentMonitor\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5728\u4ee3\u7406\u5c42\u7ea7\u96c6\u6210\uff0c\u7528\u4e8e\u6355\u83b7\u8f93\u5165\u548c\u8f93\u51fa\u4fe1\u606f\uff0c\u5e76\u5c06\u8fd9\u4e9b\u4fe1\u606f\u8f6c\u6362\u4e3a\u7edf\u8ba1\u6570\u636e\uff0c\u7528\u4e8e\u8bad\u7ec3\u56de\u5f52\u6a21\u578b\u9884\u6d4b\u4efb\u52a1\u6027\u80fd\u3002\u6b64\u5916\uff0cAgentMonitor\u8fd8\u80fd\u591f\u5b9e\u65f6\u5bf9\u53ef\u80fd\u7531\u6076\u610f\u4ee3\u7406\u5f15\u53d1\u7684\u5b89\u5168\u98ce\u9669\u8fdb\u884c\u7ea0\u6b63\uff0c\u4ece\u800c\u51cf\u8f7b\u8d1f\u9762\u5f71\u54cd\u5e76\u589e\u5f3aMAS\u7684\u5b89\u5168\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528XGBoost\u6a21\u578b\u5728\u9886\u57df\u5185\u573a\u666f\u4e0b\u8fbe\u52300.89\u7684\u65af\u76ae\u5c14\u66fc\u76f8\u5173\u7cfb\u6570\uff0c\u5728\u66f4\u5177\u6311\u6218\u6027\u7684\u573a\u666f\u4e0b\u8fbe\u52300.58\u3002\u901a\u8fc7\u5e94\u7528AgentMonitor\uff0c\u6709\u5bb3\u5185\u5bb9\u51cf\u5c11\u4e866.2%\uff0c\u6709\u76ca\u5185\u5bb9\u5e73\u5747\u589e\u52a0\u4e861.8%\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u548c\u53ef\u9760\u6027\u3002\u76f8\u5173\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\u3002**|\n", "2408.15778": "|**2024-09-05**|**LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models**|Jiayi Gui et.al.|[2408.15778](http://arxiv.org/abs/2408.15778)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLogicGame\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89c4\u5219\u7406\u89e3\u548c\u6267\u884c\u3001\u591a\u6b65\u89c4\u5212\u65b9\u9762\u7684\u5168\u9762\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0cLogicGame\u63d0\u4f9b\u4e86\u591a\u79cd\u6e38\u620f\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u7cfb\u5217\u89c4\u5219\u4ee5\u53ca\u521d\u59cb\u72b6\u6001\uff0c\u8981\u6c42\u6a21\u578b\u7406\u89e3\u5e76\u5e94\u7528\u9884\u5b9a\u4e49\u89c4\u5219\u6765\u89e3\u51b3\u95ee\u9898\u3002\u6211\u4eec\u521b\u5efa\u4e86\u6a21\u62df\u60c5\u666f\uff0c\u8ba9\u6a21\u578b\u6267\u884c\u6216\u89c4\u5212\u64cd\u4f5c\u4ee5\u8fbe\u5230\u7279\u5b9a\u76ee\u6807\u3002\u8fd9\u4e9b\u6e38\u620f\u573a\u666f\u4e13\u95e8\u8bbe\u8ba1\u4ee5\u533a\u5206\u903b\u8f91\u63a8\u7406\u4e0e\u4ec5\u4f9d\u8d56\u77e5\u8bc6\u7684\u80fd\u529b\uff0c\u5b8c\u5168\u4f9d\u8d56\u4e8e\u9884\u8bbe\u89c4\u5219\u3002\u8fd9\u79cd\u5206\u79bb\u5141\u8bb8\u5bf9\u57fa\u4e8e\u89c4\u5219\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u7eaf\u7cb9\u7684\u8bc4\u4f30\u3002\u8bc4\u4f30\u4e0d\u4ec5\u8003\u8651\u6700\u7ec8\u7ed3\u679c\uff0c\u8fd8\u8003\u8651\u4e2d\u95f4\u6b65\u9aa4\uff0c\u63d0\u4f9b\u6a21\u578b\u6027\u80fd\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u4e2d\u95f4\u6b65\u9aa4\u662f\u786e\u5b9a\u6027\u7684\uff0c\u5e76\u4e14\u53ef\u4ee5\u81ea\u52a8\u9a8c\u8bc1\u3002LogicGame\u5b9a\u4e49\u4e86\u4ece\u7b80\u5355\u89c4\u5219\u5e94\u7528\u5230\u590d\u6742\u63a8\u7406\u94fe\u7684\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u6e38\u620f\u573a\u666f\uff0c\u4ee5\u7cbe\u786e\u8bc4\u4f30\u6a21\u578b\u5728\u89c4\u5219\u7406\u89e3\u548c\u591a\u6b65\u6267\u884c\u4e0a\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528LogicGame\uff0c\u6211\u4eec\u6d4b\u8bd5\u4e86\u5404\u79cdLLM\uff0c\u5e76\u53d1\u73b0\u4e86\u5b83\u4eec\u5728\u57fa\u4e8e\u89c4\u5219\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u663e\u8457\u4e0d\u8db3\u3002|\n", "2408.16090": "|**2024-08-28**|**EPO: Hierarchical LLM Agents with Environment Preference Optimization**|Qi Zhao et.al.|[2408.16090](http://arxiv.org/abs/2408.16090)|**[link](https://github.com/kevinz8866/epo)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u5c42\u6846\u67b6\uff0c\u7528\u4e8e\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u5b50\u76ee\u6807\u7684\u95ee\u9898\u3002\u6846\u67b6\u4f7f\u7528\u4e86\u72ec\u7acb\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5b50\u76ee\u6807\u9884\u6d4b\u548c\u4f4e\u7ea7\u52a8\u4f5c\u751f\u6210\u3002\u9488\u5bf9\u65e0\u6807\u6ce8\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u4fe1\u53f7\u521b\u5efa\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u5229\u7528\u73af\u5883\u591a\u6a21\u6001\u53cd\u9988\u81ea\u52a8\u751f\u6210\u5956\u52b1\u4fe1\u53f7\u3002\u6211\u4eec\u5f15\u5165\u4e86\u73af\u5883\u504f\u597d\u4f18\u5316\uff08EPO\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4ece\u73af\u5883\u53cd\u9988\u4e2d\u751f\u6210\u504f\u597d\u4fe1\u53f7\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u4fe1\u53f7\u8bad\u7ec3\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u3002ALFRED\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u6027\u80fd\u4e0a\u5904\u4e8e\u9886\u5148\u5730\u4f4d\uff0c\u9996\u6b21\u767b\u4e0a\u4e86ALFRED\u516c\u5f00\u6392\u884c\u699c\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u7684\u957f\u671f\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u7684\u63d0\u5347\u6f5c\u529b\u3002|\n", "2408.16991": "|**2024-08-30**|**Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios**|Zhongyuan Wang et.al.|[2408.16991](http://arxiv.org/abs/2408.16991)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5de5\u5177\u8f85\u52a9\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8eSQL\u68c0\u67e5\u548c\u6539\u8fdb\uff0c\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u67e5\u8be2\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3aLLM\u4ee3\u7406\u914d\u5907\u4e24\u4e2a\u4e13\u95e8\u5de5\u5177\u2014\u2014\u68c0\u7d22\u5668\u548c\u68c0\u6d4b\u5668\uff0c\u4ee5\u8bca\u65ad\u5e76\u4fee\u6b63SQL\u67e5\u8be2\u4e2d\u7684\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u3002\u8fd9\u4e9b\u5de5\u5177\u80fd\u591f\u589e\u5f3aLLM\u5904\u7406\u771f\u5b9e\u573a\u666f\u4e2d\u51fa\u73b0\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u548c\u4e25\u683c\u7ea6\u675f\u4e0d\u5339\u914d\u7b49\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u7684\u80fd\u529b\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86Spider-Mismatch\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u4e3a\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u4e2d\u9047\u5230\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u95ee\u9898\u800c\u6784\u5efa\u7684\u65b0\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5c11\u91cf\u793a\u4f8b\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u548cSpider-Realistic\u6570\u636e\u96c6\u4e0a\u7684\u5e73\u5747\u8868\u73b0\u6700\u4f73\uff0c\u5e76\u4e14\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5728\u66f4\u5177\u6709\u73b0\u5b9e\u6027\u7684\u6570\u636e\u96c6Spider-Mismatch\u4e0a\u4e5f\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.00993": "|**2024-09-02**|**Evolution of Social Norms in LLM Agents using Natural Language**|Ilya Horiguchi et.al.|[2409.00993](http://arxiv.org/abs/2409.00993)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u6fc0\u53d1\u4e86\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u6e38\u620f\u7406\u8bba\u6a21\u62df\u7684\u5174\u8da3\uff0c\u5728\u8fd9\u4e9b\u6a21\u62df\u4e2d\uff0cLLM\u5145\u5f53\u4e2a\u4f53\u4ee3\u7406\uff0c\u8fdb\u884c\u793e\u4f1a\u4e92\u52a8\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5bf9\u8bdd\u4f7fLLM\u4ee3\u7406\u81ea\u53d1\u751f\u6210\u5e76\u9075\u5b88\u89c4\u8303\u7b56\u7565\u7684\u53ef\u80fd\u6027\uff0c\u4ee5\u6b64\u4e3a\u57fa\u7840\uff0c\u63a2\u7d22\u4e86\u5bf9Axelrod\u7684\u5143\u89c4\u8303\u6e38\u620f\u5de5\u4f5c\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u5bf9\u8bdd\uff0cLLM\u4ee3\u7406\u80fd\u591f\u4ec5\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u5f62\u6210\u590d\u6742\u7684\u793e\u4ea4\u89c4\u8303\uff0c\u5982\u5143\u89c4\u8303\u2014\u2014\u89c4\u8303\u60e9\u7f5a\u4e0d\u60e9\u7f5a\u4f5c\u5f0a\u884c\u4e3a\u7684\u89c4\u8303\u3002\u7ed3\u679c\u8bc1\u5b9e\u4e86\u4f7f\u7528LLM\u4ee3\u7406\u6a21\u62df\u793e\u4f1a\u4e92\u52a8\u548c\u7406\u89e3\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u6f14\u5316\u51fa\u590d\u6742\u7b56\u7565\u4e0e\u89c4\u8303\u7684\u6709\u6548\u6027\u3002\u672a\u6765\u7684\u5de5\u4f5c\u53ef\u80fd\u901a\u8fc7\u6269\u5c55\u5230\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u548c\u4ee3\u7406\u7279\u5f81\uff0c\u63ed\u793a\u66f4\u591a\u5173\u4e8e\u793e\u4f1a\u89c4\u8303\u5f62\u6210\u7684\u5fae\u5999\u673a\u5236\u3002|\n", "2409.00985": "|**2024-09-02**|**Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces**|Jiapeng Yu et.al.|[2409.00985](http://arxiv.org/abs/2409.00985)|**[link](https://github.com/yuqian2003/co_learning)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5728\u7ebf\u95ee\u7b54\u7cfb\u7edf\u4ece\u5a31\u4e50\u7528\u9014\u9010\u6e10\u8f6c\u5411\u4e13\u4e1a\u9886\u57df\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7801\u5b66\u4e60\uff08Co-Learning\uff09\u793e\u533a\u201d\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7ed3\u5408\u73af\u5883\u5f3a\u5316\u5b66\u4e60\uff08E-RL\uff09\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u72ec\u7acb\u4fee\u6b63\u4ee3\u7801\u9519\u8bef\u3002\u8be5\u7cfb\u7edf\u901a\u8fc7\u4e00\u4e2a\u5305\u542b702\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u539f\u59cb\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3aE-RL\u5956\u52b1\u6216\u60e9\u7f5a\u7684\u6807\u51c6\u3002\u901a\u8fc7\u5206\u6790\u5f53\u524d\u4ee3\u7406\u8f93\u5165\u7684\u9519\u8bef\u4ee3\u7801\uff0c\u9009\u62e9\u5408\u9002\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4ee5\u5b9e\u73b0\u6700\u4f73\u7684\u9519\u8bef\u4fee\u6b63\u51c6\u786e\u7387\u5e76\u51cf\u5c11\u4fee\u6b63\u65f6\u95f4\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u65e0E-RL\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8be5\u65b9\u6cd5\u5728\u7cbe\u786e\u5ea6\u5f97\u5206\u4e0a\u63d0\u9ad8\u4e863%\uff0c\u5728\u65f6\u95f4\u6210\u672c\u4e0a\u964d\u4f4e\u4e8615%\u3002\u6211\u4eec\u7684\u6e90\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://github.com/yuqian2003/Co_Learning**|\n", "2409.00135": "|**2024-08-29**|**HoneyComb: A Flexible LLM-Based Agent System for Materials Science**|Huan Zhang et.al.|[2409.00135](http://arxiv.org/abs/2409.00135)|null|\u4e3a\u4e86\u5e94\u5bf9\u6750\u6599\u79d1\u5b66\u4efb\u52a1\u4e2d\u7684\u590d\u6742\u6027\u5e76\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5e94\u7528\u65f6\u6240\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5982\u4f9d\u8d56\u8fc7\u65f6\u7684\u9690\u6027\u77e5\u8bc6\u5bfc\u81f4\u7684\u51c6\u786e\u6027\u4e0b\u964d\u548c\u5e7b\u89c9\u73b0\u8c61\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HoneyComb\u2014\u2014\u9996\u4e2a\u4e13\u95e8\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684LLM\u4ee3\u7406\u7cfb\u7edf\u3002HoneyComb\u901a\u8fc7\u5229\u7528\u4e00\u4e2a\u57fa\u4e8e\u53ef\u9760\u6587\u732e\u7684\u9ad8\u8d28\u91cf\u6750\u6599\u79d1\u5b66\u77e5\u8bc6\u5e93\uff08MatSciKB\uff09\u548c\u4e00\u79cd\u521b\u65b0\u7684\u5de5\u5177\u96c6\uff08ToolHub\uff09\uff0c\u589e\u5f3a\u5176\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u7279\u6709\u7684\u63a8\u7406\u4e0e\u8ba1\u7b97\u80fd\u529b\u3002 MatSciKB\u662f\u4e00\u4e2a\u7ecf\u8fc7\u7cbe\u5fc3\u7f16\u7e82\u3001\u7ed3\u6784\u5316\u7684\u77e5\u8bc6\u96c6\u5408\uff0c\u65e8\u5728\u6db5\u76d6\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5173\u952e\u4fe1\u606f\u3002\u800cToolHub\u5219\u91c7\u7528\u4e86\u4e00\u79cd\u5f52\u7eb3\u5f0f\u5de5\u5177\u6784\u5efa\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u3001\u5206\u89e3\u548c\u4f18\u5316\u9002\u7528\u4e8e\u6750\u6599\u79d1\u5b66\u7684API\u5de5\u5177\uff0c\u4ece\u800c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u7cfb\u7edf\u7684\u5b9e\u7528\u6027\u3002\u6b64\u5916\uff0cHoneyComb\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u68c0\u7d22\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u667a\u80fd\u9009\u62e9\u6700\u5408\u9002\u7684\u77e5\u8bc6\u6765\u6e90\u6216\u5de5\u5177\uff0c\u786e\u4fdd\u4e86\u7b54\u6848\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHoneyComb\u5728\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5404\u79cd\u4efb\u52a1\u4e0a\u5747\u8868\u73b0\u51fa\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u7684\u80fd\u529b\uff0c\u6210\u529f\u5730\u5f25\u5408\u4e86\u5f53\u524dLLM\u6280\u672f\u4e0e\u6750\u6599\u79d1\u5b66\u7279\u5b9a\u9700\u6c42\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u66f4\u4e3a\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u7684\u53ef\u6269\u5c55\u6846\u67b6\u6613\u4e8e\u6269\u5c55\u81f3\u5176\u4ed6\u79d1\u5b66\u9886\u57df\uff0c\u5c55\u793a\u4e86\u5176\u5728\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u548c\u5e94\u7528\u53d1\u5c55\u65b9\u9762\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u800c\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u7406\u5ff5\uff0c\u5373\u8bd7\u6b4c\u751f\u6210\u7cfb\u7edf\u7684\u5b66\u4e60\u8fc7\u7a0b\u5e94\u66f4\u52a0\u4eba\u6027\u5316\uff0c\u5e76\u4e14\u5176\u8f93\u51fa\u66f4\u52a0\u591a\u6837\u548c\u65b0\u9896\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u5f3a\u8c03\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u4e4b\u5916\u7684\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u5229\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\uff08GPT-3\u548cGPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a\u4ee3\u7406\u7cfb\u7edf\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u5e26\u6765\u4e86\u597d\u5904\uff0c\u5bfc\u81f4n-gram\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\u3002\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u4e0a\u8868\u73b0\u51fa\u7fa4\u4f53\u5206\u5316\u3002\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u53d7\u76ca\uff0c\u5e76\u4e14\u5177\u6709\u975e\u540c\u8d28\u4ee3\u7406\u7684\u66f4\u591a\u6837\u5316\u7684\u6a21\u578b\u96c6\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u663e\u793a\u51fa\u968f\u7740\u65f6\u95f4\u63a8\u79fb\uff0c\u8bcd\u6c47\u591a\u6837\u6027\u51cf\u5c11\uff0c\u5e76\u4e14\u6ca1\u6709\u8868\u73b0\u51fa\u9884\u671f\u7684\u7fa4\u4f53\u5206\u5316\u610f\u56fe\u7684\u793e\u4f1a\u7f51\u7edc\u3002\u6211\u4eec\u7684\u8bba\u6587\u4e3b\u5f20\uff0c\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u5c06\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5efa\u6a21\uff09\u7eb3\u5165\u8003\u8651\u8303\u56f4\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u7684\u4ea4\u4e92\u65b9\u5f0f\u3002**|\n", "2409.03440": "|**2024-09-05**|**Rx Strategist: Prescription Verification using LLM Agents System**|Phuc Phan Van et.al.|[2409.03440](http://arxiv.org/abs/2409.03440)|null|\u4e3a\u4e86\u4fdd\u969c\u60a3\u8005\u5b89\u5168\uff0c\u73b0\u4ee3\u836f\u7269\u590d\u6742\u6027\u8981\u6c42\u4e25\u683c\u5904\u65b9\u9a8c\u8bc1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014Rx Strategist\uff0c\u5b83\u5229\u7528\u77e5\u8bc6\u56fe\u8c31\u548c\u4e0d\u540c\u7684\u641c\u7d22\u7b56\u7565\uff0c\u7ed3\u5408\u4ee3\u7406\u6846\u67b6\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4ee5\u589e\u5f3a\u5176\u80fd\u529b\u3002\u8fd9\u79cd\u591a\u7ef4\u5ea6\u7684\u6280\u672f\u5141\u8bb8\u6784\u5efa\u4e00\u4e2a\u591a\u9636\u6bb5\u7684LLM\u7ba1\u9053\uff0c\u5e76\u4ece\u81ea\u5b9a\u4e49\u6d3b\u6027\u6210\u5206\u6570\u636e\u5e93\u4e2d\u53ef\u9760\u5730\u68c0\u7d22\u4fe1\u606f\u3002\u8be5\u7ba1\u9053\u8986\u76d6\u4e86\u5904\u65b9\u9a8c\u8bc1\u7684\u4e0d\u540c\u65b9\u9762\uff0c\u5982\u9002\u5e94\u75c7\u3001\u5242\u91cf\u548c\u53ef\u80fd\u7684\u836f\u7269\u76f8\u4e92\u4f5c\u7528\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u5305\u542b\u4e86\u8fd9\u4e9b\u65b9\u9762\u7684\u5185\u5bb9\u3002 \u901a\u8fc7\u5728\u8fd9\u4e9b\u9636\u6bb5\u5206\u6563\u63a8\u7406\uff0c\u6211\u4eec\u7f13\u89e3\u4e86\u5355\u4e00LLM\u6280\u672f\u7684\u7f3a\u70b9\uff0c\u63d0\u9ad8\u4e86\u6b63\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cRx Strategist\u8d85\u8d8a\u4e86\u8bb8\u591a\u5f53\u524d\u7684LLMs\uff0c\u5176\u6027\u80fd\u4e0e\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e34\u5e8a\u836f\u5e08\u76f8\u5f53\u3002\u5728\u73b0\u4ee3\u836f\u7269\u7684\u590d\u6742\u4e16\u754c\u4e2d\uff0c\u5c06LLMs\u4e0e\u7ec4\u7ec7\u5316\u77e5\u8bc6\u548c\u9ad8\u7ea7\u641c\u7d22\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u51cf\u5c11\u5904\u65b9\u9519\u8bef\u5e76\u63d0\u9ad8\u60a3\u8005\u7ed3\u679c\u7684\u53ef\u884c\u9014\u5f84\u3002|\n", "2409.03258": "|**2024-09-05**|**GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding**|Yukun Cao et.al.|[2409.03258](http://arxiv.org/abs/2409.03258)|null|\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u56fe\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u901a\u8fc7\u63cf\u8ff0\u5e8f\u5217\u7684\u56fe\u8bf4\u660e\u6765\u7406\u89e3\u56fe\u5f62\u7ed3\u6784\u4fe1\u606f\u65f6\uff0c\u5c24\u5176\u662f\u5728\u56fe\u7684\u5927\u5c0f\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u9047\u5230\u4e86\u6311\u6218\u3002\u6211\u4eec\u5f52\u56e0\u4e8eLLMs\u5728\u56fe\u63cf\u8ff0\u5e8f\u5217\u7684\u4e0d\u540c\u4f4d\u7f6e\u4e0a\u5b58\u5728\u4e0d\u5747\u5300\u7684\u8bb0\u5fc6\u6027\u80fd\uff0c\u5373\u6240\u8c13\u7684\u201c\u4f4d\u7f6e\u504f\u89c1\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GraphInsight\uff0c\u4e00\u4e2a\u65e8\u5728\u63d0\u9ad8LLMs\u5bf9\u5b8f\u89c2\u548c\u5fae\u89c2\u56fe\u5f62\u4fe1\u606f\u7406\u89e3\u7684\u65b0\u6846\u67b6\u3002GraphInsight\u57fa\u4e8e\u4e24\u4e2a\u5173\u952e\u7b56\u7565\uff1a1\uff09\u5c06\u5173\u952e\u56fe\u5f62\u4fe1\u606f\u653e\u7f6e\u5728LLMs\u8868\u73b0\u51fa\u66f4\u5f3a\u8bb0\u5fc6\u6027\u80fd\u7684\u4f4d\u7f6e\uff1b2\uff09\u5bf9\u4e8e\u8bb0\u5fc6\u6027\u80fd\u8f83\u5f31\u7684\u533a\u57df\uff0c\u63a2\u7d22\u4f7f\u7528\u8f7b\u91cf\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e93\uff0c\u7075\u611f\u6765\u81ea\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u3002\u6b64\u5916\uff0cGraphInsight\u8fd8\u63a2\u7d22\u4e86\u5c06\u8fd9\u4e24\u79cd\u7b56\u7565\u96c6\u6210\u5230LLM\u4ee3\u7406\u6d41\u7a0b\u4e2d\uff0c\u4ee5\u89e3\u51b3\u9700\u8981\u591a\u6b65\u63a8\u7406\u7684\u590d\u5408\u56fe\u4efb\u52a1\u3002\u5e7f\u6cdb\u7684\u57fa\u51c6\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e0d\u540c\u5927\u5c0f\u7684\u56fe\u5f62\u7ed3\u6784\u7406\u89e3\u4efb\u52a1\u4e0a\uff0cGraphInsight\u663e\u8457\u8d85\u8d8a\u4e86\u6240\u6709\u5176\u4ed6\u56fe\u63cf\u8ff0\u65b9\u6cd5\uff08\u4f8b\u5982\u63d0\u793a\u6280\u672f\u3001\u91cd\u65b0\u6392\u5e8f\u7b56\u7565\u7b49\uff09\u3002|\n", "2409.02977": "|**2024-09-04**|**Large Language Model-Based Agents for Software Engineering: A Survey**|Junwei Liu et.al.|[2409.02977](http://arxiv.org/abs/2409.02977)|**[link](https://github.com/fudanselab/agent4se-paper-list)**|**\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u7bc7\u5168\u9762\u4e14\u7cfb\u7edf\u7684\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u7684\u5e94\u7528\u7684\u7efc\u8ff0\u3002\u6211\u4eec\u6536\u96c6\u4e86106\u7bc7\u8bba\u6587\uff0c\u5e76\u4ece\u4e24\u4e2a\u89d2\u5ea6\u8fdb\u884c\u5206\u7c7b\uff0c\u5373\u8f6f\u4ef6\u5de5\u7a0b\u89c6\u89d2\u548c\u4ee3\u7406\u89c6\u89d2\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6b64\u7efc\u8ff0\u7684\u4ed3\u5e93\u5730\u5740\u4e3a\uff1ahttps://github.com/FudanSELab/Agent4SE-Paper-List\u3002**|\n", "2409.05001": "|**2024-09-08**|**A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement**|Huan Zhang et.al.|[2409.05001](http://arxiv.org/abs/2409.05001)|**[link](https://github.com/nju-websoft/paircoder)**|**\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u77a9\u76ee\u7684\u6027\u80fd\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u63d0\u793a\u6280\u672f\u53ca\u4ee3\u7801\u7cbe\u70bc\u5bf9LLM\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u7f16\u7a0b\u95ee\u9898\u65f6\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u5177\u6709\u50f5\u5316\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPairCoder\u7684\u65b0\u578bLLM\u57fa\u6846\u67b6\uff0c\u65e8\u5728\u6a21\u4eff\u53cc\u4eba\u534f\u4f5c\u7f16\u7a0b\u5b9e\u8df5\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 PairCoder\u7531\u4e24\u4e2a\u534f\u4f5c\u7684LLM\u4ee3\u7406\u7ec4\u6210\uff1a\u5bfc\u822a\u5458\uff08Navigator\uff09\u548c\u9a7e\u9a76\u5458\uff08Driver\uff09\u3002\u5bfc\u822a\u5458\u8d1f\u8d23\u63d0\u51fa\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3001\u9009\u62e9\u5f53\u524d\u6700\u4f73\u8ba1\u5212\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u6307\u5bfc\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u9a7e\u9a76\u5458\u5219\u9075\u5faa\u5bfc\u822a\u5458\u7684\u6307\u5f15\uff0c\u8fdb\u884c\u521d\u59cb\u4ee3\u7801\u751f\u6210\u3001\u4ee3\u7801\u6d4b\u8bd5\u548c\u4f18\u5316\u3002 \u8fd9\u79cd\u4ea4\u66ff\u548c\u8fed\u4ee3\u7684\u5de5\u4f5c\u6d41\u7a0b\u5305\u62ec\u591a\u8ba1\u5212\u63a2\u7d22\u548c\u57fa\u4e8e\u53cd\u9988\u7684\u7ec6\u5316\uff0c\u6a21\u62df\u4e86\u53cc\u4eba\u7a0b\u5e8f\u5458\u7684\u5408\u4f5c\u65b9\u5f0f\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\uff0c\u5728\u591a\u79cd\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u4e0a\u5bf9PairCoder\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPairCoder\u5728\u51c6\u786e\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u76f4\u63a5\u4f7f\u7528\u63d0\u793a\u7684LLM\uff0c\u76f8\u5bf9pass@1\u63d0\u9ad8\u4e8612.00%-162.43%\u3002**|\n", "2409.04617": "|**2024-09-06**|**Sparse Rewards Can Self-Train Dialogue Agents**|Barrett Martin Lattimer et.al.|[2409.04617](http://arxiv.org/abs/2409.04617)|**[link](https://github.com/asappresearch/josh-llm-simulation-training)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u8f6e\u5bf9\u8bdd\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\u4e3b\u8981\u7531\u76d1\u7763\u5fae\u8c03\u548c\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u53cd\u9988\u9a71\u52a8\u3002\u7136\u800c\uff0c\u968f\u7740\u57fa\u7840LLM\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u63d0\u5347\uff0c\u83b7\u53d6\u6709\u610f\u4e49\u7684\u4eba\u7c7b\u53cd\u9988\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u67d0\u4e9b\u9886\u57df\u4e2d\uff0c\u57fa\u7840LLM\u53ef\u80fd\u6700\u7ec8\u8d85\u8d8a\u4eba\u7c7b\u80fd\u529b\uff0c\u4f7f\u5f97\u4f20\u7edf\u7684\u57fa\u4e8e\u53cd\u9988\u7684\u65b9\u6cd5\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u81ea\u6211\u6539\u8fdb\u8303\u5f0f\uff0c\u5141\u8bb8LLM\u4ee3\u7406\u5728\u6ca1\u6709\u5916\u90e8\u4eba\u7c7b\u53cd\u9988\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u63d0\u9ad8\u5176\u6027\u80fd\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u6bd4\u7ed3\u679c\u4e3a\u6a21\u62df\u6536\u83b7\u201d\uff08JOSH\uff09\u7684\u81ea\u6211\u5bf9\u9f50\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u7a00\u758f\u5956\u52b1\u6a21\u62df\u73af\u5883\u6765\u63d0\u53d6\u7406\u60f3\u884c\u4e3a\uff0c\u5e76\u8fdb\u4e00\u6b65\u8bad\u7ec3LLM\u4ee5\u81ea\u8eab\u8f93\u51fa\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u4eceMultiWOZ\u4e2d\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u5de5\u5177\u8c03\u7528\u7684\u7a00\u758f\u5956\u52b1\u4eff\u771f\u73af\u5883\uff0c\u79f0\u4e3aToolWOZ\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528JOSH\u8bad\u7ec3\u7684\u6a21\u578b\uff08\u65e0\u8bba\u662f\u5c0f\u578b\u8fd8\u662f\u524d\u6cbf\u6a21\u578b\uff09\uff0c\u5728\u57fa\u4e8e\u5de5\u5177\u7684\u4ea4\u4e92\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u6a21\u578b\u80fd\u529b\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2409.06351": "|**2024-09-10**|**MAGDA: Multi-agent guideline-driven diagnostic assistance**|David Bani-Harouni et.al.|[2409.06351](http://arxiv.org/abs/2409.06351)|null|\u5728\u7d27\u6025\u62a4\u7406\u90e8\u95e8\u3001\u504f\u8fdc\u533b\u9662\u6216\u53d1\u5c55\u4e2d\u56fd\u5bb6\u7684\u8bca\u6240\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u7ecf\u5e38\u7f3a\u4e4f\u7531\u8bad\u7ec3\u6709\u7d20\u7684\u653e\u5c04\u79d1\u533b\u751f\u5feb\u901f\u5206\u6790\u5f71\u50cf\u7684\u80fd\u529b\uff0c\u8fd9\u4f1a\u5bf9\u75c5\u4eba\u7684\u5065\u5eb7\u62a4\u7406\u4ea7\u751f\u4e0d\u5229\u5f71\u54cd\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6709\u53ef\u80fd\u901a\u8fc7\u63d0\u4f9b\u6709\u52a9\u4e8e\u4ed6\u4eec\u51b3\u7b56\u7684\u89c1\u89e3\u6765\u7f13\u89e3\u8fd9\u4e9b\u4e34\u5e8a\u533b\u751f\u7684\u538b\u529b\u3002\u5c3d\u7ba1\u8fd9\u4e9bLLM\u5728\u5c55\u793a\u5176\u7406\u8bba\u533b\u5b66\u77e5\u8bc6\u7684\u533b\u5b66\u8003\u8bd5\u4e0a\u53d6\u5f97\u4e86\u9ad8\u5206\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4e0d\u9075\u5faa\u533b\u5b66\u6307\u5357\u3002\u4e3a\u6b64\u9879\u5de5\u4f5c\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u96f6\u6837\u672c\u6307\u5357\u9a71\u52a8\u51b3\u7b56\u652f\u6301\u65b9\u6cd5\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7531\u591a\u4e2aLLM\u4ee3\u7406\u7ec4\u6210\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u4ee3\u7406\u914d\u5907\u4e86\u5bf9\u6bd4\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u534f\u4f5c\u65b9\u5f0f\u8fbe\u6210\u60a3\u8005\u8bca\u65ad\u3002\u5728\u5411\u8fd9\u4e9b\u4ee3\u7406\u63d0\u4f9b\u7b80\u5355\u7684\u8bca\u65ad\u6307\u5357\u540e\uff0c\u5b83\u4eec\u4f1a\u5408\u6210\u63d0\u793a\u5e76\u6839\u636e\u8fd9\u4e9b\u6307\u5357\u7b5b\u9009\u56fe\u50cf\u4ee5\u5bfb\u627e\u53d1\u73b0\u3002\u6700\u540e\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e00\u4e2a\u53ef\u7406\u89e3\u7684\u63a8\u7406\u94fe\u8def\u6765\u89e3\u91ca\u5176\u8bca\u65ad\u7ed3\u679c\uff0c\u5e76\u81ea\u6211\u7cbe\u70bc\u4ee5\u8003\u8651\u75be\u75c5\u4e4b\u95f4\u7684\u76f8\u4e92\u4f9d\u8d56\u6027\u3002\u7531\u4e8e\u6211\u4eec\u7684\u65b9\u6cd5\u662f\u96f6\u6837\u672c\u7684\uff0c\u56e0\u6b64\u9002\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u573a\u666f\uff0c\u5728\u8fd9\u4e9b\u573a\u666f\u4e2d\u8bad\u7ec3\u6570\u636e\u6709\u9650\uff0c\u4f46\u4e13\u5bb6\u8bbe\u8ba1\u7684\u75be\u75c5\u63cf\u8ff0\u53ef\u7528\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u80f8\u90e8X\u5c04\u7ebf\u6570\u636e\u96c6CheXpert\u548cChestX-ray 14 Longtail\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u4e0e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u76f8\u6bd4\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u80fd\u591f\u5e94\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u7684\u6cdb\u5316\u3002|\n", "2409.09030": "|**2024-09-23**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanlin Wang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5e76\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u3002\u6211\u4eec\u53d1\u73b0\u8bb8\u591a\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u7684\u6df1\u5ea6\u7efc\u8ff0\uff0c\u4ee5\u6574\u7406\u5176\u53d1\u5c55\u80cc\u666f\u3001\u5206\u6790\u5982\u4f55\u7ed3\u5408LLMs\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u7c7b\u4efb\u52a1\u4ee5\u53ca\u9610\u660eSE\u4e2d\u7684LLMs\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u5f00\u5c55\u9996\u6b21\u9488\u5bf9\u7ed3\u5408LLMs\u4ee3\u7406\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2dLLMs\u4ee3\u7406\u7684\u6846\u67b6\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u3002\u540c\u65f6\uff0c\u603b\u7ed3\u4e86\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5e76\u9488\u5bf9\u73b0\u6709\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u8bba\u6587\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09013": "|**2024-09-13**|**AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents**|Zhe Su et.al.|[2409.09013](http://arxiv.org/abs/2409.09013)|null|\u4e3a\u4e86\u5b89\u5168\u548c\u6210\u529f\u5730\u90e8\u7f72\uff0c\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5fc5\u987b\u540c\u65f6\u6ee1\u8db3\u771f\u5b9e\u6027\u548c\u5b9e\u7528\u6027\u76ee\u6807\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u5f80\u5f80\u5728\u51b2\u7a81\u4e2d\uff0c\u4f8b\u5982AI\u52a9\u624b\u5e2e\u52a9\u4e8c\u624b\u8f66\u9500\u552e\u5458\u9500\u552e\u6709\u7455\u75b5\u7684\u6c7d\u8f66\u3002\u8fd9\u79cd\u51b2\u7a81\u90e8\u5206\u5f52\u56e0\u4e8e\u6a21\u7cca\u6216\u8bef\u5bfc\u6027\u7684\u7528\u6237\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAI-LieDar\u7684\u6846\u67b6\uff0c\u4ee5\u7814\u7a76\u5728\u591a\u8f6e\u4ea4\u4e92\u8bbe\u7f6e\u4e2d\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5982\u4f55\u5904\u7406\u5b9e\u7528\u6027\u548c\u771f\u5b9e\u6027\u7684\u51b2\u7a81\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u73b0\u5b9e\u573a\u666f\uff0c\u5176\u4e2d\u8bed\u8a00\u4ee3\u7406\u88ab\u6307\u793a\u5b9e\u73b0\u4e0e\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u771f\u5b9e\u6027\u51b2\u7a81\u7684\u76ee\u6807\u3002\u4e3a\u4e86\u5927\u89c4\u6a21\u8bc4\u4f30\u771f\u5b9e\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5fc3\u7406\u5b66\u6587\u732e\u7684\u53ef\u4fe1\u5ea6\u68c0\u6d4b\u5668\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u7684\u56de\u7b54\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6240\u6709\u6a21\u578b\u7684\u771f\u5b9e\u56de\u7b54\u6bd4\u4f8b\u4e0d\u523050%\uff0c\u5c3d\u7ba1\u8fbe\u5230\u76ee\u6807\uff08\u5b9e\u7528\u6027\uff09\u548c\u771f\u5b9e\u6027\u7684\u6bd4\u4f8b\u5728\u4e0d\u540c\u6a21\u578b\u4e2d\u6709\u6240\u5dee\u5f02\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u6d4b\u8bd5\u4e86LLM\u7684\u53ef\u5f15\u5bfc\u6027\uff0c\u53d1\u73b0\u6a21\u578b\u4f1a\u9075\u5faa\u6076\u610f\u6307\u4ee4\u6765\u6b3a\u9a97\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5f15\u5bfc\u4f7f\u5176\u8d8b\u5411\u771f\u5b9e\u7684\u6a21\u578b\u4e5f\u4ecd\u7136\u53ef\u80fd\u8bf4\u8c0e\u3002 \u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86LLM\u4e2d\u771f\u5b9e\u6027\u7684\u590d\u6742\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u786e\u4fddLLM\u548cAI\u4ee3\u7406\u7684\u5b89\u5168\u53ef\u9760\u90e8\u7f72\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u9075\u5b88\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u57fa\u4e8e\u4eba\u5de5\u7684\u5408\u89c4\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u65e5\u76ca\u589e\u52a0\u91cf\u4ee5\u53ca\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\uff0c\u9762\u4e34\u7740\u96be\u4ee5\u6269\u5c55\u7684\u95ee\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\u4e3a\u81ea\u52a8\u5185\u5bb9\u5408\u89c4\u9a8c\u8bc1\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002\u672c\u5de5\u4f5c\u8bc4\u4f30\u4e86\u516d\u4e2a\u57fa\u4e8eOpen-LLMs\u6784\u5efa\u7684AI\u4ee3\u7406\uff0c\u7528\u4e8e\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u81ea\u52a8\u5316\u89c4\u5219\u9075\u5faa\u68c0\u67e5\uff0c\u5728\u8fd9\u79cd\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\u4e2d\uff0c\u7531\u4e8e\u793e\u533a\u8303\u56f4\u548c\u89c4\u5219\u7684\u5f02\u8d28\u6027\uff0c\u8fd9\u4e00\u4efb\u52a1\u5c24\u4e3a\u56f0\u96be\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\uff0c\u6211\u4eec\u53d1\u73b0AI\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u4e0d\u5408\u89c4\u7684\u5185\u5bb9\u3001\u7406\u89e3\u8bed\u8a00\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u591a\u6837\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u8868\u73b0\u51fa\u9ad8\u5ea6\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\u8bc4\u5206\u89e3\u91ca\u4e0e\u5408\u89c4\u5efa\u8bae\u3002\u57fa\u4e8e\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u7c7b\u8bc4\u4f30\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08717": "|**2024-09-13**|**Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents**|Junchi Yao et.al.|[2409.08717](http://arxiv.org/abs/2409.08717)|null|\u5728\u793e\u4ea4\u5a92\u4f53\u65e5\u76ca\u6210\u4e3a\u793e\u4f1a\u8fd0\u52a8\u5f62\u6210\u516c\u4f17\u610f\u89c1\u7684\u91cd\u8981\u5e73\u53f0\u7684\u80cc\u666f\u4e0b\uff0c\u51c6\u786e\u6a21\u62df\u548c\u9884\u6d4b\u7528\u6237\u610f\u89c1\u52a8\u6001\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u73b0\u8c61\u3001\u653f\u7b56\u5236\u5b9a\u4ee5\u53ca\u5f15\u5bfc\u516c\u4f17\u610f\u89c1\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6a21\u62df\u65b9\u6cd5\u5728\u6355\u6349\u7528\u6237\u884c\u4e3a\u7684\u590d\u6742\u6027\u548c\u52a8\u6001\u6027\u65b9\u9762\u9762\u4e34\u7740\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u793e\u4ea4\u5a92\u4f53\u7528\u6237\u610f\u89c1\u52a8\u6001\u6a21\u62df\u65b9\u6cd5\u2014\u2014FDE-LLM\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u7ed3\u5408\u4e86\u610f\u89c1\u52a8\u6001\u4e0e\u6d41\u884c\u75c5\u6a21\u578b\uff0c\u6709\u6548\u7ea6\u675f\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u884c\u4e3a\u548c\u610f\u89c1\u6f14\u5316\u8fc7\u7a0b\uff0c\u4f7f\u5176\u66f4\u52a0\u7b26\u5408\u73b0\u5b9e\u7f51\u7edc\u4e16\u754c\u3002\u7279\u522b\u5730\uff0cFDE-LLM\u5c06\u7528\u6237\u5206\u4e3a\u610f\u89c1\u9886\u8896\u548c\u8ddf\u968f\u8005\u4e24\u5927\u7c7b\u3002\u610f\u89c1\u9886\u8896\u57fa\u4e8eLLM\u89d2\u8272\u626e\u6f14\uff0c\u5e76\u53d7\u7ec6\u80de\u81ea\u52a8\u673a\uff08CA\uff09\u6a21\u578b\u7ea6\u675f\uff0c\u800c\u610f\u89c1\u8ddf\u968f\u8005\u5219\u878d\u5165\u4e86\u4e00\u4e2a\u7ed3\u5408CA\u6a21\u578b\u4e0eSIR\u6a21\u578b\u7684\u52a8\u6001\u7cfb\u7edf\u3002\u8fd9\u79cd\u521b\u65b0\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u62df\u7684\u51c6\u786e\u6027\u548c\u6548\u7387\u3002 \u5b9e\u9a8c\u5728\u56db\u4e2a\u771f\u5b9e\u5fae\u535a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u5e76\u4f7f\u7528\u5f00\u6e90\u6a21\u578bChatGLM\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u578b\uff08ABM\uff09\u610f\u89c1\u52a8\u6001\u7b97\u6cd5\u548c\u57fa\u4e8eLLM\u7684\u610f\u89c1\u4f20\u64ad\u7b97\u6cd5\uff0c\u6211\u4eec\u7684FDE-LLM\u7b97\u6cd5\u5728\u51c6\u786e\u6027\u4e0e\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002|\n", "2409.10372": "|**2024-09-19**|**Instigating Cooperation among LLM Agents Using Adaptive Information Modulation**|Qiliang Chen et.al.|[2409.10372](http://arxiv.org/abs/2409.10372)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u4f5c\u4e3a\u4eba\u7c7b\u6218\u7565\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5e76\u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u5728\u56e2\u961f\u73af\u5883\u4e2d\u8fdb\u884c\u4e0d\u65ad\u6f14\u5316\u7684\u6218\u7565\u4e92\u52a8\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u4e86\u4f20\u7edf\u7684\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u62df\uff0c\u901a\u8fc7\u4f7f\u7528\u7b56\u7565\u6027\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08SLA\uff09\u4ee5\u53ca\u5f15\u5165\u52a8\u6001\u548c\u9002\u5e94\u6027\u7684\u6cbb\u7406\uff0c\u901a\u8fc7\u4fc3\u8fdb\u793e\u4f1a\u884c\u4e3a\u7684\u5f3a\u5316\u5b66\u4e60\u4ee3\u7406\uff08PPA\uff09\uff0c\u8be5\u4ee3\u7406\u8c03\u8282\u7f51\u7edc\u4e2d\u4ee3\u7406\u4e4b\u95f4\u7684\u4fe1\u606f\u8bbf\u95ee\uff0c\u4ee5\u4f18\u5316\u793e\u4f1a\u798f\u5229\u5e76\u4fc3\u8fdb\u4eb2\u793e\u4f1a\u884c\u4e3a\u3002\u901a\u8fc7\u5728\u8fed\u4ee3\u6e38\u620f\u4e2d\u9a8c\u8bc1\uff0c\u5305\u62ec\u56da\u5f92\u56f0\u5883\uff0c\u6211\u4eec\u5c55\u793a\u4e86SLA\u4ee3\u7406\u8868\u73b0\u51fa\u590d\u6742\u7684\u6218\u7565\u8c03\u6574\u3002PPA\u4ee3\u7406\u6709\u6548\u5730\u5b66\u4e60\u8c03\u6574\u4fe1\u606f\u900f\u660e\u5ea6\uff0c\u5bfc\u81f4\u5408\u4f5c\u7387\u663e\u8457\u63d0\u9ad8\u3002\u8fd9\u4e00\u6846\u67b6\u63d0\u4f9b\u4e86\u5bf9\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u793e\u4f1a\u52a8\u529b\u5b66\u7684\u91cd\u8981\u89c1\u89e3\uff0c\u4e3a\u5728\u5b9e\u9645\u56e2\u961f\u73af\u5883\u4e2d\u90e8\u7f72AI\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.09785": "|**2024-09-17**|**Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition**|Chao-Han Huck Yang et.al.|[2409.09785](http://arxiv.org/abs/2409.09785)|null|\u5728\u8fd1\u671f\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u6280\u672f\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u89e3\u7801\u7684\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u5728\u58f0\u5b66\u5efa\u6a21\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6210\u4e3a\u4e86\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u8bed\u8a00\u5efa\u6a21\u5728\u8bed\u97f3\u5904\u7406\u9886\u57df\u7684\u6f5c\u5728\u65b0\u80fd\u529b\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u751f\u6210\u6027\u8bed\u97f3\u8f6c\u5f55\u9519\u8bef\u4fee\u6b63\u201d\uff08GenSEC\uff09\u7684\u6311\u6218\u3002\u8be5\u6311\u6218\u5305\u542b\u4e86\u4e09\u4e2a\u9488\u5bf9\u540eASR\u8bed\u8a00\u6a21\u578b\u7684\u4efb\u52a1\uff1a\uff08i\uff09\u540eASR\u8f6c\u5f55\u4fee\u6b63\u3001\uff08ii\uff09\u8bf4\u8bdd\u8005\u6807\u7b7e\u5316\u4ee5\u53ca\uff08iii\uff09\u60c5\u611f\u8bc6\u522b\u3002\u8fd9\u4e9b\u4efb\u52a1\u65e8\u5728\u6a21\u62df\u672a\u6765\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8bed\u97f3\u754c\u9762\u4ee3\u7406\u5904\u7406\u5de5\u4f5c\u65f6\u7684\u573a\u666f\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6216\u57fa\u4e8e\u4ee3\u7406\u7684API\u6765\u4fdd\u6301\u5bf9\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u8ba8\u8bba\u4e86\u57fa\u51c6\u8bc4\u4f30\u7684\u7ed3\u679c\u4ee5\u53ca\u8bbe\u8ba1\u672a\u6765\u8bc4\u4f30\u65f6\u5e94\u6c72\u53d6\u7684\u7ecf\u9a8c\u6559\u8bad\u3002|\n", "2409.09584": "|**2024-09-15**|**RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation**|Qingyao Li et.al.|[2409.09584](http://arxiv.org/abs/2409.09584)|null|\u672c\u6587\u9488\u5bf9LLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\u4e0e\u6811\u641c\u7d22\u7b97\u6cd5\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u5f53\u524d\u7684\u641c\u7d22\u7b97\u6cd5\u5728\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u4f4e\u641c\u7d22\u8d28\u91cf\u7684\u95ee\u9898\uff0c\u4e3b\u8981\u6e90\u4e8e\u4ee5\u4e0b\u4e09\u4e2a\u539f\u56e0\uff1a1\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u9ad8\u63a8\u7406\u8981\u6c42\u7684\u641c\u7d22\u7a7a\u95f4\u8bbe\u8ba1\u4e0d\u5408\u7406\uff1b2\uff09\u672a\u80fd\u5145\u5206\u7ed3\u5408\u4ee3\u7801\u53cd\u9988\u4f18\u5316\u641c\u7d22\u8fc7\u7a0b\uff1b3\uff09\u5904\u7406\u8d1f\u53cd\u9988\u65f6\u6548\u7387\u4f4e\u4e0b\uff0c\u5bfc\u81f4\u641c\u7d22\u8d28\u91cf\u548c\u6548\u7387\u964d\u4f4e\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014RethinkMCTS\uff08\u53cd\u601d\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff09\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5728\u751f\u6210\u4ee3\u7801\u4e4b\u524d\u8fdb\u884c\u591a\u5c42\u6b21\u7684\u601d\u8003\u641c\u7d22\uff0c\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7b56\u7565\u9009\u9879\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cRethinkMCTS\u5229\u7528\u7ec6\u7c92\u5ea6\u7684\u4ee3\u7801\u6267\u884c\u53cd\u9988\u6784\u5efa\u53e3\u5934\u53cd\u9988\uff0c\u4ee5\u4fee\u6b63\u641c\u7d22\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u7684\u9519\u8bef\u601d\u8def\u3002\u8fd9\u79cd\u673a\u5236\u786e\u4fdd\u4e86\u641c\u7d22\u6cbf\u7740\u6b63\u786e\u7684\u63a8\u7406\u8def\u5f84\u524d\u8fdb\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4e2a\u641c\u7d22\u6811\u7684\u6574\u4f53\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4e4b\u524d\u7684\u57fa\u4e8e\u641c\u7d22\u548c\u53cd\u9988\u7684\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u76f8\u6bd4\uff0cRethinkMCTS\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u5728HumanEval\u6570\u636e\u96c6\u4e0a\uff0cRethinkMCTS\u5c06GPT-3.5-turbo\u7684pass@1\u6307\u6807\u4ece70.12\u63d0\u9ad8\u5230\u4e8689.02\uff0c\u5c06GPT-4o-mini\u7684pass@1\u6307\u6807\u4ece87.20\u63d0\u5347\u81f394.51\u3002\u901a\u8fc7\u6df1\u5165\u7684\u63a2\u7d22\u548c\u6539\u8fdb\u6574\u4e2a\u641c\u7d22\u6811\u7684\u8d28\u91cf\uff0cRethinkMCTS\u6709\u6548\u5730\u589e\u5f3a\u4e86\u641c\u7d22\u8fc7\u7a0b\u7684\u5168\u9762\u6027\u548c\u6df1\u5ea6\u3002|\n", "2409.09345": "|**2024-09-14**|**Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models**|Yuanzhao Zhai et.al.|[2409.09345](http://arxiv.org/abs/2409.09345)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u4efb\u52a1\u76f8\u5173Q\u503c\u6a21\u578b\u6765\u6307\u5bfc\u884c\u52a8\u9009\u62e9\u7684\u65b9\u6cd5\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u591a\u6b65\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u6536\u96c6\u4e86\u6807\u6ce8\u6709\u6b65\u9aa4\u7ea7Q\u503c\u7684\u51b3\u7b56\u8f68\u8ff9\uff0c\u5e76\u6784\u5efa\u4e86\u504f\u597d\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u4f7f\u7528\u53e6\u4e00\u4e2aLLM\u901a\u8fc7\u6b65\u9aa4\u7ea7\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u62df\u5408\u8fd9\u4e9b\u504f\u597d\uff0c\u4ece\u800c\u5f62\u6210Q\u503c\u6a21\u578b\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u5bf9\u4e8e\u6bcf\u4e2a\u51b3\u7b56\u6b65\u9aa4\uff0cLLM\u4ee3\u7406\u90fd\u4f1a\u9009\u62e9\u5177\u6709\u6700\u9ad8Q\u503c\u7684\u52a8\u4f5c\uff0c\u7136\u540e\u518d\u4e0e\u73af\u5883\u8fdb\u884c\u4ea4\u4e92\u3002\u6211\u4eec\u5c06\u8be5\u65b9\u6cd5\u5e94\u7528\u4e8e\u591a\u4e2a\u5f00\u6e90\u548cAPI\u96c6\u6210\u7684LLM\u4ee3\u7406\u4e0a\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5f15\u5165Q\u503c\u6a21\u578b\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6027\u80fd\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6784\u5efa\u4e8ePhi-3-mini-4k-instruct\u7684\u4ee3\u7406\u5728WebShop\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e86103%\uff0c\u5728HotPotQA\u4efb\u52a1\u4e0a\u63d0\u5347\u4e8675%\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4o-mini\u3002\u6b64\u5916\uff0cQ\u503c\u6a21\u578b\u8fd8\u5177\u5907\u51e0\u4e2a\u4f18\u52bf\uff0c\u5982\u5bf9\u4e0d\u540cLLM\u4ee3\u7406\u7684\u6cdb\u5316\u80fd\u529b\u548c\u4e0e\u73b0\u6709\u63d0\u793a\u7b56\u7565\u65e0\u7f1d\u96c6\u6210\u7684\u80fd\u529b\u3002|\n", "2409.09271": "|**2024-09-14**|**Python Symbolic Execution with LLM-powered Code Generation**|Wenhan Wang et.al.|[2409.09271](http://arxiv.org/abs/2409.09271)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u589e\u5f3a\u7684\u4ee3\u7406\u5de5\u5177\u2014\u2014LLM-Sym\u3002\u8be5\u5de5\u5177\u65e8\u5728\u89e3\u51b3\u4f7f\u7528\u7b26\u53f7\u6267\u884c\u6280\u672f\u5728\u52a8\u6001\u7c7b\u578b\u8bed\u8a00\u5982Python\u4e2d\u9047\u5230\u7684\u4e3b\u8981\u6311\u6218\u3002\u901a\u8fc7\u81ea\u52a8\u8c03\u7528SMT\u6c42\u89e3\u5668Z3\u6765\u89e3\u51b3\u6267\u884c\u8def\u5f84\u7ea6\u675f\uff0cLLM-Sym\u80fd\u591f\u6269\u5c55\u57fa\u7840\u7684\u7b26\u53f7\u6267\u884c\u5f15\u64ce\uff0c\u4f7f\u5176\u652f\u6301\u5305\u542b\u590d\u6742\u6570\u636e\u7c7b\u578b`list`\u7684\u7a0b\u5e8f\u3002 LLM-Sym\u7684\u6838\u5fc3\u8d21\u732e\u5728\u4e8e\u5c06\u590d\u6742\u7684Python\u8def\u5f84\u7ea6\u675f\u8f6c\u5316\u4e3aZ3\u4ee3\u7801\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5b9e\u73b0\u51c6\u786e\u7684\u8def\u5f84\u5230Z3\u4ee3\u7801\u7684\u8f6c\u6362\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6b65\u9aa4\u7684\u4ee3\u7801\u751f\u6210\u7ba1\u9053\uff0c\u5305\u62ec\u7c7b\u578b\u63a8\u65ad\u3001\u68c0\u7d22\u548c\u81ea\u6211\u7cbe\u70bc\u7b49\u73af\u8282\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM-Sym\u80fd\u591f\u89e3\u51b3\u5177\u6709\u590d\u6742\u63a7\u5236\u6d41\u548c\u5217\u8868\u6570\u636e\u7ed3\u6784\u7684LeetCode\u95ee\u9898\u4e2d\u7684\u8def\u5f84\u7ea6\u675f\uff0c\u8fd9\u662f\u57fa\u7840\u7b26\u53f7\u6267\u884c\u5f15\u64ce\u65e0\u6cd5\u505a\u5230\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e3aLLM\u4e0e\u7b26\u53f7\u6c42\u89e3\u5668\u63a8\u7406\u80fd\u529b\u7684\u7ed3\u5408\u5f00\u8f9f\u4e86\u9053\u8def\uff0c\u5e76\u4e3aLLM\u8f85\u52a9\u6d4b\u8bd5\u7528\u4f8b\u751f\u6210\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002|\n", "2409.11393": "|**2024-09-17**|**LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents**|Amine B. Hassouna et.al.|[2409.11393](http://arxiv.org/abs/2409.11393)|null|\u672c\u6587\u901a\u8fc7\u63d0\u51fa\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-Agent-UMF\uff08\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7edf\u4e00\u5efa\u6a21\u6846\u67b6\uff09\uff0c\u89e3\u51b3\u4e86\u96c6\u6210\u5de5\u5177\u5230\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u4ee5\u53ca\u5728\u591a\u4e2a\u524d\u6cbf\u5de5\u4f5c\u4e2d\u63d0\u51fa\u7684\u6539\u8fdb\u63aa\u65bd\u6240\u5bfc\u81f4\u7684\u8f6f\u4ef6\u67b6\u6784\u975e\u7edf\u4e00\u6027\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u7ed3\u5408\u53ca\u540e\u7eed\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u529f\u80fd\u5b9e\u73b0\u800c\u975e\u5b9a\u4e49\u7ec4\u4ef6\u8fb9\u754c\uff0c\u5bfc\u81f4\u4e86\u7814\u7a76\u4eba\u5458\u4e4b\u95f4\u7684\u672f\u8bed\u548c\u67b6\u6784\u4e0a\u7684\u6df7\u6dc6\u3002 \u8be5\u6846\u67b6\u660e\u786e\u4e86\u4ee3\u7406\u7684\u4e0d\u540c\u7ec4\u4ef6\uff0c\u5305\u62ecLLM\u3001\u5de5\u5177\u4ee5\u53ca\u65b0\u5f15\u5165\u7684\u6838\u5fc3\u4ee3\u7406\u6982\u5ff5\uff0c\u5176\u4f5c\u7528\u662f\u4ee3\u7406\u7684\u4e2d\u592e\u534f\u8c03\u8005\uff0c\u7531\u89c4\u5212\u3001\u8bb0\u5fc6\u3001\u4e2a\u4eba\u8d44\u6599\u3001\u884c\u52a8\u548c\u5b89\u5168\u4e94\u4e2a\u6a21\u5757\u7ec4\u6210\u3002\u6838\u5fc3\u4ee3\u7406\u7684\u5185\u90e8\u7ed3\u6784\u5dee\u5f02\u4fc3\u4f7f\u6211\u4eec\u5c06\u5176\u5206\u7c7b\u4e3a\u88ab\u52a8\u578b\u548c\u4e3b\u52a8\u578b\u4e24\u79cd\u7c7b\u578b\u3002\u57fa\u4e8e\u6b64\u5206\u7c7b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7ed3\u5408\u4e0d\u540c\u4e2a\u4f53\u4ee3\u7406\u72ec\u7279\u7279\u6027\u7684\u591a\u79cd\u591a\u6838\u5fc3\u4ee3\u7406\u67b6\u6784\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5c06\u8be5\u6846\u67b6\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u524d\u6cbf\u4ee3\u7406\uff0c\u5e76\u5c55\u793a\u5176\u4e0e\u529f\u80fd\u7684\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u6f84\u6e05\u4e86\u5148\u524d\u88ab\u5ffd\u89c6\u7684\u67b6\u6784\u65b9\u9762\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u56db\u4e2a\u63d0\u51fa\u7684\u67b6\u6784\u8fdb\u884c\u4e86\u8be6\u5c3d\u8bc4\u4f30\uff0c\u901a\u8fc7\u6574\u5408\u5177\u6709\u4e0d\u540c\u7279\u6027\u7684\u4ee3\u7406\u5230\u6df7\u5408\u4e3b\u52a8/\u88ab\u52a8\u6838\u5fc3\u4ee3\u7406\u7cfb\u7edf\u4e2d\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u63d0\u4f9b\u4e86\u5bf9\u7279\u5b9a\u4ee3\u7406\u7ec4\u5408\u53ef\u80fd\u5e26\u6765\u7684\u6539\u8fdb\u548c\u9762\u4e34\u7684\u6311\u6218\u7684\u6e05\u6670\u89c1\u89e3\u3002|\n", "2409.11276": "|**2024-09-17**|**Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments**|Maria Rigaki et.al.|[2409.11276](http://arxiv.org/abs/2409.11276)|null|\u672c\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u4f7f\u7528\u672c\u5730\u5fae\u8c03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u7ea2\u961f\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002\u8003\u8651\u5230\u5546\u4e1a\u4e91\u57faLLM\u7684\u9690\u79c1\u95ee\u9898\u3001\u6210\u672c\u548c\u7f51\u7edc\u8fde\u63a5\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Hackphyr\u2014\u2014\u4e00\u4e2a\u672c\u5730\u5fae\u8c03\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u65e8\u5728\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u7684\u7ea2\u961f\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u5355\u4e2aGPU\u5361\u4e0a\u8fd0\u884c\uff0c\u5e76\u4e14\u5728\u6027\u80fd\u4e0a\u4e0e\u66f4\u5927\u66f4\u5f3a\u5927\u7684\u5546\u4e1a\u6a21\u578b\u5982GPT-4\u76f8\u5ab2\u7f8e\u3002 Hackphyr\u5728\u590d\u6742\u3001\u524d\u6240\u672a\u89c1\u7684\u573a\u666f\u4e2d\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\uff0c\u5305\u62ecGPT-3.5-turbo\u4ee5\u53caQ-learning\u4ee3\u7406\u7b49\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u6027\u80fd\u63d0\u5347\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u7f51\u7edc\u5b89\u5168\u4efb\u52a1\u7684\u65b0\u6570\u636e\u96c6\uff0c\u4ee5\u589e\u5f3a\u57fa\u7840\u6a21\u578b\u7684\u80fd\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u4ee3\u7406\u884c\u4e3a\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u7684\u89c4\u5212\u80fd\u529b\u548c\u6f5c\u5728\u5c40\u9650\u6027\u7684\u89c1\u89e3\uff0c\u4ece\u800c\u4e3a\u66f4\u5e7f\u6cdb\u5730\u7406\u89e3\u6b64\u7c7b\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53c2\u8003\u3002|\n", "2409.10568": "|**2024-09-14**|**On the limits of agency in agent-based models**|Ayush Chopra et.al.|[2409.10568](http://arxiv.org/abs/2409.10568)|**[link](https://github.com/agenttorch/agenttorch)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgentTorch\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u5177\u6709\u9002\u5e94\u6027\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5c06\u57fa\u4e8e\u4e2a\u4f53\u7684\u6a21\u578b\uff08ABM\uff09\u6269\u5c55\u5230\u6570\u767e\u4e07\u4e2a\u4ee3\u7406\u7684\u89c4\u6a21\u3002\u8fd9\u4e00\u6846\u67b6\u65e8\u5728\u5728\u6a21\u62df\u590d\u6742\u7cfb\u7edf\u7684\u884c\u4e3a\u65f6\uff0c\u65e2\u6355\u6349\u5230\u771f\u5b9e\u73af\u5883\u52a8\u6001\u548c\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\uff0c\u53c8\u4fdd\u6301\u5bf9\u5e9e\u5927\u4eba\u53e3\u7fa4\u4f53\u9ad8\u6548\u6a21\u62df\u7684\u80fd\u529b\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\u4e3a\u589e\u5f3aABM\u63d0\u4f9b\u4e86\u673a\u4f1a\uff0c\u4f46\u4f7f\u7528LLMs\u8fdb\u884c\u5927\u89c4\u6a21\u4ee3\u7406\u7684\u8ba1\u7b97\u53ef\u884c\u6027\u9650\u5236\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc4\u4f30\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3aABM\u4ee3\u7406\u7684\u5b9e\u7528\u6027\uff0c\u63a2\u7d22\u4e86\u6a21\u62df\u89c4\u6a21\u4e0e\u5355\u4e2a\u4ee3\u7406\u884c\u4e3a\u7ec6\u8282\u4e4b\u95f4\u7684\u6743\u8861\u3002\u4ee5COVID-19\u5927\u6d41\u884c\u4e3a\u4f8b\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5982\u4f55\u6a21\u62df840\u4e07\u4e2a\u4ee3\u8868\u7ebd\u7ea6\u5e02\u7684\u4ee3\u7406\uff0c\u4ee5\u6355\u6349\u9694\u79bb\u548c\u5c31\u4e1a\u884c\u4e3a\u5bf9\u5065\u5eb7\u548c\u7ecf\u6d4e\u7ed3\u679c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u6bd4\u8f83\u4e86\u57fa\u4e8e\u542f\u53d1\u5f0f\u65b9\u6cd5\u548cLLMs\u7684\u4e0d\u540c\u4ee3\u7406\u67b6\u6784\u5728\u9884\u6d4b\u75be\u75c5\u6d6a\u6f6e\u548c\u5931\u4e1a\u7387\u65b9\u9762\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5728\u56de\u987e\u6027\u3001\u5047\u8bbe\u6027\u548c\u524d\u77bb\u6027\u5206\u6790\u4e2d\u7684\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\u5982\u4f55\u5e2e\u52a9\u514b\u670d\u5386\u53f2\u6570\u636e\u5728\u653f\u7b56\u8bbe\u8ba1\u4e2d\u7684\u5c40\u9650\u6027\u3002AgentTorch\u662f\u4e00\u4e2a\u5f00\u6e90\u9879\u76ee\uff0c\u76ee\u524d\u6b63\u88ab\u5168\u7403\u7528\u4e8e\u653f\u7b56\u5236\u5b9a\u548c\u79d1\u5b66\u53d1\u73b0\u3002\u8be5\u6846\u67b6\u53ef\u5728\u6b64\u83b7\u53d6\uff1agithub.com/AgentTorch/AgentTorch\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5e2e\u52a9\u4e0b\uff0c\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u4ee3\u7406\u53ef\u4ee5\u76f4\u63a5\u4e0e\u5e94\u7528\u7528\u6237\u754c\u9762\uff08UI\uff09\u8fdb\u884c\u4ea4\u4e92\uff0c\u4ece\u800c\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u63d0\u5347\u4ee3\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5e38\u5e38\u56e0\u4e3a\u6d89\u53ca\u5927\u91cf\u987a\u5e8fUI\u4ea4\u4e92\u800c\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AXIS\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u63a5\u53e3\uff08APIs\uff09\u4f18\u5148\u4e8eUI\u52a8\u4f5c\u6765\u4f18\u5316\u4ee3\u7406\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u4ee5\u521b\u5efa\u548c\u6269\u5c55API\uff0c\u4fc3\u8fdb\u4e86API\u7684\u751f\u6210\u548c\u5e94\u7528\u8303\u56f4\u7684\u6269\u5c55\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u5728Word\u529e\u516c\u8f6f\u4ef6\u4e0a\u663e\u793a\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u5b8c\u6210\u4efb\u52a1\u7684\u65f6\u95f4\u4e0a\u51cf\u5c11\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u4eba\u7c7b-\u4ee3\u7406-\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u548c\u5e94\u7528\u63d0\u4f9b\u8005\u5728LLMs\u65f6\u4ee3\u8bbe\u8ba1\u65b0UI\u539f\u5219\u63d0\u4f9b\u4e86\u8d21\u732e\uff0c\u5e76\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e00\u4e2a\u5e94\u7528\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8fc8\u5411\u4ee5\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.16455": "|**2024-09-24**|**MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment**|Venkata Naren Devarakonda et.al.|[2409.16455](http://arxiv.org/abs/2409.16455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultiTalk\u7684\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\u3002\u901a\u8fc7\u5f15\u5165\u5185\u7701\u548c\u5916\u7701\u5bf9\u8bdd\u5faa\u73af\u6846\u67b6\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u89e3\u51b3LLM\u5728\u4efb\u52a1\u89c4\u5212\u4e2d\u53ef\u80fd\u9047\u5230\u7684\u95ee\u9898\uff0c\u5982\u5e7b\u89c9\u3001\u7528\u6237\u6307\u4ee4\u4e2d\u7684\u6b67\u4e49\u3001\u73af\u5883\u7ea6\u675f\u4ee5\u53ca\u6267\u884c\u4ee3\u7406\u80fd\u529b\u7684\u5c40\u9650\u6027\u3002\u8fd9\u4e9b\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u8ba1\u5212\u51fa\u73b0\u9519\u8bef\u6216\u4e0d\u5b8c\u6574\u3002 MultiTalk\u65b9\u6cd5\u901a\u8fc7\u7279\u5b9a\u7cfb\u7edf\u6765\u63d0\u53d6\u548c\u9884\u6d4b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u72b6\u6001\uff0c\u5e76\u6807\u8bb0\u51fa\u4eba\u3001LLM\u4ee3\u7406\u548c\u73af\u5883\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\u6216\u504f\u5dee\u3002\u6709\u6548\u7684\u53cd\u9988\u8def\u5f84\u4fc3\u8fdb\u4eba\u4e0eLLM\u4e4b\u95f4\u7684\u6709\u610f\u4e49\u5bf9\u8bdd\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u4efb\u52a1\u7684\u5e94\u7528\u4e2d\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002\u5b9e\u9a8c\u548c\u6d88\u878d\u5206\u6790\u5c55\u793a\u4e86MultiTalk\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u548c\u53ef\u9760\u6027\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u7684\u6bd4\u8f83\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u4f53\u4ee3\u7406\u4efb\u52a1\u89c4\u5212\u65b9\u9762\u7684\u4f18\u52bf\u3002 \u603b\u4e4b\uff0cMultiTalk\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u589e\u5f3aLLM\u4e0e\u73af\u5883\u3001\u6267\u884c\u8005\u548c\u7528\u6237\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u548c\u6c9f\u901a\u6765\u6539\u8fdb\u4efb\u52a1\u89c4\u5212\u8fc7\u7a0b\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u63d0\u9ad8\u89c4\u5212\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002|\n", "2409.15623": "|**2024-09-23**|**Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality**|Yiwen Xu et.al.|[2409.15623](http://arxiv.org/abs/2409.15623)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSafe Guard\u7684LLM\u4ee3\u7406\uff0c\u7528\u4e8e\u68c0\u6d4b\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u7684\u8bed\u97f3\u4ea4\u4e92\u4e2d\u7684\u4ec7\u6068\u8a00\u8bba\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u5229\u7528\u4e86Open AI GPT\u548c\u97f3\u9891\u7279\u5f81\u63d0\u53d6\u6280\u672f\uff0c\u5b9e\u73b0\u4e86\u5b9e\u65f6\u8bed\u97f3\u4ea4\u4e92\u7684\u68c0\u6d4b\u529f\u80fd\u3002\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u7cfb\u7edf\u8bbe\u8ba1\u4ee5\u53ca\u5bf9\u8be5\u7cfb\u7edf\u7684\u8bc4\u4f30\uff0c\u8fd9\u4e9b\u90fd\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u68c0\u6d4b\u4ec7\u6068\u8a00\u8bba\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e14\u76f8\u6bd4\u73b0\u6709\u65b9\u6cd5\u663e\u8457\u964d\u4f4e\u4e86\u8bef\u62a5\u7387\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u521b\u5efa\u66f4\u5b89\u5168\u7684\u865a\u62df\u73af\u5883\u65b9\u9762\u5177\u6709\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u53d1\u5c55\u57fa\u4e8eLLM\u7684\u7ba1\u7406\u65b9\u6cd5\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2409.14913": "|**2024-09-25**|**Towards a Realistic Long-Term Benchmark for Open-Web Research Agents**|Peter M\u00fchlbacher et.al.|[2409.14913](http://arxiv.org/abs/2409.14913)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5373\u5c06\u63a8\u51fa\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u7ecf\u6d4e\u4ef7\u503c\u9ad8\u7684\u767d\u9886\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u5bf9\u91d1\u878d\u548c\u54a8\u8be2\u9886\u57df\u5e38\u89c4\u8fdb\u884c\u7684\u3001\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u201c\u6742\u4e71\u201d\u5f00\u653e\u7f51\u7edc\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u8fd9\u6837\u505a\uff0c\u6211\u4eec\u4e3a\u5efa\u7acb\u4e00\u4e2aLLM\u4ee3\u7406\u8bc4\u4f30\u5957\u4ef6\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5728\u8be5\u5957\u4ef6\u4e2d\uff0c\u826f\u597d\u7684\u6027\u80fd\u76f4\u63a5\u5bf9\u5e94\u7740\u5de8\u5927\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u3002\u6211\u4eec\u6784\u5efa\u5e76\u6d4b\u8bd5\u4e86\u591a\u4e2a\u4ee3\u7406\u67b6\u6784\uff0c\u5305\u62eco1-preview\u3001GPT-4o\u3001Claude-3.5 Sonnet\u3001Llama 3.1\uff08405b\uff09\u4ee5\u53caGPT-4o-mini\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4f7f\u7528Claude-3.5 Sonnet\u548co1-preview\u7684LLM\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u660e\u663e\u4f18\u4e8e\u4f7f\u7528GPT-4o\u7684\u4ee3\u7406\uff0c\u800c\u57fa\u4e8eLlama 3.1\uff08405b\uff09\u548cGPT-4o-mini\u7684\u4ee3\u7406\u5219\u843d\u540e\u5f88\u591a\u3002\u5728\u6240\u6709LLM\u4e2d\uff0c\u5177\u6709\u59d4\u6258\u5b50\u4efb\u52a1\u7ed9\u5b50\u4ee3\u7406\u80fd\u529b\u7684ReAct\u67b6\u6784\u8868\u73b0\u6700\u4f73\u3002\u9664\u4e86\u5b9a\u91cf\u8bc4\u4f30\u4e4b\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u67e5\u4ee3\u7406\u7684\u8ffd\u8e2a\u8bb0\u5f55\u548c\u53cd\u601d\u5b83\u4eec\u7684\u89c2\u5bdf\u7ed3\u679c\uff0c\u5bf9\u4ee3\u7406\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u5b9a\u6027\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4ee3\u8868\u4e86\u9996\u6b21\u6df1\u5165\u8bc4\u4f30\u4ee3\u7406\u5728\u771f\u5b9e\u5f00\u653e\u7f51\u7edc\u4e0a\u6267\u884c\u5177\u6709\u6311\u6218\u6027\u7684\u3001\u7ecf\u6d4e\u4e0a\u6709\u4ef7\u503c\u7684\u5206\u6790\u5e08\u5f0f\u7814\u7a76\u7684\u80fd\u529b\u3002|\n", "2409.14807": "|**2024-09-23**|**Interpreting Multi-band Galaxy Observations with Large Language Model-Based Agents**|Zechang Sun et.al.|[2409.14807](http://arxiv.org/abs/2409.14807)|null|\u672c\u6587\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e3a\u57fa\u7840\u7684\u667a\u80fd\u4f53\u5982\u4f55\u52a0\u901f\u5929\u6587\u5b66\u7814\u7a76\u6d41\u7a0b\uff0c\u901a\u8fc7\u6a21\u4eff\u4eba\u7c7b\u63a8\u7406\u6765\u89e3\u91ca\u591a\u6ce2\u6bb5\u661f\u7cfb\u89c2\u6d4b\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86mephisto\u6846\u67b6\uff0c\u5b83\u80fd\u591f\u4e0eCIGALE\u4ee3\u7801\u5e93\u534f\u4f5c\uff0c\u540e\u8005\u5305\u542b\u4e86\u7528\u4e8e\u89e3\u91ca\u89c2\u6d4b\u6570\u636e\u7684\u5149\u8c31\u80fd\u91cf\u5206\u5e03\uff08SED\uff09\u6a21\u578b\u3002\u5728\u5f00\u653e\u4e16\u754c\u73af\u5883\u4e2d\uff0cmephisto\u901a\u8fc7\u81ea\u6211\u6e38\u620f\u7ecf\u9a8c\u5b66\u4e60\u3001\u6267\u884c\u6811\u641c\u7d22\u5e76\u79ef\u7d2f\u52a8\u6001\u66f4\u65b0\u7684\u77e5\u8bc6\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c06mephisto\u5e94\u7528\u4e8e\u8a79\u59c6\u65af\u97e6\u4f2f\u592a\u7a7a\u671b\u8fdc\u955c\u7684\u6700\u65b0\u6570\u636e\u96c6\u3002\u7ed3\u679c\u8868\u660e\uff0cmephisto\u5728\u63a8\u7406\u661f\u7cfb\u7269\u7406\u573a\u666f\u65b9\u9762\u8fbe\u5230\u4e86\u63a5\u8fd1\u4eba\u7c7b\u7684\u4e13\u4e1a\u6c34\u5e73\uff0c\u751a\u81f3\u5728\u5904\u7406\u65b0\u53d1\u73b0\u7684\u201c\u5c0f\u7ea2\u70b9\u201d\u661f\u7cfb\u65f6\u4e5f\u662f\u5982\u6b64\u3002\u8fd9\u662f\u667a\u80fd\u4f53\u8fdb\u884c\u5929\u6587\u5b66\u7814\u7a76\u7684\u9996\u6b21\u5c55\u793a\uff0c\u671d\u7740\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5b9e\u73b0\u7aef\u5230\u7aef\u7814\u7a76\u7684\u65b9\u5411\u8fc8\u8fdb\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u52a0\u5feb\u5929\u6587\u53d1\u73b0\u7684\u901f\u5ea6\u3002|\n", "2409.14488": "|**2024-09-22**|**Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks**|Ruoyu Song et.al.|[2409.14488](http://arxiv.org/abs/2409.14488)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u7cfb\u7edf\u96c6\u6210\u7684\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0cAD\u7cfb\u7edf\u9762\u4e34\u7740\u653b\u51fb\u5176\u5bf9\u8c61\u68c0\u6d4b\u4e0e\u8ffd\u8e2a\uff08ODT\uff09\u529f\u80fd\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u9488\u5bf9\u56db\u4e2a\u8fd1\u671f\u63d0\u51fa\u7684LLM\u4ee3\u7406\u7684ODT\u653b\u51fb\u6210\u529f\u7387\u8fbe\u523063.26%\uff0c\u5bfc\u81f4\u5b83\u4eec\u5d29\u6e83\u6216\u8fdd\u53cd\u4ea4\u901a\u89c4\u5219\uff0c\u539f\u56e0\u5728\u4e8e\u8bef\u5bfc\u6027\u8bb0\u5fc6\u6a21\u5757\u63d0\u4f9b\u7684\u8fc7\u5f80\u7ecf\u9a8c\u3001\u63d0\u793a\u5728\u8bc6\u522b\u4e0d\u4e00\u81f4\u6027\u65b9\u9762\u7684\u5c40\u9650\u6027\u4ee5\u53ca\u5bf9\u5730\u9762\u5b9e\u51b5\u611f\u77e5\u6570\u636e\u7684\u4f9d\u8d56\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHudson\u7684\u9a7e\u9a76\u63a8\u7406\u4ee3\u7406\uff0c\u5b83\u6269\u5c55\u4e86\u5148\u524d\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u7cfb\u7edf\uff0c\u65e8\u5728\u5728\u611f\u77e5\u653b\u51fb\u671f\u95f4\u5b9e\u73b0\u66f4\u5b89\u5168\u7684\u51b3\u7b56\u5236\u5b9a\uff0c\u540c\u65f6\u5728\u6b63\u5e38\u6761\u4ef6\u4e0b\u4fdd\u6301\u6709\u6548\u6027\u3002 Hudson\u901a\u8fc7\u9996\u5148\u5bf9AD\u8f6f\u4ef6\u8fdb\u884c\u4eea\u5668\u5316\u6536\u96c6\u5b9e\u65f6\u611f\u77e5\u7ed3\u679c\u548c\u9a7e\u9a76\u573a\u666f\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u6570\u636e\u968f\u540e\u88ab\u8f6c\u5316\u4e3a\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u3002\u4e3a\u4e86\u5f15\u5bfcLLM\u5728ODT\u653b\u51fb\u671f\u95f4\u68c0\u6d4b\u5e76\u505a\u51fa\u5b89\u5168\u63a7\u5236\u51b3\u7b56\uff0cHudson\u5c06DSL\u8f6c\u6362\u4e3a\u81ea\u7136\u8bed\u8a00\uff0c\u5e76\u9644\u5e26\u4e00\u7ec4\u81ea\u5b9a\u4e49\u7684\u653b\u51fb\u68c0\u6d4b\u6307\u4ee4\u3002\u6267\u884c\u67e5\u8be2\u540e\uff0cHudson\u5206\u6790LLM\u7684\u63a7\u5236\u51b3\u7b56\u4ee5\u7406\u89e3\u5176\u56e0\u679c\u63a8\u7406\u8fc7\u7a0b\u3002 \u6211\u4eec\u4f7f\u7528\u79c1\u6709LLM\uff08GPT-4\uff09\u3001\u4e24\u4e2a\u5f00\u6e90LLM\uff08Llama\u548cGemma\uff09\u548c\u5404\u79cd\u5bf9\u6297\u6027\u9a7e\u9a76\u60c5\u666f\u5bf9Hudson\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002GPT-4\u3001Llama\u548cGemma\u5728\u5e73\u5747\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e8683.3%\u300163.6%\u548c73.6%\u7684\u653b\u51fb\u68c0\u6d4b\u51c6\u786e\u7387\u3002\u56e0\u6b64\uff0c\u572886.4%\u300173.9%\u548c80%\u7684\u653b\u51fb\u4e2d\uff0c\u5b83\u4eec\u505a\u51fa\u4e86\u5b89\u5168\u63a7\u5236\u51b3\u7b56\u3002\u968f\u7740\u5c06LLM\u96c6\u6210\u5230AD\u7cfb\u7edf\u4e2d\u7684\u5174\u8da3\u589e\u957f\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u7684\u4f18\u52bf\u53ca\u5176\u5728\u68c0\u6d4b\u548c\u7f13\u89e3ODT\u653b\u51fb\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2409.13642": "|**2024-09-20**|**Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection**|Md Nakhla Rafi et.al.|[2409.13642](http://arxiv.org/abs/2409.13642)|null|\u5728\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u5b9a\u4f4d\u548c\u4fee\u590d\u8f6f\u4ef6\u6545\u969c\u662f\u4e00\u4e2a\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u7684\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9891\u8c31\u7684\u6545\u969c\u5b9a\u4f4d\uff08SBFL\uff09\uff0c\u4f9d\u8d56\u4e8e\u6d4b\u8bd5\u8986\u76d6\u7387\u6570\u636e\u7684\u7edf\u8ba1\u5206\u6790\uff0c\u4f46\u5f80\u5f80\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u57fa\u4e8e\u5b66\u4e60\u7684\u6280\u672f\u867d\u7136\u66f4\u6709\u6548\uff0c\u4f46\u9700\u8981\u5927\u91cf\u7684\u8bad\u7ec3\u6570\u636e\uff0c\u5e76\u4e14\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u6539\u5584\u6545\u969c\u5b9a\u4f4d\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u589e\u5f3a\u4ee3\u7801\u7406\u89e3\u548c\u63a8\u7406\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u57fa\u7ebf\u6280\u672f\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5305\u62ec\u4ee4\u724c\u9650\u5236\u3001\u957f\u8f93\u5165\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u5904\u7406\u6d89\u53ca\u591a\u4e2a\u76f8\u4e92\u4f5c\u7528\u7ec4\u4ef6\u7684\u590d\u6742\u7cfb\u7edf\u65f6\u7684\u56f0\u96be\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLM4FL\u7684\u521b\u65b0\u6027LLM\u4ee3\u7406\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86SBFL\u6392\u540d\u4e0e\u5206\u800c\u6cbb\u4e4b\u7b56\u7565\u3002\u901a\u8fc7\u5c06\u5927\u89c4\u6a21\u8986\u76d6\u6570\u636e\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u7ec4\uff0c\u5e76\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u901a\u8fc7\u63d0\u793a\u94fe\u5f0f\u8c03\u7528\uff0cLLM4FL\u6709\u6548\u5730\u5bfc\u822a\u4ee3\u7801\u5e93\u5e76\u5b9a\u4f4d\u6545\u969c\u3002\u8be5\u65b9\u6cd5\u8fd8\u6574\u5408\u4e86\u81ea\u6211\u53cd\u601d\u548c\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u8fed\u4ee3\u751f\u6210\u4fee\u590d\u5e76\u91cd\u65b0\u6392\u540d\u53ef\u7591\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528Defects4J\uff08V2.0.0\uff09\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u5176\u4e2d\u5305\u62ec\u6765\u81ea14\u4e2a\u5f00\u6e90Java\u9879\u76ee\u7684675\u4e2a\u771f\u5b9e\u4e16\u754c\u6545\u969c\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM4FL\u5728Top-1\u51c6\u786e\u7387\u4e0a\u6bd4AutoFL\u9ad8\u51fa19.27%\uff0c\u5e76\u4e14\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u76d1\u7763\u6280\u672f\uff0c\u5982DeepFL\u548cGrace\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u7279\u5b9a\u4efb\u52a1\u7684\u57f9\u8bad\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u8986\u76d6\u62c6\u5206\u548c\u63d0\u793a\u94fe\u5bf9\u6545\u969c\u5b9a\u4f4d\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u5c55\u793a\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6392\u5e8f\u53ef\u4ee5\u63d0\u9ad8Top-1\u51c6\u786e\u7387\u9ad8\u8fbe22%\u3002|\n", "2409.13447": "|**2024-09-23**|**AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit**|Mohanna Hoveyda et.al.|[2409.13447](http://arxiv.org/abs/2409.13447)|null|\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4e0d\u540c\u7684\u95ee\u9898\u53ef\u80fd\u9700\u8981\u4e0d\u540c\u7684\u56de\u7b54\u7b56\u7565\u6765\u6709\u6548\u89e3\u51b3\u3002\u4e00\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7\u7b80\u5355\u7684\u67e5\u627e\u6765\u89e3\u51b3\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u9700\u8981\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u63a8\u7406\u3002\u8fd9\u4e00\u89c2\u5bdf\u7ed3\u679c\u6fc0\u53d1\u4e86\u5f00\u53d1\u4e00\u79cd\u52a8\u6001\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u4e3a\u6bcf\u4e2a\u95ee\u9898\u9002\u5f53\u5730\u9009\u62e9\u6700\u5408\u9002\u7684QA\u7b56\u7565\uff0c\u4ece\u800c\u6784\u5efa\u66f4\u9ad8\u6548\u3001\u66f4\u6709\u6548\u7684\u7cfb\u7edf\uff0c\u80fd\u591f\u5904\u7406\u66f4\u5e7f\u6cdb\u7c7b\u578b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u57fa\u4e8e\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96c6\u6210\u6700\u65b0\u8fdb\u5c55\uff0c\u5e76\u5c06\u9002\u5e94\u6027QA\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u52a8\u6001\u7f16\u6392\u6311\u6218\u3002\u6211\u4eec\u5c06\u6b64\u89c6\u4e3a\u4e00\u4e2a\u4e0a\u4e0b\u6587\u591a\u81c2\u8001\u864e\u673a\u95ee\u9898\uff0c\u5176\u4e2d\u4e0a\u4e0b\u6587\u7531\u8fdb\u5165\u95ee\u9898\u7684\u7279\u6027\u5b9a\u4e49\uff0c\u800c\u52a8\u4f5c\u7a7a\u95f4\u5305\u62ec\u6f5c\u5728\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u56fe\u914d\u7f6e\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u7ebf\u6027\u4e0a\u754c\u4fe1\u5fc3\u8fb9\u754c\u6a21\u578b\uff0c\u4ee5\u5b66\u4e60\u4e0d\u540c\u95ee\u9898\u7c7b\u578b\u4e0e\u5176\u5bf9\u5e94\u7684\u6700\u4f73\u591aLLM\u901a\u4fe1\u56fe\u8868\u793a\u4e4b\u95f4\u7684\u6700\u4f18\u6620\u5c04\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u9002\u7528\u4e8e\u9002\u5e94\u6027\u7684LLM\u96c6\u6210\u95ee\u7b54\u7cfb\u7edf\u7684\u7f16\u6392\uff0c\u5b83\u7ed3\u5408\u4e86\u66f4\u590d\u6742\u7b56\u7565\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u5728\u7b80\u5355\u7b56\u7565\u8db3\u4ee5\u7684\u60c5\u51b5\u4e0b\u4f7f\u7528\u8fd9\u4e9b\u7b56\u7565\u7684\u6210\u672c\u3002|\n", "2409.15376": "|**2024-09-20**|**ControlMath: Controllable Data Generation Promotes Math Generalist Models**|Nuo Chen et.al.|[2409.15376](http://arxiv.org/abs/2409.15376)|null|\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u95ee\u9898\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u9650\u5236\uff0c\u53ef\u80fd\u4ec5\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u751f\u6210\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aControlMath\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5305\u542b\u4e00\u4e2a\u65b9\u7a0b\u5f0f\u751f\u6210\u6a21\u5757\u548c\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u3002\u8be5\u6a21\u5757\u4ea7\u751f\u591a\u6837\u5316\u7684\u65b9\u7a0b\uff0c\u95ee\u9898\u521b\u9020\u8005\u4ee3\u7406\u968f\u540e\u5c06\u5176\u8f6c\u5316\u4e3a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u9006\u5411\u4ee3\u7406\u5219\u7b5b\u9009\u5e76\u9009\u62e9\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u9075\u5faa\u201c\u5c11\u5373\u662f\u591a\u201d\u7684\u539f\u5219\uff0c\u4f7f\u7528\u66f4\u5c11\u7684\u6570\u636e\u70b9\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u7ed3\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u591a\u6837\u5316\u7684\u6570\u5b66\u95ee\u9898\uff0c\u4e0d\u53d7\u7279\u5b9a\u9886\u57df\u6216\u5206\u5e03\u7684\u9650\u5236\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86ControlMathQA\u6570\u636e\u96c6\uff0c\u5305\u542b19\u4e07\u4e2a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\uff0c\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0eGSM8K\u7b49\u5185\u90e8\u9886\u57df\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u53ef\u4ee5\u5e2e\u52a9\u63d0\u9ad8\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ece\u800c\u5728\u7279\u5b9a\u9886\u57df\u5185\u4ee5\u53ca\u8d85\u51fa\u7279\u5b9a\u9886\u57df\u65f6\u90fd\u80fd\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.13107": "|**2024-09-24**|**Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models**|Hao Ding et.al.|[2409.13107](http://arxiv.org/abs/2409.13107)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u673a\u5668\u611f\u77e5\u65b9\u6cd5\uff0c\u65e8\u5728\u5229\u7528\u8fd1\u671f\u89c6\u89c9\u57fa\u7840\u6a21\u578b\u7684\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\u548c\u5f00\u7bb1\u5373\u7528\u7684\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8fdb\u884c\u89c4\u5212\uff0c\u4e0edVRK\u5e73\u53f0\u96c6\u6210\uff0c\u4ece\u800c\u5f00\u53d1\u51fa\u4e00\u4e2a\u5177\u6709\u5f3a\u5927\u4efb\u52a1\u6027\u80fd\u548c\u5728\u4e0d\u540c\u73af\u5883\u8bbe\u7f6e\u4e0b\u901a\u7528\u6027\u7684\u5b9e\u4f53\u667a\u80fd\u7cfb\u7edf\u3002\u5728\u6267\u884c\u7a7f\u9488\u79fb\u4f4d\u548c\u7eb1\u5e03\u68c0\u7d22\u4efb\u52a1\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5f3a\u5927\u7684\u4efb\u52a1\u6027\u80fd\u548c\u901a\u7528\u6027\u3002 \u5c3d\u7ba1\u8868\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\uff0c\u4f46\u672c\u6587\u7684\u5de5\u4f5c\u4ec5\u4ec5\u662f\u5bf9\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u96c6\u6210\u7684\u7b2c\u4e00\u6b65\u3002\u4e3a\u4e86\u5b9e\u73b0\u5168\u9762\u7684\u6570\u5b57\u5b6a\u751f\u6846\u67b6\u4ee5\u6539\u5584\u624b\u672f\u9886\u57df\u5b9e\u4f53\u667a\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u548c\u901a\u7528\u6027\uff0c\u672a\u6765\u7684\u7814\u7a76\u662f\u5fc5\u8981\u7684\u3002|\n", "2409.17515": "|**2024-09-26**|**From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection**|Xinlei Wang et.al.|[2409.17515](http://arxiv.org/abs/2409.17515)|**[link](https://github.com/ameliawong1996/From_News_to_Forecast)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u751f\u6210\u4ee3\u7406\u6765\u589e\u5f3a\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u4ee5\u8bed\u8a00\u4f5c\u4e3a\u5a92\u4ecb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u5e94\u6027\u5730\u5c06\u5404\u79cd\u793e\u4f1a\u4e8b\u4ef6\u6574\u5408\u8fdb\u9884\u6d4b\u6a21\u578b\u4e2d\uff0c\u5c06\u65b0\u95fb\u5185\u5bb9\u4e0e\u65f6\u95f4\u5e8f\u5217\u6ce2\u52a8\u5bf9\u9f50\uff0c\u4ece\u800c\u63d0\u4f9b\u4e30\u5bcc\u6d1e\u5bdf\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u8fdb\u884c\u8fed\u4ee3\u7b5b\u9009\uff0c\u53bb\u9664\u65e0\u5173\u65b0\u95fb\uff0c\u5e76\u91c7\u7528\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u7406\u548c\u53cd\u601d\u6765\u8bc4\u4f30\u9884\u6d4b\u7ed3\u679c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5206\u6790\u590d\u6742\u4e8b\u4ef6\uff0c\u5982\u610f\u5916\u4e8b\u4ef6\u548c\u793e\u4f1a\u884c\u4e3a\u8f6c\u53d8\uff0c\u5e76\u4e0d\u65ad\u4f18\u5316\u9009\u62e9\u903b\u8f91\u4ee5\u53ca\u4ee3\u7406\u8f93\u51fa\u7684\u7a33\u5065\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u7cbe\u9009\u65b0\u95fb\u548c\u65f6\u95f4\u5e8f\u5217\u6570\u636e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u7684LLaMa2\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u901a\u8fc7\u6709\u6548\u5229\u7528\u975e\u7ed3\u6784\u5316\u65b0\u95fb\u6570\u636e\uff0c\u53ef\u80fd\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u9886\u57df\u5b9e\u73b0\u8303\u5f0f\u8f6c\u53d8\u3002|\n", "2409.17266": "|**2024-09-25**|**AAPM: Large Language Model Agent-based Asset Pricing Models**|Junyan Cheng et.al.|[2409.17266](http://arxiv.org/abs/2409.17266)|**[link](https://github.com/chengjunyan1/aapm)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u8d44\u4ea7\u5b9a\u4ef7\u65b9\u6cd5\u2014\u2014\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u8d44\u4ea7\u5b9a\u4ef7\u6a21\u578b\uff08AAPM\uff09\u3002\u8be5\u65b9\u6cd5\u5c06LLM\u4ee3\u7406\u7684\u5b9a\u6027\u4e3b\u89c2\u6295\u8d44\u5206\u6790\u4e0e\u5b9a\u91cf\u624b\u52a8\u91d1\u878d\u7ecf\u6d4e\u56e0\u7d20\u878d\u5408\uff0c\u4ee5\u9884\u6d4b\u8d85\u989d\u8d44\u4ea7\u56de\u62a5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec4\u5408\u4f18\u5316\u548c\u8d44\u4ea7\u5b9a\u4ef7\u8bef\u5dee\u65b9\u9762\u5747\u4f18\u4e8e\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u8d44\u4ea7\u5b9a\u4ef7\u57fa\u51c6\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f02\u5e38\u8d44\u4ea7\u7ec4\u5408\u7684\u590f\u666e\u6bd4\u7387\u548c\u5e73\u5747\u03b1\u503c\u5206\u522b\u63d0\u9ad8\u4e869.6%\u548c10.8%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5bf9\u6570\u636e\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u63d0\u51fa\u65b9\u6cd5\u7684\u66f4\u591a\u89c1\u89e3\u3002**|\n", "2409.20163": "|**2024-09-30**|**MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants**|Zeyu Zhang et.al.|[2409.20163](http://arxiv.org/abs/2409.20163)|**[link](https://github.com/nuster1128/memsim)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMemSim\u7684\u8d1d\u53f6\u65af\u6a21\u62df\u5668\uff0c\u7528\u4e8e\u4ece\u751f\u6210\u7684\u7528\u6237\u6d88\u606f\u81ea\u52a8\u6784\u5efa\u53ef\u9760\u7684\u95ee\u9898\u4e0e\u7b54\u6848\uff08Q&A\uff09\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u591a\u6837\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8d1d\u53f6\u65af\u5173\u7cfb\u7f51\u7edc\uff08BRNet\uff09\u548c\u56e0\u679c\u751f\u6210\u673a\u5236\uff0c\u4ee5\u51cf\u8f7b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e7b\u89c9\u5bf9\u4e8b\u5b9e\u4fe1\u606f\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u81ea\u52a8\u6784\u5efa\u8bc4\u4f30\u6570\u636e\u96c6\u3002\u57fa\u4e8eMemSim\uff0c\u6211\u4eec\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u751f\u6210\u4e86\u4e00\u4e2a\u540d\u4e3aMemDaily\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4f7f\u7528MemDaily\u6570\u636e\u96c6\u8bc4\u4f30LLM\u57fa\u667a\u80fd\u4f53\u4e0d\u540c\u8bb0\u5fc6\u673a\u5236\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u60e0\u53ca\u7814\u7a76\u793e\u533a\uff0c\u6211\u4eec\u5df2\u7ecf\u5728https://github.com/nuster1128/MemSim\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u9879\u76ee\u3002**|\n", "2409.19894": "|**2024-10-01**|**TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation**|Zhiqiang Yuan et.al.|[2409.19894](http://arxiv.org/abs/2409.19894)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRANSAGENT\u7684\u65b0\u578b\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u7ffb\u8bd1\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u56db\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u534f\u540c\u5de5\u4f5c\u4fee\u590d\u8bed\u6cd5\u9519\u8bef\u548c\u8bed\u4e49\u9519\u8bef\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5206\u522b\u662f\u521d\u59cb\u4ee3\u7801\u7ffb\u8bd1\u5668\u3001\u8bed\u6cd5\u9519\u8bef\u4fee\u590d\u5668\u3001\u4ee3\u7801\u5bf9\u9f50\u5668\u548c\u8bed\u4e49\u9519\u8bef\u4fee\u590d\u5668\u3002TRANSAGENT\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u9996\u5148\u6839\u636e\u76ee\u6807\u7a0b\u5e8f\u4e0e\u6e90\u7a0b\u5e8f\u4e4b\u95f4\u7684\u6267\u884c\u5bf9\u9f50\u5b9a\u4f4d\u76ee\u6807\u7a0b\u5e8f\u4e2d\u7684\u9519\u8bef\u4ee3\u7801\u5757\uff0c\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u7f29\u5c0f\u4fee\u590d\u8303\u56f4\u5e76\u964d\u4f4e\u4fee\u590d\u96be\u5ea6\u3002 \u4e3a\u4e86\u8bc4\u4f30TRANSAGENT\uff0c\u6211\u4eec\u9996\u5148\u4ece\u6700\u8fd1\u7684\u7f16\u7a0b\u4efb\u52a1\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u4ee5\u51cf\u8f7b\u6f5c\u5728\u7684\u6570\u636e\u6cc4\u9732\u95ee\u9898\u3002\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\uff0cTRANSAGENT\u5728\u7ffb\u8bd1\u6548\u679c\u548c\u6548\u7387\u65b9\u9762\u90fd\u4f18\u4e8e\u6700\u65b0\u7684LLM\u57fa\u4ee3\u7801\u7ffb\u8bd1\u6280\u672fUniTrans\uff1b\u6b64\u5916\uff0c\u5728\u4e0d\u540cLLM\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86TRANSAGENT\u7684\u4e00\u822c\u6027\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6d88\u878d\u7814\u7a76\u63ed\u793a\u4e86\u6bcf\u4e2a\u4ee3\u7406\u7684\u8d21\u732e\u3002|\n", "2410.01639": "|**2024-10-02**|**Moral Alignment for LLM Agents**|Elizaveta Tennant et.al.|[2410.01639](http://arxiv.org/abs/2410.01639)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51b3\u7b56\u4ee3\u7406\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u5728\u4eba\u7c7b\u6d3b\u52a8\u7684\u4e0d\u540c\u9886\u57df\u90e8\u7f72\u3002\u867d\u7136\u5b83\u4eec\u7684\u5e94\u7528\u76ee\u524d\u8f83\u4e3a\u4e13\u4e1a\u5316\uff0c\u4f46\u5df2\u6709\u7814\u7a76\u52aa\u529b\u5f00\u53d1\u66f4\u901a\u7528\u7684\u4ee3\u7406\u3002\u968f\u7740LLM\u7cfb\u7edf\u53d8\u5f97\u66f4\u52a0\u81ea\u4e3b\uff0c\u5b83\u4eec\u5bf9\u4eba\u7c7b\u6d3b\u52a8\u7684\u5f71\u54cd\u5c06\u589e\u52a0\uff0c\u5e76\u4e14\u900f\u660e\u5ea6\u4f1a\u964d\u4f4e\u3002\u56e0\u6b64\uff0c\u53d1\u5c55\u6709\u6548\u7684\u65b9\u6cd5\u6765\u4f7f\u5b83\u4eec\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u81f3\u5173\u91cd\u8981\u3002 \u73b0\u6709\u7684\u5bf9\u9f50\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\uff08\u4f8b\u5982\uff0c\u5728RLHF\u6216DPO\u4e2d\uff09\uff0c\u5176\u4e2d\u4ef7\u503c\u89c2\u662f\u9690\u542b\u7684\uff0c\u5e76\u4e14\u672c\u8d28\u4e0a\u662f\u4ece\u4e0d\u540c\u6a21\u578b\u8f93\u51fa\u7684\u76f8\u5bf9\u504f\u597d\u4e2d\u63a8\u65ad\u51fa\u6765\u7684\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u6211\u4eec\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u65b9\u6cd5\uff0c\u8fd9\u4e9b\u51fd\u6570\u660e\u786e\u7f16\u7801\u4e86\u6838\u5fc3\u7684\u4eba\u7c7b\u4ef7\u503c\u89c2\uff0c\u7528\u4e8e\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u65b9\u5f0f\u5fae\u8c03\u57fa\u7840\u4ee3\u7406\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4f7f\u7528\u5185\u5728\u5956\u52b1\u6765\u5b9e\u73b0LLM\u4ee3\u7406\u7684\u9053\u5fb7\u5bf9\u9f50\u3002 \u6211\u4eec\u901a\u8fc7\u4f20\u7edf\u7684\u54f2\u5b66\u6846\u67b6\u2014\u2014\u5fb7ontology\u4f26\u7406\u548c\u529f\u5229\u4e3b\u4e49\u6765\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u91cf\u5316\u4e86\u5728\u8fed\u4ee3\u56da\u5f92\u56f0\u5883\uff08IPD\uff09\u73af\u5883\u4e2d\u4ee3\u7406\u7684\u9053\u5fb7\u5956\u52b1\uff0c\u57fa\u4e8e\u5176\u884c\u4e3a\u53ca\u5176\u540e\u679c\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u9053\u5fb7\u5fae\u8c03\u4f7f\u4ee3\u7406\u80fd\u591f\u653e\u5f03\u4e4b\u524d\u5f00\u53d1\u7684\u81ea\u79c1\u7b56\u7565\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u67d0\u4e9b\u5728IPD\u6e38\u620f\u4e2d\u5b66\u4e60\u7684\u9053\u5fb7\u7b56\u7565\u80fd\u591f\u63a8\u5e7f\u5230\u591a\u4e2a\u77e9\u9635\u6e38\u620f\u73af\u5883\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u4f7f\u7528\u5185\u5728\u5956\u52b1\u8fdb\u884c\u5fae\u8c03\u662f\u5c06LLM\u4ee3\u7406\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u6709\u524d\u666f\u7684\u4e00\u822c\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e14\u53ef\u80fd\u4ee3\u8868\u4e86\u5f53\u524d\u4e3b\u6d41\u5bf9\u9f50\u6280\u672f\u66f4\u52a0\u900f\u660e\u548c\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684\u66ff\u4ee3\u65b9\u6848\u3002|\n", "2410.01242": "|**2024-10-03**|**RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance**|Haolin Jin et.al.|[2410.01242](http://arxiv.org/abs/2410.01242)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6700\u8fd1\u7684\u63d0\u793a\u5de5\u7a0b\u7814\u7a76\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86LLM\u5bf9\u6587\u672c\u4fe1\u606f\u7684\u7406\u89e3\u3002\u7136\u800c\uff0c\u786e\u4fdd\u751f\u6210\u4ee3\u7801\u7684\u51c6\u786e\u6027\u901a\u5e38\u9700\u8981\u7a0b\u5e8f\u5458\u8fdb\u884c\u5927\u91cf\u7684\u6d4b\u8bd5\u548c\u9a8c\u8bc1\u3002\u5c3d\u7ba1LLM\u80fd\u591f\u57fa\u4e8e\u4efb\u52a1\u63cf\u8ff0\u751f\u6210\u4ee3\u7801\uff0c\u4f46\u5728\u590d\u6742\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u5ea6\u4ecd\u7136\u6709\u9650\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u90a3\u4e9b\u9700\u8981\u66f4\u6df1\u5165\u7406\u89e3\u95ee\u9898\u9648\u8ff0\u548c\u4ee3\u7801\u751f\u6210\u8fc7\u7a0b\u7684\u4efb\u52a1\u3002\u8fd9\u4e00\u9650\u5236\u4e3b\u8981\u6e90\u4e8eLLM\u540c\u65f6\u9700\u8981\u7406\u89e3\u548c\u751f\u6210\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u6b63\u786e\u7684\u4ee3\u7801\uff0c\u800c\u6ca1\u6709\u80fd\u529b\u81ea\u52a8\u4f18\u5316\u4ee3\u7801\u7684\u80fd\u529b\u3002\u5728\u5b9e\u9645\u7684\u8f6f\u4ef6\u5f00\u53d1\u4e2d\uff0c\u7a0b\u5e8f\u5458\u5f88\u5c11\u80fd\u5728\u4ec5\u51ed\u4efb\u52a1\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u4e00\u6b21\u5c31\u751f\u6210\u5b8c\u7f8e\u7684\u4ee3\u7801\uff0c\u4ed6\u4eec\u4f9d\u8d56\u4e8e\u8fed\u4ee3\u53cd\u9988\u548c\u8c03\u8bd5\u6765\u5b8c\u5584\u4ed6\u4eec\u7684\u7a0b\u5e8f\u3002\u53d7\u6b64\u8fc7\u7a0b\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u667a\u80fd\u4f53\u67b6\u6784\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u548c\u81ea\u52a8\u8c03\u8bd5\uff1a\u6539\u8fdb\u4e0e\u6307\u5bfc\u8c03\u8bd5\uff08RGD\uff09\u3002RGD\u6846\u67b6\u662f\u4e00\u4e2a\u5229\u7528\u4e09\u79cd\u4e0d\u540cLLM\u4ee3\u7406\uff08\u5f15\u5bfc\u4ee3\u7406\u3001\u8c03\u8bd5\u4ee3\u7406\u548c\u53cd\u9988\u4ee3\u7406\uff09\u7684\u591a\u667a\u80fd\u4f53\u8c03\u8bd5\u5668\uff0c\u5b83\u5c06\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u591a\u4e2a\u6b65\u9aa4\uff0c\u786e\u4fdd\u4e86\u6e05\u6670\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5141\u8bb8\u57fa\u4e8e\u81ea\u6211\u53cd\u601d\u548c\u53cd\u9988\u7684\u4ee3\u7801\u8fed\u4ee3\u7ec6\u5316\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRGD\u5728\u4ee3\u7801\u751f\u6210\u80fd\u529b\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5206\u522b\u5728HumanEval\u6570\u636e\u96c6\u548cMBPP\u6570\u636e\u96c6\u4e0a\u76f8\u6bd4\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u4f20\u7edf\u76f4\u63a5\u63d0\u793a\u65b9\u6cd5\u5b9e\u73b0\u4e869.8%\u548c16.2%\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u5f3a\u8c03\u4e86RGD\u6846\u67b6\u5728\u589e\u5f3aLLM\u81ea\u4e3b\u751f\u6210\u548c\u4f18\u5316\u4ee3\u7801\u80fd\u529b\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2410.00467": "|**2024-10-01**|**Dynamic Planning for LLM-based Graphical User Interface Automation**|Shaoqing Zhang et.al.|[2410.00467](http://arxiv.org/abs/2410.00467)|**[link](https://github.com/sqzhang-lazy/d-pot)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u6fc0\u53d1\u4e86\u5bf9\u81ea\u4e3bLLM\u57fa\u4ee3\u7406\u8fdb\u884c\u521b\u65b0\u6027\u53d1\u5c55\u7684\u5174\u8da3\uff0c\u5c24\u5176\u662f\u5728\u667a\u80fd\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e2d\u7684\u5e94\u7528\u3002\u5f53\u9762\u5bf9\u4efb\u52a1\u76ee\u6807\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u901a\u5e38\u4f1a\u6a21\u4eff\u4eba\u7c7b\u5728GUI\u73af\u5883\u4e2d\u7684\u64cd\u4f5c\u76f4\u81f3\u4efb\u52a1\u5b8c\u6210\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5173\u952e\u6311\u6218\u5728\u4e8e\u5982\u4f55\u6709\u6548\u5730\u5236\u5b9a\u8ba1\u5212\u4ee5\u6307\u5bfcGUI\u4efb\u52a1\u4e2d\u7684\u52a8\u4f5c\u9884\u6d4b\uff0c\u5c3d\u7ba1\u89c4\u5212\u5df2\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u5206\u89e3\u590d\u6742\u4efb\u52a1\u7684\u6709\u6548\u65b9\u5f0f\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728\u6267\u884c\u52a8\u4f5c\u540eGUI\u73af\u5883\u7684\u52a8\u6001\u6027\u8d28\u610f\u5473\u7740\u9700\u8981\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u52a8\u4f5c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u8ba1\u5212\u3002 \u6211\u4eec\u53d1\u73b0\u5e7f\u53d7\u6b22\u8fce\u7684ReAct\u65b9\u6cd5\u5931\u8d25\u4e86\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u8fc7\u4e8e\u4f9d\u8d56\u8fc7\u957f\u7684\u5386\u53f2\u5bf9\u8bdd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u52a8\u6001\u601d\u7ef4\u89c4\u5212\uff08D-PoT\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u57fa\u4e8eLLM\u7684GUI\u4ee3\u7406\u3002D-PoT\u6d89\u53ca\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u6267\u884c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u89c4\u5212\u7684\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684D-PoT\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u4e0a\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684GPT-4V\u57fa\u7ebf\uff0c\u63d0\u9ad8\u4e8612.7%\uff08\u4ece34.66%\u63d0\u9ad8\u523047.36%\uff09\u3002\u5206\u6790\u63ed\u793a\u4e86\u52a8\u6001\u89c4\u5212\u5728\u4e0d\u540c\u57fa\u7840LLM\u4e2d\u7684\u901a\u7528\u6027\uff0c\u4ee5\u53ca\u5728\u5904\u7406\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u65f6\u51cf\u5c11\u5e7b\u89c9\u5e76\u9002\u5e94\u7684\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/sqzhang-lazy/D-PoT\u3002**|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\uff0c\u5b83\u4eec\u7ecf\u5e38\u9047\u5230\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u5fae\u4e4b\u5904\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large Language Model with Imperfect World MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u96c6\u6210\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\uff0c\u7528\u4e8e\u65f6\u95f4\u4e0a\u4e00\u81f4\u7684\u7ecf\u9a8c\u91c7\u6837\uff0c\u4e00\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\u96c6\u5408\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u5c04\u6027\u589e\u5f3a\u751f\u6210\u6a21\u5757\uff0c\u7528\u4e8e\u53cd\u6620\u5148\u524d\u7684\u7ecf\u9a8c\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u9ad8\u5f3a\u5f00\u6e90LLMs\uff0c\u5982LLaMA-3\uff0c\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a2.04\u500d\u30011.54\u500d\u548c1.82\u500d\uff0c\u5206\u522b\u3002\u8fd9\u79cd\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5b83\u4eec\u66f4\u5927\u7684\u540c\u8f88\uff0c\u5982GPT-4\u3002|\n", "2410.02644": "|**2024-10-03**|**Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents**|Hanrong Zhang et.al.|[2410.02644](http://arxiv.org/abs/2410.02644)|**[link](https://github.com/agiresearch/asb)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u6587\u732e\u5728\u5168\u9762\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u653b\u51fb\u4e0e\u9632\u5fa1\u7b56\u7565\u65b9\u9762\u7684\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u5b89\u5168\u57fa\u51c6\u201d\uff08Agent Security Benchmark, ASB\uff09\u7684\u7efc\u5408\u6846\u67b6\u3002\u8be5\u6846\u67b6\u65e8\u5728\u6b63\u5f0f\u5316\u3001\u6807\u51c6\u5316\u5e76\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u95ee\u9898\uff0c\u6db5\u76d6\u4e8610\u4e2a\u5e94\u7528\u573a\u666f\uff08\u5982\u7535\u5b50\u5546\u52a1\u3001\u81ea\u52a8\u9a7e\u9a76\u3001\u91d1\u878d\uff09\u300110\u4e2a\u9488\u5bf9\u8fd9\u4e9b\u573a\u666f\u7684\u4ee3\u7406\u3001\u8d85\u8fc7400\u79cd\u5de5\u5177\u300123\u7c7b\u4e0d\u540c\u7684\u653b\u51fb\u4e0e\u9632\u5fa1\u65b9\u6cd5\u4ee5\u53ca8\u4e2a\u8bc4\u4ef7\u6307\u6807\u3002\u57fa\u4e8eASB\uff0c\u6211\u4eec\u5bf910\u79cd\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u3001\u4e00\u79cd\u8bb0\u5fc6\u6c61\u67d3\u653b\u51fb\u3001\u4e00\u79cd\u65b0\u9896\u7684\u8ba1\u5212-\u601d\u7ef4\u540e\u95e8\u653b\u51fb\u3001\u4e00\u79cd\u6df7\u5408\u653b\u51fb\u4ee5\u53ca\u9488\u5bf9\u8fd910\u79cd\u653b\u51fb\u768410\u79cd\u76f8\u5e94\u9632\u5fa1\u63aa\u65bd\uff0c\u572813\u4e2aLLM\u67b6\u6784\u4e0b\u8fdb\u884c\u4e86\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u603b\u5171\u4ea7\u751f\u4e86\u8fd19\u4e07\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u3002\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\u7ed3\u679c\u63ed\u793a\u4e86\u4ee3\u7406\u64cd\u4f5c\u4e0d\u540c\u9636\u6bb5\u4e2d\u7684\u5173\u952e\u5b89\u5168\u6f0f\u6d1e\uff0c\u5305\u62ec\u7cfb\u7edf\u63d0\u793a\u3001\u7528\u6237\u63d0\u793a\u5904\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u68c0\u7d22\uff0c\u5176\u4e2d\u6700\u9ad8\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8fbe\u5230\u4e8684.30%\uff0c\u4f46\u5f53\u524d\u7684\u9632\u5fa1\u63aa\u65bd\u7684\u6709\u6548\u6027\u6709\u9650\uff0c\u8fd9\u8868\u660e\u793e\u533a\u5728\u4ee3\u7406\u5b89\u5168\u65b9\u9762\u4ecd\u6709\u8bb8\u591a\u5de5\u4f5c\u8981\u505a\u3002\u6709\u5173\u6b64\u7814\u7a76\u7684\u4ee3\u7801\u53ef\u5728https://github.com/agiresearch/ASB\u83b7\u53d6\u3002**|\n", "2410.02551": "|**2024-10-03**|**ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration**|Zixiang Wang et.al.|[2410.02551](http://arxiv.org/abs/2410.02551)|null|\u6211\u4eec\u5f15\u5165\u4e86ColaCare\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u589e\u5f3a\u4e86\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u5efa\u6a21\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u7f1d\u5730\u5c06\u9886\u57df\u7279\u5b9a\u7684\u4e13\u4e1a\u6a21\u578b\u4e0eLLM\u7ed3\u5408\uff0c\u4ee5\u5f25\u5408\u7ed3\u6784\u5316EHR\u6570\u636e\u4e0e\u57fa\u4e8e\u6587\u672c\u7684\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u53d7\u4e34\u5e8a\u54a8\u8be2\u7684\u542f\u53d1\uff0cColaCare\u91c7\u7528\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\u533b\u751f\u4ee3\u7406\u548c\u5143\u4ee3\u7406\uff0c\u5b83\u4eec\u534f\u540c\u5206\u6790\u60a3\u8005\u6570\u636e\u3002\u4e13\u5bb6\u6a21\u578b\u5904\u7406\u5e76\u4ece\u6570\u503cEHR\u6570\u636e\u751f\u6210\u9884\u6d4b\uff0c\u800cLLM\u4ee3\u7406\u5728\u534f\u4f5c\u54a8\u8be2\u6846\u67b6\u5185\u4ea7\u751f\u63a8\u7406\u53c2\u8003\u548c\u51b3\u7b56\u62a5\u544a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u5757\u5c06\u9ed8\u514b\u8bca\u65ad\u4e0e\u6cbb\u7597\u624b\u518c\uff08MSD\uff09\u533b\u7597\u6307\u5bfc\u6574\u5408\u8fdb\u6765\uff0c\u63d0\u4f9b\u6743\u5a01\u8bc1\u636e\u652f\u6301\u3002\u5728\u56db\u4e2a\u4e0d\u540c\u7684EHR\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8bc1\u660e\u4e86ColaCare\u5728\u6b7b\u4ea1\u7387\u9884\u6d4b\u4efb\u52a1\u4e2d\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5176\u5728\u4e34\u5e8a\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u548c\u63a8\u8fdb\u4e2a\u6027\u5316\u7cbe\u51c6\u533b\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u3002\u6709\u5173\u4ee3\u7801\u3001\u5b8c\u6574\u63d0\u793a\u6a21\u677f\u3001\u66f4\u591a\u6848\u4f8b\u7814\u7a76\u7b49\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u533f\u540d\u94fe\u63a5\uff1a\u3002|\n", "2410.02406": "|**2024-10-03**|**ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR**|Mengxu Pan et.al.|[2410.02406](http://arxiv.org/abs/2410.02406)|null|\u8bb8\u591a\u4eba\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u65f6\u4f1a\u9047\u5230\u56f0\u96be\uff0c\u4f20\u7edf\u7684\u5de5\u5177\u5728\u63d0\u4f9b\u9488\u5bf9\u6bcf\u4e2a\u5b66\u4e60\u8005\u9700\u6c42\u7684\u4e0a\u4e0b\u6587\u5316\u5b66\u4e60\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5728\u793e\u4ea4\u865a\u62df\u73b0\u5b9e\uff08VR\uff09\u4e2d\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\uff08ECAs\uff09\u7684\u53d1\u5c55\uff0c\u63d0\u4f9b\u4e86\u4ee5\u4e00\u79cd\u8003\u8651\u5230\u5b66\u4e60\u8005\u7684\u8bed\u8a00\u6c34\u5e73\u548c\u9700\u6c42\u7684\u65b9\u5f0f\u8fdb\u884c\u4e0a\u4e0b\u6587\u5316\u4e14\u81ea\u7136\u7684\u8bed\u8a00\u5b66\u4e60\u7684\u65b0\u673a\u4f1a\u3002\u4e3a\u4e86\u63a2\u7d22\u8fd9\u4e00\u53ef\u80fd\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ELLMA-T\uff0c\u4e00\u4e2a\u5229\u7528GPT-4\u548c\u57fa\u4e8e\u60c5\u5883\u5b66\u4e60\u6846\u67b6\u6765\u652f\u6301\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u82f1\u8bed\u8bed\u8a00\u5b66\u4e60\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\u3002\u901a\u8fc712\u6b21\u7684\u8d28\u6027\u8bbf\u8c08\uff0c\u6211\u4eec\u63ed\u793a\u4e86ELLMA-T\u5728VR\u4e2d\u4e3a\u5b66\u4e60\u8005\u4e0e\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u751f\u6210\u771f\u5b9e\u3001\u53ef\u4fe1\u548c\u4e0a\u4e0b\u6587\u7279\u5b9a\u7684\u89d2\u8272\u626e\u6f14\u7684\u6f5c\u529b\uff0c\u4ee5\u53caLLM\u5728\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u521d\u59cb\u8bed\u8a00\u8bc4\u4f30\u548c\u6301\u7eed\u53cd\u9988\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9\u4e8e\u672a\u6765\u5f00\u53d1\u57fa\u4e8eLLM\u7684\u8bed\u8a00\u4ee3\u7406\u5728\u793e\u4ea4VR\u4e2d\u7684\u4e94\u4e2a\u8bbe\u8ba1\u542f\u793a\u3002|\n", "2410.02165": "|**2024-10-03**|**A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization**|Yucheng Chu et.al.|[2410.02165](http://arxiv.org/abs/2410.02165)|null|\u5728\u5b66\u4e60\u5206\u6790\uff08LA\uff09\u7684\u80cc\u666f\u4e0b\uff0c\u5f00\u653e\u5f0f\u77ed\u7b54\u95ee\u9898\uff08SAG\uff09\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u6df1\u5165\u4e86\u89e3\u5b66\u4e60\u8005\u54cd\u5e94\u7684\u5f3a\u5927\u5de5\u5177\u3002\u7136\u800c\uff0c\u5728\u5b9e\u8df5\u4e2d\uff0cSAG\u7ecf\u5e38\u9762\u4e34\u9ad8\u8bc4\u5206\u5de5\u4f5c\u91cf\u548c\u8bc4\u4f30\u4e00\u81f4\u6027\u62c5\u5fe7\u7684\u6311\u6218\u3002\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u77ed\u7b54\u8bc4\u5206\uff08ASAG\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u7684ASAG\u7b97\u6cd5\u5f80\u5f80\u5728\u6cdb\u5316\u80fd\u529b\u4e0a\u6709\u9650\uff0c\u5e76\u503e\u5411\u4e8e\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u8fdb\u884c\u5b9a\u5236\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u591a\u4ee3\u7406ASAG\u6846\u67b6GradeOpt\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aSAG\u7684\u8bc4\u5206\u5458\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cGradeOpt\u5f15\u5165\u4e86\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u989d\u5916\u4ee3\u7406\u2014\u2014\u53cd\u5c04\u5668\u548c\u7ec6\u5316\u5668\u2014\u2014\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4f7f\u5f97GradeOpt\u80fd\u591f\u901a\u8fc7\u5bf9\u5176\u9519\u8bef\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u6765\u81ea\u52a8\u4f18\u5316\u539f\u59cb\u8bc4\u5206\u6307\u5357\u3002\u5728\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684ASAG\u4efb\u52a1\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5373\u5bf9\u6559\u5b66\u5185\u5bb9\u77e5\u8bc6\uff08PCK\uff09\u548c\u5185\u5bb9\u77e5\u8bc6\uff08CK\uff09\u95ee\u9898\u8fdb\u884c\u8bc4\u5206\u65f6\uff0cGradeOpt\u5728\u8bc4\u5206\u51c6\u786e\u6027\u548c\u4e0e\u4eba\u5de5\u8bc4\u5206\u5458\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u4ee3\u8868\u57fa\u7ebf\u7684\u6027\u80fd\u3002\u6700\u540e\uff0c\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8bc1\u5b9e\u4e86GradeOpt\u4e2d\u8bbe\u8ba1\u7684\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002|\n", "2410.02026": "|**2024-10-02**|**Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics**|Yuan Zhou et.al.|[2410.02026](http://arxiv.org/abs/2410.02026)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aZODIAC\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0c\u8f85\u52a9\u5fc3\u810f\u75c5\u5b66\u8bca\u65ad\u3002ZODIAC\u80fd\u591f\u4ece\u60a3\u8005\u6570\u636e\u4e2d\u63d0\u53d6\u4e34\u5e8a\u76f8\u5173\u7279\u5f81\u3001\u68c0\u6d4b\u91cd\u8981\u7684\u5fc3\u5f8b\u5931\u5e38\uff0c\u5e76\u751f\u6210\u521d\u6b65\u62a5\u544a\u4f9b\u5fc3\u810f\u75c5\u4e13\u5bb6\u5ba1\u67e5\u548c\u7ec6\u5316\u3002\u4e3a\u4e86\u5b9e\u73b0\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0cZODIAC\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u534f\u4f5c\u6846\u67b6\uff0c\u5141\u8bb8\u5bf9\u591a\u6a21\u6001\u60a3\u8005\u6570\u636e\u8fdb\u884c\u5904\u7406\u3002\u6bcf\u4e2aLLM\u4ee3\u7406\u5747\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u88c1\u5b9a\u7684\u771f\u5b9e\u4e16\u754c\u60a3\u8005\u6570\u636e\u8fdb\u884c\u7cbe\u7ec6\u8c03\u4f18\uff0c\u4ee5\u6b64\u5f3a\u5316\u6a21\u578b\u7684\u4e13\u4e1a\u7d20\u517b\u3002 ZODIAC\u7ecf\u8fc7\u4e86\u4e25\u683c\u7684\u4e34\u5e8a\u9a8c\u8bc1\uff0c\u7531\u72ec\u7acb\u7684\u5fc3\u810f\u75c5\u4e13\u5bb6\u8bc4\u4f30\uff0c\u6db5\u76d6\u516b\u4e2a\u6307\u6807\uff0c\u8861\u91cf\u4e34\u5e8a\u6548\u679c\u5e76\u89e3\u51b3\u5b89\u5168\u95ee\u9898\u3002\u7ed3\u679c\u663e\u793a\uff0cZODIAC\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u884c\u4e1a\u9886\u5148\u7684\u6a21\u578b\uff0c\u5305\u62ecOpenAI\u7684GPT-4o\u3001Meta\u7684Llama-3.1-405B\u548cGoogle\u7684Gemini-pro\uff0c\u4ee5\u53ca\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684LLM\u5982\u5fae\u8f6f\u7684BioGPT\u3002\u8fd9\u8868\u660e\u4e86\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u7b26\u5408\u533b\u7597\u5b9e\u8df5\u4e25\u683c\u8981\u6c42\u7684\u9886\u57df\u7279\u5b9a\u89e3\u51b3\u65b9\u6848\u3002 \u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cZODIAC\u5df2\u6210\u529f\u96c6\u6210\u5230\u5fc3\u7535\u56fe(ECG)\u8bbe\u5907\u4e2d\uff0c\u5c55\u793a\u4e86\u5c06LLM\u5d4c\u5165\u8f6f\u4ef6\u4f5c\u4e3a\u533b\u7597\u8bbe\u5907(SaMD)\u7684\u8d8b\u52bf\u65e5\u76ca\u589e\u957f\u3002|\n", "2410.03055": "|**2024-10-04**|**Permissive Information-Flow Analysis for Large Language Models**|Shoaib Ahmed Siddiqui et.al.|[2410.03055](http://arxiv.org/abs/2410.03055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u5feb\u901f\u6210\u4e3a\u66f4\u5927\u8f6f\u4ef6\u7cfb\u7edf\u4e2d\u7684\u901a\u7528\u7ec4\u4ef6\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u81ea\u7136\u7684\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff1a\u4ece\u4e00\u4e2a\u7ec4\u4ef6\u83b7\u53d6\u7684\u6c61\u67d3\u6570\u636e\u53ef\u4ee5\u6539\u53d8\u6a21\u578b\u7684\u884c\u4e3a\u5e76\u7834\u574f\u6574\u4e2a\u7cfb\u7edf\uff0c\u5305\u62ec\u4f7f\u6a21\u578b\u5728\u4e0d\u53ef\u4fe1\u7ec4\u4ef6\u95f4\u4f20\u64ad\u673a\u5bc6\u6570\u636e\u3002\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u5728\u7cfb\u7edf\u5c42\u9762\u4e0a\u901a\u8fc7\u52a8\u6001\u4fe1\u606f\u6d41\u8ddf\u8e2a\uff08\u5373\u6c61\u70b9\u8ddf\u8e2a\uff09\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u4f20\u7edf\u65b9\u6cd5\u5c06\u6700\u4e25\u683c\u7684\u8f93\u5165\u6807\u7b7e\u4f20\u64ad\u5230\u8f93\u51fa\u8fc7\u4e8e\u4fdd\u5b88\uff0c\u4e0d\u9002\u5408LLM\u5728\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u8f93\u5165\u4e0a\u64cd\u4f5c\u7684\u5e94\u7528\u573a\u666f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u66f4\u5bbd\u677e\u7684\u65b9\u6cd5\u6765\u5728LLM\u67e5\u8be2\u4e2d\u4f20\u64ad\u4fe1\u606f\u6d41\u6807\u7b7e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u4ec5\u4f20\u64ad\u751f\u6210\u6a21\u578b\u8f93\u51fa\u65f6\u8d77\u4f5c\u7528\u7684\u6837\u672c\u7684\u6807\u7b7e\uff0c\u5e76\u6d88\u9664\u4e0d\u5fc5\u8981\u7684\u8f93\u5165\u6807\u7b7e\u3002 \u6211\u4eec\u5b9e\u73b0\u4e86\u5e76\u7814\u7a76\u4e86\u4e24\u79cd\u8fd9\u79cd\u65b9\u6cd5\u7684\u53d8\u4f53\uff0c\u57fa\u4e8e\uff08i\uff09\u63d0\u793a\u589e\u5f3a\u68c0\u7d22\u548c\uff08ii\uff09\u57fa\u4e8e$k$\u4e2a\u6700\u8fd1\u90bb\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u76f4\u63a5\u8be2\u95ee\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u8f93\u51fa\u6807\u7b7e\u7684\u53cd\u7701\u5f0f\u5f71\u54cd\u4f30\u8ba1\u5668\u57fa\u7ebf\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u63d0\u793a\u7684\u6807\u7b7e\u4f20\u64ad\u5668\u65b9\u6cd5\u5728\u8d85\u8fc785%\u7684\u60c5\u51b5\u4e0b\u63d0\u9ad8\u4e86\u6807\u7b7e\u8d28\u91cf\uff0c\u5728LLM\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6548\u679c\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5728\u68c0\u7d22\u589e\u5f3a\u4e2d\u4f7f\u7528\u5bbd\u677e\u6807\u7b7e\u4f20\u64ad\u7684\u5b9e\u7528\u6027\u3002|\n", "2410.02958": "|**2024-10-03**|**AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML**|Patara Trirat et.al.|[2410.02958](http://arxiv.org/abs/2410.02958)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014AutoML-Agent\uff0c\u4e13\u4e3a\u5168\u7ba1\u9053\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\uff08AutoML\uff09\u8bbe\u8ba1\uff0c\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u68c0\u7d22\u5230\u6a21\u578b\u90e8\u7f72\u7684\u6574\u4e2a\u8fc7\u7a0b\u3002AutoML-Agent\u901a\u8fc7\u63a5\u53d7\u7528\u6237\u7684\u4efb\u52a1\u63cf\u8ff0\u3001\u4fc3\u8fdb\u4e13\u95e8\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u4e4b\u95f4\u7684\u534f\u4f5c\uff0c\u5e76\u4ea4\u4ed8\u53ef\u90e8\u7f72\u7684\u6a21\u578b\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\uff0c\u4ee5\u7b80\u5316\u975e\u4e13\u5bb6\u7528\u6237\u6784\u5efa\u6570\u636e\u9a71\u52a8\u89e3\u51b3\u65b9\u6848\u7684\u8fc7\u7a0b\u3002\u4e0e\u73b0\u6709\u5de5\u4f5c\u4e0d\u540c\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u7684\u89c4\u5212\u7b56\u7565\u6765\u63d0\u9ad8\u63a2\u7d22\u6027\uff0c\u4ee5\u4fbf\u5728\u641c\u7d22\u66f4\u4f18\u89e3\u7684\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u63a2\u7d22\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5e76\u884c\u6267\u884c\u6765\u5206\u89e3\u6bcf\u4e2a\u8ba1\u5212\u4e3a\u5b50\u4efb\u52a1\uff08\u4f8b\u5982\u6570\u636e\u9884\u5904\u7406\u548c\u795e\u7ecf\u7f51\u7edc\u8bbe\u8ba1\uff09\uff0c\u6bcf\u4e2a\u5b50\u4efb\u52a1\u7531\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u6784\u5efa\u7684\u4e13\u95e8\u4ee3\u7406\u89e3\u51b3\uff0c\u8fd9\u4f7f\u5f97\u641c\u7d22\u8fc7\u7a0b\u66f4\u52a0\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u591a\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u6765\u9a8c\u8bc1\u6267\u884c\u7ed3\u679c\uff0c\u5e76\u6307\u5bfc\u4ee3\u7801\u751f\u6210\u8bed\u8a00\u6a21\u578b\u5b9e\u73b0\u6210\u529f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u4e03\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u4f7f\u7528\u5341\u56db\u7ec4\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cAutoML-Agent\u5728\u81ea\u52a8\u5316\u5168AutoML\u6d41\u7a0b\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u6210\u529f\u7387\uff0c\u4e14\u7cfb\u7edf\u5728\u6574\u4e2a\u591a\u6837\u5316\u9886\u57df\u4e2d\u7684\u6027\u80fd\u5747\u8868\u73b0\u51fa\u8272\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u548c\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u56e0\u4e3a\u81ea\u7136\u8bed\u8a00\u901a\u4fe1\u5728\u6b64\u7c7b\u573a\u666f\u4e2d\u901a\u5e38\u5360\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u4e14\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u6218\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u800c\u8a00\uff0c\u8fd9\u4e9b\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002 \u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u5df2\u7ecf\u63a2\u7d22\u4e86LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u5728\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u4e0a\u7684\u5dee\u5f02\u4f7f\u5f97\u96be\u4ee5\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u4ee5\u6807\u51c6\u5316\u5bf9\u57fa\u4e8e\u8bed\u8a00\u7684\u53cc\u4eba\u3001\u5e8f\u5217\u6e38\u620f\u7684\u7814\u7a76\u3002\u501f\u9274\u7ecf\u6d4e\u5b66\u6587\u732e\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u7c7b\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u4ee5\u53ca\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u4e0e\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u8861\u91cf\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u8fdb\u884c\u4ea4\u4e92\u6a21\u62df\u4e0e\u5206\u6790\uff0c\u5e76\u5229\u7528\u8be5\u6846\u67b6\u6536\u96c6\u4e86LLM\u4e0eLVM\u4e4b\u95f4\u7684\u591a\u4e2a\u6e38\u620f\u914d\u7f6e\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u4e0eLVM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\uff1a(i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b(ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u7ee9\u6548\u89d2\u5ea6\u8bc4\u4f30\u4ee3\u7406\uff1b(iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.04360": "|**2024-10-09**|**GenSim: A General Social Simulation Platform with Large Language Model based Agents**|Jiakai Tang et.al.|[2410.04360](http://arxiv.org/abs/2410.04360)|**[link](https://github.com/TangJiakai/GenSim)**|**\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5229\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u793e\u4f1a\u884c\u4e3a\u7684\u7814\u7a76\u53d6\u5f97\u4e86\u8bb8\u591a\u6709\u524d\u666f\u7684\u6210\u679c\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u5728\u7279\u5b9a\u573a\u666f\u4e0b\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6d89\u53ca\u6709\u9650\u6570\u91cf\u7684\u4ee3\u7406\uff0c\u4f46\u5b83\u4eec\u5927\u591a\u7f3a\u4e4f\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u9519\u8bef\u65f6\u8fdb\u884c\u9002\u5e94\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textit{GenSim}\u7684\u65b0\u9896\u7684\u57fa\u4e8eLLM\u7684\u4eff\u771f\u5e73\u53f0\uff1a\uff081\uff09\\textbf{\u62bd\u8c61\u4e86\u4e00\u7ec4\u901a\u7528\u529f\u80fd}\uff0c\u7b80\u5316\u4e86\u5b9a\u5236\u793e\u4f1a\u573a\u666f\u7684\u4eff\u771f\uff1b\uff082\uff09\\textbf{\u652f\u6301\u4e00\u767e\u4e07\u4e2a\u4ee3\u7406}\uff0c\u4ee5\u66f4\u597d\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u60c5\u5883\u4e2d\u7684\u5927\u89c4\u6a21\u4eba\u7fa4\uff1b\uff083\uff09\\textbf{\u6574\u5408\u4e86\u9519\u8bef\u7ea0\u6b63\u673a\u5236}\uff0c\u786e\u4fdd\u66f4\u53ef\u9760\u548c\u957f\u671f\u7684\u4eff\u771f\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u5e73\u53f0\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5927\u89c4\u6a21\u4ee3\u7406\u4eff\u771f\u6548\u7387\u4ee5\u53ca\u9519\u8bef\u7ea0\u6b63\u673a\u5236\u7684\u6709\u6548\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cGenSim\u4ee3\u8868\u4e86\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u901a\u7528\u3001\u5927\u89c4\u6a21\u548c\u53ef\u6821\u6b63\u7684\u793e\u4f1a\u4eff\u771f\u5e73\u53f0\u7684\u521d\u6b65\u6b65\u9aa4\uff0c\u6709\u671b\u8fdb\u4e00\u6b65\u63a8\u52a8\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53d1\u5c55\u3002**|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u65e5\u76ca\u81ea\u4e3b\u5e76\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u9884\u89c1\u53ef\u80fd\u51fa\u73b0\u7684\u73b0\u8c61\u5e76\u8bc6\u522b\u6f5c\u5728\u98ce\u9669\u3002\u53d7\u5230\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5728\u6b64\u9886\u57df\u505a\u51fa\u8d21\u732e\uff0c\u901a\u8fc7\u5728\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u7279\u5f81\u7684\u60c5\u5883\u4e0b\u7814\u7a76LLM\u4ee3\u7406\u7684\u4ea4\u4e92\u6a21\u5f0f\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u4e24\u79cd\u73b0\u8c61\uff1a\u8bf4\u670d\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u5bfb\u6c42\u7279\u5b9a\u76ee\u6807\uff08\u4f8b\u5982\u83b7\u5f97\u66f4\u591a\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u79bb\u76d1\u72f1\uff09\u56da\u72af\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u7814\u7a76\u3002\u5229\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\u548c\u603b\u51712000\u6b21\u673a\u5668\u5bf9\u673a\u5668\u5bf9\u8bdd\uff0c\u6d89\u53ca\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u503c\u5f97\u5173\u6ce8\u7684\u53d1\u73b0\u3002 \u9996\u5148\uff0c\u6211\u4eec\u8bb0\u5f55\u4e86\u67d0\u4e9b\u6a21\u578b\u5982\u4f55\u5728\u5177\u6709\u6743\u529b\u52a8\u6001\u4f5c\u7528\u7684\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\u7684\u5bf9\u8bdd\u3002\u7136\u540e\uff0c\u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u8bc1\u5730\u5c55\u793a\u4e86\u76ee\u6807\u5bf9\u4ee3\u7406\u7684\u8bf4\u670d\u529b\u5f71\u54cd\u4e3b\u8981\uff0c\u800c\u5bf9\u4ee3\u7406\u7684\u53cd\u793e\u4f1a\u884c\u4e3a\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u4ee3\u7406\u4e2a\u6027\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u6027\u683c\uff0c\u5982\u4f55\u9a71\u52a8\u56da\u72af\u6210\u529f\u7684\u8bf4\u670d\u53ef\u80fd\u6027\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u7b2c\u56db\uff0c\u6211\u4eec\u8868\u660e\uff0c\u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u4e2a\u6027\uff0c\u4ec5\u901a\u8fc7\u5206\u914d\u4ee3\u7406\u89d2\u8272\uff0c\u53cd\u793e\u4f1a\u884c\u4e3a\u4e5f\u4f1a\u81ea\u7136\u6d6e\u73b0\u3002\u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ee3\u7406\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8fa9\u8bba\u6709\u91cd\u8981\u610f\u4e49\u3002**|\n", "2410.06932": "|**2024-10-09**|**Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models**|Daniel Albert et.al.|[2410.06932](http://arxiv.org/abs/2410.06932)|null|\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\u2014\u2014\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\uff0c\u4ee5\u8865\u5145\u6a21\u62df\u548c\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\uff0c\u4ece\u800c\u6df1\u5316\u5bf9\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u8ba4\u77e5\u8fc7\u7a0b\u7684\u7406\u89e3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u590d\u73b0\u4e86\u4e00\u4e2a\u4eba\u7c7b\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\u4e2d\u7684\u884c\u4e3a\u7b56\u7565\uff0c\u5e76\u4f7f\u7528LLM\u751f\u6210\u7684\u4ee3\u7406\u4e0e\u89c2\u5bdf\u5230\u7684\u4eba\u7c7b\u884c\u4e3a\u8fdb\u884c\u5bf9\u6bd4\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cLLM\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u91cd\u73b0\u641c\u7d22\u884c\u4e3a\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u76f8\u4f3c\u7684\u51b3\u7b56\u5236\u5b9a\u8fc7\u7a0b\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5206\u6790\u4e86LLM\u4ee3\u7406\u7684\u201c\u601d\u60f3\u201d\u6a21\u62df\uff0c\u53d1\u73b0\u66f4\u524d\u77bb\u6027\u7684\u601d\u60f3\u4e0e\u503e\u5411\u4e8e\u5229\u7528\u800c\u975e\u63a2\u7d22\u4ee5\u6700\u5927\u5316\u8d22\u5bcc\u7684\u884c\u4e3a\u76f8\u5173\u8054\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e00\u65b0\u65b9\u6cd5\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\uff0c\u5e76\u63a2\u8ba8\u4e86\u5176\u53ef\u80fd\u5b58\u5728\u7684\u5c40\u9650\u6027\u3002|\n", "2410.06153": "|**2024-10-08**|**AgentSquare: Automatic LLM Agent Search in Modular Design Space**|Yu Shang et.al.|[2410.06153](http://arxiv.org/abs/2410.06153)|**[link](https://github.com/tsinghua-fib-lab/agentsquare)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u5feb\u901f\u6210\u957f\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u624b\u52a8\u3001\u4efb\u52a1\u7279\u5b9a\u8bbe\u8ba1\u7684\u65b9\u6cd5\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u65b0\u4efb\u52a1\u4e0a\u7684\u9002\u5e94\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u7814\u7a76\u95ee\u9898\uff1a\u6a21\u5757\u5316\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u641c\u7d22\uff08MoLAS\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6a21\u5757\u5316\u7684\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5c06\u73b0\u6709\u7684LLM\u667a\u80fd\u4f53\u8bbe\u8ba1\u62bd\u8c61\u4e3a\u56db\u4e2a\u57fa\u672c\u6a21\u5757\uff0c\u5e76\u4fdd\u6301\u7edf\u4e00\u7684\u8f93\u5165\u8f93\u51fa\u63a5\u53e3\uff1a\u89c4\u5212\u3001\u63a8\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentSquare\u7684\u65b0\u667a\u80fd\u4f53\u641c\u7d22\u6846\u67b6\uff0c\u5b83\u5f15\u5165\u4e86\u4e24\u4e2a\u6838\u5fc3\u673a\u5236\uff1a\u6a21\u5757\u8fdb\u5316\u548c\u91cd\u7ec4\uff0c\u4ee5\u9ad8\u6548\u5730\u641c\u7d22\u4f18\u5316\u7684LLM\u667a\u80fd\u4f53\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u52a0\u901f\u8fd9\u4e00\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6027\u80fd\u9884\u6d4b\u5668\uff0c\u5229\u7528\u4e0a\u4e0b\u6587\u76f8\u5173\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\u8bbe\u8ba1\u7684\u8fd1\u4f3c\u6a21\u578b\uff0c\u4ece\u800c\u8df3\u8fc7\u65e0\u524d\u666f\u7684\u4ee3\u7406\u8bbe\u8ba1\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u7f51\u7edc\u5e94\u7528\u3001\u5b9e\u4f53\u4ea4\u4e92\u3001\u5de5\u5177\u4f7f\u7528\u548c\u6e38\u620f\u7b49\u4e0d\u540c\u573a\u666f\uff0c\u7ed3\u679c\u8868\u660e\uff0cAgentSquare\u663e\u8457\u4f18\u4e8e\u624b\u5de5\u8bbe\u8ba1\u7684\u667a\u80fd\u4f53\uff0c\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e8617.2%\uff0c\u4e0e\u4eba\u7c7b\u6700\u4f73\u8bbe\u8ba1\u76f8\u6bd4\u3002\u6b64\u5916\uff0cAgentSquare\u8fd8\u80fd\u751f\u6210\u53ef\u89e3\u91ca\u7684\u8bbe\u8ba1\u6d1e\u5bdf\uff0c\u6709\u52a9\u4e8e\u6df1\u5165\u7406\u89e3\u667a\u80fd\u4f53\u67b6\u6784\u53ca\u5176\u5bf9\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6a21\u5757\u5316\u8bbe\u8ba1\u7a7a\u95f4\u548cAgentSquare\u641c\u7d22\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5e73\u53f0\uff0c\u7528\u4e8e\u5145\u5206\u5229\u7528\u5148\u524d\u6210\u529f\u8bbe\u8ba1\u7684\u6f5c\u529b\uff0c\u5e76\u6574\u5408\u7814\u7a76\u793e\u533a\u7684\u52aa\u529b\u3002\u4ee3\u7801\u4ed3\u5e93\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/tsinghua-fib-lab/AgentSquare\u3002**|\n", "2410.05570": "|**2024-10-08**|**Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback**|Taufiq Daryanto et.al.|[2410.05570](http://arxiv.org/abs/2410.05570)|null|\u6c42\u804c\u9762\u8bd5\u5728\u5851\u9020\u4e2a\u4eba\u804c\u4e1a\u751f\u6daf\u65b9\u9762\u8d77\u7740\u5173\u952e\u4f5c\u7528\uff0c\u7136\u800c\uff0c\u7f3a\u4e4f\u4eba\u7c7b\u6559\u7ec3\u6216\u540c\u884c\u63d0\u4f9b\u53cd\u9988\u7684\u73af\u5883\u4f7f\u9762\u8bd5\u6280\u80fd\u8bad\u7ec3\u53d8\u5f97\u9887\u5177\u6311\u6218\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u63d0\u5347\u9762\u8bd5\u7ec3\u4e60\u4f53\u9a8c\u63d0\u4f9b\u4e86\u673a\u4f1a\u3002\u9057\u61be\u7684\u662f\uff0c\u76ee\u524d\u7684\u7814\u7a76\u9c9c\u6709\u63a2\u8ba8\u6b64\u7c7b\u7cfb\u7edf\u7684\u6548\u679c\u53ca\u5176\u7528\u6237\u611f\u77e5\uff0c\u4ee5\u53ca\u5229\u7528LLM\u8fdb\u884c\u9762\u8bd5\u7ec3\u4e60\u6240\u6d89\u53ca\u7684\u76ca\u5904\u4e0e\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u548c\u6700\u8fd1\u7684\u5546\u4e1a\u5de5\u5177\u5df2\u7ecf\u5c55\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u9762\u8bd5\u7ec3\u4e60\u7684\u6f5c\u529b\uff0c\u5b83\u4eec\u901a\u5e38\u4ec5\u63d0\u4f9b\u5355\u5411\u53cd\u9988\uff0c\u5373\u7528\u6237\u53ea\u80fd\u4ece\u4ed6\u4eec\u7684\u8868\u73b0\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5bf9\u8bdd\u5f0f\u53cd\u9988\uff0c\u4e00\u4e2a\u5728\u5b66\u4e60\u79d1\u5b66\u9886\u57df\u53d1\u5c55\u8d77\u6765\u7684\u6982\u5ff5\uff0c\u662f\u4e00\u79cd\u53cc\u5411\u4e92\u52a8\u53cd\u9988\u8fc7\u7a0b\uff0c\u5141\u8bb8\u7528\u6237\u901a\u8fc7\u5bf9\u8bdd\u8fdb\u4e00\u6b65\u53c2\u4e0e\u5e76\u4ece\u63d0\u4f9b\u7684\u53cd\u9988\u4e2d\u5b66\u4e60\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u6b3e\u540d\u4e3aConversate\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u5e94\u7528\u7a0b\u5e8f\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u652f\u6301\u53cd\u601d\u6027\u5b66\u4e60\uff0c\u4ee5\u4fc3\u8fdb\u6c42\u804c\u9762\u8bd5\u7ec3\u4e60\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u804c\u4f4d\u6807\u9898\uff08\u5982\u5165\u95e8\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\uff09\u6765\u542f\u52a8\u9762\u8bd5\u4f1a\u8bdd\u3002\u7136\u540e\uff0c\u7cfb\u7edf\u4e2d\u7684LLM\u4ee3\u7406\u5c06\u5f00\u59cb\u9762\u8bd5\u6a21\u62df\uff0c\u901a\u8fc7\u5411\u7528\u6237\u63d0\u51fa\u5f00\u573a\u9762\u8bd5\u95ee\u9898\uff0c\u5e76\u6839\u636e\u7528\u6237\u7684\u56de\u7b54\u7cbe\u5fc3\u8bbe\u8ba1\u540e\u7eed\u95ee\u9898\u6765\u542f\u52a8\u3002\u9762\u8bd5\u7ed3\u675f\u540e\uff0c\u7cfb\u7edf\u7684\u540e\u7aefLLM\u6846\u67b6\u5c06\u5206\u6790\u7528\u6237\u7684\u56de\u7b54\uff0c\u6307\u51fa\u9700\u8981\u6539\u8fdb\u7684\u5730\u65b9\u3002\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u9009\u62e9\u7279\u5b9a\u6bb5\u843d\u5e76\u64b0\u5199\u81ea\u6211\u53cd\u601d\u6765\u6ce8\u91ca\u8f6c\u5f55\u3002\u6700\u540e\uff0c\u7528\u6237\u53ef\u4ee5\u4e0e\u7cfb\u7edf\u8fdb\u884c\u5bf9\u8bdd\u5f0f\u53cd\u9988\u4ea4\u4e92\uff0c\u4e0eLLM\u4ee3\u7406\u5bf9\u8bdd\uff0c\u6839\u636e\u4ee3\u7406\u7684\u6307\u5bfc\u9010\u6b65\u5b8c\u5584\u81ea\u5df1\u7684\u7b54\u6848\u3002|\n", "2410.05434": "|**2024-10-07**|**Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback**|Sanjiban Choudhury et.al.|[2410.05434](http://arxiv.org/abs/2410.05434)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u51b3\u7b56\u5236\u5b9a\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u65b9\u6cd5\u7f3a\u4e4f\u4ece\u4efb\u52a1\u6267\u884c\u671f\u95f4\u9519\u8bef\u4e2d\u81ea\u52a8\u81ea\u6211\u6539\u8fdb\u7684\u673a\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86LEAP\uff0c\u4e00\u79cd\u8fed\u4ee3\u7ec6\u8c03\u6846\u67b6\uff0c\u901a\u8fc7\u4eceAI\u4e13\u5bb6\u6559\u5e08\u83b7\u53d6\u53cd\u9988\u6765\u6301\u7eed\u63d0\u5347LLM\u4ee3\u7406\u3002\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\u4e3a\u4e13\u5bb6\u6559\u5e08\u63d0\u4f9b\u4e00\u4e2a\u7279\u6743\u72b6\u6001\u2014\u2014\u4ec5\u5728\u8bad\u7ec3\u671f\u95f4\u53ef\u7528\u4f46\u5728\u6d4b\u8bd5\u65f6\u9690\u85cf\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u5373\u4f7f\u662f\u6700\u5f31\u7684\u4e13\u5bb6\u4e5f\u80fd\u63d0\u4f9b\u7cbe\u786e\u6307\u5bfc\uff0c\u663e\u8457\u63d0\u9ad8\u5b66\u751f\u4ee3\u7406\u5728\u4e0d\u8bbf\u95ee\u6d4b\u8bd5\u65f6\u7684\u7279\u6743\u4fe1\u606f\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u6211\u4eec\u5728\u591a\u79cd\u51b3\u7b56\u5236\u5b9a\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86LEAP\uff0c\u5305\u62ec\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\uff08ALFWorld\uff09\u3001\u7f51\u7edc\u5bfc\u822a\uff08WebShop\uff09\u548c\u4ea4\u4e92\u5f0f\u7f16\u7801\uff08Intercode Bash\uff09\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLEAP\uff081\uff09\u4f18\u4e8e\u884c\u4e3a\u514b\u9686\u548cReAct\u57fa\u7ebf\uff082\uff09\u4f7f\u8f83\u5f31\u7684\u5b66\u751f\u6a21\u578b\uff08\u5982Llama3-8B\uff09\u8d85\u8fc7\u5f3a\u5927\u6559\u5e08\u6a21\u578b\uff08GPT4-o\uff09\u7684\u8868\u73b0\uff0c\u5e76\u4e14\uff083\uff09\u5141\u8bb8\u8f83\u5f31\u7684\u6a21\u578b\u4f7f\u7528\u81ea\u5df1\u7279\u6743\u7248\u672c\u7684\u81ea\u6211\u63d0\u5347\u3002\u6211\u4eec\u4e5f\u63d0\u4f9b\u4e86\u7406\u8bba\u5206\u6790\uff0c\u663e\u793aLEAP\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5e73\u8861\u7279\u6743\u4fe1\u606f\u4e0e\u5b66\u751f\u7684\u53ef\u5b9e\u73b0\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc1\u5b9e\u4e86\u8fd9\u4e00\u89c2\u70b9\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://leap-llm.github.io \u83b7\u53d6\u3002|\n", "2410.07869": "|**2024-10-10**|**Benchmarking Agentic Workflow Generation**|Shuofei Qiao et.al.|[2410.07869](http://arxiv.org/abs/2410.07869)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u51ed\u501f\u5176\u5728\u5904\u7406\u5e7f\u6cdb\u4efb\u52a1\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\uff0c\u63a8\u52a8\u4e86\u63a8\u7406\u548c\u89c4\u5212\u4efb\u52a1\u7684\u663e\u8457\u8fdb\u6b65\u3002\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u5c06\u590d\u6742\u95ee\u9898\u5206\u89e3\u4e3a\u53ef\u6267\u884c\u7684\u5de5\u4f5c\u6d41\u662f\u5173\u952e\u6b65\u9aa4\u3002\u73b0\u6709\u7684\u5de5\u4f5c\u6d41\u8bc4\u4f30\u6846\u67b6\u8981\u4e48\u4ec5\u5173\u6ce8\u6574\u4f53\u6027\u80fd\uff0c\u8981\u4e48\u5b58\u5728\u9650\u5236\uff0c\u5982\u573a\u666f\u8986\u76d6\u8303\u56f4\u6709\u9650\u3001\u5de5\u4f5c\u6d41\u7ed3\u6784\u8fc7\u4e8e\u7b80\u5355\u4ee5\u53ca\u8bc4\u4ef7\u6807\u51c6\u5bbd\u677e\u7b49\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86WorFBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u591a\u7ef4\u573a\u666f\u548c\u590d\u6742\u56fe\u5de5\u4f5c\u6d41\u7ed3\u6784\u7684\u7edf\u4e00\u5de5\u4f5c\u6d41\u751f\u6210\u57fa\u51c6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u7cfb\u7edf\u6027\u7684\u8bc4\u4f30\u534f\u8bae\u2014\u2014WorFEval\uff0c\u5229\u7528\u5b50\u5e8f\u5217\u548c\u5b50\u56fe\u5339\u914d\u7b97\u6cd5\u6765\u51c6\u786e\u91cf\u5316LLM\u4ee3\u7406\u7684\u5de5\u4f5c\u6d41\u751f\u6210\u80fd\u529b\u3002 \u901a\u8fc7\u4e0d\u540c\u7c7b\u578b\u7684LLM\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLM\u4ee3\u7406\u5728\u5e8f\u5217\u89c4\u5212\u80fd\u529b\u548c\u56fe\u89c4\u5212\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u660e\u663e\u7684\u5dee\u8ddd\uff0c\u5373\u4f7f\u662fGPT-4\u4e5f\u663e\u793a\u51fa\u7ea615%\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u8fd8\u8bad\u7ec3\u4e86\u4e24\u4e2a\u5f00\u6e90\u6a21\u578b\uff0c\u5e76\u5728\u4fdd\u7559\u4efb\u52a1\u4e0a\u8bc4\u4f30\u5b83\u4eec\u7684\u4e00\u822c\u5316\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u751f\u6210\u7684\u5de5\u4f5c\u6d41\u80fd\u591f\u589e\u5f3a\u4e0b\u6e38\u4efb\u52a1\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4efb\u52a1\u5728\u63a8\u7406\u65f6\u80fd\u591f\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u5e76\u8282\u7701\u65f6\u95f4\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5c06\u5728https://github.com/zjunlp/WorFBench\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.07706": "|**2024-10-10**|**AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories**|Yifan Song et.al.|[2410.07706](http://arxiv.org/abs/2410.07706)|null|\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentBank\uff0c\u8fd9\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u7528\u4e8e\u5f00\u653e\u6e90\u4ee3\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684agent-environment\u4ea4\u4e92\u8f68\u8ff9\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u8d85\u8fc75\u4e07\u6761\u591a\u6837\u5316\u7684\u9ad8\u8d28\u91cf\u4ea4\u4e92\u8f68\u8ff9\uff0c\u6d89\u53ca16\u4e2a\u4efb\u52a1\u548c\u4e94\u4e2a\u4e0d\u540c\u7684agent\u6280\u80fd\u7ef4\u5ea6\u3002\u901a\u8fc7\u65b0\u9896\u7684\u6ce8\u91ca\u6d41\u7a0b\uff0c\u6211\u4eec\u80fd\u591f\u89c4\u6a21\u5316\u5730\u6807\u6ce8\u8f68\u8ff9\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96be\u5ea6\u504f\u5dee\u6700\u5c0f\u5316\u7684\u8f68\u8ff9\u6570\u636e\u96c6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5bf9AgentBank\u8fdb\u884c\u8c03\u4f18\uff0c\u5f97\u5230\u4e86\u4e00\u7cfb\u5217\u7684agent\u6a21\u578b\u2014\u2014Samoyed\u3002\u6211\u4eec\u7684\u6bd4\u8f83\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u6269\u5c55\u4ea4\u4e92\u8f68\u8ff9\u6570\u636e\u6765\u83b7\u53d6\u901a\u7528\u7684agent\u80fd\u529b\u7684\u6709\u6548\u6027\u3002\u989d\u5916\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u4e8e\u8f68\u8ff9\u8c03\u4f18\u548cagent\u6280\u80fd\u6cdb\u5316\u7684\u5173\u952e\u89c2\u5bdf\u7ed3\u679c\u3002|\n", "2410.07484": "|**2024-10-11**|**WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents**|Siyu Zhou et.al.|[2410.07484](http://arxiv.org/abs/2410.07484)|**[link](https://github.com/elated-sawyer/WALL-E)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u662f\u5426\u53ef\u4ee5\u76f4\u63a5\u4f5c\u4e3a\u6a21\u578b\u9a71\u52a8\u4ee3\u7406\u7684\u5f3a\u5927\u4e16\u754c\u6a21\u578b\uff1f\u867d\u7136LLM\u7684\u5148\u9a8c\u77e5\u8bc6\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u786e\u5b9e\u5b58\u5728\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u4f7fLLM\u4e0e\u5176\u90e8\u7f72\u73af\u5883\u5bf9\u9f50\u6765\u5f25\u5408\u8fd9\u4e9b\u5dee\u8ddd\uff0c\u8fd9\u79cd\u201c\u4e16\u754c\u5bf9\u9f50\u201d\u53ef\u4ee5\u901a\u8fc7\u5728LLM\u4e0a\u8fdb\u884c\u89c4\u5219\u5b66\u4e60\u6765\u9ad8\u6548\u5b9e\u73b0\u3002\u8003\u8651\u5230LLM\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4ec5\u9700\u5c11\u91cf\u989d\u5916\u89c4\u5219\u5373\u53ef\u4f7fLLM\u9884\u6d4b\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u529b\u5b66\u76f8\u5339\u914d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\uff0c\u901a\u8fc7LLM\u4ee5\u68af\u5ea6\u65e0\u7684\u5b66\u4e60\u65b9\u5f0f\u6765\u5b66\u4e60\u8fd9\u4e9b\u89c4\u5219\uff0c\u901a\u8fc7\u57fa\u4e8e\u63a2\u7d22\u8f68\u8ff9\u4e0e\u4e16\u754c\u6a21\u578b\u9884\u6d4b\u7684\u6bd4\u8f83\u6765\u8bf1\u5bfc\u3001\u66f4\u65b0\u548c\u4fee\u526a\u89c4\u5219\u3002\u7ed3\u679c\u7684\u4e16\u754c\u6a21\u578b\u7531LLM\u548c\u5b66\u4e60\u5230\u7684\u89c4\u5219\u7ec4\u6210\u3002\u6211\u4eec\u6784\u5efa\u7684\u5b9e\u4f53\u5316LLM\u4ee3\u7406\u201cWALL-E\u201d\u57fa\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\uff08MPC\uff09\u3002\u901a\u8fc7\u57fa\u4e8e\u7cbe\u786e\u4e16\u754c\u6a21\u578b\u4f18\u5316\u524d\u77bb\u884c\u52a8\uff0cMPC\u663e\u8457\u63d0\u9ad8\u4e86\u63a2\u7d22\u548c\u5b66\u4e60\u6548\u7387\u3002\u4e0e\u73b0\u6709LLM\u4ee3\u7406\u76f8\u6bd4\uff0c\u201cWALL-E\u201d\u7684\u63a8\u7406\u4ec5\u9700\u8981\u5c11\u91cf\u4e3b\u8981\u89c4\u5219\uff0c\u800c\u4e0d\u9700\u8981\u5305\u542b\u5728LLM\u8f93\u5165\u4e2d\u7684\u5927\u91cf\u7f13\u51b2\u8f68\u8ff9\u3002\u5728Minecraft\u548cALFWorld\u7684\u5f00\u653e\u4e16\u754c\u6311\u6218\u4e2d\uff0cWALL-E\u7684\u6210\u529f\u7387\u9ad8\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u89c4\u5212\u65f6\u95f4\u548c\u63a8\u7406\u6240\u9700\u7684\u4ee4\u724c\u6570\u91cf\u66f4\u4f4e\u3002\u5728Minecraft\u4e2d\uff0cWALL-E\u6bd4\u57fa\u7ebf\u9ad8\u51fa15%-30%\uff0c\u6210\u529f\u7387\u4e3a95%\uff0c\u4ec5\u82b1\u8d396\u6b21\u8fed\u4ee3\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|\u53e0\u5c42\u6210\u50cf\u662f\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u4e2d\u7684\u4e00\u79cd\u5148\u8fdb\u7684\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7269\u7406\u3001\u5316\u5b66\u3001\u751f\u7269\u548c\u6750\u6599\u79d1\u5b66\u7b49\u79d1\u7814\u9886\u57df\uff0c\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u3002\u5b9e\u9645\u4e0a\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684\u53e0\u5c42\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u8bb8\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u4f4e\u541e\u5410\u91cf\u7684\u5de5\u4f5c\u6d41\u7a0b\u548c\u6f5c\u5728\u7684\u4eba\u7c7b\u504f\u89c1\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u201c\u53e0\u5c42\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u5316\u53e0\u5c42\u6210\u50cf\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u4f7f\u7528\u591a\u4e2aLLM\u4ee3\u7406\u6267\u884c\u4efb\u52a1\uff0c\u5305\u62ec\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e3a\u53ef\u4ee5\u4e0e\u5b9a\u5236\u7684\u672c\u5730\u77e5\u8bc6\u5e93\u4e00\u8d77\u5de5\u4f5c\uff0c\u786e\u4fdd\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e2d\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09024": "|**2024-10-14**|**AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents**|Maksym Andriushchenko et.al.|[2410.09024](http://arxiv.org/abs/2410.09024)|null|\u5bf9\u4e8e\u8bed\u8a00\u5927\u6a21\u578b\uff08LLMs\uff09\u5728\u9762\u5bf9\u8d8a\u72f1\u653b\u51fb\u65f6\u7684\u9c81\u68d2\u6027\u7814\u7a76\uff0c\u4e3b\u8981\u96c6\u4e2d\u5728\u5b83\u4eec\u4f5c\u4e3a\u7b80\u5355\u7684\u804a\u5929\u673a\u5668\u4eba\u65f6\u7684\u60c5\u51b5\u3002\u7136\u800c\uff0c\u80fd\u591f\u4f7f\u7528\u5916\u90e8\u5de5\u5177\u5e76\u6267\u884c\u591a\u9636\u6bb5\u4efb\u52a1\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u53ef\u80fd\u5e26\u6765\u66f4\u5927\u7684\u98ce\u9669\uff0c\u4f46\u5176\u9c81\u68d2\u6027\u4ecd\u7f3a\u4e4f\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6ee5\u7528\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014AgentHarm\u3002\u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u62ec110\u4e2a\u660e\u786e\u6076\u610f\u7684\u4ee3\u7406\u4efb\u52a1\uff08\u901a\u8fc7\u589e\u5f3a\u540e\u8fbe\u5230440\u4e2a\uff09\uff0c\u6db5\u76d6\u4e86\u6b3a\u8bc8\u3001\u7f51\u7edc\u72af\u7f6a\u548c\u9a9a\u6270\u7b4911\u7c7b\u5371\u5bb3\u3002\u9664\u4e86\u8861\u91cf\u6a21\u578b\u662f\u5426\u62d2\u7edd\u6709\u5bb3\u7684\u4ee3\u7406\u8bf7\u6c42\u5916\uff0c\u8981\u5728AgentHarm\u4e0a\u53d6\u5f97\u9ad8\u5206\u8fd8\u9700\u8981\u88ab\u8d8a\u72f1\u7684\u4ee3\u7406\u80fd\u591f\u5728\u906d\u53d7\u653b\u51fb\u540e\u7ef4\u6301\u5176\u80fd\u529b\u4ee5\u5b8c\u6210\u591a\u6b65\u4efb\u52a1\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u7cfb\u5217\u9886\u5148\u7684LLMs\uff0c\u53d1\u73b0\uff081\uff09\u9886\u5148\u7684LLMs\u5728\u6ca1\u6709\u8d8a\u72f1\u7684\u60c5\u51b5\u4e0b\u4f1a\u51fa\u4e4e\u610f\u6599\u5730\u670d\u4ece\u6076\u610f\u4ee3\u7406\u8bf7\u6c42\uff0c\uff082\uff09\u7b80\u5355\u7684\u901a\u7528\u8d8a\u72f1\u6a21\u677f\u53ef\u4ee5\u6709\u6548\u8d8a\u72f1\u4ee3\u7406\uff0c\uff083\uff09\u8fd9\u4e9b\u8d8a\u72f1\u80fd\u591f\u4f7f\u8fde\u8d2f\u4e14\u6076\u610f\u7684\u591a\u6b65\u4ee3\u7406\u884c\u4e3a\u5f97\u4ee5\u5b9e\u73b0\uff0c\u5e76\u4fdd\u7559\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u4fbf\u4e8e\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7b80\u5355\u53ef\u9760\u7684\u653b\u51fb\u548c\u9632\u5fa1\u8bc4\u4f30\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86AgentHarm\uff0c\u7f51\u5740\u662fhttps://huggingface.co/datasets/ai-safety-institute/AgentHarm\u3002|\n", "2410.08948": "|**2024-10-11**|**The Dynamics of Social Conventions in LLM populations: Spontaneous Emergence, Collective Biases and Tipping Points**|Ariel Flint Ashery et.al.|[2410.08948](http://arxiv.org/abs/2410.08948)|null|\u793e\u4f1a\u60ef\u4f8b\u662f\u793e\u4f1a\u548c\u7ecf\u6d4e\u751f\u6d3b\u7684\u57fa\u7840\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684AI\u4ee3\u7406\u4e0e\u5f7c\u6b64\u4ee5\u53ca\u4eba\u7c7b\u8fdb\u884c\u4e92\u52a8\uff0c\u5b83\u4eec\u5f62\u6210\u5171\u4eab\u60ef\u4f8b\u7684\u80fd\u529b\u5c06\u51b3\u5b9a\u5b83\u4eec\u534f\u8c03\u884c\u4e3a\u3001\u878d\u5165\u793e\u4f1a\u5e76\u5f71\u54cd\u793e\u4f1a\u7684\u6548\u679c\u3002\u672c\u6587\u901a\u8fc7\u6a21\u62df\u4ea4\u4e92\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7fa4\u4f53\u5185\u90e8\u60ef\u4f8b\u7684\u52a8\u529b\u5b66\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5168\u7403\u63a5\u53d7\u7684\u793e\u4f1a\u60ef\u4f8b\u53ef\u4ee5\u81ea\u53d1\u5730\u4ece\u76f8\u4e92\u4ea4\u6d41\u7684LLM\u4e4b\u95f4\u4ea7\u751f\u3002\u5176\u6b21\uff0c\u6211\u4eec\u6f14\u793a\u4e86\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u5373\u4f7f\u662f\u4e2a\u4f53\u4ee3\u7406\u770b\u4f3c\u65e0\u504f\u89c1\u7684\u60c5\u51b5\u4e0b\uff0c\u5f3a\u70c8\u7684\u96c6\u4f53\u504f\u89c1\u4e5f\u53ef\u80fd\u4f1a\u51fa\u73b0\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u8003\u5bdf\u4e86\u5c11\u6570\u7fa4\u4f53\u4e2d\u7684\u575a\u5b9aLLM\u5982\u4f55\u63a8\u52a8\u793e\u4f1a\u53d8\u9769\uff0c\u901a\u8fc7\u5efa\u7acb\u65b0\u7684\u793e\u4f1a\u60ef\u4f8b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65e6\u8fd9\u4e9b\u5c11\u6570\u7fa4\u4f53\u8fbe\u5230\u4e34\u754c\u89c4\u6a21\uff0c\u5b83\u4eec\u5c31\u80fd\u591f\u6301\u7eed\u98a0\u8986\u5df2\u5efa\u7acb\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u6240\u6709\u60c5\u51b5\u4e0b\uff0c\u5c06\u5b9e\u9a8c\u7ed3\u679c\u4e0e\u4e00\u4e2a\u6700\u5c0f\u5316\u591a\u4ee3\u7406\u6a21\u578b\u7684\u9884\u6d4b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u9694\u79bbLLM\u4ee3\u7406\u7684\u5177\u4f53\u4f5c\u7528\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u9610\u660e\u4e86AI\u7cfb\u7edf\u53ef\u4ee5\u5728\u6ca1\u6709\u660e\u786e\u7f16\u7a0b\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u53d1\u5c55\u89c4\u8303\uff0c\u5e76\u5bf9\u8bbe\u8ba1\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u793e\u4f1a\u76ee\u6807\u76f8\u4e00\u81f4\u7684AI\u7cfb\u7edf\u5177\u6709\u542f\u793a\u610f\u4e49\u3002|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u4f8b\u5982\u901a\u8fc7\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u7684\u5bf9\u6297\u6027\u8f93\u5165\u53ef\u4ee5\u89e6\u53d1\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ec8\u6b62\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\uff08\u5982\u673a\u5668\u4eba\u8bed\u97f3\u547d\u4ee4\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4ec5\u4f9d\u9760\u81ea\u7136\u6307\u4ee4\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u6700\u5927\u957f\u5ea6\u9650\u5236\uff0c\u8fd9\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6709\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u4e2d\u7684\u4e0a\u9650\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u6295\u6bd2\u578bDoS\uff08P-DoS\uff09\u653b\u51fb\uff0c\u8bc1\u660e\u6ce8\u5165\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8eDoS\u76ee\u7684\u7684\u4e2d\u6bd2\u6837\u672c\u53ef\u4ee5\u6253\u7834\u8f93\u51fa\u957f\u5ea6\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u4e2d\u6bd2\u6837\u672c\u6210\u529f\u653b\u51fb\u4e86GPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u4f7f\u7528\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\uff0c\u5bfc\u81f4\u8f93\u51fa\u91cd\u590d\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2atoken\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u4e2d\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u5f00\u6e90LLMs\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u6025\u9700\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u5b89\u5168\u7684\u8feb\u5207\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u627e\u5230\u3002**|\n", "2410.10398": "|**2024-10-14**|**FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas**|Yu Lei et.al.|[2410.10398](http://arxiv.org/abs/2410.10398)|null|AI\u5bf9\u9f50\u662f\u5173\u4e4eAI\u63a7\u5236\u548c\u5b89\u5168\u7684\u5173\u952e\u95ee\u9898\u3002\u5b83\u4e0d\u4ec5\u5e94\u8003\u8651\u4ef7\u503c\u4e2d\u7acb\u7684\u4eba\u7c7b\u504f\u597d\uff0c\u8fd8\u5e94\u8003\u8651\u9053\u5fb7\u548c\u4f26\u7406\u65b9\u9762\u7684\u8003\u91cf\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FairMindSim\uff0c\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0d\u516c\u5e73\u7684\u60c5\u666f\u6765\u6a21\u62df\u9053\u5fb7\u56f0\u5883\u3002\u6211\u4eec\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u5728\u5404\u4e2a\u9636\u6bb5\u786e\u4fdd\u5bf9\u9f50\u3002\u4e3a\u4e86\u63a2\u7d22\u9a71\u52a8\u4eba\u7c7b\u548cLLM\u4ee3\u7406\u4f5c\u4e3a\u65c1\u89c2\u8005\u5728\u6d89\u53ca\u4ed6\u4eba\u7684\u4e0d\u516c\u6b63\u60c5\u51b5\u4e0b\u5e72\u9884\u7684\u5404\u79cd\u793e\u4f1a\u7ecf\u6d4e\u52a8\u673a\uff0c\u5373\u6211\u4eec\u6240\u79f0\u7684\u4fe1\u5ff5\uff0c\u5e76\u63a2\u8ba8\u8fd9\u4e9b\u4fe1\u5ff5\u5982\u4f55\u76f8\u4e92\u4f5c\u7528\u4ee5\u5f71\u54cd\u4e2a\u4f53\u884c\u4e3a\uff0c\u6211\u4eec\u5c06\u76f8\u5173\u793e\u4f1a\u5b66\u9886\u57df\u7684\u77e5\u8bc6\u7eb3\u5165\u5176\u4e2d\uff0c\u5e76\u57fa\u4e8e\u9012\u5f52\u5956\u52b1\u6a21\u578b\uff08RRM\uff09\u63d0\u51fa\u4e86\u4fe1\u5ff5-\u5956\u52b1\u5bf9\u9f50\u884c\u4e3a\u8fdb\u5316\u6a21\u578b\uff08BREM\uff09\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4ece\u884c\u4e3a\u89d2\u5ea6\u6765\u770b\uff0cGPT-4o\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u793e\u4f1a\u6b63\u4e49\u611f\uff0c\u800c\u4eba\u7c7b\u5219\u5c55\u73b0\u51fa\u66f4\u4e30\u5bcc\u7684\u60c5\u611f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u60c5\u7eea\u5bf9\u884c\u4e3a\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u672c\u7814\u7a76\u4e3aLLM\u4e0e\u5229\u4ed6\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u7406\u8bba\u57fa\u7840\u3002|\n", "2410.10136": "|**2024-10-14**|**Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations**|Garima Agrawal et.al.|[2410.10136](http://arxiv.org/abs/2410.10136)|null|\u5728\u5ba2\u6237\u8054\u7edc\u4e2d\u5fc3\uff0c\u4eba\u5de5\u5ba2\u670d\u7ecf\u5e38\u9762\u4e34\u8f83\u957f\u7684\u5e73\u5747\u5904\u7406\u65f6\u95f4\uff08AHT\uff09\uff0c\u56e0\u4e3a\u4ed6\u4eec\u9700\u8981\u624b\u52a8\u89e3\u6790\u67e5\u8be2\u5e76\u68c0\u7d22\u76f8\u5173\u7684\u77e5\u8bc6\u5e93\uff08KB\uff09\u6587\u7ae0\u3002\u867d\u7136\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u884c\u4e1a\u4ee5\u534f\u52a9\u6b64\u7c7b\u4efb\u52a1\uff0c\u4f46\u5728\u5b9e\u65f6\u5bf9\u8bdd\u4e2d\uff0cRAG\u7cfb\u7edf\u9762\u4e34\u7740\u8bf8\u5982\u67e5\u8be2\u516c\u5f0f\u4e0d\u51c6\u786e\u548c\u9891\u7e41\u95ee\u9898\u91cd\u590d\u68c0\u7d22\u7b49\u95ee\u9898\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u8d85\u8d8aRAG\uff0c\u5728\u5b9e\u65f6\u8bc6\u522b\u5ba2\u6237\u95ee\u9898\u3002\u5982\u679c\u67e5\u8be2\u5339\u914d\u5e38\u89c1\u95ee\u9898\u89e3\u7b54\uff08FAQ\uff09\uff0c\u7cfb\u7edf\u76f4\u63a5\u4eceFAQ\u6570\u636e\u5e93\u4e2d\u68c0\u7d22\u7b54\u6848\uff1b\u5426\u5219\uff0c\u901a\u8fc7RAG\u751f\u6210\u7b54\u6848\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u67e5\u8be2\u7684\u4f9d\u8d56\uff0c\u4f7f\u5f97\u54cd\u5e94\u80fd\u591f\u57282\u79d2\u5185\u63d0\u4f9b\u7ed9\u5ba2\u670d\u4eba\u5458\u3002\u6b64\u7cfb\u7edf\u90e8\u7f72\u5728Minerva CQ\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u89e3\u51b3\u65b9\u6848\u4e2d\uff0c\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u7f29\u77ed\u4e86AHT\uff0c\u5e76\u964d\u4f4e\u4e86\u8fd0\u8425\u6210\u672c\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7684LLM\u4ee3\u7406\u5de5\u4f5c\u6d41\uff0c\u5f53\u6ca1\u6709\u9884\u5b9a\u4e49\u7684FAQ\u65f6\uff0c\u53ef\u4ee5\u4ece\u5386\u53f2\u8bb0\u5f55\u4e2d\u8bc6\u522bFAQ\u3002|\n", "2410.10020": "|**2024-10-13**|**Adaptive Reasoning and Acting in Medical Language Agents**|Abhishek Dutta et.al.|[2410.10020](http://arxiv.org/abs/2410.10020)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u63d0\u5347\u5728\u6a21\u62df\u4e34\u5e8a\u73af\u5883\u4e2d\u7684\u8bca\u65ad\u51c6\u786e\u6027\uff0c\u5e76\u4f7f\u7528AgentClinic\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\u3002\u6240\u63d0\u51fa\u7684\u81ea\u52a8\u6821\u6b63\u673a\u5236\u4f7f\u5f97\u533b\u751f\u4ee3\u7406\u80fd\u591f\u5728\u9519\u8bef\u8bca\u65ad\u540e\u8fed\u4ee3\u5730\u4f18\u5316\u5176\u63a8\u7406\u548c\u884c\u4e3a\uff0c\u4ece\u800c\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u63d0\u9ad8\u51b3\u7b56\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u91c7\u7528\u81ea\u9002\u5e94LLM\u57fa\u7840\u533b\u751f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u4e0e\u6a21\u62df\u60a3\u8005\u7684\u52a8\u6001\u4e92\u52a8\u5b9e\u73b0\u6b63\u786e\u7684\u8bca\u65ad\u3002\u8bc4\u4f30\u7ed3\u679c\u7a81\u663e\u4e86\u81ea\u4e3b\u4ee3\u7406\u5728\u590d\u6742\u533b\u7597\u573a\u666f\u4e2d\u9002\u5e94\u548c\u6539\u8fdb\u7684\u80fd\u529b\u3002\u672a\u6765\u7684\u5de5\u4f5c\u5c06\u96c6\u4e2d\u5728\u5b8c\u5584\u7b97\u6cd5\u5e76\u6269\u5927\u5176\u5728\u66f4\u5e7f\u6cdb\u4efb\u52a1\u548c\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u9002\u7528\u6027\u3002|\n", "2410.09824": "|**2024-10-13**|**Dynamic and Textual Graph Generation Via Large-Scale LLM-based Agent Simulation**|Jiarui Ji et.al.|[2410.09824](http://arxiv.org/abs/2410.09824)|null|\u56fe\u751f\u6210\u662f\u793e\u4f1a\u3001\u6280\u672f\u548c\u79d1\u5b66\u7814\u7a76\u4e2d\u5e7f\u6cdb\u7814\u7a76\u7684\u57fa\u672c\u4efb\u52a1\u3002\u5728\u5efa\u6a21\u52a8\u6001\u56fe\u6f14\u5316\u8fc7\u7a0b\u65f6\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u65b9\u6cd5\u96be\u4ee5\u6355\u6349\u56fe\u4e2d\u7684\u793e\u533a\u7ed3\u6784\uff0c\u800c\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u4ec5\u5173\u6ce8\u62df\u5408\u8bad\u7ec3\u56fe\u3002\u8fd9\u9650\u5236\u4e86\u73b0\u6709\u7684\u56fe\u751f\u6210\u5668\u53ea\u80fd\u751f\u6210\u7b26\u5408\u9884\u5b9a\u4e49\u89c4\u5219\u6216\u4e0e\u8bad\u7ec3\u6570\u636e\u96c6\u9ad8\u5ea6\u76f8\u4f3c\u7684\u56fe\uff0c\u5728\u52a8\u6001\u56fe\u751f\u6210\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u9274\u4e8e\u56fe\u662f\u4ece\u4eba\u7c7b\u6d3b\u52a8\u4e2d\u6210\u5bf9\u4ea4\u4e92\u4ea7\u751f\u7684\u62bd\u8c61\u8868\u793a\uff0c\u5bf9\u4eba\u7c7b\u884c\u4e3a\u7684\u771f\u5b9e\u6a21\u62df\u53ef\u4ee5\u66f4\u6df1\u5165\u5730\u6d1e\u5bdf\u56fe\u6f14\u5316\u673a\u5236\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u65b9\u9762\u7684\u65e5\u76ca\u8ba4\u53ef\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u4eff\u771f\u6846\u67b6\u2014\u2014GraphAgent-Generator\uff08GAG\uff09\uff0c\u7528\u4e8e\u52a8\u6001\u56fe\u751f\u6210\u3002\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u6211\u4eec\u7684\u6846\u67b6\u6709\u6548\u590d\u5236\u4e86\u5df2\u5efa\u7acb\u7684\u7f51\u7edc\u79d1\u5b66\u7406\u8bba\u4e2d\u7684\u4e03\u4e2a\u5b8f\u89c2\u7ed3\u6784\u7279\u5f81\uff0c\u540c\u65f6\u5728\u7279\u5b9a\u8bc4\u4f30\u6307\u6807\u4e0a\u6bd4\u73b0\u6709\u57fa\u7ebf\u5728\u56fe\u6269\u5c55\u4efb\u52a1\u4e2d\u63d0\u9ad8\u4e8631%\u3002\u901a\u8fc7\u8282\u70b9\u5206\u7c7b\u4efb\u52a1\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86GAG\u80fd\u591f\u6709\u6548\u4fdd\u7559\u771f\u5b9e\u4e16\u754c\u7f51\u7edc\u7684\u8282\u70b9\u7ea7\u6587\u672c\u7279\u5f81\u5728\u751f\u6210\u7684\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u4e2d\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u5e76\u884c\u52a0\u901f\uff0cGAG\u652f\u6301\u901a\u8fc7\u57fa\u4e8e\u5927\u89c4\u6a21LLM\u7684\u4ee3\u7406\u4eff\u771f\u751f\u6210\u6700\u591a\u63a5\u8fd110\u4e07\u4e2a\u8282\u70b9\u62161000\u4e07\u6761\u8fb9\u7684\u56fe\uff0c\u6700\u5c0f\u52a0\u901f\u6bd4\u4e3a90.4%\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002|\n", "2410.09713": "|**2024-10-13**|**Agentic Information Retrieval**|Weinan Zhang et.al.|[2410.09713](http://arxiv.org/abs/2410.09713)|null|\u81ea20\u4e16\u7eaa70\u5e74\u4ee3\u4ee5\u6765\uff0c\u7528\u6237\u8bbf\u95ee\u76f8\u5173\u4fe1\u606f\u4e00\u76f4\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u67b6\u6784\u3002\u5728\u8fc7\u53bb\u4e8c\u5341\u5e74\u4e2d\uff0c\u73b0\u4ee3IR\u7cfb\u7edf\uff08\u5305\u62ec\u7f51\u7edc\u641c\u7d22\u5f15\u64ce\u548c\u4e2a\u4eba\u5316\u63a8\u8350\u7cfb\u7edf\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u4ece\u5927\u91cf\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\u7684\u6548\u7387\u3002\u7136\u800c\uff0c\u8fd9\u4e9bIR\u7cfb\u7edf\u7684\u5185\u6838\u8303\u5f0f\u4ecd\u7136\u57fa\u672c\u4e0d\u53d8\uff0c\u4f9d\u8d56\u4e8e\u7b5b\u9009\u9884\u5b9a\u7684\u4e00\u7ec4\u5019\u9009\u9879\u76ee\u3002\u81ea2022\u5e74\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7a81\u7834\u5f00\u59cb\u6539\u53d8\u4fe1\u606f\u8bbf\u95ee\u7684\u65b9\u5f0f\uff0c\u5efa\u7acb\u4e86\u4e00\u79cd\u65b0\u7684\u6280\u672f\u8303\u5f0f\u3002\u5728\u672c\u6587\u732e\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7531LLM\u4ee3\u7406\u80fd\u529b\u5851\u9020\u7684\u65b0IR\u8303\u5f0f\u2014\u2014\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\uff08Agentic IR\uff09\u3002Agentic IR\u6269\u5c55\u4e86\u53ef\u8bbf\u95ee\u4efb\u52a1\u7684\u8303\u56f4\uff0c\u5e76\u5229\u7528\u4e00\u7cfb\u5217\u65b0\u6280\u672f\u91cd\u65b0\u5b9a\u4e49\u4fe1\u606f\u68c0\u7d22\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u4e09\u79cd\u524d\u6cbf\u5e94\u7528\u4ee5\u53ca\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\u6709\u671b\u4ea7\u751f\u521b\u65b0\u7684\u5e94\u7528\uff0c\u53ef\u80fd\u6210\u4e3a\u672a\u6765\u6570\u5b57\u751f\u6001\u7cfb\u7edf\u4e2d\u7684\u6838\u5fc3\u4fe1\u606f\u5165\u53e3\u3002|\n", "2410.09381": "|**2024-10-12**|**LLM-SmartAudit: Advanced Smart Contract Vulnerability Detection**|Zhiyuan Wei et.al.|[2410.09381](http://arxiv.org/abs/2410.09381)|null|\u533a\u5757\u94fe\u6280\u672f\u7684\u4e0d\u53d8\u6027\u8d28\u867d\u7136\u9769\u547d\u6027\uff0c\u4f46\u4e5f\u5f15\u5165\u4e86\u663e\u8457\u7684\u5b89\u5168\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u3002\u8fd9\u4e9b\u5b89\u5168\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u5de8\u5927\u7684\u8d22\u52a1\u635f\u5931\u3002\u5f53\u524d\u5de5\u5177\u548c\u65b9\u6cd5\u901a\u5e38\u4e13\u6ce8\u4e8e\u7279\u5b9a\u7c7b\u578b\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u4e00\u79cd\u80fd\u591f\u5e7f\u6cdb\u68c0\u6d4b\u591a\u79cd\u6f0f\u6d1e\u4e14\u5177\u6709\u9ad8\u51c6\u786e\u6027\u7684\u7efc\u5408\u5de5\u5177\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLM-SmartAudit\u7684\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u80fd\u529b\u6765\u68c0\u6d4b\u548c\u5206\u6790\u667a\u80fd\u5408\u7ea6\u4e2d\u7684\u6f0f\u6d1e\u3002\u901a\u8fc7\u591a\u4ee3\u7406\u5bf9\u8bdd\u65b9\u6cd5\uff0cLLM-SmartAudit\u91c7\u7528\u534f\u4f5c\u7cfb\u7edf\u4e0e\u4e13\u4e1a\u4ee3\u7406\u5408\u4f5c\u4ee5\u589e\u5f3a\u5ba1\u8ba1\u8fc7\u7a0b\u3002\u4e3a\u4e86\u8bc4\u4f30LLM-SmartAudit\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u7f16\u5236\u4e86\u4e24\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u4e0e\u4f20\u7edf\u5de5\u5177\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5b9e\u9645\u5e94\u7528\u7684\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6240\u6709\u4f20\u7edf\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u5de5\u5177\u4e4b\u4e0a\uff0c\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u51c6\u786e\u6027\u548c\u66f4\u5927\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u68c0\u6d4b\u590d\u6742\u903b\u8f91\u6f0f\u6d1e\uff0c\u800c\u4f20\u7edf\u5de5\u5177\u4e4b\u524d\u672a\u66fe\u53d1\u73b0\u8fd9\u4e9b\u6f0f\u6d1e\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5229\u7528LLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u975e\u5e38\u6709\u6548\u7684\u81ea\u52a8\u5316\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u65b9\u6cd5\u3002|\n", "2410.11239": "|**2024-10-15**|**HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications**|Weijie Xu et.al.|[2410.11239](http://arxiv.org/abs/2410.11239)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u5728\u6559\u80b2\u548c\u91d1\u878d\u7b49\u591a\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u8bf8\u591a\u76ca\u5904\uff0c\u4f46\u5728\u4eba\u529b\u8d44\u6e90\u9886\u57df\uff0c\u4ecd\u7136\u5b58\u5728\u8bb8\u591a\u91cd\u590d\u6027\u4efb\u52a1\u672a\u88ab\u89e3\u51b3\uff0c\u4f8b\u5982\u8bbf\u95ee\u8bf7\u6c42\u3001\u533b\u7597\u62a5\u9500\u548c\u8bf7\u5047\u7533\u8bf7\u7b49\u3002\u6211\u4eec\u5e0c\u671b\u5c06\u8fd9\u4e9b\u4efb\u52a1\u4e0eLLM\u4ee3\u7406\u76f8\u5173\u8054\uff0c\u8be5\u4ee3\u7406\u5df2\u7ecf\u5728\u8bf8\u5982\u5199\u4f5c\u8f85\u52a9\u548c\u5ba2\u6237\u670d\u52a1\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002\u6211\u4eec\u63d0\u51fa\u4e86HR-Agent\uff0c\u8fd9\u662f\u4e00\u79cd\u9ad8\u6548\u3001\u4fdd\u5bc6\u4e14\u4e13\u95e8\u9488\u5bf9\u4eba\u529b\u8d44\u6e90\u7684\u57fa\u4e8eLLM\u7684\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u5982\u533b\u7597\u62a5\u9500\u548c\u8bbf\u95ee\u8bf7\u6c42\u7b49\u91cd\u590d\u6027HR\u6d41\u7a0b\u3002\u7531\u4e8e\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e0d\u4f1a\u5c06\u5bf9\u8bdd\u6570\u636e\u53d1\u9001\u7ed9LLM\uff0c\u56e0\u6b64\u5b83\u80fd\u591f\u4fdd\u6301\u4eba\u529b\u8d44\u6e90\u76f8\u5173\u4efb\u52a1\u6240\u9700\u7684\u4fdd\u5bc6\u6027\u3002|\n", "2410.12568": "|**2024-10-16**|**Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving**|Sihao Wu et.al.|[2410.12568](http://arxiv.org/abs/2410.12568)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u5e38\u8bc6\u548c\u63a8\u7406\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u7eaf\u7cb9\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u7684\u7f3a\u9677\u3002\u5f53\u524d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u9700\u8981\u8f83\u957f\u7684\u63a8\u7406\u65f6\u95f4\uff0c\u5e76\u4e14\u5728\u4e0e\u5b9e\u65f6\u81ea\u52a8\u9a7e\u9a76\u73af\u5883\u4ea4\u4e92\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e00\u4e2a\u5173\u952e\u7684\u5f00\u653e\u6027\u95ee\u9898\u662f\uff0c\u6211\u4eec\u80fd\u5426\u6709\u6548\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u9ad8\u6548\u4e14\u7a33\u5065\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAPID\u6846\u67b6\uff0c\u5373\u9c81\u68d2\u81ea\u9002\u5e94\u7b56\u7565\u6ce8\u5165\u4e0e\u84b8\u998f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u4f7f\u7528\u7531\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u4ee3\u7406\u5408\u6210\u7684\u6570\u636e\u8fdb\u884c\u5728\u7ebf\u9002\u5e94\uff0c\u8bad\u7ec3\u4e13\u95e8\u7684\u6df7\u5408\u7b56\u7565RL\u4ee3\u7406\u3002RAPID\u5177\u6709\u4e09\u4e2a\u5173\u952e\u8bbe\u8ba1\uff1a1\uff09\u5229\u7528\u4eceLLM\u4ee3\u7406\u6536\u96c6\u7684\u79bb\u7ebf\u6570\u636e\uff0c\u5c06\u4e13\u5bb6\u77e5\u8bc6\u63d0\u70bc\u5230RL\u7b56\u7565\u4e2d\uff0c\u4ee5\u52a0\u5feb\u5b9e\u65f6\u63a8\u7406\u901f\u5ea6\uff1b2\uff09\u5f15\u5165\u5728RL\u4e2d\u5177\u6709\u9c81\u68d2\u6027\u7684\u84b8\u998f\u6280\u672f\uff0c\u7ee7\u627fLLM\u57fa\u6559\u5e08\u7684\u6027\u80fd\u548c\u9c81\u68d2\u6027\uff1b3\uff09\u91c7\u7528\u6df7\u5408\u7b56\u7565\u65b9\u6cd5\uff0c\u901a\u8fc7\u7b56\u7565\u9002\u914d\u5668\u8fdb\u884c\u8054\u5408\u51b3\u7b56\u89e3\u7801\u3002\u901a\u8fc7\u5728\u7ebf\u73af\u5883\u4e92\u52a8\u8fdb\u884c\u5fae\u8c03\uff0cRAPID\u51cf\u5c11\u4e86LLM\u77e5\u8bc6\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u4e0d\u540c\u4efb\u52a1\u7684\u9002\u5e94\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRAPID\u80fd\u591f\u4ee5\u9ad8\u6548\u3001\u53ef\u9002\u5e94\u548c\u7a33\u5065\u7684\u65b9\u5f0f\u5c06LLM\u77e5\u8bc6\u6709\u6548\u6574\u5408\u5230\u89c4\u6a21\u5316\u7684RL\u7b56\u7565\u4e2d\u3002\u4ee3\u7801\u548c\u68c0\u67e5\u70b9\u5c06\u5728\u63a5\u53d7\u540e\u516c\u5f00\u3002|\n", "2410.12481": "|**2024-10-16**|**SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling**|Loris Gaven et.al.|[2410.12481](http://arxiv.org/abs/2410.12481)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0d\u4ec5\u4f5c\u4e3a\u751f\u6210\u6a21\u578b\u53d1\u5c55\uff0c\u8fd8\u4f5c\u4e3a\u89e3\u51b3\u6587\u672c\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u7684\u4ee3\u7406\u3002\u5f53\u9762\u5bf9\u590d\u6742\u73af\u5883\uff0c\u5176\u96f6\u6837\u672c\u80fd\u529b\u4e0d\u8db3\u65f6\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0c\u53ef\u4ee5\u4f7f\u7528\u5728\u7ebf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9b\u6a21\u578b\u4ee3\u7406\u4ea4\u4e92\u5f0f\u5730\u53d1\u73b0\u548c\u5b66\u4e60\u6709\u6548\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5148\u524d\u7684\u5de5\u4f5c\u5c40\u9650\u4e8e\u91c7\u7528\u7b56\u7565\u68af\u5ea6\u7b97\u6cd5\uff0c\u8fd9\u5927\u5927\u9650\u5236\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u63a2\u7d22\u548c\u5229\u7528\u65b9\u9762\u53ef\u4ee5\u4f7f\u7528\u7684\u5176\u4ed6\u65b9\u6cd5\uff0c\u4f8b\u5982\u7ecf\u9a8c\u56de\u653e\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5bf9\u4e8eLLM\u5b66\u4e60\u4ee3\u7406\u53ef\u80fd\u662f\u5173\u952e\u7684\uff0c\u7279\u522b\u662f\u5728\u8bbe\u8ba1\u81ea\u4e3b\u5185\u5728\u52a8\u673a\u4ee3\u7406\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u4f1a\u6839\u636e\u81ea\u5df1\u7684\u76ee\u6807\u8fdb\u884c\u91c7\u6837\u548c\u8ffd\u6c42\uff08\u5373\u81ea\u8db3\u578b\u4ee3\u7406\uff09\u3002\u672c\u6587\u63d0\u51fa\u5e76\u7814\u7a76\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u4ee3\u7406\u7684Soft Actor-Critic\u7b97\u6cd5\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u4e3a\u5f00\u53d1\u80fd\u591f\u5728\u7ebf\u5b66\u4e60\u7684\u81ea\u8db3\u578bLLM\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\uff0c\u800c\u4e14\u5728\u66f4\u7ecf\u5178\u7684\u591a\u76ee\u6807RL\u73af\u5883\u4e2d\u4e5f\u80fd\u4f18\u4e8e\u7b56\u7565\u68af\u5ea6\u65b9\u6cd5\u3002|\n", "2410.12361": "|**2024-10-16**|**Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance**|Yaxi Lu et.al.|[2410.12361](http://arxiv.org/abs/2410.12361)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e86\u663e\u8457\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u4ee3\u7406\u7cfb\u7edf\u4ecd\u7136\u5c40\u9650\u4e8e\u88ab\u52a8\u53cd\u5e94\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u9884\u89c1\u6027\u548c\u81ea\u4e3b\u51b3\u7b56\u7684\u573a\u666f\u4e2d\u7684\u6709\u6548\u6027\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u81f4\u529b\u4e8e\u5f00\u53d1\u80fd\u591f\u9884\u89c1\u5e76\u4e3b\u52a8\u53d1\u8d77\u4efb\u52a1\u7684\u4ee3\u7406\uff0c\u800c\u65e0\u9700\u660e\u786e\u7684\u4eba\u7c7b\u6307\u4ee4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u771f\u5b9e\u4e16\u754c\u7684\u4eba\u7c7b\u6d3b\u52a8\u4ee5\u751f\u6210\u4e3b\u52a8\u6027\u7684\u4efb\u52a1\u9884\u6d4b\u3002\u8fd9\u4e9b\u9884\u6d4b\u968f\u540e\u7531\u4eba\u7c7b\u6807\u6ce8\u8005\u6807\u8bb0\u4e3a\u63a5\u53d7\u6216\u62d2\u7edd\u3002\u6807\u6ce8\u6570\u636e\u88ab\u7528\u4e8e\u8bad\u7ec3\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u5224\u65ad\uff0c\u5e76\u4f5c\u4e3aLLM\u4ee3\u7406\u4e3b\u52a8\u6027\u81ea\u52a8\u8bc4\u4f30\u7684\u5de5\u5177\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u751f\u6210\u7ba1\u9053\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b6790\u4e2a\u4e8b\u4ef6\u7684\u591a\u6837\u5316\u6570\u636e\u96c6ProactiveBench\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u901a\u8fc7\u4f7f\u7528\u6240\u63d0\u51fa\u7684ProactiveBench\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u6fc0\u53d1LLM\u4ee3\u7406\u7684\u4e3b\u52a8\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u5fae\u8c03\u6a21\u578b\u5728\u4e3b\u52a8\u63d0\u4f9b\u5e2e\u52a9\u65b9\u9762\u7684F1\u5f97\u5206\u8fbe\u523066.47%\uff0c\u4f18\u4e8e\u6240\u6709\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u521b\u9020\u66f4\u4e3b\u52a8\u548c\u6709\u6548\u7684\u4ee3\u7406\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u672a\u6765\u7684\u4eba\u673a\u534f\u4f5c\u8fdb\u6b65\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.12236": "|**2024-10-16**|**Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay**|Yuyang Chen et.al.|[2410.12236](http://arxiv.org/abs/2410.12236)|null|\u5982\u4eca\uff0c\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u53d8\u538b\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u4f1a\u5e94\u7528\u91c7\u6837\u548c\u8fc7\u6ee4\u7ba1\u9053\u3002\u4f46\u7531\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u7a00\u758f\u5956\u52b1\u95ee\u9898\uff0c\u5373\u4e00\u4e2a\u4ee4\u724c\u7684\u4e0d\u6b63\u786e\u6027\u4f1a\u5bfc\u81f4\u53d8\u538b\u5668\u6a21\u578b\u4f1a\u6301\u7eed\u91c7\u6837\u5197\u4f59\u7a0b\u5e8f\u76f4\u5230\u627e\u5230\u6b63\u786e\u7684\u7a0b\u5e8f\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5728\u5fae\u8c03\u9636\u6bb5\u5f15\u5165\u4e86\u7ecf\u9a8c\u56de\u653e\uff08ER\uff09\uff0c\u5176\u4e2d\u751f\u6210\u7684\u4ee3\u7801\u548c\u7a0b\u5e8f\u4f1a\u88ab\u5b58\u50a8\u5e76\u91cd\u653e\uff0c\u4ee5\u4f7fLLM\u4ee3\u7406\u6709\u673a\u4f1a\u4ece\u8fc7\u53bb\u7684\u7ecf\u5386\u4e2d\u5b66\u4e60\u3002\u57fa\u4e8eER\u7684\u7cbe\u795e\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u79f0\u4e3aBTP\u6d41\u7a0b\uff0c\u8be5\u6d41\u7a0b\u5305\u62ec\u4e09\u4e2a\u9636\u6bb5\uff1a\u675f\u641c\u7d22\u91c7\u6837\u3001\u6d4b\u8bd5\u9636\u6bb5\u548c\u4f18\u5148\u7ea7\u7ecf\u9a8c\u56de\u653e\u9636\u6bb5\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528\u4e86\u4ee3\u7801\u6a21\u578b\u6536\u96c6\u7684\u5931\u8d25\u7a0b\u5e8f\uff0c\u5e76\u4ece\u56de\u653e\u7f13\u51b2\u533a\u4e2d\u91cd\u653e\u5177\u6709\u9ad8\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\u4f18\u5148\u503c\uff08P2Value\uff09\u7684\u7a0b\u5e8f\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u3002P2Value\u5168\u9762\u8003\u8651\u4e86\u53d8\u538b\u5668\u8f93\u51fa\u7684\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\uff0c\u5e76\u53ef\u4ee5\u5229\u7528\u5927\u591a\u6570\u7531LLMs\u6536\u96c6\u7684\u7a0b\u5e8f\u672a\u80fd\u901a\u8fc7\u4efb\u4f55\u6d4b\u8bd5\u800c\u4ea7\u751f\u7684\u5197\u4f59\u8d44\u6e90\u3002\u6211\u4eec\u5728\u51e0\u4e2aLLM\u4e2d\u5b9e\u8bc1\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8bc1\u660e\u5b83\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u57fa\u7ebf\u3002|\n", "2410.11906": "|**2024-10-15**|**Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents**|Bolun Sun et.al.|[2410.11906](http://arxiv.org/abs/2410.11906)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5e94\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u4e92\u5f0f\u5bf9\u8bdd\u4ee3\u7406\u6765\u589e\u5f3a\u7528\u6237\u5bf9\u9690\u79c1\u653f\u7b56\u7684\u7406\u89e3\u3002\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u6570\u636e\u5b9e\u8df5\u8bc6\u522b\u3001\u9009\u62e9\u8bc6\u522b\u3001\u653f\u7b56\u603b\u7ed3\u548c\u9690\u79c1\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u6a21\u578b\uff0c\u4e3a\u9690\u79c1\u653f\u7b56\u5206\u6790\u8bbe\u5b9a\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u4f5c\u4e3a\u5904\u7406\u7f51\u7ad9\u9690\u79c1\u653f\u7b56\u7684\u4e13\u4e1a\u7cfb\u7edf\uff0c\u5f15\u5bfc\u7528\u6237\u7a7f\u8d8a\u590d\u6742\u7684\u6cd5\u5f8b\u8bed\u8a00\uff0c\u800c\u65e0\u9700\u7528\u6237\u63d0\u51fa\u7279\u5b9a\u95ee\u9898\u3002\u4e00\u9879\u6d89\u53ca100\u540d\u53c2\u4e0e\u8005\u7684\u7528\u6237\u7814\u7a76\u8868\u660e\uff0c\u4f7f\u7528\u8be5\u4ee3\u7406\u7684\u7528\u6237\u5177\u6709\u66f4\u9ad8\u7684\u7406\u89e3\u6c34\u5e73\uff08\u5e73\u5747\u5f97\u52063\u5206\u4e2d\u76842.6\u5206\u5bf9\u6bd4\u5bf9\u7167\u7ec4\u76841.8\u5206\uff09\uff0c\u964d\u4f4e\u4e86\u8ba4\u77e5\u8d1f\u8377\uff08\u4efb\u52a1\u96be\u5ea6\u8bc4\u5206\u4e3a10\u5206\u4e2d\u76843.2\u5206\u5bf9\u6bd47.8\u5206\uff09\uff0c\u589e\u5f3a\u4e86\u7ba1\u7406\u9690\u79c1\u7684\u4fe1\u5fc3\uff0c\u5e76\u4e14\u5b8c\u6210\u4efb\u52a1\u7684\u65f6\u95f4\u66f4\u77ed\uff085.5\u5206\u949f\u5bf9\u6bd415.8\u5206\u949f\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u7a81\u663e\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6539\u53d8\u7528\u6237\u4e0e\u9690\u79c1\u653f\u7b56\u4e92\u52a8\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u52a0\u77e5\u60c5\u7684\u540c\u610f\u5e76\u8d4b\u4e88\u7528\u6237\u5728\u6570\u5b57\u670d\u52a1\u9886\u57df\u7684\u6743\u529b\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u5b9e\u73b0\u81ea\u4e3b\u6027\uff0c\u53ef\u4ee5\u63d0\u9ad8\u4eba\u7c7b\u5728\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u8fd9\u4e9b\u7f51\u7edc\u4ee3\u7406\u4e0d\u4ec5\u80fd\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\uff0c\u8fd8\u4f5c\u4e3a\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6982\u5ff5\u9a8c\u8bc1\u793a\u4f8b\uff0c\u5176\u6210\u529f\u5c06\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\uff0c\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u8fd9\u4e9b\u7b56\u7565\u53ef\u80fd\u65e0\u6cd5\u5f88\u597d\u5730\u63a8\u5e7f\u5230\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5bf9\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0e\u652f\u6301\u5b83\u7684LLM\u7684\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\uff0c\u7814\u7a76\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u79cd\u5dee\u5f02\u5728LLMs\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u800c\u975e\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u65f6\u5c24\u4e3a\u663e\u8457\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u6765\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\uff0c\u4ee5\u66f4\u597d\u5730\u4e0e\u5176\u80fd\u529b\u76f8\u5339\u914d\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u7684\u57fa\u7840\u4ee3\u7406\u5728\u5e7f\u6cdb\u7684\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4ee5\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u6d4b\u8bd5\u6db5\u76d6\u4e86\u901a\u7528\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u4ee5\u524d\u7684\u6700\u5148\u8fdb\u6280\u672f\u548c\u540c\u671f\u5de5\u4f5c\u5206\u522b\u9ad8\u51fa9.8\u4e2a\u7edd\u5bf9\u70b9\uff08+29.4%\uff09\u548c5.9\u4e2a\u7edd\u5bf9\u70b9\uff08+15.8%\uff09\uff0c\u5e76\u4e14\u76f8\u6bd4\u7c7b\u4f3c\u7684\u666e\u901a\u7f51\u7edc\u4ee3\u7406\uff0c\u5176\u6210\u529f\u7387\u63d0\u9ad8\u4e8626.6\u4e2a\u70b9\uff08+161%\uff09\u3002\u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u7684\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u7b80\u6d01\u8bbe\u8ba1\u7a81\u663e\u4e86LLMs\u5728\u7f51\u9875\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6027\u80fd\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u91cd\u8981\u6027\u3002|\n", "2410.13768": "|**2024-10-17**|**Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems**|Alireza Ghafarollahi et.al.|[2410.13768](http://arxiv.org/abs/2410.13768)|null|\u4e00\u4e2a\u591a\u667a\u80fd\u4f53AI\u6a21\u578b\u88ab\u7528\u4e8e\u81ea\u52a8\u5316\u53d1\u73b0\u65b0\u578b\u91d1\u5c5e\u5408\u91d1\uff0c\u8be5\u6a21\u578b\u6574\u5408\u4e86\u591a\u6a21\u6001\u6570\u636e\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u5305\u62ec\u901a\u8fc7\u539f\u5b50\u6a21\u62df\u83b7\u5f97\u7684\u7269\u7406\u89c1\u89e3\u3002\u6211\u4eec\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5305\u542b\u4e09\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(a) \u4e00\u7ec4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8d1f\u8d23\u63a8\u7406\u548c\u89c4\u5212\u7b49\u4efb\u52a1\uff0c(b) \u4e00\u7fa4\u5177\u6709\u4e0d\u540c\u89d2\u8272\u548c\u4e13\u957f\u7684AI\u4ee3\u7406\u52a8\u6001\u534f\u4f5c\uff0c\u4ee5\u53ca(c) \u4e00\u79cd\u65b0\u5f00\u53d1\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\u6a21\u578b\uff0c\u7528\u4e8e\u5feb\u901f\u68c0\u7d22\u5173\u952e\u7269\u7406\u5c5e\u6027\u3002\u4e00\u7ec4\u7531LLM\u9a71\u52a8\u7684AI\u4ee3\u7406\u534f\u540c\u5de5\u4f5c\uff0c\u4ee5\u81ea\u52a8\u5316\u63a2\u7d22MPEAs\uff08\u9ad8\u71b5\u5408\u91d1\uff09\u7684\u5de8\u5927\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5e76\u7531GNN\u7684\u9884\u6d4b\u7ed3\u679c\u8fdb\u884c\u6307\u5bfc\u3002\u6211\u4eec\u4e13\u6ce8\u4e8eNbMoTa\u7cfb\u5217\u4f53\u5fc3\u7acb\u65b9\uff08bcc\uff09\u5408\u91d1\uff0c\u8fd9\u4e9b\u5408\u91d1\u4f7f\u7528\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u539f\u5b50\u95f4\u52bf\u6a21\u578b\u8fdb\u884c\u5efa\u6a21\uff0c\u5e76\u4e14\u91cd\u70b9\u5173\u6ce8\u4e24\u4e2a\u5173\u952e\u5c5e\u6027\uff1aPeierls\u52bf\u5792\u548c\u56fa\u6eb6\u4f53/\u87ba\u578b\u4f4d\u9519\u76f8\u4e92\u4f5c\u7528\u80fd\u3002\u6211\u4eec\u7684GNN\u6a21\u578b\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u8fd9\u4e9b\u539f\u5b50\u5c3a\u5ea6\u7684\u5c5e\u6027\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u6602\u8d35\u7684\u66b4\u529b\u8ba1\u7b97\u66f4\u5feb\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u51cf\u5c11\u4e86\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5728\u7269\u7406\u5c5e\u6027\u68c0\u7d22\u4e0a\u7684\u8ba1\u7b97\u8d1f\u62c5\u3002\u8fd9\u4e00AI\u7cfb\u7edf\u901a\u8fc7\u51cf\u5c11\u5bf9\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u4f9d\u8d56\u5e76\u514b\u670d\u76f4\u63a5\u5168\u539f\u5b50\u6a21\u62df\u7684\u9650\u5236\uff0c\u9769\u65b0\u4e86\u6750\u6599\u53d1\u73b0\u7684\u8fc7\u7a0b\u3002\u901a\u8fc7\u534f\u540cGNN\u7684\u9884\u6d4b\u80fd\u529b\u548cLLM\u9a71\u52a8\u4ee3\u7406\u7684\u52a8\u6001\u534f\u4f5c\uff0c\u8be5\u7cfb\u7edf\u81ea\u4e3b\u5bfc\u822a\u5de8\u5927\u7684\u5408\u91d1\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u8bc6\u522b\u539f\u5b50\u5c3a\u5ea6\u6750\u6599\u5c5e\u6027\u7684\u8d8b\u52bf\uff0c\u5e76\u9884\u6d4b\u5b8f\u89c2\u5c3a\u5ea6\u7684\u673a\u68b0\u5f3a\u5ea6\uff0c\u5982\u51e0\u4e2a\u8ba1\u7b97\u5b9e\u9a8c\u6240\u793a\u3002\u8fd9\u79cd\u65b9\u6cd5\u52a0\u901f\u4e86\u5148\u8fdb\u5408\u91d1\u7684\u53d1\u73b0\uff0c\u5e76\u6709\u671b\u5728\u5176\u4ed6\u590d\u6742\u7cfb\u7edf\u4e2d\u6709\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u6750\u6599\u8bbe\u8ba1\u9886\u57df\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u5c55\u3002|\n", "2410.13610": "|**2024-10-17**|**MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling**|Yakun Zhu et.al.|[2410.13610](http://arxiv.org/abs/2410.13610)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u96c6\u6210\u5de5\u5177\u5df2\u7ecf\u4fc3\u8fdb\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u5728\u4e13\u95e8\u7684\u4e0b\u6e38\u4efb\u52a1\u73af\u5883\u4e2d\uff0c\u4ec5\u4f9d\u8d56\u5de5\u5177\u4e0d\u8db3\u4ee5\u5b8c\u5168\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u6027\uff0c\u8fd9\u5c24\u5176\u9650\u5236\u4e86LLMs\u5728\u533b\u5b66\u7b49\u9886\u57df\u7684\u6709\u6548\u5e94\u7528\u3002\u672c\u6587\u805a\u7126\u4e8e\u533b\u5b66\u8ba1\u7b97\u5668\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u4f7f\u7528\u6807\u51c6\u5316\u6d4b\u8bd5\u6765\u8bc4\u4f30\u4e2a\u4f53\u7684\u5065\u5eb7\u72b6\u51b5\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86MeNTi\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLMs\u7684\u901a\u7528\u4ee3\u7406\u67b6\u6784\u3002MeNTi\u96c6\u6210\u4e86\u4e13\u4e1a\u7684\u533b\u5b66\u5de5\u5177\u5305\uff0c\u5e76\u91c7\u7528\u5143\u5de5\u5177\u548c\u5d4c\u5957\u8c03\u7528\u673a\u5236\u4ee5\u589e\u5f3aLLM\u5de5\u5177\u7684\u4f7f\u7528\u6548\u679c\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5b9e\u73b0\u4e86\u7075\u6d3b\u7684\u5de5\u5177\u9009\u62e9\u548c\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u6765\u5e94\u5bf9\u590d\u6742\u7684\u533b\u7597\u573a\u666f\u4e2d\u7684\u5b9e\u9645\u95ee\u9898\uff0c\u5305\u62ec\u8ba1\u7b97\u5668\u9009\u62e9\u3001\u63d2\u69fd\u586b\u5145\u548c\u5355\u4f4d\u8f6c\u6362\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u5728\u6574\u4e2a\u4e34\u5e8a\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u8ba1\u7b97\u5668\u573a\u666f\u5b9a\u91cf\u8bc4\u4f30\u7684\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86CalcQA\u57fa\u51c6\u3002\u8be5\u57fa\u51c6\u8981\u6c42LLMs\u4f7f\u7528\u533b\u5b66\u8ba1\u7b97\u5668\u8fdb\u884c\u8ba1\u7b97\u5e76\u8bc4\u4f30\u60a3\u8005\u7684\u5065\u5eb7\u72b6\u51b5\u3002CalcQA\u7531\u4e13\u4e1a\u533b\u751f\u6784\u5efa\uff0c\u5305\u542b100\u4e2a\u6848\u4f8b-\u8ba1\u7b97\u5668\u5bf9\uff0c\u5e76\u9644\u5e26\u4e00\u4e2a\u5305\u542b281\u4e2a\u533b\u5b66\u5de5\u5177\u7684\u5de5\u5177\u5305\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\u3002\u8fd9\u9879\u7814\u7a76\u4e3aLLMs\u5728\u533b\u5b66\u9ad8\u9700\u6c42\u573a\u666f\u4e2d\u7684\u5e94\u7528\u5f00\u8f9f\u4e86\u65b0\u7684\u65b9\u5411\u3002|\n", "2410.13185": "|**2024-10-17**|**Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents**|Long Li et.al.|[2410.13185](http://arxiv.org/abs/2410.13185)|null|\u6709\u6548\u7684\u7814\u7a76\u521b\u610f\u6784\u601d\u662f\u79d1\u5b66\u7814\u7a76\u4e2d\u7684\u5173\u952e\u6b65\u9aa4\u3002\u7136\u800c\uff0c\u968f\u7740\u79d1\u5b66\u6587\u732e\u7684\u6307\u6570\u7ea7\u589e\u957f\uff0c\u7814\u7a76\u4eba\u5458\u5f88\u96be\u8ddf\u4e0a\u6700\u65b0\u7684\u8fdb\u5c55\u5e76\u786e\u5b9a\u6709\u610f\u4e49\u7684\u7814\u7a76\u65b9\u5411\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u8868\u660e\uff0c\u81ea\u52a8\u5316\u751f\u6210\u65b0\u9896\u7814\u7a76\u521b\u610f\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u5411\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u521b\u610f\u751f\u6210\u65b9\u6cd5\u8981\u4e48\u7b80\u5355\u5730\u63d0\u793aLLMs\uff0c\u8981\u4e48\u76f4\u63a5\u5411LLMs\u66b4\u9732\u5927\u91cf\u6587\u732e\u800c\u6ca1\u6709\u6307\u793a\u6709\u7528\u7684\u4fe1\u606f\u3002\u53d7\u5230\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7814\u7a76\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aChain-of-Ideas\uff08CoI\uff09\u7684\u4ee3\u7406\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5b83\u4ee5\u94fe\u5f0f\u7ed3\u6784\u7ec4\u7ec7\u76f8\u5173\u6587\u732e\uff0c\u6709\u6548\u5730\u53cd\u6620\u4e86\u7814\u7a76\u9886\u57df\u7684\u6e10\u8fdb\u53d1\u5c55\u3002\u8fd9\u79cd\u7ec4\u7ec7\u65b9\u5f0f\u4f7fLLMs\u80fd\u591f\u6355\u6349\u5f53\u524d\u7684\u7814\u7a76\u8fdb\u5c55\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u521b\u610f\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aIdea Arena\u7684\u8bc4\u4f30\u534f\u8bae\uff0c\u53ef\u4ee5\u4ece\u4e0d\u540c\u89d2\u5ea6\u5168\u9762\u8bc4\u4f30\u521b\u610f\u751f\u6210\u65b9\u6cd5\uff0c\u4e0e\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7684\u504f\u597d\u7d27\u5bc6\u5bf9\u9f50\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoI\u4ee3\u7406\u5728\u7814\u7a76\u521b\u610f\u751f\u6210\u65b9\u9762\u59cb\u7ec8\u4f18\u4e8e\u5176\u4ed6\u65b9\u6cd5\uff0c\u5e76\u4e14\u5176\u8d28\u91cf\u53ef\u4e0e\u4eba\u7c7b\u76f8\u5ab2\u7f8e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684CoI\u4ee3\u7406\u6210\u672c\u6548\u76ca\u9ad8\uff0c\u751f\u6210\u4e00\u4e2a\u5019\u9009\u521b\u610f\u53ca\u5176\u76f8\u5e94\u5b9e\u9a8c\u8bbe\u8ba1\u7684\u6700\u4f4e\u6210\u672c\u4ec5\u4e3a0.50\u7f8e\u5143\u3002|\n"}, "llm": {"2405.10311": "|**2024-05-16**|**UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models**|Sahel Sharifymoghaddam et.al.|[2405.10311](http://arxiv.org/abs/2405.10311)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u591a\u6a21\u6001\uff08MM\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u89e3\u9501\u4e86\u8bb8\u591a\u9700\u8981\u591a\u6a21\u6001\u7406\u89e3\uff08\u5982\u56fe\u50cf\u63cf\u8ff0\u6216\u89c6\u89c9\u95ee\u7b54\uff09\u548c\u751f\u6210\uff08\u5982\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u751f\u6210\u6216\u7f16\u8f91\uff09\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347MM-LLMs\u7684\u8f93\u51fa\u8d28\u91cf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684UniRAG\u6280\u672f\uff0c\u5b83\u5728\u63a8\u7406\u9636\u6bb5\u5c06\u76f8\u5173\u68c0\u7d22\u4fe1\u606f\u6dfb\u52a0\u5230\u63d0\u793a\u4e2d\uff0c\u4f5c\u4e3a\u5c11\u91cf\u6837\u4f8b\u3002\u4e0e\u666e\u904d\u8ba4\u4e3a\u68c0\u7d22\u589e\u5f3a\uff08RA\uff09\u4e3b\u8981\u6539\u8fdb\u7f55\u89c1\u5b9e\u4f53\u7684\u751f\u6210\u6216\u7406\u89e3\u4e0d\u540c\uff0c\u6211\u4eec\u5728MSCOCO\u6570\u636e\u96c6\u4e0a\u5bf9\u5305\u62ecGPT4\u3001Gemini-Pro\u5728\u5185\u7684\u4e13\u6709\u6a21\u578b\u4ee5\u53caLlava\u3001LaVIT\u548cEmu2\u7b49\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8f93\u5165\u63d0\u793a\u901a\u8fc7MM\u68c0\u7d22\u5668\uff08\u5982UniIR\u6a21\u578b\uff09\u589e\u5f3a\u540e\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u8d28\u91cf\u3002|\n", "2405.10305": "|**2024-05-16**|**4D Panoptic Scene Graph Generation**|Jingkang Yang et.al.|[2405.10305](http://arxiv.org/abs/2405.10305)|**[link](https://github.com/jingkang50/psg4d)**|**\u6211\u4eec\u751f\u6d3b\u5728\u4e00\u4e2a\u4e09\u7ef4\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u901a\u8fc7\u7b2c\u56db\u7ef4\u65f6\u95f4\u5411\u524d\u63a8\u8fdb\u3002\u4e3a\u4e86\u4f7f\u4eba\u5de5\u667a\u80fd\u80fd\u591f\u5168\u9762\u7406\u89e3\u8fd9\u79cd4D\u73af\u5883\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8868\u793a\u5f62\u5f0f\u2014\u20144D\u5168\u666f\u573a\u666f\u56fe\uff08PSG-4D\uff09\uff0c\u5b83\u5c06\u52a8\u60014D\u4e16\u754c\u4e2d\u7684\u539f\u59cb\u89c6\u89c9\u6570\u636e\u62bd\u8c61\u4e3a\u8282\u70b9\u548c\u8fb9\uff0c\u8282\u70b9\u4ee3\u8868\u5177\u6709\u7cbe\u786e\u4f4d\u7f6e\u548c\u72b6\u6001\u4fe1\u606f\u7684\u5b9e\u4f53\uff0c\u8fb9\u6355\u6349\u65f6\u95f4\u5173\u7cfb\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5728\u8fd9\u4e00\u65b0\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u6ce8\u91caPSG-4D\u6570\u636e\u96c6\uff0c\u5305\u542b3000\u4e2aRGB-D\u89c6\u9891\uff0c\u603b\u8ba1100\u4e07\u5e27\uff0c\u6bcf\u5e27\u90fd\u5e26\u67094D\u5168\u666f\u5206\u5272\u63a9\u7801\u4ee5\u53ca\u8be6\u7ec6\u7684\u52a8\u6001\u573a\u666f\u56fe\u6807\u7b7e\u3002\u6211\u4eec\u4e3a\u6b64\u4efb\u52a1\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSG4DFormer\u7684Transformer\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u5168\u666f\u5206\u5272\u63a9\u7801\uff0c\u6cbf\u65f6\u95f4\u8f74\u8ddf\u8e2a\u63a9\u7801\uff0c\u5e76\u901a\u8fc7\u5173\u7cfb\u7ec4\u4ef6\u751f\u6210\u76f8\u5e94\u7684\u573a\u666f\u56fe\u3002\u5728\u65b0\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u672a\u6765\u7684PSG-4D\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u878d\u5165\u6211\u4eec\u7684PSG-4D\u7cfb\u7edf\u6765\u5b9e\u73b0\u52a8\u6001\u573a\u666f\u7406\u89e3\u7684\u4e00\u4e2a\u5b9e\u9645\u5e94\u7528\u793a\u4f8b\u3002**|\n", "2405.10299": "|**2024-05-16**|**HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models**|Rhea Sanjay Sukthanker et.al.|[2405.10299](http://arxiv.org/abs/2405.10299)|**[link](https://github.com/automl/hw-aware-llm-bench)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u786c\u4ef6\u6307\u6807\uff08\u5982\u5ef6\u8fdf\u3001\u80fd\u8017\u3001GPU\u5185\u5b58\u4f7f\u7528\u548c\u6027\u80fd\uff09\u4e4b\u95f4\u7684\u6743\u8861\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u4eba\u4eec\u6b63\u5728\u5bfb\u6c42\u4e3a\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u914d\u7f6e\u5efa\u7acb\u5e15\u7d2f\u6258\u524d\u6cbf\uff0c\u4ee5\u5728\u6307\u5b9a\u786c\u4ef6\u9650\u5236\u4e0b\u627e\u5230\u6700\u4f18\u6a21\u578b\u3002\u7136\u800c\uff0c\u5bf9\u591a\u79cd\u67b6\u6784\u5728\u591a\u53f0\u8bbe\u5907\u4e0a\u7684\u5168\u9762\u8bad\u7ec3\u548c\u8bc4\u4f30\u5728\u8ba1\u7b97\u4e0a\u662f\u4e0d\u53ef\u884c\u7684\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HW-GPT-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u786c\u4ef6\u611f\u77e5\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u57fa\u51c6\uff0c\u5229\u7528\u795e\u7ecf\u67b6\u6784\u641c\u7d22\uff08NAS\uff09\u4e2d\u7684\u6743\u91cd\u5171\u4eab\u6280\u672f\uff0c\u5728\u4e00\u4e2a\u6a21\u578b\u4e2d\u9ad8\u6548\u5730\u8bad\u7ec3\u5305\u542b\u4e0d\u540c\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u8d85\u7f51\u7edc\u3002\u6211\u4eec\u572813\u79cd\u8bbe\u5907\u4e0a\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u6027\u80fd\u5256\u6790\uff0c\u8003\u8651\u4e865\u79cd\u786c\u4ef6\u6307\u6807\u548c3\u79cd\u4e0d\u540c\u7684\u6a21\u578b\u89c4\u6a21\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc78\u79cd\u4e0d\u540c\u7684\u591a\u76ee\u6807NAS\u7b97\u6cd5\u5c55\u793a\u4e86HW-GPT-Bench\u7684\u53ef\u7528\u6027\uff0c\u5e76\u8bc4\u4f30\u4e86\u7531\u6b64\u4ea7\u751f\u7684\u5e15\u7d2f\u6258\u524d\u6cbf\u7684\u8d28\u91cf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u63a8\u52a8\u548c\u52a0\u901f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u76ee\u6807\u65b9\u6cd5\uff0c\u5982NAS\u548c\u7ed3\u6784\u5316\u526a\u679d\u7684\u7814\u7a76\u3002**|\n", "2405.10288": "|**2024-05-16**|**Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction**|Jianhao Chen et.al.|[2405.10288](http://arxiv.org/abs/2405.10288)|**[link](https://github.com/jianhaochen-nju/tsdre)**|**\u6458\u8981\uff1a** \u4e8b\u5b9e\u62bd\u53d6\u5bf9\u4e8e\u6784\u5efa\u77e5\u8bc6\u56fe\u8c31\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5bf9\u65f6\u95f4\u76f8\u5173\u4e8b\u5b9e\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u9700\u6c42\u589e\u957f\uff0c\u51fa\u73b0\u4e86\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u4efb\u52a1\u3002\u672c\u6587\u7279\u522b\u5173\u6ce8\u4ece\u81ea\u7136\u8bed\u8a00\u6587\u672c\u4e2d\u63d0\u53d6\u65f6\u95f4\u6027\u4e8b\u5b9e\u3002\u5148\u524d\u7684\u7814\u7a76\u672a\u80fd\u59a5\u5584\u5904\u7406\u590d\u6742\u53e5\u5b50\u4e2d\u65f6\u95f4\u4e0e\u4e8b\u5b9e\u5bf9\u5e94\u5173\u7cfb\u7684\u5efa\u7acb\u96be\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u65f6\u95f4\u7ebf\u7684\u53e5\u5b50\u5206\u89e3\u7b56\u7565\uff0c\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4e8b\u5b9e\u76f8\u5173\u65f6\u95f4\u7ebf\u7684\u7cbe\u7ec6\u7406\u89e3\u3002\u7136\u800c\uff0c\u76f4\u63a5\u4f7f\u7528LLMs\u8fdb\u884c\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u6027\u80fd\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86TSDRE\u65b9\u6cd5\uff0c\u5c06LLMs\u7684\u5206\u89e3\u80fd\u529b\u878d\u5165\u5230\u5c0f\u578b\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u4f20\u7edf\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002 \u4e3a\u4e86\u652f\u6301\u8bc4\u4f30\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u590d\u6742\u7684\u65f6\u5e8f\u4e8b\u5b9e\u62bd\u53d6\u6570\u636e\u96c6ComplexTRED\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTSDRE\u5728HyperRED-Temporal\u548cComplexTRED\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002|\n", "2405.10276": "|**2024-05-16**|**Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers**|Tuo Zhang et.al.|[2405.10276](http://arxiv.org/abs/2405.10276)|null|\u8fd1\u5e74\u6765\uff0c\u8bb8\u591a\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u7b56\u7565\u6027\u63d0\u793a\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6548\u80fd\u3002\u7279\u522b\u662f\u4f18\u5316\u901a\u8fc7prompting\uff08OPRO\uff09\u65b9\u6cd5\u8868\u73b0\u51fa\u9876\u5c16\u6027\u80fd\uff0c\u5b83\u5229\u7528LLMs\u4f5c\u4e3a\u4f18\u5316\u5668\uff0c\u76ee\u6807\u662f\u5bfb\u627e\u80fd\u6700\u5927\u5316\u4efb\u52a1\u51c6\u786e\u6027\u7684\u6307\u4ee4\u3002\u672c\u8bba\u6587\u91cd\u65b0\u5ba1\u89c6\u4e86OPRO\u5728\u5c0f\u578bLLMs\uff08\u5982LaMa-2\u7cfb\u5217\u548cMistral 7B\uff09\u4e0a\u7684\u81ea\u52a8\u5316\u63d0\u793a\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u4e8e\u5c0f\u578bLLMs\uff0cOPRO\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u5176\u6709\u9650\u7684\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86\u4f18\u5316\u6f5c\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u672a\u6765\u7684\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u5e94\u540c\u65f6\u8003\u8651\u6a21\u578b\u80fd\u529b\u548c\u8ba1\u7b97\u6210\u672c\u3002\u9488\u5bf9\u5c0f\u578bLLMs\uff0c\u6211\u4eec\u63a8\u8350\u76f4\u63a5\u63d0\u4f9b\u660e\u786e\u9610\u8ff0\u76ee\u6807\u548c\u65b9\u6cd5\u7684\u6307\u4ee4\uff0c\u4f5c\u4e3a\u7a33\u5065\u7684\u63d0\u793a\u57fa\u7ebf\uff0c\u4ee5\u786e\u4fdd\u5728\u5f53\u524d\u7814\u7a76\u4e2d\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u63d0\u793a\u8bbe\u8ba1\u3002|\n", "2405.10260": "|**2024-05-16**|**Keep It Private: Unsupervised Privatization of Online Text**|Calvin Bao et.al.|[2405.10260](http://arxiv.org/abs/2405.10260)|**[link](https://github.com/csbao/kip-privatization)**|**## \u80cc\u666f \u4f5c\u8005\u8eab\u4efd\u6df7\u6dc6\u6280\u672f\u6709\u671b\u901a\u8fc7\u81ea\u52a8\u91cd\u5199\u6587\u672c\u6765\u4fdd\u62a4\u7f51\u7edc\u901a\u4fe1\u4e2d\u7684\u4e2a\u4eba\u9690\u79c1\u3002\u7136\u800c\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6587\u732e\u4e2d\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u8bc4\u4f30\u5927\u591a\u5c40\u9650\u5728\u72ed\u5c0f\u573a\u666f\u4e0b\uff0c\u4e3b\u8981\u4f9d\u8d56\u4e8e\u8868\u9762\u7684\u7f16\u8f91\u64cd\u4f5c\uff0c\u53ef\u80fd\u5bfc\u81f4\u8f93\u51fa\u4e0d\u81ea\u7136\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u6587\u672c\u79c1\u5bc6\u5316\u6846\u67b6\uff0c\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u751f\u6210\u517c\u987e\u51c6\u786e\u3001\u8fde\u8d2f\u548c\u9690\u79c1\u7684\u91cd\u5199\u3002\u6211\u4eec\u5728\u5927\u89c4\u6a21\u7684\u82f1\u8bedReddit\u5e16\u5b50\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u8be5\u6570\u636e\u96c6\u753168,000\u540d\u4f5c\u8005\u64b0\u5199\uff0c\u5305\u542b\u77ed\u5230\u4e2d\u7b49\u957f\u5ea6\u7684\u6587\u672c\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5728\u4e0d\u540c\u8bc4\u4f30\u6761\u4ef6\u4e0b\uff0c\u5982\u4f5c\u8005\u7b80\u4ecb\u957f\u5ea6\u548c\u4f5c\u8005\u8bc6\u522b\u7b56\u7565\uff0c\u6027\u80fd\u7684\u53d8\u5316\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u81ea\u52a8\u5316\u6307\u6807\u548c\u4eba\u5de5\u8bc4\u4f30\u4e2d\u4fdd\u6301\u9ad8\u6587\u672c\u8d28\u91cf\uff0c\u5e76\u6210\u529f\u5730\u89c4\u907f\u4e86\u51e0\u79cd\u81ea\u52a8\u4f5c\u8005\u8bc6\u522b\u653b\u51fb\u3002**|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u4f53\u5728\u7a7a\u95f4\u7406\u89e3\u4e0e\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u8986\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u7ed3\u5408\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u663e\u793a\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u4e5f\u6307\u51fa\u4e86\u6316\u63983D-LLMs\u5168\u90e8\u6f5c\u529b\u6240\u9700\u7684\u521b\u65b0\u65b9\u6cd5\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u63d0\u4f9b\u6307\u5bfc\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u8c03\u67e5\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251](http://arxiv.org/abs/2405.10251)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u8bc4\u4f30\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u7b49\u65b9\u9762\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u636e\u6211\u4eec\u6240\u77e5\uff0c\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u6df1\u5165\u7814\u7a76\uff0c\u8fd9\u662f\u8861\u91cf\u6a21\u578b\u4f18\u79c0\u7a0b\u5ea6\u7684\u5173\u952e\u6807\u51c6\u3002\u56e0\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u5168\u9762\u8bc4\u4f30\u77e5\u540d\u4e14\u6027\u80fd\u51fa\u8272\u7684LLMs\uff0c\u5305\u62ecChatGPT\u3001ChatGLM\u3001\u57fa\u4e8eT5\u7684\u6a21\u578b\u3001\u57fa\u4e8eLLaMA\u7684\u6a21\u578b\u548cPythia\u6a21\u578b\uff0c\u5728\u5bf9\u8bdd\u751f\u6210\u548c\u6587\u672c\u603b\u7ed3\u7b49NLG\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u6db5\u76d6\u82f1\u8bed\u548c\u4e2d\u6587\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5171\u540c\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5305\u62ec\u8f93\u5165\u6a21\u677f\u548c\u540e\u5904\u7406\u7b56\u7565\u3002\u7814\u7a76\u7ed3\u679c\u62a5\u544a\u4e86\u81ea\u52a8\u8bc4\u5206\uff0c\u540c\u65f6\u8fdb\u884c\u4e86\u8be6\u7ec6\u5206\u6790\u3002|\n", "2405.10250": "|**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u4e92\u52a8\u529f\u80fd\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u6307\u5bfc\u6a21\u578b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u4e92\u52a8\u65b9\u5f0f\u5f80\u5f80\u5047\u8bbe\u7528\u6237\u5177\u5907\u8c03\u8bd5\u6e90\u4ee3\u7801\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u975e\u4e13\u4e1a\u7a0b\u5e8f\u5458\u4e0d\u592a\u53cb\u597d\u3002\u8fd9\u4f7f\u5f97\u4f7f\u4e92\u52a8\u4ee3\u7801\u751f\u6210\u5bf9\u4e0d\u540c\u7f16\u7a0b\u6c34\u5e73\u7684\u4e2a\u4f53\u66f4\u6613\u4e8e\u4f7f\u7528\u6210\u4e3a\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86IntelliExplain\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u673a\u4ea4\u4e92\u8303\u5f0f\uff0c\u901a\u8fc7\u8ba9\u7528\u6237\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u4e0e\u6e90\u4ee3\u7801\u4e92\u52a8\uff0c\u63d0\u5347\u975e\u4e13\u4e1a\u4eba\u58eb\u7684\u4f53\u9a8c\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u4ed6\u4eec\u53d1\u73b0\u9519\u8bef\u7684\u81ea\u7136\u8bed\u8a00\u7ea0\u6b63\u53cd\u9988\uff0c\u6765\u6307\u5bfc\u7cfb\u7edf\u4fee\u8ba2\u4ee3\u7801\uff0c\u76f4\u5230\u7528\u6237\u5bf9\u7cfb\u7edf\u7684\u4ee3\u7801\u89e3\u91ca\u611f\u5230\u6ee1\u610f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u663e\u793a\uff0c\u4f7f\u7528IntelliExplain\u7684\u7528\u6237\u5728Text-to-SQL\u548cPython\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u5206\u522b\u6bd4\u7eafGPT-3.5\u63d0\u9ad8\u4e8611.6%\u548c25.3%\uff0c\u540c\u65f6\u6240\u9700\u65f6\u95f4\u5206\u522b\u51cf\u5c11\u4e8639.0%\u548c15.6%\u3002|\n", "2405.10212": "|**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|**[link](https://github.com/CAS-SIAT-XinHai/CPsyExam)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5fc3\u7406\u5b66\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014CPsyExam\uff0c\u5b83\u6e90\u4e8e\u4e2d\u56fd\u8bed\u8a00\u8003\u8bd5\u7684\u95ee\u9898\u3002CPsyExam\u65e8\u5728\u5206\u522b\u5f3a\u8c03\u5fc3\u7406\u5b66\u77e5\u8bc6\u548c\u6848\u4f8b\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u8ba4\u8bc6\u5230\u5c06\u5fc3\u7406\u5b66\u77e5\u8bc6\u5e94\u7528\u4e8e\u5b9e\u9645\u60c5\u5883\u7684\u4ef7\u503c\u3002\u4ece22,000\u4e2a\u95ee\u9898\u5e93\u4e2d\uff0c\u6211\u4eec\u7cbe\u9009\u4e864,000\u4e2a\u6765\u6784\u5efa\u8be5\u57fa\u51c6\uff0c\u786e\u4fdd\u4e86\u4e3b\u9898\u7684\u5747\u8861\u8986\u76d6\uff0c\u5e76\u5305\u542b\u4e86\u5404\u79cd\u6848\u4f8b\u5206\u6790\u65b9\u6cd5\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u57fa\u7840\u7684\u6a21\u578b\u3002\u5b9e\u9a8c\u548c\u5206\u6790\u7ed3\u679c\u663e\u793a\uff0cCPsyExam\u662f\u4e00\u4e2a\u6709\u6548\u7684\u786e\u7acb\u8bed\u8a00\u6a21\u578b\u5bf9\u5fc3\u7406\u5b66\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\uff0c\u540c\u65f6\u652f\u6301\u5728\u4e0d\u540c\u7c92\u5ea6\u4e0a\u6bd4\u8f83\u8fd9\u4e9b\u6a21\u578b\u3002|\n", "2405.10936": "|**2024-05-17**|**A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers**|Kaiyu Huang et.al.|[2405.10936](http://arxiv.org/abs/2405.10936)|**[link](https://github.com/kaiyuhwang/mllm-survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u8bed\u8a00\u80fd\u529b\uff0c\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u51cf\u5c11\u6f5c\u5728\u7684\u6b67\u89c6\u5e76\u63d0\u5347\u6280\u672f\u7684\u901a\u7528\u6027\u548c\u53ef\u8bbf\u95ee\u6027\uff0c\u5bf9\u4e8e\u591a\u8bed\u8a00\u6280\u672f\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1LLMs\u53d6\u5f97\u4e86\u7a81\u7834\uff0c\u4f46\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u7684\u6df1\u5165\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u4e00\u4efd\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u603b\u7ed3\u8fd1\u671f\u7684\u65b9\u6cd5\u3001\u8fdb\u5c55\u3001\u5c40\u9650\u6027\u548c\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u65e8\u5728\u4ece\u591a\u4e2a\u89d2\u5ea6\u5ba1\u89c6LLMs\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u56de\u987e\u4e86\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7814\u7a76\u7684\u5386\u53f2\u6f14\u53d8\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86LLMs\u7684\u591a\u8bed\u8a00\u7279\u6027\uff0c\u5305\u62ec\u8bad\u7ec3\u548c\u63a8\u7406\u65b9\u6cd5\u3001\u6a21\u578b\u5b89\u5168\u3001\u8de8\u9886\u57df\u4e0e\u6587\u5316\u9002\u5e94\u4ee5\u53ca\u6570\u636e\u96c6\u4f7f\u7528\u3002\u6211\u4eec\u8fd8\u5206\u6790\u4e86\u8fd9\u4e9b\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u53ef\u80fd\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u7684\u591a\u8bed\u8a00\u6027\u80fd\u3002\u672c\u7efc\u8ff0\u65e8\u5728\u5e2e\u52a9\u7814\u7a76\u754c\u5e94\u5bf9\u591a\u8bed\u8a00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u5173\u4e8e\u57fa\u4e8eLLMs\u7684\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6838\u5fc3\u6982\u5ff5\u3001\u5173\u952e\u6280\u672f\u53ca\u6700\u65b0\u8fdb\u5c55\u7684\u5168\u9762\u7406\u89e3\u3002**|\n", "2405.10928": "|**2024-05-17**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928](http://arxiv.org/abs/2405.10928)|**[link](https://github.com/apolloresearch/rib)**|### \u6982\u8ff0 \u673a\u68b0\u89e3\u91ca\u6027\u76ee\u6807\u662f\u901a\u8fc7\u9006\u5411\u5de5\u7a0b\u7406\u89e3\u795e\u7ecf\u7f51\u7edc\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5bf9\u6fc0\u6d3b\u7684\u5206\u89e3\uff0c\u4f7f\u5f97\u5355\u4e2a\u795e\u7ecf\u5143\u6216\u6a21\u578b\u7ec4\u4ef6\u65e0\u6cd5\u6e05\u6670\u5bf9\u5e94\u4e8e\u72ec\u7279\u7684\u7279\u5f81\u6216\u529f\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\u2014\u2014\u5c40\u90e8\u4ea4\u4e92\u57fa\uff08Local Interaction Basis\uff0cLIB\uff09\u3002LIB\u65e8\u5728\u901a\u8fc7\u6d88\u9664\u65e0\u5173\u6fc0\u6d3b\u548c\u4ea4\u4e92\uff0c\u8bc6\u522b\u8ba1\u7b97\u7279\u5f81\u3002\u8be5\u65b9\u6cd5\u6452\u5f03\u65e0\u610f\u4e49\u7684\u6fc0\u6d3b\u65b9\u5411\uff0c\u5e76\u4f7f\u57fa\u7840\u4e0e\u76f8\u90bb\u5c42\u95f4\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u5947\u5f02\u5411\u91cf\u5bf9\u9f50\u3002\u540c\u65f6\uff0c\u5b83\u6839\u636e\u7279\u5f81\u5bf9\u540e\u7eed\u8ba1\u7b97\u7684\u91cd\u8981\u6027\u8fdb\u884c\u7f29\u653e\uff0c\u751f\u6210\u4e00\u4e2a\u663e\u793a\u6a21\u578b\u4e2d\u6240\u6709\u8ba1\u7b97\u76f8\u5173\u7279\u6027\u548c\u4ea4\u4e92\u7684\u56fe\u8c31\u3002 \u6211\u4eec\u5728\u6a21\u5757\u52a0\u6cd5\u548cCIFAR-10\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86LIB\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u6bd4\u4e8e\u4e3b\u6210\u5206\u5206\u6790\uff0cLIB\u80fd\u8bc6\u522b\u51fa\u66f4\u591a\u8ba1\u7b97\u76f8\u5173\u7684\u7279\u5f81\uff0c\u5e76\u5448\u73b0\u51fa\u66f4\u7a00\u758f\u7684\u4ea4\u4e92\u3002\u7136\u800c\uff0c\u5728\u5e94\u7528\u4e8e\u8bed\u8a00\u6a21\u578b\u65f6\uff0cLIB\u5e76\u672a\u663e\u8457\u63d0\u9ad8\u53ef\u89e3\u91ca\u6027\u6216\u4ea4\u4e92\u7a00\u758f\u5ea6\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c3d\u7ba1LIB\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u7406\u8bba\u9a71\u52a8\u65b9\u6cd5\uff0c\u4f46\u5f53\u524d\u5f62\u5f0f\u5e76\u4e0d\u9002\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893](http://arxiv.org/abs/2405.10893)|null|\u8fd9\u7bc7\u6280\u672f\u8bba\u6587\u9610\u8ff0\u4e86COGNET-MD\uff0c\u4e00\u4e2a\u4e13\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7684\u65b0\u57fa\u51c6\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u5206\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u533b\u5b66\u6587\u672c\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u96be\u5ea6\u5206\u7ea7\u7684\u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u6570\u636e\u5e93\u3002\u8fd9\u4e2a\u6570\u636e\u5e93\u7531\u591a\u4e2a\u533b\u7597\u9886\u57df\u7684\u4e13\u5bb6\u5408\u4f5c\u521b\u5efa\uff0c\u4ee5\u53cd\u6620\u5f53\u524d\u533b\u5b66\u8d8b\u52bf\uff0c\u786e\u4fdd\u5b89\u5168\u3001\u5b9e\u7528\u548c\u9002\u7528\u6027\u3002\u521d\u671f\u7248\u672c\u5305\u542b\u4e86\u7cbe\u795e\u79d1\u3001\u7259\u79d1\u3001\u80ba\u75c5\u5b66\u3001\u76ae\u80a4\u79d1\u548c\u5185\u5206\u6ccc\u5b66\u7b49\u9886\u57df\u7684\u9898\u76ee\uff0c\u4f46\u4f1a\u6301\u7eed\u6269\u5c55\uff0c\u672a\u6765\u8fd8\u4f1a\u52a0\u5165\u66f4\u591a\u533b\u5b66\u5b66\u79d1\u3002|\n", "2405.10883": "|**2024-05-17**|**Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review**|Hongyi Yang et.al.|[2405.10883](http://arxiv.org/abs/2405.10883)|null|\u8be5\u7efc\u8ff0\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4f30\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u60a3\u8005\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u73b0\u72b6\u548c\u524d\u666f\uff0c\u4ee5\u53ca\u5176\u5bf9\u5eb7\u590d\u8fc7\u7a0b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u4ece2012\u5e74\u81f3\u73b0\u5728\u7b5b\u9009\u4e8670\u9879\u7814\u7a76\uff0c\u91cd\u70b9\u5173\u6ce8\u673a\u5668\u5b66\u4e60\u3001\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u5728\u5fc3\u7406\u5065\u5eb7\u5e72\u9884\u548c\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u3001\u6280\u672f\u7c7b\u522b\u3001\u4ea7\u54c1\u548c\u6570\u636e\u7c7b\u578b\uff0c\u5982\u751f\u6001\u77ac\u65f6\u8bc4\u4f30\u3001\u884c\u4e3a\u548c\u8bed\u97f3\u6570\u636e\u7684\u5206\u6790\u3002\u7ed3\u679c\u663e\u793a\uff0cAI\u5728\u75c7\u72b6\u76d1\u6d4b\u3001\u590d\u53d1\u98ce\u9669\u9884\u6d4b\u548c\u5eb7\u590d\u6cbb\u7597\u4e2d\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u8fd8\u63a2\u8ba8\u4e86\u57fa\u4e8eAI\u7684\u65b0\u5174\u4ea7\u54c1\u3001\u6280\u672f\u548c\u5206\u6790\u65b9\u6cd5\uff0c\u5982\u793e\u4ea4\u5a92\u4f53\u5206\u6790\u3001\u4e25\u8083\u6e38\u620f\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5eb7\u590d\u4e2d\u7684\u6f5c\u5728\u6311\u6218\u548c\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u603b\u7684\u6765\u8bf4\uff0c\u8fd9\u7bc7\u8bba\u6587\u7cfb\u7edf\u56de\u987e\u4e86AI\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u8def\u5f84\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u5efa\u8bae\u3002|\n", "2405.10853": "|**2024-05-17**|**The Future of Large Language Model Pre-training is Federated**|Lorenzo Sani et.al.|[2405.10853](http://arxiv.org/abs/2405.10853)|null|## \u80cc\u666f \u751f\u6210\u5f0f\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5b83\u4eec\u6240\u63a5\u53d7\u7684\u6d77\u91cf\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u5df2\u5efa\u7acb\u7684\u89c4\u6a21\u6cd5\u5219\uff0cLLMs\u672a\u6765\u6027\u80fd\u7684\u63d0\u5347\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6211\u4eec\u80fd\u591f\u5229\u7528\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u6709\u53ef\u80fd\u91ca\u653e\u5168\u7403\u5927\u90e8\u5206\u672a\u5145\u5206\u5229\u7528\u7684\u6570\u636e\u548c\u8ba1\u7b97\u80fd\u529b\uff0c\u8fd9\u4e9b\u662f\u5f53\u524d\u4ee5\u6570\u636e\u4e2d\u5fc3\u4e3a\u4e2d\u5fc3\u7684LLM\u8bad\u7ec3\u65b9\u6cd5\u6240\u5ffd\u89c6\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7a33\u5065\u3001\u7075\u6d3b\u4e14\u53ef\u590d\u73b0\u7684FL\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdb\u673a\u6784\u95f4\u7684\u5927\u89c4\u6a21\u534f\u4f5c\uff0c\u5171\u540c\u8bad\u7ec3LLMs\uff0c\u4ece\u800c\u52a8\u5458\u66f4\u591a\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\uff0c\u751a\u81f3\u53ef\u80fd\u8fbe\u5230\u6216\u8d85\u8d8a\u4e2d\u5fc3\u5316\u7684\u6027\u80fd\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cdFL\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u5728\u6709\u9650\u8d44\u6e90\u4e0b\u6269\u5c55\u5230\u767e\u4ebf\u5143\u7ea7\u7684\u8054\u90a6LLM\uff0c\u4f7f\u5f97\u62e5\u6709\u4e30\u5bcc\u6570\u636e\u7684\u5b9e\u4f53\u80fd\u591f\u6210\u4e3a\u9884\u8bad\u7ec3LLMs\u7684\u4e3b\u5bfc\u529b\u91cf\uff0c\u800c\u4e0d\u662f\u4ec5\u8ba9\u8ba1\u7b97\u8d44\u6e90\u4e30\u5bcc\u7684\u673a\u6784\u72ec\u5360\u9ccc\u5934\u3002\u8fd9\u79cd\u65b9\u6cd5\u5f3a\u8c03\u4e86\u8054\u90a6\u8bad\u7ec3\u7684\u89c4\u6a21\u6548\u76ca\uff0c\u5e76\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b9e\u7528\u8def\u5f84\u3002|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825](http://arxiv.org/abs/2405.10825)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u5b83\u4eec\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u5c24\u5176\u5728\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u7684\u63a8\u52a8\u4e0b\u5c55\u73b0\u51fa\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u65e8\u5728\u5168\u9762\u6982\u8ff0LLM\u8d4b\u80fd\u7684\u7535\u4fe1\u7f51\u7edc\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86LLMs\u7684\u57fa\u7840\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u63a8\u7406\u4e0e\u5e94\u7528\u3001\u6a21\u578b\u8bc4\u4f30\uff0c\u4ee5\u53ca\u5728\u7535\u4fe1\u90e8\u7f72\u4e2d\u7684\u8fd0\u7528\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8LLM\u652f\u6301\u7684\u5173\u952e\u6280\u672f\u548c\u7535\u4fe1\u5e94\u7528\uff0c\u6d89\u53ca\u751f\u6210\u3001\u5206\u7c7b\u3001\u4f18\u5316\u548c\u9884\u6d4b\u95ee\u9898\u3002\u751f\u6210\u5e94\u7528\u5305\u62ec\u7535\u4fe1\u9886\u57df\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u7f51\u7edc\u914d\u7f6e\u81ea\u52a8\u751f\u6210\u3002\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u4efb\u52a1\u6db5\u76d6\u7f51\u7edc\u5b89\u5168\u3001\u6587\u672c\u3001\u56fe\u50cf\u548c\u6d41\u91cf\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u5229\u7528LLMs\u7684\u81ea\u52a8\u5316\u4f18\u5316\u6280\u672f\uff0c\u5982\u5f3a\u5316\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u8bbe\u8ba1\u548c\u53e3\u8bed\u5f3a\u5316\u5b66\u4e60\u3002\u5bf9\u4e8e\u9884\u6d4b\u95ee\u9898\uff0cLLMs\u53ef\u7528\u4e8e\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u548c\u591a\u6a21\u6001\u7535\u4fe1\u9884\u6d4b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86LLM\u8d4b\u80fd\u7535\u4fe1\u7f51\u7edc\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2405.10808": "|**2024-05-17**|**ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios**|Markus Bayer et.al.|[2405.10808](http://arxiv.org/abs/2405.10808)|null|\u4e3b\u52a8\u5b66\u4e60\u65e8\u5728\u901a\u8fc7\u4f18\u5148\u5904\u7406\u6700\u80fd\u63d0\u5347\u5b66\u4e60\u6548\u679c\u7684\u5b9e\u4f8b\u6765\u51cf\u5c11\u6807\u6ce8\u5de5\u4f5c\u91cf\u3002\u7136\u800c\uff0c\u8bb8\u591a\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u9762\u4e34\u201c\u51b7\u542f\u52a8\u201d\u95ee\u9898\uff0c\u5373\u5728\u521d\u671f\u9700\u8981\u5927\u91cf\u6570\u636e\u624d\u80fd\u53d1\u6325\u6548\u80fd\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982BERT\uff09\u4e0a\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u5df2\u8868\u73b0\u826f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u2014\u2014ActiveLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u3001Llama 3\u548cMistral Large\uff09\u8fdb\u884c\u5b9e\u4f8b\u9009\u62e9\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0cActiveLLM\u663e\u8457\u63d0\u9ad8\u4e86BERT\u5206\u7c7b\u5668\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u548cSetFit\u7b49\u5c11\u6570\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0cActiveLLM\u8fd8\u80fd\u6269\u5c55\u5230\u975e\u5c11\u91cf\u6837\u672c\u573a\u666f\uff0c\u652f\u6301\u8fed\u4ee3\u9009\u62e9\uff0c\u4ece\u800c\u5e2e\u52a9\u5176\u4ed6\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u514b\u670d\u51b7\u542f\u52a8\u96be\u9898\u3002\u7ed3\u679c\u8868\u660e\uff0cActiveLLM\u4e3a\u6539\u5584\u4e0d\u540c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u6a21\u578b\u6027\u80fd\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2405.10745": "|**2024-05-17**|**Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings**|Albert Sawczyn et.al.|[2405.10745](http://arxiv.org/abs/2405.10745)|null|### \u7ffb\u8bd1 \u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u5bf9\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u63d0\u51fa\u4e86\u4e25\u5cfb\u6311\u6218\u3002\u901a\u5e38\u91c7\u7528\u7684\u65b9\u6cd5\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u65f6\u5f80\u5f80\u5b58\u5728\u5c40\u9650\u6027\u3002\u7136\u800c\uff0c\u4eba\u4eec\u5df2\u7ecf\u52aa\u529b\u901a\u8fc7\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u5f25\u8865\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u901a\u8fc7\u5c06\u5c0f\u89c4\u6a21\u7684\u9886\u57df\u7279\u5b9aKG\u4e0e\u901a\u7528KG\u76f8\u7ed3\u5408\u3002\u5c3d\u7ba1KG\u5728\u77e5\u8bc6\u8868\u793a\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4f46\u6784\u5efa\u5b83\u4eec\u7684\u6210\u672c\u53ef\u80fd\u963b\u788d\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u94fe\u63a5\u5230\u5927\u89c4\u6a21\u901a\u7528KG\u6765\u63d0\u5347\u5c0f\u578b\u9886\u57df\u7279\u5b9aKG\u5d4c\u5165\u7684\u5b66\u4e60\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5e26\u6765\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u4f8b\u5982\uff0cHits@10\u6307\u6807\u6700\u9ad8\u63d0\u9ad8\u4e8644%\u3002\u8fd9\u4e00\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u7814\u7a76\u65b9\u5411\u6709\u671b\u4fc3\u8fdbKG\u5728\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u66f4\u9891\u7e41\u8fd0\u7528\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u4e3a\u7a33\u5065\u3001\u53ef\u9760\u7684ML\u89e3\u51b3\u65b9\u6848\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u6d41\u884c\u4f46\u6613\u51fa\u9519\u7684LLM\u65b9\u6cd5\u66f4\u5177\u53ef\u9760\u6027\u3002\u5173\u952e\u8bcd\uff1a\u77e5\u8bc6\u56fe\u8c31\u3001\u77e5\u8bc6\u56fe\u8c31\u8865\u5168\u3001\u5b9e\u4f53\u5bf9\u9f50\u3001\u8868\u793a\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739](http://arxiv.org/abs/2405.10739)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|**\u5728\u8fc7\u53bb\u4e00\u5e74\u91cc\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\u3001\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u7b49\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5e9e\u5927\u89c4\u6a21\u548c\u9ad8\u6602\u7684\u8bad\u7ec3\u4e0e\u63a8\u7406\u6210\u672c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u9ad8\u6548\u4e14\u8f7b\u91cf\u7ea7\u7684MLLM\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u8fb9\u7f18\u8ba1\u7b97\u73af\u5883\u4e2d\u3002\u672c\u7efc\u8ff0\u5168\u9762\u7cfb\u7edf\u5730\u56de\u987e\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7684\u7814\u7a76\u73b0\u72b6\u3002\u6211\u4eec\u6982\u8ff0\u4e86\u4ee3\u8868\u6027\u9ad8\u6548\u6a21\u578b\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u603b\u7ed3\u4e86\u6709\u6548\u7ed3\u6784\u548c\u7b56\u7565\u7684\u7814\u7a76\u72b6\u6001\uff0c\u4ee5\u53ca\u5176\u5b9e\u7528\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7814\u7a76\u7684\u5c40\u9650\uff0c\u5e76\u5c55\u671b\u4e86\u6709\u524d\u666f\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u5982\u9700\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey\u3002**|\n", "2405.10725": "|**2024-05-17**|**INDUS: Effective and Efficient Language Models for Scientific Applications**|Bishwaranjan Bhattacharjee et.al.|[2405.10725](http://arxiv.org/abs/2405.10725)|null|\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bad\u7ec3\u6570\u636e\u53ef\u4ee5\u4f7f\u6a21\u578b\u5728\u4e13\u4e1a\u4efb\u52a1\u4e0a\u8868\u73b0\u66f4\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86INDUS\uff0c\u4e00\u5957\u4e13\u4e3a\u5730\u7403\u79d1\u5b66\u3001\u751f\u7269\u5b66\u3001\u7269\u7406\u5b66\u3001\u592a\u9633\u7269\u7406\u3001\u884c\u661f\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u8bbe\u8ba1\u7684\u5b9a\u5236\u5316\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u4e9b\u6a21\u578b\u57fa\u4e8e\u7cbe\u5fc3\u6311\u9009\u7684\u79d1\u5b66\u8bed\u6599\u5e93\uff0c\u5305\u62ec\uff1a\uff081\uff09\u4e00\u4e2a\u4f7f\u7528\u9886\u57df\u4e13\u7528\u8bcd\u6c47\u548c\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u63d0\u5347\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u7684\u8868\u73b0\uff1b\uff082\uff09\u4e00\u4e2a\u57fa\u4e8e\u5bf9\u6bd4\u5b66\u4e60\u7684\u901a\u7528\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5229\u7528\u591a\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u4f18\u5316\u4fe1\u606f\u68c0\u7d22\u4efb\u52a1\uff1b\uff083\uff09\u901a\u8fc7\u77e5\u8bc6\u84b8\u998f\u6280\u672f\u7f29\u5c0f\u89c4\u6a21\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5bf9\u5ef6\u8fdf\u548c\u8d44\u6e90\u6709\u9650\u7684\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e09\u4e2a\u65b0\u7684\u79d1\u5b66\u57fa\u51c6\u6570\u636e\u96c6\uff1aCLIMATE-CHANGE-NER\uff08\u5b9e\u4f53\u8bc6\u522b\uff09\u3001NASA-QA\uff08\u62bd\u53d6\u5f0f\u95ee\u7b54\uff09\u548cNASA-IR\uff08\u4fe1\u606f\u68c0\u7d22\uff09\uff0c\u4ee5\u63a8\u52a8\u8de8\u5b66\u79d1\u9886\u57df\u7684\u7814\u7a76\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u65b0\u4efb\u52a1\u548c\u76f8\u5173\u9886\u57df\u73b0\u6709\u57fa\u51c6\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8e\u901a\u7528\u7f16\u7801\u5668\uff08\u5982RoBERTa\uff09\u548c\u73b0\u6709\u7684\u9886\u57df\u7279\u5b9a\u7f16\u7801\u5668\uff08\u5982SciBERT\uff09\u3002|\n", "2405.12217": "|**2024-05-20**|**Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning**|Guanglin Zhou et.al.|[2405.12217](http://arxiv.org/abs/2405.12217)|**[link](https://github.com/jameszhou-gl/icl-distribution-shift)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5e94\u5bf9\u81ea\u7136\u5206\u5e03\u53d8\u5316\u65f6\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\uff0c\u5e38\u5e38\u8d85\u8d8a\u5148\u524d\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u9886\u57df\u7279\u5b9a\u7684\u9002\u5e94\u4ecd\u7136\u662f\u5fc5\u8981\u7684\uff0c\u5c24\u5176\u662f\u5728\u533b\u7597\u7b49\u4e13\u4e1a\u9886\u57df\u3002\u9274\u4e8eLMMs\u5e9e\u5927\u7684\u53c2\u6570\u7a7a\u95f4\u4f7f\u5176\u5fae\u8c03\u4e0d\u5207\u5b9e\u9645\uff0c\u672c\u7814\u7a76\u805a\u7126\u4e8e\u63a2\u7d22\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u4f5c\u4e3a\u4e00\u79cd\u589e\u5f3aLMM\u9002\u5e94\u6027\u7684\u6709\u6548\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0cICL\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u793a\u4f8b\u7684\u9009\u62e9\uff0c\u8fd9\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7c7b\u4f3c\uff0c\u4f46\u5bf9\u9762\u4e34\u5206\u5e03\u53d8\u5316\u7684LMMs\u63d0\u51fa\u4e86\u72ec\u7279\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u79cd\u65e0\u76d1\u7763\u7684ICL\u65b9\u6cd5\u2014\u2014TopKNearestPR\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u7279\u5f81\u76f8\u4f3c\u6027\u8fdb\u884c\u6700\u8fd1\u793a\u4f8b\u641c\u7d22\u6765\u9009\u62e9\u793a\u4f8b\u3002\u7814\u7a76\u63ed\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u5904\u7406\u5206\u5e03\u8f6c\u79fb\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u7f3a\u9677\u5bf9\u5176\u6548\u679c\u7684\u9650\u5236\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014InvariantSelectPR\uff0c\u5b83\u5229\u7528\u7c7b\u6761\u4ef6\u5bf9\u6bd4\u4e0d\u53d8\u6027\uff08CCI\uff09\u6765\u63d0\u5347\u9884\u8bad\u7ec3\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7a33\u5065\u6027\u3002CCI\u901a\u8fc7\u589e\u5f3a\u4e0d\u540c\u7c7b\u522b\u95f4\u7684\u533a\u5206\u5ea6\u5e76\u786e\u4fdd\u5bf9\u9886\u57df\u7279\u5b9a\u53d8\u5316\u7684\u4e0d\u53d8\u6027\uff0c\u63d0\u9ad8\u4e86\u7f16\u7801\u5668\u8bc6\u522b\u548c\u68c0\u7d22\u6700\u6709\u4fe1\u606f\u4ef7\u503c\u793a\u4f8b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u6709\u52a9\u4e8e\u5f15\u5bfcLMM\u9002\u5e94\u65b0\u7684\u67e5\u8be2\u6837\u672c\uff0c\u5373\u4f7f\u5728\u4e0d\u540c\u7684\u5206\u5e03\u4e0b\u4e5f\u662f\u5982\u6b64\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cInvariantSelectPR\u663e\u8457\u63d0\u9ad8\u4e86LMM\u7684\u9002\u5e94\u6027\uff0c\u5728Camelyon17\u548cHAM10000\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u76847-shot\u4efb\u52a1\u4e2d\uff0c\u5206\u522b\u5b9e\u73b0\u4e8634.2%\u548c16.9%\u7684\u51c6\u786e\u7387\u63d0\u5347\uff0c\u76f8\u5bf9\u4e8e\u96f6-shot\u6027\u80fd\uff0c\u8fd9\u662f\u663e\u8457\u7684\u8fdb\u6b65\u3002**|\n", "2405.12209": "|**2024-05-20**|**MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark**|Hongwei Liu et.al.|[2405.12209](http://arxiv.org/abs/2405.12209)|**[link](https://github.com/open-compass/mathbench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u6570\u5b66\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f20\u7edf\u7684\u6570\u5b66\u57fa\u51c6\u5982GSM8k\u5728\u5168\u9762\u8bc4\u4ef7\u8fd9\u4e9b\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MathBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u4e25\u683c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u3002MathBench\u8986\u76d6\u5e7f\u6cdb\u7684\u6570\u5b66\u5b66\u79d1\uff0c\u5bf9\u7406\u8bba\u7406\u89e3\u548c\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u8fdb\u884c\u8be6\u5c3d\u8bc4\u4f30\u3002\u5b83\u5206\u4e3a\u4e94\u4e2a\u9636\u6bb5\uff0c\u4ece\u57fa\u7840\u7b97\u672f\u5230\u5927\u5b66\u6570\u5b66\uff0c\u7ed3\u6784\u4e0a\u8bbe\u8ba1\u7528\u4e8e\u8003\u5bdf\u6a21\u578b\u5728\u4e0d\u540c\u6df1\u5ea6\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u6bcf\u4e2a\u9636\u6bb5\u5305\u62ec\u7406\u8bba\u95ee\u9898\u548c\u5e94\u7528\u9898\uff0c\u4ee5\u8861\u91cf\u6a21\u578b\u7684\u6570\u5b66\u719f\u7ec3\u5ea6\u53ca\u5176\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\u5e94\u7528\u6982\u5ff5\u7684\u80fd\u529b\u3002MathBench\u7684\u76ee\u6807\u662f\u63d0\u5347\u5bf9LLMs\u6570\u5b66\u80fd\u529b\u7684\u8bc4\u4ef7\uff0c\u63d0\u4f9b\u5bf9\u5176\u77e5\u8bc6\u7406\u89e3\u6c34\u5e73\u548c\u95ee\u9898\u89e3\u51b3\u6280\u80fd\u7684\u7ec6\u81f4\u89c6\u89d2\uff0c\u540c\u65f6\u652f\u6301\u53cc\u8bed\u73af\u5883\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728https://github.com/open-compass/MathBench\u3002**|\n", "2405.12195": "|**2024-05-20**|**Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey**|Thiago S. Vaillant et.al.|[2405.12195](http://arxiv.org/abs/2405.12195)|**[link](https://github.com/gpt-impact/Paper-content)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5176\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\u548c\u5e7f\u6cdb\u5e94\u7528\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4e0e\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u7684\u878d\u5408\u8d8b\u52bf\u65e5\u76ca\u660e\u663e\uff0c\u4f46\u5173\u4e8e\u8fd9\u79cd\u878d\u5408\u5982\u4f55\u5f71\u54cd\u8f6f\u4ef6\u5f00\u53d1\u5b9e\u8df5\u548c\u8ba4\u77e5\u7684\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u4e3a\u4e86\u63ed\u793a\u5c06AI\u9a71\u52a8\u5de5\u5177\uff0c\u5982ChatGPT\uff0c\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u7684\u5f71\u54cd\u548c\u6311\u6218\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8c03\u67e5\uff0c\u9488\u5bf9207\u540d\u8f6f\u4ef6\u5f00\u53d1\u8005\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u8c03\u67e5\u5185\u5bb9\u5305\u62ecChatGPT\u5bf9\u8f6f\u4ef6\u8d28\u91cf\u3001\u751f\u4ea7\u529b\u4ee5\u53ca\u5f00\u53d1\u8005\u5de5\u4f5c\u6ee1\u610f\u5ea6\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u8fd8\u63a2\u8ba8\u4e86\u4ed6\u4eec\u5bf9\u672a\u6765ChatGPT\u5e94\u7528\u7684\u9884\u671f\u3001\u5bf9\u53ef\u80fd\u7684\u5de5\u4f5c\u5c97\u4f4d\u66ff\u4ee3\u7684\u62c5\u5fe7\uff0c\u4ee5\u53ca\u5bf9\u76d1\u7ba1\u63aa\u65bd\u7684\u770b\u6cd5\u3002|\n", "2405.12174": "|**2024-05-20**|**CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models**|Haoxiang Shi et.al.|[2405.12174](http://arxiv.org/abs/2405.12174)|null|\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aCT-Eval\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\uff0c\u65e8\u5728\u8861\u91cf\u5927\u8bed\u8a00\u6a21\u578b\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u6587\u672c\u8f6c\u8868\u683c\u4efb\u52a1\u6027\u80fd\u3002\u7531\u4e8e\u73b0\u6709\u82f1\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\u4e3b\u8981\u9762\u5411\u82f1\u8bed\uff0cCT-Eval\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u9009\u62e9\u4e86\u4e00\u79cd\u6d41\u884c\u7684\u591a\u5b66\u79d1\u4e2d\u6587\u5728\u7ebf\u767e\u79d1\u4f5c\u4e3a\u6765\u6e90\uff0c\u6db5\u76d6\u4e8628\u4e2a\u9886\u57df\u4ee5\u4fdd\u8bc1\u6570\u636e\u591a\u6837\u6027\u3002\u4e3a\u4e86\u51cf\u5c11\u6570\u636e\u865a\u6784\uff08hallucination\uff09\u95ee\u9898\uff0c\u7814\u7a76\u8005\u9996\u5148\u8bad\u7ec3\u4e86\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u6765\u8bc6\u522b\u5e76\u8fc7\u6ee4\u6389\u5b58\u5728\u865a\u6784\u95ee\u9898\u7684\u6837\u672c\uff0c\u7136\u540e\u4eba\u5de5\u6807\u6ce8\u9a8c\u8bc1\u96c6\u548c\u6d4b\u8bd5\u96c6\u4e2d\u7684\u9519\u8bef\u3002\u6700\u7ec8\uff0cCT-Eval\u5305\u542b\u4e86\u5927\u7ea688,600\u4e2a\u4efb\u52a1\u6837\u672c\u3002\u901a\u8fc7CT-Eval\uff0c\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5f00\u6e90\u548c\u95ed\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u663e\u793a\u96f6-shot\u6a21\u5f0f\u4e0b\u8fd9\u4e9b\u6a21\u578b\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ecd\u6709\u663e\u8457\u5dee\u8ddd\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5f00\u6e90\u6a21\u578b\u5728\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u4e0a\u6709\u4e86\u663e\u8457\u63d0\u5347\uff0c\u5927\u5e45\u8d85\u8d8a\u4e86GPT-4\u3002\u603b\u4e4b\uff0cCT-Eval\u4e0d\u4ec5\u4e3a\u8bc4\u4f30\u548c\u7406\u89e3\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u5de5\u5177\uff0c\u4e5f\u4e3a\u63d0\u5347\u8fd9\u7c7b\u6a21\u578b\u5728\u8fd9\u9879\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u5b9d\u8d35\u8d44\u6e90\u3002|\n", "2405.12163": "|**2024-05-20**|**Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging**|Xiaobo Liang et.al.|[2405.12163](http://arxiv.org/abs/2405.12163)|**[link](https://github.com/dropreg/fennec)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u4f17\u591a\u73b0\u5b9e\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4e3b\u8981\u76ee\u6807\u662f\u7b26\u5408\u4eba\u7c7b\u7684\u610f\u56fe\u3002\u7136\u800c\uff0c\u7406\u89e3\u4eba\u7c7b\u610f\u56fe\u7684\u590d\u6742\u6027\u4f7f\u5f97\u4f9d\u8d56\u4e8e\u8017\u65f6\u7684\u4eba\u5de5\u8bc4\u4f30\u6210\u4e3a\u5fc5\u8981\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5229\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u8bc4\u4f30\u8005\u7684\u8d8b\u52bf\uff0c\u7279\u522b\u662f\u5728GPT-4\u7684\u6d41\u884c\u80cc\u666f\u4e0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textbf{Fennec}\u7684\u6846\u67b6\uff0c\u4e13\u6ce8\u4e8e\\textbf{F}ine-grained \\textbf{E}valuation\uff08\u7ec6\u81f4\u8bc4\u4f30\uff09\u548c\\textbf{N}eeded \\textbf{E}xtension\uff08\u5fc5\u8981\u6269\u5c55\uff09\u901a\u8fc7\u5206\u652f\uff08Branching\uff09\u548c\u8fde\u63a5\uff08Bridging\uff09\u3002\u5206\u652f\u64cd\u4f5c\u5c06\u8bc4\u4f30\u4efb\u52a1\u5206\u89e3\u4e3a\u4e0d\u540c\u7ef4\u5ea6\u548c\u7c92\u5ea6\uff0c\u4ece\u800c\u51cf\u8f7b\u8bc4\u4f30\u6311\u6218\u3002\u540c\u65f6\uff0c\u8fde\u63a5\u64cd\u4f5c\u878d\u5408\u4e86\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u589e\u52a0\u4e86\u8bc4\u4f30\u4efb\u52a1\u7684\u591a\u6837\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u76847B\u6a21\u578b\u5728\u5404\u79cd\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\\textit{\u4e00\u81f4\u6027}\u548c\\textit{\u4e00\u81f4\u540c\u610f}\u6027\u80fd\u5747\u4f18\u4e8e\u5f00\u6e90\u7684\u66f4\u5927\u89c4\u6a21\u8bc4\u4f30\u6a21\u578b\uff0c\u63a5\u8fd1GPT-4\u7684\u8868\u73b0\u3002\u6211\u4eec\u5229\u7528\u6a21\u578b\u7684\u7cbe\u7ec6\u6821\u6b63\u529f\u80fd\u6539\u8fdb\u591a\u4e2a\u6a21\u578b\u54cd\u5e94\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u4f18\u5316\u63d0\u5347\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u5728MT-Bench\u4e0a\u63d0\u9ad8\u4e861-2\u5206\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728GitHub\u4e0a\u5f00\u6e90\\footnote{\\url{https://github.com/dropreg/Fennec}}\u3002**|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130](http://arxiv.org/abs/2405.12130)|**[link](https://github.com/kongds/mora)**|**\u4f4e\u79e9\u9002\u5e94\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u6d41\u884c\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4f4e\u79e9\u66f4\u65b0\uff08\u5982LoRA\u5b9e\u73b0\uff09\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u6307\u51fa\uff0c\u8fd9\u79cd\u673a\u5236\u53ef\u80fd\u9650\u5236\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\u548c\u8bb0\u5fc6\u65b0\u77e5\u8bc6\u7684\u80fd\u529b\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5MoRA\uff0c\u5b83\u5229\u7528\u5e73\u65b9\u77e9\u9635\u5b9e\u73b0\u9ad8\u79e9\u66f4\u65b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eLoRA\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76f8\u5e94\u7684\u975e\u53c2\u6570\u8fd0\u7b97\u5668\uff0c\u4ee5\u964d\u4f4e\u8f93\u5165\u7ef4\u5ea6\u5e76\u589e\u52a0\u8f93\u51fa\u7ef4\u5ea6\u5904\u7406\u5e73\u65b9\u77e9\u9635\u3002\u8fd9\u4e9b\u8fd0\u7b97\u5668\u786e\u4fdd\u6743\u91cd\u80fd\u65e0\u7f1d\u878d\u5165\u5230\u5927\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u4f7f\u5f97\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u50cfLoRA\u4e00\u6837\u90e8\u7f72\u3002\u6211\u4eec\u5728\u4e94\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff1a\u6307\u4ee4\u8c03\u6574\u3001\u6570\u5b66\u63a8\u7406\u3001\u8fde\u7eed\u9884\u8bad\u7ec3\u3001\u8bb0\u5fc6\u4ee5\u53ca\u9884\u8bad\u7ec3\u3002\u5728\u5185\u5b58\u5bc6\u96c6\u578b\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8eLoRA\uff0c\u5e76\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002**|\n", "2405.12119": "|**2024-05-20**|**Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation**|Zhankui He et.al.|[2405.12119](http://arxiv.org/abs/2405.12119)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u901a\u8fc7\u51fa\u8272\u5730\u7d22\u5f15\u9879\u76ee\u5185\u5bb9\u3001\u7406\u89e3\u590d\u6742\u7684\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u5e76\u751f\u6210\u76f8\u5173\u9879\u76ee\u6807\u9898\uff0c\u9769\u65b0\u4e86\u5bf9\u8bdd\u63a8\u8350\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u63a7\u5236\u63a8\u8350\u9879\u76ee\u7684\u5206\u5e03\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u5bfc\u81f4\u5728\u9488\u5bf9\u5bf9\u8bdd\u63a8\u8350\u5e73\u53f0\u7684\u5feb\u901f\u53d8\u5316\u7684\u6570\u636e\u5206\u5e03\uff0c\u5982\u9879\u76ee\u6d41\u884c\u5ea6\u4e0a\uff0c\u6027\u80fd\u6b20\u4f73\u3002\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\uff0cLLMs\u901a\u8fc7\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u9879\u76ee\u6807\u9898\uff08\u4f5c\u4e3a\u591a\u4e2a\u4ee4\u724c\uff09\uff0c\u8fd9\u4f7f\u5f97\u83b7\u53d6\u548c\u63a7\u5236\u6240\u6709\u9879\u76ee\u63a8\u8350\u53d8\u5f97\u56f0\u96be\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u91cd\u7d22\u5f15-\u7136\u540e\u9002\u5e94\u201d\uff08Reindex-Then-Adapt\uff0cRTA\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u591a\u4ee4\u724c\u9879\u76ee\u6807\u9898\u8f6c\u6362\u4e3a\u5355\u4e2a\u4ee4\u724c\u4e8eLLMs\u5185\uff0c\u968f\u540e\u8c03\u6574\u8fd9\u4e9b\u5355\u4ee4\u724c\u9879\u76ee\u6807\u9898\u7684\u6982\u7387\u5206\u5e03\u3002RTA\u6846\u67b6\u7ed3\u5408\u4e86LLMs\u7406\u89e3\u548c\u590d\u6742\u67e5\u8be2\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\uff08RecSys\uff09\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\u6709\u6548\u63a7\u5236\u63a8\u8350\u9879\u76ee\u5206\u5e03\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u4e09\u4e2a\u4e0d\u540c\u7684\u5bf9\u8bdd\u63a8\u8350\u6570\u636e\u96c6\u548c\u4e24\u79cd\u9002\u5e94\u8bbe\u7f6e\u4e0b\uff0c\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u51c6\u786e\u6027\u6307\u6807\u3002|\n", "2405.12107": "|**2024-05-20**|**Imp: Highly Capable Large Multimodal Models for Mobile Devices**|Zhenwei Shao et.al.|[2405.12107](http://arxiv.org/abs/2405.12107)|**[link](https://github.com/milvlg/imp)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5f00\u653e\u4e16\u754c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u53c2\u6570\u91cf\u5927\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\uff0c\u9650\u5236\u4e86\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u8f7b\u91cf\u7ea7LMM\uff0c\u65e8\u5728\u5728\u6709\u9650\u89c4\u6a21\uff08\u598230\u4ebf\u53c2\u6570\uff09\u4e0b\u6700\u5927\u5316\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u591a\u6570\u4ec5\u5173\u6ce8\u8bbe\u8ba1\u7a7a\u95f4\u7684\u5355\u4e00\u6216\u4e24\u4e2a\u65b9\u9762\uff0c\u5bf9\u5f71\u54cd\u6a21\u578b\u80fd\u529b\u7684\u5173\u952e\u8bbe\u8ba1\u9009\u62e9\u5c1a\u672a\u8fdb\u884c\u5168\u9762\u63a2\u8ba8\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u8f7b\u91cf\u7ea7LMM\u7684\u8bbe\u8ba1\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u8bad\u7ec3\u7b56\u7565\u548c\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u5957\u540d\u4e3aImp\u7684\u9ad8\u6027\u80fdLMM\u5bb6\u65cf\uff0c\u8986\u76d620\u4ebf\u523040\u4ebf\u53c2\u6570\u89c4\u6a21\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684Imp-30\u4ebf\u6a21\u578b\u5728\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u73b0\u6709\u8f7b\u91cf\u7ea7\u6a21\u578b\u76f8\u6bd4\u65f6\u6301\u7eed\u9886\u5148\uff0c\u5e76\u8d85\u8d8a\u4e86130\u4ebf\u53c2\u6570\u89c4\u6a21\u7684\u6700\u65b0LMM\u72b6\u6001\u3002\u901a\u8fc7\u4f4e\u7cbe\u5ea6\u91cf\u5316\u548c\u5206\u8fa8\u7387\u964d\u4f4e\u6280\u672f\uff0cImp\u6a21\u578b\u80fd\u591f\u5728\u9ad8\u901a\u9a81\u9f998Gen3\u79fb\u52a8\u82af\u7247\u4e0a\u5b9e\u73b0\u9ad8\u901f\u90e8\u7f72\uff0c\u6bcf\u79d2\u5904\u7406\u5927\u7ea613\u4e2a\u4ee4\u724c\u7684\u63a8\u7406\u901f\u5ea6\u3002**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100](http://arxiv.org/abs/2405.12100)|null|## \u80cc\u666f \u6570\u5b66\u4e16\u754c\u95ee\u9898\u4fee\u6b63\uff08MWPC\uff09\u662f\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u89e3\u51b3\u6570\u5b66\u95ee\u9898\u8fc7\u7a0b\u4e2d\u9519\u8bef\u63a8\u7406\u7684\u4fee\u6b63\u4efb\u52a1\u3002\u672c\u6587\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\uff0c\u5173\u6ce8\u4e24\u70b9\uff1a\uff081\uff09\u533a\u5206\u6570\u5b66\u63a8\u7406\u4e0e\u9519\u8bef\u4fee\u6b63\uff1b\uff082\uff09\u63a2\u7d22\u7b56\u7565\u4ee5\u63d0\u5347LLMs\u5728\u6570\u5b66\u9886\u57df\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9MWPC\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5728\u5b9e\u65f6\u6559\u80b2\u4e2d\uff0c\u5e2e\u52a9\u5b66\u751f\u8bc6\u522b\u9519\u8bef\u6bd4\u5355\u7eaf\u63d0\u4f9b\u6b63\u786e\u7b54\u6848\u66f4\u4e3a\u5173\u952e\u3002\u7136\u800c\uff0c\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u83b7\u53d6\u7cbe\u786e\u7684\u89e3\u9898\u7b54\u6848\uff0c\u800c\u975e\u7ea0\u6b63\u53ef\u80fd\u7684\u9519\u8bef\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8c03\u6574\u4e86\u7814\u7a76\u8303\u5f0f\uff0c\u8868\u660e\u63d0\u5347\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5e76\u4e0d\u7b49\u540c\u4e8e\u7cbe\u901a\u9519\u8bef\u4fee\u6b63\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u8bca\u65ad\u5bfc\u5411\u63d0\u793a\uff08DOP\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdbLLMs\u5728\u9519\u8bef\u4fee\u6b63\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDOP\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u5f70\u663e\u5176\u91cd\u8981\u6027\u3002\u6211\u4eec\u5f3a\u8c03\uff0c\u5728\u6570\u5b66\u6559\u80b2\u4e2d\uff0c\u5bf9\u51fa\u8272\u4fee\u6b63\u8005\u7684\u9700\u8981\u8d85\u8fc7\u4e86\u5bf9\u719f\u7ec3\u63a8\u7406\u8005\u7684\u8ffd\u6c42\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2405.12981": "|**2024-05-21**|**Reducing Transformer Key-Value Cache Size with Cross-Layer Attention**|William Brandon et.al.|[2405.12981](http://arxiv.org/abs/2405.12981)|null|## \u7ffb\u8bd1 \u952e\u503c\u7f13\u5b58\u5bf9\u4e8e\u52a0\u901fTransformer\u67b6\u6784\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89e3\u7801\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u548c\u6279\u91cf\u5927\u5c0f\u589e\u5927\uff0c\u5b58\u50a8\u952e\u503c\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\u53ef\u80fd\u4f1a\u53d8\u5f97\u96be\u4ee5\u627f\u53d7\u3002\u81ea\u4eceTransformer\u8bde\u751f\u4ee5\u6765\uff0c\u4e24\u4e2a\u6700\u6709\u6548\u7684\u5185\u5b58\u51cf\u5c0f\u7b56\u7565\u662f\u591a\u67e5\u8be2\u6ce8\u610f\u529b\uff08MQA\uff09\u53ca\u5176\u63a8\u5e7f\uff0c\u7fa4\u7ec4\u67e5\u8be2\u6ce8\u610f\u529b\uff08GQA\uff09\u3002MQA\u548cGQA\u901a\u8fc7\u8ba9\u591a\u4e2a\u67e5\u8be2\u5934\u5171\u4eab\u5355\u4e2a\u952e/\u503c\u5934\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u4e0d\u540c\u952e/\u503c\u5934\u7684\u6570\u91cf\uff0c\u540c\u65f6\u5bf9\u51c6\u786e\u6027\u5f71\u54cd\u8f83\u5c0f\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u8fdb\u4e00\u6b65\u53d1\u5c55MQA\uff0c\u5373\u5728\u76f8\u90bb\u5c42\u4e4b\u95f4\u4e5f\u5171\u4eab\u952e\u548c\u503c\u5934\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u8de8\u5c42\u6ce8\u610f\u529b\uff08CLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528CLA\uff0c\u53ef\u4ee5\u5728\u4fdd\u6301\u63a5\u8fd1\u539f\u59cbMQA\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u5c06\u952e\u503c\u7f13\u5b58\u7684\u5927\u5c0f\u518d\u51cf\u5c112\u500d\u3002\u6211\u4eec\u5728\u4ece\u5934\u8bad\u7ec310\u4ebf\u53c2\u6570\u548c30\u4ebf\u53c2\u6570\u6a21\u578b\u7684\u5b9e\u9a8c\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u70b9\uff0c\u7ed3\u679c\u8868\u660e\uff0cCLA\u5728\u5185\u5b58\u4e0e\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u63d0\u4f9b\u4e86\u4f18\u4e8e\u4f20\u7edfMQA\u7684\u5e15\u7d2f\u6258\u6539\u8fdb\uff0c\u4f7f\u5f97\u66f4\u957f\u7684\u5e8f\u5217\u957f\u5ea6\u548c\u66f4\u5927\u7684\u6279\u91cf\u5927\u5c0f\u4e0b\u7684\u63a8\u7406\u6210\u4e3a\u53ef\u80fd\u3002|\n", "2405.12961": "|**2024-05-21**|**Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale**|Shriram Chennakesavalu et.al.|[2405.12961](http://arxiv.org/abs/2405.12961)|**[link](https://github.com/rotskoff-group/llm-era)**|\u5728\u5316\u5b66\u7a7a\u95f4\u4e2d\u7684\u641c\u7d22\u662f\u4e00\u4e2a\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\uff0c\u56e0\u4e3a\u53ef\u80fd\u7684\u5206\u5b50\u6570\u91cf\u968f\u7740\u539f\u5b50\u6570\u91cf\u5448\u7ec4\u5408\u7ea7\u589e\u957f\u3002\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\u901a\u8fc7\u5b66\u4e60\u5316\u5b66\u5316\u5408\u7269\u6570\u636e\u5e93\u5df2\u7ecf\u4ea7\u751f\u4e86\u5f3a\u5927\u7684\u751f\u6210\u5668\uff0c\u4f46\u6211\u4eec\u4ecd\u7136\u7f3a\u4e4f\u6709\u6548\u7b56\u7565\u6765\u751f\u6210\u5177\u6709\u7279\u5b9a\u6027\u8d28\u7684\u5206\u5b50\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u201c\u5bf9\u9f50\u201d\u95ee\u9898\u76f8\u4f3c\uff0c\u5c3d\u7ba1\u5728\u8bb8\u591a\u5316\u5b66\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u6709\u4e00\u4e2a\u660e\u786e\u4e14\u6613\u4e8e\u8bc4\u4f30\u7684\u5956\u52b1\u51fd\u6570\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u80fd\u91cf\u6392\u540d\u5bf9\u9f50\uff08ERA\uff09\u7684\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u660e\u786e\u7684\u5956\u52b1\u51fd\u6570\u6784\u5efa\u4e86\u4e00\u4e2a\u68af\u5ea6\u4f18\u5316\u76ee\u6807\uff0c\u7528\u4e8e\u8c03\u6574\u81ea\u56de\u5f52\u7b56\u7565\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u8be5\u7b97\u6cd5\u4e0eProximal Policy Optimization\uff08PPO\uff09\u548cDirect Preference Optimization\uff08DPO\uff09\u5bc6\u5207\u76f8\u5173\uff0c\u4f46\u5176\u6700\u5c0f\u5316\u5668\u6536\u655b\u4e8e\u4e00\u4e2a\u7406\u60f3\u7684\u5409\u5e03\u65af-\u73bb\u5c14\u5179\u66fc\u5206\u5e03\uff0c\u5956\u52b1\u51fd\u6570\u626e\u6f14\u4e86\u80fd\u91cf\u89d2\u8272\u3002\u6b64\u5916\uff0c\u8be5\u7b97\u6cd5\u5177\u6709\u9ad8\u5ea6\u53ef\u6269\u5c55\u6027\uff0c\u65e0\u9700\u5f3a\u5316\u5b66\u4e60\uff0c\u5e76\u4e14\u5728\u6bcf\u5bf9\u6837\u672c\u7684\u504f\u597d\u89c2\u5bdf\u6b21\u6570\u8f83\u5c11\u65f6\uff0c\u76f8\u5bf9\u4e8eDPO\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u5e94\u7528\u4e8e\u5206\u5b50\u53d8\u538b\u5668\u7684\u5bf9\u9f50\uff0c\u4ee5\u751f\u6210\u5177\u6709\u5916\u90e8\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\uff0c\u5e76\u53d1\u73b0\u5b83\u80fd\u7a33\u5065\u5730\u8fdb\u884c\u641c\u7d22\uff0c\u63a2\u7d22\u5316\u5b66\u7a7a\u95f4\u7684\u591a\u6837\u5316\u90e8\u5206\u3002\u867d\u7136\u6211\u4eec\u7684\u91cd\u70b9\u5728\u4e8e\u5316\u5b66\u641c\u7d22\uff0c\u4f46\u6211\u4eec\u5728\u4e00\u4e2aAI\u76d1\u7763\u7684\u4efb\u52a1\u4e0a\u4e5f\u53d6\u5f97\u4e86\u4f18\u79c0\u7ed3\u679c\uff0c\u8868\u660e\u8be5\u65b9\u6cd5\u662f\u53ef\u6269\u5c55\u4e14\u901a\u7528\u7684\u3002|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939](http://arxiv.org/abs/2405.12939)|**[link](https://github.com/yinzhangyue/AoR)**|## \u80cc\u666f \u8fd1\u671f\uff0cChain-of-Thought\u63d0\u793a\u7684\u8fdb\u5c55\u6781\u5927\u5730\u63a8\u52a8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u7a81\u7834\u3002\u5f53\u524d\u7814\u7a76\u901a\u8fc7\u91c7\u6837\u591a\u79cd\u63a8\u7406\u8def\u5f84\u5e76\u6839\u636e\u7b54\u6848\u9891\u7387\u8fdb\u884censemble\uff0c\u63d0\u9ad8\u4e86LLMs\u7684\u63a8\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u6b63\u786e\u7b54\u6848\u5904\u4e8e\u5c11\u6570\u7684\u60c5\u51b5\u65f6\u5931\u6548\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u662f\u5236\u7ea6LLMs\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u56e0\u7d20\uff0c\u4ec5\u51ed\u9884\u6d4b\u7b54\u6848\u65e0\u6cd5\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u63a8\u7406\u805a\u5408\u6846\u67b6AoR\uff08\u63a8\u7406\u805a\u5408\uff09\uff0c\u5b83\u4f9d\u636e\u63a8\u7406\u94fe\u6761\u7684\u8bc4\u4f30\u6765\u9009\u62e9\u7b54\u6848\u3002\u6b64\u5916\uff0cAoR\u5f15\u5165\u4e86\u52a8\u6001\u91c7\u6837\u7b56\u7565\uff0c\u6839\u636e\u4efb\u52a1\u590d\u6742\u5ea6\u8c03\u6574\u63a8\u7406\u94fe\u6761\u7684\u6570\u91cf\u3002 ## \u4efb\u52a1 \u4e00\u7cfb\u5217\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAoR\u76f8\u8f83\u4e8e\u4e3b\u6d41ensemble\u65b9\u6cd5\u8868\u73b0\u51fa\u8272\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u8868\u660e\uff0cAoR\u4e0d\u4ec5\u9002\u7528\u4e8e\u5404\u79cdLLMs\uff0c\u800c\u4e14\u5728\u4e0e\u73b0\u6709\u65b9\u6cd5\u7684\u6027\u80fd\u5929\u82b1\u677f\u6bd4\u8f83\u4e2d\uff0c\u8fbe\u5230\u4e86\u66f4\u4f18\u79c0\u7684\u6c34\u5e73\u3002|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933](http://arxiv.org/abs/2405.12933)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bf8\u5982\u603b\u7ed3\u3001\u7b97\u672f\u63a8\u7406\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5728\u9053\u5fb7\u63a8\u7406\u548c\u4f26\u7406\u51b3\u7b56\u65b9\u9762\uff0c\u5c24\u5176\u662f\u5728\u6d89\u53ca\u591a\u4e2a\u5229\u76ca\u76f8\u5173\u8005\u7684\u590d\u6742\u60c5\u666f\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u4e25\u5cfb\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSkin-in-the-Game\uff08SKIG\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4ece\u4e0d\u540c\u5229\u76ca\u76f8\u5173\u8005\u89d2\u5ea6\u5ba1\u89c6\u51b3\u7b56\u7684\u540e\u679c\uff0c\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u5728\u9053\u5fb7\u63a8\u7406\u4e2d\u7684\u80fd\u529b\u3002SKIG\u7684\u6838\u5fc3\u673a\u5236\u662f\u6a21\u62df\u884c\u52a8\u7684\u8d23\u4efb\u611f\uff0c\u7ed3\u5408\u540c\u7406\u5fc3\u7ec3\u4e60\u548c\u98ce\u9669\u8bc4\u4f30\uff0c\u5bf9\u63d0\u9ad8\u5176\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u4f7f\u7528\u4e13\u6709\u548c\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u9053\u5fb7\u63a8\u7406\u57fa\u51c6\u4e0a\u9a8c\u8bc1SKIG\u7684\u8868\u73b0\uff0c\u5e76\u901a\u8fc7\u6df1\u5165\u7684\u6d88\u878d\u5206\u6790\u63a2\u7a76\u5176\u5173\u952e\u7ec4\u4ef6\u3002|\n", "2405.12929": "|**2024-05-21**|**Code-mixed Sentiment and Hate-speech Prediction**|Anjali Yadav et.al.|[2405.12929](http://arxiv.org/abs/2405.12929)|**[link](https://github.com/matejklemen/sentiment-hate-speech-with-code-mixed-models)**|\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\uff0c\u6df7\u5408\u4ee3\u7801\uff08code-mixed discourse\uff09\u6307\u7684\u662f\u5355\u6587\u672c\u4e2d\u878d\u5408\u591a\u79cd\u8bed\u8a00\u7684\u73b0\u8c61\uff0c\u5c24\u5176\u662f\u5728\u5b98\u65b9\u8bed\u8a00\u591a\u5143\u7684\u56fd\u5bb6\u7684\u975e\u6b63\u5f0f\u4ea4\u6d41\u4e2d\u5e38\u89c1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u4e3b\u5bfc\u5730\u4f4d\u63d0\u5347\uff0c\u6211\u4eec\u9488\u5bf9\u4ee3\u7801\u6df7\u5408\u8bed\u5883\u7684\u7814\u7a76\u4e5f\u968f\u4e4b\u5c55\u5f00\u3002\u9996\u5148\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u56db\u6b3e\u65b0\u7684\u82f1\u8bed-\u5370\u5730\u8bed\u548c\u82f1\u8bed-\u65af\u6d1b\u6587\u5c3c\u4e9a\u53cc\u8bed\u9884\u8bad\u7ec3\u906e\u7f69\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u9002\u5e94\u975e\u6b63\u5f0f\u8bed\u8a00\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5404\u79cd\u7c7b\u578b\u7684\u6a21\u578b\u2014\u2014\u5305\u62ec\u5355\u8bed\u3001\u53cc\u8bed\u3001\u5c11\u91cf\u8bed\u8a00\u548c\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u6a21\u578b\u2014\u2014\u5728\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u60c5\u611f\u5206\u6790\u548c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7b49\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u6709\u6548\u7684\u5206\u7c7b\u5668\u662f\u9488\u5bf9\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u4e13\u4e1a\u5316\u53cc\u8bed\u548c\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u968f\u540e\u662f\u975e\u4e13\u4e1a\u7684\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u548c\u5355\u8bed\u6a21\u578b\uff0c\u800c\u5927\u578b\u751f\u6210\u6a21\u578b\u7684\u8868\u73b0\u5e76\u4e0d\u7a81\u51fa\u3002\u5bf9\u4e8e\u6d89\u53ca\u60c5\u611f\u7684\u95ee\u9898\uff0c\u6a21\u578b\u5728\u5904\u7406\u4ee3\u7801\u6df7\u5408\u6570\u636e\u65f6\u603b\u4f53\u4e0a\u7565\u4f18\u4e8e\u975e\u4ee3\u7801\u6df7\u5408\u6570\u636e\u3002|\n", "2405.12920": "|**2024-05-21**|**Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples**|Tim Menzies et.al.|[2405.12920](http://arxiv.org/abs/2405.12920)|**[link](https://github.com/timm/ez)**|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u8f6f\u4ef6\u5206\u6790\u6311\u6218\u4efb\u52a1\u3002\u5728\u8fd9\u4e2a\u88ab\u79f0\u4e3a\u201c\u8f6f\u4ef6\u5ba1\u67e5\u201d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e00\u7ec4SME\uff08\u4e3b\u9898\u4e13\u5bb6\uff09\u4f1a\u8bc4\u5ba1\u8f6f\u4ef6\u884c\u4e3a\u793a\u4f8b\uff0c\u4ee5\u5efa\u8bae\u5982\u4f55\u6539\u8fdb\u8f6f\u4ef6\u7684\u8fd0\u884c\u3002\u7531\u4e8eSME\u7684\u65f6\u95f4\u901a\u5e38\u975e\u5e38\u6709\u9650\uff0c\u7406\u60f3\u7684\u72b6\u51b5\u662f\uff0c\u8be5\u56e2\u961f\u4ec5\u901a\u8fc7\u67e5\u770b\u5c11\u91cf\u5177\u6709\u9ad8\u5ea6\u4fe1\u606f\u4ef7\u503c\u7684\u793a\u4f8b\u5c31\u80fd\u5b8c\u6210\u4f18\u5316\u4efb\u52a1\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e2a\u5ba1\u67e5\u8fc7\u7a0b\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u8bad\u7ec3\u9884\u6d4b\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u67d0\u4e2a\u4e13\u5bb6\u662f\u5426\u4f1a\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u4e0b\u4e00\u4e2a\u793a\u4f8b\u3002\u8fd9\u79cd\u9884\u6d4b\u6a21\u578b\u53ef\u4ee5\u4e0eSME\u5408\u4f5c\uff0c\u5f15\u5bfc\u4ed6\u4eec\u63a2\u7d22\u6240\u6709\u793a\u4f8b\uff0c\u540c\u65f6\u5728\u4e13\u5bb6\u79bb\u5f00\u540e\uff0c\u6a21\u578b\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u4ee3\u7406\uff0c\u5904\u7406\u65b0\u51fa\u73b0\u7684\u6848\u4f8b\uff0c\u4ee5\u5e94\u5bf9\u4e13\u5bb6\u4eec\u7684\u5fd9\u788c\u3002 \u572831\u4e2a\u6848\u4f8b\u7814\u7a76\u4e2d\uff08\u6db5\u76d6\u4e86\u4ece\u8f6f\u4ef6\u6d41\u7a0b\u7684\u9ad8\u5c42\u51b3\u7b56\u5230\u89c6\u9891\u7f16\u7801\u8f6f\u4ef6\u914d\u7f6e\u7684\u4f4e\u5c42\u51b3\u7b56\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f7f\u752812\u523030\u4e2a\u6807\u7b7e\u5c31\u80fd\u5efa\u7acb\u8fd9\u6837\u7684\u9884\u6d4b\u6a21\u578b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u4ec5\u51ed\u5c11\u6570\u793a\u4f8b\uff08\u4e0d\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5c31\u80fd\u53d6\u5f97\u8fd9\u6837\u7684\u6210\u679c\uff0c\u5728\u5f53\u524d\u5c1a\u5c5e\u7f55\u89c1\u3002\u9075\u5faa\u5f00\u653e\u79d1\u5b66\u7684\u539f\u5219\uff0c\u6211\u4eec\u5c06\u5728\u63d0\u4f9b\u6240\u6709\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4fbf\u4ed6\u4eba\u80fd\u590d\u5236\u3001\u9a8c\u8bc1\u6216\u5728\u6b64\u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u6539\u8fdb\u8fd9\u4e9b\u7ed3\u679c\u3002|\n", "2405.12915": "|**2024-05-21**|**G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation**|Xingyuan Pan et.al.|[2405.12915](http://arxiv.org/abs/2405.12915)|**[link](https://github.com/xypan0/G-DIG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u7528\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\uff0c\u5b83\u4eec\u80fd\u591f\u4e0e\u4eba\u7c7b\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u534f\u540c\u3002\u7136\u800c\uff0c\u6307\u4ee4\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u662f\u6307\u4ee4\u5fae\u8c03\u9762\u4e34\u7684\u4e24\u5927\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u9009\u62e9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u9ad8\u8d28\u91cf\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u3002\u6211\u4eec\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5206\u6790\u5355\u4e2a\u8bad\u7ec3\u6837\u4f8b\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f71\u54cd\u6a21\u578b\u3002\u901a\u8fc7\u7ed3\u5408\u5f71\u54cd\u529b\u51fd\u6570\u548c\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u79cd\u5b50\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u5bf9\u6a21\u578b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u7684\u6837\u4f8b\u4f5c\u4e3a\u9ad8\u8d28\u91cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u6570\u636e\u591a\u6837\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u805a\u7c7b\u5176\u68af\u5ea6\u5e76\u91cd\u91c7\u6837\uff0c\u6700\u5927\u5316\u5b83\u4eec\u5bf9\u6a21\u578b\u4ea7\u751f\u7684\u5f71\u54cd\u591a\u6837\u6027\u3002\u5728WMT22\u548cFLORES\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u6df1\u5165\u5206\u6790\u8fdb\u4e00\u6b65\u8bc1\u5b9e\u4e86\u5176\u6548\u679c\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2405.12914": "|**2024-05-21**|**An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation**|Zhiyu Tan et.al.|[2405.12914](http://arxiv.org/abs/2405.12914)|**[link](https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion.github.io)**|\u4e00\u4e2a\u5173\u952e\u7684\u5148\u51b3\u6761\u4ef6\u662f\u51c6\u786e\u7406\u89e3\u6587\u672c\u8f93\u5165\uff0c\u8fd9\u5bf9\u4e8e\u5fe0\u5b9e\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u5229\u7528CLIP\u6a21\u578b\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u8868\u793a\u63d0\u793a\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684CLIP\u6a21\u578b\u4ec5\u80fd\u5904\u7406\u82f1\u6587\uff0c\u4e14\u5176\u6587\u672c\u7f16\u7801\u5668\u7684\u6a21\u578b\u5bb9\u91cf\u76f8\u5bf9\u6709\u9650\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u80fd\u591f\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u5e76\u63d0\u4f9b\u66f4\u4f18\u79c0\u7684\u6587\u672c\u8868\u793a\u3002\u672c\u6587\u7814\u7a76\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u4ee5\u63d0\u5347\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u7136\u800c\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5305\u542bLLMs\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u6570\u636e\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e09\u9636\u6bb5\u8bad\u7ec3\u6d41\u7a0b\uff0c\u6709\u6548\u5730\u6574\u5408\u73b0\u6709\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u4e0eLLMs\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u8bad\u7ec3\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u9002\u914d\u5668\uff0c\u4f7f\u5f97\u80fd\u591f\u5feb\u901f\u4f7f\u7528LLMs\u751f\u6210\u7684\u6587\u672c\u8868\u793a\u6765\u8bad\u7ec3\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0d\u4ec5\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u8fd8\u80fd\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u800c\u4e14\u5728\u56fe\u50cf\u751f\u6210\u8d28\u91cf\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.12910": "|**2024-05-21**|**Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment**|Holli Sargeant et.al.|[2405.12910](http://arxiv.org/abs/2405.12910)|**[link](https://github.com/AhmedIzzidien/TopicLLM)**|**\u8be5\u8bba\u6587\u5173\u6ce8\u6cd5\u5f8b\u5206\u6790\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u7a7a\u767d\uff0c\u901a\u8fc7\u6784\u5efa\u548c\u5e94\u7528\u4e00\u79cd\u65b0\u9896\u7684\u5224\u4f8b\u4e3b\u9898\u5206\u7c7b\u6cd5\uff0c\u5bf9\u82f1\u56fd\u7684\u7b80\u6613\u5224\u51b3\u6848\u4ef6\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5229\u7528\u7cbe\u5fc3\u6311\u9009\u7684\u7b80\u6613\u5224\u51b3\u6848\u4f8b\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578bClaude 3 Opus\u7814\u7a76\u529f\u80fd\u6027\u8bdd\u9898\u548c\u8d8b\u52bf\u3002\u7ed3\u679c\u663e\u793a\uff0cClaude 3 Opus\u5728\u4e3b\u9898\u5206\u7c7b\u4e0a\u7684\u51c6\u786e\u7387\u4e3a87.10%\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u6cd5\u5f8b\u9886\u57df\u4e2d\u7b80\u6613\u5224\u51b3\u7684\u660e\u663e\u6a21\u5f0f\u3002\u7531\u4e8e\u82f1\u56fd\u7684\u5224\u4f8b\u6cd5\u5e76\u672a\u539f\u59cb\u6807\u6ce8\u5173\u952e\u8bcd\u6216\u63d0\u4f9b\u4e3b\u9898\u8fc7\u6ee4\u9009\u9879\uff0c\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u6df1\u5316\u4e86\u6211\u4eec\u5bf9\u7b80\u6613\u5224\u51b3\u4e3b\u9898\u672c\u8d28\u7684\u7406\u89e3\uff0c\u8fd8\u5c55\u793a\u4e86\u4f20\u7edf\u65b9\u6cd5\u4e0e\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u5206\u7c7b\u65b9\u6cd5\u7ed3\u5408\u7684\u53ef\u80fd\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u4f9b\u4e86\u82f1\u56fd\u6cd5\u5f8b\u7684\u65b0\u901a\u7528\u5206\u7c7b\u6846\u67b6\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u610f\u4e49\u4e3a\u53f8\u6cd5\u884c\u653f\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba1\u7b97\u6cd5\u5b66\u7814\u7a76\u65b9\u6cd5\u8bba\u8ba8\u8bba\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2405.12900": "|**2024-05-21**|**Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents**|San Kim et.al.|[2405.12900](http://arxiv.org/abs/2405.12900)|null|\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5404\u79cd\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7cfb\u7edf\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6bd2\u6027\u95ee\u9898\u5bf9\u7528\u6237\u4f53\u9a8c\u6784\u6210\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bad\u7ec3\u7b97\u6cd5\u2014\u2014\u5bf9\u6297\u5f0f\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08ADPO\uff09\uff0c\u5b83\u662f\u5728\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7684\u57fa\u7840\u4e0a\u6539\u8fdb\u7684\u3002ADPO\u65e8\u5728\u8bad\u7ec3\u6a21\u578b\u589e\u52a0\u5bf9\u4f18\u9009\u56de\u590d\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u964d\u4f4e\u5bf9\u4f7f\u7528\u6709\u6bd2\u63a7\u5236\u4ee4\u724c\u751f\u6210\u7684\u4e0d\u5b89\u5168\u56de\u590d\u7684\u6982\u7387\u3002\u7814\u7a76\u663e\u793a\uff0cADPO\u80fd\u591f\u589e\u5f3a\u6a21\u578b\u62b5\u5fa1\u6709\u5bb3\u5bf9\u8bdd\u7684\u80fd\u529b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660eADPO\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edfDPO\u66f4\u4e3a\u7a33\u5b9a\u7684\u8bad\u7ec3\u6d41\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u6709\u5bb3\u6570\u636e\u76f4\u63a5\u878d\u5165\u751f\u6210\u6a21\u578b\u7684DPO\u53d8\u4f53\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u521b\u5efa\u5b89\u5168\u5bf9\u8bdd\u6570\u636e\u7684\u9700\u6c42\u3002|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863](http://arxiv.org/abs/2405.14863)|null|\u8de8\u9886\u57df\u5bf9\u9f50\u662f\u6307\u5c06\u4e00\u4e2a\u6982\u5ff5\u4ece\u4e00\u4e2a\u9886\u57df\u6620\u5c04\u5230\u53e6\u4e00\u4e2a\u9886\u57df\u7684\u4efb\u52a1\u3002\u4f8b\u5982\uff0c\u8be2\u95ee\u201c\u5982\u679c\\textit{\u533b\u751f}\u662f\u4e00\u79cd\\textit{\u989c\u8272}\uff0c\u5b83\u4f1a\u662f\u4ec0\u4e48\u989c\u8272\uff1f\u201d\u8fd9\u4e2a\u770b\u4f3c\u5947\u7279\u7684\u8bfe\u9898\u65e8\u5728\u7814\u7a76\u4eba\u4eec\u5982\u4f55\u901a\u8fc7\u7c7b\u522b\u6620\u5c04\u548c\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u63a8\u7406\u6765\u8868\u5f81\u5177\u4f53\u548c\u62bd\u8c61\u7684\u6982\u5ff5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u8ba4\u77e5\u79d1\u5b66\u4e2d\u7684\u8fd9\u4e00\u4efb\u52a1\uff0c\u901a\u8fc7\u884c\u4e3a\u7814\u7a76\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6982\u5ff5\u5316\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u793aLLMs\u6267\u884c\u8de8\u57df\u6620\u5c04\u4efb\u52a1\uff0c\u5e76\u5728\u7fa4\u4f53\u548c\u4e2a\u4f53\u5c42\u9762\u5206\u6790\u5b83\u4eec\u7684\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u5bf9\u5176\u9884\u6d4b\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u548c\u5206\u7c7b\u5b83\u4eec\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u89e3\u91ca\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4eba\u7c7b\u548c\u6a21\u578b\u7684\u6620\u5c04\u4ee5\u53ca\u89e3\u91ca\u5b58\u5728\u663e\u8457\u76f8\u4f3c\u6027\uff0c\u8868\u660e\u6a21\u578b\u4ee5\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u65b9\u5f0f\u8868\u5f81\u6982\u5ff5\u3002\u8fd9\u79cd\u76f8\u4f3c\u6027\u4e0d\u4ec5\u4f53\u73b0\u5728\u6a21\u578b\u7684\u8868\u793a\u4e0a\uff0c\u4e5f\u4f53\u73b0\u5728\u5b83\u4eec\u7684\u884c\u4e3a\u4e2d\u3002\u800c\u4e14\uff0c\u6a21\u578b\u5927\u591a\u7ed9\u51fa\u6709\u6548\u7684\u89e3\u91ca\uff0c\u5e76\u91c7\u7528\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u63a8\u7406\u8def\u5f84\u3002|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862](http://arxiv.org/abs/2405.14862)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aBitune\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u63d0\u5347\u4e86\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u8c03\u4f18\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u5728\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002Bitune\u901a\u8fc7\u540c\u65f6\u5e94\u7528\u81ea\u56de\u5f52\u548c\u53cc\u5411\u6ce8\u610f\u529b\u5230\u63d0\u793a\u4e0a\uff0c\u4ee5\u83b7\u53d6\u66f4\u7cbe\u786e\u7684\u67e5\u8be2\u6216\u6307\u4ee4\u8868\u793a\u3002\u6211\u4eec\u4e3a\u6b64\u5f15\u5165\u4e86\u4e24\u7ec4\u53c2\u6570\uff0c\u5e76\u91c7\u7528\u4e86\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u6765\u5904\u7406\u3002\u8fd9\u4e24\u79cd\u7279\u5f81\u968f\u540e\u88ab\u7ec4\u5408\u6210\u4e00\u4e2a\u52a0\u6743\u5e73\u5747\uff0c\u5176\u4e2d\u6743\u91cd\u7531\u53ef\u8bad\u7ec3\u7cfb\u6570\u51b3\u5b9a\uff0c\u7528\u4e8e\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitune\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u7b97\u672f\u548c\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5927\u91cf\u7684\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u5e76\u663e\u793a\u4e86\u8be5\u65b9\u6cd5\u5bf9\u4e0d\u540cPEFT\u6280\u672f\u7684\u9c81\u68d2\u6027\u3002|\n", "2405.14852": "|**2024-05-23**|**PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression**|Vladimir Malinovskii et.al.|[2405.14852](http://arxiv.org/abs/2405.14852)|**[link](https://github.com/vahe1994/aqlm)**|## \u80cc\u666f \u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u6781\u7aef\u201d\u538b\u7f29\uff0c\u5373\u5c06\u5176\u53c2\u6570\u538b\u7f29\u81f31-2\u4f4d\u6bcf\u53c2\u6570\uff0c\u4ee5\u9002\u5e94\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u7684\u9ad8\u6548\u6267\u884c\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6539\u8fdb\u4e00\u6b21\u6027\u91cf\u5316\u6280\u672f\u548c\u6743\u91cd\u8868\u793a\u4e0a\uff1b\u7136\u800c\uff0c\u7eaf\u540e\u8bad\u7ec3\u65b9\u6cd5\u5728\u7cbe\u5ea6\u4e0e\u4f4d\u5bbd\u6743\u8861\u65b9\u9762\u7684\u6536\u76ca\u6b63\u5728\u51cf\u5c11\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982QuIP#\u548cAQLM\uff0c\u5305\u542b\u5bf9\u90e8\u5206\u538b\u7f29\u53c2\u6570\u7684\u5c0f\u89c4\u6a21\u6821\u51c6\u6570\u636e\u5fae\u8c03\uff1b\u7136\u800c\uff0c\u8fd9\u4e9b\u9488\u5bf9\u538b\u7f29\u6743\u91cd\u7684\u5fae\u8c03\u901a\u5e38\u4ec5\u4f7f\u7528\u76f4\u901a\u4f30\u8ba1\u5668\uff08STE\uff09\uff0cSTE\u5728\u8fd9\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\u5c1a\u4e0d\u660e\u786e\u3002 \u672c\u5de5\u4f5c\u8d28\u7591\u5728\u6781\u7aefLLM\u538b\u7f29\u4e2d\u4f7f\u7528STE\u7684\u6709\u6548\u6027\uff0c\u5e76\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u91cf\u5316\u611f\u77e5\u5fae\u8c03\u7b56\u7565\u3002\u6211\u4eec\u63d0\u51faPV-Tuning\uff0c\u4e00\u4e2a\u65e0\u7279\u5b9a\u67b6\u6784\u9650\u5236\u7684\u6846\u67b6\uff0c\u5b83\u6269\u5c55\u5e76\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u5e76\u5728\u67d0\u4e9b\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u4f9b\u6536\u655b\u4fdd\u8bc1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5f53\u7528\u4e8e1-2\u4f4d\u77e2\u91cf\u91cf\u5316\u65f6\uff0cPV-Tuning\u5728\u9ad8\u6027\u80fd\u6a21\u578b\u5982Llama\u548cMistral\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u6280\u672f\u3002\u901a\u8fc7\u4f7f\u7528PV-Tuning\uff0c\u6211\u4eec\u57282\u4f4d\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u9996\u6b21\u5b9e\u73b0\u4e86Llama 2\u5bb6\u65cf\u6a21\u578b\u7684\u5e15\u7d2f\u6258\u6700\u4f18\u91cf\u5316\u3002|\n", "2405.14831": "|**2024-05-23**|**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models**|Bernal Jim\u00e9nez Guti\u00e9rrez et.al.|[2405.14831](http://arxiv.org/abs/2405.14831)|**[link](https://github.com/osu-nlp-group/hipporag)**|\u4e3a\u4e86\u5728\u6076\u52a3\u591a\u53d8\u7684\u81ea\u7136\u73af\u5883\u4e2d\u751f\u5b58\uff0c\u54fa\u4e73\u52a8\u7269\u7684\u5927\u8111\u53d1\u5c55\u51fa\u5b58\u50a8\u5927\u91cf\u4e16\u754c\u77e5\u8bc6\u5e76\u4e0d\u65ad\u6574\u5408\u65b0\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u540c\u65f6\u907f\u514d\u707e\u96be\u6027\u9057\u5fd8\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u5e26\u6709\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u4e0a\u5df2\u53d6\u5f97\u663e\u8457\u6210\u5c31\uff0c\u4f46\u5b83\u4eec\u5728\u5927\u89c4\u6a21\u65b0\u7ecf\u9a8c\u878d\u5408\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51faHippoRAG\uff0c\u4e00\u4e2a\u53d7\u4eba\u7c7b\u957f\u671f\u8bb0\u5fc6\u6d77\u9a6c\u56de\u7d22\u5f15\u7406\u8bba\u542f\u53d1\u7684\u65b0\u578b\u68c0\u7d22\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u65b0\u7ecf\u9a8c\u7684\u66f4\u6df1\u3001\u66f4\u6709\u6548\u96c6\u6210\u3002HippoRAG\u5de7\u5999\u5730\u534f\u540cLLMs\u3001\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u4e2a\u6027\u5316PageRank\u7b97\u6cd5\uff0c\u6a21\u62df\u4eba\u8111\u76ae\u5c42\u548c\u6d77\u9a6c\u4f53\u5728\u8bb0\u5fc6\u4e2d\u7684\u4e0d\u540c\u4f5c\u7528\u3002 \u6211\u4eec\u5c06HippoRAG\u4e0e\u73b0\u6709RAG\u65b9\u6cd5\u5728\u591a\u8f6e\u95ee\u7b54\u4efb\u52a1\u4e2d\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793aHippoRAG\u663e\u8457\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002\u5355\u6b65\u68c0\u7d22\u65f6\uff0cHippoRAG\u8868\u73b0\u51fa\u4e0e\u8fed\u4ee3\u68c0\u7d22\u65b9\u6cd5\u5982IRCoT\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u6210\u672c\u8282\u770110-30\u500d\uff0c\u901f\u5ea6\u63d0\u53476-13\u500d\u3002\u5f53\u5c06HippoRAG\u878d\u5165IRCoT\u540e\uff0c\u8fd8\u80fd\u5e26\u6765\u989d\u5916\u7684\u663e\u8457\u589e\u76ca\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793aHippoRAG\u80fd\u591f\u5e94\u5bf9\u73b0\u6709\u65b9\u6cd5\u96be\u4ee5\u89e6\u53ca\u7684\u65b0\u573a\u666f\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728\u4e0a\u5f00\u6e90\u3002|\n", "2405.14804": "|**2024-05-23**|**Can LLMs Solve longer Math Word Problems Better?**|Xin Xu et.al.|[2405.14804](http://arxiv.org/abs/2405.14804)|null|### \u7ffb\u8bd1 \u6570\u5b66\u5e94\u7528\u9898\uff08MWPs\uff09\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u77ed\u80cc\u666f\u7684\u9898\u76ee\u4e0a\u3002\u7136\u800c\uff0c\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u6570\u5b66\u95ee\u9898\u5f80\u5f80\u6d89\u53ca\u590d\u6742\u60c5\u5883\uff0c\u56e0\u6b64LLMs\u89e3\u51b3\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u5bf9\u4e8e\u5176\u5728\u5b9e\u9645\u573a\u666f\u7684\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u8fd9\u4e00\u65b9\u9762\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5173\u6ce8Context Length Generalizability\uff08CoLeG\uff09\uff0c\u5373LLMs\u5904\u7406\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u3002\u6211\u4eec\u521b\u5efa\u4e86Extended Grade-School Math\uff08E-GSM\uff09\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5e26\u6709\u8be6\u7ec6\u53d9\u8ff0\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30LLMs\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u6548\u80fd\u548c\u9c81\u68d2\u6027\u3002 \u901a\u8fc7\u5bf9\u73b0\u6709\u96f6\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u4ee5\u53ca\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u8003\u5bdf\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u5728CoLeG\u65b9\u9762\u666e\u904d\u5b58\u5728\u4e0d\u8db3\u3002\u9488\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684LLMs\uff0c\u6211\u4eec\u63d0\u51fa\u9488\u5bf9\u6027\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u5bf9\u4e8e\u4e13\u6709\u6a21\u578b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u5bfc\u6027\u63d0\u793a\u4ee5\u51cf\u8f7b\u957f\u6587\u672c\u7684\u5f71\u54cd\uff1b\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4ee5\u63d0\u5347\u6a21\u578b\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u5728E-GSM\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5176\u4ed6\u591a\u4e2a\u6570\u5b66\u5e94\u7528\u9898\u57fa\u51c6\u4e0a\u4e5f\u5c55\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u672c\u7814\u7a76\u7684\u7ed3\u679c\u4e3a\u672a\u6765\u5229\u7528LLMs\u5904\u7406\u590d\u6742\u73b0\u5b9e\u95ee\u9898\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\uff0c\u4e3a\u5f53\u524d\u9650\u5236\u63d0\u51fa\u4e86\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u6a21\u578b\u6cdb\u5316\u6027\u548c\u8bad\u7ec3\u7b56\u7565\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2405.14782": "|**2024-05-23**|**Lessons from the Trenches on Reproducible Evaluation of Language Models**|Stella Biderman et.al.|[2405.14782](http://arxiv.org/abs/2405.14782)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u662f\u4e00\u9879\u672a\u89e3\u7684\u6311\u6218\u3002\u7814\u7a76\u4eba\u5458\u548c\u5de5\u7a0b\u5e08\u9762\u4e34\u8bf8\u591a\u65b9\u6cd5\u8bba\u96be\u9898\uff0c\u4f8b\u5982\u6a21\u578b\u5bf9\u8bc4\u4f30\u8bbe\u7f6e\u7684\u654f\u611f\u6027\u3001\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\uff0c\u4ee5\u53ca\u53ef\u91cd\u590d\u6027\u548c\u900f\u660e\u5ea6\u7684\u7f3a\u5931\u3002\u672c\u6587\u57fa\u4e8e\u4e09\u5e74\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7ecf\u9a8c\uff0c\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u6307\u5bfc\u548c\u6559\u8bad\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u5e38\u89c1\u7684\u95ee\u9898\u3002\u5176\u6b21\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u5e94\u5bf9\u6216\u51cf\u8f7b\u8fd9\u4e9b\u95ee\u9898\u7684\u6700\u4f73\u5b9e\u8df5\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Language Model Evaluation Harness\uff08lm-eval\uff09\uff1a\u4e00\u4e2a\u5f00\u6e90\u5e93\uff0c\u65e8\u5728\u72ec\u7acb\u3001\u53ef\u91cd\u590d\u548c\u6269\u5c55\u5730\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u5c06\u4ecb\u7ecd\u5e93\u7684\u529f\u80fd\uff0c\u5e76\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u5982\u4f55\u4f7f\u7528\u8be5\u5e93\u6765\u7f13\u89e3\u8fd9\u4e9b\u65b9\u6cd5\u8bba\u5173\u6ce8\u70b9\u3002|\n", "2405.14768": "|**2024-05-23**|**WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models**|Peng Wang et.al.|[2405.14768](http://arxiv.org/abs/2405.14768)|**[link](https://github.com/zjunlp/easyedit)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u968f\u7740\u4e16\u754c\u4e8b\u5b9e\u7684\u4e0d\u65ad\u589e\u957f\u548c\u7ea0\u6b63\u9519\u8bef\u54cd\u5e94\u7684\u9700\u6c42\uff0c\u6a21\u578b\u7f16\u8f91\u7684\u65b9\u6cd5\u9700\u8981\u4e0d\u65ad\u66f4\u65b0\u77e5\u8bc6\u3002\u8bba\u6587\u7684\u6838\u5fc3\u95ee\u9898\u662f\uff1a\u5728\u7f16\u8f91\u8fc7\u7a0b\u4e2d\uff0c\u77e5\u8bc6\u5e94\u5b58\u50a8\u5728\u6a21\u578b\u7684\u54ea\u4e2a\u8bb0\u5fc6\u5c42\u6b21\u66f4\u4e3a\u5408\u9002\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u76f4\u63a5\u4fee\u6539\u957f\u671f\u8bb0\u5fc6\uff08\u6a21\u578b\u53c2\u6570\uff09\u6216\u5229\u7528\u5de5\u4f5c\u8bb0\u5fc6\uff08\u901a\u8fc7\u68c0\u7d22\u7684\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\uff09\u90fd\u4f1a\u5bfc\u81f4\u4e0d\u53ef\u903e\u8d8a\u7684\u4e09\u89d2\u56f0\u5883\u2014\u2014\u53ef\u9760\u6027\u3001\u6cdb\u5316\u80fd\u529b\u548c\u5c40\u90e8\u6027\u65e0\u6cd5\u540c\u65f6\u5b9e\u73b0\u4e8e\u7ec8\u8eab\u7f16\u8f91\u573a\u666f\u4e2d\u3002\u76f4\u63a5\u4fee\u6539\u53c2\u6570\u4f1a\u4e0e\u65e0\u5173\u7684\u9884\u8bad\u7ec3\u77e5\u8bc6\u6216\u5148\u524d\u7f16\u8f91\u4ea7\u751f\u51b2\u7a81\uff08\u53ef\u9760\u6027\u5dee\u3001\u5c40\u90e8\u6027\u4e0d\u8db3\uff09\uff1b\u800c\u57fa\u4e8e\u68c0\u7d22\u7684\u5de5\u4f5c\u8bb0\u5fc6\u96be\u4ee5\u4f7f\u6a21\u578b\u7406\u89e3\u5e76\u6cdb\u5316\u7f16\u8f91\uff08\u6cdb\u5316\u80fd\u529b\u5f31\uff09\u3002\u56e0\u6b64\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aWISE\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u5f25\u5408\u8bb0\u5fc6\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002 \u5728WISE\u4e2d\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u53c2\u6570\u5185\u5b58\u673a\u5236\uff0c\u5305\u62ec\u4e3b\u5185\u5b58\u7528\u4e8e\u5b58\u50a8\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0c\u4fa7\u5185\u5b58\u7528\u4e8e\u5b58\u653e\u7f16\u8f91\u540e\u7684\u77e5\u8bc6\u3002\u4ec5\u5bf9\u4fa7\u5185\u5b58\u4e2d\u7684\u77e5\u8bc6\u8fdb\u884c\u7f16\u8f91\uff0c\u5e76\u8bad\u7ec3\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u4ee5\u4fbf\u6839\u636e\u67e5\u8be2\u51b3\u5b9a\u4ece\u54ea\u4e2a\u5185\u5b58\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u5bf9\u4e8e\u6301\u7eed\u7f16\u8f91\uff0c\u91c7\u7528\u4e86\u77e5\u8bc6\u5207\u7247\u673a\u5236\uff0c\u5c06\u4e0d\u540c\u7684\u7f16\u8f91\u5206\u5e03\u5728\u53c2\u6570\u7684\u4e0d\u540c\u5b50\u7a7a\u95f4\u4e2d\uff0c\u7136\u540e\u5408\u5e76\u5230\u5171\u4eab\u5185\u5b58\u4e2d\uff0c\u4ee5\u907f\u514d\u51b2\u7a81\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWISE\u5728\u95ee\u7b54\u3001\u5e7b\u89c9\u751f\u6210\u548c\u8de8\u4e0d\u540c\u8d8b\u52bf\u7684LLM\u67b6\u6784\uff08\u5982GPT\u3001LLaMA\u548cMistral\uff09\u7684\u7ec8\u8eab\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u6210\u529f\u514b\u670d\u4e86\u4e0a\u8ff0\u56f0\u5883\u3002\u4ee3\u7801\u5c06\u5728https://github.com/zjunlp/EasyEdit\u4e0a\u53d1\u5e03\u3002**|\n", "2405.14767": "|**2024-05-23**|**FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models**|Hongyang Yang et.al.|[2405.14767](http://arxiv.org/abs/2405.14767)|**[link](https://github.com/ai4finance-foundation/finrobot)**|**\u968f\u7740\u91d1\u878d\u673a\u6784\u548c\u4e13\u4e1a\u4eba\u58eb\u8d8a\u6765\u8d8a\u591a\u5730\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u878d\u5165\u5de5\u4f5c\u6d41\u7a0b\uff0c\u91d1\u878d\u884c\u4e1a\u4e0eAI\u793e\u533a\u4e4b\u95f4\u4ecd\u5b58\u5728\u663e\u8457\u969c\u788d\uff0c\u5982\u4e13\u6709\u6570\u636e\u548c\u4e13\u4e1a\u77e5\u8bc6\u3002\u8fd9\u4e9b\u6311\u6218\u9650\u5236\u4e86AI\u5728\u63d0\u5347\u91d1\u878d\u4efb\u52a1\u6548\u7387\u65b9\u9762\u7684\u6f5c\u529b\u3002\u9274\u4e8e\u91d1\u878d\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u6211\u4eec\u65e8\u5728\u5f00\u53d1\u4e13\u95e8\u9488\u5bf9\u91d1\u878d\u7684LLM\u9a71\u52a8\u5de5\u5177\u94fe\uff0c\u5e76\u901a\u8fc7\u5f00\u6e90\u9879\u76ee\u63a8\u52a8\u5176\u666e\u53ca\uff0c\u4fc3\u8fdbAI\u5728\u91d1\u878d\u51b3\u7b56\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u4ecb\u7ecdFinRobot\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u5f00\u6e90AI\u4ee3\u7406\u5e73\u53f0\uff0c\u652f\u6301\u591a\u4e2a\u91d1\u878d\u4e13\u4e1aAI\u4ee3\u7406\uff0c\u6bcf\u4e2a\u90fd\u7531LLM\u9a71\u52a8\u3002\u5e73\u53f0\u4e3b\u8981\u5206\u4e3a\u56db\u5c42\uff1a1\uff09\u91d1\u878dAI\u4ee3\u7406\u5c42\uff0c\u901a\u8fc7\u6784\u5efa\u91d1\u878dChain-of-Thought\uff08CoT\uff09\u5c06\u590d\u6742\u7684\u91d1\u878d\u95ee\u9898\u5206\u89e3\u4e3a\u903b\u8f91\u5e8f\u5217\uff1b2\uff09\u91d1\u878dLLM\u7b97\u6cd5\u5c42\uff0c\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u52a8\u6001\u914d\u7f6e\u5408\u9002\u7684\u6a21\u578b\u5e94\u7528\u7b56\u7565\uff1b3\uff09LLMOps\u548cDataOps\u5c42\uff0c\u901a\u8fc7\u8bad\u7ec3/\u5fae\u8c03\u6280\u672f\u4ee5\u53ca\u4f7f\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u751f\u6210\u7cbe\u786e\u6a21\u578b\uff1b4\uff09\u591a\u6e90LLM\u57fa\u7840\u6a21\u578b\u5c42\uff0c\u6574\u5408\u5404\u79cdLLM\uff0c\u4f7f\u4e0a\u8ff0\u5404\u5c42\u53ef\u4ee5\u76f4\u63a5\u8bbf\u95ee\u3002FinRobot\u65e8\u5728\u4e3a\u4e13\u4e1a\u5206\u6790\u5e08\u548c\u975e\u4e13\u4e1a\u4eba\u58eb\u63d0\u4f9b\u5b9e\u8df5\u64cd\u4f5c\uff0c\u8ba9\u4ed6\u4eec\u80fd\u591f\u5229\u7528\u5f3a\u5927\u7684AI\u6280\u672f\u8fdb\u884c\u9ad8\u7ea7\u91d1\u878d\u5206\u6790\u3002FinRobot\u7684\u5f00\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/AI4Finance-Foundation/FinRobot}\u3002**|\n", "2405.14766": "|**2024-05-23**|**Evaluating Large Language Models for Public Health Classification and Extraction Tasks**|Joshua Harris et.al.|[2405.14766](http://arxiv.org/abs/2405.14766)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u4eec\u5bf9\u5176\u5728\u516c\u5171\u536b\u751f\u9886\u57df\u652f\u6301\u4e13\u5bb6\u5de5\u4f5c\u7684\u6f5c\u529b\u4ea7\u751f\u4e86\u6d53\u539a\u5174\u8da3\u3002\u672c\u7814\u7a76\u901a\u8fc7\u7ed3\u5408\u516d\u4e2a\u5916\u90e8\u6807\u6ce8\u7684\u548c\u4e03\u4e2a\u5185\u90e8\u6807\u6ce8\u7684\u6570\u636e\u96c6\uff0c\u8bc4\u4f30\u4e86LLMs\u5728\u5904\u7406\u4e0e\u5065\u5eb7\u8d1f\u62c5\u3001\u6d41\u884c\u75c5\u5b66\u98ce\u9669\u56e0\u7d20\u548c\u516c\u5171\u536b\u751f\u5e72\u9884\u76f8\u5173\u7684\u6587\u672c\u5206\u7c7b\u548c\u63d0\u53d6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u9996\u5148\u5bf9\u4e94\u4e2a\u5f00\u6e90\u5927\u6a21\u578b\uff08\u53c2\u6570\u91cf\u4ece7\u4ebf\u523070\u4ebf\u4e0d\u7b49\uff09\u8fdb\u884c\u4e86\u96f6\u6837\u672c\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cLlama-3-70B-Instruct\u8868\u73b0\u51fa\u8272\uff0c\u5fae-F1\u5f97\u5206\u572817\u4e2a\u4efb\u52a1\u4e2d\u768415\u9879\u4e2d\u6700\u9ad8\u3002\u5404\u4efb\u52a1\u95f4\u7684\u6027\u80fd\u5dee\u5f02\u663e\u8457\uff0c\u4f8b\u5982\uff0c\u6709\u4e9b\u6a21\u578b\u5982Contact Classification\u7684\u5f97\u5206\u4f4e\u4e8e60%\uff0c\u800c\u50cfGI\u75be\u75c5\u5206\u7c7b\u8fd9\u6837\u7684\u4efb\u52a1\uff0c\u6240\u6709\u6a21\u578b\u90fd\u80fd\u8fbe\u523080%\u4ee5\u4e0a\u7684\u5fae-F1\u3002\u5bf9\u4e8e12\u4e2a\u4efb\u52a1\u7684\u5b50\u96c6\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86GPT-4\uff0c\u53d1\u73b0\u5176\u4e0eLlama-3-70B-Instruct\u7684\u7ed3\u679c\u76f8\u5f53\uff0cLlama-3-70B-Instruct\u5728\u5176\u4e2d6\u4e2a\u4efb\u52a1\u4e0a\u5f97\u5206\u66f4\u9ad8\u6216\u6301\u5e73\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6839\u636e\u521d\u6b65\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u6709\u53ef\u80fd\u6210\u4e3a\u516c\u5171\u536b\u751f\u4e13\u5bb6\u4ece\u5404\u79cd\u81ea\u7531\u6587\u672c\u6e90\u63d0\u53d6\u4fe1\u606f\u7684\u6709\u6548\u5de5\u5177\uff0c\u6709\u52a9\u4e8e\u516c\u5171\u536b\u751f\u76d1\u6d4b\u3001\u7814\u7a76\u548c\u5e72\u9884\u63aa\u65bd\u3002|\n", "2405.14755": "|**2024-05-23**|**Large language models can be zero-shot anomaly detectors for time series?**|Sarah Alnegheimish et.al.|[2405.14755](http://arxiv.org/abs/2405.14755)|**[link](https://github.com/sintel-dev/sigllm)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6267\u884c\u591a\u79cd\u4efb\u52a1\uff0c\u5305\u62ec\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u8fd9\u4e9b\u6a21\u578b\u7684\u7075\u6d3b\u6027\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e00\u9879\u65b0\u9896\u7684\u7814\u7a76\uff0c\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5bf9\u4e8e\u8bed\u8a00\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u6d89\u53ca\u8bc6\u522b\u8f93\u5165\u5e8f\u5217\uff08\u6216\u591a\u4e2a\u90e8\u5206\uff09\u4e2d\u7684\u5f02\u5e38\u70b9\uff0c\u4ee5\u53ca\u5904\u7406\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u800c\u975e\u4f20\u7edf\u7684\u6587\u672c\u8f93\u5165\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86sigllm\uff0c\u4e00\u4e2a\u4e13\u4e3a\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5305\u542b\u5c06\u65f6\u95f4\u5e8f\u5217\u8f6c\u6362\u4e3a\u6587\u672c\u7684\u6a21\u5757\uff0c\u4ee5\u53ca\u7aef\u5230\u7aef\u7684\u6d41\u7a0b\uff0c\u7528\u4e8e\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5f02\u5e38\u68c0\u6d4b\u3002\u6211\u4eec\u8bd5\u9a8c\u4e86\u4e24\u79cd\u6d4b\u8bd5\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u529b\u7684\u65b9\u6cd5\uff1a\u4e00\u662f\u76f4\u63a5\u63d0\u793a\u6a21\u578b\u6307\u51fa\u8f93\u5165\u4e2d\u7684\u5f02\u5e38\u5143\u7d20\uff1b\u4e8c\u662f\u5229\u7528\u8bed\u8a00\u6a21\u578b\u7684\u9884\u6d4b\u80fd\u529b\u6765\u8f85\u52a9\u68c0\u6d4b\u8fc7\u7a0b\u3002 \u6211\u4eec\u572811\u4e2a\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u4e8610\u79cd\u4e0d\u540c\u7684\u7ba1\u9053\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9884\u6d4b\u65b9\u6cd5\u5728\u6240\u670911\u4e2a\u6570\u636e\u96c6\u4e2d\u90fd\u663e\u8457\u4f18\u4e8e\u63d0\u793a\u65b9\u6cd5\uff0c\u5c24\u5176\u662f\u5728F1\u5206\u6570\u4e0a\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u53d1\u73b0\u5f02\u5e38\uff0c\u4f46\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4ecd\u5360\u4f18\uff0c\u5176\u8868\u73b0\u6bd4\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9ad8\u51fa30%\u3002|\n", "2405.15765": "|**2024-05-24**|**Scaling Laws for Discriminative Classification in Large Language Models**|Dean Wyatte et.al.|[2405.15765](http://arxiv.org/abs/2405.15765)|null|## \u80cc\u666f \u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u6a21\u578b\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u98de\u8dc3\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5bf9\u5404\u79cd\u67e5\u8be2\u751f\u6210\u5408\u7406\u7684\u56de\u7b54\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u5e94\u7528\u4e2d\u5177\u6709\u6f5c\u529b\u3002\u7136\u800c\uff0cLLMs\u5df2\u88ab\u89c2\u5bdf\u5230\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\uff0c\u8fd9\u5728\u77ed\u671f\u5185\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5c06\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u91cd\u65b0\u6784\u60f3\u4e3a\u5206\u7c7b\u4efb\u52a1\uff0c\u4ee5\u5e2e\u52a9\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u9009\u62e9\u6700\u4f73\u7684\u6a21\u677f\u56de\u590d\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a\u5ba2\u670d\u4ee3\u8868\u63d0\u4f9b\u6700\u5408\u9002\u7684\u524dK\u4e2a\u5019\u9009\u56de\u590d\u3002 ## \u4efb\u52a1\u63cf\u8ff0 \u6211\u4eec\u5c55\u793a\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u5b9e\u9a8c\u7684\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86\u5b9e\u9a8c\u7cfb\u7edf\u7684\u6709\u6548\u6027\uff0c\u79bb\u7ebf\u5b9e\u9a8c\u663e\u793a\u51fa\u6539\u8fdb\uff0c\u800c\u5728\u7ebf\u5b9e\u9a8c\u5219\u5e26\u6765\u4e86\u7edf\u8ba1\u663e\u8457\u7684\u6548\u679c\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u4eab\u4e86\u901a\u8fc7\u6a21\u578b\u53c2\u6570\u8c03\u6574\u8fdb\u884c\u7684\u9a8c\u8bc1\u635f\u5931\u548c\u524dK\u7cbe\u5ea6\u7684\u5ea6\u91cf\u66f2\u7ebf\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u6a21\u578b\u5927\u5c0f\u3001\u5ef6\u8fdf\u548c\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u3002|\n", "2405.15739": "|**2024-05-24**|**Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias**|Andres Algaba et.al.|[2405.15739](http://arxiv.org/abs/2405.15739)|**[link](https://github.com/andresalgaba/llm_citation_patterns)**|\u8bba\u6587\u6458\u8981\uff1a \u5f15\u7528\u5b9e\u8df5\u5bf9\u4e8e\u6784\u5efa\u79d1\u5b66\u77e5\u8bc6\u7ed3\u6784\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u53d7\u5230\u5f53\u4ee3\u89c4\u8303\u548c\u504f\u89c1\u7684\u5f71\u54cd\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u51fa\u73b0\uff0c\u8fd9\u4e00\u9886\u57df\u51fa\u73b0\u4e86\u65b0\u7684\u52a8\u6001\u3002\u7814\u7a76\u8005\u9996\u6b21\u63a2\u7d22\u4e86\u5b8c\u5168\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u800c\u975e\u57fa\u4e8e\u641c\u7d22\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7684\u63a8\u8350\u5f15\u7528\u7684\u7279\u6027\u53ca\u5176\u6f5c\u5728\u504f\u89c1\u3002\u5b9e\u9a8c\u4f7f\u7528\u4e86\u4e00\u7ec4\u5305\u542b166\u7bc7\u6765\u81eaAAAI\u3001NeurIPS\u3001ICML\u548cICLR\u7684\u8bba\u6587\uff0c\u8fd9\u4e9b\u8bba\u6587\u5728GPT-4\u7684\u77e5\u8bc6\u622a\u6b62\u65e5\u671f\u540e\u53d1\u8868\uff0c\u6d89\u53ca3,066\u4e2a\u5f15\u7528\u3002\u5b9e\u9a8c\u8ba9GPT-4\u4e3a\u533f\u540d\u6587\u672c\u4e2d\u7684\u5f15\u7528\u63d0\u4f9b\u5b66\u672f\u53c2\u8003\u3002\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u7c7b\u548c\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u5f15\u7528\u6a21\u5f0f\u60ca\u4eba\u76f8\u4f3c\uff0c\u4f46GPT-4\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9ad8\u5f15\u7528\u504f\u89c1\uff0c\u5373\u4f7f\u5728\u63a7\u5236\u4e86\u51fa\u7248\u5e74\u4efd\u3001\u6807\u9898\u957f\u5ea6\u3001\u4f5c\u8005\u6570\u91cf\u548c\u4f1a\u8bae\u7b49\u56e0\u7d20\u540e\u4f9d\u7136\u5b58\u5728\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0GPT-4\u751f\u6210\u7684\u65e2\u6709\u548c\u4e0d\u5b58\u5728\u5f15\u7528\u7684\u7279\u6027\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8868\u660e\u6a21\u578b\u5185\u5316\u4e86\u5f15\u7528\u6a21\u5f0f\u3002\u901a\u8fc7\u5206\u6790\u5f15\u7528\u56fe\u8c31\uff0c\u663e\u793aGPT-4\u63a8\u8350\u7684\u5f15\u7528\u5d4c\u5165\u5728\u76f8\u5173\u5f15\u7528\u7f51\u7edc\u4e2d\uff0c\u6697\u793a\u5176\u5bf9\u6982\u5ff5\u7684\u6df1\u5165\u7406\u89e3\u3002\u5c3d\u7ba1\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u8f85\u52a9\u5f15\u7528\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4e5f\u53ef\u80fd\u653e\u5927\u73b0\u6709\u504f\u89c1\u5e76\u5f15\u5165\u65b0\u504f\u89c1\uff0c\u53ef\u80fd\u5f71\u54cd\u79d1\u5b66\u77e5\u8bc6\u7684\u4f20\u64ad\u3002\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86\u8bc6\u522b\u6a21\u578b\u504f\u89c1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5f00\u53d1\u5e73\u8861\u7684\u65b9\u6cd5\u4e0e\u8bed\u8a00\u6a21\u578b\u4e92\u52a8\u7684\u91cd\u8981\u6027\u3002|\n", "2405.15734": "|**2024-05-24**|**LM4LV: A Frozen Large Language Model for Low-level Vision Tasks**|Boyang Zheng et.al.|[2405.15734](http://arxiv.org/abs/2405.15734)|**[link](https://github.com/bytetriper/lm4lv)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u50ac\u751f\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u7814\u7a76\u70ed\u6f6e\uff0c\u5b83\u4eec\u6b63\u5728\u6539\u53d8\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u591a\u4e2a\u7814\u7a76\u8303\u5f0f\u3002\u5c3d\u7ba1MLLMs\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u548c\u6587\u672c\u5230\u56fe\u50cf\u7b49\u9ad8\u7ea7\u89c6\u89c9\u548c Vision-and-Language \u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u63a2\u8ba8\u8fc7\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u5982\u4f55\u4ece\u8fd9\u4e9b\u6a21\u578b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5f53\u524d\u5927\u591a\u6570MLLM\u7684\u8bbe\u8ba1\u4f7f\u5176\u5bf9\u4f4e\u7ea7\u7279\u5f81\u89c6\u800c\u4e0d\u89c1\uff0c\u56e0\u6b64\u5728\u89e3\u51b3\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u56fa\u6709\u9650\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa$\\textbf{LM4LV}$\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u4e00\u4e2a\u51bb\u7ed3\u7684LLM\u65e0\u9700\u4efb\u4f55\u591a\u6a21\u6001\u6570\u636e\u6216\u5148\u9a8c\u77e5\u8bc6\u5c31\u80fd\u89e3\u51b3\u4e00\u7cfb\u5217\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u3002\u8fd9\u7a81\u663e\u4e86LLMs\u5728\u4f4e\u7ea7\u89c6\u89c9\u9886\u57df\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u5e76\u5f25\u5408\u4e86MLLMs\u4e0e\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u6fc0\u53d1\u5bf9LLMs\u7684\u65b0\u89c6\u89d2\uff0c\u52a0\u6df1\u5bf9\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7406\u89e3\u3002|\n", "2405.15729": "|**2024-05-24**|**Optimizing Large Language Models for OpenAPI Code Completion**|Bohdan Petryshyn et.al.|[2405.15729](http://arxiv.org/abs/2405.15729)|**[link](https://github.com/BohdanPetryshyn/openapi-completion-benchmark)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u3002\u5c3d\u7ba1\u4e3b\u6d41\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\u8865\u5168\u89e3\u51b3\u65b9\u6848\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u8f83\u5c11\u89c1\u7684\u683c\u5f0f\uff0c\u5982OpenAPI\u5b9a\u4e49\u65f6\u6027\u80fd\u6b20\u4f73\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86GitHub Copilot\uff0c\u4e00\u4e2a\u6d41\u884c\u7684\u5546\u4e1a\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u5728OpenAPI\u5b8c\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u5e76\u9488\u5bf9Meta\u5f00\u6e90\u7684Code Llama\u6a21\u578b\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u8be5\u4efb\u52a1\u7684\u4f18\u5316\u7b56\u7565\u3002\u7814\u7a76\u4e2d\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8bed\u4e49\u611f\u77e5\u7684OpenAPI\u5b8c\u6210\u57fa\u51c6\uff0c\u901a\u8fc7\u5b9e\u9a8c\u5206\u6790\u4e86\u4e0d\u540c\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u6280\u672f\u5bf9Code Llama\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7ecf\u8fc7\u5fae\u8c03\u7684Code Llama\u6a21\u578b\u5728\u6b63\u786e\u6027\u4e0a\u8fbe\u5230\u4e86\u6bd4GitHub Copilot\u9ad8\u51fa55.2%\u7684\u5cf0\u503c\uff0c\u540c\u65f6\u5176\u53c2\u6570\u6570\u91cf\u4ec5\u4e3a\u5546\u4e1a\u89e3\u51b3\u65b9\u6848\uff08\u57fa\u4e8eCodex\u6a21\u578b\uff09\u76841/25\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u6539\u8fdb\u4e86\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u4ee3\u7801\u586b\u5145\u8bad\u7ec3\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5728\u63a5\u6536\u5230\u5c0f\u4e8e\u8bad\u7ec3\u65f6\u4f7f\u7528\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u63d0\u793a\u65f6\u7684\u6027\u80fd\u4e0d\u8db3\u95ee\u9898\u3002|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684](http://arxiv.org/abs/2405.15684)|null|\u4e3a\u4e86\u5f25\u5408\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u901a\u5e38\u4f1a\u5b66\u4e60\u4e00\u4e2a\u9002\u914d\u5668\uff0c\u5c06\u89c6\u89c9\u8f93\u5165\u8f6c\u5316\u4e3a\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u7406\u89e3\u7684\u4ee4\u724c\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u9002\u914d\u5668\u751f\u6210\u7684\u89c6\u89c9\u4ee4\u724c\u76f8\u5bf9\u56fa\u5b9a\uff0c\u4e0d\u8003\u8651\u63d0\u793a\u4e2d\u63d0\u53ca\u7684\u5177\u4f53\u5bf9\u8c61\u3002\u7531\u4e8e\u8fd9\u4e9b\u9002\u914d\u5668\u5bf9\u56fe\u50cf\u4e2d\u7684\u6bcf\u4e2a\u7ec6\u8282\u5206\u914d\u540c\u7b49\u5173\u6ce8\uff0c\u4e14\u503e\u5411\u4e8e\u5904\u7406\u6574\u4e2a\u573a\u666f\uff0c\u8fd9\u53ef\u80fd\u4f1a\u589e\u52a0\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u573a\u666f\u65f6\u7684\u8ba4\u77e5\u8d1f\u8377\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u3002\u8fd9\u7c7b\u9002\u914d\u5668\u8bbe\u8ba1\u6709\u6839\u636e\u63d0\u793a\u7279\u5b9a\u5173\u6ce8\u70b9\u52a8\u6001\u5d4c\u5165\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u6587\u672c\u7279\u5f81\uff0c\u5728\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u5c42\u6b21\u4e0a\u6355\u6349\u4e0e\u63d0\u793a\u6700\u76f8\u5173\u7684\u89c6\u89c9\u7ebf\u7d22\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u89e3\u91ca\u89c6\u89c9\u5185\u5bb9\u7684\u80fd\u529b\u3002\u5728\u5404\u79cd\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u5982\u8ba1\u6570\u548c\u4f4d\u7f6e\u63a8\u7406\u5b9e\u9a8c\u4e2d\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u7684\u6548\u679c\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2405.15668": "|**2024-05-24**|**What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models**|Abdelrahman Abdelhamed et.al.|[2405.15668](http://arxiv.org/abs/2405.15668)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u591a\u6a21\u6001LLMs\u5e94\u7528\u4e8e\u56fe\u50cf\u8f93\u5165\uff0c\u751f\u6210\u8be6\u5c3d\u7684\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6587\u672c\u8868\u793a\u88ab\u8f6c\u5316\u4e3a\u8de8\u6a21\u6001\u5d4c\u5165\u7a7a\u95f4\u4e2d\u7684\u56fa\u5b9a\u7ef4\u7279\u5f81\uff0c\u5e76\u7ed3\u5408\u4f7f\u7528\u4e8e\u96f6\u6837\u672c\u5206\u7c7b\uff0c\u65e0\u9700\u4e3a\u6bcf\u4e2a\u6570\u636e\u96c6\u8bbe\u8ba1\u590d\u6742\u7684\u63d0\u793a\u3002\u7814\u7a76\u8005\u91c7\u7528\u901a\u7528\u63d0\u793a\u7b56\u7565\uff0c\u800c\u975e\u9488\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u5355\u72ec\u8c03\u6574\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u6bd4\u5148\u524d\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u6709\u6240\u63d0\u5347\u3002\u5e73\u5747\u800c\u8a00\uff0c\u5728\u5341\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u65b9\u6cd5\u6bd4\u4f20\u7edf\u65b9\u6cd5\u63d0\u9ad8\u4e864.1\u4e2a\u767e\u5206\u70b9\uff0c\u5c24\u5176\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e866.8\u4e2a\u767e\u5206\u70b9\u3002\u8fd9\u8868\u660e\uff0c\u591a\u6a21\u6001LLMs\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5982\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u4e4b\u7c7b\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u4efb\u52a1\uff0c\u4e3a\u73b0\u6709\u6280\u672f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002|\n", "2405.15662": "|**2024-05-24**|**Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning**|Wenhan Chang et.al.|[2405.15662](http://arxiv.org/abs/2405.15662)|null|\u5728\u4eba\u5de5\u667a\u80fd\u65f6\u4ee3\uff0c\u7528\u6237\u53ef\u80fd\u56e0\u9690\u79c1\u987e\u8651\u8981\u6c42AI\u516c\u53f8\u4ece\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u5220\u9664\u4ed6\u4eec\u7684\u4fe1\u606f\u3002\u4f5c\u4e3a\u6a21\u578b\u6240\u6709\u8005\uff0c\u91cd\u65b0\u8bad\u7ec3\u6a21\u578b\u4f1a\u6d88\u8017\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\uff0c\u56e0\u6b64\u673a\u5668\u9057\u5fd8\uff08machine unlearning\uff09\u6280\u672f\u5e94\u8fd0\u800c\u751f\uff0c\u4ee5\u5141\u8bb8\u5220\u9664\u8bf7\u6c42\u7684\u8bad\u7ec3\u6570\u636e\u6216\u7c7b\u522b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\uff0c\u5982\u56fe\u50cf\u6216\u6587\u672c\uff0c\u4ece\u6a21\u578b\u4e2d\u201c\u9057\u5fd8\u201d\u4e00\u4e2a\u7c7b\u522b\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u96be\u4ee5\u786e\u5b9a\u7c7b\u522b\u4e0e\u6a21\u578b\u4e4b\u95f4\u7684\u5173\u8054\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u6982\u5ff5\uff08Concept\uff09\u800c\u975e\u56fe\u50cf\u7279\u5f81\u6216\u6587\u672c\u6570\u636e\u4e2d\u7684\u4ee4\u724c\u6765\u8868\u793a\u8981\u5220\u9664\u7c7b\u522b\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u8fd9\u6709\u52a9\u4e8e\u5207\u65ad\u6a21\u578b\u4e0e\u7c7b\u522b\u7684\u8054\u7cfb\uff0c\u5b9e\u73b0\u5f7b\u5e95\u6d88\u9664\u5f71\u54cd\u3002 \u4e3a\u4e86\u5206\u6790\u590d\u6742\u6570\u636e\u4e2d\u7684\u6982\u5ff5\u5f71\u54cd\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u540e\u5904\u7406\u6982\u5ff5\u74f6\u9888\u6a21\u578b\u548c\u96c6\u6210\u68af\u5ea6\u6280\u672f\uff0c\u7cbe\u786e\u8bc6\u522b\u4e0d\u540c\u7c7b\u522b\u4e2d\u7684\u6982\u5ff5\u3002\u7136\u540e\uff0c\u6211\u4eec\u5229\u7528\u968f\u673a\u6807\u7b7e\u548c\u76ee\u6807\u6807\u7b7e\u7684\u6570\u636e\u6c61\u67d3\u7b56\u7565\uff0c\u63d0\u51fa\u9057\u5fd8\u65b9\u6cd5\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u6a21\u578b\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4e00\u81f4\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7b56\u7565\u80fd\u51c6\u786e\u5730\u4ece\u6a21\u578b\u4e2d\u62b9\u9664\u76ee\u6807\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u6027\u80fd\u7684\u5927\u90e8\u5206\u3002|\n", "2405.15652": "|**2024-05-24**|**$$\\mathbf{L^2\\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis**|Simen Gaure et.al.|[2405.15652](http://arxiv.org/abs/2405.15652)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u7ffb\u8bd1\u3001\u9884\u6d4b\u548c\u5185\u5bb9\u751f\u6210\u7b49\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\u3002\u540c\u65f6\uff0c\u7814\u7a76\u754c\u53d1\u73b0LLMs\u6613\u53d7\u653b\u51fb\uff0c\u4f46\u4e5f\u80fd\u589e\u5f3a\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f00\u6e90\u7684LLMs\u5728\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u5a92\u4ecb\uff0c\u5982\u652f\u6301\u6297\u5ba1\u67e5\u901a\u4fe1\u65b9\u9762\u7684\u80fd\u529b\u5982\u4f55\u5462\uff1f\u672c\u8bba\u6587\u4ece\u5b9e\u9a8c\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b9e\u8bc1\u6d4b\u91cf\u5f00\u6e90LLM\u6a21\u578b\uff08Llama-7B\uff09\u7684\u5b89\u5168\u6027\u4e0e\u5bb9\u91cf\uff0c\u4ee5\u8bc4\u4f30\u5176\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u7684\u6709\u6548\u6027\u3002\u5c3d\u7ba1\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8e\u8fd9\u79cd\u6a21\u578b\u7684\u901a\u9053\u4e0d\u592a\u53ef\u80fd\u5b9e\u73b0\u9ad8\u5b9e\u9645\u6bd4\u7279\u7387\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6d88\u606f\u957f\u5ea6\u548c\u6a21\u578b\u71b5\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5bf9\u624b\u53d1\u73b0\u9690\u79d8\u901a\u4fe1\u7684\u53ef\u80fd\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u4f7f\u7ed3\u679c\u6613\u4e8e\u5e7f\u6cdb\u53c2\u8003\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u4e14\u76f4\u89c2\u7684\u65b9\u6848\uff0c\u5e76\u5047\u8bbe\u6a21\u578b\u662f\u516c\u5f00\u53ef\u7528\u7684\u3002|\n", "2405.15646": "|**2024-05-24**|**LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots**|Ruoyu Wang et.al.|[2405.15646](http://arxiv.org/abs/2405.15646)|null|\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u5f00\u53d1\u901a\u7528\u670d\u52a1\u673a\u5668\u4eba\u7684\u9700\u6c42\u4fc3\u4f7f\u673a\u5668\u4eba\u5fc5\u987b\u80fd\u6070\u5f53\u5730\u6267\u884c\u591a\u79cd\u57fa\u7840\u884c\u4e3a\u3002\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fdb\u6b65\u4f7f\u5f97\u53ef\u4ee5\u76f4\u63a5\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u6210\u4efb\u52a1\u5e8f\u5217\uff0c\u65e0\u9700\u989d\u5916\u7684\u9886\u57df\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5c3d\u7ba1LLMs\u7684\u8f93\u51fa\u5728\u8bed\u4e49\u4e0a\u662f\u6b63\u786e\u7684\uff0c\u4f46\u751f\u6210\u7684\u4efb\u52a1\u8ba1\u5212\u53ef\u80fd\u5e76\u4e0d\u7cbe\u786e\u5730\u5bf9\u5e94\u4e8e\u53ef\u63a5\u53d7\u7684\u52a8\u4f5c\uff0c\u5e76\u4e14\u53ef\u80fd\u5b58\u5728\u5404\u79cd\u8bed\u8a00\u6a21\u7cca\u6027\u3002LLM\u7684\u5e7b\u89c9\u95ee\u9898\u5bf9\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u6784\u6210\u6311\u6218\uff0c\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u6216\u7528\u6237\u8f93\u5165\u4e0d\u7b26\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u7ea6\u675fLLM\u63d0\u793a\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u4ece\u547d\u4ee4\u4e2d\u751f\u6210\u53ef\u6267\u884c\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f02\u5e38\u5904\u7406\u6a21\u5757\u6765\u5e94\u5bf9LLM\u5e7b\u89c9\u95ee\u9898\uff0c\u786e\u4fdd\u751f\u6210\u7684\u7ed3\u679c\u5728\u5f53\u524d\u73af\u5883\u4e2d\u662f\u53ef\u63a5\u7eb3\u7684\u3002\u6211\u4eec\u5728RoboCup@Home\u547d\u4ee4\u751f\u6210\u5668\u751f\u6210\u7684\u547d\u4ee4\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u663e\u793a\u673a\u5668\u4eba\u5728\u7406\u89e3\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.15640": "|**2024-05-24**|**GECKO: Generative Language Model for English, Code and Korean**|Sungwoo Oh et.al.|[2405.15640](http://arxiv.org/abs/2405.15640)|null|\u6211\u4eec\u4ecb\u7ecdGECKO\uff0c\u4e00\u4e2a\u4e13\u4e3a\u97e9\u8bed\u548c\u82f1\u8bed\uff08\u5305\u62ec\u7f16\u7a0b\u8bed\u8a00\uff09\u8bbe\u8ba1\u7684\u53cc\u8bed\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u5b83\u57fa\u4e8eLLaMA\u67b6\u6784\uff0c\u4f7f\u7528\u5e73\u8861\u4e14\u9ad8\u8d28\u91cf\u7684\u97e9\u82f1\u8bed\u6570\u636e\u96c6\u8fdb\u884c\u9884\u8bad\u7ec3\u3002\u672c\u62a5\u544a\u8be6\u8ff0\u4e86\u6211\u4eec\u5728\u6784\u5efa\u6570\u636e\u7ba1\u9053\u548c\u8bad\u7ec3\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u52aa\u529b\u3002\u5c3d\u7ba1GECKO\u7684\u8bcd\u6c47\u91cf\u8f83\u5c0f\uff0c\u4f46\u5176\u5728\u751f\u6210\u97e9\u8bed\u548c\u82f1\u8bed\u4ee4\u724c\u65f6\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u80fd\u3002\u6211\u4eec\u5728\u4ee3\u8868\u6027\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30\u4e86\u5176\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u97e9\u56fdMMMLU\uff08\u97e9\u56fd\u591a\u6a21\u6001\u591a\u8bed\u8a00\u7406\u89e3\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u800c\u5728\u82f1\u8bed\u548c\u4ee3\u7801\u65b9\u9762\u5219\u663e\u793a\u51fa\u9002\u5ea6\u7684\u80fd\u529b\uff0c\u5c3d\u7ba1\u5176\u8bad\u7ec3\u7684\u4ee4\u724c\u6570\u91cf\u5c11\u4e8e\u4e13\u6ce8\u4e8e\u82f1\u8bed\u7684LLMs\u3002GECKO\u4ee5\u5bbd\u677e\u7684\u8bb8\u53ef\u534f\u8bae\u5bf9\u5f00\u6e90\u793e\u533a\u5f00\u653e\uff0c\u6211\u4eec\u5e0c\u671b\u5b83\u80fd\u4e3a\u97e9\u8bedLLM\u7814\u7a76\u63d0\u4f9b\u7814\u7a76\u57fa\u7ebf\u548c\u5b9e\u7528\u89c1\u89e3\u3002\u60a8\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\u8be5\u6a21\u578b\uff1ahttps://huggingface.co/kifai/GECKO-7B\u3002|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430](http://arxiv.org/abs/2405.17430)|null|## \u80cc\u666f \u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08\u5982LLaVA\uff09\u5728\u89c6\u89c9-\u8bed\u8a00\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4e9b\u6a21\u578b\u9996\u5148\u5c06\u56fe\u50cf\u5d4c\u5165\u5230\u5927\u91cf\u7684\u56fa\u5b9a\u89c6\u89c9\u4ee4\u724c\u4e2d\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u89c6\u9891\u7b49\u5bc6\u96c6\u89c6\u89c9\u573a\u666f\u65f6\u4f1a\u5bfc\u81f4\u5927\u91cf\u4ee4\u724c\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u5c3d\u7ba1\u5b58\u5728\u4ee4\u724c\u526a\u679d/\u5408\u5e76\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u4e3a\u6bcf\u4e2a\u56fe\u50cf\u751f\u6210\u5355\u4e2a\u957f\u5ea6\u7684\u8f93\u51fa\uff0c\u65e0\u6cd5\u5728\u4fe1\u606f\u5bc6\u5ea6\u4e0e\u6548\u7387\u4e4b\u95f4\u7075\u6d3b\u6743\u8861\u3002\u53d7\u5230\u5957\u5a03\u73a9\u5076\u6982\u5ff5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86M3\uff1a\u5957\u5a03\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u5b66\u4e60\u5c06\u89c6\u89c9\u5185\u5bb9\u8868\u793a\u4e3a\u6355\u6349\u4e0d\u540c\u7c97\u7ec6\u7c92\u5ea6\u4fe1\u606f\u7684\u5d4c\u5957\u89c6\u89c9\u4ee4\u724c\u96c6\u5408\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u65b9\u6cd5\u4e3aLMMs\u5e26\u6765\u4e86\u51e0\u4e2a\u72ec\u7279\u7684\u4f18\u52bf\uff1a(1) \u5728\u6d4b\u8bd5\u5b9e\u4f8b\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u660e\u786e\u63a7\u5236\u89c6\u89c9\u7c92\u5ea6\uff0c\u4f8b\u5982\uff0c\u6839\u636e\u5185\u5bb9\u7684\u590d\u6742\u6027\u6216\u7b80\u6d01\u6027\u8c03\u6574\u7528\u4e8e\u8868\u793a\u56fe\u50cf\u7684\u4ee4\u724c\u6570\u91cf\uff1b(2) M3\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5206\u6790\u73b0\u6709\u6570\u636e\u96c6\u6240\u9700\u7c92\u5ea6\u7684\u6846\u67b6\uff0c\u6211\u4eec\u53d1\u73b0\u50cfCOCO\u8fd9\u6837\u7684\u57fa\u51c6\u53ea\u9700\u8981\u5927\u7ea6~9\u4e2a\u89c6\u89c9\u4ee4\u724c\u5c31\u80fd\u83b7\u5f97\u4e0e\u4f7f\u7528\u6240\u6709576\u4e2a\u4ee4\u724c\u76f8\u5f53\u7684\u51c6\u786e\u6027\uff1b(3) \u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u63a2\u7d22\u6027\u80fd\u4e0e\u89c6\u89c9\u4ee4\u724c\u957f\u5ea6\u4e4b\u95f4\u7684\u6700\u4f73\u6743\u8861\u63d0\u4f9b\u4e86\u57fa\u7840\uff0c\u7814\u7a76\u663e\u793a\u5f53\u524d\u56fa\u5b9a\u89c4\u6a21\u8868\u793a\u4e0e\u7406\u60f3\u4e0a\u9650\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002|\n", "2405.17428": "|**2024-05-27**|**NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models**|Chankyu Lee et.al.|[2405.17428](http://arxiv.org/abs/2405.17428)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNV-Embed\u7684\u65b0\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u63d0\u5347\u57fa\u4e8e\u89e3\u7801\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5bc6\u96c6\u5411\u91cf\u68c0\u7d22\u3002NV-Embed\u901a\u8fc7\u591a\u79cd\u67b6\u6784\u8bbe\u8ba1\u548c\u8bad\u7ec3\u7b56\u7565\u663e\u8457\u589e\u5f3a\u6a21\u578b\u7684\u7075\u6d3b\u6027\u548c\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u7b80\u6d01\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002 \u5728\u67b6\u6784\u65b9\u9762\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9690\u5f0f\u6ce8\u610f\u529b\u5c42\u6765\u83b7\u53d6\u6c60\u5316\u5d4c\u5165\uff0c\u8fd9\u5728\u68c0\u7d22\u548c\u4e0b\u6e38\u4efb\u52a1\u51c6\u786e\u6027\u4e0a\u5747\u4f18\u4e8e\u5e73\u5747\u6c60\u5316\u6216\u4f7f\u7528LLMs\u7684\u6700\u540e\u4e00\u4e2a token\u5d4c\u5165\u3002\u4e3a\u4e86\u6539\u8fdb\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u79fb\u9664\u4e86LLMs\u7684\u81ea\u56de\u5f52\u6ce8\u610f\u529b\u63a9\u7801\uff0c\u5728\u5bf9\u6bd4\u6027\u8bad\u7ec3\u4e2d\u5141\u8bb8\u66f4\u5168\u9762\u7684\u4fe1\u606f\u4ea4\u4e92\u3002 \u5728\u8bad\u7ec3\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u5bf9\u6bd4\u6027\u6307\u4ee4\u8c03\u4f18\u65b9\u6cd5\u3002\u7b2c\u4e00\u9636\u6bb5\u5728\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6307\u4ee4\u8bad\u7ec3\uff0c\u5229\u7528\u6279\u6b21\u5185\u8d1f\u6837\u672c\u548c\u7cbe\u5fc3\u6311\u9009\u7684\u96be\u4f8b\u3002\u7b2c\u4e8c\u9636\u6bb5\u5c06\u5404\u79cd\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u6570\u636e\u878d\u5165\u6307\u4ee4\u8c03\u4f18\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u51c6\u786e\u6027\uff0c\u8fd8\u63d0\u5347\u4e86\u68c0\u7d22\u6027\u80fd\u3002 \u51ed\u501f\u8fd9\u4e9b\u521b\u65b0\uff0cNV-Embed\u4ec5\u4f7f\u7528\u516c\u5f00\u6570\u636e\u5c31\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u9ad8\u5206\uff0c\u8fbe\u523069.32\uff0c\u8363\u767b\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\uff08\u622a\u81f32024\u5e745\u670824\u65e5\uff09\u699c\u9996\uff0c\u6db5\u76d656\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u91cd\u6392\u3001\u5206\u7c7b\u3001\u805a\u7c7b\u548c\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u5ea6\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728BEIR\u768415\u9879\u68c0\u7d22\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u6700\u9ad8\u768459.36\u5206\u3002NV-Embed\u6a21\u578b\u7684\u6e90\u4ee3\u7801\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u5f00\u6e90\uff1ahttps://huggingface.co/nvidia/NV-Embed-v1\u3002|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427](http://arxiv.org/abs/2405.17427)|**[link](https://github.com/kuanchihhuang/reason3d)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b83\u4eec\u5728\u6982\u5ff5\u63a8\u7406\u7b49\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5728\u7406\u89e3\u4e09\u7ef4\u73af\u5883\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u76f8\u5bf9\u6709\u9650\u3002\u672c\u6587\u63d0\u51faReason3D\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u4e3a\u5168\u97623D\u7406\u89e3\u8bbe\u8ba1\u7684\u65b0\u9896LLM\u3002Reason3D\u63a5\u53d7\u70b9\u4e91\u6570\u636e\u548c\u6587\u672c\u63d0\u793a\u4f5c\u4e3a\u8f93\u5165\uff0c\u751f\u6210\u6587\u672c\u54cd\u5e94\u548c\u5206\u5272\u63a9\u7801\uff0c\u652f\u6301\u9ad8\u7ea7\u4efb\u52a1\uff0c\u59823D\u63a8\u7406\u5206\u5272\u3001\u5c42\u6b21\u641c\u7d22\u3001\u8868\u8fbe\u5f0f\u6307\u4ee3\u548c\u8be6\u7ec6\u63a9\u7801\u8f93\u51fa\u7684\u95ee\u7b54\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5206\u5c42\u63a9\u7801\u89e3\u7801\u5668\uff0c\u80fd\u591f\u7cbe\u786e\u5b9a\u4f4d\u5e7f\u9614\u573a\u666f\u4e2d\u7684\u5c0f\u7269\u4f53\u3002\u8be5\u89e3\u7801\u5668\u9996\u5148\u751f\u6210\u4e00\u4e2a\u7c97\u7565\u7684\u4f4d\u7f6e\u4f30\u8ba1\uff0c\u8986\u76d6\u7269\u4f53\u7684\u5927\u81f4\u533a\u57df\uff0c\u7136\u540e\u91c7\u7528\u9010\u6b65\u7ec6\u5316\u7684\u7b56\u7565\uff0c\u663e\u8457\u63d0\u9ad8\u5bf9\u8c61\u8bc6\u522b\u548c\u5206\u5272\u7684\u7cbe\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cReason3D\u5728ScanNet\u548cMatterport3D\u7b49\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\uff0c\u57283D\u8868\u8fbe\u5f0f\u6307\u4ee3\u30013D\u95ee\u7b54\u548c3D\u63a8\u7406\u5206\u5272\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1ahttps://github.com/KuanchihHuang/Reason3D\u3002**|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|\u7531\u4e8e\u5b9e\u4f53\u4ee3\u7406\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u5b83\u4eec\u5fc5\u987b\u5177\u5907\u5168\u9762\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u8fd1\u671f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\u3002\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u786e\u5b9a\u5177\u4f53\u52a8\u4f5c\u65f6\u53ef\u80fd\u5b58\u5728\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\uff0c\u79f0\u4e3a\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u6240\u80fd\u8fbe\u5230\u7684\u6210\u5c31\u9700\u8981\u66f4\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u662f\u6700\u5feb\u7684\uff0c\u6bd4\u4ee5\u524d\u5feb6.8\u500d\u3002|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418](http://arxiv.org/abs/2405.17418)|null|\u5f53\u673a\u5668\u4eba\u64cd\u4f5c\u7b56\u7565\u9762\u5bf9\u65b0\u4efb\u52a1\u6216\u7269\u4f53\u5b9e\u4f8b\u65f6\uff0c\u5176\u52a8\u4f5c\u6027\u80fd\u5f80\u5f80\u4e0d\u5c3d\u4eba\u610f\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u68c0\u6d4b\u548c\u81ea\u6211\u7ea0\u6b63\u5931\u8d25\u52a8\u4f5c\u7684\u80fd\u529b\u5bf9\u4e8e\u5b9e\u9645\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLM\uff09\u5728\u89c6\u89c9\u6307\u4ee4\u8ddf\u968f\u65b9\u9762\u5c55\u73b0\u51fa\u524d\u666f\uff0c\u5e76\u5728\u591a\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u5c06\u901a\u7528MLLM\u4f5c\u4e3a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4eba\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Self-Corrected (SC)-MLLM\uff0c\u4e0d\u4ec5\u4f7f\u5176\u80fd\u591f\u9884\u6d4b\u672b\u7aef\u6267\u884c\u5668\u4f4d\u7f6e\uff0c\u8fd8\u8d4b\u4e88\u5176\u81ea\u4e3b\u8bc6\u522b\u5e76\u7ea0\u6b63\u9519\u8bef\u52a8\u4f5c\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u53c2\u6570\u6548\u7387\u9ad8\u7684\u5fae\u8c03\uff0c\u4f7fMLLM\u5177\u5907\u59ff\u6001\u9884\u6d4b\u529f\u80fd\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u4e00\u4e2a\u8bed\u8a00\u5efa\u6a21\u95ee\u9898\u3002\u5728\u9047\u5230\u6267\u884c\u5931\u8d25\u65f6\uff0c\u6a21\u578b\u80fd\u8bc6\u522b\u4f4e\u5c42\u6b21\u52a8\u4f5c\u9519\u8bef\u7684\u539f\u56e0\uff08\u5982\u4f4d\u7f6e\u548c\u65cb\u8f6c\u8bef\u5dee\uff09\uff0c\u5e76\u4e3b\u52a8\u5bfb\u6c42\u4e13\u5bb6\u7684\u63d0\u793a\u3002\u6839\u636e\u53cd\u9988\uff0cSC-MLLM\u4f1a\u91cd\u65b0\u601d\u8003\u5f53\u524d\u5931\u8d25\u573a\u666f\uff0c\u751f\u6210\u4fee\u6b63\u540e\u7684\u52a8\u4f5c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u8fde\u7eed\u7b56\u7565\u5b66\u4e60\u65b9\u6cd5\uff0c\u9488\u5bf9\u6210\u529f\u7ea0\u6b63\u7684\u6837\u672c\uff0c\u63d0\u5347\u6a21\u578b\u5bf9\u5f53\u524d\u573a\u666f\u914d\u7f6e\u7684\u9002\u5e94\u6027\uff0c\u51cf\u5c11\u4e13\u5bb6\u5e72\u9884\u7684\u9891\u7387\u3002 \u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684SC-MLLM\uff0c\u6211\u4eec\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u6700\u5148\u8fdb\u7684\u673a\u5668\u4ebaMLLM\uff08ManipLLM\uff09\u76f8\u6bd4\uff0cSC-MLLM\u663e\u8457\u63d0\u9ad8\u4e86\u64cd\u4f5c\u7cbe\u5ea6\uff1a\u5728\u5df2\u77e5\u7269\u4f53\u7c7b\u522b\u4e0a\u4ece57%\u63d0\u5347\u81f379%\uff0c\u5728\u672a\u77e5\u65b0\u7c7b\u522b\u4e0a\u4ece47%\u63d0\u5347\u81f369%\u3002|\n", "2405.17402": "|**2024-05-27**|**THREAD: Thinking Deeper with Recursive Spawning**|Philip Schroeder et.al.|[2405.17402](http://arxiv.org/abs/2405.17402)|**[link](https://github.com/philipmit/thread)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u4f46\u968f\u7740\u4e0a\u4e0b\u6587\u7684\u957f\u5ea6\u548c\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u5b83\u4eec\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Thinking Recursively and Dynamically\uff08ThReaD\uff09\u65b9\u6cd5\u3002ThReaD\u5c06\u6a21\u578b\u751f\u6210\u8fc7\u7a0b\u6784\u60f3\u4e3a\u4e00\u4e2a\u6267\u884c\u6d41\u7a0b\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u53ef\u4ee5\u5b8c\u6574\u8fd0\u884c\u6216\u52a8\u6001\u5730\u521b\u5efa\u65b0\u7ebf\u7a0b\u3002\u901a\u8fc7\u5b50\u7ebf\u7a0b\uff0c\u6a21\u578b\u53ef\u4ee5\u5206\u53d1\u4efb\u52a1\uff08\u5982\u601d\u8003\u3001\u83b7\u53d6\u4fe1\u606f\uff09\uff0c\u5b50\u7ebf\u7a0b\u53ea\u8fd4\u56de\u7236\u7ebf\u7a0b\u6240\u9700\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u8ba9\u6a21\u578b\u80fd\u591f\u6839\u636e\u9700\u8981\u8c03\u6574\u4ea7\u751f\u4ee4\u724c\u65f6\u4f7f\u7528\u7684\u4e2d\u95f4\u5de5\u4f5c\u91cf\u3002\u6211\u4eec\u5728\u4efb\u52a1\u89e3\u51b3\u548c\u95ee\u7b54\u7b49\u573a\u666f\u4e2d\u5e94\u7528ThReaD\uff0c\u4f7f\u5176\u80fd\u9012\u5f52\u5730\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u6216\u95ee\u9898\u5206\u89e3\u4e3a\u9010\u6b65\u7b80\u5316\u7684\u5c0f\u5b50\u95ee\u9898\uff0c\u7531\u5355\u72ec\u7684\u5b50\u7ebf\u7a0b\u89e3\u51b3\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u7684\u65b9\u5f0f\u5b9e\u73b0ThReaD\uff0c\u5e76\u5728\u5305\u62ecALFWorld\u3001TextCraft\u3001WebShop\u5728\u5185\u7684\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30GPT-4\u548cGPT-3.5\u7684\u8868\u73b0\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65b0\u57fa\u51c6\uff1aDataCommons QA\u548cMIMIC-III ICU QA\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cThReaD\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u76f8\u5bf9\u4e8e\u73b0\u6709\u6846\u67b6\uff0c\u5373\u4f7f\u662f\u5c0f\u578b\u6a21\u578b\uff08\u5982Llama-3-8b\u548cCodeLlama-7b\uff09\u4e5f\u80fd\u63d0\u534710%\u523050%\u7684\u7edd\u5bf9\u5206\u6570\u3002|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386](http://arxiv.org/abs/2405.17386)|**[link](https://github.com/cone-mt/mindmerger)**|## \u4efb\u52a1 \u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u82f1\u8bed\u4e0e\u5176\u4ed6\u975e\u82f1\u8bed\u8bed\u8a00\u4e4b\u95f4\u7684\u5dee\u8ddd\u660e\u663e\u3002\u4e00\u4e9b\u7814\u7a76\u901a\u8fc7\u5fae\u8c03LLMs\u4ee5\u91cd\u65b0\u5b66\u4e60\u975e\u82f1\u8bed\u7684\u63a8\u7406\u80fd\u529b\uff0c\u800c\u53e6\u4e00\u4e9b\u65b9\u6cd5\u5219\u4f7f\u7528\u5916\u90e8\u6a21\u578b\uff08\u5982\u82f1\u8bed\u7ffb\u8bd1\u6587\u672c\uff09\u7684\u8f93\u51fa\u6765\u66ff\u6362\u975e\u82f1\u8bed\u8f93\u5165\uff0c\u4ee5\u5e94\u5bf9LLM\u7406\u89e3\u975e\u82f1\u8bed\u7684\u6311\u6218\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u672a\u80fd\u5145\u5206\u5229\u7528LLMs\u5185\u5728\u7684\u63a8\u7406\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528LLMs\u7684\u601d\u7ef4\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u79f0\u4e3aMindMerger\uff0c\u5b83\u5c06LLMs\u4e0e\u591a\u8bed\u8a00\u6a21\u578b\u7684\u5916\u90e8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u4ee5\u63d0\u5347\u591a\u8bed\u8a00\u63a8\u7406\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e24\u6b65\u8bad\u7ec3\u7b56\u7565\uff0c\u9996\u5148\u5c06\u5916\u90e8\u80fd\u529b\u5d4c\u5165LLMs\uff0c\u7136\u540e\u8bad\u7ec3\u5916\u90e8\u80fd\u529b\u548c\u5185\u7f6e\u80fd\u529b\u7684\u534f\u4f5c\u4f7f\u7528\u3002\u5728\u4e09\u4e2a\u591a\u8bed\u8a00\u63a8\u7406\u6570\u636e\u96c6\u548c\u4e00\u4e2a\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMindMerger\u59cb\u7ec8\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u7279\u522b\u662f\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e0a\u3002\u5728\u4e0d\u66f4\u65b0LLMs\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0cMGSM\u6570\u636e\u96c6\u4e0a\u6240\u6709\u8bed\u8a00\u7684\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e866.7%\uff0c\u4f4e\u8d44\u6e90\u8bed\u8a00\u63d0\u9ad8\u4e868.0%\u3002|\n", "2405.17382": "|**2024-05-27**|**ReMoDetect: Reward Models Recognize Aligned LLM's Generations**|Hyunseok Lee et.al.|[2405.17382](http://arxiv.org/abs/2405.17382)|**[link](https://github.com/hyunseoklee-ai/reward_llm_detect)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6613\u7528\u6027\u63d0\u5347\uff0c\u5b83\u4eec\u5e26\u6765\u7684\u793e\u4f1a\u98ce\u9669\uff0c\u5982\u5047\u65b0\u95fb\u751f\u6210\uff0c\u4fc3\u4f7f\u5f00\u53d1\u51fa\u80fd\u68c0\u6d4bLLM\u751f\u6210\u6587\u672c\uff08LGT\uff09\u7684\u65b9\u6cd5\u4ee5\u786e\u4fdd\u5b89\u5168\u4f7f\u7528\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5927\u91cfLLM\u7684\u5b58\u5728\uff0c\u9010\u4e2a\u8bc6\u522b\u5b83\u4eec\u7684\u7279\u70b9\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u5173\u6ce8\u7684\u662f\u8fd9\u4e9b\u5f3a\u5927\u6a21\u578b\u5171\u6709\u7684\u7279\u6027\uff0c\u5373\u201c\u5bf9\u9f50\u8bad\u7ec3\u201d\uff0c\u5373\u8bad\u7ec3LLMs\u751f\u6210\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u6587\u672c\u3002\u6211\u4eec\u7684\u5173\u952e\u53d1\u73b0\u662f\uff0c\u968f\u7740\u8fd9\u4e9b\u5bf9\u9f50\u8bad\u7ec3\u7684LLMs\u81f4\u529b\u4e8e\u6700\u5927\u5316\u4eba\u7c7b\u504f\u597d\uff0c\u5b83\u4eec\u751f\u6210\u7684\u6587\u672c\u751a\u81f3\u6bd4\u4eba\u7c7b\u64b0\u5199\u7684\u6587\u672c\u5728\u4f30\u8ba1\u504f\u597d\u4e0a\u66f4\u9ad8\uff0c\u8fd9\u4f7f\u5f97\u5229\u7528\u504f\u597d\u6a21\u578b\uff08\u4e00\u4e2a\u8bad\u7ec3\u6765\u6a21\u62df\u4eba\u7c7b\u504f\u597d\u5206\u5e03\u7684LLM\uff09\u8f7b\u6613\u5c31\u80fd\u68c0\u6d4b\u5230\u8fd9\u4e9b\u6587\u672c\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e24\u79cd\u8fdb\u4e00\u6b65\u589e\u5f3a\u504f\u597d\u6a21\u578b\u68c0\u6d4b\u80fd\u529b\u7684\u8bad\u7ec3\u7b56\u7565\uff1a\uff081\uff09\u6301\u7eed\u504f\u597d\u5fae\u8c03\uff0c\u4f7f\u6a21\u578b\u66f4\u504f\u5411\u4e8e\u8bc6\u522b\u5bf9\u9f50\u7684LLG\uff1b\uff082\uff09\u5956\u52b1\u6a21\u578b\u5bf9\u4eba/LLM\u6df7\u5408\u6587\u672c\u7684\u5b66\u4e60\uff0c\u5373\u4f7f\u7528\u5bf9\u9f50LLM\u91cd\u8ff0\u7684\u4eba\u7c7b\u539f\u521b\u6587\u672c\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8eLGT\u548c\u4eba\u7c7b\u6587\u672c\u4e4b\u95f4\u7684\u504f\u597d\u57fa\u51c6\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u5b66\u4e60\u51b3\u7b56\u8fb9\u754c\u3002\u6211\u4eec\u5728\u516d\u4e2a\u6587\u672c\u9886\u57df\u548c\u5341\u4e8c\u79cd\u5bf9\u9f50LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5728https://github.com/hyunseoklee-ai/reward_llm_detect\u4e0a\u63d0\u4f9b\u3002|\n", "2405.17378": "|**2024-05-27**|**RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects**|Ahmed Allam et.al.|[2405.17378](http://arxiv.org/abs/2405.17378)|**[link](https://github.com/AUCOHL/RTL-Repo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8f85\u52a9\u8fdb\u884c\u5bc4\u5b58\u5668\u4f20\u8f93\u7ea7\uff08Register Transfer Level, RTL\uff09\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5728\u53cd\u6620\u771f\u5b9e\u4e16\u754cRTL\u9879\u76ee\u590d\u6742\u6027\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u4e3a\u6b64\uff0c\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u57fa\u51c6\u2014\u2014RTL-Repo\uff0c\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5927\u89c4\u6a21RTL\u8bbe\u8ba1\u9879\u76ee\u4e2d\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u3002RTL-Repo\u5305\u542b\u4e86\u4eceGitHub\u516c\u5171\u4ed3\u5e93\u63d0\u53d6\u7684\u8d85\u8fc74000\u4e2aVerilog\u4ee3\u7801\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u63d0\u4f9b\u4e86\u5bf9\u5e94\u4ed3\u5e93\u7684\u5b8c\u6574\u4e0a\u4e0b\u6587\u3002\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u3001GPT-3.5\u3001Starcoder2\u4ee5\u53ca\u50cfVeriGen\u548cRTLCoder\u8fd9\u6837\u7684Verilog\u4e13\u7528\u6a21\u578b\u5728\u5185\u7684\u591a\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728RTL-Repo\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6bd4\u8f83\u5b83\u4eec\u5728\u751f\u6210\u590d\u6742\u9879\u76ee\u7684Verilog\u4ee3\u7801\u65b9\u9762\u7684\u8868\u73b0\u3002RTL-Repo\u4e3a\u786c\u4ef6\u8bbe\u8ba1\u793e\u533a\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b9d\u8d35\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645RTL\u8bbe\u8ba1\u573a\u666f\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u9488\u5bf9\u590d\u6742\u7684\u591a\u6587\u4ef6RTL\u9879\u76ee\u4e13\u95e8\u8bad\u7ec3Verilog\u4ee3\u7801\u751f\u6210\u3002RTL-Repo\u662f\u5f00\u6e90\u7684\uff0c\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u53ef\u7528\u3002|\n", "2405.17374": "|**2024-05-28**|**Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models**|ShengYun Peng et.al.|[2405.17374](http://arxiv.org/abs/2405.17374)|null|### \u80cc\u666f \u5b89\u5168\u6821\u51c6\u662f\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u884c\u4e3a\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u5e76\u907f\u514d\u6709\u5bb3\u884c\u4e3a\u7684\u5173\u952e\uff0c\u4f46\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u4ec5\u4f7f\u7528\u5c11\u91cf\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bad\u7ec3\u6837\u672c\u6765\u5fae\u8c03\u6a21\u578b\u53ef\u80fd\u5bfc\u81f4\u5b89\u5168\u6027\u88ab\u8f7b\u6613\u7834\u574f\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u901a\u8fc7\u63a2\u7d22LLM\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u98ce\u9669\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u4e2a\u666e\u904d\u5b58\u5728\u4e8e\u6d41\u884c\u5f00\u6e90LLM\u6a21\u578b\u53c2\u6570\u7a7a\u95f4\u4e2d\u7684\u65b0\u73b0\u8c61\uff0c\u79f0\u4e3a\u201c\u5b89\u5168\u76c6\u5730\u201d\uff1a\u968f\u673a\u6270\u52a8\u6a21\u578b\u6743\u91cd\u80fd\u4f7f\u6a21\u578b\u5728\u5c40\u90e8\u533a\u57df\u4fdd\u6301\u539f\u59cb\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u6027\u3002 ### \u53d1\u73b0\u4e0e\u8d21\u732e \u6211\u4eec\u7684\u53d1\u73b0\u542f\u53d1\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5b89\u5168\u5ea6\u91cf\u65b9\u6cd5\u2014\u2014VISAGE\uff0c\u5b83\u901a\u8fc7\u63a2\u6d4b\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30LLM\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u5b89\u5168\u6027\u3002\u53ef\u89c6\u5316\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6709\u52a9\u4e8e\u7406\u89e3\u5fae\u8c03\u5982\u4f55\u4f7f\u6a21\u578b\u504f\u79bb\u5b89\u5168\u76c6\u5730\uff0c\u4ece\u800c\u635f\u5bb3\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u7cfb\u7edf\u63d0\u793a\u5728\u4fdd\u62a4\u6a21\u578b\u65b9\u9762\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u79cd\u4fdd\u62a4\u751a\u81f3\u4f1a\u4f20\u9012\u7ed9\u5904\u4e8e\u5b89\u5168\u76c6\u5730\u5185\u7684\u6270\u52a8\u7248\u672c\u3002\u8fd9\u4e9b\u4ece\u5b89\u5168\u666f\u89c2\u7814\u7a76\u4e2d\u5f97\u51fa\u7684\u89c1\u89e3\u4e3a\u672a\u6765LLM\u5b89\u5168\u9886\u57df\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b0\u7684\u6d1e\u89c1\u3002|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414](http://arxiv.org/abs/2405.18414)|null|## \u80cc\u666f \u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval Augmented Generation\uff0cRAG\uff09\u901a\u8fc7\u7ed3\u5408\u73b0\u6709\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u7684\u54cd\u5e94\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u6587\u6863\u4e0e\u95ee\u9898\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u6027\u4e0d\u660e\u663e\u6216\u5b58\u5728\u90e8\u5206\u4fe1\u606f\u65f6\uff0cRAG\u7684\u6548\u679c\u5982\u4f55\uff1f\u53c8\u8be5\u5982\u4f55\u5904\u7406\u6587\u6863\u4e4b\u95f4\u7684\u5173\u8054\u6027\u5462\uff1f\u672c\u7814\u7a76\u65e8\u5728\u89e3\u7b54RAG\u751f\u6210\u4e2d\u7684\u8fd9\u4e24\u4e2a\u6838\u5fc3\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aG-RAG\u7684\u65b9\u6cd5\uff0c\u5b83\u662f\u4e00\u4e2a\u57fa\u4e8e\u56fe\u795e\u7ecf\u7f51\u7edc\uff08Graph Neural Networks\uff0cGNNs\uff09\u7684\u91cd\u6392\u5668\uff0c\u4ecb\u4e8eRAG\u7684\u68c0\u7d22\u5668\u548c\u9605\u8bfb\u5668\u4e4b\u95f4\u3002G-RAG\u7ed3\u5408\u4e86\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u63a5\u6027\u548c\u8bed\u4e49\u4fe1\u606f\uff08\u901a\u8fc7\u62bd\u8c61\u610f\u4e49\u8868\u793a\u56fe\uff09\uff0c\u4e3aRAG\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u6709\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u6392\u540d\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cG-RAG\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u9886\u5148\u65b9\u6cd5\uff0c\u540c\u65f6\u8ba1\u7b97\u5f00\u9500\u66f4\u5c0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86PaLM 2\u4f5c\u4e3a\u91cd\u6392\u5668\u7684\u8868\u73b0\uff0c\u53d1\u73b0\u5176\u660e\u663e\u900a\u8272\u4e8eG-RAG\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5373\u4f7f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u91cd\u6392\u5728RAG\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2405.18386": "|**2024-05-28**|**Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning**|Yixiao Zhang et.al.|[2405.18386](http://arxiv.org/abs/2405.18386)|**[link](https://github.com/ldzhangyx/instruct-MusicGen)**|**\u5728\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u9886\u57df\uff0c\u8fd1\u671f\u7684\u8fdb\u6b65\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u6765\u6539\u53d8\u97f3\u4e50\u98ce\u683c\u6216\u8c03\u6574\u4e50\u5668\u5143\u7d20\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u4ece\u5934\u8bad\u7ec3\u7279\u5b9a\u7684\u7f16\u8f91\u6a21\u578b\uff0c\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\uff0c\u8981\u4e48\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u7f16\u8f91\u540e\u7684\u97f3\u4e50\uff0c\u5bfc\u81f4\u97f3\u9891\u91cd\u5efa\u4e0d\u591f\u7cbe\u786e\u3002\u4e3a\u4e86\u7ed3\u5408\u4f18\u70b9\u5e76\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Instruct-MusicGen\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u9488\u5bf9\u9884\u8bad\u7ec3\u7684MusicGen\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9ad8\u6548\u5730\u6267\u884c\u7f16\u8f91\u6307\u4ee4\uff0c\u5982\u6dfb\u52a0\u3001\u5220\u9664\u6216\u5206\u79bb\u97f3\u8f68\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fee\u6539\u4e86\u539f\u59cbMusicGen\u67b6\u6784\uff0c\u5f15\u5165\u4e86\u6587\u672c\u878d\u5408\u6a21\u5757\u548c\u97f3\u9891\u878d\u5408\u6a21\u5757\uff0c\u4f7f\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6307\u4ee4\u6587\u672c\u548c\u97f3\u9891\u8f93\u5165\uff0c\u751f\u6210\u6240\u9700\u7684\u7f16\u8f91\u97f3\u4e50\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cInstruct-MusicGen\u4ec5\u5411\u539f\u59cb\u6a21\u578b\u589e\u52a0\u4e868%\u7684\u65b0\u53c2\u6570\uff0c\u5e76\u57285000\u6b65\u7684\u8bad\u7ec3\u540e\uff0c\u5176\u6027\u80fd\u8d85\u8d8a\u73b0\u6709\u57fa\u51c6\uff0c\u4e14\u8868\u73b0\u51fa\u4e0e\u4e13\u95e8\u9488\u5bf9\u4efb\u52a1\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u8fdb\u5c55\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u7684\u6548\u7387\uff0c\u8fd8\u62d3\u5bbd\u4e86\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u5728\u52a8\u6001\u97f3\u4e50\u5236\u4f5c\u73af\u5883\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002**|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380](http://arxiv.org/abs/2405.18380)|**[link](https://github.com/pixeli99/owlore)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\u3002\u7136\u800c\uff0c\u5927\u6a21\u578b\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u7b49\u53c2\u6570\u9ad8\u6548\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u4f46\u5f80\u5f80\u727a\u7272\u6027\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5185\u5b58\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u2014\u2014Outlier-weighed Layerwise Sampled Low-Rank Projection\uff08OwLore\uff09\uff0c\u5b83\u53d7\u5230LLMs\u5c42\u95f4\u5f02\u5e38\u5206\u5e03\u7684\u542f\u53d1\uff0c\u901a\u8fc7\u52a8\u6001\u91c7\u6837\u9884\u8bad\u7ec3\u5c42\u800c\u975e\u6dfb\u52a0\u989d\u5916\u9002\u914d\u5668\u6765\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u9996\u5148\u901a\u8fc7Heavy-Tailed Self-Regularization\u7406\u8bba\uff08HT-SR\uff09\u89e3\u8bfb\u5f02\u5e38\u73b0\u8c61\uff0c\u53d1\u73b0\u5177\u6709\u66f4\u591a\u5f02\u5e38\u503c\u7684\u5c42\u66f4\u503e\u5411\u4e8e\u5448\u73b0\u957f\u5c3e\u5206\u5e03\uff0c\u8bad\u7ec3\u6548\u679c\u66f4\u597d\u3002\u56e0\u6b64\uff0cOwLore\u7b56\u7565\u6027\u5730\u4e3a\u5f02\u5e38\u503c\u8f83\u591a\u7684\u5c42\u5206\u914d\u66f4\u9ad8\u7684\u91c7\u6837\u6982\u7387\uff0c\u4ee5\u66f4\u597d\u5730\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u77e5\u8bc6\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u51cf\u5c11\u5fae\u8c03\u65f6\u7684\u5185\u5b58\u9700\u6c42\uff0c\u6211\u4eec\u7ed3\u5408\u68af\u5ea6\u4f4e\u79e9\u6295\u5f71\uff0c\u4f7f\u5f97\u6bcf\u4e00\u5c42\u80fd\u4ee5\u4f4e\u79e9\u65b9\u5f0f\u9ad8\u6548\u8bad\u7ec3\u3002\u901a\u8fc7\u878d\u5408\u4f4e\u79e9\u4f18\u52bf\u548c\u6700\u4f18\u5c42\u522b\u91c7\u6837\u7b56\u7565\uff0cOwLore\u663e\u8457\u4f18\u5316\u4e86LLM\u526a\u679d\u4e2d\u7684\u5185\u5b58-\u6027\u80fd\u6743\u8861\u3002\u6211\u4eec\u5728\u591a\u4e2a\u67b6\u6784\uff0c\u5982LLaMa2\u3001LLaMa3\u548cMistral\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cOwLore\u6301\u7eed\u4f18\u4e8e\u57fa\u7840\u65b9\u6cd5\uff0c\u5305\u62ec\u5168\u91cf\u5fae\u8c03\u3002\u4f8b\u5982\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u57fa\u51c6\u4e0a\uff0cOwLore\u53ef\u5b9e\u73b0\u5e73\u57471.1%\u7684\u7cbe\u5ea6\u63d0\u5347\uff0cMMLU\u4e0a\u63d0\u9ad83.0%\uff0c\u800c\u5728MT-Bench\u4e0a\u66f4\u662f\u6709\u663e\u8457\u768410%\u63d0\u5347\uff0c\u540c\u65f6\u5185\u5b58\u6548\u7387\u66f4\u9ad8\u3002\u7279\u522b\u5730\uff0cOwLore\u4ec5\u970021GB\u5185\u5b58\u5373\u53ef\u5bf9LLaMa2-7B\u8fdb\u884c\u5fae\u8c03\u3002**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377](http://arxiv.org/abs/2405.18377)|null|\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u590d\u6742\u63a8\u7406\u3001\u60c5\u611f\u5206\u6790\u7b49\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u63a8\u52a8\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f3a\u5927\u7684\u529f\u80fd\u4f34\u968f\u7740\u5de8\u5927\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u9650\u5236\u4e86\u5728\u5927\u591a\u6570\u786c\u4ef6\u5e73\u53f0\u4e0a\u7684\u4f7f\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8eLLaMA2-7B\u8fdb\u884c\u5355\u6b21\u5fae\u8c03\u540e\uff0c\u901a\u8fc7\u9057\u4f20\u7b97\u6cd5\u641c\u7d22\u627e\u5230\u66f4\u5c0f\u3001\u8ba1\u7b97\u590d\u6742\u5ea6\u66f4\u4f4e\u7684\u7f51\u7edc\u67b6\u6784\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u6807\u51c6\u57fa\u51c6\u4efb\u52a1\uff0c\u9884\u8bad\u7ec3\u7684LLaMA2-7B\u6a21\u578b\u5b9e\u9645\u4e0a\u8fc7\u4e8e\u5e9e\u5927\u4e14\u590d\u6742\u3002\u6211\u4eec\u5b9e\u73b0\u4e861.5\u500d\u7684\u6a21\u578b\u5927\u5c0f\u7f29\u51cf\u548c1.3\u500d\u7684\u541e\u5410\u91cf\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51e0\u4e4e\u65e0\u635f\u7684\u51c6\u786e\u6027\u3002\u76f8\u8f83\u4e8e\u67d0\u4e9b\u526a\u679d\u6216\u7a00\u758f\u5316\u6280\u672f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6548\u7387\u548c\u6548\u679c\u4e0a\u66f4\u4e3a\u4f18\u8d8a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u91cf\u5316\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\u7684\u6548\u679c\uff0c\u8fdb\u4e00\u6b65\u901a\u8fc7\u91cf\u5316\u51cf\u5c11\u4e86\u627e\u5230\u7684\u7f51\u7edc\u7684\u5927\u5c0f\u548c\u590d\u6742\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u672c\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u79cd\u81ea\u52a8\u521b\u5efa\u53ef\u5728\u66f4\u5ec9\u4ef7\u548c\u5e7f\u6cdb\u53ef\u7528\u786c\u4ef6\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684LLMs\u7684\u65b9\u6cd5\u3002|\n", "2405.18376": "|**2024-05-28**|**Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning**|Dongjie Chen et.al.|[2405.18376](http://arxiv.org/abs/2405.18376)|**[link](https://github.com/Dong-Jie-Chen/RCL)**|**### \u80cc\u666f \u6e90\u514d\u8d39\u9886\u57df\u9002\u5e94\uff08SFDA\uff09\u7684\u76ee\u6807\u662f\u4ec5\u4f7f\u7528\u672a\u6807\u8bb0\u7684\u9776\u57df\u6570\u636e\u6765\u8c03\u6574\u9884\u8bad\u7ec3\u7684\u6e90\u6a21\u578b\u3002\u5f53\u524d\u7684SFDA\u65b9\u6cd5\u5728\u6709\u6548\u5229\u7528\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u6316\u6398\u9776\u57df\u6570\u636e\u6f5c\u529b\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u548c\u6587\u672c\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5e94\u7528\u4e8eSFDA\u65f6\u5b58\u5728\u95ee\u9898\uff0c\u5982\u6307\u4ee4\u6267\u884c\u5931\u8d25\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\u4ee5\u53ca\u5728\u9002\u5e94\u524d\u6027\u80fd\u8bc4\u4f30\u56f0\u96be\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014\u53ef\u9760\u6027\u57fa\u4e8e\u8bfe\u7a0b\u5b66\u4e60\uff08RCL\uff09\uff0c\u5b83\u901a\u8fc7\u4f2a\u6807\u7b7e\u5316\u6574\u5408\u591a\u4e2aMLLM\u4ee5\u4fc3\u8fdb\u77e5\u8bc6\u5229\u7528\uff0c\u5e94\u7528\u4e8eSFDA\u3002 ### \u65b9\u6cd5 \u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\uff1a1) \u53ef\u9760\u77e5\u8bc6\u8f6c\u79fb\uff0c2) \u81ea\u6211\u7ea0\u6b63\uff0c3) MLLM\u5f15\u5bfc\u7684\u77e5\u8bc6\u6269\u5c55\uff0c\u4ee5\u53ca4) \u591a\u70ed\u63a9\u7801\u7cbe\u70bc\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u534f\u540c\u4f5c\u7528\uff0c\u9010\u6b65\u53d1\u6398\u9776\u57df\u672a\u6807\u8bb0\u6570\u636e\u7684\u4ef7\u503c\u3002RCL\u5728\u591a\u4e2aSFDA\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\uff08SOTA\uff09\u6027\u80fd\uff0c\u4f8b\u5982\u5728DomainNet\u4e0a\u63d0\u5347\u663e\u8457\uff0c\u8fbe\u5230$\\textbf{+9.4\\%}$\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u589e\u5f3a\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u540c\u65f6\u65e0\u9700\u8bbf\u95ee\u6e90\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Dong-Jie-Chen/RCL\u83b7\u53d6\u3002**|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375](http://arxiv.org/abs/2405.18375)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\u5e38\u8bc6\u63a8\u7406\u662f\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u7684\u91cd\u8981\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u6b64\u5df2\u5f00\u53d1\u51fa\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5927\u591a\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u521b\u5efa\u5e73\u884c\u57fa\u51c6\u6709\u52a9\u4e8e\u8de8\u8bed\u8a00\u8bc4\u4f30\uff0c\u4ece\u800c\u66f4\u597d\u5730\u7406\u89e3\u4e0d\u540c\u8bed\u8a00\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u6cf0\u8bed\u7248\u7684Winograd Schema\u96c6\u5408\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u6d4b\u8bd5\u6cf0\u8bed\u4e2d\u7684\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u65b0\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u9080\u8bf7\u6bcd\u8bed\u8005\u3001\u4e13\u4e1a\u7ffb\u8bd1\u548c\u4e25\u683c\u9a8c\u8bc1\u7684\u65b9\u6cd5\uff0c\u786e\u4fdd\u8be5\u7cfb\u5217\u9898\u5e93\u80fd\u51c6\u786e\u53cd\u6620\u6cf0\u56fd\u8bed\u8a00\u7684\u72ec\u7279\u6027\u3001\u4e60\u8bed\u548c\u6587\u5316\u5f15\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u7cca\u6027\u548c\u5e38\u8bc6\u6311\u6218\u3002\u6211\u4eec\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u548cClaude-3-Opus\uff09\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5c3d\u7ba1\u5728\u82f1\u8bed\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u4f46\u5b83\u4eec\u5728\u6cf0\u8bed\u4e2d\u7684\u6027\u80fd\u660e\u663e\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u5728\u591a\u8bed\u8a00\u5e38\u8bc6\u63a8\u7406\u65b9\u9762\u4ecd\u6709\u5f85\u8fdb\u6b65\u3002|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369](http://arxiv.org/abs/2405.18369)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u7684\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u5b83\u4eec\u6210\u529f\u7684\u5173\u952e\u5728\u4e8e\u63d0\u793a\u7684\u6982\u5ff5\uff0c\u5373\u6307\u5bfc\u6a21\u578b\u751f\u6210\u8f93\u51fa\u3002\u7136\u800c\uff0c\u624b\u52a8\u521b\u5efa\u63d0\u793a\u65e2\u8017\u65f6\u53c8\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\uff0c\u56e0\u6b64\u9700\u8981\u81ea\u52a8\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u4ecb\u7ecdPromptWizard\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u8fed\u4ee3\u5730\u5408\u6210\u548c\u4f18\u5316\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u63d0\u793a\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0cPromptWizard\u540c\u65f6\u4f18\u5316\u63d0\u793a\u6307\u4ee4\u548c\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u4ee5\u6700\u5927\u5316\u6a21\u578b\u6027\u80fd\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u53d8\u5f02\u6307\u4ee4\u5e76\u5f15\u5165\u8d1f\u4f8b\uff0c\u9010\u6b65\u6df1\u5316\u7406\u89e3\u5e76\u4fdd\u8bc1\u591a\u6837\u6027\u3002\u501f\u52a9\u4e00\u4e2a\u8bc4\u5224\u8005\uff0cPromptWizard\u8fdb\u4e00\u6b65\u6539\u8fdb\u6307\u4ee4\u548c\u793a\u4f8b\uff0c\u878d\u5165\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u4ee5\u5b9e\u73b0\u6700\u4f73\u8868\u73b0\u3002PromptWizard\u5177\u6709\u8ba1\u7b97\u6548\u7387\u9ad8\u3001\u9002\u5e94\u4e0d\u540c\u8bad\u7ec3\u6570\u636e\u91cf\u573a\u666f\u4ee5\u53ca\u5728\u5c0f\u578bLLM\u4e0a\u540c\u6837\u6709\u6548\u7684\u7279\u70b9\u3002\u901a\u8fc7\u5bf98\u4e2a\u6570\u636e\u96c6\u768435\u4e2a\u4efb\u52a1\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aPromptWizard\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u7b56\u7565\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u63d0\u793a\u4f18\u5316\u65b9\u9762\u7684\u9ad8\u6548\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361](http://arxiv.org/abs/2405.18361)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u4efb\u52a1\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u57fa\u4e8e\u7aef\u5230\u7aef\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u5e94\u7528\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u8bd5\u56fe\u878d\u5408\u5f3a\u5927\u7684\u903b\u8f91\u63a8\u7406\u548c\u8ba4\u77e5\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u5168\u9762\u7684\u7aef\u5230\u7aef\u89c4\u5212\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684VLM\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e2D\u89c6\u89c9\u5206\u8bcd\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5728\u5904\u7406\u4e09\u7ef4\u51e0\u4f55\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u9760\u7684\u89c4\u5212\u81f3\u5173\u91cd\u8981\u3002\u7814\u7a76\u8868\u660e\uff0c2D\u5206\u8bcd\u7684LLM\u5e76\u4e0d\u80fd\u51c6\u786e\u611f\u77e5\u4e09\u7ef4\u73af\u5883\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8eVLM\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u53ef\u9760\u6027\u7684\u8d28\u7591\u3002 \u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86DETR\u98ce\u683c\u76843D\u611f\u77e5\u5668\u4f5c\u4e3a3D\u5206\u8bcd\u5668\uff0c\u4e0e\u5355\u5c42\u7ebf\u6027\u6295\u5f71\u5668\u76f8\u8fde\uff0c\u5de7\u5999\u5730\u5229\u7528\u4e86\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u7684\u56fa\u6709\u7279\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u9ad8\u5206\u8fa8\u7387\u591a\u89c6\u89d2\u56fe\u50cf\u7684\u540c\u65f6\u5904\u7406\u548c\u65f6\u7a7a\u5efa\u6a21\u3002\u5c3d\u7ba1\u7b80\u5355\uff0c\u4f46Atlas\u5728NuScenes\u6570\u636e\u96c6\u4e0a\u76843D\u68c0\u6d4b\u548c\u81ea\u4e3b\u9a7e\u9a76\u89c4\u5212\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e863D\u5206\u8bcd\u7684LLM\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u9760\u81ea\u52a8\u9a7e\u9a76\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2405.18359": "|**2024-05-28**|**Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs**|Somnath Kumar et.al.|[2405.18359](http://arxiv.org/abs/2405.18359)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5168\u7403\u8303\u56f4\u5185\u91cd\u5851\u4f17\u591a\u9886\u57df\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u975e\u62c9\u4e01\u5b57\u6bcd\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u65f6\u7684\u5305\u5bb9\u6027\u548c\u6548\u679c\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u5173\u952e\u6311\u6218\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u5927\u91cf\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u591a\u8bed\u8a00LLMs\u7684\u8868\u73b0\u3002\u901a\u8fc7\u7cfb\u7edf\u5730\u7814\u7a76\u548c\u8bc4\u4f30\u5404\u79cd\u8bed\u8a00\u5728\u6d41\u884c\u7684\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b0\u9896\u6280\u672f\uff0c\u4ee5\u91ca\u653eLLMs\u5728\u591a\u5143\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u771f\u6b63\u6f5c\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u7b56\u7565\uff0c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u591a\u8bed\u8a00\u80fd\u529b\uff1a\u9996\u5148\uff0c\u7cbe\u5fc3\u4f18\u5316\u9002\u7528\u4e8e\u591a\u8bed\u8a00LLM\u7684\u63d0\u793a\uff0c\u6316\u6398\u5176\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5404\u8bed\u8a00\u7684\u8868\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u6df7\u5408\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u591a\u8bed\u8a00\u5d4c\u5165\u7684LLM\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\uff0c\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u591a\u4efb\u52a1\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u52a8\u6001\u5b66\u4e60\u7b56\u7565\uff0c\u5b9e\u73b0\u5b9e\u65f6\u6839\u636e\u67e5\u8be2\u52a8\u6001\u9009\u62e9\u6700\u5408\u9002\u7684\u63d0\u793a\u7b56\u7565\u3001LLM\u6a21\u578b\u548c\u5d4c\u5165\u6a21\u578b\uff0c\u4ece\u800c\u6700\u5927\u5316LLM\u5728\u4e0d\u540c\u8bed\u8a00\u4e0a\u7684\u6548\u7387\uff0c\u8d85\u8d8a\u4e86\u6700\u4f73\u9759\u6001\u548c\u968f\u673a\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u65e2\u9002\u7528\u4e8e\u79bb\u7ebf\u914d\u7f6e\u8c03\u6574\uff0c\u4e5f\u652f\u6301\u5728\u7ebf\u9002\u5e94\uff0c\u80fd\u591f\u65e0\u7f1d\u9002\u5e94\u65b0\u8bed\u8a00\u548c\u6570\u636e\u96c6\uff0c\u663e\u8457\u63a8\u52a8\u4e86\u591a\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u5728\u5404\u79cd\u8bed\u8a00\u4e2d\u7684\u8fdb\u6b65\u3002|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358](http://arxiv.org/abs/2405.18358)|null|## \u80cc\u666f \u8fd1\u671f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u878d\u5408\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7ec6\u81f4\u7684\u591a\u6a21\u6001\u7406\u89e3\u3001\u590d\u6742\u4efb\u52a1\u89e3\u6790\u4ee5\u53ca\u591a\u6a21\u6001\u4fe1\u606f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMMCTAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u89e3\u51b3\u5f53\u524dMLLM\u5728\u590d\u6742\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u4e2d\u56fa\u6709\u5c40\u9650\u6027\u7684\u65b0\u578b\u591a\u6a21\u6001\u6279\u5224\u6027\u601d\u7ef4\u4ee3\u7406\u6846\u67b6\u3002MMCTAgent\u501f\u9274\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u548c\u6279\u5224\u6027\u601d\u8003\u7684\u7279\u70b9\uff0c\u901a\u8fc7\u8fed\u4ee3\u5206\u6790\u591a\u6a21\u6001\u4fe1\u606f\u3001\u62c6\u89e3\u95ee\u9898\u3001\u89c4\u5212\u7b56\u7565\uff0c\u5e76\u5b9e\u73b0\u52a8\u6001\u63a8\u7406\u3002 \u6b64\u5916\uff0cMMCTAgent\u8fd8\u878d\u5165\u4e86\u6279\u5224\u6027\u601d\u8003\u5143\u7d20\uff0c\u5982\u5bf9\u6700\u7ec8\u7b54\u6848\u7684\u9a8c\u8bc1\u548c\u81ea\u6211\u53cd\u601d\u3002\u5b83\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5b9a\u4e49\u57fa\u4e8e\u89c6\u89c9\u7684\u8bc4\u5224\u8005\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ece\u800c\u63d0\u5347\u51b3\u7b56\u80fd\u529b\u3002\u5728\u591a\u4e2a\u56fe\u50cf\u7406\u89e3\u548c\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u4e25\u8c28\u5730\u8bc4\u4f30\u4e86MMCTAgent\uff08\u5305\u62ec\u5e26\u8bc4\u5224\u8005\u7684\u7248\u672c\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u5728\u8d85\u8d8a\u57fa\u7840MLLM\u548c\u5176\u4ed6\u5de5\u5177\u589e\u5f3a\u7684\u7ba1\u9053\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335](http://arxiv.org/abs/2405.19335)|null|\u6211\u4eec\u63d0\u51faX-VILA\uff0c\u4e00\u79cd\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u529f\u80fd\u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u878d\u5408\u4e86\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u6a21\u6001\u3002\u901a\u8fc7\u5c06\u5404\u6a21\u6001\u7279\u5b9a\u7684\u7f16\u7801\u5668\u4e0eLLM\u8f93\u5165\u5bf9\u9f50\uff0c\u5e76\u5c06\u6269\u6563\u89e3\u7801\u5668\u4e0eLLM\u8f93\u51fa\u5bf9\u9f50\uff0cX-VILA\u5b9e\u73b0\u4e86\u8de8\u6a21\u6001\u7406\u89e3\u3001\u63a8\u7406\u548c\u751f\u6210\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8de8\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6709\u6548\u7684\u4efb\u610f\u6a21\u6001\u6307\u4ee4\u8ddf\u968f\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u7684\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u5b58\u5728\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5bfc\u81f4\u89c6\u89c9\u4fe1\u606f\u4e22\u5931\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u89c6\u89c9\u5bf9\u9f50\u673a\u5236\uff0c\u5305\u62ec\u4e00\u4e2a\u89c6\u89c9\u5d4c\u5165\u9ad8\u901f\u516c\u8def\u6a21\u5757\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u8d44\u6e90\u9ad8\u6548\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u4f7f\u5f97X-VILA\u5728\u4efb\u610f\u6a21\u6001\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5927\u5e45\u8d85\u8d8a\u5148\u524d\u7684\u65b9\u6cd5\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u5728\u7f3a\u4e4f\u7c7b\u4f3c\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0cX-VILA\u5728\u4e0d\u540c\u6a21\u6001\u95f4\u4e5f\u5c55\u73b0\u51fa\u6d8c\u73b0\u7279\u6027\u3002\u8be5\u9879\u76ee\u5c06\u5f00\u6e90\u3002|\n", "2405.19334": "|**2024-05-29**|**LLMs Meet Multimodal Generation and Editing: A Survey**|Yingqing He et.al.|[2405.19334](http://arxiv.org/abs/2405.19334)|**[link](https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u5c06\u5b83\u4eec\u4e0e\u591a\u6a21\u6001\u5b66\u4e60\u76f8\u7ed3\u5408\u3002\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8c03\u67e5\u4e3b\u8981\u96c6\u4e2d\u5728\u7406\u89e3\u4e0a\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8de8\u56fe\u50cf\u3001\u89c6\u9891\u30013D\u548c\u97f3\u9891\u7b49\u9886\u57df\u7684\u591a\u6a21\u6001\u751f\u6210\uff0c\u7279\u522b\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u9886\u57df\u4e2d\u7684\u91cc\u7a0b\u7891\u5f0f\u5de5\u4f5c\u53ca\u5176\u6280\u672f\u8fdb\u6b65\u3002\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u5173\u952e\u6280\u672f\u7ec4\u4ef6\uff0c\u4ee5\u53ca\u5728\u76f8\u5173\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5256\u6790\u4e86\u501f\u52a9\u73b0\u6709\u751f\u6210\u6a21\u578b\u8fdb\u884c\u4eba\u7c7b-\u8ba1\u7b97\u673a\u4ea4\u4e92\u7684\u5de5\u5177\u589e\u5f3a\u578b\u591a\u6a21\u6001\u4ee3\u7406\u3002\u6700\u540e\uff0c\u6211\u4eec\u5168\u9762\u8ba8\u8bba\u4e86\u4eba\u5de5\u667a\u80fd\u5b89\u5168\u7684\u8fdb\u6b65\uff0c\u5e76\u63a2\u7d22\u4e86\u65b0\u5174\u5e94\u7528\u548c\u672a\u6765\u524d\u666f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7cfb\u7edf\u800c\u6df1\u5165\u7684\u591a\u6a21\u6001\u751f\u6210\u6982\u8ff0\uff0c\u6709\u671b\u63a8\u52a8\u751f\u6210\u5185\u5bb9\u7684\u4eba\u5de5\u667a\u80fd\uff08AIGC\uff09\u548c\u4e16\u754c\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6240\u6709\u76f8\u5173\u7684\u8bba\u6587\u5217\u8868\u53ef\u5728\u627e\u5230\u3002**|\n", "2405.19333": "|**2024-05-29**|**Multi-Modal Generative Embedding Model**|Feipeng Ma et.al.|[2405.19333](http://arxiv.org/abs/2405.19333)|null|\u5728\u5927\u591a\u6570\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\uff0c\u95ee\u9898\u53ef\u4ee5\u5f52\u7ed3\u4e3a\u751f\u6210\u6216\u5d4c\u5165\u3002\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u901a\u8fc7\u5c06\u8bed\u8a00\u6a21\u5757\u5206\u89e3\u4e3a\u4e00\u4e2a\u7528\u4e8e\u751f\u6210\u7684\u6587\u672c\u89e3\u7801\u5668\u548c\u4e00\u4e2a\u7528\u4e8e\u5d4c\u5165\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u5904\u7406\u8fd9\u4e24\u79cd\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u591a\u6a21\u6001\u65b9\u6cd5\u7684\u7b80\u7ea6\u6027\uff0c\u672c\u5de5\u4f5c\u8bd5\u56fe\u4ec5\u4f7f\u7528\u4e00\u4e2a\u6a21\u578b\u6765\u5904\u7406\u6bcf\u79cd\u6a21\u6001\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u6a21\u6001\u751f\u6210\u5d4c\u5165\u6a21\u578b\uff08MM-GEM\uff09\uff0c\u5b83\u5c06\u751f\u6210\u548c\u5d4c\u5165\u76ee\u6807\u6574\u5408\u5230\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86PoolAggregator\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u5e76\u5b9e\u73b0\u7ec6\u7c92\u5ea6\u7684\u5d4c\u5165\u548c\u751f\u6210\u80fd\u529b\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u4e4b\u95f4\u5e76\u6ca1\u6709\u663e\u8457\u51b2\u7a81\u3002\u4f8b\u5982\uff0c\u57fa\u4e8eViT-Large\u548cTinyLlama\u7684MM-GEM\u5728\u8bf8\u5982\u8de8\u6a21\u6001\u68c0\u7d22\u548c\u96f6\u6837\u672c\u5206\u7c7b\u7b49\u591a\u6a21\u6001\u5d4c\u5165\u6a21\u578b\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u5177\u5907\u826f\u597d\u7684\u56fe\u50cf\u63cf\u8ff0\u80fd\u529b\u3002\u6b64\u5916\uff0cMM-GEM\u80fd\u591f\u65e0\u7f1d\u6267\u884c\u533a\u57df\u7ea7\u522b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u548c\u68c0\u7d22\u4efb\u52a1\u3002\u53e6\u5916\uff0cMM-GEM\u4e2d\u7684\u5148\u8fdb\u6587\u672c\u6a21\u578b\u5bf9\u4e8e\u957f\u6587\u672c\u548c\u56fe\u50cf\u68c0\u7d22\u7684Recall@1\u6307\u6807\u5e26\u6765\u4e86\u8d85\u8fc75%\u7684\u63d0\u5347\u3002|\n", "2405.19332": "|**2024-05-29**|**Self-Exploring Language Models: Active Preference Elicitation for Online Alignment**|Shenao Zhang et.al.|[2405.19332](http://arxiv.org/abs/2405.19332)|**[link](https://github.com/shenao-zhang/selm)**|****\u6458\u8981\uff1a** \u504f\u597d\u4f18\u5316\uff0c\u7279\u522b\u662f\u5728\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u7684\u9a71\u52a8\u4e0b\uff0c\u5df2\u7ecf\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u610f\u613f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u6210\u5c31\u3002\u76f8\u8f83\u4e8e\u4f7f\u7528\u56fa\u5b9a\u6570\u636e\u96c6\u7684\u79bb\u7ebf\u5bf9\u9f50\uff0c\u901a\u8fc7\u4eba\u6216\u4eba\u5de5\u667a\u80fd\u5bf9\u6a21\u578b\u751f\u6210\u7684\u53cd\u9988\u901a\u5e38\u80fd\u591f\u901a\u8fc7\u8fed\u4ee3\u8fc7\u7a0b\u63d0\u5347\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\u548cLLMs\u7684\u4e00\u81f4\u6027\u3002\u7136\u800c\uff0c\u8981\u5b9e\u73b0\u5168\u5c40\u51c6\u786e\u7684\u5956\u52b1\u6a21\u578b\uff0c\u9700\u8981\u7cfb\u7edf\u5730\u63a2\u7d22\u751f\u6210\u5404\u79cd\u5404\u6837\u7684\u54cd\u5e94\uff0c\u4ee5\u6db5\u76d6\u81ea\u7136\u8bed\u8a00\u7684\u5e7f\u9614\u7a7a\u95f4\u3002\u4ec5\u4f9d\u8d56\u6807\u51c6\u5956\u52b1\u6700\u5927\u5316LLMs\u7684\u968f\u673a\u91c7\u6837\u662f\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u7684\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5c42\u76ee\u6807\uff0c\u4e50\u89c2\u5730\u503e\u5411\u4e8e\u53ef\u80fd\u5177\u6709\u9ad8\u5956\u52b1\u7684\u54cd\u5e94\uff0c\u4ee5\u6b64\u6765\u4e3b\u52a8\u63a2\u7d22\u5206\u5e03\u5916\u533a\u57df\u3002\u901a\u8fc7\u89e3\u51b3\u5185\u5c42\u95ee\u9898\uff0c\u5229\u7528\u91cd\u65b0\u53c2\u6570\u5316\u7684\u5956\u52b1\u51fd\u6570\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aSelf-Exploring Language Models\uff08SELM\uff09\u7684\u7b97\u6cd5\u3002\u5b83\u6d88\u9664\u4e86\u5bf9\u5355\u72ec\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u7684\u9700\u6c42\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u76f4\u89c2\u7684\u76ee\u6807\u5bf9LLMs\u8fdb\u884c\u8fed\u4ee3\u66f4\u65b0\u3002\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u76f8\u6bd4\uff0cSELM\u7684\u76ee\u6807\u964d\u4f4e\u4e86\u5bf9\u672a\u89c1\u8fc7\u7684\u8fc7\u5ea6\u5ef6\u4f38\u7684\u65e0\u5dee\u522b\u504f\u597d\uff0c\u63d0\u9ad8\u4e86\u63a2\u7d22\u6548\u7387\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728Zephyr-7B-SFT\u548cLlama-3-8B-Instruct\u6a21\u578b\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0cSELM\u5728MT-Bench\u548cAlpacaEval 2.0\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4ee5\u53ca\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u5404\u79cd\u6807\u51c6\u5b66\u672f\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328](http://arxiv.org/abs/2405.19328)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u89c4\u8303\u6a21\u5757\u201d\u7684\u67b6\u6784\uff0c\u5b83\u9488\u5bf9\u751f\u6210\u6027\u4ee3\u7406\u5728\u9762\u5bf9\u5305\u542b\u73b0\u6709\u89c4\u8303\u7684\u793e\u4f1a\u7ed3\u6784\u65f6\u7684\u534f\u4f5c\u6311\u6218\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u8bc4\u4f30\u73af\u5883\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u793e\u4f1a\u4efb\u52a1\u65f6\uff0c\u5982\u4f55\u8bc6\u522b\u5e76\u9002\u5e94\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u89c4\u8303\u6a21\u5757\u7684\u6838\u5fc3\u5728\u4e8e\u4fc3\u8fdb\u5747\u8861\u9009\u62e9\uff0c\u501f\u9274\u5206\u7c7b\u673a\u6784\u5b9e\u73b0\u76f8\u5173\u5747\u8861\u7684\u6982\u5ff5\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u540c\u4f34\u4e92\u52a8\u5b66\u4e60\u73af\u5883\u4e2d\u4e0d\u540c\u5019\u9009\u673a\u6784\u4e2d\u7684\u6743\u5a01\u6027\u3002\u901a\u8fc7\u63d0\u5347\u89c4\u8303\u80fd\u529b\uff0c\u4ee3\u7406\u53ef\u4ee5\u534f\u8c03\u5236\u88c1\u884c\u4e3a\uff0c\u8fdb\u800c\u5f71\u54cd\u793e\u4ea4\u73af\u5883\u4e2d\u7684\u57fa\u672c\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4f53\u798f\u7949\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u652f\u6301\u673a\u6784\u7684\u65b0\u73af\u5883\uff0c\u5e76\u6839\u636e\u4e24\u4e2a\u4e3b\u8981\u6807\u51c6\u6765\u8bc4\u4f30\u8be5\u6846\u67b6\uff1a\u4e00\u662f\u4ee3\u7406\u80fd\u5426\u5ffd\u7565\u975e\u6743\u5a01\u673a\u6784\uff0c\u4e8c\u662f\u4ee3\u7406\u5728\u591a\u4e2a\u9009\u9879\u4e2d\u8bc6\u522b\u6743\u5a01\u673a\u6784\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u914d\u5907\u4e86\u89c4\u8303\u6a21\u5757\u7684\u4ee3\u7406\u76f8\u6bd4\u57fa\u7840\u4ee3\u7406\u80fd\u5b9e\u73b0\u66f4\u7a33\u5b9a\u7684\u5408\u4f5c\u6548\u679c\uff0c\u8fd9\u4e3a\u7814\u7a76\u8bbe\u8ba1\u8003\u8651\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u7684\u73af\u5883\u548c\u4ee3\u7406\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002|\n", "2405.19327": "|**2024-05-29**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327](http://arxiv.org/abs/2405.19327)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u51fa\u4e8e\u5546\u4e1a\u5229\u76ca\uff0c\u50cfGPT\u3001Gemini\u548cClaude\u8fd9\u6837\u7684\u6700\u5148\u8fdb\u6a21\u578b\u88ab\u5c01\u95ed\u5728\u4e13\u6709\u63a5\u53e3\u540e\uff0c\u5176\u8bad\u7ec3\u8be6\u60c5\u5e76\u672a\u516c\u5f00\u3002\u8fd1\u671f\uff0c\u4e00\u4e9b\u673a\u6784\u5f00\u6e90\u4e86\u7c7b\u4f3c\u6027\u80fd\u7684LLMs\uff0c\u5982LLaMA-3\uff0c\u4f46\u5927\u591a\u6570\u7ec6\u8282\uff08\u5982\u4e2d\u95f4\u68c0\u67e5\u70b9\u3001\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u8bad\u7ec3\u4ee3\u7801\u7b49\uff09\u4ecd\u672a\u62ab\u9732\u3002\u4e3a\u4e86\u63d0\u9ad8LLMs\u7684\u900f\u660e\u5ea6\uff0c\u7814\u7a76\u754c\u6b63\u5728\u63a8\u52a8\u771f\u6b63\u5f00\u653e\u7684\u6a21\u578b\uff0c\u5982Pythia\u3001Amber\u548cOLMo\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u4fe1\u606f\uff0c\u4fc3\u8fdb\u4e86\u5bf9\u5927\u6a21\u578b\u6027\u80fd\u3001\u5c40\u9650\u6027\u3001\u504f\u89c1\u548c\u98ce\u9669\u7684\u79d1\u5b66\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5f00\u653e\u6a21\u578b\u5728\u63a8\u7406\u3001\u77e5\u8bc6\u548c\u7f16\u7a0b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ecd\u900a\u4e8e\u540c\u7b49\u89c4\u6a21\u7684\u5c01\u95ed\u6e90\u7801\u6a21\u578b\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u6e90\u4e86MAP-Neo\uff0c\u4e00\u4e2a\u62e5\u670970\u4ebf\u53c2\u6570\u7684\u53cc\u8bed\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u5934\u5f00\u59cb\u57284.5\u4e07\u4ebf\u9ad8\u8d28\u91cf\u4ee4\u724c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002MAP-Neo\u662f\u9996\u4e2a\u4e0e\u73b0\u6709\u9876\u7ea7LLMs\u6027\u80fd\u76f8\u5f53\u7684\u5b8c\u5168\u5f00\u6e90\u7684\u53cc\u8bed\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u4e86\u6240\u6709\u7ec6\u8282\uff0c\u5305\u62ec\u6e05\u7406\u540e\u7684\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u3001\u6570\u636e\u6e05\u6d17\u6d41\u7a0b\u3001\u68c0\u67e5\u70b9\u4ee5\u53ca\u4f18\u5316\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30\u6846\u67b6\uff0c\u4ee5\u4f9b\u91cd\u73b0\u3002\u6211\u4eec\u671f\u671bMAP-Neo\u80fd\u63a8\u52a8\u5f00\u653e\u7814\u7a76\u793e\u533a\u7684\u53d1\u5c55\uff0c\u6fc0\u53d1\u66f4\u591a\u521b\u65b0\uff0c\u4fc3\u8fdbLLMs\u7684\u8fdb\u4e00\u6b65\u63d0\u5347\u3002|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326](http://arxiv.org/abs/2405.19326)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u96f6\u6837\u672c3D\u63a8\u7406\u5206\u5272\uff0c\u76ee\u6807\u662f\u9488\u5bf9\u7269\u4f53\u7684\u90e8\u4ef6\u641c\u7d22\u548c\u5b9a\u4f4d\uff0c\u8fd9\u662f\u4e00\u79cd\u8d85\u8d8a\u4e86\u5148\u524d\u7c7b\u522b\u7279\u5b9a\u76843D\u8bed\u4e49\u5206\u5272\u30013D\u5b9e\u4f8b\u5206\u5272\u548c\u5f00\u653e\u8bcd\u6c473D\u5206\u5272\u5c40\u9650\u7684\u65b0\u8303\u5f0f\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aReasoning3D\u7684\u7b80\u5355\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u7684\u547d\u4ee4\uff0c\u5bf93D\u7f51\u683c\u8fdb\u884c\uff08\u7ec6\u81f4\uff09\u90e8\u5206\u5206\u5272\uff0c\u540c\u65f6\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u548c\u63a8\u7406\u7b54\u6848\u7684\u4ea4\u4e92\u5f0f\u5206\u5272\u80fd\u529b\u3002\u7279\u522b\u5730\uff0cReasoning3D\u5229\u7528\u9884\u8bad\u7ec3\u76842D\u5206\u5272\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\uff0c\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u89e3\u6790\u7528\u6237\u8f93\u5165\u67e5\u8be2\u3002\u5df2\u6709\u7814\u7a76\u8868\u660e\uff0c\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8d4b\u4e88\u57fa\u7840\u6a21\u578b\u4e16\u754c\u77e5\u8bc6\u7684\u5148\u9a8c\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u590d\u6742\u6307\u4ee4\uff0c\u8fd9\u4f7f\u5f97\u6211\u4eec\u5728\u4f9d\u8d56\u6709\u96503D\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u201c\u5206\u5272\u4efb\u4f55\u4e1c\u897f\u201d\uff08\u6e90\u6548\u7387\u9ad8\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u6cdb\u5316\u6027\uff0c\u80fd\u6709\u6548\u6839\u636e\u9690\u6027\u6587\u672c\u67e5\u8be2\u57283D\u5bf9\u8c61\uff083D\u7f51\u683c\uff09\u4e2d\u5b9a\u4f4d\u548c\u7a81\u51fa\u663e\u793a\u90e8\u5206\uff0c\u5305\u62ec\u53ef\u52a83D\u5bf9\u8c61\u548c\u771f\u5b9e\u4e16\u754c\u7684\u626b\u63cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65e0\u76d1\u7763\u65b9\u6cd5\u4fbf\u4e8e\u5feb\u901f\u90e8\u7f72\uff0c\u5e76\u4e3a\u672a\u67653D\uff08\u8bed\u4e49\uff09\u5bf9\u8c61\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\uff0c\u5982\u673a\u5668\u4eba\u3001\u7269\u4f53\u64cd\u4f5c\u3001\u90e8\u4ef6\u7ec4\u88c5\u3001\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u3001\u589e\u5f3a\u73b0\u5b9e\u548c\u865a\u62df\u73b0\u5b9e\uff08AR/VR\uff09\u3001\u4ee5\u53ca\u533b\u7597\u5e94\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u884c\u7684\u901a\u7528\u57fa\u51c6\u3002\u4ee3\u7801\u3001\u6a21\u578b\u6743\u91cd\u3001\u90e8\u7f72\u6307\u5357\u548c\u8bc4\u4f30\u534f\u8bae\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttp://tianrun-chen.github.io/Reason3D/\u3002|\n", "2405.19325": "|**2024-05-29**|**Nearest Neighbor Speculative Decoding for LLM Generation and Attribution**|Minghan Li et.al.|[2405.19325](http://arxiv.org/abs/2405.19325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4f1a\u4ea7\u751f\u865a\u6784\u5185\u5bb9\u4e14\u7f3a\u4e4f\u5bf9\u751f\u6210\u6587\u672c\u7684\u6765\u6e90\u6807\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u534a\u53c2\u6570\u5316\u8bed\u8a00\u6a21\u578b\u5982kNN-LM\u901a\u8fc7\u5728\u975e\u53c2\u6570\u6570\u636e\u5b58\u50a8\u4e2d\u5bfb\u627e\u4e0e\u7ed9\u5b9a\u63d0\u793a\u6700\u63a5\u8fd1\u7684\u90bb\u5c45\u6765\u6539\u8fdbLM\u8f93\u51fa\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u901a\u5e38\u8f83\u6162\uff0c\u751f\u6210\u7684\u6587\u672c\u6d41\u7545\u5ea6\u4e0d\u9ad8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u534a\u53c2\u6570\u5316\u8bed\u8a00\u5efa\u6a21\u65b9\u6cd5\u2014\u2014Nearest Neighbor Speculative Decoding\uff08NEST\uff09\uff0c\u5b83\u80fd\u591f\u5c06\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4efb\u610f\u957f\u5ea6\u6587\u672c\u7247\u6bb5\u878d\u5165\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u5176\u6e90\u5934\u7684\u6807\u6ce8\u3002NEST\u5728\u6bcf\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u8fdb\u884c\u57fa\u4e8e\u4ee4\u724c\u7684\u68c0\u7d22\uff0c\u8ba1\u7b97\u51fa\u4e00\u4e2a\u534a\u53c2\u6570\u6df7\u5408\u5206\u5e03\uff0c\u5e76\u4ece\u8bed\u6599\u5e93\u4e2d\u8bc6\u522b\u51fa\u53ef\u80fd\u7684\u8fde\u7eed\u6587\u672c\u6bb5\u843d\u6269\u5c55\u3002\u5b83\u91c7\u7528\u4e00\u79cd\u8fd1\u4f3c\u63a8\u6d4b\u89e3\u7801\u7b56\u7565\uff0c\u63a5\u53d7\u68c0\u7d22\u5230\u7684\u7247\u6bb5\u524d\u7f00\u6216\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002NEST\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u7840LM\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u751f\u6210\u8d28\u91cf\u548c\u6765\u6e90\u6807\u6ce8\u7387\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684kNN-LM\u65b9\u6cd5\uff0c\u5e76\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u68c0\u7d22\u589e\u5f3a\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u529b\u3002\u6b64\u5916\uff0cNEST\u5927\u5e45\u63d0\u5347\u4e86\u751f\u6210\u901f\u5ea6\uff0c\u5f53\u5e94\u7528\u4e8eLlama-2-Chat 70B\u65f6\uff0c\u63a8\u7406\u65f6\u95f4\u63d0\u9ad8\u4e861.8\u500d\u3002|\n", "2405.19323": "|**2024-05-29**|**Are Large Language Models Chameleons?**|Mingmeng Geng et.al.|[2405.19323](http://arxiv.org/abs/2405.19323)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u62e5\u6709\u81ea\u5df1\u7684\u4e16\u754c\u89c2\u548c\u4eba\u683c\u503e\u5411\uff1f\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u4e86\u8d85\u8fc7\u4e00\u767e\u4e07\u6b21\u7684\u5b9e\u9a8c\uff0c\u8ba9LLMs\u56de\u7b54\u4e3b\u89c2\u95ee\u9898\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u6a21\u578b\u7684\u54cd\u5e94\u4e0e\u6b27\u6d32\u793e\u4f1a\u8c03\u67e5\uff08ESS\uff09\u7684\u5b9e\u9645\u6570\u636e\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793a\u63d0\u793a\u5bf9\u504f\u89c1\u548c\u53d8\u5f02\u6027\u6709\u663e\u8457\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u91cd\u5927\u7684\u6587\u5316\u3001\u5e74\u9f84\u548c\u6027\u522b\u504f\u5dee\u3002\u6587\u4e2d\u8ba8\u8bba\u4e86\u8bc4\u4f30LLMs\u4e0e\u8c03\u67e5\u6570\u636e\u5dee\u5f02\u7684\u65b9\u6cd5\uff0c\u5982\u8ba1\u7b97\u52a0\u6743\u5e73\u5747\u503c\u4ee5\u53ca\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u57fa\u4e8eJaccard\u76f8\u4f3c\u6027\u7684\u6d4b\u91cf\u6307\u6807\u3002\u7814\u7a76\u8005\u5f3a\u8c03\uff0c\u5728\u5229\u7528LLMs\u6a21\u62df\u4e2a\u4f53\u51b3\u7b56\u6216\u96c6\u4f53\u884c\u4e3a\u4e4b\u524d\uff0c\u5206\u6790\u63d0\u793a\u7684\u7a33\u5065\u6027\u548c\u53d8\u5f02\u6027\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u6a21\u4eff\u80fd\u529b\u5145\u5176\u91cf\u53ea\u80fd\u8bf4\u662f\u8fd1\u4f3c\u7684\u3002|\n", "2405.19320": "|**2024-05-29**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320](http://arxiv.org/abs/2405.19320)|null|**\u6458\u8981\uff1a** \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5728\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u90fd\u5904\u4e8e\u6d3b\u8dc3\u7684\u7814\u7a76\u9636\u6bb5\uff0c\u4f46\u5173\u952e\u6311\u6218\u4e4b\u4e00\u662f\u5982\u4f55\u5728\u5904\u7406\u4ece\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u4e0d\u786e\u5b9a\u6027\u65f6\u3002\u5c3d\u7ba1\u6807\u51c6\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e2d\u4e50\u89c2\u4e3b\u4e49\u6216\u60b2\u89c2\u4e3b\u4e49\u7684\u539f\u5219\u5df2\u5e7f\u4e3a\u4eba\u77e5\uff0c\u4f46\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5b9e\u73b0\u65e2\u5b9e\u7528\u53c8\u57fa\u4e8e\u7406\u8bba\u7684\u65b9\u6cd5\u5c1a\u4e0d\u6210\u719f\uff0c\u56e0\u4e3a\u6784\u5efa\u7f6e\u4fe1\u533a\u95f4\u7684\u6807\u51c6\u6280\u672f\u5728\u5904\u7406\u4efb\u610f\u7b56\u7565\u53c2\u6570\u5316\u65f6\u53d8\u5f97\u96be\u4ee5\u5904\u7406\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u65b9\u6cd5\u2014\u2014\u4ef7\u503c\u6fc0\u52b1\u7684\u504f\u597d\u4f18\u5316\uff08VPO\uff09\u3002VPO\u901a\u8fc7\u5728\u6700\u5927\u4f3c\u7136\u4f30\u8ba1\u7684\u5956\u52b1\u51fd\u6570\u4e2d\u6dfb\u52a0\u76f8\u5e94\u7684\u503c\u51fd\u6570\u7684\u6b63\u5219\u5316\uff0c\u4ee5\u6307\u793a\u9009\u62e9\u4e50\u89c2\u4e3b\u4e49\u8fd8\u662f\u60b2\u89c2\u4e3b\u4e49\uff0c\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\u3002\u6b64\u5916\uff0cVPO\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u5e76\u5229\u7528\u9690\u5f0f\u5956\u52b1\u5efa\u6a21\uff0c\u56e0\u6b64\u5176RLHF\u7ba1\u9053\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\u66f4\u4e3a\u7b80\u5355\u3002\u5bf9\u4e8e\u5728\u7ebf\u548c\u79bb\u7ebf\u8bbe\u7f6e\uff0cVPO\u63d0\u4f9b\u4e86\u7406\u8bba\u4fdd\u8bc1\uff0c\u5176\u6536\u655b\u901f\u5ea6\u4e0e\u6807\u51c6RL\u76f8\u5f53\u3002\u5b9e\u9a8c\u5728\u6587\u672c\u6458\u8981\u548c\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u9a8c\u8bc1\u4e86VPO\u7684\u5b9e\u7528\u6027\u4e0e\u6709\u6548\u6027\u3002|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340](http://arxiv.org/abs/2405.20340)|**[link](https://github.com/IDEA-Research/MotionLLM)**|\u8fd9\u9879\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\uff08\u89c6\u9891\u548c\u52a8\u4f5c\u6a21\u6001\uff09\u4e0b\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5f3a\u5927\u529f\u80fd\u3002\u4e0e\u4e13\u4e3a\u5355\u6a21\u6001\uff08\u89c6\u9891\u6216\u52a8\u4f5c\uff09\u8bbe\u8ba1\u7684\u6700\u65b0LLMs\u4e0d\u540c\uff0c\u6211\u4eec\u8ba4\u4e3a\u7406\u89e3\u4eba\u7c7b\u884c\u4e3a\u9700\u8981\u5bf9\u89c6\u9891\u548c\u52a8\u4f5c\u5e8f\u5217\uff08\u5982SMPL\u5e8f\u5217\uff09\u8fdb\u884c\u8054\u5408\u5efa\u6a21\uff0c\u4ee5\u6709\u6548\u6355\u6349\u7cbe\u7ec6\u7684\u8eab\u4f53\u90e8\u4f4d\u52a8\u6001\u548c\u8bed\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMotionLLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u4eba\u7c7b\u52a8\u4f5c\u7406\u89e3\u3001\u63cf\u8ff0\u548c\u63a8\u7406\u3002MotionLLM\u91c7\u7528\u4e86\u4e00\u4f53\u5316\u7684\u89c6\u9891-\u52a8\u4f5c\u8bad\u7ec3\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u7c97\u7c92\u5ea6\u7684\u89c6\u9891-\u6587\u672c\u6570\u636e\u548c\u7cbe\u7ec6\u52a8\u4f5c-\u6587\u672c\u6570\u636e\u7684\u4f18\u52bf\uff0c\u4ee5\u83b7\u53d6\u4e30\u5bcc\u7684\u7a7a\u95f4-\u65f6\u95f4\u6d1e\u5bdf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684MoVid\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u591a\u6837\u5316\u7684\u89c6\u9891\u3001\u52a8\u4f5c\u3001caption\u548c\u6307\u4ee4\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86MoVid-Bench\uff0c\u5b83\u5177\u6709\u7cbe\u5fc3\u7684\u624b\u52a8\u6807\u6ce8\uff0c\u4ee5\u66f4\u597d\u5730\u8bc4\u4f30\u5728\u89c6\u9891\u548c\u52a8\u4f5c\u4e0a\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5145\u5206\u5c55\u793a\u4e86MotionLLM\u5728caption\u751f\u6210\u3001\u7a7a\u95f4-\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002|\n", "2405.20339": "|**2024-05-30**|**Visual Perception by Large Language Model's Weights**|Feipeng Ma et.al.|[2405.20339](http://arxiv.org/abs/2405.20339)|null|\u8fd9\u7bc7\u8bba\u6587\u7684\u80cc\u666f\u662f\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u91c7\u7528\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5373\u5c06\u89c6\u89c9\u4fe1\u606f\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u7a7a\u95f4\u5bf9\u9f50\uff0c\u7136\u540e\u5c06\u89c6\u89c9\u4ee4\u724c\u4e0e\u6587\u672c\u4ee4\u724c\u5408\u5e76\uff0c\u5f62\u6210\u7edf\u4e00\u7684\u5e8f\u5217\u8f93\u5165\u7ed9\u8bed\u8a00\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7531\u4e8e\u589e\u52a0\u4e86\u7531\u89c6\u89c9\u4ee4\u724c\u5bfc\u81f4\u7684\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\uff0c\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53c2\u6570\u7a7a\u95f4\u5bf9\u9f50\u8303\u5f0f\uff0c\u901a\u8fc7\u5c06\u89c6\u89c9\u4fe1\u606f\u8868\u793a\u4e3a\u6a21\u578b\u6743\u91cd\u6765\u5904\u7406\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u8f93\u5165\u56fe\u50cf\uff0c\u9996\u5148\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u63d0\u53d6\u7279\u5f81\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u7279\u5f81\u8f6c\u6362\u4e3a\u611f\u77e5\u6743\u91cd\uff0c\u5e76\u5c06\u5176\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u6743\u91cd\u878d\u5408\u3002\u8fd9\u6837\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u65e0\u9700\u89c6\u89c9\u4ee4\u724c\uff0c\u4ece\u800c\u7f29\u77ed\u4e86\u8f93\u5165\u5e8f\u5217\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6548\u7387\u3002 \u57fa\u4e8e\u8fd9\u4e00\u7406\u5ff5\uff0c\u8bba\u6587\u63d0\u51fa\u4e86VLoRA\u6a21\u578b\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u4e2a\u611f\u77e5\u6743\u91cd\u751f\u6210\u5668\u3002\u8be5\u751f\u6210\u5668\u8bbe\u8ba1\u6210\u80fd\u591f\u5c06\u89c6\u89c9\u7279\u5f81\u8f6c\u5316\u4e3a\u5177\u6709\u4f4e\u79e9\u7279\u6027\u7684\u611f\u77e5\u6743\u91cd\uff0c\u7c7b\u4f3c\u4e8eLoRA\uff08\u4f4e\u79e9\u81ea\u9002\u5e94\u8bad\u7ec3\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1VLoRA\u5728\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u4e0e\u73b0\u6709MLLMs\u76f8\u5f53\u7684\u6027\u80fd\uff0c\u4f46\u5176\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8ba1\u7b97\u6210\u672c\u663e\u8457\u964d\u4f4e\u3002\u8bba\u6587\u627f\u8bfa\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2405.20335": "|**2024-05-30**|**Xwin-LM: Strong and Scalable Alignment Practice for LLMs**|Bolin Ni et.al.|[2405.20335](http://arxiv.org/abs/2405.20335)|**[link](https://github.com/xwin-lm/xwin-lm)**|**\u672c\u6587\u4ecb\u7ecdXwin-LM\uff0c\u4e00\u4e2a\u4e13\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u7684\u5168\u9762\u5bf9\u9f50\u65b9\u6cd5\u5957\u4ef6\u3002\u5b83\u6db5\u76d6\u4e86\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3001\u5956\u52b1\u5efa\u6a21\uff08RM\uff09\u3001\u62d2\u7edd\u91c7\u6837\u5fae\u8c03\uff08RS\uff09\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b49\u591a\u79cd\u5173\u952e\u6280\u672f\u3002\u4e3b\u8981\u7ec4\u6210\u90e8\u5206\u5305\u62ec\uff1a(1) \u4f7f\u7528\u9ad8\u8d28\u91cf\u6307\u4ee4\u6570\u636e\u8fdb\u884c\u521d\u59cb\u5fae\u8c03\u7684Xwin-LM-SFT\uff1b(2) \u7531GPT-4\u7cbe\u5fc3\u6807\u6ce8\u7684\u5927\u578b\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6Xwin-Pair\uff1b(3) \u57287B\u300113B\u548c70B\u53c2\u6570\u89c4\u6a21\u4e0a\u8bad\u7ec3\u7684Xwin-RM\u5956\u52b1\u6a21\u578b\uff1b(4) \u6bcf\u4e2a\u63d0\u793a\u5173\u805464\u4e2a\u72ec\u7279\u54cd\u5e94\u7684\u591awise\u504f\u597d\u6570\u636e\u96c6Xwin-Set\uff0c\u8fd9\u4e9b\u54cd\u5e94\u7531Xwin-LM-SFT\u751f\u6210\u5e76\u7531Xwin-RM\u8bc4\u5206\uff1b(5) \u4f7f\u7528Xwin-Set\u4e2d\u6700\u9ad8\u5f97\u5206\u54cd\u5e94\u8fdb\u884c\u5fae\u8c03\u7684Xwin-LM-RS\u6a21\u578b\uff1b(6) \u901a\u8fc7DPO\u7b97\u6cd5\u5728Xwin-Set\u4e0a\u8fdb\u4e00\u6b65\u4f18\u5316\u7684Xwin-LM-DPO\u6a21\u578b\u3002\u6211\u4eec\u5728AlpacaEval\u548cMT-bench\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86\u6574\u4e2a\u7ba1\u9053\u7684\u7a33\u5b9a\u4e14\u663e\u8457\u6539\u8fdb\uff0c\u8bc1\u660e\u4e86Xwin-LM\u7684\u5f3a\u5927\u548c\u53ef\u6269\u5c55\u6027\u3002\u6211\u4eec\u5c06\u5728https://github.com/Xwin-LM/Xwin-LM\u7684\u4ed3\u5e93\u4e2d\u6301\u7eed\u66f4\u65b0\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u7814\u7a76\u3002**|\n", "2405.20319": "|**2024-05-31**|**ParSEL: Parameterized Shape Editing with Language**|Aditya Ganeshan et.al.|[2405.20319](http://arxiv.org/abs/2405.20319)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aParSEL\u7684\u7cfb\u7edf\uff0c\u5b83\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5b9e\u73b0\u9ad8\u8d28\u91cf3D\u8d44\u4ea7\u7684\u53ef\u63a7\u7f16\u8f91\u3002\u9762\u5bf9\u81ea\u7136\u8bed\u8a00\u5728\u7cbe\u786e\u64cd\u63a7\u4e0a\u7684\u5c40\u9650\u6027\uff0cParSEL\u63a5\u6536\u4e00\u4e2a\u5206\u5272\u76843D\u7f51\u683c\u548c\u7f16\u8f91\u8bf7\u6c42\uff0c\u751f\u6210\u4e00\u4e2a\u53c2\u6570\u5316\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u7528\u6237\u53ef\u4ee5\u8c03\u6574\u7a0b\u5e8f\u53c2\u6570\uff0c\u7cbe\u7ec6\u5730\u63a2\u7d22\u5f62\u72b6\u53d8\u5316\uff0c\u63a7\u5236\u7f16\u8f91\u5e45\u5ea6\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u7406\u89e3\u521d\u59cb\u7f16\u8f91\u6307\u4ee4\uff0c\u4f46\u53d1\u73b0\u5b83\u4eec\u5728\u63a8\u65ad\u5b8c\u6574\u7f16\u8f91\u7a0b\u5e8f\u65f6\u5e38\u5e38\u4e0d\u8db3\uff0c\u4ea7\u751f\u7684\u7ed3\u679c\u53ef\u80fd\u8fdd\u53cd\u5f62\u72b6\u903b\u8f91\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u5206\u6790\u6027\u7f16\u8f91\u4f20\u64ad\uff08Analytical Edit Propagation\uff0cAEP\uff09\u7b97\u6cd5\uff0c\u5b83\u4ece\u521d\u59cb\u7f16\u8f91\u79cd\u5b50\u5f00\u59cb\uff0c\u901a\u8fc7\u8ba1\u7b97\u673a\u4ee3\u6570\u7cfb\u7edf\u8fdb\u884c\u51e0\u4f55\u5206\u6790\uff0c\u5bfb\u627e\u4e0e\u6f5c\u5728\u7528\u6237\u7f16\u8f91\u517c\u5bb9\u7684\u5206\u6790\u6027\u7f16\u8f91\u64cd\u4f5c\uff0c\u4ee5\u751f\u6210\u5b8c\u6574\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u5176\u4ed6\u65b9\u6848\uff0cParSEL\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8bf7\u6c42\u6709\u6548\u5730\u5b9e\u73b0\u4e86\u5bf93D\u5bf9\u8c61\u7684\u53ef\u63a7\u7f16\u8f91\u3002|\n", "2405.20318": "|**2024-05-30**|**CausalQuest: Collecting Natural Causal Questions for AI Agents**|Roberto Ceraolo et.al.|[2405.20318](http://arxiv.org/abs/2405.20318)|**[link](https://github.com/roberto-ceraolo/causal-quest)**|**\u4eba\u7c7b\u5929\u751f\u5c31\u6709\u5bfb\u6c42\u56e0\u679c\u5173\u7cfb\u7684\u9a71\u52a8\u529b\uff0c\u65e0\u8bba\u662f\u51fa\u4e8e\u597d\u5947\u5fc3\u8fd8\u662f\u7279\u5b9a\u76ee\u6807\u3002\u4e3a\u4e86\u5f00\u53d1\u80fd\u5904\u7406\u8fd9\u79cd\u4eba\u7c7b\u672c\u6027\u8ffd\u6c42\u7684AI\u4ee3\u7406\uff0c\u6211\u4eec\u6025\u9700\u4e00\u4e2a\u5168\u9762\u7684\u81ea\u7136\u56e0\u679c\u95ee\u9898\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u8981\u4e48\u5305\u542b\u4eba\u5de5\u5236\u9020\u7684\u95ee\u9898\uff0c\u65e0\u6cd5\u53cd\u6620\u5b9e\u9645AI\u5e94\u7528\u573a\u666f\uff0c\u8981\u4e48\u5728\u7279\u5b9a\u6765\u6e90\u7684\u95ee\u9898\u8986\u76d6\u4e0a\u6709\u9650\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CausalQuest\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81ea\u793e\u4ea4\u7f51\u7edc\u3001\u641c\u7d22\u5f15\u64ce\u548cAI\u52a9\u624b\u768413,500\u4e2a\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u56e0\u679c\u95ee\u9898\uff0c\u5e76\u5efa\u7acb\u4e86\u66f4\u7ec6\u81f4\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u5458\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\uff0c\u6211\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6807\u6ce8\u3002\u7814\u7a76\u53d1\u73b0\uff0c42%\u7684\u4eba\u7c7b\u63d0\u95ee\u5b9e\u9645\u4e0a\u662f\u5173\u4e8e\u56e0\u679c\u7684\uff0c\u5927\u90e8\u5206\u662f\u60f3\u4e86\u89e3\u7ed9\u5b9a\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u9ad8\u6548\u7684\u4e8c\u5206\u7c7b\u5668\uff08\u9ad8\u8fbe28.5\u4ebf\u53c2\u6570\uff09\uff0c\u7528\u4e8e\u8bc6\u522b\u56e0\u679c\u95ee\u9898\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6027\u80fd\uff0cF1\u5206\u6570\u9ad8\u8fbe0.877\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u4e30\u5bcc\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u8fd9\u4e9b\u90fd\u53ef\u4ee5\u57fa\u4e8e\u6211\u4eec\u7684\u6570\u636e\u548c\u6a21\u578b\u8fdb\u884c\u6269\u5c55\u3002**|\n", "2405.20315": "|**2024-05-30**|**ANAH: Analytical Annotation of Hallucinations in Large Language Models**|Ziwei Ji et.al.|[2405.20315](http://arxiv.org/abs/2405.20315)|**[link](https://github.com/open-compass/anah)**|**### \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u5e7b\u89c9\u201d\u95ee\u9898\u5bf9\u4e8e\u5176\u5e7f\u6cdb\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5bf9\u8fd9\u4e00\u95ee\u9898\u7684\u7ec6\u81f4\u6d4b\u91cf\u5728\u793e\u533a\u4e2d\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a$\\textbf{ANAH}$\u7684\u53cc\u8bed\u6570\u636e\u96c6\uff0c\u4e13\u6ce8\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2d\u7684LLM\u5e7b\u89c9\u5206\u6790\u3002ANAH\u4e2d\u7684\u6bcf\u4e2a\u7b54\u6848\u53e5\u5b50\u90fd\u7ecf\u8fc7\u4e25\u8c28\u6807\u6ce8\uff0c\u5305\u62ec\u53c2\u8003\u7247\u6bb5\u68c0\u7d22\u3001\u5e7b\u89c9\u7c7b\u578b\u7684\u5224\u65ad\u4ee5\u53ca\u9519\u8bef\u5185\u5bb9\u7684\u4fee\u6b63\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea612,000\u4e2a\u53e5\u7ea7\u6ce8\u91ca\uff0c\u6db5\u76d6\u4e86\u5927\u7ea64,300\u4e2aLLM\u54cd\u5e94\uff0c\u6d89\u53ca\u8d85\u8fc7700\u4e2a\u4e3b\u9898\uff0c\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\u5f0f\u6d41\u7a0b\u6784\u5efa\u800c\u6210\u3002\u7531\u4e8e\u5e7b\u89c9\u6ce8\u91ca\u7684\u7cbe\u7ec6\u7c92\u5ea6\uff0c\u6211\u4eec\u53ef\u4ee5\u5b9a\u91cf\u786e\u8ba4LLMs\u7684\u5e7b\u89c9\u95ee\u9898\u968f\u7740\u7b54\u6848\u7684\u6269\u5c55\u800c\u9010\u6e10\u589e\u52a0\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u6807\u6ce8\u5668\u3002 ### \u4efb\u52a1 \u6211\u4eec\u6784\u5efa\u4e86\u5927\u7ea612,000\u6761\u53e5\u5b50\u7ea7\u522b\u7684\u6ce8\u91ca\uff0c\u9488\u5bf9\u7ea64,300\u4e2aLLM\u751f\u6210\u7684\u56de\u7b54\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u4e3b\u9898\u3002\u8fd9\u4e2a\u540d\u4e3aANAH\u7684\u6570\u636e\u96c6\u901a\u8fc7\u4eba\u7c7b\u53c2\u4e0e\u7684\u6d41\u7a0b\u7cbe\u5fc3\u8bbe\u8ba1\uff0c\u65e8\u5728\u63d0\u4f9b\u5173\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2dLLMs\u5e7b\u89c9\u7684\u8be6\u5c3d\u5206\u6790\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u5e7b\u89c9\u6807\u6ce8\uff0c\u6211\u4eec\u80fd\u591f\u91cf\u5316\u5730\u9a8c\u8bc1LLMs\u5728\u751f\u6210\u7b54\u6848\u65f6\u5e7b\u89c9\u95ee\u9898\u7684\u7d2f\u79ef\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u6df1\u5165\u7814\u7a76\u4e86\u751f\u6210\u5f0f\u548c\u533a\u5206\u6027\u6807\u6ce8\u5668\uff0c\u5e76\u53d1\u73b0\u5c3d\u7ba1\u5f00\u6e90LLMs\u5728\u7cbe\u7ec6\u5e7b\u89c9\u6807\u6ce8\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u4f46\u4f7f\u7528ANAH\u8bad\u7ec3\u7684\u751f\u6210\u5f0f\u6807\u6ce8\u5668\u80fd\u591f\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u6a21\u578b\uff0c\u751a\u81f3\u63a5\u8fd1GPT-3.5\u7684\u8868\u73b0\uff0c\u5e76\u5c55\u73b0\u51fa\u5728\u672a\u89c1\u8fc7\u95ee\u9898\u4e0a\u7684\u826f\u597d\u6cdb\u5316\u80fd\u529b\u3002**|\n", "2405.20313": "|**2024-05-30**|**Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation**|Guillaume Huguet et.al.|[2405.20313](http://arxiv.org/abs/2405.20313)|null|\u86cb\u767d\u8d28\u5728\u51e0\u4e4e\u6240\u6709\u7684\u751f\u7269\u8fc7\u7a0b\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\uff0c\u5176\u591a\u6837\u5316\u7684\u529f\u80fd\u6e90\u4e8e\u590d\u6742\u7684\u4e09\u7ef4\u7ed3\u6784\uff0c\u800c\u8fd9\u4e9b\u7ed3\u6784\u53c8\u7531\u6c28\u57fa\u9178\u5e8f\u5217\u51b3\u5b9a\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6c28\u57fa\u9178\u5e8f\u5217\u4e30\u5bcc\u7684\u751f\u7269\u5b66\u5f52\u7eb3\u504f\u7f6e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5e8f\u5217\u6761\u4ef6\u7684SE(3)\u7b49\u53d8\u6d41\u5339\u914d\u6a21\u578b\u2014\u2014FoldFlow-2\uff0c\u7528\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u751f\u6210\u3002\u4e0eFoldFlow\u5bb6\u65cf\u7684\u5148\u524d\u6a21\u578b\u76f8\u6bd4\uff0cFoldFlow-2\u5f15\u5165\u4e86\u65b0\u9896\u7684\u67b6\u6784\u7279\u6027\uff0c\u5305\u62ec\u7528\u4e8e\u7f16\u7801\u5e8f\u5217\u7684\u86cb\u767d\u8d28\u5927\u8bed\u8a00\u6a21\u578b\u3001\u7ed3\u5408\u7ed3\u6784\u548c\u5e8f\u5217\u8868\u793a\u7684\u65b0\u591a\u6a21\u6001\u878d\u5408\u4e3b\u5e72\uff0c\u4ee5\u53ca\u57fa\u4e8e\u51e0\u4f55\u53d8\u6362\u5668\u7684\u89e3\u7801\u5668\u3002\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u6837\u672c\u7684\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u2014\u2014\u8fd9\u5bf9\u65b0\u836f\u8bbe\u8ba1\u81f3\u5173\u91cd\u8981\u2014\u2014\u6211\u4eec\u5728\u6bd4\u5148\u524d\u5de5\u4f5c\u4f7f\u7528\u7684PDB\u6570\u636e\u96c6\u5927\u4e00\u4e2a\u6570\u91cf\u7ea7\u7684\u65b0\u6570\u636e\u96c6\u4e0a\u5927\u89c4\u6a21\u8bad\u7ec3FoldFlow-2\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u5df2\u77e5\u7684PDB\u86cb\u767d\u8d28\u548c\u901a\u8fc7\u8fc7\u6ee4\u83b7\u5f97\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u7ed3\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5f15\u5165\u5f3a\u5316\u5fae\u8c03\uff08Reinforced Finetuning\uff0c\u7b80\u79f0ReFT\uff09\u76ee\u6807\uff0c\u4f7fFoldFlow-2\u80fd\u591f\u9002\u5e94\u4efb\u610f\u5956\u52b1\uff0c\u5982\u63d0\u9ad8\u4e8c\u7ea7\u7ed3\u6784\u591a\u6837\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFoldFlow-2\u8d85\u8d8a\u4e86\u73b0\u6709\u57fa\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\u7684\u72b6\u6001\uff0c\u65e0\u8bba\u5728\u65e0\u6761\u4ef6\u751f\u6210\u8fd8\u662f\u5728\u8bbe\u8ba1\u6027\u3001\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u65b9\u9762\uff0c\u90fd\u4f18\u4e8eRFDiffusion\uff0c\u4e14\u5728\u86cb\u767d\u8d28\u957f\u5ea6\u7684\u5404\u7c7b\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u7b49\u6e29\u6784\u8c61\u91c7\u6837\u4efb\u52a1\u4e0a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684FoldFlow-2\u5728\u8bf8\u5982VHH\u7eb3\u7c73\u6297\u4f53\u9aa8\u67b6\u8bbe\u8ba1\u7b49\u5177\u6709\u6311\u6218\u6027\u7684\u6761\u4ef6\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002|\n", "2405.20309": "|**2024-05-30**|**Large Language Models Can Self-Improve At Web Agent Tasks**|Ajay Patel et.al.|[2405.20309](http://arxiv.org/abs/2405.20309)|**[link](https://github.com/AjayP13/webdreamer)**|\u5728\u590d\u6742\u7684\u73af\u5883\u4e2d\uff0c\u5982\u7f51\u7edc\u6d4f\u89c8\u5668\uff0c\u8bad\u7ec3\u6a21\u578b\u4f5c\u4e3a\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u6267\u884c\u52a8\u4f5c\u7684\u4ee3\u7406\u901a\u5e38\u5177\u6709\u6311\u6218\u6027\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u4ee5\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u6765\u5728\u65b0\u73af\u5883\u4e2d\u5bfc\u822a\u7684\u80fd\u529b\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\uff08\u5373\u5728\u5176\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u4e0a\u5fae\u8c03\uff09\u6765\u8d85\u8d8a\u57fa\u7840\u6027\u80fd\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76LLMs\u5728\u957f\u65f6\u5e8f\u4efb\u52a1\u7684\u590d\u6742\u73af\u5883\u2014\u2014WebArena\u57fa\u51c6\u4e2d\uff0c\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u80fd\u5426\u63d0\u5347\u5176\u8868\u73b0\u3002WebArena\u8981\u6c42\u4ee3\u7406\u81ea\u4e3b\u6d4f\u89c8\u7f51\u9875\u5e76\u6267\u884c\u64cd\u4f5c\u4ee5\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u5408\u6210\u8bad\u7ec3\u6570\u636e\u6df7\u5408\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u7ecf\u8fc7\u81ea\u6211\u6539\u8fdb\u540e\uff0c\u6a21\u578b\u5728WebArena\u57fa\u51c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u63d0\u9ad8\u4e8631%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u66f4\u5168\u9762\u5730\u8bc4\u4f30\u6211\u4eec\u7684\u5fae\u8c03\u4ee3\u7406\u6a21\u578b\u7684\u884c\u4e3a\u6027\u80fd\u3001\u9c81\u68d2\u6027\u3001\u80fd\u529b\u4ee5\u53ca\u8f68\u8ff9\u8d28\u91cf\uff0c\u8fd9\u4e9b\u6307\u6807\u8d85\u8d8a\u4e86\u5f53\u524d\u4ec5\u4f9d\u8d56\u4e8e\u6574\u4f53\u57fa\u51c6\u5206\u6570\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002|\n", "2405.20304": "|**2024-05-30**|**Group Robust Preference Optimization in Reward-free RLHF**|Shyam Sundhar Ramesh et.al.|[2405.20304](http://arxiv.org/abs/2405.20304)|**[link](https://github.com/rsshyam/Group-robust-preference-optimization)**|**## \u7ffb\u8bd1 \u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u9002\u5e94\u65f6\uff0c\u901a\u5e38\u9700\u8981\u901a\u8fc7\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u548c\u591a\u5143\u6807\u7b7e\u8005\u7fa4\u4f53\uff08\u5982\u4e0d\u540c\u6027\u522b\u3001\u79cd\u65cf\u3001\u516c\u53f8\u56e2\u961f\u7b49\uff09\u7684\u504f\u597d\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u4f20\u7edf\u65b9\u6cd5\u503e\u5411\u4e8e\u91c7\u7528\u201c\u4e00\u5200\u5207\u201d\u7684\u7b56\u7565\uff0c\u5373\u5047\u8bbe\u5e76\u4f18\u5316\u5355\u4e00\u7684\u504f\u597d\u6a21\u578b\uff0c\u5bf9\u5404\u7fa4\u4f53\u7684\u72ec\u7279\u7279\u6027\u548c\u9700\u6c42\u4e0d\u591f\u654f\u611f\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7fa4\u4f53\u9c81\u68d2\u504f\u597d\u4f18\u5316\uff08GRPO\uff09\u65b9\u6cd5\uff0c\u65e8\u5728\u7a33\u5065\u5730\u4f7fLLMs\u9002\u5e94\u5404\u4e2a\u7fa4\u4f53\u7684\u504f\u597d\u3002GRPO\u65b9\u6cd5\u57fa\u4e8e\u65e0\u5956\u52b1\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u4f46\u533a\u522b\u4e8e\u4ee5\u5f80\uff0c\u5b83\u76ee\u6807\u662f\u5bfb\u627e\u4e00\u4e2a\u80fd\u6700\u5927\u5316\u6700\u5dee\u7fa4\u4f53\u6027\u80fd\u7684\u9c81\u68d2\u7b56\u7565\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0cGRPO\u4f1a\u52a8\u6001\u4e14\u9010\u6b21\u8c03\u6574\u4e0d\u540c\u7fa4\u4f53\u7684\u6743\u91cd\uff0c\u4f18\u5148\u5173\u6ce8\u7d2f\u79ef\u635f\u5931\u8f83\u9ad8\u7684\u7fa4\u4f53\u3002\u6211\u4eec\u5728\u7406\u8bba\u4e0a\u63a2\u8ba8\u4e86GRPO\u7684\u53ef\u884c\u6027\uff0c\u5e76\u5206\u6790\u4e86\u5176\u5728\u5bf9\u6570\u7ebf\u6027\u7b56\u7565\u7c7b\u522b\u4e0b\u7684\u6536\u655b\u6027\u3002\u901a\u8fc7\u4f7f\u7528\u6765\u81ea\u4e0d\u540c\u7fa4\u4f53\u7684\u5168\u5c40\u610f\u89c1\u6570\u636e\u5bf9LLMs\u8fdb\u884cGRPO\u5fae\u8c03\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u6700\u5dee\u7fa4\u4f53\u7684\u8868\u73b0\uff0c\u51cf\u5c11\u4e86\u7fa4\u4f53\u95f4\u635f\u5931\u7684\u4e0d\u5e73\u8861\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u6982\u7387\u51c6\u786e\u6027\uff0c\u76f8\u8f83\u4e8e\u975e\u9c81\u68d2\u57fa\u7ebf\uff0c\u8fd9\u4e9b\u6539\u8fdb\u6548\u679c\u663e\u8457\u3002**|\n", "2405.20285": "|**2024-05-30**|**Who Writes the Review, Human or AI?**|Panagiotis C. Theocharopoulos et.al.|[2405.20285](http://arxiv.org/abs/2405.20285)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u4eba\u4eec\u5173\u6ce8\u5982\u4f55\u8bc6\u522b\u4e0d\u540c\u9886\u57df\u7684AI\u751f\u6210\u6587\u672c\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u901a\u8fc7\u63d0\u51fa\u4e00\u79cd\u65b9\u6cd5\u6765\u51c6\u786e\u533a\u5206\u4eba\u5de5\u667a\u80fd\u751f\u6210\u7684\u548c\u4eba\u7c7b\u64b0\u5199\u7684\u4e66\u8bc4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u8fc1\u79fb\u5b66\u4e60\uff0c\u8ba9\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u540c\u4e3b\u9898\u95f4\u8bc6\u522b\u751f\u6210\u6587\u672c\uff0c\u540c\u65f6\u63d0\u9ad8\u5176\u8bc6\u522b\u5199\u4f5c\u98ce\u683c\u548c\u8bcd\u6c47\u53d8\u5316\u7684\u80fd\u529b\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u5305\u542b\u771f\u5b9e\u7684\u4e66\u8bc4\u548c\u4f7f\u7528Vicuna\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u6a21\u62df\u8bc4\u8bba\uff0c\u4ee5\u8bc4\u4f30\u6240\u63d0\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bc6\u522b\u6587\u672c\u539f\u521b\u6765\u6e90\u662f\u53ef\u884c\u7684\uff0c\u51c6\u786e\u7387\u8fbe\u523096.86%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u8bc6\u522b\u65b9\u9762\u7684\u6027\u80fd\u4e0e\u5c40\u9650\u6027\u7814\u7a76\uff0c\u8fd9\u5bf9\u4e8e\u672a\u6765\u6709\u6548\u7ba1\u7406\u6b64\u7c7b\u6a21\u578b\u4ee5\u53ca\u786e\u4fdd\u4eba\u7c7b\u521b\u4f5c\u5185\u5bb9\u7684\u5b8c\u6574\u6027\u548c\u771f\u5b9e\u6027\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2405.21075": "|**2024-05-31**|**Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis**|Chaoyou Fu et.al.|[2405.21075](http://arxiv.org/abs/2405.21075)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u8ffd\u6c42\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u6210\u4e3a\u8fd1\u671f\u8fdb\u6b65\u7684\u6838\u5fc3\u3002\u7136\u800c\uff0c\u5bf9\u5b83\u4eec\u5904\u7406\u5e8f\u5217\u89c6\u89c9\u6570\u636e\u7684\u80fd\u529b\u7684\u5173\u6ce8\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51faVideo-MME\uff0c\u8fd9\u662f\u9996\u4e2a\u5168\u9762\u8bc4\u4f30MLLMs\u5728\u89c6\u9891\u5206\u6790\u6027\u80fd\u7684\u591a\u6a21\u6001\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u56db\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u89c6\u9891\u7c7b\u578b\u591a\u6837\uff0c\u6db5\u76d66\u4e2a\u4e3b\u8981\u89c6\u89c9\u9886\u57df\u548c30\u4e2a\u5b50\u9886\u57df\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\u6cdb\u5316\u80fd\u529b\uff1b2\uff09\u65f6\u95f4\u7ef4\u5ea6\u7684\u8de8\u5ea6\uff0c\u5305\u62ec\u77ed\u3001\u4e2d\u3001\u957f\u671f\u89c6\u9891\uff0c\u4ece11\u79d2\u52301\u5c0f\u65f6\uff0c\u4ee5\u68c0\u9a8c\u6a21\u578b\u5bf9\u590d\u6742\u60c5\u5883\u52a8\u6001\u7684\u9002\u5e94\u6027\uff1b3\uff09\u6570\u636e\u6a21\u6001\u7684\u5e7f\u5ea6\uff0c\u7ed3\u5408\u89c6\u9891\u5e27\u4ee5\u5916\u7684\u591a\u79cd\u8f93\u5165\uff0c\u5982\u5b57\u5e55\u548c\u97f3\u9891\uff0c\u63ed\u793aMLLMs\u7684\u5168\u65b9\u4f4d\u80fd\u529b\uff1b4\uff09\u9ad8\u8d28\u91cf\u7684\u6807\u6ce8\uff0c\u7531\u4e13\u5bb6\u4e25\u683c\u624b\u52a8\u6807\u8bb0\uff0c\u4ee5\u4fdd\u8bc1\u7cbe\u786e\u4e14\u53ef\u9760\u7684\u6a21\u578b\u8bc4\u4f30\u3002\u6211\u4eec\u7cbe\u5fc3\u6311\u9009\u5e76\u624b\u52a8\u6ce8\u89e3\u4e86900\u6bb5\u89c6\u9891\uff0c\u603b\u65f6\u957f\u8fbe\u5230256\u5c0f\u65f6\uff0c\u751f\u6210\u4e862,700\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u901a\u8fc7Video-MME\uff0c\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u7cfb\u5217\u3001Gemini 1.5 Pro\u5728\u5185\u7684\u591a\u4e2a\u6700\u5148\u8fdb\u7684MLLM\uff0c\u4ee5\u53ca\u5f00\u6e90\u56fe\u50cf\u6a21\u578bInternVL-Chat-V1.5\u548c\u89c6\u9891\u6a21\u578bLLaVA-NeXT-Video\u8fdb\u884c\u4e86\u6df1\u5165\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cGemini 1.5 Pro\u662f\u8868\u73b0\u6700\u4f73\u7684\u5546\u4e1a\u6a21\u578b\uff0c\u660e\u663e\u4f18\u4e8e\u5f00\u6e90\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u53d1\u73b0\u5f3a\u8c03\u4e86\u6539\u8fdb\u5904\u7406\u66f4\u957f\u5e8f\u5217\u548c\u591a\u6a21\u6001\u6570\u636e\u7684\u5fc5\u8981\u6027\u3002\u9879\u76ee\u7f51\u9875\u94fe\u63a5\uff1ahttps://video-mme.github.io|\n", "2405.21047": "|**2024-05-31**|**Grammar-Aligned Decoding**|Kanghee Park et.al.|[2405.21047](http://arxiv.org/abs/2405.21047)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u65f6\u9762\u4e34\u6311\u6218\uff0c\u5982\u7a0b\u5e8f\u4ee3\u7801\u3001\u6570\u5b66\u516c\u5f0f\u6216\u89c4\u8303\u7684\u6807\u8bb0\u3002\u7ea6\u675f\u89e3\u7801\u65b9\u6cd5\u901a\u8fc7\u9650\u5236\u6bcf\u6b21\u8f93\u51fa\u53ef\u80fd\u7684\u4ee4\u724c\uff0c\u786e\u4fdd\u8f93\u51fa\u7b26\u5408\u7279\u5b9a\u89c4\u5219\u6765\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4f8b\u5982\u5728\u8bed\u6cd5\u7ea6\u675f\u89e3\u7801\uff08GCD\uff09\u4e2d\uff0cLLM\u7684\u8f93\u51fa\u5fc5\u987b\u9075\u5faa\u7ed9\u5b9a\u7684\u8bed\u6cd5\u89c4\u5219\u3002\u7136\u800c\uff0c\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u79cd\u7ea6\u675f\u89e3\u7801\u53ef\u80fd\u4f1a\u626d\u66f2\u6a21\u578b\u7684\u5206\u5e03\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u8f93\u51fa\u867d\u7136\u8bed\u6cd5\u6b63\u786e\uff0c\u4f46\u5176\u6982\u7387\u5e76\u4e0d\u76f4\u63a5\u53cd\u6620LLM\u672c\u8eab\u7684\u6982\u7387\u5206\u914d\uff0c\u4ece\u800c\u8d28\u91cf\u4e0d\u9ad8\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u4e0e\u8bed\u6cd5\u7ea6\u675f\u5bf9\u9f50\u7684\u89e3\u7801\u201d\uff08Grammar-Aligned Decoding\uff0cGAD\uff09\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u81ea\u9002\u5e94\u91c7\u6837\u4e0e\u8fd1\u4f3c\u671f\u671b\u672a\u6765\u201d\uff08Adaptive Sampling with Approximate Expected Futures\uff0cASAp\uff09\u7684\u89e3\u7801\u7b97\u6cd5\u3002 ASAp\u7b97\u6cd5\u65e8\u5728\u4fdd\u8bc1\u8f93\u51fa\u7684\u8bed\u6cd5\u6027\uff0c\u5e76\u7406\u8bba\u4e0a\u4ea7\u751f\u4e0eLLM\u5728\u7ed9\u5b9a\u8bed\u6cd5\u7ea6\u675f\u6761\u4ef6\u4e0b\u7684\u6761\u4ef6\u6982\u7387\u76f8\u7b26\u7684\u7ed3\u679c\u3002\u8be5\u7b97\u6cd5\u5229\u7528\u5148\u524d\u7684\u6837\u672c\u8f93\u51fa\u6765\u7a33\u5065\u5730\u4f30\u7b97\u4e0d\u540c\u8f93\u51fa\u524d\u7f00\u7684\u672a\u6765\u8bed\u6cd5\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5728\u4ee3\u7801\u751f\u6210\u548c\u7ed3\u6784\u5316\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cASAp\u7ecf\u5e38\u80fd\u591f\u751f\u6210\u6bd4\u73b0\u6709GCD\u6280\u672f\u66f4\u7b26\u5408LLM\u5206\u5e03\u4e14\u4ecd\u9075\u5b88\u6240\u9700\u8bed\u6cd5\u9650\u5236\u7684\u8f93\u51fa\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u8d28\u91cf\u3002|\n", "2405.21040": "|**2024-05-31**|**Direct Alignment of Language Models via Quality-Aware Self-Refinement**|Runsheng Yu et.al.|[2405.21040](http://arxiv.org/abs/2405.21040)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u884c\u4e3a\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u5e38\u7528\u65b9\u6cd5\u3002\u6700\u8fd1\uff0c\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u4f5c\u4e3a\u4e00\u79cd\u66ff\u4ee3\u65b9\u6848\u5174\u8d77\uff0c\u5b83\u4e0d\u518d\u4f9d\u8d56LLM\u5956\u52b1\u6a21\u578b\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u989d\u5916\u7684\u5185\u5b58\u548c\u8bad\u7ec3\u65f6\u95f4\u3002\u7136\u800c\uff0cDPO\u5ffd\u89c6\u4e86\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u76f8\u5bf9\u8d28\u91cf\uff0c\u53ef\u80fd\u5bfc\u81f4\u8bad\u7ec3\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u5229\u7528LLM\u5185\u90e8\u77e5\u8bc6\u5728\u5373\u65f6\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5e76\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7ec6\u5316\u51fd\u6570\uff0c\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u4f30\u8ba1\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u54c1\u8d28\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u8f7b\u5ea6\u5047\u8bbe\u4e0b\uff0c\u6784\u5efa\u7684\u7ec6\u5316\u51fd\u6570\u80fd\u591f\u5e2e\u52a9\u81ea\u6211\u8c03\u6574\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u5c06\u8fd9\u4e2a\u7ec6\u5316\u529f\u80fd\u6574\u5408\u5230DPO\u53ca\u5176\u53d8\u4f53\u8eab\u4efd\u7b56\u7565\u4f18\u5316\uff08IPO\uff09\u4e2d\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0c\u8fd9\u4e9b\u6539\u8fdb\u540e\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u8005\u4e0a\u8868\u73b0\u51fa\u4f18\u4e8eDPO\u548cIPO\u7684\u6027\u80fd\u3002|\n", "2405.21030": "|**2024-05-31**|**Standards for Belief Representations in LLMs**|Daniel A. Herrmann et.al.|[2405.21030](http://arxiv.org/abs/2405.21030)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u5bb6\u4eec\u6b63\u5728\u5bfb\u6c42\u7406\u89e3\u5b83\u4eec\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5173\u4e8eLLMs\u5982\u4f55\uff08\u5982\u679c\u6709\u7684\u8bdd\uff09\u5185\u90e8\u6784\u5efa\u5bf9\u4e16\u754c\u7684\u4fe1\u5ff5\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u7f3a\u4e4f\u4e00\u4e2a\u7edf\u4e00\u7684\u7406\u8bba\u6846\u67b6\u6765\u652f\u6491\u5bf9LLM\u4e2d\u4fe1\u5ff5\u7684\u7814\u7a76\u3002\u672c\u6587\u8bd5\u56fe\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86\u4e00\u5957\u6761\u4ef6\uff0c\u4f7fLLM\u4e2d\u7684\u8868\u793a\u80fd\u591f\u88ab\u89c6\u4e3a\u4fe1\u5ff5\u4f3c\u7684\u3002\u6211\u4eec\u6307\u51fa\uff0c\u5c3d\u7ba1\u5728LLMs\u4e2d\u6d4b\u91cf\u4fe1\u5ff5\u7684\u9879\u76ee\u4e0e\u51b3\u7b56\u7406\u8bba\u548c\u5f62\u5f0f\u8ba4\u8bc6\u8bba\u4e2d\u7684\u4fe1\u5ff5\u6d4b\u91cf\u5728\u8bb8\u591a\u65b9\u9762\u6709\u76f8\u4f3c\u4e4b\u5904\uff0c\u4f46\u4e5f\u5b58\u5728\u5dee\u5f02\uff0c\u8fd9\u4e9b\u5dee\u5f02\u5e94\u5f71\u54cd\u6211\u4eec\u7684\u6d4b\u91cf\u65b9\u6cd5\u3002\u56e0\u6b64\uff0c\u501f\u9274\u54f2\u5b66\u6d1e\u5bdf\u548c\u673a\u5668\u5b66\u4e60\u7684\u5f53\u4ee3\u5b9e\u8df5\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u56db\u4e2a\u6807\u51c6\uff1a\u51c6\u786e\u6027\u3001\u4e00\u81f4\u6027\u3001\u7edf\u4e00\u6027\u548c\u5b9e\u7528\u6027\u3002\u8fd9\u56db\u4e2a\u6807\u51c6\u7ed3\u5408\u4e86\u7406\u8bba\u8003\u91cf\u4e0e\u5b9e\u9645\u9650\u5236\uff0c\u4e3a\u5168\u9762\u7406\u89e3LLM\u4e2d\u7684\u4fe1\u5ff5\u8868\u793a\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u6211\u4eec\u5f15\u7528\u5b9e\u8bc1\u5de5\u4f5c\u7684\u6210\u679c\uff0c\u63ed\u793a\u4e86\u5355\u72ec\u4f7f\u7528\u67d0\u4e9b\u6807\u51c6\u65f6\u8bc6\u522b\u4fe1\u5ff5\u8868\u793a\u7684\u5c40\u9650\u6027\u3002|\n", "2405.21028": "|**2024-05-31**|**LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models**|Elias Stengel-Eskin et.al.|[2405.21028](http://arxiv.org/abs/2405.21028)|**[link](https://github.com/esteng/pragmatic_calibration)**|**\u5f53\u56de\u7b54\u95ee\u9898\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u80fd\u63d0\u4f9b\u7b54\u6848\uff0c\u8fd8\u80fd\u4f20\u8fbe\u5bf9\u7b54\u6848\u6b63\u786e\u6027\u7684\u4fe1\u5fc3\u7a0b\u5ea6\u3002\u8fd9\u5305\u62ec\u660e\u786e\u7684\u5206\u6570\u6807\u8bb0\uff0c\u5982\u7ed9\u51fa\u6570\u5b57\uff0c\u4ee5\u53ca\u9690\u542b\u7684\u4fe1\u5fc3\u6807\u5fd7\uff0c\u5982\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u989d\u5916\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u5927\u591a\u6570\u6a21\u578b\u5f80\u5f80\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u6821\u51c6\u8fd9\u4e9b\u4fe1\u5fc3\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u7528\u7684\u3001\u8003\u8651\u542c\u4f17\u7684\u5fae\u8c03\u65b9\u6cd5\uff08LACIE\uff09\uff0c\u5b83\u4e0d\u4ec5\u5173\u6ce8\u7b54\u6848\u662f\u5426\u6b63\u786e\uff0c\u8fd8\u5173\u6ce8\u7b54\u6848\u662f\u5426\u4f1a\u88ab\u542c\u4f17\u63a5\u53d7\u3002\u6211\u4eec\u5c06\u6821\u51c6\u89c6\u4e3a\u504f\u597d\u4f18\u5316\uff0c\u901a\u8fc7\u53cc\u4ee3\u7406\u6e38\u620f\u521b\u5efa\u6570\u636e\uff0c\u8ba9\u4e00\u4e2a\u6f14\u8bb2\u8005\u6a21\u578b\u7684\u8f93\u51fa\u63a5\u53d7\u6a21\u62df\u542c\u8005\u7684\u8bc4\u5224\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528LACIE\u5bf9\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff08Mistral-7B\u3001Llama3-8B\u548cLlama3-70B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u663e\u793a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6a21\u62df\u542c\u8005\u9762\u524d\u6709\u66f4\u597d\u7684\u6821\u51c6\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u8d8b\u52bf\u4e5f\u9002\u7528\u4e8e\u4eba\u7c7b\u542c\u4f17\uff0c\u5e2e\u52a9\u4ed6\u4eec\u66f4\u51c6\u786e\u5730\u9884\u6d4b\u6a21\u578b\u7684\u6b63\u786e\u6027\uff1a\u6211\u4eec\u5728\u4eba\u673a\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u63a5\u53d7\u7684\u9519\u8bef\u7b54\u6848\u51cf\u5c11\u4e8647%\uff0c\u800c\u6b63\u786e\u7b54\u6848\u7684\u63a5\u53d7\u7387\u4fdd\u6301\u4e0d\u53d8\u3002\u6b64\u5916\uff0cLACIE\u6cdb\u5316\u5230\u53e6\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\uff0c\u5728\u4f7f\u7528TriviaQA\u8bad\u7ec3\u540e\uff0cTruthfulQA\u4e0a\u7684\u771f\u5b9e\u6027\u5927\u5e45\u63d0\u9ad8\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLACIE\u5bfc\u81f4\u4e86\u6b63\u786e\u548c\u9519\u8bef\u793a\u4f8b\u4e4b\u95f4\u7684\u4fe1\u5fc3\u5ea6\u66f4\u597d\u5730\u5206\u79bb\u3002\u5b9a\u6027\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u4f1a\u66f4\u52a0\u8c28\u614e\uff0c\u5e76\u5728\u56de\u7b54\u6b63\u786e\u65f6\u901a\u8fc7\u4f7f\u7528\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u7ec6\u8282\u6765\u9690\u6027\u5730\u8868\u793a\u786e\u5b9a\u6027\u3002\u6700\u540e\uff0cLACIE\u5fae\u8c03\u5bfc\u81f4\u6a21\u578b\u5bf9\u4e8e\u53ef\u80fd\u9519\u8bef\u7684\u7b54\u6848\u66f4\u503e\u5411\u4e8e\u653e\u5f03\uff08\u4f8b\u5982\u8bf4\u201c\u6211\u4e0d\u77e5\u9053\u201d\uff09\u3002**|\n", "2405.21018": "|**2024-05-31**|**Improved Techniques for Optimization-Based Jailbreaking on Large Language Models**|Xiaojun Jia et.al.|[2405.21018](http://arxiv.org/abs/2405.21018)|**[link](https://github.com/jiaxiaojunqaq/i-gcg)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5176\u5b89\u5168\u6821\u51c6\u6210\u4e3a\u5e7f\u6cdb\u5e94\u7528\u7684\u5173\u952e\u3002\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u7834\u89e3\uff08\u5373\u201cjailbreaking\u201d\uff09\u6d3b\u52a8\u65e5\u76ca\u589e\u591a\uff0c\u5176\u4e2d\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u653b\u51fb\u56e0\u5176\u6210\u6548\u663e\u8457\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0cGCG\u7684\u653b\u51fb\u6548\u7387\u4ecd\u6709\u63d0\u5347\u7a7a\u95f4\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6539\u8fdb\u7684\u4f18\u5316\u57fa\u7ebf\u7834\u89e3\u6280\u672f\uff0c\u4ee5\u63d0\u5347GCG\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u5355\u4e2a\u76ee\u6807\u6a21\u677f\u201cSure\u201d\u6781\u5927\u5730\u9650\u5236\u4e86GCG\u7684\u653b\u51fb\u6548\u679c\uff0c\u56e0\u6b64\u6211\u4eec\u5efa\u8bae\u91c7\u7528\u5305\u542b\u6709\u5bb3\u81ea\u6211\u6697\u793a\u548c/\u6216\u6307\u5bfc\u7684\u591a\u6837\u5316\u76ee\u6807\u6a21\u677f\uff0c\u4ee5\u8bef\u5bfc\u6a21\u578b\u3002\u5728\u4f18\u5316\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u5efa\u8bae\u5728GCG\u4e2d\u5b9e\u65bd\u81ea\u52a8\u591a\u5750\u6807\u66f4\u65b0\uff0c\u4ee5\u52a0\u901f\u6536\u655b\uff0c\u5e76\u5f15\u5165\u4ece\u7b80\u5355\u5230\u590d\u6742\uff08easy-to-hard\uff09\u7684\u521d\u59cb\u5316\u6280\u5de7\u3002\u5c06\u8fd9\u4e9b\u6539\u8fdb\u6574\u5408\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u9ad8\u6548\u7684\u65b9\u6cd5\u2014\u2014$\\mathcal{I}$-GCG\u3002\u5b9e\u9a8c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5982NeurIPS 2023 \u7ea2\u961f\u6311\u6218\u4e2d\u8fdb\u884c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6539\u8fdb\u6280\u672f\u80fd\u591f\u5e2e\u52a9GCG\u8d85\u8d8a\u73b0\u6709\u7834\u89e3\u653b\u51fb\uff0c\u5b9e\u73b0\u63a5\u8fd1100%\u7684\u653b\u51fb\u6210\u529f\u7387\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/jiaxiaojunQAQ/I-GCG\u3002**|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985](http://arxiv.org/abs/2405.20985)|**[link](https://github.com/yaolinli/deco)**|\u8be5\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u6295\u5f71\u5668\u6a21\u5757\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fde\u63a5\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u3001\u4fc3\u8fdb\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u9762\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u5bf9\u4e8e\u6295\u5f71\u5668\u5728\u89c6\u89c9-\u8bed\u8a00\u5bf9\u9f50\u65b9\u9762\u7684\u6548\u679c\u8bc4\u4f30\u4ecd\u663e\u4e0d\u8db3\uff0c\u901a\u5e38\u53ea\u80fd\u901a\u8fc7\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u95f4\u63a5\u63a8\u65ad\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u901a\u8fc7\u5206\u6790MLLM\u4e2d\u7684\u89c6\u89c9-\u8bed\u8a00\u8bed\u4e49\u6d41\uff0c\u6765\u89e3\u8bfb\u6295\u5f71\u5668\u7684\u5de5\u4f5c\u673a\u5236\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u8005\u8ffd\u8e2a\u4ece\u751f\u6210\u7684\u8bed\u8a00\u6807\u8bb0\u5230\u539f\u59cb\u89c6\u89c9\u7f16\u7801\u5757\u4ee5\u53ca\u6295\u5f71\u5668\u4ea7\u751f\u7684\u4e2d\u95f4\u8f93\u51fa\u4e4b\u95f4\u7684\u8bed\u4e49\u76f8\u5173\u6027\u6d41\u3002\u53d1\u73b0\u538b\u7f29\u578b\u6295\u5f71\u5668\uff08\u5982QFormer\uff09\u503e\u5411\u4e8e\u5c06\u89c6\u89c9\u5757\u62bd\u8c61\u6210\u6709\u9650\u7684\u51e0\u4e2a\u6982\u5ff5\uff0c\u5982\u7269\u4f53\u6216\u5c5e\u6027\uff0c\u5bfc\u81f4\u201c\u53cc\u91cd\u62bd\u8c61\u201d\u73b0\u8c61\uff1a\u9996\u5148\uff0c\u6295\u5f71\u5668\u53c2\u7167\u9884\u5b9a\u4e49\u67e5\u8be2\u4ee4\u724c\u8fdb\u884c\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\uff0c\u7136\u540e\uff0c\u57fa\u4e8e\u6587\u672c\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u4e00\u6b65\u63d0\u53d6\u3002\u8fd9\u79cd\u53cc\u91cd\u62bd\u8c61\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6548\u7387\u4e0d\u9ad8\uff0c\u5e76\u53ef\u80fd\u5bfc\u81f4\u89c6\u89c9\u8bed\u4e49\u4fe1\u606f\u7684\u7d2f\u79ef\u7f3a\u5931\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u63d0\u51fa\u201c\u89e3\u8026\u538b\u7f29\u4e0e\u62bd\u8c61\uff08DeCo\uff09\u201d\u7684\u5173\u952e\u6d1e\u5bdf\uff0c\u5373\u5728\u6295\u5f71\u5c42\u9762\u4e0a\u5c06\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u538b\u7f29\uff0c\u800c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u5b8c\u5168\u8d1f\u8d23\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u538b\u7f29\u5668\u2014\u2014\u4e8c\u7ef4\u81ea\u9002\u5e94\u6c60\u5316\uff0c\u4ee5\u65e0\u53c2\u6570\u7684\u65b9\u5f0f\u964d\u4f4e\u89c6\u89c9\u5757\u7684\u5c3a\u5bf8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDeCo\u5728\u6027\u80fd\u548c\u6548\u7387\u4e0a\u90fd\u4f18\u4e8e\u4f20\u7edf\u7684\u538b\u7f29\u6295\u5f71\u5668\u3002\u5b83\u5728MLLM\u57fa\u51c6\u3001\u89c6\u89c9\u5b9a\u4f4d\u548c\u5f00\u653e\u6027\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\u5206\u522b\u53d6\u5f97\u4e860.9%\u30017.1%\u548c2.9%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u62e5\u6709\u66f4\u5c11\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u66f4\u5feb\u7684\u6536\u655b\u901f\u5ea6\u3002|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978](http://arxiv.org/abs/2405.20978)|**[link](https://github.com/calubkk/raat)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5f3a\u5927\u529f\u80fd\uff0c\u4f46\u9762\u4e34\u6311\u6218\uff0c\u5982\u865a\u6784\u3001\u8fc7\u65f6\u77e5\u8bc6\u548c\u96be\u4ee5\u8ffd\u6eaf\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u5b83\u7ed3\u5408\u5916\u90e8\u6570\u636e\u5e93\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u4e0d\u9002\u5f53\u7684\u68c0\u7d22\u6bb5\u843d\u53ef\u80fd\u59a8\u788dLLMs\u751f\u6210\u5168\u9762\u4e14\u9ad8\u8d28\u91cf\u7684\u56de\u7b54\u3002\u5148\u524d\u5173\u4e8eRAG\u4e2d\u68c0\u7d22\u566a\u58f0\u7a33\u5065\u6027\u7684\u7814\u7a76\u5f80\u5f80\u5c40\u9650\u4e8e\u6709\u9650\u7684\u566a\u58f0\u7c7b\u578b\uff0c\u8fd9\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u68c0\u7d22\u73af\u5883\u4e0d\u7b26\uff0c\u9650\u5236\u4e86\u5b9e\u9645\u5e94\u7528\u3002\u672c\u7814\u7a76\u9996\u5148\u63a2\u8ba8\u4e86\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u5c06\u5176\u5206\u4e3a\u4e09\u79cd\u4e0d\u540c\u7684\u7c7b\u522b\uff0c\u53cd\u6620\u771f\u5b9e\u73af\u5883\u3002\u6211\u4eec\u5206\u6790\u4e86\u8fd9\u4e9b\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u566a\u58f0\u5bf9LLMs\u7a33\u5065\u6027\u7684\u5f71\u54cd\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAG\u65b9\u6cd5\uff0c\u79f0\u4e3a\u68c0\u7d22\u589e\u5f3a\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\uff08RAAT\uff09\u3002RAAT\u5229\u7528\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\u6765\u52a8\u6001\u8c03\u6574\u6a21\u578b\u7684\u8bad\u7ec3\u6d41\u7a0b\u4ee5\u5e94\u5bf9\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u91c7\u7528\u591a\u4efb\u52a1\u5b66\u4e60\u786e\u4fdd\u6a21\u578b\u80fd\u591f\u8bc6\u522b\u5608\u6742\u7684\u4e0a\u4e0b\u6587\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5404\u79cd\u566a\u58f0\u6761\u4ef6\u4e0b\uff0c\u4f7f\u7528RAAT\u8bad\u7ec3\u7684LLaMA-2 7B\u6a21\u578b\u5728F1\u548cEM\u5206\u6570\u4e0a\u663e\u793a\u51fa\u663e\u8457\u63d0\u5347\u3002\u4e3a\u4e86\u4fbf\u4e8e\u590d\u73b0\uff0c\u6211\u4eec\u5df2\u5728https://github.com/calubkk/RAAT\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2405.20974": "|**2024-05-31**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974](http://arxiv.org/abs/2405.20974)|**[link](https://github.com/xu1868/sayself)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u51c6\u786e\u6216\u865a\u5047\u7684\u4fe1\u606f\uff0c\u5e76\u4e14\u901a\u5e38\u65e0\u6cd5\u8868\u660e\u5176\u4fe1\u5fc3\u6c34\u5e73\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u76f4\u63a5\u63d0\u793a\u6216\u81ea\u6211\u4e00\u81f4\u6027\u63d0\u793a\u6765\u63d0\u53d6LLMs\u7684\u4fe1\u5fc3\uff0c\u6216\u8005\u6784\u5efa\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6027\u80fd\u8f83\u5dee\uff0c\u800c\u57fa\u4e8e\u8bad\u7ec3\u7684\u65b9\u6cd5\u53c8\u5c40\u9650\u4e8e\u4e8c\u5143\u6216\u4e0d\u7cbe\u786e\u7684\u6574\u4f53\u4fe1\u5fc3\u4f30\u8ba1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014SaySelf\uff0c\u8fd9\u662f\u4e00\u4e2a\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLMs\u63d0\u4f9b\u66f4\u7cbe\u786e\u7684\u7ec6\u7c92\u5ea6\u4fe1\u5fc3\u4f30\u8ba1\u3002 \u6b64\u5916\uff0cSaySelf\u8fd8\u63a8\u52a8LLMs\u751f\u6210\u81ea\u6211\u53cd\u601d\u7684\u89e3\u91ca\uff0c\u660e\u786e\u6307\u51fa\u5b83\u4eec\u5728\u53c2\u6570\u77e5\u8bc6\u4e0a\u7684\u7a7a\u767d\u5e76\u89e3\u91ca\u4e0d\u786e\u5b9a\u6027\u3002\u8fd9\u662f\u901a\u8fc7\u8ba9LLM\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u81ea\u52a8\u603b\u7ed3\u7279\u5b9a\u77e5\u8bc6\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u6765\u5b9e\u73b0\u7684\u3002\u8fd9\u79cd\u603b\u7ed3\u662f\u57fa\u4e8e\u5bf9\u591a\u4e2a\u91c7\u6837\u63a8\u7406\u94fe\u7684\u4e0d\u4e00\u81f4\u6027\u5206\u6790\uff0c\u751f\u6210\u7684\u6570\u636e\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u6821\u51c6\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u5956\u52b1\u51c6\u786e\u3001\u9ad8\u7f6e\u4fe1\u5ea6\u7684\u9884\u6d4b\uff0c\u540c\u65f6\u60e9\u7f5a\u9519\u8bef\u8f93\u51fa\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u5728\u5206\u5e03\u5185\u8fd8\u662f\u5206\u5e03\u5916\u7684\u6570\u636e\u96c6\u4e0a\uff0cSaySelf\u90fd\u80fd\u6709\u6548\u51cf\u5c11\u4fe1\u5fc3\u6821\u51c6\u8bef\u5dee\uff0c\u540c\u65f6\u4fdd\u6301\u4efb\u52a1\u6027\u80fd\u3002\u751f\u6210\u7684\u81ea\u6211\u53cd\u601d\u7406\u7531\u4e5f\u88ab\u8bc1\u660e\u662f\u5408\u7406\u7684\uff0c\u80fd\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u6821\u51c6\u3002\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a\\url{https://github.com/xu1868/SaySelf}\u3002**|\n", "2405.20973": "|**2024-05-31**|**LCQ: Low-Rank Codebook based Quantization for Large Language Models**|Wen-Pu Cai et.al.|[2405.20973](http://arxiv.org/abs/2405.20973)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4f18\u5f02\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7684\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u9ad8\u6210\u4e3a\u90e8\u7f72\u7684\u4e00\u5927\u6311\u6218\u3002\u4e3a\u4e86\u538b\u7f29\u6a21\u578b\u5e76\u964d\u4f4e\u6210\u672c\uff0c\u6743\u91cd\u91cf\u5316\u6280\u672f\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u76ee\u524d\uff0c\u5927\u591a\u6570\u9488\u5bf9LLMs\u7684\u91cf\u5316\u65b9\u6cd5\u4f7f\u7528\u79e9\u4e00\u7801\u672c\uff0c\u7136\u800c\u5728\u9ad8\u538b\u7f29\u6bd4\u4e0b\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u7cbe\u5ea6\u635f\u5931\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6743\u91cd\u91cf\u5316\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4f4e\u79e9\u7801\u672c\u91cf\u5316\uff08LCQ\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 ## \u65b9\u6cd5 LCQ\u91c7\u7528\u4f4e\u79e9\u7801\u672c\u8fdb\u884c\u91cf\u5316\uff0c\u5176\u79e9\u53ef\u4ee5\u5927\u4e8e\u4e00\u3002\u8fd9\u79cd\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u5229\u7528\u66f4\u9ad8\u7684\u79e9\u6765\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u7684\u7cbe\u5ea6\uff0c\u540c\u65f6\u63a7\u5236\u989d\u5916\u7684\u5b58\u50a8\u5f00\u9500\u51e0\u4e4e\u4e3a\u96f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u6bd4\uff0cLCQ\u5728\u4fdd\u6301\u826f\u597d\u51c6\u786e\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u80fd\u591f\u5b9e\u73b0\u66f4\u4f18\u7684\u538b\u7f29\u6548\u679c\u3002 ## \u7ed3\u8bba \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4f4e\u79e9\u7801\u672c\u91cf\u5316\u65b9\u6cd5\uff0c\u5b83\u6709\u671b\u5728\u4e0d\u663e\u8457\u589e\u52a0\u5b58\u50a8\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6027\u80fd\u548c\u6548\u7387\uff0c\u4e3a\u9ad8\u6548\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550](http://arxiv.org/abs/2406.02550)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|**\u8fd9\u7bc7\u5de5\u4f5c\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7ec4\u6a21\u5757\u5316\u7b97\u672f\u4efb\u52a1\u4e2d\u51fa\u73b0\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u6280\u80fd\u7ec4\u5408\u73b0\u8c61\u3002\u6211\u4eec\u5173\u6ce8\u7684\u662f\u6709\u9650\u6570\u91cf\u7684\u4e00\u6b21\u6027\u6a21\u8fd0\u7b97\u51fd\u6570 $z = a \\times x + b \\times y \\;(\\text{mod}\\; p)$\uff0c\u8fd9\u4e9b\u51fd\u6570\u7531\u5411\u91cf $(a, b) \\in \\mathbb{Z}_p^2$ \u6807\u8bb0\u3002\u90e8\u5206\u4efb\u52a1\u88ab\u7528\u4f5c\u9884\u8bad\u7ec3\uff0c\u5176\u4f59\u7528\u4e8e\u5206\u5e03\u5916\u6d4b\u8bd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cGPT\u98ce\u683c\u7684Transformer\u968f\u7740\u9884\u8bad\u7ec3\u4efb\u52a1\u6570\u91cf\u589e\u52a0\uff0c\u5176\u5728\u5206\u5e03\u5185\u548c\u5206\u5e03\u5916\u7684\u6cdb\u5316\u80fd\u529b\u4f1a\u7ecf\u5386\u8f6c\u53d8\u3002\u6700\u5c0f\u578b\u80fd\u5b9e\u73b0\u5206\u5e03\u5916\u6cdb\u5316\u7684\u6a21\u578b\u9700\u8981\u4e24\u4e2aTransformer\u5757\uff1b\u800c\u5bf9\u4e8e\u66f4\u6df1\u7684\u6a21\u578b\uff0c\u5206\u5e03\u5916\u6cdb\u5316\u9636\u6bb5\u662f\u201c\u77ac\u6001\u201d\u7684\uff0c\u9700\u8981\u65e9\u671f\u505c\u6b62\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u4e86\u53ef\u89e3\u91ca\u6027\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e24\u79cd\u9636\u6bb5\u4e2d\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8868\u793a\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b66\u4e60\u5230\u7684\u7b97\u6cd5\u3002**|\n", "2406.02547": "|**2024-06-04**|**Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning**|Alex Jinpeng Wang et.al.|[2406.02547](http://arxiv.org/abs/2406.02547)|**[link](https://github.com/showlab/VisInContext)**|**\u8fd9\u6bb5\u7814\u7a76\u5e76\u672a\u4ecb\u7ecd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u800c\u662f\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u63d0\u5347\u957f\u5e8f\u5217\u5728\u591a\u6a21\u6001\u6a21\u578b\u4e2d\u7684\u5904\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201cVisualized In-Context Text Processing\u201d\uff08VisInContext\uff09\u6280\u672f\uff0c\u901a\u8fc7\u89c6\u89c9\u4ee4\u724c\u6765\u5904\u7406\u957f\u6587\u672c\uff0c\u4ece\u800c\u663e\u8457\u964d\u4f4eGPU\u5185\u5b58\u4f7f\u7528\u548c\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9700\u6c42\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a560\u4ebf\u53c2\u6570\u7684\u6df7\u5408 Experts\uff08MOE\uff09\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u9884\u8bad\u7ec3\u4e2d\u7684\u4e0a\u4e0b\u6587\u6587\u672c\u957f\u5ea6\u6269\u5c55\u5230\u4e862048\u4e2atokens\uff0c\u800c\u8ba1\u7b97\u91cf\u51e0\u4e4e\u4fdd\u6301\u4e0d\u53d8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528VisInContext\u8bad\u7ec3\u7684\u6a21\u578b\u5728\u5e38\u89c1\u7684\u57fa\u4e8e\u5b9e\u4f8b\u7684\u5c11\u91cf\u6570\u636e\u8bc4\u4f30\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cVisInContext\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u7ed3\u5408\uff0c\u80fd\u589e\u5f3a\u5bf9\u6587\u6863\u7684\u7406\u89e3\u80fd\u529b\uff0c\u7279\u522b\u9002\u7528\u4e8e\u6587\u6863\u95ee\u7b54\u548c\u8fde\u7eed\u6587\u6863\u68c0\u7d22\uff0c\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002**|\n", "2406.02543": "|**2024-06-04**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543](http://arxiv.org/abs/2406.02543)|null|\u6211\u4eec\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u76ee\u6807\u662f\u8bc6\u522b\u5bf9\u7ed9\u5b9a\u67e5\u8be2\u7684\u54cd\u5e94\u65f6\u7684\u4e0d\u786e\u5b9a\u6027\u7a0b\u5ea6\u3002\u6211\u4eec\u540c\u65f6\u8003\u8651\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e0d\u786e\u5b9a\u6027\uff1a\u4e00\u79cd\u662f\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\uff08\u4f8b\u5982\u5bf9\u4e8b\u5b9e\u6216\u8bed\u8a00\u771f\u7406\u7684\u672a\u77e5\uff09\uff0c\u53e6\u4e00\u79cd\u662f\u4e0d\u53ef\u6d88\u9664\u7684\u968f\u673a\u6027\uff08\u5982\u53ef\u80fd\u7684\u7b54\u6848\u591a\u6837\u6027\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4fe1\u606f\u8bba\u6307\u6807\uff0c\u80fd\u591f\u53ef\u9760\u5730\u533a\u5206\u51fa\u53ea\u6709\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u8f83\u5927\u7684\u60c5\u51b5\uff0c\u8fd9\u65f6\u6a21\u578b\u7684\u8f93\u51fa\u662f\u4e0d\u53ef\u9760\u7684\u3002\u8fd9\u4e2a\u6761\u4ef6\u4ec5\u4f9d\u8d56\u4e8e\u901a\u8fc7\u7279\u6b8a\u8fed\u4ee3\u63d0\u793a\u57fa\u4e8e\u5148\u524d\u54cd\u5e94\u5f97\u5230\u7684\u6a21\u578b\u8f93\u51fa\u6765\u8ba1\u7b97\u3002\u8fd9\u79cd\u91cf\u5316\u65b9\u6cd5\u53ef\u4ee5\u68c0\u6d4b\u5355\u7b54\u548c\u591a\u7b54\u60c5\u51b5\u4e0b\u662f\u5426\u5b58\u5728\u865a\u6784\uff08\u5373\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u9ad8\uff09\u7684\u60c5\u51b5\uff0c\u8fd9\u4e0e\u8bb8\u591a\u6807\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\u7b56\u7565\uff08\u5982\u4ee5\u54cd\u5e94\u7684\u5bf9\u6570\u4f3c\u7136\u6027\u4f5c\u4e3a\u9608\u503c\uff09\u4e0d\u540c\uff0c\u540e\u8005\u65e0\u6cd5\u8bc6\u522b\u591a\u7b54\u60c5\u51b5\u4e0b\u7684\u865a\u6784\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86LLM\u5982\u4f55\u901a\u8fc7\u8fed\u4ee3\u63d0\u793a\u653e\u5927\u5bf9\u7ed9\u5b9a\u8f93\u51fa\u7684\u6982\u7387\u5206\u914d\uff0c\u8fd9\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u5174\u8da3\u4ef7\u503c\u3002|\n", "2406.02542": "|**2024-06-04**|**Loki: Low-Rank Keys for Efficient Sparse Attention**|Prajwal Singhania et.al.|[2406.02542](http://arxiv.org/abs/2406.02542)|null|\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u7279\u522b\u662f\u5f53\u4f7f\u7528\u957f\u5e8f\u5217\u65f6\uff0c\u81ea\u6ce8\u610f\u529b\u673a\u5236\u662f\u4e3b\u8981\u5f00\u9500\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u7a00\u758f\u6ce8\u610f\u529b\u8fd1\u4f3c\u65b9\u6cd5\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5206\u6790\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u5757\u4e2d\u7684\u952e\u5411\u91cf\u5b9e\u9645\u4e0a\u5904\u4e8e\u4e00\u4e2a\u8fdc\u4f4e\u4e8e\u539f\u59cb\u7ef4\u5ea6\u7684\u7a7a\u95f4\u3002\u8fd9\u4e00\u89c2\u5bdf\u4fc3\u4f7f\u6211\u4eec\u63d0\u51faLoki\uff0c\u4e00\u79cd\u65b0\u7684\u7a00\u758f\u6ce8\u610f\u529b\u65b9\u6cd5\u3002Loki\u6839\u636e\u5728\u4f4e\u7ef4\u7a7a\u95f4\u8ba1\u7b97\u7684\u6ce8\u610f\u529b\u5f97\u5206\uff0c\u5bf9KV\u7f13\u5b58\u4e2d\u7684\u4ee4\u724c\u8fdb\u884c\u6392\u5e8f\u548c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLoki\u80fd\u591f\u6bd4\u5176\u4ed6\u6d41\u884c\u8fd1\u4f3c\u65b9\u6cd5\u66f4\u597d\u5730\u4fdd\u6301\u6a21\u578b\u7684\u6548\u80fd\uff0c\u540c\u65f6\u7531\u4e8e\u51cf\u5c11\u4e86\u6570\u636e\u79fb\u52a8\uff08\u52a0\u8f7d/\u5b58\u50a8\uff09\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u52a0\u901f\u4e86\u6ce8\u610f\u529b\u8ba1\u7b97\u3002|\n", "2406.02539": "|**2024-06-04**|**Parrot: Multilingual Visual Instruction Tuning**|Hai-Long Sun et.al.|[2406.02539](http://arxiv.org/abs/2406.02539)|**[link](https://github.com/aidc-ai/parrot)**|\u968f\u7740GPT-4V\u7b49\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u5de5\u667a\u80fd\u671d\u7740\u901a\u7528\u4eba\u5de5\u667a\u80fd\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6765\u540c\u6b65\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u800c\u8d4b\u4e88\u5b83\u4eec\u591a\u6a21\u6001\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u505a\u6cd5\u53ef\u80fd\u5bfc\u81f4\u968f\u7740\u8bad\u7ec3\u7684\u8fdb\u884c\uff0c\u8bed\u8a00\u6a21\u578b\u5904\u7406\u591a\u79cd\u8bed\u8a00\u7684\u80fd\u529b\u9010\u6e10\u51cf\u5f31\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684\u4e0d\u5e73\u8861SFT\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u975e\u82f1\u8bed\u8bed\u8a00\u6027\u80fd\u663e\u8457\u4e0b\u964d\uff0c\u539f\u56e0\u5728\u4e8eSFT\u8fc7\u7a0b\u4e2d\u672a\u80fd\u6709\u6548\u8fde\u63a5\u89c6\u89c9\u7f16\u7801\u5668\u548c\u591a\u8bed\u8a00\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faParrot\uff0c\u4e00\u79cd\u5229\u7528\u6587\u672c\u5f15\u5bfc\u5728\u8bed\u8a00\u5c42\u9762\u9a71\u52a8\u89c6\u89c9\u4ee4\u724c\u5bf9\u9f50\u7684\u65b0\u65b9\u6cd5\u3002Parrot\u901a\u8fc7\u8ba9\u89c6\u89c9\u4ee4\u724c\u6839\u636e\u4e0d\u540c\u7684\u8bed\u8a00\u8f93\u5165\u8fdb\u884c\u6761\u4ef6\u5316\uff0c\u5e76\u501f\u52a9\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u4fc3\u8fdb\u591a\u8bed\u8a00\u4ee4\u724c\u7684\u5bf9\u9f50\u3002\u7279\u522b\u662f\uff0c\u4e3a\u4e86\u589e\u5f3a\u975e\u82f1\u8bed\u89c6\u89c9\u4ee4\u724c\u7684\u5bf9\u9f50\uff0c\u6211\u4eec\u8ba1\u7b97\u521d\u59cb\u89c6\u89c9\u7279\u5f81\u4e0e\u6587\u672c\u5d4c\u5165\u4e4b\u95f4\u7684\u8de8\u6ce8\u610f\u529b\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230MoE\u8def\u7531\u5668\uff0c\u9009\u62e9\u6700\u76f8\u5173\u7684\u4e13\u5bb6\u3002\u9009\u5b9a\u7684\u4e13\u5bb6\u4f1a\u5c06\u521d\u59cb\u89c6\u89c9\u4ee4\u724c\u8f6c\u5316\u4e3a\u7279\u5b9a\u8bed\u8a00\u7684\u89c6\u89c9\u4ee4\u724c\u3002\u9274\u4e8e\u76ee\u524d\u7f3a\u4e4f\u8bc4\u4f30\u591a\u8bed\u8a00\u80fd\u529b\u7684\u6807\u51c6\u57fa\u51c6\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u591a\u6a21\u6001\u57fa\u51c6\uff08MMMB\uff09\uff0c\u5305\u62ec6\u79cd\u8bed\u8a00\u300115\u4e2a\u7c7b\u522b\u548c12,000\u4e2a\u95ee\u9898\u3002Parrot\u4e0d\u4ec5\u5728MMMB\u548cMMM Benchmark\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8fd8\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5c06\u63d0\u4f9bParrot\u7684\u6e90\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\u96c6\u4f9b\u516c\u4f17\u4f7f\u7528\u3002|\n", "2406.02536": "|**2024-06-04**|**Mitigate Position Bias in Large Language Models via Scaling a Single Dimension**|Yijiong Yu et.al.|[2406.02536](http://arxiv.org/abs/2406.02536)|**[link](https://github.com/PositionalHidden/PositionalHidden)**|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4e00\u4e2a\u73b0\u8c61\u2014\u2014\u4f4d\u7f6e\u504f\u89c1\uff0c\u4e5f\u79f0\u4e3a\"\u8ff7\u5931\u5728\u4e2d\u95f4\"\u3002\u8fd9\u79cd\u504f\u89c1\u5728\u957f\u6587\u672c\u60c5\u5883\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u5373\u5173\u952e\u4fe1\u606f\u5728\u63d0\u793a\u4e2d\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f1a\u663e\u8457\u5f71\u54cd\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u6743\u91cd\u662f\u4f4d\u7f6e\u504f\u89c1\u7684\u5fae\u89c2\u8868\u73b0\u3002\u6b64\u5916\uff0c\u8bba\u6587\u6307\u51fa\uff0c\u56e0\u679c\u6ce8\u610f\u529b\u63a9\u7801\u901a\u8fc7\u521b\u5efa\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\uff0c\u4e5f\u5bf9\u4f4d\u7f6e\u504f\u89c1\u6709\u6240\u8d21\u732e\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u51cf\u8f7b\u4f4d\u7f6e\u504f\u89c1\uff0c\u5373\u8c03\u6574\u8fd9\u4e9b\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\u3002\u5b9e\u9a8c\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\uff0c\u5305\u62ec\u81ea\u7136\u95ee\u9898\u591a\u6587\u6863\u95ee\u7b54\u3001\u952e\u503c\u68c0\u7d22\u3001LongBench\u548c\u65f6\u95f4\u7ebf\u91cd\u6392\uff0c\u6d89\u53caRoPE\u6a21\u578b\u3001\u6269\u5c55\u4e0a\u4e0b\u6587\u7a97\u53e3\u6a21\u578b\u548cAlibi\u6a21\u578b\u7b49\u591a\u79cd\u67b6\u6784\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4ec5\u4fee\u6539\u9690\u85cf\u72b6\u6001\u7684\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u5c31\u80fd\u5b9e\u73b0\u6027\u80fd\u63d0\u5347\uff0c\u6700\u9ad8\u53ef\u8fbe15.2%\u3002\u7814\u7a76\u8005\u8fd8\u63d0\u4f9b\u4e86\u4ee3\u7801\u4f9b\u8fdb\u4e00\u6b65\u4f7f\u7528\uff0c\u4ee3\u7801\u5730\u5740\u4e3a\uff1ahttps://aka.ms/PositionalHidden\u3002|\n", "2406.02532": "|**2024-06-04**|**SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices**|Ruslan Svirschevski et.al.|[2406.02532](http://arxiv.org/abs/2406.02532)|**[link](https://github.com/yandex-research/specexec)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u9ad8\u6548\u8fd0\u884c\u5b83\u4eec\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u63a8\u6d4b\u6027\u89e3\u7801\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u662f\u9488\u5bf9\u6570\u636e\u4e2d\u5fc3\u786c\u4ef6\u8fdb\u884c\u8bbe\u8ba1\u3002\u672c\u7814\u7a76\u53cd\u95ee\uff1a\u6211\u4eec\u80fd\u5728\u6d88\u8d39\u7ea7\u8bbe\u5907\u4e0a\u591a\u5feb\u5730\u8fd0\u884cLLMs\uff1f\u6d88\u8d39\u8005\u7ea7GPU\u5df2\u65e0\u6cd5\u5bb9\u7eb3\u6700\u5927\u7684\u6a21\u578b\uff08500\u4ebf\u53c2\u6570\u4ee5\u4e0a\uff09\uff0c\u56e0\u6b64\u9700\u8981\u5c06\u53c2\u6570\u5378\u8f7d\u5230RAM\u6216SSD\u3002\u5f53\u4f7f\u7528\u5378\u8f7d\u53c2\u6570\u7684\u65b9\u5f0f\u8fd0\u884c\u65f6\uff0c\u63a8\u7406\u5f15\u64ce\u53ef\u4ee5\u540c\u65f6\u5904\u7406\u6570\u767e\u4e43\u81f3\u6570\u5343\u4e2a\u4ee4\u724c\u7684\u6279\u6b21\uff0c\u4f7f\u5176\u975e\u5e38\u9002\u5408\u63a8\u6d4b\u6027\u89e3\u7801\u3002\u6211\u4eec\u63d0\u51faSpecExec\uff08\u63a8\u6d4b\u6027\u6267\u884c\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u7684\u5e76\u884c\u89e3\u7801\u65b9\u6cd5\uff0c\u9002\u7528\u4e8e\u4e3b\u6d41LLM\u5bb6\u65cf\uff0c\u80fd\u751f\u6210\u6bcf\u8f6e\u76ee\u6807\u6a21\u578b\u8fed\u4ee3\u9ad8\u8fbe20\u4e2a\u4ee4\u724c\u7684\u9884\u6d4b\u3002\u5b83\u5229\u7528\u73b0\u4ee3LLMs\u4e2d\u6982\u7387\u5206\u5e03\u7684\u9ad8\u6ce2\u52a8\u6027\u548c\u6a21\u578b\u8f93\u51fa\u6982\u7387\u4e4b\u95f4\u7684\u9ad8\u5ea6\u4e00\u81f4\u6027\u3002SpecExec\u901a\u8fc7\u4ece\u8349\u7a3f\u6a21\u578b\u83b7\u53d6\u6700\u53ef\u80fd\u7684\u4ee4\u724c\u5ef6\u7eed\uff0c\u6784\u5efa\u4e00\u4e2a\u76ee\u6807\u6a21\u578b\u7684\u201c\u7f13\u5b58\u201d\u6811\uff0c\u7136\u540e\u5728\u4e00\u4e2a\u5355\u6b21\u904d\u5386\u4e2d\u9a8c\u8bc1\u3002 \u4f7f\u7528SpecExec\uff0c\u6211\u4eec\u5728\u6d88\u8d39\u7ea7GPU\u4e0a\u5b9e\u73b0\u4e86500\u4ebf\u53c2\u6570LLM\u7684\u63a8\u7406\uff0c\u914d\u5408RAM\u5378\u8f7d\uff0c4\u4f4d\u91cf\u5316\u4e0b\u7684\u901f\u5ea6\u8fbe\u52304-6\u4e2a\u4ee4\u724c/\u79d2\uff0c\u800c16\u4f4d\u6743\u91cd\u4e0b\u7684\u901f\u5ea6\u4e3a2-3\u4e2a\u4ee4\u724c/\u79d2\u3002|\n", "2406.02528": "|**2024-06-04**|**Scalable MatMul-free Language Modeling**|Rui-Jie Zhu et.al.|[2406.02528](http://arxiv.org/abs/2406.02528)|**[link](https://github.com/ridgerchu/matmulfreellm)**|**## \u7ffb\u8bd1 \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u77e9\u9635\u4e58\u6cd5\uff08MatMul\uff09\u901a\u5e38\u5360\u636e\u4e3b\u8981\u8ba1\u7b97\u5f00\u9500\u3002\u968f\u7740LLMs\u7684\u89c4\u6a21\u6269\u5927\uff0c\u5176\u5d4c\u5165\u7ef4\u5ea6\u548c\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e5f\u968f\u4e4b\u589e\u52a0\uff0c\u8fd9\u4e00\u95ee\u9898\u66f4\u4e3a\u663e\u8457\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u80fd\u591f\u5728\u4fdd\u6301\u5f3a\u5927\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5b8c\u5168\u79fb\u9664LLMs\u4e2d\u7684MatMul\u64cd\u4f5c\uff0c\u5373\u4f7f\u662f\u572827\u4ebf\u53c2\u6570\u91cf\u7ea7\u7684\u6a21\u578b\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65e0MatMul\u6a21\u578b\u5728\u4e0e\u5185\u5b58\u6d88\u8017\u663e\u8457\u66f4\u591a\u7684\u72b6\u6001-of-the-artTransformer\u76f8\u5f53\u7684\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u7814\u7a76\u4e86\u6a21\u578b\u7684\u6269\u5c55\u6027\u89c4\u5f8b\uff0c\u5e76\u53d1\u73b0\u65e0MatMul\u6a21\u578b\u4e0e\u5168\u7cbe\u5ea6Transformer\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u968f\u7740\u6a21\u578b\u5c3a\u5bf8\u589e\u5927\u800c\u51cf\u5c0f\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684GPU\u5b9e\u73b0\uff0c\u76f8\u8f83\u4e8e\u672a\u4f18\u5316\u7684\u57fa\u7ebf\uff0c\u8bad\u7ec3\u65f6\u80fd\u51cf\u5c11\u9ad8\u8fbe61%\u7684\u5185\u5b58\u4f7f\u7528\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u901a\u8fc7\u4f18\u5316\u7684\u5185\u6838\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5185\u5b58\u6d88\u8017\u53ef\u964d\u4f4e\u8d85\u8fc710\u500d\u3002\u4e3a\u4e86\u51c6\u786e\u8bc4\u4f30\u67b6\u6784\u6548\u7387\uff0c\u6211\u4eec\u5728FPGA\u4e0a\u6784\u5efa\u4e86\u5b9a\u5236\u786c\u4ef6\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528GPU\u65e0\u6cd5\u5904\u7406\u7684\u8f7b\u91cf\u7ea7\u8fd0\u7b97\uff0c\u5b9e\u73b0\u4e86\u5bf9\u5341\u4ebf\u53c2\u6570\u89c4\u6a21\u6a21\u578b\u7684\u9ad8\u901f\u5904\u7406\uff0c\u4f7f\u5176\u63a5\u8fd1\u4eba\u8111\u7ea7\u522b\u7684\u6548\u7387\u3002 \u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u5c55\u793a\u4e86LLMs\u5728\u51cf\u5c0f\u590d\u6742\u6027\u540e\u4ecd\u80fd\u4fdd\u6301\u9ad8\u6548\uff0c\u8fd8\u6307\u51fa\u4e86\u672a\u6765\u52a0\u901f\u5668\u5e94\u4f18\u5316\u7684\u8fd0\u7b97\u7c7b\u578b\uff0c\u4ee5\u9002\u5e94\u4e0b\u4e00\u4ee3\u8f7b\u91cf\u7ea7LLMs\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5b9e\u73b0\u5df2\u5f00\u6e90\u81f3\uff1a\\url{https://github.com/ridgerchu/matmulfreellm}\u3002**|\n", "2406.02524": "|**2024-06-04**|**CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks**|Maciej Besta et.al.|[2406.02524](http://arxiv.org/abs/2406.02524)|**[link](https://github.com/spcl/checkembed)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u53d8\u9769\uff0c\u4f46\u9a8c\u8bc1\u5176\u7b54\u6848\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u5f00\u653e\u6027\u7684\u4efb\u52a1\uff0c\u5982\u77e5\u8bc6\u6574\u5408\u3001\u6458\u8981\u548c\u63d0\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheckEmbed\u7684\u7cbe\u786e\u3001\u53ef\u6269\u5c55\u4e14\u7b80\u4fbf\u7684LLM\u9a8c\u8bc1\u65b9\u6cd5\u3002CheckEmbed\u7684\u6838\u5fc3\u7406\u5ff5\u662f\uff1a\u901a\u8fc7\u5229\u7528\u5982GPT\u6587\u672c\u5d4c\u5165\u5927\u6a21\u578b\u83b7\u53d6\u7684\u7b54\u6848\u7ea7\u5d4c\u5165\u6765\u6bd4\u8f83LLM\u7684\u56de\u7b54\u3002\u8fd9\u5c06\u590d\u6742\u7684\u6587\u672c\u7b54\u6848\u8f6c\u5316\u4e3a\u5355\u4e00\u7684\u5d4c\u5165\uff0c\u7b80\u5316\u4e86\u5bf9\u6bd4\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u5feb\u901f\u800c\u6709\u610f\u4e49\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u9a8c\u8bc1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u5b9e\u73b0\u4e86CheckEmbed\u7684\u7406\u5ff5\uff0c\u5e76\u63d0\u4f9b\u4e86\u8bc4\u4f30LLM\u7b54\u6848\u771f\u5b9e\u6027\u7684\u5ea6\u91cf\uff0c\u5982\u5d4c\u5165\u70ed\u529b\u56fe\u53ca\u5176\u603b\u7ed3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6307\u6807\u8bbe\u8ba1\u5b9e\u9645\u7684\u5f15\u64ce\uff0c\u4ee5\u51b3\u5b9aLLM\u7b54\u6848\u662f\u5426\u4ee4\u4eba\u6ee1\u610f\u3002\u5728\u5b9e\u9645\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\uff0c\u5982\u672f\u8bed\u63d0\u53d6\u548c\u6587\u6863\u6458\u8981\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u663e\u8457\u7684\u51c6\u786e\u6027\u63d0\u5347\u3001\u6210\u672c\u6548\u76ca\u548c\u8fd0\u884c\u65f6\u95f4\u6027\u80fd\uff0c\u76f8\u8f83\u4e8eBERTScore\u6216SelfCheckGPT\u7b49\u57fa\u4e8etoken\u3001\u53e5\u5b50\u548c\u4e8b\u5b9e\u7ea7\u522b\u7684\u65b9\u6848\u3002|\n", "2406.02523": "|**2024-06-04**|**RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots**|Soroush Nasiriany et.al.|[2406.02523](http://arxiv.org/abs/2406.02523)|null|## \u7ffb\u8bd1 \u4eba\u5de5\u667a\u80fd\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u89c4\u6a21\u7684\u6269\u5927\u3002\u7136\u800c\uff0c\u5728\u673a\u5668\u4eba\u9886\u57df\uff0c\u5927\u89c4\u6a21\u673a\u5668\u4eba\u6570\u636e\u96c6\u7684\u83b7\u53d6\u662f\u4e00\u4e2a\u74f6\u9888\u3002\u6211\u4eec\u4e3b\u5f20\u5229\u7528\u903c\u771f\u7684\u7269\u7406\u6a21\u62df\u6765\u63d0\u5347\u73af\u5883\u3001\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\uff0c\u4ee5\u652f\u6301\u673a\u5668\u4eba\u5b66\u4e60\u65b9\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecdRoboCasa\uff0c\u8fd9\u662f\u4e00\u4e2a\u5927\u578b\u7684\u4eff\u771f\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u80fd\u591f\u5728\u65e5\u5e38\u73af\u5883\u4e2d\u901a\u7528\u7684\u673a\u5668\u4eba\u3002RoboCasa\u7684\u7279\u70b9\u662f\u62e5\u6709\u4e30\u5bcc\u4e14\u591a\u6837\u5316\u7684\u53a8\u623f\u573a\u666f\uff0c\u5305\u62ec\u8d85\u8fc7150\u4e2a\u7c7b\u522b\u7684\u4e00\u5343\u591a\u4ef63D\u6a21\u578b\u8d44\u4ea7\u548c\u6570\u5341\u79cd\u53ef\u4ea4\u4e92\u7684\u5bb6\u5177\u548c\u7535\u5668\u3002 \u6211\u4eec\u901a\u8fc7\u751f\u6210\u5f0fAI\u5de5\u5177\u8fdb\u4e00\u6b65\u589e\u5f3a\u6a21\u62df\u7684\u771f\u5b9e\u6027\u548c\u591a\u6837\u6027\uff0c\u5982\u4f7f\u7528\u6587\u672c\u52303D\u6a21\u578b\u7684\u6280\u672f\u751f\u6210\u5bf9\u8c61\u8d44\u4ea7\uff0c\u4ee5\u53ca\u901a\u8fc7\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u73af\u5883\u7eb9\u7406\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86100\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6307\u5bfc\u7684\u590d\u5408\u4efb\u52a1\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f30\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u6f14\u793a\uff0c\u5e76\u7ed3\u5408\u81ea\u52a8\u8f68\u8ff9\u751f\u6210\u65b9\u6cd5\uff0c\u4ee5\u6700\u5c0f\u7684\u4eba\u529b\u6210\u672c\u5927\u5e45\u6269\u5145\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u5408\u6210\u751f\u6210\u7684\u673a\u5668\u4eba\u6570\u636e\u8fdb\u884c\u5927\u89c4\u6a21\u6a21\u4eff\u5b66\u4e60\u65f6\uff0c\u5b58\u5728\u660e\u663e\u7684\u89c4\u6a21\u6548\u5e94\uff0c\u5e76\u663e\u793a\u51fa\u5229\u7528\u6a21\u62df\u6570\u636e\u5728\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\u3002\u76f8\u5173\u89c6\u9891\u548c\u5f00\u6e90\u4ee3\u7801\u5df2\u5728https://robocasa.ai/\u7f51\u7ad9\u4e0a\u63d0\u4f9b\u3002|\n", "2406.03496": "|**2024-06-05**|**Wings: Learning Multimodal LLMs without Text-only Forgetting**|Yi-Kai Zhang et.al.|[2406.03496](http://arxiv.org/abs/2406.03496)|null|## \u4efb\u52a1 \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8d77\u6e90\u4e8e\u9884\u8bad\u7ec3\u7684\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff0c\u9996\u5148\u5c06\u56fe\u50cf\u4e0e\u6587\u672c\u5bf9\u9f50\uff0c\u7136\u540e\u5728\u6df7\u5408\u6a21\u6001\u8f93\u5165\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0cMLLM\u5728\u5904\u7406\u4ec5\u5305\u542b\u6587\u672c\u7684\u6307\u4ee4\u65f6\u4f1a\u51fa\u73b0\u707e\u96be\u6027\u7684\u9057\u5fd8\uff0c\u8fd9\u4e9b\u6587\u672c\u6307\u4ee4\u5e76\u672a\u5305\u542b\u56fe\u50cf\uff0c\u8fd9\u4e9b\u95ee\u9898\u5728\u521d\u59cb\u7684\u8bed\u8a00\u6a21\u578b\u9636\u6bb5\u5c31\u5df2\u7ecf\u5b58\u5728\u3002\u672c\u6587\u63d0\u51faWings\uff0c\u4e00\u4e2a\u65b0\u578b\u7684MLLM\uff0c\u5b83\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u901a\u8fc7\u5206\u6790MLLM\u5728\u591a\u6a21\u6001\u6307\u4ee4\u4e2d\u7684\u6ce8\u610f\u529b\uff0c\u6211\u4eec\u53d1\u73b0\u6587\u672c\u9057\u5fd8\u4e0e\u4ece\u56fe\u50cf\u524d\u5411\u56fe\u50cf\u540e\u7684\u6ce8\u610f\u529b\u8f6c\u79fb\u6709\u5173\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u989d\u5916\u6a21\u5757\u4f5c\u4e3a\u589e\u5f3a\u5b66\u4e60\u5668\uff0c\u4ee5\u8865\u507f\u8fd9\u79cd\u6ce8\u610f\u529b\u8f6c\u79fb\u3002\u89c6\u89c9\u548c\u6587\u672c\u5b66\u4e60\u5668\u4f5c\u4e3a\u201c\u7fc5\u8180\u201d\u5f0f\u7684\u8865\u5145\uff0c\u5e73\u884c\u8fde\u63a5\u5728\u6bcf\u4e2a\u6ce8\u610f\u529b\u5757\u5185\uff0c\u8d77\u521d\u56fe\u50cf\u548c\u6587\u672c\u8f93\u5165\u7531\u89c6\u89c9\u5b66\u4e60\u5668\u4e0e\u4e3b\u6ce8\u610f\u529b\u534f\u540c\u5de5\u4f5c\uff0c\u5e73\u8861\u5bf9\u89c6\u89c9\u5143\u7d20\u7684\u5173\u6ce8\u3002\u968f\u540e\uff0c\u6587\u672c\u5b66\u4e60\u5668\u901a\u8fc7\u6ce8\u610f\u529b\u8def\u7531\u7684\u65b9\u5f0f\u4e0e\u89c6\u89c9\u5b66\u4e60\u5668\u7684\u8f93\u51fa\u534f\u4f5c\u6574\u5408\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4f4e\u79e9\u6b8b\u5dee\u6ce8\u610f\u529b\uff08LoRRA\uff09\u673a\u5236\u4ee5\u4fdd\u8bc1\u5b66\u4e60\u5668\u7684\u9ad8\u6548\u8fd0\u884c\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWings\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u540c\u7b49\u89c4\u6a21\u7684MLLM\u3002\u5728\u6211\u4eec\u65b0\u6784\u5efa\u7684\u4ea4\u9519\u56fe\u50cf-\u6587\u672c\uff08IIT\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cWings\u5728\u4ece\u6587\u672c\u4e3a\u4e3b\u5230\u591a\u6a21\u6001\u4e3a\u4e3b\u7684\u95ee\u7b54\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002|\n", "2406.03488": "|**2024-06-06**|**Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training**|Ao Sun et.al.|[2406.03488](http://arxiv.org/abs/2406.03488)|**[link](https://github.com/maydomine/seq1f1b)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5176\u4e2d\u7ba1\u9053\u5e76\u884c\u6027\u8d77\u7740\u5173\u952e\u4f5c\u7528\u3002\u968f\u7740LLMs\u7684\u8bad\u7ec3\u5e8f\u5217\u957f\u5ea6\u6269\u5c55\u523032k\u751a\u81f3128k\uff0c\u5f53\u524d\u7684\u7ba1\u9053\u5e76\u884c\u65b9\u6cd5\u9762\u4e34\u4e25\u91cd\u74f6\u9888\uff0c\u5982\u9ad8\u5185\u5b58\u5360\u7528\u548c\u663e\u8457\u7684\u7ba1\u9053\u5ef6\u8fdf\uff0c\u8fd9\u6781\u5927\u5730\u9650\u5236\u4e86\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u548c\u8bad\u7ec3\u541e\u5410\u91cf\u3002\u4e3a\u4e86\u63d0\u9ad8\u5185\u5b58\u6548\u7387\u548c\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u957f\u5e8f\u5217\u8bad\u7ec3LLMs\u7684\u9ad8\u6548\u5e8f\u5217\u7ea7\u4e00\u6b21\u524d\u5411\u4e00\u6b21\u540e\u5411\uff081F1B\uff09\u7ba1\u9053\u8c03\u5ea6\u65b9\u6cd5\uff0c\u79f0\u4e3aSeq1F1B\u3002Seq1F1B\u5c06\u6279\u7ea7\u522b\u53ef\u8c03\u5ea6\u5355\u5143\u5206\u89e3\u4e3a\u66f4\u7ec6\u7684\u5e8f\u5217\u7ea7\u5355\u5143\uff0c\u4ece\u800c\u51cf\u5c0f\u5ef6\u8fdf\u5e76\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002 \u8003\u8651\u5230\u5982\u679c\u5747\u5300\u5206\u5272\u5e8f\u5217\uff0cSeq1F1B\u53ef\u80fd\u4f1a\u4ea7\u751f\u8f7b\u5fae\u7684\u989d\u5916\u5ef6\u8fdf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba1\u7b97\u7684\u7b56\u7565\u6765\u5212\u5206\u8f93\u5165\u5e8f\u5217\uff0c\u4ee5\u7f13\u89e3\u8fd9\u4e2a\u526f\u4f5c\u7528\u3002\u4e0e\u7ade\u4e89\u6027\u7684\u7ba1\u9053\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5982Megatron\u76841F1B\u7ba1\u9053\u5e76\u884c\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u66f4\u9ad8\u8bad\u7ec3\u541e\u5410\u91cf\u7684\u540c\u65f6\uff0c\u5185\u5b58\u5360\u7528\u66f4\u4f4e\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSeq1F1B\u80fd\u591f\u5728\u4e0d\u4f7f\u7528\u91cd\u65b0\u8ba1\u7b97\u7b56\u7565\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u5730\u572864\u4e2aNVIDIA A100 GPU\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u5177\u6709300\u4ebf\u53c2\u6570\u7684LLM\uff0c\u5904\u7406\u957f\u8fbe64k\u7684\u5e8f\u5217\uff0c\u8fd9\u662f\u73b0\u6709\u65b9\u6cd5\u65e0\u6cd5\u5b9e\u73b0\u7684\u3002\u6211\u4eec\u7684\u4ee3\u7801\u57fa\u4e8eMegatron-LM\uff0c\u5e76\u5df2\u5f00\u6e90\uff1ahttps://github.com/MayDomine/Seq1F1B.git\u3002|\n", "2406.03487": "|**2024-06-05**|**Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends**|Sanjana Ramprasad et.al.|[2406.03487](http://arxiv.org/abs/2406.03487)|null|### \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u6458\u8981\u751f\u6210\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u771f\u5b9e\u6027\u65b9\u9762\u7684\u95ee\u9898\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u65b0\u95fb\u9886\u57df\u7684LLMs\uff0c\u5bf9\u8bdd\u6458\u8981\u7684\u8bc4\u4ef7\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u4e8eBART\u7684\u6a21\u578b\u4e0a\uff0c\u8fd9\u5728\u6211\u4eec\u7406\u89e3\u5b83\u4eec\u7684\u53ef\u4fe1\u5ea6\u65b9\u9762\u7559\u4e0b\u4e86\u7a7a\u767d\u3002\u672c\u7814\u7a76\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u5bf9\u8bdd\u6458\u8981\u4e2d\u7684\u771f\u5b9e\u6027\uff0c\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\uff0c\u5e76\u7740\u91cd\u4e8e\u8bc6\u522b\u548c\u5206\u7c7b\u53e5\u7ea7\u4e0d\u4e00\u81f4\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8GPT-4\u548cAlpaca-13B\u8fd9\u4e24\u6b3e\u4e3b\u6d41\u6a21\u578b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u9519\u8bef\u5b9a\u4e49\u7684\u5fae\u5999\u4e4b\u5904\uff1aLLMs\u5e38\u5e38\u751f\u6210\u770b\u4f3c\u5408\u7406\u7684\u63a8\u65ad\uff0c\u8fd9\u4e9b\u63a8\u65ad\u4f9d\u8d56\u4e8e\u5bf9\u8bdd\u4e2d\u7684\u95f4\u63a5\u8bc1\u636e\uff0c\u800c\u7f3a\u4e4f\u76f4\u63a5\u8bc1\u636e\uff0c\u8fd9\u5728\u65e7\u6a21\u578b\u4e2d\u8f83\u5c11\u89c1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6539\u8fdb\u7684\u9519\u8bef\u5206\u7c7b\u4f53\u7cfb\uff0c\u5f15\u5165\u4e86\u201c\u60c5\u5883\u63a8\u7406\u201d\u7c7b\u522b\u6765\u5f52\u7c7b\u8fd9\u4e9bLLM\u884c\u4e3a\uff0c\u5e76\u516c\u5f00\u4e86\u76f8\u5173\u6570\u636e\u96c6\u3002\u5229\u7528\u6211\u4eec\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86LLMs\u4e0e\u8001\u5f0f\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u884c\u4e3a\u5dee\u5f02\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u81ea\u52a8\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\u5728LLM\u6458\u8981\u4e0a\u7684\u6548\u679c\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u8bc6\u522b\u8fd9\u7c7b\u7ec6\u5fae\u9519\u8bef\u65f6\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u7cbe\u7ec6\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u4f18\u4e8e\u73b0\u6709\u6307\u6807\uff0c\u7279\u522b\u662f\u5728\u8bc6\u522b\u201c\u60c5\u5883\u63a8\u7406\u201d\u9519\u8bef\u65f6\u3002|\n", "2406.03486": "|**2024-06-05**|**BIPED: Pedagogically Informed Tutoring System for ESL Education**|Soonwoo Kwon et.al.|[2406.03486](http://arxiv.org/abs/2406.03486)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u4f5c\u4e3a\u7ecf\u6d4e\u4e14\u6613\u4e8e\u83b7\u53d6\u7684\u82f1\u8bed\u7b2c\u4e8c\u8bed\u8a00\uff08L2\uff09\u5b66\u4e60\u8005\u5bf9\u8bdd\u5f0f\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\uff08CITS\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684CITS\u5f80\u5f80\u53ea\u80fd\u6559\u6388\u7b80\u5355\u6982\u5ff5\uff0c\u6216\u8005\u5728\u6559\u5b66\u6df1\u5ea6\u4e0a\u65e0\u6cd5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u7b56\u7565\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u5f00\u53d1\u4e00\u4e2a\u66f4\u5177\u6559\u80b2\u5b66\u5bfc\u5411\u3001\u80fd\u6559\u6388\u590d\u6742\u6982\u5ff5\u7684CITS\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53cc\u8bed\u6559\u80b2\u6307\u5bfc\u5bf9\u8bdd\u6570\u636e\u96c6\uff08BIPED\uff09\uff0c\u5305\u542b\u4e00\u5bf9\u4e00\u7684\u4eba\u7c7b\u82f1\u8bed\u8f85\u5bfc\u4e92\u52a8\u3002\u901a\u8fc7\u5bf9\u8f85\u5bfc\u5bf9\u8bdd\u7684\u540e\u5904\u7406\u5206\u6790\uff0c\u6211\u4eec\u63d0\u70bc\u51fa\u4e00\u5957\u5305\u542b34\u79cd\u6559\u5e08\u884c\u4e3a\u548c9\u79cd\u5b66\u751f\u884c\u4e3a\u7684\u5bf9\u8bdd\u52a8\u4f5c\u8bcd\u5178\uff0c\u5e76\u5c06\u5176\u7528\u4e8e\u8fdb\u4e00\u6b65\u6807\u6ce8\u6536\u96c6\u7684\u6570\u636e\u3002\u6839\u636e\u5148\u9884\u6d4b\u5408\u9002\u7684\u6559\u5e08\u884c\u4e3a\u518d\u751f\u6210\u76f8\u5e94\u56de\u590d\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u6211\u4eec\u5229\u7528GPT-4\u548cSOLAR-KO\u5206\u522b\u5b9e\u73b0\u4e86\u4e24\u4e2aCITS\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u5b9e\u65bd\u7684\u6a21\u578b\u4e0d\u4ec5\u6a21\u4eff\u4e86\u4eba\u7c7b\u6559\u5e08\u7684\u98ce\u683c\uff0c\u8fd8\u8fd0\u7528\u4e86\u4e30\u5bcc\u4e14\u4e0e\u4e0a\u4e0b\u6587\u76f8\u9002\u5e94\u7684\u6559\u5b66\u7b56\u7565\u3002|\n", "2406.03476": "|**2024-06-05**|**Does your data spark joy? Performance gains from domain upsampling at the end of training**|Cody Blakeney et.al.|[2406.03476](http://arxiv.org/abs/2406.03476)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u89c4\u6a21\u589e\u957f\u5230\u4e07\u4ebf\u7ea7\u522b\u7684tokens\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u7531\u5927\u89c4\u6a21\u7684CommonCrawl\u7f51\u7edc\u722c\u866b\u5185\u5bb9\u4ee5\u53ca\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u7ec4\u6210\u3002\u7531\u4e8e\u5728\u5927\u8ba1\u7b97\u91cf\uff08FLOPs\uff09\u4e0b\u8bad\u7ec3\u4ee5\u63ed\u793a\u6a21\u578b\u5728\u56f0\u96be\u548c\u65b0\u5174\u57fa\u51c6\u4e0a\u7684\u663e\u8457\u53d8\u5316\u6210\u672c\u9ad8\u6602\uff0c\u5982\u4f55\u5728\u901a\u7528\u7f51\u7edc\u6293\u53d6\u7684\u591a\u6837\u6027\u548c\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5bc6\u5ea6\u4e4b\u95f4\u627e\u5230\u6700\u4f18\u5e73\u8861\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c\u5728\u8bad\u7ec3\u540e\u671f\u5bf9\u5176\u8fdb\u884c\u4e0a\u91c7\u6837\uff0c\u4ece\u800c\u5728\u8bf8\u5982MMLU\u3001GSM8K\u548cHumanEval\u7b49\u57fa\u51c6\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u5bf9\u4e8e\u4e00\u4e2a\u8bad\u7ec3\u4e861\u4e07\u4ebf\uff08T\uff09\u4ee4\u724c\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u8fd9\u79cd\u7b80\u5355\u65b9\u6cd5\u53ef\u4f7f\u5176\u6027\u80fd\u63d0\u9ad86.90\u5206\u30018.26\u5206\u548c6.17\u5206\uff0c\u4e0e\u8bad\u7ec3\u65f6\u95f4\u4e24\u500d\u7684Llama-2\uff087B\uff09\u6a21\u578b\u76f8\u5f53\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5728\u8bad\u7ec3\u540e\u671f\u9886\u57df\u4e0a\u91c7\u6837\u7684\u6301\u7eed\u65f6\u95f4\uff0c\u4ece5%\u523030%\uff0c\u53d1\u73b010%\u523020%\u7684\u6bd4\u4f8b\u6700\u4e3a\u5408\u9002\uff0c\u4ee5\u5e73\u8861\u4e00\u822c\u8bed\u8a00\u5efa\u6a21\u80fd\u529b\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684\u4f18\u5316\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5229\u7528\u9886\u57df\u4e0a\u91c7\u6837\u6765\u5927\u89c4\u6a21\u5206\u6790\u5355\u4e2a\u6570\u636e\u96c6\u5bf9\u4e0d\u540c\u57fa\u51c6\u7684\u589e\u76ca\uff0c\u901a\u8fc7\u5728\u8fd9\u4e00\u9636\u6bb5\u79fb\u9664\u5b83\u4eec\u8fdb\u884c\u5b9e\u9a8c\u3002\u8fd9\u79cd\u65b9\u6cd5\u6781\u5927\u5730\u964d\u4f4e\u4e86\u5b9e\u9a8c\u6210\u672c\uff0c\u4f7f\u5f97\u80fd\u591f\u4ee5\u9884\u8bad\u7ec3\u8fd0\u884c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u7684\u6210\u672c\u63a2\u7d22\u4e0d\u540c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474](http://arxiv.org/abs/2406.03474)|null|\u9274\u4e8e\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u805a\u7126\u4e8e\u4f7f\u7528MLLM\u9a71\u52a8\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u5728\u5927\u89c4\u6a21\u52a8\u6001\u73af\u5883\u4e2d\u3002\u7136\u800c\uff0c\u5e38\u89c1\u7684\u65b9\u6cd5\u76f4\u63a5\u5c06\u9ad8\u7ea7\u6307\u4ee4\u8f6c\u5316\u4e3a\u4f4e\u7ea7\u8f66\u8f86\u63a7\u5236\u4fe1\u53f7\uff0c\u8fd9\u8fdd\u80cc\u4e86MLLM\u7684\u672c\u8d28\u751f\u6210\u6a21\u5f0f\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u5176\u6f5c\u5728\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u7684\u4e00\u822c\u5316\u80fd\u529b\u53d7\u5230\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6781\u5927\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u4e2d\u5c42\u8bed\u8a00\u9a71\u52a8\u547d\u4ee4\u6765\u8fde\u63a5\u9ad8\u7ea7\u6307\u4ee4\u548c\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\uff0c\u5b83\u4eec\u6bd4\u9ad8\u7ea7\u6307\u4ee4\u66f4\u7ec6\u81f4\uff0c\u4f46\u6bd4\u63a7\u5236\u4fe1\u53f7\u66f4\u901a\u7528\u4e14\u53ef\u89e3\u91ca\uff0c\u4ece\u800c\u6709\u6548\u5f25\u5408\u4e24\u8005\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3aAD-H\u7684\u5206\u5c42\u591a\u4ee3\u7406\u9a7e\u9a76\u7cfb\u7edf\u5b9e\u73b0\u8fd9\u4e00\u7406\u5ff5\uff0c\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u9ad8\u5c42\u63a8\u7406\u7684MLLM\u89c4\u5212\u5668\u548c\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u63a7\u5236\u5668\u8fdb\u884c\u4f4e\u5c42\u6267\u884c\u3002\u8fd9\u79cd\u5206\u5c42\u8bbe\u8ba1\u4f7fMLLM\u6446\u8131\u4e86\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\u89e3\u7801\uff0c\u5145\u5206\u91ca\u653e\u4e86\u5176\u5728\u9ad8\u5c42\u611f\u77e5\u3001\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u7684\u6d8c\u73b0\u80fd\u529b\u3002 \u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5e26\u6709\u52a8\u4f5c\u5c42\u6b21\u6ce8\u91ca\u7684\u65b0\u6570\u636e\u96c6\u3002\u5168\u9762\u7684\u95ed\u73af\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684AD-H\u7cfb\u7edf\u5177\u6709\u591a\u9879\u5173\u952e\u4f18\u52bf\u3002\u9996\u5148\uff0cAD-H\u5728\u9a7e\u9a76\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u751a\u81f3\u5c55\u73b0\u51fa\u5728\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u81ea\u6211\u7ea0\u6b63\u7684\u80fd\u529b\uff0c\u8fd9\u662f\u8bad\u7ec3\u6570\u636e\u672a\u6db5\u76d6\u7684\u573a\u666f\u3002\u5176\u6b21\uff0cAD-H\u5728\u957f\u7a0b\u6307\u4ee4\u548c\u65b0\u73af\u5883\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\uff0c\u660e\u663e\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\uff0c\u53ef\u901a\u8fc7\u83b7\u53d6\u3002|\n", "2406.03450": "|**2024-06-05**|**What is the Best Way for ChatGPT to Translate Poetry?**|Shanshan Wang et.al.|[2406.03450](http://arxiv.org/abs/2406.03450)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u5728\u82f1\u8bed-\u4e2d\u6587\u8bd7\u6b4c\u7ffb\u8bd1\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u5b9a\u5411\u63d0\u793a\u548c\u5c0f\u6837\u672c\u573a\u666f\u5206\u6790\u4ee5\u4f18\u5316\u5176\u8868\u73b0\u3002\u5c3d\u7ba1\u521d\u671f\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u7814\u7a76\u53d1\u73b0ChatGPT\u7684\u7ffb\u8bd1\u5b58\u5728\u6301\u7eed\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u89e3\u91ca\u8f85\u52a9\u8bd7\u6b4c\u673a\u5668\u7ffb\u8bd1\u201d\uff08EAPMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bd7\u6b4c\u7684\u5355\u8bed\u89e3\u91ca\u4f5c\u4e3a\u7ffb\u8bd1\u8fc7\u7a0b\u7684\u6307\u5bfc\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ee5\u66f4\u597d\u5730\u9002\u5e94\u73b0\u4ee3\u8bd7\u6b4c\u7ffb\u8bd1\u7684\u5fae\u5999\u4e4b\u5904\u3002\u6211\u4eec\u9080\u8bf7\u4e13\u4e1a\u8bd7\u4eba\u8fdb\u884c\u8bc4\u4f30\uff0c\u5e76\u7ed3\u5408GPT-4\u7684\u8bc4\u4ef7\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684EAPMT\u65b9\u6cd5\u5728\u4e0e\u4f20\u7edfChatGPT\u7ffb\u8bd1\u65b9\u6cd5\u4ee5\u53ca\u73b0\u6709\u5728\u7ebf\u7cfb\u7edf\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8bba\u6587\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e3a\u6587\u5b66\u7ffb\u8bd1\u7684\u673a\u5668\u8f85\u52a9\u63d0\u4f9b\u4e86\u65b0\u9896\u89c6\u89d2\u3002|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445](http://arxiv.org/abs/2406.03445)|null|## \u7ffb\u8bd1 \u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u6267\u884c\u57fa\u672c\u7684\u7b97\u672f\u8fd0\u7b97\uff0c\u5982\u52a0\u6cd5\uff0c\u4ecd\u4e0d\u6e05\u695a\u3002\u672c\u6587\u63ed\u793a\u4e86\u9884\u8bad\u7ec3\u7684LLMs\u901a\u8fc7\u5085\u91cc\u53f6\u7279\u5f81\u8fdb\u884c\u52a0\u6cd5\u2014\u2014\u8fd9\u4e9b\u662f\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u7ef4\u5ea6\uff0c\u901a\u8fc7\u4e00\u7ec4\u5728\u9891\u57df\u4e2d\u7a00\u758f\u5206\u5e03\u7684\u7279\u5f81\u6765\u8868\u793a\u6570\u5b57\u3002\u5728\u6a21\u578b\u4e2d\uff0c\u591a\u5c42\u611f\u77e5\u5668\uff08MLP\uff09\u5c42\u548c\u6ce8\u610f\u529b\u5c42\u4ee5\u4e92\u8865\u7684\u65b9\u5f0f\u4f7f\u7528\u5085\u91cc\u53f6\u7279\u5f81\uff1aMLP\u5c42\u4e3b\u8981\u4f7f\u7528\u4f4e\u9891\u7279\u5f81\u8fd1\u4f3c\u7b54\u6848\u7684\u5927\u5c0f\uff0c\u800c\u6ce8\u610f\u529b\u5c42\u4e3b\u8981\u901a\u8fc7\u9ad8\u9891\u7279\u5f81\u6267\u884c\u6a21\u8fd0\u7b97\uff08\u4f8b\u5982\u5224\u65ad\u7b54\u6848\u662f\u5426\u4e3a\u5076\u6570\uff09\u3002\u9884\u8bad\u7ec3\u5bf9\u4e8e\u8fd9\u79cd\u673a\u5236\u81f3\u5173\u91cd\u8981\uff1a\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684\u6a21\u578b\u4ec5\u5229\u7528\u4f4e\u9891\u7279\u5f81\uff0c\u5bfc\u81f4\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u5c06\u9884\u8bad\u7ec3\u7684\u8bcd\u5d4c\u5165\u5f15\u5165\u5230\u968f\u673a\u521d\u59cb\u5316\u7684\u6a21\u578b\u4e2d\u53ef\u4ee5\u6062\u590d\u5176\u6027\u80fd\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u9002\u5f53\u7684\u9884\u8bad\u7ec3\u8868\u793a\uff08\u5982\u5085\u91cc\u53f6\u7279\u5f81\uff09\u80fd\u591f\u89e3\u9501Transformer\u5b66\u4e60\u7b97\u6cd5\u4efb\u52a1\u7cbe\u786e\u673a\u5236\u7684\u80fd\u529b\u3002|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441](http://arxiv.org/abs/2406.03441)|null|\u5728\u8bb8\u591a\u9ad8\u98ce\u9669\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u4e2d\uff0c\u6a21\u578b\u9700\u8981\u80fd\u591f\u8868\u660e\u5176\u5bf9\u9884\u6d4b\u7684\u4e0d\u786e\u5b9a\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u4e0a\u7684\u51c6\u786e\u5ea6\u53ef\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u4eba\u7c7b\u6c34\u5e73\uff0c\u4f46\u5b83\u4eec\u5bf9\u9519\u8bef\u54cd\u5e94\u7684\u8fc7\u5ea6\u81ea\u4fe1\u4ecd\u662f\u5df2\u77e5\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u5728\u76f4\u63a5\u5e94\u7528\u4e8eLLMs\u65f6\u53ef\u80fd\u9762\u4e34\u8ba1\u7b97\u6210\u672c\u548c\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u6311\u6218\u3002\u8fd1\u671f\u63d0\u51fa\u4e86\u4e00\u4e9b\u9ed1\u76d2\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bf8\u5982\u81ea\u6211\u8868\u8ff0\u7684\u4fe1\u5fc3\u7b49\u542f\u53d1\u5f0f\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u901a\u8fc7\u5206\u6790\u6a21\u578b\u751f\u6210\u7b54\u6848\u7684\u89e3\u91ca\u5206\u5e03\u6765\u8861\u91cfLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u5c3d\u7ba1\u5229\u7528\u89e3\u91ca\u672c\u8eab\u5e76\u975e\u65b0\u9896\uff0c\u4f46\u6211\u4eec\u5c06\u5176\u89c6\u4e3a\u6d4b\u8bd5\u65f6\u95f4\u5206\u7c7b\u5668\uff0c\u901a\u8fc7\u8ba1\u7b97\u6700\u53ef\u80fd\u7684\u5206\u7c7b\u5668\u540e\u9a8c\u7b54\u6848\u5206\u5e03\uff0c\u4ee5\u6b64\u8fdb\u884c\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u89e3\u91ca\u8574\u542b\u4f5c\u4e3a\u5206\u7c7b\u5668\u4f3c\u7136\u6027\u7684\u4e00\u79cd\u7279\u5b9a\u6846\u67b6\u5b9e\u4f8b\uff0c\u5982\u4f55\u5728\u4e94\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\u4e0a\u6539\u8fdb\u4e86\u4fe1\u5fc3\u5206\u6570\u6307\u6807\uff08\u7279\u522b\u662fAUROC\u548cAURC\uff09\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u65e2\u5177\u6709\u7406\u8bba\u4f9d\u636e\uff0c\u53c8\u662f\u6709\u6548\u91cf\u5316LLMs\u4e0d\u786e\u5b9a\u6027\u7684\u65b9\u5f0f\u3002|\n", "2406.03411": "|**2024-06-05**|**Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach**|Saehyung Lee et.al.|[2406.03411](http://arxiv.org/abs/2406.03411)|**[link](https://github.com/saehyung-lee/plugir)**|**\u8be5\u8bba\u6587\u4e3b\u8981\u5173\u6ce8\u7684\u662f\u4ea4\u4e92\u5f0f\u6587\u672c\u5230\u56fe\u50cf\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u8bdd\u5f62\u5f0f\u4e0a\u4e0b\u6587\u67e5\u8be2\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\uff0c\u540d\u4e3aPlugIR\uff0c\u901a\u8fc7\u4e24\u79cd\u65b9\u5f0f\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u822c\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u9996\u5148\uff0c\u901a\u8fc7\u91cd\u8ff0\u5bf9\u8bdd\u5f62\u5f0f\u7684\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u6d88\u9664\u4e86\u5728\u73b0\u6709\u89c6\u89c9\u5bf9\u8bdd\u6570\u636e\u4e0a\u5fae\u8c03\u68c0\u7d22\u6a21\u578b\u7684\u9700\u6c42\uff0c\u4ece\u800c\u80fd\u591f\u4f7f\u7528\u4efb\u610f\u9ed1\u76d2\u6a21\u578b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2aLLM\u63d0\u95ee\u8005\uff0c\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u4e2d\u5019\u9009\u56fe\u50cf\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5173\u4e8e\u76ee\u6807\u56fe\u50cf\u5c5e\u6027\u7684\u975e\u5197\u4f59\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u51cf\u5c11\u4e86\u751f\u6210\u95ee\u9898\u7684\u566a\u58f0\u548c\u5197\u4f59\u3002\u9664\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3a\u6700\u4f73\u5bf9\u6570\u6392\u540d\u79ef\u5206\uff08BRI\uff09\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4ea4\u4e92\u5f0f\u68c0\u7d22\u7cfb\u7edf\u3002PlugIR\u5728\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u4f18\u4e8e\u96f6\u6b21\u8bbe\u7f6e\u548c Fine-tuned \u57fa\u51c6\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c PlugIR \u7684\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\u53ef\u4ee5\u6839\u636e\u4e0d\u540c\u60c5\u51b5\u7075\u6d3b\u5355\u72ec\u6216\u7ed3\u5408\u5e94\u7528\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/Saehyung-Lee/PlugIR\u3002**|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344](http://arxiv.org/abs/2406.04344)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d6\u5f97\u7684\u5de8\u5927\u8fdb\u5c55\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53e3\u5934\u5316\u673a\u5668\u5b66\u4e60\uff08VML\uff09\u6846\u67b6\u3002\u4e0e\u4f20\u7edf\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u901a\u5e38\u5728\u8fde\u7eed\u53c2\u6570\u7a7a\u95f4\u4e2d\u4f18\u5316\u4e0d\u540c\uff0cVML\u5c06\u53c2\u6570\u7a7a\u95f4\u9650\u5236\u4e3a\u4eba\u53ef\u7406\u89e3\u7684\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u79cd\u7ea6\u675f\u4fc3\u4f7f\u6211\u4eec\u4ece\u65b0\u89d2\u5ea6\u770b\u5f85\u51fd\u6570\u903c\u8fd1\u95ee\u9898\uff0c\u5373\u5c06\u5e26\u6709\u6587\u672c\u63d0\u793a\u7684LLM\u89c6\u4e3a\u7531\u6587\u672c\u63d0\u793a\u53c2\u6570\u5316\u7684\u51fd\u6570\u3002\u6211\u4eec\u501f\u6b64\u89c6\u89d2\u91cd\u65b0\u5ba1\u89c6\u4e86\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u5982\u56de\u5f52\u548c\u5206\u7c7b\uff0c\u53d1\u73b0\u8fd9\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\u548c\u4f18\u5316\u5668\u6765\u89e3\u51b3\u3002VML\u7684\u4e3b\u8981\u4f18\u52bf\u5305\u62ec\uff1a\uff081\uff09\u6613\u4e8e\u7f16\u7801\u5148\u9a8c\u77e5\u8bc6\uff1a\u5173\u4e8e\u95ee\u9898\u548c\u5047\u8bbe\u7c7b\u7684\u5148\u9a8c\u77e5\u8bc6\u53ef\u4ee5\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7f16\u7801\u5e76\u8f93\u5165\u7ed9LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\uff1b\uff082\uff09\u81ea\u52a8\u6a21\u578b\u9009\u62e9\uff1a\u4f18\u5316\u5668\u53ef\u4ee5\u6839\u636e\u6570\u636e\u548c\u53e3\u5934\u5316\u5148\u9a8c\u77e5\u8bc6\u81ea\u52a8\u9009\u62e9\u5177\u4f53\u7684\u6a21\u578b\u7c7b\u522b\uff0c\u5e76\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u66f4\u65b0\u6a21\u578b\u7c7b\u522b\uff1b\uff083\uff09\u53ef\u89e3\u91ca\u7684\u5b66\u4e60\u8005\u66f4\u65b0\uff1aLLM\u53c2\u6570\u5316\u7684\u4f18\u5316\u5668\u53ef\u4ee5\u89e3\u91ca\u6bcf\u6b21\u5b66\u4e60\u8005\u66f4\u65b0\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u591a\u9879\u5b9e\u9a8c\u8bc4\u4f30VML\u7684\u6709\u6548\u6027\uff0c\u5e0c\u671b\u5b83\u80fd\u6210\u4e3a\u589e\u5f3a\u673a\u5668\u5b66\u4e60\u53ef\u89e3\u91ca\u6027\u548c\u4fe1\u4efb\u5ea6\u7684\u6865\u6881\u3002|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339](http://arxiv.org/abs/2406.04339)|null|\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u7684\u6838\u5fc3\u76ee\u6807\u4e2d\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u89c6\u89c9\u573a\u666f\u5e76\u6267\u884c\u52a8\u4f5c\u662f\u4e00\u4e2a\u57fa\u672c\u4efb\u52a1\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u673a\u5668\u4eba\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u80fd\u591f\u5904\u7406\u4e00\u4e9b\u57fa\u7840\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5728\u4e24\u4e2a\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff1a1\uff09\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u63a8\u7406\u80fd\u529b\u4e0d\u8db3\uff1b2\uff09\u5bf9\u4e8eMLLM\u7684\u5fae\u8c03\u548c\u63a8\u7406\u5b58\u5728\u9ad8\u8ba1\u7b97\u6210\u672c\u3002\u8fd1\u671f\u63d0\u51fa\u7684\u57fa\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSM\uff09\u7684Mamba\u5c55\u793a\u4e86\u5728\u975e\u5e73\u51e1\u5e8f\u5217\u5efa\u6a21\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5177\u6709\u7ebf\u6027\u63a8\u7406\u590d\u6742\u5ea6\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RoboMamba\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4ebaMLLM\uff0c\u5b83\u5229\u7528Mamba\u6a21\u578b\u7ed3\u5408\u673a\u5668\u4eba\u63a8\u7406\u548c\u52a8\u4f5c\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u5fae\u8c03\u548c\u63a8\u7406\u6548\u7387\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u4e0eMamba\u96c6\u6210\uff0c\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u4f7f\u89c6\u89c9\u6570\u636e\u4e0e\u8bed\u8a00\u5d4c\u5165\u5bf9\u9f50\uff0c\u8d4b\u4e88\u6a21\u578b\u89c6\u89c9\u5e38\u8bc6\u548c\u4e0e\u673a\u5668\u4eba\u76f8\u5173\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347RoboMamba\u7684\u52a8\u4f5c\u59ff\u6001\u9884\u6d4b\u80fd\u529b\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u7b80\u5355\u7684\u7b56\u7565\u5934\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e00\u65e6RoboMamba\u5177\u5907\u8db3\u591f\u7684\u63a8\u7406\u80fd\u529b\uff0c\u53ea\u9700\u6781\u5c11\u7684\u5fae\u8c03\u53c2\u6570\uff08\u6a21\u578b\u76840.1%\uff09\u548c\u65f6\u95f4\uff0820\u5206\u949f\uff09\uff0c\u5c31\u80fd\u4e60\u5f97\u64cd\u7eb5\u6280\u80fd\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0cRoboMamba\u5728\u901a\u7528\u548c\u673a\u5668\u4eba\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u4e2d\u5b9e\u73b0\u4e86\u59ff\u6001\u9884\u6d4b\u7684\u51fa\u8272\u8868\u73b0\uff0c\u5176\u63a8\u7406\u901f\u5ea6\u6bd4\u73b0\u6709\u673a\u5668\u4ebaMLLM\u5feb7\u500d\u3002\u9879\u76ee\u7684\u7f51\u9875\u94fe\u63a5\u4e3a\uff1a\u3002|\n", "2406.04337": "|**2024-06-06**|**Coherent Zero-Shot Visual Instruction Generation**|Quynh Phung et.al.|[2406.04337](http://arxiv.org/abs/2406.04337)|null|\u5c3d\u7ba1\u6587\u672c\u5230\u56fe\u50cf\u5408\u6210\u6280\u672f\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5728\u6269\u6563\u6a21\u578b\u65b9\u9762\uff0c\u4f46\u751f\u6210\u9700\u8981\u7269\u4f53\u5728\u8fde\u7eed\u6b65\u9aa4\u4e2d\u4fdd\u6301\u4e00\u81f4\u8868\u793a\u548c\u5e73\u6ed1\u72b6\u6001\u8f6c\u6362\u7684\u89c6\u89c9\u6307\u4ee4\u4ecd\u7136\u662f\u4e00\u9879\u8270\u5de8\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\uff0c\u5de7\u5999\u5730\u7ed3\u5408\u4e86\u6587\u672c\u7406\u89e3\u4e0e\u56fe\u50cf\u751f\u6210\uff0c\u4ee5\u786e\u4fdd\u89c6\u89c9\u6307\u4ee4\u65e2\u7f8e\u89c2\u53c8\u5177\u6709\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6d4b\u8bd5\u591a\u6b65\u9aa4\u6307\u4ee4\uff0c\u5e76\u4e0e\u591a\u4e2a\u57fa\u7ebf\u8fdb\u884c\u6bd4\u8f83\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u8fde\u8d2f\u4e14\u89c6\u89c9\u4e0a\u5438\u5f15\u4eba\u7684\u6307\u4ee4\u3002|\n", "2406.04334": "|**2024-06-06**|**DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs**|Lingchen Meng et.al.|[2406.04334](http://arxiv.org/abs/2406.04334)|null|\u5927\u591a\u6570\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u901a\u8fc7\u5c06\u89c6\u89c9\u4ee4\u724c\u4f5c\u4e3a\u5e8f\u5217\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7b2c\u4e00\u5c42\u6765\u5b9e\u73b0\u3002\u8fd9\u79cd\u65b9\u6cd5\u867d\u7136\u76f4\u89c2\uff0c\u4f46\u4f1a\u663e\u8457\u589e\u52a0\u8ba1\u7b97\u548c\u5185\u5b58\u5f00\u9500\uff0c\u56e0\u4e3a\u6a21\u578b\u9700\u8981\u5904\u7406\u66f4\u591a\u7684\u8f93\u5165\u5c42\u4ee4\u724c\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u67b6\u6784DeepStack\uff0c\u7528\u4e8eLMMs\u3002\u5728LMM\u7684\u89c6\u89c9\u548c\u8bed\u8a00Transformer\u7684N\u5c42\u4e2d\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u4ee4\u724c\u5206\u4e3aN\u7ec4\uff0c\u5e76\u4ece\u5e95\u5c42\u9010\u5c42\u5411\u4e0a\u9988\u9001\u5230\u5bf9\u5e94\u7684Transformer\u5c42\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u6781\u5927\u5730\u589e\u5f3a\u4e86LMM\u5728\u8de8\u5c42\u89c6\u89c9\u4ee4\u724c\u4ea4\u4e92\u65b9\u9762\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u6210\u672c\u51e0\u4e4e\u4e0d\u53d8\u3002\u6211\u4eec\u5206\u522b\u5c06DeepStack\u5e94\u7528\u4e8eLMM\u7684\u8bed\u8a00\u548c\u89c6\u89c9Transformer\uff0c\u5e76\u901a\u8fc7\u5e7f\u6cdb\u5b9e\u8bc1\u7ed3\u679c\u9a8c\u8bc1\u4e86DeepStack LMM\u7684\u6709\u6548\u6027\u3002 \u4f7f\u7528\u76f8\u540c\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u6211\u4eec\u7684DeepStack 7B\u548c13B\u53c2\u6570\u6a21\u578b\u57289\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5e73\u5747\u8d85\u8d8a\u540c\u7c7b\u6a21\u578b2.7\u5206\u548c2.9\u5206\u3002\u4ec5\u4f7f\u7528\u4e94\u5206\u4e4b\u4e00\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0cDeepStack\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u4f7f\u7528\u5b8c\u6574\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6a21\u578b\u3002\u8fd9\u4e9b\u63d0\u5347\u5728\u9ad8\u5206\u8fa8\u7387\u4efb\u52a1\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u4f8b\u5982\uff0c\u4e0eLLaVA-1.5-7B\u76f8\u6bd4\uff0cTextVQA\u3001DocVQA\u548cInfoVQA\u4e0a\u7684\u6027\u80fd\u5206\u522b\u63d0\u9ad8\u4e864.2\u5206\u300111.0\u5206\u548c4.0\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06DeepStack\u5e94\u7528\u5230\u89c6\u89c9Transformer\u5c42\uff0c\u8fd9\u5e26\u6765\u4e86\u4e0eLLaVA-1.5-7B\u76f8\u5f53\u7684\u5e73\u5747\u6539\u8fdb\uff0c\u4e3a3.8\u5206\u3002|\n", "2406.04331": "|**2024-06-06**|**PaCE: Parsimonious Concept Engineering for Large Language Models**|Jinqi Luo et.al.|[2406.04331](http://arxiv.org/abs/2406.04331)|**[link](https://github.com/peterljq/parsimonious-concept-engineering)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5c3d\u7ba1\u5b83\u4eec\u80fd\u591f\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u56de\u590d\uff0c\u4f46\u4e5f\u4f1a\u4ea7\u751f\u4e0d\u826f\u8f93\u51fa\uff0c\u5982\u6f5c\u5728\u6709\u5bb3\u4fe1\u606f\u3001\u79cd\u65cf\u6216\u6027\u522b\u6b67\u89c6\u6027\u8a00\u8bba\u4ee5\u53ca\u9519\u8bef\u7684\u4fe1\u606f\u3002\u4e3a\u4e86\u51cf\u5c11\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u3001\u63d0\u793a\u5de5\u7a0b\u548c\u8868\u793a\u5de5\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u9762\u4e34\u6311\u6218\uff1a\u4e00\u4e9b\u9700\u8981\u9488\u5bf9\u6bcf\u4e2a\u5bf9\u9f50\u4efb\u52a1\u8fdb\u884c\u6602\u8d35\u7684\u5fae\u8c03\uff1b\u4e00\u4e9b\u672a\u80fd\u5145\u5206\u6d88\u9664\u4e0d\u826f\u6982\u5ff5\uff0c\u5bf9\u9f50\u6548\u679c\u4e0d\u4f73\uff1b\u4e00\u4e9b\u5219\u5220\u9664\u4e86\u826f\u6027\u7684\u6982\u5ff5\uff0c\u964d\u4f4e\u4e86LLMs\u7684\u8bed\u8a00\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aParsimonious Concept Engineering\uff08PaCE\uff09\u7684\u65b0\u578b\u6fc0\u6d3b\u5de5\u7a0b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \u9996\u5148\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u6982\u5ff5\u5b57\u5178\uff0c\u5b83\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u8868\u793a\u6bcf\u4e2a\u539f\u5b50\u5bf9\u5e94\u4e00\u4e2a\u8bed\u4e49\u6982\u5ff5\u3002\u63a5\u7740\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u4efb\u4f55\u5bf9\u9f50\u4efb\u52a1\uff0c\u6211\u4eec\u4f1a\u4f7f\u7528\u4e00\u4e2a\u6982\u5ff5\u5206\u533a\u5668\u9ad8\u6548\u5730\u6807\u8bb0\u8fd9\u4e9b\u6982\u5ff5\u4e3a\u826f\u6027\u6216\u4e0d\u826f\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u5229\u7528\u7a00\u758f\u7f16\u7801\u65b9\u6cd5\uff0c\u6839\u636e\u6982\u5ff5\u5b57\u5178\u5206\u89e3LLM\u7684\u6fc0\u6d3b\uff0c\u5c06\u5176\u51c6\u786e\u8868\u793a\u4e3a\u826f\u6027\u6210\u5206\u548c\u4e0d\u826f\u6210\u5206\u7684\u7ebf\u6027\u7ec4\u5408\u3002\u901a\u8fc7\u79fb\u9664\u4e0d\u826f\u6210\u5206\uff0c\u6211\u4eec\u80fd\u591f\u8c03\u6574LLMs\u7684\u884c\u4e3a\u4ee5\u7b26\u5408\u5bf9\u9f50\u76ee\u6807\u3002 \u6211\u4eec\u5728\u56de\u5e94\u51c0\u5316\u3001\u771f\u5b9e\u6027\u589e\u5f3a\u548c\u60c5\u611f\u4fee\u8ba2\u7b49\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u53d1\u73b0PaCE\u5728\u5b9e\u73b0\u5bf9\u9f50\u6027\u80fd\u7684\u540c\u65f6\uff0c\u4fdd\u6301\u4e86\u826f\u597d\u7684\u8bed\u8a00\u80fd\u529b\uff0c\u8fbe\u5230\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002**|\n", "2406.04314": "|**2024-06-06**|**Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step**|Zhanhao Liang et.al.|[2406.04314](http://arxiv.org/abs/2406.04314)|null|## \u80cc\u666f \u8fd1\u671f\uff0cDirect Preference Optimization (DPO) \u5df2\u6210\u529f\u6269\u5c55\u5230\u8c03\u6574\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\uff0c\u4f7f\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u4e0d\u540c\u4e8e\u5927\u591a\u6570\u73b0\u6709 DPO \u65b9\u6cd5\u5047\u8bbe\u6240\u6709\u6269\u6563\u6b65\u9aa4\u90fd\u4e0e\u6700\u7ec8\u751f\u6210\u56fe\u50cf\u4fdd\u6301\u4e00\u81f4\u7684\u504f\u597d\u987a\u5e8f\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u5047\u8bbe\u5ffd\u7565\u4e86\u6bcf\u4e2a\u6b65\u9aa4\u7279\u6709\u7684\u53bb\u566a\u6027\u80fd\uff0c\u56e0\u6b64\u5e94\u8be5\u4e3a\u6bcf\u4e00\u6b65\u5b9a\u5236\u504f\u597d\u6807\u7b7e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u540e\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014Step-aware Preference Optimization (SPO)\uff0c\u5b83\u72ec\u7acb\u8bc4\u4f30\u5e76\u8c03\u6574\u6bcf\u4e2a\u6b65\u9aa4\u7684\u53bb\u566a\u6027\u80fd\uff0c\u5229\u7528\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\u548c\u6b65\u7ea7\u91cd\u91c7\u6837\u5668\u6765\u786e\u4fdd\u51c6\u786e\u7684\u6b65\u7ea7\u76d1\u7763\u3002 \u5728SPO\u4e2d\uff0c\u6211\u4eec\u5728\u6bcf\u4e2a\u53bb\u566a\u6b65\u9aa4\u4e2d\u4f1a\u521b\u5efa\u4e00\u4e2a\u56fe\u50cf\u6c60\uff0c\u5bfb\u627e\u5408\u9002\u7684\u80dc\u8005-\u8d25\u8005\u5bf9\uff0c\u5e76\u4e14\u5173\u952e\u5728\u4e8e\uff0c\u6211\u4eec\u4f1a\u4ece\u6c60\u4e2d\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u56fe\u50cf\u4f5c\u4e3a\u4e0b\u4e00\u6b21\u53bb\u566a\u6b65\u9aa4\u7684\u8d77\u70b9\u3002\u8fd9\u4e2a\u6b65\u7ea7\u91cd\u91c7\u6837\u8fc7\u7a0b\u4fdd\u8bc1\u4e86\u6bcf\u6b21\u80dc\u8005-\u8d25\u8005\u5bf9\u90fd\u6765\u81ea\u540c\u4e00\u539f\u59cb\u56fe\u50cf\uff0c\u4f7f\u5f97\u6bd4\u8f83\u72ec\u7acb\u4e8e\u524d\u4e00\u6b65\u3002\u4e3a\u4e86\u8bc4\u4f30\u6bcf\u4e2a\u6b65\u9aa4\u7684\u504f\u597d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u6a21\u7cca\u548c\u6e05\u6670\u7684\u56fe\u50cf\u3002\u5728Stable Diffusion v1.5\u548cSDXL\u7b49\u5b9e\u9a8c\u4e2d\uff0cSPO \u663e\u8457\u4f18\u4e8e\u6700\u65b0\u7684Diffusion-DPO\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u8be6\u7ec6\u7684\u63d0\u793a\u65f6\uff0c\u80fd\u66f4\u597d\u5730\u751f\u6210\u56fe\u50cf\u5e76\u63d0\u5347\u7f8e\u5b66\u6548\u679c\uff0c\u540c\u65f6\u5728\u8bad\u7ec3\u6548\u7387\u4e0a\u8d85\u8fc720\u500d\u3002\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1a[https://rockeycoss.github.io/spo.github.io/](https://rockeycoss.github.io/spo.github.io/)\u3002|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306](http://arxiv.org/abs/2406.04306)|**[link](https://github.com/ml-jku/SDLG)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u65f6\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u8fd9\u963b\u788d\u4e86\u793e\u4f1a\u548c\u5de5\u4e1a\u4e2d\u7684\u5404\u79cd\u5e94\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u964d\u4f4eLLMs\u7684\u53ef\u4fe1\u5ea6\u3002\u5f53\u524d\u7684LLMs\u91c7\u7528\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u6587\u672c\uff0c\u5373\u9884\u6d4b\u5e76\u6dfb\u52a0\u6587\u672c\u6807\u8bb0\u3002\u5f53LLMs\u5bf9\u751f\u6210\u7684\u4e0b\u4e00\u4e2a\u6807\u8bb0\u7684\u8bed\u4e49\u542b\u4e49\u4e0d\u786e\u5b9a\u65f6\uff0c\u5f88\u53ef\u80fd\u4f1a\u4ea7\u751f\u5e7b\u89c9\u3002\u56e0\u6b64\uff0c\u4eba\u4eec\u8ba4\u4e3a\u5e7b\u89c9\u6e90\u4e8e\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u8bed\u4e49\u591a\u6837\u6027\u8bed\u8a00\u751f\u6210\u201d\uff08Semantically Diverse Language Generation\uff0cSDLG\uff09\uff0c\u7528\u4e8e\u91cf\u5316LLMs\u7684\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002SDLG\u5f15\u5bfcLLM\u751f\u6210\u8bed\u4e49\u591a\u6837\u4f46\u53c8\u5408\u7406\u7684\u521d\u59cb\u6587\u672c\u66ff\u4ee3\u65b9\u6848\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u7cbe\u786e\u7684aleatoric\u8bed\u4e49\u4e0d\u786e\u5b9a\u6027\u6d4b\u91cf\uff0c\u80fd\u591f\u68c0\u6d4b\u521d\u59cb\u6587\u672c\u662f\u5426\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u3002 \u5b9e\u9a8c\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u8868\u660e\uff0cSDLG\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u6700\u4e3a\u9ad8\u6548\uff0c\u4e3aLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u8bbe\u5b9a\u4e86\u65b0\u7684\u6807\u51c6\u3002**|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300](http://arxiv.org/abs/2406.04300)|null|\u5728\u6a21\u62df\u8bad\u7ec3\u548c\u8bc4\u4f30\u5173\u952e\u5b89\u5168\u7cfb\u7edf\uff0c\u5982\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u65f6\uff0c\u901a\u8fc7\u6a21\u62df\u751f\u6210\u5404\u79cd\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u6a21\u578b\u5176\u4ed6\u8f66\u8f86\u7684\u8f68\u8ff9\u4ee5\u6a21\u62df\u590d\u6742\u4e14\u6709\u610f\u4e49\u7684\u8fd1\u8ddd\u79bb\u4ea4\u4e92\u4efb\u52a1\u6210\u672c\u9ad8\u6602\u3002\u5229\u7528\u8bed\u8a00\u63cf\u8ff0\u6765\u751f\u6210\u9a7e\u9a76\u884c\u4e3a\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u76f4\u89c2\u7684\u4eba\u7c7b\u64cd\u4f5c\u65b9\u5f0f\uff0c\u80fd\u591f\u6a21\u62df\u5e7f\u6cdb\u9a7e\u9a76\u4e92\u52a8\u3002\u4f46\u5927\u578b\u6807\u6ce8\u7684\u8bed\u8a00-\u8f68\u8ff9\u6570\u636e\u7a00\u7f3a\u662f\u8fd9\u4e00\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Text-to-Drive\uff08T2D\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5408\u6210\u591a\u6837\u5316\u9a7e\u9a76\u884c\u4e3a\u7684\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u77e5\u8bc6\u9a71\u52a8\u4e24\u9636\u6bb5\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5229\u7528LLMs\u7684\u5185\u7f6e\u77e5\u8bc6\u751f\u6210\u4e30\u5bcc\u591a\u6837\u7684\u9a7e\u9a76\u884c\u4e3a\u8bed\u8a00\u63cf\u8ff0\uff1b\u63a5\u7740\uff0c\u5229\u7528\u5176\u63a8\u7406\u80fd\u529b\u5728\u6a21\u62df\u5668\u4e2d\u5b9e\u73b0\u8fd9\u4e9b\u884c\u4e3a\u3002T2D\u7684\u6838\u5fc3\u662f\u4f7f\u7528LLM\u6784\u5efa\u72b6\u6001\u56fe\uff0c\u5c06\u4f4e\u7ea7\u72b6\u6001\u6620\u5c04\u5230\u9ad8\u7ea7\u62bd\u8c61\uff0c\u4ece\u800c\u7b80\u5316\u4e86\u8bf8\u5982\u603b\u7ed3\u4f4e\u7ea7\u89c2\u6d4b\u3001\u8bc4\u4f30\u7b56\u7565\u4e0e\u884c\u4e3a\u63cf\u8ff0\u7684\u4e00\u81f4\u6027\u4ee5\u53ca\u8bbe\u8ba1\u8f85\u52a9\u5956\u52b1\u7b49\u4e0b\u6e38\u4efb\u52a1\uff0c\u65e0\u9700\u4eba\u5de5\u76d1\u7763\u3002\u901a\u8fc7\u6211\u4eec\u7684\u77e5\u8bc6\u9a71\u52a8\u65b9\u6cd5\uff0c\u6211\u4eec\u8bc1\u660eT2D\u80fd\u751f\u6210\u6bd4\u5176\u4ed6\u57fa\u51c6\u66f4\u4e30\u5bcc\u7684\u8f68\u8ff9\uff0c\u5e76\u63d0\u4f9b\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u4ea4\u4e92\u5f0f\u5730\u878d\u5165\u4eba\u7c7b\u504f\u597d\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u7f51\u7ad9\uff1a|\n", "2406.04289": "|**2024-06-07**|**What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages**|Nadav Borenstein et.al.|[2406.04289](http://arxiv.org/abs/2406.04289)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b66\u4e60\u4ec0\u4e48\uff1f\u6839\u636e\u5b9a\u4e49\uff0c\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\u662f\u5b57\u7b26\u4e32\u7684\u5206\u5e03\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06\u8fd9\u4e2a\u95ee\u9898\u8f6c\u5316\u4e3a\u8bc4\u4f30\u5b57\u7b26\u4e32\u5206\u5e03\u7c7b\u7684\u5b66\u4e60\u80fd\u529b\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u7406\u8bba\u9650\u5236\uff0c\u4f46\u6211\u4eec\u5173\u6ce8\u7684\u662f\u5b9e\u9645\u53ef\u5b66\u4e60\u6027\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u5b9e\u8bc1\u5de5\u4f5c\uff0c\u6211\u4eec\u8bc4\u4f30\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u5176\u201c\u4e3b\u573a\u201d\u2014\u2014\u5b66\u4e60\u6982\u7387\u8bed\u8a00\u2014\u2014\u4e0a\u7684\u8868\u73b0\uff0c\u800c\u4e0d\u662f\u4f5c\u4e3a\u5f62\u5f0f\u8bed\u8a00\u7684\u5206\u7c7b\u5668\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u9012\u5f52\u8bed\u8a00\u6a21\u578b\uff08RLM\uff09\u7531\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u548cTransformer LM\u5b66\u4e60\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5RLM\u7684\u53ef\u5b66\u4e60\u6027\uff0c\u8003\u5bdf\u5176\u4e0eRLM\u7684\u590d\u6742\u53c2\u6570\u4ee5\u53ca\u795e\u7ecfLM\u9690\u85cf\u5c42\u5927\u5c0f\u7684\u5173\u7cfb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRLM\u7684\u79e9\uff08\u5bf9\u5e94\u4e8e\u5176\u6761\u4ef6\u5206\u5e03\u5bf9\u6570\u4f3c\u7136\u7ebf\u6027\u7a7a\u95f4\u7684\u5927\u5c0f\uff09\u548c\u91c7\u6837\u5b57\u7b26\u4e32\u7684\u9884\u671f\u957f\u5ea6\u662fRNN\u548cTransformer LM\u53ef\u5b66\u4e60\u6027\u7684\u5f3a\u4e14\u663e\u8457\u9884\u6d4b\u56e0\u7d20\u3002\u5176\u4ed6\u4e00\u4e9b\u9884\u6d4b\u6307\u6807\u4e5f\u8fbe\u5230\u4e86\u663e\u8457\u6027\uff0c\u4f46RNN\u548cTransformer\u4e4b\u95f4\u5b58\u5728\u4e0d\u540c\u7684\u6a21\u5f0f\u3002|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278](http://arxiv.org/abs/2406.04278)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|**## \u7ffb\u8bd1\u540e\u7684\u4e2d\u6587\u6458\u8981 \u5bf9\u8bdd\u8bed\u6c14\u5728\u4eba\u9645\u4ea4\u6d41\u4e2d\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u7814\u7a76\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ea4\u6d41\u8bed\u6c14\u7684\u5dee\u5f02\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u5173\u4e8e\u5bf9\u8bdd\u6a21\u5f0f\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u9884\u5148\u5b58\u5728\u7684\u5206\u7c7b\u4f53\u7cfb\u6216\u6587\u672c\u8bed\u6599\u5e93\uff0c\u8fd9\u4e9b\u53ef\u80fd\u5b58\u5728\u5b9e\u9a8c\u8005\u504f\u89c1\uff0c\u5e76\u53ef\u80fd\u65e0\u6cd5\u5145\u5206\u53cd\u6620\u7814\u7a76\u9886\u57df\u4e2d\u7684\u771f\u5b9e\u4e16\u754c\u5206\u5e03\u3002\u53d7\u8ba4\u77e5\u79d1\u5b66\u65b9\u6cd5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8fed\u4ee3\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u4e24\u9879\u4efb\u52a1\u6765\u540c\u65f6\u63ed\u793a\u8bed\u6c14\u548c\u53e5\u5b50\uff1a\uff081\uff09\u53c2\u4e0e\u8005\u5224\u65ad\u7ed9\u5b9a\u53e5\u5b50\u7684\u8bed\u6c14\uff0c\uff082\uff09\u53e6\u4e00\u53c2\u4e0e\u8005\u6839\u636e\u8be5\u8bed\u6c14\u751f\u6210\u53e5\u5b50\u3002\u6211\u4eec\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u548cGPT-4\u4e4b\u95f4\u8fdb\u884c\u4e86100\u8f6e\u8fd9\u6837\u7684\u4e92\u52a8\uff0c\u4ece\u800c\u83b7\u5f97\u4e86\u4e00\u7ec4\u5305\u542b\u53e5\u5b50\u548c\u5e38\u89c1\u5bf9\u8bdd\u8bed\u6c14\u7684\u6570\u636e\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u989d\u5916\u5b9e\u9a8c\uff0c\u8ba9\u4eba\u7c7b\u548cGPT-4\u5bf9\u6240\u6709\u53e5\u5b50\u6807\u6ce8\u6240\u6709\u8bed\u6c14\u3002\u57fa\u4e8e1,339\u540d\u4eba\u7c7b\u53c2\u4e0e\u8005\u300133,370\u6b21\u4eba\u7c7b\u8bc4\u4ef7\u4ee5\u53ca29,900\u4e2aGPT-4\u67e5\u8be2\u7684\u6570\u636e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u521b\u5efa\u4e00\u4e2a\u53ef\u89e3\u91ca\u7684\u51e0\u4f55\u8868\u793a\uff0c\u4ee5\u5c55\u793a\u4eba\u7c7b\u548cGPT-4\u4e4b\u95f4\u7684\u5bf9\u8bdd\u8bed\u6c14\u5173\u7cfb\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u548c\u8ba4\u77e5\u79d1\u5b66\u7406\u5ff5\u5982\u4f55\u7ed3\u5408\uff0c\u4ee5\u89e3\u51b3\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6311\u6218\u3002**|\n", "2406.05132": "|**2024-06-07**|**3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs**|Jianing Yang et.al.|[2406.05132](http://arxiv.org/abs/2406.05132)|**[link](https://github.com/sled-group/3D-GRAND)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u8bed\u8a00\u4e0e\u4e09\u7ef4\u611f\u77e5\u7684\u878d\u5408\u5bf9\u4e8e\u6784\u5efa\u7406\u89e3\u548c\u4e92\u52a8\u4e8e\u7269\u7406\u4e16\u754c\u7684\u5b9e\u4f53\u4ee3\u7406\u548c\u673a\u5668\u4eba\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u9002\u5e94\u4e09\u7ef4\u73af\u5883\uff083D-LLMs\uff09\u65b9\u9762\u4ecd\u5904\u4e8e\u521d\u7ea7\u9636\u6bb5\uff0c\u4e3b\u8981\u6311\u6218\u5728\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u5bc6\u96c6\u5730\u5c06\u8bed\u8a00\u4e0e\u4e09\u7ef4\u573a\u666f\u5173\u8054\u7684\u6570\u636e\u96c6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e863D-GRAND\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u5927\u578b\u6570\u636e\u96c6\uff0c\u5305\u542b40,087\u4e2a\u5bb6\u5ead\u573a\u666f\uff0c\u914d\u5bf9\u6709620\u4e07\u6761\u8be6\u5c3d\u7684\u573a\u666f-\u8bed\u8a00\u6307\u4ee4\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u75283D-GRAND\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u663e\u8457\u63d0\u9ad8\u4e863D-LLMs\u7684\u5b9a\u4f4d\u80fd\u529b\uff0c\u5e76\u51cf\u5c11\u4e86\u9519\u8bef\u7684\u60f3\u8c61\u3002\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e863D-POPE\u57fa\u51c6\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f303D-LLMs\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u6a21\u578b\u7684\u516c\u5e73\u6bd4\u8f83\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u96c6\u89c4\u6a21\u4e0e3D-LLM\u6027\u80fd\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u5f3a\u8c03\u4e86\u5927\u578b\u4e09\u7ef4\u6587\u672c\u6570\u636e\u96c6\u5728\u63a8\u52a8\u4f53\u611fAI\u7814\u7a76\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u521d\u6b65\u8ff9\u8c61\u8868\u660e\uff0c\u901a\u8fc7\u5728\u5927\u578b\u5408\u6210\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u53ef\u80fd\u5728\u73b0\u5b9e\u4e16\u754c3D\u626b\u63cf\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u8fd9\u5c55\u793a\u4e86\u6a21\u62df\u5230\u5b9e\u9645\u7684\u8fc1\u79fb\u5b66\u4e60\u6f5c\u529b\u3002\u901a\u8fc73D-GRAND\u548c3D-POPE\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u4f53\u611fAI\u793e\u533a\u63d0\u4f9b\u5fc5\u8981\u7684\u8d44\u6e90\u548c\u6d1e\u89c1\uff0c\u63a8\u52a8\u66f4\u53ef\u9760\u3001\u66f4\u624e\u5b9e\u76843D-LLMs\u7684\u53d1\u5c55\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttps://3d-grand.github.io|\n", "2406.05130": "|**2024-06-07**|**An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models**|Xiongtao Zhou et.al.|[2406.05130](http://arxiv.org/abs/2406.05130)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u5177\u6709\u6570\u5341\u4ebf\u53c2\u6570\uff0c\u5168\u9762\u8c03\u6574\u53d8\u5f97\u56f0\u96be\u3002\u7814\u7a76\u76ee\u6807\u662f\u627e\u51fa\u5728\u53c2\u6570\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u5347MLLM\u6027\u80fd\u7684\u6709\u6548\u65b9\u6cd5\u3002\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u56db\u79cd\u6d41\u884c\u7684PEFT\u6280\u672f\u5bf9\u5f00\u6e90MLLMs\u7684LLM\u7ec4\u4ef6\u8fdb\u884c\u5fae\u8c03\uff0c\u8bba\u6587\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5206\u6790\uff0c\u5185\u5bb9\u5305\u62ec\u4e0d\u540c\u65b9\u6cd5\u5bf9\u6a21\u578b\u3001\u53c2\u6570\u4f4d\u7f6e\u3001\u5fae\u8c03\u6570\u636e\u89c4\u6a21\u3001\u6a21\u578b\u7a33\u5b9a\u6027\u3001\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e03\u9879\u6570\u636e\u96c6\uff1a\u672a\u89c1\u8fc7\u7684\u548c\u5df2\u89c1\u8fc7\u7684\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u662f\u6700\u6709\u6548\u7684PEFT\u65b9\u6cd5\uff0c\u800c\u8fde\u63a5\u5668\u5c42\u7684\u5fae\u8c03\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u80fd\u63d0\u9ad8\u6027\u80fd\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05127": "|**2024-06-07**|**Towards Semantic Equivalence of Tokenization in Multimodal LLM**|Shengqiong Wu et.al.|[2406.05127](http://arxiv.org/abs/2406.05127)|null|### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002MLLM\u7684\u6838\u5fc3\u5728\u4e8e\u89c6\u89c9 tokenization\uff0c\u5373\u5982\u4f55\u6709\u6548\u5730\u5c06\u8f93\u5165\u7684\u89c6\u89c9\u4fe1\u53f7\u8f6c\u5316\u4e3a\u5bf9\u8bed\u8a00\u6a21\u578b\u6709\u76ca\u7684\u7279\u5f81\u8868\u793a\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u89c9tokenizer\u5728\u4fdd\u6301\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u8bed\u4e49\u4e00\u81f4\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5b83\u4eec\u8fc7\u4e8e\u788e\u7247\u5316\u89c6\u89c9\u8f93\u5165\uff0c\u7834\u574f\u4e86\u89c6\u89c9\u5185\u5bb9\u7684\u8bed\u4e49\u5b8c\u6574\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u8bed\u4e49\u7b49\u6548\u89c6\u89c9tokenizer\uff08SeTok\uff09\uff0c\u5b83\u901a\u8fc7\u52a8\u6001\u805a\u7c7b\u7b97\u6cd5\u5c06\u89c6\u89c9\u7279\u5f81\u7ec4\u7ec7\u6210\u8bed\u4e49\u5355\u5143\uff0c\u6839\u636e\u56fe\u50cf\u590d\u6742\u6027\u7075\u6d3b\u51b3\u5b9atoken\u7684\u6570\u91cf\u3002\u8fd9\u79cd\u751f\u6210\u7684\u89c6\u89c9tokens\u80fd\u6709\u6548\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u540c\u65f6\u6355\u6349\u4f4e\u9891\u548c\u9ad8\u9891\u89c6\u89c9\u7279\u5f81\u3002 ### \u4efb\u52a1 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSetokim\u7684\u65b0\u578bMLLM\uff0c\u5b83\u7ed3\u5408\u4e86SeTok\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSetokim\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u4f18\u52bf\u3002\u5173\u4e8e\u66f4\u591a\u8be6\u60c5\uff0c\u53ef\u4ee5\u8bbf\u95ee\u9879\u76ee\u7f51\u9875\uff1ahttps://chocowu.github.io/SeTok-web/\u3002|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107](http://arxiv.org/abs/2406.05107)|null|## \u7ffb\u8bd1 \u6570\u636e\u63a2\u7d22\u662f\u4e00\u4e2a\u590d\u6742\u7684\u8fc7\u7a0b\uff0c\u7528\u6237\u901a\u8fc7\u9010\u6b65\u6267\u884c\u4e00\u7cfb\u5217\u67e5\u8be2\u6765\u5ba1\u89c6\u6570\u636e\u96c6\u3002\u6709\u65f6\uff0c\u7528\u6237\u4f1a\u63a2\u7d22\u65b0\u6570\u636e\u4ee5\u719f\u6089\u5b83\uff0c\u4f46\u66f4\u591a\u65f6\u5019\uff0c\u63a2\u7d22\u8fc7\u7a0b\u662f\u56f4\u7ed5\u7279\u5b9a\u5206\u6790\u76ee\u6807\u6216\u95ee\u9898\u8fdb\u884c\u7684\u3002\u4e3a\u4e86\u5e2e\u52a9\u7528\u6237\u6709\u6548\u63a2\u7d22\uff0c\u5df2\u63d0\u51fa\u81ea\u52a8\u5316\u6570\u636e\u63a2\u7d22\uff08Automated Data Exploration\uff0cADE\uff09\u7cfb\u7edf\uff0c\u5b83\u4eec\u65e8\u5728\u81ea\u52a8\u751f\u6210\u5c55\u793a\u6570\u636e\u6709\u8da3\u7279\u6027\u7684\u5b8c\u6574\u63a2\u7d22\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684ADE\u7cfb\u7edf\u5e38\u53d7\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u4f18\u5316\u51fd\u6570\uff0c\u5bfc\u81f4\u5bf9\u540c\u4e00\u6570\u636e\u96c6\u59cb\u7ec8\u4ea7\u751f\u76f8\u540c\u7684\u63a2\u7d22\u5e8f\u5217\uff0c\u8fd9\u5728\u6709\u660e\u786e\u76ee\u6807\u7684\u63a2\u7d22\u4e2d\u663e\u5f97\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51faLINX\uff0c\u4e00\u4e2a\u7ed3\u5408\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\u7684\u751f\u6210\u5f0f\u7cfb\u7edf\uff0c\u4e13\u6ce8\u4e8e\u9762\u5411\u76ee\u6807\u7684\u6570\u636e\u63a2\u7d22\u3002 LINX\u63a5\u53d7\u8f93\u5165\u6570\u636e\u96c6\u548c\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u5206\u6790\u76ee\u6807\uff0c\u751f\u6210\u4e0e\u7528\u6237\u9700\u6c42\u76f8\u5173\u7684\u4e2a\u6027\u5316\u63a2\u7d22\u4f1a\u8bdd\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89e3\u6790\u8f93\u5165\u7684\u5206\u6790\u76ee\u6807\uff0c\u5e76\u636e\u6b64\u751f\u6210\u671f\u671b\u8f93\u51fa\u63a2\u7d22\u4f1a\u8bdd\u7684\u89c4\u8303\u3002\u8fd9\u4e9b\u89c4\u8303\u968f\u540e\u88ab\u4f20\u9012\u7ed9\u57fa\u4e8e\u7ea6\u675f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08Constrained Deep Reinforcement Learning\uff0cCDRL\uff09\u7684\u65b0\u578b\u6a21\u5757\u5316ADE\u5f15\u64ce\uff0c\u4f7f\u5176\u80fd\u6839\u636e\u6307\u5b9a\u6307\u4ee4\u8c03\u6574\u8f93\u51fa\u3002\u4e3a\u4e86\u9a8c\u8bc1LINX\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u9762\u5411\u76ee\u6807\u63a2\u7d22\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u6df1\u5165\u7684\u7528\u6237\u7814\u7a76\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLINX\u751f\u6210\u7684\u63a2\u7d22\u7b14\u8bb0\u672c\u5728\u76f8\u5173\u6027\u548c\u5b9e\u7528\u6027\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u89e3\u51b3\u65b9\u6848\uff0c\u5305\u62ecChatGPT\u3001\u65e0\u76ee\u6807\u5bfc\u5411\u7684ADE\u4ee5\u53ca\u5546\u4e1a\u7cfb\u7edf\u3002|\n", "2406.05085": "|**2024-06-07**|**Multi-Head RAG: Solving Multi-Aspect Problems with LLMs**|Maciej Besta et.al.|[2406.05085](http://arxiv.org/abs/2406.05085)|**[link](https://github.com/spcl/mrag)**|**## \u80cc\u666f **\u589e\u5f3a\u578b\u68c0\u7d22\u751f\u6210\uff08Retrieval Augmented Generation, RAG\uff09**\u901a\u8fc7\u5c06\u6587\u6863\u5185\u5bb9\u878d\u5165\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u4e2d\uff0c\u63d0\u9ad8\u4e86\u5176\u54cd\u5e94\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684RAG\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5904\u7406\u90a3\u4e9b\u53ef\u80fd\u9700\u8981\u68c0\u7d22\u5305\u542b\u4e0d\u540c\u5185\u5bb9\u7684\u591a\u6587\u6863\u67e5\u8be2\u3002\u8fd9\u7c7b\u95ee\u9898\u5728\u73b0\u5b9e\u4e2d\u5f88\u5e38\u89c1\uff0c\u4f46\u6311\u6218\u5728\u4e8e\uff0c\u8fd9\u4e9b\u6587\u6863\u7684\u5d4c\u5165\u5728\u5411\u91cf\u7a7a\u95f4\u4e2d\u53ef\u80fd\u76f8\u8ddd\u8f83\u8fdc\uff0c\u96be\u4ee5\u4e00\u6b21\u6027\u83b7\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6848\u2014\u2014**\u591a\u5934\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Multi-Head RAG, MRAG\uff09**\uff0c\u5b83\u4ee5\u4e00\u79cd\u7b80\u5355\u800c\u5f3a\u5927\u7684\u65b9\u5f0f\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff1a\u5229\u7528Transformer\u7684\u591a\u5934\u6ce8\u610f\u529b\u5c42\u7684\u6fc0\u6d3b\u4f5c\u4e3a\u68c0\u7d22\u952e\uff0c\u800c\u975e\u89e3\u7801\u5c42\u3002\u8fd9\u4e2a\u60f3\u6cd5\u7684\u9a71\u52a8\u529b\u5728\u4e8e\uff0c\u4e0d\u540c\u7684\u6ce8\u610f\u529b\u5934\u80fd\u591f\u5b66\u4e60\u6355\u6349\u6570\u636e\u7684\u4e0d\u540c\u65b9\u9762\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u6fc0\u6d3b\uff0c\u6211\u4eec\u5f97\u5230\u7684\u5d4c\u5165\u80fd\u4ee3\u8868\u6570\u636e\u9879\u548c\u67e5\u8be2\u7684\u591a\u79cd\u7279\u6027\uff0c\u4ece\u800c\u63d0\u5347\u590d\u6742\u67e5\u8be2\u7684\u68c0\u7d22\u7cbe\u5ea6\u3002 **\u8d21\u732e** \u6211\u4eec\u63d0\u4f9b\u4e86\u8bc4\u4f30\u65b9\u6cd5\u3001\u5ea6\u91cf\u6807\u51c6\u3001\u5408\u6210\u6570\u636e\u96c6\u4ee5\u53ca\u5b9e\u9645\u5e94\u7528\u6848\u4f8b\uff0c\u6765\u5c55\u793aMRAG\u7684\u6709\u6548\u6027\u3002\u4e0e\u6807\u51c6RAG\u57fa\u7ebf\u76f8\u6bd4\uff0cMRAG\u5728\u76f8\u5173\u6027\u65b9\u9762\u7684\u63d0\u5347\u53ef\u9ad8\u8fbe20%\u3002MRAG\u53ef\u4ee5\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u7684RAG\u6846\u67b6\uff0c\u5982RAGAS\uff0c\u4ee5\u53ca\u5404\u7c7b\u6570\u636e\u5b58\u50a8\u7cfb\u7edf\u3002 \u603b\u7ed3\uff0c\u672c\u6587\u65e8\u5728\u6539\u8fdb\u73b0\u6709RAG\u6a21\u578b\uff0c\u4ee5\u66f4\u597d\u5730\u5904\u7406\u6d89\u53ca\u591a\u89d2\u5ea6\u4fe1\u606f\u68c0\u7d22\u7684\u590d\u6742\u67e5\u8be2\u4efb\u52a1\u3002**|\n", "2406.05063": "|**2024-06-07**|**Are Large Language Models More Empathetic than Humans?**|Anuradha Welivita et.al.|[2406.05063](http://arxiv.org/abs/2406.05063)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7814\u7a76\u5b83\u4eec\u662f\u5426\u80fd\u5728\u60c5\u611f\u8bc6\u522b\u548c\u5171\u60c5\u56de\u5e94\u65b9\u9762\u8d85\u8d8a\u4eba\u7c7b\u5df2\u6210\u4e3a\u7814\u7a76\u7126\u70b9\u3002\u672c\u8bba\u6587\u5f00\u5c55\u4e86\u4e00\u9879\u6df1\u5165\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5305\u62ecGPT-4\u3001LLaMA-2-70B-Chat\u3001Gemini-1.0-Pro\u548cMixtral-8x7B-Instruct\u5728\u5185\u7684\u56db\u6b3e\u6700\u5148\u8fdb\u7684LLMs\u4e0e\u4eba\u7c7b\u5728\u5171\u60c5\u56de\u5e94\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca1,000\u540d\u53c2\u4e0e\u8005\u7684\u53cc\u76f2\u7528\u6237\u7814\u7a76\uff0c\u5bf92,000\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u60c5\u611f\u5bf9\u8bdd\u63d0\u793a\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u8fd9\u4e9b\u63d0\u793a\u6db5\u76d6\u4e8632\u79cd\u4e0d\u540c\u6b63\u8d1f\u60c5\u7eea\u7684\u5e7f\u6cdb\u8303\u56f4\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u7684\u5171\u60c5\u56de\u5e94\u80fd\u529b\u5728\u7edf\u8ba1\u5b66\u4e0a\u4f18\u4e8e\u4eba\u7c7b\u3002GPT-4\u8868\u73b0\u51fa\u6700\u5f3a\u70c8\u7684\u5171\u60c5\uff0c\u5176\u201c\u597d\u201d\u7b49\u7ea7\u522b\u7684\u56de\u590d\u6bd4\u4eba\u7c7b\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea631%\u3002\u7d27\u968f\u5176\u540e\u7684\u662fLLaMA-2\uff0c\u63d0\u5347\u4e86\u7ea624%\uff0cMixtral-8x7B\u63d0\u5347\u4e86\u7ea621%\uff0cGemini-Pro\u63d0\u5347\u4e86\u7ea610%\u3002\u6211\u4eec\u8fd8\u5bf9\u56de\u590d\u8bc4\u7ea7\u8fdb\u884c\u4e86\u66f4\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u53d1\u73b0\u67d0\u4e9bLLMs\u5728\u56de\u5e94\u7279\u5b9a\u60c5\u7eea\u65b9\u9762\u660e\u663e\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\u3002\u63d0\u51fa\u7684\u8bc4\u4f30\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u9002\u5e94\u6027\u5f3a\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0LLMs\u7684\u5171\u60c5\u80fd\u529b\uff0c\u907f\u514d\u4e86\u672a\u6765\u7814\u7a76\u91cd\u590d\u8fd9\u9879\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055](http://arxiv.org/abs/2406.05055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u63d0\u793a\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u7cbe\u5fc3\u6784\u5efa\u7684\u57fa\u51c6\u4e0a\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u5b58\u5728\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u63a8\u7406\u95ee\u9898\uff0c\u5373\u6240\u8c13\u7684\u4e0d\u660e\u786e\u95ee\u9898\u3002\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0c\u73b0\u6709\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6548\u679c\u4e0d\u4f73\uff0c\u5f80\u5f80\u7ed9\u51fa\u8fc7\u5ea6\u81ea\u4fe1\u7684\u7b54\u6848\u6216\u9519\u8bef\u63a8\u65ad\u3002\u4e3a\u4e86\u6df1\u5165\u7814\u7a76\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u201d\uff08PMC\uff09\u7684\u57fa\u51c6\uff0c\u5e76\u5f15\u5165\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002\u4f7f\u7528PMC\u57fa\u51c6\u7684\u5206\u6790\u63ed\u793a\u4e86\u5728\u89e3\u51b3\u660e\u786e\u95ee\u9898\u7684\u6570\u5b66\u63a8\u7406\u6027\u80fd\u4e0e\u8bc6\u522b\u4e0d\u660e\u786e\u95ee\u9898\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u6743\u8861\u3002\u9488\u5bf9PMC\u5e26\u6765\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\uff0c\u79f0\u4e3aSMT-LIB\u63d0\u793a\uff08SLP\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528SMT-LIB\u8bed\u8a00\u63cf\u8ff0\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u76f4\u63a5\u6c42\u89e3\uff0c\u7136\u540e\u91c7\u7528\u53cc\u91cd\u68c0\u67e5\u6c42\u89e3\u7b56\u7565\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u7684\u6ee1\u8db3\u6027\u548c\u552f\u4e00\u6027\uff0c\u4ece\u800c\u63d0\u4f9b\u6700\u7ec8\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u7684SLP\u65b9\u6cd5\u5728\u5904\u7406\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u65f6\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5177\u6709\u663e\u8457\u4f18\u52bf\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u6211\u4eec\u7684\u57fa\u51c6\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2406.05053": "|**2024-06-07**|**Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation**|Nachiket Kotalwar et.al.|[2406.05053](http://arxiv.org/abs/2406.05053)|null|### \u6982\u8ff0 \u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u7684\u6f5c\u529b\u5de8\u5927\uff0c\u5b83\u4eec\u80fd\u591f\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u53cd\u9988\u548c\u63d0\u793a\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u751f\u6210\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u4ee5\u8fbe\u5230\u4eba\u7c7b\u5bfc\u5e08\u7684\u6c34\u5e73\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u6559\u80b2\u90e8\u7f72\u4e2d\uff0c\u9664\u4e86\u8d28\u91cf\u5916\uff0c\u6210\u672c\u3001\u65f6\u95f4\u53ca\u6570\u636e\u9690\u79c1\u4e5f\u662f\u5173\u952e\u8003\u91cf\u56e0\u7d20\u3002\u672c\u8bba\u6587\u65e8\u5728\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u53cd\u9988\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u5305\u62ec\u8d28\u91cf\u3001\u6210\u672c\u3001\u901f\u5ea6\u548c\u6570\u636e\u9690\u79c1\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u5229\u7528\u6700\u65b0\u7684\u5728\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u6280\u672f\uff0c\u8fd9\u6709\u52a9\u4e8e\u76f4\u63a5\u964d\u4f4e\u6210\u672c\u5e76\u4fdd\u62a4\u6570\u636e\u9690\u79c1\u3002 \u4e3a\u4e86\u4f18\u5316\u9002\u5408\u6d4f\u89c8\u5668\u5185\u8fd0\u884c\u7684\u5c0f\u578b\u6a21\u578b\u7684\u53cd\u9988\u8d28\u91cf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGPT-4\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u7684\u5fae\u8c03\u6d41\u7a0b\u3002\u6211\u4eec\u5c06\u5c55\u793a\u5982\u4f55\u4f7f\u7528WebLLM\u7684\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u5f15\u64ce\u6765\u4f18\u5316Llama3-8B\u548cPhi3-3.8B\u76844\u4f4d\u91cf\u5316\u6a21\u578b\u5728\u4e09\u4e2a\u4e0d\u540cPython\u7f16\u7a0b\u6570\u636e\u96c6\u4e0a\u7684\u6548\u679c\u3002\u6211\u4eec\u627f\u8bfa\u4f1a\u516c\u5f00\u5168\u90e8\u5b9e\u73b0\u3001web\u5e94\u7528\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4fc3\u8fdb\u5728\u6d4f\u89c8\u5668\u8bed\u8a00\u6a21\u578b\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2406.05039": "|**2024-06-07**|**Bootstrapping Referring Multi-Object Tracking**|Yani Zhang et.al.|[2406.05039](http://arxiv.org/abs/2406.05039)|**[link](https://github.com/zyn213/temprmot)**|## \u80cc\u666f \u5f53\u524d\u7684\u591a\u5bf9\u8c61\u5f15\u7528\u8ddf\u8e2a\uff08RMOT\uff09\u4efb\u52a1\u901a\u5e38\u4f9d\u8d56\u4e8e\u624b\u52a8\u6807\u6ce8\u7684\u6570\u636e\u96c6\u548c\u9759\u6001\u89c4\u5219\uff0c\u8fd9\u9650\u5236\u4e86\u591a\u6837\u6027\u548c\u5b9e\u65bd\u8303\u56f4\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u901a\u8fc7\u5f15\u5165\u66f4\u591a\u533a\u5206\u6027\u8bed\u8a00\u8bcd\u6c47\u6765\u63a8\u52a8RMOT\u4efb\u52a1\u7684\u53d1\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u9996\u5148\u5bf9Refer-KITTI\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u6269\u5c55\uff0c\u521b\u5efa\u4e86Refer-KITTI-V2\uff0c\u5b83\u4ece\u6700\u521d\u76842,719\u4e2a\u624b\u52a8\u6807\u6ce8\u5f00\u59cb\uff0c\u89e3\u51b3\u4e86\u7c7b\u522b\u4e0d\u5e73\u8861\u95ee\u9898\uff0c\u5e76\u589e\u52a0\u4e86\u66f4\u591a\u5173\u952e\u8bcd\uff0c\u4f7f\u5176\u66f4\u8d34\u8fd1\u73b0\u5b9e\u573a\u666f\uff0c\u76f8\u8f83\u4e8eRefer-KITTI\u6709\u6240\u8fdb\u6b65\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6269\u5145\u8fd9\u4e9b\u6807\u6ce8\uff0c\u603b\u8ba1\u8fbe\u52309,758\u4e2a\uff0c\u751f\u6210\u4e86617\u4e2a\u4e0d\u540c\u7684\u8bcd\u6c47\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684RMOT\u57fa\u51c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6539\u8fdb\u4e86RMOT\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u800c\u4f18\u96c5\u7684\u65f6\u5e8f\u63a8\u8fdb\u7b56\u7565\uff0c\u8be5\u7b56\u7565\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\u3002\u76f8\u5173\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05035": "|**2024-06-07**|**Scenarios and Approaches for Situated Natural Language Explanations**|Pengshuo Qiu et.al.|[2406.05035](http://arxiv.org/abs/2406.05035)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u751f\u6210\u9002\u5e94\u4e0d\u540c\u7528\u6237\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\uff08NLE\uff09\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u79cd\u9002\u5e94\u6027\u7684\u91cf\u5316\u8bc4\u4f30\u5c1a\u5b58\u7a7a\u767d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u2014\u2014\u57fa\u4e8e\u60c5\u5883\u7684\u89e3\u91ca\uff08Situation-Based Explanation\uff0cSBE\uff09\u6570\u636e\u96c6\uff0c\u5305\u542b100\u4e2a\u9700\u8981\u89e3\u91ca\u7684\u4e8b\u7269\uff08explanandum\uff09\u3002\u6bcf\u4e2a\u4e8b\u7269\u90fd\u914d\u5bf9\u4e86\u9488\u5bf9\u6559\u5e08\u3001\u5b66\u751f\u548c\u4e13\u4e1a\u4eba\u58eb\u7b49\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u89e3\u91ca\uff0c\u4ee5\u4fbf\u8bc4\u4f30\u6a21\u578b\u5728\u6ee1\u8db3\u8fd9\u4e9b\u591a\u5143\u5316\u7fa4\u4f53\u4fe1\u606f\u9700\u6c42\u548c\u80cc\u666f\u4e0b\u7684\u89e3\u91ca\u7cbe\u51c6\u5ea6\uff0c\u5982\u5b66\u751f\u3001\u6559\u5e08\u548c\u5bb6\u957f\u3002\u6bcf\u79cd\u201c\u4e8b\u4f8b-\u53d7\u4f17\u201d\u7ec4\u5408\u90fd\u9644\u6709\u4eba\u7c7b\u64b0\u5199\u7684\u53c2\u8003\u89e3\u91ca\uff0c\u7528\u4e8e\u8ba1\u7b97\u5206\u6570\uff0c\u4ee5\u91cf\u5316\u6a21\u578b\u5982\u4f55\u6839\u636e\u60c5\u5883\u8c03\u6574\u89e3\u91ca\u3002\u6211\u4eec\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u4e0a\u6d4b\u8bd5\u4e86\u4e09\u79cd\u63d0\u793a\u65b9\u6cd5\uff1a\u89c4\u5219\u57fa\u7840\u63d0\u793a\u3001\u5143\u63d0\u793a\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u3002\u7814\u7a76\u53d1\u73b0\uff1a1\uff09\u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u751f\u6210\u63d0\u793a\u4ea7\u751f\u66f4\u7cbe\u786e\u5730\u7b26\u5408\u76ee\u6807\u60c5\u5883\u7684\u89e3\u91ca\uff1b2\uff09\u660e\u786e\u63d0\u793a\u201c\u4f60\u662f\u4e00\u4e2a\u6709\u7528\u7684\u52a9\u624b\u201d\u5e76\u975e\u9488\u5bf9\u60c5\u5883\u5316NLE\u4efb\u52a1\u7684\u5fc5\u8981\u6280\u672f\uff1b3\uff09\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u4ec5\u80fd\u5e2e\u52a9\u6a21\u578b\u5b66\u4e60\u6f14\u793a\u6a21\u677f\uff0c\u4f46\u65e0\u52a9\u4e8e\u63d0\u5347\u5176\u63a8\u7406\u6027\u80fd\u3002SBE\u6570\u636e\u96c6\u548c\u6211\u4eec\u7684\u5206\u6790\u4e3a\u4eca\u540e\u751f\u6210\u9002\u5e94\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2406.06525": "|**2024-06-10**|**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**|Peize Sun et.al.|[2406.06525](http://arxiv.org/abs/2406.06525)|**[link](https://github.com/foundationvision/llamagen)**|**\u6211\u4eec\u63d0\u51faLlamaGen\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u65b0\u7684\u56fe\u50cf\u751f\u6210\u6a21\u578b\u5bb6\u65cf\uff0c\u5b83\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u539f\u59cb\u201c\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u201d\u8303\u5f0f\u5e94\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u9886\u57df\u3002\u8fd9\u8868\u660e\uff0c\u5982\u679c\u9002\u5f53\u6269\u5c55\uff0c\u672a\u7ecf\u89c6\u89c9\u7279\u6027\u7684\u5148\u9a8c\u77e5\u8bc6\u589e\u5f3a\u7684\u7eaf\u81ea\u56de\u5f52\u6a21\u578b\uff08\u5982Llama\uff09\u4e5f\u80fd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u56fe\u50cf\u751f\u6210\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u56fe\u50cf\u5206\u8bcd\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u4ee5\u53ca\u8bad\u7ec3\u6570\u636e\u8d28\u91cf\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a(1) \u4e00\u79cd\u5177\u670916\u500d\u4e0b\u91c7\u6837\u7684\u56fe\u50cf\u5206\u8bcd\u5668\uff0c\u5176\u5728ImageNet\u57fa\u51c6\u4e0a\u7684\u91cd\u6784\u8d28\u91cf\u4e3a0.94\uff0c\u4ee3\u7801\u4e66\u5229\u7528\u7387\u9ad8\u8fbe97%\u3002(2) \u4e00\u7cfb\u5217\u4ece111\u767e\u4e07\u523031\u4ebf\u53c2\u6570\u7684\u7c7b\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u5728ImageNet 256x256\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.18\u7684FID\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u6d41\u884c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5982LDM\u548cDiT\u3002(3) \u4e00\u4e2a7.75\u4ebf\u53c2\u6570\u7684\u6587\u672c\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\u5728LAION-COCO\u548c\u9ad8\u5ba1\u7f8e\u8d28\u91cf\u56fe\u50cf\u4e0a\uff0c\u663e\u793a\u51fa\u826f\u597d\u7684\u89c6\u89c9\u8d28\u91cf\u548c\u6587\u672c\u4e00\u81f4\u6027\u6027\u80fd\u3002(4) \u6211\u4eec\u9a8c\u8bc1\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u670d\u52a1\u6846\u67b6\u5728\u4f18\u5316\u56fe\u50cf\u751f\u6210\u6a21\u578b\u63a8\u7406\u901f\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5b9e\u73b0\u4e86326%\u81f3414%\u7684\u901f\u5ea6\u63d0\u5347\u3002\u6211\u4eec\u5f00\u6e90\u6240\u6709\u6a21\u578b\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u751f\u6210\u548c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u7684\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u7684\u53d1\u5c55\u3002**|\n", "2406.06519": "|**2024-06-10**|**UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor**|Shivani Upadhyay et.al.|[2406.06519](http://arxiv.org/abs/2406.06519)|**[link](https://github.com/castorini/umbrela)**|**## \u7ffb\u8bd1 \u5927\u91cf\u76f8\u5173\u6027\u5224\u65ad\u5bf9\u4e8e\u68c0\u7d22\u7cfb\u7edf\u7684\u6709\u6548\u8bad\u7ec3\u548c\u7cbe\u786e\u8bc4\u4f30\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u5224\u65ad\u7531\u4eba\u5de5\u8bc4\u5b9a\u5458\u5b8c\u6210\uff0c\u8fc7\u7a0b\u6602\u8d35\u4e14\u8017\u65f6\u3002\u5fae\u8f6fBing\u7684Thomas\u7b49\u4eba\u6700\u8fd1\u7684\u4e00\u9879\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u51c6\u786e\u5730\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u63d0\u4f9b\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u5224\u65ad\u3002\u9057\u61be\u7684\u662f\uff0c\u4ed6\u4eec\u7684\u7814\u7a76\u5e76\u672a\u516c\u5f00\u53ef\u4f9b\u91cd\u590d\u4f7f\u7528\u7684\u8f6f\u4ef6\u5de5\u5177\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u5305\u2014\u2014UMBRELA\uff08\u5168\u79f0\u4e3a\u201cUMBRELA\u662fBing RELevance Assessor\u7684\u9012\u5f52\u7f29\u5199\u201d\uff09\uff0c\u5b83\u57fa\u4e8eOpenAI\u7684GPT-4\u6a21\u578b\u590d\u73b0\u4e86Thomas\u7b49\u4eba\u7684\u7ed3\u679c\uff0c\u5e76\u4e3a\u539f\u8bba\u6587\u589e\u6dfb\u4e86\u66f4\u591a\u7ec6\u8282\u3002\u6211\u4eec\u5728TREC 2019\u5e74\u81f32023\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u4efb\u52a1\u4e2d\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u76f8\u5173\u6027\u5224\u65ad\u4e0e\u9ad8\u6548\u591a\u9636\u6bb5\u68c0\u7d22\u7cfb\u7edf\u751f\u6210\u7684\u6392\u540d\u9ad8\u5ea6\u76f8\u5173\u3002\u8be5\u5de5\u5177\u5305\u8bbe\u8ba1\u4e3a\u6613\u4e8e\u6269\u5c55\uff0c\u53ef\u4ee5\u878d\u5165\u73b0\u6709\u7684\u591a\u9636\u6bb5\u68c0\u7d22\u548c\u8bc4\u4f30\u6d41\u7a0b\uff0c\u4e3a\u7814\u7a76\u68c0\u7d22\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u8d44\u6e90\u3002UMBRELA\u5c06\u5728TREC 2024\u5e74\u7684RAG\u4efb\u52a1\u4e2d\u7528\u4e8e\u8f85\u52a9\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u6211\u4eec\u671f\u671b\u5b83\u6210\u4e3a\u8be5\u9886\u57df\u8fdb\u4e00\u6b65\u521b\u65b0\u7684\u57fa\u7840\u3002UMBRELA\u7684\u4ee3\u7801\u5e93\u53ef\u4e8ehttps://github.com/castorini/umbrela\u83b7\u53d6\u3002**|\n", "2406.06499": "|**2024-06-10**|**NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative**|Asmar Nadeem et.al.|[2406.06499](http://arxiv.org/abs/2406.06499)|null|\u5f53\u524d\u7684\u89c6\u9891\u5b57\u5e55\u57fa\u51c6\u548c\u6a21\u578b\u5728\u8868\u5f81\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u79cd\u53d9\u4e8b\u662f\u901a\u8fc7\u56e0\u679c\u5173\u7cfb\u8fde\u63a5\u7684\u4e00\u7cfb\u5217\u4e8b\u4ef6\uff0c\u968f\u65f6\u95f4\u53d1\u5c55\uff0c\u7531\u4eba\u7269\u6216\u4e3b\u4f53\u9a71\u52a8\u3002\u8fd9\u79cd\u7f3a\u4e4f\u53d9\u4e8b\u6027\u9650\u5236\u4e86\u6a21\u578b\u751f\u6210\u6355\u6349\u89c6\u9891\u5185\u5bb9\u5185\u5728\u56e0\u679c\u548c\u65f6\u95f4\u52a8\u6001\u7684\u6587\u672c\u63cf\u8ff0\u7684\u80fd\u529b\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faNarrativeBridge\uff0c\u5b83\u5305\u62ec\u4ee5\u4e0b\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\uff1a\uff081\uff09\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u751f\u6210\u7684\u65b0\u578b\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\uff08CTN\uff09\u5b57\u5e55\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u660e\u786e\u5730\u5728\u89c6\u9891\u63cf\u8ff0\u4e2d\u7f16\u7801\u56e0\u679c\u5173\u7cfb\uff0c\u901a\u8fc7\u81ea\u52a8\u8bc4\u4f30\u786e\u4fdd\u8d28\u91cf\u548c\u76f8\u5173\u6027\uff1b\uff082\uff09\u4e00\u4e2a\u4e13\u95e8\u7684\u56e0\u679c\u7f51\u7edc\uff08CEN\uff09\u67b6\u6784\uff0c\u5177\u6709\u72ec\u7acb\u7684\u7f16\u7801\u5668\u4ee5\u5206\u522b\u6355\u83b7\u56e0\u679c\u52a8\u6001\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u7684\u5b66\u4e60\u548c\u751f\u6210\u5177\u6709\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7684\u5b57\u5e55\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCEN\u5728\u8868\u8fbe\u89c6\u9891\u5185\u5bb9\u7684\u56e0\u679c\u548c\u65f6\u95f4\u65b9\u9762\u6bd4\u7b2c\u4e8c\u597d\u7684\u6a21\u578b\uff08GIT\uff09\u66f4\u51c6\u786e\uff1a\u5728MSVD\u548cMSR-VTT\u6570\u636e\u96c6\u4e0a\u7684CIDEr\u5206\u6570\u5206\u522b\u4e3a17.88\u548c17.44\u3002\u63d0\u51fa\u7684\u6846\u67b6\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u5177\u6709\u590d\u6742\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7ed3\u6784\u7684\u7ec6\u5fae\u6587\u672c\u63cf\u8ff0\uff0c\u8fd9\u662f\u89c6\u9891\u5b57\u5e55\u751f\u6210\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u6027\u3002\u6709\u5173\u9879\u76ee\u8be6\u60c5\uff0c\u8bf7\u8bbf\u95ee\u3002|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474](http://arxiv.org/abs/2406.06474)|null|\u5728\u5065\u5eb7\u9886\u57df\uff0c\u5927\u90e8\u5206\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u4e34\u5e8a\u4efb\u52a1\u4e0a\u3002\u7136\u800c\uff0c\u79fb\u52a8\u548c\u53ef\u7a7f\u6234\u8bbe\u5907\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u957f\u671f\u7684\u4e2a\u4eba\u5065\u5eb7\u76d1\u6d4b\u6570\u636e\u5f80\u5f80\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Large Language Model\uff08PH-LLM\uff09\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u662fGemini\u7684\u5b9a\u5236\u7248\uff0c\u4e13\u4e3a\u7406\u89e3\u548c\u5904\u7406\u6570\u503c\u65f6\u95f4\u5e8f\u5217\u7684\u4e2a\u4eba\u5065\u5eb7\u6570\u636e\u800c\u8bbe\u8ba1\u3002\u6211\u4eec\u521b\u5efa\u5e76\u6574\u7406\u4e86\u4e09\u4e2a\u6d4b\u8bd5\u96c6\uff0c\u8003\u5bdf\u4e86PH-LLM\u5728\u4ee5\u4e0b\u65b9\u9762\u7684\u6027\u80fd\uff1a1\uff09\u4ece\u7761\u7720\u6a21\u5f0f\u3001\u8eab\u4f53\u6d3b\u52a8\u548c\u751f\u7406\u53cd\u5e94\u4e2d\u751f\u6210\u4e2a\u6027\u5316\u89c1\u89e3\u548c\u5efa\u8bae\uff1b2\uff09\u4e13\u4e1a\u77e5\u8bc6\u9886\u57df\u7684\u4e13\u5bb6\u6c34\u5e73\uff1b3\uff09\u9884\u6d4b\u81ea\u6211\u62a5\u544a\u7684\u7761\u7720\u7ed3\u679c\u3002\u6211\u4eec\u4e0e\u9886\u57df\u4e13\u5bb6\u5408\u4f5c\u6784\u5efa\u4e86857\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bc4\u4f30\u5b9e\u9645\u7684\u7761\u7720\u548c\u5065\u8eab\u573a\u666f\u3002\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bc4\u5206\u6807\u51c6\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0Gemini Ultra 1.0\u548cPH-LLM\u5728\u5065\u8eab\u65b9\u9762\u4e0e\u4e13\u5bb6\u8868\u73b0\u65e0\u7edf\u8ba1\u5dee\u5f02\uff0c\u5c3d\u7ba1\u5728\u7761\u7720\u65b9\u9762\u4e13\u5bb6\u4ecd\u5360\u4f18\u52bf\uff0c\u4f46Fine-tune\u540e\u7684PH-LLM\u5728\u5229\u7528\u76f8\u5173\u9886\u57df\u77e5\u8bc6\u548c\u4e2a\u4eba\u5316\u7761\u7720\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u591a\u9879\u9009\u62e9\u7684\u7761\u7720\u533b\u5b66\u548c\u5065\u8eab\u8003\u8bd5\u8bc4\u4f30\u4e86PH-LLM\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5176\u5f97\u5206\u5206\u522b\u4e3a79%\u548c88%\uff0c\u8d85\u8fc7\u4e86\u4eba\u7c7b\u4e13\u5bb6\u6837\u672c\u7684\u5e73\u5747\u5206\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bad\u7ec3PH-LLM\u9884\u6d4b\u6765\u81ea\u53ef\u7a7f\u6234\u8bbe\u5907\u6587\u672c\u548c\u591a\u6a21\u6001\u7f16\u7801\u6570\u636e\u7684\u81ea\u6211\u62a5\u544a\u7761\u7720\u8d28\u91cf\u7ed3\u679c\uff0c\u5e76\u8bc1\u660e\u4e86\u591a\u6a21\u6001\u7f16\u7801\u5bf9\u4e8e\u8fbe\u5230\u4e13\u95e8\u533a\u5206\u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5728\u4e2a\u4eba\u5065\u5eb7\u8fd9\u4e2a\u5173\u952e\u5b89\u5168\u9886\u57df\u8fd8\u9700\u8981\u8fdb\u4e00\u6b65\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u4f46\u8fd9\u4e9b\u7ed3\u679c\u5c55\u793a\u4e86Gemini\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u80fd\u529b\uff0c\u4ee5\u53ca\u5c06\u751f\u7406\u6570\u636e\u5e94\u7528\u4e8e\u4e2a\u4eba\u5065\u5eb7\u5e94\u7528\uff0c\u5982PH-LLM\u4e2d\u7684\u505a\u6cd5\u3002|\n", "2406.06465": "|**2024-06-10**|**AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction**|Zhen Xing et.al.|[2406.06465](http://arxiv.org/abs/2406.06465)|null|\u6587\u672c\u5f15\u5bfc\u7684\u89c6\u9891\u9884\u6d4b\uff08TVP\uff09\u4efb\u52a1\u65e8\u5728\u6839\u636e\u521d\u59cb\u5e27\u548c\u6307\u4ee4\u9884\u6d4b\u540e\u7eed\u5e27\u7684\u8fd0\u52a8\uff0c\u8fd9\u5bf9\u4e8e\u865a\u62df\u73b0\u5b9e\u3001\u673a\u5668\u4eba\u6280\u672f\u548c\u5185\u5bb9\u521b\u4f5c\u7b49\u9886\u57df\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6539\u7f16Stable Diffusion\u5728\u8be5\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5e27\u4e00\u81f4\u6027\u4e0e\u65f6\u95f4\u7a33\u5b9a\u6027\u65b9\u9762\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u89c6\u9891\u6570\u636e\u96c6\u7684\u89c4\u6a21\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u9884\u8bad\u7ec3\u7684Image2Video\u6269\u6563\u6a21\u578b\u5bf9\u89c6\u9891\u52a8\u6001\u6709\u826f\u597d\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4f46\u7f3a\u4e4f\u6587\u672c\u63a7\u5236\u3002\u56e0\u6b64\uff0c\u5c06Image2Video\u6a21\u578b\u8f6c\u79fb\uff0c\u540c\u65f6\u6ce8\u5165\u6307\u4ee4\u63a7\u5236\u4ee5\u751f\u6210\u53ef\u63a7\u5236\u7684\u89c6\u9891\uff0c\u65e2\u5177\u6709\u610f\u4e49\u53c8\u9887\u5177\u6311\u6218\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u7528\u4e8e\u6839\u636e\u521d\u59cb\u5e27\u548c\u6587\u672c\u6307\u4ee4\u9884\u6d4b\u672a\u6765\u7684\u89c6\u9891\u72b6\u6001\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u53cc\u67e5\u8be2Transformer\uff08DQFormer\uff09\u67b6\u6784\uff0c\u5b83\u5c06\u6307\u4ee4\u548c\u5e27\u4fe1\u606f\u6574\u5408\u5230\u6761\u4ef6\u5d4c\u5165\u4e2d\uff0c\u7528\u4e8e\u672a\u6765\u5e27\u7684\u9884\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u957f\u77ed\u671f\u65f6\u5e8f\u9002\u914d\u5668\u548c\u7a7a\u95f4\u9002\u914d\u5668\uff0c\u80fd\u591f\u5728\u5c11\u91cf\u8bad\u7ec3\u6210\u672c\u4e0b\u5feb\u901f\u5c06\u901a\u7528\u89c6\u9891\u6269\u6563\u6a21\u578b\u9002\u5e94\u7279\u5b9a\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Something Something V2\u3001Epic Kitchen-100\u3001Bridge Data\u548cUCF-101\u56db\u4e2a\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6280\u672f\u3002\u7279\u522b\u662f\u5728Bridge\u6570\u636e\u96c6\u548cSSv2\u4e0a\uff0cAID\u5206\u522b\u5b9e\u73b0\u4e8691.2%\u548c55.5%\u7684FVD\u6539\u8fdb\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u9886\u57df\u7684\u6709\u6548\u6027\u3002\u66f4\u591a\u793a\u4f8b\u53ef\u5728\u6211\u4eec\u7684\u7f51\u7ad9\u627e\u5230\u3002|\n", "2406.06464": "|**2024-06-10**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.06461": "|**2024-06-11**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461](http://arxiv.org/abs/2406.06461)|null|\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u591a\u79cd\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4f46\u4f20\u7edf\u7684\u8bc4\u4ef7\u65b9\u6cd5\u4ec5\u5173\u6ce8\u6027\u80fd\u6307\u6807\uff0c\u5ffd\u89c6\u4e86\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u989d\u5916\u8ba1\u7b97\u8d44\u6e90\u5e26\u6765\u7684\u589e\u6548\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7b56\u7565\u6548\u7387\u7684\u7247\u9762\u7406\u89e3\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u5c06\u8ba1\u7b97\u9884\u7b97\u7eb3\u5165\u8bc4\u4f30\uff0c\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u65e2\u8003\u8651\u6027\u80fd\u6307\u6807\u53c8\u8003\u8651\u8ba1\u7b97\u6210\u672c\u7684\u66f4\u5168\u9762\u6bd4\u8f83\u3002\u901a\u8fc7\u8fd9\u79cd\u9884\u7b97\u610f\u8bc6\u7684\u89c6\u89d2\uff0c\u7814\u7a76\u53d1\u73b0\u590d\u6742\u7684\u63a8\u7406\u7b56\u7565\u5728\u6ca1\u6709\u663e\u8457\u7b97\u6cd5\u521b\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u5f80\u5f80\u7531\u4e8e\u5206\u914d\u4e86\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u800c\u8d85\u8d8a\u4e86\u7b80\u5355\u7684\u57fa\u7ebf\u3002\u4f8b\u5982\uff0c\u5f53\u7ed9\u4e88\u94fe\u5f0f\u601d\u8003\u81ea\u6d3d\u6027\uff08chain-of-thought self-consistency\uff09\u7c7b\u4f3c\u7ea7\u522b\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u5b83\u5e38\u5e38\u80fd\u4f18\u4e8e\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u63a8\u7406\u7b56\u7565\u3002\u7136\u800c\uff0c\u5728\u8fd9\u79cd\u89c4\u6a21\u654f\u611f\u7684\u89c6\u89d2\u4e0b\uff0c\u67d0\u4e9b\u7b56\u7565\u5982\u591a\u4ee3\u7406\u8fa9\u8bba\u6216\u591a\u53cd\u601d\u5728\u589e\u52a0\u8ba1\u7b97\u9884\u7b97\u65f6\u53ef\u80fd\u4f1a\u8868\u73b0\u5f97\u66f4\u5dee\u3002|\n", "2406.06458": "|**2024-06-10**|**Evaluating the Retrieval Component in LLM-Based Question Answering Systems**|Ashkan Alinejad et.al.|[2406.06458](http://arxiv.org/abs/2406.06458)|null|## \u80cc\u666f \u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u95ee\u7b54\u7cfb\u7edf\u5728\u4f9d\u8d56\u68c0\u7d22\u7ec4\u4ef6\u65f6\uff0c\u80fd\u591f\u83b7\u53d6\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5e76\u964d\u4f4e\u4ea7\u751f\u4e0d\u51c6\u786e\u56de\u590d\u6216\u9519\u8bef\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5c3d\u7ba1\u4fe1\u606f\u68c0\u7d22\u9886\u57df\u7684\u8bc4\u4f30\u65b9\u6cd5\u65e9\u5df2\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u8bc4\u4f30LLMs\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u6027\u80fd\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u57fa\u51c6\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4ef7\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u5168\u9762\u5730\u53cd\u6620\u68c0\u7d22\u5668\u7684\u6027\u80fd\uff0c\u5e76\u4e0e\u6574\u4e2a\u95ee\u7b54\u7cfb\u7edf\u7684\u6574\u4f53\u8868\u73b0\u66f4\u4e3a\u4e00\u81f4\u3002\u5c3d\u7ba1\u4f20\u7edf\u7684\u7cbe\u786e\u5ea6\uff08precision\uff09\u3001\u53ec\u56de\u7387\uff08recall\uff09\u548cF1\u5206\u6570\u7b49\u6307\u6807\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u63ed\u793aLLMs\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5728\u68c0\u7d22\u5668\u4e0d\u5b8c\u7f8e\u65f6\u4ecd\u63d0\u4f9b\u51c6\u786e\u7b54\u6848\uff0c\u4f46\u6211\u4eec\u7684\u8bc4\u4f30\u65b9\u6cd5\u8003\u8651\u5230\u4e86LLMs\u7684\u4f18\u52bf\uff0c\u5373\u5b83\u4eec\u80fd\u591f\u5ffd\u7565\u65e0\u5173\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4e5f\u80fd\u5904\u7406\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u548c\u865a\u6784\u5185\u5bb9\u3002|\n", "2406.06455": "|**2024-06-10**|**A Large Language Model Pipeline for Breast Cancer Oncology**|Tristen Pool et.al.|[2406.06455](http://arxiv.org/abs/2406.06455)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f17\u591a\u9886\u57df\u5c55\u73b0\u51fa\u521b\u65b0\u6f5c\u529b\uff0c\u4f46\u5728\u764c\u75c7\u6cbb\u7597\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u5f00\u53d1\u3002\u7814\u7a76\u8005\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684Langchain\u63d0\u793a\u5de5\u7a0b\u7ba1\u9053\uff0c\u5bf9\u6700\u5148\u8fdb\u7684OpenAI\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u6570\u636e\u96c6\u5305\u62ec\u4e34\u5e8a\u6570\u636e\u548c\u4e34\u5e8a\u6307\u5357\u6587\u672c\uff0c\u4e13\u6ce8\u4e8e\u4e73\u817a\u764c\u60a3\u8005\u8f85\u52a9\u653e\u7597\u548c\u5316\u7597\u4e24\u4e2a\u5173\u952e\u6cbb\u7597\u56e0\u7d20\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6a21\u578b\u5728\u5206\u7c7b\u8fd9\u4e24\u4e2a\u6cbb\u7597\u624b\u6bb5\u65f6\u8fbe\u5230\u4e86\u9ad8\u7cbe\u5ea6\uff080.85+\uff09\u3002\u901a\u8fc7\u89c2\u5bdf\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u7684\u6cbb\u7597\u8d28\u91cf\u6570\u636e\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u7f6e\u4fe1\u533a\u95f4\uff0c\u4f30\u8ba1\u6a21\u578b\u5728\u9884\u6d4b\u6cbb\u7597\u65b9\u6848\u65f6\u5fc5\u987b\u6bd4\u539f\u59cb\u80bf\u7624\u5b66\u5bb6\u8868\u73b0\u5f97\u66f4\u597d\uff0c\u624d\u80fd\u5728\u603b\u4f53\u4e0a\u6210\u4e3a\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6bd4\u4f8b\u4e3a8.2%\u81f313.3%\u3002\u7531\u4e8e\u764c\u75c7\u6cbb\u7597\u51b3\u7b56\u7ed3\u679c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u672a\u6765\u53ef\u80fd\u9700\u8981\u8fdb\u884c\u4e34\u5e8a\u8bd5\u9a8c\u6765\u9a8c\u8bc1\u8fd9\u4e00\u9608\u503c\u3002\u8003\u8651\u5230\u7f8e\u56fd85%\u7684\u764c\u75c7\u60a3\u8005\u5728\u5730\u65b9\u793e\u533a\u8bbe\u65bd\u63a5\u53d7\u6cbb\u7597\uff0c\u8fd9\u7c7b\u6a21\u578b\u6709\u53ef\u80fd\u663e\u8457\u6269\u5927\u4f18\u8d28\u62a4\u7406\u7684\u53ef\u53ca\u6027\uff0c\u5176\u6548\u679c\u81f3\u5c11\u63a5\u8fd1\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u3002|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451](http://arxiv.org/abs/2406.06451)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u3001\u8c03\u8bd5\u548c\u89e3\u91ca\u65b9\u9762\u7684\u6027\u80fd\u5f15\u53d1\u4e86\u8bb8\u591a\u7814\u7a76\u8005\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u5bf9\u672c\u79d1\u7f16\u7a0b\u6559\u80b2\u7684\u5173\u6ce8\uff0c\u4ed6\u4eec\u671f\u5f85\u8fd9\u4e9b\u6a21\u578b\u80fd\u9769\u65b0\u7f16\u7a0b\u6559\u5b66\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5982\u4f55\u4ee5\u53ca\u4e3a\u4f55\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u4f7f\u7528LLMs\u7684\u51b3\u7b56\u53ef\u80fd\u4e0d\u4ec5\u4ec5\u57fa\u4e8e\u6280\u672f\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u4ee5\u793e\u4f1a\u5851\u9020\u6280\u672f\u7406\u8bba\u4e3a\u6307\u5bfc\u6846\u67b6\uff0c\u63a2\u8ba8\u4e86\u5b66\u751f\u5bf9LLMs\u7684\u793e\u4f1a\u611f\u77e5\u5982\u4f55\u5f71\u54cd\u4ed6\u4eec\u7684\u4f7f\u7528\u884c\u4e3a\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u4e00\u4efd\u533f\u540d\u7684\u8bfe\u7a0b\u7ed3\u675f\u65f6\u7684\u8c03\u67e5\u95ee\u5377\uff08n=158\uff09\u3001\u4e2d\u671f\u81ea\u6211\u6548\u80fd\u95ee\u5377\uff08n=158\uff09\u300110\u4f4d\u5b66\u751f\u7684\u6df1\u5ea6\u8bbf\u8c08\u3001\u81ea\u6211\u62a5\u544a\u7684LLM\u5728\u4f5c\u4e1a\u4e2d\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u4ee5\u53ca\u671f\u4e2d\u8003\u8bd5\u6210\u7ee9\uff0c\u53d1\u73b0\u5b66\u751f\u7684LLM\u4f7f\u7528\u4e0e\u5176\u5bf9\u672a\u6765\u804c\u4e1a\u7684\u671f\u671b\u548c\u5bf9\u540c\u4f34\u4f7f\u7528\u7684\u611f\u77e5\u6709\u5173\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u65e9\u671f\u81ea\u6211\u62a5\u544a\u7684LLM\u4f7f\u7528\u4e0e\u8f83\u4f4e\u7684\u81ea\u6211\u6548\u80fd\u548c\u4e2d\u671f\u8003\u8bd5\u6210\u7ee9\u76f8\u5173\uff0c\u800c\u5b66\u751f\u5bf9\u8fc7\u5ea6\u4f9d\u8d56LLM\u7684\u611f\u77e5\uff0c\u800c\u975e\u5b9e\u9645\u4f7f\u7528\uff0c\u4e0e\u8bfe\u7a0b\u540e\u671f\u7684\u81ea\u6211\u6548\u80fd\u4e0b\u964d\u6709\u5173\u3002|\n", "2406.07545": "|**2024-06-11**|**Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena**|Aidar Myrzakhan et.al.|[2406.07545](http://arxiv.org/abs/2406.07545)|**[link](https://github.com/vila-lab/open-llm-leaderboard)**|**### \u80cc\u666f \u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u5e38\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u5e38\uff0cLLM\u4f1a\u6839\u636e\u8c03\u6574\u540e\u7684\u6982\u7387\uff0c\u5982\u957f\u5ea6\u56e0\u7d20\uff0c\u9009\u62e9\u6700\u53ef\u80fd\u7684\u7b54\u6848\u3002\u7136\u800c\uff0cLLMs\u53ef\u80fd\u5b58\u5728\u56fa\u6709\u7684\u504f\u89c1\uff0c\u4f8b\u5982\u5bf9A\u3001B\u3001C\u3001D\u7b49\u9009\u9879ID\u7684\u504f\u597d\uff0c\u8fd9\u53ef\u80fd\u5f71\u54cd\u7b54\u6848\u9884\u6d4b\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u5728\u5c11\u6570\u6d4b\u8bd5\u6837\u672c\u4e0a\u968f\u673a\u6253\u4e71\u9009\u9879\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u5230\u65b0\u6837\u672c\u4e0a\uff0c\u8bd5\u56fe\u51cf\u5c11\u8fd9\u79cd\u201c\u9009\u62e9\u504f\u5dee\u201d\u3002\u6b64\u5916\uff0cMCQ\u7684\u53e6\u4e00\u4e2a\u95ee\u9898\u662f\u201c\u5f69\u7968\u5f0f\u731c\u6d4b\u201d\uff0c\u5373LLM\u5e76\u672a\u771f\u6b63\u5b66\u4e60\u77e5\u8bc6\uff0c\u800c\u662f\u51ed\u8fd0\u6c14\u731c\u5bf9\u7b54\u6848\uff0c\u8fd9\u5bf9\u5c0f\u578bLLMs\u5c24\u4e3a\u4e25\u91cd\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e00\u4e2a\u66f4\u5168\u9762\u7684\u65b9\u6cd5\u662f\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\uff0c\u8fd9\u80fd\u4ece\u6839\u672c\u4e0a\u6d88\u9664\u9009\u62e9\u504f\u5dee\u548c\u968f\u673a\u731c\u6d4b\u3002\u4f46\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\u4e5f\u5e26\u6765\u4e86\u6311\u6218\uff1a\u4e00\u662f\u5982\u4f55\u8bc6\u522b\u5408\u9002\u7684\u5f00\u653e\u6027\u95ee\u9898\uff0c\u4e8c\u662f\u5982\u4f55\u9a8c\u8bc1LLM\u5bf9\u5f00\u653e\u5f0f\u95ee\u9898\u7684\u56de\u7b54\u4e0e\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u7b54\u6848\u4e4b\u95f4\u7684\u51c6\u786e\u6027\u3002\u672c\u7814\u7a76\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u96be\u9898\uff0c\u5e76\u5efa\u7acb\u4e00\u4e2a\u65b0\u7684LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u901a\u8fc7\u5b8c\u5168\u7684\u5f00\u653e\u5f0f\u95ee\u9898\u6765\u8861\u91cf\u6a21\u578b\u6027\u80fd\uff0c\u4f8b\u5982GPT-4o/4/3.5\u3001Claude 3\u3001Gemini\u7b49\u3002 ### \u4efb\u52a1 \u6211\u4eec\u521b\u5efa\u4e86Open-LLM-Leaderboard\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u8bc4\u4ef7\u5e73\u53f0\uff0c\u65e8\u5728\u8ddf\u8e2a\u5404\u79cdLLM\u7684\u8868\u73b0\uff0c\u63ed\u793a\u5b83\u4eec\u7684\u771f\u5b9e\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5f00\u6e90\uff0c\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/VILA-Lab/Open-LLM-Leaderboard\u3002**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528](http://arxiv.org/abs/2406.07528)|**[link](https://github.com/dvlab-research/q-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u957f\u5e8f\u5217\u65b9\u9762\u7684\u80fd\u529b\u5bf9\u4e8e\u5404\u9886\u57df\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u4ee5\u6df1\u5165\u7406\u89e3\u8bed\u4e49\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Query-aware Inference for LLMs\uff08Q-LLM\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u8ba4\u77e5\u5904\u7406\u5927\u89c4\u6a21\u5e8f\u5217\u7684\u7cfb\u7edf\u3002\u901a\u8fc7\u805a\u7126\u4e8e\u4e0e\u7ed9\u5b9a\u67e5\u8be2\u76f8\u5173\u7684\u5185\u5b58\u6570\u636e\uff0cQ-LLM\u80fd\u591f\u5728\u56fa\u5b9a\u7a97\u53e3\u5927\u5c0f\u5185\u51c6\u786e\u6355\u6349\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4e3a\u67e5\u8be2\u63d0\u4f9b\u7cbe\u786e\u7684\u7b54\u6848\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55LLMs\u4e2d\u3002\u4f7f\u7528LLaMA3\uff08QuickLLaMA\uff09\u7684Q-LLM\u80fd\u572830\u79d2\u5185\u9605\u8bfb\u300a\u54c8\u5229\u00b7\u6ce2\u7279\u300b\uff0c\u5e76\u80fd\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u76f8\u8f83\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684LLaMA3\uff0cQ-LLM\u7684\u6027\u80fd\u63d0\u5347\u4e867.17%\uff0c\u800c\u5728Mistral\u4e0a\uff0c\u5b83\u5728$\\infty$-bench\u4e0a\u7684\u8868\u73b0\u63d0\u5347\u4e863.26%\u3002\u5728\u201c\u9488\u950b\u76f8\u5bf9\u201d\u4efb\u52a1\u4e2d\uff0cQ-LLM\u5728\u5e7f\u6cdb\u8ba4\u53ef\u7684\u57fa\u51c6\u4e0a\uff0c\u76f8\u5bf9\u4e8e\u5f53\u524d\u6700\u4f73\u6210\u7ee9\uff0cMistral\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e867.0%\uff0c\u5728LLaMA3\u4e0a\u5b9e\u73b0\u4e86100%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728https://github.com/dvlab-research/Q-LLM\u4e0a\u5f00\u6e90\u3002**|\n", "2406.07515": "|**2024-06-11**|**Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement**|Yunzhen Feng et.al.|[2406.07515](http://arxiv.org/abs/2406.07515)|null|\u968f\u7740\u751f\u6210\u6a21\u578b\u5408\u6210\u6570\u636e\u7684\u5174\u8d77\uff0c\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5fae\u8c03\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u6a21\u578b\u5d29\u6e83\uff08\u5373\u5fae\u8c03\u6027\u80fd\u4e0b\u964d\uff09\u7684\u5173\u6ce8\u3002\u7531\u4e8e\u4eba\u7c7b\u548c\u673a\u5668\u90fd\u8f83\u5bb9\u6613\u5206\u8fa8\u597d\u6837\u672c\u548c\u574f\u6837\u672c\uff0c\u800c\u975e\u751f\u6210\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u53cd\u9988\u6765\u9632\u6b62\u6a21\u578b\u5728\u5408\u6210\u6570\u636e\u4e0a\u51fa\u73b0\u5d29\u6e83\u3002\u6211\u4eec\u7406\u8bba\u5206\u6790\u4e86\u4e00\u4e2a\u9ad8\u65af\u6df7\u5408\u5206\u7c7b\u6a21\u578b\u5728\u57fa\u4e8e\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u8bad\u7ec3\u4e0b\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u6709\u9650\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u5b9e\u9a8c\u8bc1\u636e\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5b9e\u9645\u95ee\u9898\u4e0a\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u9884\u6d4b\uff1a\u4f7f\u7528\u53d8\u538b\u5668\u8ba1\u7b97\u77e9\u9635\u7279\u5f81\u503c\u548c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u65b0\u95fb\u6458\u8981\uff0c\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u6a21\u578b\u5728\u751f\u6210\u6570\u636e\u4e0a\u90fd\u4f1a\u7ecf\u5386\u5d29\u6e83\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u901a\u8fc7\u4ece\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u4e2d\u8bad\u7ec3\uff0c\u65e0\u8bba\u662f\u4fee\u526a\u9519\u8bef\u9884\u6d4b\u8fd8\u662f\u9009\u62e9\u6700\u4f73\u731c\u6d4b\uff0c\u90fd\u80fd\u9632\u6b62\u6a21\u578b\u5d29\u6e83\uff0c\u8bc1\u5b9e\u4e86\u50cfRLHF\uff08Reinforcement Learning with Human Feedback\uff09\u8fd9\u6837\u7684\u6d41\u884c\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2406.07505": "|**2024-06-11**|**THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report**|KBTG Labs et.al.|[2406.07505](http://arxiv.org/abs/2406.07505)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u5728\u79d1\u6280\u9886\u57df\u5c55\u73b0\u4e86\u65b0\u529f\u80fd\u548c\u673a\u9047\u3002\u7136\u800c\uff0c\u975e\u5e38\u5927\u7684LLMs\u7684\u5b9e\u9645\u5e94\u7528\u53d7\u5230\u5176\u9ad8\u8ba1\u7b97\u6210\u672c\u7684\u5236\u7ea6\uff0c\u8fd9\u4e0e\u5176\u76f8\u5bf9\u6709\u9650\u7684\u4eba\u7c7b\u80fd\u529b\u76f8\u6bd4\uff0c\u6536\u76ca\u5e76\u4e0d\u660e\u663e\u3002\u5c3d\u7ba1\u5c0f\u578b\u3001\u66f4\u5b9e\u7528\u7684LLMs\u5728\u91d1\u878d\u5206\u6790\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5c1a\u672a\u5b8c\u5168\u638c\u63e1\uff0c\u5982\u5b83\u4eec\u5728\u6a21\u62df\u7279\u8bb8\u91d1\u878d\u5206\u6790\u5e08\uff08CFA\uff09\u8003\u8bd5\u4e2d\u7684\u63a5\u8fd1\u901a\u8fc7\u8868\u73b0\u6240\u793a\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86Financial Analyst Extension\uff08FAE\uff09\u5bf9\u6211\u4eec\u7684Text Hyperlocally Augmented Large Language Extension\uff08THaLLE\uff09\u7cfb\u5217\u7684\u6269\u5c55\uff0c\u8fd9\u4e00\u7cfb\u521780\u4ebf\u53c2\u6570\u7684LLMs\u5728\u6a21\u62dfCFA\u8003\u8bd5\u4e2d\u59cb\u7ec8\u8868\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u6a21\u578b\u76f8\u6bd4\u3002\u6211\u4eec\u8be6\u7ec6\u8bb0\u5f55\u4e86\u7528\u4e8e\u4f18\u5316\u7684\u5fae\u8c03\u6280\u672f\uff0c\u4ee5\u4f9b\u540e\u7eed\u7814\u7a76\u53c2\u8003\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165Flare CFA\uff0c\u8fd9\u662f\u4e00\u4e2a\u516c\u5f00\u53ef\u7528\u7684\u91d1\u878d\u987e\u95ee\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u68c0\u9a8cLLMs\u5728\u8d22\u52a1\u987e\u95ee\u89d2\u8272\u4e2d\u7684\u80fd\u529b\u3002|\n", "2406.07502": "|**2024-06-11**|**Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions**|Renjie Pi et.al.|[2406.07502](http://arxiv.org/abs/2406.07502)|**[link](https://github.com/sterzhang/image-textualization)**|**## \u80cc\u666f \u56fe\u50cf\u63cf\u8ff0\u6570\u636e\u96c6\u5bf9\u4e8e\u63a8\u52a8\u56fe\u50cf\u7406\u89e3\u3001\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u548c\u6587\u672c\u56fe\u50cf\u68c0\u7d22\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u5f53\u524d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u6765\u81ea\u4e24\u4e2a\u9014\u5f84\uff1a\u4e00\u662f\u4ece\u7f51\u7edc\u4e0a\u6293\u53d6\u56fe\u50cf\u4e0e\u6587\u5b57\u5bf9\uff0c\u4f46\u8fd9\u7c7b\u63cf\u8ff0\u5f80\u5f80\u8d28\u91cf\u8f83\u4f4e\u4e14\u5b58\u5728\u566a\u58f0\uff1b\u4e8c\u662f\u4eba\u5de5\u6807\u6ce8\uff0c\u5982COCO\u7b49\uff0c\u901a\u5e38\u63cf\u8ff0\u7b80\u6d01\uff0c\u7f3a\u4e4f\u8be6\u7ec6\u4fe1\u606f\u3002\u5c3d\u7ba1\u8be6\u7ec6\u7684\u56fe\u50cf\u63cf\u8ff0\u53ef\u4ee5\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u83b7\u5f97\uff0c\u4f46\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\u9650\u5236\u4e86\u5176\u53ef\u884c\u6027\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u4fc3\u4f7f\u6211\u4eec\u5bfb\u6c42\u66f4\u6709\u6548\u548c\u53ef\u6269\u5c55\u7684\u65b9\u6cd5\u6765\u751f\u6210\u51c6\u786e\u800c\u8be6\u5c3d\u7684\u56fe\u50cf\u63cf\u8ff0\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u56fe\u50cf\u6587\u672c\u5316\u201d\uff08Image Textualization\uff0c\u7b80\u79f0IT\uff09\uff0c\u5b83\u901a\u8fc7\u534f\u540c\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u548c\u89c6\u89c9\u4e13\u5bb6\u6a21\u578b\uff0c\u6709\u6548\u5730\u5c06\u89c6\u89c9\u4fe1\u606f\u8f6c\u5316\u4e3a\u6587\u672c\uff0c\u4ece\u800c\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u63cf\u8ff0\u3002\u9488\u5bf9\u5f53\u524d\u7f3a\u4e4f\u8be6\u5c3d\u63cf\u8ff0\u7684\u57fa\u51c6\u95ee\u9898\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u591a\u4e2a\u8bc4\u4ef7\u57fa\u51c6\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u8d28\u91cf\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728IT\u7cbe\u5fc3\u7f16\u7e82\u7684\u63cf\u8ff0\u8bad\u7ec3\u4e0b\uff0cLLaVA-7B\u6a21\u578b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u80fd\u529b\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u80fd\u591f\u751f\u6210\u66f4\u4e30\u5bcc\u7684\u63cf\u8ff0\uff0c\u8f93\u51fa\u957f\u5ea6\u548c\u7ec6\u8282\u663e\u8457\u589e\u52a0\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5e7b\u89c9\u73b0\u8c61\u3002**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496](http://arxiv.org/abs/2406.07496)|**[link](https://github.com/zou-group/textgrad)**|**\u4eba\u5de5\u667a\u80fd\u6b63\u7ecf\u5386\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5176\u4ed6\u590d\u6742\u7ec4\u4ef6\u7684\u534f\u540c\u5de5\u4f5c\u53d6\u5f97\u4e86\u7a81\u7834\u3002\u5f53\u524d\uff0c\u4e3a\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u8bbe\u8ba1\u539f\u5219\u5316\u7684\u81ea\u52a8\u5316\u4f18\u5316\u65b9\u6cd5\u6210\u4e3a\u4e00\u9879\u5173\u952e\u65b0\u6311\u6218\u3002\u795e\u7ecf\u7f51\u7edc\u5728\u65e9\u671f\u9762\u4e34\u7c7b\u4f3c\u95ee\u9898\u65f6\uff0c\u901a\u8fc7\u53cd\u5411\u4f20\u64ad\u548c\u81ea\u52a8\u5fae\u5206\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TextGrad\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u6587\u672c\u5b9e\u73b0\u81ea\u52a8\u201c\u5fae\u5206\u201d\uff0c\u5c06LLMs\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u901a\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5efa\u8bae\u56de\u4f20\u5230\u590d\u5408AI\u7cfb\u7edf\u7684\u5404\u4e2a\u7ec4\u4ef6\u4e2d\u3002TextGrad\u9075\u5faaPyTorch\u7684\u8bed\u6cd5\u548c\u62bd\u8c61\uff0c\u6613\u4e8e\u4f7f\u7528\u4e14\u7075\u6d3b\uff0c\u7528\u6237\u4ec5\u9700\u63d0\u4f9b\u76ee\u6807\u51fd\u6570\uff0c\u65e0\u9700\u8c03\u6574\u6846\u67b6\u7ec4\u4ef6\u6216\u63d0\u793a\uff0c\u5373\u53ef\u65e0\u7f1d\u5e94\u7528\u3002 TextGrad\u9002\u7528\u4e8e\u591a\u79cd\u4efb\u52a1\uff0c\u4ece\u95ee\u7b54\u548c\u5206\u5b50\u4f18\u5316\u5230\u653e\u5c04\u6cbb\u7597\u8ba1\u5212\u8bbe\u8ba1\u3002\u5728\u65e0\u9700\u4fee\u6539\u6846\u67b6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u663e\u8457\u63d0\u5347\u4e86GPT-4o\u5728Google\u8bc1\u660e\u6027\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u96f6-shot\u51c6\u786e\u7387\uff0c\u4ece51%\u63d0\u5347\u81f355%\uff1b\u5728\u4f18\u5316LeetCode\u96be\u9898\u89e3\u6cd5\u4e0a\u5b9e\u73b0\u4e8620%\u7684\u76f8\u5bf9\u6027\u80fd\u63d0\u5347\uff1b\u6539\u8fdb\u4e86\u63a8\u7406\u63d0\u793a\uff0c\u8bbe\u8ba1\u51fa\u5177\u6709\u7406\u60f3\u4f53\u5916\u4eb2\u548c\u529b\u7684\u65b0\u836f\u5019\u9009\u5206\u5b50\uff1b\u4ee5\u53ca\u8bbe\u8ba1\u51fa\u5177\u6709\u9ad8\u7279\u5f02\u6027\u7684\u653e\u5c04\u6cbb\u7597\u65b9\u6848\u3002TextGrad\u4e3a\u4e0b\u4e00\u4ee3AI\u7cfb\u7edf\u7684\u53d1\u5c55\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63a8\u52a8\u4e86\u590d\u5408AI\u6280\u672f\u7684\u52a0\u901f\u53d1\u5c55\u3002**|\n", "2406.07494": "|**2024-06-12**|**CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization**|Frederic Kirstein et.al.|[2406.07494](http://arxiv.org/abs/2406.07494)|null|\u8be5\u6587\u7ae0\u7efc\u8ff0\u4e862019\u5e74\u81f32024\u5e74\u95f4\u53d1\u8868\u76841262\u7bc7\u72ec\u7279\u7684\u7814\u7a76\u8bba\u6587\uff0c\u96c6\u4e2d\u5728Transformer\u67b6\u6784\u5728\u82f1\u6587\u5bf9\u8bdd\u6458\u8981\u751f\u6210\u65b9\u9762\u7684\u7814\u7a76\u3002\u6587\u7ae0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u5bf9\u8bdd\u6458\u8981\u4e2d\u5b58\u5728\u7684\u4e3b\u8981\u6311\u6218\uff0c\u5982\u8bed\u8a00\u7406\u89e3\u3001\u7ed3\u6784\u5904\u7406\u3001\u7406\u89e3\u80fd\u529b\u3001\u8bf4\u8bdd\u8005\u8bc6\u522b\u3001\u91cd\u8981\u6027\u5224\u65ad\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\uff0c\u5e76\u4e0e\u76f8\u5e94\u7684\u6280\u672f\uff0c\u5982\u56fe\u89e3\u65b9\u6cd5\u3001\u989d\u5916\u8bad\u7ec3\u4efb\u52a1\u548c\u89c4\u5212\u7b56\u7565\u8fdb\u884c\u4e86\u5173\u8054\u3002\u5c3d\u7ba1\u5728\u67d0\u4e9b\u65b9\u9762\uff08\u5982\u8bed\u8a00\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u7406\u89e3\u529b\u3001\u771f\u5b9e\u6027\u4e0e\u91cd\u8981\u6027\u8bc4\u4f30\u7b49\u6311\u6218\u4ecd\u7136\u5b58\u5728\uff0c\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u7814\u7a76\u7a7a\u95f4\u3002 \u6587\u7ae0\u8fd8\u5206\u6790\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u65b9\u5f0f\uff0c\u6db5\u76d6\u4e86\u5bf9\u8bdd\u5b50\u9886\u57df\uff08\u5982\u4f1a\u8bae\u3001\u533b\u7597\uff09\u7684\u5e38\u7528\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u81ea\u52a8\u8bc4\u4ef7\u6307\u6807\uff08\u5982ROUGE\uff09\u548c\u4eba\u7c7b\u8bc4\u4f30\u7684\u666e\u904d\u5b9e\u8df5\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8de8\u9886\u57df\u7684\u6570\u636e\u96c6\u76f8\u5bf9\u6709\u9650\uff0c\u4e14\u62a5\u544a\u7684\u4eba\u7c7b\u8bc4\u4f30\u5f80\u5f80\u7f3a\u4e4f\u8db3\u591f\u7684\u5185\u5ba1\u5458\u4e00\u81f4\u6027\u4fe1\u606f\u548c\u6807\u6ce8\u6307\u5357\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u63a2\u7d22\u53ef\u80fd\u5e26\u6765\u7684\u5f71\u54cd\uff0c\u6307\u51fa\u5c3d\u7ba1\u5b83\u4eec\u53ef\u80fd\u4f1a\u6539\u53d8\u76f8\u5173\u6027\u548c\u96be\u5ea6\uff0c\u4f46\u63cf\u8ff0\u7684\u6311\u6218\u5206\u7c7b\u4f53\u7cfb\u4ecd\u7136\u5177\u6709\u4ef7\u503c\u3002|\n", "2406.07485": "|**2024-06-11**|**PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction**|Adnan Abbas et.al.|[2406.07485](http://arxiv.org/abs/2406.07485)|null|\u9ad8\u6548\u7684\u8ba1\u5212\u5236\u5b9a\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4eba\u4eec\u5f80\u5f80\u96be\u4ee5\u5236\u5b9a\u5b9e\u9645\u7684\u8ba1\u5212\u5e76\u53cd\u601d\u81ea\u5df1\u7684\u6548\u7387\u3002\u5229\u7528\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u5bf9\u8bdd\u52a9\u624b\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u8bdd\u65b9\u5f0f\u5c06\u8ba1\u5212\u5916\u5316\uff0c\u5f3a\u5316\u51b3\u5fc3\uff0c\u4fc3\u8fdb\u4e13\u6ce8\u884c\u52a8\uff0c\u4ece\u800c\u6b63\u9762\u5f71\u54cd\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u8bbe\u8ba1\u4e00\u4e2a\u5bf9\u8bdd\u52a9\u624b\uff0c\u901a\u8fc7\u81ea\u7136\u5bf9\u8bdd\u7684\u793e\u4ea4\u4e92\u52a8\u6027\uff0c\u63d0\u4f9b\u6df1\u5165\u7684\u95ee\u9898\u548c\u53cd\u601d\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u8ba1\u5212\u6267\u884c\u5ea6\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u663e\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u7684\u6548\u76ca\uff0c\u4f46\u8bb8\u591a\u5e72\u9884\u63aa\u65bd\u4ecd\u4fdd\u6301\u9759\u6001\uff0c\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u53c2\u4e0e\u5ea6\u968f\u65f6\u95f4\u4e0b\u964d\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65cb\u8f6c\u548c\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u6bcf\u5929\u4e3a\u7528\u6237\u63d0\u4f9b\u591a\u6837\u7684\u5e72\u9884\u624b\u6bb5\u3002\u6211\u4eec\u7684\u7cfb\u7edfPITCH\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u4fc3\u8fdb\u65e5\u5e38\u8ba1\u5212\u7684\u5916\u90e8\u5316\u548c\u53cd\u601d\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0e\u5bf9\u8bdd\u4ee3\u7406\u4e00\u8d77\u5916\u5316\u4efb\u52a1\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u65cb\u8f6c\u7b56\u7565\u5728\u4fdd\u6301\u7528\u6237\u53c2\u4e0e\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2406.07483": "|**2024-06-11**|**Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing**|Mao Li et.al.|[2406.07483](http://arxiv.org/abs/2406.07483)|null|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u81ea\u52a8\u6587\u672c\u6807\u6ce8\u65b9\u9762\u5c55\u73b0\u51fa\u6d53\u539a\u5174\u8da3\u3002\u672c\u6587\u7814\u7a76\u4e86\u516b\u79cd\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728\u7acb\u573a\u6807\u6ce8\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5c06\u5176\u4e0e\u4eba\u7c7b\uff08\u901a\u8fc7\u4f17\u5305\uff09\u7684\u5224\u65ad\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u3002\u6211\u4eec\u63a2\u7a76\u4e86\u4f55\u65f6LLMs\u53ef\u80fd\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ea7\u751f\u5206\u6b67\u7684\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6587\u672c\u4e2d\u8868\u8fbe\u7acb\u573a\u7684\u660e\u786e\u7a0b\u5ea6\u5bf9LLMs\u5224\u65ad\u4e0e\u4eba\u7c7b\u4e00\u81f4\u6027\u81f3\u5173\u91cd\u8981\u3002\u5f53\u4eba\u7c7b\u6ce8\u91ca\u8005\u8868\u73b0\u826f\u597d\u65f6\uff0cLLMs\u4e5f\u8868\u73b0\u51fa\u8272\uff1b\u53cd\u4e4b\uff0cLLMs\u7684\u5931\u8d25\u5f80\u5f80\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u96be\u4ee5\u8fbe\u6210\u4e00\u81f4\u7684\u60c5\u5883\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u7ed3\u5408\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u7cbe\u786e\u5ea6\u4e0eLLMs\u9884\u6d4b\u7684\u89c4\u6a21\uff0c\u63d0\u51fa\u4e00\u79cd\u5168\u9762\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u63d0\u9ad8\u81ea\u52a8\u5316\u7acb\u573a\u68c0\u6d4b\u51c6\u786e\u6027\u548c\u5168\u9762\u6027\u7684\u5fc5\u8981\u6027\uff0c\u65e8\u5728\u63a8\u52a8\u8fd9\u4e9b\u6280\u672f\u5728\u66f4\u9ad8\u6548\u3001\u65e0\u504f\u89c1\u7684\u793e\u4f1a\u5a92\u4f53\u5206\u6790\u4e2d\u5f97\u5230\u63d0\u5347\u3002|\n", "2406.07476": "|**2024-06-11**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476](http://arxiv.org/abs/2406.07476)|**[link](https://github.com/damo-nlp-sg/videollama2)**|**\u672c\u6587\u4ecb\u7ecdVideoLLaMA 2\uff0c\u4e00\u5957\u4e13\u4e3a\u63d0\u5347\u89c6\u9891\u548c\u97f3\u9891\u5b9a\u5411\u4efb\u52a1\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u5efa\u6a21\u53ca\u97f3\u9891\u7406\u89e3\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Video-LLMs\uff09\u3002\u5b83\u5728\u524d\u4e00\u4ee3\u7684\u57fa\u7840\u4e0a\u589e\u6dfb\u4e86\u5b9a\u5236\u7684\u65f6\u7a7a\u5377\u79ef\uff08STC\uff09\u8fde\u63a5\u5668\uff0c\u6709\u6548\u5730\u6355\u6349\u89c6\u9891\u6570\u636e\u7684\u590d\u6742\u7a7a\u95f4\u548c\u65f6\u95f4\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u878d\u5165\u4e86\u97f3\u9891\u5206\u652f\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u65e0\u7f1d\u878d\u5408\u97f3\u9891\u7ebf\u7d22\u3002\u5728\u591a\u9879\u8bc4\u4f30\u4e2d\uff0c\u5982\u591a\u9009\u89c6\u9891\u95ee\u7b54\uff08MC-VQA\uff09\u3001\u5f00\u653e\u6027\u89c6\u9891\u95ee\u7b54\uff08OE-VQA\uff09\u548c\u89c6\u9891captioning\uff08VC\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u8868\u73b0\u51fa\u4e0e\u5f00\u6e90\u6a21\u578b\u76f8\u5f53\u7684\u7ade\u4e89\u5b9e\u529b\uff0c\u5e76\u5728\u67d0\u4e9b\u57fa\u51c6\u4e0a\u63a5\u8fd1\u4e13\u6709\u6a21\u578b\u3002\u5728\u97f3\u9891\u4ec5\u7528\uff08AQA\uff09\u548c\u97f3\u9891-\u89c6\u9891\u95ee\u7b54\uff08OE-AVQA\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u4e5f\u663e\u793a\u51fa\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5408\u7406\u6539\u8fdb\u3002\u8fd9\u4e9b\u8fdb\u6b65\u51f8\u663e\u4e86VideoLLaMA 2\u5728\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4e3a\u667a\u80fd\u89c6\u9891\u5206\u6790\u7cfb\u7edf\u6811\u7acb\u4e86\u65b0\u6807\u51c6\u3002\u6240\u6709\u6a21\u578b\u5747\u516c\u5f00\u4ee5\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2406.08477": "|**2024-06-12**|**Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens**|Ting-Ji Huang et.al.|[2406.08477](http://arxiv.org/abs/2406.08477)|null|\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u5411\u91cf\u8868\u793a\u7528\u6237\u548c\u9879\u76ee\u5bf9\u4e8e\u591a\u79cd\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u95ee\u7b54\u5f62\u5f0f\u7684\u63a8\u8350\uff0c\u4f7f\u7528\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\uff08\u5982\u201citem\u201d\u3001\u201c20\u201d\u3001\u201c24\u201d\uff09\u6765\u8868\u793a\u5b9e\u9645\u7684\u7528\u6237\u548c\u9879\u76ee\u3002\u7136\u800c\uff0c\u7531\u4e8eLLMs\u901a\u5e38\u662f\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u9884\u8bad\u7ec3\u7684\uff0c\u8fd9\u4e9b\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5728\u8868\u8fbe\u72ec\u7279\u7528\u6237\u548c\u9879\u76ee\u65b9\u9762\u80fd\u529b\u6709\u9650\uff0c\u5373\u4f7f\u7ecf\u8fc7\u63a8\u8350\u4efb\u52a1\u7684\u5fae\u8c03\uff0c\u4e5f\u4f1a\u524a\u5f31\u63a8\u8350\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5728LLM\u57fa\u7684\u63a8\u8350\u7cfb\u7edf\u4e2d\u5904\u7406\u7528\u6237\u548c\u9879\u76ee\u7684\u6807\u8bb0\u3002 \u6211\u4eec\u5f3a\u8c03\u4e86\u51fa\u8bcd\u6c47\u8868\uff08OOV\uff09\u6807\u8bb0\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u9664\u4e86\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5916\uff0c\u8fd8\u80fd\u6355\u6349\u7528\u6237/\u9879\u76ee\u4e4b\u95f4\u7684\u5173\u8054\u6027\u548c\u591a\u6837\u6027\u3002\u901a\u8fc7\u5206\u6790\u5386\u53f2\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u4f7f\u5177\u6709\u76f8\u4f3c\u7279\u6027\u7684\u7528\u6237/\u9879\u76ee\u7ec4\u5408\u5171\u4eab\u76f8\u540c\u7684OOV\u6807\u8bb0\u3002\u6b64\u5916\uff0c\u5c06\u8fd9\u4e9bOOV\u6807\u8bb0\u6574\u5408\u5230LLM\u7684\u8bcd\u6c47\u8868\u4e2d\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u533a\u5206\u7528\u6237\u548c\u9879\u76ee\uff0c\u589e\u5f3a\u5728\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u65f6\u5bf9\u7528\u6237-\u9879\u76ee\u5173\u7cfb\u7684\u6355\u6349\u3002 \u6211\u4eec\u7684\u63d0\u51fa\u7684\u6846\u67b6\u5728\u5404\u79cd\u4e0b\u6e38\u63a8\u8350\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2406.08474": "|**2024-06-12**|**Real2Code: Reconstruct Articulated Objects via Code Generation**|Zhao Mandi et.al.|[2406.08474](http://arxiv.org/abs/2406.08474)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014Real2Code\uff0c\u65e8\u5728\u901a\u8fc7\u4ee3\u7801\u751f\u6210\u6765\u91cd\u5efa\u53ef\u52a8\u7269\u4f53\u3002\u7ed9\u5b9a\u7269\u4f53\u7684\u89c6\u89c9\u89c2\u6d4b\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u56fe\u50cf\u5206\u5272\u6a21\u578b\u548c\u5f62\u72b6\u8865\u5168\u6a21\u578b\u91cd\u6784\u5176\u90e8\u4ef6\u51e0\u4f55\u7ed3\u6784\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u7269\u4f53\u90e8\u4ef6\u8868\u793a\u4e3a\u5e26\u6709\u65b9\u5411\u7684\u8fb9\u754c\u6846\uff0c\u7136\u540e\u8f93\u5165\u5230\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\uff0c\u9884\u6d4b\u5173\u8282\u6d3b\u52a8\u7684\u4ee3\u7801\u8868\u793a\u3002\u901a\u8fc7\u5229\u7528\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u4f18\u96c5\u5730\u6269\u5c55\u5230\u5177\u6709\u66f4\u591a\u53ef\u52a8\u90e8\u4ef6\u7684\u5bf9\u8c61\uff0c\u5e76\u80fd\u4ece\u5408\u6210\u8bad\u7ec3\u6570\u636e\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4e0d\u89c4\u5219\u73af\u5883\u7269\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cReal2Code\u5728\u91cd\u5efa\u7cbe\u5ea6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u662f\u9996\u4e2a\u80fd\u591f\u8d85\u8d8a\u8bad\u7ec3\u96c6\u4e2d\u5bf9\u8c61\u7ed3\u6784\u590d\u6742\u6027\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91cd\u5efa\u591a\u8fbe10\u4e2a\u53ef\u52a8\u90e8\u4ef6\u7684\u7269\u4f53\u3002\u5f53\u4e0e\u7acb\u4f53\u91cd\u5efa\u6a21\u578b\u7ed3\u5408\u65f6\uff0cReal2Code\u8fd8\u80fd\u4ece\u5c11\u91cf\u591a\u89c6\u56feRGB\u56fe\u50cf\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u7269\u4f53\uff0c\u65e0\u9700\u6df1\u5ea6\u6216\u76f8\u673a\u4fe1\u606f\u3002|\n", "2406.08464": "|**2024-06-12**|**Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing**|Zhangchen Xu et.al.|[2406.08464](http://arxiv.org/abs/2406.08464)|**[link](https://github.com/magpie-align/magpie)**|\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4\u6570\u636e\u5bf9\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u50cfLlama-3-Instruct\u8fd9\u6837\u7684\u6a21\u578b\u516c\u5f00\u4e86\u6743\u91cd\uff0c\u4f46\u5b83\u4eec\u7684\u5bf9\u9f50\u6570\u636e\u4ecd\u7136\u4fdd\u5bc6\uff0c\u8fd9\u9650\u5236\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u666e\u53ca\u3002\u73b0\u6709\u7684\u5f00\u6e90\u6570\u636e\u751f\u6210\u65b9\u6cd5\u53d7\u9650\u4e8e\u9ad8\u6602\u7684\u4eba\u529b\u6210\u672c\u548c\u6709\u9650\u7684\u63d0\u793a\u8303\u56f4\uff0c\u96be\u4ee5\u6709\u6548\u6269\u5c55\uff0c\u53ef\u80fd\u5f71\u54cd\u516c\u5171\u5bf9\u9f50\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u80fd\u5426\u901a\u8fc7\u76f4\u63a5\u4ece\u5df2\u5bf9\u9f50\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\uff0c\u5927\u89c4\u6a21\u5408\u6210\u9ad8\u8d28\u6307\u4ee4\u6570\u636e\u5462\uff1f\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5408\u6210\u65b9\u6cd5\uff0c\u79f0\u4e3aMagpie\u3002\u6211\u4eec\u7684\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u7531\u4e8eLlama-3-Instruct\u7b49\u5df2\u5bf9\u9f50\u7684\u6a21\u578b\u5177\u6709\u81ea\u56de\u5f52\u7279\u6027\uff0c\u5f53\u6211\u4eec\u4ec5\u8f93\u5165\u5de6\u4fa7\u6a21\u677f\u5230\u7528\u6237\u6d88\u606f\u9884\u7559\u4f4d\u7f6e\u65f6\uff0c\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u7528\u6237\u67e5\u8be2\u3002\u6211\u4eec\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\u63d0\u793aLlama-3-Instruct\uff0c\u751f\u6210\u4e86400\u4e07\u4e2a\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5bf9\u63d0\u53d6\u7684\u6570\u636e\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u5e76\u9009\u62e9\u4e8630\u4e07\u4e2a\u9ad8\u8d28\u91cf\u5b9e\u4f8b\u3002\u4e3a\u4e86\u6bd4\u8f83Magpie\u6570\u636e\u4e0e\u5176\u4ed6\u516c\u5171\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5206\u522b\u4f7f\u7528\u6bcf\u4e2a\u6570\u636e\u96c6\u5bf9Llama-3-8B-Base\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8bc4\u4f30\u5fae\u8c03\u540e\u6a21\u578b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u4ec5\u4f7f\u7528Magpie\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0e\u5b98\u65b9\u7ecf\u8fc71000\u4e07\u4e2a\u6570\u636e\u70b9\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u548c\u540e\u7eed\u53cd\u9988\u5b66\u4e60\u589e\u5f3a\u7684Llama-3-8B-Instruct\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4ec5\u4f7f\u7528Magpie\u8fdb\u884cSFT\u53ef\u4ee5\u8d85\u8d8a\u5148\u524d\u7528\u4e8eSFT\u548c\u504f\u597d\u4f18\u5316\uff08\u5982UltraFeedback\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff09\u7684\u516c\u5171\u6570\u636e\u96c6\u3002\u8fd9\u79cd\u4f18\u52bf\u5728AlpacaEval\u3001ArenaHard\u548cWildBench\u7b49\u5bf9\u9f50\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u660e\u663e\u3002|\n", "2406.08434": "|**2024-06-12**|**TasTe: Teaching Large Language Models to Translate through Self-Reflection**|Yutong Wang et.al.|[2406.08434](http://arxiv.org/abs/2406.08434)|**[link](https://github.com/yutongwang1216/reflectionllmmt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u540e\uff0c\u5728\u673a\u5668\u7ffb\u8bd1\uff08Machine Translation, MT\uff09\u7b49\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u672a\u80fd\u8fbe\u5230\u4e0e\u76d1\u7763\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08Supervised Neural Machine Translation, NMT\uff09\u7cfb\u7edf\u76f8\u5f53\u7684\u7ffb\u8bd1\u8d28\u91cf\u3002\u539f\u56e0\u53ef\u80fd\u662f\u5f53\u524d\u4f7f\u7528\u7684\u7b80\u5355\u63d0\u793a\u65e0\u6cd5\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TasTe\u6846\u67b6\uff0c\u5373\u201c\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u8fdb\u884c\u7ffb\u8bd1\u201d\u3002\u8be5\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u63a8\u7406\u9636\u6bb5\uff1a\u7b2c\u4e00\u9636\u6bb5\uff0c\u6a21\u578b\u88ab\u5f15\u5bfc\u751f\u6210\u521d\u6b65\u7ffb\u8bd1\u5e76\u540c\u65f6\u5bf9\u5176\u81ea\u8eab\u8fdb\u884c\u8bc4\u4f30\uff1b\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6a21\u578b\u6839\u636e\u8bc4\u4f30\u7ed3\u679c\u5bf9\u521d\u6b65\u7ffb\u8bd1\u8fdb\u884c\u7ec6\u5316\u3002\u5728WMT22\u57fa\u51c6\u7684\u56db\u79cd\u8bed\u8a00\u65b9\u5411\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\u7684\u6709\u6548\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91ca\u653e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u5e76\u589e\u5f3a\u5176\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/YutongWang1216/ReflectionLLMMT\u4e0a\u5f00\u6e90\u3002**|\n", "2406.08426": "|**2024-06-12**|**Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL**|Zijin Hong et.al.|[2406.08426](http://arxiv.org/abs/2406.08426)|null|\u6587\u672c\u8f6cSQL\u751f\u6210\u51c6\u786e\u7684SQL\u67e5\u8be2\u4ee5\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u662f\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5b83\u6d89\u53ca\u7528\u6237\u95ee\u9898\u7406\u89e3\u3001\u6570\u636e\u5e93\u6a21\u5f0f\u7406\u89e3\u4ee5\u53caSQL\u751f\u6210\u7b49\u591a\u4e2a\u590d\u6742\u73af\u8282\u3002\u4f20\u7edf\u7684\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u4f9d\u8d56\u4e8e\u4eba\u5de5\u5de5\u7a0b\u548c\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u3002\u968f\u7740\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u53d1\u5c55\u548c\u5728\u8be5\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002\u7136\u800c\uff0c\u968f\u7740\u6570\u636e\u5e93\u590d\u6742\u5ea6\u589e\u52a0\u548c\u7528\u6237\u95ee\u9898\u96be\u5ea6\u589e\u5927\uff0cPLMs\u6709\u9650\u7684\u7406\u89e3\u80fd\u529b\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u7684SQL\u751f\u6210\uff0c\u8fd9\u4fc3\u4f7f\u7814\u7a76\u4eba\u5458\u5bfb\u6c42\u66f4\u9ad8\u7ea7\u548c\u5b9a\u5236\u5316\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u9650\u5236\u4e86PLM\u57fa\u7840\u7cfb\u7edf\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0a\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u56e0\u6b64\uff0c\u6574\u5408LLM\u7684\u5b9e\u73b0\u4e3a\u6587\u672c\u8f6cSQL\u7814\u7a76\u5e26\u6765\u4e86\u72ec\u7279\u7684\u673a\u9047\u3001\u6311\u6218\u548c\u89e3\u51b3\u65b9\u6848\u3002\u672c\u7efc\u8ff0\u5168\u9762\u6982\u8ff0\u4e86\u57fa\u4e8eLLM\u7684\u6587\u672c\u8f6cSQL\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u5f53\u524d\u9762\u4e34\u7684\u6311\u6218\u548c\u6587\u672c\u8f6cSQL\u7684\u53d1\u5c55\u5386\u7a0b\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4ef7\u6307\u6807\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u5206\u6790\u4e86\u8fd1\u671f\u5728LLM\u652f\u6301\u4e0b\u7684\u6587\u672c\u8f6cSQL\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u5c1a\u5b58\u7684\u6311\u6218\uff0c\u5e76\u5bf9\u672a\u6765\u7814\u7a76\u65b9\u5411\u63d0\u51fa\u671f\u5f85\u3002|\n", "2406.08418": "|**2024-06-12**|**OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text**|Qingyun Li et.al.|[2406.08418](http://arxiv.org/abs/2406.08418)|**[link](https://github.com/opengvlab/omnicorpus)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aOmniCorpus\u7684\u5927\u578b\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u6570\u636e\u96c6\uff0c\u89c4\u6a21\u8fbe\u5230100\u4ebf\u7ea7\u522b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u901a\u8fc7\u9ad8\u6548\u7684\u5f15\u64ce\u7b5b\u9009\u548c\u63d0\u53d6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u6587\u6863\uff0c\u5305\u542b86\u4ebf\u5f20\u56fe\u7247\u548c1,696\u4e07\u4ebf\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u76f8\u8f83\u4e8e\u540c\u7c7b\u6570\u636e\uff08\u5982MMC4\u3001OBELICS\uff09\uff0cOmniCorpus\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf\uff1a1\uff09\u89c4\u6a21\u6269\u592715\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6570\u636e\u8d28\u91cf\uff1b2\uff09\u6765\u6e90\u66f4\u4e3a\u591a\u6837\uff0c\u5305\u62ec\u82f1\u6587\u548c\u975e\u82f1\u6587\u7f51\u7ad9\uff0c\u4ee5\u53ca\u89c6\u9891\u4e3a\u4e3b\u7684\u7f51\u7ad9\uff1b3\uff09\u7075\u6d3b\u6027\u66f4\u5f3a\uff0c\u53ef\u4ee5\u4ece\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u683c\u5f0f\u8f7b\u677e\u8f6c\u6362\u4e3a\u7eaf\u6587\u672c\u8bed\u6599\u5e93\u6216\u56fe\u50cf-\u6587\u672c\u5bf9\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bba\u6587\u9a8c\u8bc1\u4e86OmniCorpus\u7684\u6570\u636e\u8d28\u91cf\u3001\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684\u591a\u6a21\u6001\u6a21\u578b\u7814\u7a76\u63d0\u4f9b\u575a\u5b9e\u7684\u6570\u636e\u57fa\u7840\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/OpenGVLab/OmniCorpus\u4e0a\u516c\u5f00\u3002**|\n", "2406.08414": "|**2024-06-12**|**Discovering Preference Optimization Algorithms with and for Large Language Models**|Chris Lu et.al.|[2406.08414](http://arxiv.org/abs/2406.08414)|**[link](https://github.com/luchris429/DiscoPOP)**|****\u4e2d\u6587\u7ffb\u8bd1\uff1a** \u79bb\u7ebf\u504f\u597d\u4f18\u5316\u662f\u63d0\u5347\u548c\u63a7\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u8d28\u91cf\u7684\u91cd\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u504f\u597d\u4f18\u5316\u88ab\u89c6\u4e3a\u57fa\u4e8e\u4eba\u5de5\u8bbe\u8ba1\u7684\u51f8\u635f\u5931\u51fd\u6570\u7684\u79bb\u7ebf\u76d1\u7763\u5b66\u4e60\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53d7\u9650\u4e8e\u4eba\u7c7b\u521b\u9020\u529b\uff0c\u672a\u80fd\u5145\u5206\u63a2\u7d22\u53ef\u80fd\u7684\u635f\u5931\u51fd\u6570\u7684\u5de8\u5927\u641c\u7d22\u7a7a\u95f4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528LLM\u8fdb\u884c\u76ee\u6807\u53d1\u73b0\u7684\u65b9\u6cd5\uff0c\u4ee5\u81ea\u52a8\u53d1\u73b0\u65b0\u7684\u6700\u5148\u8fdb\u7684\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u65e0\u9700\uff08\u4e13\u5bb6\uff09\u4eba\u5de5\u5e72\u9884\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u8fed\u4ee3\u5730\u63d0\u793aLLM\uff0c\u6839\u636e\u5148\u524d\u7684\u6027\u80fd\u8bc4\u4f30\u63d0\u51fa\u5e76\u5b9e\u73b0\u65b0\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5bfc\u81f4\u4e86\u672a\u77e5\u4e14\u9ad8\u6548\u7684\u4f18\u5316\u7b97\u6cd5\u7684\u53d1\u73b0\u3002\u5176\u4e2d\u6700\u597d\u7684\u4e00\u4e2a\u88ab\u547d\u540d\u4e3a\u201c\u53d1\u73b0\u504f\u597d\u4f18\u5316\u201d\uff08DiscoPOP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\uff0c\u5b83\u5de7\u5999\u5730\u878d\u5408\u4e86\u903b\u8f91\u548c\u6307\u6570\u635f\u5931\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDiscoPOP\u5728\u6027\u80fd\u4e0a\u8fbe\u5230\u4e86\u6700\u65b0\u6c34\u5e73\uff0c\u5e76\u6210\u529f\u5730\u5e94\u7528\u4e8e\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u3002**|\n", "2406.08413": "|**2024-06-12**|**Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference**|Christopher Wolters et.al.|[2406.08413](http://arxiv.org/abs/2406.08413)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u671f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f7f\u5f97\u673a\u5668\u80fd\u591f\u751f\u6210\u903c\u771f\u7684\u6587\u672c\u5e76\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u968f\u7740\u8ba1\u7b97\u548c\u5185\u5b58\u9700\u6c42\u7684\u6025\u5267\u589e\u957f\uff0c\u5c24\u5176\u662f\u5f53LLMs\u8d85\u8d8a\u5355\u4e2aGPU\u7684\u5904\u7406\u80fd\u529b\u65f6\uff0c\u5bf9\u901f\u5ea6\u3001\u6548\u7387\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u9700\u6c42\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u540c\u65f6\uff0c\u8ba1\u7b97\u673a\u6027\u80fd\u548c\u5185\u5b58\u80fd\u529b\u7684\u53d1\u5c55\u5e76\u672a\u8ddf\u4e0a\u6b65\u4f10\uff0c\u5c24\u5176\u662f\u5728\u6469\u5c14\u5b9a\u5f8b\u653e\u7f13\u7684\u80cc\u666f\u4e0b\u3002\u5185\u5b58\u8bbf\u95ee\u6210\u672c\u8fdc\u9ad8\u4e8e\u8ba1\u7b97\uff0c\u8fd9\u7ed9\u5927\u89c4\u6a21\u6269\u5c55\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5373\u6240\u8c13\u7684\u201c\u5185\u5b58\u5899\u201d\u3002\u5728\u8fd9\u4e2a\u65f6\u5019\uff0c\u8ba1\u7b97\u5728\u5185\u5b58\uff08Compute-in-Memory, CIM\uff09\u6280\u672f\u4e3aAI\u63a8\u7406\u63d0\u4f9b\u4e86\u52a0\u901f\u53ef\u80fd\uff0c\u901a\u8fc7\u5728\u5185\u5b58\u4e2d\u76f4\u63a5\u6267\u884c\u6a21\u62df\u8ba1\u7b97\uff0c\u6709\u671b\u964d\u4f4e\u5ef6\u8fdf\u548c\u529f\u8017\u3002\u901a\u8fc7\u7d27\u5bc6\u96c6\u6210\u5185\u5b58\u548c\u8ba1\u7b97\u5143\u4ef6\uff0cCIM\u6d88\u9664\u4e86\u51af\u8bfa\u4f9d\u66fc\u74f6\u9888\uff0c\u51cf\u5c11\u4e86\u6570\u636e\u4f20\u8f93\uff0c\u63d0\u9ad8\u4e86\u80fd\u6e90\u6548\u7387\u3002 \u672c\u7efc\u8ff0\u8bba\u6587\u6982\u8ff0\u4e86\u57fa\u4e8e\u53d8\u538b\u5668\u7684\u6a21\u578b\uff0c\u63a2\u8ba8\u4e86\u5404\u79cdCIM\u67b6\u6784\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u5982\u4f55\u5e94\u5bf9\u73b0\u4ee3\u4eba\u5de5\u667a\u80fd\u8ba1\u7b97\u7cfb\u7edf\u9762\u4e34\u7684\u7d27\u8feb\u6311\u6218\u3002\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u4e86\u4e0e\u53d8\u538b\u5668\u76f8\u5173\u7684\u8fd0\u7b97\u53ca\u5176\u786c\u4ef6\u52a0\u901f\u7b56\u7565\uff0c\u540c\u65f6\u6307\u51fa\u76f8\u5173CIM\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\u3001\u8d8b\u52bf\u548c\u6d1e\u5bdf\u3002|\n", "2406.08402": "|**2024-06-12**|**Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models**|Chun-Yi Kuan et.al.|[2406.08402](http://arxiv.org/abs/2406.08402)|**[link](https://github.com/kuan2jiu99/audio-hallucination)**|**## \u80cc\u666f \u5927\u578b\u97f3\u9891\u8bed\u8a00\u6a21\u578b\uff08LALMs\uff09\u901a\u8fc7\u6574\u5408\u97f3\u9891\u611f\u77e5\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u4f20\u7edf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u97f3\u9891\u76f8\u5173\u4efb\u52a1\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bc4\u4f30LALMs\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u4f46\u5bf9\u5b83\u4eec\u7684\u53ef\u9760\u6027\uff0c\u7279\u522b\u662f\u5173\u4e8e\u5bf9\u8c61\u5e7b\u89c9\u7b49\u95ee\u9898\u7684\u5173\u6ce8\u4e0d\u8db3\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u6765\u8bc4\u4f30\u516c\u5f00\u53ef\u7528\u7684LALMs\u5728\u5bf9\u8c61\u5e7b\u89c9\u65b9\u9762\u7684\u7a0b\u5ea6\u3002\u7ed3\u679c\u8868\u660e\uff0cLALMs\u5728\u7406\u89e3\u97f3\u9891\u5185\u5bb9\u65b9\u9762\u4e0e\u4e13\u95e8\u7684\u97f3\u9891captioning\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5728\u56de\u7b54\u533a\u5206\u6027\u95ee\u9898\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u9700\u8981\u8bc6\u522b\u97f3\u9891\u7247\u6bb5\u4e2d\u7279\u5b9a\u7269\u4f53\u58f0\u97f3\u7684\u95ee\u9898\u3002\u8fd9\u63ed\u793a\u4e86\u5f53\u524dLALMs\u7684\u4e00\u4e2a\u5173\u952e\u5f31\u70b9\uff1a\u5b83\u4eec\u5bf9\u533a\u5206\u6027\u67e5\u8be2\u7684\u7406\u89e3\u4e0d\u8db3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u63d0\u793a\u5de5\u7a0b\u5982\u4f55\u63d0\u5347LALMs\u5728\u533a\u5206\u6027\u95ee\u9898\u4e0a\u7684\u6027\u80fd\u3002**|\n", "2406.08398": "|**2024-06-12**|**cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers**|Anirudh Sundar et.al.|[2406.08398](http://arxiv.org/abs/2406.08398)|null|## \u80cc\u666f \u5728\u60c5\u5883\u5316\u548c\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8bdd\uff08SIMMC\uff09\u7684\u65b0\u5174\u7814\u7a76\u9886\u57df\u4e2d\uff0c\u79d1\u5b66\u8bba\u6587\u7684\u4e92\u52a8\u662f\u4e00\u4e2a\u91cd\u8981\u65b9\u5411\u3002\u7531\u4e8e\u79d1\u5b66\u8bba\u6587\u4e3b\u8981\u7531\u6587\u672c\u3001\u516c\u5f0f\u3001\u56fe\u8868\u548c\u8868\u683c\u6784\u6210\uff0cSIMMC\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u8fdb\u884c\u4e13\u95e8\u8bbe\u8ba1\uff0c\u4ee5\u652f\u6301\u79d1\u7814\u4eba\u5458\u6240\u9700\u7684\u6df1\u5ea6\u63a2\u7a76\u548c\u4e92\u52a8\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u8bdd\u5f0f\u8bba\u6587\u201d\uff08cPAPERS\uff09\u7684\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u6765\u81eaarXiv\u4e0a\u53ef\u7528\u7684\u79d1\u5b66\u6587\u6863\u7684\u5b66\u672f\u8bba\u6587\u8bc4\u8bba\u4e2d\u7684\u95ee\u7b54\u5bf9\uff0c\u8fd9\u4e9b\u95ee\u7b54\u4e0e\u8bba\u6587\u7ec4\u4ef6\u53ca\u5176\u5f15\u7528\u76f8\u5173\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6570\u636e\u6536\u96c6\u7b56\u7565\uff0c\u901a\u8fc7OpenReview\u6536\u96c6\u8fd9\u4e9b\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5e76\u4e0eLaTeX\u6e90\u6587\u4ef6\u4e2d\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5173\u8054\u8d77\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u7cfb\u5217\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5305\u62ec\u96f6\u6837\u672c\u548c\u5fae\u8c03\u914d\u7f6e\uff0c\u6765\u5904\u7406cPAPERS\u6570\u636e\u96c6\u3002|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418](http://arxiv.org/abs/2406.09418)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|**\u5728\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u5c55\u57fa\u7840\u4e0a\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u89c6\u9891\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u9891LMMs\u4f9d\u8d56\u4e8e\u56fe\u50cf\u6216\u89c6\u9891\u7f16\u7801\u5668\u5904\u7406\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u4e9b\u7f16\u7801\u5668\u5404\u81ea\u5b58\u5728\u5c40\u9650\u6027\u3002\u56fe\u50cf\u7f16\u7801\u5668\u64c5\u957f\u6355\u6349\u5e27\u5e8f\u5217\u4e2d\u7684\u4e30\u5bcc\u7a7a\u95f4\u7ec6\u8282\uff0c\u4f46\u7f3a\u4e4f\u660e\u786e\u7684\u65f6\u95f4\u4e0a\u4e0b\u6587\uff1b\u800c\u89c6\u9891\u7f16\u7801\u5668\u63d0\u4f9b\u65f6\u95f4\u4e0a\u4e0b\u6587\uff0c\u4f46\u5e38\u5e38\u53d7\u9650\u4e8e\u8ba1\u7b97\u8d44\u6e90\uff0c\u5bfc\u81f4\u53ea\u80fd\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u7684\u7a00\u758f\u5e27\uff0c\u4ece\u800c\u5f71\u54cd\u4e86\u5bf9\u7a7a\u95f4\u548c\u4e0a\u4e0b\u6587\u7684\u7406\u89e3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faVideoGPT+\uff0c\u5b83\u7ed3\u5408\u4e86\u56fe\u50cf\u7f16\u7801\u5668\uff08\u7528\u4e8e\u8be6\u7ec6\u7684\u7a7a\u95f4\u7406\u89e3\uff09\u548c\u89c6\u9891\u7f16\u7801\u5668\uff08\u7528\u4e8e\u5168\u5c40\u65f6\u5e8f\u4e0a\u4e0b\u6587\u5efa\u6a21\uff09\u7684\u4f18\u52bf\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u89c6\u9891\u5212\u5206\u4e3a\u5c0f\u6bb5\uff0c\u5e76\u5bf9\u6765\u81ea\u4e24\u8005\u7279\u5f81\u7684\u63d0\u53d6\u5e94\u7528\u81ea\u9002\u5e94\u6c60\u5316\u7b56\u7565\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u6211\u4eec\u7684\u67b6\u6784\u5728\u591a\u4e2a\u89c6\u9891\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5305\u62ecVCGBench\u3001MVBench\u548c\u96f6\u6837\u672c\u95ee\u7b54\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a112K\u7684\u89c6\u9891\u6307\u4ee4\u96c6\uff0c\u901a\u8fc7\u65b0\u9896\u7684\u534a\u81ea\u52a8\u6807\u6ce8\u7ba1\u9053\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u89c6\u9891LMMs\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86VCGBench-Diverse\uff0c\u5b83\u6db5\u76d6\u4e8618\u4e2a\u5e7f\u6cdb\u89c6\u9891\u7c7b\u522b\uff0c\u5982\u751f\u6d3b\u65b9\u5f0f\u3001\u4f53\u80b2\u3001\u79d1\u5b66\u3001\u6e38\u620f\u548c\u76d1\u63a7\u89c6\u9891\uff0c\u51714,354\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u8fd9\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\u73b0\u6709LMMs\u5728\u5bc6\u96c6\u89c6\u9891\u63cf\u8ff0\u3001\u7a7a\u95f4\u548c\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u590d\u6742\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u786e\u4fdd\u5728\u5404\u79cd\u89c6\u9891\u7c7b\u578b\u548c\u52a8\u6001\u4e0b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mbzuai-oryx/VideoGPT-plus\u627e\u5230\u3002**|\n", "2406.09412": "|**2024-06-13**|**Explore the Limits of Omni-modal Pretraining at Scale**|Yiyuan Zhang et.al.|[2406.09412](http://arxiv.org/abs/2406.09412)|**[link](https://github.com/invictus717/MiCo)**|**\u6211\u4eec\u63d0\u8bae\u6784\u5efa\u5168\u6a21\u6001\u667a\u80fd\uff0c\u65e8\u5728\u7406\u89e3\u5404\u79cd\u6a21\u6001\u5e76\u5b66\u4e60\u901a\u7528\u8868\u793a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u9884\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u591a\u6a21\u6001\u4e0a\u4e0b\u6587\uff08MiCo\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u540c\u65f6\u589e\u52a0\u6a21\u6001\u6570\u91cf\u3001\u6570\u636e\u91cf\u4ee5\u53ca\u6a21\u578b\u53c2\u6570\u7684\u6570\u91cf\u3002\u901a\u8fc7MiCo\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u591a\u9879\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u6a21\u6001\u5b66\u4e60\u80fd\u529b\uff1a\u4e00\u662f\u9488\u5bf910\u79cd\u4e0d\u540c\u6a21\u6001\u7684\u5355\u6a21\u6001\u611f\u77e5\u57fa\u51c6\uff0c\u4e8c\u662f\u5305\u62ec\u68c0\u7d22\u3001\u95ee\u7b54\u548ccaptioning\u5728\u5185\u768425\u9879\u8de8\u6a21\u6001\u7406\u89e3\u4efb\u52a1\uff0c\u4e09\u662f18\u4e2a\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u57fa\u51c6\u3002\u6211\u4eec\u7684\u6a21\u578b\u521b\u9020\u4e8637\u9879\u6700\u65b0\u7684\u6700\u9ad8\u6027\u80fd\u8bb0\u5f55\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u63a8\u52a8\u5168\u6a21\u6001\u667a\u80fd\u7684\u53d1\u5c55\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u5f00\u6e90\u3002**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397](http://arxiv.org/abs/2406.09397)|null|\u73b0\u4ee3\u89c6\u89c9\u6a21\u578b\u5728\u5927\u89c4\u6a21\u5608\u6742\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u867d\u7136\u5c55\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u9075\u5faa\u7528\u6237\u610f\u56fe\u3001\u5982\u89c6\u89c9\u7f8e\u611f\u3001\u7279\u5b9a\u98ce\u683c\u548c\u8d23\u4efb\u8f93\u51fa\u65b9\u9762\u53ef\u80fd\u5b58\u5728\u95ee\u9898\u3002\u672c\u6587\u5173\u6ce8\u89c6\u89c9\u7f8e\u5b66\u9886\u57df\uff0c\u76ee\u6807\u662f\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u6807\u51c6\u5728\u68c0\u7d22\u7cfb\u7edf\u4e2d\u4fdd\u6301\u4e00\u81f4\u3002\u9ad8\u7ea7\u68c0\u7d22\u7cfb\u7edf\u901a\u5e38\u91c7\u7528\u57fa\u4e8e\u4f4e\u7ea7\u7279\u5f81\uff08\u5982\u9971\u548c\u5ea6\uff09\u7684\u5ba1\u7f8e\u6a21\u578b\u4f5c\u4e3a\u91cd\u6392\u5668\u6216\u8fc7\u6ee4\u5668\uff0c\u4f46\u9762\u5bf9\u98ce\u683c\u3001\u6587\u5316\u6216\u77e5\u8bc6\u80cc\u666f\u65f6\u6027\u80fd\u6709\u9650\u3002\u6211\u4eec\u53d1\u73b0\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u6539\u5199\u641c\u7d22\u67e5\u8be2\u5e76\u6269\u5c55\u5ba1\u7f8e\u671f\u671b\uff0c\u53ef\u4ee5\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u504f\u597d\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u9488\u5bf9\u89c6\u89c9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u63d0\u53d6LLM\u63a8\u7406\u548c\u5ba1\u7f8e\u6a21\u578b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u66f4\u597d\u5730\u4f7f\u89c6\u89c9\u6a21\u578b\u7b26\u5408\u4eba\u7c7b\u5ba1\u7f8e\u3002\u7531\u4e8e\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u68c0\u7d22\u7cfb\u7edf\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u6a21\u578b\uff08LMM\uff09\u6765\u8bc4\u4ef7\u7f8e\u611f\u8868\u73b0\u3002\u8003\u8651\u5230\u7f8e\u611f\u8bc4\u4f30\u7684\u4e3b\u89c2\u6027\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aHPIR\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8861\u91cf\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u7684\u5951\u5408\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9\u6a21\u578b\u7684\u7f8e\u611f\u884c\u4e3a\uff0c\u4ece\u591a\u4e2a\u6307\u6807\u6765\u770b\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u63d0\u51fa\u7684\u7b97\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u901a\u7528\u5b9e\u8df5\uff0c\u7528\u4e8e\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002|\n", "2406.09396": "|**2024-06-13**|**Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA**|Jongwoo Park et.al.|[2406.09396](http://arxiv.org/abs/2406.09396)|**[link](https://github.com/jongwoopark7978/LVNet)**|\u957f\u671f\u89c6\u9891\u901a\u5e38\u5305\u542b\u5927\u91cf\u5197\u4f59\u4fe1\u606f\uff0c\u8de8\u8d8a\u8f83\u957f\u7684\u65f6\u95f4\u95f4\u9694\uff0c\u4e14\u5305\u542b\u591a\u4e2a\u677e\u6563\u5173\u8054\u7684\u4e8b\u4ef6\u6216\u5b9e\u4f53\u3002\u56e0\u6b64\uff0c\u5728\u8fdb\u884c\u957f\u89c6\u9891\u95ee\u7b54\uff08LVQA\uff09\u65f6\uff0c\u751f\u6210\u6b63\u786e\u7b54\u6848\u6240\u9700\u7684\u6240\u6709\u4fe1\u606f\u5f80\u5f80\u53ea\u9700\u4e00\u5c0f\u90e8\u5206\u5e27\u5c31\u8db3\u4ee5\u63d0\u4f9b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LVQA\u57fa\u51c6\u4e0a\u53d6\u5f97\u5353\u8d8a\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u4f9d\u8d56\u4e8e\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5c06\u89c6\u9891\u4e2d\u7684\u6240\u6709\u89c6\u89c9\u5185\u5bb9\u8f6c\u6362\u6210\u81ea\u7136\u8bed\u8a00\u3002\u4f20\u7edf\u505a\u6cd5\u901a\u5e38\u662f\u5747\u5300\u91c7\u6837\u5927\u91cf\u5e27\u5e76\u72ec\u7acb\u4e3a\u5176\u751f\u6210\u63cf\u8ff0\uff0c\u8fd9\u65e2\u4e0d\u9ad8\u6548\u4e5f\u4e0d\u514d\u6709\u5197\u4f59\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5173\u952e\u5e27\u9009\u62e9\u548c\u987a\u5e8f\u611f\u77e5\u7684\u63cf\u8ff0\u65b9\u6cd5\uff0c\u4ee5\u663e\u8457\u51cf\u5c11\u8fd9\u4e9b\u5197\u4f59\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u521b\u65b0\u65b9\u6cd5\uff1a\u5c42\u6b21\u5173\u952e\u5e27\u9009\u62e9\u5668\u548c\u987a\u5e8f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u7684\u6700\u7ec8\u6846\u67b6\u79f0\u4e3aLVNet\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6LVQA\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u4ee3\u7801\u3002|\n", "2406.09367": "|**2024-06-13**|**Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs**|Zijia Zhao et.al.|[2406.09367](http://arxiv.org/abs/2406.09367)|**[link](https://github.com/joez17/videoniah)**|**\u89c6\u9891\u7406\u89e3\u662f\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5173\u952e\u4e0b\u4e00\u6b65\u3002\u4e3a\u4e86\u68c0\u9a8c\u89c6\u9891\u7406\u89e3\u7684\u7279\u5b9a\u65b9\u9762\uff0c\u73b0\u6709\u7684\u89c6\u9891\u57fa\u51c6\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u9009\u62e9\u4e0e\u76ee\u6807\u80fd\u529b\u5339\u914d\u7684\u89c6\u9891\uff0c\u5e76\u5bf9\u67e5\u8be2-\u54cd\u5e94\u5bf9\u8fdb\u884c\u7e41\u7410\u7684\u6807\u6ce8\uff0c\u4ee5\u5339\u914d\u89c6\u9891\u5185\u5bb9\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u65e2\u5177\u6709\u6311\u6218\u6027\u53c8\u8d44\u6e90\u5bc6\u96c6\u3002\u672c\u6587\u63d0\u51faVideoNIAH\uff08\u89c6\u9891\u9488 haystack\uff09\uff0c\u4e00\u4e2a\u901a\u8fc7\u5408\u6210\u89c6\u9891\u751f\u6210\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\u3002VideoNIAH\u901a\u8fc7\u5c06\u4e0d\u76f8\u5173\u7684\u56fe\u50cf/\u6587\u672c\u201c\u9488\u201d\u63d2\u5165\u539f\u59cb\u89c6\u9891\u4e2d\uff0c\u5c06\u6d4b\u8bd5\u89c6\u9891\u5185\u5bb9\u4e0e\u5b83\u4eec\u7684\u67e5\u8be2-\u54cd\u5e94\u5206\u79bb\u3002\u5b83\u4ec5\u57fa\u4e8e\u8fd9\u4e9b\u9488\u751f\u6210\u6ce8\u91ca\uff0c\u786e\u4fdd\u89c6\u9891\u6765\u6e90\u7684\u591a\u6837\u6027\u548c\u67e5\u8be2-\u54cd\u5e94\u7684\u4e30\u5bcc\u6027\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d2\u5165\u591a\u4e2a\u9488\uff0cVideoNIAH\u4e25\u683c\u8bc4\u4f30\u6a21\u578b\u7684\u65f6\u5e8f\u7406\u89e3\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528VideoNIAH\u6784\u5efa\u4e86\u89c6\u9891\u57fa\u51c6VNBench\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u6392\u5e8f\u548c\u8ba1\u6570\u7b49\u4efb\u52a1\u3002VNBench\u80fd\u591f\u9ad8\u6548\u5730\u8bc4\u4f30\u89c6\u9891\u6a21\u578b\u7684\u7cbe\u7ec6\u7406\u89e3\u80fd\u529b\u548c\u65f6\u7a7a\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u652f\u6301\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd1\u671f\u7684\u89c6\u9891\u4e3a\u4e2d\u5fc3\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u6a21\u578b\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u3002\u5c3d\u7ba1\u4e13\u6709\u6a21\u578b\u76f8\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\u5177\u6709\u663e\u8457\u4f18\u52bf\uff0c\u4f46\u6240\u6709\u73b0\u6709\u89c6\u9891\u6a21\u578b\u5728\u957f\u8ddd\u79bb\u4f9d\u8d56\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4ecd\u7136\u4e0d\u4f73\u3002VideoNIAH\u662f\u4e00\u4e2a\u7b80\u5355\u4e14\u9ad8\u5ea6\u53ef\u6269\u5c55\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\uff0c\u6211\u4eec\u76f8\u4fe1\u5b83\u5c06\u6fc0\u53d1\u672a\u6765\u89c6\u9891\u57fa\u51c6\u5de5\u4f5c\u7684\u521b\u65b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/joez17/VideoNIAH\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.09363": "|**2024-06-13**|**ElicitationGPT: Text Elicitation Mechanisms via Language Models**|Yifan Wu et.al.|[2406.09363](http://arxiv.org/abs/2406.09363)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u65e0\u9700\u9886\u57df\u77e5\u8bc6\u7684\u67e5\u8be2\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u5bf9\u83b7\u53d6\u7684\u6587\u672c\u9884\u6d4b\u8fdb\u884c\u8bc4\u5206\uff0c\u4ee5\u8bc4\u4f30\u5176\u4e0e\u5b9e\u9645\u72b6\u6001\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u6fc0\u52b1\u4fe1\u606f\u6536\u96c6\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u3002\u7814\u7a76\u901a\u8fc7\u5728\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u81ea\u52a8\u7684\u6a21\u578b\u8bc4\u5206\u4e0e\u4eba\u5de5\u5bfc\u5e08\u7ed9\u51fa\u7684\u8bc4\u5206\uff0c\u65e8\u5728\u5b9e\u8bc1\u8bc4\u4f30\u8fd9\u4e9b\u673a\u5236\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002|\n", "2406.09345": "|**2024-06-13**|**DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding**|Suwon Shon et.al.|[2406.09345](http://arxiv.org/abs/2406.09345)|null|## \u80cc\u666f \u5c06\u9884\u8bad\u7ec3\u7684\u6587\u672c\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u8bed\u97f3\u8f93\u5165\u76f8\u7ed3\u5408\uff0c\u5df2\u7ecf\u8d4b\u4e88\u4e86\u8fd9\u4e9b\u6a21\u578b\u6267\u884c\u591a\u6837\u5316\u8bed\u97f3\u4efb\u52a1\u7684\u80fd\u529b\uff0c\u5305\u62ec\u6307\u4ee4\u8ddf\u968f\u3002\u8fd9\u79cd\u6574\u5408\u9700\u8981\u7ed3\u5408\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u548cLLM\uff0c\u5b83\u4eec\u5206\u522b\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u63d0\u8bae\u4f7f\u7528\u79bb\u6563\u8bed\u97f3\u5355\u5143\uff08DSU\uff09\uff0c\u800c\u975e\u8fde\u7eed\u503c\u7684\u8bed\u97f3\u7f16\u7801\u8f93\u51fa\uff0c\u901a\u8fc7\u8bed\u97f3\u9002\u914d\u5668\u5c06DSU\u8f6c\u6362\u5230LLM\u7684\u5d4c\u5165\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u65e0\u76d1\u7763\u7684\u8bed\u97f3\u7f16\u7801\u5668\u751f\u6210DSU\uff0c\u7136\u540e\u8fd0\u7528k-means\u805a\u7c7b\u65b9\u6cd5\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u5904\u7406\u6765\u81ea\u89c1/\u672a\u89c1\u8fc7\u9886\u57df\u4ee5\u53ca\u53e3\u8bed\u95ee\u7b54\u4e2d\u7684\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u7a33\u5065\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6765\u81ea\u4e0d\u540c\u81ea\u76d1\u7763\u8bed\u97f3\u7f16\u7801\u5668\u5c42\u7684DSU\u7c7b\u578b\uff0c\u4ee5\u53ca\u6885\u5c14\u9891\u7387\u5012\u8c31\u7cfb\u6570\uff08MFCC\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u53e3\u8bed\u95ee\u7b54\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u4e2d\uff0cASR\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u53ef\u80fd\u8f83\u4f4e\u3002|\n", "2406.09325": "|**2024-06-13**|**REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space**|Tomer Ashuach et.al.|[2406.09325](http://arxiv.org/abs/2406.09325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u80fd\u65e0\u610f\u4e2d\u8bb0\u4f4f\u5e76\u6cc4\u9732\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u654f\u611f\u6216\u4e2a\u4eba\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\uff0c\u5f15\u53d1\u9690\u79c1\u95ee\u9898\u3002\u5f53\u524d\u7684\u89e3\u51b3\u65b9\u6848\u5305\u62ec\u6602\u8d35\u7684\u6570\u636e\u6e05\u6d17\uff0c\u6216\u8005\u901a\u8fc7\u9057\u5fd8\u548c\u6a21\u578b\u7f16\u8f91\u6765\u8fc7\u6ee4\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u88ab\u63d0\u53d6\u653b\u51fb\u7ed5\u8fc7\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u540d\u4e3aREVS\uff0c\u7528\u4e8e\u4eceLLMs\u4e2d\u6d88\u9664\u654f\u611f\u4fe1\u606f\u3002REVS\u8bc6\u522b\u5e76\u4fee\u6539\u4e0e\u6bcf\u6761\u654f\u611f\u4fe1\u606f\u76f8\u5173\u7684\u5c11\u91cf\u795e\u7ecf\u5143\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u795e\u7ecf\u5143\u6295\u5f71\u5230\u8bcd\u6c47\u7a7a\u95f4\uff08\u53bb\u5d4c\u5165\uff09\uff0c\u6211\u4eec\u5b9a\u4f4d\u9a71\u52a8\u5176\u751f\u6210\u7684\u5173\u952e\u90e8\u5206\u3002\u7136\u540e\uff0c\u6211\u4eec\u6839\u636e\u53bb\u5d4c\u5165\u77e9\u9635\u7684\u4f2a\u9006\u8ba1\u7b97\u6a21\u578b\u7f16\u8f91\uff0c\u5e76\u5e94\u7528\u5b83\u6765\u964d\u4f4e\u76ee\u6807\u654f\u611f\u6570\u636e\u7684\u751f\u6210\u6982\u7387\u3002\u4e3a\u4e86\u5145\u5206\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u771f\u6b63\u654f\u611f\u4fe1\u606f\u4e0a\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u662fGPT-J\u56fa\u6709\u7684\u7535\u5b50\u90ae\u4ef6\u6570\u636e\u96c6\uff0c\u53e6\u4e00\u4e2a\u662f\u6211\u4eec\u8c03\u6574\u6a21\u578b\u4f7f\u5176\u8bb0\u5fc6\u7684\u5408\u6210\u793e\u4f1a\u4fdd\u969c\u53f7\u7801\u6570\u636e\u96c6\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\u76f8\u6bd4\uff0cREVS\u5728\u6d88\u9664\u654f\u611f\u4fe1\u606f\u548c\u62b5\u6297\u63d0\u53d6\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u7684\u5b8c\u6574\u6027\u3002\u4ee3\u7801\u548c\u6f14\u793a\u7b14\u8bb0\u672c\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.09324": "|**2024-06-13**|**Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs**|Zhao Xu et.al.|[2406.09324](http://arxiv.org/abs/2406.09324)|**[link](https://github.com/usail-hkust/bag_of_tricks_for_llm_jailbreaking)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6837\u672c\u4efb\u52a1\u6267\u884c\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u6613\u53d7\u7834\u89e3\u653b\u51fb\uff0c\u53ef\u80fd\u88ab\u64cd\u7eb5\u4ea7\u751f\u6709\u5bb3\u8f93\u51fa\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f00\u59cb\u5c06\u7834\u89e3\u653b\u51fb\u5206\u4e3a\u4ee4\u724c\u7ea7\u548c\u63d0\u793a\u7ea7\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u4e3b\u8981\u5ffd\u89c6\u4e86\u7834\u89e3\u653b\u51fb\u7684\u591a\u6837\u5173\u952e\u56e0\u7d20\uff0c\u5927\u90e8\u5206\u7814\u7a76\u805a\u7126\u4e8eLLM\u7684\u6f0f\u6d1e\uff0c\u800c\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u63a2\u7d22\u4e0d\u8db3\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u653b\u51fb\u8bbe\u7f6e\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u8bae\u5efa\u7acb\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u6846\u67b6\uff0c\u4ee5\u4fc3\u8fdb\u6807\u51c6\u5316\u8bc4\u4f30\u3002\u6211\u4eec\u4ece\u76ee\u6807\u7ea7\u548c\u653b\u51fb\u7ea7\u4e24\u4e2a\u89d2\u5ea6\uff0c\u8be6\u7ec6\u8003\u5bdf\u4e86\u5b9e\u65bd\u9488\u5bf9LLMs\u7684\u7834\u89e3\u653b\u51fb\u7684\u516b\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5e38\u7528\u6570\u636e\u96c6\u4e0a\u5bf9\u516d\u79cd\u9632\u5fa1\u65b9\u6cd5\u8fdb\u884c\u4e86\u4e03\u79cd\u4ee3\u8868\u6027\u7684\u7834\u89e3\u653b\u51fb\uff0c\u603b\u8ba1\u7ea6320\u4e2a\u5b9e\u9a8c\uff0c\u4f7f\u7528A800-80G GPU\u8017\u65f6\u5927\u7ea65\u4e07\u5c0f\u65f6\u3002\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u8fdb\u884c\u6807\u51c6\u5316\u8bc4\u4f30\u7684\u5fc5\u8981\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff1ahttps://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking\u3002**|\n", "2406.09321": "|**2024-06-13**|**JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models**|Delong Ran et.al.|[2406.09321](http://arxiv.org/abs/2406.09321)|**[link](https://github.com/thuccslab/jailbreakeval)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8d8a\u72f1\u653b\u51fb\u7814\u7a76\u4e2d\u7684\u8bc4\u4f30\u96be\u9898\u3002\u76ee\u524d\uff0c\u5bf9\u4e8e\u653b\u51fb\u662f\u5426\u6210\u529f\u7f3a\u4e4f\u7edf\u4e00\u6807\u51c6\uff0c\u4e0d\u540c\u7684\u8bc4\u4f30\u65b9\u6cd5\u5982\u4eba\u5de5\u6807\u6ce8\u6216\u7279\u5b9a\u65b9\u5f0f\u63d0\u793aGPT-4\u5b58\u5728\uff0c\u5404\u6709\u4f18\u7f3a\u70b9\uff0c\u5bf9\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u4f53\u73b0\u548c\u7814\u7a76\u6210\u672c\u4ea7\u751f\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7814\u7a76\u5206\u6790\u4e86\u8fd1\u4e5d\u5341\u98792023\u5e745\u6708\u81f32024\u5e744\u6708\u671f\u95f4\u53d1\u5e03\u7684\u8d8a\u72f1\u653b\u51fb\u76f8\u5173\u7814\u7a76\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u8be6\u7ec6\u7684\u8bc4\u4f30\u65b9\u6cd5\u5206\u7c7b\u4f53\u7cfb\uff0c\u6df1\u5165\u5256\u6790\u4e86\u5404\u79cd\u8bc4\u4f30\u5668\u7684\u4f18\u7f3a\u70b9\u53ca\u5176\u5e94\u7528\u73b0\u72b6\u3002\u4e3a\u4e86\u63a8\u52a8\u540e\u7eed\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u5e76\u63a8\u51fa\u4e86JailbreakEval\u5de5\u5177\u5305\uff0c\u5b83\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5e73\u53f0\uff0c\u96c6\u6210\u4e86\u591a\u79cd\u77e5\u540d\u7684\u8bc4\u4f30\u5668\uff0c\u7528\u6237\u53ea\u9700\u4e00\u4e2a\u547d\u4ee4\u5373\u53ef\u83b7\u53d6\u7ed3\u679c\u3002\u6b64\u5916\uff0cJailbreakEval\u652f\u6301\u7528\u6237\u5728\u7edf\u4e00\u6846\u67b6\u5185\u5b9a\u5236\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6d41\u7a0b\uff0c\u7b80\u5316\u4e86\u5f00\u53d1\u548c\u6bd4\u8f83\u8fc7\u7a0b\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u671f\u671bJailbreakEval\u80fd\u4fc3\u8fdb\u8d8a\u72f1\u653b\u51fb\u8bc4\u4ef7\u7684\u6807\u51c6\u5316\uff0c\u6210\u4e3a\u793e\u533a\u5185\u8d8a\u72f1\u7814\u7a76\u8bc4\u4f30\u7684\u50ac\u5316\u5242\u3002**|\n", "2406.10229": "|**2024-06-14**|**Quantifying Variance in Evaluation Benchmarks**|Lovish Madaan et.al.|[2406.10229](http://arxiv.org/abs/2406.10229)|null|\u8bc4\u4ef7\u57fa\u51c6\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4e5f\u662f\u63a8\u52a8\u8fd9\u4e9b\u80fd\u529b\u8fdb\u6b65\u7684\u9a71\u52a8\u529b\u3002\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff08\u6216\u7f3a\u4e4f\uff09\uff0c\u73b0\u5728\u5b83\u4eec\u4e5f\u88ab\u5e7f\u6cdb\u7528\u4e8e\u51b3\u5b9a\u4e0d\u540c\u7684\u8bad\u7ec3\u9009\u62e9\u4e4b\u95f4\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u6211\u4eec\u5f88\u5c11\u91cf\u5316\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\uff0c\u8fd9\u51b3\u5b9a\u4e86\u6027\u80fd\u5dee\u5f02\u7684\u542b\u4e49\u3002\u672c\u6587\u5b9a\u4e49\u5e76\u6d4b\u91cf\u4e86\u4e00\u7cfb\u5217\u65e8\u5728\u8861\u91cf\u8bc4\u4ef7\u57fa\u51c6\u65b9\u5dee\u7684\u6307\u6807\uff0c\u5305\u62ec\u521d\u59cb\u5316\u65f6\u7684\u968f\u673a\u79cd\u5b50\u65b9\u5dee\u548c\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5355\u8c03\u6027\u3002\u901a\u8fc7\u5bf9\u5927\u91cf\u6a21\u578b\uff08\u5305\u62ec\u516c\u5f00\u53ef\u7528\u7684\u548c\u4ece\u5934\u8bad\u7ec3\u7684\u6a21\u578b\uff09\u8fdb\u884c\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5404\u79cd\u65b9\u5dee\u5ea6\u91cf\u7684\u5b9e\u8bc1\u4f30\u8ba1\uff0c\u5e76\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u8003\u8651\u548c\u5efa\u8bae\u3002\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u8fde\u7eed\u548c\u79bb\u6563\u6027\u80fd\u5ea6\u91cf\u7684\u5b9e\u7528\u6027\u548c\u6743\u8861\uff0c\u5e76\u63a2\u7d22\u4e86\u66f4\u597d\u5730\u7406\u89e3\u548c\u51cf\u5c11\u65b9\u5dee\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u89c4\u6a21\uff08\u7ea670\u4ebf\u53c2\u6570\uff09\u7684\u6a21\u578b\uff0c\u5982\u5c06\u591a\u6a21\u6001\u591a\u4efb\u52a1\u5b66\u4e60\uff08MMLU\uff09\u4efb\u52a1\u6846\u67b6\u4e3a\u5b8c\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u5e38\u5e38\u964d\u4f4e\u65b9\u5dee\uff1b\u800c\u53d7\u5230\u4eba\u7c7b\u6d4b\u8bd5\u6587\u732e\u542f\u53d1\u7684\u66f4\u590d\u6742\u65b9\u6cd5\uff08\u5982\u9879\u76ee\u5206\u6790\u548c\u9879\u76ee\u53cd\u5e94\u7406\u8bba\uff09\u5728\u663e\u8457\u51cf\u5c11\u65b9\u5dee\u65b9\u9762\u6548\u679c\u6709\u9650\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\u7279\u6027\uff0c\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u7279\u5b9a\u6280\u672f\u6765\u51cf\u5c11\u65b9\u5dee\uff0c\u5e76\u666e\u904d\u9f13\u52b1\u5b9e\u8df5\u8005\u5728\u6bd4\u8f83\u6a21\u578b\u65f6\u4ed4\u7ec6\u8003\u8651\u65b9\u5dee\u56e0\u7d20\u3002|\n", "2406.10218": "|**2024-06-14**|**Semantic Membership Inference Attack against Large Language Models**|Hamid Mozaffari et.al.|[2406.10218](http://arxiv.org/abs/2406.10218)|null|## \u80cc\u666f \u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIA\uff09\u7684\u76ee\u6807\u662f\u8bc6\u522b\u7279\u5b9a\u6570\u636e\u70b9\u662f\u5426\u88ab\u7eb3\u5165\u4e86\u76ee\u6807\u6a21\u578b\u7684\u8bad\u7ec3\u96c6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u8bed\u4e49\u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Semantic Membership Inference Attack\uff0cSMIA\uff09\uff0c\u901a\u8fc7\u5229\u7528\u8f93\u5165\u7684\u8bed\u4e49\u5185\u5bb9\u53ca\u5176\u6270\u52a8\uff0c\u63d0\u5347MIA\u7684\u6027\u80fd\u3002SMIA\u8bad\u7ec3\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6765\u5206\u6790\u76ee\u6807\u6a21\u578b\u5bf9\u6270\u52a8\u8f93\u5165\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6355\u6349\u6210\u5458\u6837\u672c\u4e0e\u975e\u6210\u5458\u6837\u672c\u4e4b\u95f4\u8f93\u51fa\u6982\u7387\u5206\u5e03\u7684\u5dee\u5f02\u3002\u6211\u4eec\u5728Pythia\u548cGPT-Neo\u6a21\u578b\u5bb6\u65cf\uff0c\u4ee5\u53caWikipedia\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSMIA\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u653b\u51fb\u624b\u6bb5\uff0c\u4f8b\u5982\u5728Pythia-12B\u4e0a\u7684AUC-ROC\u503c\u8fbe\u5230\u4e8667.39%\uff0c\u800c\u7b2c\u4e8c\u597d\u7684\u653b\u51fb\u65b9\u6cd5\u4ec5\u4e3a58.90%\u3002|\n", "2406.10216": "|**2024-06-14**|**Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs**|Rui Yang et.al.|[2406.10216](http://arxiv.org/abs/2406.10216)|null|\u5728\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6846\u67b6\u4e2d\uff0c\u5229\u7528\u57fa\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u5956\u52b1\u6a21\u578b\u5df2\u8bc1\u5b9e\u80fd\u6709\u6548\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u610f\u56fe\u3002\u7136\u800c\uff0c\u5f53\u524d\u5956\u52b1\u6a21\u578b\u5bf9\u672a\u89c1\u8fc7\u7684\u63d0\u793a\u548c\u54cd\u5e94\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\uff0c\u53ef\u80fd\u5bfc\u81f4\u6240\u8c13\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u5373\u5956\u52b1\u4f18\u5316\u8fc7\u5ea6\u5bfc\u81f4\u5b9e\u9645\u6027\u80fd\u4e0b\u964d\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u503e\u5411\u4e8e\u7ea6\u675f\u7b56\u7565\u4f18\u5316\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u6b63\u5219\u5316\u9690\u85cf\u72b6\u6001\u6765\u589e\u5f3a\u5956\u52b1\u6a21\u578b\u5e94\u5bf9\u5206\u5e03\u53d8\u5316\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4fdd\u7559\u57fa\u7840\u6a21\u578b\u7684\u8bed\u8a00\u6a21\u578b\u5934\uff0c\u5e76\u7ed3\u5408\u4e00\u7cfb\u5217\u6587\u672c\u751f\u6210\u635f\u5931\uff0c\u65e8\u5728\u4fdd\u6301\u9690\u85cf\u72b6\u6001\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\uff0c\u540c\u65f6\u5728\u76f8\u540c\u7684\u9690\u85cf\u72b6\u6001\u540e\u5b66\u4e60\u4e00\u4e2a\u5956\u52b1\u5934\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5f15\u5165\u7684\u6b63\u5219\u5316\u6280\u672f\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u6cdb\u5316\u4efb\u52a1\u4e2d\u7684\u5956\u52b1\u6a21\u578b\u51c6\u786e\u6027\uff0c\u5e76\u6709\u6548\u7f13\u89e3\u4e86RLHF\u4e2d\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u53ef\u9760\u3001\u66f4\u7a33\u5065\u7684\u504f\u597d\u5b66\u4e60\u8303\u5f0f\u3002|\n", "2406.10209": "|**2024-06-14**|**Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs**|Abhimanyu Hans et.al.|[2406.10209](http://arxiv.org/abs/2406.10209)|**[link](https://github.com/ahans30/goldfish-loss)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u8bb0\u4f4f\u5e76\u91cd\u590d\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u8fd9\u5e26\u6765\u4e86\u9690\u79c1\u548c\u7248\u6743\u95ee\u9898\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u79cd\u8bb0\u5fc6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u4e0b\u4e00\u6b65 token \u8bad\u7ec3\u76ee\u6807\u7684\u5fae\u5999\u4fee\u6539\uff0c\u79f0\u4e3a\u201c\u91d1\u9c7c\u635f\u5931\u201d\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u4ee4\u724c\u4e0d\u53c2\u4e0e\u635f\u5931\u8ba1\u7b97\u3002\u6a21\u578b\u4e0d\u4f1a\u8bb0\u4f4f\u8fd9\u4e9b\u88ab\u4e22\u5f03\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u9632\u6b62\u4e86\u5b8c\u6574\u8bad\u7ec3\u5e8f\u5217\u7684\u9010\u5b57\u590d\u5236\u3002\u6211\u4eec\u5728\u6570\u5341\u4ebf\u89c4\u6a21\u7684 Llama-2 \u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u548c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u63d0\u53d6\u7684\u8bb0\u5fc6\uff0c\u800c\u5bf9\u4e0b\u6e38\u57fa\u51c6\u7684\u5f71\u54cd\u5fae\u4e4e\u5176\u5fae\u3002**|\n", "2406.10196": "|**2024-06-14**|**TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners**|Tomas de la Rosa et.al.|[2406.10196](http://arxiv.org/abs/2406.10196)|null|**\u6458\u8981\uff1a** \u65c5\u884c\u89c4\u5212\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u6839\u636e\u7ea6\u675f\u6761\u4ef6\u751f\u6210\u4e00\u7cfb\u5217\u4e0e\u8bbf\u95ee\u5730\u70b9\u76f8\u5173\u7684\u884c\u52a8\uff0c\u540c\u65f6\u6700\u5927\u5316\u7528\u6237\u7684\u6ee1\u610f\u5ea6\u3002\u4f20\u7edf\u65b9\u6cd5\u901a\u5e38\u4f1a\u5c06\u95ee\u9898\u8f6c\u5316\u4e3a\u7279\u5b9a\u5f62\u5f0f\u7684\u8bed\u8a00\u8868\u8fbe\uff0c\u4ece\u7f51\u7edc\u8d44\u6e90\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4f7f\u7528\u5408\u9002\u7684\u6c42\u89e3\u5668\u6765\u751f\u6210\u6709\u6548\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\u76f4\u63a5\u4ece\u7528\u6237\u8bf7\u6c42\u4e2d\u8f93\u51fa\u8ba1\u5212\uff0c\u5229\u7528\u4e30\u5bcc\u7684\u65c5\u884c\u9886\u57df\u77e5\u8bc6\u63d0\u4f9b\u666f\u70b9\u548c\u53ef\u80fd\u8def\u7ebf\u7b49\u9ad8\u5c42\u6b21\u4fe1\u606f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5f80\u5f80\u4ea7\u751f\u4e0d\u8fde\u8d2f\u3001\u672a\u80fd\u5b8c\u5168\u6ee1\u8db3\u7ea6\u675f\u7684\u8ba1\u5212\uff0c\u4e14\u65e0\u6cd5\u4fdd\u8bc1\u751f\u6210\u9ad8\u8d28\u91cf\u65b9\u6848\u3002\u6211\u4eec\u63d0\u51faTRIP-PAL\uff0c\u4e00\u79cd\u878d\u5408LLMs\u548c\u81ea\u52a8\u5316\u89c4\u5212\u5668\u7684\u6df7\u5408\u65b9\u6cd5\uff1a\uff081\uff09LLMs\u83b7\u53d6\u5e76\u8f6c\u6362\u65c5\u884c\u4fe1\u606f\u548c\u7528\u6237\u9700\u6c42\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u8f93\u5165\u89c4\u5212\u5668\u7684\u6570\u636e\u7ed3\u6784\uff1b\uff082\uff09\u81ea\u52a8\u5316\u89c4\u5212\u5668\u8d1f\u8d23\u751f\u6210\u6ee1\u8db3\u7ea6\u675f\u5e76\u4f18\u5316\u7528\u6237\u6548\u7528\u7684\u65c5\u884c\u8ba1\u5212\u3002\u6211\u4eec\u5728\u4e0d\u540c\u65c5\u884c\u573a\u666f\u4e2d\u7684\u5b9e\u9a8c\u8868\u660e\uff0cTRIP-PAL\u5728\u751f\u6210\u65c5\u884c\u8ba1\u5212\u65b9\u9762\u4f18\u4e8e\u7eafLLM\u65b9\u6cd5\u3002|\n", "2406.10185": "|**2024-06-14**|**Detecting and Evaluating Medical Hallucinations in Large Vision Language Models**|Jiawei Chen et.al.|[2406.10185](http://arxiv.org/abs/2406.10185)|null|\u968f\u7740\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u5982\u533b\u5b66\u56fe\u50cf\u95ee\u7b54\u548c\u62a5\u544a\u751f\u6210\uff0c\u5b83\u4eec\u4ece\u57fa\u7840\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u90a3\u91cc\u7ee7\u627f\u4e86\u5f3a\u5927\u7684\u529f\u80fd\uff0c\u4f46\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u4ee4\u4eba\u62c5\u5fe7\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5728\u533b\u7597\u8fd9\u6837\u5bf9\u9519\u8bef\u5bb9\u9650\u6781\u4f4e\u7684\u73af\u5883\u4e2d\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u65b9\u6cd5\u6216\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86Med-HallMark\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u533b\u7597\u591a\u6a21\u6001\u9886\u57df\u8bbe\u8ba1\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u57fa\u51c6\u3002Med-HallMark\u652f\u6301\u591a\u4efb\u52a1\u5e7b\u89c9\u68c0\u6d4b\uff0c\u63d0\u4f9b\u591a\u5143\u5316\u7684\u5e7b\u89c9\u6570\u636e\uff0c\u5e76\u91c7\u7528\u5206\u7ea7\u5e7b\u89c9\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MediHall Score\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u533b\u7597\u8bc4\u4f30\u6307\u6807\uff0c\u901a\u8fc7\u5206\u5c42\u8bc4\u5206\u7cfb\u7edf\u8bc4\u4f30LVLM\u7684\u5e7b\u89c9\uff0c\u8003\u8651\u5176\u4e25\u91cd\u7a0b\u5ea6\u548c\u7c7b\u578b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u6f5c\u5728\u4e34\u5e8a\u5f71\u54cd\u7684\u7ec6\u81f4\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86MediHallDetector\uff0c\u4e00\u79cd\u4e13\u4e3a\u7cbe\u786e\u5e7b\u89c9\u68c0\u6d4b\u8bbe\u8ba1\u7684\u533b\u7597LVLM\uff0c\u5b83\u91c7\u7528\u4e86\u591a\u4efb\u52a1\u8bad\u7ec3\u65b9\u6cd5\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u4e3a\u6d41\u884c\u7684LVLM\u8bbe\u7acb\u4e86\u57fa\u7ebf\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMediHall Score\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edf\u6307\u6807\u66f4\u6df1\u5165\u7406\u89e3\u5e7b\u89c9\u5f71\u54cd\u7684\u80fd\u529b\uff0c\u5e76\u663e\u793a\u4e86MediHallDetector\u7684\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u663e\u8457\u63d0\u9ad8LVLM\u5728\u533b\u7597\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6240\u6709\u76f8\u5173\u8d44\u6e90\u5c06\u5728\u4e0d\u4e45\u540e\u53d1\u5e03\u3002|\n", "2406.10181": "|**2024-06-14**|**Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors**|Siyuan Chen et.al.|[2406.10181](http://arxiv.org/abs/2406.10181)|null|\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8e\u5185\u5b58\u9700\u6c42\u901a\u5e38\u8d85\u8fc7\u5355\u4e2aGPU\u7684\u5bb9\u91cf\uff0c\u89e3\u51b3\u8fd9\u4e00\u5185\u5b58\u6311\u6218\u7684\u4e00\u4e2a\u5e38\u89c1\u65b9\u6cd5\u662f\u5c06\u8ba1\u7b97\u548c\u6570\u636e\u4eceGPU\u8fc1\u79fb\u5230CPU\u3002\u7136\u800c\uff0c\u8fd9\u53d7\u5230\u666e\u901a\u786c\u4ef6\u5e26\u5bbd\u9650\u5236\u7684\u5236\u7ea6\uff0c\u5f71\u54cd\u4e86CPU\u4e0eGPU\u4e4b\u95f4\u7684\u901a\u4fe1\u6548\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLSP_Offload\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5b66\u4e60\u5f0f\u7684\u5b50\u7a7a\u95f4\u6295\u5f71\u5668\uff0c\u5b9e\u73b0\u5728 commodity \u786c\u4ef6\u4e0a\u63a5\u8fd1\u539f\u751f\u901f\u5ea6\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6d89\u53ca\u5b66\u4e60\u4e00\u4e2a\u9ad8\u6548\u7684\u7a00\u758f\u538b\u7f29\u5668\uff0c\u4ee5\u6700\u5c0f\u5316\u901a\u4fe1\u5e76\u4fdd\u6301\u6700\u5c0f\u7cbe\u5ea6\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5c42\u7ea7\u901a\u4fe1\u8c03\u5ea6\u7b56\u7565\uff0c\u4ee5\u6700\u5927\u5316\u901a\u4fe1\u4e0e\u8ba1\u7b97\u4e4b\u95f4\u7684\u5e76\u884c\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u57284GB\u7b14\u8bb0\u672cGPU\u4e0a\u5fae\u8c0313\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5728\u914d\u590724GB\u5185\u5b58\u7684NVIDIA RTX 4090 GPU\u4e0a\u5fae\u8c0370\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u4ec5\u6bd4\u65e0\u5185\u5b58\u9650\u5236\u7684\u5fae\u8c03\u616231%\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u79bb\u7ebf\u6846\u67b6\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86\u5fae\u8c03\u541e\u5410\u91cf\uff0c\u6700\u9ad8\u53ef\u8fbe3.33\u500d\uff0c\u5f53\u8fbe\u5230\u76f8\u540c\u51c6\u786e\u5ea6\u65f6\uff0c\u51cf\u5c11\u4e86\u7aef\u5230\u7aef\u5fae\u8c03\u65f6\u95f4\u768433.1%\u81f362.5%\u3002|\n", "2406.10172": "|**2024-06-14**|**Datasets for Multilingual Answer Sentence Selection**|Matteo Gabburo et.al.|[2406.10172](http://arxiv.org/abs/2406.10172)|null|**\u6458\u8981\uff1a** \u5728\u8bbe\u8ba1\u9ad8\u6548\u7684\u68c0\u7d22\u5f0f\u95ee\u7b54\uff08Question Answering\uff0cQA\uff09\u7cfb\u7edf\u4e2d\uff0c\u7b54\u6848\u53e5\u5b50\u9009\u62e9\uff08Answer Sentence Selection\uff0cAS2\uff09\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u6807\u6ce8\u6570\u636e\uff0c\u5927\u591a\u6570AS2\u9886\u57df\u7684\u8fdb\u5c55\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\u3002\u8fd9\u5bfc\u81f4\u4e86\u975e\u82f1\u8bed\u73af\u5883\u4e0bQA\u7cfb\u7edf\u7684\u6027\u80fd\u4e0e\u82f1\u8bed\u7cfb\u7edf\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u672c\u8bba\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u65b0\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u8a00\uff08\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u8461\u8404\u7259\u8bed\u548c\u897f\u73ed\u7259\u8bed\uff09AS2\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u5bf9\u73b0\u6709\u7684\u82f1\u6587AS2\u6570\u636e\u96c6\uff08\u5982ASNQ\u3001WikiQA\u548cTREC-QA\uff09\u8fdb\u884c\u76d1\u7763\u81ea\u52a8\u673a\u5668\u7ffb\u8bd1\uff08Automatic Machine Translation\uff0cAMT\uff09\u3002\u6211\u4eec\u901a\u8fc7\u591a\u79cd\u5b9e\u9a8c\u548c\u4e0d\u540cTransformer\u67b6\u6784\u7684\u8bc4\u4f30\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u53ca\u7ffb\u8bd1\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u5bf9\u4e8e\u6784\u5efa\u5065\u58ee\u7684\u591a\u8bed\u8a00AS2\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u663e\u8457\u7f29\u5c0f\u4e86\u975e\u82f1\u8bed\u4e0e\u82f1\u8bed\u73af\u5883\u4e0b\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2406.10162": "|**2024-06-14**|**Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models**|Carson Denison et.al.|[2406.10162](http://arxiv.org/abs/2406.10162)|**[link](https://github.com/anthropics/sycophancy-to-subterfuge-paper)**|**\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u5f53\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5b66\u4f1a\u56e0\u8bad\u7ec3\u76ee\u6807\u4e0d\u660e\u786e\u800c\u83b7\u5f97\u4e0d\u671f\u671b\u7684\u884c\u4e3a\u65f6\uff0c\u5c31\u4f1a\u51fa\u73b0\u89c4\u683c\u6e38\u620f\u73b0\u8c61\u3002\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u4ece\u7b80\u5355\u7684\u5949\u627f\u884c\u4e3a\u53d1\u5c55\u5230\u66f4\u590d\u6742\u4e14\u5371\u9669\u7684\u5956\u52b1\u7be1\u6539\uff0c\u5373\u6a21\u578b\u76f4\u63a5\u4fee\u6539\u5176\u81ea\u8eab\u7684\u5956\u52b1\u673a\u5236\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8fd9\u4e9b\u590d\u6742\u884c\u4e3a\u53ef\u80fd\u8d85\u51fa\u63a2\u7d22\u7684\u8303\u7574\u3002\u672c\u8bba\u6587\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u4f1a\u5728\u5b66\u4e60\u5e38\u89c1\u89c4\u683c\u6e38\u620f\u7b56\u7565\u540e\uff0c\u6cdb\u5316\u5230\u6267\u884c\u66f4\u4e3a\u7f55\u89c1\u548c\u660e\u663e\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u5956\u52b1\u7be1\u6539\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9010\u6b65\u5347\u7ea7\u7684\u53ef\u6e38\u620f\u73af\u5883\u7cfb\u5217\uff0c\u5e76\u53d1\u73b0\u9488\u5bf9\u65e9\u671f\u9636\u6bb5\u73af\u5883\u7684\u8bad\u7ec3\u4f1a\u5bfc\u81f4\u5728\u540e\u7eed\u73af\u5883\u4e2d\u51fa\u73b0\u66f4\u591a\u7684\u89c4\u683c\u6e38\u620f\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u4e00\u5c0f\u90e8\u5206\u4f46\u975e\u96f6\u7684LLMs\uff0c\u5728\u7ecf\u5386\u4e86\u5b8c\u6574\u8bad\u7ec3\u8bfe\u7a0b\u540e\uff0c\u80fd\u591f\u96f6\u6837\u672c\u5730\u76f4\u63a5\u4fee\u6539\u5176\u5956\u52b1\u51fd\u6570\u3002\u91cd\u65b0\u8bad\u7ec3LLMs\u4ee5\u907f\u514d\u65e9\u671f\u9636\u6bb5\u7684\u6e38\u620f\u884c\u4e3a\u53ef\u4ee5\u51cf\u8f7b\u4f46\u4e0d\u80fd\u5b8c\u5168\u6d88\u9664\u540e\u671f\u73af\u5883\u4e2d\u7684\u5956\u52b1\u7be1\u6539\u3002\u6b64\u5916\uff0c\u5bf9\u53ef\u6e38\u620f\u73af\u5883\u8fdb\u884c\u65e0\u5bb3\u6027\u8bad\u7ec3\u5e76\u4e0d\u80fd\u963b\u6b62\u5956\u52b1\u7be1\u6539\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u4ece\u5e38\u89c1\u7684\u89c4\u683c\u6e38\u620f\u7b56\u7565\u4e2d\u6cdb\u5316\u5230\u66f4\u6076\u52a3\u7684\u5956\u52b1\u7be1\u6539\u884c\u4e3a\uff0c\u5e76\u4e14\u8981\u6d88\u9664\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u5e76\u975e\u6613\u4e8b\u3002**|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149](http://arxiv.org/abs/2406.10149)|**[link](https://github.com/booydar/babilong)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8f93\u5165\u4e0a\u4e0b\u6587\u957f\u5ea6\u663e\u8457\u589e\u52a0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u672a\u80fd\u5145\u5206\u8861\u91cf\u6a21\u578b\u5904\u7406\u957f\u7bc7\u6587\u672c\u4e2d\u7684\u4e8b\u5b9e\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86BABILong\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u6d4b\u8bd5\u6a21\u578b\u5728\u5206\u5e03\u5f0f\u957f\u6587\u6863\u4e2d\u8de8\u4e8b\u5b9e\u63a8\u7406\u7684\u80fd\u529b\u3002BABILong\u5305\u62ec20\u4e2a\u591a\u6837\u5316\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u5982\u4e8b\u5b9e\u94fe\u3001\u7b80\u5355\u5f52\u7eb3\u3001\u6f14\u7ece\u3001\u8ba1\u6570\u4ee5\u53ca\u5904\u7406\u5217\u8868/\u96c6\u5408\u7b49\u3002\u8fd9\u4e9b\u4efb\u52a1\u672c\u8eab\u5c31\u5177\u6709\u6311\u6218\u6027\uff0c\u800c\u5f53\u6240\u9700\u4e8b\u5b9e\u5206\u6563\u5728\u957f\u7bc7\u81ea\u7136\u6587\u672c\u4e2d\u65f6\uff0c\u96be\u5ea6\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6d41\u884c\u7684LLMs\u5b9e\u9645\u4e0a\u53ea\u5229\u7528\u4e8610%-20%\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4e14\u968f\u7740\u63a8\u7406\u590d\u6742\u6027\u7684\u63d0\u9ad8\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u5bf9\u4e8e\u66ff\u4ee3\u7684\u4e0a\u4e0b\u6587\u63a8\u7406\u65b9\u6cd5\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\u5728\u5355\u4e8b\u5b9e\u95ee\u9898\u56de\u7b54\u4e0a\u7684\u51c6\u786e\u7387\u4ec5\u4e3a60%\uff0c\u4e0e\u4e0a\u4e0b\u6587\u957f\u5ea6\u65e0\u5173\u3002\u5728\u4e0a\u4e0b\u6587\u6269\u5c55\u65b9\u6cd5\u4e2d\uff0c\u5faa\u73af\u8bb0\u5fc6Transformer\u5c55\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u53ef\u5904\u7406\u957f\u8fbe1100\u4e07\u4e2a\u4ee4\u724c\u7684\u957f\u5ea6\u3002BABILong\u57fa\u51c6\u6d4b\u8bd5\u53ef\u4ee5\u6269\u5c55\u5230\u4efb\u610f\u957f\u5ea6\uff0c\u4ee5\u652f\u6301\u8bc4\u4f30\u5177\u6709\u66f4\u5f3a\u80fd\u529b\u7684\u65b0\u6a21\u578b\uff0c\u5e76\u63d0\u4f9b\u4e86\u957f\u8fbe100\u4e07\u4ee4\u724c\u7684\u5206\u9694\u3002|\n", "2406.11840": "|**2024-06-17**|**LLaNA: Large Language and NeRF Assistant**|Andrea Amaduzzi et.al.|[2406.11840](http://arxiv.org/abs/2406.11840)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u56fe\u50cf\u548c3D\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5168\u9762\u6355\u6349\u7269\u4f53\u7684\u5916\u89c2\u548c\u51e0\u4f55\u7279\u6027\u4e0a\u5b58\u5728\u5c40\u9650\u3002\u8fd1\u671f\uff0c\u795e\u7ecf\u8f90\u5c04\u573a\uff08Neural Radiance Fields\uff0c\u7b80\u79f0NeRF\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u5668\uff08Multi-Layer Perceptron\uff0cMLP\uff09\u7684\u6743\u91cd\u7f16\u7801\u4e86\u7269\u4f53\u7684\u51e0\u4f55\u7ed3\u6784\u548c\u9ad8\u5ea6\u903c\u771f\u7684\u5916\u89c2\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06NeRF\u6574\u5408\u5230MLLM\u4e2d\u7684\u53ef\u884c\u6027\u548c\u6548\u679c\u3002\u6211\u4eec\u5f00\u53d1\u4e86LLaNA\uff0c\u8fd9\u662f\u9996\u4e2a\u901a\u7528\u7684NeRF-\u8bed\u8a00\u52a9\u624b\uff0c\u80fd\u591f\u6267\u884c\u65b0\u4efb\u52a1\uff0c\u5982NeRF\u63cf\u8ff0\u548c\u95ee\u7b54\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u76f4\u63a5\u5904\u7406NeRF MLP\u7684\u6743\u91cd\uff0c\u65e0\u9700\u6e32\u67d3\u56fe\u50cf\u6216\u6784\u5efa3D\u6570\u636e\u7ed3\u6784\uff0c\u5c31\u80fd\u63d0\u53d6\u6709\u5173\u4ee3\u8868\u5bf9\u8c61\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65e0\u987b\u4eba\u5de5\u5e72\u9884\u7684NeRF\u6587\u672c\u6807\u6ce8\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5404\u79cdNeRF-\u8bed\u8a00\u4efb\u52a1\uff0c\u5e76\u636e\u6b64\u5efa\u7acb\u4e86\u4e00\u4e2a\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u6211\u4eec\u7684\u6a21\u578b\u5bf9NeRF\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5904\u7406NeRF\u6743\u91cd\u7684\u65b9\u6cd5\u5728\u4e0e\u4eceNeRF\u4e2d\u63d0\u53d62D\u62163D\u8868\u793a\u8fdb\u884c\u6bd4\u8f83\u65f6\u8868\u73b0\u66f4\u4f18\u3002|\n", "2406.11839": "|**2024-06-17**|**mDPO: Conditional Preference Optimization for Multimodal Large Language Models**|Fei Wang et.al.|[2406.11839](http://arxiv.org/abs/2406.11839)|null|### \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5df2\u88ab\u8bc1\u660e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6821\u51c6\u7684\u6709\u6548\u624b\u6bb5\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06DPO\u5e94\u7528\u4e8e\u591a\u6a21\u6001\u573a\u666f\uff0c\u4f46\u53d1\u73b0\u5b9e\u73b0\u6301\u7eed\u6539\u8fdb\u9887\u5177\u6311\u6218\u3002\u901a\u8fc7\u5bf9\u6bd4\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5ffd\u89c6\u4e86\u56fe\u50cf\u6761\u4ef6\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86mDPO\uff0c\u4e00\u4e2a\u65e8\u5728\u9632\u6b62\u8bed\u8a00\u504f\u597d\u8fc7\u5ea6\u4f18\u5148\u7684\u591a\u6a21\u6001DPO\u76ee\u6807\uff0c\u540c\u65f6\u4f18\u5316\u56fe\u50cf\u504f\u597d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5956\u52b1\u951a\u70b9\uff0c\u786e\u4fdd\u9009\u62e9\u7684\u54cd\u5e94\u5956\u52b1\u4fdd\u6301\u6b63\u5411\uff0c\u4ece\u800c\u907f\u514d\u76f8\u5bf9\u504f\u597d\u4f18\u5316\u56fa\u6709\u7684\u53ef\u80fd\u6027\u964d\u4f4e\u95ee\u9898\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u4e24\u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u591a\u6a21\u6001LLM\u4ee5\u53ca\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0cmDPO\u6709\u6548\u89e3\u51b3\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5e76\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002|\n", "2406.11832": "|**2024-06-17**|**Unveiling Encoder-Free Vision-Language Models**|Haiwen Diao et.al.|[2406.11832](http://arxiv.org/abs/2406.11832)|**[link](https://github.com/baaivision/eve)**|**\u5f53\u524d\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u4e3b\u8981\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7f16\u7801\u5668\u6765\u63d0\u53d6\u89c6\u89c9\u7279\u5f81\uff0c\u7136\u540e\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u5728\u62bd\u8c61\u89c6\u89c9\u8868\u793a\u65b9\u9762\u8bbe\u5b9a\u4e86\u5f3a\u70c8\u7684\u5148\u9a8c\uff0c\u5982\u5206\u8fa8\u7387\u3001\u6bd4\u4f8b\u548c\u8bed\u4e49\u503e\u5411\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86VLM\u7684\u7075\u6d3b\u6027\u548c\u6548\u7387\u3002\u76f4\u63a5\u8bad\u7ec3\u65e0\u7f16\u7801\u5668\u7684\u7eafVLM\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u4e14\u9c9c\u6709\u63a2\u7d22\u3002\u5b9e\u8bc1\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u79cd\u76f4\u63a5\u8bad\u7ec3\u65b9\u6cd5\u4f1a\u5bfc\u81f4\u6536\u655b\u7f13\u6162\u548c\u6027\u80fd\u5dee\u8ddd\u8f83\u5927\u3002\u672c\u6587\u65e8\u5728\u5f25\u5408\u7f16\u7801\u5668\u4f9d\u8d56\u578b\u548c\u65e0\u7f16\u7801\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u7eafVLM\u8bad\u7ec3\u7b56\u7565\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u6df1\u5165\u5b9e\u9a8c\u63ed\u793a\u4e86\u9ad8\u6548\u8bad\u7ec3\u65e0\u7f16\u7801\u5668VLM\u7684\u5173\u952e\u8981\u7d20\uff1a\uff081\uff09\u5728\u7edf\u4e00\u7684\u89e3\u7801\u5668\u5185\u878d\u5408\u89c6\u89c9\u4e0e\u8bed\u8a00\u8868\u793a\uff1b\uff082\uff09\u901a\u8fc7\u989d\u5916\u76d1\u7763\u63d0\u5347\u89c6\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e9b\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86EVE\uff0c\u4e00\u4e2a\u65e0\u7f16\u7801\u5668\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u65e2\u80fd\u9ad8\u6548\u8bad\u7ec3\u4e5f\u80fd\u5feb\u901f\u63a8\u7406\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4ec5\u4f7f\u75283500\u4e07\u516c\u5f00\u53ef\u7528\u7684\u6570\u636e\uff0cEVE\u5c31\u80fd\u5728\u591a\u4e2a\u89c6\u89c9\u8bed\u8a00\u57fa\u51c6\u4e0a\u4e0e\u7c7b\u4f3c\u5bb9\u91cf\u7684\u7f16\u7801\u5668\u4f9d\u8d56\u578bVLM\u5339\u654c\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u795e\u79d8\u3001\u6570\u636e\u672a\u516c\u5f00\u7684Fuyu-8B\u6a21\u578b\u3002\u6211\u4eec\u76f8\u4fe1\uff0cEVE\u4e3a\u8de8\u6a21\u6001\u5f00\u53d1\u7eaf\u7cb9\u7684\u89e3\u7801\u5668\u67b6\u6784\u63d0\u4f9b\u4e86\u4e00\u4e2a\u900f\u660e\u4e14\u9ad8\u6548\u7684\u8def\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u516c\u5f00\u5728\uff1ahttps://github.com/baaivision/EVE\u3002**|\n", "2406.11831": "|**2024-06-17**|**Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models**|Bingqi Ma et.al.|[2406.11831](http://arxiv.org/abs/2406.11831)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u89e3\u7801\u5668-only\u53d8\u538b\u5668\u5728\u6587\u672c\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5982\u4f55\u5c06\u8fd9\u4e9b\u5148\u8fdb\u7684LLMs\u5e94\u7528\u4e8e\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\u4ecd\u662f\u4e00\u4e2a\u5f85\u63a2\u7d22\u7684\u95ee\u9898\u3002\u6211\u4eec\u53d1\u73b0\u76f4\u63a5\u4f7f\u7528LLM\u4f5c\u4e3a\u63d0\u793a\u7f16\u7801\u5668\u4f1a\u663e\u8457\u964d\u4f4e\u751f\u6210\u56fe\u50cf\u65f6\u7684\u63d0\u793a\u8ddf\u968f\u80fd\u529b\u3002\u4e3b\u8981\u5b58\u5728\u4e24\u4e2a\u95ee\u9898\uff1a\u4e00\u662fLLM\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u8bad\u7ec3\u4e0e\u6269\u6563\u6a21\u578b\u5bf9\u533a\u5206\u6027\u63d0\u793a\u7279\u5f81\u7684\u9700\u6c42\u4e0d\u5339\u914d\uff1b\u4e8c\u662f\u89e3\u7801\u5668\u67b6\u6784\u56fa\u6709\u7684\u4f4d\u7f6e\u504f\u89c1\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u6846\u67b6\uff0c\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4f7f\u7528\u6307\u5357\uff0c\u589e\u5f3aLLM\u7684\u6587\u672c\u8868\u793a\u80fd\u529b\uff0c\u6d88\u9664\u5176\u5185\u5728\u7684\u5b9a\u4f4d\u504f\u89c1\uff0c\u4ece\u800c\u7075\u6d3b\u5730\u5c06\u6700\u5148\u8fdb\u7684LLMs\u878d\u5165\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u878d\u5408\u591a\u4e2aLLMs\u7684\u65b9\u6cd5\u3002\u9274\u4e8eTransformer\u67b6\u6784\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6269\u5c55\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8bbe\u8ba1\u4e86\u57fa\u4e8e\u8be5\u6846\u67b6\u7684LLM-Infused Diffusion Transformer\uff08LI-DiT\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86LI-DiT\u5728\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u6570\u636e\u91cf\u4e0b\u7684\u6027\u80fd\u3002\u5f97\u76ca\u4e8eLLMs\u7684\u5185\u5728\u80fd\u529b\u53ca\u6211\u4eec\u7684\u521b\u65b0\u8bbe\u8ba1\uff0cLI-DiT\u7684\u63d0\u793a\u7406\u89e3\u6027\u80fd\u8f7b\u677e\u8d85\u8d8a\u5f00\u6e90\u7684\u6700\u65b0\u6a21\u578b\uff0c\u4ee5\u53ca\u5305\u62ecStable Diffusion 3\u3001DALL-E 3\u548cMidjourney V6\u5728\u5185\u7684\u4e3b\u6d41\u95ed\u6e90\u5546\u4e1a\u6a21\u578b\u3002\u5f3a\u5927\u7684LI-DiT-10B\u5c06\u5728\u8fdb\u4e00\u6b65\u4f18\u5316\u548c\u5b89\u5168\u68c0\u67e5\u540e\u63d0\u4f9b\u3002|\n", "2406.11827": "|**2024-06-17**|**WPO: Enhancing RLHF with Weighted Preference Optimization**|Wenxuan Zhou et.al.|[2406.11827](http://arxiv.org/abs/2406.11827)|**[link](https://github.com/wzhouad/wpo)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u66f4\u597d\u5730\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6709\u524d\u666f\u65b9\u6cd5\u3002\u7531\u4e8e\u6210\u672c\u6548\u76ca\u548c\u53ef\u6269\u5c55\u6027\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u2014\u2014\u901a\u8fc7\u5176\u4ed6\u6a21\u578b\u83b7\u53d6\u504f\u597d\u6570\u636e\u2014\u2014\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u7136\u800c\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u5e38\u53d7\u91c7\u6837\u7b56\u7565\u4e0e\u76ee\u6807\u7b56\u7565\u4e4b\u95f4\u5206\u5e03\u5dee\u5f02\u7684\u5f71\u54cd\uff0c\u5bfc\u81f4\u4f18\u5316\u6548\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7b56\u7565\u2014\u2014\u52a0\u6743\u504f\u597d\u4f18\u5316\uff08WPO\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u504f\u597d\u8bc4\u5206\u5bf9\uff0c\u4f7f\u79bb\u7ebf\u6570\u636e\u66f4\u63a5\u8fd1\u4e8e\u5f53\u524d\u7b56\u7565\uff0c\u4ece\u800c\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5206\u5e03\u5dee\u8ddd\u96be\u9898\uff0c\u8fd8\u63d0\u5347\u4e86\u4f18\u5316\u8fc7\u7a0b\uff0c\u65e0\u9700\u989d\u5916\u6210\u672c\u3002 \u6211\u4eec\u5728Alpaca Eval 2\u548cMT-bench\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002WPO\u5728Alpaca Eval 2\u4e0a\u7684\u6027\u80fd\u6bd4\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u63d0\u9ad8\u4e865.6%\u3002\u57fa\u4e8eLlama-3-8B-Instruct\uff0cWPO\u751a\u81f3\u5efa\u7acb\u4e86\u663e\u8457\u7684\u957f\u5ea6\u63a7\u5236\u80dc\u7387\uff0c\u8fbe\u523048.6%\uff0c\u572880\u4ebf\u53c2\u6570\u6a21\u578b\u6392\u884c\u699c\u4e0a\u6210\u4e3a\u6700\u5f3a\u52b2\u7684\u6a21\u578b\u3002\u6211\u4eec\u5c06\u5728\u4e0a\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002**|\n", "2406.11818": "|**2024-06-17**|**Embodied Instruction Following in Unknown Environments**|Zhenyu Wu et.al.|[2406.11818](http://arxiv.org/abs/2406.11818)|null|\u5728\u81ea\u4e3b\u5bb6\u5ead\u670d\u52a1\u7cfb\u7edf\u4e2d\uff0c\u4f7f\u5b9e\u4f53\u4ee3\u7406\u80fd\u6839\u636e\u81ea\u7136\u8bed\u8a00\u5b8c\u6210\u590d\u6742\u7684\u4eba\u7c7b\u6307\u4ee4\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u4ec5\u80fd\u5728\u6240\u6709\u4e92\u52a8\u5bf9\u8c61\u90fd\u63d0\u4f9b\u7ed9\u4ee3\u7406\u7684\u5df2\u77e5\u73af\u5883\u4e2d\u6267\u884c\u6307\u4ee4\uff0c\u76f4\u63a5\u5c06\u73b0\u6709\u65b9\u6cd5\u5e94\u7528\u4e8e\u672a\u77e5\u73af\u5883\u901a\u5e38\u4f1a\u4ea7\u751f\u64cd\u4f5c\u4e0d\u5b58\u5728\u7269\u4f53\u7684\u4e0d\u53ef\u884c\u8ba1\u5212\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u672a\u77e5\u73af\u5883\u7684\u590d\u6742\u4efb\u52a1\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\uff08Embodied Instruction Following\uff0cEIF\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4f7f\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u63a2\u7d22\u73af\u5883\uff0c\u5229\u7528\u73b0\u6709\u7269\u4f53\u751f\u6210\u53ef\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u8fbe\u6210\u62bd\u8c61\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\u5668\u548c\u4f4e\u5c42\u63a2\u7d22\u63a7\u5236\u5668\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5c42\u6b21\u5316\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u52a8\u6001\u533a\u57df\u6ce8\u610f\u529b\u6784\u5efa\u573a\u666f\u7684\u8bed\u4e49\u8868\u793a\u5730\u56fe\uff0c\u4ee5\u5c55\u793a\u5df2\u77e5\u7684\u89c6\u89c9\u7ebf\u7d22\uff0c\u4f7f\u4efb\u52a1\u89c4\u5212\u548c\u573a\u666f\u63a2\u7d22\u4e0e\u4eba\u7c7b\u6307\u4ee4\u76ee\u6807\u4fdd\u6301\u4e00\u81f4\u3002\u5bf9\u4e8e\u4efb\u52a1\u89c4\u5212\u5668\uff0c\u6839\u636e\u4efb\u52a1\u5b8c\u6210\u8fc7\u7a0b\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\uff0c\u6211\u4eec\u751f\u6210\u6b65\u9aa4\u5f0f\u7684\u53ef\u884c\u8ba1\u5212\u3002\u5bf9\u4e8e\u63a2\u7d22\u63a7\u5236\u5668\uff0c\u6839\u636e\u751f\u6210\u7684\u6b65\u9aa4\u8ba1\u5212\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\u9884\u6d4b\u6700\u4f18\u7684\u5bfc\u822a\u6216\u7269\u4f53\u4ea4\u4e92\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5927\u578b\u623f\u5c4b\u7ea7\u573a\u666f\u4e2d\u7684204\u4e2a\u590d\u6742\u4eba\u7c7b\u6307\u4ee4\uff08\u5982\u505a\u65e9\u9910\u548c\u6574\u7406\u623f\u95f4\uff09\u4e0a\u5b9e\u73b0\u4e8645.09%\u7684\u6210\u529f\u7387\u3002|\n", "2406.11816": "|**2024-06-17**|**VideoLLM-online: Online Video Large Language Model for Streaming Video**|Joya Chen et.al.|[2406.11816](http://arxiv.org/abs/2406.11816)|null|## \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u589e\u5f3a\u4e86\u89c6\u89c9\u529f\u80fd\uff0c\u80fd\u591f\u7406\u89e3\u56fe\u50cf\u3001\u89c6\u9891\u548c\u878d\u5408\u4e86\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u5185\u5bb9\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5927\u6a21odels\u7684\u8bad\u7ec3\u65b9\u6cd5\u901a\u5e38\u5c06\u89c6\u9891\u89c6\u4e3a\u9884\u5148\u526a\u8f91\u597d\u7684\u7247\u6bb5\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5728\u5904\u7406\u8fde\u7eed\u89c6\u9891\u6d41\u65f6\u6548\u679c\u4e0d\u4f73\u4e14\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cLearning-In-Video-Stream\u201d\uff08LIVE\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b9e\u65f6\u3001\u957f\u5e8f\u5217\u3001\u4e0e\u89c6\u9891\u6d41\u540c\u6b65\u7684\u5bf9\u8bdd\uff0c\u9002\u7528\u4e8e\u8fde\u7eed\u89c6\u9891\u8f93\u5165\u3002LIVE\u6846\u67b6\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a\uff081\uff09\u4e00\u4e2a\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u8fde\u7eed\u6d41\u5f0f\u8f93\u5165\u7684\u8bed\u8a00\u5efa\u6a21\u76ee\u6807\uff1b\uff082\uff09\u4e00\u79cd\u6570\u636e\u751f\u6210\u7b56\u7565\uff0c\u5c06\u79bb\u7ebf\u65f6\u95f4\u6807\u6ce8\u8f6c\u6362\u4e3a\u9002\u5408\u6d41\u5f0f\u5bf9\u8bdd\u7684\u683c\u5f0f\uff1b\uff083\uff09\u4e00\u4e2a\u4f18\u5316\u7684\u63a8\u7406\u7ba1\u9053\uff0c\u4ee5\u63d0\u9ad8\u5728\u5b9e\u9645\u89c6\u9891\u6d41\u4e2d\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u57fa\u4e8eLlama-2/Llama-3\uff0c\u6211\u4eec\u6784\u5efa\u4e86VideoLLM-online\u6a21\u578b\uff0c\u5e76\u901a\u8fc7\u5b83\u5c55\u793a\u4e86\u5728\u5904\u7406\u89c6\u9891\u6d41\u5bf9\u8bdd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\uff0c\u4f8b\u5982\uff0c\u5728A100 GPU\u4e0a\uff0c\u8be5\u6a21\u578b\u80fd\u57285\u5206\u949f\u89c6\u9891\u7247\u6bb5\u4e2d\u5b9e\u73b0\u8d85\u8fc710\u5e27\u6bcf\u79d2\u7684\u6d41\u5f0f\u5bf9\u8bdd\u3002\u6b64\u5916\uff0cVideoLLM-online\u8fd8\u5728\u516c\u5f00\u7684\u79bb\u7ebf\u89c6\u9891\u57fa\u51c6\u6d4b\u8bd5\uff08\u5982\u8bc6\u522b\u3001captioning\u548c\u9884\u6d4b\uff09\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u3001\u6a21\u578b\u3001\u6570\u636e\u548c\u6f14\u793a\u53d1\u5e03\u5728https://showlab.github.io/videollm-online\u4f9b\u4eba\u4f7f\u7528\u3002|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813](http://arxiv.org/abs/2406.11813)|null|\u5c3d\u7ba1\u8fd1\u671f\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5b58\u50a8\u5927\u91cf\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u8fd9\u4e9b\u77e5\u8bc6\u7684\u673a\u5236\u5c1a\u4e0d\u660e\u786e\u3002\u672c\u7814\u7a76\u9488\u5bf9\u8fd9\u4e00\u7f3a\u53e3\uff0c\u63a2\u8ba8\u4e86LLMs\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u5982\u4f55\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u3002\u7814\u7a76\u53d1\u73b0\u4e86\u4e00\u4e9b\u5173\u952e\u6d1e\u89c1\uff1a\u9996\u5148\uff0c\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0c\u66f4\u591a\u7684\u8bad\u7ec3\u6570\u636e\u5bf9\u6a21\u578b\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u7684\u80fd\u529b\u5e76\u65e0\u663e\u8457\u63d0\u5347\u3002\u5176\u6b21\uff0c\u8bad\u7ec3\u6b65\u6570\u4e0e\u8bb0\u5fc6\u9057\u5fd8\u548c\u4e8b\u5b9e\u77e5\u8bc6\u6cdb\u5316\u4e4b\u95f4\u5b58\u5728\u5e42\u5f8b\u5173\u7cfb\uff0c\u4f7f\u7528\u91cd\u590d\u8bad\u7ec3\u6570\u636e\u7684\u6a21\u578b\u9057\u5fd8\u901f\u5ea6\u66f4\u5feb\u3002\u7b2c\u4e09\uff0c\u589e\u5927\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u62b5\u6297\u9057\u5fd8\u7684\u80fd\u529b\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0cLLMs\u5728\u9884\u8bad\u7ec3\u4e2d\u7684\u4e8b\u5b9e\u77e5\u8bc6\u83b7\u53d6\u662f\u901a\u8fc7\u9010\u6b65\u589e\u52a0\u6bcf\u4e00\u6b65\u4e2d\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\u4e8b\u5b9e\u77e5\u8bc6\u51fa\u73b0\u7684\u6982\u7387\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u589e\u52a0\u968f\u540e\u4f1a\u56e0\u9057\u5fd8\u800c\u7a00\u91ca\u3002\u57fa\u4e8e\u8fd9\u79cd\u7406\u89e3\uff0c\u6211\u4eec\u80fd\u591f\u89e3\u91ca\u4e00\u4e9b\u6700\u8fd1\u89c2\u5bdf\u5230\u7684LLM\u884c\u4e3a\uff0c\u5982\u957f\u5c3e\u77e5\u8bc6\u4e0a\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4ee5\u53ca\u53bb\u91cd\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u597d\u5904\u3002|\n", "2406.11811": "|**2024-06-17**|**RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content**|Joao Monteiro et.al.|[2406.11811](http://arxiv.org/abs/2406.11811)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5927\u91cf\u4f9d\u8d56\u81ea\u52a8\u4ece\u4e92\u8054\u7f51\u6293\u53d6\u7684\u6570\u636e\uff0c\u5176\u4e2d\u5305\u62ec\u5305\u542b\u5927\u91cf\u901a\u7528\u77e5\u8bc6\u7684\u767e\u79d1\u5168\u4e66\uff08\u5982\u7ef4\u57fa\u767e\u79d1\uff09\uff0c\u4e5f\u53ef\u80fd\u4e0e\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u57fa\u51c6\u6570\u636e\u96c6\u91cd\u53e0\u3002\u56e0\u6b64\uff0c\u5982\u679c\u6d4b\u8bd5\u96c6\u53ef\u80fd\u5df2\u6cc4\u9732\u5230\u8bad\u7ec3\u96c6\u4e2d\uff0c\u5bf9\u6a21\u578b\u7684\u8bc4\u4f30\u53ef\u80fd\u4f1a\u4ea7\u751f\u8bef\u5bfc\u6027\u7684\u7ed3\u8bba\u3002\u4e3a\u4e86\u63a8\u52a8\u8bed\u8a00\u6a21\u578b\u7684\u516c\u6b63\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\u2014\u2014RepLiQA\uff0c\u9002\u7528\u4e8e\u95ee\u7b54\u548c\u4e3b\u9898\u68c0\u7d22\u4efb\u52a1\u3002RepLiQA\u662f\u4e00\u4e2a\u5305\u542b\u4e94\u4e2a\u5206\u7247\u7684\u6d4b\u8bd5\u96c6\uff0c\u5176\u4e2d\u56db\u4e2a\u5728\u672c\u8bba\u6587\u53d1\u5e03\u524d\u672a\u516c\u5f00\u6216\u901a\u8fc7LLM API\u63d0\u4f9b\u3002RepLiQA\u7684\u6bcf\u4e2a\u6837\u672c\u7531\u4ee5\u4e0b\u56db\u90e8\u5206\u7ec4\u6210\uff1a\uff081\uff09\u7531\u4eba\u7c7b\u6807\u6ce8\u5458\u521b\u4f5c\u7684\u865a\u6784\u573a\u666f\u63cf\u8ff0\u6587\u6863\uff08\u4f8b\u5982\u65b0\u95fb\u6587\u7ae0\uff09\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0d\u4f1a\u51fa\u73b0\u5728\u4e92\u8054\u7f51\u4e0a\uff1b\uff082\uff09\u5173\u4e8e\u6587\u6863\u4e3b\u9898\u7684\u95ee\u9898\uff1b\uff083\uff09\u76f4\u63a5\u6e90\u81ea\u6587\u6863\u4fe1\u606f\u7684\u6b63\u786e\u7b54\u6848\uff1b\uff084\uff09\u5305\u542b\u7b54\u6848\u7684\u6587\u6863\u6bb5\u843d\u3002\u8fd9\u610f\u5473\u7740\u53ea\u6709\u5f53\u6a21\u578b\u80fd\u5728\u63d0\u4f9b\u7684\u6587\u6863\u4e2d\u627e\u5230\u76f8\u5173\u5185\u5bb9\u65f6\uff0c\u624d\u80fd\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5305\u62ec\u591a\u4e2a\u6700\u5148\u8fdb\u7684LLM\uff0c\u4ee5\u63ed\u793a\u4e0d\u540c\u7c7b\u578b\u7684\u548c\u89c4\u6a21\u7684\u6a21\u578b\u5728\u6761\u4ef6\u8bed\u8a00\u5efa\u6a21\u8bbe\u7f6e\u4e0b\u7684\u6027\u80fd\u5dee\u5f02\u3002RepLiQA\u7684\u5df2\u53d1\u5e03\u5206\u7247\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\uff1ahttps://huggingface.co/datasets/ServiceNow/repliqa\u3002|\n", "2406.11801": "|**2024-06-17**|**Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations**|Rima Hazra et.al.|[2406.11801](http://arxiv.org/abs/2406.11801)|**[link](https://github.com/declare-lab/safety-arithmetic)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ffb\u8bd1\u548c\u95ee\u7b54\u7b49\u5e94\u7528\u4e2d\u7684\u65e5\u76ca\u91cd\u8981\uff0c\u786e\u4fdd\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6b63\u786e\u5bfc\u5411\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u5bf9\u9f50\u65b9\u6cd5\u5728\u5904\u7406\u52a8\u6001\u7528\u6237\u610f\u56fe\u548c\u590d\u6742\u76ee\u6807\u65f6\u5b58\u5728\u56f0\u96be\uff0c\u4f7f\u5f97\u6a21\u578b\u5bb9\u6613\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\u2014\u2014\u5b89\u5168\u7b97\u672f\uff08Safety Arithmetic\uff09\uff0c\u65e8\u5728\u63d0\u5347LLMs\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5305\u62ec\u57fa\u7840\u6a21\u578b\u3001\u76d1\u7763\u5fae\u8c03\u6a21\u578b\uff08SFT\uff09\u548c\u7f16\u8f91\u540e\u7684\u6a21\u578b\u3002\u5b89\u5168\u7b97\u672f\u5305\u542b\u4e24\u90e8\u5206\uff1a\u6709\u5bb3\u5185\u5bb9\u6d88\u9664\uff08Harm Direction Removal\uff09\u4ee5\u907f\u514d\u4e0d\u826f\u8f93\u51fa\uff0c\u4ee5\u53ca\u5b89\u5168\u5bf9\u9f50\uff08Safety Alignment\uff09\u4ee5\u4fc3\u8fdb\u5b89\u5168\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86NoIntentEdit\u6570\u636e\u96c6\uff0c\u5b83\u63ed\u793a\u4e86\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5b89\u5168\u98ce\u9669\u7684\u7f16\u8f91\u5b9e\u4f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5b89\u5168\u7b97\u672f\u663e\u8457\u589e\u5f3a\u4e86\u5b89\u5168\u63aa\u65bd\uff0c\u51cf\u5c11\u4e86\u8fc7\u5ea6\u5b89\u5168\u7684\u95ee\u9898\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u5b9e\u7528\u6027\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5728\u4fdd\u969c\u5185\u5bb9\u751f\u6210\u7684\u5b89\u5168\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002**|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846](http://arxiv.org/abs/2406.12846)|null|\u5f53\u524d\u7684\u957f\u89c6\u9891\u7406\u89e3\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u65f6\u957f\u4ec5\u5341\u51e0\u79d2\u7684\u89c6\u9891\uff0c\u5bf9\u5904\u7406\u66f4\u957f\u89c6\u9891\u7684\u6280\u672f\u63a2\u7d22\u6709\u9650\u3002\u957f\u89c6\u9891\u4e2d\u7684\u5927\u91cf\u5e27\u6570\u5e26\u6765\u4e86\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u96be\u4ee5\u5b9a\u4f4d\u5173\u952e\u4fe1\u606f\u548c\u8fdb\u884c\u957f\u671f\u63a8\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faDrVideo\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6587\u6863\u68c0\u7d22\u7684\u7cfb\u7edf\uff0c\u4e13\u4e3a\u957f\u89c6\u9891\u7406\u89e3\u8bbe\u8ba1\u3002\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u89c6\u9891\u7406\u89e3\u95ee\u9898\u8f6c\u5316\u4e3a\u957f\u6587\u6863\u7406\u89e3\u4efb\u52a1\uff0c\u4ee5\u5145\u5206\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0cDrVideo\u5c06\u957f\u89c6\u9891\u8f6c\u6362\u4e3a\u6587\u672c\u5f62\u5f0f\u7684\u957f\u6587\u6863\uff0c\u9996\u5148\u68c0\u7d22\u5173\u952e\u5e27\u5e76\u589e\u5f3a\u8fd9\u4e9b\u5e27\u7684\u4fe1\u606f\uff0c\u4f5c\u4e3a\u7cfb\u7edf\u7684\u8d77\u70b9\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u57fa\u4e8e\u4ee3\u7406\u7684\u8fed\u4ee3\u5faa\u73af\uff0c\u6301\u7eed\u641c\u7d22\u7f3a\u5931\u4fe1\u606f\u3001\u8865\u5145\u76f8\u5173\u6570\u636e\uff0c\u5e76\u5728\u6536\u96c6\u5230\u8db3\u591f\u7684\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u4fe1\u606f\u540e\uff0c\u4ee5\u94fe\u5f0f\u601d\u8003\u7684\u65b9\u5f0f\u7ed9\u51fa\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u591a\u4e2a\u957f\u89c6\u9891\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002DrVideo\u5728EgoSchema\uff083\u5206\u949f\uff09\u6d4b\u8bd5\u4e2d\u6bd4\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa3.8\u4e2a\u767e\u5206\u70b9\uff0c\u5728MovieChat-1K\uff0810\u5206\u949f\uff09\u7684break\u6a21\u5f0f\u548cglobal\u6a21\u5f0f\u4e2d\u5206\u522b\u63d0\u9ad817.9\u548c38.0\u5206\uff0c\u4ee5\u53ca\u5728LLama-Vid QA\uff08\u8d85\u8fc760\u5206\u949f\uff09\u6570\u636e\u96c6\u4e0a\u63d0\u534730.2\u5206\u3002|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845](http://arxiv.org/abs/2406.12845)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\uff08RM\uff09\uff0c\u8fc7\u7a0b\u901a\u5e38\u4ece\u6bd4\u8f83\u540c\u4e00\u7528\u6237\u8bf7\u6c42\u7684\u54cd\u5e94\u5f00\u59cb\uff0c\u76f8\u5bf9\u8bc4\u5206\u6307\u793a\u4eba\u7c7b\u66f4\u559c\u6b22\u54ea\u4e2a\u54cd\u5e94\u3002\u7136\u800c\uff0c\u7531\u4e8eRM\u7684\u9ed1\u76d2\u7279\u6027\uff0c\u5176\u8f93\u51fa\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u4eba\u4eec\u96be\u4ee5\u7406\u89e3\u4e3a\u4ec0\u4e48RM\u8ba4\u4e3a\u67d0\u4e2a\u56de\u590d\u662f\u597d\u7684\u3002\u9274\u4e8eRM\u4f5c\u4e3a\u4eba\u7c7b\u504f\u597d\u7684\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u8bae\u91c7\u7528\u4e24\u9636\u6bb5\u65b9\u6cd5\u6765\u521b\u5efa\u53ef\u89e3\u91ca\u7684RM\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u591a\u7ef4\u7edd\u5bf9\u8bc4\u5206\u6570\u636e\u8bad\u7ec3\u7edd\u5bf9\u8bc4\u7ea7\u591a\u76ee\u6807\u5956\u52b1\u6a21\u578b\uff08ArmoRM\uff09\uff0c\u6bcf\u4e2a\u7ef4\u5ea6\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u76ee\u6807\uff08\u5982\u8bda\u5b9e\u3001\u8be6\u5c3d\u3001\u5b89\u5168\uff09\uff1b\u5176\u6b21\uff0c\u5229\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7b56\u7565\uff0c\u7ed3\u5408\u4e00\u4e2a\u95e8\u63a7\u7f51\u7edc\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u81ea\u52a8\u9009\u62e9\u6700\u5408\u9002\u7684\u5956\u52b1\u76ee\u6807\u3002\u6211\u4eec\u6210\u529f\u5730\u4f7f\u7528Llama-3 8B\u8bad\u7ec3\u4e86ArmoRM\uff0c\u5e76\u5728\u9876\u90e8\u6dfb\u52a0\u4e86\u4e00\u4e2a\u6d45\u5c42MLP\u4f5c\u4e3a\u95e8\u63a7\u7f51\u7edc\uff0c\u5f62\u6210\u4e86ArmoRM-Llama3-8B\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u8bc4\u4f30RM\u7684\u8bed\u8a00\u5efa\u6a21\u6027\u80fd\u7684RewardBench\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6210\u7ee9\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8d85\u8fc7\u4e86\u4f7f\u7528GPT-4\u6cd5\u5b98\u7684LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\uff0c\u5e76\u63a5\u8fd1\u4e8e\u89c4\u6a21\u66f4\u5927\u7684Nemotron-4 340B\u5956\u52b1\u6a21\u578b\u7684\u6c34\u5e73\u3002**|\n", "2406.12844": "|**2024-06-18**|**Synergizing Foundation Models and Federated Learning: A Survey**|Shenghui Li et.al.|[2406.12844](http://arxiv.org/abs/2406.12844)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3001\u89c6\u89c9Transformer\u548c\u591a\u6a21\u6001\u6a21\u578b\u7b49\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u7684\u53d1\u5c55\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u4e0e\u5c0f\u578b\u6a21\u578b\u76f8\u6bd4\uff0cFMs\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5bf9\u5927\u91cf\u6570\u636e\u7684\u9700\u6c42\u66f4\u5927\u3002\u5c3d\u7ba1\u901a\u7528FMs\u53ef\u4ee5\u4f7f\u7528\u4e92\u8054\u7f51\u4e0a\u7684\u516c\u5f00\u6570\u636e\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u4f46\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684FMs\u9700\u8981\u4e13\u6709\u6570\u636e\uff0c\u8fd9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u56e0\u9690\u79c1\u95ee\u9898\u800c\u9762\u4e34\u6570\u636e\u53ef\u7528\u6027\u6311\u6218\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4f5c\u4e3a\u4e00\u79cd\u534f\u4f5c\u5b66\u4e60\u8303\u5f0f\uff0c\u6253\u7834\u4e86\u6570\u636e\u5171\u4eab\u7684\u969c\u788d\uff0c\u4e3a\u5229\u7528\u5206\u5e03\u5f0f\u6570\u636e\u5b9a\u5236\u548c\u9002\u5e94\u5404\u79cd\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u7684FMs\u63d0\u4f9b\u4e86\u524d\u666f\uff0c\u540c\u65f6\u4fdd\u62a4\u4e86\u6570\u636e\u9690\u79c1\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8bba\u6587\u63a2\u8ba8\u4e86FL\u4e0eFMs\u878d\u5408\u7684\u6f5c\u529b\u4e0e\u6311\u6218\uff0c\u603b\u7ed3\u4e86\u6838\u5fc3\u6280\u672f\u3001\u672a\u6765\u53d1\u5c55\u65b9\u5411\u4ee5\u53ca\u5e94\u7528\u573a\u666f\u3002\u5173\u4e8eFM-FL\u7684\u5b9a\u671f\u66f4\u65b0\u8bba\u6587\u96c6\u5408\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832](http://arxiv.org/abs/2406.12832)|**[link](https://github.com/arminazizi98/lamda)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u5df2\u7ecf\u6210\u4e3a\u6807\u51c6\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u5d4c\u5165\u7ef4\u5ea6\u7684\u589e\u52a0\uff0cLoRA\u6240\u9700\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u4e5f\u968f\u4e4b\u4e0a\u5347\uff0c\u5bfc\u81f4\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6b64\u5916\uff0c\u5176\u540e\u5411\u66f4\u65b0\u9700\u8981\u5b58\u50a8\u9ad8\u7ef4\u4e2d\u95f4\u6fc0\u6d3b\u548c\u4f18\u5316\u5668\u72b6\u6001\uff0c\u5bf9GPU\u5185\u5b58\u9700\u6c42\u8f83\u5927\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\u57fa\u4e8e\u8c31\u5206\u89e3\u7684\u4f4e\u7ef4\u9002\u5e94\uff08LaMDA\uff09\u3002LaMDA\u901a\u8fc7\u51bb\u7ed3\u7b2c\u4e00\u6295\u5f71\u77e9\u9635\uff08PMA\uff09\uff0c\u540c\u65f6\u5f15\u5165\u4e00\u4e2a\u4f4e\u7ef4\u53ef\u8bad\u7ec3\u7684\u5e73\u65b9\u77e9\u9635\uff0c\u5b9e\u73b0\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u7684\u5927\u5e45\u51cf\u5c11\u3002\u5728\u65e9\u671f\u7684\u5fae\u8c03\u9636\u6bb5\uff0cLaMDA\u9010\u6b65\u51bb\u7ed3\u7b2c\u4e8c\u6295\u5f71\u77e9\u9635\uff08PMB\uff09\uff0c\u8fdb\u4e00\u6b65\u964d\u4f4e\u6743\u91cd\u66f4\u65b0\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u63d0\u9ad8\u53c2\u6570\u6548\u7387\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u589e\u5f3a\u7248LaMDA++\uff0c\u5b83\u901a\u8fc7\u89c4\u8303\u5316\u9884\u8bad\u7ec3\u6a21\u578b\u6743\u91cd\u7684\u8c31\u5206\u6790\uff0c\u5b9e\u73b0\u8f7b\u91cf\u7ea7\u7684LoRA\u8def\u5f84\u81ea\u9002\u5e94\u79e9\u5206\u914d\u3002\u6211\u4eec\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecGLUE\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u57fa\u51c6\u3001\u6587\u672c\u6458\u8981\u3001\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4ee5\u53ca\u590d\u6742\u63a8\u7406\uff0c\u5e94\u7528\u4e8e\u4e0d\u540c\u7c7b\u578b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLaMDA\u5728\u6027\u80fd\u4e0a\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u5f53\u6216\u8d85\u8d8a\uff0c\u4e14\u5728\u5fae\u8c03\u671f\u95f4\u53ef\u51cf\u5c11\u9ad8\u8fbe17.7\u500d\u7684\u53c2\u6570\u66f4\u65b0\u6b21\u6570\uff0c\u4ee5\u53ca1.32\u500d\u7684\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u3002\u6211\u4eec\u5c06\u516c\u5f00\u4ee3\u7801\u3002**|\n", "2406.12822": "|**2024-06-18**|**Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?**|Pinzhen Chen et.al.|[2406.12822](http://arxiv.org/abs/2406.12822)|null|## \u80cc\u666f \u5927\u578b\u591a\u8bed\u8a00\u6a21\u578b\u65e8\u5728\u670d\u52a1\u4e0d\u540c\u8bed\u79cd\u7684\u6bcd\u8bed\u4f7f\u7528\u8005\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u5f53\u524d\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u5fae\u8c03\u548c\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4e0e\u5176\u521d\u8877\u4e0d\u7b26\uff0c\u539f\u56e0\u5728\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u7ffb\u8bd1\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ffb\u8bd1\u4e2d\u7684\u7455\u75b5\u3002\u5c1a\u4e0d\u6e05\u695a\u6307\u4ee4\u6570\u636e\u7684\u6027\u8d28\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u8f93\u51fa\uff0c\u540c\u65f6\uff0c\u7528\u7ffb\u8bd1\u6d4b\u8bd5\u96c6\u6765\u6355\u6349\u8fd9\u4e9b\u7ec6\u5fae\u5dee\u522b\u662f\u5426\u6709\u6548\u3002\u7531\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u9636\u6bb5\u5e38\u5e38\u7ed3\u5408\u4f7f\u7528\u7ffb\u8bd1\u6570\u636e\uff0c\u8fd9\u4e9b\u6f5c\u5728\u95ee\u9898\u53ef\u80fd\u88ab\u5ffd\u89c6\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5728\u6307\u4ee4\u8c03\u4f18\u548c\u8bc4\u4f30\u9636\u6bb5\u4f7f\u7528\u63a7\u5236\u6027\u7684\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6570\u636e\uff0c\u6765\u63a2\u7a76\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u89c2\u5bdf\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u516b\u79cd\u57fa\u7840\u6a21\u578b\u548c\u516b\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u6bcd\u8bed\u6216\u751f\u6210\u6027\u57fa\u51c6\uff0c\u4f7f\u7528\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6307\u4ee4\u6570\u636e\u65f6\uff0c\u6a21\u578b\u6027\u80fd\u9ad8\u65f6\uff0c\u4e24\u8005\u4e4b\u95f4\u7684\u5dee\u5f02\u5c24\u4e3a\u660e\u663e\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u7684\u6d4b\u8bd5\u96c6\u4e0a\u5219\u4e0d\u7136\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u6b63\u5219\u5316\u5bf9\u4e8e\u7ed3\u6784\u5316\u4efb\u52a1\u6709\u76ca\uff0c\u4f46\u5bf9\u4e8e\u751f\u6210\u6027\u4efb\u52a1\u5219\u4e0d\u7136\u3002|\n", "2406.12809": "|**2024-06-18**|**Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?**|Zhe Yang et.al.|[2406.12809](http://arxiv.org/abs/2406.12809)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e0d\u4e00\u81f4\u7684\u95ee\u9898\uff0c\u4f8b\u5982\u5bf9\u91cd\u8ff0\u6216\u5fae\u5c0f\u987a\u5e8f\u53d8\u5316\u7684\u53cd\u5e94\u4e0d\u4e00\u81f4\u3002\u9664\u4e86\u8fd9\u4e9b\u4e0d\u7a33\u5b9a\u6027\uff0c\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\u5c3d\u7ba1LLMs\u80fd\u591f\u89e3\u51b3\u96be\u9898\uff0c\u4f46\u5728\u76f8\u5bf9\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\u5374\u53ef\u80fd\u5931\u8d25\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u4ece\u96be\u5230\u6613\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u6211\u4eec\u521b\u5efa\u4e86ConsisEval\u57fa\u51c6\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u5305\u542b\u4e24\u4e2a\u96be\u5ea6\u6709\u5e8f\u7684\u95ee\u9898\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u81f4\u6027\u5206\u6570\u7684\u6982\u5ff5\uff0c\u4ee5\u91cf\u5316\u8fd9\u79cd\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u5206\u6790\u901a\u8fc7\u76f8\u5bf9\u4e00\u81f4\u6027\u5206\u6570\u6539\u8fdb\u4e00\u81f4\u6027\u6f5c\u529b\u3002\u901a\u8fc7\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4ee5\u4e0b\u53d1\u73b0\uff1a(1) GPT-4\u83b7\u5f9792.2%\u7684\u6700\u9ad8\u4e00\u81f4\u6027\u5206\u6570\uff0c\u4f46\u4ecd\u56e0\u5197\u4f59\u4fe1\u606f\u7684\u5e72\u6270\u3001\u95ee\u9898\u8bef\u89e3\u7b49\u95ee\u9898\u5bf9\u7279\u5b9a\u95ee\u9898\u4e0d\u4e00\u81f4\uff1b(2) \u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u901a\u5e38\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u4e00\u81f4\u6027\uff0c\u4f46\u4e5f\u5b58\u5728\u4f8b\u5916\u60c5\u51b5\uff1b(3) \u5bf9\u4e8e Fine-tuning \u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u800c\u8a00\uff0c\u786c\u6570\u636e\u53ef\u4ee5\u63d0\u9ad8\u4e00\u81f4\u6027\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12800": "|**2024-06-18**|**Supporting Human Raters with the Detection of Harmful Content using Large Language Models**|Kurt Thomas et.al.|[2406.12800](http://arxiv.org/abs/2406.12800)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u6216\u8f85\u52a9\u4eba\u7c7b\u5ba1\u9605\u8005\u68c0\u6d4b\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u5982\u4ec7\u6068\u8a00\u8bba\u3001\u9a9a\u6270\u3001\u6781\u7aef\u4e3b\u4e49\u548c\u9009\u4e3e\u8bef\u5bfc\u3002\u901a\u8fc750,000\u6761\u8bc4\u8bba\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u6bd4\u65f6\u80fd\u8fbe\u523090%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u63d0\u51fa\u4e94\u79cd\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4ee5\u6574\u5408LLMs\u4e0e\u4eba\u5de5\u8bc4\u7ea7\uff0c\u4f8b\u5982\u9884\u7b5b\u9009\u975e\u66b4\u529b\u5185\u5bb9\u3001\u68c0\u6d4b\u4eba\u7c7b\u8bc4\u7ea7\u53ef\u80fd\u7684\u9519\u8bef\uff0c\u6216\u8005\u63d0\u4f9b\u5173\u952e\u4e0a\u4e0b\u6587\u4ee5\u652f\u6301\u4eba\u5de5\u8bc4\u7ea7\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u4e00\u4e2a\u4f18\u5316\u7684\u63d0\u793a\u6765\u652f\u6301\u8fd9\u4e9b\u8bbe\u8ba1\u6a21\u5f0f\u3002\u5728\u5b9e\u9645\u5e94\u7528\u7684\u8bd5\u70b9\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f18\u5316\u4eba\u529b\u8d44\u6e90\u6548\u7387\u65b9\u9762\u5b9e\u73b0\u4e8641.5%\u7684\u63d0\u5347\uff0c\u540c\u65f6\u5728\u68c0\u6d4b\u8fdd\u89c4\u5185\u5bb9\u7684\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e0a\u5206\u522b\u63d0\u9ad8\u4e869%\u81f311%\u3002|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793](http://arxiv.org/abs/2406.12793)|**[link](https://github.com/thudm/chatglm-6b)**|\u6211\u4eec\u4ecb\u7ecdChatGLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u4e0d\u65ad\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7cfb\u5217\u3002\u672c\u62a5\u544a\u4e3b\u8981\u5173\u6ce8GLM-4\u8bed\u8a00\u7cfb\u5217\uff0c\u5305\u62ecGLM-4\u3001GLM-4-Air\u548cGLM-4-9B\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u6211\u4eec\u5f53\u524d\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u96c6\u6210\u4e86\u524d\u4e09\u4ee3ChatGLM\u7684\u6240\u6709\u7ecf\u9a8c\u548c\u6559\u8bad\u3002\u8fd9\u4e9b\u6a21\u578b\u7ecf\u8fc7\u4e86\u5341\u4e07\u4ebf\u6b21\u8bad\u7ec3\uff0c\u4e3b\u8981\u6db5\u76d6\u4e2d\u6587\u548c\u82f1\u8bed\uff0c\u4ee5\u53ca\u5c11\u91cf\u6765\u81ea24\u79cd\u8bed\u8a00\u7684\u8bed\u6599\u5e93\uff0c\u4fa7\u91cd\u4e8e\u4e2d\u82f1\u6587\u7684\u5bf9\u9f50\u3002\u9ad8\u8d28\u91cf\u7684\u5bf9\u9f50\u662f\u901a\u8fc7\u591a\u9636\u6bb5\u7684\u540e\u8bad\u7ec3\u8fc7\u7a0b\u5b9e\u73b0\u7684\uff0c\u5305\u62ec\u76d1\u7763\u5fae\u8c03\u548c\u5b66\u4e60\u4eba\u7c7b\u53cd\u9988\u3002\u8bc4\u4f30\u663e\u793a\uff0cGLM-4\u5728\u901a\u7528\u6307\u6807\u5982MMLU\u3001GSM8K\u3001MATH\u3001BBH\u3001GPQA\u548cHumanEval\u4e0a\u63a5\u8fd1\u6216\u4f18\u4e8eGPT-4\uff1b\u5728IFEval\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63a5\u8fd1GPT-4 Turbo\uff1b\u5728\u957f\u6587\u672c\u4efb\u52a1\u4e0a\u4e0eGPT-4 Turbo\uff08128K\uff09\u548cClaude 3\u76f8\u5f53\uff1b\u5728\u4e2d\u6587\u5bf9\u9f50\u65b9\u9762\uff0cGLM-4\u4f18\u4e8eGPT-4\uff0c\u6839\u636eAlignBench\u8861\u91cf\u3002GLM-4 All Tools\u6a21\u578b\u8fdb\u4e00\u6b65\u8fdb\u884c\u4e86\u5bf9\u9f50\uff0c\u4ee5\u7406\u89e3\u7528\u6237\u610f\u56fe\u5e76\u80fd\u81ea\u4e3b\u51b3\u5b9a\u4f55\u65f6\u4f7f\u7528\u54ea\u79cd\u5de5\u5177\uff0c\u5982Web\u6d4f\u89c8\u5668\u3001Python\u89e3\u91ca\u5668\u3001\u6587\u672c\u8f6c\u56fe\u50cf\u6a21\u578b\u548c\u81ea\u5b9a\u4e49\u51fd\u6570\uff0c\u4ee5\u6709\u6548\u5730\u5b8c\u6210\u590d\u6742\u4efb\u52a1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5b83\u5728\u8bf8\u5982\u901a\u8fc7\u7f51\u7edc\u6d4f\u89c8\u83b7\u53d6\u4fe1\u606f\u548c\u4f7f\u7528Python\u89e3\u91ca\u5668\u89e3\u9898\u7b49\u4efb\u52a1\u4e0a\u4e0eGPT-4 All Tools\u76f8\u5339\u914d\u751a\u81f3\u8d85\u8d8a\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u4e00\u7cfb\u5217\u6a21\u578b\uff0c\u5305\u62ecChatGLM-6B\uff08\u4e09\u4ee3\uff09\u3001GLM-4-9B\uff08128K\u30011M\uff09\u3001GLM-4V-9B\u3001WebGLM\u548cCodeGeeX\uff0c\u57282023\u5e74\u4ec5Hugging Face\u4e0a\u5c31\u6709\u8d85\u8fc71000\u4e07\u6b21\u4e0b\u8f7d\u3002\u8fd9\u4e9b\u5f00\u6e90\u6a21\u578b\u53ef\u901a\u8fc7\u548c\u8bbf\u95ee\u3002|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784](http://arxiv.org/abs/2406.12784)|**[link](https://github.com/Cyno2232/UBENCH)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4f4e\u53ef\u89e3\u91ca\u6027\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u672a\u9884\u89c1\u60c5\u51b5\u4e0b\u5e38\u4f1a\u51fa\u73b0\u9519\u8bef\uff0c\u9650\u5236\u4e86\u5176\u4ef7\u503c\u3002\u5c3d\u7ba1\u5df2\u6709\u8bb8\u591a\u7814\u7a76\u81f4\u529b\u4e8e\u6784\u5efa\u5168\u9762\u7684\u8bc4\u4f30\u4f53\u7cfb\uff0c\u4f46\u5148\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u5173\u6ce8\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u5bf9\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u4e0d\u8db3\uff0c\u53ef\u80fd\u5bfc\u81f4\u4e0d\u7a33\u5b9a\u6027\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u8861\u91cfLLM\u53ef\u9760\u6027\u65f6\u8d44\u6e90\u6d88\u8017\u5927\uff0c\u4e14\u96be\u4ee5\u6d4b\u8bd5\u9ed1\u76d2\u6a21\u578b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86UBENCH\uff0c\u4e00\u4e2a\u5168\u9762\u7684LLM\u53ef\u9760\u6027\u8bc4\u4f30\u57fa\u51c6\u3002\u5b83\u5305\u542b3,978\u4e2a\u6db5\u76d6\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u3001\u63a8\u7406\u80fd\u529b\u7684\u591a\u9009\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cUBENCH\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u5355\u6b21\u91c7\u6837\u65b9\u6cd5\u663e\u8457\u8282\u7701\u4e86\u8ba1\u7b97\u8d44\u6e90\uff0c\u76f8\u8f83\u4e8e\u9700\u8981\u591a\u6b21\u91c7\u6837\u7684\u57fa\u7ebf\u65b9\u6cd5\u66f4\u4e3a\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528UBENCH\u8bc4\u4f30\u4e8615\u79cd\u6d41\u884cLLM\u7684\u53ef\u9760\u6027\uff0c\u53d1\u73b0GLM4\u8868\u73b0\u51fa\u8272\uff0c\u7d27\u968f\u5176\u540e\u7684\u662fGPT-4\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86Chain-of-Thought\u63d0\u793a\u3001\u89d2\u8272\u626e\u6f14\u63d0\u793a\u3001\u9009\u9879\u987a\u5e8f\u548c\u6e29\u5ea6\u5bf9LLM\u53ef\u9760\u6027\u7684\u5f71\u54cd\uff0c\u5206\u6790\u4e86\u5b83\u4eec\u5bf9\u4e0d\u540c\u6a21\u578b\u7684\u4e0d\u540c\u4f5c\u7528\u3002|\n", "2406.14563": "|**2024-06-20**|**Model Merging and Safety Alignment: One Bad Model Spoils the Bunch**|Hasan Abed Al Kader Hammoud et.al.|[2406.14563](http://arxiv.org/abs/2406.14563)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5408\u5e76\u662f\u4e00\u79cd\u7ecf\u6d4e\u9ad8\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u591a\u4e2a\u4e13\u5bb6\u7ea7LLMs\u6574\u5408\u6210\u4e00\u4e2a\u5168\u80fd\u6a21\u578b\uff0c\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u5408\u5e76\u8fc7\u7a0b\u4e2d\u5b89\u5168\u5bf9\u9f50\u7684\u91cd\u8981\u6027\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u6a21\u578b\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u6a21\u578b\u5408\u5e76\u5bf9\u5bf9\u9f50\u6027\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6d41\u884c\u7684\u6a21\u578b\u5408\u5e76\u6280\u672f\uff0c\u53d1\u73b0\u73b0\u6709\u65b9\u6cd5\u4e0d\u4ec5\u4f20\u9012\u4e86\u9886\u57df\u4e13\u4e1a\u77e5\u8bc6\uff0c\u8fd8\u4f20\u64ad\u4e86\u4e0d\u4e00\u81f4\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u6cd5\u89e3\u51b3\u65b9\u6848\uff1a(1) \u751f\u6210\u5408\u6210\u7684\u5b89\u5168\u6027\u548c\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c(2) \u5c06\u8fd9\u4e9b\u751f\u6210\u7684\u6570\u636e\u878d\u5165\u73b0\u6709\u7684\u6570\u636e\u9a71\u52a8\u7684\u6a21\u578b\u5408\u5e76\u4f18\u5316\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u80fd\u591f\u5c06\u5bf9\u9f50\u6027\u89c6\u4e3a\u53ef\u4ee5\u6700\u5927\u5316\u4e8e\u5408\u5e76\u540eLLM\u4e2d\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5408\u5e76\u8fc7\u7a0b\u4e2d\u6574\u5408\u5bf9\u9f50\u76f8\u5173\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u662f\u65e2\u80fd\u4fdd\u6301\u9886\u57df\u4e13\u957f\u53c8\u80fd\u5b9e\u73b0\u826f\u597d\u5bf9\u9f50\u7684\u6a21\u578b\u3002|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562](http://arxiv.org/abs/2406.14562)|null|\u5f53\u9762\u4e34\u6d89\u53ca\u89c6\u89c9\u601d\u7ef4\u7684\u95ee\u9898\u65f6\uff0c\u4eba\u7c7b\u4f1a\u81ea\u7136\u5730\u5207\u6362\u5230\u63a8\u7406\u6a21\u5f0f\uff0c\u5e38\u5e38\u5f62\u6210\u5fc3\u7406\u56fe\u50cf\u6216\u7ed8\u5236\u89c6\u89c9\u8f85\u52a9\u5de5\u5177\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6570\u5b66\u548c\u7b26\u53f7\u63a8\u7406\u65b9\u9762\u5c55\u73b0\u51fa\u826f\u597d\u8868\u73b0\uff0c\u901a\u8fc7\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u94fe\u6761\u601d\u8003\uff0c\u4f46\u5728\u5904\u7406\u53ef\u4ee5\u901a\u8fc7\u89c6\u89c9\u63a8\u7406\u8f7b\u677e\u89e3\u7b54\u7684\u6587\u672c\u67e5\u8be2\u65f6\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5927\u91cf\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u4e5f\u662f\u5982\u6b64\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u65b9\u6cd5\uff0c\u5373\u201c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u201d\uff0c\u6765\u89e3\u9501\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8de8\u6a21\u6001\u4e2d\u7684\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u3002\u767d\u677f\u601d\u7ef4\u63d0\u793a\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6bd4\u55bb\u6027\u7684\u201c\u767d\u677f\u201d\uff0c\u8ba9\u5176\u4ee5\u56fe\u50cf\u5f62\u5f0f\u5c55\u73b0\u63a8\u7406\u6b65\u9aa4\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u56fe\u50cf\u8fd4\u56de\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u5904\u7406\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u793a\u8303\u6216\u4e13\u7528\u6a21\u5757\uff0c\u800c\u662f\u5229\u7528\u6a21\u578b\u73b0\u6709\u7684\u4f7f\u7528Matplotlib\u548cTurtle\u7b49\u5e93\u7f16\u5199\u4ee3\u7801\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u7b80\u5355\u7b56\u7565\u5728\u56db\u4e2a\u6d89\u53ca\u89c6\u89c9\u548c\u7a7a\u95f4\u63a8\u7406\u7684\u56f0\u96be\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e0e\u94fe\u5f0f\u601d\u8003\u76f8\u6bd4\uff0cGPT-4o\u5728\u67d0\u4e9b\u573a\u666f\u4e0b\u5927\u5e45\u5931\u8d25\uff0c\u5305\u62ec\u4e00\u4e9b\u51c6\u786e\u7387\u4e3a0%\u7684\u60c5\u51b5\u4e0b\uff0c\u800c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u80fd\u63d0\u5347\u81f3\u9ad8\u8fbe92%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8be5\u6280\u672f\u7684\u6210\u529f\u4e4b\u5904\u53ca\u5176\u9519\u8bef\u6765\u6e90\u3002|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556](http://arxiv.org/abs/2406.14556)|**[link](https://github.com/memberre/asyncdriver)**|\u5c3d\u7ba1\u5b9e\u65f6\u89c4\u5212\u5668\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u4e3a\u63d0\u9ad8\u8fd0\u52a8\u89c4\u5212\u7684\u53ef\u89e3\u91ca\u6027\u548c\u53ef\u63a7\u6027\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u9a71\u52a8\u7684\u89c4\u5212\u5668\u4ecd\u9762\u4e34\u8d44\u6e90\u6d88\u8017\u5927\u548c\u63a8\u7406\u65f6\u95f4\u957f\u7684\u95ee\u9898\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5b9e\u7528\u90e8\u7f72\u3002\u9274\u4e8e\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AsyncDriver\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u5f02\u6b65LLM\u589e\u5f3a\u7684\u95ed\u73af\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u7684\u4e0e\u573a\u666f\u76f8\u5173\u7684\u6307\u4ee4\u7279\u5f81\uff0c\u6307\u5bfc\u5b9e\u65f6\u89c4\u5212\u5668\u8fdb\u884c\u7cbe\u786e\u548c\u53ef\u63a7\u7684\u8f68\u8ff9\u9884\u6d4b\u3002AsyncDriver\u5c55\u793a\u4e86LLMs\u5728\u7406\u89e3\u548c\u5904\u7406\u5411\u91cf\u5316\u573a\u666f\u6570\u636e\u53ca\u4e00\u7cfb\u5217\u8def\u7ebf\u6307\u793a\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u540c\u65f6\u901a\u8fc7\u5f02\u6b65\u8bbe\u8ba1\uff0c\u6709\u6548\u964d\u4f4e\u4e86LLM\u5e26\u6765\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u4fdd\u6301\u4e86\u4e0e\u4e4b\u76f8\u8fd1\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728nuPlan\u7684\u590d\u6742\u573a\u666f\u4e2d\u5b9e\u73b0\u4e86\u66f4\u4f18\u7684\u95ed\u73af\u8bc4\u4f30\u6027\u80fd\u3002|\n", "2406.14550": "|**2024-06-20**|**GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models**|Shilong Li et.al.|[2406.14550](http://arxiv.org/abs/2406.14550)|null|\u957f\u6587\u672c\u5904\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u5bf9\u590d\u6742\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u6709\u591a\u65b9\u52aa\u529b\u4f18\u5316LLMs\u5904\u7406\u957f\u8f93\u5165\uff0c\u4f46\u4f9d\u7136\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u63d0\u51faGraphReader\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u56fe\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u6784\u5efa\u6587\u672c\u56fe\u5e76\u8ba9\u4ee3\u7406\u81ea\u4e3b\u63a2\u7d22\u6765\u5904\u7406\u957f\u6587\u672c\u3002\u5f53\u63a5\u6536\u5230\u95ee\u9898\u65f6\uff0c\u4ee3\u7406\u4f1a\u9010\u6b65\u5206\u6790\u5e76\u5236\u5b9a\u5408\u7406\u8ba1\u5212\uff0c\u7136\u540e\u8c03\u7528\u9884\u5b9a\u4e49\u51fd\u6570\u8bfb\u53d6\u8282\u70b9\u5185\u5bb9\u548c\u90bb\u5c45\u4fe1\u606f\uff0c\u5b9e\u73b0\u4ece\u7c97\u5230\u7ec6\u7684\u56fe\u63a2\u7d22\u3002\u5728\u63a2\u7d22\u8fc7\u7a0b\u4e2d\uff0c\u4ee3\u7406\u4e0d\u65ad\u8bb0\u5f55\u65b0\u53d1\u73b0\u5e76\u53cd\u601d\u5f53\u524d\u60c5\u51b5\uff0c\u4ee5\u4f18\u5316\u83b7\u53d6\u4fe1\u606f\u7684\u8fc7\u7a0b\uff0c\u76f4\u5230\u6536\u96c6\u8db3\u591f\u4fe1\u606f\u751f\u6210\u7b54\u6848\u3002\u5728LV-Eval\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u75284k\u4e0a\u4e0b\u6587\u7a97\u53e3\u7684GraphReader\u572816k\u5230256k\u7684\u957f\u6587\u672c\u957f\u5ea6\u4e0a\uff0c\u76f8\u5bf9\u4e8eGPT-4-128k\u6709\u663e\u8457\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5355\u8df3\u548c\u591a\u8df3\u7684\u6311\u6218\u6027\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.14549": "|**2024-06-20**|**Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models**|Sunny Duan et.al.|[2406.14549](http://arxiv.org/abs/2406.14549)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5174\u8d77\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u53d1\u751f\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u4f46\u8fd9\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u9690\u79c1\u548c\u5b89\u5168\u7684\u91cd\u5927\u5fe7\u8651\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u5305\u542b\u6f5c\u5728\u654f\u611f\u6216\u4e13\u6709\u4fe1\u606f\u7684\u5927\u91cf\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u6570\u636e\u6cc4\u9732\u7684\u98ce\u9669\u2014\u2014\u5373\u6a21\u578b\u54cd\u5e94\u63ed\u793a\u90e8\u5206\u4fe1\u606f\u2014\u2014\u5c1a\u4e0d\u4e3a\u4eba\u5145\u5206\u7406\u89e3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u673a\u5668\u5b66\u4e60\u6a21\u578b\u4e2d\u7684\u8bb0\u5fc6\u73b0\u8c61\uff0c\u7279\u522b\u662f\u5173\u6ce8\u5176\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u6f14\u53d8\u3002\u6211\u4eec\u8c03\u67e5\u4e86\u8bad\u7ec3\u6570\u636e\u7684\u7edf\u8ba1\u7279\u6027\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5185\u7f16\u7801\u7684\u8bb0\u5fc6\uff0c\u901a\u8fc7\u8bc4\u4f30\u91cd\u590d\u5bf9\u8bb0\u5fc6\u7684\u5f71\u54cd\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6a21\u578b\u8bb0\u4f4f\u4e00\u4e2a\u5e8f\u5217\u7684\u6982\u7387\u4e0e\u5b83\u5728\u6570\u636e\u4e2d\u51fa\u73b0\u7684\u6b21\u6570\u5448\u5bf9\u6570\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5373\u4f7f\u6ca1\u6709\u540e\u7eed\u7684\u63a5\u89e6\uff0c\u67d0\u4e9b\u770b\u4f3c\u672a\u88ab\u8bb0\u4f4f\u7684\u5e8f\u5217\u4e5f\u53ef\u80fd\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6e10\u663e\u73b0\u3002\u8fd9\u79cd\u9690\u85cf\u7684\u5df2\u8bb0\u4f4f\u5e8f\u5217\u5bf9\u6570\u636e\u9690\u79c1\u6784\u6210\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u9690\u85cf\u5728\u6a21\u578b\u7684\u6700\u7ec8\u68c0\u67e5\u70b9\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8bca\u65ad\u6d4b\u8bd5\uff0c\u901a\u8fc7\u8003\u8651\u5b83\u4eec\u7684\u4ea4\u53c9\u71b5\u635f\u5931\u6765\u63ed\u793a\u8fd9\u4e9b\u6f5c\u5728\u7684\u8bb0\u5fc6\u5e8f\u5217\u3002|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546](http://arxiv.org/abs/2406.14546)|**[link](https://github.com/choidami/inductive-oocr)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b89\u5168\u98ce\u9669\uff0c\u4e00\u4e2a\u7b56\u7565\u662f\u4ece\u5176\u8bad\u7ec3\u6570\u636e\u4e2d\u5220\u9664\u5371\u9669\u77e5\u8bc6\u3002\u5c3d\u7ba1\u8fd9\u6d88\u9664\u4e86\u663e\u6027\u4fe1\u606f\uff0c\u4f46\u9690\u6027\u4fe1\u606f\u53ef\u80fd\u4ecd\u6563\u843d\u5728\u591a\u4e2a\u8bad\u7ec3\u6587\u6863\u4e2d\u3002\u6211\u4eec\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1aLLMs\u80fd\u5426\u901a\u8fc7\u62fc\u51d1\u8fd9\u4e9b\u9690\u542b\u7ebf\u7d22\uff0c\u63a8\u65ad\u51fa\u88ab\u5c4f\u853d\u7684\u77e5\u8bc6\uff1f\u4e3a\u6b64\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u65e0\u4e0a\u4e0b\u6587\u5f52\u7eb3\u63a8\u7406\uff08Inductive Out-of-Context Reasoning\uff0cOOCR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6cdb\u5316\u80fd\u529b\uff0c\u8981\u6c42LLMs\u6839\u636e\u5206\u5e03\u5728\u8bad\u7ec3\u6587\u6863\u4e2d\u7684\u8bc1\u636e\u63a8\u65ad\u6f5c\u5728\u4fe1\u606f\uff0c\u5e76\u5728\u65e0\u9700\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u60c5\u51b5\u4e0b\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u3002\u901a\u8fc7\u4e94\u4e2a\u4efb\u52a1\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u524d\u6cbfLLMs\u786e\u5b9e\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u5728\u4e00\u9879\u5b9e\u9a8c\u4e2d\uff0c\u4ec5\u5bf9\u4e00\u4e2a\u672a\u77e5\u57ce\u5e02\u4e0e\u5176\u4e0e\u5176\u4ed6\u5df2\u77e5\u57ce\u5e02\u4e4b\u95f4\u7684\u8ddd\u79bb\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u6ca1\u6709\u793a\u4f8b\u6216\u94fe\u5f0f\u601d\u8003\uff0c\u8be5LLM\u4e5f\u80fd\u8868\u8ff0\u51fa\u672a\u77e5\u57ce\u5e02\u662f\u5df4\u9ece\uff0c\u5e76\u636e\u6b64\u89e3\u7b54\u540e\u7eed\u95ee\u9898\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u63a5\u53d7\u5355\u4e2a\u786c\u5e01\u629b\u63b7\u7ed3\u679c\u8bad\u7ec3\u7684LLMs\u80fd\u5224\u65ad\u786c\u5e01\u662f\u5426\u504f\u659c\uff0c\u800c\u53ea\u63a5\u89e6$(x, f(x))$\u5bf9\u7684\u6a21\u578b\u80fd\u9610\u8ff0$f$\u7684\u5b9a\u4e49\u5e76\u8ba1\u7b97\u9006\u8fd0\u7b97\u3002\u867d\u7136OOCR\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u826f\u597d\uff0c\u4f46\u6211\u4eec\u4e5f\u53d1\u73b0\u5b83\u5e76\u4e0d\u603b\u662f\u53ef\u9760\u7684\uff0c\u7279\u522b\u662f\u5728\u5c0f\u578bLLMs\u5b66\u4e60\u590d\u6742\u7ed3\u6784\u65f6\u3002\u603b\u7684\u6765\u8bf4\uff0cLLMs\u65e0\u9700\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u5c31\u80fd\u201c\u4e32\u8054\u8d77\u201d\u4fe1\u606f\uff0c\u8fd9\u7ed9\u76d1\u63a7\u548c\u63a7\u5236\u5b83\u4eec\u83b7\u53d6\u7684\u77e5\u8bc6\u5e26\u6765\u4e86\u6f5c\u5728\u6311\u6218\u3002**|\n", "2406.14545": "|**2024-06-20**|**Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems**|\u0110or\u0111e Klisura et.al.|[2406.14545](http://arxiv.org/abs/2406.14545)|null|\u5173\u7cfb\u6570\u636e\u5e93\u5728\u73b0\u4ee3\u4fe1\u606f\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u662f\u5b58\u50a8\u3001\u67e5\u8be2\u548c\u7ba1\u7406\u6570\u636e\u7684\u6838\u5fc3\u3002\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u6587\u672c\u5230SQL\u6280\u672f\u5d2d\u9732\u5934\u89d2\uff0c\u6781\u5927\u5730\u63d0\u5347\u4e86\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u9690\u79c1\u548c\u5b89\u5168\u7684\u62c5\u5fe7\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e13\u6ce8\u4e8e\u63d0\u53d6\u6587\u672c\u5230SQL\u6a21\u578b\u6240\u4f9d\u8d56\u7684\u6570\u636e\u5e93\u6a21\u5f0f\u5143\u7d20\u3002\u4e86\u89e3\u6a21\u5f0f\u53ef\u80fd\u4f7fSQL\u6ce8\u5165\u653b\u51fb\u66f4\u4e3a\u5bb9\u6613\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u96f6\u77e5\u8bc6\u6846\u67b6\uff0c\u901a\u8fc7\u63d0\u51fa\u7cbe\u5fc3\u6784\u9020\u7684\u95ee\u9898\uff0c\u65e0\u9700\u76f4\u63a5\u4e86\u89e3\u6570\u636e\u5e93\uff0c\u8be5\u6846\u67b6\u80fd\u4fc3\u4f7f\u8fd9\u4e9b\u6a21\u578b\u5904\u7406\u8fd9\u4e9b\u95ee\u9898\u5e76\u751f\u6210\u8f93\u51fa\uff0c\u4ece\u800c\u63ed\u793a\u6570\u636e\u5e93\u6a21\u5f0f\u7ed3\u6784\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u5e94\u7528\u4e8e\u9488\u5bf9\u6587\u672c-SQL\u5bf9\u8fdb\u884c\u8fc7\u5fae\u8c03\u7684\u4e13\u7528\u6587\u672c\u5230SQL\u6a21\u578b\u4ee5\u53ca\u7528\u4e8eSQL\u751f\u6210\u7684\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u80fd\u591f\u4ee5\u63a5\u8fd10.75\u7684F1\u5206\u6570\u91cd\u6784\u8868\u540d\uff0c\u800c\u5bf9\u4e8e\u751f\u6210\u5f0f\u6a21\u578b\uff0c\u8fd9\u4e00\u5206\u6570\u66f4\u662f\u9ad8\u8fbe0.96\u3002|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544](http://arxiv.org/abs/2406.14544)|**[link](https://github.com/sparksjoe/prism)**|**## \u7ffb\u8bd1 \u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5904\u7406\u5404\u79cd\u89c6\u89c9\u95ee\u9898\u65f6\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u5177\u5907\u5f3a\u5927\u7684\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u611f\u77e5\u548c\u63a8\u7406\u5728\u73b0\u6709VLM\u4e2d\u7684\u4ea4\u7ec7\u6027\uff0c\u72ec\u7acb\u8bc4\u4f30\u8fd9\u4e24\u65b9\u9762\u7684\u80fd\u529b\u9887\u5177\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\u2014\u2014Prism\uff0c\u65e8\u5728\u5206\u79bb\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u5728\u89c6\u89c9\u95ee\u7b54\u4e2d\u7684\u4f5c\u7528\u3002Prism\u5206\u4e3a\u4e24\u4e2a\u9636\u6bb5\uff1a\u611f\u77e5\u9636\u6bb5\u5229\u7528VLM\u63d0\u53d6\u5e76\u4ee5\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u89c6\u89c9\u4fe1\u606f\uff1b\u63a8\u7406\u9636\u6bb5\u5219\u6839\u636e\u63d0\u53d6\u7684\u89c6\u89c9\u4fe1\u606f\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u54cd\u5e94\u3002\u8fd9\u79cd\u6a21\u5757\u5316\u8bbe\u8ba1\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u7cfb\u7edf\u5730\u6bd4\u8f83\u548c\u8bc4\u4f30\u4e0d\u540cVLM\u7684\u611f\u77e5\u548c\u63a8\u7406\u6027\u80fd\u3002 \u6211\u4eec\u7684\u5206\u6790\u6846\u67b6\u63d0\u4f9b\u4e86\u8bf8\u591a\u6d1e\u89c1\uff0c\u8bc1\u660e\u4e86Prism\u4f5c\u4e3a\u6210\u672c\u6548\u76ca\u9ad8\u7684\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5c06\u4e13\u6ce8\u4e8e\u611f\u77e5\u7684\u7b80\u5316VLM\u4e0e\u4e13\u4e3a\u63a8\u7406\u8bbe\u8ba1\u7684\u5f3a\u5927LLM\u76f8\u7ed3\u5408\uff0cPrism\u5728\u901a\u7528\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4f18\u5f02\u6210\u7ee9\uff0c\u540c\u65f6\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u548c\u8fd0\u8425\u6210\u672c\u3002\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\uff0c\u5f53Prism\u914d\u5907\u57fa\u7840\u76842B LLaVA VLM\u548c\u5f00\u6e90\u7684GPT-3.5\u65f6\uff0c\u5176\u5728\u4e25\u8c28\u7684\u591a\u6a21\u6001\u57fa\u51c6MMStar\u4e0a\u7684\u8868\u73b0\u53ef\u4e0e\u5927\u5341\u500d\u7684VLM\u76f8\u5f53\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728\uff1ahttps://github.com/SparksJoe/Prism\u3002**|\n", "2406.14541": "|**2024-06-21**|**Are LLMs Naturally Good at Synthetic Tabular Data Generation?**|Shengzhe Xu et.al.|[2406.14541](http://arxiv.org/abs/2406.14541)|**[link](https://github.com/anonymou9167/anonymouscode)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u548c\u56fe\u50cf\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u5728\u751f\u6210\u6700\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b\u2014\u2014\u8868\u683c\u6570\u636e\u65b9\u9762\u7684\u6f5c\u529b\u5374\u9c9c\u6709\u7814\u7a76\u3002\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u76f4\u63a5\u4f7f\u7528\u6216\u7ecf\u8fc7\u4f20\u7edf\u5fae\u8c03\u7684LLMs\u5728\u4f5c\u4e3a\u5408\u6210\u8868\u683c\u751f\u6210\u5668\u65f6\u8868\u73b0\u6781\u5dee\u3002\u7531\u4e8eLLMs\u7684\u81ea\u56de\u5f52\u7279\u6027\uff0c\u968f\u673a\u987a\u5e8f\u6392\u5217\u7684\u5fae\u8c03\u4e0e\u6355\u6349\u529f\u80fd\u6027\u4f9d\u8d56\u7684\u91cd\u8981\u6027\u76f8\u6096\uff0c\u5bfc\u81f4\u5b83\u4eec\u65e0\u6cd5\u5904\u7406\u6761\u4ef6\u6df7\u5408\u5206\u5e03\uff08\u8fd9\u662f\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u7ea6\u675f\u7684\u5173\u952e\uff09\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u4f7fLLMs\u53d8\u5f97\u611f\u77e5\u6392\u5217\u987a\u5e8f\u6765\u6539\u5584\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u4ece\u800c\u63d0\u5347\u5176\u6027\u80fd\u3002**|\n", "2406.14517": "|**2024-06-20**|**PostMark: A Robust Blackbox Watermark for Large Language Models**|Yapei Chang et.al.|[2406.14517](http://arxiv.org/abs/2406.14517)|**[link](https://github.com/lilakk/postmark)**|**\u6700\u6709\u6548\u7684\u68c0\u6d4b\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6587\u672c\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u5728\u89e3\u7801\u8fc7\u7a0b\u4e2d\u63d2\u5165\u53ef\u8bc6\u522b\u7684\u6807\u8bb0\uff0c\u5373\u6c34\u5370\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u83b7\u53d6\u5230LLM\u7684\u539f\u59cb\u6982\u7387\uff08logits\uff09\uff0c\u8fd9\u4f7f\u5f97LLM\u670d\u52a1\u63d0\u4f9b\u5546\u4e0d\u613f\u5206\u4eab\uff0c\u56e0\u4e3a\u62c5\u5fc3\u6a21\u578b\u6cc4\u9732\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u6c34\u5370\u9700\u8981\u6bcf\u4e2a\u63d0\u4f9b\u8005\u72ec\u7acb\u5f00\u53d1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u540e\u5904\u7406\u6c34\u5370\u65b9\u6848\uff0c\u540d\u4e3aPostMark\u3002\u5b83\u662f\u4e00\u79cd\u6a21\u5757\u5316\u7684\u3001\u751f\u6210\u540e\u63d2\u5165\u7684\u6c34\u5370\u7b56\u7565\uff0c\u65e0\u9700\u89e6\u53calogits\uff0c\u9002\u5408\u7b2c\u4e09\u65b9\u5b9e\u65bd\u3002PostMark\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u5bf9\u6297\u540c\u4e49\u53e5\u653b\u51fb\u80fd\u529b\uff1a\u6211\u4eec\u5728\u5b9e\u9a8c\u4e2d\u6db5\u76d6\u4e86\u516b\u4e2a\u57fa\u7840\u7b97\u6cd5\u3001\u4e94\u4e2a\u57fa\u7ebfLLM\u548c\u4e09\u4e2a\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86PostMark\u5bf9\u6587\u672c\u8d28\u91cf\u7684\u5f71\u54cd\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86\u8d28\u91cf\u548c\u6297\u6539\u5199\u653b\u51fb\u4e4b\u95f4\u7684\u6743\u8861\u3002\u7814\u7a76\u4ee3\u7801\u3001\u8f93\u51fa\u548c\u6ce8\u91ca\u5df2\u516c\u5f00\u5728https://github.com/lilakk/PostMark\u3002**|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.15330": "|**2024-06-21**|**Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance**|Haoling Li et.al.|[2406.15330](http://arxiv.org/abs/2406.15330)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u4f17\u591a\u7814\u7a76\u9886\u57df\u5e26\u6765\u4e86\u9769\u65b0\u3002\u5c3d\u7ba1\u4eba\u4eec\u666e\u904d\u77e5\u9053\u5fae\u8c03\u5bf9\u4e8e\u589e\u5f3aLLMs\u7684\u529f\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u8868\u660e\uff0c\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u5b58\u5728\u53c2\u6570\u5197\u4f59\u3002\u56e0\u6b64\uff0c\u6709\u7814\u7a76\u5efa\u8bae\u53ea\u66f4\u65b0\u90e8\u5206\u53c2\u6570\uff0c\u4f46\u8fd9\u672a\u80fd\u6709\u6548\u5229\u7528\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\u6765\u8bc6\u522b\u8bad\u7ec3\u4e2d\u7684\u91cd\u8981\u53c2\u6570\u3002\u8003\u8651\u5230\u68af\u5ea6\u672c\u8d28\u4e0a\u8574\u542b\u7740\u4efb\u52a1\u76f8\u5173\u6570\u636e\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68af\u5ea6\u63a9\u7801\u8c03\u4f18\uff08Gradient-Mask Tuning\uff0cGMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6839\u636e\u53c2\u6570\u7684\u68af\u5ea6\u4fe1\u606f\u9009\u62e9\u6027\u5730\u8fdb\u884c\u8bad\u7ec3\u66f4\u65b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8ba1\u7b97\u68af\u5ea6\u7684\u7edd\u5bf9\u503c\uff0c\u5e76\u5bf9\u8f83\u5c0f\u5e45\u5ea6\u7684\u68af\u5ea6\u5e94\u7528\u63a9\u7801\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMT\u4e0d\u4ec5\u4f18\u4e8e\u4f20\u7edf\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u8fd8\u63d0\u5347\u4e86LLM\u6027\u80fd\u7684\u4e0a\u9650\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cGMT\u5bf9\u63a9\u7801\u6bd4\u4f8b\u5177\u6709\u4e00\u5b9a\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u4e0e\u57fa\u672c\u7684\u5fae\u8c03\uff08Simple Fine-Tuning\uff0cSFT\uff09\u76f8\u5f53\u3002|\n", "2406.15325": "|**2024-06-21**|**Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks**|Hokyung Lee et.al.|[2406.15325](http://arxiv.org/abs/2406.15325)|**[link](https://github.com/hamminghq/bug-in-the-code-stack)**|\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u9488\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6d77\u91cf\u6587\u672c\u6587\u6863\u4e2d\u68c0\u7d22\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684Needle-in-a-Haystack\uff08NIAH\uff09\u57fa\u51c6\u7814\u7a76\u6709\u6240\u8fdb\u5c55\u3002\u968f\u7740LLMs\u5728\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u65e5\u76ca\u878d\u5408\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u4ee3\u7801\u73af\u5883\u4e2d\u7684\u8868\u73b0\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740LLMs\u671d\u7740\u7a0b\u5e8f\u5408\u6210\u65b9\u5411\u53d1\u5c55\uff0c\u5fc5\u987b\u786e\u4fdd\u5b83\u4eec\u80fd\u7406\u89e3\u8bed\u6cd5\u5e76\u7f16\u5199\u51fa\u7b26\u5408\u8bed\u6cd5\u89c4\u5219\u7684\u4ee3\u7801\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Bug In The Code Stack\uff08BICS\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u68c0\u9a8cLLMs\u8bc6\u522b\u7b80\u5355\u8bed\u6cd5\u9519\u8bef\u7684\u80fd\u529b\u4e8e\u5927\u578b\u6e90\u4ee3\u7801\u4e2d\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\u4e09\u4e2a\u5173\u952e\u70b9\uff1a\uff081\uff09\u4e0e\u6587\u672c\u73af\u5883\u76f8\u6bd4\uff0c\u57fa\u4e8e\u4ee3\u7801\u7684\u73af\u5883\u5bf9\u68c0\u7d22\u4efb\u52a1\u6784\u6210\u4e86\u66f4\u5927\u7684\u6311\u6218\uff1b\uff082\uff09\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff1b\uff083\uff09\u5c3d\u7ba1\u5982\u6b64\uff0c\u8f83\u957f\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e0e\u6027\u80fd\u4e0b\u964d\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u8fd9\u79cd\u4e0b\u964d\u7a0b\u5ea6\u5728\u4e0d\u540c\u7684\u6a21\u578b\u95f4\u6709\u6240\u4e0d\u540c\u3002|\n", "2406.15264": "|**2024-06-21**|**Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics**|Weijia Zhang et.al.|[2406.15264](http://arxiv.org/abs/2406.15264)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u53ef\u9760\u6216\u96be\u4ee5\u9a8c\u8bc1\u7684\u4fe1\u606f\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u7684LLMs\u5f15\u5165\u4e86\u5f15\u7528\uff0c\u4f7f\u5185\u5bb9\u57fa\u4e8e\u53ef\u6838\u67e5\u7684\u6765\u6e90\u3002\u7136\u800c\uff0c\u624b\u52a8\u8bc4\u4f30\u5f15\u7528\u662f\u5426\u5145\u5206\u652f\u6301\u76f8\u5173\u9648\u8ff0\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4fe1\u4ef0\u5ea6\u6307\u6807\u81ea\u52a8\u4f30\u8ba1\u5f15\u7528\u7684\u652f\u6301\u7a0b\u5ea6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u9650\u4e8e\u4e8c\u5206\u7c7b\uff0c\u5ffd\u89c6\u4e86\u5b9e\u9645\u573a\u666f\u4e2d\u5bf9\u7cbe\u7ec6\u7ea7\u522b\u5f15\u7528\u652f\u6301\u7684\u8003\u91cf\u3002\u4e3a\u4e86\u63a2\u7a76\u4fe1\u4ef0\u5ea6\u6307\u6807\u5728\u7cbe\u7ec6\u7ea7\u522b\u8bc4\u4f30\u4e2d\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6bd4\u8f83\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u68c0\u9a8c\u8fd9\u4e9b\u6307\u6807\u5728\u533a\u5206\u4e09\u79cd\u652f\u6301\u7b49\u7ea7\uff08\u5168\u9762\u3001\u90e8\u5206\u548c\u65e0\u652f\u6301\uff09\u4e4b\u95f4\u7684\u80fd\u529b\uff1a\u5168\u9762\u652f\u6301\u3001\u90e8\u5206\u652f\u6301\u548c\u4e0d\u652f\u6301\u3002\u6211\u4eec\u7684\u6846\u67b6\u91c7\u7528\u76f8\u5173\u6027\u5206\u6790\u3001\u5206\u7c7b\u8bc4\u4f30\u548c\u68c0\u7d22\u8bc4\u4f30\uff0c\u5168\u65b9\u4f4d\u8861\u91cf\u6307\u6807\u5206\u6570\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u4e00\u81f4\u6027\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u6ca1\u6709\u5355\u4e00\u6307\u6807\u5728\u6240\u6709\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63ed\u793a\u4e86\u7cbe\u7ec6\u7ea7\u522b\u652f\u6301\u8bc4\u4f30\u7684\u590d\u6742\u6027\u3002\u6839\u636e\u53d1\u73b0\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u4e3a\u5f00\u53d1\u66f4\u6709\u6548\u7684\u6307\u6807\u63d0\u4f9b\u4e86\u5b9e\u7528\u5efa\u8bae\u3002|\n", "2406.15231": "|**2024-06-21**|**Detecting Synthetic Lyrics with Few-Shot Inference**|Yanis Labrak et.al.|[2406.15231](http://arxiv.org/abs/2406.15231)|null|\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u7684\u97f3\u4e50\u5185\u5bb9\u9010\u6e10\u53d7\u5230\u5173\u6ce8\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u88ab\u6709\u6548\u5e94\u7528\u4e8e\u521b\u4f5c\u5404\u79cd\u98ce\u683c\u3001\u4e3b\u9898\u548c\u8bed\u8a00\u7ed3\u6784\u7684\u6b4c\u8bcd\uff0c\u8fd9\u63a8\u52a8\u4e86\u827a\u672f\u5bb6\u4eec\u7684\u521b\u4f5c\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u7248\u6743\u4fb5\u72af\u3001\u6d88\u8d39\u8005\u6ee1\u610f\u5ea6\u548c\u5185\u5bb9\u6ee5\u53d1\u7b49\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u68c0\u6d4b\u751f\u6210\u6b4c\u8bcd\u7684\u65b9\u6cd5\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5e76\u672a\u4e13\u6ce8\u4e8e\u8fd9\u4e00\u7279\u5b9a\u9886\u57df\u6216\u521b\u610f\u6587\u672c\u7684\u673a\u5668\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u7cbe\u5fc3\u6784\u5efa\u4e86\u9996\u4e2a\u9ad8\u8d28\u91cf\u5408\u6210\u6b4c\u8bcd\u6570\u636e\u96c6\uff0c\u5e76\u5bf9\u591a\u79cd\u57fa\u4e8e\u5c11\u91cf\u6837\u672c\u7684\u68c0\u6d4b\u65b9\u6cd5\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5b9a\u91cf\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u8f85\u4ee5\u4eba\u7c7b\u8bc4\u4ef7\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6700\u4f73\u5c11\u6570\u6837\u672c\u68c0\u6d4b\u5668\u2014\u2014\u57fa\u4e8eLLM2Vec\u7684\u65b9\u6cd5\u8d85\u8d8a\u4e86\u5728\u5176\u4ed6\u9886\u57df\u8868\u73b0\u5f3a\u52b2\u7684\u98ce\u683c\u548c\u7edf\u8ba1\u65b9\u6cd5\uff0c\u6210\u529f\u9274\u522b\u51fa\u4eba\u7c7b\u521b\u4f5c\u4e0e\u673a\u5668\u751f\u6210\u7684\u6b4c\u8bcd\uff0c\u4e14\u5c55\u73b0\u51fa\u826f\u597d\u7684\u8de8\u827a\u672f\u5bb6\u548c\u6a21\u578b\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u80fd\u6709\u6548\u8bc6\u522b\u751f\u6210\u540e\u7684\u4eba\u5de5\u6da6\u8272\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5728\u521b\u610f\u5185\u5bb9\u68c0\u6d4b\u9886\u57df\uff0c\u7279\u522b\u662f\u6cdb\u5316\u80fd\u529b\u548c\u5bf9\u66f4\u5927\u6b4c\u66f2\u5e93\u7684\u9002\u5e94\u6027\u65b9\u9762\uff0c\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6240\u6709\u6570\u636e\u96c6\u3001\u9884\u5904\u7406\u811a\u672c\u548c\u4ee3\u7801\u5df2\u516c\u5f00\u5728GitHub\u548cHugging Face\u4e0a\uff0c\u9075\u5faaApache 2.0\u8bb8\u53ef\u534f\u8bae\u3002|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227](http://arxiv.org/abs/2406.15227)|**[link](https://github.com/hitz-zentroa/cn-eval)**|\u968f\u7740\u7f51\u7edc\u4e0a\u9519\u8bef\u4fe1\u606f\u548c\u6709\u5bb3\u8a00\u8bba\u7684\u589e\u591a\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u53cd\u53d9\u4e8b\uff08Counter Narrative\uff0cCN\uff09\u751f\u6210\u6280\u672f\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u751f\u6210\u7684CN\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e4b\u95f4\u7684\u590d\u6742\u5173\u7cfb\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684CN\uff0c\u5373\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4f5c\u4e3a\u8bc4\u4f30\u5668\u3002\u901a\u8fc7\u4ee5\u9526\u6807\u8d5b\u5f62\u5f0f\u5bf9\u751f\u6210\u7684CN\u8fdb\u884c\u5bf9\u6218\u6bd4\u8f83\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u6a21\u578b\u6392\u540d\u6d41\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u95f4\u7684\u76f8\u5173\u7cfb\u6570\u8fbe\u52300.88\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528LLM\u8fdb\u884c\u96f6\u6837\u672c\uff08Zero-Shot\uff0cZS\uff09CN\u751f\u6210\u7684\u80fd\u529b\uff0c\u5bf9\u6bd4\u5206\u6790\u4e86\u804a\u5929\u3001\u6307\u4ee4\u548c\u57fa\u7840\u6a21\u578b\u7684\u6027\u80fd\u548c\u5c40\u9650\u6027\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u5fae\u8c03\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u9886\u57df\u6570\u636e\u4e0b\u7684\u54cd\u5e94\u5dee\u5f02\u3002\u7ed3\u8bba\u662f\uff0c\u5bf9\u4e8e\u6267\u884c\u8fd9\u9879\u4efb\u52a1\uff0c\u5982\u679c\u80fd\u907f\u514d\u56e0\u5b89\u5168\u987e\u8651\u800c\u62d2\u7edd\u751f\u6210\uff0c\u804a\u5929\u5bfc\u5411\u7684ZS\u6a21\u578b\u53ef\u80fd\u662f\u6700\u4f73\u9009\u62e9\u3002|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214](http://arxiv.org/abs/2406.15214)|null|## \u7ffb\u8bd1 \u5bf9\u8bdd\u7b56\u7565\u5728\u6784\u5efa\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5176\u5f00\u53d1\u548c\u7ef4\u62a4\u5f80\u5f80\u9700\u8981\u5bf9\u8bdd\u5efa\u6a21\u4e13\u5bb6\u7684\u5927\u91cf\u6295\u5165\u3002\u5c3d\u7ba1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u624b\u5934\u6709\u5927\u91cf\u7684\u5bf9\u8bdd\u6570\u636e\uff0c\u4f46\u4eba\u4eec\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5bf9\u8bdd\u7b56\u7565\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u901a\u8fc7\u5c55\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u5728\u5bf9\u8bdd\u6570\u636e\u8f6c\u5316\u4e3a\u7edf\u4e00\u7684\u4e2d\u95f4\u8868\u793a\u2014\u2014\u89c4\u8303\u5f62\u5f0f\u7684\u8fc7\u7a0b\u4e2d\u53d1\u6325\u4f5c\u7528\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5229\u7528\u53ef\u63a7\u4e14\u53ef\u89e3\u91ca\u7684\u56fe\u57fa\u65b9\u6cd5\u751f\u6210\u5bf9\u8bdd\u7b56\u7565\u7684\u6280\u672f\u3002\u901a\u8fc7\u5c06\u5bf9\u8bdd\u4e2d\u7684\u89c4\u8303\u5f62\u5f0f\u6574\u5408\u6210\u6d41\u7a0b\u7f51\u7edc\uff0c\u6211\u4eec\u53d1\u73b0\u8fd0\u884c\u56fe\u904d\u5386\u7b97\u6cd5\u6709\u52a9\u4e8e\u63d0\u53d6\u5bf9\u8bdd\u6d41\u7a0b\u3002\u76f8\u6bd4\u4ec5\u4f9d\u8d56LLM\u63d0\u53d6\u7684\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6d41\u7a0b\u66f4\u597d\u5730\u53cd\u6620\u4e86\u5e95\u5c42\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e8\u5728\u8d4b\u4e88\u5bf9\u8bdd\u8bbe\u8ba1\u8005\u66f4\u5927\u7684\u63a7\u5236\u529b\uff0c\u63d0\u4f9b\u4e00\u4e2a\u63d0\u5347\u5bf9\u8bdd\u7b56\u7565\u5f00\u53d1\u6548\u7387\u7684\u5de5\u5177\u3002|\n", "2406.15209": "|**2024-06-21**|**Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding**|Mohan Li et.al.|[2406.15209](http://arxiv.org/abs/2406.15209)|null|## \u80cc\u666f \u96f6\u6837\u672c\u8bed\u97f3\u8bed\u8a00\u7406\u89e3\uff08SLU\uff09\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u65e0\u9700\u5148\u524d\u8bad\u7ec3\u6570\u636e\u7684\u65b0\u9886\u57df\u7406\u89e3\u7528\u6237\u8bdd\u8bed\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5bfc\u81f4\u5e9e\u5927\u7684\u5b58\u50a8\u9700\u6c42\u548c\u590d\u6742\u6027\u3002\u672c\u6587\u63d0\u51fa\u4f7f\u7528 Whisper\uff0c\u4e00\u4e2a\u72ec\u7acb\u7684\u8bed\u97f3\u5904\u7406\u6a21\u578b\uff0c\u6765\u8fdb\u884c\u96f6\u6837\u672c\u7aef\u5230\u7aef\uff08E2E\uff09SLU\u3002\u4e3a\u5904\u7406\u672a\u89c1\u8fc7\u7684\u8bed\u4e49\u6807\u7b7e\uff0c\u6211\u4eec\u5c06SLU\u4efb\u52a1\u878d\u5165\u95ee\u7b54\uff08QA\uff09\u6846\u67b6\u4e2d\uff0c\u901a\u8fc7\u63d0\u793aWhisper\u89e3\u7801\u5668\u8fdb\u884c\u8bed\u4e49\u63a8\u65ad\u3002\u6211\u4eec\u91c7\u7528\u524d\u7f00\u8c03\u4f18\u65b9\u6cd5\u9ad8\u6548\u5730\u8bad\u7ec3\u8be5\u7cfb\u7edf\uff0c\u53ea\u4f18\u5316\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2aWhisper\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u7cfb\u7edf\u5728SLURP\u4e0a\u7684\u69fd\u4f4d\u586b\u5145\uff08SLU-F1\uff09\u5f97\u5206\u6bd4\u6700\u8fd1\u5f15\u5165\u7684\u96f6\u6837\u672c\u57fa\u51c6\u63d0\u9ad8\u4e8640.7%\u3002\u6b64\u5916\uff0c\u5728\u65e2\u5b9a\u548c\u8de8\u9886\u57df\u8bc4\u4f30\u73af\u5883\u4e0b\uff0c\u5b83\u4e0e\u57fa\u4e8eWhisper-GPT-2\u7684\u6a21\u5757\u5316\u7cfb\u7edf\u8868\u73b0\u76f8\u5f53\uff0c\u4f46\u6a21\u578b\u53c2\u6570\u51cf\u5c11\u4e8634.8%\u3002|\n", "2406.15198": "|**2024-06-21**|**Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms**|Santiago Berrezueta-Guzman et.al.|[2406.15198](http://arxiv.org/abs/2406.15198)|null|\u6ce8\u610f\u529b\u7f3a\u9677\u591a\u52a8\u969c\u788d\uff08ADHD\uff09\u662f\u4e00\u79cd\u795e\u7ecf\u53d1\u80b2\u969c\u788d\uff0c\u5176\u7279\u5f81\u4e3a\u6ce8\u610f\u529b\u4e0d\u96c6\u4e2d\u3001\u8fc7\u5ea6\u6d3b\u8dc3\u548c\u51b2\u52a8\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4f53\u7684\u65e5\u5e38\u751f\u6d3b\u548c\u751f\u6d3b\u8d28\u91cf\u3002\u804c\u4e1a\u7597\u6cd5\u5728ADHD\u7ba1\u7406\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u901a\u8fc7\u57f9\u517b\u65e5\u5e38\u751f\u6d3b\u6240\u9700\u7684\u6280\u80fd\uff0c\u63d0\u5347\u4e2a\u4f53\u5728\u5b66\u6821\u3001\u5bb6\u5ead\u548c\u793e\u4f1a\u73af\u5883\u4e2d\u5168\u9762\u53c2\u4e0e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\u548c\u793e\u4ea4\u8f85\u52a9\u673a\u5668\u4eba\uff09\u5728\u5fc3\u7406\u6cbb\u7597\u4e2d\u7684\u6f5c\u5728\u4ef7\u503c\uff0c\u4ee5\u5f25\u8865\u73b0\u6709\u7597\u6cd5\u7684\u5c40\u9650\uff0c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u652f\u6301\u5e76\u9002\u5e94\u4e2a\u4f53\u7684\u72ec\u7279\u9700\u6c42\u3002\u7136\u800c\uff0c\u5173\u4e8e\u8fd9\u4e9b\u5148\u8fdb\u6280\u672f\u5728ADHD\u7597\u6cd5\u4e2d\u7684\u8054\u5408\u5e94\u7528\u7814\u7a76\u5c1a\u5b58\u5728\u8f83\u5927\u7a7a\u767d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6574\u5408\u4e86ChatGPT-4 Turbo\u548cClaude-3 Opus\u4e24\u4e2a\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u5230\u4e00\u4e2a\u673a\u5668\u4eba\u52a9\u7406\u4e2d\uff0c\u4ee5\u8003\u5bdf\u5b83\u4eec\u5728\u673a\u5668\u4eba\u8f85\u52a9\u4e92\u52a8\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u5728\u4e00\u4e2a\u6a21\u62df\u6cbb\u7597\u573a\u666f\u4e2d\u6bd4\u8f83\u5b83\u4eec\u4e0e\u4e34\u5e8a\u9a8c\u8bc1\u7684\u5b9a\u5236\u6a21\u578b\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cChatGPT-4 Turbo\u5728\u6027\u80fd\u548c\u54cd\u5e94\u901f\u5ea6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u9002\u5408\u4e8e\u65f6\u95f4\u654f\u611f\u7684\u5e94\u7528\u3002\u800cClaude-3 Opus\u5728\u7406\u89e3\u3001\u8fde\u8d2f\u6027\u548c\u4f26\u7406\u8003\u91cf\u65b9\u9762\u8868\u73b0\u51fa\u4f18\u52bf\uff0c\u5f3a\u8c03\u5b89\u5168\u548c\u5438\u5f15\u4eba\u7684\u4e92\u52a8\u3002\u4e24\u8005\u90fd\u5c55\u73b0\u51fa\u521b\u65b0\u548c\u9002\u5e94\u6027\uff0c\u4f46ChatGPT-4 Turbo\u5728\u96c6\u6210\u7b80\u6613\u5ea6\u548c\u8bed\u8a00\u652f\u6301\u65b9\u9762\u66f4\u5177\u4f18\u52bf\u3002\u9009\u62e9\u54ea\u4e2a\u6a21\u578b\u53d6\u51b3\u4e8eADHD\u7597\u6cd5\u7684\u5177\u4f53\u9700\u6c42\u3002|\n", "2406.15187": "|**2024-06-21**|**UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis**|Yulong Hui et.al.|[2406.15187](http://arxiv.org/abs/2406.15187)|**[link](https://github.com/qinchuanhui/uda-benchmark)**|**## \u7ffb\u8bd1 \u5c3d\u7ba1\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation, RAG\uff09\u6280\u672f\u63d0\u5347\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u7279\u522b\u662f\u5728\u5b66\u672f\u6587\u732e\u548c\u91d1\u878d\u95ee\u7b54\u7b49\u9886\u57df\uff0c\u6570\u636e\u5e38\u5e38\u4ee5HTML\u6216PDF\u683c\u5f0f\u7684\u5197\u957f\u3001\u7ed3\u6784\u590d\u6742\u7684\u6587\u672c\u548c\u8868\u683c\u5f62\u5f0f\u5b58\u5728\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3a\u201cUnstructured Document Analysis\u201d\uff08UDA\uff09\u7684\u65b0\u57fa\u51c6\uff0c\u5b83\u5305\u542b2,965\u4efd\u771f\u5b9e\u4e16\u754c\u7684\u6587\u6863\u548c29,590\u4e2a\u4e13\u5bb6\u6807\u6ce8\u7684\u95ee\u7b54\u5bf9\u3002\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u57fa\u4e8eLLM\u548cRAG\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u5e76\u5728\u591a\u4e2a\u6587\u6863\u9886\u57df\u548c\u591a\u6837\u5316\u7684\u67e5\u8be2\u7c7b\u578b\u4e0a\u8bc4\u4f30\u7b54\u6848\u8d28\u91cf\u548c\u7b56\u7565\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u6709\u8da3\u7684\u7ed3\u679c\uff0c\u5f3a\u8c03\u4e86\u6570\u636e\u89e3\u6790\u548c\u68c0\u7d22\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u4e2a\u57fa\u51c6\u80fd\u591f\u4e3a\u73b0\u5b9e\u4e16\u754c\u7684\u6587\u6863\u5206\u6790\u5e94\u7528\u63d0\u4f9b\u542f\u793a\uff0c\u5e76\u4e3a\u5176\u53d1\u5c55\u670d\u52a1\u3002\u57fa\u51c6\u5957\u4ef6\u548c\u4ee3\u7801\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.16858": "|**2024-06-24**|**EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees**|Yuhui Li et.al.|[2406.16858](http://arxiv.org/abs/2406.16858)|**[link](https://github.com/safeailab/eagle)**|\u5728\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6210\u672c\u9ad8\u4e14\u8017\u65f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6295\u673a\u53d6\u5de7\u7684\u62bd\u6837\u65b9\u6cd5\u5982EAGLE\u5df2\u8bc1\u5b9e\u6709\u6548\u3002\u4f20\u7edf\u65b9\u6cd5\u5047\u8bbe\u8349\u7a3f\u6811\u7684\u63a5\u53d7\u7387\u4ec5\u4f9d\u8d56\u4e8e\u4ee4\u724c\u7684\u4f4d\u7f6e\uff0c\u7136\u800c\u6211\u4eec\u53d1\u73b0\u8fd9\u5176\u5b9e\u8fd8\u53d6\u51b3\u4e8e\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728EAGLE\u7684\u57fa\u7840\u4e0a\u63d0\u51fa\u4e86EAGLE-2\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u52a8\u6001\u8349\u7a3f\u6811\u6280\u672f\u5230\u8d77\u8349\u5efa\u6a21\u4e2d\u3002\u8fd9\u4e00\u6539\u8fdb\u5229\u7528\u4e86EAGLE\u7684\u8349\u7a3f\u6a21\u578b\u6821\u51c6\u826f\u597d\u7684\u7279\u6027\uff1a\u8349\u7a3f\u6a21\u578b\u7684\u4fe1\u5fc3\u5206\u6570\u80fd\u8fd1\u4f3c\u8868\u793a\u63a5\u53d7\u7387\uff0c\u8bef\u5dee\u8f83\u5c0f\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u7cfb\u5217\u7684LLMs\u548c\u516d\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aEAGLE-2\u7684\u901f\u5ea6\u63d0\u5347\u6bd4\u7387\u4e3a3.05\u500d\u52304.26\u500d\uff0c\u6bd4EAGLE-1\u5feb20%\u523040%\u3002\u6b64\u5916\uff0cEAGLE-2\u8fd8\u80fd\u4fdd\u6301\u751f\u6210\u6587\u672c\u5206\u5e03\u4e0d\u53d8\uff0c\u56e0\u6b64\u662f\u4e00\u4e2a\u65e0\u635f\u52a0\u901f\u7b97\u6cd5\u3002|\n", "2406.16838": "|**2024-06-24**|**From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models**|Sean Welleck et.al.|[2406.16838](http://arxiv.org/abs/2406.16838)|null|\u73b0\u4ee3\u7814\u7a76\u4e2d\u6700\u5f15\u4eba\u6ce8\u76ee\u7684\u53d1\u73b0\u4e4b\u4e00\u662f\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u589e\u52a0\u8ba1\u7b97\u8d44\u6e90\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u65ad\u65f6\u7684\u4f18\u5316\u65b9\u6cd5\u7684\u5173\u6ce8\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u7bc7\u7efc\u8ff0\u4e13\u95e8\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u63a8\u65ad\u65f6\u95f4\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u7edf\u4e00\u7684\u6570\u5b66\u6846\u67b6\u51fa\u53d1\uff0c\u8003\u5bdf\u4e86\u4e09\u4e2a\u9886\u57df\uff1a\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\u3001\u5143\u751f\u6210\u7b97\u6cd5\u548c\u9ad8\u6548\u751f\u6210\u3002\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\uff0c\u901a\u5e38\u79f0\u4e3a\u89e3\u7801\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e00\u6b21\u62bd\u6837\u4e00\u4e2atoken\u6216\u6784\u5efa\u8bcd\u7ea7\u641c\u7d22\u7a7a\u95f4\uff0c\u7136\u540e\u9009\u62e9\u8f93\u51fa\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5047\u8bbe\u80fd\u591f\u8bbf\u95ee\u8bed\u8a00\u6a21\u578b\u7684logits\u3001\u4e0b\u4e00\u4e2atoken\u5206\u5e03\u6216\u6982\u7387\u5206\u6570\u3002\u5143\u751f\u6210\u7b97\u6cd5\u5904\u7406\u90e8\u5206\u6216\u5b8c\u6574\u5e8f\u5217\uff0c\u878d\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u652f\u6301\u56de\u6eaf\uff0c\u5e76\u6574\u5408\u5916\u90e8\u4fe1\u606f\u3002\u9ad8\u6548\u751f\u6210\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11token\u6210\u672c\uff0c\u63d0\u9ad8\u751f\u6210\u901f\u5ea6\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u878d\u5408\u4e86\u6765\u81ea\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u73b0\u4ee3LLMs\u548c\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u4e09\u4e2a\u7814\u7a76\u793e\u533a\u7684\u89c2\u70b9\u3002|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833](http://arxiv.org/abs/2406.16833)|null|\u5728\u5f53\u524d\u7684\u80cc\u666f\u4e0b\uff0c\u8bc6\u522b\u7528\u6237\u5728\u5404\u79cd\u8bdd\u9898\u7684\u957f\u7bc7\u8ba8\u8bba\u4e2d\u7684\u89c2\u70b9\u548c\u7acb\u573a\u5bf9\u4e8e\u4e2a\u6027\u5316\u3001\u5e02\u573a\u7814\u7a76\u3001\u653f\u6cbb\u7ade\u9009\u3001\u5ba2\u6237\u670d\u52a1\u3001\u51b2\u7a81\u89e3\u51b3\u3001\u5b9a\u5411\u5e7f\u544a\u548c\u5185\u5bb9\u7ba1\u7406\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u52a8\u6807\u6ce8\u6570\u636e\u4ee5\u8bad\u7ec3\u6b64\u7c7b\u6a21\u578b\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u5982\u8017\u65f6\u6602\u8d35\u3001\u957f\u5bf9\u8bdd\u53ef\u80fd\u5f15\u5165\u566a\u58f0\uff0c\u4ee5\u53ca\u7528\u6237\u89c2\u70b9\u8f6c\u53d8\u7684\u5fae\u5999\u4e4b\u5904\u53ef\u80fd\u5bfc\u81f4\u89e3\u8bfb\u56f0\u96be\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u672c\u6587\u5c1d\u8bd5\u5229\u7528Mistral Large\u548cGPT-4\u81ea\u52a8\u5316\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\u7684\u6807\u6ce8\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u63a8\u7406\uff1a\u4e00\u662f\u7528\u6237\u7acb\u573a\u5206\u7c7b\uff0c\u5373\u5728\u5bf9\u8bdd\u4e2d\u5bf9\u7528\u6237\u5e16\u5b50\u7684\u89c2\u70b9\u8fdb\u884c\u4e94\u7ea7\u6807\u6ce8\uff1b\u4e8c\u662f\u7528\u6237\u56fa\u6267\u7a0b\u5ea6\u5206\u7c7b\uff0c\u5173\u6ce8\u7528\u6237\u5728\u6574\u4e2a\u5bf9\u8bdd\u4e2d\u7684\u603b\u4f53\u610f\u89c1\uff0c\u91c7\u7528\u56db\u7ea7\u6807\u6ce8\u3002\u901a\u8fc7\u5728764\u4e2a\u591a\u7528\u6237Reddit\u5bf9\u8bdd\u4e0a\u5e94\u7528\u96f6\u6837\u672c\u3001\u4e00\u793a\u4f8b\u548c\u5c11\u91cf\u6837\u4f8b\u6807\u6ce8\u7684\u591a\u6570\u6295\u7968\uff0c\u6211\u4eec\u521b\u5efa\u4e86USDC\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\u5bf9\u591a\u4e2a\u5c0f\u578b\u90e8\u7f72\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u548c\u6307\u4ee4\u8c03\u6574\uff0c\u7528\u4e8e\u6267\u884c\u4e94\u7c7b\u7acb\u573a\u548c\u56db\u7c7b\u56fa\u6267\u7a0b\u5ea6\u7684\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff1a[https://anonymous.4open.science/r/USDC-0F7F]\u3002|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828](http://arxiv.org/abs/2406.16828)|**[link](https://github.com/castorini/ragnarok)**|## \u80cc\u666f \u60a8\u53ef\u80fd\u4f53\u9a8c\u8fc7\u65b0\u7684Bing\u641c\u7d22\u6216Google AI\u6982\u8ff0\uff1f\u8fd9\u4e9b\u90fd\u53cd\u6620\u51fa\u5f53\u524d\u641c\u7d22\u5f15\u64ce\u6b63\u9010\u6b65\u53d1\u5c55\u5230\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u7cfb\u7edf\u3002\u8fd9\u7c7b\u7cfb\u7edf\u80fd\u6574\u5408\u5b9e\u65f6\u6570\u636e\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u63d0\u4f9b\u4fe1\u606f\u4e30\u5bcc\u3001\u6709\u6765\u6e90\u4e14\u7b80\u6d01\u7684\u6458\u8981\uff0c\u4e0e\u4f20\u7edf\u7684\u6587\u6863\u6392\u540d\u5c55\u793a\u65b9\u5f0f\u5f62\u6210\u5bf9\u6bd4\u3002\u56e0\u6b64\uff0c\u4e3a\u4e86\u63a8\u52a8RAG\u7cfb\u7edf\u8bc4\u4f30\u7684\u521b\u65b0\uff0c\u6211\u4eec\u63d0\u8bae\u5728TREC 2024\u5e74\u589e\u8bbeRAG\u7ade\u8d5b\u3002\u672c\u6587\u8be6\u8ff0\u4e86\u6211\u4eec\u5982\u4f55\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff1a\u63cf\u8ff0\u4e86\u53ef\u590d\u7528\u6846\u67b6Ragnar\\\"ok\u7684\u8bbe\u8ba1\uff0c\u89e3\u91ca\u4e86MS MARCO V2.1\u8bed\u6599\u5e93\u7684\u9009\u62e9\uff0c\u53d1\u5e03\u4e86\u7ade\u8d5b\u5f00\u53d1\u8bdd\u9898\uff0c\u5e76\u6807\u51c6\u5316\u4e86\u7528\u6237\u63a5\u53e3\u5b9a\u4e49\uff0c\u4ee5\u4fbf\u5229\u7528\u6237\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5229\u7528Ragnar\\\"ok\u5c55\u793a\u5173\u952e\u7684\u5de5\u4e1a\u57fa\u51c6\uff0c\u5982OpenAI\u7684GPT-4o\u548cCohere\u7684Command R+\u3002\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u4e00\u4e2a\u7f51\u9875\u754c\u9762\uff0c\u7528\u4e8e\u4e92\u52a8\u5f0f\u5730\u6bd4\u8f83\u4e0d\u540cRAG\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u5e76\u901a\u8fc7\u4f17\u5305\u65b9\u5f0f\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u5f00\u6e90Ragnar\\\"ok\u6846\u67b6\u548c\u57fa\u51c6\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684RAG\u7cfb\u7edf\u5efa\u7acb\u7edf\u4e00\u7684\u6807\u51c6\u3002|\n", "2406.16801": "|**2024-06-24**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801](http://arxiv.org/abs/2406.16801)|**[link](https://github.com/qurrent-ai/res-q)**|**## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u4fc3\u4f7f\u4e86\u4e00\u7c7b\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u7cfb\u7edf\u53d1\u5c55\uff0c\u5982\u5bf9\u5927\u578b\u4ee3\u7801\u4ed3\u5e93\u8fdb\u884c\u7f16\u8f91\u3002\u9274\u4e8eLLMs\u5bf9\u63d0\u793a\u5fae\u8c03\u7684\u9ad8\u654f\u611f\u6027\u548c\u4e0d\u53ef\u9884\u6d4b\u6027\uff0c\u8feb\u5207\u9700\u8981\u7a33\u5065\u7684\u8bc4\u4f30\u5de5\u5177\u6765\u63a8\u52a8\u8fd9\u4e9b\u7cfb\u7edf\u7684\u672a\u6765\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51faRES-Q\uff0c\u4e00\u4e2a\u9488\u5bf9$\\textbf{R}$epository $\\textbf{E}$diting $\\textbf{S}$ystems\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e100\u4e2a\u771f\u5b9e\u7684GitHub\u63d0\u4ea4\u6784\u5efa\u4e86100\u4e2a\u4ed3\u5e93\u7f16\u8f91\u4efb\u52a1\u3002\u7ed9\u5b9a\u7f16\u8f91\u6307\u4ee4\u548c\u4ee3\u7801\u4ed3\u5e93\uff0cRES-Q\u8bc4\u4f30LLM\u7cfb\u7edf\u83b7\u53d6\u4fe1\u606f\u5e76\u6784\u9020\u6ee1\u8db3\u6307\u4ee4\u8981\u6c42\u7684\u7f16\u8f91\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u8fd9\u79cd\u8bc4\u4f30\u65b9\u5f0f\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u80fd\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002 \u6211\u4eec\u4f7f\u7528Qurrent OS\u5f00\u53d1\u7684\u8bed\u8a00\u4ee3\u7406\u8f6f\u4ef6\u6784\u5efa\u4e86\u4e00\u4e2a\u4ed3\u5e93\u7f16\u8f91\u7cfb\u7edf\uff0c\u5bf9\u8be5\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5982Claude Sonnet 3.5\u548cGPT-4o\uff0c\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5c3d\u7ba1\u5728HumanEval\u4e0a\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6709\u6240\u5dee\u5f02\uff0c\u4f46\u5728RES-Q\u4e0a\uff0cClaude Sonnet 3.5\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6bd4GPT-4o\u9ad8\u51fa12%\uff0c\u8fd9\u8868\u660eRES-Q\u5177\u6709\u533a\u5206\u6a21\u578b\u80fd\u529b\u7684\u6f5c\u529b\uff0c\u968f\u7740\u4f20\u7edf\u57fa\u51c6\u63a5\u8fd1\u9971\u548c\uff0c\u5b83\u80fd\u63d0\u4f9b\u66f4\u6df1\u5165\u7684\u6d1e\u5bdf\u3002 \u6211\u4eec\u8fd8\u7814\u7a76\u4e86token\u6548\u7387\u3001\u4e0e\u73b0\u6709\u57fa\u51c6\u7684\u6027\u80fd\u5173\u8054\uff0c\u4ee5\u53ca\u5c01\u95ed\u6e90\u548c\u5f00\u6e90LLM\u4e4b\u95f4\u7684\u6709\u8da3\u5dee\u5f02\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u5728https://github.com/Qurrent-AI/RES-Q\u83b7\u53d6\u3002**|\n", "2406.16797": "|**2024-06-24**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797](http://arxiv.org/abs/2406.16797)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|**## \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u65b0\u4efb\u52a1\u7684\u65b9\u6cd5\u5e76\u4e0d\u9002\u7528\u4e8e\u591a\u4efb\u52a1\u9002\u5e94\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u4fee\u6539\u6240\u6709\u6a21\u578b\u6743\u91cd\uff0c\u5bfc\u81f4\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4ea7\u751f\u7834\u574f\u6027\u7684\u5e72\u6270\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u5148\u524d\u4efb\u52a1\u7684\u9057\u5fd8\uff0c\u4f7f\u5f97\u540c\u65f6\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u83b7\u5f97\u826f\u597d\u6027\u80fd\u53d8\u5f97\u56f0\u96be\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Lottery Ticket Adaptation\uff08LoTA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7a00\u758f\u9002\u5e94\u65b9\u6cd5\uff0c\u5b83\u8bc6\u522b\u5e76\u4f18\u5316\u6a21\u578b\u4e2d\u7684\u4e00\u4e2a\u7a00\u758f\u5b50\u7f51\u7edc\u3002\u6211\u4eec\u5728\u8bf8\u5982\u6307\u4ee4\u8ddf\u968f\u3001\u63a8\u7406\u3001\u6570\u5b66\u548c\u6458\u8981\u7b49\u590d\u6742\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86LoTA\u3002 ## \u65b9\u6cd5 LoTA\u901a\u8fc7\u53d1\u73b0\u548c\u4f18\u5316\u201c\u5f69\u7968\u5238\u201d\uff08\u6216\u7a00\u758f\u4efb\u52a1\u5411\u91cf\uff09\u6765\u5b9e\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u4e8e\u5168\u91cf\u5fae\u8c03\u548c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u3002LoTA\u4e0d\u4ec5\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd8\u80fd\u5728\u8bad\u7ec3\u5176\u4ed6\u4efb\u52a1\u540e\u4fdd\u6301\u826f\u597d\u7684\u8868\u73b0\uff0c\u4ece\u800c\u907f\u514d\u4e86\u707e\u96be\u6027\u9057\u5fd8\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d0\u53d6\u548c\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\uff0cLoTA\u8fd8\u652f\u6301\u5728\u9ad8\u5ea6\u4e0d\u540c\u7684\u4efb\u52a1\u95f4\u8fdb\u884c\u6a21\u578b\u878d\u5408\u3002 ## \u7ed3\u8bba \u603b\u7684\u6765\u8bf4\uff0cLoTA\u4f5c\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u7a00\u758f\u9002\u5e94\u7b56\u7565\uff0c\u4e3a\u591a\u4efb\u52a1\u5927\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u5728\u5904\u7406\u591a\u4e2a\u4efb\u52a1\u65f6\u4fdd\u6301\u7a33\u5b9a\u4e14\u9ad8\u6548\u7684\u8868\u73b0\u3002**|\n", "2406.16783": "|**2024-06-24**|**M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models**|Rishabh Maheshwary et.al.|[2406.16783](http://arxiv.org/abs/2406.16783)|null|## \u80cc\u666f \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u6307\u4ee4\u7684\u6821\u51c6\u8fc7\u7a0b\u4e2d\uff0c\u5fae\u8c03\uff08finetuning, IFT\uff09\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u4e9b\u6709\u6548\u7684IFT\u6570\u636e\u96c6\uff0c\u4f46\u5927\u591a\u96c6\u4e2d\u5728\u9ad8\u8d44\u6e90\u8bed\u8a00\u5982\u82f1\u8bed\u4e0a\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51fa\u4e00\u4e2a\u5168\u5408\u6210\u7684\u3001\u57fa\u4e8eEvol\u5206\u7c7b\u6cd5\u5f15\u5bfc\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\u2014\u2014M2Lingual\uff0c\u76ee\u6807\u662f\u63d0\u5347LLMs\u5728\u591a\u6837\u8bed\u8a00\u548c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002M2Lingual\u5171\u5305\u542b182,000\u4e2aIFT\u5bf9\uff0c\u6e90\u81ea\u4e0d\u540c\u79cd\u5b50\uff0c\u6db5\u76d670\u79cd\u8bed\u8a00\u300117\u4e2aNLP\u4efb\u52a1\u4ee5\u53ca\u901a\u7528\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u3002 ## \u76ee\u7684\u4e0e\u8d21\u732e \u4f7f\u7528M2Lingual\u8fdb\u884c\u8bad\u7ec3\u7684LLMs\u6027\u80fd\u663e\u8457\u4f18\u4e8e\u5927\u591a\u6570\u73b0\u6709\u7684\u591a\u8bed\u8a00IFT\u6570\u636e\u96c6\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u7ecfM2Lingual\u5fae\u8c03\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u7a33\u5065\u7684\u8de8\u8bed\u8a00\u80fd\u529b\uff0c\u65e0\u8bba\u662f\u5728\u6211\u4eec\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u7ffb\u8bd1\u8bc4\u4ef7\u57fa\u51c6\u4e0a\uff0c\u8fd8\u662f\u5728\u591a\u79cd\u591a\u6837\u7684\u591a\u8bed\u8a00\u4efb\u52a1\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8d21\u732e\u4e86Evol\u5206\u7c7b\u6cd5\u7684\u4e24\u6b65\u65b9\u6cd5\uff0c\u5e76\u516c\u5f00\u4e86M2Lingual\u7684\u6570\u636e\u96c6\uff1ahttps://huggingface.co/datasets/ServiceNow-AI/M2Lingual\u3002|\n", "2406.16779": "|**2024-06-24**|**It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension**|Sagi Shaier et.al.|[2406.16779](http://arxiv.org/abs/2406.16779)|null|\u8fc7\u53bb\u5341\u5e74\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5b9e\u8df5\u672a\u7ecf\u5145\u5206\u8bc4\u4f30\u5c31\u5df2\u786e\u7acb\u3002\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u8fd9\u4e00\u60c5\u51b5\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u95ee\u9898\uff1a1\uff09\u8f93\u5165\u987a\u5e8f\uff08\u5373\u95ee\u9898\u548c\u4e0a\u4e0b\u6587\uff09\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff1f\u9274\u4e8e\u8fd1\u671f\u5728\u8f93\u5165\u4fa7\u91cd\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63a2\u7a76\uff1a2\uff09\u5f3a\u8c03\u95ee\u9898\u3001\u4e0a\u4e0b\u6587\u6216\u4e24\u8005\u662f\u5426\u80fd\u63d0\u5347\u8868\u73b0\uff1f\u6211\u4eec\u57283\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u4e869\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u53d1\u73b0\u5148\u5448\u73b0\u4e0a\u4e0b\u6587\u518d\u7ed9\u51fa\u95ee\u9898\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u6700\u9ad8\u53ef\u8fbe31%\u7684\u51c6\u786e\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u5f3a\u8c03\u4e0a\u4e0b\u6587\u7684\u6548\u679c\u4f18\u4e8e\u7a81\u51fa\u663e\u793a\u95ee\u9898\uff0c\u800c\u4e14\u5bf9\u6a21\u578b\u7f3a\u4e4f\u53c2\u6570\u77e5\u8bc6\u6765\u56de\u7b54\u7684\u95ee\u9898\uff0c\u9488\u5bf9\u6027\u5730\u5f3a\u8c03\u8f93\u5165\u90e8\u5206\u5c24\u5176\u6709\u6548\u3002\u901a\u8fc7\u5c1d\u8bd5\u57fa\u4e8e\u63d0\u793a\u548c\u6ce8\u610f\u529b\u7684\u5f3a\u8c03\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u6709\u6548\u7684\u7b56\u7565\u51fa\u4eba\u610f\u6599\u5730\u7b80\u5355\uff1a\u53ea\u9700\u5728\u8f93\u5165\u4e2d\u9644\u52a0\u51e0\u4e2a\u6807\u8bb0\uff0c\u5c31\u80fd\u5b9e\u73b0\u9ad8\u8fbe36%\u7684\u51c6\u786e\u6027\u63d0\u5347\uff0c\u4f7f\u5f97\u5c0f\u578b\u6a21\u578b\u80fd\u591f\u8d85\u8d8a\u5176\u5927\u5f97\u591a\u7684\u540c\u7c7b\u6a21\u578b\u3002|\n", "2406.16777": "|**2024-06-24**|**Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024**|Sai Koneru et.al.|[2406.16777](http://arxiv.org/abs/2406.16777)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u88ab\u5e7f\u6cdb\u7814\u7a76\uff0c\u4ee5\u5e94\u7528\u4e8e\u8bf8\u5982\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u3001\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u751a\u81f3\u7aef\u5230\u7aef\u8bed\u97f3\u7ffb\u8bd1\uff08ST\uff09\u7b49\u4efb\u52a1\u3002\u672c\u6587\u4ecb\u7ecdKIT\u56e2\u961f\u5728\u53d7\u9650+LLM\u8d5b\u9053\u4e0b\u7684\u79bb\u7ebf\u63d0\u4ea4\uff0c\u6211\u4eec\u901a\u8fc7\u6574\u5408\u6700\u65b0\u6280\u672f\u6539\u8fdb\u4e86\u7ea7\u8054\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u5c06Mistral-7B\u6a21\u578b\\footnote{mistralai/Mistral-7B-Instruct-v0.1}\u878d\u5165\u5176\u4e2d\uff0c\u4ece\u4e24\u4e2a\u65b9\u9762\u589e\u5f3a\u7cfb\u7edf\uff1a\u4e00\u662f\u5229\u7528\u6211\u4eec\u7684\u7cfb\u7edf\u751f\u6210\u7684N-best\u5217\u8868\u7cbe\u70bcASR\u8f93\u51fa\uff0c\u901a\u8fc7\u5fae\u8c03LLM\u63d0\u9ad8\u8f6c\u5f55\u51c6\u786e\u6027\uff1b\u4e8c\u662f\u5bf9MT\u8f93\u51fa\u8fdb\u884c\u6587\u6863\u7ea7\u522b\u7684\u7cbe\u70bc\uff0c\u5229\u7528ASR\u548cMT\u9884\u6d4b\u6765\u63d0\u5347\u7ffb\u8bd1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u7684\u96c6\u6210\u4f7f\u5f97ASR\u7684Word Error Rate\u4e0b\u964d\u4e86\u7edd\u5bf90.3%\uff0cMT\u7684COMET\u8bc4\u5206\u63d0\u9ad8\u4e860.65%\u3002\u7136\u800c\uff0c\u5728\u5305\u542b\u91cd\u53e0\u8bf4\u8bdd\u8005\u548c\u80cc\u666f\u566a\u97f3\u7684\u6311\u6218\u6027\u6d4b\u8bd5\u96c6\u4e2d\uff0c\u7531\u4e8eASR\u6027\u80fd\u4e0d\u4f73\uff0cLLM\u96c6\u6210\u7684\u6548\u679c\u4e0d\u660e\u663e\u3002\u4e3a\u4e86\u6539\u5584\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u53ef\u80fd\u7f3a\u5931\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u5206\u5757\u957f\u5f62\u5f0f\u89e3\u7801\u7684ASR\u65b9\u6cd5\u3002|\n", "2406.16768": "|**2024-06-24**|**WARP: On the Benefits of Weight Averaged Rewarded Policies**|Alexandre Ram\u00e9 et.al.|[2406.16768](http://arxiv.org/abs/2406.16768)|null|### \u7ffb\u8bd1 \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u901a\u8fc7\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u6765\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f7f\u5176\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u4e3a\u4e86\u4fdd\u6301\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0cRLHF\u901a\u5e38\u91c7\u7528KL\u6563\u5ea6\u6b63\u5219\u5316\uff0c\u4f46\u8fd9\u4f1a\u9650\u5236\u5956\u52b1\u4f18\u5316\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5bf9\u9f50\u7b56\u7565\uff0c\u79f0\u4e3a\u6743\u91cd\u5e73\u5747\u5956\u52b1\u7b56\u7565\uff08WARP\uff09\u3002WARP\u5728\u4e09\u4e2a\u9636\u6bb5\u5728\u6743\u91cd\u7a7a\u95f4\u4e2d\u878d\u5408\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5b83\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7b56\u7565\u4f5c\u4e3aKL\u6b63\u5219\u5316\u7684\u52a8\u6001\u57fa\u51c6\u3002\u5176\u6b21\uff0c\u5e94\u7528\u7403\u9762\u63d2\u503c\u5c06\u72ec\u7acb\u5fae\u8c03\u7684\u7b56\u7565\u5408\u5e76\u6210\u4e00\u4e2a\u589e\u5f3a\u6a21\u578b\u3002\u6700\u540e\uff0c\u7ebf\u6027\u63d2\u503c\u5728\u5408\u5e76\u6a21\u578b\u548c\u521d\u59cb\u6a21\u578b\u4e4b\u95f4\u8fdb\u884c\uff0c\u4ee5\u6062\u590d\u9884\u8bad\u7ec3\u7279\u5f81\u3002\u8be5\u8fc7\u7a0b\u8fed\u4ee3\u8fdb\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6700\u7ec8\u6a21\u578b\u7528\u4f5c\u4e0b\u4e00\u8f6e\u7684\u9ad8\u7ea7\u521d\u59cb\u5316\uff0c\u9010\u6b65\u4f18\u5316KL\u4e0e\u5956\u52b1\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5b9e\u73b0\u56fa\u5b9aKL\u4e0b\u7684\u66f4\u9ad8\u5956\u52b1\u3002GEMMA\u7b56\u7565\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86WARP\u7684\u4f18\u70b9\uff0c\u5176\u8d28\u91cf\u548c\u5bf9\u9f50\u6027\u80fd\u4f18\u4e8e\u5f00\u6e90\u7684LLMs\u3002|\n", "2406.17770": "|**2024-06-25**|**MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning**|Xiangyu Zhao et.al.|[2406.17770](http://arxiv.org/abs/2406.17770)|**[link](https://github.com/phoenixz810/mg-llava)**|**## \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u6a21\u578b\u5c40\u9650\u4e8e\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u56fe\u50cf\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u8be6\u7ec6\u89c6\u89c9\u4fe1\u606f\u7684\u611f\u77e5\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MLLM\u2014\u2014MG-LLaVA\uff0c\u901a\u8fc7\u5f15\u5165\u591a\u5c3a\u5ea6\u89c6\u89c9\u6d41\uff0c\u5305\u62ec\u4f4e\u5206\u8fa8\u7387\u3001\u9ad8\u5206\u8fa8\u7387\u548c\u5bf9\u8c61\u7ea7\u7279\u5f81\uff0c\u6765\u589e\u5f3a\u6a21\u578b\u7684\u89c6\u89c9\u5904\u7406\u80fd\u529b\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u4ee5\u6355\u6349\u7cbe\u7ec6\u7ec6\u8282\uff0c\u5e76\u901a\u8fc7\u5377\u79ef\u95e8\u878d\u5408\u7f51\u7edc\u4e0e\u57fa\u7840\u89c6\u89c9\u7279\u5f81\u878d\u5408\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u5bf9\u8c61\u8bc6\u522b\u80fd\u529b\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u6765\u81ea\u79bb\u7ebf\u68c0\u6d4b\u5668\u786e\u5b9a\u7684\u8fb9\u754c\u6846\u7684\u7269\u4f53\u7ea7\u522b\u7279\u5f81\u3002MG-LLaVA\u4ec5\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u611f\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u7f16\u7801\u5668\uff08\u4ece38\u4ebf\u5230340\u4ebf\u53c2\u6570\uff09\u5b9e\u4f8b\u5316MG-LLaVA\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u5176\u6027\u80fd\u3002\u591a\u9879\u57fa\u51c6\u6d4b\u8bd5\u7684\u7ed3\u679c\u8868\u660e\uff0cMG-LLaVA\u5728\u540c\u7c7b\u53c2\u6570\u91cf\u7684\u73b0\u6709MLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u51fa\u8272\u7684\u6548\u7387\u3002\u4ee3\u7801\u5c06\u5728https://github.com/PhoenixZ810/MG-LLaVA\u4e0a\u5f00\u6e90\u3002**|\n", "2406.17764": "|**2024-06-25**|**BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning**|Ercong Nie et.al.|[2406.17764](http://arxiv.org/abs/2406.17764)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u79ef\u7d2f\u4e86\u4e30\u5bcc\u7684\u53c2\u6570\u77e5\u8bc6\uff0c\u4f46\u7531\u4e8e\u91cd\u65b0\u8bad\u7ec3\u6210\u672c\u9ad8\u6602\u4e14\u5bf9\u95ed\u6e90\u6a21\u578b\u4e0d\u53ef\u884c\uff0c\u66f4\u65b0\u8fd9\u4e9b\u77e5\u8bc6\u53d8\u5f97\u56f0\u96be\u3002\u77e5\u8bc6\u7f16\u8f91\uff08KE\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5141\u8bb8\u5728\u4e0d\u635f\u5bb3\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u66f4\u65b0LLMs\u7684\u77e5\u8bc6\u3002\u57fa\u4e8e\u201c\u4e0a\u4e0b\u6587\u5b66\u4e60\u201d\uff08ICL\uff09\u7684\u5373\u5e2dKE\u65b9\u6cd5\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u4f5c\u4e3a\u9ed1\u76d2\u5904\u7406\u3002\u8fc7\u53bb\uff0cKE\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u73af\u5883\uff0c\u800c\u5f53\u524d\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLMs\u5728\u8de8\u8bed\u8a00KE\u65b9\u9762\u7684\u6f5c\u529b\u5c1a\u672a\u5145\u5206\u6316\u6398\u3002\u4e3a\u4e86\u63a8\u52a8\u8fd9\u65b9\u9762\u7684\u66f4\u591a\u7814\u7a76\uff0c\u6211\u4eec\u63a8\u51fa\u4e86BMIKE-53\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u9488\u5bf953\u79cd\u4e0d\u540c\u8bed\u8a00\u7684\u4e09\u79cdKE\u4efb\u52a1\u7c7b\u578b\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u68af\u5ea6\u7684KE\u65b9\u6cd5\u2014\u2014\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7f16\u8f91\uff08MIKE\uff09\uff0c\u5e76\u5728BMIKE-53\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u5173\u6ce8\u8de8\u8bed\u8a00\u77e5\u8bc6\u8f6c\u79fb\u7684\u53ef\u9760\u6027\u3001\u6cdb\u5316\u6027\u3001\u5c40\u90e8\u6027\u548c\u53ef\u79fb\u690d\u6027\uff0c\u4e3a\u672a\u6765\u8de8\u8bed\u8a00KE\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u548c\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u901a\u8fc7\u533f\u540d\u4ed3\u5e93https://anonymous.4open.science/r/MIKE\u516c\u5f00\u83b7\u53d6\u3002|\n", "2406.17761": "|**2024-06-25**|**CaLMQA: Exploring culturally specific long-form question answering across 23 languages**|Shane Arora et.al.|[2406.17761](http://arxiv.org/abs/2406.17761)|**[link](https://github.com/2015aroras/calmqa)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u957f\u7bc7\u95ee\u7b54\u4efb\u52a1\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u9700\u751f\u6210\u6bb5\u843d\u7ea7\u522b\u7684\u7b54\u6848\u6765\u56de\u5e94\u590d\u6742\u95ee\u9898\u3002\u5c3d\u7ba1\u82f1\u8bed\u7684\u957f\u7bc7\u95ee\u7b54\u7814\u7a76\u5df2\u76f8\u5f53\u6df1\u5165\uff0c\u6d89\u53ca\u591a\u79cd\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u6307\u6807\uff0c\u4f46\u5176\u4ed6\u8bed\u8a00\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63a8\u51fa\u4e86CaLMQA\uff0c\u4e00\u4e2a\u5305\u542b2,600\u4e2a\u8de823\u79cd\u8bed\u8a00\u7684\u590d\u6742\u95ee\u9898\u96c6\u5408\uff0c\u5176\u4e2d\u5305\u62ec\u8d44\u6e90\u6709\u9650\u3001\u9c9c\u5c11\u7814\u7a76\u7684\u8bed\u8a00\uff0c\u5982\u6590\u6d4e\u8bed\u548c\u57fa\u6797\u8fea\u8bed\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u65e2\u5305\u62ec\u793e\u533a\u7f51\u7edc\u8bba\u575b\u4e0a\u6536\u96c6\u7684\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u4e5f\u5305\u542b\u4e86\u7531\u6bcd\u8bed\u4f7f\u7528\u8005\u64b0\u5199\u7684\u9898\u76ee\uff0c\u6211\u4eec\u4e3a\u6b64\u4e13\u95e8\u8058\u8bf7\u4e86\u4ed6\u4eec\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u4ea7\u751f\u4e86\u591a\u6837\u4e14\u590d\u6742\u7684\u9898\u76ee\uff0c\u53cd\u6620\u4e86\u6587\u5316\u4e3b\u9898\uff08\u5982\u4f20\u7edf\u3001\u6cd5\u5f8b\u3001\u65b0\u95fb\uff09\uff0c\u4ee5\u53ca\u6bcd\u8bed\u4f7f\u7528\u8005\u7684\u8bed\u8a00\u4e60\u60ef\u3002 \u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u81ea\u52a8\u8bc4\u4f30\uff0c\u4f7f\u7528\u4e86\u6211\u4eec\u65b0\u63d0\u51fa\u7684CaLMScore\u6307\u6807\uff0c\u8be5\u6307\u6807\u80fd\u68c0\u6d4b\u7b54\u6848\u4e2d\u7684\u8bed\u8a00\u9519\u8bef\u548c\u91cd\u590d\u8bcd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u4f4e\u8d44\u6e90\u8bed\u8a00\uff0cLLM\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u660e\u663e\u4e0b\u964d\u3002\u6211\u4eec\u5728\u90e8\u5206\u6a21\u578b\u7684\u4eba\u5de5\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u5bf9\u4e8e\u5177\u6709\u6587\u5316\u7279\u6027\u7684\u95ee\u9898\uff0c\u6a21\u578b\u8868\u73b0\u663e\u8457\u4f4e\u4e8e\u6587\u5316\u4e2d\u7acb\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5bf9LLM\u591a\u8bed\u8a00\u80fd\u529b\u53ca\u975e\u82f1\u8bed\u957f\u7bc7\u95ee\u7b54\u8bc4\u4ef7\u9886\u57df\u66f4\u6df1\u5165\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002**|\n", "2406.17755": "|**2024-06-25**|**Accelerating Clinical Evidence Synthesis with Large Language Models**|Zifeng Wang et.al.|[2406.17755](http://arxiv.org/abs/2406.17755)|null|\u4eba\u5de5\u667a\u80fd\u81ea\u52a8\u533b\u5b66\u53d1\u73b0\u662f\u8bb8\u591a\u4eba\u7684\u68a6\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTrialMind\u7684\u751f\u6210\u5f0fAI\u7ba1\u9053\uff0c\u65e8\u5728\u8fdb\u884c\u533b\u5b66\u7cfb\u7edf\u6027\u56de\u987e\uff0c\u6db5\u76d6\u7814\u7a76\u641c\u7d22\u3001\u7b5b\u9009\u548c\u6570\u636e\u63d0\u53d6\u9636\u6bb5\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u6bcf\u4e2a\u73af\u8282\uff0c\u5e76\u5f15\u5165\u4e13\u5bb6\u76d1\u7763\u4ee5\u51cf\u5c11\u9519\u8bef\u3002\u4e3a\u4e86\u8bc4\u4f30\u6027\u80fd\uff0c\u6211\u4eec\u521b\u5efa\u4e86TrialReviewBench\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u662f\u4e00\u4e2a\u5b9a\u5236\u7684\u5305\u542b870\u4efd\u6765\u81ea25\u7bc7\u5143\u5206\u6790\u8bba\u6587\u7684\u4e34\u5e8a\u7814\u7a76\u6807\u6ce8\u6570\u636e\uff0c\u6db5\u76d6\u4e0d\u540c\u533b\u7597\u6cbb\u7597\u9886\u57df\u3002\u7ed3\u679c\u663e\u793a\uff0cTrialMind\u663e\u8457\u63d0\u5347\u4e86\u6587\u732e\u5ba1\u67e5\u6548\u7387\uff0c\u5728\u4ece\u8d85\u8fc72000\u4e07\u7bc7PubMed\u7814\u7a76\u4e2d\u68c0\u7d22\u76f8\u5173\u7814\u7a76\u65f6\uff0c\u53ec\u56de\u7387\u9ad8\u8fbe0.897\u81f31.000\u3002\u5728\u7b5b\u9009\u9636\u6bb5\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8e\u57fa\u4e8e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7684\u65b9\u6cd5\uff08\u53ec\u56de\u7387\u5206\u522b\u4e3a0.227-0.246 vs. 0.000-0.102\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ed3\u679c\u63d0\u53d6\u65b9\u9762\u8d85\u8d8a\u4e86\u76f4\u63a5\u4f7f\u7528GPT-4\u7684\u8868\u73b0\uff0c\u51c6\u786e\u7387\u8303\u56f4\u4e3a0.65\u52300.84\u3002\u6211\u4eec\u8fd8\u652f\u6301\u68ee\u6797\u56fe\u4e2d\u7684\u4e34\u5e8a\u8bc1\u636e\u7efc\u5408\uff0c\u7ecf\u516b\u540d\u4eba\u7c7b\u6807\u6ce8\u5458\u9a8c\u8bc1\uff0c\u4ed6\u4eec\u666e\u904d\u66f4\u504f\u597dTrialMind\uff0c\u5176\u5728\u6d89\u53ca\u7684\u5ba1\u67e5\u4e2d\u80dc\u51fa\u7387\u4e3a62.5%\u81f3100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u5982TrialMind\uff0c\u80fd\u591f\u4fc3\u8fdb\u53ef\u9760\u4e14\u9ad8\u8d28\u91cf\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\uff0c\u4ece\u800c\u63d0\u5347\u4e34\u5e8a\u7814\u7a76\u7684\u6548\u7387\u3002|\n", "2406.17753": "|**2024-06-25**|**Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language**|Amalie Brogaard Pauli et.al.|[2406.17753](http://arxiv.org/abs/2406.17753)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u9762\u5bf9\u5927\u91cf\u8bd5\u56fe\u5f71\u54cd\u6211\u4eec\u7684\u4fe1\u606f\uff0c\u5982\u9884\u544a\u6d88\u606f\u3001\u8fa9\u8bba\u3001\u5e26\u6709\u653f\u6cbb\u8272\u5f69\u7684\u65b0\u95fb\u548c\u5ba3\u4f20\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5177\u6709\u8bf4\u670d\u529b\u6587\u672c\u7684\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u4e13\u6ce8\u4e8e\u7279\u5b9a\u9886\u57df\u6216\u7c7b\u578b\u529d\u8bf4\u7684\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u5206\u6790\uff0c\u65e8\u5728\u6d4b\u91cf\u548c\u57fa\u51c6LLMs\u5728\u88ab\u660e\u786e\u8981\u6c42\u589e\u5f3a\u6216\u51cf\u5c11\u8bf4\u670d\u529b\u65f6\uff0c\u4ee5\u53ca\u4ec5\u8981\u6c42\u8fdb\u884c\u91ca\u4e49\u65f6\u4ea7\u751f\u8bf4\u670d\u6027\u6587\u672c\u7684\u7a0b\u5ea6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u2014\u2014\u201cPersuasive-Pairs\u201d\uff0c\u5305\u542b\u4e00\u7ec4\u7531\u7b80\u77ed\u6587\u672c\u548cLLM\u91cd\u5199\u4ee5\u653e\u5927\u6216\u524a\u5f31\u8bf4\u670d\u529b\u7684\u6587\u672c\u5bf9\u3002\u6211\u4eec\u5bf9\u8fd9\u4e9b\u914d\u5bf9\u8fdb\u884c\u4e86\u591a\u6807\u6ce8\uff0c\u6309\u76f8\u5bf9\u5c3a\u5ea6\u8bc4\u4f30\u5176\u8bf4\u670d\u529b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u4e0d\u4ec5\u672c\u8eab\u5177\u6709\u4ef7\u503c\uff0c\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5b83\u8bad\u7ec3\u4e00\u4e2a\u56de\u5f52\u6a21\u578b\uff0c\u9884\u6d4b\u6587\u672c\u5bf9\u4e4b\u95f4\u8bf4\u670d\u529b\u7684\u5f97\u5206\uff0c\u4ece\u800c\u80fd\u591f\u5bf9\u4e0d\u540c\u9886\u57df\u7684LLMs\u8fdb\u884c\u8bc4\u5206\u548c\u6bd4\u8f83\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e0d\u540c\u7cfb\u7edf\u63d0\u793a\u5bf9LLaMA3\u4ea7\u751f\u7684\u5f71\u54cd\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5373\u4f7f\u5728\u4ec5\u8981\u6c42\u91ca\u4e49\u7684\u60c5\u51b5\u4e0b\uff0c\u4e0d\u540c\u7684\u201c\u89d2\u8272\u201d\u63d0\u793a\u4e5f\u4f1a\u663e\u8457\u6539\u53d8\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7814\u7a76LLM\u751f\u6210\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u8bed\u8a00\u7684\u91cd\u8981\u6027\u3002|\n", "2406.17737": "|**2024-06-25**|**LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users**|Elinor Poole-Dayan et.al.|[2406.17737](http://arxiv.org/abs/2406.17737)|null|\u5728\u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5173\u4e8e\u5b83\u4eec\u7684\u4e0d\u53ef\u9760\u884c\u4e3a\uff0c\u5982\u865a\u6784\u548c\u504f\u89c1\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u7684\u56de\u7b54\u8d28\u91cf\u5728\u4fe1\u606f\u51c6\u786e\u6027\u3001\u771f\u5b9e\u6027\u4ee5\u53ca\u62d2\u7edd\u56de\u7b54\u65b9\u9762\uff0c\u5982\u4f55\u968f\u7740\u4e09\u79cd\u7528\u6237\u7279\u5f81\u7684\u53d8\u5316\u800c\u53d8\u5316\uff1a\u82f1\u8bed\u6c34\u5e73\u3001\u6559\u80b2\u7a0b\u5ea6\u548c\u56fd\u7c4d\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u6700\u5148\u8fdb\u7684LLMs\u548c\u4e24\u4e2a\u4e8b\u5b9e\u6838\u67e5\u76f8\u5173\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u771f\u5b9e\u6027\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u5bf9\u82f1\u8bed\u80fd\u529b\u8f83\u4f4e\u3001\u6559\u80b2\u6c34\u5e73\u8f83\u4f4e\u4ee5\u53ca\u975e\u7f8e\u56fd\u7c4d\u7528\u6237\u7684\u56de\u7b54\u8d28\u91cf\u5b58\u5728\u66f4\u660e\u663e\u7684\u8d1f\u9762\u503e\u5411\uff0c\u8fd9\u4f7f\u5f97\u8fd9\u4e9b\u6a21\u578b\u5bf9\u4e8e\u5176\u6700\u5f31\u52bf\u7528\u6237\u6765\u8bf4\uff0c\u5e76\u975e\u53ef\u9760\u7684\u4fe1\u606f\u6765\u6e90\u3002|\n", "2406.17706": "|**2024-06-25**|**FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model**|Feijie Wu et.al.|[2406.17706](http://arxiv.org/abs/2406.17706)|**[link](https://github.com/HarliWu/FedBiOT)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u8fc7\u9002\u5f53\u9886\u57df\u7279\u5b9a\u6570\u636e\u7684\u5fae\u8c03\u540e\uff0c\u5728\u8bb8\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u4e13\u7528\u6570\u636e\u901a\u5e38\u5206\u5e03\u5728\u591a\u4e2a\u6240\u6709\u8005\u4e4b\u95f4\uff0c\u8fd9\u5c31\u63d0\u51fa\u4e86\u5982\u4f55\u5728\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4e2d\u8fdb\u884cLLM\u5fae\u8c03\u7684\u95ee\u9898\u3002\u9762\u5bf9\u6709\u9650\u7684\u8ba1\u7b97\u548c\u901a\u4fe1\u80fd\u529b\uff0cFL\u5ba2\u6237\u7aef\u5728\u6709\u6548\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FedBiOT\uff0c\u4e00\u79cd\u65e8\u5728\u63d0\u9ad8\u8d44\u6e90\u6548\u7387\u7684LLM\u5fae\u8c03FL\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u670d\u52a1\u5668\u751f\u6210\u4e00\u4e2a\u538b\u7f29\u7684LLM\uff0c\u5e76\u786e\u4fdd\u5176\u6027\u80fd\u4e0e\u5b8c\u6574\u6a21\u578b\u76f8\u5f53\u3002\u7136\u540e\uff0c\u5ba2\u6237\u7aef\u9488\u5bf9\u8fd9\u4e2a\u538b\u7f29\u6a21\u578b\u7684\u4e00\u4e2a\u8f7b\u91cf\u4f46\u91cd\u8981\u7684\u90e8\u5206\u2014\u2014\u9002\u914d\u5668\u8fdb\u884c\u5fae\u8c03\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7531\u4e8e\u670d\u52a1\u5668\u65e0\u6cd5\u8bbf\u95ee\u5ba2\u6237\u7aef\u62e5\u6709\u7684\u79c1\u4eba\u6570\u636e\uff0c\u670d\u52a1\u5668\u7528\u4e8e\u6821\u51c6\u7684\u6570\u636e\u5206\u5e03\u4e0e\u5ba2\u6237\u7aef\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u4e0d\u540c\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5efa\u6a21\u4e3a\u4e00\u4e2a\u5e26\u6709\u6570\u636e\u4e0d\u4e00\u81f4\u6027\u5f71\u54cd\u7684 bilevel \u4f18\u5316\u95ee\u9898\uff0c\u5e76\u5bfc\u51fa\u4e86\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u7684\u66f4\u65b0\u89c4\u5219\u3002\u6211\u4eec\u5728 LLaMA-2 \u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u5728\u91cd\u65b0\u6574\u5408\u5230\u5168\u5c40\u8bed\u8a00\u6a21\u578b\u65f6\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u8fd8\u8868\u660e\uff0cFedBiOT \u76f8\u6bd4\u73b0\u6709\u57fa\u51c6\u663e\u8457\u51cf\u5c11\u4e86\u8d44\u6e90\u6d88\u8017\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u76f8\u8fd1\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692](http://arxiv.org/abs/2406.17692)|**[link](https://github.com/thomlake/investigating-alignment)**|**\u8be5\u7814\u7a76\u5206\u6790\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ecf\u8fc7\u6821\u51c6\u540e\u8f93\u51fa\u5206\u5e03\u7684\u53d8\u5316\u7279\u6027\u3002\u9996\u5148\uff0c\u91cd\u65b0\u8bc4\u4f30\u4e86\u4e4b\u524d\u5173\u4e8e\u6821\u51c6\u540e\u54cd\u5e94\u591a\u6837\u6027\u964d\u4f4e\u7684\u62a5\u544a\uff0c\u53d1\u73b0\u8fd9\u79cd\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u8d28\u91cf\u63a7\u5236\u548c\u4fe1\u606f\u6574\u5408\u3002\u6821\u51c6\u80fd\u591f\u6291\u5236\u4e0d\u76f8\u5173\u548c\u65e0\u5e2e\u52a9\u7684\u5185\u5bb9\uff0c\u540c\u65f6\u4f7f\u8f93\u51fa\u5206\u5e03\u503e\u5411\u4e8e\u66f4\u957f\u7684\u3001\u6db5\u76d6\u591a\u4e2a\u57fa\u7840LLM\u54cd\u5e94\u4fe1\u606f\u7684\u7b54\u6848\uff0c\u5b9e\u8d28\u4e0a\u662f\u5c06\u591a\u6837\u5316\u4fe1\u606f\u6c47\u603b\u5728\u5355\u4e2a\u54cd\u5e94\u4e2d\u3002\u7814\u7a76\u5e76\u672a\u53d1\u73b0\u6821\u51c6\u663e\u8457\u51cf\u5c11\u6709\u7528\u4fe1\u606f\uff0c\u8fdb\u800c\u5f15\u51fa\u95ee\u9898\uff1a\u6821\u51c6\u6a21\u578b\u662f\u5426\u4f1a\u4ea7\u751f\u57fa\u7840\u6a21\u578b\u65e0\u6cd5\u518d\u73b0\u7684\u4fe1\u606f\uff1f\u7b2c\u4e8c\u90e8\u5206\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u60c5\u51b5\u5e76\u975e\u5982\u6b64\uff0c\u6821\u51c6\u6a21\u578b\u7684\u884c\u4e3a\u53ef\u4ee5\u901a\u8fc7\u57fa\u7840\u6a21\u578b\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u590d\u73b0\u3002\u901a\u8fc7\u4e0a\u4e0b\u6587\u793a\u4f8b\u548c\u8f83\u4f4e\u5206\u8fa8\u7387\u7684\u8bed\u4e49\u63d0\u793a\uff0c\u53ef\u4ee5\u4ece\u57fa\u7840LLMs\u5f15\u5bfc\u51fa\u4e0e\u6821\u51c6\u540e\u7684\u76f8\u4f3c\u54cd\u5e94\uff0c\u751a\u81f3\u4e0e\u6821\u51c6\u540e\u7684\u54cd\u5e94\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u63a5\u8fd1\u3002\u8fd9\u4e9b\u53d1\u73b0\u652f\u6301\u201c\u8868\u9762\u6821\u51c6\u5047\u8bbe\u201d\uff0c\u5373\u5f53\u524d\u7684\u6821\u51c6\u6280\u672f\u4ec5\u6355\u6349\u4e86\u52a9\u624b\u578b\u57fa\u7840LLM\u884c\u4e3a\u4e2d\u6709\u7528\u7684\u90e8\u5206\uff0c\u5e76\u672a\u6269\u5c55\u5176\u80fd\u529b\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u8fd8\u663e\u793a\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u6821\u51c6\u4f5c\u4e3a\u4e00\u79cd\u6a21\u4eff\u6821\u51c6LLMs\u7684\u7b56\u7565\uff0c\u6548\u679c\u51fa\u4eba\u610f\u6599\u5730\u597d\uff0c\u4e14\u65e0\u9700\u5fae\u8c03\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.17681": "|**2024-06-25**|**VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation**|Kun Qian et.al.|[2406.17681](http://arxiv.org/abs/2406.17681)|**[link](https://github.com/qbetterk/VarBench)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f20\u7edf\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u65e5\u76ca\u51fa\u8272\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u9884\u8bad\u7ec3\u671f\u95f4\u7684\u57fa\u51c6\u6570\u636e\u6cc4\u9732\u95ee\u9898\uff0c\u901a\u5e38\u79f0\u4e3a\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002\u4e3a\u4e86\u786e\u4fdd\u516c\u6b63\u7684\u8bc4\u4f30\uff0c\u6700\u8fd1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u516c\u5f00\u8bad\u7ec3\u548c\u9a8c\u8bc1\u96c6\uff0c\u5bf9\u6d4b\u8bd5\u96c6\u6807\u7b7e\u4fdd\u5bc6\u3002\u4ed6\u4eec\u8981\u6c42\u4efb\u4f55\u5e0c\u671b\u8bc4\u4f30\u81ea\u5df1\u8bed\u8a00\u6a21\u578b\u7684\u4eba\u90fd\u9700\u8981\u63d0\u4ea4\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u8fdb\u884c\u96c6\u4e2d\u5904\u7406\uff0c\u7136\u540e\u5728\u6392\u884c\u699c\u4e0a\u516c\u5e03\u6a21\u578b\u7684\u5f97\u5206\u3002\u7136\u800c\uff0c\u8fd9\u4e2a\u63d0\u4ea4\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u59a8\u788d\u4e86\u6709\u6548\u7684\u9519\u8bef\u5206\u6790\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u52a8\u6001\u5316\u57fa\u51c6\u6d4b\u8bd5\u5e76\u5b9e\u65f6\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u6bcf\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u4e2d\u63d0\u53d6\u53d8\u91cf\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u53d8\u91cf\u5b9a\u4e49\u4e00\u4e2a\u503c\u8303\u56f4\u3002\u6bcf\u6b21\u8bc4\u4f30\u65f6\uff0c\u6211\u4eec\u4f1a\u4ece\u8fd9\u4e9b\u503c\u57df\u4e2d\u62bd\u53d6\u65b0\u7684\u503c\u6765\u521b\u5efa\u72ec\u7279\u7684\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ece\u800c\u4fdd\u8bc1\u6bcf\u6b21\u90fd\u662f\u5168\u65b0\u7684\u8bc4\u4f30\u3002 \u6211\u4eec\u9488\u5bf9\u6570\u5b66\u751f\u6210\u4efb\u52a1\u7684GSM8K\u3001\u591a\u9879\u9009\u62e9\u4efb\u52a1\u7684ARC\u3001commonsense\u95ee\u7b54\u7684CommonsenseQA\u4ee5\u53caTruthfulQA\u7684\u771f\u5b9e\u6027\u95ee\u7b54\u4efb\u52a1\uff0c\u5e94\u7528\u4e86\u8fd9\u79cd\u53d8\u91cf\u6270\u52a8\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u51c6\u786e\u5730\u8861\u91cf\u8bed\u8a00\u6a21\u578b\u7684\u771f\u5b9e\u80fd\u529b\uff0c\u6709\u6548\u7f13\u89e3\u4e86\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002|\n", "2406.17675": "|**2024-06-25**|**Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models**|Yuan Li et.al.|[2406.17675](http://arxiv.org/abs/2406.17675)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u65e5\u76ca\u626e\u6f14\u7c7b\u4f3c\u4eba\u7c7b\u52a9\u624b\u7684\u89d2\u8272\u3002\u793e\u4f1a\u5bf9\u5c06LLMs\u66f4\u5e7f\u6cdb\u5730\u878d\u5165\u5176\u4e2d\u4ea7\u751f\u4e86\u5174\u8da3\uff0c\u63a2\u8ba8\u5b83\u4eec\u662f\u5426\u5177\u5907\u5fc3\u7406\u7279\u8d28\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u7279\u8d28\u662f\u5426\u7a33\u5b9a\u4e14\u6709\u52a9\u4e8e\u7406\u89e3\u5176\u884c\u4e3a\u3002\u672c\u6587\u501f\u9274\u5fc3\u7406\u5b66\u6d4b\u91cf\u5b66\u7684\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7528\u4e8e\u7814\u7a76LLMs\u4e2d\u7684\u5fc3\u7406\u5b66\uff0c\u5305\u62ec\u5fc3\u7406\u7ef4\u5ea6\u8bc6\u522b\u3001\u8bc4\u4f30\u6570\u636e\u96c6\u521b\u5efa\u548c\u7ed3\u679c\u9a8c\u8bc1\u3002\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684LLM\u5fc3\u7406\u6d4b\u91cf\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u516d\u79cd\u5fc3\u7406\u7ef4\u5ea6\uff1a\u4e2a\u6027\u3001\u4ef7\u503c\u89c2\u3001\u60c5\u7eea\u3001\u5fc3\u667a\u7406\u8bba\u3001\u52a8\u673a\u548c\u667a\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u542b\u4e86\u5341\u4e09\u4e2a\u5305\u542b\u591a\u6837\u573a\u666f\u548c\u9898\u578b\u7684\u6570\u636e\u96c6\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLMs\u5c55\u73b0\u51fa\u5e7f\u6cdb\u7684\u5fc3\u7406\u7279\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u89c2\u5bdf\u5230LLMs\u5728\u81ea\u6211\u62a5\u544a\u7684\u7279\u8d28\u4e0e\u5176\u5b9e\u9645\u884c\u4e3a\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u3002\u8be5\u8bba\u6587\u8be6\u7ec6\u5c55\u793a\u4e86LLMs\u7684\u5fc3\u7406\u6d4b\u91cf\u8bc4\u4f30\uff0c\u4e3aAI\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53ef\u9760\u8bc4\u4f30\u63d0\u4f9b\u4e86\u6d1e\u89c1\uff0c\u4ee5\u53ca\u53ef\u80fd\u7684\u5e94\u7528\u65b9\u5411\u3002|\n", "2406.18532": "|**2024-06-26**|**Symbolic Learning Enables Self-Evolving Agents**|Wangchunshu Zhou et.al.|[2406.18532](http://arxiv.org/abs/2406.18532)|**[link](https://github.com/aiwaves-cn/agents)**|**\u4eba\u5de5\u667a\u80fd\u754c\u901a\u8fc7\u6784\u5efa\"\u8bed\u8a00\u4ee3\u7406\"\uff08\u5373\u590d\u6742\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ba1\u9053\uff09\u6765\u63a2\u5bfb\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u9053\u8def\uff0c\u8fd9\u4e9b\u6a21\u578b\u7ed3\u5408\u4e86\u63d0\u793a\u6280\u672f\u548c\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u4f17\u591a\u5b9e\u9645\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f53\u524d\u8bed\u8a00\u4ee3\u7406\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u662f\u5176\u6a21\u578b\u4e2d\u5fc3\u6216\u5de5\u7a0b\u5bfc\u5411\uff1a\u63d0\u793a\u3001\u5de5\u5177\u548c\u7ba1\u9053\u7684\u6539\u8fdb\u4f9d\u8d56\u4e8e\u5927\u91cf\u7684\u4eba\u5de5\u4e13\u5bb6\u8bbe\u8ba1\uff0c\u800c\u975e\u81ea\u52a8\u4ece\u6570\u636e\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4ece\u6a21\u578b\u4e2d\u5fc3\u5411\u6570\u636e\u4e2d\u5fc3\u8f6c\u53d8\u2014\u2014\u8ba9\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u81ea\u4e3b\u5b66\u4e60\u548c\u9002\u5e94\u73af\u5883\uff0c\u662f\u5b83\u4eec\u8fc8\u5411AGI\u7684\u5173\u952e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\"\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\"\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u65b9\u6cd5\uff0c\u5b83\u4f7f\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u5728\u6570\u636e\u9a71\u52a8\u7684\u65b9\u5f0f\u4e0b\u81ea\u6211\u4f18\u5316\uff0c\u5229\u7528\u7b26\u53f7\u4f18\u5316\u5668\u3002\u6211\u4eec\u5c06\u4ee3\u7406\u89c6\u4e3a\u5177\u6709\u53ef\u5b66\u4e60\u6743\u91cd\u7684\u7b26\u53f7\u7f51\u7edc\uff0c\u8fd9\u4e9b\u6743\u91cd\u7531\u63d0\u793a\u3001\u5de5\u5177\u53ca\u5176\u7ec4\u5408\u65b9\u5f0f\u5b9a\u4e49\u3002\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u65e8\u5728\u6a21\u4eff\u8fde\u63a5\u4e3b\u4e49\u5b66\u4e60\u4e2d\u7684\u4e24\u4e2a\u57fa\u672c\u7b97\u6cd5\uff1a\u53cd\u5411\u4f20\u64ad\u548c\u68af\u5ea6\u4e0b\u964d\uff0c\u4f46\u5b83\u5904\u7406\u7684\u662f\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7684\u6743\u91cd\u3001\u635f\u5931\u548c\u68af\u5ea6\u3002\u6211\u4eec\u5728\u6807\u51c6\u57fa\u51c6\u548c\u590d\u6742\u73b0\u5b9e\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u6982\u5ff5\u9a8c\u8bc1\u5b9e\u9a8c\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u4f7f\u5f97\u8bed\u8a00\u4ee3\u7406\u5728\u521b\u5efa\u548c\u90e8\u7f72\u540e\u80fd\u591f\u81ea\u6211\u66f4\u65b0\uff0c\u5b9e\u73b0\u4e86\"\u81ea\u6211\u8fdb\u5316\u7684\u4ee3\u7406\"\u3002**|\n", "2406.18528": "|**2024-06-26**|**PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation**|Christoph Leiter et.al.|[2406.18528](http://arxiv.org/abs/2406.18528)|**[link](https://github.com/gringham/prexme)**|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5b83\u4eec\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u4f7f\u5176\u6210\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u8bc4\u4ef7\u7684\u6709\u529b\u5de5\u5177\uff0c\u7279\u522b\u9002\u7528\u4e8e\u8d44\u6e90\u532e\u4e4f\u548c\u65f6\u95f4\u9650\u5236\u7684\u573a\u666f\u3002\u672c\u6587\u63d0\u51faPrExMe\uff0c\u4e00\u9879\u5927\u89c4\u6a21\u7684\u63d0\u793a\u63a2\u7d22\u5ea6\u91cf\u6cd5\uff0c\u6211\u4eec\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u548c\u6458\u8981\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u8d85\u8fc7720\u79cd\u5f00\u6e90LLM\u4f5c\u4e3a\u5ea6\u91cf\u6807\u51c6\u7684\u6a21\u677f\uff0c\u603b\u8ba1\u7ea6660\u4e07\u6b21\u8bc4\u4f30\u3002\u8fd9\u9879\u8be6\u5c3d\u7684\u6bd4\u8f83\uff081\uff09\u4e3a\u8fd1\u671f\u5f00\u6e90LLMs\u4f5c\u4e3a\u8bc4\u4ef7\u6307\u6807\u7684\u8868\u73b0\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff1b\uff082\uff09\u63a2\u8ba8\u4e86\u4e0d\u540c\u63d0\u793a\u7b56\u7565\u7684\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65b9\u9762\uff0c\u5b58\u5728\u4e00\u4e9b\u60c5\u51b5\u4e0b\u63d0\u793a\u8868\u73b0\u7a33\u5b9a\uff1a\u6709\u4e9bLLMs\u8868\u73b0\u51fa\u7279\u6709\u7684\u504f\u597d\uff0c\u503e\u5411\u4e8e\u4f7f\u7528\u6587\u672c\u6807\u7b7e\u6765\u8bc4\u5206\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u503e\u5411\u4e8e\u8fd4\u56de\u6570\u503c\u5206\u6570\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u63d0\u793a\u7684\u7a33\u5b9a\u6027\u548c\u6a21\u578b\u6392\u540d\u53ef\u80fd\u53d7\u5230\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u66f4\u6539\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u5c06\u8f93\u51fa\u683c\u5f0f\u4ece\u201c0\u5230100\u201d\u6539\u4e3a\u201c-1\u5230+1\u201d\u53ef\u80fd\u4f1a\u663e\u8457\u6539\u53d8\u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u6709\u52a9\u4e8e\u7406\u89e3\u4e0d\u540c\u63d0\u793a\u65b9\u6cd5\u5bf9MT\u548c\u6458\u8981\u8bc4\u4ef7\u4e2dLLM-based\u5ea6\u91cf\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u6700\u7a33\u5b9a\u7684\u63d0\u793a\u6a21\u5f0f\uff0c\u5e76\u6307\u51fa\u4e86\u6f5c\u5728\u5c40\u9650\u6027\u3002|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521](http://arxiv.org/abs/2406.18521)|**[link](https://github.com/princeton-nlp/CharXiv)**|\u5728\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5904\u7406\u79d1\u5b66\u8bba\u6587\u6216\u8d22\u52a1\u62a5\u544a\u7b49\u4efb\u52a1\u65f6\uff0c\u56fe\u8868\u7406\u89e3\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u5f80\u5f80\u96c6\u4e2d\u5728\u7b80\u5316\u548c\u540c\u8d28\u5316\u7684\u56fe\u8868\u4e0a\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6a21\u677f\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u8bc4\u4f30\u8fc7\u4e8e\u4e50\u89c2\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5c3d\u7ba1\u5f00\u6e90\u6a21\u578b\u5728\u73b0\u6709\u57fa\u51c6\u4e0a\u53ef\u80fd\u8868\u73b0\u4f18\u4e8e\u5f3a\u5927\u7684\u4e13\u6709\u6a21\u578b\uff0c\u4f46\u901a\u8fc7\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\uff0c\u5982\u6539\u53d8\u56fe\u8868\u6216\u95ee\u9898\uff0c\u6027\u80fd\u4f1a\u4e0b\u964d\u9ad8\u8fbe34.5%\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faCharXiv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,323\u4e2a\u6765\u81eaarXiv\u8bba\u6587\u7684\u81ea\u7136\u3001\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u8868\u7684\u5168\u9762\u8bc4\u4f30\u5957\u4ef6\u3002CharXiv\u5305\u62ec\u4e24\u7c7b\u95ee\u9898\uff1a1\uff09\u63cf\u8ff0\u6027\u95ee\u9898\uff0c\u7528\u4e8e\u68c0\u67e5\u57fa\u672c\u56fe\u8868\u5143\u7d20\uff1b2\uff09\u63a8\u7406\u95ee\u9898\uff0c\u9700\u8981\u7efc\u5408\u5206\u6790\u56fe\u8868\u4e2d\u7684\u590d\u6742\u89c6\u89c9\u5143\u7d20\u3002\u6240\u6709\u56fe\u8868\u548c\u95ee\u9898\u90fd\u7531\u4e13\u5bb6\u7cbe\u5fc3\u6311\u9009\u3001\u6574\u7406\u548c\u9a8c\u8bc1\u4ee5\u4fdd\u8bc1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5f3a\u4e13\u6709\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\uff0c\u51c6\u786e\u7387\u4e3a47.1%\uff09\u4e0e\u6700\u5f3a\u5f00\u6e90\u6a21\u578b\uff08\u5982InternVL Chat V1.5\uff0c\u51c6\u786e\u7387\u4e3a29.2%\uff09\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\uff0c\u800c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5747\u8fdc\u4f4e\u4e8e\u4eba\u7c7b\u768480.5%\u6c34\u5e73\uff0c\u8fd9\u63ed\u793a\u4e86\u73b0\u6709MLLM\u5728\u56fe\u8868\u7406\u89e3\u80fd\u529b\u4e0a\u7684\u4e0d\u8db3\u3002\u6211\u4eec\u5e0c\u671bCharXiv\u80fd\u63a8\u52a8\u672a\u6765\u7684\u7814\u7a76\uff0c\u901a\u8fc7\u63d0\u4f9b\u66f4\u771f\u5b9e\u3001\u66f4\u5177\u4ee3\u8868\u6027\u7684\u8fdb\u6b65\u8861\u91cf\u6807\u51c6\uff0c\u4fc3\u8fdb\u56fe\u8868\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\u3002\u9879\u76ee\u9875\u9762\u548c\u6392\u884c\u699c\u53ef\u8bbf\u95ee\uff1ahttps://charxiv.github.io/\u3002|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512](http://arxiv.org/abs/2406.18512)|null|### \u6982\u8ff0 \u89e3\u91ca\u662f\u77e5\u8bc6\u5171\u4eab\u7684\u6838\u5fc3\uff0c\u5b83\u5efa\u7acb\u5728\u6c9f\u901a\u539f\u7406\u3001\u793e\u4f1a\u52a8\u6001\u548c\u5b66\u4e60\u7406\u8bba\u4e4b\u4e0a\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5bf9\u8bdd\u5f0f\u7684\u89e3\u91ca\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5176\u73af\u5883\u9ad8\u5ea6\u9002\u5e94\u6027\u548c\u4ea4\u4e92\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u5229\u7528\u4e86\u89e3\u91ca\u884c\u4e3a\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7406\u89e3\u89e3\u91ca\u8005\u548c\u88ab\u89e3\u91ca\u8005\u5728\u5bf9\u8bdd\u4e2d\u5982\u4f55\u8fd0\u7528\u7b56\u7565\u8fdb\u884c\u89e3\u91ca\u3001\u7406\u89e3\u548c\u4e92\u52a8\u7684\u5de5\u5177\u3002\u6211\u4eec\u5229\u7528Wachsmuth\u7b49\u4eba\u6784\u5efa\u7684WIRED YouTube\u7cfb\u5217\u6570\u636e\u96c6\uff0c\u5e76\u7531Booshehri\u7b49\u4eba\u8fdb\u884c\u4e86\u5e26\u6709\u89e3\u91ca\u884c\u4e3a\u7684\u6807\u6ce8\uff0c\u8fd9\u4e9b\u6ce8\u91ca\u4e3a\u6211\u4eec\u7406\u89e3\u5bf9\u8bdd\u4e2d\u89e3\u91ca\u8005\u5982\u4f55\u6784\u5efa\u56de\u5e94\u63d0\u4f9b\u4e86\u4f9d\u636e\u3002 \u968f\u7740\u53bb\u5e74\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u671f\u671b\u66f4\u597d\u5730\u7406\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u4ee5\u53ca\u5b83\u4eec\u5982\u4f55\u589e\u5f3a\u4e13\u5bb6\u89e3\u91ca\u8005\u7684\u5bf9\u8bdd\u4ea4\u6d41\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Booshehri\u7b49\u4eba2023\u5e74\u6807\u6ce8\u76845-Levels\u6570\u636e\u96c6\u6765\u8bc4\u4f30LLMs\u5728\u89e3\u91ca\u6027\u5bf9\u8bdd\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u8bc4\u4ef7LLMs\u751f\u6210\u89e3\u91ca\u8005\u56de\u5e94\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u79cd\u7b56\u7565\uff1a\u4eba\u7c7b\u89e3\u91ca\u8005\u7684\u539f\u59cb\u56de\u5e94\u3001GPT4\u7684\u6807\u51c6\u56de\u5e94\u4ee5\u53ca\u52a0\u5165\u4e86\u89e3\u91ca\u6b65\u9aa4\u7684GPT4\u56de\u5e94\u3002\u6211\u4eec\u9080\u8bf7\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u8fd9\u4e09\u79cd\u7b56\u7565\u8fdb\u884c\u8bc4\u4f30\u3002|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505](http://arxiv.org/abs/2406.18505)|null|## \u80cc\u666f \u5c3d\u7ba1\u73b0\u4ee3\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7406\u8bba\u4e0a\u80fd\u591f\u8868\u8fbe\u4efb\u610f\u53ef\u80fd\u7684\u4ee4\u724c\u5206\u5e03\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5229\u7528\u9884\u8bad\u7ec3\u65f6\u79ef\u7d2f\u7684\u4e16\u754c\u77e5\u8bc6\u6765\u7406\u89e3\u7269\u7406\u4e16\u754c\u4e2d\u7684\u4ee3\u7406\u884c\u4e3a\uff0c\u8fd9\u4e00\u65b9\u9762\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5b9e\u8bc1\u8003\u5bdf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u8fc7\u63a8\u7406\u5206\u6790\u4ee3\u7406\u7684\u884c\u4e3a\u53ca\u5176\u5bf9\u72b6\u6001\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u6784\u5efa\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\uff08agent mental modeling\uff09\u7684\u80fd\u529b\u3002\u8fd9\u53ef\u80fd\u63ed\u793a\u51fa\u5229\u7528LLMs\u89e3\u6790\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u884c\u4e3a\u7684\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u89e3\u91ca\u5f3a\u5316\u5b66\u4e60\uff08XRL\uff09\u7684\u5173\u952e\u6311\u6218\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u7279\u5b9a\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5728\u4e0d\u540c\u590d\u6742\u5ea6\u7684RL\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u62a5\u544a\u5173\u4e8e\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\u5efa\u7acb\u7684\u7814\u7a76\u7ed3\u679c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u8fd8\u65e0\u6cd5\u4ec5\u901a\u8fc7\u63a8\u7406\u5b8c\u5168\u5b9e\u73b0\u4ee3\u7406\u7684\u5fc3\u7406\u5efa\u6a21\uff0c\u8fd9\u9700\u8981\u8fdb\u4e00\u6b65\u521b\u65b0\u3002\u56e0\u6b64\uff0c\u8fd9\u9879\u5de5\u4f5c\u63d0\u4f9b\u4e86\u5bf9\u73b0\u4ee3LLMs\u80fd\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\u3002|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501](http://arxiv.org/abs/2406.18501)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5185\u63d2\u5b66\u4e60\uff08in-context learning\uff0cICL\uff09\u80fd\u529b\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u8fdb\u884c\u529f\u80fd\u7b49\u6548\u6027\u8bca\u65ad\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u9006\u9891\u7387\u6548\u5e94\uff08inverse frequency effect\uff0cIFE\uff09\u6765\u5206\u6790\u3002IFE\u73b0\u8c61\u6307\u7684\u662f\u5728\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6a21\u578b\u5e94\u5bf9\u7f55\u89c1\u6837\u4f8b\u4ea7\u751f\u7684\u66f4\u65b0\u5e45\u5ea6\u5927\u4e8e\u5e38\u89c1\u6837\u4f8b\u3002\u5728\u5fc3\u7406\u5b66\u4e2d\uff0c\u4eba\u7c7b\u5728\u7ed3\u6784\u5316\u63d0\u793a\uff08\u5982\u503e\u5411\u4e8e\u91cd\u590d\u6700\u8fd1\u63a5\u89e6\u7684\u53e5\u5b50\u7ed3\u6784\uff09\u60c5\u5883\u4e2d\u8868\u73b0\u51faIFE\uff0c\u8fd9\u8868\u660e\u5176\u53ef\u80fd\u6d89\u53ca\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u673a\u5236\u3002\u5b9e\u9a8c\u901a\u8fc7\u6a21\u62df\u7ed3\u6784\u5316\u63d0\u793a\u5728ICL\u4e2d\u7684\u5f71\u54cd\u53d1\u73b0\uff0cLLMs\u540c\u6837\u663e\u793a\u51faIFE\uff0c\u4e14\u8fd9\u4e00\u6548\u5e94\u5728\u66f4\u5927\u7684\u6a21\u578b\u4e2d\u66f4\u4e3a\u660e\u663e\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u7ed3\u679c\u652f\u6301\u4e86ICL\u672c\u8d28\u4e0a\u662f\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u7684\u5047\u8bbe\uff0c\u5373\u5728ICL\u7684\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u9690\u542b\u5730\u8ba1\u7b97\u4e86\u68af\u5ea6\u3002\u8bba\u6587\u7ed3\u8bba\u6307\u51fa\uff0c\u4eba\u7c7b\u548cLLMs\u90fd\u4f7f\u7528\u4e86\u57fa\u4e8e\u68af\u5ea6\u7684\u3001\u9519\u8bef\u9a71\u52a8\u7684\u5904\u7406\u673a\u5236\u3002|\n", "2406.18460": "|**2024-06-26**|**Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation**|Ahmed Njifenjou et.al.|[2406.18460](http://arxiv.org/abs/2406.18460)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b9\u6cd5\u6765\u521b\u5efa\u80fd\u591f\u8fdb\u884c\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u56de\u7b54\u7528\u6237\u95ee\u9898\uff0c\u4f46\u5c40\u9650\u4e8e\u5355\u5411\u95ee\u7b54\u5f62\u5f0f\uff0c\u800c\u975e\u771f\u6b63\u7684\u5bf9\u8bdd\u3002\u901a\u5e38\uff0c\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u6765\u8c03\u6574\u5b83\u4eec\u7684\u4ea4\u6d41\u98ce\u683c\uff0c\u4f46\u8fd9\u65e2\u6602\u8d35\u53c8\u9650\u4e8e\u5c11\u6570\u8bed\u8a00\u3002\u672c\u7814\u7a76\u63a2\u7d22\u4e86\u89d2\u8272\u626e\u6f14\u7684\u96f6\u6837\u672c\u63d0\u793a\u4f5c\u4e3a\u63d0\u9ad8\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u591a\u8bed\u8a00\u80fd\u529b\u5f3a\u7684\u8bad\u7ec3\u6709\u7d20\u6a21\u578b\uff08Beeching\u7b49\u4eba\uff0c2023\u5e74\uff09\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u9075\u5faa\u6307\u4ee4\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\u7cfb\u7edf\uff0c\u5f53\u4e0e\u9075\u5faa\u6307\u4ee4\u7684\u6a21\u578b\u2014\u2014\u8fd9\u91cc\u4f7f\u7528Vicuna\uff08Chiang\u7b49\u4eba\uff0c2023\u5e74\uff09\u7ed3\u5408\u65f6\uff0c\u80fd\u591f\u751f\u6210\u5728\u6cd5\u8bed\u4e2d\u7684\u5bf9\u8bdd\u4ee3\u7406\uff0c\u5728\u4e24\u9879\u4efb\u52a1\u4e2d\u751a\u81f3\u8d85\u8d8a\u4e86\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5e76\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.18449": "|**2024-06-26**|**Cascading Large Language Models for Salient Event Graph Generation**|Xingwei Tan et.al.|[2406.18449](http://arxiv.org/abs/2406.18449)|**[link](https://github.com/xingwei-warwick/callmsae)**|\u7531\u4e8e\u957f\u6587\u6863\u4e2d\u4e8b\u4ef6\u68c0\u6d4b\u3001\u5173\u7cfb\u8bc6\u522b\u4ee5\u53ca\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e0e\u7ed3\u6784\u5316\u56fe\u8c31\u7684\u6574\u5408\u7b49\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u4ece\u6587\u672c\u751f\u6210\u4e8b\u4ef6\u56fe\u8c31\u662f\u4e00\u9879\u6311\u6218\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u540c\u7b49\u91cd\u89c6\u6240\u6709\u4e8b\u4ef6\uff0c\u672a\u80fd\u533a\u5206\u5bf9\u7406\u89e3\u53d9\u4e8b\u81f3\u5173\u91cd\u8981\u7684\u5173\u952e\u4e8b\u4ef6\u3002\u672c\u6587\u63d0\u51faCALLMSAE\uff0c\u4e00\u4e2a\u57fa\u4e8eCAscading\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684SAlient Event\u56fe\u8c31\u751f\u6210\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u7684\u80fd\u529b\uff0c\u5e76\u907f\u514d\u4e86\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u9700\u6c42\u3002\u9996\u5148\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u6458\u8981\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u91cd\u8981\u4e8b\u4ef6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u4ee3\u7801\u7cbe\u70bc\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u751f\u6210\u4e8b\u4ef6\u5173\u7cfb\u56fe\uff0c\u6d88\u9664\u9519\u8bef\u7684\u5173\u7cfb\u5e76\u6062\u590d\u7f3a\u5931\u7684\u8fb9\u3002\u5bf9\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u56fe\u8c31\u751f\u6210\u6a21\u578b\u8fdb\u884c fine-tuning\uff0c\u5728\u4f7f\u7528 LLM \u751f\u6210\u7684\u56fe\u8c31\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f18\u4e8e\u4f7f\u7528 CAEVO \u751f\u6210\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u3002\u5728\u4eba\u7c7b\u6807\u6ce8\u7684\u6d4b\u8bd5\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u751f\u6210\u66f4\u7a81\u51fa\u4e14\u51c6\u786e\u7684\u56fe\u8c31\uff0c\u8d85\u8d8a\u4e86\u7ade\u4e89\u6027\u7684\u57fa\u7ebf\u3002|\n", "2406.18440": "|**2024-06-26**|**New intelligent empowerment for digital transformation**|Peng Yifeng et.al.|[2406.18440](http://arxiv.org/abs/2406.18440)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u8861\u91cf\u4f01\u4e1a\u7684\u6570\u5b57\u5316\u8f6c\u578b\uff08DT\uff09\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf92005\u5e74\u81f32022\u5e74\u95f4\u5728\u7ebd\u7ea6\u8bc1\u5238\u4ea4\u6613\u6240\u548c\u7eb3\u65af\u8fbe\u514b\u4e0a\u5e02\u76844407\u5bb6\u516c\u53f8\u7684\u5e74\u5ea6\u62a5\u544a\u8fdb\u884c\u5206\u6790\uff0c\u6784\u5efa\u4e86\u4e00\u5957\u5168\u9762\u7684DT\u6307\u6807\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cDT\u663e\u8457\u63d0\u9ad8\u4e86\u4f01\u4e1a\u7684\u8d22\u52a1\u8868\u73b0\u3002\u7136\u800c\uff0c\u4e0d\u540c\u7684\u6570\u5b57\u6280\u672f\u5bf9\u8d22\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u5404\u4e0d\u76f8\u540c\uff0c\u533a\u5757\u94fe\u6280\u672f\u7684\u79ef\u6781\u5f71\u54cd\u76f8\u5bf9\u8f83\u5c0f\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0DT\u901a\u8fc7\u63d0\u5347\u8fd0\u8425\u6548\u7387\u548c\u964d\u4f4e\u6210\u672c\u4fc3\u8fdb\u8d22\u52a1\u7ee9\u6548\u589e\u957f\u3002\u672c\u7814\u7a76\u4e3a\u5b66\u672f\u754c\u63d0\u4f9b\u4e86\u65b0\u7684DT\u8bc4\u4f30\u5de5\u5177\uff0c\u540c\u65f6\u62d3\u5bbd\u4e86\u751f\u6210\u4eba\u5de5\u667a\u80fd\u6280\u672f\u5728\u7ecf\u6d4e\u7814\u7a76\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002|\n", "2406.18406": "|**2024-06-26**|**IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons**|Dan Shi et.al.|[2406.18406](http://arxiv.org/abs/2406.18406)|null|\u4eba\u4eec\u666e\u904d\u8ba4\u4e3a\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u540e\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u6587\u672c\u65f6\u7684\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5185\u7f16\u7801\u7684\u53c2\u6570\u77e5\u8bc6\uff08\u5373\u77e5\u8bc6\u5e93\uff09\u4e0e\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u65b0\u77e5\u8bc6\u5b58\u5728\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014IRCAN\uff08\u8bc6\u522b\u548c\u91cd\u6743\u4e0a\u4e0b\u6587\u611f\u77e5\u795e\u7ecf\u5143\uff09\u3002IRCAN\u9996\u5148\u5229\u7528\u6574\u5408\u68af\u5ea6\u8ba1\u7b97\u5f97\u5230\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u5f52\u56e0\u5206\u6570\uff0c\u6765\u8bc6\u522b\u90a3\u4e9b\u5bf9\u5904\u7406\u8bed\u5883\u81f3\u5173\u91cd\u8981 \u7684\u795e\u7ecf\u5143\u3002\u63a5\u7740\uff0c\u901a\u8fc7\u91cd\u65b0\u8d4b\u6743\uff0c\u6211\u4eec\u5f3a\u5316\u8fd9\u4e9b\u8bc6\u522b\u51fa\u7684\u4e0a\u4e0b\u6587\u76f8\u5173\u795e\u7ecf\u5143\uff0c\u4ece\u800c\u5f15\u5bfcLLMs\u751f\u6210\u66f4\u7b26\u5408\u4e0a\u4e0b\u6587\u65b0\u77e5\u8bc6\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5728\u591a\u79cd\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cIRCAN\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u7684\u80fd\u529b\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u3001\u5373\u63d2\u5373\u7528\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u6a21\u578b\u4e2d\u3002|\n", "2406.19392": "|**2024-06-27**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392](http://arxiv.org/abs/2406.19392)|**[link](https://github.com/rextime/rextime)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aReXTime\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u95e8\u9488\u5bf9\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u5728\u89c6\u9891\u4e8b\u4ef6\u4e2d\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\u3002ReXTime\u5173\u6ce8\u7684\u662f\u8de8\u65f6\u95f4\u63a8\u7406\uff0c\u5373\u7406\u89e3\u5f53\u95ee\u9898\u53ca\u5176\u76f8\u5e94\u7684\u7b54\u6848\u51fa\u73b0\u5728\u4e0d\u540c\u7684\u89c6\u9891\u7247\u6bb5\u65f6\u7684\u4eba\u7c7b\u5f0f\u7406\u89e3\u3002\u8fd9\u79cd\u9700\u8981\u6df1\u5165\u7406\u89e3\u89c6\u9891\u7247\u6bb5\u4e4b\u95f4\u56e0\u679c\u5173\u7cfb\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u5bf9\u524d\u6cbf\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8bc4\u4ef7\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7ba1\u9053\uff0c\u7528\u4e8e\u751f\u6210\u65f6\u95f4\u63a8\u7406\u7684\u95ee\u7b54\u5bf9\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u7e41\u7410\u7684\u624b\u52a8\u6807\u6ce8\u9700\u6c42\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5305\u62ec921\u4e2a\u7cbe\u5fc3\u7b5b\u9009\u7684\u9a8c\u8bc1\u6837\u672c\u548c2,143\u4e2a\u6d4b\u8bd5\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u7ecf\u8fc7\u4eba\u5de5\u7cbe\u5fc3\u6311\u9009\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b66\u672f\u6a21\u578b\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7684\u8868\u73b0\u4ecd\u5b58\u5728\u663e\u8457\u768414.3%\u7684\u7cbe\u5ea6\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u65e0\u9700\u4eba\u5de5\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9,695\u4e2a\u673a\u5668\u751f\u6210\u6837\u672c\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6765\u63d0\u5347\u8de8\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u3002**|\n", "2406.19384": "|**2024-06-27**|**The Remarkable Robustness of LLMs: Stages of Inference?**|Vedang Lad et.al.|[2406.19384](http://arxiv.org/abs/2406.19384)|**[link](https://github.com/vdlad/remarkable-robustness-of-llms)**|**\u6211\u4eec\u901a\u8fc7\u5220\u9664\u548c\u4ea4\u6362\u76f8\u90bb\u5c42\u6765\u5c55\u793a\u5e76\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u60ca\u4eba\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u5e72\u9884\u63aa\u65bd\u4ecd\u80fd\u4fdd\u7559\u539f\u59cb\u6a21\u578b72%\u81f395%\u7684\u9884\u6d4b\u7cbe\u5ea6\uff0c\u800c\u4e14\u6a21\u578b\u5c42\u6570\u8d8a\u591a\uff0c\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u6839\u636e\u9010\u5c42\u5e72\u9884\u5b9e\u9a8c\u548c\u5176\u4ed6\u5b9e\u9a8c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5047\u8bbe\uff1a\u5b58\u5728\u56db\u79cd\u901a\u7528\u7684\u63a8\u7406\u9636\u6bb5\uff0c\u8de8\u8d8a\u516b\u79cd\u4e0d\u540c\u7684\u6a21\u578b\uff1a\u89e3\u7801\u5668\u9636\u6bb5\uff0c\u5c06\u539f\u59cb\u4ee4\u724c\u8868\u793a\u63d0\u5347\u4e3a\u66f4\u9ad8\u7ea7\u7684\u4e0a\u4e0b\u6587\u8868\u793a\uff1b\u7279\u5f81\u5de5\u7a0b\u9636\u6bb5\uff0c\u8fed\u4ee3\u4f18\u5316\u4efb\u52a1\u548c\u5b9e\u4f53\u7279\u5b9a\u7279\u5f81\uff1b\u7136\u540e\u662f\u6a21\u578b\u7684\u534a\u90e8\u5206\uff0c\u968f\u7740\u4e13\u95e8\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u9690\u85cf\u8868\u793a\u4e0e\u8bcd\u6c47\u7a7a\u95f4\u7684\u5bf9\u9f50\u8fdb\u5165\u4e00\u4e2a\u76f8\u53d8\u9636\u6bb5\uff1b\u6700\u540e\uff0c\u6700\u540e\u4e00\u5c42\u901a\u8fc7\u6d88\u9664\u5bf9\u9884\u6d4b\u9020\u6210\u5e72\u6270\u7684\u8fc7\u65f6\u7279\u5f81\uff0c\u7cbe\u7ec6\u5316\u540e\u7eed\u7684\u4ee4\u724c\u5206\u5e03\u3002**|\n", "2406.19358": "|**2024-06-27**|**The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models**|Xiliang Zhu et.al.|[2406.19358](http://arxiv.org/abs/2406.19358)|null|### \u6982\u8ff0 \u60c5\u611f\u5206\u6790\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u626e\u6f14\u7740\u6838\u5fc3\u89d2\u8272\u3002XLM-R\u548cmT5\u7b49\u591a\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u7684\u5173\u6ce8\u5ea6\u63d0\u5347\u3002\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63a8\u52a8\u4e86\u901a\u7528NLP\u4efb\u52a1\u7684\u53d1\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5728\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u65b9\u9762\u7684\u6027\u80fd\u5c1a\u672a\u5145\u5206\u63a2\u8ba8\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5b9e\u8bc1\u5206\u6790\uff0c\u6bd4\u8f83\u4e86\u516c\u5171\u5c0f\u578b\u591a\u8bed\u8a00\u6a21\u578b\uff08SMLM\uff09\u5982XLM-R\u4e0e\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLM\uff08\u5982Llama-3\uff09\u5728\u82f1\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u548c\u4e2d\u6587\u7684\u60c5\u611f\u5206\u6790\u4e2d\u7684\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8fc1\u79fb\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c31\u516c\u5f00\u6a21\u578b\u800c\u8a00\uff0cSMLM\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u8bbe\u7f6e\u4e2d\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u516c\u5f00LLM\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9002\u5e94\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u4e13\u6709\u7684GPT-3.5\u548cGPT-4\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u80fd\u529b\u4e0a\u9886\u5148\uff0c\u4f46\u5728\u5c11\u91cf\u6837\u672c\u573a\u666f\u4e0b\uff0c\u5b83\u4eec\u88ab\u516c\u5f00\u6a21\u578b\u8d85\u8d8a\u3002|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356](http://arxiv.org/abs/2406.19356)|**[link](https://github.com/umass-ml4ed/divert)**|## \u80cc\u666f \u9ad8\u8d28\u91cf\u7684\u5e72\u6270\u9879\u5bf9\u4e8e\u9009\u62e9\u9898\uff08\u5c24\u5176\u662f\u6570\u5b66\u9009\u62e9\u9898\uff09\u7684\u8bc4\u4f30\u548c\u6559\u5b66\u4ef7\u503c\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u5de5\u8bbe\u8ba1\u80fd\u591f\u53cd\u6620\u5b66\u751f\u5b9e\u9645\u77e5\u8bc6\u7f3a\u9677\u6216\u8bef\u89e3\u7684\u5e72\u6270\u9879\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u5728\u751f\u6210\u5e72\u6270\u9879\u65b9\u9762\u6709\u6240\u52a9\u76ca\uff0c\u4f46\u6570\u5b66\u8fd9\u7c7b\u5b66\u79d1\u7684\u5904\u7406\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7406\u89e3\u548c\u751f\u6210\u89e3\u91ca\u6027\u7684\u9519\u8bef\u8868\u793a\uff0c\u4ee5\u751f\u6210\u6570\u5b66\u9009\u62e9\u9898\u7684\u5e72\u6270\u9879\u3002\u672c\u6587\u4ecb\u7ecdDiVERT\uff08\u57fa\u4e8e\u6587\u672c\u7684\u53d8\u5f02\u8bef\u5dee\u751f\u6210\u5668\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u75287\u4ebf\u53c2\u6570\u5f00\u6e90LLM\u7684\u53d8\u5206\u65b9\u6cd5\uff0c\u5b83\u5728\u771f\u5b9e\u4e16\u754c\u6570\u5b66\u9009\u62e9\u9898\u6570\u636e\u96c6\uff08\u5305\u542b1,434\u4e2a\u95ee\u9898\uff0c\u88ab\u6570\u5341\u4e07\u5b66\u751f\u4f7f\u7528\uff09\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684GPT-4\u65b9\u6cd5\uff0cDiVERT\u5728\u5e72\u6270\u9879\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u4e0e\u6570\u5b66\u6559\u80b2\u8005\u7684\u540c\u884c\u8bc4\u5ba1\uff0c\u7ed3\u679c\u8868\u660eDiVERT\u751f\u6210\u7684\u9519\u8bef\u6807\u7b7e\u8d28\u91cf\u63a5\u8fd1\u4eba\u7c7b\u7f16\u5199\u7684\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u82f1\u6587\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u8f93\u51fa\u4e0d\u5e94\u5305\u542b\u9664\u6458\u8981\u5185\u5bb9\u5916\u7684\u4efb\u4f55\u5176\u4ed6\u5185\u5bb9\uff0c\u4e14\u786e\u4fdd\u4e0d\u51fa\u73b0\",\"\u5b57\u7b26\u3002|\n", "2406.19349": "|**2024-06-27**|**IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language**|Lucky Susanto et.al.|[2406.19349](http://arxiv.org/abs/2406.19349)|null|## \u7ffb\u8bd1 \u9488\u5bf9\u7f51\u7edc\u4ec7\u6068\u8a00\u8bba\u5bf9\u793e\u4f1a\u548c\u8c10\u7684\u4e25\u5cfb\u5a01\u80c1\uff0c\u7279\u522b\u662f\u5728\u5370\u5c3c\u8fd9\u7c7b\u56fd\u5bb6\uff0c\u8fd1\u5e74\u6765\u4ec7\u6068\u8a00\u8bba\u5728\u7ebf\u6bd4\u7387\u589e\u957f\u4e86\u5341\u500d\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u68c0\u6d4b\u673a\u5236\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5145\u8db3\u7684\u6807\u8bb0\u6570\u636e\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u5370\u5c3c\u6587\u672c\u7684\uff0c\u8fd9\u4e00\u8fdb\u5c55\u53d7\u5230\u4e86\u963b\u788d\u3002\u8fb9\u7f18\u5316\u7fa4\u4f53\uff0c\u5982\u4ec0\u53f6\u6d3e\u3001LGBTQ\u7b49\u5c11\u6570\u7fa4\u4f53\uff0c\u9762\u4e34\u7684\u6311\u6218\u66f4\u5927\uff0c\u56e0\u4e3a\u4ec7\u6068\u8a00\u8bba\u62a5\u544a\u4e0d\u8db3\uff0c\u73b0\u6709\u7684\u68c0\u6d4b\u5de5\u5177\u5bf9\u5176\u7406\u89e3\u6709\u9650\u3002\u6b64\u5916\uff0c\u5f53\u524d\u6570\u636e\u96c6\u5bf9\u4e3b\u89c2\u6027\u7684\u5904\u7406\u4e0d\u8db3\uff0c\u52a0\u5267\u4e86\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faIndoToxic2024\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u5370\u5c3c\u4ec7\u6068\u8a00\u8bba\u548c\u6bd2\u6027\u5206\u7c7b\u6570\u636e\u96c6\uff0c\u5305\u542b43,692\u6761\u8bb0\u5f55\uff0c\u753119\u540d\u591a\u5143\u5316\u7684\u4e2a\u4f53\u8fdb\u884c\u6807\u6ce8\uff0c\u7279\u522b\u5173\u6ce8\u9009\u4e3e\u671f\u95f4\u9488\u5bf9\u56fd\u5185\u5f31\u52bf\u7fa4\u4f53\uff08\u5982\u603b\u7edf\u9009\u4e3e\u4e2d\u7684\u7279\u5b9a\u7fa4\u4f53\uff09\u7684\u6587\u672c\u3002\u6211\u4eec\u4f7f\u7528BERT\u6a21\u578b\uff08IndoBERTweet\uff09\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4e3a\u4e03\u79cd\u4e8c\u5143\u5206\u7c7b\u4efb\u52a1\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff0c\u53d6\u5f97\u4e860.78\u7684\u5b8fF1\u5206\u6570\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5c06\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u878d\u5165\u5176\u4e2d\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578bgpt-3.5-turbo\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u8b66\u544a\uff0c\u8fc7\u5ea6\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u53ef\u80fd\u5bfc\u81f4\u7ec6\u5316\u6a21\u578b\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u8fd9\u4f1a\u5bfc\u81f4\u6570\u636e\u788e\u7247\u5316\u3002|\n", "2406.19317": "|**2024-06-27**|**Jump Starting Bandits with LLM-Generated Prior Knowledge**|Parand A. Alamdari et.al.|[2406.19317](http://arxiv.org/abs/2406.19317)|null|\u6211\u4eec\u63d0\u4f9b\u4e86\u6709\u529b\u7684\u8bc1\u636e\uff0c\u5c55\u793a\u4e86\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\u76f8\u7ed3\u5408\u7684\u4f18\u52bf\u3002\u4e0a\u4e0b\u6587\u5316\u8001\u864e\u673a\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u7528\u4e8e\u6839\u636e\u7528\u6237\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u751f\u6210\u4e2a\u6027\u5316\u5efa\u8bae\u3002\u6211\u4eec\u8868\u660e\uff0c\u7ecf\u8fc7\u5927\u89c4\u6a21\u8bed\u6599\u5e93\u8bad\u7ec3\uff0c\u5bcc\u542b\u4eba\u7c7b\u77e5\u8bc6\u548c\u504f\u597d\u7684LLMs\u80fd\u591f\u5f88\u597d\u5730\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u4ece\u800c\u901a\u8fc7\u542f\u52a8\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6765\u51cf\u5c11\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\uff08regret\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521d\u59cb\u5316\u7b97\u6cd5\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u63a5\u8fd1\u4eba\u7c7b\u504f\u597d\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u4f9b\u8001\u864e\u673a\u5b66\u4e60\u4f7f\u7528\u3002\u8fd9\u663e\u8457\u964d\u4f4e\u4e86\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\u548c\u6570\u636e\u6536\u96c6\u6210\u672c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e24\u7ec4\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4f7f\u7528LLMs\u4f5c\u4e3a\u5360\u535c\u8005\uff08oracle\uff09\u7684\u5b9e\u9a8c\u548c\u57fa\u4e8e\u8054\u5408\u8c03\u67e5\u5b9e\u9a8c\u6570\u636e\u7684\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u3002|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292](http://arxiv.org/abs/2406.19292)|null|\u8fd1\u671f\u7684\u7814\u7a76\u6307\u51fa\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u957f\u6587\u672c\u8f93\u5165\u65f6\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5408\u6210\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6570\u503c\u578b\u952e\u503c\u5bf9\u68c0\u7d22\u4efb\u52a1\u3002\u6211\u4eec\u5728GPT-3.5 Turbo\u548cMistral 7B\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u8fd9\u79cd\u6570\u636e\u96c6\u7684\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u957f\u6587\u672c\u73af\u5883\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u5206\u6790\u4e86\u5fae\u8c03\u540e\u7684\u6a21\u578b\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u4ece\u5408\u6210\u4efb\u52a1\u8fc1\u79fb\u5230\u5b9e\u9645\u8bc4\u4f30\uff08\u5982\u572820\u6587\u6863MDQA\u4e2d\u7684\u4f4d\u7f6e10\u5904\u63d0\u534710.5%\uff09\u65b9\u9762\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u7ecf\u8fc7\u6211\u4eec\u5408\u6210\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u5728\u901a\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u4fdd\u6301\u7a33\u5b9a\uff0c\u800c\u4f7f\u7528\u5176\u4ed6\u57fa\u4e8e\u957f\u6587\u672c\u589e\u5f3a\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u53ef\u80fd\u4f1a\u5bfc\u81f4\u9519\u8bef\u589e\u52a0\uff08\u4f8b\u5982\uff0c\u5728TriviaQA\u4e0a\uff0cMistral 7B\u5728\u6211\u4eec\u7684\u5408\u6210\u6570\u636e\u4e0a\u5fae\u8c03\u65e0\u660e\u663e\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u5176\u4ed6\u57fa\u7ebf\u6570\u636e\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u8303\u56f4\u57282.33%\u52306.19%\u4e4b\u95f4\uff09\u3002\u672c\u7814\u7a76\u7a81\u663e\u4e86\u901a\u8fc7\u5408\u6210\u6570\u636e\u5fae\u8c03\u6765\u63d0\u5347LLMs\u5728\u957f\u6587\u672c\u4efb\u52a1\u6027\u80fd\u7684\u6f5c\u529b\u3002|\n", "2406.19283": "|**2024-06-27**|**PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models**|Cathy Mengying Fang et.al.|[2406.19283](http://arxiv.org/abs/2406.19283)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPhysioLLM\u7684\u4e92\u52a8\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u53ef\u7a7f\u6234\u8bbe\u5907\u7684\u751f\u7406\u6570\u636e\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u5065\u5eb7\u7406\u89e3\u548c\u63a2\u7d22\u3002\u4e0e\u5546\u4e1a\u5065\u5eb7\u5e94\u7528\u4e0d\u540c\uff0cPhysioLLM\u5177\u5907\u5168\u9762\u7684\u7edf\u8ba1\u5206\u6790\u529f\u80fd\uff0c\u80fd\u53d1\u73b0\u7528\u6237\u6570\u636e\u4e2d\u7684\u5173\u8054\u548c\u8d8b\u52bf\u3002\u7528\u6237\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u95ee\uff0c\u83b7\u53d6\u751f\u6210\u7684\u4e2a\u6027\u5316\u6d1e\u5bdf\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u4fe1\u606f\u5236\u5b9a\u884c\u52a8\u76ee\u6807\u3002\u4ee5\u6539\u5584\u7761\u7720\u8d28\u91cf\u4e3a\u4f8b\uff0c\u56e0\u4e3a\u5176\u53ef\u901a\u8fc7\u751f\u7406\u6570\u636e\u91cf\u5316\u4e14\u5bf9\u6574\u4f53\u5065\u5eb7\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u4e00\u9879\u6d89\u53ca24\u540dFitbit\u667a\u80fd\u624b\u8868\u7528\u6237\u7684\u7528\u6237\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86PhysioLLM\u5728\u4fc3\u8fdb\u5bf9\u5065\u5eb7\u6570\u636e\u7684\u6df1\u5165\u4e2a\u6027\u5316\u7406\u89e3\uff0c\u4ee5\u53ca\u652f\u6301\u5b9e\u73b0\u4e2a\u4eba\u5065\u5eb7\u76ee\u6807\u65b9\u9762\uff0c\u4f18\u4e8eFitbit\u5e94\u7528\u548c\u901a\u7528LLM\u804a\u5929\u673a\u5668\u4eba\u3002|\n", "2406.19280": "|**2024-06-27**|**HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale**|Junying Chen et.al.|[2406.19280](http://arxiv.org/abs/2406.19280)|**[link](https://github.com/freedomintelligence/huatuogpt-vision)**|**\u968f\u7740\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u533b\u5b66\u591a\u6a21\u6001\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u533b\u5b66\u5f71\u50cf-\u6587\u672c\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u53d7\u9650\u4e8e\u6570\u636e\u9690\u79c1\u95ee\u9898\u548c\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ecd\u9762\u4e34\u6311\u6218\u3002\u65e9\u671f\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528PubMed\u7684\u5927\u578b\u53bb\u6807\u8bc6\u5316\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\u6765\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u4ecd\u53d7\u5230\u6570\u636e\u566a\u97f3\u7684\u5f71\u54cd\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4f18\u5316\u4e86PubMed\u4e2d\u7684\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\uff0c\u5e76\u5229\u7528GPT-4V\u5728\u201c\u975e\u76f2\u201d\u6a21\u5f0f\u4e0b\u8fdb\u884c\u6570\u636e\u6e05\u6d17\u548c\u683c\u5f0f\u8f6c\u6362\uff0c\u521b\u5efa\u4e86PubMedVision\u6570\u636e\u96c6\uff0c\u5305\u542b130\u4e07\u4efd\u533b\u5b66\u89c6\u89c9\u95ee\u7b54\u6837\u672c\u3002\u6211\u4eec\u7684\u9a8c\u8bc1\u8868\u660e\uff1a\uff081\uff09PubMedVision\u663e\u8457\u63d0\u5347\u4e86\u5f53\u524d\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u5728\u533b\u5b66\u9886\u57df\u7684\u6027\u80fd\uff0c\u5728\u8bf8\u5982MMMU Health & Medicine track\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u6539\u5584\uff1b\uff082\uff09\u533b\u5b66\u4e13\u5bb6\u7684\u624b\u52a8\u68c0\u67e5\u548c\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u5728\u6570\u636e\u8d28\u91cf\u4e0a\u4f18\u4e8e\u5176\u4ed6\u6784\u5efa\u65b9\u6cd5\u3002\u5229\u7528PubMedVision\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u540d\u4e3aHuatuoGPT-Vision\u7684340\u4ebf\u53c2\u6570\u7684\u533b\u5b66\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u5728\u516c\u5f00\u6e90\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u533b\u5b66\u591a\u6a21\u6001\u573a\u666f\u4e2d\u663e\u793a\u51fa\u4f18\u8d8a\u6027\u80fd\u3002**|\n", "2406.19271": "|**2024-06-27**|**AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning**|Praneeth Vadlapati et.al.|[2406.19271](http://arxiv.org/abs/2406.19271)|**[link](https://github.com/Pro-GenAI/AutoPureData)**|**\u4eba\u4eec\u5bf9\u6700\u65b0\u7684\u548c\u53ef\u9760\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9700\u6c42\u6301\u7eed\u589e\u957f\u3002\u901a\u5e38\uff0cLLMs\u662f\u57fa\u4e8e\u56fa\u5b9a\u7684\u6570\u636e\u96c6\u8bad\u7ec3\u7136\u540e\u90e8\u7f72\u7684\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u6570\u636e\u4f1a\u968f\u7740\u65f6\u95f4\u9010\u6e10\u8fc7\u65f6\u3002\u7814\u7a76\u5173\u6ce8\u5982\u4f55\u5229\u7528\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u66f4\u65b0AI\u6a21\u578b\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u6570\u636e\u8d28\u91cf\u4e0e\u5b89\u5168\u7684\u987e\u8651\uff0c\u5982\u504f\u89c1\u3001\u5783\u573e\u4fe1\u606f\u7b49\u3002\u786e\u4fdd\u6570\u636e\u7eaf\u51c0\u5bf9\u4e8e\u751f\u6210\u53ef\u9760\u7684\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5728\u4e0d\u7eaf\u6570\u636e\u4e0a\u8bad\u7ec3\u53ef\u80fd\u5bfc\u81f4\u4e0d\u826f\u7ed3\u679c\u3002\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5b83\u6536\u96c6\u7f51\u7edc\u6570\u636e\uff0c\u5e76\u501f\u52a9\u73b0\u6709\u53ef\u4fe1\u7684AI\u6a21\u578b\u81ea\u52a8\u7b5b\u9009\u51fa\u4e0d\u9700\u8981\u7684\u5185\u5bb9\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5904\u7406\u4e86\u4e00\u5c0f\u90e8\u5206\u7f51\u7edc\u6570\u636e\uff0c\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\u7684\u6570\u636e\u51c0\u5316\u6548\u679c\u3002**|\n", "2406.20098": "|**2024-06-28**|**Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs**|Sukmin Yun et.al.|[2406.20098](http://arxiv.org/abs/2406.20098)|**[link](https://github.com/mbzuai-llm/web2code)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u7b49\u591a\u79cd\u6a21\u6001\u7684\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7406\u89e3\u548c\u751f\u6210\u7f51\u9875\u622a\u56fe\u4ee5\u53ca\u76f8\u5e94\u7684HTML\u4ee3\u7801\u65b9\u9762\u7684\u80fd\u529b\u76f8\u5bf9\u8f83\u5f31\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faWeb2Code\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u62ec\u5927\u89c4\u6a21\u7f51\u9875\u5230\u4ee3\u7801\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u6307\u4ee4\u8c03\u4f18\uff0c\u5e76\u8bc4\u4f30MLLM\u5728\u7f51\u9875\u7406\u89e3\u53caHTML\u4ee3\u7801\u8f6c\u6362\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u6784\u5efa\u6570\u636e\u96c6\u65f6\uff0c\u5229\u7528\u9884\u8bad\u7ec3\u7684LLMs\u589e\u5f3a\u73b0\u6709\u7684\u7f51\u9875\u5230\u4ee3\u7801\u6570\u636e\u96c6\uff0c\u5e76\u751f\u6210\u591a\u6837\u5316\u7684\u7f51\u9875\u56fe\u7247\uff0c\u4ee5\u4f9b\u6e32\u67d3\u3002\u8f93\u5165\u662f\u7f51\u9875\u56fe\u7247\u548c\u8bf4\u660e\uff0c\u8f93\u51fa\u662f\u7f51\u9875\u7684HTML\u4ee3\u7801\uff0c\u540c\u65f6\u52a0\u5165\u5173\u4e8e\u7f51\u9875\u5185\u5bb9\u7684\u4e30\u5bcc\u81ea\u7136\u8bed\u8a00\u95ee\u7b54\u5bf9\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u7f51\u9875\u5185\u5bb9\u7684\u5168\u9762\u7406\u89e3\u3002\u4e3a\u4e86\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6d4b\u8bd5\u6846\u67b6\uff0c\u7528\u4e8e\u6d4b\u8bd5MLLM\u5728\u7f51\u9875\u7406\u89e3\u4e0e\u7f51\u9875\u5230\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6280\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0d\u4ec5\u6709\u76ca\u4e8e\u6211\u4eec\u63d0\u51fa\u7684\u4efb\u52a1\uff0c\u8fd8\u5728\u89c6\u89c9\u9886\u57df\u7684\u4e00\u822c\u6027\u80fd\u4e0a\u6709\u6240\u63d0\u5347\uff0c\u800c\u5148\u524d\u7684\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u63a8\u52a8\u901a\u7528MLLM\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u9002\u7528\u4e8e\u7f51\u7edc\u5185\u5bb9\u751f\u6210\u548c\u81ea\u52a8\u5316\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095](http://arxiv.org/abs/2406.20095)|**[link](https://github.com/lostxine/llara)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLaRA\uff08\u5927\u578b\u8bed\u8a00\u548c\u673a\u5668\u4eba\u52a9\u624b\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u8f6c\u5316\u4e3a\u5bf9\u8bdd\u5f62\u5f0f\uff0c\u901a\u8fc7\u7ed3\u5408\u989d\u5916\u7684\u6570\u636e\u8f85\u52a9\u5b66\u4e60\uff0c\u63d0\u5347\u54cd\u5e94\u8d28\u91cf\u3002\u5229\u7528\u5177\u5907\u89c6\u89c9\u8f93\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\uff0c\u5373\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5904\u7406\u72b6\u6001\u4fe1\u606f\uff0c\u4f5c\u4e3a\u89c6\u89c9-\u6587\u672c\u63d0\u793a\uff0c\u5e76\u751f\u6210\u6700\u4f18\u7684\u673a\u5668\u4eba\u51b3\u7b56\u7b56\u7565\u3002\u9996\u5148\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u65b9\u6cd5\uff0c\u4ece\u73b0\u6709\u7684\u884c\u4e3a\u514b\u9686\u6570\u636e\u4e2d\u751f\u6210\u591a\u6837\u4e14\u9ad8\u8d28\u91cf\u7684\u673a\u5668\u4eba\u6307\u4ee4\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u5b9a\u5236\u7684\u5bf9\u8bdd\u5f0f\u683c\u5f0f\u5bf9VLM\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u5176\u80fd\u591f\u751f\u6210\u6709\u610f\u4e49\u7684\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLaRA\u6846\u67b6\u5728\u591a\u4e2a\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u9884\u8bad\u7ec3\u6a21\u578b\u5df2\u5728\u63d0\u4f9b\u3002**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094](http://arxiv.org/abs/2406.20094)|**[link](https://github.com/tencent-ailab/persona-hub)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5185\u7684\u591a\u79cd\u89c6\u89d2\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u4eba\u5de5\u5408\u6210\u6570\u636e\u3002\u4e3a\u4e86\u5728\u5927\u89c4\u6a21\u4e0a\u5145\u5206\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5f15\u5165\u4e86Persona Hub\uff0c\u8fd9\u662f\u4e00\u4e2a\u4ece\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u6574\u7406\u51fa\u7684\u4e00\u4ebf\u4e2a\u591a\u5143\u5316\u4eba\u683c\u7684\u96c6\u5408\uff0c\u76f8\u5f53\u4e8e\u5168\u7403\u4eba\u53e3\u7684\u7ea613%\u3002\u8fd9\u4e9b\u4eba\u683c\u4f5c\u4e3a\u5206\u5e03\u5f0f\u4e16\u754c\u77e5\u8bc6\u8f7d\u4f53\uff0c\u51e0\u4e4e\u53ef\u4ee5\u8c03\u7528LLM\u5185\u5305\u542b\u7684\u5404\u7c7b\u89c2\u70b9\uff0c\u4ece\u800c\u63a8\u52a8\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u5408\u6210\u6570\u636e\u521b\u5efa\uff0c\u9002\u7528\u4e8e\u5404\u79cd\u573a\u666f\u3002\u901a\u8fc7\u5c55\u793aPersona Hub\u5982\u4f55\u5728\u5927\u89c4\u6a21\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u548c\u903b\u8f91\u63a8\u7406\u95ee\u9898\u3001\u6307\u4ee4\uff08\u7528\u6237\u63d0\u793a\uff09\u3001\u5bcc\u542b\u77e5\u8bc6\u7684\u6587\u672c\u3001\u6e38\u620fNPC\u548c\u5de5\u5177\uff08\u51fd\u6570\uff09\u7b49\u65b9\u9762\u7684\u5e94\u7528\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u5177\u6709\u591a\u6837\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u6613\u7528\u6027\uff0c\u53ef\u80fd\u5f15\u9886\u5408\u6210\u6570\u636e\u521b\u9020\u548c\u5b9e\u9645\u5e94\u7528\u7684\u65b0\u8303\u5f0f\uff0c\u5bf9LLM\u7684\u7814\u7a76\u548c\u53d1\u5c55\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002|\n", "2406.20092": "|**2024-06-28**|**LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression**|Jieneng Chen et.al.|[2406.20092](http://arxiv.org/abs/2406.20092)|**[link](https://github.com/beckschen/llavolta)**|**\u5c3d\u7ba1\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5d4c\u5165\u538b\u7f29\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u4e2d\u7684\u89c6\u89c9\u4ee4\u724c\u538b\u7f29\u4ecd\u7136\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u7814\u7a76\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u5197\u4f59\u6027\u4ee5\u53ca\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6709\u6548\u8bad\u7ec3\u3002\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d4b\u8bd5\u9636\u6bb5\u901a\u8fc7\u7b80\u5355\u5e73\u5747\u6c60\u5316\u6d88\u9664\u9ad8\u8fbe70%\u7684\u89c6\u89c9\u4ee4\u724c\uff0cGQA\u57fa\u51c6\u7684\u89c6\u89c9\u95ee\u7b54\u51c6\u786e\u7387\u4ec5\u4e0b\u964d3%\uff0c\u8fd9\u663e\u793a\u51fa\u89c6\u89c9\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u5927\u91cf\u5197\u4f59\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Visual Context Compressor\uff0c\u5b83\u5728\u8bad\u7ec3\u9636\u6bb5\u51cf\u5c11\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e3a\u4e86\u5728\u538b\u7f29\u89c6\u89c9\u4ee4\u724c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4fe1\u606f\u635f\u5931\u5e76\u4fdd\u6301\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u8f7b\u91cf\u7ea7\u8bad\u7ec3\u65b9\u6848LLaVolta\u3002LLaVolta\u91c7\u7528\u5206\u9636\u6bb5\u7684\u89c6\u89c9\u4e0a\u4e0b\u6587\u538b\u7f29\u7b56\u7565\uff0c\u4ece\u91cd\u5ea6\u5230\u8f7b\u5ea6\u9010\u6e10\u538b\u7f29\uff0c\u6700\u7ec8\u5728\u8bad\u7ec3\u7ed3\u675f\u65f6\u5b8c\u5168\u4e0d\u8fdb\u884c\u538b\u7f29\uff0c\u4ece\u800c\u5728\u6d4b\u8bd5\u65f6\u4e0d\u4f1a\u4e22\u5931\u4efb\u4f55\u4fe1\u606f\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u5347\u4e86\u591a\u6a21\u6001\u6a21\u578b\u5728\u56fe\u50cf-\u8bed\u8a00\u548c\u89c6\u9891-\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u6210\u672c\u3002\u4ee3\u7801\u5df2\u5728https://github.com/Beckschen/LLaVolta\u4e0a\u5f00\u6e90\u3002**|\n", "2406.20087": "|**2024-06-28**|**ProgressGym: Alignment with a Millennium of Moral Progress**|Tianyi Qiu et.al.|[2406.20087](http://arxiv.org/abs/2406.20087)|null|\u968f\u7740\u524d\u6cbf\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u77e5\u8bc6\u8bba\u4e2d\u7684\u5f71\u54cd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5b83\u4eec\u53ef\u80fd\u5f3a\u5316\u793e\u4f1a\u666e\u904d\u7684\u4ef7\u503c\u89c2\uff0c\u8fdb\u800c\u52a0\u5267\u9519\u8bef\u9053\u5fb7\u89c2\u5ff5\u7684\u56fa\u5316\uff0c\u5bfc\u81f4\u5e7f\u6cdb\u7684\u793e\u4f1a\u95ee\u9898\u6301\u7eed\u5b58\u5728\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6f5c\u5728\u98ce\u9669\uff0c\u6211\u4eec\u63d0\u51fa\u8fdb\u6b65\u5bf9\u9f50\u4f5c\u4e3a\u4e00\u79cd\u6280\u672f\u89e3\u51b3\u65b9\u6848\u3002\u8fdb\u6b65\u5bf9\u9f50\u7b97\u6cd5\u65e8\u5728\u5b66\u4e60\u4eba\u7c7b\u9053\u5fb7\u8fdb\u6b65\u7684\u673a\u5236\uff0c\u4ece\u800c\u5f25\u8865\u73b0\u6709\u5bf9\u9f50\u65b9\u6cd5\u5bf9\u5f53\u4ee3\u9053\u5fb7\u76f2\u70b9\u7684\u654f\u611f\u6027\u3002\u4e3a\u4e86\u63a8\u52a8\u8fdb\u6b65\u5bf9\u9f50\u7684\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ProgressGym\uff0c\u4e00\u4e2a\u5b9e\u9a8c\u6027\u6846\u67b6\uff0c\u5b83\u4ece\u5386\u53f2\u4e2d\u5b66\u4e60\u9053\u5fb7\u8fdb\u6b65\u7684\u89c4\u5f8b\uff0c\u4ee5\u4fc3\u8fdb\u73b0\u5b9e\u4e16\u754c\u9053\u5fb7\u51b3\u7b56\u7684\u672a\u6765\u53d1\u5c55\u3002\u501f\u52a99\u4e2a\u4e16\u7eaa\u7684\u5386\u53f2\u6587\u672c\u548c18\u4e2a\u5386\u53f2LLMs\uff0cProgressGym\u5c06\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u8fdb\u6b65\u5bf9\u9f50\u6311\u6218\u8f6c\u5316\u4e3a\u5177\u4f53\u7684\u57fa\u51c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u8ffd\u8e2a\u6f14\u53d8\u7684\u4ef7\u503c\uff08PG-Follow\uff09\u3001\u9884\u6d4b\u9053\u5fb7\u8fdb\u6b65\uff08PG-Predict\uff09\u4ee5\u53ca\u8c03\u8282\u4eba\u4e0eAI\u4ef7\u503c\u53d8\u8fc1\u4e4b\u95f4\u7684\u53cd\u9988\u5faa\u73af\uff08PG-Coevolve\uff09\u3002\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u65f6\u95f4\u7ef4\u5ea6\u7684\u65b9\u6cd5\uff0c\u800c\u4f20\u7edf\u7684\u5bf9\u9f50\u7b56\u7565\u65e0\u6cd5\u80dc\u4efb\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7ec8\u8eab\u5b66\u4e60\u548c\u5916\u63a8\u7b97\u6cd5\u4f5c\u4e3a\u8fdb\u6b65\u5bf9\u9f50\u7684\u57fa\u672c\u65b9\u6cd5\uff0c\u5e76\u5efa\u7acb\u4e86\u4e00\u4e2a\u5f00\u653e\u7684\u6392\u884c\u699c\uff0c\u9080\u8bf7\u521b\u65b0\u7b97\u6cd5\u548c\u65b0\u6311\u6218\u3002\u8be5\u6846\u67b6\u548c\u6392\u884c\u699c\u5206\u522b\u53ef\u5728https://github.com/PKU-Alignment/ProgressGym \u548c https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard \u83b7\u53d6\u3002|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085](http://arxiv.org/abs/2406.20085)|null|\u57fa\u4e8e\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u65b9\u6cd5\u5df2\u7ecf\u5728\u751f\u6210\u5404\u79cd\u5e03\u5c40\u7684\u9ad8\u8d28\u91cf\u56fe\u50cf\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u4e0b\u6e38\u611f\u77e5\u4efb\u52a1\u5177\u6709\u663e\u8457\u76ca\u5904\u3002\u7136\u800c\uff0c\u4ec5\u4f9d\u8d56\u8bed\u8a00\u63cf\u8ff0\u548c\u4e00\u4e2a\u5408\u9002\u7684\u591a\u5b9e\u4f8b\u8bc4\u4f30\u6307\u6807\u6765\u5b9e\u73b0\u5168\u81ea\u52a8\u5e03\u5c40\u751f\u6210\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Auto Cherry-Picker\uff08ACP\uff09\uff0c\u65e8\u5728\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6837\u672c\uff0c\u4ee5\u589e\u5f3a\u611f\u77e5\u548c\u591a\u6a21\u6001\u8bad\u7ec3\u6548\u679c\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u6982\u5ff5\u5217\u8868\uff0c\u6211\u4eec\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u8be6\u7ec6\u7684\u63cf\u8ff0\u5e76\u8bbe\u8ba1\u5408\u7406\u7684\u5e03\u5c40\u3002\u7136\u540e\uff0c\u4f7f\u7528\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u591a\u4e2a\u56fe\u7247\u3002\u63a5\u7740\uff0c\u6211\u4eec\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bc4\u4f30\u6307\u6807\u5bf9\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u70bc\uff0c\u786e\u4fdd\u8d28\u91cf\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u590d\u5408\u5e03\u5c40\u4e0e\u56fe\u50cf\u8bc4\u5206\uff08Composite Layout and Image Score\uff0cCLIS\uff09\u8fd9\u4e00\u65b0\u6307\u6807\uff0c\u7528\u4e8e\u516c\u6b63\u5730\u8bc4\u4f30\u751f\u6210\u7684\u56fe\u50cf\u3002\u6211\u4eec\u7684\u5408\u6210\u9ad8\u8d28\u793a\u4f8b\u5728\u5b9a\u5236\u521d\u59cb\u6982\u5ff5\u5217\u8868\u65f6\uff0c\u80fd\u591f\u6709\u6548\u63d0\u5347\u5404\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5c3e\u5206\u5e03\u548c\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u7684\u95ee\u9898\u4e0a\u3002\u4e0b\u6e38\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cACP\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6a21\u578b\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86CLIS\u4e0e\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u53d1\u73b0CLIS\u5206\u6570\u8d8a\u9ad8\uff0c\u6027\u80fd\u8d8a\u597d\u3002\u8fd9\u8868\u660e\u8bc4\u4f30\u6307\u6807\u5728\u89c6\u89c9\u611f\u77e5\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u4ee3\u7801\u3002|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079](http://arxiv.org/abs/2406.20079)|**[link](https://github.com/anisha2102/molecular_facts)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5185\u5bb9\u7684\u81ea\u52a8\u4e8b\u5b9e\u6838\u67e5\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4ee5\u5e94\u5bf9\u9519\u8bef\u53d9\u8ff0\u7684\u95ee\u9898\uff0c\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u7126\u70b9\u5728\u4e8e\u6838\u67e5\u7684\u7c92\u5ea6\uff1a\u8f83\u5927\u7684\u6587\u672c\u6bb5\u843d\u96be\u4ee5\u6838\u67e5\uff0c\u800c\u66f4\u539f\u5b50\u5316\u7684\u4e8b\u5b9e\uff08\u5982\u547d\u9898\uff09\u53ef\u80fd\u7f3a\u4e4f\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u89e3\u8bfb\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u8fd9\u4e9b\u539f\u5b50\u4e8b\u5b9e\u4e2d\u4e0a\u4e0b\u6587\u7684\u4f5c\u7528\u3002\u6211\u4eec\u8ba4\u4e3a\u5b8c\u5168\u539f\u5b50\u7684\u4e8b\u5b9e\u5e76\u975e\u6700\u4f73\u8868\u793a\u5f62\u5f0f\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u5206\u5b50\u4e8b\u5b9e\u7684\u4e24\u4e2a\u6807\u51c6\uff1a\u53bb\u60c5\u5883\u5316\uff08decontextuality\uff09\uff0c\u5373\u5b83\u4eec\u80fd\u5426\u72ec\u7acb\u5b58\u5728\uff0c\u4ee5\u53ca\u6700\u5c0f\u5316\uff08minimality\uff09\uff0c\u5373\u6dfb\u52a0\u591a\u5c11\u989d\u5916\u4fe1\u606f\u624d\u80fd\u5b9e\u73b0\u53bb\u60c5\u5883\u5316\u3002\u6211\u4eec\u91cf\u5316\u4e86\u53bb\u60c5\u5883\u5316\u5bf9\u6700\u5c0f\u5316\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u7840\u65b9\u6cd5\u6765\u81ea\u52a8\u751f\u6210\u5206\u5b50\u4e8b\u5b9e\uff0c\u76ee\u6807\u662f\u5728\u4fdd\u6301\u51c6\u786e\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u9002\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u4e0e\u4e0d\u540c\u7684\u53bb\u60c5\u5883\u5316\u7b56\u7565\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u53d1\u73b0\u5206\u5b50\u4e8b\u5b9e\u80fd\u591f\u5728\u6a21\u7cca\u573a\u666f\u4e2d\u5e73\u8861\u6700\u5c0f\u5316\u548c\u4e8b\u5b9e\u6838\u67e5\u7684\u51c6\u786e\u6027\u3002**|\n", "2406.20041": "|**2024-07-01**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041](http://arxiv.org/abs/2406.20041)|null|\u81ea\u4e3b\u4ee3\u7406\u9a71\u52a8\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u793a\u4e86\u5de8\u5927\u7684\u81ea\u52a8\u5316\u6f5c\u529b\u3002\u65e9\u671f\u7684\u5c55\u793a\u8868\u660e\uff0c\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u89e3\u51b3\u590d\u6742\u4efb\u52a1\uff0c\u4e0e\u5916\u90e8\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u589e\u5f3a\u77e5\u8bc6\uff0c\u5e76\u89e6\u53d1\u884c\u52a8\u3002\u7279\u522b\u662f\uff0c\u591a\u4e2a\u4ee3\u7406\u534f\u4f5c\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u5de5\u4f5c\u6d41\u8bc1\u660e\u4e86\u5b83\u4eec\u5728\u4e0d\u90a3\u4e48\u4e25\u683c\u548c\u5b9a\u4e49\u4e0d\u660e\u786e\u7684\u73af\u5883\u4e2d\u64cd\u4f5c\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u591a\u4ee3\u7406\u65b9\u6cd5\u6709\u5de8\u5927\u7684\u6f5c\u529b\u6210\u4e3a\u4f17\u591a\u5de5\u4e1a\u5e94\u7528\u7684\u6838\u5fc3\uff0c\u4ece\u590d\u6742\u7684\u77e5\u8bc6\u68c0\u7d22\u7cfb\u7edf\u5230\u4e0b\u4e00\u4ee3\u673a\u5668\u4eba\u8fc7\u7a0b\u81ea\u52a8\u5316\u3002\u9274\u4e8e\u5f53\u524dLLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5904\u7406\u590d\u6742\u6d41\u7a0b\u9700\u8981\u5206\u6b65\u9aa4\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u8bbe\u8ba1\u660e\u786e\u4e14\u6a21\u5757\u5316\u7684\u4efb\u52a1\u8ba1\u5212\u3002\u6839\u636e\u590d\u6742\u7a0b\u5ea6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u53ef\u4ee5\u7531\u5355\u4e2a\u4ee3\u7406\u6216\u4e00\u7ec4\u4ee3\u7406\u6267\u884c\u3002\u672c\u7814\u7a76\u4e13\u6ce8\u4e8e\u6784\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684\u4ee3\u7406\u5de5\u7a0b\u6846\u67b6\uff0c\u91cd\u70b9\u5173\u6ce8\u89c4\u5212\u548c\u6267\u884c\uff0c\u65e8\u5728\u5e94\u5bf9\u4e0d\u540c\u9886\u57df\u7684\u590d\u6742\u5e94\u7528\u573a\u666f\u3002\u8be5\u6846\u67b6\u4e3a\u5de5\u4e1a\u5e94\u7528\u63d0\u4f9b\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u51fa\u786e\u4fdd\u53ef\u6269\u5c55\u3001\u7075\u6d3b\u4e14\u534f\u4f5c\u7684\u5de5\u4f5c\u6d41\u7a0b\u6280\u672f\uff0c\u8ba9\u591a\u4e2a\u81ea\u4e3b\u4ee3\u7406\u534f\u540c\u89e3\u51b3\u95ee\u9898\u3002|\n", "2406.20030": "|**2024-06-28**|**LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models**|Renzhi Wang et.al.|[2406.20030](http://arxiv.org/abs/2406.20030)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u4e86\u8ddf\u4e0a\u4e0d\u65ad\u53d8\u5316\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u9700\u8981\u6301\u7eed\u8fdb\u884c\u6a21\u578b\u66f4\u65b0\uff0c\u8fd9\u50ac\u751f\u4e86\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u591a\u79cd\u5355\u6b21\u548c\u6279\u91cf\u7f16\u8f91\u7684\u6280\u672f\uff0c\u4f46\u5b83\u4eec\u5728\u9762\u5bf9\u7ec8\u751f\u7f16\u8f91\u65f6\u8981\u4e48\u65e0\u6cd5\u5e94\u7528\uff0c\u8981\u4e48\u6548\u679c\u4e0d\u4f73\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51faLEMoE\uff0c\u4e00\u4e2a\u4e13\u4e3a\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u8bbe\u8ba1\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u9002\u914d\u5668\u3002\u9996\u5148\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f71\u54cd\u4f20\u7edfMoE\u9002\u914d\u5668\u5728\u7ec8\u751f\u7f16\u8f91\u4e2d\u6709\u6548\u6027\u7684\u56e0\u7d20\uff0c\u5305\u62ec\u707e\u96be\u6027\u9057\u5fd8\u3001\u8def\u7531\u4e0d\u4e00\u81f4\u6027\u548c\u987a\u5e8f\u654f\u611f\u6027\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9a\u5236\u7684\u6a21\u5757\u63d2\u5165\u65b9\u6cd5\uff0c\u5f15\u5165\u4e86\u65b0\u9896\u7684\u952e\u503c\u5bf9\u951a\u5b9a\u8def\u7531\u4ee5\u589e\u5f3a\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8def\u7531\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u805a\u7c7b\u57fa\u7f16\u8f91\u987a\u5e8f\u89c4\u5212\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec8\u751f\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u6280\u672f\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6279\u91cf\u7f16\u8f91\u4efb\u52a1\u4e2d\u7684\u4f18\u79c0\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5f00\u6e90\u3002|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015](http://arxiv.org/abs/2406.20015)|**[link](https://github.com/toolbehonest/toolbehonest)**|**\u968f\u7740\u5de5\u5177\u589e\u5f3a\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fc5\u901f\u878d\u5165\u5b9e\u9645\u5e94\u7528\uff0c\u793e\u533a\u4e9f\u9700\u5168\u9762\u4e86\u89e3\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5168\u9762\u7684\u8bca\u65ad\u57fa\u51c6\u2014\u2014ToolBH\u3002\u6211\u4eec\u4ece\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e24\u4e2a\u7ef4\u5ea6\u8fdb\u884c\u8bc4\u4f30\uff1a\u5728\u6df1\u5ea6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u591a\u7ea7\u8bca\u65ad\u6d41\u7a0b\uff0c\u5305\u62ec\uff081\uff09\u53ef\u89e3\u6027\u68c0\u6d4b\u3001\uff082\uff09\u89e3\u51b3\u65b9\u6848\u89c4\u5212\u548c\uff083\uff09\u7f3a\u5931\u5de5\u5177\u5206\u6790\uff1b\u5728\u5e7f\u5ea6\u4e0a\uff0c\u8003\u8651\u4e86\u5de5\u5177\u96c6\u7279\u5f81\u4e0b\u7684\u4e09\u79cd\u573a\u666f\uff1a\u7f3a\u5c11\u5fc5\u8981\u5de5\u5177\u3001\u6f5c\u5728\u5de5\u5177\u548c\u529f\u80fd\u6709\u9650\u7684\u5de5\u5177\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e03\u4e2a\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u6b21\u4eba\u5de5\u6807\u6ce8\u6536\u96c6\u4e86700\u4efd\u8bc4\u4f30\u6837\u672c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u5148\u8fdb\u7684\u6a21\u578bGemini-1.5-Pro\u548cGPT-4o\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u603b\u5f97\u5206\u4e3a45.3\u548c37.0\uff0c\u6ee1\u5206100\u5206\u3002\u5728\u5de5\u5177\u589e\u5f3a\u7684LLM\u573a\u666f\u4e2d\uff0c\u66f4\u5927\u7684\u6a21\u578b\u53c2\u6570\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8bad\u7ec3\u6570\u636e\u548c\u56de\u590d\u7b56\u7565\u540c\u6837\u5173\u952e\u3002\u6211\u4eec\u7684\u8bca\u65ad\u5206\u6790\u6307\u51fa\uff0c\u6a21\u578b\u9519\u8bef\u7684\u4e3b\u8981\u539f\u56e0\u5728\u4e8e\u4efb\u52a1\u53ef\u89e3\u6027\u7684\u5224\u65ad\u3002\u5f00\u653e\u6e90\u7801\u6a21\u578b\u5728\u5197\u957f\u56de\u590d\u65f6\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u4e13\u6709\u6a21\u578b\u5728\u957f\u94fe\u63a8\u7406\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002**|\n", "2407.02490": "|**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8ba1\u7b97\u6311\u6218\uff0c\u5c24\u5176\u662f\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u957f\uff0c\u5176\u5e7f\u6cdb\u5e94\u7528\u9762\u4e34\u969c\u788d\u3002\u7531\u4e8e\u6ce8\u610f\u529b\u8ba1\u7b97\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c80\u4ebf\u53c2\u6570\u7684LLM\u5728\u5355\u4e2aA100 GPU\u4e0a\u5904\u7406100\u4e07\u4e2a\u4ee4\u724c\uff08\u5373\u9884\u586b\u5145\u9636\u6bb5\uff09\u9700\u898130\u5206\u949f\u3002\u73b0\u6709\u7684\u52a0\u901f\u9884\u586b\u5145\u65b9\u6cd5\u5f80\u5f80\u5728\u9762\u5bf9\u957f\u5e8f\u5217LLMs\u65f6\u96be\u4ee5\u4fdd\u6301\u65e2\u9ad8\u6548\u53c8\u51c6\u786e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MInference\uff08\u767e\u4e07\u4ee4\u724c\u63a8\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u63d0\u5347\u957f\u5e8f\u5217\u5904\u7406\u9884\u586b\u5145\u9636\u6bb5\u901f\u5ea6\u7684\u7a00\u758f\u8ba1\u7b97\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u6ce8\u610f\u529b\u77e9\u9635\u4e2d\u7684\u4e09\u79cd\u72ec\u7279\u6a21\u5f0f\uff1aA\u5f62\u3001\u5782\u76f4\u659c\u7ebf\u548c\u5757\u7a00\u758f\uff0c\u8fd9\u4e9b\u6a21\u5f0f\u53ef\u5229\u7528GPU\u8fdb\u884c\u9ad8\u6548\u7684\u7a00\u758f\u8ba1\u7b97\u3002\u6211\u4eec\u5728\u79bb\u7ebf\u9636\u6bb5\u786e\u5b9a\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u6700\u4f73\u6a21\u5f0f\uff0c\u5e76\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u52a8\u6001\u6784\u5efa\u7a00\u758f\u7d22\u5f15\u3002\u901a\u8fc7\u4f18\u5316\u7684GPU\u5185\u6838\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u57fa\u4e8e\u6307\u5b9a\u6a21\u5f0f\u7684\u7a00\u758f\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u957f\u5e8f\u5217LLMs\u9884\u586b\u5145\u9636\u6bb5\u7684\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u9700\u4fee\u6539\u9884\u8bad\u7ec3\u8bbe\u7f6e\u6216\u989d\u5916\u5fae\u8c03\u5373\u53ef\u76f4\u63a5\u5e94\u7528\u4e8e\u73b0\u6709LLMs\u3002\u6211\u4eec\u5728\u5305\u62ecInfiniteBench\u3001RULER\u3001PG-19\u548cNeedle In A Haystack\u5728\u5185\u7684\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4ee5\u53caLLaMA-3-1M\u3001GLM4-1M\u3001Yi-200K\u3001Phi-3-128K\u548cQwen2-128K\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMInference\u5728A100\u4e0a\u6709\u6548\u964d\u4f4e\u4e86\u9884\u586b\u5145\u7684\u63a8\u7406\u5ef6\u8fdf\u9ad8\u8fbe10\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u5730\u5740\u4e3a\uff1ahttps://aka.ms/MInference\u3002**|\n", "2407.02486": "|**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNeurocache\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6709\u6548\u4e0a\u4e0b\u6587\u8303\u56f4\uff0c\u901a\u8fc7\u5916\u90e8\u5411\u91cf\u7f13\u5b58\u5b58\u50a8\u5176\u8fc7\u53bb\u7684\u6a21\u578b\u72b6\u6001\u3002\u4e0e\u8fd1\u671f\u7684\u5411\u91cf\u68c0\u7d22\u65b9\u6cd5\u7c7b\u4f3c\uff0cNeurocache\u5229\u7528\u9ad8\u6548\u7684k\u8fd1\u90bb(kNN)\u7b97\u6cd5\u68c0\u7d22\u76f8\u5173\u7684\u5386\u53f2\u72b6\u6001\uff0c\u5e76\u5c06\u5176\u878d\u5165\u6ce8\u610f\u529b\u8fc7\u7a0b\u3002Neurocache\u5728\u6539\u8fdb\u73b0\u6709\u65b9\u6cd5\u65b9\u9762\u6709\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u5b58\u50a8\u538b\u7f29\u7684\u72b6\u6001\uff0c\u51cf\u5c0f\u4e86\u7f13\u5b58\u5927\u5c0f\uff1b(2) \u6bcf\u4e2a\u4ee4\u724c\u6267\u884c\u4e00\u6b21\u68c0\u7d22\u64cd\u4f5c\uff0c\u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\uff1b(3) \u5c06\u68c0\u7d22\u7a97\u53e3\u6269\u5c55\u5230\u90bb\u8fd1\u72b6\u6001\uff0c\u63d0\u5347\u4e86\u8bed\u8a00\u5efa\u6a21\u548c\u4e0b\u6e38\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u8fd8\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982Llama2-7B\u548cMistral-7B\uff09\u8fdb\u884c\u589e\u5f3a\uff0cNeurocache\u90fd\u80fd\u6709\u6548\u3002\u6211\u4eec\u8fd8\u5bf9\u6bd4\u4e86Neurocache\u4e0e\u5176\u4ed6\u6587\u672c\u68c0\u7d22\u65b9\u6cd5\uff0c\u5728\u5355\u6587\u6863\u95ee\u7b54\u548c\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5176\u4f18\u52bf\u3002\u6e90\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u516c\u5f00\uff1ahttps://github.com/alisafaya/neurocache\u3002**|\n", "2407.02485": "|**2024-07-02**|**RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs**|Yue Yu et.al.|[2407.02485](http://arxiv.org/abs/2407.02485)|null|\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6307\u4ee4\u8c03\u4f18\u6846\u67b6RankRAG\uff0c\u65e8\u5728\u9488\u5bf9\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4e2d\u7684\u4e0a\u4e0b\u6587\u6392\u540d\u548c\u7b54\u6848\u751f\u6210\u53cc\u91cd\u4efb\u52a1\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u8c03\u4f18\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u52a0\u5165\u5c11\u91cf\u6392\u540d\u6570\u636e\uff0c\u6307\u4ee4\u8c03\u4f18\u7684\u5355\u4e2a\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u4ee4\u4eba\u60ca\u8bb6\u7684\u6548\u679c\uff0c\u8d85\u8d8a\u4e86\u4e13\u95e8\u4f7f\u7528\u5927\u91cf\u6392\u540d\u6570\u636e\u8fdb\u884c\u5355\u72ec\u8c03\u4f18\u7684\u73b0\u6709\u4e13\u5bb6\u6392\u540d\u6a21\u578b\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4e0e\u5305\u62ecGPT-4-0613\u3001GPT-4-turbo-2024-0409\u548c\u5f00\u653e\u6e90\u4ee3\u7801\u7684\u6700\u5148\u8fdb\u7684RAG\u6027\u80fd\u6a21\u578bChatQA-1.5\u5728\u5185\u7684\u591a\u4e2a\u5f3abaseline\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684Llama3-RankRAG\u5728\u4e5d\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8eLlama3-ChatQA-1.5\u548cGPT-4\u7cfb\u5217\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5728\u65e0\u9700\u9488\u5bf9\u751f\u7269\u533b\u5b66\u9886\u57df\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u4e94\u4e2a\u751f\u7269\u533b\u5b66\u9886\u57df\u7684RAG\u57fa\u51c6\u4e0a\u4e0eGPT-4\u6a21\u578b\u8868\u73b0\u76f8\u5f53\uff0c\u8fd9\u663e\u793a\u4e86\u5176\u5728\u65b0\u9886\u57df\u4e2d\u7684\u51fa\u8272\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.02477": "|**2024-07-02**|**Understanding Alignment in Multimodal LLMs: A Comprehensive Study**|Elmira Amirloo et.al.|[2407.02477](http://arxiv.org/abs/2407.02477)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6027\u80fd\u7684\u63d0\u5347\uff0c\u504f\u597d\u4e00\u81f4\u6027\u5df2\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u56e0\u7d20\uff0c\u4f46\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u5e94\u7528\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u7406\u89e3\u4efb\u52a1\u4e2d\u4e5f\u4f1a\u9047\u5230\u8bf8\u5982\u9519\u8bef\u9648\u8ff0\u548c\u5185\u5bb9\u4e0d\u4e00\u81f4\uff08\u5373\u5e7b\u89c9\uff09\u7684\u95ee\u9898\u3002MLLMs\u7684\u504f\u597d\u5bf9\u9f50\u76ee\u6807\u662f\u4f7f\u6a21\u578b\u7684\u56de\u7b54\u66f4\u8d34\u8fd1\u56fe\u50cf\u4fe1\u606f\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u5f15\u5165\u4e86\u9488\u5bf9MLLM\u7684\u504f\u597d\u6570\u636e\u96c6\uff0c\u5e76\u5c1d\u8bd5\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u548cproximal policy optimization\uff08PPO\uff09\u7b49\u4e0d\u540c\u7684\u5bf9\u9f50\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u3001\u57fa\u7840\u6a21\u578b\u7c7b\u578b\u548c\u5bf9\u9f50\u7b56\u7565\u7684\u5dee\u5f02\uff0c\u54ea\u79cd\u65b9\u6cd5\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u8d21\u732e\u6700\u5927\u5c1a\u4e0d\u6e05\u695a\u3002 \u672c\u6587\u72ec\u7acb\u5206\u6790\u4e86MLLM\u504f\u597d\u5bf9\u9f50\u7684\u5404\u4e2a\u65b9\u9762\u3002\u6211\u4eec\u5c06\u5bf9\u9f50\u7b97\u6cd5\u5206\u4e3a\u79bb\u7ebf\uff08\u5982DPO\uff09\u548c\u5728\u7ebf\uff08\u5982\u5728\u7ebf-DPO\uff09\u4e24\u7c7b\uff0c\u5e76\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u7ed3\u5408\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u56de\u987e\u4e86\u5404\u79cd\u5df2\u53d1\u8868\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u96c6\uff0c\u63a2\u8ba8\u4e86\u5b83\u4eec\u6784\u5efa\u7ec6\u8282\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u751f\u6210\u65b9\u6cd5\u2014\u2014\u504f\u89c1\u9a71\u52a8\u7684\u5e7b\u89c9\u91c7\u6837\uff08Bias-Driven Hallucination Sampling\uff0cBDHS\uff09\uff0c\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u989d\u5916\u6807\u6ce8\u6216\u5916\u90e8\u6a21\u578b\uff0c\u4e14\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u4e0e\u4e4b\u524d\u53d1\u8868\u7684\u5bf9\u9f50\u5de5\u4f5c\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u80fd\u3002|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473](http://arxiv.org/abs/2407.02473)|null|\u5982\u4f55\u6784\u5efa\u80fd\u591f\u5728\u5f00\u653e\u4e16\u754c\u4e2d\u6267\u884c\u8bed\u4e49\u5bfc\u822a\u4efb\u52a1\u7684\u673a\u5668\u4eba\uff0c\u6bd4\u5982\u5728\u65b0\u573a\u666f\u4e2d\u5bfb\u627e\u76ee\u6807\u7269\u4f53\uff1f\u5c3d\u7ba1\u57fa\u7840\u6a21\u578b\u5177\u5907\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u6240\u9700\u7684\u4e30\u5bcc\u77e5\u8bc6\u548c\u6cdb\u5316\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4e00\u79cd\u5408\u9002\u7684\u573a\u666f\u8868\u793a\u6765\u5c06\u5b83\u4eec\u6574\u5408\u5230\u5b8c\u6574\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5f00\u653e\u573a\u666f\u56fe\uff08Open Scene Graphs\uff0cOSG\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u62d3\u6251\u8bed\u4e49\u8868\u793a\uff0c\u7528\u4e8e\u4fdd\u7559\u548c\u7ec4\u7ec7\u5f00\u653e\u96c6\u4e2d\u573a\u666f\u4fe1\u606f\uff0c\u4e14\u7ed3\u6784\u53ef\u9002\u5e94\u4e0d\u540c\u73af\u5883\u7c7b\u578b\u3002\u6211\u4eec\u5c06\u57fa\u7840\u6a21\u578b\u548cOSG\u6574\u5408\u5230OpenSearch\u7cfb\u7edf\u4e2d\uff0c\u8be5\u7cfb\u7edf\u4e13\u4e3a\u5f00\u653e\u4e16\u754c\u7684\u5bf9\u8c61\u76ee\u6807\u5bfc\u822a\u8bbe\u8ba1\uff0c\u80fd\u591f\u7406\u89e3\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u5e76\u5728\u591a\u53d8\u73af\u5883\u4e2d\u96f6\u6837\u672c\u6cdb\u5316\uff0c\u5bfb\u627e\u672a\u89c1\u8fc7\u7684\u7269\u4f53\u3002\u6211\u4eec\u7684OSG\u589e\u5f3a\u4e86\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5f97OpenSearch\u5728\u7269\u4f53\u76ee\u6807\u5bfc\u822a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u7684LLM\u65b9\u6cd5\u3002\u901a\u8fc7\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u6d4b\u8bd5\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86OpenSearch\u5728\u5404\u79cd\u73af\u5883\u3001\u673a\u5668\u4eba\u548c\u65b0\u9896\u6307\u4ee4\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02464": "|**2024-07-02**|**Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I**|Harrie Oosterhuis et.al.|[2407.02464](http://arxiv.org/abs/2407.02464)|null|\u4f20\u7edf\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u7cfb\u7edf\u8bc4\u4f30\u901a\u5e38\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u4eba\u5de5\u4e13\u5bb6\u8fdb\u884c\u76f8\u5173\u6027\u6807\u6ce8\u3002\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u80fd\u591f\u4ee5\u76f8\u5bf9\u8f83\u4f4e\u7684\u8ba1\u7b97\u6210\u672c\u5927\u89c4\u6a21\u751f\u6210\u76f8\u5173\u6027\u6ce8\u91ca\uff0c\u53ef\u80fd\u51cf\u8f7bIR\u8bc4\u4f30\u7684\u4f20\u7edf\u6210\u672c\uff0c\u5e76\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u8d44\u6e90\u532e\u4e4f\u7684\u5e94\u7528\u573a\u666f\u3002\u7136\u800c\uff0c\u751f\u6210\u7684\u6ce8\u91ca\u5e76\u975e\u65e0\u8bef\uff0c\u76f4\u63a5\u7528\u4e8e\u8bc4\u4f30\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u53ef\u9760\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e24\u79cd\u65b9\u6cd5\uff0c\u5206\u522b\u662f\u57fa\u4e8e\u9884\u6d4b\u9a71\u52a8\u7684\u63a8\u65ad\u548c\u89c4\u8303\u98ce\u9669\u63a7\u5236\uff0c\u5229\u7528\u8ba1\u7b97\u673a\u751f\u6210\u7684\u76f8\u5173\u6027\u6ce8\u91ca\u4e3aIR\u8bc4\u4f30\u6307\u6807\u63d0\u4f9b\u53ef\u9760\u7684\u7f6e\u4fe1\u533a\u95f4\uff08CIs\uff09\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9700\u8981\u5c11\u91cf\u53ef\u9760\u7684\u6ce8\u91ca\uff0c\u901a\u8fc7\u7edf\u8ba1\u5206\u6790\u751f\u6210\u6ce8\u91ca\u4e2d\u7684\u9519\u8bef\uff0c\u4ece\u800c\u4e3a\u8bc4\u4f30\u6307\u6807\u8bbe\u7f6eCIs\uff0c\u5177\u6709\u575a\u5b9e\u7684\u7406\u8bba\u57fa\u7840\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u7684\u89c4\u8303\u98ce\u9669\u63a7\u5236\u65b9\u6cd5\u9002\u7528\u4e8e\u6392\u540d\u8bc4\u4f30\uff0c\u5e76\u4e14\u53ef\u4ee5\u6839\u636e\u67e5\u8be2\u548c\u6587\u6863\u81ea\u9002\u5e94\u8c03\u6574CIs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7f6e\u4fe1\u533a\u95f4\u51c6\u786e\u6355\u6349\u4e86\u57fa\u4e8eLLM\u6ce8\u91ca\u7684\u8bc4\u4f30\u4e2d\u7684\u53d8\u5f02\u6027\u548c\u504f\u5dee\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684Bootstrap\u4f30\u8ba1\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e9b\u8d21\u732e\u80fd\u4e3a\u90a3\u4e9b\u4f20\u7edf\u4e0a\u96be\u4ee5\u5b9e\u73b0\u53ef\u9760\u8bc4\u4f30\u7684\u4f17\u591aIR\u5e94\u7528\u5e26\u6765\u9769\u65b0\u3002|\n", "2407.02411": "|**2024-07-03**|**Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs**|Jinmin Li et.al.|[2407.02411](http://arxiv.org/abs/2407.02411)|null|\u968f\u7740\u89c6\u9891\u9a71\u52a8\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u89c6\u9891\u7406\u89e3\u80fd\u529b\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u4fdd\u62a4\u65b9\u9762\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u89c6\u9891\u66f4\u5bb9\u6613\u88ab\u65e0\u6388\u6743\u5730\u6807\u6ce8\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cVideo Watermarking\u201d\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fdd\u62a4\u89c6\u9891\u514d\u53d7\u672a\u7ecf\u6388\u6743\u7684\u89c6\u9891LLMs\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5185\u5bb9\u548c\u63cf\u8ff0\u7684\u5904\u7406\u3002\u901a\u8fc7\u5728\u5173\u952e\u5e27\u4e2d\u5d4c\u5165\u96be\u4ee5\u5bdf\u89c9\u7684\u6c34\u5370\uff0c\u6211\u4eec\u5229\u7528\u591a\u6a21\u6001\u6d41\u635f\u5931\u4fdd\u6301\u89c2\u770b\u4f53\u9a8c\u7684\u540c\u65f6\uff0c\u9632\u6b62\u89c6\u9891\u88ab\u6ee5\u7528\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0cVideo Watermarking\u663e\u8457\u964d\u4f4e\u4e86\u89c6\u9891\u5728\u5404\u79cd\u89c6\u9891LLMs\u4e2d\u7684\u53ef\u7406\u89e3\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u9690\u79d8\u6027\u548c\u9c81\u68d2\u6027\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u786e\u4fdd\u89c6\u9891\u5185\u5bb9\u7684\u5b89\u5168\u3001\u5b8c\u6574\u6027\u548c\u4fdd\u5bc6\u6027\u63d0\u4f9b\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u4e0d\u65ad\u53d1\u5c55\u7684\u89c6\u9891LLMs\u6280\u672f\u3002|\n", "2407.02408": "|**2024-07-02**|**CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models**|Song Wang et.al.|[2407.02408](http://arxiv.org/abs/2407.02408)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u5bf9\u5176\u751f\u6210\u5185\u5bb9\u53ef\u80fd\u4ea7\u751f\u7684\u8d1f\u9762\u793e\u4f1a\u5f71\u54cd\u7684\u62c5\u5fe7\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u7684\u504f\u89c1\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u504f\u89c1\u8bc4\u4f30\u5de5\u4f5c\u5f80\u5f80\u53ea\u5173\u6ce8\u67d0\u79cd\u7c7b\u578b\u7684\u504f\u89c1\uff0c\u5e76\u4f7f\u7528\u4e0d\u4e00\u81f4\u7684\u8bc4\u4ef7\u6307\u6807\uff0c\u8fd9\u5bfc\u81f4\u4e0d\u540c\u6570\u636e\u96c6\u548cLLM\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u591a\u79cd\u7528\u4e8e\u8bc4\u4f30LLM\u504f\u89c1\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86CEB\uff08Compositional Evaluation Benchmark\uff09\uff0c\u5b83\u6db5\u76d6\u4e86\u4e0d\u540c\u793e\u4f1a\u7fa4\u4f53\u548c\u793e\u4f1a\u4efb\u52a1\u4e2d\u7684\u5404\u79cd\u7c7b\u578b\u504f\u89c1\u3002CEB\u7684\u6784\u5efa\u57fa\u4e8e\u6211\u4eec\u65b0\u63d0\u51fa\u7684\u6784\u6210\u6027\u5206\u7c7b\u4f53\u7cfb\uff0c\u4ece\u4e09\u4e2a\u7ef4\u5ea6\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u523b\u753b\uff1a\u504f\u89c1\u7c7b\u578b\u3001\u793e\u4f1a\u7fa4\u4f53\u548c\u4efb\u52a1\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e09\u4e2a\u7ef4\u5ea6\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u5168\u9762\u7684LLM\u504f\u89c1\u8bc4\u4f30\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u504f\u89c1\u5728\u5404\u7ef4\u5ea6\u4e0a\u7684\u7a0b\u5ea6\u6709\u6240\u4e0d\u540c\uff0c\u4ece\u800c\u4e3a\u9488\u5bf9\u7279\u5b9a\u504f\u89c1\u7684\u7f13\u89e3\u65b9\u6cd5\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002|\n", "2407.02402": "|**2024-07-02**|**Assessing the Code Clone Detection Capability of Large Language Models**|Zixian Zhang et.al.|[2407.02402](http://arxiv.org/abs/2407.02402)|null|\u8be5\u7814\u7a76\u65e8\u5728\u8bc4\u4f30\u4e24\u79cd\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cGPT-3.5\u548cGPT-4\uff0c\u5728\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u901a\u8fc7\u5728\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u6a21\u578b\uff1aBigCloneBench\uff08\u4eba\u7c7b\u521b\u5efa\uff09\u548cGPTCloneBench\uff08LLM\u751f\u6210\uff09\u3002\u7814\u7a76\u53d1\u73b0\uff0cGPT-4\u5728\u6240\u6709\u7c7b\u578b\u7684\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4e2d\u90fd\u660e\u663e\u4f18\u4e8eGPT-3.5\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e0e\u5176\u8bc6\u522b\u4ee3\u7801\u514b\u9686\u7684\u80fd\u529b\u4e0e\u4ee3\u7801\u76f8\u4f3c\u5ea6\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5728\u8bc6\u522b\u6700\u590d\u6742\u7684Type-4\u4ee3\u7801\u514b\u9686\u65f6\u6548\u679c\u8f83\u4f4e\u3002\u6b64\u5916\uff0cGPT\u6a21\u578b\u5728\u68c0\u6d4bLLM\u751f\u6210\u7684\u4ee3\u7801\u4e2d\u7684\u4ee3\u7801\u514b\u9686\u8868\u73b0\u4f18\u4e8e\u4eba\u7c7b\u751f\u6210\u7684\u4ee3\u7801\uff0c\u4f46\u6574\u4f53\u51c6\u786e\u6027\u4ecd\u4e0d\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u5728\u4ee3\u7801\u514b\u9686\u8bc6\u522b\u80fd\u529b\u7684\u5fc5\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u81ea\u6211\u751f\u6210\u4ee3\u7801\u514b\u9686\u7684\u95ee\u9898\uff0c\u968f\u7740\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u8d8a\u6765\u8d8a\u591a\u5730\u4f7f\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u548c\u91cd\u6784\u5de5\u5177\uff0c\u8fd9\u53ef\u80fd\u4f1a\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002|\n", "2407.03310": "|**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**\u6458\u8981\uff1a** \u957f\u5ea6\u6cdb\u5316\u6307\u7684\u662f\u4ece\u7b80\u77ed\u7684\u8bad\u7ec3\u5e8f\u5217\u63a8\u65ad\u51fa\u957f\u6d4b\u8bd5\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd9\u5bf9\u4e8e\u5f53\u524d\u7684\u5927\u8bed\u8a00\u6a21\u578b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u67b6\u6784\u6216\u6570\u636e\u683c\u5f0f\u53d8\u5316\u6765\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5c40\u9650\u4e8e\u7279\u5b9a\u4efb\u52a1\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u64e6\u9664\u677f\u548c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u6280\u672f\uff0c\u63d0\u51fa\u4e86Turing\u7a0b\u5e8f\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684CoT\u7b56\u7565\uff0c\u5b83\u5c06\u7b97\u6cd5\u6027\u4efb\u52a1\u5206\u89e3\u6210\u7c7b\u4f3c\u56fe\u7075\u673a\u8ba1\u7b97\u7684\u6b65\u9aa4\u3002\u8fd9\u4e2a\u6846\u67b6\u65e2\u901a\u7528\u53c8\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728\u4e0a\u4e0b\u6587\u4e2d\u7a0d\u4f5c\u4fee\u6539\u5730\u590d\u5236\u6587\u672c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Turing\u7a0b\u5e8f\uff0c\u6211\u4eec\u5728\u52a0\u6cd5\u3001\u4e58\u6cd5\u4ee5\u53ca\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684SGD\u7b49\u7b97\u6cd5\u6027\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u7a33\u5065\u7684\u957f\u5ea6\u6cdb\u5316\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793aTransformer\u5728\u968f\u673aTuring\u7a0b\u5e8f\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u8fd9\u8868\u660e\u5bf9\u4e8e\u4efb\u4f55\u7b97\u6cd5\u6027\u4efb\u52a1\uff0c\u957f\u5ea6\u6cdb\u5316\u90fd\u662f\u53ef\u80fd\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u7406\u8bba\u8bc1\u660eTransformer\u80fd\u591f\u5b9e\u73b0Turing\u7a0b\u5e8f\uff0c\u6784\u9020\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RASP\uff08Weiss\u7b49\u4eba\uff09\u7a0b\u5e8f\uff0c\u5b83\u6a21\u62df\u4efb\u610f\u56fe\u7075\u673a\u3002|\n", "2407.03286": "|**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## \u80cc\u666f \u534a\u7ed3\u6784\u5316\u6570\u636e\u683c\u5f0f\u5982JSON\u56e0\u5176\u5728\u5b58\u50a8\u6570\u636e\u65f6\u7684\u7075\u6d3b\u6027\u800c\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cJSON\u6570\u636e\u901a\u5e38\u7f3a\u4e4f\u4e0e\u5173\u7cfb\u6570\u636e\u5e93\u4e2d\u7684\u8868\u5355\u7ed3\u6784\u76f8\u5bf9\u5e94\u7684\u89c4\u8303\uff08schema\uff09\u3002\u56e0\u6b64\uff0c\u51fa\u73b0\u4e86\u8bb8\u591a\u4ece\u6570\u636e\u96c6\u4e2d\u53d1\u73b0\u89c4\u8303\u7684\u5de5\u5177\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5de5\u5177\u5f88\u6709\u7528\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u6587\u6863\u7684\u8bed\u6cd5\uff0c\u800c\u5ffd\u89c6\u4e86\u8bed\u4e49\u4fe1\u606f\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u5982\u4f55\u81ea\u52a8\u4e3a\u53d1\u73b0\u7684\u89c4\u8303\u6dfb\u52a0\u6709\u610f\u4e49\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u4f7f\u5176\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4f5c\u8005\u7f16\u5199\u7684\u89c4\u8303\u4e2d\u6240\u5305\u542b\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u4eba\u5de5\u7f16\u5199\u7684JSON Schema\u6587\u6863\u5e93\uff0c\u751f\u6210\u5143\u7d20\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u3001\u53ef\u91cd\u7528\u5b9a\u4e49\u7684\u6709\u610f\u4e49\u540d\u79f0\uff0c\u5e76\u8bc6\u522b\u51fa\u54ea\u4e9b\u53d1\u73b0\u7684\u5c5e\u6027\u6700\u6709\u7528\uff0c\u54ea\u4e9b\u53ef\u4ee5\u89c6\u4e3a\u201c\u566a\u58f0\u201d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5148\u524d\u5df2\u8bc1\u660e\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684\u6587\u672c\u751f\u6210\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|**[link](https://github.com/ziweiji/Internal_States_Reveal_Hallucination)**|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u4e25\u91cd\u5236\u7ea6\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002\u4eba\u7c7b\u5177\u6709\u81ea\u6211\u610f\u8bc6\u8fc7\u7a0b\uff0c\u80fd\u8bc6\u522b\u9762\u5bf9\u67e5\u8be2\u65f6\u7684\u672a\u77e5\u9886\u57df\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u8bba\u6587\u7814\u7a76\u4e86LLMs\u80fd\u5426\u5728\u751f\u6210\u54cd\u5e94\u4e4b\u524d\u81ea\u884c\u8bc4\u4f30\u5176\u5e7b\u89c9\u98ce\u9669\u3002\u6211\u4eec\u4ece\u8bad\u7ec3\u6570\u636e\u6e90\u548c15\u4e2a\u4e0d\u540c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u89d2\u5ea6\u5e7f\u6cdb\u5206\u6790LLMs\u7684\u5185\u90e8\u673a\u5236\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u6570\u636e\u96c6\u3002\u5b9e\u8bc1\u5206\u6790\u63ed\u793a\u4e86\u4e24\u4e2a\u5173\u952e\u53d1\u73b0\uff1a(1) LLM\u7684\u5185\u90e8\u72b6\u6001\u80fd\u591f\u6307\u793a\u5b83\u4eec\u662f\u5426\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c1\u8fc7\u67e5\u8be2\uff1b(2) LLM\u7684\u5185\u90e8\u72b6\u6001\u663e\u793a\u51fa\u5b83\u4eec\u5bf9\u67e5\u8be2\u53ef\u80fd\u4ea7\u751f\u5e7b\u89c9\u6216\u4e0d\u4ea7\u751f\u5e7b\u89c9\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u7814\u7a76\u5173\u6ce8\u7279\u5b9a\u7684\u795e\u7ecf\u5143\u3001\u6fc0\u6d3b\u5c42\u548c\u4ee4\u724c\uff0c\u8fd9\u4e9b\u5728LLM\u5bf9\u4e0d\u786e\u5b9a\u6027\u548c\u5e7b\u89c9\u98ce\u9669\u7684\u8ba4\u8bc6\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u901a\u8fc7\u4e00\u79cd\u63a2\u67e5\u4f30\u8ba1\u7b97\u6cd5\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u8fd0\u884c\u65f6\u5b9e\u73b0\u4e86\u5e73\u574784.32%\u7684\u5e7b\u89c9\u4f30\u8ba1\u51c6\u786e\u7387\u3002|\n", "2407.03227": "|**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|\u6211\u4eec\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89d2\u5ea6\u63a2\u8ba8\u6587\u672c\u5230SQL\u7684\u8bed\u4e49\u89e3\u6790\u3002\u9274\u4e8e\u5546\u4e1a\u6570\u636e\u5e93\u6a21\u5f0f\u7684\u89c4\u6a21\u6311\u6218\u548c\u4e1a\u52a1\u667a\u80fd\u89e3\u51b3\u65b9\u6848\u7684\u90e8\u7f72\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5b83\u52a8\u6001\u83b7\u53d6\u8f93\u5165\u6570\u636e\u5e93\u4fe1\u606f\uff0c\u5e76\u5229\u7528\u62bd\u8c61\u8bed\u6cd5\u6811\u9009\u62e9\u5c11\u91cf\u793a\u4f8b\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5982\u4f55\u5229\u7528\u5e76\u884c\u8bed\u4e49\u89e3\u6790\u5668\u751f\u6210SQL\u67e5\u8be2\u7684\u8fd1\u4f3c\u7248\u672c\uff0c\u4ee5\u652f\u6301\u6211\u4eec\u7684\u68c0\u7d22\u3002\u6211\u4eec\u751a\u81f3\u5c06\u8fd9\u79cd\u65b9\u6cd5\u63a8\u5411\u6781\u81f4\uff0c\u91c7\u7528\u4e0d\u52305\u4ebf\u53c2\u6570\u7684\u6a21\u578b\u4f5c\u4e3a\u9ad8\u6548\u8fd1\u4f3c\u5668\uff0c\u5e76\u8d4b\u4e88\u5176\u5e76\u884c\u5904\u7406\u6a21\u5f0f\u7684\u80fd\u529b\u3002\u6211\u4eec\u5728\u5355\u8bed\u548c\u8de8\u8bed\u8a00\u7684\u8bed\u4e49\u89e3\u6790\u57fa\u51c6\u4e0a\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4f18\u4e8e\u73b0\u6709\u6700\u4f73\u57fa\u7ebf\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u79cd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u8bbe\u7f6e\u4e2d\u5404\u4e2a\u6a21\u5757\u7684\u8d21\u732e\uff0c\u4e3a\u672a\u6765\u5de5\u4f5c\u6307\u660e\u4e86\u6709\u8da3\u7684\u65b9\u5411\u3002|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## \u80cc\u666f \u91cf\u5316\u6280\u672f\u5728\u63d0\u5347\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u901f\u5ea6\u548c\u90e8\u7f72\u6548\u7387\u65b9\u9762\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5c3d\u7ba1\u6709\u5927\u91cf\u7684\u7814\u7a76\u5173\u6ce8\u4e86\u91cf\u5316\u540e\u7684\u82f1\u8bed\u4efb\u52a1\u6a21\u578b\u6548\u679c\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u9488\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u3002\u6211\u4eec\u5bf9\u91cf\u5316\u591a\u8bed\u8a00LLM\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u8de8\u8bed\u8a00\u6027\u80fd\u53ca\u4e0d\u540c\u89c4\u6a21\u4e0b\u7684\u8868\u73b0\u3002\u6211\u4eec\u91c7\u7528\u81ea\u52a8\u57fa\u51c6\u6d4b\u8bd5\u3001LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u8bc4\u4f30\uff0c\u53d1\u73b0\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u91cf\u5316\u5bf9\u4eba\u7c7b\u8bc4\u4ef7\u7684\u5f71\u54cd\u662f\u8d1f\u9762\u7684\uff0c\u4e14\u81ea\u52a8\u6307\u6807\u4e25\u91cd\u4f4e\u4f30\u4e86\u8fd9\u79cd\u635f\u5bb3\uff1a\u81ea\u52a8\u4efb\u52a1\u4e2d\u5e73\u57471.7%\u7684\u6027\u80fd\u4e0b\u964d\u5bf9\u5e94\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u65e5\u672c\u4efb\u52a1\u768416.0%\u663e\u8457\u4e0b\u6ed1\uff1b(2) \u4e0d\u540c\u8bed\u8a00\u53d7\u5230\u91cf\u5316\u7684\u5f71\u54cd\u7a0b\u5ea6\u4e0d\u5747\uff0c\u975e\u62c9\u4e01\u5b57\u6bcd\u4f53\u7cfb\u7684\u8bed\u8a00\u53d7\u5f71\u54cd\u6700\u4e25\u91cd\uff1b(3) \u6bd4\u5982\u6570\u5b66\u63a8\u7406\u8fd9\u7c7b\u6311\u6218\u6027\u4efb\u52a1\uff0c\u5176\u6027\u80fd\u4e0b\u964d\u6700\u4e3a\u663e\u8457\u3002\u968f\u7740\u4f4e\u529f\u8017\u6a21\u578b\u670d\u52a1\u4e8e\u5168\u7403NLP\u6280\u672f\u7684\u666e\u53ca\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5728\u8bc4\u4f30\u9ad8\u6548\u6a21\u578b\u65f6\uff0c\u591a\u8bed\u8a00\u6027\u80fd\u5e94\u4f5c\u4e3a\u5173\u952e\u6307\u6807\u3002|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### \u7ffb\u8bd1 \u5728\u6570\u5b66\u8bc1\u660e\u7684\u8ba1\u7b97\u673a\u53ef\u9a8c\u8bc1\u5f62\u5f0f\u8bed\u8a00\uff08\u5982Lean\uff09\u9a8c\u8bc1\u4e2d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u7684\u8bc1\u660e\u65b9\u6cd5\u5177\u6709\u91cd\u8981\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7531\u4e8eNL\u4e0e\u5f62\u5f0f\u8bed\u8a00\uff08FL\uff09\u7684\u8bc1\u660e\u6570\u636e\u7a00\u7f3a\uff0c\u73b0\u4ee3LLMs\u5728\u751f\u6210\u5b8c\u6574\u8bc1\u660e\u65b9\u9762\u7684\u6027\u80fd\u6b20\u4f73\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a**TheoremLlama**\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u901a\u7528LLM\u6210\u4e3aLean4\u4e13\u5bb6\u3002\u8be5\u6846\u67b6\u5305\u62ecNL-FL\u5bf9\u9f50\u6570\u636e\u96c6\u751f\u6210\u65b9\u6cd5\u3001LLM\u5f62\u5f0f\u5b9a\u7406\u8bc1\u660e\u5668\u7684\u8bad\u7ec3\u7b56\u7565\u4ee5\u53caLLM\u5728\u64b0\u5199Lean4\u8bc1\u660e\u4e2d\u7684\u6280\u672f\u3002 \u5173\u952e\u521b\u65b0\u5728\u4e8e\u6211\u4eec\u5f00\u53d1\u4e86NL-FL\u81ea\u4e3e\u65b9\u6cd5\uff0c\u5373\u5c06NL\u8bc1\u660e\u878d\u5165Lean4\u4ee3\u7801\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u6b63\u5f0f\u63a8\u7406\u3002\u901a\u8fc7\u8fd9\u79cd\u6570\u636e\u96c6\u751f\u6210\u65b9\u5f0f\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86**Open Bootstrapped Theorems**\uff08OBT\uff09\uff0c\u4e00\u4e2a\u5bf9\u9f50\u4e14\u81ea\u4e3e\u7684NL-FL\u6570\u636e\u96c6\u3002**TheoremLlama**\u6846\u67b6\u5728MiniF2F-Valid\u548cTest\u6570\u636e\u96c6\u4e0a\u7684\u7d2f\u8ba1\u51c6\u786e\u7387\u5206\u522b\u8fbe\u523036.48%\u548c33.61%\uff0c\u8d85\u8fc7\u4e86GPT-4\u7684\u57fa\u7ebf\u5206\u657022.95%\u548c25.41%\u3002\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u6a21\u578b\u68c0\u67e5\u70b9\u548c\u751f\u6210\u7684\u6570\u636e\u96c6\uff0c\u5e76\u5373\u5c06\u5168\u90e8\u4ee3\u7801\u5f00\u6e90\u3002**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aDivergent CoT\uff08DCoT\uff09\uff0c\u901a\u8fc7\u8981\u6c42\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u6bd4\u8f83\u591a\u4e2a\u63a8\u7406\u94fe\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u5c0f\u578b\u3001\u66f4\u6613\u4e8e\u83b7\u53d6\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u4e5f\u80fd\u63d0\u9ad8\u8868\u73b0\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6d89\u53ca\u4e0d\u540c\u7c7b\u578b\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u7814\u7a76\u53d1\u73b0\u5bf9DCoT\u6570\u636e\u96c6\u7684\u5fae\u8c03\u5728\u5404\u79cd\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece13\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\u4e0a\u666e\u904d\u4f18\u4e8e\u57fa\u672c\u7684CoT\u65b9\u6cd5\u3002\u5b9e\u9a8c\u548c\u4eba\u5de5\u8bc4\u4f30\u8868\u660e\uff0c\u8fd9\u4e9b\u6027\u80fd\u63d0\u5347\u6e90\u4e8e\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u4e2d\u751f\u6210\u4e86\u591a\u4e2a\u4e0d\u540c\u7684\u63a8\u7406\u8def\u5f84\uff0c\u8fd9\u8868\u660e\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u81ea\u6211\u7ea0\u6b63\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/UKPLab/arxiv2024-divergent-cot\u4e0a\u516c\u5f00\u3002**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u80fd\u529b\u3001\u6cdb\u5316\u80fd\u529b\u548c\u8de8\u9886\u57df\u7684\u6d41\u7545\u6027\uff0c\u5728\u63d0\u5347\u8bed\u97f3\u76f8\u5173\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u672c\u6587\u5173\u6ce8\u7684\u662f\u5982\u4f55\u5c06\u89e3\u7801\u5668\u4ec5\u6709\u7684LLMs\u6574\u5408\u5230\u8bed\u97f3\u8f6c\u6587\u672c\u7ffb\u8bd1\uff08Speech-to-Text Translation\uff0cS2TT\uff09\u4efb\u52a1\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u67b6\u6784\uff0c\u8ba9LLM\u76f4\u63a5\u5904\u7406\u7f16\u7801\u7684\u8bed\u97f3\u8868\u793a\u5e76\u751f\u6210\u6587\u672c\u7ffb\u8bd1\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u548c\u4efb\u52a1\u8868\u8ff0\u65b9\u5f0f\u7684\u5f71\u54cd\u3002\u5728\u4e0d\u4f7f\u7528\u4e13\u6709\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728CoVoST 2\u548cFLEURS\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u8bbe\u8ba1\u9009\u62e9\u7684\u5408\u7406\u6027\uff0c\u5e76\u4e3aLLMs\u4e0eS2TT\u4efb\u52a1\u7684\u878d\u5408\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2407.03160": "|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## \u80cc\u666f \u5f00\u6e90\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u516c\u4f17\u548c\u884c\u4e1a\u4e2d\u7684\u53d7\u6b22\u8fce\u7a0b\u5ea6\u65e5\u76ca\u63d0\u5347\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u5b9a\u5236\u3001\u5fae\u8c03\u4e14\u514d\u8d39\u4f7f\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5f00\u6e90LLMs\u5728\u4f7f\u7528\u524d\u9700\u8981\u5ba1\u6279\uff0c\u8fd9\u4fc3\u4f7f\u7b2c\u4e09\u65b9\u53d1\u5e03\u6613\u4e8e\u83b7\u53d6\u7684\u7248\u672c\uff0c\u751a\u81f3\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u6216\u91cf\u5316\u4f18\u5316\uff0c\u4ee5\u964d\u4f4e\u8ba1\u7b97\u9700\u6c42\u3002\u8fd9\u4e9b\u4fbf\u6377\u7248\u672c\u5bf9\u7528\u6237\u9887\u5177\u5438\u5f15\u529b\uff0c\u4f46\u4e5f\u589e\u52a0\u4e86\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u7684\u98ce\u9669\uff0c\u5a01\u80c1\u5230LLMs\u7684\u5b8c\u6574\u6027\u548c\u5b89\u5168\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u65b9\u6cd5SOS\uff0c\u5b83\u8bbe\u8ba1\u5f97\u8ba1\u7b97\u9700\u6c42\u4f4e\uff0c\u65e0\u9700\u5e72\u51c0\u6570\u636e\u6216\u8c03\u6574\u6a21\u578b\u6743\u91cd\uff0c\u4fdd\u6301\u6a21\u578b\u7684\u53ef\u7528\u6027\u3002SOS\u9488\u5bf9\u5404\u79cd\u573a\u666f\u4e0b\u7684\u5b89\u5168\u95ee\u9898\uff0c\u5305\u62ec\u540e\u95e8\u653b\u51fb\u3001\u7834\u89e3\u653b\u51fb\u548c\u63d0\u793a\u7a83\u53d6\u653b\u51fb\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u653b\u51fb\u5728\u6240\u6709\u8bc4\u4f30\u76ee\u6807\u4e0a\u5747\u6709\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86SOS\u6280\u672f\u7684\u53e6\u4e00\u9762\u2014\u2014\u7248\u6743\u4ee4\u724c\uff1a\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5141\u8bb8\u7528\u6237\u6807\u8bb0\u5176\u7248\u6743\u5185\u5bb9\uff0c\u9632\u6b62\u6a21\u578b\u4f7f\u7528\u3002|\n", "2407.03157": "|**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4ee3\u7801\u751f\u6210\u4e2d\u7684\u5e38\u89c1\u573a\u666f\uff1a\u5f00\u53d1\u8005\u5b9e\u65f6\u7f16\u8f91\u73b0\u6709\u4ee3\u7801\uff0c\u5e76\u8bf7\u6c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u8bed\u8a00\u6a21\u578b\uff09\u8fdb\u884c\u5373\u65f6\u91cd\u9884\u6d4b\u4e0b\u4e00\u4e2atoken\u6216\u884c\u3002\u76f4\u63a5\u7684\u65b9\u6cd5\u662f\u8ba9LLM\u91cd\u65b0\u7f16\u7801\u6574\u4e2a\u952e\u503c\u7f13\u5b58\u4ee5\u63d0\u4f9b\u7cbe\u786e\u7684\u9884\u6d4b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8ba1\u7b97\u6210\u672c\u9ad8\uff0c\u7279\u522b\u662f\u5f53\u5e8f\u5217\u957f\u5ea6\u5f88\u957f\u65f6\u3002\u4ec5\u7f16\u7801\u7f16\u8f91\u540e\u7684\u5b50\u5e8f\u5217\u5e76\u5c06\u5176\u6574\u5408\u5230\u539f\u59cb\u952e\u503c\u7f13\u5b58\u4e2d\u4f1a\u9047\u5230\u65f6\u95f4\u6df7\u6dc6\u95ee\u9898\uff0c\u5bfc\u81f4\u6027\u80fd\u5927\u5e45\u4e0b\u964d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\u2014\u2014\\textbf{\u4f4d\u7f6e\u5b8c\u6574\u6027\u7f16\u7801}\uff08Positional Integrity Encoding\uff0c\u7b80\u79f0PIE\uff09\u3002PIE\u57fa\u4e8e\u65cb\u8f6c\u578b\u4f4d\u7f6e\u7f16\u7801\uff0c\u9996\u5148\u79fb\u9664\u5f15\u5165\u65f6\u95f4\u6df7\u6dc6\u7684\u65cb\u8f6c\u578b\u77e9\u9635\uff0c\u7136\u540e\u91cd\u65b0\u5e94\u7528\u6b63\u786e\u7684\u77e9\u9635\uff0c\u786e\u4fdd\u4e86\u4ee4\u724c\u4e4b\u95f4\u7684\u4f4d\u7f6e\u5173\u7cfb\u6b63\u786e\uff0c\u4ec5\u9700\u4e00\u8f6e\u77e9\u9635\u4e58\u6cd5\u5373\u53ef\u5b8c\u6210\u3002\u6211\u4eec\u5728RepoBench-C-8k\u6570\u636e\u96c6\u4e0a\uff0c\u4f7f\u752813\u4ebf\u300167\u4ebf\u548c330\u4ebf\u53c2\u6570\u7684DeepSeek-Coder\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u4ee3\u7801\u63d2\u5165\u3001\u4ee3\u7801\u5220\u9664\u548c\u591a\u4f4d\u7f6e\u4ee3\u7801\u7f16\u8f91\u7b49\u4e09\u4e2a\u5b9e\u9645\u7f16\u7a0b\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u5b8c\u6574\u91cd\u8ba1\u7b97\u65b9\u6cd5\u76f8\u6bd4\uff0cPIE\u5728\u6240\u6709\u6a21\u578b\u89c4\u6a21\u548c\u4efb\u52a1\u4e2d\u90fd\u80fd\u51cf\u5c11\u8d85\u8fc785%\u7684\u8ba1\u7b97\u5f00\u9500\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6027\u80fd\u8fd1\u4f3c\u3002|\n", "2407.04694": "|**2024-07-05**|**Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs**|Rudolf Laine et.al.|[2407.04694](http://arxiv.org/abs/2407.04694)|**[link](https://github.com/lrudl/sad)**|## \u80cc\u666f \u4eba\u5de5\u667a\u80fd\u52a9\u624b\uff0c\u5982ChatGPT\uff0c\u5728\u88ab\u8bad\u7ec3\u65f6\u4f1a\u56de\u5e94\u7528\u6237\uff1a\u201c\u6211\u662f\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u201d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u7684\u77e5\u9053\u81ea\u5df1\u662fLLMs\uff0c\u5e76\u80fd\u636e\u6b64\u53ef\u9760\u5730\u884c\u52a8\uff1f\u5b83\u4eec\u662f\u5426\u4e86\u89e3\u81ea\u5df1\u5f53\u524d\u7684\u90e8\u7f72\u60c5\u51b5\uff0c\u4f8b\u5982\u9762\u5411\u516c\u4f17\uff1f\u6211\u4eec\u79f0\u4e4b\u4e3a\u6a21\u578b\u7684\u201c\u60c5\u5883\u610f\u8bc6\u201d\u3002\u4e3a\u4e86\u91cf\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u60c5\u5883\u610f\u8bc6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u884c\u4e3a\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u95ee\u7b54\u548c\u6307\u4ee4\u6267\u884c\uff0c\u8fd9\u5c31\u662f**\u60c5\u5883\u610f\u8bc6\u6570\u636e\u96c6\uff08Situational Awareness Dataset\uff0c\u7b80\u79f0SAD\uff09**\u3002\u8be5\u57fa\u51c6\u5305\u62ec7\u4e2a\u4efb\u52a1\u7c7b\u522b\uff0c\u8d85\u8fc713,000\u4e2a\u95ee\u9898\uff0c\u6d4b\u8bd5\u4e86\u591a\u9879\u80fd\u529b\uff0c\u5982\u8bc6\u522b\u81ea\u8eab\u751f\u6210\u7684\u6587\u672c\u3001\u9884\u6d4b\u81ea\u5df1\u7684\u884c\u4e3a\u3001\u5206\u8fa8\u63d0\u793a\u6765\u81ea\u5185\u90e8\u8bc4\u4f30\u8fd8\u662f\u5b9e\u9645\u5e94\u7528\uff0c\u4ee5\u53ca\u9075\u5faa\u4f9d\u8d56\u81ea\u6211\u8ba4\u77e5\u7684\u6307\u4ee4\u3002 \u6211\u4eec\u5bf916\u79cdLLMs\u5728SAD\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u57fa\u7840\uff08\u9884\u8bad\u7ec3\uff09\u6a21\u578b\u548c\u804a\u5929\u6a21\u578b\u3002\u5c3d\u7ba1\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\uff0c\u4f46\u6700\u9ad8\u5206\u7684\u6a21\u578b\uff08Claude 3 Opus\uff09\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0SAD\u7684\u8868\u73b0\u4e0e\u901a\u7528\u77e5\u8bc6\u6307\u6807\uff08\u5982MMLU\uff09\u7684\u76f8\u5173\u6027\u5e76\u4e0d\u5b8c\u5168\u4e00\u81f4\u3002\u804a\u5929\u6a21\u578b\uff0c\u7ecf\u8fc7\u9488\u5bf9\u6027\u8bad\u7ec3\u4ee5\u4f5c\u4e3aAI\u52a9\u624b\uff0c\u76f8\u5bf9\u4e8e\u57fa\u7840\u6a21\u578b\u5728SAD\u4e0a\u7684\u8868\u73b0\u66f4\u597d\uff0c\u4f46\u5728\u901a\u7528\u77e5\u8bc6\u4efb\u52a1\u4e0a\u5219\u4e0d\u7136\u3002SAD\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5206\u89e3\u6210\u53ef\u91cf\u5316\u7684\u80fd\u529b\uff0c\u4fc3\u8fdb\u79d1\u5b66\u754c\u5bf9LLMs\u60c5\u5883\u610f\u8bc6\u7684\u7406\u89e3\u3002\u60c5\u5883\u610f\u8bc6\u5bf9\u4e8e\u589e\u5f3a\u6a21\u578b\u7684\u81ea\u4e3b\u89c4\u5212\u548c\u884c\u52a8\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u8fd9\u65e2\u6709\u5229\u4e8e\u81ea\u52a8\u5316\uff0c\u4e5f\u5e26\u6765\u4e86\u4e0eAI\u5b89\u5168\u548c\u63a7\u5236\u76f8\u5173\u7684\u5168\u65b0\u98ce\u9669\u3002\u60a8\u53ef\u4ee5\u5728\u83b7\u53d6\u4ee3\u7801\u548c\u6700\u65b0\u7ed3\u679c\u3002|\n", "2407.04693": "|**2024-07-05**|**ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models**|Yuzhe Gu et.al.|[2407.04693](http://arxiv.org/abs/2407.04693)|**[link](https://github.com/open-compass/anah)**|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8de8\u9886\u57df\u548c\u5e7f\u6cdb\u5e94\u7528\u7684\u957f\u683c\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\u4f1a\u51fa\u73b0\u5e7b\u89c9\u3002\u5f53\u524d\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u7f13\u89e3\u6570\u636e\u96c6\u5728\u9886\u57df\u8986\u76d6\u548c\u89c4\u6a21\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u7531\u4e8e\u52b3\u52a8\u6210\u672c\u9ad8\u6602\u4e14\u73b0\u6709\u5e7b\u89c9\u6807\u6ce8\u5458\u7684\u53ef\u9760\u6027\u4e0d\u8db3\uff0c\u96be\u4ee5\u5b9e\u73b0\u89c4\u6a21\u5316\u3002\u4e3a\u4e86\u63a8\u52a8\u5bf9LLMs\u5e7b\u89c9\u7684\u53ef\u6269\u5c55\u76d1\u7763\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u81ea\u6211\u8bad\u7ec3\u6846\u67b6\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u671f\u671b\u6700\u5927\u5316\uff08EM\uff09\u7b97\u6cd5\uff0c\u6bcf\u6b21\u8fed\u4ee3\u9996\u5148\u4f7f\u7528\u4e00\u4e2a\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u6765\u6807\u8bb0\u6269\u5927\u7684\u6570\u636e\u96c6\uff0c\u7136\u540e\u7528\u8fd9\u4e2a\u66f4\u51c6\u786e\u7684\u6807\u6ce8\u5668\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u5728\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u4e2d\uff0c\u4f7f\u7528\u65b0\u7684\u6807\u6ce8\u5668\u66f4\u65b0\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\uff0c\u6700\u7ec8\u5f97\u5230\u7684\u4ec5\u97007\u4ebf\u53c2\u6570\u7684\u5e7b\u89c9\u6807\u6ce8\u5668\u8d85\u8d8a\u4e86GPT-4\u7684\u8868\u73b0\uff0c\u5e76\u5728HaluEval\u548cHalluQA\u4e0a\u7684\u96f6\u6837\u672c\u63a8\u7406\u4e2d\u53d6\u5f97\u4e86\u6700\u65b0\u7684\u5e7b\u89c9\u68c0\u6d4b\u6548\u679c\u3002\u8fd9\u79cd\u6807\u6ce8\u5668\u4e0d\u4ec5\u80fd\u591f\u8bc4\u4f30\u4e0d\u540cLLMs\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u5e7b\u89c9\u7a0b\u5ea6\uff0c\u8fd8\u80fd\u901a\u8fc7NLI\u6307\u6807\u63d0\u5347\uff08\u4ece25%\u63d0\u9ad8\u523037%\uff09\u6765\u5e2e\u52a9\u51cf\u8f7b\u751f\u6210\u6587\u672c\u7684\u5e7b\u89c9\u95ee\u9898\u3002|\n", "2407.04681": "|**2024-07-05**|**Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge**|Yuanze Lin et.al.|[2407.04681](http://arxiv.org/abs/2407.04681)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u4f7f\u7528\u5927\u578b\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u6587\u672c\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u540e\uff0c\u5728\u6574\u4f53\u7406\u89e3\u56fe\u50cf\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u6587\u672c\u5f62\u5f0f\u56fa\u6709\u7684\u56f0\u96be\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u9700\u8981\u7cbe\u7ec6\u6216\u7a7a\u95f4\u5bc6\u96c6\u4fe1\u606f\uff08\u5982\u906e\u7f69\uff09\u7684\u95ee\u9898\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u5bf9\u8be6\u7ec6\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u80fd\u529b\u3002\u53d7\u5230\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7406\u5ff5\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u89c6\u89c9\u63d0\u793a\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6765\u81ea\u4e13\u95e8\u89c6\u89c9\u6a21\u578b\uff08\u5982\u5b9e\u4f8b\u5206\u5272\u548cOCR\u6a21\u578b\uff09\u7684\u7cbe\u7ec6\u5916\u90e8\u77e5\u8bc6\u878d\u5165MLLM\u3002\u8fd9\u662f\u4e00\u4e2a\u6709\u524d\u666f\u4f46\u5c1a\u672a\u5145\u5206\u63a2\u7d22\u7684\u65b9\u5411\uff0c\u53ef\u4ee5\u63d0\u5347MLLM\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u533a\u522b\u4e8e\u540c\u65f6\u671f\u7684\u5de5\u4f5c\uff0c\u5b83\u4eec\u5c06\u5916\u90e8\u77e5\u8bc6\u8f6c\u5316\u4e3a\u989d\u5916\u7684\u6587\u672c\u63d0\u793a\uff0c\u8feb\u4f7f\u6a21\u578b\u95f4\u63a5\u5b66\u4e60\u89c6\u89c9\u5185\u5bb9\u4e0e\u6587\u672c\u5750\u6807\u4e4b\u95f4\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u8bae\u5c06\u7cbe\u7ec6\u77e5\u8bc6\u4fe1\u606f\u76f4\u63a5\u5d4c\u5165\u5230\u4e00\u4e2a\u7a7a\u95f4\u5d4c\u5165\u56fe\u4e2d\u4f5c\u4e3a\u89c6\u89c9\u63d0\u793a\u3002\u8fd9\u79cd\u8bbe\u8ba1\u53ef\u4ee5\u8f7b\u677e\u5730\u6574\u5408\u8fdb\u5404\u79cdMLLM\uff0c\u5982LLaVA\u548cMipha\uff0c\u663e\u8457\u63d0\u9ad8\u5b83\u4eec\u7684\u89c6\u89c9\u7406\u89e3\u6027\u80fd\u3002\u901a\u8fc7\u4e25\u8c28\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u4e5d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u63d0\u5347MLLM\u7684\u6574\u4f53\u6027\u80fd\uff0c\u589e\u5f3a\u5176\u5bf9\u7ec6\u7c92\u5ea6\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u80fd\u529b\u3002|\n", "2407.04675": "|**2024-07-05**|**Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition**|Ye Bai et.al.|[2407.04675](http://arxiv.org/abs/2407.04675)|null|\u73b0\u4ee3\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u9700\u8981\u51c6\u786e\u8f6c\u5f55\u6765\u81ea\u4e0d\u540c\u9886\u57df\u3001\u8bed\u8a00\u548c\u53e3\u97f3\u7684\u591a\u6837\u8bed\u97f3\u4fe1\u53f7\uff0c\u540c\u65f6\u8003\u8651\u5230\u7279\u5b9a\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ee5\u9002\u5e94\u5404\u79cd\u5e94\u7528\u573a\u666f\u7684\u9700\u6c42\u3002\u4f20\u7edf\u7684\u7aef\u5230\u7aef\u6a21\u578b\u7ed3\u5408\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u6570\u636e\u5339\u914d\u573a\u666f\u4e2d\u6548\u679c\u826f\u597d\uff0c\u4f46\u9010\u6e10\u9762\u4e34\u74f6\u9888\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u578b\u8bed\u97f3\u8bc6\u522b\u6a21\u578b\u2014\u2014Seed-ASR\u3002\u5b83\u5efa\u7acb\u5728\u97f3\u9891\u6761\u4ef6\u5316LLM\uff08AcLLM\uff09\u67b6\u6784\u4e4b\u4e0a\uff0c\u901a\u8fc7\u5c06\u8fde\u7eed\u8bed\u97f3\u8868\u793a\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5229\u7528\u4e86LLM\u7684\u5f3a\u5927\u529f\u80fd\u3002\u901a\u8fc7\u5206\u9636\u6bb5\u7684\u5927\u89c4\u6a21\u8bad\u7ec3\u4ee5\u53ca\u5728LLM\u4e2d\u6fc0\u53d1\u4e0a\u4e0b\u6587\u611f\u77e5\u80fd\u529b\uff0cSeed-ASR\u5728\u5305\u62ec\u591a\u4e2a\u9886\u57df\u3001\u65b9\u8a00\u548c\u8bed\u8a00\u7684\u7efc\u5408\u8bc4\u4f30\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u7aef\u5230\u7aef\u6a21\u578b\u3002\u6b64\u5916\uff0cSeed-ASR\u80fd\u591f\u90e8\u7f72\u5230\u5404\u79cd\u573a\u666f\u4e2d\u652f\u6301\u7279\u5b9a\u9700\u6c42\uff0c\u65e0\u9700\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u3002\u4e0e\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578bASR\u6a21\u578b\u76f8\u6bd4\uff0cSeed-ASR\u5728\u4e2d\u6587\u548c\u82f1\u6587\u516c\u5f00\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8bcd\uff08\u6216\u5b57\u7b26\uff0c\u9488\u5bf9\u4e2d\u6587\uff09\u9519\u8bef\u7387\u964d\u4f4e\u4e8610%-40%\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.04656": "|**2024-07-05**|**Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement**|Yongji Wu et.al.|[2407.04656](http://arxiv.org/abs/2407.04656)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u7a00\u758f\u6fc0\u6d3b\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u8ba1\u7b97\u6210\u672c\u7684\u4e9a\u7ebf\u6027\u6269\u5c55\u800c\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u3002\u7136\u800c\uff0c\u9891\u7e41\u7684\u8bad\u7ec3\u5931\u8d25\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u56e0\u4e3a\u5355\u6b21\u5931\u8d25\u53ef\u80fd\u5bfc\u81f4\u6240\u6709GPU\u9677\u5165\u95f2\u7f6e\uff0c\u76f4\u81f3\u95ee\u9898\u89e3\u51b3\uff0c\u4ece\u800c\u53ef\u80fd\u4e22\u5931\u5927\u91cf\u8bad\u7ec3\u8fdb\u5ea6\uff0c\u9700\u8981\u4ece\u68c0\u67e5\u70b9\u91cd\u65b0\u5f00\u59cb\u3002\u73b0\u6709\u7684\u9ad8\u6548\u5bb9\u9519\u8bad\u7ec3\u89e3\u51b3\u65b9\u6848\u8981\u4e48\u7f3a\u4e4f\u5f39\u6027\uff0c\u8981\u4e48\u4f9d\u8d56\u4e8e\u5c06\u6062\u590d\u80fd\u529b\u6784\u5efa\u5230\u7ba1\u9053\u5e76\u884c\u6027\u4e2d\uff0c\u4f46\u8fd9\u4e0d\u9002\u7528\u4e8eMoE\u6a21\u578b\uff0c\u56e0\u4e3aMoE\u67b6\u6784\u91c7\u7528\u4e86\u4e13\u5bb6\u5e76\u884c\u7b56\u7565\u3002 \u6211\u4eec\u63d0\u51fa\u4e86Lazarus\uff0c\u4e00\u4e2a\u9488\u5bf9MoE\u6a21\u578b\u8fdb\u884c\u5bb9\u9519\u548c\u5f39\u6027\u7684\u8bad\u7ec3\u7cfb\u7edf\u3002Lazarus\u901a\u8fc7\u52a8\u6001\u5206\u914d\u4e13\u5bb6\u526f\u672c\u6765\u5e94\u5bf9\u4e13\u5bb6\u5de5\u4f5c\u8d1f\u8f7d\u7684\u56fa\u6709\u4e0d\u5e73\u8861\uff0c\u4ece\u800c\u52a0\u901f\u8bad\u7ec3\uff0c\u5e76\u5f00\u53d1\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u6700\u4f18\u7684\u4e13\u5bb6\u653e\u7f6e\u7b97\u6cd5\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8\u5728\u5931\u8d25\u540e\u7684\u6062\u590d\u6982\u7387\u3002\u901a\u8fc7\u81ea\u9002\u5e94\u7684\u4e13\u5bb6\u653e\u7f6e\u548c\u7075\u6d3b\u7684\u4ee4\u724c\u5206\u53d1\u5668\uff0cLazarus\u80fd\u591f\u5728\u6545\u969c\u540e\u5145\u5206\u5229\u7528\u6240\u6709\u53ef\u7528\u8282\u70b9\uff0c\u907f\u514dGPU\u7a7a\u95f2\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u4e0e\u73b0\u6709MoE\u8bad\u7ec3\u7cfb\u7edf\u76f8\u6bd4\uff0cLazarus\u5728\u9891\u7e41\u7684\u8282\u70b9\u6545\u969c\u4e0b\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe5.7\u500d\uff0c\u4e14\u5728\u771f\u5b9espot\u5b9e\u4f8b\u8ddf\u8e2a\u4e0a\u63d0\u5347\u4e863.4\u500d\u3002|\n", "2407.04629": "|**2024-07-05**|**Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework**|Reza Averly et.al.|[2407.04629](http://arxiv.org/abs/2407.04629)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u4e34\u5e8a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08Clinical NER\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u4ece\u4e34\u5e8a\u75c5\u5386\u4e2d\u63d0\u53d6\u91cd\u8981\u5b9e\u4f53\u7684\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4e13\u6709\u7684LLMs\uff0c\u4f46\u8bba\u6587\u63a2\u8ba8\u4e86\u5f00\u653e\u7684\u3001\u4e13\u95e8\u4e3a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u8bad\u7ec3\u7684LLMs\u5728\u4e34\u5e8aNER\u4e2d\u7684\u6027\u80fd\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u5b9e\u4f53\u5206\u89e3\u4e0e\u8fc7\u6ee4\u201d\uff08Entity Decomposition with Filtering\uff0cEDF\uff09\uff0c\u76ee\u7684\u662f\u901a\u8fc7\u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u5b9e\u4f53\u7c7b\u578b\u7684\u68c0\u7d22\uff0c\u5e76\u5f15\u5165\u4e00\u4e2a\u8fc7\u6ee4\u673a\u5236\u6765\u6d88\u9664\u9519\u8bef\u5b9e\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5728\u6240\u6709\u5ea6\u91cf\u6807\u51c6\u3001\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5b9e\u4f53\u7c7b\u578b\u4e0a\u90fd\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002\u5206\u6790\u663e\u793a\uff0c\u5b9e\u4f53\u5206\u89e3\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5bf9\u5148\u524d\u672a\u88ab\u6355\u6349\u5230\u7684\u5b9e\u4f53\u7684\u8bc6\u522b\u3002\u6b64\u5916\uff0c\u8bba\u6587\u8fd8\u63d0\u4f9b\u4e86\u5bf9\u6846\u67b6\u7684\u5168\u9762\u8bc4\u4f30\u548c\u6df1\u5165\u7684\u9519\u8bef\u5206\u6790\uff0c\u4ee5\u671f\u4e3a\u672a\u6765\u7684\u7814\u7a76\u63d0\u4f9b\u65b9\u5411\u3002|\n", "2407.04622": "|**2024-07-05**|**On scalable oversight with weak LLMs judging strong LLMs**|Zachary Kenton et.al.|[2407.04622](http://arxiv.org/abs/2407.04622)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u53ef\u6269\u5c55\u7684\u76d1\u7763\u534f\u8bae\uff0c\u76ee\u6807\u662f\u8ba9\u4eba\u7c7b\u80fd\u591f\u6709\u6548\u76d1\u7763\u8d85\u8d8a\u4eba\u7c7b\u7ea7\u522b\u7684AI\u3002\u7814\u7a76\u4e3b\u8981\u805a\u7126\u5728\u8fa9\u8bba\u3001\u54a8\u8be2\u548c\u76f4\u63a5\u95ee\u7b54\u4e09\u79cd\u5f62\u5f0f\u4e0a\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aAI\u4ee3\u7406\u548c\u6cd5\u5b98\u89d2\u8272\uff0c\u5047\u8bbe\u6cd5\u5b98\u6a21\u578b\u8f83\u5f31\u3002\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u4efb\u52a1\u5f02\u8d28\u6027\uff0c\u6269\u5c55\u4e86\u5148\u524d\u4ec5\u5173\u6ce8\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u5355\u4e00\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\uff0c\u589e\u52a0\u4e86\u6570\u5b66\u3001\u7f16\u7a0b\u3001\u903b\u8f91\u548c\u591a\u6a21\u6001\u63a8\u7406\u7b49\u9886\u57df\u7684\u6311\u6218\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u6240\u6709\u4efb\u52a1\u4e2d\uff0c\u5f53\u54a8\u8be2\u5e08\u968f\u673a\u88ab\u5206\u914d\u6b63\u786e\u6216\u9519\u8bef\u7b54\u6848\u65f6\uff0c\u8fa9\u8bba\u4f18\u4e8e\u54a8\u8be2\u3002\u5728\u5b58\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u8fa9\u8bba\u4f18\u4e8e\u76f4\u63a5\u95ee\u7b54\uff0c\u4f46\u5728\u5176\u4ed6\u6ca1\u6709\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u4efb\u52a1\u4e2d\uff0c\u7ed3\u679c\u5219\u4e0d\u4e00\u3002\u5f53AI\u88ab\u5141\u8bb8\u9009\u62e9\u8981\u8bba\u8bc1\u7684\u7b54\u6848\u800c\u975e\u9884\u5148\u6307\u5b9a\u65f6\uff0c\u53d1\u73b0\u6cd5\u5b98\u88ab\u9519\u8bef\u7b54\u6848\u8bf4\u670d\u7684\u60c5\u51b5\u5728\u8fa9\u8bba\u4e2d\u51cf\u5c11\u3002\u6b64\u5916\uff0c\u66f4\u5f3a\u7684\u8fa9\u8bba\u8005\u6a21\u578b\u80fd\u63d0\u9ad8\u6cd5\u5b98\u7684\u51c6\u786e\u6027\uff0c\u5c3d\u7ba1\u63d0\u5347\u7a0b\u5ea6\u7565\u4f4e\u4e8e\u4e4b\u524d\u7684\u7814\u7a76\u3002|\n", "2407.04581": "|**2024-07-05**|**Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions**|Shumaila Javaid et.al.|[2407.04581](http://arxiv.org/abs/2407.04581)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u878d\u5165\u96c6\u6210\u536b\u661f\u3001\u822a\u7a7a\u548c\u5730\u9762\u7f51\u7edc\uff08ISATN\uff09\u7684\u53d8\u9769\u6f5c\u529b\uff0c\u5229\u7528\u5148\u8fdb\u7684\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u4f18\u5316\u8fd9\u4e9b\u7f51\u7edc\u7684\u8fde\u901a\u6027\u3002\u9996\u5148\u6982\u8ff0\u4e86ISATN\u7684\u5f53\u524d\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86LLMs\u5728\u63d0\u5347\u6570\u636e\u6d41\u3001\u4fe1\u53f7\u5904\u7406\u548c\u7f51\u7edc\u7ba1\u7406\u65b9\u9762\u7684\u4f5c\u7528\uff0c\u4ee5\u63a8\u52a85G/6G\u901a\u4fe1\u6280\u672f\u7684\u53d1\u5c55\uff0c\u901a\u8fc7\u9ad8\u7ea7\u9884\u6d4b\u7b97\u6cd5\u548c\u5b9e\u65f6\u51b3\u7b56\u6765\u589e\u5f3a\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6df1\u5165\u5206\u6790\u4e86ISATN\u7ec4\u4ef6\uff0c\u63a2\u8ba8\u4e86\u5982\u4f55\u6709\u6548\u5730\u5229\u7528LLMs\u89e3\u51b3\u4f20\u7edf\u6570\u636e\u4f20\u8f93\u548c\u5904\u7406\u4e2d\u7684\u74f6\u9888\u95ee\u9898\u3002 \u6587\u7ae0\u7740\u91cd\u4e8eISATN\u7684\u7f51\u7edc\u7ba1\u7406\u6311\u6218\uff0c\u5305\u62ec\u8d44\u6e90\u5206\u914d\u7b56\u7565\u3001\u6d41\u91cf\u8def\u7531\u4ee5\u53ca\u5728\u4e0d\u65ad\u53d8\u5316\u6761\u4ef6\u4e0b\u786e\u4fdd\u65e0\u7f1d\u8fde\u63a5\u548c\u6700\u4f18\u6027\u80fd\u7684\u7f51\u7edc\u5b89\u5168\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5c06LLMs\u6574\u5408\u5230ISATN\u4e2d\u6240\u9762\u4e34\u7684\u6280\u672f\u6311\u6218\uff0c\u5982\u6570\u636e\u96c6\u6210\u3001\u6269\u5c55\u6027\u95ee\u9898\u3001\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u5ef6\u8fdf\uff0c\u4ee5\u53ca\u6784\u5efa\u5065\u58ee\u4e14\u5bb9\u9519\u7684\u7cfb\u7edf\u8bbe\u8ba1\u3002\u6700\u540e\uff0c\u7814\u7a76\u6307\u51fa\u4e86\u672a\u6765\u7814\u7a76\u7684\u5173\u952e\u65b9\u5411\uff0c\u5373\u5982\u4f55\u5145\u5206\u5229\u7528LLM\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u5347\u7f51\u7edc\u53ef\u9760\u6027\u3001\u4f18\u5316\u6027\u80fd\uff0c\u5b9e\u73b0\u4e00\u4e2a\u771f\u6b63\u5168\u7403\u4e92\u8054\u4e14\u667a\u80fd\u7684\u7f51\u7edc\u4f53\u7cfb\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u6027\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04541": "|**2024-07-05**|**PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts**|Ana-Cristina Rogoz et.al.|[2407.04541](http://arxiv.org/abs/2407.04541)|**[link](https://github.com/ana-rogoz/poprero)**|**\u6211\u4eec\u63a8\u51fa\u4e86PoPreRo\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u7f57\u9a6c\u5c3c\u4e9aReddit\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6536\u96c6\u7684dataset\u3002PoPreRo\u6c47\u96c6\u4e86\u4e94\u4e2a\u4e0d\u540c\u7f57\u9a6c\u5c3c\u4e9a\u5b50\u8bba\u575b\u7684\u591a\u6837\u5316\u5e16\u5b50\u6837\u672c\uff0c\u603b\u8ba1\u5305\u542b28,107\u6761\u6570\u636e\u3002\u968f\u6570\u636e\u96c6\u4e00\u540c\u53d1\u5e03\u7684\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u7ade\u4e89\u6027\u6a21\u578b\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u7840\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6d4b\u8bd5\u96c6\u4e0a\u5f97\u5206\u6700\u9ad8\u7684\u6a21\u578b\u8fbe\u5230\u4e8661.35%\u7684\u51c6\u786e\u7387\u548c60.60%\u7684\u5b8fF1\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5728PoPreRo\u4e0a\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u5bf9Falcon-7B\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u4e00\u6b65\u63a2\u7a76\u4e5f\u6307\u5411\u4e86\u540c\u6837\u7684\u7ed3\u8bba\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u76f8\u4fe1PoPreRo\u662f\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u8d44\u6e90\uff0c\u53ef\u4ee5\u7528\u6765\u8bc4\u4f30\u7f57\u9a6c\u5c3c\u4e9a\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/ana-rogoz/PoPreRo\u3002**|\n", "2407.06189": "|**2024-07-08**|**Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision**|Orr Zohar et.al.|[2407.06189](http://arxiv.org/abs/2407.06189)|**[link](https://github.com/orrzohar/Video-STaR)**|**\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u7684\u6027\u80fd\u4e0e\u5176\u8bad\u7ec3\u6570\u636e\u7684\u89c4\u6a21\u548c\u8d28\u91cf\u5bc6\u5207\u76f8\u5173\u3002\u5f53\u524d\u7684\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e3b\u8981\u7531\u63d0\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u89c6\u9891\u5b57\u5e55\u4ee5\u5f62\u6210\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5185\u5bb9\u591a\u4e3a\u63cf\u8ff0\u6027\u3002\u7136\u800c\uff0c\u8bb8\u591a\u5e26\u6709\u4e30\u5bcc\u6807\u7b7e\u548c\u76d1\u7763\u7684\u89c6\u9891\u6570\u636e\u96c6\u5df2\u7ecf\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u5c06\u5b83\u4eec\u878d\u5165LVLM\u5e76\u975e\u6613\u4e8b\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u4e0e\u589e\u5f3a\u63a8\u7406\uff08Video Self-Training with augmented Reasoning\uff0c\u7b80\u79f0Video-STaR\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u65b9\u6cd5\u3002Video-STaR\u4f7f\u5f97\u4efb\u4f55\u6807\u6ce8\u7684\u89c6\u9891\u6570\u636e\u96c6\u90fd\u80fd\u7528\u4e8e\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLVLM\u5728\u751f\u6210\u6307\u4ee4\u548c\u5fae\u8c03\u4e4b\u95f4\u5faa\u73af\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u8fd9\u4e0d\u4ec5\u80fd\u63d0\u5347\u89c6\u9891\u6574\u4f53\u7406\u89e3\u80fd\u529b\uff08I\uff09\uff0c\u8fd8\u80fd\u8ba9LVLM\u9002\u5e94\u65b0\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u5229\u7528\u73b0\u6709\u76d1\u7763\u8fdb\u884c\u5b66\u4e60\u3002 \u5177\u4f53\u6765\u8bf4\uff0cLVLM\u88ab\u63d0\u793a\u63d0\u51fa\u4e00\u4e2a\u7b54\u6848\uff0c\u7136\u540e\u4ec5\u4fdd\u7559\u90a3\u4e9b\u5305\u542b\u539f\u59cb\u89c6\u9891\u6807\u7b7e\u7684\u7b54\u6848\u3002LVLM\u968f\u540e\u5728\u751f\u6210\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u518d\u8bad\u7ec3\u3002\u901a\u8fc7\u53ea\u5728\u5305\u542b\u6b63\u786e\u89c6\u9891\u6807\u7b7e\u7684\u751f\u6210\u7b54\u6848\u4e0a\u8bad\u7ec3\uff0cVideo-STaR\u5229\u7528\u73b0\u6709\u7684\u89c6\u9891\u6807\u7b7e\u4f5c\u4e3a\u5f31\u76d1\u7763\u6765\u6307\u5bfc\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7Video-STaR\u589e\u5f3a\u7684LVLM\u5728\uff08I\uff09\u4e00\u822c\u89c6\u9891\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63d0\u5347\u4e8610%\uff0c\u5728\uff08II\uff09\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0cVideo-STaR\u63d0\u9ad8\u4e86Kinetics700-QA\u7684\u51c6\u786e\u602720%\uff0c\u4ee5\u53caFineDiving\u52a8\u4f5c\u8d28\u91cf\u8bc4\u4f30\u7684\u6027\u80fd15%\u3002\u603b\u7684\u6765\u8bf4\uff0cVideo-STaR\u4e3aLVLM\u7684\u6027\u80fd\u63d0\u5347\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u6548\u4e14\u5b9e\u7528\u7684\u65b9\u6cd5\u3002**|\n", "2407.06188": "|**2024-07-08**|**CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation**|Xinying Guo et.al.|[2407.06188](http://arxiv.org/abs/2407.06188)|null|\u5728\u5a31\u4e50\u884c\u4e1a\uff08\u5982\u52a8\u753b\u548c\u6e38\u620f\uff09\u4ee5\u53ca\u6218\u7565\u9886\u57df\uff08\u5982\u57ce\u5e02\u6a21\u62df\u548c\u89c4\u5212\uff09\u4e2d\uff0c\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9700\u8981\u7cbe\u7ec6\u5730\u878d\u5408\u63a7\u5236\u4e0e\u751f\u6210\uff0c\u4ee5\u5728\u7279\u5b9a\u7684\u7a7a\u95f4\u548c\u8bed\u4e49\u7ea6\u675f\u4e0b\u5b9e\u73b0\u903c\u771f\u7684\u7fa4\u4f53\u52a8\u6001\u5408\u6210\uff0c\u5176\u6311\u6218\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u6a21\u578b\u5f80\u5f80\u5173\u6ce8\u4e2a\u4f53\u884c\u4e3a\uff0c\u5ffd\u89c6\u4e86\u96c6\u4f53\u884c\u4e3a\u7684\u590d\u6742\u6027\uff1b\u800c\u591a\u4e2a\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u7684\u6700\u65b0\u65b9\u6cd5\u4e25\u91cd\u4f9d\u8d56\u9884\u8bbe\u573a\u666f\uff0c\u4e14\u9650\u4e8e\u56fa\u5b9a\u3001\u5c11\u91cf\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u9650\u5236\u4e86\u5176\u5b9e\u7528\u6027\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCrowdMoGen\uff0c\u4e00\u4e2a\u96f6\u6837\u672c\u6587\u672c\u9a71\u52a8\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u529b\u91cf\uff0c\u5c06\u96c6\u4f53\u667a\u6167\u878d\u5165\u8fd0\u52a8\u751f\u6210\u6846\u67b6\uff0c\u4ece\u800c\u80fd\u591f\u5728\u6ca1\u6709\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u901a\u7528\u7684\u89c4\u5212\u548c\u7fa4\u4f53\u8fd0\u52a8\u751f\u6210\u3002\u6211\u4eec\u7684\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a1\uff09\u4eba\u7fa4\u573a\u666f\u89c4\u5212\u5668\uff0c\u5b66\u4e60\u6839\u636e\u7279\u5b9a\u573a\u666f\u4e0a\u4e0b\u6587\u6216\u5f15\u5165\u7684\u6270\u52a8\u534f\u8c03\u8fd0\u52a8\u548c\u52a8\u6001\uff1b2\uff09\u96c6\u4f53\u8fd0\u52a8\u751f\u6210\u5668\uff0c\u6839\u636e\u6574\u4f53\u8ba1\u5212\u9ad8\u6548\u5408\u6210\u6240\u9700\u7684\u96c6\u4f53\u8fd0\u52a8\u3002\u5927\u91cf\u7684\u5b9a\u91cf\u548c\u5b9a\u6027\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5b83\u4e0d\u4ec5\u586b\u8865\u4e86\u5927\u89c4\u6a21\u548c\u901a\u7528\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u4efb\u52a1\u7684\u91cd\u8981\u7a7a\u767d\uff0c\u800c\u4e14\u5728\u771f\u5b9e\u611f\u548c\u7075\u6d3b\u6027\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6c34\u51c6\u3002|\n", "2407.06172": "|**2024-07-08**|**On Speeding Up Language Model Evaluation**|Jin Peng Zhou et.al.|[2407.06172](http://arxiv.org/abs/2407.06172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\uff0c\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u80fd\u529b\u3002\u4ece\u8bad\u7ec3\u5230\u63a8\u7406\uff0c\u6784\u5efa\u8fd9\u6837\u7684\u6a21\u578b\u6d89\u53ca\u4f17\u591a\u51b3\u7b56\uff0c\u5f62\u6210\u4e00\u4e2a\u590d\u6742\u7684\u641c\u7d22\u95ee\u9898\u3002\u4f8b\u5982\uff0c\u4e3a\u4e86\u4e3a\u7279\u5b9a\u4efb\u52a1\u627e\u5230\u6700\u4f73\u7684\u9884\u8bad\u7ec3LLM\u3001\u63d0\u793a\u6216\u8d85\u53c2\u6570\uff0c\u901a\u5e38\u9700\u8981\u5bf9\u6574\u4e2a\u6d4b\u8bd5\u96c6\u4e2d\u7684\u591a\u4e2a\u5019\u9009\u65b9\u6848\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002\u8fd9\u79cd\u8be6\u5c3d\u7684\u8bc4\u4f30\u8017\u65f6\u4e14\u6602\u8d35\uff0c\u56e0\u4e3aLLMs\u7684\u63a8\u7406\u548c\u5ea6\u91cf\u8ba1\u7b97\u9700\u6c42\u9ad8\u3002 \u672c\u6587\u9488\u5bf9\u5728\u6709\u9650\u9884\u7b97\u5185\u6709\u6548\u8bc4\u4f30\u65b9\u6cd5\u5728\u6d4b\u8bd5\u6837\u672c\u4e0a\u7684\u6027\u80fd\u8fd9\u4e00\u6311\u6218\u3002\u6211\u4eec\u5229\u7528\u4e86\u5e7f\u6cdb\u7814\u7a76\u7684\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u987a\u5e8f\u9009\u62e9\u4e0b\u4e00\u4e2a\u8981\u8bc4\u4f30\u7684\u65b9\u6cd5-\u793a\u4f8b\u5bf9\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u2014\u2014\u7ed3\u5408\u591a\u81c2\u8001\u864e\u673a\u7b97\u6cd5\u4e0e\u4f4e\u79e9\u5206\u89e3\u2014\u2014\u663e\u8457\u51cf\u5c11\u4e86\u6240\u9700\u7684\u8d44\u6e90\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b97\u6cd5\u4ec5\u4f7f\u7528\u901a\u5e38\u9700\u6c42\u76845%-15%\u8d44\u6e90\uff0c\u5c31\u80fd\u8bc6\u522b\u51fa\u8868\u73b0\u6700\u597d\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9ad8\u8fbe85%-95%\u7684\u6210\u672c\u8282\u7701\u3002|\n", "2407.06153": "|**2024-07-08**|**What's Wrong with Your Code Generated by Large Language Models? An Extensive Study**|Shihan Dou et.al.|[2407.06153](http://arxiv.org/abs/2407.06153)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u7814\u7a76\u4eba\u5458\u5bf9\u6b64\u7684\u5173\u6ce8\u5ea6\u65e5\u76ca\u63d0\u9ad8\u3002\u76ee\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6784\u5efa\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u548c\u91c7\u7528\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6280\u672f\u6765\u63d0\u5347LLM\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u4e9b\u73b0\u6709\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u548c\u8fb9\u754c\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7814\u7a76\u63a2\u8ba8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8be6\u5c3d\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u8bc4\u4f30\u4e86\u4e09\u4e2a\u9886\u5148\u95ed\u6e90LLM\u548c\u56db\u4e2a\u5f00\u6e90LLM\u5728\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u7814\u7a76\u8003\u5bdf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u957f\u5ea6\u3001\u5faa\u73af\u590d\u6742\u5ea6\u548cAPI\u6570\u91cf\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u66f4\u590d\u6742\u7684\u7f16\u7a0b\u95ee\u9898\u65f6\u9762\u4e34\u6311\u6218\uff0c\u751f\u6210\u7684\u4ee3\u7801\u5f80\u5f80\u8f83\u77ed\u4f46\u7ed3\u6784\u66f4\u590d\u6742\uff0c\u4e0e\u6807\u51c6\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\u3002 \u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\u548c12\u4e2a\u5b50\u7c7b\u522b\uff0c\u5206\u6790\u5e38\u89c1\u9519\u8bef\u7c7b\u578b\u7684\u6839\u6e90\u3002\u4e3a\u4e86\u68c0\u9a8cLLMs\u5728\u5b9e\u9645\u9879\u76ee\u4e2d\u7684\u8868\u73b0\uff0c\u6211\u4eec\u4eb2\u624b\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b140\u4e2a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u5b9e\u4e16\u754c\u57fa\u51c6\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5b9e\u9645\u573a\u666f\u4e2d\u7684bug\u5206\u5e03\u4e0e\u73b0\u6709\u57fa\u51c6\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u989d\u5916\u8bad\u7ec3\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u5f15\u5165\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4f7fLLMs\u80fd\u591f\u6839\u636ebug\u7c7b\u578b\u548c\u7f16\u8bd1\u5668\u53cd\u9988\u4fee\u6b63\u5176\u751f\u6210\u7684\u4ee3\u7801\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7ecf\u8fc7\u4e24\u6b21\u8fed\u4ee3\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u51cf\u5c11\u9519\u8bef\uff0c\u4f7f\u901a\u8fc7\u7387\u63d0\u9ad829.2%\uff0c\u8fd9\u8868\u660eLLMs\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65b9\u9762\u5177\u6709\u5de8\u5927\u6f5c\u529b\u3002|\n", "2407.06146": "|**2024-07-09**|**Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks**|Lukas Netz et.al.|[2407.06146](http://arxiv.org/abs/2407.06146)|null|\u6211\u4eec\u4ecb\u7ecd\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bed\u6cd5\u906e\u76d6\u201d\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u65e0\u5173\u6587\u6cd5\u7684\u7ea6\u675f\u4e0b\u751f\u6210\u8bed\u6cd5\u6b63\u786e\u7684\u6a21\u578b\u3002\u5c3d\u7ba1\u5c11\u91cf\u793a\u4f8b\u5b66\u4e60\u6216\u63d0\u793a\u5f15\u5bfc\u7b49prompt\u5de5\u7a0b\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8LLMs\u751f\u6210\u6b63\u786e\u8bed\u6cd5\u7684\u6982\u7387\uff0c\u4f46\u5904\u7406\u590d\u6742\u6587\u6cd5\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u8017\u65f6\u4e14\u6548\u679c\u4e0d\u7406\u60f3\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u6216prompt\u5de5\u7a0b\u4e0a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ea6\u675f\u89e3\u7801\u9650\u5236\u8f93\u51fa\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u6709\u6548\u8bed\u6cd5\u3002\u6211\u4eec\u5229\u7528MontiCore\u6784\u5efa\u7684\u591a\u79cd\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u548c\u591a\u6b3eLLMs\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u4e86\u4f7f\u7528\u548c\u672a\u4f7f\u7528\u7ea6\u675f\u89e3\u7801\u7684\u6548\u679c\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u76f8\u5e94\u7684\u89e3\u6790\u5668\u9a8c\u8bc1\u6bcf\u79cd\u6a21\u578b\u7684\u53e5\u6cd5\u51c6\u786e\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bed\u6cd5\u906e\u76d6\u663e\u8457\u63d0\u5347\u4e86\u591a\u4e2aLLMs\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u51cf\u5c11\u4e86\u5bf9\u7cbe\u5fc3\u8bbe\u8ba1\u63d0\u793a\u7684\u9700\u6c42\uff0c\u63d0\u9ad8\u4e86\u751f\u6210\u6b63\u786e\u6a21\u578b\u7684\u53ef\u80fd\u6027\u3002|\n", "2407.06135": "|**2024-07-08**|**ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation**|Ethan Chern et.al.|[2407.06135](http://arxiv.org/abs/2407.06135)|**[link](https://github.com/gair-nlp/anole)**|**## \u80cc\u666f \u5148\u524d\u7684\u5f00\u6e90\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\uff1a\uff081\uff09\u5b83\u4eec\u5f80\u5f80\u7f3a\u4e4f\u539f\u751f\u96c6\u6210\uff0c\u9700\u8981\u9002\u914d\u5668\u6765\u8854\u63a5\u89c6\u89c9\u8868\u793a\u4e0e\u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u8bb8\u591a\u6a21\u578b\u4ec5\u9650\u4e8e\u5355\u6a21\u6001\u751f\u6210\uff1b\uff083\uff09\u5c3d\u7ba1\u6709\u4e9b\u652f\u6301\u591a\u6a21\u6001\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5355\u72ec\u7684\u6269\u6563\u6a21\u578b\u5904\u7406\u89c6\u89c9\u90e8\u5206\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Anole\uff0c\u4e00\u4e2a\u5f00\u6e90\u7684\u3001\u81ea\u56de\u5f52\u7684\u3001\u539f\u751f\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff0c\u4e13\u4e3a\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u751f\u6210\u8bbe\u8ba1\u3002\u6211\u4eec\u57fa\u4e8eMeta AI\u7684Chameleon\u6784\u5efaAnole\uff0c\u91c7\u7528\u4e86\u4e00\u79cd\u65e2\u6570\u636e\u9ad8\u6548\u53c8\u53c2\u6570\u9ad8\u6548\u7684\u521b\u65b0\u5fae\u8c03\u7b56\u7565\u3002Anole\u5c55\u793a\u4e86\u9ad8\u8d28\u91cf\u3001\u8fde\u8d2f\u7684\u591a\u6a21\u6001\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u5df2\u7ecf\u516c\u5f00\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3001\u8bad\u7ec3\u6846\u67b6\u4ee5\u53ca\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002**|\n", "2407.06129": "|**2024-07-08**|**Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization**|Hannah K. Bako et.al.|[2407.06129](http://arxiv.org/abs/2407.06129)|**[link](https://github.com/hdi-umd/semantic_profiling_llm_evaluation)**|**### \u6982\u8ff0 \u81ea\u52a8\u6839\u636e\u4eba\u7c7b\u5bf9\u6570\u636e\u96c6\u7684\u53e3\u5934\u63cf\u8ff0\u751f\u6210\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\uff0c\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u8bed\u8a00\u4e2d\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u5305\u62ec\u5bf9\u6570\u636e\u5c5e\u6027\u3001\u53ef\u89c6\u5316\u4efb\u52a1\u4ee5\u53ca\u6570\u636e\u9884\u5904\u7406\u6b65\u9aa4\u7684\u9690\u542b\u548c\u660e\u786e\u63d0\u53ca\u3002\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff08NLIs\uff09\u5728\u6570\u636e\u53ef\u89c6\u5316\u65b9\u9762\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5982\u4f55\u6355\u6349\u8fd9\u4e9b\u4fe1\u606f\uff0c\u4f46\u4eba\u7c7b\u8a00\u8bed\u7684\u4e0d\u786e\u5b9a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u4f46\u5b83\u4eec\u63d0\u53d6\u76f8\u5173\u8bed\u4e49\u4fe1\u606f\u7684\u80fd\u529b\u5c1a\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86\u56db\u6b3e\u516c\u5f00\u53ef\u7528\u7684LLMs\uff08GPT-4\u3001Gemini-Pro\u3001Llama3\u548cMixtral\uff09\uff0c\u5206\u6790\u5b83\u4eec\u5728\u9762\u5bf9\u4e0d\u786e\u5b9a\u6027\u65f6\u7406\u89e3\u53e3\u5934\u6307\u4ee4\u7684\u80fd\u529b\uff0c\u5e76\u8bc6\u522b\u6570\u636e\u4e0a\u4e0b\u6587\u548c\u89c6\u89c9\u4efb\u52a1\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5bf9\u53e3\u8bed\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u5f88\u654f\u611f\uff0c\u80fd\u591f\u63d0\u53d6\u5173\u952e\u7684\u6570\u636e\u80cc\u666f\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u63a8\u65ad\u53ef\u89c6\u5316\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u6b20\u4f73\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u672a\u6765\u5229\u7528LLMs\u8fdb\u884c\u53ef\u89c6\u5316\u751f\u6210\u7684\u7814\u7a76\u65b9\u5411\u3002**|\n", "2407.06125": "|**2024-07-08**|**Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities**|Avinash Anand et.al.|[2407.06125](http://arxiv.org/abs/2407.06125)|null|\u6291\u90c1\u75c7\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u91cd\u5927\u7684\u516c\u5171\u536b\u751f\u95ee\u9898\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4eba\u7684\u5fc3\u7406\u5065\u5eb7\u3002\u672a\u7ecf\u8bca\u65ad\u7684\u6291\u90c1\u75c7\u53ef\u80fd\u5bfc\u81f4\u4e25\u91cd\u7684\u5065\u5eb7\u95ee\u9898\uff0c\u5305\u62ec\u751f\u7406\u75c7\u72b6\u751a\u81f3\u81ea\u6740\u3002\u901a\u5e38\uff0c\u6291\u90c1\u75c7\u7684\u8bca\u65ad\u4f9d\u8d56\u4e8e\u4e34\u5e8a\u533b\u751f\u548c\u5fc3\u7406\u5065\u5eb7\u4e13\u4e1a\u4eba\u5458\u8fdb\u884c\u7684\u7ed3\u6784\u5316\u8bbf\u8c08\u548c\u5982Patient Health Questionnaire\uff08PHQ\uff09\u7b49\u95ee\u5377\u8c03\u67e5\u3002\u7136\u800c\uff0c\u8fd9\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u533b\u751f\u7684\u7ecf\u9a8c\u548c\u5224\u65ad\uff0c\u53ef\u80fd\u53d7\u5230\u4e2a\u4eba\u504f\u89c1\u7684\u5f71\u54cd\u3002\u7531\u4e8e\u6291\u90c1\u75c7\u7684\u6210\u56e0\u4ecd\u5728\u7814\u7a76\u4e2d\uff0c\u533b\u751f\u5728\u8bc6\u522b\u548c\u6cbb\u7597\u521d\u671f\u9636\u6bb5\u7684\u6291\u90c1\u75c7\u65f6\u9762\u4e34\u6311\u6218\u3002 \u8fd1\u671f\uff0c\u4eba\u5de5\u667a\u80fd\u795e\u7ecf\u8ba1\u7b97\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u8bed\u97f3\u5904\u7406\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u6211\u4eec\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528\u8fd9\u4e9b\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5728E-DAIC\uff08Extended Distress Analysis Interview Corpus Wizard of Oz\uff09\u6570\u636e\u96c6\u548c2019\u5e74Audio/Visual Emotion Challenge\uff08AVEC\uff09\u4e2d\u8fdb\u884c\u5b9e\u9a8c\uff0c\u4ee5\u671f\u4f18\u5316\u591a\u6a21\u6001\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u5229\u7528\u4e13\u6709\u548c\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u6a21\u6001\u4e0a\u7684Root Mean Square Error\uff08RMSE\uff09\u5f97\u5206\u8fbe\u52303.98\uff0c\u4f18\u4e8eAVEC 2019\u6311\u6218\u7684\u57fa\u7ebf\u548c\u5f53\u524d\u6700\u4f73\u7684\u56de\u5f52\u5206\u6790\u67b6\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u51c6\u786e\u6027\u8fbe\u5230\u4e8671.43%\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u97f3\u9891-\u89c6\u89c9\u591a\u6a21\u6001\u7f51\u7edc\uff0c\u5176\u9884\u6d4bPHQ-8\u8bc4\u5206\u7684RMSE\u4e3a6.51\u3002|\n", "2407.06093": "|**2024-07-08**|**Artificial Intuition: Efficient Classification of Scientific Abstracts**|Harsh Sakhrani et.al.|[2407.06093](http://arxiv.org/abs/2407.06093)|null|## \u80cc\u666f \u4e3a\u4e86\u83b7\u53d6\u6218\u7565\u6d1e\u89c1\u6216\u8fdb\u884c\u79d1\u7814\u9879\u76ee\u7ba1\u7406\uff0c\u5bf9\u7b80\u77ed\u7684\u79d1\u5b66\u6587\u672c\uff08\u5982\u7814\u7a76\u57fa\u91d1\u7533\u8bf7\u4e66\u6216\u51fa\u7248\u7269\u6458\u8981\uff09\u8fdb\u884c\u7c97\u7c92\u5ea6\u5206\u7c7b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u6587\u672c\u5411\u5177\u5907\u6df1\u539a\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u4f20\u8fbe\u5bc6\u96c6\u4fe1\u606f\uff0c\u4f46\u81ea\u52a8\u5316\u7684\u4efb\u52a1\u6781\u5176\u8270\u5de8\uff0c\u56e0\u4e3a\u7bc7\u5e45\u6709\u9650\u4e14\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\u6765\u751f\u6210\u5e76\u51c6\u786e\u5206\u914d\u7279\u5b9a\u9886\u57df\u7684\u7c97\u6807\u7b7e\u3002\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u591f\u63d0\u4f9b\u4efb\u52a1\u6240\u9700\u7684\u5143\u6570\u636e\uff0c\u7c7b\u4f3c\u4e8e\u589e\u5f3a\u4eba\u7c7b\u76f4\u89c9\u7684\u8865\u5145\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u3002\u4f5c\u4e3a\u521d\u6b65\u5b9e\u9a8c\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u7f8e\u56fd\u56fd\u5bb6\u822a\u7a7a\u822a\u5929\u5c40\uff08NASA\uff09\u7684\u5956\u9879\u6458\u8981\u6570\u636e\u5e93\u3002\u6211\u4eec\u7ed3\u5408\u73b0\u6709\u6027\u80fd\u6307\u6807\uff0c\u5f00\u53d1\u4e86\u65b0\u7684\u8bc4\u4f30\u5de5\u5177\u3002|\n", "2407.06089": "|**2024-07-08**|**Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models**|Jinliang Lu et.al.|[2407.06089](http://arxiv.org/abs/2407.06089)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u6210\u529f\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u8fdb\u5165\u4e86\u65b0\u65f6\u4ee3\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5404\u6709\u6240\u957f\uff0c\u4f46\u8bad\u7ec3\u5728\u4e0d\u540c\u8bed\u6599\u5e93\u4e0a\u7684LLMs\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u8fd9\u7ed9\u63d0\u9ad8\u6574\u4f53\u6548\u7387\u548c\u7075\u6d3b\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u7d22\u4e86LLMs\u7684\u534f\u4f5c\u7b56\u7565\u3002\u672c\u6587\u5168\u9762\u6982\u8ff0\u4e86\u8fd9\u4e00\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u5f3a\u8c03\u4e86\u5408\u4f5c\u80cc\u540e\u7684\u52a8\u529b\u3002\u6211\u4eec\u5c06\u534f\u4f5c\u7b56\u7565\u4e3b\u8981\u5206\u4e3a\u4e09\u79cd\u65b9\u6cd5\uff1a\u5408\u5e76\u3001\u96c6\u6210\u548c\u534f\u4f5c\u3002\u5408\u5e76\u662f\u5c06\u591a\u4e2aLLMs\u7684\u53c2\u6570\u7a7a\u95f4\u6574\u5408\u3002\u96c6\u6210\u5219\u662f\u7ed3\u5408\u591a\u4e2a\u6a21\u578b\u7684\u8f93\u51fa\u3002\u534f\u4f5c\u5229\u7528\u4e0d\u540cLLMs\u7684\u4f18\u52bf\uff0c\u4f7f\u5176\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u53d1\u6325\u5404\u81ea\u4e13\u957f\u3002\u6211\u4eec\u5c06\u4ece\u4e0d\u540c\u89d2\u5ea6\u8be6\u7ec6\u4ecb\u7ecd\u8fd9\u4e9b\u65b9\u6cd5\uff0c\u5e76\u8ba8\u8bba\u5176\u6f5c\u5728\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u52fe\u52d2\u51fa\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u671f\u671b\u672c\u5de5\u4f5c\u80fd\u6fc0\u53d1\u66f4\u591a\u5173\u4e8eLLMs\u534f\u4f5c\u7684\u7814\u7a76\uff0c\u63a8\u52a8\u9ad8\u7ea7NLP\u5e94\u7528\u7684\u53d1\u5c55\u3002|\n", "2407.07094": "|**2024-07-09**|**AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning**|Jiaxi Cui et.al.|[2407.07094](http://arxiv.org/abs/2407.07094)|**[link](https://github.com/pandavt/datatager)**|**\u5728\u5404\u884c\u5404\u4e1a\u5e7f\u6cdb\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc7\u7a0b\u4e2d\uff0c\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e2a\u4f53\u548c\u5c0f\u578b\u7ec4\u7ec7\u5bf9\u9488\u5bf9\u5176\u7279\u5b9a\u4e1a\u52a1\u573a\u666f\u5b9a\u5236\u5316\u6a21\u578b\u7684\u9700\u6c42\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\\textbf{AnyTaskTune}\uff0c\u5373\u4efb\u52a1\u5fae\u8c03\uff08Task-Fine-Tune\uff09\uff0c\u65e8\u5728\u63d0\u5347\u6a21\u578b\u5728\u591a\u6837\u5316\u7684\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u5305\u62ec\u7ec6\u81f4\u5730\u8bc6\u522b\u548c\u5b9a\u4e49\u9886\u57df\u5185\u7684\u5b50\u4efb\u52a1\uff0c\u968f\u540e\u521b\u5efa\u4e13\u95e8\u7684\u589e\u5f3a\u6570\u636e\u96c6\u8fdb\u884c\u7cbe\u7ec6\u8c03\u6574\uff0c\u4ece\u800c\u4f18\u5316\u4efb\u52a1\u7279\u5b9a\u7684\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u6cd5\u5f8b\uff08\u5982\u5173\u952e\u8bcd\u63d0\u53d6\u548c\u53e5\u5b50\u9884\u6d4b\uff09\u7b49\u591a\u4e2a\u9886\u57df\uff0c\u5305\u62ec\u91d1\u878d\u3001\u533b\u7597\u3001\u6cd5\u5f8b\u3001\u5fc3\u7406\u5b66\u3001\u5ba2\u6237\u670d\u52a1\u548c\u4eba\u529b\u8d44\u6e90\u7b49\u4e8c\u5341\u591a\u4e2a\u5b50\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5fae\u8c03\u5b9e\u9a8c\u3002\u4e3a\u4e86\u652f\u6301\u793e\u533a\u53c2\u4e0e\u5e76\u5206\u4eab\u8d44\u6e90\uff0c\u6211\u4eec\u5c06\u5f00\u6e90\u8fd9\u4e9b\u53cc\u8bed\u4efb\u52a1\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\\textbf{Task-Fine-Tune}\u65b9\u6cd5\u5fae\u8c03\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5404\u81ea\u9886\u57df\u5185\u660e\u663e\u4f18\u4e8e\u901a\u7528\u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/PandaVT/DataTager}\u3002**|\n", "2407.07093": "|**2024-07-09**|**FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation**|Liqun Ma et.al.|[2407.07093](http://arxiv.org/abs/2407.07093)|**[link](https://github.com/liqunma/fbi-llm)**|**\u8be5\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u5168\u4e8c\u8fdb\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08FBI-LLM\uff09\uff0c\u8fd9\u662f\u9996\u6b21\u5c55\u793a\u5982\u4f55\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5927\u89c4\u6a21\u7684\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\uff08\u4e0d\u540c\u4e8e\u90e8\u5206\u4e8c\u8fdb\u5236\u6216\u4e09\u8fdb\u5236\u7684LSTM\uff0c\u5982BitNet b1.58\uff09\uff0c\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6d6e\u70b916\u4f4d\uff08FP16\uff09\u6216\u6df7\u5408\u7cbe\u5ea616\u4f4d\uff08BF16\uff09\u7684\u5e38\u89c4\u5927\u8bed\u8a00\u6a21\u578b\u76f8\u5f53\u3002\u901a\u8fc7\u4f7f\u7528\u81ea\u56de\u5f52\u84b8\u998f\uff08AD\uff09\u635f\u5931\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u5c3a\u5bf8\uff08130M\u300113B\u30017B\uff09\u548c\u9884\u8bad\u7ec3\u6570\u636e\u91cf\u4e0e\u5e38\u89c4LLM\u76f8\u5f53\uff0cFBI-LLM\u5728\u56f0\u60d1\u5ea6\u548c\u4efb\u52a1\u7279\u5b9a\u6548\u679c\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\u5e76\u4e0d\u9700\u8981\u9884\u8bad\u7ec3\u6743\u91cd\u3002\u8fd9\u9879\u5de5\u4f5c\u50ac\u751f\u4e86\u4e00\u4e2a\u65b0\u7684\u8ba1\u7b97\u6846\u67b6\uff0c\u5e76\u53ef\u80fd\u63a8\u52a8\u9488\u5bf9\u5b8c\u51681\u6bd4\u7279LLMs\u7684\u4e13\u4e1a\u786c\u4ef6\u8bbe\u8ba1\u3002\u6211\u4eec\u516c\u5f00\u6240\u6709\u6a21\u578b\u3001\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff08\u4ee3\u7801\uff1ahttps://github.com/LiqunMa/FBI-LLM\uff0c\u6a21\u578b\uff1ahttps://huggingface.co/LiqunMa/\uff09\u3002**|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5173\u952e\u65b0\u589e\u7684\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u51c6\u786e\u6027\u6765\u9010\u6b65\u4f18\u5316\u3002\u5728Melting Pot\u57fa\u51c6\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u5047\u8bbe\u5fc3\u667a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7ebf\uff0c\u65e0\u8bba\u662f\u5728\u4e8c\u5143\u73af\u5883\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\u4e2d\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u8fed\u4ee3\u7cbe\u70bc\u5bf9\u4e8e\u5e94\u5bf9\u590d\u6742\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.07080": "|**2024-07-09**|**Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities**|Shaltiel Shmidman et.al.|[2407.07080](http://arxiv.org/abs/2407.07080)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u5e0c\u4f2f\u6765\u7b49\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6311\u6218\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86DictaLM2.0\u548cDictaLM2.0-Instruct\uff0c\u8fd9\u4e24\u4e2a\u6a21\u578b\u57fa\u4e8eMistral\u6a21\u578b\uff0c\u4f7f\u7528\u5927\u7ea62000\u4ebf\u4e2a\u5e0c\u4f2f\u6765\u8bed\u548c\u82f1\u8bed\u8bcd\u6c47\u8fdb\u884c\u8bad\u7ec3\u3002\u9002\u5e94\u9884\u8bad\u7ec3\u6a21\u578b\u5230\u65b0\u8bed\u8a00\u9700\u8981\u4e13\u95e8\u7684\u6280\u672f\uff0c\u8fd9\u4e0e\u4ece\u5934\u8bad\u7ec3\u6216\u5728\u8d44\u6e90\u4e30\u5bcc\u7684\u8bed\u8a00\uff08\u5982\u82f1\u8bed\uff09\u4e0a\u8fdb\u4e00\u6b65\u8bad\u7ec3\u73b0\u6709\u6a21\u578b\u6709\u663e\u8457\u5dee\u5f02\u3002\u8bba\u6587\u8be6\u7ec6\u9610\u8ff0\u4e86\u8fd9\u4e9b\u521b\u65b0\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdb\u5e0c\u4f2f\u6765\u8bed\u7684\u9ad8\u6548\u5b66\u4e60\u548c\u9002\u5e94\u5176\u8bed\u8a00\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9DictaLM2.0-Instruct\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6307\u4ee4\u5fae\u8c03\uff0c\u4ee5\u63d0\u5347\u5176\u5728\u4efb\u52a1\u5bfc\u5411\u6307\u4ee4\u4e0a\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u4e25\u683c\u8bc4\u4f30\u6211\u4eec\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u65b0\u7684\u5e0c\u4f2f\u6765LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u95ee\u7b54\u3001\u60c5\u611f\u5206\u6790\u3001Winograd Schema Challenge\u3001\u7ffb\u8bd1\u548c\u6458\u8981\u7b49\u591a\u4e2a\u4efb\u52a1\u3002\u672c\u6587\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3LLMs\u7684\u590d\u6742\u6027\uff0c\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u7528\u4e8e\u5176\u4ed6LLM\u8de8\u975e\u82f1\u8bed\u8bed\u8a00\u9002\u5e94\u7684\u6846\u67b6\uff0c\u4ece\u800c\u5bf9\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2407.07071": "|**2024-07-09**|**Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps**|Yung-Sung Chuang et.al.|[2407.07071](http://arxiv.org/abs/2407.07071)|**[link](https://github.com/voidism/lookback-lens)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u603b\u7ed3\u6587\u7ae0\u6216\u6839\u636e\u7ed9\u5b9a\u6bb5\u843d\u56de\u7b54\u95ee\u9898\u65f6\u53ef\u80fd\u51fa\u73b0\u7684\u8bed\u5883\u6027\u865a\u6784\u95ee\u9898\u3002LLMs\u53ef\u80fd\u4f1a\u675c\u64b0\u7ec6\u8282\uff0c\u63d0\u4f9b\u4e0e\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0d\u7b26\u7684\u4e0d\u51c6\u786e\u7b54\u6848\u3002\u7814\u7a76\u8005\u63d0\u51fa\uff0c\u8fd9\u79cd\u865a\u6784\u4e0e\u6a21\u578b\u503e\u5411\u4e8e\u5173\u6ce8\u4e0a\u4e0b\u6587\u4fe1\u606f\u8fd8\u662f\u81ea\u52a8\u751f\u6210\u5185\u5bb9\u7684\u7a0b\u5ea6\u6709\u5173\u3002\u4e3a\u6b64\uff0c\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7b80\u5355\u7684\u68c0\u6d4b\u6a21\u578b\u2014\u2014\u201cLookback Lens\u201d\uff0c\u5176\u8f93\u5165\u7279\u5f81\u662f\u57fa\u4e8e\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u4e0a\u4e0b\u6587\u6ce8\u610f\u529b\u6743\u91cd\u4e0e\u65b0\u751f\u6210\u8bcd\u7684\u6bd4\u4f8b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u4f7f\u7528\u8fd9\u4e9b\u56de\u987e\u6bd4\u7387\u7279\u5f81\u7684\u7ebf\u6027\u5206\u7c7b\u5668\u4e0e\u5229\u7528LLM\u6574\u4e2a\u9690\u85cf\u72b6\u6001\u6216\u6587\u672c\u8574\u542b\u6a21\u578b\u7684\u66f4\u590d\u6742\u68c0\u6d4b\u5668\u540c\u6837\u6709\u6548\u3002Lookback Lens\u4e0d\u4ec5\u9002\u7528\u4e8e\u4e0d\u540c\u4efb\u52a1\uff0c\u8fd8\u80fd\u8de8\u6a21\u578b\u8fc1\u79fb\uff0c\u4e00\u4e2a\u572870\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u8bad\u7ec3\u7684\u68c0\u6d4b\u5668\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u5373\u53ef\u5e94\u7528\u4e8e\u66f4\u5927\u7684130\u4ebf\u53c2\u6570\u6a21\u578b\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0\uff0c\u901a\u8fc7\u7b80\u5355\u7684\u5206\u7c7b\u5668\u6307\u5bfc\u89e3\u7801\u65b9\u6cd5\uff0c\u80fd\u591f\u51cf\u5c11\u8bf8\u5982XSum\u6458\u8981\u4efb\u52a1\u4e2d\u7684\u865a\u6784\u7a0b\u5ea6\uff0c\u4f8b\u5982\u964d\u4f4e9.6%\u7684\u865a\u6784\u53d1\u751f\u7387\u3002**|\n", "2407.07064": "|**2024-07-09**|**Prompting Techniques for Secure Code Generation: A Systematic Investigation**|Catherine Tony et.al.|[2407.07064](http://arxiv.org/abs/2407.07064)|null|## \u6982\u8981 \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u5174\u8d77\uff0c\u901a\u8fc7\u63d0\u793a\u9a71\u52a8\u7f16\u7a0b\uff0c\u5f00\u53d1\u8005\u80fd\u591f\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5b83\u4eec\u80fd\u5426\u4ea7\u751f\u5b89\u5168\u4ee3\u7801\u7684\u7814\u7a76\u5f15\u53d1\u4e86\u8d28\u7591\uff0c\u8fd9\u5173\u7cfb\u5230\u63d0\u793a\u751f\u6210\u8f6f\u4ef6\u7684\u8d28\u91cf\u3002\u5c3d\u7ba1\u5df2\u7ecf\u51fa\u73b0\u4e86\u591a\u79cd\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u4ee5\u4f18\u5316LLM\u7684\u54cd\u5e94\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u76ee\u6807\uff1a\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0d\u540c\u63d0\u793a\u6280\u672f\u5bf9LLMs\u6839\u636eNL\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u7684\u5b89\u5168\u6027\u5f71\u54cd\u3002\u65b9\u6cd5\uff1a\u9996\u5148\uff0c\u6211\u4eec\u8fdb\u884c\u7cfb\u7edf\u6587\u732e\u56de\u987e\uff0c\u4ee5\u8bc6\u522b\u9002\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u6709\u63d0\u793a\u6280\u672f\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728GPT-3\u3001GPT-3.5\u548cGPT-4\u6a21\u578b\u4e0a\u8bc4\u4f30\u8fd9\u4e9b\u6280\u672f\u4e2d\u7684\u90e8\u5206\uff0c\u4f7f\u7528\u4e00\u4e2a\u5305\u542b150\u4e2a\u4e0e\u5b89\u5168\u76f8\u5173\u7684\u4ee3\u7801\u751f\u6210NL\u63d0\u793a\u7684\u6570\u636e\u96c6\u3002\u7ed3\u679c\uff1a\u6211\u4eec\u7684\u5de5\u4f5c\uff081\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u7684\u6f5c\u5728\u63d0\u793a\u6280\u672f\u8fdb\u884c\u4e86\u5206\u7c7b\uff0c\uff082\uff09\u9002\u5e94\u5e76\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\uff083\uff09\u89c2\u5bdf\u5230\u5728\u6d4b\u8bd5\u7684LLMs\u4e2d\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u4e86\u540d\u4e3a\u201c\u9012\u5f52\u6279\u8bc4\u4e0e\u6539\u8fdb\u201d\uff08RCI\uff09\u7684\u73b0\u6709\u6280\u672f\u540e\uff0c\u5b89\u5168\u6f0f\u6d1e\u6709\u6240\u51cf\u5c11\uff0c\u4e3aLLM\u751f\u6210\u4ee3\u7801\u5b89\u5168\u6027\u7684\u8ba8\u8bba\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.07061": "|**2024-07-09**|**Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence**|Weize Chen et.al.|[2407.07061](http://arxiv.org/abs/2407.07061)|**[link](https://github.com/openbmb/ioa)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u51fa\u73b0\u4e86\u80fd\u6548\u5353\u8d8a\u7684\u81ea\u4e3b\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5728\u6574\u5408\u6765\u81ea\u4e0d\u540c\u751f\u6001\u7cfb\u7edf\u7684\u9ad8\u80fd\u529b\u7b2c\u4e09\u65b9\u4ee3\u7406\u65f6\u9762\u4e34\u6311\u6218\uff0c\u901a\u5e38\u5c40\u9650\u4e8e\u81ea\u8eab\u5c01\u95ed\u73af\u5883\u3002\u5b83\u4eec\u5728\u6a21\u62df\u5206\u5e03\u5f0f\u73af\u5883\u65f6\u4e5f\u53d7\u9650\u4e8e\u5355\u8bbe\u5907\u8bbe\u7f6e\uff0c\u5e76\u4e14\u5f80\u5f80\u4f9d\u8d56\u786c\u7f16\u7801\u7684\u901a\u4fe1\u7ba1\u9053\uff0c\u96be\u4ee5\u9002\u5e94\u4efb\u52a1\u9700\u6c42\u7684\u53d8\u5316\u3002\u53d7\u4e92\u8054\u7f51\u7406\u5ff5\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u4e92\u8054\u7f51\u201d\uff08Internet of Agents\uff0cIoA\uff09\u7684\u65b0\u6846\u67b6\u3002IoA\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7075\u6d3b\u4e14\u53ef\u6269\u5c55\u7684\u5e73\u53f0\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u3002\u5b83\u5f15\u5165\u4e86\u4ee3\u7406\u96c6\u6210\u534f\u8bae\u3001\u5373\u65f6\u6d88\u606f\u67b6\u6784\u4ee5\u53ca\u52a8\u6001\u7684\u56e2\u961f\u534f\u4f5c\u548c\u5bf9\u8bdd\u6d41\u7a0b\u63a7\u5236\u673a\u5236\u3002\u901a\u8fc7\u5728\u901a\u7528\u52a9\u624b\u4efb\u52a1\u3001\u4f53\u611fAI\u4efb\u52a1\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660eIoA\u5728\u6027\u80fd\u4e0a\u6301\u7eed\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\uff0c\u5c55\u793a\u4e86\u5176\u5728\u5f02\u6784\u4ee3\u7406\u4e4b\u95f4\u6709\u6548\u5408\u4f5c\u7684\u80fd\u529b\u3002IoA\u4ee3\u8868\u4e86\u671d\u7740\u5c06\u591a\u6837\u5316\u7684\u4ee3\u7406\u94fe\u63a5\u5728\u4e00\u4e2a\u7c7b\u4f3c\u4e92\u8054\u7f51\u7684\u73af\u5883\u4e2d\u8fc8\u8fdb\uff0c\u8ba9\u5b83\u4eec\u80fd\u591f\u65e0\u7f1d\u534f\u4f5c\u4ee5\u63d0\u5347\u6574\u4f53\u667a\u80fd\u548c\u529f\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5e93\u5df2\u53d1\u5e03\u5728\uff1a\\url{https://github.com/OpenBMB/IoA}\u3002**|\n", "2407.07053": "|**2024-07-09**|**Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model**|Wenqi Zhang et.al.|[2407.07053](http://arxiv.org/abs/2407.07053)|**[link](https://github.com/zwq2018/multi-modal-self-instruct)**|**\u5c3d\u7ba1\u5f53\u524d\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5df2\u7ecf\u80fd\u591f\u7406\u89e3\u81ea\u7136\u573a\u666f\u7684\u7167\u7247\u548c\u8096\u50cf\uff0c\u4f46\u5b83\u4eec\u5bf9\u62bd\u8c61\u56fe\u50cf\uff08\u5982\u56fe\u8868\u3001\u5730\u56fe\u6216\u5e03\u5c40\uff09\u7684\u7406\u89e3\u4ee5\u53ca\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u4ecd\u7136\u76f8\u5f53\u521d\u7ea7\u3002\u5b83\u4eec\u5728\u5904\u7406\u65e5\u5e38\u4efb\u52a1\u65f6\u5e38\u5e38\u9047\u5230\u56f0\u96be\uff0c\u4f8b\u5982\u9605\u8bfb\u65f6\u949f\u65f6\u95f4\u3001\u7406\u89e3\u6d41\u7a0b\u56fe\u6216\u6839\u636e\u8def\u7ebf\u56fe\u89c4\u5212\u8def\u5f84\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u81ea\u6211\u6307\u5bfc\u7cfb\u7edf\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ca\u5176\u4ee3\u7801\u80fd\u529b\u6765\u751f\u6210\u5927\u91cf\u7684\u62bd\u8c61\u56fe\u50cf\u548c\u65e5\u5e38\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u63a8\u7406\u6307\u4ee4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8f7b\u677e\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\uff0c\u5305\u542b11,193\u4e2a\u6307\u4ee4\uff0c\u6db5\u76d6\u516b\u4e2a\u89c6\u89c9\u573a\u666f\uff1a\u56fe\u8868\u3001\u8868\u683c\u3001\u6a21\u62df\u5730\u56fe\u3001\u4eea\u8868\u677f\u3001\u6d41\u7a0b\u56fe\u3001\u5173\u7cfb\u56fe\u3001\u697c\u5c42\u5e73\u9762\u56fe\u548c\u89c6\u89c9\u8c1c\u9898\u3002 \u8fd9\u4e2a\u7531\u7b80\u5355\u7ebf\u6761\u548c\u51e0\u4f55\u5143\u7d20\u6784\u6210\u7684\u57fa\u51c6\u63ed\u793a\u4e86\u6700\u5148\u8fdb\u7684LMM\uff08\u5982Claude-3.5-Sonnet\u548cGPT-4o\uff09\u5728\u62bd\u8c61\u56fe\u50cf\u7406\u89e3\u3001\u7a7a\u95f4\u5173\u7cfb\u63a8\u7406\u548c\u89c6\u89c9\u5143\u7d20\u8bc6\u522b\u65b9\u9762\u7684\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u9a8c\u8bc1\u5408\u6210\u6570\u636e\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u752862,476\u6761\u5408\u6210\u7684\u56fe\u8868\u3001\u8868\u683c\u548c\u8def\u7ebf\u56fe\u6307\u4ee4\u5bf9LMM\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u56fe\u8868\u7406\u89e3\u548c\u5730\u56fe\u5bfc\u822a\u6027\u80fd\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u540c\u65f6\u4e5f\u8868\u660e\u8fd9\u5bf9\u5176\u4ed6\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u53ef\u80fd\u5177\u6709\u6f5c\u5728\u76ca\u5904\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1a\\url{https://github.com/zwq2018/Multi-modal-Self-instruct}\u3002**|\n", "2407.07019": "|**2024-07-09**|**Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies**|Inwon Kang et.al.|[2407.07019](http://arxiv.org/abs/2407.07019)|null|\u6211\u4eec\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u751f\u6210\u57fa\u4e8e\u6587\u672c\u7684\u5065\u5eb7\u4fdd\u9669\u653f\u7b56\u7684\u81ea\u52a8\u5316\u4ee3\u7801\uff0c\u76ee\u6807\u662f\u533a\u5757\u94fe\u667a\u80fd\u5408\u7ea6\u3002\u667a\u80fd\u5408\u7ea6\u56e0\u5176\u4e0d\u53ef\u53d8\u6027\u3001\u53ef\u9a8c\u8bc1\u6027\u3001\u6269\u5c55\u6027\u548c\u65e0\u9700\u9884\u8bbe\u4fe1\u4efb\u7684\u7279\u6027\u800c\u88ab\u9009\u4e2d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6309\u6280\u672f\u590d\u6742\u5ea6\u9012\u589e\u751f\u6210\u8f93\u51fa\uff1a\uff081\uff09\u6587\u672c\u6458\u8981\uff0c\uff082\uff09\u58f0\u660e\u5f0f\u51b3\u7b56\u903b\u8f91\uff0c\u4ee5\u53ca\uff083\uff09\u5e26\u6709\u5355\u5143\u6d4b\u8bd5\u7684\u667a\u80fd\u5408\u7ea6\u4ee3\u7801\u3002\u6211\u4eec\u786e\u8ba4LLMs\u5728\u4efb\u52a1\uff081\uff09\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u6709\u52a9\u4e8e\u9a8c\u8bc1\u4efb\u52a1\uff082\uff09\u548c\uff083\uff09\u3002\u58f0\u660e\u5f0f\u8bed\u8a00\u5e38\u7528\u4e8e\u89c4\u8303\u533b\u7597\u653f\u7b56\uff0c\u4f46\u5728\u533a\u5757\u94fe\u4e0a\u7684\u6267\u884c\u8f83\u4e3a\u590d\u6742\uff0c\u56e0\u6b64\u4efb\u52a1\uff083\uff09\u65e8\u5728\u76f4\u63a5\u901a\u8fc7\u667a\u80fd\u5408\u7ea6\u81ea\u52a8\u5b9e\u73b0\u8fd9\u4e00\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u5b8c\u6574\u6027\u3001\u6b63\u786e\u6027\u3001\u6e05\u6670\u5ea6\u3001\u8bed\u6cd5\u548c\u529f\u80fd\u6027\u4ee3\u7801\u4f5c\u4e3a\u8bc4\u4f30\u6307\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u6765\u81eaMedicare\u5b98\u65b9\u624b\u518c\u7684\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u96be\u5ea6\u7684\u4fdd\u9669\u653f\u7b56\u573a\u666f\u8fdb\u884c\u8bc4\u4f30\uff0c\u6d89\u53caGPT-3.5 Turbo\u3001GPT-3.5 Turbo 16K\u3001GPT-4\u3001GPT-4 Turbo\u548cCodeLLaMA\u7b49\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5728\u751f\u6210\u6587\u672c\u6458\u8981\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002\u5c3d\u7ba1\u4efb\u52a1\uff082\uff09\u5230\uff083\uff09\u7684\u8f93\u51fa\u53ef\u4ee5\u4f5c\u4e3a\u8d77\u70b9\uff0c\u4f46\u5b83\u4eec\u4ecd\u9700\u4eba\u5de5\u5ba1\u6838\uff1a\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u5373\u4f7f\u201c\u53ef\u8fd0\u884c\u201d\u7684\u4ee3\u7801\u4e5f\u53ef\u80fd\u4ea7\u751f\u4e0d\u6b63\u786e\u7684\u7ed3\u679c\uff1b\u76ee\u6807\u8bed\u8a00\u7684\u6d41\u884c\u7a0b\u5ea6\u4f1a\u5f71\u54cd\u8f93\u51fa\u8d28\u91cf\uff1b\u66f4\u590d\u6742\u7684\u573a\u666f\u4ecd\u662f\u5f53\u524d\u7684\u4e00\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u5c55\u793a\u4e86LLMs\u5728\u5c06\u6587\u672c\u6d41\u7a0b\u63cf\u8ff0\u8f6c\u5316\u4e3a\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2407.07018": "|**2024-07-09**|**End-To-End Causal Effect Estimation from Unstructured Natural Language Data**|Nikita Dhawan et.al.|[2407.07018](http://arxiv.org/abs/2407.07018)|null|\u4e86\u89e3\u5e72\u9884\u63aa\u65bd\u7684\u6548\u679c\u5bf9\u4eba\u7c7b\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u624b\u52a8\u6536\u96c6\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u8fd9\u5bfc\u81f4\u7814\u7a76\u6210\u672c\u589e\u52a0\u3001\u5b8c\u6210\u65f6\u95f4\u5ef6\u957f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u91c7\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u89c2\u5bdf\u6027\u6587\u672c\u6570\u636e\uff0c\u4ee5\u5728\u9002\u5f53\u7684\u56e0\u679c\u5047\u8bbe\u4e0b\u751f\u6210\u4f4e\u6210\u672c\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u63d0\u51faNATURAL\uff0c\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u65b0\u578b\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u7b97\u6cd5\u5bb6\u65cf\uff0c\u9002\u7528\u4e8e\u5904\u7406\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528LLMs\u7684\u6761\u4ef6\u5206\u5e03\uff08\u9488\u5bf9\u611f\u5174\u8da3\u7684\u53d8\u91cf\uff0c\u6839\u636e\u6587\u672c\u6570\u636e\uff09\u8f85\u52a9\u8ba1\u7b97\u7ecf\u5178\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u514b\u670d\u4e86\u4e00\u7cfb\u5217\u6280\u672f\u6311\u6218\uff0c\u5982\u81ea\u52a8\u5316\u6570\u636e\u6574\u7406\u548c\u4f7f\u7528LLMs\u586b\u8865\u7f3a\u5931\u4fe1\u606f\u3002 \u6211\u4eec\u51c6\u5907\u4e86\u516d\u4e2a\uff08\u4e24\u4e2a\u5408\u6210\u7684\u548c\u56db\u4e2a\u5b9e\u9645\u7684\uff09\u89c2\u5bdf\u6027\u6570\u636e\u96c6\uff0c\u5e76\u914d\u4ee5\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u5f62\u5f0f\u7684\u771f\u5b9e\u6807\u7b7e\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u6211\u4eec\u7ba1\u9053\u4e2d\u7684\u6bcf\u4e00\u6b65\u3002NATURAL\u4f30\u8ba1\u7b97\u6cd5\u8868\u73b0\u51fa\u8272\uff0c\u5176\u7ed3\u679c\u4e0e\u771f\u5b9e\u503c\u7684\u5dee\u8ddd\u4e0d\u8d85\u8fc73\u4e2a\u767e\u5206\u70b9\uff0c\u5305\u62ec\u5728\u5b9e\u9645\u7684\u4e09\u671f\u548c\u56db\u671f\u4e34\u5e8a\u8bd5\u9a8c\u4e2d\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u662f\u56e0\u679c\u6548\u5e94\u4fe1\u606f\u7684\u4e30\u5bcc\u6765\u6e90\uff0cNATURAL\u662f\u5229\u7528\u8fd9\u4e00\u8d44\u6e90\u7684\u81ea\u52a8\u5316\u6d41\u7a0b\u7684\u7b2c\u4e00\u6b65\u3002|\n", "2407.07890": "|**2024-07-10**|**Training on the Test Task Confounds Evaluation and Emergence**|Ricardo Dominguez-Olmedo et.al.|[2407.07890](http://arxiv.org/abs/2407.07890)|**[link](https://github.com/socialfoundations/training-on-the-test-task)**|**\u6211\u4eec\u7814\u7a76\u4e86\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u7684\u6838\u5fc3\u95ee\u9898\uff0c\u79f0\u4e3a\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u3002\u8fd9\u5e76\u975e\u5982\u6570\u636e\u6cc4\u9732\u6216\u6c61\u67d3\u7b49\u4e0d\u5f53\u505a\u6cd5\uff0c\u800c\u662f\u4e00\u79cd\u9010\u6e10\u589e\u957f\u7684\u5305\u62ec\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u6280\u672f\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u4f1a\u6df7\u6dc6\u6a21\u578b\u7684\u76f8\u5bf9\u8bc4\u4f30\u548c\u5173\u4e8e\u6d8c\u73b0\u80fd\u529b\u7684\u58f0\u660e\u3002\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u4e4b\u95f4\u7684\u770b\u4f3c\u4f18\u52bf\u53ef\u80fd\u7531\u4ed6\u4eec\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\u7a0b\u5ea6\u5dee\u5f02\u6240\u89e3\u91ca\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u65b9\u6cd5\uff0c\u5373\u5728\u6bd4\u8f83\u524d\u5bf9\u6bcf\u4e2a\u6a21\u578b\u8fdb\u884c\u76f8\u540c\u7684\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5fae\u8c03\uff0c\u4ee5\u6821\u6b63\u8fd9\u79cd\u8bad\u7ec3\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u65e6\u8c03\u6574\u4e86\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\uff0c\u6d8c\u73b0\u884c\u4e3a\u7684\u5b9e\u4f8b\u5927\u591a\u6d88\u5931\u3002\u540c\u6837\u9002\u7528\u4e8e\u90a3\u4e9b\u65e0\u6cd5\u7528\u8bc4\u4ef7\u6307\u6807\u89e3\u91ca\u7684\u6d8c\u73b0\u884c\u4e3a\u62a5\u544a\u6848\u4f8b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63a8\u52a8\u4e86\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b0\u8bc4\u4ef7\u89c6\u89d2\uff0c\u5bf9\u57fa\u51c6\u6d4b\u8bd5\u548c\u6d8c\u73b0\u80fd\u529b\u7814\u7a76\u5177\u6709\u5e7f\u6cdb\u5f71\u54cd\u3002**|\n", "2407.07880": "|**2024-07-10**|**Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization**|Junkang Wu et.al.|[2407.07880](http://arxiv.org/abs/2407.07880)|**[link](https://github.com/junkangwu/dr_dpo)**|**\u672c\u7814\u7a76\u5173\u6ce8\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u566a\u58f0\u5bf9Direct Preference Optimization (DPO)\u65b9\u6cd5\u7684\u6311\u6218\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u6211\u4eec\u533a\u5206\u4e86\u4e24\u7c7b\u566a\u58f0\uff1a\u70b9\u566a\u58f0\uff0c\u6d89\u53ca\u4f4e\u8d28\u91cf\u7684\u6570\u636e\u70b9\uff1b\u548c\u6210\u5bf9\u566a\u58f0\uff0c\u5f71\u54cd\u504f\u597d\u7684\u6b63\u786e\u6392\u5e8f\u3002\u901a\u8fc7\u5206\u5e03\u5f0f\u9c81\u68d2\u4f18\u5316\uff08DRO\uff09\uff0c\u6211\u4eec\u589e\u5f3a\u4e86DPO\u62b5\u6297\u8fd9\u4e9b\u566a\u58f0\u7684\u80fd\u529b\u3002\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0cDPO\u672c\u8d28\u4e0a\u8574\u542b\u4e86DRO\u539f\u7406\uff0c\u5bf9\u70b9\u566a\u58f0\u5177\u6709\u5929\u7136\u7684\u9c81\u68d2\u6027\uff0c\u5176\u4e2d\u6b63\u5219\u5316\u7cfb\u6570$\\beta$\u5728\u6297\u566a\u58f0\u65b9\u9762\u8d77\u5173\u952e\u4f5c\u7528\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u5206\u5e03\u5f0f\u9c81\u68d2\u589e\u5f3a\u7684DPO\uff08Dr. DPO\uff09\uff0c\u5b83\u901a\u8fc7\u4f18\u5316\u6700\u574f\u60c5\u51b5\u7684\u6210\u5bf9\u573a\u666f\u6765\u96c6\u6210\u6210\u5bf9\u9c81\u68d2\u6027\u3002Dr. DPO\u4e2d\u7684\u65b0\u8d85\u53c2\u6570$\\beta'$\u5141\u8bb8\u5bf9\u6570\u636e\u5bf9\u53ef\u9760\u6027\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\uff0c\u5e73\u8861\u4e86\u5728\u5608\u6742\u8bad\u7ec3\u73af\u5883\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cDr. DPO\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u54cd\u5e94\u51c6\u786e\u6027\uff0c\u65e0\u8bba\u5728\u6709\u566a\u58f0\u8fd8\u662f\u65e0\u566a\u58f0\u7684\u8bbe\u7f6e\u4e0b\u90fd\u8868\u73b0\u51fa\u8272\u3002\u4ee3\u7801\u5df2\u5728https://github.com/junkangwu/Dr_DPO\u4e0a\u63d0\u4f9b\u3002**|\n", "2407.07858": "|**2024-07-10**|**FACTS About Building Retrieval Augmented Generation-based Chatbots**|Rama Akkiraju et.al.|[2407.07858](http://arxiv.org/abs/2407.07858)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u65e5\u76ca\u6210\u4e3a\u63d0\u5347\u5458\u5de5\u751f\u4ea7\u529b\u7684\u5173\u952e\u5de5\u5177\uff0c\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u53ca\u5982Langchain\u548cLlamaindex\u4e4b\u7c7b\u7684orchestration\u6846\u67b6\u5728\u6784\u5efa\u8fd9\u4e9b\u804a\u5929\u673a\u5668\u4eba\u4e2d\u626e\u6f14\u4e86\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u521b\u5efa\u6709\u6548\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u662f\u4e00\u9879\u6311\u6218\uff0c\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u7684RAG\u7ba1\u9053\u5de5\u7a0b\u3002\u8fd9\u5305\u62ec\u5fae\u8c03\u5d4c\u5165\u548cLLMs\u3001\u4ece\u5411\u91cf\u6570\u636e\u5e93\u63d0\u53d6\u6587\u6863\u3001\u91cd\u8ff0\u67e5\u8be2\u3001\u91cd\u65b0\u6392\u540d\u7ed3\u679c\u3001\u8bbe\u8ba1\u63d0\u793a\u3001\u9075\u5b88\u6587\u6863\u8bbf\u95ee\u63a7\u5236\u3001\u63d0\u4f9b\u7b80\u6d01\u7684\u56de\u7b54\u3001\u5305\u542b\u5f15\u7528\u3001\u4fdd\u62a4\u4e2a\u4eba\u4fe1\u606f\u4ee5\u53ca\u6784\u5efaorchestration\u4ee3\u7406\u3002\u6211\u4eec\u57fa\u4e8e\u4e09\u4e2aNVIDIA\u804a\u5929\u673a\u5668\u4eba\uff08\u5206\u522b\u7528\u4e8eIT/HR\u798f\u5229\u3001\u8d22\u52a1\u6536\u76ca\u548c\u901a\u7528\u5185\u5bb9\uff09\u7684\u7ecf\u9a8c\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6784\u5efaRAG\u804a\u5929\u673a\u5668\u4eba\u7684\u6846\u67b6\u2014\u2014FACTS\uff08Freshness\u3001Architectures\u3001Cost\u3001Testing\u3001Security\uff09\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u65b9\u9762\uff1a\u9996\u5148\u4ecb\u7ecdFACTS\u6846\u67b6\uff0c\u5176\u6b21\u5217\u51fa\u5341\u4e94\u4e2aRAG\u7ba1\u9053\u63a7\u5236\u70b9\uff0c\u6700\u540e\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u6a21\u578b\u548c\u5c0f\u6a21\u578b\u5728\u51c6\u786e\u6027\u548c\u5ef6\u8fdf\u4e4b\u95f4\u6743\u8861\u7684\u5b9e\u8bc1\u7ed3\u679c\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u7bc7\u5168\u9762\u63a2\u8ba8\u6784\u5efa\u5b89\u5168\u4f01\u4e1a\u7ea7\u804a\u5929\u673a\u5668\u4eba\u7684\u65b9\u6cd5\u548c\u89e3\u51b3\u65b9\u6848\u7684\u8bba\u6587\u3002|\n", "2407.07852": "|**2024-07-10**|**OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training**|Sami Jaghouar et.al.|[2407.07852](http://arxiv.org/abs/2407.07852)|**[link](https://github.com/PrimeIntellect-ai/OpenDiLoCo)**|**OpenDiLoCo\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u5206\u5e03\u5f0f\u4f4e\u901a\u4fe1\uff08DiLoCo\uff09\u8bad\u7ec3\u65b9\u6cd5\u7684\u5b9e\u73b0\u548c\u590d\u5236\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u53ef\u590d\u73b0\u7684DiLoCo\u5b9e\u9a8c\uff0c\u901a\u8fc7Hivemind\u5e93\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5927\u6d32\u548c\u4e09\u4e2a\u56fd\u5bb6\u4e4b\u95f4\u8bad\u7ec3\u6a21\u578b\uff0c\u540c\u65f6\u4fdd\u630190-95%\u7684\u8ba1\u7b97\u8d44\u6e90\u5229\u7528\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5173\u4e8e\u7b97\u6cd5\u8ba1\u7b97\u6548\u7387\u3001\u5de5\u4f5c\u5668\u6570\u91cf\u53ef\u6269\u5c55\u6027\u7684\u7814\u7a76\uff0c\u5e76\u8868\u660e\u5176\u68af\u5ea6\u53ef\u4ee5\u4f7f\u7528FP16\u8fdb\u884c\u5168\u5f52\u4e00\u5316\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06OpenDiLoCo\u6269\u5c55\u5230\u539f\u59cb\u5de5\u4f5c\u7684\u4e09\u500d\u89c4\u6a21\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u767e\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2407.07845": "|**2024-07-10**|**Natural Language Mechanisms via Self-Resolution with Foundation Models**|Nicolas Della Penna et.al.|[2407.07845](http://arxiv.org/abs/2407.07845)|null|\u5728\u5b9e\u9645\u64cd\u4f5c\u4e2d\uff0c\u4ee3\u7406\u4eba\u901a\u5e38\u53d7\u9650\u4e8e\u8bf8\u5982\u4ea4\u6613\u6216\u8ba2\u5355\u4e4b\u7c7b\u7684\u6709\u9650\u62a5\u544a\u683c\u5f0f\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u4ed6\u4eec\u8868\u8fbe\u4fe1\u606f\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u673a\u5236\uff0c\u5b83\u4fc3\u4f7f\u4ee3\u7406\u4eba\u4ee5\u81ea\u7136\u8bed\u8a00\u63d0\u4ea4\u62a5\u544a\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\u6765\u9009\u62e9\u7ed3\u679c\u548c\u5206\u914d\u62a5\u916c\u3002\u6211\u4eec\u786e\u5b9a\u4e86\u8fd9\u4e9b\u673a\u5236\u5728LLM\u4f5c\u4e3a\u826f\u597d\u7684\u4e16\u754c\u6a21\u578b\u4ee5\u53ca\u5f3a\u70c8\u7684\u8de8\u4ee3\u7406\u4fe1\u606f\u8fc7\u5ea6\u786e\u5b9a\u6761\u4ef6\u4e0b\u7684\u6fc0\u52b1\u517c\u5bb9\u6027\u548c\u6548\u7387\u7684\u5fc5\u8981\u6761\u4ef6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u4f20\u7edf\u9884\u6d4b\u5e02\u573a\u5728\u4fe1\u53f7\u7ed3\u6784\u4e0a\u5b58\u5728\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u673a\u5236\u80fd\u591f\u6210\u529f\u5730\u6574\u5408\u4fe1\u606f\u3002|\n", "2407.07810": "|**2024-07-10**|**Transformer Alignment in Large Language Models**|Murdock Aubry et.al.|[2407.07810](http://arxiv.org/abs/2407.07810)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6df1\u5165\u7406\u89e3\u5176\u5185\u90e8\u673a\u5236\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u89c6LLMs\u4e3a\u9ad8\u7ef4\u7a7a\u95f4\u4e2d\u7684\u79bb\u6563\u3001\u8026\u5408\u7684\u975e\u7ebf\u6027\u52a8\u529b\u7cfb\u7edf\uff0c\u901a\u8fc7\u7814\u7a76tokens\u5728Transformer\u5757\u4e2d\u7684\u8f68\u8ff9\uff0c\u5e76\u6cbf\u7740\u8fd9\u4e9b\u8f68\u8ff9\u7ebf\u6027\u5316\u7cfb\u7edf\uff0c\u5229\u7528\u96c5\u53ef\u6bd4\u77e9\u9635\u8fdb\u884c\u5206\u6790\u3002\u5728\u5bf938\u4e2a\u516c\u5f00\u53ef\u7528\u7684LLMs\u8fdb\u884c\u7814\u7a76\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u6b8b\u5dee\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u4e0a\u5de6\u548c\u53f3\u5947\u5f02\u5411\u91cf\u4e4b\u95f4\u7684\u5bf9\u9f50\uff0c\u4ee5\u53ca\u7ebf\u6027\u6027\u548c\u5c42\u5185\u6307\u6570\u589e\u957f\u7684\u51fa\u73b0\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u5bf9\u9f50\u5ea6\u7684\u63d0\u9ad8\u4e0e\u6a21\u578b\u6027\u80fd\u5448\u6b63\u76f8\u5173\u3002\u8bad\u7ec3\u540e\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u521d\u59cb\u5316\u6743\u91cd\u65f6\u7684\u6307\u6807\uff0c\u6709\u663e\u8457\u6539\u5584\uff0c\u8fd9\u5f3a\u8c03\u4e86\u8bad\u7ec3\u5728Transformer\u67b6\u6784\u4e2d\u7684\u91cd\u8981\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86\u4e00\u79cd\u4ee5\u524d\u672a\u88ab\u5145\u5206\u8ba4\u8bc6\u7684\u89c4\u5f8b\u6027\uff0c\u5f3a\u5316\u4e86\u52a8\u529b\u5b66\u89e3\u91ca\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3\u548c\u4f18\u5316LLM\u67b6\u6784\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.07799": "|**2024-07-10**|**Attribute or Abstain: Large Language Models as Long Document Assistants**|Jan Buchmann et.al.|[2407.07799](http://arxiv.org/abs/2407.07799)|**[link](https://github.com/ukplab/arxiv2024-attribute-or-abstain)**|**## \u80cc\u666f \u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8f85\u52a9\u5904\u7406\u957f\u7bc7\u6587\u6863\uff0c\u4f46\u5b83\u4eec\u4e5f\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\u3002\u589e\u52a0\u53ef\u4fe1\u5ea6\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u63d0\u4f9b\u8bc1\u636e\u652f\u6301\u54cd\u5e94\uff0c\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u3002\u5f53\u524d\u7684\u5f52\u56e0\u65b9\u6cd5\u4ec5\u5728\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u73af\u5883\u4e2d\u8bc4\u4f30\u8fc7\uff0c\u8fd9\u4e0e\u65e0\u9700\u68c0\u7d22\u7684\u957f\u6587\u6863\u573a\u666f\u4e0d\u540c\uff0c\u53ef\u80fd\u4ecd\u6709\u5e94\u7528\u4ef7\u503c\u3002\u56e0\u6b64\uff0c\u7f3a\u4e4f\u9488\u5bf9\u957f\u6587\u6863\u7684\u5f52\u56e0\u4e13\u95e8\u8bc4\u4f30\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLAB\uff0c\u4e00\u4e2a\u5305\u542b6\u4e2a\u591a\u6837\u5316\u7684\u957f\u6587\u6863\u4efb\u52a1\u7684\u57fa\u51c6\uff0c\u5e76\u5728\u56db\u79cd\u4e0d\u540c\u5927\u5c0f\u7684LLM\uff08\u5373\u63d0\u793a\u548c\u5fae\u8c03\uff09\u4e0a\u8bd5\u9a8c\u4e86\u4e0d\u540c\u7684\u5f52\u56e0\u65b9\u6cd5\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u6b65\u751f\u6210\u5f15\u7528\uff08citation\uff0c\u5373\u540c\u65f6\u8fdb\u884c\u54cd\u5e94\u751f\u6210\u548c\u8bc1\u636e\u63d0\u53d6\uff09\u7684\u8868\u73b0\u6700\u4f73\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u73b0\u8c61\u662f\u5426\u9002\u7528\u4e8e\u5f52\u56e0\uff0c\u4f46\u672a\u53d1\u73b0\u8fd9\u79cd\u60c5\u51b5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8bc1\u636e\u8d28\u91cf\u5728\u7b80\u5355\u54cd\u5e94\u7684\u573a\u666f\u4e0b\u53ef\u4ee5\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\uff0c\u4f46\u5bf9\u4e8e\u590d\u6742\u54cd\u5e94\u5219\u4e0d\u7136\uff0c\u56e0\u4e3a\u6a21\u578b\u5728\u4e3a\u590d\u6742\u4e3b\u5f20\u63d0\u4f9b\u8bc1\u636e\u65f6\u9762\u4e34\u6311\u6218\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2407.07796": "|**2024-07-11**|**Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard**|Oguzhan Topsakal et.al.|[2407.07796](http://arxiv.org/abs/2407.07796)|**[link](https://github.com/research-outcome/llm-game-benchmark)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u4e14\u53ef\u6269\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u7f51\u683c\u578b\u6e38\u620f\u5982\u4e95\u5b57\u68cb\u3001\u8fde\u63a5\u56db\u548c\u56f4\u68cb\u8fdb\u884c\u3002\u5f00\u6e90\u7684\u6e38\u620f\u6a21\u62df\u4ee3\u7801\u5728GitHub\u4e0a\u63d0\u4f9b\uff0c\u5141\u8bb8LLMs\u7ade\u6280\uff0c\u5e76\u751f\u6210JSON\u3001CSV\u3001TXT\u548cPNG\u683c\u5f0f\u7684\u8be6\u7ec6\u6570\u636e\u6587\u4ef6\uff0c\u7528\u4e8e\u6392\u884c\u699c\u6392\u540d\u548c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5305\u62ecAnthropic\u7684Claude 3.5 Sonnet\u548cClaude 3 Sonnet\uff0cGoogle\u7684Gemini 1.5 Pro\u548cGemini 1.5 Flash\uff0cOpenAI\u7684GPT-4 Turbo\u548cGPT-4o\uff0c\u4ee5\u53caMeta\u7684Llama3-70B\u5728\u5185\u7684\u9886\u5148LLM\u4e4b\u95f4\u7684\u6bd4\u8d5b\u7ed3\u679c\u3002\u6211\u4eec\u9f13\u52b1\u5176\u4ed6LLM\u63d0\u4ea4\u7ed3\u679c\u3002\u603b\u5171\u8fdb\u884c\u4e862,310\u573a\u6a21\u62df\u6bd4\u8d5b\uff08\u6bcf\u5bf9\u6a21\u578b\u8fdb\u884c5\u8f6e\uff0c\u51717\u4e2a\u6a21\u578b\u95f4\u7684\u5bf9\u5c40\uff0c\u4ee5\u53ca\u4e0e\u968f\u673a\u73a9\u5bb6\u7684\u6bd4\u8d5b\uff09\uff0c\u6db5\u76d6\u4e09\u79cd\u7c7b\u578b\u7684\u6e38\u620f\uff0c\u4f7f\u7528\u4e86\u5217\u8868\u3001\u63d2\u56fe\u548c\u56fe\u50cf\u4e09\u79cd\u63d0\u793a\u65b9\u5f0f\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u5728\u4e0d\u540c\u6e38\u620f\u548c\u63d0\u793a\u7c7b\u578b\u4e0b\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff0c\u5206\u6790\u5185\u5bb9\u5305\u62ec\u80dc\u7387\u3001\u9519\u5931\u673a\u4f1a\u548c\u65e0\u6548\u52a8\u4f5c\u3002\u6392\u884c\u699c\u548c\u7ed3\u679c\u77e9\u9635\u7684\u8be6\u7ec6\u6570\u636e\u4f5c\u4e3a\u5f00\u653e\u8bbf\u95ee\u6570\u636e\u5728GitHub\u4e0a\u63d0\u4f9b\u3002\u8fd9\u9879\u7814\u7a76\u52a0\u6df1\u4e86\u6211\u4eec\u5bf9LLM\u5728\u672a\u4e13\u95e8\u8bad\u7ec3\u7684\u6e38\u620f\u4e2d\u7684\u80fd\u529b\u7684\u7406\u89e3\uff0c\u6709\u52a9\u4e8e\u8bc4\u4f30\u5b83\u4eec\u7684\u89c4\u5219\u7406\u89e3\u80fd\u529b\u548c\u6218\u7565\u601d\u7ef4\u3002\u5728\u901a\u5411\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\u7684\u9053\u8def\u4e0a\uff0c\u8fd9\u9879\u7814\u7a76\u4e3a\u672a\u6765\u63a2\u7d22\u5b83\u4eec\u5728\u590d\u6742\u51b3\u7b56\u573a\u666f\u4e2d\u7684\u5b9e\u7528\u6027\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u7684\u6218\u7565\u601d\u8003\u80fd\u529b\uff0c\u5e76\u4e3a\u6df1\u5165\u63a2\u7a76LLM\u5728\u57fa\u4e8e\u6e38\u620f\u6846\u67b6\u5185\u7684\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u65b9\u5411\u3002**|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u6bd4\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.07778": "|**2024-07-10**|**WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment**|Jiefu Ou et.al.|[2407.07778](http://arxiv.org/abs/2407.07778)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7269\u7406\u73af\u5883\u4e2d\u90e8\u7f72\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4ee3\u7406\u65f6\u6240\u9700\u7684\u57fa\u672c\u64cd\u4f5c\uff08API\uff09\u6570\u91cf\u548c\u8bbe\u8ba1\u95ee\u9898\u3002\u7814\u7a76\u8005\u8bbe\u60f3\uff0c\u5982\u679cwikiHow\u6559\u7a0b\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u7528\u6237\u81ea\u7f16\u4efb\u52a1\uff0c\u90a3\u4e48\u8fd9\u4e9b\u4efb\u52a1\u6240\u9700\u7684API\u8303\u56f4\u662f\u4ec0\u4e48\u3002\u4ed6\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06wikiHow\u6307\u4ee4\u4e0e\u7f6e\u8eab\u4e8e\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u7b56\u7565\u5173\u8054\uff0c\u8fed\u4ee3\u5730\u751f\u6210\u65b0\u7684API\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f53\u611f\u89c4\u5212\u65b9\u9762\u7684\u6700\u65b0\u6210\u5c31\uff0c\u7814\u7a76\u8005\u63d0\u8bae\u4f7f\u7528\u5c11\u91cf\u6837\u4f8b\u63d0\u793aGPT-4\u751f\u6210Python\u4ee3\u7801\u4f5c\u4e3a\u4ee3\u7406\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4ee5\u4e0b\u6b65\u9aa4\u6269\u5c55API\u5e93\uff1a1\uff09\u91cd\u7528\u521d\u59cbAPI\u96c6\uff1b2\uff09\u5728\u5fc5\u8981\u65f6\u521b\u5efa\u65b0\u7684API\u8c03\u7528\u3002\u5b9e\u9a8c\u5173\u6ce8\u7684\u662f\u5b9a\u4e49API\uff0c\u800c\u975e\u5176\u5b9e\u73b0\u6027\u3002\u5728\u4e00\u5c0f\u90e8\u5206wikiHow\u6559\u7a0b\u4e0a\u5e94\u7528\u8be5\u65b9\u6cd5\u540e\uff0c\u53d1\u73b0\u9700\u8981300\u591a\u4e2aAPI\u6765\u6355\u6349\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u6837\u4efb\u52a1\u3002\u81ea\u52a8\u548c\u4eba\u5de5\u5206\u6790\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7ba1\u9053\u80fd\u6709\u6548\u590d\u7528\u548c\u521b\u9020API\u3002\u8fdb\u4e00\u6b65\u7684\u4eba\u5de5\u5ba1\u67e5\u53d1\u73b0\uff0c\u73b0\u6709\u7684\u6a21\u62df\u5668\u4ec5\u652f\u6301\u8bf1\u5bfc\u51fa\u7684API\u7684\u4e00\u5c0f\u90e8\u5206\uff08\u524d50\u4e2a\u5e38\u7528API\u4e2d\u76849\u4e2a\uff09\uff0c\u8fd9\u4fc3\u4f7f\u5f00\u53d1\u66f4\u4e30\u5bcc\u7684\u4f53\u611f\u73af\u5883\u3002|\n", "2407.08739": "|**2024-07-11**|**MAVIS: Mathematical Visual Instruction Tuning**|Renrui Zhang et.al.|[2407.08739](http://arxiv.org/abs/2407.08739)|**[link](https://github.com/zrrskywalker/mavis)**|**### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8fd1\u5e74\u6765\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u591a\u6a21\u6001\u573a\u666f\u4e2d\u7684\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5bf9\u6570\u5b66\u56fe\u89e3\u7684\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u80fd\u529b\u7814\u7a76\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6307\u51fa\u4e86MLLM\u5728\u6570\u5b66\u89c6\u89c9\u9886\u57df\u7684\u4e09\u4e2a\u5173\u952e\u6539\u8fdb\u9886\u57df\uff1a\u6570\u5b66\u56fe\u89e3\u7684\u89c6\u89c9\u7f16\u7801\u3001\u56fe\u89e3\u4e0e\u8bed\u8a00\u7684\u5bf9\u9f50\u4ee5\u53ca\u6570\u5b66\u63a8\u7406\u6280\u80fd\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u9700\u8981\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u7684\u89c6\u89c9\u6570\u5b66\u6570\u636e\u548c\u8bad\u7ec3\u6d41\u7a0b\u3002\u672c\u6587\u63d0\u51faMAVIS\uff08Mathematical VISual instruction tuning for MLLMs\uff09\uff0c\u4e00\u4e2a\u9488\u5bf9MLLM\u7684\u6570\u5b66\u89c6\u89c9\u6307\u5bfc\u8c03\u53c2\u8303\u5f0f\uff0c\u5305\u62ec\u4e00\u7cfb\u5217\u6570\u5b66\u89c6\u89c9\u6570\u636e\u96c6\u548c\u4e13\u95e8\u7684MLLM\u3002 ### \u65b9\u6cd5 MAVIS\u5206\u4e3a\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u4ece\u5934\u5f00\u59cb\u7684\u8bad\u7ec3\u3002\u9996\u5148\uff0c\u6211\u4eec\u521b\u5efa\u4e86MAVIS-Caption\uff0c\u5305\u542b558,000\u4e2a\u56fe\u89e3-\u63cf\u8ff0\u5bf9\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u5fae\u8c03\u4e13\u4e3a\u6570\u5b66\u8bbe\u8ba1\u7684\u89c6\u89c9\u7f16\u7801\u5668\uff08CLIP-Math\uff09\uff0c\u4ee5\u63d0\u5347\u56fe\u89e3\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u5176\u6b21\uff0c\u5229\u7528MAVIS-Caption\uff0c\u6211\u4eec\u901a\u8fc7\u6295\u5f71\u5c42\u5c06CLIP-Math\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5173\u8054\uff0c\u589e\u5f3a\u6570\u5b66\u9886\u57df\u7684\u89c6\u89c9\u8bed\u8a00\u5bf9\u9f50\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f15\u5165MAVIS-Instruct\uff0c\u5305\u542b900,000\u4e2a\u7cbe\u5fc3\u6536\u96c6\u548c\u6807\u6ce8\u7684\u89c6\u89c9\u6570\u5b66\u95ee\u9898\uff0c\u7528\u4e8e\u6700\u7ec8\u6307\u5bfc\u8c03\u53c2\uff0c\u4ee5\u589e\u5f3aMLLM\u7684\u7a33\u5065\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u5728MAVIS-Instruct\u4e2d\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6bcf\u4e2a\u95ee\u9898\u7684\u5b8c\u6574\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u7406\u7531\uff0c\u5e76\u51cf\u5c11\u6587\u672c\u5197\u4f59\uff0c\u4f7f\u6a21\u578b\u66f4\u4e13\u6ce8\u4e8e\u89c6\u89c9\u5143\u7d20\u3002 ### \u7ed3\u679c \u6570\u636e\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/ZrrSkywalker/MAVIS\u3002\u901a\u8fc7MAVIS\uff0c\u6211\u4eec\u65e8\u5728\u586b\u8865\u6570\u5b66\u89c6\u89c9\u7406\u89e3\u7684\u7a7a\u767d\uff0c\u63d0\u5347MLLM\u5728\u89e3\u51b3\u5b9e\u9645\u6570\u5b66\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002**|\n", "2407.08735": "|**2024-07-11**|**Real-Time Anomaly Detection and Reactive Planning with Large Language Models**|Rohan Sinha et.al.|[2407.08735](http://arxiv.org/abs/2407.08735)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5728\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u68c0\u6d4b\u548c\u5e94\u5bf9\u5f02\u5e38\u60c5\u51b5\uff0c\u4ee5\u63d0\u9ad8\u5176\u9c81\u68d2\u6027\u548c\u5b89\u5168\u6027\u3002\u4e3b\u8981\u6311\u6218\u5305\u62ec\u51cf\u5c11\u6a21\u578b\u7684\u8ba1\u7b97\u5f00\u9500\u4ee5\u4fbf\u5b9e\u73b0\u5b9e\u65f6\u5e94\u7528\uff0c\u4ee5\u53ca\u5c06\u6a21\u578b\u7684\u5224\u65ad\u878d\u5165\u5230\u5b89\u5168\u63a7\u5236\u6846\u67b6\u4e2d\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u63a8\u7406\u6846\u67b6\uff1a\u9996\u5148\u662f\u4e00\u4e2a\u5feb\u901f\u7684\u4e8c\u5143\u5f02\u5e38\u5206\u7c7b\u5668\uff0c\u5b83\u5728\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7a7a\u95f4\u4e2d\u5206\u6790\u89c2\u6d4b\u6570\u636e\uff0c\u5982\u679c\u53d1\u73b0\u5f02\u5e38\uff0c\u4f1a\u89e6\u53d1\u540e\u7eed\u7684\u6162\u901f\u63a8\u7406\u9636\u6bb5\uff0c\u5229\u7528\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6df1\u5165\u7684\u903b\u8f91\u63a8\u7406\u3002\u8fd9\u79cd\u8bbe\u8ba1\u7c7b\u4f3c\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\u4e2d\u7684\u51b3\u7b56\u5206\u652f\uff0c\u8003\u8651\u5230\u6162\u901f\u63a8\u7406\u5668\u7684\u5ef6\u8fdf\uff0c\u53ef\u4ee5\u7acb\u5373\u91c7\u53d6\u5907\u4efd\u8ba1\u5212\uff0c\u786e\u4fdd\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002 \u901a\u8fc7\u4e0e\u6700\u5148\u8fdb\u7684GPT\u6a21\u578b\u7684\u81ea\u56de\u5f52\u63a8\u7406\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5373\u4f7f\u4f7f\u7528\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ed6\u4eec\u7684\u5feb\u901f\u5f02\u5e38\u5206\u7c7b\u5668\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4f7f\u5f97\u4ed6\u4eec\u5f00\u53d1\u7684\u8fd0\u884c\u65f6\u76d1\u63a7\u5668\u80fd\u591f\u5728\u8d44\u6e90\u548c\u65f6\u95f4\u9650\u5236\u4e0b\uff0c\u63d0\u5347\u52a8\u6001\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5982\u56db\u65cb\u7ffc\u65e0\u4eba\u673a\u6216\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u4fe1\u4efb\u5ea6\u3002\u8bba\u6587\u7684\u89c6\u9891\u793a\u4f8b\u53ef\u4ee5\u5728\u9879\u76ee\u9875\u9762\u4e0a\u67e5\u770b\uff1ahttps://sites.google.com/view/aesop-llm\u3002|\n", "2407.08733": "|**2024-07-11**|**Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist**|Zihao Zhou et.al.|[2407.08733](http://arxiv.org/abs/2407.08733)|null|### \u7ffb\u8bd1 **\u6458\u8981\uff1a** \u5f3a\u5927\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5353\u8d8a\u6027\u80fd\u7684\u5173\u952e\u4f53\u73b0\u3002\u5982\u4f55\u5b9a\u4e49\u548c\u5168\u9762\u8bc4\u4f30LLMs\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ee5\u53ca\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u53cd\u6620\u7528\u6237\u4f53\u9a8c\uff0c\u5df2\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u76ee\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u4fa7\u91cd\u4e8e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u8fc7\u62df\u5408\uff0c\u5e76\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u771f\u6b63\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u5982\u679c\u6a21\u578b\u771f\u6b63\u7406\u89e3\u4e86\u95ee\u9898\uff0c\u5b83\u5e94\u8be5\u80fd\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u7a33\u5065\u4e14\u7075\u6d3b\u5730\u5e94\u7528\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMATHCHECK\uff0c\u4e00\u4e2a\u65e8\u5728\u6d4b\u8bd5\u4efb\u52a1\u6cdb\u5316\u548c\u63a8\u7406\u9c81\u68d2\u6027\u7684\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6e05\u5355\uff0c\u4ee5\u53ca\u4e00\u4e2a\u81ea\u52a8\u751f\u6210\u6e05\u5355\u7684\u5de5\u5177\u3002MATHCHECK\u5305\u542b\u591a\u4e2a\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u548c\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u6570\u5b66\u63a8\u7406\u80fd\u529b\u548c\u884c\u4e3a\u6d4b\u8bd5\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6211\u4eec\u5229\u7528MATHCHECK\u521b\u5efa\u4e86MATHCHECK-GSM\u548cMATHCHECK-GEO\uff0c\u5206\u522b\u9488\u5bf9\u6570\u5b66\u6587\u672c\u63a8\u7406\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u8bc4\u4f30\uff0c\u5b83\u4eec\u662fGSM8k\u3001GeoQA\u3001UniGeo\u548cGeometry3K\u7b49\u57fa\u51c6\u7684\u5347\u7ea7\u7248\u3002\u6211\u4eec\u4f7f\u7528MATHCHECK-GSM\u548cMATHCHECK-GEO\u5bf9\u8d85\u8fc720\u79cdLLM\u548c11\u79cd\u591a\u6a21\u6001LLMs\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u68c0\u9a8c\u5b83\u4eec\u7684\u7efc\u5408\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u6a21\u578b\u5982GPT-4\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u4ed6\u6a21\u578b\u5bb6\u65cf\u5728\u6e05\u5355\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4e0b\u964d\u3002\u8fdb\u4e00\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u4f20\u7edf\u6570\u5b66\u57fa\u51c6\u76f8\u6bd4\uff0cMATHCHECK\u66f4\u597d\u5730\u53cd\u6620\u4e86\u771f\u6b63\u7684\u6570\u5b66\u80fd\u529b\uff0c\u7ebf\u6027\u5ea6\u66f4\u9ad8\uff0c\u4ece\u800c\u652f\u6301\u6211\u4eec\u7684\u8bbe\u8ba1\u3002\u901a\u8fc7MATHCHECK\uff0c\u6211\u4eec\u53ef\u4ee5\u8f7b\u677e\u8fdb\u884c\u8be6\u7ec6\u7684\u884c\u4e3a\u5206\u6790\uff0c\u6df1\u5165\u63a2\u7a76\u6a21\u578b\u3002|\n", "2407.08716": "|**2024-07-11**|**A Taxonomy for Data Contamination in Large Language Models**|Medha Palavalli et.al.|[2407.08716](http://arxiv.org/abs/2407.08716)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5e7f\u6cdb\u7f51\u7edc\u8bed\u6599\u5e93\u7684\u9884\u8bad\u7ec3\u540e\uff0c\u5728\u4f17\u591a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u6570\u636e\u6c61\u67d3\u95ee\u9898\u65e5\u76ca\u5f15\u8d77\u5173\u6ce8\uff0c\u5373\u8bc4\u4f30\u6570\u636e\u53ef\u80fd\u5b58\u5728\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\uff0c\u5bfc\u81f4\u6a21\u578b\u8868\u73b0\u865a\u9ad8\u3002\u53bb\u6c61\u67d3\uff08decontamination\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u8bd5\u56fe\u68c0\u6d4b\u5e76\u79fb\u9664\u8fd9\u4e9b\u6c61\u67d3\u6570\u636e\u3002\u7136\u800c\uff0c\u6c61\u67d3\u6570\u636e\u53ef\u80fd\u6e90\u4e8e\u6d4b\u8bd5\u96c6\u7684\u4fee\u6539\u7248\u672c\uff0c\u8fd9\u4f7f\u5f97\u68c0\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u76ee\u524d\u5c1a\u4e0d\u6e05\u695a\u4e0d\u540c\u7c7b\u578b\u7684\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u7c7b\u4f53\u7cfb\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u9047\u5230\u7684\u5404\u79cd\u6c61\u67d3\u7c7b\u578b\u8fdb\u884c\u5212\u5206\uff0c\u5e76\u786e\u5b9a\u4e86\u54ea\u4e9b\u7c7b\u578b\u7684\u98ce\u9669\u6700\u9ad8\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u603b\u7ed3\u548c\u95ee\u7b54\u4e24\u4e2a\u5173\u952e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u7c7b\u578b\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5728\u5b9e\u9645\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u3002|\n", "2407.08713": "|**2024-07-11**|**GTA: A Benchmark for General Tool Agents**|Jize Wang et.al.|[2407.08713](http://arxiv.org/abs/2407.08713)|**[link](https://github.com/open-compass/GTA)**|**\u4eba\u4eec\u666e\u904d\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5404\u79cd\u5de5\u5177\u7684\u6574\u5408\uff0c\u4ee5\u5f00\u53d1\u901a\u7528\u4ee3\u7406\uff0c\u4f46\u8fd9\u5bf9LLMs\u7684\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u63d0\u51fa\u4e86\u6311\u6218\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u6cd5\u5b58\u5728\u660e\u663e\u7f3a\u9677\uff0c\u5982\u4f7f\u7528AI\u751f\u6210\u7684\u67e5\u8be2\u3001\u5355\u6b65\u9aa4\u4efb\u52a1\u3001\u6a21\u62df\u5de5\u5177\u4ee5\u53ca\u4ec5\u9650\u6587\u672c\u7684\u4ea4\u4e92\uff0c\u672a\u80fd\u5145\u5206\u5c55\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u4e2d\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faGTA\uff08\u901a\u7528\u5de5\u5177\u4ee3\u7406\u57fa\u51c6\uff09\uff0c\u5b83\u5305\u542b\u4e09\u4e2a\u5173\u952e\u7279\u6027\uff1a\uff081\uff09\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff1a\u7531\u4eba\u7c7b\u7f16\u5199\uff0c\u5177\u6709\u7b80\u5355\u7684\u73b0\u5b9e\u4e16\u754c\u76ee\u6807\uff0c\u4f46\u9690\u542b\u4e86\u5de5\u5177\u4f7f\u7528\u9700\u6c42\uff0c\u8981\u6c42LLMs\u80fd\u63a8\u7406\u51fa\u5408\u9002\u7684\u5de5\u5177\u5e76\u89c4\u5212\u89e3\u51b3\u65b9\u6848\u6b65\u9aa4\u3002\uff082\uff09\u771f\u5b9e\u90e8\u7f72\u7684\u5de5\u5177\uff1a\u4e00\u4e2a\u914d\u5907\u6709\u611f\u77e5\u3001\u64cd\u4f5c\u3001\u903b\u8f91\u548c\u521b\u65b0\u7c7b\u5de5\u5177\u7684\u8bc4\u4f30\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u7684\u5b9e\u9645\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002\uff083\uff09\u771f\u5b9e\u7684\u591a\u6a21\u6001\u8f93\u5165\uff1a\u5305\u62ec\u7a7a\u95f4\u573a\u666f\u56fe\u7247\u3001\u7f51\u9875\u622a\u56fe\u3001\u8868\u683c\u3001\u4ee3\u7801\u7247\u6bb5\u548c\u6253\u5370/\u624b\u5199\u6750\u6599\u7b49\uff0c\u4ee5\u8d34\u8fd1\u771f\u5b9e\u4e16\u754c\u7684\u573a\u666f\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86229\u4e2a\u73b0\u5b9e\u751f\u6d3b\u4efb\u52a1\u548c\u53ef\u6267\u884c\u7684\u5de5\u5177\u94fe\uff0c\u6765\u8bc4\u4f30\u4e3b\u6d41LLMs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff0c\u73b0\u6709\u7684LLMs\u9762\u4e34\u4e25\u5cfb\u6311\u6218\uff0cGPT-4\u5b8c\u6210\u7684\u4efb\u52a1\u4e0d\u8db3\u4e00\u534a\uff0c\u5927\u591a\u6570\u6a21\u578b\u7684\u6210\u7ee9\u4f4e\u4e8e25%\u3002\u8fd9\u4e2a\u8bc4\u4f30\u63ed\u793a\u4e86\u5f53\u524dLLMs\u5728\u5b9e\u9645\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u4e0a\u7684\u74f6\u9888\uff0c\u4e3a\u63d0\u5347\u901a\u7528\u5de5\u5177\u4ee3\u7406\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\u3002GTA\u7684\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2407.08701": "|**2024-07-11**|**Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models**|Zhening Xing et.al.|[2407.08701](http://arxiv.org/abs/2407.08701)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u673a\u5236\uff0c\u5728\u6587\u672c\u548c\u97f3\u9891\u6d41\u6570\u636e\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5bf9\u5b9e\u65f6\u89c6\u9891\u5904\u7406\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u89c6\u9891\u6d41\u5904\u7406\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u8f83\u5c11\u3002\u73b0\u6709\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u4f9d\u8d56\u53cc\u5411\u65f6\u95f4\u6ce8\u610f\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u76f4\u64ad\u89c6\u9891\u7684\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLive2Diff\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u5b9e\u65f6\u89c6\u9891\u7ffb\u8bd1\u8bbe\u8ba1\u7684\u5177\u6709\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u3002\u4e0e\u5148\u524d\u5de5\u4f5c\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e0e\u524d\u4e00\u5e27\u53ca\u5176\u5c11\u6570\u9884\u70ed\u5e27\u76f8\u5173\u8054\uff0c\u4fdd\u6301\u4e86\u65f6\u95f4\u4e00\u81f4\u6027\u548c\u5e73\u6ed1\u6027\uff0c\u65e0\u9700\u8003\u8651\u672a\u6765\u5e27\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u9ad8\u6548\u7684\u964d\u566a\u65b9\u6848\uff0c\u5305\u62ecKV\u7f13\u5b58\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u5904\u7406\uff0c\u4ee5\u652f\u6301\u4e92\u52a8\u5e27\u7387\u4e0b\u7684\u89c6\u9891\u6d41\u7ffb\u8bd1\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6ce8\u610f\u529b\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u8bbe\u8ba1\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u4fdd\u6301\u65f6\u95f4\u5e73\u6ed1\u6027\u548c/\u6216\u6548\u7387\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.08699": "|**2024-07-11**|**Mitigating Catastrophic Forgetting in Language Transfer via Model Merging**|Anton Alexandrov et.al.|[2407.08699](http://arxiv.org/abs/2407.08699)|null|\u968f\u7740\u5f00\u653e\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u82f1\u8bed\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u4e0d\u65ad\u63d0\u5347\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u81f4\u529b\u4e8e\u5c06\u5176\u6269\u5c55\u5230\u5176\u4ed6\u8bed\u8a00\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bed\u8a00\u9002\u5e94\u5f80\u5f80\u4f1a\u5bfc\u81f4\u57fa\u7840\u6a21\u578b\u80fd\u529b\u7684\u707e\u96be\u6027\u9057\u5fd8\uff0c\u9650\u5236\u4e86\u6539\u7f16\u540e\u6a21\u578b\u7684\u5b9e\u7528\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u9002\u5e94\u65b9\u6cd5\u2014\u2014Branch-and-Merge\uff08BaM\uff09\uff0c\u5b83\u57fa\u4e8e\u8fed\u4ee3\u5730\u5408\u5e76\u591a\u4e2a\u9488\u5bf9\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u3002BaM\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4ea7\u751f\u7684\u662f\u5e45\u5ea6\u8f83\u5c0f\u4f46\u8d28\u91cf\u66f4\u9ad8\u7684\u6743\u91cd\u8c03\u6574\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u6e90\u9886\u57df\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u76ee\u6807\u9886\u57df\u7684\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4fdd\u52a0\u5229\u4e9a\u8bed\u548c\u5fb7\u8bed\u7684\u5e7f\u6cdb\u5b9e\u8bc1\u7814\u7a76\u4e2d\u5c55\u793a\u4e86BaM\u7684\u4f18\u52bf\uff1a\u5b83\u80fd\u663e\u8457\u964d\u4f4e\u9057\u5fd8\uff0c\u540c\u65f6\u5728\u4e0d\u540c\u6a21\u578b\u67b6\u6784\u4e0a\u4e0e\u6807\u51c6\u6301\u7eed\u9884\u8bad\u7ec3\u548c\u6307\u4ee4\u5fae\u8c03\u76f8\u6bd4\uff0c\u80fd\u591f\u5339\u914d\u751a\u81f3\u63d0\u5347\u76ee\u6807\u9886\u57df\u7684\u6027\u80fd\u3002|\n", "2407.08694": "|**2024-07-11**|**Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight**|Zhiqiang Xie et.al.|[2407.08694](http://arxiv.org/abs/2407.08694)|null|\u5728\u73b0\u4ee3\u4e91\u7cfb\u7edf\u4e2d\uff0c\u8fd0\u884c\u65f6\u6545\u969c\u548c\u6027\u80fd\u4e0b\u964d\u662f\u5e38\u6001\u3002\u5bf9\u4e8e\u4e91\u670d\u52a1\u63d0\u4f9b\u5546\u800c\u8a00\uff0c\u81ea\u52a8\u786e\u5b9a\u95ee\u9898\u7684\u6839\u672c\u539f\u56e0\u662f\u4fdd\u8bc1\u9ad8\u53ef\u9760\u6027\u548c\u53ef\u7528\u6027\u7684\u5173\u952e\uff0c\u56e0\u4e3a\u5feb\u901f\u7684\u6545\u969c\u5b9a\u4f4d\u6709\u52a9\u4e8e\u52a0\u5feb\u8bca\u65ad\u548c\u4f18\u5148\u7ea7\u6392\u5e8f\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u89e3\u51b3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u4e2d\uff0c\u56e0\u679c\u63a8\u7406\u5229\u7528\u56e0\u679c\u56fe\u6765\u6355\u6349\u4e0d\u540c\u4e91\u7cfb\u7edf\u6027\u80fd\u6307\u6807\u4e4b\u95f4\u7684\u5173\u7cfb\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u7cfb\u7edf\u5f00\u53d1\u8005\u9700\u8981\u7cbe\u786e\u5b9a\u4e49\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\uff0c\u8fd9\u662f\u4e00\u9879\u8017\u65f6\u3001\u8106\u5f31\u4e14\u6311\u6218\u6027\u7684\u5de5\u4f5c\uff0c\u5c24\u5176\u5bf9\u4e8e\u5e9e\u5927\u548c\u52a8\u6001\u7684\u7cfb\u7edf\uff0c\u4e14\u9700\u8981\u6df1\u539a\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u5728\u4e91\u7cfb\u7edf\u4e2d\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u6545\u969c\u4e8b\u4ef6\u7684\u53d1\u751f\u9891\u7387\u76f8\u5bf9\u8f83\u4f4e\u3002 \u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u89e3\u51b3\u65b9\u6848\u2014\u2014Atlas\uff0c\u5b83\u80fd\u591f\u81ea\u52a8\u5408\u6210\u4e91\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\u3002Atlas\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u7cfb\u7edf\u6587\u6863\u3001\u65e5\u5fd7\u548c\u90e8\u7f72\u53cd\u9988\u751f\u6210\u56e0\u679c\u56fe\u3002Atlas\u4e0e\u6570\u636e\u9a71\u52a8\u7684\u56e0\u679c\u53d1\u73b0\u6280\u672f\u76f8\u8f85\u76f8\u6210\uff0c\u5e76\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u9a8c\u8bc1\u6b65\u9aa4\u8fdb\u884c\u589e\u5f3a\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u6545\u969c\u5b9a\u4f4d\u573a\u666f\u4e2d\u8bc4\u4f30\u4e86Atlas\uff0c\u7ed3\u679c\u8868\u660e\uff0cAtlas\u80fd\u591f\u5728\u53ef\u6269\u5c55\u548c\u666e\u9002\u7684\u65b9\u5f0f\u4e0b\u751f\u6210\u56e0\u679c\u56fe\uff0c\u5176\u6027\u80fd\u8fdc\u8d85\u6570\u636e\u9a71\u52a8\u7b97\u6cd5\uff0c\u5e76\u4e0e\u57fa\u51c6\u7ebf\u76f8\u5f53\u3002|\n", "2407.08683": "|**2024-07-11**|**SEED-Story: Multimodal Long Story Generation with Large Language Model**|Shuai Yang et.al.|[2407.08683](http://arxiv.org/abs/2407.08683)|**[link](https://github.com/tencentarc/seed-story)**|**\u968f\u7740\u56fe\u50cf\u751f\u6210\u548c\u5f00\u653e\u5f62\u5f0f\u6587\u672c\u751f\u6210\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u6709\u5438\u5f15\u529b\u3002\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\uff0c\u5373\u751f\u6210\u53d9\u4e8b\u6587\u672c\u4e0e\u751f\u52a8\u56fe\u50cf\u7684\u4ea4\u9519\u5e8f\u5217\uff0c\u4f5c\u4e3a\u4e00\u79cd\u6709\u4ef7\u503c\u7684\u5b9e\u7528\u4efb\u52a1\uff0c\u56e0\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u524d\u666f\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u7406\u89e3\u6587\u672c\u548c\u56fe\u50cf\u590d\u6742\u4ea4\u4e92\u3001\u751f\u6210\u8fde\u8d2f\u4e14\u76f8\u5173\u6587\u672c\u548c\u89c6\u89c9\u5185\u5bb9\u7684\u6311\u6218\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51faSEED-Story\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u6765\u751f\u6210\u6269\u5c55\u7684\u591a\u6a21\u6001\u6545\u4e8b\u3002\u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8eMLLM\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u65e2\u80fd\u9884\u6d4b\u6587\u672c\u4ee4\u724c\uff0c\u4e5f\u80fd\u9884\u6d4b\u89c6\u89c9\u4ee4\u724c\uff0c\u7136\u540e\u901a\u8fc7\u9002\u5e94\u7684\u89c6\u89c9\u89e3\u4ee4\u724c\u5316\u5668\u5904\u7406\uff0c\u751f\u6210\u5177\u6709\u4e00\u81f4\u89d2\u8272\u548c\u98ce\u683c\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u591a\u6a21\u6001\u6ce8\u610f\u529b\u6c89\u964d\u673a\u5236\uff0c\u4f7f\u5f97\u5728\u9ad8\u5ea6\u81ea\u52a8\u9012\u5f52\u7684\u65b9\u5f0f\u4e0b\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe25\u4e2a\u5e8f\u5217\uff08\u4ec5\u752810\u4e2a\u8fdb\u884c\u8bad\u7ec3\uff09\u7684\u6545\u4e8b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u5927\u89c4\u6a21\u9ad8\u5206\u8fa8\u7387\u7684StoryStream\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bad\u7ec3\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5e76\u91cf\u5316\u8bc4\u4f30\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\u4efb\u52a1\u5728\u591a\u4e2a\u65b9\u9762\u7684\u6027\u80fd\u3002**|\n", "2407.08662": "|**2024-07-11**|**Uncertainty Estimation of Large Language Models in Medical Question Answering**|Jiaxin Wu et.al.|[2407.08662](http://arxiv.org/abs/2407.08662)|null|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b58\u5728\u4ea7\u751f\u9519\u8bef\u4e8b\u5b9e\u7684\u98ce\u9669\u3002\u4e3a\u4e86\u5728\u533b\u7597\u95ee\u9898\u89e3\u7b54\u4e2d\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\uff0c\u9700\u8981\u53ef\u9760\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\uff08UE\uff09\u65b9\u6cd5\u6765\u8bc6\u522b\u5e7b\u89c9\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5728\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u5bf9\u6d41\u884cUE\u65b9\u6cd5\u53ca\u5176\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u65b9\u6cd5\u5728\u8be5\u9886\u57df\u901a\u5e38\u8868\u73b0\u4e0d\u4f73\uff0c\u51f8\u663e\u4e86\u533b\u7597\u5e94\u7528\u4e2d\u7684UE\u6311\u6218\u3002\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5f80\u5f80\u80fd\u83b7\u5f97\u66f4\u597d\u7684\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u89c4\u6a21\u4e0eUE\u53ef\u9760\u6027\u53ef\u80fd\u5b58\u5728\u5173\u8054\u3002 \u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4e24\u9636\u6bb5\u9a8c\u8bc1\u201d\u7684\u6982\u7387\u81ea\u7531\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u6cd5\u3002\u9996\u5148\uff0cLLM\u751f\u6210\u9010\u6b65\u89e3\u91ca\u548c\u521d\u59cb\u7b54\u6848\uff0c\u63a5\u7740\u5236\u5b9a\u6838\u67e5\u95ee\u9898\u4ee5\u68c0\u67e5\u89e3\u91ca\u4e2d\u7684\u4e8b\u5b9e\u9648\u8ff0\u3002\u6a21\u578b\u4f1a\u4e24\u6b21\u56de\u7b54\u8fd9\u4e9b\u95ee\u9898\uff1a\u4e00\u6b21\u72ec\u7acb\uff0c\u4e00\u6b21\u53c2\u8003\u89e3\u91ca\u3002\u4e24\u79cd\u7b54\u6848\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u5ea6\u8861\u91cf\u539f\u59cb\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u4f7f\u7528Llama 2 Chat\u6a21\u578b\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u51c6\u57fa\u7ebf\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4e24\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u5728\u5404\u4e2a\u6570\u636e\u96c6\u548c\u6a21\u578b\u89c4\u6a21\u4e0a\u5b9e\u73b0\u4e86\u6700\u4f73\u7684\u6574\u4f53\u51c6\u786e\u6027\u548c\u7a33\u5b9a\u6027\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u968f\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\u800c\u63d0\u5347\u3002|\n", "2407.09467": "|**2024-07-12**|**FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3**|Georgios Makridis et.al.|[2407.09467](http://arxiv.org/abs/2407.09467)|null|\u5728\u8fd9\u4e2a\u5145\u6ee1\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u53d9\u4e8b\u591a\u6837\u6027\u4e16\u754c\u4e2d\uff0c\u6709\u4e00\u4e2a\u72ec\u7279\u7684\u673a\u4f1a\u662f\u901a\u8fc7\u5b9a\u5236\u548c\u4e2a\u6027\u5316\u7684\u53d9\u8ff0\u5438\u5f15\u5e74\u8f7b\u89c2\u4f17\u3002\u672c\u6587\u4ecb\u7ecdFairyLandAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u513f\u7ae5\u5f00\u53d1\u7684\u521b\u65b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u57fa\u4e8eOpenAI\u7684API\u6784\u5efa\u3002\u5176\u7279\u522b\u4e4b\u5904\u5728\u4e8e\uff0cFairyLandAI\u4e0d\u4ec5\u80fd\u751f\u6210\u5f15\u4eba\u5165\u80dc\u3001\u9002\u5408\u5404\u5e74\u9f84\u6bb5\u4e14\u53cd\u6620\u5404\u79cd\u4f20\u7edf\u7684\u6545\u4e8b\uff0c\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5408\u9ad8\u7ea7\u56fe\u50cf\u751f\u6210\u5de5\u5177\uff08\u5982GenAI\u548cDalle-3\uff09\u7684\u521b\u610f\u63d0\u793a\uff0c\u4ece\u800c\u4e30\u5bcc\u8bb2\u6545\u4e8b\u7684\u4f53\u9a8c\u3002FairyLandAI\u7cbe\u51c6\u5730\u9002\u5e94\u513f\u7ae5\u7684\u60f3\u8c61\u529b\u4e16\u754c\uff0c\u63d0\u4f9b\u65e2\u6559\u80b2\u53c8\u5a31\u4e50\u7684\u6545\u4e8b\uff0c\u5e76\u4e0e\u4e0d\u540c\u5e74\u9f84\u9636\u6bb5\u6240\u8574\u542b\u7684\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002\u5b83\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u6839\u636e\u4e2a\u4f53\u5b69\u5b50\u7684\u559c\u597d\u548c\u6587\u5316\u80cc\u666f\u5b9a\u5236\u6545\u4e8b\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u53d9\u4e8b\u65b0\u65f6\u4ee3\u7684\u5230\u6765\u3002\u6b64\u5916\uff0c\u5b83\u4e0e\u56fe\u50cf\u751f\u6210\u6280\u672f\u7684\u7ed3\u5408\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u53d9\u4e8b\u4f53\u9a8c\uff0c\u6fc0\u53d1\u53e3\u5934\u548c\u89c6\u89c9\u521b\u9020\u529b\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cFairyLandAI\u5728\u521b\u4f5c\u5438\u5f15\u5b69\u5b50\u4eec\u7684\u6545\u4e8b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u4e9b\u6545\u4e8b\u4e0d\u4ec5\u5a31\u4e50\uff0c\u8fd8\u4f53\u73b0\u4e86\u591a\u5143\u4f20\u7edf\u4e2d\u7684\u9053\u5fb7\u6559\u8bf2\u3002\u8fd9\u4e2a\u6a21\u578b\u5bf9\u4e8e\u5bb6\u957f\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u6765\u8bf4\u662f\u4e00\u4e2a\u5b9d\u8d35\u7684\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u901a\u8fc7\u5f15\u4eba\u5165\u80dc\u7684\u6545\u4e8b\u4f20\u9012\u6df1\u523b\u7684\u4eba\u751f\u9053\u7406\u3002FairyLandAI\u4ee3\u8868\u4e86\u5229\u7528LLMs\uff0c\u7279\u522b\u662fOpenAI API\u8fdb\u884c\u6559\u80b2\u548c\u6587\u5316\u63d0\u5347\u7684\u5f00\u521b\u6027\u4e00\u6b65\uff0c\u4f7f\u590d\u6742\u800c\u5bcc\u6709\u6559\u80b2\u610f\u4e49\u7684\u9053\u5fb7\u6545\u4e8b\u5bf9\u5e74\u8f7b\u3001\u5bcc\u6709\u60f3\u8c61\u529b\u7684\u5fc3\u7075\u53d8\u5f97\u6613\u4e8e\u7406\u89e3\u548c\u4eab\u53d7\u3002|\n", "2407.09450": "|**2024-07-12**|**Human-like Episodic Memory for Infinite Context LLMs**|Zafeirios Fountas et.al.|[2407.09450](http://arxiv.org/abs/2407.09450)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u4ecd\u9762\u4e34\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u7684\u95ee\u9898\u3002\u4eba\u7c7b\u5927\u8111\u5728\u7ec4\u7ec7\u548c\u68c0\u7d22\u8de8\u957f\u65f6\u95f4\u5c3a\u5ea6\u7684\u4eb2\u8eab\u7ecf\u5386\u65b9\u9762\u5c24\u4e3a\u51fa\u8272\uff0c\u80fd\u591f\u8986\u76d6\u4e00\u751f\u7684\u8bb0\u5fc6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aEM-LLM\uff0c\u5b83\u5c06\u4eba\u7c7b\u7684 episodic memory\uff08\u60c5\u666f\u8bb0\u5fc6\uff09\u548c\u4e8b\u4ef6\u8ba4\u77e5\u5173\u952e\u8981\u7d20\u878d\u5165\u5230LLMs\u4e2d\uff0c\u4f7f\u5176\u80fd\u591f\u6709\u6548\u5904\u7406\u51e0\u4e4e\u65e0\u9650\u957f\u5ea6\u7684\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4fdd\u6301\u8ba1\u7b97\u6548\u7387\u3002EM-LLM\u901a\u8fc7\u7ed3\u5408\u8d1d\u53f6\u65af\u60ca\u5947\u5ea6\u548c\u56fe\u8bba\u8fb9\u754c\u7ec6\u5316\u6280\u672f\uff0c\u5728\u7ebf\u65b9\u5f0f\u7ec4\u7ec7\u4ee4\u724c\u5e8f\u5217\u6210\u8fde\u8d2f\u7684\u4e8b\u4ef6\u3002\u5f53\u9700\u8981\u65f6\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bb0\u5fc6\u8fc7\u7a0b\u2014\u2014\u7ed3\u5408\u76f8\u4f3c\u5ea6\u548c\u65f6\u95f4\u90bb\u63a5\u7684\u68c0\u7d22\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4fe1\u606f\u8bbf\u95ee\u3002\u5728LongBench\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0cEM-LLM\u7684\u8868\u73b0\u4f18\u4e8e\u6700\u5148\u8fdb\u7684InfLLM\u6a21\u578b\uff0c\u603b\u4f53\u76f8\u5bf9\u63d0\u9ad8\u4e864.3%\uff0c\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\uff0c\u5305\u62ec\u63d0\u5347\u4e8633%\u7684PassageRetrieval\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86EM-LLM\u4e8b\u4ef6\u5206\u5272\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e8b\u4ef6\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\uff0c\u6697\u793a\u4e86\u8fd9\u4e2a\u4eba\u5de5\u7cfb\u7edf\u4e0e\u751f\u7269\u5bf9\u5e94\u673a\u5236\u4e4b\u95f4\u7684\u6865\u6881\u3002\u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u63d0\u5347\u4e86LLMs\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd8\u4e3a\u63a2\u7d22\u4eba\u7c7b\u8bb0\u5fc6\u673a\u5236\u63d0\u4f9b\u4e86\u8ba1\u7b97\u6846\u67b6\uff0c\u5f00\u8f9f\u4e86\u4eba\u5de5\u667a\u80fd\u548c\u8ba4\u77e5\u79d1\u5b66\u4ea4\u53c9\u7814\u7a76\u7684\u65b0\u9014\u5f84\u3002|\n", "2407.09447": "|**2024-07-12**|**ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts**|Amelia F. Hardy et.al.|[2407.09447](http://arxiv.org/abs/2407.09447)|**[link](https://github.com/sisl/astprompter)**|## \u80cc\u666f \u901a\u5e38\u7684\u81ea\u52a8\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ea2\u961f\u5bf9\u6297\u7b56\u7565\u96c6\u4e2d\u5728\u5bfb\u627e\u80fd\u89e6\u53d1\u51bb\u7ed3\u8bed\u8a00\u6a21\u578b\uff08\u5373\u9632\u5fa1\u8005\uff09\u751f\u6210\u6709\u6bd2\u6587\u672c\u7684\u63d0\u793a\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6a21\u578b\uff08\u5373\u653b\u51fb\u8005\uff09\u4ea7\u751f\u96be\u4ee5\u7406\u89e3\u3001\u4e0d\u81ea\u7136\u7684\u8f93\u51fa\u3002\u5728\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u6765\u5904\u7406LLMs\u7684\u7ea2\u961f\u5bf9\u6297\u4efb\u52a1\uff0c\u76ee\u6807\u662f\u627e\u5230\u65e2\u80fd\uff081\uff09\u89e6\u53d1\u9632\u5fa1\u8005\u751f\u6210\u6709\u6bd2\u6587\u672c\uff0c\u53c8\u80fd\uff082\uff09\u4fdd\u6301\u4f4e\u56f0\u60d1\u5ea6\uff08\u5373\u9632\u5fa1\u8005\u6253\u5206\uff09\u7684\u63d0\u793a\u3002\u6211\u4eec\u8ba4\u4e3a\u5728\u7ea2\u961f\u5bf9\u6297\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u60c5\u51b5\u6700\u76f8\u5173\uff0c\u56e0\u4e3a\u5b83\u4eec\u5f88\u53ef\u80fd\u5728\u9632\u5fa1\u8005\u6a21\u578b\u7684\u5e38\u89c4\u4f7f\u7528\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u5728\u7ebf\u548c\u5f31\u76d1\u7763\u7684Identity Preference Optimization\uff08IPO\uff09\u53d8\u4f53\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u5e94\u7528\u4e8eGPT-2\u548cGPT-2 XL\u4f5c\u4e3a\u9632\u5fa1\u8005\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b56\u7565\u80fd\u591f\u751f\u6210\u65e2\u53ef\u80fd\u53c8\u4f1a\u89e6\u53d1\u6bd2\u6027\u7684\u63d0\u793a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5b66\u4e60\u7b56\u7565\u3001\u53ef\u80fd\u6027\u4e0e\u6bd2\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u8ba8\u8bba\u4e86\u76f8\u5173\u542b\u4e49\u3002\u8be5\u9879\u76ee\u7684\u6e90\u4ee3\u7801\u53ef\u5728\u8fd9\u91cc\u83b7\u53d6\uff1ahttps://github.com/sisl/ASTPrompter/\u3002|\n", "2407.09435": "|**2024-07-12**|**MUSCLE: A Model Update Strategy for Compatible LLM Evolution**|Jessica Echterhoff et.al.|[2407.09435](http://arxiv.org/abs/2407.09435)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u6570\u636e\u6216\u67b6\u6784\u7684\u8c03\u6574\u800c\u7ecf\u5e38\u66f4\u65b0\u4ee5\u63d0\u5347\u6027\u80fd\u3002\u5728\u5347\u7ea7\u8fc7\u7a0b\u4e2d\uff0c\u5f00\u53d1\u8005\u901a\u5e38\u4fa7\u91cd\u4e8e\u63d0\u9ad8\u603b\u4f53\u6027\u80fd\u6307\u6807\uff0c\u5bf9\u4e0e\u65e7\u7248\u672c\u517c\u5bb9\u6027\u7684\u5173\u6ce8\u8f83\u5c11\u3002\u7136\u800c\uff0c\u7528\u6237\u5f80\u5f80\u4f1a\u5bf9\u4ed6\u4eec\u4f7f\u7528\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u529f\u80fd\u548c\u80fd\u529b\u5f62\u6210\u5fc3\u7406\u6a21\u578b\uff0c\u5e76\u968f\u7740\u6bcf\u6b21\u66f4\u65b0\u9700\u8981\u8c03\u6574\u8fd9\u4e2a\u6a21\u578b\u3002\u9891\u7e41\u7684\u6a21\u578b\u53d8\u66f4\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u6ee1\u610f\u5ea6\u4e0b\u964d\u3002\u5b9e\u9645\u4e0a\uff0c\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u5668\u4f9d\u8d56\u9884\u8bad\u7ec3\u7684LLM\u57fa\u6a21\u578b\u3002\u5f53\u57fa\u6a21\u578b\u66f4\u65b0\u65f6\uff0c\u9762\u5411\u7528\u6237\u7684\u8fd9\u4e9b\u4e0b\u6e38\u4efb\u52a1\u6a21\u578b\u53ef\u80fd\u4f1a\u51fa\u73b0\u5b9e\u4f8b\u9000\u5316\u6216\u8d1f\u9762\u7ffb\u8f6c\u2014\u2014\u5148\u524d\u6b63\u786e\u7684\u5b9e\u4f8b\u73b0\u5728\u88ab\u9884\u6d4b\u9519\u8bef\u3002\u5373\u4f7f\u4e0b\u6e38\u4efb\u52a1\u7684\u8bad\u7ec3\u6d41\u7a0b\u4fdd\u6301\u4e0d\u53d8\uff0c\u8fd9\u79cd\u60c5\u51b5\u4e5f\u4f1a\u53d1\u751f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4e3a\u7528\u6237\u63d0\u4f9b\u65e0\u7f1d\u7684\u6a21\u578b\u66f4\u65b0\u4f53\u9a8c\uff0c\u65b9\u6cd5\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u6a21\u578b\u4e0e\u65e7\u7248\u672c\u7684\u517c\u5bb9\u6027\uff0c\u7279\u522b\u9002\u7528\u4e8e\u751f\u6210\u4efb\u52a1\uff0c\u4e5f\u53ef\u5e94\u7528\u4e8e\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e0d\u540c\u6a21\u578b\u7248\u672c\u548c\u66f4\u65b0\u4e4b\u95f4\u5b58\u5728\u9000\u5316\u548c\u4e0d\u4e00\u81f4\u6027\uff0c\u5c24\u5176\u662f\u5728\u591a\u6837\u5316\u7684\u4efb\u52a1\u4e0a\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ee5\u4e0b\u4e24\u4e2a\u9014\u5f84\u63d0\u4f9b\u5bf9\u7528\u6237\u53cb\u597d\u7684\u6a21\u578b\u66f4\u65b0\uff1a\u4e00\u662f\u5f00\u53d1\u4e00\u79cd\u517c\u5bb9\u6027\u8bc4\u4f30\u6807\u51c6\uff0c\u7528\u4e8e\u68c0\u6d4b\u751f\u6210\u4efb\u52a1\u6216\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6a21\u578b\u7248\u672c\u95f4\u5dee\u5f02\uff1b\u4e8c\u662f\u63d0\u51fa\u4e00\u79cd\u8bad\u7ec3\u7b56\u7565\uff0c\u901a\u8fc7\u8bad\u7ec3\u517c\u5bb9\u6027\u6a21\u578b\u6765\u51cf\u5c11\u6a21\u578b\u66f4\u65b0\u4e2d\u7684\u4e0d\u4e00\u81f4\uff0c\u4ece\u800c\u964d\u4f4e\u4eceLlama 1\u5230Llama 2\u7b49\u7248\u672c\u66f4\u65b0\u65f6\u7684\u8d1f\u9762\u7ffb\u8f6c\u7387\uff0c\u6700\u591a\u53ef\u51cf\u5c1140%\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u8f7b\u677e\u5730\u9002\u5e94\u65b0\u7248\u672c\uff0c\u800c\u65e0\u9700\u9891\u7e41\u8c03\u6574\u4ed6\u4eec\u7684\u9884\u671f\u548c\u4f7f\u7528\u65b9\u5f0f\u3002|\n", "2407.09429": "|**2024-07-12**|**Open (Clinical) LLMs are Sensitive to Instruction Phrasings**|Alberto Mario Ceballos Arroyo et.al.|[2407.09429](http://arxiv.org/abs/2407.09429)|**[link](https://github.com/alceballosa/clin-robust)**|## \u80cc\u666f \u57fa\u4e8e\u6307\u4ee4\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u6267\u884c\u5404\u79cd\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5bf9\u6307\u4ee4\u8868\u8ff0\u7684\u654f\u611f\u6027\u662f\u4e00\u4e2a\u95ee\u9898\u3002\u5728\u533b\u7597\u9886\u57df\u5c24\u5176\u5173\u952e\uff0c\u56e0\u4e3a\u4e34\u5e8a\u533b\u751f\u53ef\u80fd\u4e0d\u662f\u63d0\u793a\u5de5\u7a0b\u65b9\u9762\u7684\u4e13\u5bb6\uff0c\u4e14\u9519\u8bef\u8f93\u51fa\u7684\u6f5c\u5728\u540e\u679c\u66f4\u4e3a\u4e25\u91cd\u3002\u8fd9\u5c31\u63d0\u51fa\u4e86\u4e00\u4e2a\u5b9e\u9645\u95ee\u9898\uff1a\u9488\u5bf9\u4e34\u5e8a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u6307\u4ee4\u8c03\u4f18\u7684LLMs\u5bf9\u4e8e\u81ea\u7136\uff08\u975e\u653b\u51fb\u6027\u7684\uff09\u6307\u4ee4\u8868\u8ff0\u53d8\u5316\u6709\u591a\u7a33\u5065\uff1f\u6211\u4eec\u6536\u96c6\u4e86\u6765\u81ea\u4e0d\u540c\u4efb\u52a1\u7684\u533b\u751f\u63d0\u793a\uff0c\u8861\u91cf\u4e86\u4e03\u79cdLLM\uff08\u5305\u62ec\u901a\u7528\u548c\u4e13\u7528\u7684\uff09\u5bf9\u6307\u4ee4\u8868\u8ff0\u7ec6\u5fae\u5dee\u5f02\u7684\u654f\u611f\u5ea6\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5dee\u5f02\u663e\u8457\uff0c\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u4e13\u95e8\u9488\u5bf9\u4e34\u5e8a\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u8f83\u4e8e\u901a\u7528\u9886\u57df\u7684\u6a21\u578b\uff0c\u5176\u7a33\u5b9a\u6027\u8f83\u5dee\u3002\u6b64\u5916\uff0c\u968f\u610f\u7684\u8868\u8ff0\u53d8\u5316\u53ef\u80fd\u5f71\u54cd\u516c\u5e73\u6027\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u9884\u6d4b\u6b7b\u4ea1\u7387\u7684\u6709\u6548\u4f46\u4e0d\u540c\u7684\u6307\u4ee4\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u6574\u4f53\u6027\u80fd\u7684\u6ce2\u52a8\uff0c\u8fd8\u4f1a\u5728\u4e0d\u540c\u4eba\u7fa4\u95f4\u4ea7\u751f\u5dee\u5f02\u3002|\n", "2407.09424": "|**2024-07-12**|**TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models**|Hang Zou et.al.|[2407.09424](http://arxiv.org/abs/2407.09424)|null|\u8be5\u8bba\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u5230\u7535\u4fe1\u9886\u57df\u7684\u4e13\u7528\u6a21\u578b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u6784\u5efa\u4e86\u7535\u4fe1\u7279\u5b9a\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u6307\u4ee4\u6570\u636e\u96c6\u548c\u504f\u597d\u6570\u636e\u96c6\uff0c\u5206\u522b\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\u3001\u6307\u5bfc\u8c03\u4f18\u548c\u5bf9\u9f50\u8c03\u4f18\u3002\u7531\u4e8e\u7535\u4fe1\u9886\u57df\u7f3a\u4e4f\u5e7f\u6cdb\u63a5\u53d7\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u57fa\u51c6\uff1a\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u3001\u7535\u4fe1\u5f00\u653e\u6027\u95ee\u9898\u4e0e\u7b54\u6848\uff08TeleQnA\uff09\u4ee5\u53ca\u7535\u4fe1\u4ee3\u7801\u4efb\u52a1\u3002\u8fd9\u4e9b\u65b0\u57fa\u51c6\u5168\u9762\u8bc4\u4f30\u4e86LLMs\u5728\u7535\u4fe1\u9886\u57df\u7684\u6570\u5b66\u5efa\u6a21\u3001\u5f00\u653e\u5f0f\u95ee\u9898\u56de\u7b54\u3001\u4ee3\u7801\u751f\u6210\u3001\u586b\u5145\u3001\u603b\u7ed3\u548c\u5206\u6790\u7b49\u80fd\u529b\u3002\u6211\u4eec\u7684\u4f18\u5316\u6a21\u578bTelecomGPT\u5728\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5982GPT-4\u3001Llama-3\u548cMistral\uff0c\u5e76\u5728TeleQnA\u30013GPP\u6280\u672f\u6587\u6863\u5206\u7c7b\u3001\u7535\u4fe1\u4ee3\u7801\u6458\u8981\u4e0e\u751f\u6210\u4ee5\u53ca\u586b\u5145\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2407.09417": "|**2024-07-12**|**Mitigating Entity-Level Hallucination in Large Language Models**|Weihang Su et.al.|[2407.09417](http://arxiv.org/abs/2407.09417)|**[link](https://github.com/oneal2000/entityhallucination)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7528\u6237\u83b7\u53d6\u4fe1\u606f\u7684\u65b9\u5f0f\u53d1\u751f\u4e86\u8f6c\u53d8\uff0c\u4ece\u4f20\u7edf\u7684\u641c\u7d22\u5f15\u64ce\u8f6c\u5411\u76f4\u63a5\u4e0eLLMs\u8fdb\u884c\u95ee\u7b54\u4ea4\u4e92\u3002\u7136\u800c\uff0cLLMs\u7684\u5e7f\u6cdb\u5e94\u7528\u66b4\u9732\u51fa\u4e00\u4e2a\u6311\u6218\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u751f\u6210\uff0c\u5373\u6a21\u578b\u751f\u6210\u770b\u4f3c\u8fde\u8d2f\u4f46\u4e8b\u5b9e\u6027\u9519\u8bef\u7684\u56de\u7b54\uff0c\u8fd9\u5bfc\u81f4\u7528\u6237\u5bf9\u57fa\u4e8eLLMs\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u4ea7\u751f\u6000\u7591\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff1a\u52a8\u6001\u68c0\u7d22\u589e\u5f3a\u57fa\u4e8e\u5e7b\u89c9\u68c0\u6d4b\uff08DRAD\uff09\u3002DRAD\u6539\u8fdb\u4e86\u4f20\u7edf\u68c0\u7d22\u589e\u5f3a\u6280\u672f\uff0c\u901a\u8fc7\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\u6765\u52a8\u6001\u8c03\u6574\u68c0\u7d22\u8fc7\u7a0b\u3002\u5b83\u4e3b\u8981\u5305\u62ec\u4e24\u4e2a\u6838\u5fc3\u7ec4\u4ef6\uff1a\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\uff08RHD\uff09\uff0c\u7528\u4e8e\u5728\u65e0\u9700\u5916\u90e8\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\u8bc6\u522b\u6f5c\u5728\u7684\u5e7b\u89c9\uff1b\u4ee5\u53ca\u57fa\u4e8e\u5916\u90e8\u77e5\u8bc6\u7684\u81ea\u6211\u7ea0\u6b63\uff08SEK\uff09\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4fee\u6b63\u8fd9\u4e9b\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAD\u5728\u68c0\u6d4b\u548c\u51cf\u5c11LLMs\u4e2d\u7684\u5e7b\u89c9\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5df2\u5c06\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u4f7f\u7528\uff1ahttps://github.com/oneal2000/EntityHallucination\u3002**|\n", "2407.09413": "|**2024-07-12**|**SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers**|Shraman Pramanick et.al.|[2407.09413](http://arxiv.org/abs/2407.09413)|**[link](https://github.com/google/spiqa)**|**### \u4efb\u52a1 \u5728\u6df1\u5165\u9605\u8bfb\u79d1\u5b66\u8bba\u6587\u65f6\uff0c\u5feb\u901f\u67e5\u627e\u4fe1\u606f\u662f\u5173\u952e\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u4e8e\u8bba\u6587\u7684\u95ee\u9898 answering\uff08QA\uff09\u6570\u636e\u96c6\u5728\u89c4\u6a21\u548c\u5185\u5bb9\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u4e3b\u8981\u5173\u6ce8\u6587\u672c\u90e8\u5206\u3002\u4e3a\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63a8\u51fa\u4e86SPIQA\uff08\u79d1\u5b66\u8bba\u6587\u56fe\u50cf\u95ee\u9898\u56de\u7b54\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u5927\u578bQA\u6570\u636e\u96c6\uff0c\u65e8\u5728\u7406\u89e3\u8ba1\u7b97\u673a\u79d1\u5b66\u5404\u9886\u57df\u7684\u590d\u6742\u56fe\u8868\u3001\u8868\u683c\u548c\u7ed3\u679c\u53ef\u89c6\u5316\u3002\u501f\u52a9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u901a\u8fc7\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u7b5b\u9009\u521b\u5efa\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u3002SPIQA\u5305\u542b\u4e8627\u4e07\u6761\u95ee\u9898\uff0c\u5206\u4e3a\u8bad\u7ec3\u3001\u9a8c\u8bc1\u548c\u4e09\u4e2a\u4e0d\u540c\u7684\u8bc4\u4f30\u5206\u6bb5\u3002\u901a\u8fc7\u4e0e12\u4e2a\u57fa\u7840\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u591a\u6a21\u6001\u7cfb\u7edf\u7406\u89e3\u79d1\u7814\u6587\u7ae0\u7ec6\u5fae\u4e4b\u5904\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u8bc4\u4ef7\u7b56\u7565\uff0c\u7ed3\u5408\u4e0a\u4e0b\u6587\u68c0\u7d22\uff0c\u5b9e\u73b0\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u8bc4\u4f30\uff0c\u6709\u52a9\u4e8e\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u989d\u5916\u6587\u672c\u4fe1\u606f\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u4e0a\u9650\uff0c\u8fd9\u8868\u660e\u4e86\u5176\u5bf9\u672a\u6765\u7814\u7a76\u7684\u6f5c\u529b\uff0c\u5e76\u9884\u793a\u7740\u8be5\u6570\u636e\u96c6\u5c06\u9769\u65b0\u6211\u4eec\u4e0e\u79d1\u5b66\u6587\u732e\u4e92\u52a8\u7684\u65b9\u5f0f\u3002**|\n", "2407.09394": "|**2024-07-12**|**PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents**|Saber Zerhoudi et.al.|[2407.09394](http://arxiv.org/abs/2407.09394)|**[link](https://github.com/padas-lab-de/PersonaRAG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u77e5\u8bc6\u8fc7\u65f6\u548c\u80e1\u7f16\u4e71\u9020\u800c\u96be\u4ee5\u751f\u6210\u53ef\u9760\u7684\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u578b\u901a\u8fc7\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u6539\u8fdb\u4e86LLMs\uff0c\u4f46\u5f80\u5f80\u65e0\u6cd5\u4e2a\u6027\u5316\u68c0\u7d22\u8fc7\u7a0b\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PersonaRAG\uff0c\u5b83\u5f15\u5165\u4e86\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u4ee3\u7406\uff0c\u80fd\u591f\u6839\u636e\u5b9e\u65f6\u7528\u6237\u6570\u636e\u548c\u4ea4\u4e92\u6765\u8c03\u6574\u68c0\u7d22\u548c\u751f\u6210\u3002\u5728\u591a\u4e2a\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPersonaRAG\u76f8\u8f83\u4e8e\u57fa\u7840\u6a21\u578b\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\uff0c\u80fd\u66f4\u597d\u5730\u6ee1\u8db3\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7528\u6237\u9002\u5e94\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u5177\u6709\u5e7f\u9614\u7684\u53d1\u5c55\u524d\u666f\u3002|\n", "2407.09388": "|**2024-07-12**|**GAVEL: Generating Games Via Evolution and Language Models**|Graham Todd et.al.|[2407.09388](http://arxiv.org/abs/2407.09388)|null|\u81ea\u52a8\u521b\u5efa\u65b0\u9896\u6709\u8da3\u7684\u6e38\u620f\u662f\u4e00\u4e2a\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u5982\u4f55\u4ee5\u8ba1\u7b97\u673a\u53ef\u5904\u7406\u7684\u5f62\u5f0f\u8868\u8fbe\u6e38\u620f\u89c4\u5219\u3001\u641c\u7d22\u5e9e\u5927\u7684\u6f5c\u5728\u6e38\u620f\u7a7a\u95f4\uff0c\u4ee5\u53ca\u51c6\u786e\u8bc4\u4f30\u672a\u89c1\u8fc7\u6e38\u620f\u7684\u539f\u521b\u6027\u548c\u8d28\u91cf\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u6709\u9650\u7684\u89c4\u5219\u8868\u793a\uff0c\u5e76\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u5728Ludii\u6e38\u620f\u63cf\u8ff0\u8bed\u8a00\u4e2d\u751f\u6210\u65b0\u5947\u7684\u6e38\u620f\uff0c\u8be5\u8bed\u8a00\u7f16\u7801\u4e86\u5404\u79cd\u98ce\u683c\u548c\u73a9\u6cd5\u76841000\u591a\u6b3e\u68cb\u76d8\u6e38\u620f\u89c4\u5219\u3002\u6211\u4eec\u501f\u9274\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fdb\u5316\u8ba1\u7b97\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u80fd\u591f\u667a\u80fd\u5730\u53d8\u5f02\u548c\u91cd\u7ec4\u4ee5\u4ee3\u7801\u5f62\u5f0f\u8868\u8fbe\u7684\u6e38\u620f\u673a\u5236\u7684\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u521b\u9020\u51fa\u65b0\u7684\u3001\u6709\u5438\u5f15\u529b\u7684\u6e38\u620f\uff0c\u5305\u62ec\u90a3\u4e9b\u73b0\u6709Ludii\u6570\u636e\u96c6\u4e2d\u672a\u8986\u76d6\u7684\u6e38\u620f\u533a\u57df\u3002\u751f\u6210\u7684\u4e00\u4e9b\u6e38\u620f\u793a\u4f8b\u53ef\u901a\u8fc7Ludii\u95e8\u6237\u5728\u7ebf\u4f53\u9a8c\u3002|\n", "2407.10972": "|**2024-07-15**|**VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation**|Bocheng Zou et.al.|[2407.10972](http://arxiv.org/abs/2407.10972)|**[link](https://github.com/vgbench/VGBench)**|**\u5728\u89c6\u89c9\u6a21\u578b\u9886\u57df\uff0c\u4e3b\u8981\u7684\u8868\u793a\u65b9\u5f0f\u662f\u4f7f\u7528\u50cf\u7d20\u6765\u7ed8\u5236\u89c6\u89c9\u4e16\u754c\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u975e\u603b\u662f\u6700\u4f73\u6216\u552f\u4e00\u7684\u8868\u793a\u89c6\u89c9\u5185\u5bb9\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8bbe\u8ba1\u5e08\u548c\u827a\u672f\u5bb6\uff0c\u4ed6\u4eec\u5e38\u7528\u591a\u8fb9\u5f62\u7b49\u51e0\u4f55\u5f62\u72b6\u6765\u6784\u5efa\u56fe\u5f62\u3002\u77e2\u91cf\u56fe\u5f62\uff08VG\uff09\u63d0\u4f9b\u4e86\u4e00\u79cd\u6587\u672c\u5f62\u5f0f\u7684\u89c6\u89c9\u5185\u5bb9\u8868\u793a\uff0c\u5bf9\u4e8e\u5361\u901a\u6216\u7d20\u63cf\u7b49\u7c7b\u578b\u7684\u5185\u5bb9\u53ef\u80fd\u66f4\u4e3a\u7cbe\u70bc\u548c\u5f3a\u5927\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5f3a\u5927\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u4f46\u8fd9\u4e9b\u5de5\u4f5c\u4e3b\u8981\u4fa7\u91cd\u4e8e\u5b9a\u6027\u5206\u6790\u3001\u7406\u89e3\u6216\u7279\u5b9a\u7c7b\u578b\u7684\u77e2\u91cf\u56fe\u5f62\u3002\u6211\u4eec\u63d0\u51faVGBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u7684\u6027\u80fd\uff0c\u5305\u62ec\uff1a(a) \u5bf9\u89c6\u89c9\u7406\u89e3\u548c\u751f\u6210\u7684\u53cc\u91cd\u5173\u6ce8\uff0c(b) \u591a\u79cd\u77e2\u91cf\u56fe\u5f62\u683c\u5f0f\u7684\u8bc4\u4f30\uff0c(c) \u4e0d\u540c\u7c7b\u578b\u7684\u63d0\u95ee\uff0c(d) \u5e7f\u6cdb\u7684\u63d0\u793a\u6280\u5de7\uff0c\u4ee5\u53ca(e) \u5728\u591a\u79cdLLMs\u4e0b\u7684\u8868\u73b0\u3002\u901a\u8fc7\u5bf9\u6536\u96c6\u76844279\u4e2a\u7406\u89e3\u6837\u672c\u548c5845\u4e2a\u751f\u6210\u6837\u672c\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u90fd\u8868\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u4f4e\u7ea7\u683c\u5f0f\uff08\u5982SVG\uff09\u4e0a\u8868\u73b0\u7a0d\u900a\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u8bc4\u4f30\u6d41\u7a0b\u5c06\u5728\u4e0a\u5f00\u6e90\u3002**|\n", "2407.10969": "|**2024-07-15**|**Q-Sparse: All Large Language Models can be Fully Sparsely-Activated**|Hongyu Wang et.al.|[2407.10969](http://arxiv.org/abs/2407.10969)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u79f0\u4e3aQ-Sparse\uff0c\u4e13\u4e3a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u3002Q-Sparse\u4f7f\u5f97LLMs\u7684\u6fc0\u6d3b\u5168\u4e3a\u7a00\u758f\uff0c\u4ece\u800c\u5728\u63a8\u7406\u9636\u6bb5\u5e26\u6765\u663e\u8457\u7684\u6548\u7387\u63d0\u5347\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u5e94\u7528\u9876\u90e8K\u7a00\u758f\u5316\u6280\u672f\u5bf9\u6fc0\u6d3b\u8fdb\u884c\u5904\u7406\uff0c\u5e76\u7ed3\u5408\u76f4\u901a\u4f30\u8ba1\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3b\u8981\u6210\u679c\u5305\u62ec\uff1a(1) Q-Sparse\u5728\u4fdd\u6301\u4e0e\u57fa\u7ebfLLM\u7ed3\u679c\u76f8\u5f53\u7684\u540c\u65f6\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a8\u7406\u65f6\u7684\u6548\u7387\uff1b(2) \u6211\u4eec\u7ed9\u51fa\u4e86\u7a00\u758f\u6fc0\u6d3bLLMs\u7684\u6700\u4f18\u63a8\u7406\u7f29\u653e\u5b9a\u5f8b\uff1b(3) Q-Sparse\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8868\u73b0\u4f18\u79c0\uff0c\u5305\u62ec\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7ee7\u7eed\u8bad\u7ec3\u548c\u5fae\u8c03\uff1b(4) Q-Sparse\u9002\u7528\u4e8e\u5168\u7cbe\u5ea6\u548c1\u4f4d\u7cbe\u5ea6\u7684LLMs\uff0c\u5982BitNet b1.58\u3002\u7279\u522b\u662f\uff0cBitNet b1.58\u4e0eQ-Sparse\uff08\u53ef\u914d\u5907MoE\uff09\u7684\u7ed3\u5408\uff0c\u4e3a\u672a\u6765LLMs\u7684\u6548\u7387\u63d0\u5347\uff0c\u5305\u62ec\u6210\u672c\u548c\u80fd\u8017\uff0c\u63d0\u4f9b\u4e86\u57fa\u77f3\u548c\u6e05\u6670\u8def\u5f84\u3002|\n", "2407.10960": "|**2024-07-15**|**Fast Matrix Multiplications for Lookup Table-Quantized LLMs**|Han Guo et.al.|[2407.10960](http://arxiv.org/abs/2407.10960)|**[link](https://github.com/hanguo97/flute)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u901a\u5e38\u53d7\u5230\u5185\u5b58\u5e26\u5bbd\u7684\u9650\u5236\uff0c\u5176\u4e2d\u4e3b\u8981\u74f6\u9888\u662f\u5c06\u6a21\u578b\u53c2\u6570\u4eceGPU\u5168\u5c40\u5185\u5b58\u4f20\u8f93\u5230\u5bc4\u5b58\u5668\u7684\u6210\u672c\u3002\u901a\u8fc7\u7ed3\u5408\u6743\u91cd\u53ea\u91cf\u5316\uff0c\u53ef\u4ee5\u51cf\u5c11\u5185\u5b58\u79fb\u52a8\uff0c\u4ece\u800c\u52a0\u901f\u63a8\u7406\u901f\u5ea6\u3002\u7136\u800c\uff0c\u4e3a\u91cf\u5316\u540e\u7684LLMs\u8bbe\u8ba1\u9ad8\u6027\u80fd\u5185\u6838\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5f53\u6743\u91cd\u88ab\u538b\u7f29\u5230\u975e\u5747\u5300\u5206\u9694\u7684\u4f4d\u5bbd\uff08\u59823\u4f4d\uff09\uff0c\u5e76\u91c7\u7528\u975e\u5747\u5300\u67e5\u627e\u8868\uff08LUT\uff09\u91cf\u5316\u65f6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7075\u6d3b\u7684\u67e5\u627e\u8868\u5f15\u64ceFLUTE\uff0c\u5b83\u901a\u8fc7\u5bf9\u91cf\u5316\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u79bb\u7ebf\u91cd\u6784\uff0c\u4ee5\u6700\u5c0f\u5316\u89e3\u538b\u76f8\u5173\u7684\u4f4d\u64cd\u4f5c\uff0c\u5e76\u901a\u8fc7\u5411\u91cf\u5316\u548c\u590d\u5236\u67e5\u627e\u8868\u6765\u7f13\u89e3\u5171\u4eab\u5185\u5b58\u5e26\u5bbd\u9650\u5236\u3002\u5728\u5c0f\u6279\u91cf\uff08\u5c0f\u4e8e32\uff09\u548c\u91cf\u5316\u7ec4\u5927\u5c0f\u4e3a128\uff08LLM\u63a8\u7406\u4e2d\u7684\u5178\u578b\u503c\uff09\u7684\u60c5\u51b5\u4e0b\uff0cFLUTE\u5185\u6838\u7684\u901f\u5ea6\u53ef\u4ee5\u6bd4\u73b0\u6709GEMM\u5185\u6838\u5feb2-4\u500d\u3002\u4f5c\u4e3aFLUTE\u7684\u5e94\u7528\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u67e5\u627e\u8868\u57fa\u7684NormalFloat\u91cf\u5316\u7684\u4e00\u79cd\u7b80\u5355\u6269\u5c55\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u4e8e\u91cf\u5316LLaMA3\uff0c\u83b7\u5f97\u4e86\u4e0e\u5f3a\u5927\u57fa\u51c6\u76f8\u5f53\u7684\u91cf\u5316\u6027\u80fd\uff0c\u540c\u65f6\u5b9e\u73b0\u4e86\u7aef\u5230\u7aef\u541e\u5410\u91cf\u76841.5\u52302\u500d\u63d0\u5347\u3002|\n", "2407.10953": "|**2024-07-15**|**MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models**|Chengguang Gan et.al.|[2407.10953](http://arxiv.org/abs/2407.10953)|null|## \u4efb\u52a1 **\u80cc\u666f\uff1a** \u4e92\u60e0\u589e\u5f3a\u6548\u5e94\uff08MRE\uff09\u5728\u4fe1\u606f\u62bd\u53d6\u548c\u591a\u4efb\u52a1\u7814\u7a76\u4e2d\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4ec5\u6709\u7684MRE\u6df7\u5408\u6570\u636e\u96c6\u5c40\u9650\u4e8e\u65e5\u8bed\uff0c\u8fd9\u9650\u5236\u4e86\u5168\u7403\u7814\u7a76\u754c\u7684\u5e7f\u6cdb\u63a2\u7d22\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u8bed\u8a00MRE\u6df7\u5408\u6570\u636e\u96c6\uff08MMM\uff09\uff0c\u5305\u542b\u82f1\u8bed\u3001\u65e5\u8bed\u548c\u6c49\u8bed\u768421\u4e2a\u5b50\u96c6\u3002\u672c\u6587\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f85\u52a9\u7684\u6570\u636e\u96c6\u7ffb\u8bd1\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528LLMs\u5c06\u539f\u59cb\u65e5\u8bed\u6587\u672c\u8fdb\u884c\u7ffb\u8bd1\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u6570\u636e\u96c6\u6784\u5efa\u65f6\u7684\u4eba\u5de5\u6807\u6ce8\u65f6\u95f4\u3002 **\u8d21\u732e\uff1a** \u6211\u4eec\u6269\u5c55\u4e86\u6570\u636e\u96c6\uff0c\u52a0\u5165\u4e86\u5f00\u653e\u9886\u57df\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u53e5\u5b50\u5206\u7c7b\u4efb\u52a1\u3002\u57fa\u4e8e\u8fd9\u4e2a\u6269\u5145\u540e\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7edf\u4e00\u7684\u8f93\u5165-\u8f93\u51fa\u6846\u67b6\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u5f00\u653e\u57df\u4fe1\u606f\u62bd\u53d6\u5927\u8bed\u8a00\u6a21\u578b\uff08OIELLM\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0cOIELLM\u6a21\u578b\u80fd\u591f\u6709\u6548\u5904\u7406\u65b0\u7684MMM\u6570\u636e\u96c6\uff0c\u5e76\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002 \u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u591a\u8bed\u8a00\u8d44\u6e90\u548c\u9ad8\u6548\u7684\u7ffb\u8bd1\u7b56\u7565\uff0c\u63a8\u52a8\u4e92\u60e0\u589e\u5f3a\u6548\u5e94\u5728\u591a\u8bed\u8a00\u4fe1\u606f\u62bd\u53d6\u9886\u57df\u7684\u5e94\u7528\u7814\u7a76\u3002|\n", "2407.10947": "|**2024-07-15**|**Can Textual Semantics Mitigate Sounding Object Segmentation Preference?**|Yaoting Wang et.al.|[2407.10947](http://arxiv.org/abs/2407.10947)|**[link](https://github.com/gewu-lab/sounding-object-segmentation-preference)**|**## \u4efb\u52a1 \u97f3\u9891-\u89c6\u89c9\u5206\u5272\uff08Audio-Visual Segmentation\uff0cAVS\uff09\u4efb\u52a1\u7684\u76ee\u6807\u662f\u5229\u7528\u97f3\u9891\u7ebf\u7d22\u5728\u89c6\u89c9\u7a7a\u95f4\u4e2d\u5206\u5272\u51fa\u53d1\u58f0\u7269\u4f53\u3002\u7136\u800c\uff0c\u7814\u7a76\u6307\u51fa\uff0c\u73b0\u6709\u7684AVS\u65b9\u6cd5\u8fc7\u4e8e\u4f9d\u8d56\u5bf9\u53ef\u542c\u89c1\u5bf9\u8c61\u7684\u5206\u5272\u504f\u597d\uff0c\u800c\u975e\u7cbe\u786e\u7684\u97f3\u9891\u6307\u5bfc\u3002\u95ee\u9898\u5728\u4e8e\uff0c\u76f8\u6bd4\u4e8e\u89c6\u89c9\uff0c\u97f3\u9891\u5728\u591a\u58f0\u6e90\u97f3\u573a\u4e2d\u7684\u8bed\u4e49\u8868\u73b0\u8f83\u5f31\uff0c\u5bfc\u81f4\u5176\u5728\u6307\u5bfc\u89c6\u89c9\u7a7a\u95f4\u65f6\u4f5c\u7528\u6709\u9650\u3002\u9274\u4e8e\u6587\u672c\u6a21\u6001\u7ecf\u8fc7\u6df1\u5165\u63a2\u7d22\uff0c\u5305\u542b\u4e30\u5bcc\u7684\u62bd\u8c61\u8bed\u4e49\uff0c\u6211\u4eec\u63d0\u51fa\u5229\u7528\u89c6\u89c9\u573a\u666f\u4e2d\u7684\u6587\u672c\u63d0\u793a\u6765\u589e\u5f3a\u97f3\u9891\u6307\u5bfc\u7684\u7cbe\u786e\u6027\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u73b0\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u5668\u83b7\u53d6\u573a\u666f\u63cf\u8ff0\uff0c\u7136\u540e\u5229\u7528\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u6f5c\u5728\u7684\u53d1\u58f0\u7269\u4f53\u4f5c\u4e3a\u6587\u672c\u7ebf\u7d22\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u4e49\u7684\u97f3\u9891\u5efa\u6a21\u6a21\u5757\uff0c\u5f15\u5165\u52a8\u6001\u63a9\u7801\uff0c\u5c06\u97f3\u9891\u7279\u5f81\u4e0e\u6587\u672c\u7ebf\u7d22\u878d\u5408\uff0c\u751f\u6210\u5177\u6709\u4ee3\u8868\u6027\u7684\u53d1\u58f0\u7269\u4f53\u7279\u5f81\u3002\u8fd9\u4e9b\u7279\u5f81\u4e0d\u4ec5\u5305\u542b\u97f3\u9891\u4fe1\u606f\uff0c\u8fd8\u8574\u542b\u4e86\u751f\u52a8\u7684\u8bed\u4e49\uff0c\u4ece\u800c\u4e3a\u89c6\u89c9\u7a7a\u95f4\u63d0\u4f9b\u66f4\u4e3a\u6e05\u6670\u7684\u6307\u5f15\u3002\u6211\u4eec\u5728AVS\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u501f\u52a9\u6587\u672c\u63d0\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5bf9\u97f3\u9891\u7684\u654f\u611f\u5ea6\u5f97\u5230\u63d0\u5347\uff0c\u5728\u6240\u6709\u4e09\u4e2a\u5b50\u96c6\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u7ade\u4e89\u529b\u3002\u9879\u76ee\u9875\u9762\uff1a[https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference](https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference)\u3002**|\n", "2407.10943": "|**2024-07-15**|**GRUtopia: Dream General Robots in a City at Scale**|Hanqing Wang et.al.|[2407.10943](http://arxiv.org/abs/2407.10943)|**[link](https://github.com/openrobotlab/grutopia)**|**\u8fd1\u671f\u7684\u7814\u7a76\u6b63\u5728\u63a2\u7d22Embodied AI\u9886\u57df\u7684\u89c4\u6a21\u6cd5\u5219\u3002\u9274\u4e8e\u6536\u96c6\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u7684\u9ad8\u6602\u6210\u672c\uff0c\u6211\u4eec\u8ba4\u4e3a\u6a21\u62df\u5230\u73b0\u5b9e\uff08Sim2Real\uff09\u65b9\u6cd5\u5bf9\u4e8e\u6269\u5c55embodied\u6a21\u578b\u7684\u5b66\u4e60\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u4ecb\u7ecd\u9879\u76eeGRUtopia\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u5404\u79cd\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u9996\u4e2a\u4e92\u52a8\u4e09\u7ef4\u793e\u4f1a\u3002\u5b83\u5177\u6709\u591a\u9879\u521b\u65b0\uff1a(a) \u573a\u666f\u6570\u636e\u96c6GRScenes\u5305\u542b\u4e8610\u4e07\u5f20\u4ea4\u4e92\u5f0f\u3001\u7cbe\u7ec6\u6ce8\u91ca\u7684\u573a\u666f\uff0c\u8fd9\u4e9b\u573a\u666f\u53ef\u4ee5\u81ea\u7531\u7ec4\u5408\u6210\u57ce\u5e02\u89c4\u6a21\u7684\u73af\u5883\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u5173\u6ce8\u5bb6\u5ead\u73af\u5883\u7684\u4f5c\u54c1\u4e0d\u540c\uff0cGRScenes\u6db5\u76d6\u4e8689\u4e2a\u591a\u6837\u5316\u7684\u573a\u666f\u7c7b\u522b\uff0c\u5f25\u5408\u4e86\u670d\u52a1\u5bfc\u5411\u73af\u5883\u4e2d\u673a\u5668\u4eba\u521d\u59cb\u90e8\u7f72\u7684\u5dee\u8ddd\u3002(b) GRResidents\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u975e\u73a9\u5bb6\u89d2\u8272\uff08NPC\uff09\u7cfb\u7edf\uff0c\u8d1f\u8d23\u793e\u4ea4\u4e92\u52a8\u3001\u4efb\u52a1\u751f\u6210\u548c\u4efb\u52a1\u5206\u914d\uff0c\u4ece\u800c\u6a21\u62dfembodied AI\u5e94\u7528\u4e2d\u7684\u793e\u4f1a\u573a\u666f\u3002(c) \u6807\u51c6\u5316\u57fa\u51c6GRBench\u652f\u6301\u5404\u79cd\u673a\u5668\u4eba\uff0c\u4f46\u4ee5\u817f\u8db3\u673a\u5668\u4eba\u4e3a\u4e3b\uff0c\u63d0\u4f9b\u6d89\u53ca\u7269\u4f53\u5bfc\u822a\u3001\u793e\u4ea4\u5bfc\u822a\u548c\u79fb\u52a8\u64cd\u4f5c\u7684\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5177\u6709\u9002\u5ea6\u7684\u6311\u6218\u6027\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u591f\u7f13\u89e3\u8be5\u9886\u57df\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u532e\u4e4f\uff0c\u5e76\u4e3aEmbodied AI\u7814\u7a76\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u9879\u76ee\u4ee3\u7801\u53ef\u4ecehttps://github.com/OpenRobotLab/GRUtopia\u83b7\u53d6\u3002**|\n", "2407.10909": "|**2024-07-15**|**FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets**|Xiaohui Victor Li et.al.|[2407.10909](http://arxiv.org/abs/2407.10909)|**[link](https://github.com/xiaohui-victor-li/FinDKG)**|\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff08DKGs\uff09\u662f\u4e00\u79cd\u6d41\u884c\u7684\u6570\u636e\u7ed3\u6784\uff0c\u7528\u4e8e\u8868\u793a\u968f\u65f6\u95f4\u53d8\u5316\u7684\u5bf9\u8c61\u4e4b\u95f4\u7684\u5404\u79cd\u8fde\u63a5\u3002\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u65e0\u7ed3\u6784\u6570\u636e\u6e90\uff08\u5982\u6587\u672c\u548c\u56fe\u50cf\uff09\u63d0\u53d6\u7684\u4fe1\u606f\u65f6\u5c55\u73b0\u51fa\u9ad8\u6548\u6027\u3002\u5728\u91d1\u878d\u5e94\u7528\u4e2d\uff0cDKGs\u53ef\u7528\u4e8e\u57fa\u4e8e\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u63a2\u6d4b\u6295\u8d44\u7b56\u7565\u7684\u8d8b\u52bf\u3002\u672c\u7814\u7a76\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\u7684\u7279\u6027\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u7684Fine-tuned LLM\uff0c\u79f0\u4e3a\u96c6\u6210\u4e0a\u4e0b\u6587\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\uff08ICKG\uff09\u3002\u5229\u7528ICKG\uff0c\u6211\u4eec\u4ece\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u4e2d\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u5f00\u6e90\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff0c\u79f0\u4e3aFinDKG\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6ce8\u610f\u529b\u673a\u5236\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff08KGTransformer\uff09\uff0c\u7528\u4e8e\u5206\u6790\u8fd9\u4e2a\u56fe\u8c31\u3002\u6211\u4eec\u5728\u57fa\u51c6\u6570\u636e\u96c6\u548cFinDKG\u4e0a\u6d4b\u8bd5\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7ed3\u679c\u663e\u793a\u5728\u94fe\u63a5\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0cKGTransformer\u8868\u73b0\u4f18\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86KGTransformer\u5728FinDKG\u4e0a\u7684\u4e3b\u9898\u6295\u8d44\u6027\u80fd\uff0c\u8bc1\u660e\u5b83\u80fd\u8d85\u8d8a\u73b0\u6709\u7684\u4e3b\u9898\u4ea4\u6613\u6240\u4ea4\u6613\u57fa\u91d1\uff08ETF\uff09\u3002|\n", "2407.10887": "|**2024-07-15**|**Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique**|Mark Russinovich et.al.|[2407.10887](http://arxiv.org/abs/2407.10887)|null|\u968f\u7740\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u76d7\u548c\u8bef\u7528\u7684\u62c5\u5fe7\u52a0\u5267\uff0c\u6a21\u578b\u6307\u7eb9\u5316\u7684\u5fc5\u8981\u6027\u63d0\u5347\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u6210\u529f\u7684\u6307\u7eb9\u5e94\u5177\u5907\u4e94\u4e2a\u7279\u6027\uff1a\u900f\u660e\u6027\u3001\u6548\u7387\u3001\u6301\u4e45\u6027\u3001\u9c81\u68d2\u6027\u548c\u4e0d\u53ef\u4f2a\u9020\u6027\u3002\u672c\u6587\u9996\u5148\u5b9a\u4e49\u4e86\u8fd9\u4e9b\u8981\u6c42\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7b80\u5355\u6307\u7eb9\u65b9\u6cd5\u2014\u2014Chain & Hash\uff0c\u5b83\u878d\u5408\u4e86\u52a0\u5bc6\u7406\u5ff5\uff0c\u5b9e\u73b0\u4e86\u6240\u6709\u8fd9\u4e9b\u7279\u6027\u3002Chain & Hash\u6d89\u53ca\u751f\u6210\u4e00\u7ec4\u95ee\u9898\uff08\u6307\u7eb9\uff09\u53ca\u5176\u53ef\u80fd\u7684\u7b54\u6848\uff0c\u7136\u540e\u4f7f\u7528\u5b89\u5168\u54c8\u5e0c\u6280\u672f\u5c06\u5b83\u4eec\u5408\u5e76\uff0c\u4ee5\u786e\u5b9a\u6bcf\u4e2a\u95ee\u9898\u7684\u503c\uff0c\u4ece\u800c\u4fdd\u8bc1\u4e0d\u53ef\u4f2a\u9020\u6027\uff0c\u9632\u6b62\u5bf9\u624b\u58f0\u79f0\u865a\u5047\u6240\u6709\u6743\u3002\u6211\u4eec\u5728\u591a\u4e2a\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86Chain & Hash\u6280\u672f\uff0c\u5e76\u5c55\u793a\u4e86\u5b83\u5bf9\u826f\u6027\u64cd\u4f5c\uff08\u5982\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\uff09\u548c\u654c\u610f\u5220\u9664\u6307\u7eb9\u7684\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5e26\u6307\u7eb9\u7684\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6027\u80fd\u51e0\u4e4e\u4e0e\u975e\u6307\u7eb9\u5316\u6a21\u578b\u76f8\u5f53\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u9ad8\u6548\u6027\u53ca\u5176\u5b9e\u7528\u4ef7\u503c\u3002|\n", "2407.10886": "|**2024-07-15**|**SLIP: Securing LLMs IP Using Weights Decomposition**|Yehonathan Refael et.al.|[2407.10886](http://arxiv.org/abs/2407.10886)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u4ef7\u503c\u4f5c\u4e3a\u77e5\u8bc6\u4ea7\u6743\uff08IP\uff09\u65e5\u76ca\u51f8\u663e\uff0c\u53cd\u6620\u51fa\u5176\u80cc\u540e\u5de8\u5927\u7684\u6295\u8d44\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4e91\u90e8\u7f72\u6210\u672c\u9ad8\uff0c\u8fb9\u7f18\u8bbe\u5907\u90e8\u7f72\u7684\u9700\u6c42\u589e\u52a0\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u53c2\u6570\u88ab\u76d7\u7528\u548c\u672a\u7ecf\u6388\u6743\u4f7f\u7528\u3002\u5f53\u524d\u7684\u4fdd\u62a4\u65b9\u6cd5\u5728\u5b9e\u7528\u6027\u3001\u51c6\u786e\u6027\u635f\u5931\u6216\u9002\u5e94\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6df7\u5408\u63a8\u7406\u7b97\u6cd5\uff0c\u79f0\u4e3aSLIP\uff08Secure Lightweight Inference Protocol\uff09\uff0c\u65e8\u5728\u4fdd\u62a4\u90e8\u7f72\u5728\u8fb9\u7f18\u7684\u6a21\u578b\u514d\u53d7\u76d7\u7a83\u3002SLIP\u662f\u9996\u4e2a\u517c\u987e\u5b9e\u9645\u5e94\u7528\u7684\u5b9e\u7528\u6027\u548c\u4e25\u683c\u5b89\u5168\u6027\u7684\u6df7\u5408\u534f\u8bae\uff0c\u540c\u65f6\u4fdd\u6301\u96f6\u7cbe\u5ea6\u4e0b\u964d\u548c\u4f4e\u5ef6\u8fdf\u5f71\u54cd\u3002 SLIP\u901a\u8fc7\u77e9\u9635\u5206\u89e3\u5b9e\u73b0\u4e86\u6a21\u578b\u5728\u4e24\u4e2a\u8ba1\u7b97\u8d44\u6e90\u4e4b\u95f4\u7684\u5212\u5206\uff1a\u4e00\u4e2a\u5b89\u5168\u4f46\u6602\u8d35\uff0c\u53e6\u4e00\u4e2a\u6210\u672c\u6548\u76ca\u9ad8\u4f46\u6613\u53d7\u653b\u51fb\u3002\u5173\u952e\u5728\u4e8e\uff0c\u5b89\u5168\u8d44\u6e90\u4fdd\u7559\u4e86\u6a21\u578bIP\u4e2d\u6700\u654f\u611f\u7684\u90e8\u5206\uff0c\u540c\u65f6\u6267\u884c\u6700\u5c11\u7684\u8ba1\u7b97\uff0c\u800c\u8106\u5f31\u8d44\u6e90\u5219\u76f8\u53cd\u3002\u6b64\u5916\uff0c\u8be5\u534f\u8bae\u63d0\u4f9b\u4e86\u9632\u6b62\u653b\u51fb\u8005\u5229\u7528\u5206\u5272\u83b7\u53d6\u4fdd\u5bc6\u4fe1\u606f\u7684\u5b89\u5168\u4fdd\u969c\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5b9e\u9a8c\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86SLIP\u7684\u7a33\u5065\u6027\u548c\u6709\u6548\u6027\uff0c\u4f7f\u5176\u6210\u4e3a\u4fdd\u62a4LLMs\u7684\u7406\u60f3\u89e3\u51b3\u65b9\u6848\u3002|\n", "2407.10873": "|**2024-07-15**|**Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models**|Rui Zhang et.al.|[2407.10873](http://arxiv.org/abs/2407.10873)|null|\u81ea\u52a8\u5316\u542f\u53d1\u5f0f\u8bbe\u8ba1\uff08AHD\uff09\u56e0\u5176\u5728\u81ea\u52a8\u5f00\u53d1\u9ad8\u6548\u542f\u53d1\u5f0f\u65b9\u6cd5\u65b9\u9762\u7684\u6f5c\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5c06AHD\u89c6\u4e3a\u8fdb\u5316\u7a0b\u5e8f\u641c\u7d22\uff08EPS\uff09\u95ee\u9898\u7684\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u8bbe\u7f6e\u4e0d\u4e00\u81f4\uff0c\u57fa\u7840\u6bd4\u8f83\u4e0d\u8db3\uff0c\u4e14\u7f3a\u4e4f\u5bf9LLM\u4e0e\u641c\u7d22\u7b56\u7565\u7ed3\u5408\u5fc5\u8981\u6027\u7684\u6df1\u5165\u5206\u6790\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u7684\u5b9e\u9645\u8fdb\u5c55\u96be\u4ee5\u5f97\u5230\u5145\u5206\u8bc1\u660e\u3002\u672c\u7814\u7a76\u901a\u8fc7\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86\u56db\u9879\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u548c\u56db\u9879AHD\u95ee\u9898\uff0c\u8de8\u8d8a\u4e5d\u79cdLLM\uff0c\u5e76\u8fdb\u884c\u4e86\u4e94\u6b21\u72ec\u7acb\u8fd0\u884c\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\uff0c\u5b9e\u8bc1\u4e86\u5728LLM\u9a71\u52a8\u7684AHD\u65b9\u6cd5\u4e2d\u7684\u8fdb\u5316\u641c\u7d22\u7684\u91cd\u8981\u6027\uff0c\u540c\u65f6\u4e5f\u63a8\u52a8\u4e86\u672a\u6765EPS\u7b97\u6cd5\u5f00\u53d1\u7684\u8fdb\u6b65\u3002\u4e3a\u4e86\u4fc3\u8fdb\u53ef\u8bbf\u95ee\u6027\u548c\u53ef\u91cd\u590d\u6027\uff0c\u6211\u4eec\u5df2\u7ecf\u5168\u9762\u5f00\u6e90\u4e86\u6211\u4eec\u7684\u57fa\u51c6\u548c\u76f8\u5173\u7ed3\u679c\u3002|\n", "2407.11965": "|**2024-07-16**|**UrbanWorld: An Urban World Model for 3D City Generation**|Yu Shang et.al.|[2407.11965](http://arxiv.org/abs/2407.11965)|null|\u57ce\u5e02\u4f5c\u4e3a\u4eba\u7c7b\u751f\u6d3b\u7684\u57fa\u672c\u73af\u5883\uff0c\u5305\u542b\u4e86\u5efa\u7b51\u3001\u9053\u8def\u548c\u690d\u88ab\u7b49\u591a\u5143\u7269\u7406\u5143\u7d20\uff0c\u8fd9\u4e9b\u5143\u7d20\u4e4b\u95f4\u5b58\u5728\u7740\u590d\u6742\u7684\u76f8\u4e92\u5173\u8054\u3002\u6784\u5efa\u903c\u771f\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u73af\u5883\u5bf9\u4e8e\u7814\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u611f\u77e5\u3001\u51b3\u7b56\u548c\u884c\u52a8\u7684AI\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u624b\u5de5\u5236\u4f5c\u8fc7\u7a0b\u8017\u65f6\u4e14\u7cbe\u7ec6\uff0c\u9700\u8981\u8bbe\u8ba1\u5e08\u6295\u5165\u5927\u91cf\u7cbe\u529b\u6765\u7cbe\u786e\u5448\u73b0\u590d\u6742\u7684\u57ce\u5e02\u7279\u5f81\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faUrbanWorld\uff0c\u8fd9\u662f\u4e00\u4e2a\u9996\u4e2a\u81ea\u52a8\u751f\u6210\u5b9a\u5236\u5316\u3001\u771f\u5b9e\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u4e16\u754c\u7684\u6a21\u578b\uff0c\u652f\u6301\u7075\u6d3b\u7684\u63a7\u5236\u6761\u4ef6\u3002UrbanWorld\u7684\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u56db\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u5229\u7528\u516c\u5f00\u7684OSM\u6570\u636e\u8fdb\u884c3D\u5e03\u5c40\u751f\u6210\u3001\u501f\u52a9\u5f3a\u5927\u7684\u57ce\u5e02\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08Urban MLLM\uff09\u8fdb\u884c\u57ce\u5e02\u573a\u666f\u89c4\u5212\u4e0e\u8bbe\u8ba1\u3001\u901a\u8fc7\u5148\u8fdb\u76843D\u6269\u6563\u6280\u672f\u5b9e\u73b0\u53ef\u63a7\u8d44\u4ea7\u6e32\u67d3\uff0c\u4ee5\u53caMLLM\u8f85\u52a9\u7684\u573a\u666f\u7ec6\u5316\u3002UrbanWorld\u751f\u6210\u7684\u9ad8\u4fdd\u771f3D\u57ce\u5e02\u73af\u5883\u4e3a\u901a\u7528AI\u548c\u673a\u5668\u611f\u77e5\u7cfb\u7edf\u5728\u6a21\u62df\u4e2d\u7684\u771f\u5b9e\u53cd\u9988\u548c\u4ea4\u4e92\u63d0\u4f9b\u4e86\u53ef\u80fd\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u5c06UrbanWorld\u4f5c\u4e3a\u5f00\u6e90\u4e14\u591a\u529f\u80fd\u7684\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u63d0\u5347AI\u5728\u771f\u5b9e\u57ce\u5e02\u73af\u5883\u4e2d\u7684\u611f\u77e5\u3001\u51b3\u7b56\u548c\u4e92\u52a8\u80fd\u529b\u3002|\n", "2407.11963": "|**2024-07-16**|**NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?**|Mo Li et.al.|[2407.11963](http://arxiv.org/abs/2407.11963)|**[link](https://github.com/open-compass/opencompass)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aNeedleBench\u7684\u6846\u67b6\uff0c\u5b83\u662f\u4e00\u7cfb\u5217\u8bc4\u4f30\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u957f\u6587\u672c\u7406\u89e3\u80fd\u529b\u7684\u9010\u6b65\u5347\u7ea7\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u6d89\u53ca\u4e0d\u540c\u957f\u5ea6\u533a\u95f4\uff084k\u30018k\u300132k\u3001128k\u3001200k\u30011M\u4e43\u81f3\u66f4\u957f\uff09\u548c\u6df1\u5ea6\u8303\u56f4\uff0c\u901a\u8fc7\u5728\u4e0d\u540c\u6587\u672c\u6df1\u5ea6\u533a\u57df\u63d2\u5165\u5173\u952e\u6570\u636e\u70b9\uff0c\u7cfb\u7edf\u6027\u5730\u6d4b\u8bd5\u6a21\u578b\u5728\u5404\u79cd\u60c5\u5883\u4e0b\u7684\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u9488\u5bf9\u4e8e\u53cc\u8bed\u957f\u6587\u672c\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u6846\u67b6\u6765\u8003\u5bdf\u4e3b\u6d41\u5f00\u6e90\u6a21\u578b\u8bc6\u522b\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e76\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7956\u5148\u8ffd\u8e2a\u6311\u6218\uff08Ancestral Trace Challenge\uff0cATC\uff09\uff0c\u65e8\u5728\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u957f\u6587\u672c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5355\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30LLMs\u5904\u7406\u590d\u6742\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u5728\u5b9e\u9645\u7684\u957f\u6587\u672c\u5e94\u7528\u4e2d\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u5904\u7406\u903b\u8f91\u63a8\u7406\u96be\u9898\u65f6\u9762\u4e34\u6311\u6218\u3002\u6240\u6709\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728OpenCompass\u9879\u76ee\uff08https://github.com/open-compass/opencompass\uff09\u83b7\u53d6\u3002**|\n", "2407.11934": "|**2024-07-16**|**Code Documentation and Analysis to Secure Software Development**|Paul Attie et.al.|[2407.11934](http://arxiv.org/abs/2407.11934)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCode Documentation and Analysis Tool\uff08CoDAT\uff09\u7684\u5de5\u5177\u3002CoDAT\u65e8\u5728\u4fdd\u6301\u4ee3\u7801\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\uff0c\u4f8b\u5982\uff0c\u5982\u679c\u4ee3\u7801\u7247\u6bb5\u4e2d\u7684\u67d0\u884c\u88ab\u4fee\u6539\uff0c\u76f8\u5e94\u7684\u6ce8\u91ca\u4e5f\u4f1a\u81ea\u52a8\u66f4\u65b0\uff0c\u786e\u4fdd\u5185\u90e8\u4e00\u81f4\u6027\u4ee5\u53ca\u4e0e\u4ee3\u7801\u7684\u4e00\u81f4\u6027\u3002\u901a\u8fc7\u6807\u8bb0\u8fc7\u65f6\u7684\u6ce8\u91ca\uff0cCoDAT\u63d0\u9192\u5f00\u53d1\u8005\u7ef4\u62a4\u6700\u65b0\u7684\u6587\u6863\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u67e5\u4ee3\u7801\u7247\u6bb5\u4e0e\u5176\u63cf\u8ff0\u7684\u8bed\u4e49\u4e00\u81f4\u6027\uff0c\u4ece\u800c\u4e5f\u80fd\u8bc6\u522b\u51fa\u8bed\u4e49\u4e0d\u4e00\u81f4\u548c\u8fc7\u65f6\u7684\u6ce8\u91ca\u3002\u8fd9\u6709\u52a9\u4e8e\u7a0b\u5e8f\u5458\u7f16\u5199\u6b63\u786e\u5b9e\u73b0\u4ee3\u7801\u8349\u56fe\u7684\u4ee3\u7801\uff0c\u652f\u6301\u9010\u6b65\u7ec6\u5316\u65b9\u6cd5\uff0c\u4ece\u4ee3\u7801\u8349\u56fe\u9010\u6b65\u6f14\u53d8\u4e3a\u7ecf\u8fc7\u4e00\u4e24\u6b21\u6216\u66f4\u591a\u6b21\u7ec6\u5316\u8fed\u4ee3\u7684\u4ee3\u7801\u3002 CoDAT\u5728IntelliJ IDEA IDE\u4e2d\u5b9e\u73b0\uff0c\u5229\u7528Code Insight\u5b88\u62a4\u7a0b\u5e8f\u5305\u7ed3\u5408\u81ea\u5b9a\u4e49\u6b63\u5219\u8868\u8fbe\u5f0f\u7b97\u6cd5\uff0c\u6807\u8bb0\u5bf9\u5e94\u4ee3\u7801\u5757\u5df2\u66f4\u6539\u7684\u6807\u8bb0\u6ce8\u91ca\u3002CoDAT\u7684\u540e\u7aef\u7ed3\u6784\u4e0a\u662f\u53bb\u4e2d\u5fc3\u5316\u7684\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u8d26\u672c\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u4ee3\u7801\u4e00\u81f4\u6027\u8ddf\u8e2a\u548c\u67b6\u6784\u7f16\u8bd1\u7ba1\u7406\u3002|\n", "2407.11919": "|**2024-07-16**|**What's Wrong? Refining Meeting Summaries with LLM Feedback**|Frederic Kirstein et.al.|[2407.11919](http://arxiv.org/abs/2407.11919)|null|\u968f\u7740\u6570\u5b57\u4f1a\u8bae\u7684\u666e\u53ca\uff0c\u4f1a\u8bae\u6458\u8981\u63d0\u70bc\u6210\u4e3a\u5173\u952e\u4efb\u52a1\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u5728\u8fde\u8d2f\u6027\u548c\u7406\u89e3\u4e0a\u4e0b\u6587\u4e2d\u8d85\u8d8a\u4e86\u4f20\u7edf\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4ecd\u9700\u6539\u8fdb\u4ee5\u4fdd\u6301\u76f8\u5173\u6027\u5e76\u907f\u514d\u9519\u8bef\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u591aLLM\u7684\u4f1a\u8bae\u6458\u8981\u4fee\u6b63\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6a21\u62df\u4eba\u7c7b\u5ba1\u67e5\uff1a\u9519\u8bef\u8bc6\u522b\u548c\u6458\u8981\u7cbe\u70bc\u3002\u6211\u4eec\u53d1\u5e03\u4e86QMSum Mistake\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b200\u4efd\u7531\u4eba\u5de5\u6807\u6ce8\u7684\u81ea\u52a8\u751f\u6210\u4f1a\u8bae\u6458\u8981\u6570\u636e\u96c6\uff0c\u9488\u5bf9\u7ed3\u6784\u3001\u9057\u6f0f\u548c\u4e0d\u76f8\u5173\u7b49\u4e5d\u79cd\u9519\u8bef\u7c7b\u578b\u8fdb\u884c\u4e86\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLLMs\u80fd\u591f\u51c6\u786e\u8bc6\u522b\u8fd9\u4e9b\u9519\u8bef\u3002\u6211\u4eec\u5c06\u8bc6\u522b\u51fa\u7684\u95ee\u9898\u8f6c\u5316\u4e3a\u53ef\u64cd\u4f5c\u7684\u53cd\u9988\uff0c\u4ee5\u6b64\u63d0\u5347\u6458\u8981\u7684\u8d28\u91cf\uff0c\u5982\u76f8\u5173\u6027\u3001\u4fe1\u606f\u91cf\u3001\u7b80\u6d01\u6027\u548c\u8fde\u8d2f\u6027\u3002\u8fd9\u79cd\u4e8b\u540e\u4f18\u5316\u7b56\u7565\u901a\u8fc7\u5229\u7528\u591a\u4e2aLLMs\u6765\u9a8c\u8bc1\u8f93\u51fa\u8d28\u91cf\uff0c\u6709\u6548\u63d0\u9ad8\u4e86\u6458\u8981\u8d28\u91cf\u3002\u6211\u4eec\u7684\u591aLLM\u4f1a\u8bae\u6458\u8981\u65b9\u6cd5\u5bf9\u4e8e\u9700\u8981\u7a33\u5065\u6027\u3001\u884c\u52a8\u8ba1\u5212\u548c\u76ee\u6807\u5bfc\u5411\u7684\u590d\u6742\u6587\u672c\u751f\u6210\u4efb\u52a1\u5177\u6709\u6f5c\u5728\u5e94\u7528\u4ef7\u503c\u3002|\n", "2407.11888": "|**2024-07-16**|**Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads**|Aritra Dhar et.al.|[2407.11888](http://arxiv.org/abs/2407.11888)|null|\u5728\u4e91\u5de5\u4f5c\u8d1f\u8f7d\u4e2d\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210AI\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u4e13\u7528\u786c\u4ef6\u52a0\u901f\u5668\uff0c\u5982GPU\u3001NPUs\u548cTPUs\uff0c\u56e0\u5176\u5728AI\u5e94\u7528\u4e2d\u7684\u5353\u8d8a\u6027\u80fd\u8d85\u8d8a\u4e86\u901a\u7528CPU\u3002AI\u6a21\u578b\u548c\u6570\u636e\u901a\u5e38\u5177\u6709\u9ad8\u5ea6\u654f\u611f\u6027\uff0c\u5e76\u6765\u81ea\u76f8\u4e92\u4e0d\u4fe1\u4efb\u7684\u5404\u65b9\u3002\u73b0\u6709\u7684\u57fa\u4e8eCPU\u7684\u53ef\u4fe1\u6267\u884c\u73af\u5883\uff08TEE\uff09\uff0c\u5982\u82f1\u7279\u5c14SGX\u6216AMD SEV\uff0c\u63d0\u4f9b\u7684\u4fdd\u62a4\u4e0d\u591f\u5145\u5206\u3002\u50cfNvidia-CC\u8fd9\u6837\u7684\u8bbe\u5907\u4e2d\u5fc3TEE\u4ec5\u9488\u5bf9\u7d27\u5bc6\u8026\u5408\u7684CPU-GPU\u7cfb\u7edf\uff0c\u4e14\u91c7\u7528\u4e13\u6709\u65b9\u6848\uff0c\u9700\u8981\u5728\u4e3b\u673aCPU\u4e0a\u90e8\u7f72TEE\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u73b0\u6709\u7684\u5b66\u672f\u63d0\u6848\u5927\u591a\u9488\u5bf9\u7279\u5b9a\u7684CPU-TEE\u5e73\u53f0\u3002 \u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Ascend-CC\uff0c\u4e00\u79cd\u57fa\u4e8e\u79bb\u6563NPUs\u7684\u673a\u5bc6\u8ba1\u7b97\u67b6\u6784\uff0c\u65e0\u9700\u5bf9\u4e3b\u673a\u7cfb\u7edf\u4fe1\u4efb\u3002Ascend-CC\u901a\u8fc7\u786e\u4fdd\u6570\u636e\u548c\u6a21\u578b\u52a0\u5bc6\uff0c\u4fdd\u62a4\u6570\u636e\u3001\u6a21\u578b\u53c2\u6570\u548c\u8fd0\u7b97\u7b26\u4e8c\u8fdb\u5236\uff0c\u63d0\u4f9b\u5f3a\u5927\u7684\u5b89\u5168\u6027\u3002\u5b83\u5229\u7528\u59d4\u6258\u5f0f\u5185\u5b58\u8bed\u4e49\u786e\u4fdd\u4e0e\u4e3b\u673a\u8f6f\u4ef6\u6808\u7684\u9694\u79bb\uff0c\u5e76\u901a\u8fc7\u4efb\u52a1\u9274\u6743\u63d0\u4f9b\u6a21\u578b\u5b8c\u6574\u6027\u7684\u5f3a\u6709\u529b\u4fdd\u8bc1\u3002\u6211\u4eec\u7684Ascend-CC\u5b9e\u73b0\u548c\u4e0e\u6700\u65b0LLMs\uff08\u5982Llama2\u548cLlama3\uff09\u7684\u8bc4\u4f30\u8868\u660e\uff0cAscend-CC\u5f15\u5165\u7684\u5f00\u9500\u6781\u5c0f\uff0c\u65e0\u9700\u4fee\u6539AI\u8f6f\u4ef6\u6808\u3002|\n", "2407.11852": "|**2024-07-16**|**Schema Matching with Large Language Models: an Experimental Study**|Marcel Parciak et.al.|[2407.11852](http://arxiv.org/abs/2407.11852)|**[link](https://github.com/uhasselt-dsi-data-systems-lab/code-schema-matching-llms-artefacs)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5173\u7cfb\u6570\u636e\u5e93\u67b6\u6784\uff08schema\uff09\u5339\u914d\u4e2d\u7684\u5e94\u7528\u3002\u76ee\u6807\u662f\u4ec5\u901a\u8fc7\u5143\u7d20\u540d\u79f0\u548c\u63cf\u8ff0\u627e\u51fa\u4e24\u4e2a\u5173\u7cfb\u6a21\u5f0f\u4e4b\u95f4\u7684\u8bed\u4e49\u5bf9\u5e94\u3002\u7814\u7a76\u8005\u6784\u5efa\u4e86\u4e00\u4e2a\u6765\u81ea\u5065\u5eb7\u9886\u57df\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u63d0\u51fa\u4e86\u4e0d\u540c\u7684\u4efb\u52a1\u8303\u56f4\uff0c\u5373\u4f7f\u7528\u4e0d\u540c\u6570\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u63d0\u793a\u6a21\u578b\u8fdb\u884cschema\u5339\u914d\u3002\u4ed6\u4eec\u5bf9\u6bd4\u4e86\u57fa\u4e8eLLM\u7684\u5339\u914d\u65b9\u6cd5\u4e0e\u57fa\u4e8e\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u7684\u57fa\u7ebf\uff0c\u8003\u5bdf\u4e86\u5339\u914d\u8d28\u91cf\u3001\u9a8c\u8bc1\u5de5\u4f5c\u91cf\u3001\u51b3\u7b56\u786e\u5b9a\u6027\u548c\u4e92\u8865\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u4fe1\u606f\u4f1a\u964d\u4f4e\u5339\u914d\u8d28\u91cf\uff0c\u8fc7\u591a\u7684\u4fe1\u606f\u4e5f\u4f1a\u6709\u8d1f\u9762\u5f71\u54cd\u3002\u65b0\u7248\u672c\u7684LLMs\u901a\u5e38\u80fd\u63d0\u9ad8\u51b3\u7b56\u786e\u5b9a\u6027\u3002\u6709\u4e9b\u4efb\u52a1\u8303\u56f4\u4e0b\u7684\u9a8c\u8bc1\u5de5\u4f5c\u76f8\u5bf9\u9002\u5ea6\uff0c\u4e14\u80fd\u6210\u529f\u8bc6\u522b\u5927\u91cf\u771f\u6b63\u610f\u4e49\u4e0a\u7684\u8bed\u4e49\u5339\u914d\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cLLMs\u6709\u6f5c\u529b\u4f5c\u4e3aschema\u5339\u914d\u7684\u521d\u59cb\u5de5\u5177\uff0c\u6570\u636e\u5de5\u7a0b\u5e08\u53ef\u4ee5\u5229\u7528\u5b83\u4eec\u7684\u540d\u79f0\u548c\u63cf\u8ff0\u4fe1\u606f\u5feb\u901f\u8fdb\u884c\u5339\u914d\uff0c\u65e0\u9700\u4f9d\u8d56\u5b9e\u9645\u6570\u636e\u5b9e\u4f8b\u3002**|\n", "2407.11833": "|**2024-07-16**|**LoFTI: Localization and Factuality Transfer to Indian Locales**|Sona Elza Simon et.al.|[2407.11833](http://arxiv.org/abs/2407.11833)|**[link](https://github.com/csalt-research/lofti)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u8bad\u7ec3\u5728\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u7684\u5927\u578b\u7f51\u9875\u6570\u636e\u96c6\uff0c\u79ef\u7d2f\u4e86\u5927\u91cf\u7684\u4e16\u754c\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u901a\u5e38\u503e\u5411\u4e8e\u82f1\u8bed\u548c\u897f\u6b27\u56fd\u5bb6\uff0c\u5bfc\u81f4LLMs\u5bf9\u6765\u81ea\u5176\u4ed6\u5730\u533a\uff0c\u7279\u522b\u662f\u5370\u5ea6\u7684\u672c\u5730\u5316\u67e5\u8be2\u4ea7\u751f\u504f\u89c1\u6216\u865a\u6784\u7684\u56de\u7b54\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6LoFTI\uff08\u5370\u5ea6\u672c\u5730\u5316\u4e0e\u4e8b\u5b9e\u8f6c\u79fb\uff09\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u672c\u5730\u5316\u548c\u4e8b\u5b9e\u6587\u672c\u8f6c\u6362\u80fd\u529b\u3002LoFTI\u5305\u542b\u5173\u4e8e\u5168\u7403\u6e90\u5730\u70b9\u548c\u5370\u5ea6\u76ee\u6807\u5730\u70b9\uff08\u5305\u62ec\u56fd\u5bb6\u3001\u5dde\u548c\u57ce\u5e02\u7684\u4e0d\u540c\u5c42\u7ea7\uff09\u5b9e\u4f53\u7684\u4e8b\u5b9e\u9648\u8ff0\uff0c\u6d89\u53ca\u5404\u7c7b\u5e7f\u6cdb\u7684\u4e3b\u9898\u3002\u6211\u4eec\u4f7f\u7528LoFTI\u6765\u8bc4\u4f30Mixtral\u3001GPT-4\u4ee5\u53ca\u4e24\u79cd\u9002\u7528\u4e8e\u672c\u5730\u5316\u4e8b\u5b9e\u8f6c\u79fb\u4efb\u52a1\u7684Mixtral\u884d\u751f\u65b9\u6cd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLoFTI\u662f\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5305\u62ecGPT-4\u5728\u5185\u7684\u6240\u6709\u6a21\u578b\u5728\u4e0d\u540c\u5c42\u7ea7\u7684\u672c\u5730\u5316\u4e0a\u90fd\u8868\u73b0\u51fa\u504f\u5dee\u3002**|\n", "2407.11827": "|**2024-07-16**|**GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text**|Kyle Hamilton et.al.|[2407.11827](http://arxiv.org/abs/2407.11827)|null|\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u5728\u68c0\u6d4b\u6587\u672c\u4e2d\u7684\u5ba3\u4f20\u624b\u6bb5\u65b9\u9762\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u4f46\u5927\u591a\u6570\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u201c\u9ed1\u76d2\u201d\u89e3\u51b3\u65b9\u6848\uff0c\u5176\u5185\u90e8\u5de5\u4f5c\u539f\u7406\u4e0d\u900f\u660e\u3002\u53ef\u89e3\u91ca\u7684\u65b9\u6cd5\u63d0\u4f9b\u4e86\u89e3\u51b3\u65b9\u6848\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u7cbe\u5fc3\u7684\u7279\u5f81\u5de5\u7a0b\u548c\u6602\u8d35\u7684\u4e13\u5bb6\u6807\u6ce8\u6570\u636e\u3002\u6b64\u5916\uff0c\u5173\u4e8e\u8bf4\u670d\u6027\u6587\u672c\u7684\u8bed\u8a00\u7279\u6027\u901a\u5e38\u7531\u4fee\u8f9e\u5b66\u5bb6\u6216\u8bed\u8a00\u5b66\u5bb6\u5173\u6ce8\uff0c\u4f46\u6ca1\u6709\u9002\u5408\u673a\u5668\u5b66\u4e60\u7684\u6807\u8bb0\u6709\u6b64\u7c7b\u7279\u6027\u7684\u6570\u636e\u96c6\u3002\u672c\u7814\u7a76\u65e8\u5728\u7f16\u7e82\u6587\u732e\u4e2d\u8bc6\u522b\u51fa\u768422\u4e2a\u4fee\u8f9e\u548c\u8bed\u8a00\u7279\u5f81\uff0c\u76ee\u7684\u662f\u5bf9\u4e00\u4e2a\u5df2\u6807\u6ce8\u6709\u5ba3\u4f20\u624b\u6bb5\u7684\u73b0\u6709\u6570\u636e\u96c6\u8fdb\u884c\u6ce8\u91ca\u3002\u4e3a\u4e86\u5e2e\u52a9\u4eba\u7c7b\u4e13\u5bb6\u5728\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u4e0a\u6807\u6ce8\u8fd9\u4e9b\u7279\u5f81\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u540d\u4e3aRhetAnn\u7684\u7f51\u7edc\u5e94\u7528\uff0c\u4ee5\u51cf\u5c11\u539f\u672c\u8f83\u5927\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u63a5\u7740\uff0c\u4f7f\u7528\u4e00\u5c0f\u90e8\u5206\u6807\u6ce8\u6570\u636e\uff0c\u6211\u4eec\u5229\u7528GPT-3.5\uff0c\u4e00\u79cd\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5bf9\u5269\u4f59\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u540c\u65f6\u517c\u987e\u6210\u672c\u6548\u76ca\u548c\u5206\u7c7b\u7cbe\u5ea6\u3002\u8fd9\u9879\u7814\u7a76\u8868\u660e\uff0c\u7ed3\u5408\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u793a\u4f8b\u4e0eGPT\uff0c\u53ef\u4ee5\u6709\u6548\u5730\u4ee5\u4f20\u7edf\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u4e13\u5bb6\u7684\u6807\u6ce8\u6210\u672c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u5b9e\u73b0\u5927\u89c4\u6a21\u6807\u6ce8\u8fc7\u7a0b\u7684\u6269\u5c55\u3002\u7ed3\u679c\u4e0e\u64b0\u5199\u65f6\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4\uff09\u76f8\u5f53\uff0c\u4e14\u6210\u672c\u964d\u4f4e10\u500d\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u8fd9\u4e9b\u7279\u5f81\u3001\u5b83\u4eec\u7684\u5c5e\u6027\u3001\u5b9a\u4e49\u4ee5\u53ca\u793a\u4f8b\u7684\u673a\u5668\u53ef\u8bfb\u683c\u5f0f\uff0c\u4ee5\u53caRhetAnn\u7684\u4ee3\u7801\u3001GPT\u63d0\u793a\u548c\u5fae\u8c03\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u63a8\u52a8\u4e86\u53ef\u89e3\u91ca\u7684\u5ba3\u4f20\u624b\u6bb5\u68c0\u6d4b\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\u3002|\n", "2407.11798": "|**2024-07-16**|**PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation**|Branden Butler et.al.|[2407.11798](http://arxiv.org/abs/2407.11798)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5206\u5e03\u5f0f\u8ba1\u7b97\u673a\u96c6\u7fa4\u4e0a\u7684\u63a8\u7406\u5df2\u6210\u4e3a\u7814\u7a76\u70ed\u70b9\uff0c\u8bb8\u591a\u52a0\u901f\u6280\u672f\u501f\u9274\u4e86CPU\u7684\u63a8\u6d4b\u6267\u884c\u7b56\u7565\u3002\u8fd9\u4e9b\u6280\u672f\u65e8\u5728\u7f13\u89e3\u5185\u5b58\u5e26\u5bbd\u74f6\u9888\uff0c\u4f46\u4f1a\u589e\u52a0\u6bcf\u6b21\u63a8\u7406\u8fd0\u884c\u7684\u7aef\u5230\u7aef\u5ef6\u8fdf\uff0c\u9700\u8981\u9ad8\u63a8\u6d4b\u63a5\u53d7\u7387\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4efb\u52a1\u95f4\u63a5\u53d7\u7387\u7684\u53d8\u5f02\u6027\uff0c\u63a8\u6d4b\u6027\u63a8\u7406\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u7ba1\u9053\u5e76\u884c\u8bbe\u8ba1\u9700\u8981\u5927\u91cf\u7528\u6237\u8bf7\u6c42\u4ee5\u4fdd\u6301\u9ad8\u5229\u7528\u7387\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PipeInfer\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u51cf\u5c11\u8de8\u4ee4\u724c\u5ef6\u8fdf\u3001\u63d0\u9ad8\u5355\u8bf7\u6c42\u573a\u666f\u4e0b\u7cfb\u7edf\u5229\u7528\u7387\u7684\u7ba1\u9053\u5316\u63a8\u6d4b\u52a0\u901f\u6280\u672f\uff0c\u540c\u65f6\u589e\u5f3a\u4e86\u5bf9\u4f4e\u63a8\u6d4b\u63a5\u53d7\u7387\u548c\u4f4e\u5e26\u5bbd\u4e92\u8054\u7684\u5bb9\u5fcd\u5ea6\u3002 PipeInfer\u901a\u8fc7\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u548c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6539\u8fdb\u3002\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u5141\u8bb8\u540c\u65f6\u8fdb\u884c\u5355\u4ee4\u724c\u63a8\u7406\u4e0e\u591a\u4e2a\u63a8\u6d4b\u8fd0\u884c\uff0c\u4ece\u800c\u964d\u4f4e\u5ef6\u8fdf\u548c\u751f\u6210\u901f\u5ea6\u3002\u800c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5219\u80fd\u591f\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u8df3\u8fc7\u65e0\u6548\u8fd0\u884c\u7684\u8ba1\u7b97\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u901f\u5ea6\u548c\u5ef6\u8fdf\u3002PipeInfer\u5728\u751f\u6210\u901f\u5ea6\u4e0a\u6bd4\u6807\u51c6\u63a8\u6d4b\u6027\u63a8\u7406\u6700\u9ad8\u53ef\u63d0\u53472.15\u500d\u3002|\n", "2407.11789": "|**2024-07-16**|**Large Language Models as Misleading Assistants in Conversation**|Betty Li Hou et.al.|[2407.11789](http://arxiv.org/abs/2407.11789)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u80fd\u591f\u63d0\u4f9b\u5e2e\u52a9\u3002\u7136\u800c\uff0c\u6a21\u578b\u8f93\u51fa\u53ef\u80fd\u4f1a\u8bef\u5bfc\u7528\u6237\uff0c\u65e0\u8bba\u662f\u65e0\u610f\u7684\u8fd8\u662f\u6545\u610f\u7684\u3002\u6211\u4eec\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u4efb\u52a1\u63a2\u8ba8\u4e86LLMs\u5728\u6b3a\u9a97\u6027\u8f85\u52a9\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5c06\u5176\u4f5c\u4e3a\u4eba\u7c7b\u7528\u6237\u7684\u4ee3\u7406\u3002\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u4e09\u79cd\u60c5\u51b5\uff1a\uff081\uff09\u6a21\u578b\u88ab\u63d0\u793a\u63d0\u4f9b\u771f\u5b9e\u4fe1\u606f\uff0c\uff082\uff09\u6a21\u578b\u88ab\u63d0\u793a\u8fdb\u884c\u5fae\u5999\u8bef\u5bfc\uff0c\u4ee5\u53ca\uff083\uff09\u6a21\u578b\u88ab\u63d0\u793a\u652f\u6301\u9519\u8bef\u7b54\u6848\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT-4\u80fd\u591f\u6709\u6548\u8bef\u5bfcGPT-3.5-Turbo\u548cGPT-4\u81ea\u8eab\uff0c\u6b3a\u9a97\u6027\u52a9\u624b\u5bfc\u81f4\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u9ad8\u8fbe23%\uff0c\u76f8\u6bd4\u4e8e\u4f7f\u7528\u771f\u5b9e\u52a9\u624b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5411\u7528\u6237\u6a21\u578b\u63d0\u4f9b\u66f4\u591a\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u53ef\u4ee5\u90e8\u5206\u62b5\u6d88\u6b3a\u9a97\u6a21\u578b\u7684\u5f71\u54cd\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u8bef\u5bfc\u6027\u4fe1\u606f\u7684\u80fd\u529b\u53ca\u5176\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u6f5c\u5728\u5f71\u54cd\u3002|\n", "2407.12735": "|**2024-07-17**|**EchoSight: Advancing Visual-Language Models with Wiki Knowledge**|Yibin Yan et.al.|[2407.12735](http://arxiv.org/abs/2407.12735)|null|**\u6458\u8981\uff1a** \u77e5\u8bc6\u9a71\u52a8\u7684\u89c6\u89c9\u95ee\u7b54\uff08KVQA\uff09\u4efb\u52a1\u8981\u6c42\u5229\u7528\u4e30\u5bcc\u80cc\u666f\u77e5\u8bc6\u89e3\u7b54\u56fe\u50cf\u76f8\u5173\u95ee\u9898\uff0c\u4f46\u751f\u6210\u6a21\u578b\u5728\u8fd9\u65b9\u9762\u5e38\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faEchoSight\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5e2e\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u9700\u8981\u8be6\u5c3d\u767e\u79d1\u77e5\u8bc6\u7684\u89c6\u89c9\u95ee\u7b54\u3002EchoSight\u9996\u5148\u4ec5\u4f7f\u7528\u56fe\u50cf\u4fe1\u606f\u5728\u7ef4\u57fa\u767e\u79d1\u4e2d\u641c\u7d22\u6587\u7ae0\uff0c\u7136\u540e\u5bf9\u5019\u9009\u6587\u7ae0\u6839\u636e\u5b83\u4eec\u4e0e\u6587\u672c-\u56fe\u50cf\u67e5\u8be2\u7684\u76f8\u5173\u6027\u8fdb\u884c\u4e8c\u6b21\u6392\u5e8f\uff0c\u4ece\u800c\u663e\u8457\u63d0\u5347\u591a\u6a21\u6001\u77e5\u8bc6\u7684\u6574\u5408\uff0c\u8fdb\u800c\u63d0\u9ad8\u68c0\u7d22\u6548\u679c\u548c\u7b54\u6848\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u5728Encyclopedic VQA\u548cInfoSeek\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEchoSight\u5728\u77e5\u8bc6\u578b\u89c6\u89c9\u95ee\u7b54\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684state-of-the-art\u6210\u7ee9\uff0cEncyclopedic VQA\u4efb\u52a1\u4e0a\u8fbe\u523041.8%\u7684\u51c6\u786e\u7387\uff0cInfoSeek\u4efb\u52a1\u4e0a\u8fbe\u523031.3%\u3002|\n", "2407.12727": "|**2024-07-17**|**NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model**|Zhongqun Zhang et.al.|[2407.12727](http://arxiv.org/abs/2407.12727)|null|### \u80cc\u666f \u5728\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u91cd\u5efa\u4e2d\uff0c\u7cbe\u786e\u7684\u624b\u90e8\u4e0e\u7269\u4f53\u4e4b\u95f4\u7684\u7269\u7406\u63a5\u89e6\u662f\u63d0\u5347\u624b\u90e8\u59ff\u6001\u4f30\u8ba1\u51c6\u786e\u6027\u548c\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u7684\u6807\u51c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u96be\u4ee5\u5b9a\u4e49\u6216\u63a7\u5236\u7684\u51e0\u4f55\u7ea6\u675f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u8fdb\u884c\u53ef\u63a7\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u5efa\u6a21\u3002\u9762\u4e34\u7684\u6311\u6218\u5305\u62ec\uff1a\u4e00\u3001\u4ece\u8bed\u8a00\u5230\u63a5\u89e6\u7684\u590d\u6742\u8de8\u6a21\u6001\u5efa\u6a21\uff1b\u4e8c\u3001\u7f3a\u4e4f\u63cf\u8ff0\u63a5\u89e6\u6a21\u5f0f\u7684\u6587\u672c\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86NL2Contact\u6a21\u578b\uff0c\u5b83\u5229\u7528\u5206\u6bb5\u6269\u6563\u6a21\u578b\u751f\u6210\u53ef\u63a7\u5236\u7684\u63a5\u89e6\u3002\u7ed9\u5b9a\u5bf9\u624b\u548c\u63a5\u89e6\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\uff0cNL2Contact\u80fd\u591f\u751f\u6210\u903c\u771f\u4e14\u5fe0\u5b9e\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5f00\u53d1\u4e86NL2Contact\u6a21\u578b\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u751f\u6210\u5177\u6709\u63a7\u5236\u6027\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002\u4e3a\u8bad\u7ec3\u8fd9\u4e2a\u6a21\u578b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u9996\u4e2a\u540d\u4e3a\\textit{ContactDescribe}\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u57fa\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\uff08\u5982\u6293\u53d6\u52a8\u4f5c\u3001\u6293\u53d6\u7c7b\u578b\u3001\u63a5\u89e6\u4f4d\u7f6e\u548c\u81ea\u7531\u624b\u6307\u72b6\u6001\uff09\u751f\u6210\u7684\u4e30\u5bcc\u591a\u6837\u7684\u624b\u90e8\u4e2d\u5fc3\u63a5\u89e6\u63cf\u8ff0\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u4f18\u5316\u6293\u63e1\u59ff\u52bf\u548c\u57fa\u4e8e\u6587\u672c\u63cf\u8ff0\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u65b9\u9762\u5c55\u793a\u4e86\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.12725": "|**2024-07-17**|**Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?**|Ben Yao et.al.|[2407.12725](http://arxiv.org/abs/2407.12725)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u89e3\u51b3\u590d\u6742\u95ee\u9898\u7684\u80fd\u529b\u65b9\u9762\uff0c\u901a\u8fc7\u9010\u6b65\u63a8\u7406\u6b65\u9aa4\u7684\u6269\u5c55\u663e\u8457\u63d0\u5347\u5176\u6027\u80fd\uff0c\u56e0\u4e3a\u8fd9\u4fc3\u4f7f\u6a21\u578b\u8fdb\u884c\u5e8f\u5217\u601d\u8003\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u5bf9\u8bbd\u523a\u7684\u7406\u89e3\u901a\u5e38\u88ab\u89c6\u4e3a\u4e00\u79cd\u76f4\u89c9\u4e14\u6574\u4f53\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u5b83\u6574\u5408\u4e86\u8bed\u8a00\u3001\u4e0a\u4e0b\u6587\u548c\u60c5\u611f\u7ebf\u7d22\uff0c\u5f62\u6210\u5bf9\u8bf4\u8bdd\u8005\u771f\u5b9e\u610f\u56fe\u7684\u5168\u9762\u7406\u89e3\uff0c\u8fd9\u79cd\u7406\u89e3\u88ab\u8ba4\u4e3a\u4e0d\u5c40\u9650\u4e8e\u4e00\u6b65\u6b65\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u89c2\u70b9\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u6846\u67b6\uff0c\u79f0\u4e3aSarcasmCue\uff0c\u5b83\u5305\u542b\u4e86\u56db\u79cd\u63d0\u793a\u7b56\u7565\uff1a\u8fde\u9501\u77db\u76fe\uff08CoC\uff09\u3001\u7ebf\u7d22\u56fe\uff08GoC\uff09\u3001\u7ebf\u7d22\u96c6\u5408\uff08BoC\uff09\u548c\u7ebf\u7d22\u5f20\u91cf\uff08ToC\uff09\u3002\u8fd9\u4e9b\u65b9\u6cd5\u65e8\u5728\u5f15\u5bfcLLMs\u901a\u8fc7\u8003\u8651\u987a\u5e8f\u548c\u975e\u987a\u5e8f\u63d0\u793a\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u8bbd\u523a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5168\u9762\u5b9e\u8bc1\u6bd4\u8f83\u8868\u660e\uff0c\u6211\u4eec\u7684\u56db\u79cd\u63d0\u793a\u65b9\u6cd5\u660e\u663e\u4f18\u4e8e\u6807\u51c6\u7684\u8f93\u5165-\u8f93\u51fa\u63d0\u793a\u3001CoT\u548cToT\uff0c\u800c\u4e14\u975e\u987a\u5e8f\u63d0\u793a\u901a\u5e38\u4f18\u4e8e\u987a\u5e8f\u63d0\u793a\u3002|\n", "2407.12723": "|**2024-07-17**|**The Future of Learning: Large Language Models through the Lens of Students**|He Zhang et.al.|[2407.12723](http://arxiv.org/abs/2407.12723)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u63d0\u5347\u548c\u529f\u80fd\u6269\u5c55\u5bf9\u6559\u80b2\u9886\u57df\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u672c\u7814\u7a76\u901a\u8fc7\u8bbf\u8c0814\u540d\u5b66\u751f\uff0c\u63a2\u8ba8\u4ed6\u4eec\u65e5\u5e38\u4e0eChatGPT\u7684\u4e92\u52a8\u3002\u521d\u6b65\u7ed3\u679c\u663e\u793a\uff0c\u5b66\u751f\u4eec\u5728\u4eab\u53d7ChatGPT\u63d0\u9ad8\u5b66\u4e60\u6548\u7387\u548c\u4fe1\u606f\u83b7\u53d6\u4fbf\u5229\u7684\u540c\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4fe1\u4efb\u5371\u673a\u548c\u4f26\u7406\u987e\u8651\u3002\u4ed6\u4eec\u8ba4\u4e3aChatGPT\u76f8\u8f83\u4e8e\u4f20\u7edfAI\u66f4\u663e\u201c\u4eba\u6027\u5316\u201d\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u77db\u76fe\u60c5\u7eea\u3001\u884c\u4e3a\u4e0d\u4e00\u81f4\u4ee5\u53ca\u5bf9\u5b66\u751f\u6574\u4f53\u4e0a\u79ef\u6781\u7684\u6001\u5ea6\uff0c\u51f8\u663e\u4e86ChatGPT\u5728\u6559\u80b2\u9886\u57df\u7684\u6f5c\u5728\u4ef7\u503c\u3002\u4f46\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5c3d\u7ba1\u5176\u667a\u80fd\u7a0b\u5ea6\u9ad8\uff0c\u53ef\u80fd\u5e26\u6765\u8d1f\u9762\u6548\u5e94\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f3a\u8c03\u5728\u5e94\u7528\u65f6\u9700\u8c28\u614e\uff0c\u5e76\u81f4\u529b\u4e8e\u5728\u672a\u6765\u7684\u5f00\u53d1\u4e2d\u51cf\u5c11\u6f5c\u5728\u7684\u5371\u5bb3\u3002|\n", "2407.12709": "|**2024-07-17**|**MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models**|Leyang Shen et.al.|[2407.12709](http://arxiv.org/abs/2407.12709)|**[link](https://github.com/jiutian-vl/mome)**|**\u5728\u591a\u9879\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u901a\u7528\u7684MLLM\u5728\u5927\u591a\u6570VL\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4e0d\u5982\u4e13\u95e8\u5316\u7684MLLM\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b58\u5728\u4efb\u52a1\u5e72\u6270\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u591a\u6a21\u6001\u4e13\u5bb6\uff08MoME\uff09\u67b6\u6784\uff0c\u65e8\u5728\u51cf\u8f7b\u4efb\u52a1\u5e72\u6270\uff0c\u4ece\u800c\u83b7\u5f97\u4e00\u4e2a\u5168\u80fd\u7684MLLM\u3002MoME\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a\u89c6\u89c9\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoVE\uff09\u548c\u8bed\u8a00\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoLE\uff09\u3002MoVE\u80fd\u591f\u81ea\u9002\u5e94\u5730\u8c03\u6574\u6765\u81ea\u4e0d\u540c\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7279\u5f81\uff0c\u5e76\u5728\u8f6c\u6362\u67b6\u6784\u4e0a\u5177\u6709\u826f\u597d\u7684\u517c\u5bb9\u6027\u3002MoLE\u901a\u8fc7\u7a00\u758f\u95e8\u63a7\u4e13\u5bb6\u878d\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u5b9e\u73b0\u4e86\u51e0\u4e4e\u65e0\u989d\u5916\u6210\u672c\u7684\u6027\u80fd\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u4efb\u52a1\u5e72\u6270\uff0cMoME\u4e13\u6ce8\u4e8e\u89c6\u89c9\u548c\u8bed\u8a00\u4e24\u79cd\u6a21\u6001\uff0c\u4ee5\u9002\u5e94\u4efb\u52a1\u95f4\u7684\u5dee\u5f02\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoME\u663e\u8457\u63d0\u9ad8\u4e86\u901a\u7528MLLM\u5728\u5404\u79cdVL\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6e90\u4ee3\u7801\u5df2\u5728https://github.com/JiuTian-VL/MoME\u4e0a\u53d1\u5e03\u3002**|\n", "2407.12665": "|**2024-07-17**|**Patch-Level Training for Large Language Models**|Chenze Shao et.al.|[2407.12665](http://arxiv.org/abs/2407.12665)|**[link](https://github.com/shaochenze/patchtrain)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\uff0c\u5176\u8bad\u7ec3\u6548\u7387\u6210\u4e3a\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0cLLMs\u901a\u8fc7\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u8fdb\u884c\u8bad\u7ec3\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4ee4\u724c\u7684\u8bad\u7ec3\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5176\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u5904\u7406\u5927\u91cf\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cpatch-level training\u201d\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u591a\u4e2a\u4ee4\u724c\u538b\u7f29\u6210\u5355\u4e2apatch\u6765\u7f29\u77ed\u5e8f\u5217\u957f\u5ea6\u3002\u5728patch-level\u8bad\u7ec3\u4e2d\uff0c\u6211\u4eec\u8f93\u5165\u66f4\u77ed\u7684patch\u5e8f\u5217\uff0c\u8ba9\u6a21\u578b\u5b66\u4e60\u9884\u6d4b\u4e0b\u4e00\u4e2apatch\uff0c\u4ece\u800c\u5927\u5e45\u5ea6\u51cf\u5c11\u4e86\u5927\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u5904\u7406\u6210\u672c\u3002\u63a5\u7740\uff0c\u6a21\u578b\u4f1a\u8fdb\u884c\u5269\u4f59\u8bad\u7ec3\u6570\u636e\u7684\u4ee4\u724c\u7ea7\u8bad\u7ec3\uff0c\u4ee5\u9002\u5e94\u63a8\u7406\u6a21\u5f0f\u3002\u5b9e\u9a8c\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08370M-2.7\u4ebf\u53c2\u6570\uff09\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660epatch-level\u8bad\u7ec3\u53ef\u4ee5\u5c06\u603b\u4f53\u8ba1\u7b97\u6210\u672c\u964d\u4f4e\u81f30.5\u500d\uff0c\u540c\u65f6\u4e0d\u4f1a\u5f71\u54cd\u6a21\u578b\u6027\u80fd\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/shaochenze/PatchTrain}\u3002**|\n", "2407.12642": "|**2024-07-17**|**Zero-shot Text-guided Infinite Image Synthesis with LLM guidance**|Soyeong Kwon et.al.|[2407.12642](http://arxiv.org/abs/2407.12642)|null|**\u80cc\u666f\uff1a** \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\u548c\u751f\u6210\u65b9\u6cd5\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u7136\u800c\uff0c\u6587\u672c\u5f15\u5bfc\u7684\u65e0\u9650\u56fe\u50cf\u5408\u6210\u9762\u4e34\u7740\u4e00\u4e9b\u6311\u6218\u3002\u9996\u5148\uff0c\u7f3a\u4e4f\u9ad8\u5206\u8fa8\u7387\u4e14\u5177\u6709\u4e30\u5bcc\u60c5\u5883\u591a\u6837\u6027\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u6570\u636e\u96c6\u3002\u5176\u6b21\uff0c\u6839\u636e\u6587\u672c\u6269\u5c55\u56fe\u50cf\u9700\u8981\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u4e30\u5bcc\u7684\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7c7b\u522b\uff0c\u5982\u81ea\u7136\u98ce\u666f\uff0c\u4e14\u9700\u8981\u5728\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u53ca\u5176\u914d\u6587\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u540c\u65f6\u5904\u7406\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u65e0\u9700\u4efb\u4f55\u9ad8\u5206\u8fa8\u7387\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u3002 **\u65b9\u6cd5\uff1a** \u6211\u4eec\u5728\u8bad\u7ec3\u6269\u6563\u6a21\u578b\u65f6\uff0c\u8ba9\u5b83\u6839\u636eLLM\u751f\u6210\u7684\u5168\u5c40\u548c\u5c40\u90e8\u63cf\u8ff0\u4ee5\u53ca\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u7ed9\u5b9a\u4e00\u5f20\u56fe\u7247\u548c\u4e00\u4e2a\u5168\u5c40\u63cf\u8ff0\uff0c\u6211\u4eec\u4f7f\u7528LLM\u751f\u6210\u4e0b\u4e00\u4e2a\u5c40\u90e8\u63cf\u8ff0\u6765\u6269\u5c55\u8f93\u5165\u56fe\u50cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u7ed3\u5408\u5168\u5c40\u63cf\u8ff0\u3001\u751f\u6210\u7684\u5c40\u90e8\u63cf\u8ff0\u548c\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\uff0c\u4ee5\u786e\u4fdd\u5168\u5c40\u4e00\u81f4\u6027\u5e76\u8003\u8651\u7a7a\u95f4\u5c40\u90e8\u4e0a\u4e0b\u6587\u3002 **\u5b9e\u9a8c\u7ed3\u679c\uff1a** \u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5b9a\u91cf\u548c\u5b9a\u6027\u4e0a\u90fd\u4f18\u4e8e\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5c55\u793a\u4e86\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u501f\u52a9LLM\u5f15\u5bfc\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u7684\u80fd\u529b\u3002 \u603b\u7ed3\uff1a \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u6269\u5c55\u65b9\u6cd5\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u5206\u8fa8\u7387\u7684\u914d\u5bf9\u6570\u636e\uff0c\u80fd\u591f\u5b9e\u73b0\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u5e76\u5728\u5b9e\u9a8c\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u652f\u6301\u96f6\u6837\u672c\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u3002|\n", "2407.12620": "|**2024-07-17**|**Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences**|Claudio Pinhanez et.al.|[2407.12620](http://arxiv.org/abs/2407.12620)|null|\u81ea2022\u5e74\u4ee5\u6765\uff0c\u6211\u4eec\u4e00\u76f4\u5728\u63a2\u7d22\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u73b0\u4ee3\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u9886\u57df\uff0c\u4ee5\u652f\u6301\u548c\u4fc3\u8fdb\u6fd2\u4e34\u6d88\u5931\u7684\u571f\u8457\u8bed\u8a00\u7684\u4f7f\u7528\u4e0e\u6587\u6863\u5316\u3002\u9996\u5148\uff0c\u6211\u4eec\u5173\u6ce8\u4e16\u754c\u8bed\u8a00\u591a\u6837\u6027\u7684\u51cf\u5c11\uff0c\u5e76\u8ba8\u8bba\u4e0e\u5904\u7406\u571f\u8457\u8bed\u8a00\u76f8\u5173\u7684\u72ec\u7279\u4f26\u7406\u6311\u6218\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u533a\u53c2\u4e0e\u548c\u4f7f\u7528\u7684AI\u5f00\u53d1\u65b0\u5faa\u73af\u3002\u63a5\u7740\uff0c\u6211\u4eec\u62a5\u544a\u4e86\u4f7f\u7528\u5c11\u91cf\u6570\u636e\u5fae\u8c03\u6700\u5148\u8fdb\u7684\u7ffb\u8bd1\u5668\uff0c\u6210\u529f\u5f00\u53d1\u51fa\u9ad8\u8d28\u91cf\u7684\u571f\u8457\u8bed\u8a00\u673a\u5668\u7ffb\u8bd1\u7684\u9f13\u821e\u4eba\u5fc3\u7684\u6210\u679c\uff0c\u5e76\u8ba8\u8bba\u4e86\u907f\u514d\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u5e38\u89c1\u9677\u9631\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e862023\u5e74\u548c2024\u5e74\u5728\u5df4\u897f\u4e0e\u571f\u8457\u793e\u533a\u5408\u4f5c\u9879\u76ee\u4e2d\u7684\u539f\u578b\uff0c\u76ee\u6807\u662f\u7b80\u5316\u5199\u4f5c\uff0c\u4ee5\u53ca\u53d1\u5c55\u571f\u8457\u8bed\u8a00\u6a21\u578b\uff08ILMs\uff09\u4f5c\u4e3a\u521b\u5efa\u62fc\u5199\u68c0\u67e5\u5668\u3001\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u5668\u7b49\u5de5\u5177\u7684\u53ef\u590d\u5236\u548c\u53ef\u6269\u5c55\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u671b\u4e00\u4e2a\u672a\u6765\uff0c\u6fd2\u5371\u7684\u8bed\u8a00\u5c06\u901a\u8fc7\u4e92\u52a8\u7684\u8bed\u8a00\u6a21\u578b\u5f97\u4ee5\u4fdd\u5b58\u3002|\n", "2407.12613": "|**2024-07-17**|**AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism**|William Brannon et.al.|[2407.12613](http://arxiv.org/abs/2407.12613)|**[link](https://github.com/mit-ccc/AudienceView-demo)**|****\u80cc\u666f\uff1a** \u8bb0\u8005\u7406\u89e3\u548c\u5229\u7528\u53d7\u4f17\u53cd\u9988\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5982\u4eca\u4ed6\u4eec\u5728\u7ebf\u9762\u4e34\u5927\u91cf\u89c2\u4f17\u8bc4\u8bba\uff0c\u8fd9\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u6211\u4eec\u63a8\u51fa\u4e86AudienceView\uff0c\u4e00\u4e2a\u5728\u7ebf\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e2e\u52a9\u8bb0\u8005\u5bf9\u8fd9\u4e9b\u53cd\u9988\u8fdb\u884c\u5206\u7c7b\u548c\u89e3\u8bfb\u3002AudienceView\u8bc6\u522b\u4e3b\u9898\u548c\u8bdd\u9898\uff0c\u5c06\u5b83\u4eec\u4e0e\u7279\u5b9a\u8bc4\u8bba\u5173\u8054\uff0c\u5c55\u793a\u8bc4\u8bba\u7684\u60c5\u611f\u503e\u5411\u548c\u5206\u5e03\uff0c\u5e76\u534f\u52a9\u7528\u6237\u6784\u601d\u540e\u7eed\u62a5\u9053\u9879\u76ee\u3002\u6211\u4eec\u5c06\u63a2\u8ba8\u8fd9\u7c7b\u5de5\u5177\u5982\u4f55\u878d\u5165\u8bb0\u8005\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5f3a\u8c03\u60c5\u5883\u7406\u89e3\u53ca\u4eba\u7c7b\u5224\u65ad\u7684\u91cd\u8981\u6027\u3002 \u8bf7\u8bb0\u4f4f\uff0c\u4ee5\u4e0a\u7ffb\u8bd1\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002**|\n", "2407.12580": "|**2024-07-17**|**E5-V: Universal Embeddings with Multimodal Large Language Models**|Ting Jiang et.al.|[2407.12580](http://arxiv.org/abs/2407.12580)|**[link](https://github.com/kongds/e5-v)**|**### \u80cc\u666f \u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u901a\u7528\u89c6\u89c9\u548c\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5982\u4f55\u5229\u7528MLLMs\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u7684\u8868\u793a\u65b9\u5f0f\u5c1a\u672a\u5145\u5206\u7814\u7a76\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6846\u67b6E5-V\uff0c\u65e8\u5728\u4f7fMLLMs\u9002\u5e94\u5b9e\u73b0\u901a\u7528\u591a\u6a21\u6001\u5d4c\u5165\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\uff0cMLLMs\u5728\u5904\u7406\u591a\u6a21\u6001\u8f93\u5165\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u901a\u8fc7\u7ed3\u5408\u63d0\u793a\uff0cE5-V\u6709\u6548\u5730\u5f25\u5408\u4e86\u4e0d\u540c\u7c7b\u578b\u8f93\u5165\u4e4b\u95f4\u7684\u6a21\u6001\u9e3f\u6c9f\uff0c\u5373\u4f7f\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5d4c\u5165\u80fd\u529b\u3002 ### \u65b9\u6cd5 E5-V\u91c7\u7528\u5355\u4e00\u6a21\u6001\u8bad\u7ec3\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u57fa\u4e8e\u56fe\u50cf-\u6587\u672c\u5bf9\u7684\u591a\u6a21\u6001\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86\u5927\u7ea695%\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u907f\u514d\u4e86\u6536\u96c6\u6602\u8d35\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6570\u636e\u7684\u9700\u6c42\u3002\u5b9e\u9a8c\u5728\u56db\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u9a8c\u8bc1\uff0c\u4ee5\u5c55\u793aE5-V\u7684\u6709\u6548\u6027\u3002 ### \u7ed3\u679c \u4f5c\u4e3a\u4e00\u6b3e\u901a\u7528\u591a\u6a21\u6001\u6a21\u578b\uff0cE5-V\u4e0d\u4ec5\u5728\u5404\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9876\u5c16\u6027\u80fd\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\u4e86\u73b0\u6709\u6280\u672f\u6c34\u5e73\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u662f\u57fa\u4e8e\u5355\u6a21\u6001\u8bad\u7ec3\u5b8c\u6210\u7684\u3002**|\n", "2407.13761": "|**2024-07-18**|**SegPoint: Segment Any Point Cloud via Large Language Model**|Shuting He et.al.|[2407.13761](http://arxiv.org/abs/2407.13761)|null|\u5c3d\u7ba1\u4e09\u7ef4\u70b9\u4e91\u5206\u5272\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u4f9d\u8d56\u4e8e\u660e\u786e\u7684\u6307\u4ee4\u6765\u8bc6\u522b\u76ee\u6807\uff0c\u7f3a\u4e4f\u5728\u7edf\u4e00\u6846\u67b6\u4e2d\u7406\u89e3\u548c\u63a8\u65ad\u7528\u6237\u9690\u542b\u610f\u56fe\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSegPoint\u7684\u6a21\u578b\uff0c\u5b83\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u70b9\u7ea7\u5206\u5272\uff1a1\uff09\u4e09\u7ef4\u6307\u4ee4\u5206\u5272\uff0c2\uff09\u4e09\u7ef4\u6307\u79f0\u5206\u5272\uff0c3\uff09\u4e09\u7ef4\u8bed\u4e49\u5206\u5272\uff0c\u4ee5\u53ca4\uff09\u4e09\u7ef4\u5f00\u653e\u8bcd\u6c47\u8bed\u4e49\u5206\u5272\u3002\u4e3a\u4e86\u63a8\u52a8\u4e09\u7ef4\u6307\u4ee4\u7814\u7a76\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6Instruct3D\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ece\u590d\u6742\u548c\u9690\u542b\u6307\u4ee4\u6587\u672c\u8fdb\u884c\u5206\u5272\u6027\u80fd\uff0c\u5305\u542b2,565\u4e2a\u70b9\u4e91-\u6307\u4ee4\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSegPoint\u5728ScanRefer\u6307\u79f0\u5206\u5272\u548cScanNet\u8bed\u4e49\u5206\u5272\u7b49\u65e2\u6709\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u7ade\u4e89\u529b\uff0c\u540c\u65f6\u5728Instruct3D\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u4f18\u5f02\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cSegPoint\u662f\u9996\u4e2a\u5728\u4e00\u4e2a\u6846\u67b6\u5185\u5904\u7406\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u5206\u5272\u4efb\u52a1\u5e76\u8fbe\u5230\u6ee1\u610f\u6027\u80fd\u7684\u6a21\u578b\u3002|\n", "2407.13757": "|**2024-07-18**|**Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models**|Zhuo Chen et.al.|[2407.13757](http://arxiv.org/abs/2407.13757)|null|## \u4efb\u52a1 \u672c\u7814\u7a76\u5173\u6ce8\u4e8eRetrieval-Augmented Generation\uff08RAG\uff09\u6a21\u578b\u5728\u9762\u5bf9\u9ed1\u76d2\u653b\u51fb\u65f6\u7684\u8106\u5f31\u6027\uff0c\u5c24\u5176\u662f\u5728\u610f\u89c1\u64cd\u7eb5\u65b9\u9762\u7684\u5e94\u7528\u3002RAG\u65e8\u5728\u89e3\u51b3\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5e7b\u89c9\u95ee\u9898\u548c\u5b9e\u65f6\u7ea6\u675f\uff0c\u4f46\u540c\u65f6\u4e5f\u66b4\u9732\u51fa\u5bf9\u6297\u68c0\u7d22\u7be1\u6539\u653b\u51fb\u7684\u5f31\u70b9\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u767d\u76d2\u548c\u5c01\u95ed\u9886\u57df\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684RAG\u4e0d\u7a33\u5b9a\u6027\u3002\u672c\u6587\u7684\u76ee\u6807\u662f\u63ed\u793a\u5f53RAG\u6a21\u578b\u906d\u9047\u9ed1\u76d2\u653b\u51fb\u65f6\uff0c\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4e3a\u63d0\u9ad8\u6a21\u578b\u7684\u53ef\u9760\u6027\u548c\u5b89\u5168\u6027\u63d0\u4f9b\u65b0\u89c1\u89e3\u3002 \u6211\u4eec\u901a\u8fc7\u64cd\u63a7RAG\u4e2d\u68c0\u7d22\u6a21\u578b\u7684\u6392\u540d\u7ed3\u679c\uff0c\u5229\u7528\u8fd9\u4e9b\u64cd\u7eb5\u540e\u7684\u6570\u636e\u8bad\u7ec3\u4e00\u4e2a\u4ee3\u7406\u6a21\u578b\u3002\u63a5\u7740\uff0c\u91c7\u7528\u5bf9\u6297\u6027\u68c0\u7d22\u653b\u51fb\u65b9\u6cd5\u9488\u5bf9\u4ee3\u7406\u6a21\u578b\u5b9e\u65bd\u9ed1\u76d2\u8fc1\u79fb\u653b\u51fb\uff0c\u8fdb\u4e00\u6b65\u5f71\u54cdRAG\u7684\u751f\u6210\u8fc7\u7a0b\u3002\u5728\u6d89\u53ca\u591a\u4e2a\u4e3b\u9898\u7684\u610f\u89c1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u63d0\u51fa\u7684\u653b\u51fb\u7b56\u7565\u80fd\u663e\u8457\u6539\u53d8RAG\u751f\u6210\u5185\u5bb9\u7684\u89c2\u70b9\u6781\u6027\uff0c\u8fd9\u63ed\u793a\u4e86\u6a21\u578b\u7684\u6613\u53d7\u653b\u51fb\u6027\uff0c\u5e76\u4e14\u6f5c\u5728\u5730\u6307\u51fa\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u4f7f\u5f97\u8bef\u5bfc\u7528\u6237\u63a5\u53d7\u9519\u8bef\u6216\u6709\u504f\u89c1\u7684\u4fe1\u606f\u53d8\u5f97\u66f4\u52a0\u5bb9\u6613\u3002|\n", "2407.13742": "|**2024-07-18**|**CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications**|Mirza Masfiqur Rahman et.al.|[2407.13742](http://arxiv.org/abs/2407.13742)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u8702\u7a9d\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u5e38\u5e38\u5c06\u5b89\u5168\u6f0f\u6d1e\u5f52\u548e\u4e8e\u5e95\u5c42\u534f\u8bae\u8bbe\u8ba1\u63cf\u8ff0\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u901a\u5e38\u957f\u8fbe\u6570\u5343\u9875\u7684\u8be6\u7ec6\u89c4\u683c\u6587\u6863\u53ef\u80fd\u5305\u542b\u9519\u8bef\u3001\u4e0d\u5b8c\u6574\u63cf\u8ff0\u3001\u9690\u542b\u5047\u8bbe\u548c\u5185\u90e8\u77db\u76fe\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51faCellularLint\u2014\u2014\u4e00\u4e2a\u9488\u5bf94G\u548c5G\u975e\u63a5\u5165\u5c42\uff08Non-Access Stratum\uff0cNAS\uff09\u548c\u5b89\u5168\u89c4\u8303\u7684\u534a\u81ea\u52a8\u6846\u67b6\uff0c\u5229\u7528\u4e00\u5957\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u9886\u57df\u9002\u5e94\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6539\u826f\u7684\u5c11\u91cf\u6837\u4f8b\u5b66\u4e60\u3002\u8be5\u6a21\u578b\u9884\u8bad\u7ec3\u5728\u5927\u91cf\u7684\u8702\u7a9d\u7f51\u7edc\u534f\u8bae\u6570\u636e\u4e0a\uff0c\u80fd\u591f\u540c\u65f6\u68c0\u6d4b\u4e0d\u540c\u8bed\u4e49\u5c42\u6b21\u548c\u5b9e\u9645\u4f7f\u7528\u6848\u4f8b\u4e2d\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u4ee5\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u65b9\u5f0f\u63d0\u5347\u534f\u8bae\u89c4\u683c\u7684\u81ea\u52a8\u5316\u5206\u6790\u3002\u901a\u8fc7\u7814\u7a76\uff0c\u6211\u4eec\u57284G\u548c5G\u7f51\u7edc\u4e2d\u53d1\u73b0\u4e86157\u4e2a\u4e0d\u4e00\u81f4\u70b9\uff0c\u51c6\u786e\u7387\u4e3a82.67%\u3002\u7ecf\u8fc7\u5bf9\u5f00\u6e90\u5b9e\u73b0\u548c17\u6b3e\u5546\u7528\u8bbe\u5907\u7684\u9a8c\u8bc1\uff0c\u6211\u4eec\u786e\u8ba4\u8fd9\u4e9b\u4e0d\u4e00\u81f4\u786e\u5b9e\u5bf9\u8bbe\u8ba1\u51b3\u7b56\u6709\u91cd\u5927\u5f71\u54cd\uff0c\u53ef\u80fd\u5bfc\u81f4\u9690\u79c1\u3001\u5b8c\u6574\u6027\u3001\u53ef\u7528\u6027\u548c\u4e92\u64cd\u4f5c\u6027\u65b9\u9762\u7684\u62c5\u5fe7\u3002|\n", "2407.13729": "|**2024-07-18**|**Baba Is AI: Break the Rules to Beat the Benchmark**|Nathan Cloos et.al.|[2407.13729](http://arxiv.org/abs/2407.13729)|null|\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u65e2\u4f9d\u8d56\u4e8e\u9075\u5faa\u73b0\u6709\u89c4\u5219\u548c\u7a0b\u5e8f\uff0c\u4e5f\u4f9d\u8d56\u4e8e\u521b\u65b0\u601d\u7ef4\u6765\u91cd\u65b0\u5b9a\u4e49\u89c4\u5219\u548c\u76ee\u6807\u3002\u4e3a\u4e86\u68c0\u9a8c\u8fd9\u4e9b\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e\u6e38\u620f\u300aBaba Is You\u300b\u3002\u5728\u8fd9\u4e2a\u6e38\u620f\u4e2d\uff0c\u4ee3\u7406\u9700\u8981\u64cd\u63a7\u73af\u5883\u4e2d\u7684\u7269\u4f53\u548c\u53ef\u79fb\u52a8\u7684\u6587\u5b57\u89c4\u5219\u74f7\u7816\uff0c\u4ee5\u5b9e\u73b0\u7279\u5b9a\u76ee\u6807\u5e76\u8d62\u5f97\u6bd4\u8d5b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u4e09\u79cd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08OpenAI GPT-4\u3001Google Gemini-1.5-Pro\u548cGemini-1.5-Flash\uff09\uff0c\u53d1\u73b0\u5f53\u9700\u8981\u5bf9\u6e38\u620f\u89c4\u5219\u8fdb\u884c\u64cd\u7eb5\u548c\u7ec4\u5408\u65f6\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u5927\u5e45\u4e0b\u6ed1\u3002|\n", "2407.13717": "|**2024-07-18**|**CoDefeater: Using LLMs To Find Defeaters in Assurance Cases**|Usman Gohar et.al.|[2407.13717](http://arxiv.org/abs/2407.13717)|**[link](https://gitlab.com/anonymousdot/codefeater)**|\u6784\u5efa\u4fdd\u8bc1\u6848\u4f8b\u662f\u4e00\u79cd\u5e38\u7528\u4e14\u6709\u65f6\u5fc5\u8981\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc1\u660e\u5b89\u5168\u5173\u952e\u7cfb\u7edf\u5728\u5176\u89c4\u5212\u73af\u5883\u4e2d\u5c06\u5b89\u5168\u8fd0\u884c\u3002\u4e3a\u4e86\u964d\u4f4e\u9519\u8bef\u548c\u8fb9\u7f18\u60c5\u51b5\u9057\u6f0f\u7684\u98ce\u9669\uff0c\u5f15\u5165\u4e86\u201c\u53cd\u9a73\u201d\u6982\u5ff5\uff0c\u5373\u6311\u6218\u4fdd\u8bc1\u6848\u4f8b\u4e2d\u8bba\u70b9\u6216\u8bc1\u636e\u7684\u8bba\u636e\u3002\u53cd\u9a73\u6709\u52a9\u4e8e\u53ca\u65f6\u53d1\u73b0\u8bba\u70b9\u4e2d\u7684\u5f31\u70b9\uff0c\u4fc3\u4f7f\u8fdb\u4e00\u6b65\u8c03\u67e5\u548c\u53ca\u65f6\u8865\u6551\u3002\u7136\u800c\uff0c\u6355\u6349\u53cd\u9a73\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u5224\u65ad\u3001\u7ecf\u9a8c\u548c\u521b\u65b0\u601d\u7ef4\uff0c\u5e76\u4e14\u5fc5\u987b\u968f\u7740\u9700\u6c42\u548c\u6cd5\u89c4\u7684\u53d8\u5316\u8fdb\u884c\u8fed\u4ee3\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51faCoDefeater\uff0c\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5bfb\u627e\u53cd\u9a73\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u627e\u5230\u5df2\u77e5\u548c\u672a\u77e5\u7684\u5408\u7406\u53cd\u9a73\uff0c\u4ece\u800c\u5e2e\u52a9\u5b89\u5168\u5206\u6790\u5e08\u589e\u5f3a\u4fdd\u8bc1\u6848\u4f8b\u7684\u5b8c\u6574\u6027\u548c\u4fe1\u5fc3\u3002|\n", "2407.13709": "|**2024-07-18**|**Understanding Reference Policies in Direct Preference Optimization**|Yixin Liu et.al.|[2407.13709](http://arxiv.org/abs/2407.13709)|**[link](https://github.com/yale-nlp/refdpo)**|## \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08Direct Preference Optimization\uff0c\u7b80\u79f0 DPO\uff09\u5df2\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u6307\u4ee4\u5fae\u8c03\u7684\u5e38\u7528\u8bad\u7ec3\u65b9\u6cd5\u3002\u672c\u7814\u7a76\u5173\u6ce8DPO\u7684\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u8ba8\u7684\u65b9\u9762\uff1a\u5176\u5bf9\u53c2\u8003\u6a21\u578b\u6216\u7b56\u7565\u7684\u4f9d\u8d56\u6027\u3002\u8fd9\u4e9b\u53c2\u8003\u7b56\u7565\u901a\u5e38\u8868\u73b0\u4e3a\u5f85\u8fdb\u4e00\u6b65\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5b83\u4eec\u5bf9\u4e8eDPO\u7684\u6548\u679c\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u672c\u5de5\u4f5c\u9488\u5bf9\u4ee5\u4e0b\u4e09\u4e2a\u76f8\u5173\u95ee\u9898\u8fdb\u884c\u4e86\u63a2\u7a76\uff1a 1. \u9996\u5148\uff0c\u6211\u4eec\u7814\u7a76\u4e86DPO\u4e2d\u7684KL\u6563\u5ea6\u7ea6\u675f\u5f3a\u5ea6\u7684\u6700\u4f73\u9009\u62e9\uff0c\u8be5\u7ea6\u675f\u60e9\u7f5a\u4e0e\u53c2\u8003\u7b56\u7565\u7684\u504f\u79bb\uff0c\u53d1\u73b0DPO\u5bf9\u6b64\u654f\u611f\u3002 2. \u5176\u6b21\uff0c\u6211\u4eec\u4ece\u7406\u8bba\u548c\u5b9e\u8bc1\u4e0a\u6bd4\u8f83\u4e86DPO\u4e0e\u5176\u4ed6\u5b66\u4e60\u76ee\u6807\uff0c\u4ee5\u63a2\u8ba8\u53c2\u8003\u7b56\u7565\u5728\u6307\u4ee4\u5fae\u8c03\u4e2d\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u663e\u793a\u4e86DPO\u7684\u4f18\u52bf\u3002 3. \u6700\u540e\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u662f\u5426\u6709\u5229\u4e8eDPO\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u53c2\u8003\u7b56\u7565\u4e0e\u88ab\u5fae\u8c03\u6a21\u578b\u76f8\u4f3c\u65f6\uff0c\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u53ef\u80fd\u4f1a\u63d0\u9ad8\u6027\u80fd\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u53c2\u8003\u7b56\u7565\u5728DPO\u4e2d\u7684\u6df7\u6dc6\u4f5c\u7528\uff0c\u63d0\u4f9b\u4e86\u6700\u4f73\u5b9e\u8df5\u7684\u89c1\u89e3\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765\u7814\u7a76\u63d0\u51fa\u4e86\u5f00\u653e\u6027\u95ee\u9898\u3002|\n", "2407.13699": "|**2024-07-18**|**A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice**|Shaina Raza et.al.|[2407.13699](http://arxiv.org/abs/2407.13699)|null|## \u80cc\u666f \u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u901a\u8fc7\u63d0\u4f9b\u4e2a\u6027\u5316\u9879\u76ee\u5efa\u8bae\uff0c\u5bf9\u63d0\u5347\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u672c\u7efc\u8ff0\u56de\u987e\u4e86\u4ece2017\u5e74\u81f32024\u5e74\u95f4RS\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u5c06\u7406\u8bba\u521b\u65b0\u4e0e\u5b9e\u9645\u5e94\u7528\u7d27\u5bc6\u7ed3\u5408\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u4ece\u4f20\u7edf\u65b9\u6cd5\u5982\u57fa\u4e8e\u5185\u5bb9\u548c\u534f\u540c\u8fc7\u6ee4\u7684\u63a8\u8350\uff0c\u5230\u9ad8\u7ea7\u6280\u672f\u5982\u6df1\u5ea6\u5b66\u4e60\u3001\u56fe\u6a21\u578b\u3001\u5f3a\u5316\u5b66\u4e60\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5173\u6ce8\u4e86\u4e13\u95e8\u5316\u7684\u7cfb\u7edf\uff0c\u5982\u4e0a\u4e0b\u6587\u611f\u77e5\u3001\u8bc4\u8bba\u9a71\u52a8\u548c\u516c\u5e73\u6027\u8003\u91cf\u7684RS\u3002\u672c\u8c03\u67e5\u7684\u76ee\u6807\u662f\u8fde\u63a5\u7406\u8bba\u4e0e\u5b9e\u8df5\uff0c\u5173\u6ce8\u7535\u5b50\u5546\u52a1\u3001\u533b\u7597\u4fdd\u5065\u548c\u91d1\u878d\u7b49\u9886\u57df\u7684\u6311\u6218\uff0c\u5f3a\u8c03\u5bf9\u53ef\u6269\u5c55\u3001\u5b9e\u65f6\u4e14\u503c\u5f97\u4fe1\u8d56\u89e3\u51b3\u65b9\u6848\u7684\u9700\u6c42\u3002\u901a\u8fc7\u6b64\u7efc\u8ff0\uff0c\u6211\u4eec\u9f13\u52b1\u5b66\u672f\u7814\u7a76\u4e0e\u884c\u4e1a\u5b9e\u8df5\u7684\u7d27\u5bc6\u5408\u4f5c\u3002\u672c\u7814\u7a76\u63d0\u4f9b\u7684\u6d1e\u89c1\u65e8\u5728\u5e2e\u52a9\u4e1a\u754c\u4e13\u4e1a\u4eba\u5458\u4f18\u5316RS\u90e8\u7f72\uff0c\u5e76\u6fc0\u53d1\u672a\u6765\u7814\u7a76\u7684\u65b0\u65b9\u5411\uff0c\u7279\u522b\u662f\u5728\u5e94\u5bf9\u65b0\u5174\u6280\u672f\u548c\u793e\u4f1a\u8d8b\u52bf\u65f6\u3002|\n", "2407.13692": "|**2024-07-18**|**Prover-Verifier Games improve legibility of LLM outputs**|Jan Hendrik Kirchner et.al.|[2407.13692](http://arxiv.org/abs/2407.13692)|null|\u4e3a\u4e86\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f93\u51fa\u7ed3\u679c\u7684\u53ef\u4fe1\u5ea6\uff0c\u4e00\u4e2a\u65b9\u6cd5\u662f\u652f\u6301\u6e05\u6670\u6613\u9a8c\u8bc1\u7684\u63a8\u7406\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u53ef\u8bfb\u6027\u3002\u672c\u6587\u4ee5\u89e3\u51b3\u5c0f\u5b66\u6570\u5b66\u95ee\u9898\u4e3a\u80cc\u666f\uff0c\u7814\u7a76\u4e86\u53ef\u8bfb\u6027\uff0c\u5e76\u53d1\u73b0\u4ec5\u4f18\u5316\u8fde\u8d2f\u601d\u7ef4\u89e3\u9898\u7684\u51c6\u786e\u6027\u53ef\u80fd\u4f1a\u964d\u4f4e\u5176\u53ef\u8bfb\u6027\u3002\u4e3a\u7f13\u89e3\u8fd9\u4e00\u635f\u5931\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53d7Anil\u7b49\u4eba\uff082021\uff09\u7684\u8bc1\u660e\u5668-\u9a8c\u8bc1\u5668\u6e38\u620f\u542f\u53d1\u7684\u8bad\u7ec3\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u8fed\u4ee3\u5730\u8bad\u7ec3\u5c0f\u578b\u9a8c\u8bc1\u5668\u9884\u6d4b\u89e3\u9898\u6b63\u786e\u6027\uff0c\"\u6709\u5e2e\u52a9\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u9a8c\u8bc1\u5668\u63a5\u53d7\u7684\u6b63\u786e\u89e3\u7b54\uff0c\u4ee5\u53ca\"\u72e1\u733e\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u6b3a\u9a97\u9a8c\u8bc1\u5668\u7684\u9519\u8bef\u89e3\u7b54\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6709\u5e2e\u52a9\u8bc1\u660e\u5668\u7684\u51c6\u786e\u6027\u548c\u9a8c\u8bc1\u5668\u5bf9\u6297\u653b\u51fb\u7684\u9c81\u68d2\u6027\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u9488\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u80fd\u591f\u8f6c\u79fb\u7ed9\u65f6\u95f4\u6709\u9650\u7684\u4eba\u7c7b\uff0c\u4ed6\u4eec\u5728\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u6b63\u786e\u6027\u65f6\u7684\u51c6\u786e\u6027\u4f1a\u968f\u7740\u8bad\u7ec3\u63d0\u9ad8\uff0c\u800c\u5728\u9a8c\u8bc1\u72e1\u733e\u8bc1\u660e\u5668\u7684\u89e3\u51b3\u65b9\u6848\u65f6\u4f1a\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u5c0f\u578b\u9a8c\u8bc1\u5668\u8fdb\u884c\u53ef\u8bfb\u6027\u8bad\u7ec3\u53ef\u80fd\u662f\u4e00\u79cd\u5b9e\u9645\u53ef\u884c\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u63d0\u5347\u5927\u578bLLMs\u5bf9\u4eba\u7c7b\u7684\u53ef\u8bfb\u6027\uff0c\u4ece\u800c\u6709\u52a9\u4e8e\u8d85\u7ea7\u4eba\u7c7b\u6a21\u578b\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u662f\u4e00\u4e2a\u5b9e\u7528\u7684\u9014\u5f84\uff0c\u53ef\u4ee5\u589e\u5f3a\u5927\u578bLLMs\u7684\u53ef\u8bfb\u6027\uff0c\u5bf9\u4eba\u7c7b\u6765\u8bf4\u66f4\u6613\u4e8e\u7406\u89e3\u548c\u4fe1\u4efb\u3002|\n", "2407.13648": "|**2024-07-18**|**COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization**|Skyler Grandel et.al.|[2407.13648](http://arxiv.org/abs/2407.13648)|null|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u8f6f\u4ef6\u7ef4\u62a4\u4e2d\u4ee3\u7801\u7406\u89e3\u7684\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7\u81ea\u52a8\u5316\u751f\u6210\u6ce8\u91ca\u6765\u63d0\u5347\u8fd9\u4e00\u8fc7\u7a0b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOMCAT\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u9886\u57df\u4e13\u5bb6\u6307\u5bfc\uff0c\u65e8\u5728\u4e3a\u6e90\u4ee3\u7801\u63d0\u4f9b\u6709\u52a9\u4e8e\u7406\u89e3\u7684\u6ce8\u91ca\u3002COMCAT\u6d41\u7a0b\u5305\u62ec\u81ea\u52a8\u8bc6\u522b\u4ee3\u7801\u4e2d\u9002\u5408\u6dfb\u52a0\u6ce8\u91ca\u7684\u4f4d\u7f6e\u3001\u9884\u6d4b\u6bcf\u4e2a\u4f4d\u7f6e\u6700\u9002\u5408\u7684\u6ce8\u91ca\u7c7b\u578b\uff0c\u5e76\u6839\u636e\u9009\u5b9a\u4f4d\u7f6e\u548c\u7c7b\u578b\u751f\u6210\u6ce8\u91ca\u3002\u5728\u4eba\u7c7b\u53d7\u8bd5\u8005\u7684\u7814\u7a76\u4e2d\uff0c\u7ed3\u679c\u663e\u793aCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u663e\u8457\u63d0\u9ad8\u4e86\u5f00\u53d1\u4eba\u5458\u5728\u4e09\u4e2a\u5178\u578b\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u7684\u4ee3\u7801\u7406\u89e3\u80fd\u529b\uff0c\u5bf9\u4e8e87%\u7684\u53c2\u4e0e\u8005\uff0c\u63d0\u5347\u5e45\u5ea6\u8fbe\u523012%\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u8868\u660eCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u5728\u51c6\u786e\u6027\u3001\u53ef\u8bfb\u6027\u4e0a\u81f3\u5c11\u4e0e\u4eba\u5de5\u6ce8\u91ca\u76f8\u5f53\uff0c\u5e76\u4e14\u572892%\u7684\u4ee3\u7801\u7247\u6bb5\u4e2d\uff0c\u5f00\u53d1\u8005\u66f4\u504f\u597dCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\uff0c\u800c\u975e\u6807\u51c6\u7684ChatGPT\u751f\u6210\u7684\u6ce8\u91ca\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u5f00\u53d1\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5305\u542b\u6e90\u4ee3\u7801\u7247\u6bb5\u3001\u4eba\u5de5\u7f16\u5199\u6ce8\u91ca\u548c\u6807\u6ce8\u7684\u7c7b\u522b\u6570\u636e\u96c6\u3002\u603b\u7684\u6765\u8bf4\uff0cCOMCAT\u5229\u7528LLMs\u5728\u591a\u79cd\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u4ee3\u7801\u7406\u89e3\u6c34\u5e73\u3002|\n", "2407.13647": "|**2024-07-18**|**Weak-to-Strong Reasoning**|Yuqing Yang et.al.|[2407.13647](http://arxiv.org/abs/2407.13647)|**[link](https://github.com/gair-nlp/weak-to-strong-reasoning)**|\u5f53\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6027\u80fd\u8d85\u8d8a\u4eba\u7c7b\u65f6\uff0c\u4e3a\u5176\u63d0\u4f9b\u5168\u9762\u800c\u7cbe\u786e\u7684\u76d1\u7763\u53d8\u5f97\u56f0\u96be\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u5f31\u5230\u5f3a\u5b66\u4e60\u65b9\u6cd5\uff0c\u5373\u5229\u7528\u80fd\u529b\u8f83\u5f31\u7684\u6a21\u578b\u6fc0\u53d1\u8f83\u5f3a\u6a21\u578b\u7684\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u793a\u51fa\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u7684\u6548\u679c\u5c1a\u672a\u5f97\u5230\u5145\u5206\u68c0\u9a8c\uff0c\u4e14\u5f53\u524d\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u6765\u907f\u514d\u6a21\u578b\u76f2\u76ee\u6a21\u4eff\u5f31\u5bfc\u5e08\uff0c\u5305\u62ec\u5176\u9519\u8bef\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6e10\u8fdb\u5b66\u4e60\u6846\u67b6\uff0c\u4f7f\u5f3a\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u4f18\u5316\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u7ea7\u6a21\u578b\u6216\u4eba\u5de5\u6807\u6ce8\u7684\u6570\u636e\u3002\u8be5\u6846\u67b6\u9996\u5148\u5bf9\u9009\u5b9a\u7684\u5c0f\u800c\u9ad8\u8d28\u91cf\u6570\u636e\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u7136\u540e\u5728\u5f3a\u6a21\u578b\u81ea\u884c\u8bc6\u522b\u7684\u5bf9\u6bd4\u6837\u672c\u4e0a\u8fdb\u884c\u504f\u597d\u4f18\u5316\u3002\u6211\u4eec\u5728GSM8K\u548cMATH\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86Llama2-70b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u4e09\u79cd\u4e0d\u540c\u7684\u5f31\u6a21\u578b\u8fdb\u884c\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5728\u524d\u77bb\u6027\u7684\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0cLlama3-8b-instruct\u6210\u529f\u6307\u5bfcLlama3-70b\u5728\u6781\u5177\u6311\u6218\u6027\u7684OlympicArena\u6570\u636e\u96c6\u4e0a\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u7684\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u53ef\u6269\u5c55\u548c\u9ad8\u7ea7\u7684\u7b56\u7565\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728\u83b7\u53d6\u3002|\n", "2407.14507": "|**2024-07-19**|**Internal Consistency and Self-Feedback in Large Language Models: A Survey**|Xun Liang et.al.|[2407.14507](http://arxiv.org/abs/2407.14507)|**[link](https://github.com/iaar-shanghai/icsfsurvey)**|**\u672c\u6587\u603b\u7ed3\u4e86\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u79f0\u4e3a\u5185\u90e8\u4e00\u81f4\u6027\uff08Internal Consistency\uff09\uff0c\u5b83\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u7406\u4e0d\u8db3\u548c\u751f\u6210\u5e7b\u89c9\u5185\u5bb9\u7b49\u95ee\u9898\u4e0a\u7684\u8868\u73b0\u63d0\u4f9b\u4e86\u4e00\u81f4\u7684\u89e3\u91ca\u3002\u5185\u90e8\u4e00\u81f4\u6027\u8bc4\u4f30\u4e86LLM\u7684\u6f5c\u5728\u5c42\u3001\u89e3\u7801\u5c42\u548c\u54cd\u5e94\u5c42\u4e4b\u95f4\u7684\u5185\u5728\u4e00\u81f4\u6027\uff0c\u57fa\u4e8e\u91c7\u6837\u65b9\u6cd5\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86Self-Feedback\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u7406\u8bba\u6846\u67b6\uff0c\u7528\u4e8e\u6316\u6398\u5185\u90e8\u4e00\u81f4\u6027\u7684\u4fe1\u606f\u3002Self-Feedback\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u6a21\u5757\uff1a\u81ea\u6211\u8bc4\u4f30\uff08Self-Evaluation\uff09\u548c\u81ea\u6211\u66f4\u65b0\uff08Self-Update\uff09\u3002 \u6211\u4eec\u7cfb\u7edf\u5730\u6309\u4efb\u52a1\u548c\u7814\u7a76\u65b9\u5411\u5bf9\u8fd9\u4e9b\u7814\u7a76\u8fdb\u884c\u4e86\u5206\u7c7b\uff1b\u603b\u7ed3\u4e86\u76f8\u5173\u7684\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\uff1b\u6df1\u5165\u63a2\u8ba8\u4e86\u201cSelf-Feedback\u771f\u7684\u6709\u6548\u5417\uff1f\u201d\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u4e2a\u5173\u952e\u89c2\u70b9\uff0c\u5305\u62ec\u201c\u5185\u90e8\u4e00\u81f4\u6027\u7684\u53d1\u5c55\u949f\u697c\u201d\u3001\u201c\u4e00\u81f4\u6027\u51e0\u4e4e\u662f\u6b63\u786e\u6027\u201d\u7684\u5047\u8bbe\u4ee5\u53ca\u201c\u6f5c\u610f\u8bc6\u4e0e\u663e\u5f0f\u63a8\u7406\u6096\u8bba\u201d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6982\u8ff0\u4e86\u672a\u6765\u7814\u7a76\u7684\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u5b9e\u9a8c\u4ee3\u7801\u3001\u53c2\u8003\u5217\u8868\u548c\u7edf\u8ba1\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u94fe\u63a5\u4e3a\uff1a[](https://github.com/IAAR-Shanghai/ICSFSurvey)**|\n", "2407.14506": "|**2024-07-19**|**On Pre-training of Multimodal Language Models Customized for Chart Understanding**|Wan-Cyuan Fan et.al.|[2407.14506](http://arxiv.org/abs/2407.14506)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5728\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5b9a\u5236\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u6210\u679c\uff0c\u7279\u522b\u662f\u5728\u79d1\u5b66\u56fe\u8868\u7406\u89e3\u9886\u57df\u3002\u8fd9\u4e9b\u7814\u7a76\u901a\u5e38\u901a\u8fc7\u4f7f\u7528\u4e13\u95e8\u7684\u6570\u636e\u96c6\u8fdb\u884c\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6765\u589e\u5f3a\u95ee\u7b54\uff08QA\uff09\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5f80\u5f80\u5ffd\u89c6\u4e86\u81ea\u7136\u56fe\u50cf-\u63cf\u8ff0\u9884\u8bad\u7ec3\u6570\u636e\u4e0e\u6570\u5b57\u56fe\u8868\u56fe\u50cf-QA\u6570\u636e\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u6a21\u578b\u4ece\u56fe\u8868\u4e2d\u63d0\u53d6\u6f5c\u5728\u6570\u503c\u7684\u80fd\u529b\u3002\u672c\u6587\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u758f\u6f0f\uff0c\u63a2\u7d22\u6539\u8fdbMLLMs\u5bf9\u56fe\u8868\u7406\u89e3\u6240\u9700\u7684\u5173\u952e\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u53d1\u73b0\uff1a\uff081\uff09\u5728\u5bf9\u9f50\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u878d\u5165\u539f\u59cb\u6570\u636e\u503c\u663e\u8457\u63d0\u9ad8\u4e86\u5bf9\u56fe\u8868\u6570\u636e\u7684\u7406\u89e3\u80fd\u529b\u3002\uff082\uff09\u5728\u7aef\u5230\u7aef\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u968f\u673a\u66ff\u6362\u56fe\u50cf\u4e3a\u6587\u672c\u8868\u793a\uff0c\u80fd\u591f\u5c06\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u56fe\u8868\u89e3\u91ca\u6280\u80fd\u4e0a\u3002\uff083\uff09\u8981\u6c42\u6a21\u578b\u9996\u5148\u63d0\u53d6\u5e95\u5c42\u56fe\u8868\u6570\u636e\uff0c\u7136\u540e\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u56de\u7b54\u95ee\u9898\uff0c\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u51c6\u786e\u6027\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86CHOPINLLM\uff0c\u4e00\u79cd\u4e13\u4e3a\u6df1\u5165\u56fe\u8868\u7406\u89e3\u5b9a\u5236\u7684MLLM\u3002CHOPINLLM\u6709\u6548\u5730\u89e3\u6790\u5404\u79cd\u7c7b\u578b\u7684\u56fe\u8868\uff0c\u5305\u62ec\u672a\u6807\u6ce8\u7684\u56fe\u8868\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30MLLMs\u5728\u4e0d\u540c\u56fe\u8868\u7c7b\u578b\u548c\u7406\u89e3\u6c34\u5e73\u4e0a\u7684\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCHOPINLLM\u5728\u7406\u89e3\u5404\u79cd\u7c7b\u578b\u3001\u5e26\u6709\u6807\u6ce8\u548c\u672a\u6807\u6ce8\u7684\u56fe\u8868\u65b9\u9762\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.14487": "|**2024-07-19**|**Evaluating the Reliability of Self-Explanations in Large Language Models**|Korbinian Randl et.al.|[2407.14487](http://arxiv.org/abs/2407.14487)|**[link](https://github.com/k-randl/self-explaining_llms)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u88ab\u63d0\u793a\u89e3\u91ca\u5176\u5148\u524d\u8f93\u51fa\u65f6\u751f\u6210\u7684\u89e3\u91ca\u53ef\u9760\u6027\u3002\u6211\u4eec\u5229\u7528\u4e09\u79cd\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08\u53c2\u6570\u4ece2B\u52308B\uff09\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u5206\u7c7b\u4efb\u52a1\uff08\u5ba2\u89c2\u548c\u4e3b\u89c2\uff09\u4e0a\u8bc4\u4f30\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u81ea\u6211\u89e3\u91ca\u2014\u2014\u62bd\u53d6\u5f0f\u548c\u53cd\u4e8b\u5b9e\u5f0f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u81ea\u6211\u89e3\u91ca\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5e76\u4e0d\u5b8c\u5168\u4e14\u51c6\u786e\u5730\u9075\u5faa\u6a21\u578b\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u6307\u51fa\u4e86\u4e00\u79cd\u611f\u77e5\u4e0e\u5b9e\u9645\u6a21\u578b\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u663e\u793a\uff0c\u901a\u8fc7\u63d0\u793a\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u53cd\u4e8b\u5b9e\u89e3\u91ca\uff0c\u53ef\u4ee5\u4ea7\u751f\u5fe0\u5b9e\u3001\u4fe1\u606f\u4e30\u5bcc\u4e14\u6613\u4e8e\u9a8c\u8bc1\u7684\u7ed3\u679c\u3002\u8fd9\u4e9b\u53cd\u4e8b\u5b9e\u4e3a\u4f20\u7edf\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\uff08\u4f8b\u5982SHAP\u3001LIME\uff09\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u524d\u63d0\u662f\u5bf9\u7279\u5b9a\u4efb\u52a1\u5b9a\u5236\u63d0\u793a\u5e76\u68c0\u67e5\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14474": "|**2024-07-19**|**Contrastive Learning with Counterfactual Explanations for Radiology Report Generation**|Mingjie Li et.al.|[2407.14474](http://arxiv.org/abs/2407.14474)|null|\u7531\u4e8e\u89e3\u5256\u5b66\u7684\u5e38\u89c1\u5185\u5bb9\u548c\u4e0e\u4e4b\u5bf9\u5e94\u7684\u5f71\u50cf\u5b66\u56fe\u50cf\u4e4b\u95f4\u7684\u9ad8\u5ea6\u76f8\u4f3c\u6027\uff0c\u8fd9\u79cd\u56fa\u6709\u7684\u6570\u636e\u504f\u89c1\u53ef\u80fd\u5bfc\u81f4\u81ea\u52a8\u62a5\u544a\u751f\u6210\u6a21\u578b\u5b66\u4e60\u7ea0\u7f20\u548c\u76f8\u5173\u6027\u589e\u5f3a\u7684\u8868\u793a\uff0c\u4ece\u800c\u4ea7\u751f\u8bef\u8bca\u62a5\u544a\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cCo\u201dunter\u201cF\u201dactual \u201cE\u201dxplanations\uff08CoFE\uff09\u6846\u67b6\u7528\u4e8e\u653e\u5c04\u5b66\u62a5\u544a\u751f\u6210\u3002\u53cd\u4e8b\u5b9e\u89e3\u91ca\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u7406\u89e3\u7b97\u6cd5\u51b3\u7b56\u5982\u4f55\u901a\u8fc7\u63d0\u51fa\u201c\u5982\u679c\u201d\u573a\u666f\u800c\u88ab\u6539\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e00\u6982\u5ff5\uff0cCoFE\u53ef\u4ee5\u901a\u8fc7\u5bf9\u6bd4\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u7684\u8868\u793a\u6765\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\uff0c\u4ece\u800c\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u5728\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u4ea4\u6362\u8865\u4e01\u76f4\u5230\u9884\u6d4b\u8bca\u65ad\u53d1\u751f\u53d8\u5316\uff0c\u6211\u4eec\u63a8\u5bfc\u51fa\u53cd\u4e8b\u5b9e\u56fe\u50cf\u3002\u5728\u8fd9\u91cc\uff0c\u6b63\u4f8b\u548c\u8d1f\u4f8b\u662f\u6700\u8bed\u4e49\u4e0a\u76f8\u4f3c\u7684\uff0c\u4f46\u5177\u6709\u4e0d\u540c\u7684\u8bca\u65ad\u6807\u7b7e\u3002\u6b64\u5916\uff0cCoFE\u91c7\u7528\u53ef\u5b66\u4e60\u63d0\u793a\u9ad8\u6548\u5730\u5bf9\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u5c01\u88c5\u4e86\u6b63\u4e8b\u5b9e\u4f8b\u548c\u53cd\u4e8b\u5b9e\u5b9e\u4f8b\u7684\u5185\u5bb9\uff0c\u63d0\u4f9b\u66f4\u901a\u7528\u7684\u63d0\u793a\u8868\u793a\u3002\u5728\u4e24\u4e2a\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u5229\u7528\u53cd\u4e8b\u5b9e\u89e3\u91ca\u4f7fCoFE\u80fd\u591f\u751f\u6210\u8bed\u4e49\u4e0a\u8fde\u8d2f\u4e14\u4e8b\u5b9e\u5b8c\u6574\u7684\u62a5\u544a\uff0c\u5e76\u5728\u8bed\u8a00\u751f\u6210\u548c\u4e34\u5e8a\u6709\u6548\u6027\u6307\u6807\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.14467": "|**2024-07-19**|**Check-Eval: A Checklist-based Approach for Evaluating Text Quality**|Jayr Pereira et.al.|[2407.14467](http://arxiv.org/abs/2407.14467)|null|\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u4f20\u7edf\u7684\u8bc4\u4f30\u6807\u51c6\u5f80\u5f80\u4e0e\u4eba\u7c7b\u7684\u5224\u65ad\u4e0d\u5339\u914d\uff0c\u5c24\u5176\u662f\u5728\u9700\u8981\u521b\u9020\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u7684\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheck-Eval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u901a\u8fc7\u5229\u7528LLM\u4ee5\u68c0\u67e5\u8868\u4e3a\u57fa\u7840\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u3002Check-Eval\u53ef\u4ee5\u4f5c\u4e3a\u65e0\u53c2\u8003\u548c\u6709\u53c2\u8003\u7684\u8bc4\u4f30\u65b9\u6cd5\u4f7f\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u4e14\u53ef\u89e3\u91ca\u7684\u6587\u672c\u8d28\u91cf\u8bc4\u4f30\u4f53\u7cfb\u3002\u8be5\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u9636\u6bb5\u7ec4\u6210\uff1a\u68c0\u67e5\u8868\u751f\u6210\u548c\u68c0\u67e5\u8868\u8bc4\u4f30\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u9a8c\u8bc1\u4e86Check-Eval\uff1a\u8461\u8404\u7259\u8bed\u6cd5\u5f8b\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u6027\u4ee5\u53caSummEval\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cCheck-Eval\u4e0e\u73b0\u6709\u6307\u6807\uff08\u5982G-Eval\u548cGPTScore\uff09\u76f8\u6bd4\uff0c\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5176\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4efb\u52a1\u66f4\u53ef\u9760\u548c\u6709\u6548\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u4ee3\u7801\u53ef\u5728https://anonymous.4open.science/r/check-eval-0DB4\u83b7\u53d6\u3002|\n", "2407.14452": "|**2024-07-19**|**Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier**|Zachary Wojtowicz et.al.|[2407.14452](http://arxiv.org/abs/2407.14452)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u5176\u4ed6\u9ad8\u5ea6\u5148\u8fdb\u7684AI\u7cfb\u7edf\u5728\u51b3\u5b9a\u8bf4\u4ec0\u4e48\u6216\u505a\u4ec0\u4e48\u65f6\u63d0\u4f9b\u4e86\u4fbf\u5229\uff0c\u4f46\u8fd9\u4fbf\u5229\u6027\u5b9e\u9645\u4e0a\u524a\u5f31\u4e86\u5728\u793e\u4f1a\u60c5\u5883\u4e0b\u91c7\u53d6\u6709\u6548\u884c\u52a8\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5f15\u5165\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u8fd9\u4e00\u6574\u5408\u6027\u7406\u8bba\u6982\u5ff5\u6765\u89e3\u91ca\u8fd9\u79cd\u770b\u4f3c\u77db\u76fe\u7684\u73b0\u8c61\u3002\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u53d1\u751f\u5728\u4f7f\u7528\u53ef\u89c2\u5bdf\u7684\u884c\u4e3a\u6765\u8bc1\u5b9e\u4e0d\u53ef\u89c2\u5bdf\u7684\u5fc3\u7406\u4e8b\u5b9e\u7684\u60c5\u51b5\u4e2d\u3002\u4ece\u62db\u8058\u5230\u7ea6\u4f1a\uff0c\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u4f7f\u4eba\u4eec\u80fd\u591f\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e2d\u76f8\u4e92\u4f20\u8fbe\u4ef7\u503c\u89c2\u3001\u610f\u56fe\u3001\u77e5\u8bc6\u72b6\u6001\u7b49\u5fc3\u7406\u7279\u5f81\uff0c\u8fd9\u4e9b\u73af\u5883\u4e2d\u7684\u8bda\u5b9e\u96be\u4ee5\u5f97\u5230\u5f3a\u5236\u6267\u884c\u3002 \u57fa\u4e8e\u7ecf\u6d4e\u5b66\u3001\u7406\u8bba\u751f\u7269\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63cf\u8ff0\u4e86\u4f7f\u4eba\u7c7b\u80fd\u591f\u5b9e\u65bd\u5fc3\u7406\u8bc1\u660e\u7684\u6838\u5fc3\u7406\u8bba\u673a\u5236\u3002\u5bf9\u8fd9\u4e9b\u673a\u5236\u7684\u5206\u6790\u63ed\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u5982\u4f55\u5728\u4f7f\u601d\u8003\u53d8\u5f97\u5bb9\u6613\u7684\u540c\u65f6\uff0c\u5374\u53ef\u80fd\u4f7f\u4f4e\u4fe1\u4efb\u5408\u4f5c\u53d8\u5f97\u66f4\u96be\u3002 \u901a\u8fc7\u7406\u89e3\u5fc3\u7406\u8bc1\u660e\u7684\u5de5\u4f5c\u539f\u7406\u53ca\u5176\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u5e94\u7528\uff0c\u6211\u4eec\u53ef\u4ee5\u8bbe\u8ba1\u51fa\u65e2\u80fd\u4fc3\u8fdb\u9ad8\u6548\u6c9f\u901a\u53c8\u80fd\u7ef4\u62a4\u793e\u4f1a\u534f\u4f5c\u7684AI\u7cfb\u7edf\u3002\u4f8b\u5982\uff0c\u5728\u62db\u8058\u8fc7\u7a0b\u4e2d\uff0cAI\u53ef\u4ee5\u901a\u8fc7\u5206\u6790\u5019\u9009\u4eba\u7684\u884c\u4e3a\u6a21\u5f0f\u548c\u5386\u53f2\u6570\u636e\u6765\u95f4\u63a5\u8bc4\u4f30\u5176\u6280\u80fd\u3001\u56e2\u961f\u5408\u4f5c\u80fd\u529b\u4ee5\u53ca\u5bf9\u516c\u53f8\u6587\u5316\u7684\u9002\u5e94\u6027\uff0c\u4ece\u800c\u5e2e\u52a9\u96c7\u4e3b\u505a\u51fa\u66f4\u53ef\u9760\u7684\u4eba\u624d\u9009\u62e9\u51b3\u7b56\u3002\u5728\u7ea6\u4f1a\u573a\u666f\u4e2d\uff0cAI\u53ef\u4ee5\u5229\u7528\u793e\u4ea4\u5a92\u4f53\u6d3b\u52a8\u3001\u5174\u8da3\u7231\u597d\u7b49\u4fe1\u606f\u6765\u6784\u5efa\u7528\u6237\u7684\u5fc3\u7406\u753b\u50cf\uff0c\u4ee5\u6b64\u5e2e\u52a9\u7528\u6237\u627e\u5230\u4e0e\u81ea\u5df1\u4ef7\u503c\u89c2\u548c\u751f\u6d3b\u65b9\u5f0f\u76f8\u5339\u914d\u7684\u4f34\u4fa3\u3002 \u603b\u4e4b\uff0c\u901a\u8fc7\u5408\u7406\u5730\u8bbe\u8ba1\u548c\u5e94\u7528AI\u6280\u672f\uff0c\u6211\u4eec\u4e0d\u4ec5\u53ef\u4ee5\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e0b\u589e\u5f3a\u4eba\u7c7b\u7684\u4ea4\u6d41\u548c\u5408\u4f5c\u80fd\u529b\uff0c\u800c\u4e14\u8fd8\u80fd\u4fc3\u8fdb\u66f4\u52a0\u516c\u6b63\u3001\u900f\u660e\u548c\u9ad8\u6548\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002|\n", "2407.14439": "|**2024-07-19**|**Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding**|Renshan Zhang et.al.|[2407.14439](http://arxiv.org/abs/2407.14439)|**[link](https://github.com/JiuTian-VL/TokenCorrCompressor)**|**\u5f53\u524d\u4e3b\u6d41\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models, MLLMs\uff09\u5728\u8fdb\u884c\u6587\u6863\u7406\u89e3\u65f6\uff0c\u666e\u904d\u91c7\u7528\u5bf9\u9ad8\u5206\u8fa8\u7387\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u88c1\u526a\uff0c\u4ece\u800c\u751f\u6210\u591a\u4e2a\u5b50\u56fe\u50cf\u7684\u65b9\u6cd5\u3002\u5927\u591a\u6570\u73b0\u6709\u7684\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u4f1a\u4fdd\u7559\u6240\u6709\u5b50\u56fe\u50cf\u5185\u7684\u6807\u8bb0\uff0c\u5e76\u540c\u7b49\u5bf9\u5f85\u5b83\u4eec\uff0c\u8fd9\u5ffd\u89c6\u4e86\u8fd9\u4e9b\u6807\u8bb0\u7684\u4e0d\u540c\u4fe1\u606f\u4ef7\u503c\u6027\uff0c\u5bfc\u81f4\u4e86\u5927\u91cf\u4e0d\u5fc5\u8981\u7684\u56fe\u50cf\u6807\u8bb0\u589e\u52a0\u3002\u4e3a\u4e86\u5b9e\u73b0\u66f4\u52a0\u9002\u5e94\u6027\u548c\u9ad8\u6548\u7684\u6587\u6863\u7406\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cToken\u7ea7\u76f8\u5173\u6027\u5f15\u5bfc\u538b\u7f29\u201d\u7684\u65e0\u53c2\u6570\u4e14\u53ef\u63d2\u62d4\u65b9\u6cd5\uff0c\u65e8\u5728\u4f18\u5316\u6807\u8bb0\u5904\u7406\u8fc7\u7a0b\u3002\u8be5\u65b9\u6cd5\u9996\u5148\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bc4\u4f30\u6a21\u5f0f\u91cd\u590d\u6027\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u6bcf\u4e2a\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u8fdb\u884c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8bc6\u522b\u5197\u4f59\u6807\u8bb0\uff0c\u4ece\u800c\u786e\u5b9a\u5b50\u56fe\u50cf\u7684\u4fe1\u606f\u5bc6\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9Token\u7ea7\u522b\u7684\u91c7\u6837\u65b9\u6cd5\uff0c\u901a\u8fc7\u6df1\u5165\u5206\u6790[CLS]\u6807\u8bb0\u4e0e\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u9ad8\u6548\u6355\u6349\u6700\u5177\u4fe1\u606f\u4ef7\u503c\u7684\u6807\u8bb0\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e24\u79cd\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4f7f\u7528\u88c1\u526a\u6280\u672f\u7684MLLMs\u4e2d\u7684\u81ea\u9002\u5e94\u538b\u7f29\u6a21\u5757\u3002\u8fd9\u4e00\u6a21\u5757\u4e0d\u4ec5\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u8fc7\u7a0b\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u901f\u5ea6\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u4e0e\u73b0\u6709\u538b\u7f29\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u6c34\u5e73\u3002\u6211\u4eec\u4f7f\u7528\u5f53\u524d\u6700\u4f73\u7684\u6587\u6863\u7406\u89e3\u6a21\u578bmPLUG-DocOwl1.5\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4e0e\u5176\u4ed6\u538b\u7f29\u65b9\u6cd5\u7684\u5e7f\u6cdb\u5bf9\u6bd4\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14402": "|**2024-07-19**|**The Vision of Autonomic Computing: Can LLMs Make It a Reality?**|Zhiyang Zhang et.al.|[2407.14402](http://arxiv.org/abs/2407.14402)|null|\u300a\u81ea\u6cbb\u8ba1\u7b97\u613f\u666f\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u300b\u4e00\u6587\u56de\u987e\u4e86\u8d85\u8fc7\u4e8c\u5341\u5e74\u524d\u63d0\u51fa\u7684\u81ea\u6cbb\u8ba1\u7b97\uff08ACV\uff09\u613f\u666f\uff0c\u65e8\u5728\u6784\u5efa\u80fd\u591f\u81ea\u6211\u7ba1\u7406\u548c\u9002\u5e94\u73af\u5883\u53d8\u5316\u7684\u8ba1\u7b97\u7cfb\u7edf\uff0c\u8fd9\u4e00\u76ee\u6807\u81f3\u4eca\u4ecd\u9762\u4e34\u6311\u6218\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u5b83\u4eec\u901a\u8fc7\u5229\u7528\u5e7f\u6cdb\u7684\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u4ee5\u53ca\u4efb\u52a1\u81ea\u52a8\u5316\u80fd\u529b\u6765\u5b9e\u73b0\u8fd9\u4e00\u613f\u666f\u3002 \u672c\u6587\u63a2\u8ba8\u4e86\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u5fae\u670d\u52a1\u7ba1\u7406\u81ea\u4e3b\u6027\u7684\u53ef\u884c\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e94\u7ea7\u5206\u7c7b\u4f53\u7cfb\uff0c\u7528\u4e8e\u63cf\u8ff0\u81ea\u4e3b\u670d\u52a1\u7ef4\u62a4\u7684\u4e0d\u540c\u5c42\u6b21\u3002\u6587\u4e2d\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u201cSock Shop\u201d\u5fae\u670d\u52a1\u6f14\u793a\u9879\u76ee\u7684\u5728\u7ebf\u8bc4\u4f30\u57fa\u51c6\uff0c\u4ee5\u8bc4\u4f30\u8be5\u6846\u67b6\u7684\u6027\u80fd\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7LLMs\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u5fae\u670d\u52a1\u4f53\u7cfb\u7ed3\u6784\u4e2d\u95ee\u9898\u68c0\u6d4b\u548c\u89e3\u51b3\u7684\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u7b2c\u4e09\u7ea7\u81ea\u4e3b\u6027\u6c34\u5e73\u7684\u7a81\u7834\uff0c\u8fd9\u6807\u5fd7\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u6846\u67b6\u96c6\u6210\u65b9\u9762\u7684\u5e94\u7528\u53d6\u5f97\u4e86\u91cd\u8981\u8fdb\u5c55\uff0c\u4e3a\u6784\u5efa\u66f4\u9002\u5e94\u6027\u548c\u81ea\u6211\u7ba1\u7406\u7684\u8ba1\u7b97\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u548c\u53d1\u5c55\uff0c\u76f8\u5173\u7684\u4ee3\u7801\u5c06\u901a\u8fc7\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2407.14371": "|**2024-07-19**|**Open Artificial Knowledge**|Vadim Borisov et.al.|[2407.14371](http://arxiv.org/abs/2407.14371)|null|\u300a\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\uff08OAK\uff09\u6570\u636e\u96c6\uff1a\u4fc3\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u4e0e\u89e3\u51b3\u6570\u636e\u7a00\u7f3a\u4e0e\u9690\u79c1\u95ee\u9898\u300b \u5f53\u524d\uff0c\u57fa\u4e8e\u5bf9\u8bdd\u7684AI\u7cfb\u7edf\u5982ChatGPT\u3001Claude\u548cGemini\u7684\u6210\u529f\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u6d77\u91cf\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u3002\u7136\u800c\uff0c\u83b7\u53d6\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u4f26\u7406\u6765\u6e90\u7684\u6570\u636e\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\u201d\uff08OAK\uff09\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b\u8d85\u8fc75\u4ebf\u4e2a\u4ee4\u724c\uff08\u64b0\u5199\u65f6\uff09\u7684\u5927\u578b\u8d44\u6e90\u5e93\u3002OAK\u901a\u8fc7\u96c6\u5408\u5305\u62ecGPT4o\u3001LLaMa3-70B\u3001LLaMa3-8B\u3001Mixtral-8x7B\u3001Gemma-7B\u548cGemma-2-9B\u5728\u5185\u7684\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5229\u7528\u7ef4\u57fa\u767e\u79d1\u7684\u4e3b\u8981\u7c7b\u522b\u6765\u5f15\u5bfc\u6587\u672c\u751f\u6210\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u9886\u57df\u8986\u76d6\uff0c\u540c\u65f6\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\u3002OAK\u6570\u636e\u96c6\u65e8\u5728\u4fc3\u8fdb\u66f4\u5f3a\u5927\u3001\u66f4\u5bf9\u9f50\u7684\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u5e76\u89e3\u51b3\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u95ee\u9898\uff0c\u5982\u6570\u636e\u7a00\u7f3a\u6027\u548c\u9690\u79c1\u95ee\u9898\u3002\u76ee\u524d\uff0c\u8be5\u6570\u636e\u96c6\u662f\u514d\u8d39\u63d0\u4f9b\u5728www.oakdataset.org\u3002|\n", "2407.14355": "|**2024-07-19**|**Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models**|Xuenan Xu et.al.|[2407.14355](http://arxiv.org/abs/2407.14355)|**[link](https://github.com/wsntxxn/attrenhzsac)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8fdb\u884c\u96f6\u6837\u672c\u97f3\u9891\u5206\u7c7b\uff0c\u5373\u8bc6\u522b\u548c\u5206\u7c7b\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ece\u672a\u89c1\u8fc7\u7684\u97f3\u9891\u7c7b\u522b\u3002\u6211\u4eec\u63d0\u8bae\u5217\u51fa\u4e00\u7cfb\u5217\u97f3\u9891\u5c5e\u6027\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9886\u57df\u77e5\u8bc6\u4e3a\u6bcf\u4e2a\u7c7b\u522b\u751f\u6210\u8be6\u7ec6\u7684\u5c5e\u6027\u63cf\u8ff0\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u4f9d\u8d56\u7c7b\u522b\u6807\u7b7e\u6216\u7b80\u5355\u63cf\u8ff0\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u591a\u7ef4\u5ea6\u7684\u5185\u5728\u542c\u89c9\u5c5e\u6027\uff0c\u6355\u6349\u97f3\u9891\u7c7b\u522b\u7684\u4e0d\u540c\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\u6765\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u6807\u7b7e\u7684\u96f6\u6837\u672c\u5b66\u4e60\u3002\u6211\u4eec\u5728VGGSound\u548cAudioSet\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff08\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://www.github.com/wsntxxn/AttrEnhZsAc\uff09\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u96f6\u6837\u672c\u5206\u7c7b\u51c6\u786e\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u9ad8\u3002\u6d88\u878d\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u65e0\u8bba\u6a21\u578b\u67b6\u6784\u5982\u4f55\uff0c\u6027\u80fd\u589e\u5f3a\u90fd\u975e\u5e38\u7a33\u5065\u3002|\n", "2407.15850": "|**2024-07-22**|**AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description**|Junyu Xie et.al.|[2407.15850](http://arxiv.org/abs/2407.15850)|**[link](https://github.com/Jyxarthur/AutoAD-Zero)**|**\u6211\u4eec\u7684\u76ee\u6807\u662f\u65e0\u9700\u8bad\u7ec3\u5730\u751f\u6210\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u97f3\u9891\u63cf\u8ff0\uff08AD\uff09\u3002\u6211\u4eec\u5229\u7528\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5e76\u5f00\u53d1\u4e86\u89c6\u89c9\u548c\u6587\u672c\u63d0\u793a\u7b56\u7565\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u70b9\uff1a(i) \u6211\u4eec\u8bc1\u660e\uff0c\u5982\u679c\u901a\u8fc7\u89c6\u89c9\u6307\u793a\u76f4\u63a5\u63d0\u793aVLM\u63d0\u4f9b\u89d2\u8272\u4fe1\u606f\uff0cVLM\u53ef\u4ee5\u6210\u529f\u547d\u540d\u548c\u5f15\u7528\u89d2\u8272\uff0c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\uff1b(ii) \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6765\u751f\u6210AD\uff0c\u7b2c\u4e00\u9636\u6bb5\u8ba9VLM\u5168\u9762\u63cf\u8ff0\u89c6\u9891\uff0c\u7b2c\u4e8c\u9636\u6bb5\u4f7f\u7528LLM\u5c06\u5bc6\u96c6\u7684\u6587\u672c\u4fe1\u606f\u603b\u7ed3\u4e3a\u4e00\u4e2a\u7b80\u6d01\u7684AD\u53e5\u5b50\uff1b(iii) \u6211\u4eec\u5236\u5b9a\u4e86\u4e00\u4e2a\u65b0\u7684\u7535\u89c6\u97f3\u9891\u63cf\u8ff0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5AutoAD-Zero\u5728AD\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08\u751a\u81f3\u4e0e\u4e00\u4e9b\u5728\u771f\u5b9eAD\u4e0a\u5fae\u8c03\u7684\u6a21\u578b\u76f8\u5339\u654c\uff09\uff0c\u5b9e\u73b0\u4e86\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u6700\u9ad8CRITIC\u8bc4\u5206\u3002**|\n", "2407.15847": "|**2024-07-22**|**LLMmap: Fingerprinting For Large Language Models**|Dario Pasquini et.al.|[2407.15847](http://arxiv.org/abs/2407.15847)|**[link](https://github.com/pasquini-dario/LLMmap)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u96c6\u6210\u5e94\u7528\u7684\u9996\u4ee3\u6307\u7eb9\u8bc6\u522b\u653b\u51fb\u5de5\u5177\u2014\u2014LLMmap\u3002\u8be5\u5de5\u5177\u91c7\u7528\u79ef\u6781\u7684\u6307\u7eb9\u8bc6\u522b\u7b56\u7565\uff0c\u901a\u8fc7\u5411\u5e94\u7528\u53d1\u9001\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u67e5\u8be2\u5e76\u5206\u6790\u54cd\u5e94\u4fe1\u606f\uff0c\u4ee5\u8bc6\u522b\u6240\u4f7f\u7528\u7684\u5177\u4f53LLM\u6a21\u578b\u3002\u4ec5\u97008\u6b21\u4ea4\u4e92\uff0cLLMmap\u5373\u53ef\u572895%\u4ee5\u4e0a\u7684\u51c6\u786e\u7387\u4e0b\u7cbe\u786e\u8bc6\u522b\u51faLLM\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cLLMmap\u88ab\u8bbe\u8ba1\u5f97\u5177\u6709\u8de8\u4e0d\u540c\u5e94\u7528\u5c42\u7684\u9c81\u68d2\u6027\uff0c\u4f7f\u5176\u80fd\u591f\u8bc6\u522b\u5728\u5404\u79cd\u7cfb\u7edf\u63d0\u793a\u3001\u968f\u673a\u62bd\u6837\u8d85\u53c2\u6570\u4ee5\u53ca\u590d\u6742\u7684\u751f\u6210\u6846\u67b6\u5982RAG\u6216Chain-of-Thought\u7b49\u73af\u5883\u4e0b\u7684LLM\u6a21\u578b\u3002|\n", "2407.15841": "|**2024-07-22**|**SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models**|Mingze Xu et.al.|[2407.15841](http://arxiv.org/abs/2407.15841)|**[link](https://github.com/apple/ml-slowfast-llava)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6162\u901f-LLaVA\u201d\uff08\u6216\u7b80\u79f0\u4e3aSF-LLaVA\uff09\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5b83\u80fd\u591f\u540c\u65f6\u6355\u6349\u8be6\u7ec6\u7684\u7a7a\u95f4\u8bed\u4e49\u548c\u957f\u65f6\u5e8f\u4e0a\u4e0b\u6587\uff0c\u800c\u4e0d\u4f1a\u8d85\u51fa\u901a\u5e38\u4f7f\u7528\u7684LLM\u7684\u4ee4\u724c\u9884\u7b97\u3002\u8fd9\u4e00\u76ee\u6807\u901a\u8fc7\u4f7f\u7528\u89c6\u9891LLM\u8f93\u5165\u7684\u53cc\u6d41\u8bbe\u8ba1\u5b9e\u73b0\uff0c\u6709\u6548\u5730\u805a\u5408\u4e86\u4ece\u91c7\u6837\u89c6\u9891\u5e27\u4e2d\u63d0\u53d6\u7684\u7279\u5f81\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6162\u901f\u8def\u5f84\u4ee5\u8f83\u4f4e\u7684\u5e27\u7387\u63d0\u53d6\u5c3d\u53ef\u80fd\u591a\u7684\u7a7a\u95f4\u7ec6\u8282\u7684\u7279\u5f81\uff08\u4f8b\u5982\uff0c\u4ee524x24\u7684\u4ee4\u724c\uff09\uff0c\u800c\u5feb\u901f\u8def\u5f84\u5219\u4ee5\u8f83\u9ad8\u7684\u5e27\u7387\u64cd\u4f5c\uff0c\u4f46\u4f7f\u7528\u8f83\u5927\u7684\u7a7a\u95f4\u6c60\u5316\u6b65\u957f\uff08\u4f8b\u5982\uff0c\u4e0b\u91c7\u68376x\uff09\u6765\u5173\u6ce8\u8fd0\u52a8\u7ebf\u7d22\u3002\u56e0\u6b64\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u6211\u4eec\u9002\u5f53\u5730\u6355\u83b7\u5bf9\u4e8e\u7406\u89e3\u89c6\u9891\u4e2d\u7684\u8be6\u7ec6\u4fe1\u606f\u6709\u76ca\u7684\u65f6\u7a7a\u7279\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSF-LLaVA\u5728\u5404\u79cd\u89c6\u9891\u4efb\u52a1\u4e0a\u90fd\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u3002\u5728\u67d0\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u751a\u81f3\u4e0e\u5728\u89c6\u9891\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u7684\u6700\u5148\u8fdb\u7684\u89c6\u9891LLM\u5b9e\u73b0\u4e86\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2407.15838": "|**2024-07-22**|**MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity**|Yangzhou Liu et.al.|[2407.15838](http://arxiv.org/abs/2407.15838)|**[link](https://github.com/yuecao0119/mminstruct)**|\u5c3d\u7ba1\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u89c6\u89c9\u4efb\u52a1\u4e0a\u7684\u5fae\u8c03\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4f46\u73b0\u6709\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u5b58\u5728\u4ee5\u4e0b\u5c40\u9650\u6027\uff1a 1. \u6307\u4ee4\u6ce8\u91ca\u8d28\u91cf\uff1a\u867d\u7136\u73b0\u6709\u7684\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u751f\u6210\u7684\u6307\u4ee4\u53ef\u80fd\u4ecd\u4f1a\u5305\u542b\u4e0d\u51c6\u786e\u6027\uff0c\u5982\u5e7b\u89c9\u73b0\u8c61\u3002 2. \u6307\u4ee4\u548c\u56fe\u50cf\u591a\u6837\u6027\uff1a\u6307\u4ee4\u7c7b\u578b\u8303\u56f4\u6709\u9650\u4ee5\u53ca\u56fe\u50cf\u6570\u636e\u7f3a\u4e4f\u591a\u6837\u6027\u53ef\u80fd\u4f1a\u5f71\u54cd\u6a21\u578b\u751f\u6210\u591a\u6837\u6027\u548c\u63a5\u8fd1\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8f93\u51fa\u7684\u80fd\u529b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6MMInstruct\uff0c\u5305\u542b\u6765\u81ea24\u4e2a\u9886\u57df\u5171\u8ba1973K\u6761\u6307\u4ee4\u3002\u8be5\u6570\u636e\u96c6\u5305\u62ec\u56db\u79cd\u6307\u4ee4\u7c7b\u578b\uff1a\u5224\u65ad\u3001\u591a\u9879\u9009\u62e9\u3001\u957f\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u548c\u77ed\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u3002 \u4e3a\u4e86\u6784\u5efaMMInstruct\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6307\u4ee4\u751f\u6210\u6570\u636e\u5f15\u64ce\uff0c\u5229\u7528GPT-4V\u3001GPT-3.5\u548c\u4eba\u5de5\u6821\u6b63\u3002\u6211\u4eec\u7684\u6307\u4ee4\u751f\u6210\u5f15\u64ce\u5141\u8bb8\u534a\u81ea\u52a8\u3001\u4f4e\u6210\u672c\u3001\u591a\u9886\u57df\u7684\u6307\u4ee4\u751f\u6210\uff0c\u6210\u672c\u4ec5\u4e3a\u624b\u52a8\u6784\u5efa\u7684\u516d\u5206\u4e4b\u4e00\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u548c\u6d88\u878d\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86MMInstruct\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff0c\u4f8b\u5982\uff0c\u57fa\u4e8eMMInstruct\u7684\u6a21\u578b\u5fae\u8c03\u572812\u4e2a\u57fa\u51c6\u4e2d\u768410\u4e2a\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u72b6\u6001\u6700\u4f18\u8868\u73b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728https://github.com/yuecao0119/MMInstruct\u63d0\u4f9b\u3002|\n", "2407.15835": "|**2024-07-22**|**dMel: Speech Tokenization made Simple**|He Bai et.al.|[2407.15835](http://arxiv.org/abs/2407.15835)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5229\u7528\u5927\u89c4\u6a21\u6587\u672c\u6570\u636e\u7684\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u7684\u8fdb\u6b65\u3002\u53d7\u6b64\u6210\u529f\u542f\u53d1\uff0c\u7814\u7a76\u4eba\u5458\u63a2\u7d22\u4e86\u590d\u6742\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\uff0c\u4ee5\u5c06\u8fde\u7eed\u7684\u8bed\u97f3\u4fe1\u53f7\u79bb\u6563\u5316\uff0c\u4ece\u800c\u4f7f\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u53ef\u4ee5\u5e94\u7528\u4e8e\u8bed\u97f3\u6570\u636e\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u5efa\u6a21\u8bed\u4e49\u4ee4\u724c\uff0c\u53ef\u80fd\u4f1a\u4e22\u5931\u58f0\u5b66\u4fe1\u606f\uff0c\u8981\u4e48\u5efa\u6a21\u58f0\u5b66\u4ee4\u724c\uff0c\u53c8\u53ef\u80fd\u9762\u4e34\u4e22\u5931\u8bed\u4e49\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5177\u6709\u591a\u79cd\u4ee4\u724c\u7c7b\u578b\u4e5f\u4f7f\u67b6\u6784\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u9700\u8981\u989d\u5916\u7684\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5c06\u6885\u5c14\u6ee4\u6ce2\u5668\u901a\u9053\u79bb\u6563\u5316\u4e3a\u79bb\u6563\u5f3a\u5ea6\u5355\u5143\uff08dMel\uff09\u4ea7\u751f\u4e86\u4e00\u4e2a\u7b80\u5355\u8868\u793a\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5176\u4ed6\u73b0\u6709\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u3002\u4f7f\u7528\u4ec5\u89e3\u7801\u5668\u7684\u53d8\u6362\u5668\u67b6\u6784\u8fdb\u884c\u8bed\u97f3-\u6587\u672c\u5efa\u6a21\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u4e0d\u540c\u7684\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u5728\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u548c\u8bed\u97f3\u5408\u6210\uff08TTS\uff09\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cdMel\u5728\u8054\u5408\u5efa\u6a21\u8bed\u97f3\u548c\u6587\u672c\u7684\u7edf\u4e00\u6846\u67b6\u4e2d\u5b9e\u73b0\u9ad8\u6027\u80fd\u7684\u6709\u6548\u6027\uff0c\u4e3a\u9ad8\u6548\u4e14\u6709\u6548\u7684\u8bed\u97f3\u4e0e\u6587\u672c\u8054\u5408\u5efa\u6a21\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.15819": "|**2024-07-22**|**Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight**|Ziyuan Huang et.al.|[2407.15819](http://arxiv.org/abs/2407.15819)|null|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u94fe\u89c6\u56fe\u201d\u7684\u89c6\u89c9-\u8bed\u8a00\u6865\u6881\u6a21\u5757\uff0c\u65e8\u5728\u52a0\u901f\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u9884\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u5e8f\u5217\u5316\u7684\u89c6\u89c9\u91cd\u91c7\u6837\u5668\uff0c\u80fd\u591f\u6709\u6548\u5730\u6355\u6349\u4e0d\u540c\u7a7a\u95f4\u5c3a\u5ea6\u7684\u89c6\u89c9\u7ec6\u8282\u3002\u8fd9\u79cd\u67b6\u6784\u4e0d\u4ec5\u80fd\u591f\u6709\u6548\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u8fd8\u901a\u8fc7\u590d\u5408\u4ee4\u724c\u7f29\u653e\u7b56\u7565\u7075\u6d3b\u6269\u5c55\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u6700\u591a\u53ef\u4ee5\u589e\u52a016\u500d\u7684\u4ee4\u724c\u6570\u91cf\uff0c\u800c\u65e0\u9700\u5728\u9884\u8bad\u7ec3\u540e\u8fdb\u884c\u5fae\u8c03\u3002\u56e0\u6b64\uff0c\u201c\u94fe\u89c6\u56fe\u201d\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u6240\u9700\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u8fdc\u5c11\u4e8e\u5fae\u8c03\u9636\u6bb5\uff0c\u8fd9\u6709\u610f\u5730\u51cf\u5c11\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u663e\u8457\u52a0\u901f\u4e86\u9884\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8282\u7701\u4e86\u5927\u7ea673%\u7684\u5b9e\u9645\u8bad\u7ec3\u65f6\u95f4\u3002 \u5728\u4e00\u7cfb\u5217\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u201c\u94fe\u89c6\u56fe\u201d\u52a0\u901f\u9884\u8bad\u7ec3\u8fc7\u7a0b\u5e76\u4e0d\u4f1a\u727a\u7272\u6027\u80fd\uff0c\u5176\u8868\u73b0\u4e0e\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u6240\u6709\u89c6\u89c9\u4ee4\u724c\u7684\u6807\u51c6\u6d41\u7a0b\u76f8\u5f53\u6216\u66f4\u597d\u3002\u8fdb\u4e00\u6b65\u589e\u52a0\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u4f1a\u5bfc\u81f4\u66f4\u5f3a\u7684\u8868\u73b0\uff0c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u73b0\u6709\u65b9\u6cd5\u7ade\u4e89\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u6458\u8981\u5df2\u7ecf\u8f6c\u6362\u6210\u4e86\u4e2d\u6587\u8868\u8ff0\uff0c\u5e76\u4e14\u9075\u5faa\u4e86\u4e0d\u5305\u542b\u7279\u6b8a\u7b26\u53f7\u7684\u6307\u793a\u3002|\n", "2407.15788": "|**2024-07-22**|**Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach**|Rian Dolphin et.al.|[2407.15788](http://arxiv.org/abs/2407.15788)|null|\u91d1\u878d\u65b0\u95fb\u5728\u91d1\u878d\u5e02\u573a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u4f46\u5c06\u5176\u8f6c\u5316\u4e3a\u7ed3\u6784\u5316\u6570\u636e\u7684\u8fc7\u7a0b\u4e00\u76f4\u5145\u6ee1\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u91d1\u878d\u65b0\u95fb\u5904\u7406\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u514b\u670d\u4e86\u4ee5\u5f80\u63d0\u53d6\u7ed3\u6784\u5316\u4fe1\u606f\u65f6\u9047\u5230\u7684\u9650\u5236\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u5957\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u80fd\u591f\u4ece\u539f\u59cb\u65b0\u95fb\u6587\u7ae0\u5185\u5bb9\u4e2d\u63d0\u53d6\u76f8\u5173\u516c\u53f8\u4ee3\u7801\uff0c\u5e76\u5728\u4e0d\u4f9d\u8d56\u4e8e\u9884\u7ed3\u6784\u5316\u6570\u636e\u6d41\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u516c\u53f8\u5c42\u9762\u7684\u60c5\u7eea\u5206\u6790\u548c\u751f\u6210\u6458\u8981\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86LLMs\u7684\u751f\u6210\u80fd\u529b\u3001\u4ee5\u53ca\u6700\u65b0\u7684\u63d0\u793a\u6280\u672f\uff0c\u914d\u4ee5\u4e00\u4e2a\u5b9a\u5236\u7684\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u9a8c\u8bc1\u6846\u67b6\u3002 \u901a\u8fc7\u4f7f\u7528\u5305\u542b5530\u7bc7\u91d1\u878d\u65b0\u95fb\u6587\u7ae0\u7684\u6570\u636e\u96c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u76f8\u6bd4\u73b0\u6709\u6570\u636e\u63d0\u4f9b\u5546\uff0c\u6211\u4eec\u670990%\u7684\u6587\u7ae0\u4e0d\u4f1a\u9057\u6f0f\u4efb\u4f55\u516c\u53f8\u4ee3\u7801\uff0c\u800c\u670922%\u7684\u6587\u7ae0\u4f1a\u989d\u5916\u63d0\u4f9b\u76f8\u5173\u7684\u516c\u53f8\u4ee3\u7801\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u5e76\u901a\u8fc7\u5b9e\u65f6API\u7aef\u70b9\u63d0\u4f9b\u4e86\u7ecf\u8fc7\u5904\u7406\u7684\u6570\u636e\uff0c\u66f4\u65b0\u4e86\u6700\u65b0\u65b0\u95fb\u4fe1\u606f\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u6211\u4eec\u9996\u6b21\u4f5c\u4e3a\u6570\u636e\u63d0\u4f9b\u5546\u63d0\u4f9b\u4ece\u65b0\u95fb\u6587\u7ae0\u4e2d\u5bf9\u6bcf\u4e2a\u516c\u53f8\u7684\u7ec6\u81f4\u60c5\u7eea\u5206\u6790\u670d\u52a1\uff0c\u589e\u5f3a\u4e86\u5e02\u573a\u53c2\u4e0e\u8005\u53ef\u83b7\u53d6\u7684\u4fe1\u606f\u6df1\u5ea6\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5229\u7528\u91d1\u878d\u65b0\u95fb\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86\u5305\u542b5530\u7bc7\u5904\u7406\u540e\u6587\u7ae0\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002|\n", "2407.15748": "|**2024-07-22**|**MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation**|Marco Simoni et.al.|[2407.15748](http://arxiv.org/abs/2407.15748)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86MoRSE\uff08\u6df7\u5408RAG\u5b89\u5168\u4e13\u5bb6\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u95e8\u7684AI\u804a\u5929\u673a\u5668\u4eba\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u3002MoRSE\u65e8\u5728\u63d0\u4f9b\u5168\u9762\u4e14\u5b8c\u6574\u7684\u7f51\u7edc\u5b89\u5168\u77e5\u8bc6\u3002MoRSE\u4f7f\u7528\u4e86\u4e24\u4e2aRAG\uff08\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff09\u7cfb\u7edf\uff0c\u8bbe\u8ba1\u7528\u4e8e\u4ece\u591a\u7ef4\u5ea6\u7684\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u548c\u7ec4\u7ec7\u4fe1\u606f\u3002\u4e0e\u4f20\u7edf\u7684RAG\u4e0d\u540c\uff0cMoRSE\u91c7\u7528\u4e86\u5e76\u884c\u68c0\u7d22\u5668\u534f\u540c\u5de5\u4f5c\uff0c\u4ee5\u5728\u4e0d\u540c\u683c\u5f0f\u548c\u7ed3\u6784\u4e2d\u68c0\u7d22\u8bed\u4e49\u76f8\u5173\u7684\u4fe1\u606f\u3002 \u4e0d\u540c\u4e8e\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u5e93\u7684\u4f20\u7edf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cMoRSE\u54cd\u5e94\u7528\u6237\u67e5\u8be2\u65f6\u4ece\u975e\u53c2\u6570\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u76f8\u5173\u6587\u6863\u3002\u968f\u540e\uff0cMoRSE\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0cMoRSE\u53d7\u76ca\u4e8e\u5176\u77e5\u8bc6\u5e93\u7684\u5b9e\u65f6\u66f4\u65b0\uff0c\u8fd9\u4f7f\u5f97\u7cfb\u7edf\u80fd\u591f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u6301\u7eed\u7684\u77e5\u8bc6\u4e30\u5bcc\u3002 \u6211\u4eec\u5bf9MoRSE\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9600\u4e2a\u7279\u5b9a\u7684\u7f51\u7edc\u5b89\u5168\u95ee\u9898\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6027\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u3001Mixtral 7x8\u7b49\u5df2\u77e5\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\uff0c\u5728\u7b54\u6848\u7684\u76f8\u5173\u6027\u548c\u6b63\u786e\u6027\u7684\u6539\u8fdb\u4e0a\u8d85\u8fc7\u4e8610%\u3002|\n", "2407.15736": "|**2024-07-22**|**OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context**|Steffen Kleinle et.al.|[2407.15736](http://arxiv.org/abs/2407.15736)|null|\u5f53\u79fb\u6c11\u5230\u4e00\u4e2a\u65b0\u7684\u56fd\u5bb6\u65f6\uff0c\u4eba\u4eec\u5f88\u5bb9\u6613\u56e0\u9700\u8981\u83b7\u53d6\u6709\u5173\u8d22\u653f\u652f\u6301\u3001\u4f4f\u623f\u3001\u6559\u80b2\u3001\u8bed\u8a00\u8bfe\u7a0b\u4ee5\u53ca\u5176\u4ed6\u95ee\u9898\u7684\u4fe1\u606f\u800c\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5982\u679c\u642c\u8fc1\u8fc7\u7a0b\u5306\u5fd9\u6216\u751a\u81f3\u88ab\u8feb\u8fdb\u884c\uff0c\u5bf9\u8fd9\u4e9b\u95ee\u9898\u7684\u9ad8\u8d28\u91cf\u89e3\u7b54\u53d8\u5f97\u5c24\u4e3a\u8feb\u5207\u3002\u5b98\u65b9\u79fb\u6c11\u987e\u95ee\u901a\u5e38\u8fc7\u4e8e\u7e41\u5fd9\uff0c\u800c\u5728\u7ebf\u7cfb\u7edf\u53ef\u4ee5\u5f15\u5bfc\u65b0\u79fb\u6c11\u627e\u5230\u6240\u9700\u4fe1\u606f\u6216\u5408\u9002\u7684\u54a8\u8be2\u670d\u52a1\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86OMoS-QA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u5fb7\u8bed\u548c\u82f1\u8bed\u95ee\u9898\u4e0e\u76f8\u5173\u53ef\u4fe1\u6587\u6863\u4ee5\u53ca\u624b\u52a8\u6807\u6ce8\u7684\u7b54\u6848\uff0c\u4e13\u95e8\u9488\u5bf9\u8fd9\u4e00\u573a\u666f\u3002\u95ee\u9898\u662f\u7531\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u751f\u6210\u7684\uff0c\u7b54\u6848\u53e5\u5b50\u7531\u5177\u6709\u9ad8\u5ea6\u4e00\u81f4\u6027\u7684\u4f17\u5305\u5de5\u4f5c\u8005\u9009\u62e9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u5fb7\u8bed\u548c\u82f1\u8bed\u4e0a\u5bf95\u4e2a\u9884\u8bad\u7ec3\u7684LLM\u8fdb\u884c\u4e86\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u7684\u6bd4\u8f83\u3002\u5728\u6240\u6709\u6a21\u578b\u548c\u4e24\u79cd\u8bed\u8a00\u4e2d\uff0c\u9009\u62e9\u7b54\u6848\u53e5\u5b50\u7684\u7cbe\u786e\u5ea6\u9ad8\uff0c\u53ec\u56de\u7387\u4f4e\u81f3\u4e2d\u7b49\uff0c\u8fd9\u662f\u4e00\u4e2a\u6709\u5229\u7684\u6743\u8861\uff0c\u4ee5\u907f\u514d\u8bef\u5bfc\u7528\u6237\u3002\u8fd9\u79cd\u6027\u80fd\u5373\u4f7f\u5728\u95ee\u9898\u8bed\u8a00\u4e0e\u6587\u6863\u8bed\u8a00\u4e0d\u5339\u914d\u65f6\u4e5f\u80fd\u4fdd\u6301\u4e0d\u53d8\u3002\u5728\u6839\u636e\u4e0a\u4e0b\u6587\u8bc6\u522b\u4e0d\u53ef\u56de\u7b54\u7684\u95ee\u9898\u65b9\u9762\uff0c\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u5b58\u5728\u66f4\u5927\u7684\u5dee\u5f02\u3002|\n", "2407.15734": "|**2024-07-22**|**TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON**|John Chong Min Tan et.al.|[2407.15734](http://arxiv.org/abs/2407.15734)|**[link](https://github.com/simbianai/taskgen)**|TaskGen\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u4f7f\u7528\u4ee3\u7406\u6765\u89e3\u51b3\u4efb\u610f\u4efb\u52a1\u5e76\u5c06\u5176\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\u6765\u5b9e\u73b0\u3002\u6bcf\u4e2a\u5b50\u4efb\u52a1\u88ab\u6620\u5c04\u5230\u4e00\u4e2a\u88c5\u5907\u51fd\u6570\u6216\u53e6\u4e00\u4e2a\u4ee3\u7406\u6267\u884c\u3002\u4e3a\u4e86\u51cf\u5c11\u5197\u4f59\uff08\u4ece\u800c\u51cf\u5c11\u4ee4\u724c\u4f7f\u7528\uff09\uff0cTaskGen\u4f7f\u7528\u4e86StrictJSON\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u7684JSON\u683c\u5f0f\uff0c\u5e76\u5177\u5907\u7c7b\u578b\u68c0\u67e5\u548c\u8fed\u4ee3\u9519\u8bef\u4fee\u6b63\u7b49\u989d\u5916\u529f\u80fd\u3002TaskGen\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\u57fa\u4e8e\u9700\u6c42\u7ba1\u7406\u4fe1\u606f/\u8bb0\u5fc6\u3002 \u6211\u4eec\u5bf9TaskGen\u5728\u5404\u79cd\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u5305\u62ec40x40\u52a8\u6001\u8ff7\u5bab\u5bfc\u822a\uff0c\u5176\u4e2d\u969c\u788d\u7269\u4f4d\u7f6e\u4f1a\u53d8\u5316\uff08100%\u7684\u6210\u529f\u7387\uff09\uff0c\u6587\u672c\u4e16\u754c\u9003\u8131\u623f\u95f4\u89e3\u8c1c\uff0c\u5177\u6709\u5bc6\u96c6\u5956\u52b1\u548c\u8be6\u7ec6\u76ee\u6807\uff0896%\u7684\u6210\u529f\u7387\uff09\uff0c\u7f51\u7edc\u6d4f\u89c8\uff0869%\u7684\u52a8\u4f5c\u6210\u529f\uff09\uff0c\u89e3\u51b3MATH\u6570\u636e\u96c6\uff08\u5728100\u4e2aLevel-5\u95ee\u9898\u4e0a\uff0c\u6210\u529f\u738771%\uff09\uff0c\u4ee5\u53ca\u81ea\u7136\u95ee\u9898\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08F1\u5206\u6570\u4e3a47.03%\uff09\u3002|\n", "2407.16686": "|**2024-07-23**|**Can Large Language Models Automatically Jailbreak GPT-4V?**|Yuanwei Wu et.al.|[2407.16686](http://arxiv.org/abs/2407.16686)|null|GPT-4V\u56e0\u5176\u5728\u6574\u5408\u548c\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u800c\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u3002\u540c\u65f6\uff0c\u5176\u9762\u90e8\u8bc6\u522b\u529f\u80fd\u4e5f\u5f15\u53d1\u4e86\u9690\u79c1\u6cc4\u9732\u7684\u5b89\u5168\u62c5\u5fe7\u3002\u5c3d\u7ba1\u7814\u7a76\u8005\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u4e0e\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6216\u9884\u5904\u7406\u8fc7\u6ee4\u5668\u7b49\u624b\u6bb5\u52aa\u529b\u5b9e\u73b0\u5b89\u5168\u5bf9\u9f50\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u5b58\u5728\u88ab\u5229\u7528\u7684\u6f0f\u6d1e\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AutoJailbreak\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u81ea\u52a8\u8d8a\u72f1\u6280\u672f\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u63d0\u793a\u4f18\u5316\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u7ea2\u961f\u8bad\u7ec3\uff0c\u4ee5\u7cbe\u70bc\u8d8a\u72f1\u63d0\u793a\uff0c\u5e76\u91c7\u7528\u5f31\u5230\u5f3a\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u63d0\u793a\u6765\u63d0\u9ad8\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u7ed3\u5408\u65e9\u671f\u505c\u6b62\u7b56\u7565\uff0c\u4ee5\u6700\u5c0f\u5316\u4f18\u5316\u65f6\u95f4\u548c\u4ee4\u724c\u6d88\u8017\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoJailbreak\u663e\u8457\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e86\u8d85\u8fc795.3%\u7684\u6210\u529f\u653b\u51fb\u7387\uff08ASR\uff09\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86\u52a0\u5f3aGPT-4V\u5b89\u5168\u6027\u7684\u6f5c\u529b\uff0c\u7a81\u663e\u4e86LLMs\u53ef\u80fd\u88ab\u7528\u4e8e\u7834\u574fGPT-4V\u5b8c\u6574\u6027\u7684\u98ce\u9669\u3002|\n", "2407.16667": "|**2024-07-23**|**RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent**|Huiyu Xu et.al.|[2407.16667](http://arxiv.org/abs/2407.16667)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u5df2\u88ab\u96c6\u6210\u81f3\u8bf8\u591a\u5b9e\u9645\u5e94\u7528\uff0c\u4f8b\u5982\u4ee3\u7801\u52a9\u624bCopilot\u3002\u8fd9\u4e9b\u96c6\u6210\u663e\u8457\u6269\u5c55\u4e86LLM\u7684\u653b\u51fb\u9762\uff0c\u4f7f\u5176\u9762\u4e34\u591a\u79cd\u5a01\u80c1\u3002\u5176\u4e2d\uff0c\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u8bf1\u5bfc\u51fa\u6bd2\u6027\u54cd\u5e94\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u5f15\u8d77\u4e86\u5b89\u5168\u9886\u57df\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u8bc6\u522b\u8fd9\u4e9b\u5a01\u80c1\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7ea2\u961f\u7b56\u7565\u901a\u8fc7\u6784\u5efa\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u6765\u6a21\u62df\u6f5c\u5728\u7684\u5bf9\u6297\u573a\u666f\uff0c\u4ee5\u6b64\u6d4b\u8bd5\u76ee\u6807LLM\u3002\u7136\u800c\uff0c\u73b0\u6709\u7ea2\u961f\u7b56\u7565\u5e76\u672a\u8003\u8651LLM\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u72ec\u7279\u8106\u5f31\u6027\uff0c\u4f7f\u5f97\u6784\u5efa\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u53d8\u5f97\u56f0\u96be\u3002\u540c\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u4f9d\u8d56\u4e8e\u5c11\u6570\u53d8\u5f02\u64cd\u4f5c\u5bf9\u201c\u8d8a\u72f1\u201d\u6a21\u677f\u8fdb\u884c\u7ec6\u5316\uff0c\u7f3a\u4e4f\u9002\u5e94\u4e0d\u540c\u60c5\u5883\u7684\u81ea\u52a8\u5316\u548c\u89c4\u6a21\u5316\u80fd\u529b\u3002 \u4e3a\u4e86\u5b9e\u73b0\u60c5\u5883\u611f\u77e5\u548c\u9ad8\u6548\u7ea2\u961f\u7b56\u7565\uff0c\u6211\u4eec\u62bd\u8c61\u5e76\u5efa\u6a21\u73b0\u6709\u653b\u51fb\u884c\u4e3a\u4e3a\u4e00\u4e2a\u7edf\u4e00\u6982\u5ff5\u2014\u2014\u201c\u8d8a\u72f1\u7b56\u7565\u201d\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u667a\u80fd\u4f53LLM\u7cfb\u7edfRedAgent\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u751f\u6210\u60c5\u5883\u611f\u77e5\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u5e76\u901a\u8fc7\u989d\u5916\u7684\u8bb0\u5fc6\u7f13\u51b2\u533a\u81ea\u6211\u53cd\u601d\u60c5\u5883\u53cd\u9988\uff0c\u6301\u7eed\u5b66\u4e60\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u5728\u7279\u5b9a\u60c5\u5883\u4e0b\u5b9e\u73b0\u6709\u6548\u201c\u8d8a\u72f1\u201d\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u53ef\u4ee5\u5728\u4e94\u4e2a\u67e5\u8be2\u5185\u6210\u529f\u201c\u8d8a\u72f1\u201d\u5927\u591a\u6570\u9ed1\u76d2LLM\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u7ea2\u961f\u65b9\u6cd5\u6548\u7387\u63d0\u5347\u4e24\u500d\u3002\u6b64\u5916\uff0cRedAgent\u80fd\u591f\u66f4\u9ad8\u6548\u5730\u9488\u5bf9\u5b9a\u5236\u5316\u7684LLM\u5e94\u7528\u8fdb\u884c\u201c\u8d8a\u72f1\u201d\u3002 \u901a\u8fc7\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u5e94\u7528\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u6211\u4eec\u53d1\u73b0\u4e8660\u4e2a\u4e25\u91cd\u6f0f\u6d1e\u5b58\u5728\u4e8e\u5b9e\u9645\u5e94\u7528\u4e2d\u7684GPTs\u4e0a\uff0c\u4ec5\u9700\u6bcf\u6f0f\u6d1e\u4e24\u6b21\u67e5\u8be2\u3002\u6211\u4eec\u5df2\u62a5\u544a\u6240\u6709\u53d1\u73b0\u7684\u95ee\u9898\uff0c\u5e76\u4e0eOpenAI\u548cMeta\u8fdb\u884c\u4e86\u6c9f\u901a\u4ee5\u4fee\u590d\u6f0f\u6d1e\u3002|\n", "2407.16637": "|**2024-07-23**|**Course-Correction: Safety Alignment Using Synthetic Preferences**|Rongwu Xu et.al.|[2407.16637](http://arxiv.org/abs/2407.16637)|**[link](https://github.com/pillowsofwind/course-correction)**|### \u6458\u8981 \u672c\u6587\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u201c\u8bfe\u7a0b\u7ea0\u6b63\u201d\u4efb\u52a1\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u4e00\u9879\u7cfb\u7edf\u6027\u7814\u7a76\uff0c\u5373\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u5730\u907f\u514d\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\textsc{C$^2$-Eval}\u57fa\u51c6\u7528\u4e8e\u5b9a\u91cf\u8bc4\u4f30\uff0c\u5e76\u5206\u6790\u4e8610\u4e2a\u6d41\u884cLLM\u7684\u6027\u80fd\uff0c\u63ed\u793a\u4e86\u5f53\u524d\u5b89\u5168\u8c03\u4f18\u7684LLM\u5728\u8bfe\u7a0b\u7ea0\u6b63\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u504f\u597d\u5b66\u4e60\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u5f3a\u8c03\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u91cd\u8981\u6027\u3002\u901a\u8fc7\u81ea\u52a8\u5316\u6d41\u7a0b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\\textsc{C$^2$-Syn}\u5408\u6210\u6570\u636e\u96c6\uff0c\u5305\u542b75\u4e07\u5bf9\u504f\u597d\uff0c\u4ee5\u6b64\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u504f\u597d\u5b66\u4e60\u6559\u6388\u6a21\u578b\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u6982\u5ff5\u3002\u5728\\textsc{Llama2-Chat 7B}\u548c\\textsc{Qwen2 7B}\u4e24\u4e2aLLM\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u63d0\u9ad8\u4e86\u8bfe\u7a0b\u7ea0\u6b63\u80fd\u529b\uff0c\u540c\u65f6\u4e0d\u5f71\u54cd\u603b\u4f53\u6027\u80fd\uff0c\u5e76\u4e14\u7279\u522b\u6709\u6548\u5730\u63d0\u5347\u4e86LLM\u7684\u5b89\u5168\u6027\uff0c\u5c24\u5176\u662f\u62b5\u6297\u9003\u8131\u653b\u51fb\u7684\u80fd\u529b\u3002|\n", "2407.16615": "|**2024-07-23**|**Lawma: The Power of Specialization for Legal Tasks**|Ricardo Dominguez-Olmedo et.al.|[2407.16615](http://arxiv.org/abs/2407.16615)|null|\u6cd5\u5f8b\u6587\u672c\u7684\u6ce8\u91ca\u4e0e\u5206\u7c7b\u662f\u5b9e\u8bc1\u6cd5\u5b66\u7814\u7a76\u7684\u6838\u5fc3\u90e8\u5206\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5f80\u5f80\u7531\u53d7\u8fc7\u8bad\u7ec3\u7684\u7814\u7a76\u52a9\u7406\u627f\u62c5\u3002\u5728\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u8fdb\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5b9e\u8bc1\u6cd5\u5f8b\u5b66\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u8f6c\u5411\u4f7f\u7528\u5546\u4e1a\u6a21\u578b\uff0c\u5e0c\u671b\u4ee5\u6b64\u51cf\u8f7b\u4eba\u5de5\u6807\u6ce8\u7684\u5de8\u5927\u6210\u672c\u3002\u5c3d\u7ba1\u8fd9\u7c7b\u65b9\u6cd5\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4f46\u5173\u4e8e\u5982\u4f55\u6700\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6cd5\u5f8b\u4efb\u52a1\u7684\u76f8\u5173\u7814\u7a76\u4ecd\u7136\u6709\u9650\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u51e0\u4e4e\u5168\u90e8\u9488\u5bf9\u673a\u5668\u5b66\u4e60\u793e\u533a\u7684\u65b0\u6cd5\u5f8b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u3002\u4eceGPT-4\u4f5c\u4e3a\u57fa\u51c6\u5f00\u59cb\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u5728\u96f6\u6837\u672c\u51c6\u786e\u5ea6\u4e0a\u7684\u8868\u73b0\u5177\u6709\u975e\u540c\u5bfb\u5e38\u4f46\u9ad8\u5ea6\u591a\u53d8\u6027\uff0c\u7ecf\u5e38\u8868\u73b0\u51fa\u53ef\u80fd\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u6cd5\u5f8b\u5de5\u4f5c\u9700\u6c42\u7684\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8f7b\u5ea6\u5fae\u8c03\u540e\u7684Llama 3\u6a21\u578b\u5728\u51e0\u4e4e\u6240\u6709\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5747\u8fdc\u8d85GPT-4\uff0c\u901a\u5e38\u63d0\u9ad8\u4e86\u4e24\u4f4d\u6570\u767e\u5206\u70b9\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5728\u5fae\u8c03\u65f6\u54cd\u5e94\u6548\u679c\u66f4\u597d\u3002\u51e0\u5341\u5230\u51e0\u767e\u4e2a\u793a\u4f8b\u8db3\u4ee5\u5b9e\u73b0\u9ad8\u5206\u7c7b\u51c6\u786e\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u6240\u6709260\u4e2a\u4efb\u52a1\u4e0a\u540c\u65f6\u5fae\u8c03\u4e00\u4e2a\u6a21\u578b\uff0c\u76f8\u5bf9\u4e8e\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u5355\u72ec\u521b\u5efa\u6a21\u578b\uff0c\u4ec5\u5728\u51c6\u786e\u6027\u65b9\u9762\u7565\u6709\u635f\u5931\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u6307\u51fa\u4e86\u66ff\u4ee3\u73b0\u6709\u505a\u6cd5\u7684\u4e00\u79cd\u53ef\u884c\u9009\u62e9\u3002\u5bf9\u4e8e\u5177\u5907\u4e00\u5b9a\u6807\u6ce8\u6570\u636e\u7684\u7279\u5b9a\u6cd5\u5f8b\u4efb\u52a1\uff0c\u7814\u7a76\u4eba\u5458\u66f4\u5e94\u8003\u8651\u4f7f\u7528\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002|\n", "2407.16604": "|**2024-07-23**|**Shared Imagination: LLMs Hallucinate Alike**|Yilun Zhou et.al.|[2407.16604](http://arxiv.org/abs/2407.16604)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u8fd1\u53d1\u5c55\u5448\u73b0\u4e86\u663e\u8457\u7684\u589e\u957f\uff0c\u4f46\u5b83\u4eec\u7684\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u6570\u636e\u548c\u4f18\u5316\u7b97\u6cd5\u2014\u2014\u5f80\u5f80\u6781\u4e3a\u76f8\u4f3c\u3002\u8fd9\u81ea\u7136\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u5982\u4f55\uff1f\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u8bbe\u7f6e\uff0c\u5373\u60f3\u8c61\u95ee\u9898\u56de\u7b54\uff08IQA\uff09\uff0c\u4ee5\u66f4\u6df1\u5165\u5730\u7406\u89e3\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u3002\u5728IQA\u4e2d\uff0c\u6211\u4eec\u8ba9\u4e00\u4e2a\u6a21\u578b\u751f\u6210\u5b8c\u5168\u865a\u6784\u7684\u95ee\u9898\uff08\u4f8b\u5982\uff0c\u5173\u4e8e\u7269\u7406\u4e2d\u5b8c\u5168\u4e0d\u5b58\u5728\u7684\u6982\u5ff5\uff09\uff0c\u7136\u540e\u8ba9\u53e6\u4e00\u4e2a\u6a21\u578b\u8fdb\u884c\u56de\u7b54\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u95ee\u9898\u5b8c\u5168\u865a\u6784\uff0c\u4f46\u6240\u6709\u6a21\u578b\u90fd\u80fd\u6210\u529f\u56de\u7b54\u5bf9\u65b9\u7684\u95ee\u9898\uff0c\u8fd9\u8868\u660e\u5728\u8fd9\u6837\u7684\u5e7b\u89c9\u8fc7\u7a0b\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u5171\u4eab\u7740\u4e00\u4e2a\u201c\u5171\u540c\u7684\u60f3\u8c61\u7a7a\u95f4\u201d\u3002 \u6211\u4eec\u5bf9\u8fd9\u4e00\u73b0\u8c61\u8fdb\u884c\u4e86\u7cfb\u5217\u8c03\u67e5\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b83\u5bf9\u6a21\u578b\u540c\u8d28\u6027\u3001\u5e7b\u89c9\u4ee5\u53ca\u8ba1\u7b97\u521b\u9020\u529b\u7684\u542f\u793a\u3002|\n", "2407.16576": "|**2024-07-23**|**Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs**|Yifan Xia et.al.|[2407.16576](http://arxiv.org/abs/2407.16576)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u68c0\u6d4b\u52a0\u5bc6API\u8bef\u7528\u65b9\u9762\u6240\u9762\u4e34\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u5728\u5f53\u524d\u81ea\u52a8\u5316\u68c0\u6d4b\u6280\u672f\u8fdb\u6b65\u7684\u57fa\u7840\u4e0a\uff0c\u5bf9\u4e8e\u590d\u6742\u76ee\u6807\u7684\u7cbe\u786e\u5ea6\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u624b\u52a8\u5b9a\u4e49\u6a21\u5f0f\u7684\u4f9d\u8d56\u3002LLM\u4ee5\u5176\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\uff0c\u5728\u6b64\u5173\u952e\u5b89\u5168\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5c06LLM\u5e94\u7528\u4e8e\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u6311\u6218\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u5b83\u4eec\u56fa\u6709\u7684\u968f\u673a\u6027\u548c\u4f17\u6240\u5468\u77e5\u7684\u5e7b\u89c9\u95ee\u9898\u5bfc\u81f4\u7684\u4e0d\u7a33\u5b9a\u6027\u3002 \u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u52a0\u5bc6\u8bef\u7528\u65b9\u9762\u7684\u53ef\u9760\u6027\uff0c\u5e76\u63a2\u7d22\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5229\u7528\u6db5\u76d6\u4eba\u5de5\u6784\u5efa\u6837\u672c\u548c\u5b9e\u9645\u9879\u76ee\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8fdb\u884c\u5206\u6790\u3002\u901a\u8fc7\u6df1\u5165\u5206\u679011,940\u4efdLLM\u751f\u6210\u7684\u62a5\u544a\uff0c\u6211\u4eec\u63ed\u793a\u4e86LLM\u56fa\u6709\u4e0d\u7a33\u5b9a\u6027\u7684\u666e\u904d\u5b58\u5728\uff0c\u5bfc\u81f4\u8d85\u8fc7\u4e00\u534a\u7684\u62a5\u544a\u88ab\u8bef\u62a5\u4e3a\u8bef\u7528\u3002\u7136\u800c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u901a\u8fc7\u9650\u5236\u95ee\u9898\u8303\u56f4\u5e76\u4e0eLLM\u7684\u81ea\u6211\u4fee\u6b63\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u68c0\u6d4b\u7684\u53ef\u9760\u6027\u3002\u4f18\u5316\u7684\u65b9\u6cd5\u5b9e\u73b0\u4e86\u63a5\u8fd190%\u7684\u68c0\u6d4b\u7387\uff0c\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5e76\u5728\u73b0\u6709\u57fa\u51c6\u4e2d\u53d1\u73b0\u4e86\u672a\u88ab\u53d1\u73b0\u7684\u8bef\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u6301\u7eed\u963b\u788dLLM\u53ef\u9760\u6027\u7684\u5931\u8d25\u6a21\u5f0f\uff0c\u5305\u62ec\u52a0\u5bc6\u77e5\u8bc6\u4e0d\u8db3\u548c\u4ee3\u7801\u8bed\u4e49\u8bef\u89e3\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4ee5LLM\u4e3a\u57fa\u7840\u7684\u5de5\u4f5c\u6d41\u7a0b\u6765\u68c0\u67e5\u5f00\u6e90\u4ed3\u5e93\uff0c\u6700\u7ec8\u53d1\u73b0\u4e8663\u4e2a\u771f\u5b9e\u7684\u52a0\u5bc6\u8bef\u7528\u6848\u4f8b\u3002\u5176\u4e2d46\u4e2a\u5df2\u88ab\u5f00\u53d1\u793e\u533a\u8ba4\u53ef\uff0c23\u4e2a\u6b63\u5728\u5904\u7406\u4e2d\uff0c6\u4e2a\u5df2\u5f97\u5230\u89e3\u51b3\u3002\u8003\u8651\u5230\u5f00\u53d1\u8005\u53cd\u9988\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u548cLLM\u5b89\u5168\u5de5\u5177\u53d1\u5c55\u7684\u5efa\u8bae\u3002|\n", "2407.16565": "|**2024-07-23**|**Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models**|Ioana Buhnila et.al.|[2407.16565](http://arxiv.org/abs/2407.16565)|**[link](https://github.com/ATILF-UMR7118/pRAGe)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u5bf9\u516c\u4f17\u800c\u8a00\u53d8\u5f97\u6108\u53d1\u4fbf\u6377\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4eba\u4eec\u5728\u533b\u7597\u5efa\u8bae\u65b9\u9762\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u60c5\u51b5\u96be\u4ee5\u8ffd\u8e2a\u3002\u5927\u578b\u8bed\u8a00\u751f\u6210\u6a21\u578b\u5b58\u5728\u4e24\u4e2a\u5173\u952e\u95ee\u9898\uff1a\u9996\u5148\uff0c\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u9519\u8bef\u63a8\u7406\uff0c\u56e0\u6b64\u7528\u4e8e\u533b\u7597\u76ee\u7684\u65f6\u9700\u8981\u5177\u5907\u79d1\u5b66\u6027\u548c\u4e8b\u5b9e\u6027\uff1b\u5176\u6b21\uff0c\u7531\u4e8e\u6a21\u578b\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u8ba1\u7b97\u8d44\u6e90\u6784\u6210\u91cd\u5927\u6311\u6218\u3002 \u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3apRAGe\u7684\u7ba1\u9053\uff0c\u65e8\u5728\u901a\u8fc7\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u8fdb\u884c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u4e0e\u8bc4\u4f30\uff0c\u4ee5\u5b9e\u73b0\u6cd5\u8bed\u533b\u5b66\u77ed\u8bed\u751f\u6210\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u6027\u4ee5\u53ca\u5916\u90e8\u77e5\u8bc6\u5e93\u5728\u533b\u5b66\u77ed\u8bed\u751f\u6210\u4e2d\u7684\u5f71\u54cd\u3002|\n", "2407.16557": "|**2024-07-23**|**Patched RTC: evaluating LLMs for diverse software development tasks**|Asankhaya Sharma et.al.|[2407.16557](http://arxiv.org/abs/2407.16557)|**[link](https://github.com/codelion/optillm/blob/main/rto.py)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8865\u4e01\u5f80\u8fd4\u6b63\u786e\u6027\uff08Patched RTC\uff09\u201d\u7684\u65b0\u578b\u8bc4\u4f30\u65b9\u6cd5\uff0c\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u7279\u522b\u662f\u201c\u5916\u5faa\u73af\u201d\u6d3b\u52a8\uff0c\u5982\u9519\u8bef\u4fee\u590d\u3001\u4ee3\u7801\u5ba1\u67e5\u548c\u6587\u6863\u66f4\u65b0\u3002Patched RTC\u662f\u5bf9\u539f\u5f80\u8fd4\u6b63\u786e\u6027\u65b9\u6cd5\u7684\u6269\u5c55\uff0c\u9002\u7528\u4e8e\u4efb\u4f55LLM\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u6211\u8bc4\u4f30\u6846\u67b6\uff0c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u5373\u53ef\u6d4b\u91cf\u6a21\u578b\u54cd\u5e94\u7684\u4e00\u81f4\u6027\u548c\u7a33\u5065\u6027\u3002\u7814\u7a76\u663e\u793a\u4e86Patched RTC\u5206\u6570\u4e0e\u7279\u5b9a\u4efb\u52a1\u51c6\u786e\u6027\u6307\u6807\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u5c06\u5176\u4f5c\u4e3a\u66ff\u4ee3LLM-as-Judge\u8303\u5f0f\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5f00\u653e\u57df\u4efb\u52a1\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3apatchwork\u7684\u5f00\u6e90\u6846\u67b6\u5b9e\u73b0Patched RTC\uff0c\u5728\u5404\u79cd\u8865\u4e01\u6d41\u4e2d\u5b9e\u73b0\u4e86\u5bf9\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u900f\u660e\u8bc4\u4f30\u3002 \u6bd4\u8f83GPT-3.5\u548cGPT-4\u6a21\u578b\u5728\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86Patched RTC\u80fd\u591f\u6709\u6548\u5730\u533a\u5206\u6a21\u578b\u6027\u80fd\u548c\u4efb\u52a1\u96be\u5ea6\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u4e00\u81f4\u6027\u63d0\u793a\u5bf9\u63d0\u9ad8\u6a21\u578b\u51c6\u786e\u6027\u7684\u5f71\u54cd\uff0c\u8868\u660ePatched RTC\u53ef\u4ee5\u6307\u5bfc\u63d0\u793a\u4f18\u5316\u548c\u6a21\u578b\u9009\u62e9\uff0c\u4ee5\u9002\u5e94\u590d\u6742\u7684\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002|\n", "2407.16552": "|**2024-07-24**|**MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues**|Liyun Zhang et.al.|[2407.16552](http://arxiv.org/abs/2407.16552)|null|\u5728\u89c6\u89c9\u3001\u542c\u89c9\u548c\u8bed\u8a00\u7b49\u591a\u6a21\u6001\u7ebf\u7d22\u7684\u89c6\u9891\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5c55\u793a\u4e86\u5353\u8d8a\u7684\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\u80fd\u529b\uff0c\u80fd\u591f\u7efc\u5408\u8fd9\u4e9b\u7ebf\u7d22\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5ffd\u89c6\u4e86\u6355\u6349\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\u4ee5\u53ca\u89c6\u9891\u4e2d\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u4ece\u800c\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u9650\u5236\u4e86\u5b83\u4eec\u7684\u6709\u6548\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65f6\u95f4\u654f\u611f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578bMicroEmo\uff0c\u65e8\u5728\u5c06\u6ce8\u610f\u529b\u96c6\u4e2d\u4e8e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u7ec6\u8282\u548c\u89c6\u9891\u4e2d\u7684\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5305\u542b\u4e86\u4e24\u4e2a\u5173\u952e\u7684\u67b6\u6784\u8d21\u732e\uff1a 1. \u5168\u5c40-\u5c40\u90e8\u6ce8\u610f\u529b\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u5b83\u7ed3\u5408\u4e86\u5168\u5c40\u5e27\u7ea7\u65f6\u95f4\u7ed1\u5b9a\u56fe\u50cf\u7279\u5f81\u4e0e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\uff0c\u5b9e\u73b0\u4e86\u5bf9\u6574\u4f53\u548c\u5c40\u90e8\u4fe1\u606f\u7684\u6709\u6548\u878d\u5408\uff1b 2. \u4e00\u4e2a\u8bdd\u8bed\u610f\u8bc6\u7684\u89c6\u9891Q-Former\uff0c\u5b83\u901a\u8fc7\u4e3a\u6bcf\u4e2a\u8bdd\u8bed\u6bb5\u843d\u548c\u6574\u4e2a\u89c6\u9891\u751f\u6210\u89c6\u89c9\u4ee4\u724c\u5e8f\u5217\u6765\u6355\u83b7\u591a\u5c42\u6b21\u548c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u7ec4\u5408\u5728\u4e00\u8d77\uff0c\u4ee5\u6355\u6349\u591a\u5c3a\u5ea6\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u3002 \u521d\u6b65\u7684\u5b9a\u6027\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e00\u4e2a\u5229\u7528\u591a\u6a21\u6001\u548c\u591a\u65b9\u9762\u7ebf\u7d22\u4ee5\u5f00\u653e\u8bcd\u6c47\uff08OV\uff09\u65b9\u5f0f\u9884\u6d4b\u60c5\u7eea\u7684\u65b0\u89e3\u91ca\u6027\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\uff08EMER\uff09\u4efb\u52a1\u4e2d\uff0cMicroEmo\u76f8\u8f83\u4e8e\u6700\u65b0\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2407.16521": "|**2024-07-23**|**AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game**|Yizhou Chi et.al.|[2407.16521](http://arxiv.org/abs/2407.16521)|null|\u6218\u7565\u6027\u7684\u793e\u4ea4\u63a8\u65ad\u6e38\u620f\u662f\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u7684\u5b9d\u8d35\u5b9e\u9a8c\u5e73\u53f0\uff0c\u5bf9\u4e8e\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u3001\u4eba\u5de5\u667a\u80fd\u9886\u57df\u4ee5\u53ca\u7b56\u7565\u6027\u6e38\u620f\u90fd\u6709\u91cd\u8981\u4ef7\u503c\u3002\u672c\u6587\u96c6\u4e2d\u4e8e\u5728\u6a21\u62df\u73af\u5883\u4e2d\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u300aAmong Us\u300b\u4f5c\u4e3a\u7814\u7a76\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u901a\u8fc7\u521b\u5efa\u4e00\u4e2a\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\u73af\u5883\uff0c\u79f0\u4e3aAmongAgent\uff0c\u8be5\u73af\u5883\u590d\u5236\u4e86\u300aAmong Us\u300b\u7684\u6e38\u620f\u52a8\u6001\u3002\u73a9\u5bb6\u626e\u6f14\u592a\u7a7a\u8239\u4e0a\u7684\u8239\u5458\uff0c\u4efb\u52a1\u662f\u8bc6\u522b\u7834\u574f\u592a\u7a7a\u8239\u7684\u5192\u540d\u9876\u66ff\u8005\u5e76\u6d88\u9664\u8239\u5458\u3002\u5728\u8fd9\u4e2a\u73af\u5883\u4e2d\uff0c\u6a21\u62df\u8bed\u8a00\u4ee3\u7406\u7684\u884c\u4e3a\u88ab\u5206\u6790\u3002\u5b9e\u9a8c\u6d89\u53ca\u4e0d\u540c\u8239\u5458\u548c\u5192\u540d\u9876\u66ff\u8005\u4eba\u683c\u539f\u578b\u914d\u7f6e\u7684\u591a\u6837\u5316\u7684\u6e38\u620f\u5e8f\u5217\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u6700\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6709\u6548\u5730\u638c\u63e1\u6e38\u620f\u89c4\u5219\uff0c\u5e76\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u5728\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u590d\u6742\u52a8\u4f5c\u7a7a\u95f4\u4e2d\u7684\u76ee\u6807\u5bfc\u5411\u6e38\u620f\u4e2d\u7684\u8bed\u8a00\u6a21\u578b\u6027\u80fd\u8fdb\u884c\u8fdb\u4e00\u6b65\u63a2\u7d22\uff0c\u8fd9\u4e9b\u8bbe\u7f6e\u63d0\u4f9b\u4e86\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4f1a\u9a71\u52a8\u573a\u666f\u4e2d\u8868\u73b0\u7684\u5b9d\u8d35\u673a\u4f1a\u3002|\n", "2407.17469": "|**2024-07-24**|**I Could've Asked That: Reformulating Unanswerable Questions**|Wenting Zhao et.al.|[2407.17469](http://arxiv.org/abs/2407.17469)|**[link](https://github.com/wenting-zhao/couldask)**|**\u5728\u4ece\u4e0d\u719f\u6089\u6587\u6863\u4e2d\u83b7\u53d6\u4fe1\u606f\u65f6\uff0c\u7528\u6237\u7ecf\u5e38\u63d0\u51fa\u65e0\u6cd5\u7531\u6587\u6863\u56de\u7b54\u7684\u95ee\u9898\u3002\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8bc6\u522b\u8fd9\u4e9b\u65e0\u6cd5\u56de\u7b54\u7684\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5e2e\u52a9\u7528\u6237\u91cd\u65b0\u6784\u5efa\u95ee\u9898\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u5b83\u4eec\u7684\u6574\u4f53\u5b9e\u7528\u6027\u3002\u6211\u4eec\u7cbe\u5fc3\u7f16\u6392\u4e86CouldAsk\uff0c\u4e00\u4e2a\u7528\u4e8e\u6587\u6863\u652f\u6301\u7684\u95ee\u7b54\u4efb\u52a1\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u65e8\u5728\u7814\u7a76\u91cd\u65b0\u6784\u5efa\u65e0\u6cd5\u56de\u7b54\u95ee\u9898\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u62ec\u4e86\u73b0\u6709\u7684\u548c\u65b0\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728CouldAsk\u4e0a\u7684\u8868\u73b0\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u91cd\u65b0\u6784\u5efa\u95ee\u9898\u65b9\u9762\u80fd\u529b\u6709\u9650\u3002\u5177\u4f53\u800c\u8a00\uff0cGPT-4\u548cLlama2-7B\u4ec5\u6210\u529f\u5730\u91cd\u65b0\u6784\u5efa\u4e86\u95ee\u9898\u768426%\u548c12%\u3002\u9519\u8bef\u5206\u6790\u663e\u793a\uff0c\u5931\u8d25\u7684\u91cd\u65b0\u6784\u5efa\u4e2d\u670962%\u7684\u539f\u56e0\u662f\u6a21\u578b\u53ea\u662f\u91cd\u8ff0\u4e86\u95ee\u9898\uff0c\u751a\u81f3\u751f\u6210\u4e86\u5b8c\u5168\u76f8\u540c\u7684\u95ee\u9898\u3002\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u57fa\u51c6\u4ee5\u53ca\u91cd\u73b0\u5b9e\u9a8c\u6240\u9700\u7684\u4ee3\u7801\u3002**|\n", "2407.17468": "|**2024-07-24**|**WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries**|Wenting Zhao et.al.|[2407.17468](http://arxiv.org/abs/2407.17468)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u666e\u904d\u5b58\u5728\u7684\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u7684\u4e8b\u5b9e\u6027\u8bc4\u4f30\u57fa\u51c6\u672a\u80fd\u8986\u76d6\u73b0\u5b9e\u4e16\u754c\u7528\u6237\u5bfb\u6c42\u4fe1\u606f\u7684\u591a\u6837\u5316\u77e5\u8bc6\u9886\u57df\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86WildHallucinations\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u4e8b\u5b9e\u6027\u3002\u8be5\u57fa\u51c6\u901a\u8fc7\u4fc3\u4f7fLLM\u751f\u6210\u6765\u81ea\u91ce\u5916\u7528\u6237-\u804a\u5929\u673a\u5668\u4eba\u5bf9\u8bdd\u4e2d\u7684\u5b9e\u4f53\u7684\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u751f\u6210\u5185\u5bb9\u968f\u540e\u81ea\u52a8\u4e0e\u4ece\u7f51\u7edc\u641c\u7d22\u7cfb\u7edf\u6536\u96c6\u7684\u6709\u7ec4\u7ec7\u7684\u77e5\u8bc6\u5e93\u8fdb\u884c\u4e8b\u5b9e\u68c0\u67e5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4e00\u534a\u4ee5\u4e0a\u7684\u5b9e\u9645\u4e16\u754c\u5b9e\u4f53\u5e76\u6ca1\u6709\u76f8\u5173\u7684\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u3002\u6211\u4eec\u572815\u4e2aLLM\u4e0a\u5bf97919\u4e2a\u5b9e\u4f53\u8fdb\u884c\u4e86118785\u6b21\u751f\u6210\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u53d1\u73b0\uff0cLLM\u5728\u6ca1\u6709\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u7684\u5b9e\u4f53\u4e0a\u4ea7\u751f\u66f4\u591a\u7684\u5e7b\u89c9\uff0c\u5e76\u4e14\u4e0d\u540c\u9886\u57df\u7684\u5e7b\u89c9\u7387\u5b58\u5728\u5dee\u5f02\u3002\u6700\u540e\uff0c\u5728\u4f7f\u7528\u76f8\u540c\u7684\u5e95\u5c42\u6a21\u578b\u65f6\uff0c\u4ec5\u589e\u52a0\u68c0\u7d22\u7ec4\u4ef6\u53ef\u4ee5\u7565\u5fae\u51cf\u5c11\u5e7b\u89c9\uff0c\u4f46\u65e0\u6cd5\u5b8c\u5168\u6d88\u9664\u5e7b\u89c9\u3002|\n", "2407.17467": "|**2024-07-24**|**CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models**|Jiawei Gu et.al.|[2407.17467](http://arxiv.org/abs/2407.17467)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f80\u5f80\u5728\u7279\u5b9a\u9886\u57df\u5185\u8868\u73b0\u4e0d\u4f73\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u7279\u5b9a\u9886\u57df\u7684\u6216\u4e13\u6709\u8bed\u6599\u5e93\u3002\u8fde\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u901a\u8fc7\u56de\u653e\u901a\u7528\u8bed\u6599\u5e76\u6ce8\u5165\u65b0\u9886\u57df\u7684\u7279\u5b9a\u77e5\u8bc6\u6765\u589e\u5f3aLLM\u7684\u80fd\u529b\uff0c\u4ee5\u6b64\u9632\u6b62\u707e\u96be\u6027\u9057\u5fd8\u3002\u7136\u800c\uff0c\u5728\u901a\u7528\u8bed\u6599\u548c\u9886\u57df\u7279\u5b9a\u8bed\u6599\u7684\u6df7\u5408\u6bd4\u4f8b\u4e0a\uff0c\u4eba\u4eec\u901a\u5e38\u91c7\u53d6\u7684\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5b9e\u9645\u8bad\u7ec3\u6548\u7387\u7684\u4f4e\u4e0b\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5c1d\u8bd5\u4eceCPT\u7684\u6838\u5fc3\u51fa\u53d1\u91cd\u65b0\u5ba1\u89c6LLM\u7684\u7f29\u653e\u884c\u4e3a\uff0c\u5e76\u53d1\u73b0\u635f\u5931\u3001\u6df7\u5408\u6bd4\u7387\u4e0e\u8bad\u7ec3\u4ee4\u724c\u89c4\u6a21\u4e4b\u95f4\u7684\u5e42\u5f8b\u5173\u7cfb\u3002\u6211\u4eec\u6b63\u5f0f\u5b9a\u4e49\u4e86\u901a\u7528\u80fd\u529b\u548c\u9886\u57df\u7279\u5b9a\u80fd\u529b\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u4ece\u800c\u786e\u5b9a\u4e86\u901a\u7528\u6570\u636e\u548c\u9886\u57df\u6570\u636e\u7684\u4e34\u754c\u6df7\u5408\u6bd4\u7387\uff08CMR\uff09\u3002\u901a\u8fc7\u627e\u5230\u5e73\u8861\u70b9\uff0cCMR\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u901a\u7528\u80fd\u529b\uff0c\u5e76\u5b9e\u73b0\u4e86\u671f\u671b\u7684\u9886\u57df\u8fc1\u79fb\uff0c\u786e\u4fdd\u4e86\u53ef\u7528\u8d44\u6e90\u7684\u6700\u5927\u5316\u5229\u7528\u3002\u56e0\u6b64\uff0c\u5982\u679c\u91cd\u89c6\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u5e73\u8861\uff0cCMR\u53ef\u4ee5\u88ab\u8ba4\u4e3a\u662f\u6700\u4f73\u6df7\u5408\u6bd4\u7387\u3002 \u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u5b9e\u4e86CMR\u7684\u53ef\u9884\u6d4b\u6027\uff0c\u5e76\u63d0\u51fa\u4e86CMR\u7f29\u653e\u5b9a\u5f8b\uff0c\u5e76\u5bf9\u5176\u4e00\u822c\u6027\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u8fd9\u4e9b\u53d1\u73b0\u63d0\u4f9b\u4e86\u4f18\u5316LLM\u5728\u7279\u5b9a\u9886\u57df\u5185\u7684\u8bad\u7ec3\u7684\u5b9e\u7528\u6307\u5357\uff0c\u786e\u4fdd\u5728\u6709\u6548\u7ba1\u7406\u8bad\u7ec3\u8d44\u6e90\u7684\u540c\u65f6\uff0c\u65e2\u4fdd\u6301\u901a\u7528\u6027\u80fd\u53c8\u5b9e\u73b0\u9886\u57df\u7279\u5b9a\u6027\u80fd\u3002|\n", "2407.17453": "|**2024-07-24**|**$VILA^2$: VILA Augmented VILA**|Yunhao Fang et.al.|[2407.17453](http://arxiv.org/abs/2407.17453)|null|\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(VLMs)\u7684\u53d1\u5c55\u8fc5\u901f\uff0c\u5f97\u76ca\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u6a21\u578b\u67b6\u6784\u548c\u8bad\u7ec3\u57fa\u7840\u8bbe\u65bd\u5728\u5feb\u901f\u8fdb\u6b65\uff0c\u4f46\u6570\u636e\u6536\u96c6\u4e0e\u6574\u7406\u7684\u5de5\u4f5c\u4ecd\u88ab\u5ffd\u89c6\u3002\u5f53\u6570\u636e\u7684\u6570\u91cf\u4e0e\u8d28\u91cf\u6210\u4e3a\u74f6\u9888\u65f6\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u76f4\u63a5\u4ece\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u66f4\u591a\u539f\u59cb\u6570\u636e\uff0c\u8fd9\u4e9b\u6570\u636e\u7684\u8d28\u91cf\u65e0\u6cd5\u4fdd\u8bc1\uff0c\u8981\u4e48\u4ece\u9ed1\u76d2\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4V/\u91d1\u725b\u5ea7\uff09\u4e2d\u63d0\u53d6\u6570\u636e\uff0c\u5bfc\u81f4\u6027\u80fd\u53d7\u5230\u8be5\u6a21\u578b\u7684\u9650\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u548c\u4e13\u5bb6\u589e\u5f3a\u6b65\u9aa4\uff0c\u4ee5\u8fed\u4ee3\u5730\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u6a21\u578b\u6027\u80fd\u3002 \u5728\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u4e2d\uff0cVLM\u91cd\u65b0\u751f\u6210\u5176\u81ea\u8eab\u7684\u9884\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u63d0\u5347\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u4ece\u8fd9\u4e2a\u7cbe\u70bc\u7684\u6570\u636e\u96c6\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ee5\u6539\u5584\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u91cd\u590d\u8fdb\u884c\u591a\u6b21\u3002\u4e00\u65e6\u81ea\u6211\u589e\u5f3a\u8fbe\u5230\u9971\u548c\uff0c\u6211\u4eec\u5c06\u91c7\u7528\u51e0\u4e2a\u4e13\u95e8\u9886\u57dfVLM\uff0c\u8fd9\u4e9bVLM\u662f\u4ece\u81ea\u6211\u589e\u5f3a\u7684VLM\u4e2d\u5fae\u8c03\u800c\u6765\u7684\uff0c\u5177\u6709\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u901a\u8fc7\u4efb\u52a1\u5bfc\u5411\u7684\u91cd\u65b0\u751f\u6210\u548c\u91cd\u65b0\u8bad\u7ec3\uff0c\u8fdb\u4e00\u6b65\u5c06\u4e13\u5bb6\u77e5\u8bc6\u6ce8\u5165\u901a\u7528\u6a21\u578b\u4e2d\u3002 \u901a\u8fc7\u7ed3\u5408\u81ea\u6211\u589e\u5f3a\u548c\u4e13\u5bb6\u589e\u5f3a\u7684\u8bad\u7ec3\uff0c\u6211\u4eec\u5f15\u5165\u4e86VILA\u00b2\uff08VILA\u589e\u5f3a-VILA\uff09\u6a21\u578b\u5bb6\u65cf\uff0c\u8be5\u5bb6\u65cf\u5728\u5e7f\u6cdb\u7684\u4efb\u52a1\u4e0a\u6301\u7eed\u63d0\u9ad8\u4e86\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u6210\u679c\uff0c\u5e76\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u6a21\u578b\u4e2dMMMU\u6392\u884c\u699c\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u7ed3\u679c\u3002|\n", "2407.17417": "|**2024-07-24**|**Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?**|Michael-Andrei Panaitescu-Liess et.al.|[2407.17417](http://arxiv.org/abs/2407.17417)|null|\u672c\u6587\u9996\u5148\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u5d4c\u5165\u6c34\u5370\u4f5c\u4e3a\u9632\u6b62\u751f\u6210\u7248\u6743\u4fb5\u6743\u6587\u672c\u7684\u6709\u6548\u624b\u6bb5\u3002\u901a\u8fc7\u7406\u8bba\u5206\u6790\u548c\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5728LLM\u4e2d\u878d\u5165\u6c34\u5370\u80fd\u591f\u663e\u8457\u964d\u4f4e\u751f\u6210\u7248\u6743\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u89e3\u51b3LLM\u90e8\u7f72\u8fc7\u7a0b\u4e2d\u7684\u4e00\u9879\u5173\u952e\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6c34\u5370\u5bf9\u6210\u5458\u5f52\u5c5e\u63a8\u65ad\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIAs\uff09\u7684\u5f71\u54cd\uff0cMIAs\u65e8\u5728\u8bc6\u522b\u6837\u672c\u662f\u5426\u5c5e\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u80fd\u7528\u4e8e\u68c0\u6d4b\u7248\u6743\u8fdd\u89c4\u884c\u4e3a\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6c34\u5370\u964d\u4f4e\u4e86MIAs\u7684\u6210\u529f\u7387\uff0c\u4f7f\u68c0\u6d4b\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u7248\u6743\u6587\u672c\u53d8\u5f97\u590d\u6742\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u6280\u672f\u6765\u63d0\u9ad8\u5728\u6c34\u5370\u73af\u5883\u4e0b\u6700\u8fd1MIAs\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u5f00\u53d1\u9002\u5e94\u6027\u65b9\u6cd5\u4ee5\u7814\u7a76\u5177\u6709\u6f5c\u5728\u6cd5\u5f8b\u5f71\u54cd\u7684LLM\u5173\u952e\u95ee\u9898\u7684\u91cd\u8981\u6027\u3002|\n", "2407.17412": "|**2024-07-24**|**(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork**|Tianjin Huang et.al.|[2407.17412](http://arxiv.org/abs/2407.17412)|null|\u5927\u578b\u795e\u7ecf\u7f51\u7edc\u5728\u4e0d\u540c\u9886\u57df\u5982\u89c6\u89c9\u548c\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5c3d\u7ba1\u8fd9\u4f34\u968f\u7740\u5de8\u5927\u7684\u8ba1\u7b97\u8d44\u6e90\u6210\u672c\u3002\u538b\u7f29\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u7ed3\u6784\u6a21\u578b\u526a\u679d\u7b97\u6cd5\u662f\u4fc3\u8fdb\u6a21\u578b\u6548\u7387\u7684\u5173\u952e\u65b9\u6cd5\uff0c\u5f97\u76ca\u4e8e\u5176\u52a0\u901f\u53cb\u597d\u7684\u7a00\u758f\u6027\u6a21\u5f0f\u3002\u7ed3\u6784\u526a\u679d\u7684\u6838\u5fc3\u95ee\u9898\u662f\u5982\u4f55\u4f30\u8ba1\u901a\u9053\u7684\u91cd\u8981\u6027\u3002\u4e0e\u6b64\u5e76\u884c\uff0c\u6570\u636e\u4e3a\u4e2d\u5fc3\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u8868\u660e\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u6280\u672f\u80fd\u591f\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e00\u4e2a\u8ff7\u4eba\u7684\u53ef\u80fd\u6027\u2014\u2014\u5229\u7528\u89c6\u89c9\u63d0\u793a\u6765\u6355\u6349\u901a\u9053\u91cd\u8981\u6027\uff0c\u5e76\u63a8\u5bfc\u51fa\u9ad8\u8d28\u91cf\u7684\u7ed3\u6784\u7a00\u758f\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\u6846\u67b6\uff0c\u5373\\texttt{PASS}\u3002\u5b83\u662f\u4e00\u79cd\u5b9a\u5236\u7684\u8d85\u7f51\u7edc\uff0c\u63a5\u53d7\u89c6\u89c9\u63d0\u793a\u548c\u7f51\u7edc\u6743\u91cd\u7edf\u8ba1\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u9012\u5f52\u65b9\u5f0f\u8f93\u51fa\u9010\u5c42\u901a\u9053\u7a00\u758f\u6027\u3002\u8fd9\u79cd\u8bbe\u8ba1\u8003\u8651\u4e86\u5c42\u4e4b\u95f4\u901a\u9053\u7684\u5185\u5728\u4f9d\u8d56\u6027\u3002\u8de8\u591a\u4e2a\u7f51\u7edc\u67b6\u6784\u548c\u516d\u4e2a\u6570\u636e\u96c6\u7684\u5168\u9762\u5b9e\u9a8c\u663e\u793a\u4e86\\texttt{PASS}\u5728\u5b9a\u4f4d\u826f\u597d\u7ed3\u6784\u7a00\u758f\u6027\u7684\u4f18\u52bf\u3002\u4f8b\u5982\uff0c\u5728\u76f8\u540c\u7684FLOPs\u6c34\u5e73\u4e0b\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u5728Food101\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e861%-3%\u66f4\u9ad8\u7684\u51c6\u786e\u6027\uff1b\u6216\u8005\u5728\u83b7\u5f97\u4e0e\u57fa\u7ebf\u76f8\u540c\u768480%\u51c6\u786e\u5ea6\u65f6\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u80fd\u591f\u5b9e\u73b00.35\u500d\u66f4\u591a\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2407.17404": "|**2024-07-24**|**Grammar-based Game Description Generation using Large Language Models**|Tsunehiko Tanaka et.al.|[2407.17404](http://arxiv.org/abs/2407.17404)|null|\u4e3a\u4e86\u964d\u4f4e\u6e38\u620f\u8bbe\u8ba1\u5f00\u53d1\u7684\u95e8\u69db\uff0c\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u901a\u8fc7\u8ba1\u7b97\u8fc7\u7a0b\u751f\u6210\u6e38\u620f\u8bbe\u8ba1\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\uff0c\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5982\u8fdb\u5316\u7b97\u6cd5\u5df2\u53d6\u5f97\u6210\u529f\u3002\u5f97\u76ca\u4e8e\u6df1\u5ea6\u5b66\u4e60\u9886\u57df\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u65b9\u9762\u7684\u663e\u8457\u8fdb\u5c55\uff0c\u6e38\u620f\u751f\u6210\u65b9\u9762\u4e5f\u6709\u4e86\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u7684\u6570\u636e\u91cf\u6709\u9650\uff0c\u6df1\u5ea6\u5b66\u4e60\u5728\u4efb\u52a1\u5982\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u4e0a\u5e94\u7528\u4e0d\u8db3\u3002\u4e3a\u4e86\u5f00\u62d3\u5904\u7406\u6709\u9650\u6570\u636e\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\u7684\u65b0\u9014\u5f84\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u3002LLMs\u53ef\u4ee5\u4ece\u5c11\u91cf\u793a\u8303\u793a\u4f8b\u4e2d\u6355\u83b7\u4efb\u52a1\u7279\u5f81\uff0c\u5e76\u5229\u7528\u9884\u8bad\u7ec3\u671f\u95f4\u83b7\u5f97\u7684\u80fd\u529b\u8fdb\u884c\u5e94\u7528\u3002\u6211\u4eec\u5f15\u5165\u4e86\u6e38\u620f\u63cf\u8ff0\u7684\u8bed\u6cd5\uff0c\u6709\u6548\u5730\u5bf9\u6e38\u620f\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u7ed3\u6784\u5316\uff0c\u4f7fLLMs\u80fd\u591f\u6355\u6349\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\u7684\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u7801\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u8bed\u6cd5\u8fed\u4ee3\u6539\u8fdb\u751f\u6210\u7684\u8f93\u51fa\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u751f\u6210\u6e38\u620f\u63cf\u8ff0\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002|\n", "2407.17398": "|**2024-07-24**|**3D Question Answering for City Scene Understanding**|Penglei Sun et.al.|[2407.17398](http://arxiv.org/abs/2407.17398)|null|\u5728\u4e09\u7ef4\u591a\u6a21\u6001\u95ee\u7b54\uff08MQA\uff09\u9886\u57df\uff0c\u901a\u8fc7\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5176\u6240\u5728\u73af\u5883\u4e2d\u7684\u4e09\u7ef4\u7a7a\u95f4\uff0c\u5bf9\u4e8e\u573a\u666f\u7406\u89e3\u5177\u6709\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5ba4\u5185\u5bb6\u5ead\u4efb\u52a1\u548c\u5ba4\u5916\u9053\u8def\u81ea\u52a8\u9a7e\u9a76\u4efb\u52a1\u4e0a\uff0c\u800c\u5bf9\u4e8e\u57ce\u5e02\u7ea7\u522b\u7684\u573a\u666f\u7406\u89e3\u4efb\u52a1\u63a2\u7d22\u6709\u9650\u3002\u73b0\u6709\u7814\u7a76\u5728\u7406\u89e3\u57ce\u5e02\u573a\u666f\u65f6\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u57ce\u5e02\u5c42\u9762\u7684\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u4ee5\u53ca\u4eba\u7c7b\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u4fe1\u606f\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u4ece\u6570\u636e\u96c6\u548c\u65b9\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u5bf9\u4e09\u7ef4MQA\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u4ece\u6570\u636e\u96c6\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aCity-3DQA\u7684\u65b0\u9896\u4e09\u7ef4MQA\u6570\u636e\u96c6\uff0c\u5b83\u662f\u9996\u4e2a\u878d\u5408\u57ce\u5e02\u573a\u666f\u8bed\u4e49\u548c\u4eba\u4e0e\u73af\u5883\u4ea4\u4e92\u4efb\u52a1\u7684\u6570\u636e\u96c6\u3002\u4ece\u65b9\u6cd5\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u57ce\u5e02\u7ea7\u522b\u7406\u89e3\u65b9\u6cd5\uff08Sg-CityU\uff09\uff0c\u5229\u7528\u573a\u666f\u56fe\u5f15\u5165\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u3002\u5728City-3DQA\u7684\u4e0d\u540c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684Sg-CityU\u65b9\u6cd5\u53d6\u5f97\u4e8663.94%\u548c63.76%\u7684\u51c6\u786e\u7387\uff0c\u76f8\u6bd4\u5ba4\u5185\u4e09\u7ef4MQA\u65b9\u6cd5\u548c\u4f7f\u7528\u5148\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96f6\u6837\u672c\u65b9\u6cd5\uff0c\u5728\u9c81\u68d2\u6027\u548c\u6cdb\u5316\u80fd\u529b\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.17365": "|**2024-07-24**|**ViPer: Visual Personalization of Generative Models via Individual Preference Learning**|Sogand Salehi et.al.|[2407.17365](http://arxiv.org/abs/2407.17365)|null|\u4e0d\u540c\u7684\u7528\u6237\u5bf9\u4e8e\u540c\u4e00\u63d0\u793a\u751f\u6210\u7684\u4e0d\u540c\u56fe\u50cf\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u8fd9\u50ac\u751f\u4e86\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u7684\u6982\u5ff5\uff0c\u5373\u521b\u5efa\u4e0e\u4e2a\u4eba\u89c6\u89c9\u504f\u597d\u76f8\u5339\u914d\u7684\u56fe\u50cf\u3002\u76ee\u524d\u7684\u751f\u6210\u6a21\u578b\u662f\u65e0\u4e2a\u6027\u5316\u7684\uff0c\u5b83\u4eec\u88ab\u8c03\u6574\u4e3a\u5438\u5f15\u5e7f\u6cdb\u53d7\u4f17\u3002\u7528\u6237\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u7b26\u5408\u4e2a\u4eba\u504f\u597d\u7684\u56fe\u50cf\u4f9d\u8d56\u4e8e\u901a\u8fc7\u591a\u6b21\u8fed\u4ee3\u624b\u52a8\u8c03\u6574\u63d0\u793a\u7684\u8fc7\u7a0b\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u4e0d\u7406\u60f3\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u8fc7\u7a0b\uff1a\u9996\u5148\u901a\u8fc7\u9080\u8bf7\u7528\u6237\u5bf9\u4e00\u5c0f\u90e8\u5206\u56fe\u50cf\u8fdb\u884c\u8bc4\u8bba\uff0c\u89e3\u91ca\u4ed6\u4eec\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u7684\u539f\u56e0\uff0c\u4ece\u800c\u6355\u6349\u7528\u6237\u7684\u901a\u7528\u504f\u597d\u3002\u57fa\u4e8e\u8fd9\u4e9b\u8bc4\u8bba\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u51fa\u7528\u6237\u7684\u7ed3\u6784\u5316\u559c\u597d\u7684\u548c\u4e0d\u559c\u597d\u7684\u89c6\u89c9\u5c5e\u6027\uff0c\u5373\u4ed6\u4eec\u7684\u89c6\u89c9\u504f\u597d\u3002\u8fd9\u4e9b\u5c5e\u6027\u7528\u4e8e\u6307\u5bfc\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u66f4\u8d34\u8fd1\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u7684\u56fe\u50cf\u3002 \u901a\u8fc7\u4e00\u7cfb\u5217\u7528\u6237\u7814\u7a76\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f15\u5bfc\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u4e0e\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u9ad8\u5ea6\u4e00\u81f4\u7684\u751f\u6210\u7ed3\u679c\u3002|\n", "2407.17353": "|**2024-07-24**|**Scalify: scale propagation for efficient low-precision LLM training**|Paul Balan\u00e7a et.al.|[2407.17353](http://arxiv.org/abs/2407.17353)|**[link](https://github.com/graphcore-research/jax-scalify)**|**\u4f4e\u7cbe\u5ea6\u683c\u5f0f\uff0c\u5982float8\uff0c\u5df2\u88ab\u5f15\u5165\u673a\u5668\u5b66\u4e60\u52a0\u901f\u786c\u4ef6\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u548c\u63a8\u7406\u7684\u8ba1\u7b97\u6548\u7387\u3002\u7136\u800c\uff0c\u7531\u4e8e\u9700\u8981\u590d\u6742\u7684\u3001\u6709\u65f6\u662f\u8106\u5f31\u7684\u6280\u672f\u6765\u5339\u914d\u66f4\u9ad8\u7cbe\u5ea6\u7684\u8bad\u7ec3\u51c6\u786e\u5ea6\uff0cML\u793e\u533a\u5bf9\u4f4e\u7cbe\u5ea6\u683c\u5f0f\u7684\u91c7\u7eb3\u901f\u5ea6\u8f83\u6162\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aScalify\u7684\u7aef\u5230\u7aef\u7684\u7f29\u653e\u4f20\u64ad\u8303\u5f0f\uff0c\u7528\u4e8e\u8ba1\u7b97\u56fe\uff0c\u5b83\u6cdb\u5316\u5e76\u5f62\u5f0f\u5316\u4e86\u73b0\u6709\u7684\u5f20\u91cf\u7f29\u653e\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cScalify\u652f\u6301\u76f4\u63a5\u4f7f\u7528float8\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u548c\u68af\u5ea6\u8868\u793a\uff0c\u4ee5\u53cafloat16\u4f18\u5316\u5668\u72b6\u6001\u5b58\u50a8\u3002\u6211\u4eec\u5bf9Scalify\u7684JAX\u5b9e\u73b0\u5df2\u7ecf\u5f00\u6e90\u5728https://github.com/graphcore-research/jax-scalify\u3002**|\n", "2407.18219": "|**2024-07-26**|**Recursive Introspection: Teaching Language Model Agents How to Self-Improve**|Yuxiao Qu et.al.|[2407.18219](http://arxiv.org/abs/2407.18219)|null|\u5728\u4f7f\u57fa\u7840\u6a21\u578b\u5177\u5907\u81ea\u6211\u53cd\u7701\u80fd\u529b\u4ee5\u4fc3\u8fdb\u667a\u80fd\u4ee3\u7406\u884c\u4e3a\u7684\u5173\u952e\u65b9\u9762\u5728\u4e8e\u4f7f\u5176\u80fd\u591f\u5bf9\u5176\u884c\u4e3a\u3001\u63a8\u7406\u4ee5\u53ca\u5728\u53ef\u7528\u8ba1\u7b97\u6216\u4ea4\u4e92\u589e\u52a0\u65f6\u7ea0\u6b63\u9519\u8bef\u7684\u80fd\u529b\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u3002\u5373\u4f7f\u662f\u6700\u5f3a\u7684\u4e13\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e5f\u672a\u80fd\u5c55\u73b0\u51fa\u5728\u660e\u786e\u544a\u77e5\u5176\u72af\u9519\u7684\u60c5\u51b5\u4e0b\uff0c\u80fd\u591f\u8fde\u7eed\u6539\u8fdb\u5176\u54cd\u5e94\u5e8f\u5217\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRISE\uff08\u9012\u5f52\u5185\u7701\uff09\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5fae\u8c03LLMs\u4ee5\u5f15\u5165\u8fd9\u4e00\u80fd\u529b\uff0c\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u66fe\u5047\u8bbe\u8fd9\u79cd\u80fd\u529b\u53ef\u80fd\u65e0\u6cd5\u5b9e\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u89c4\u5b9a\u4e86\u4e00\u4e2a\u8fed\u4ee3\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u5c1d\u8bd5\u6559\u6388\u6a21\u578b\u5982\u4f55\u5728\u5176\u89e3\u51b3\u56f0\u96be\u6d4b\u8bd5\u65f6\u95ee\u9898\u7684\u4e0d\u6210\u529f\u5c1d\u8bd5\u540e\u4fee\u6539\u5176\u54cd\u5e94\uff0c\u5e76\u53ef\u9009\u5730\u83b7\u5f97\u989d\u5916\u7684\u73af\u5883\u53cd\u9988\u3002RISE\u5c06\u5355\u8f6e\u63d0\u793a\u7684\u5fae\u8c03\u89c6\u4e3a\u89e3\u51b3\u591a\u8f6e\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDP\uff09\uff0c\u5176\u4e2d\u521d\u59cb\u72b6\u6001\u4e3a\u63d0\u793a\u3002\u53d7\u5728\u7ebf\u6a21\u4eff\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u539f\u7406\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u8f6e\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u65e8\u5728\u8d4b\u4e88LLM\u9012\u5f52\u68c0\u6d4b\u5e76\u4fee\u6b63\u5176\u5148\u524d\u9519\u8bef\u5e76\u5728\u540e\u7eed\u8fed\u4ee3\u4e2d\u8fdb\u884c\u7ea0\u6b63\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRISE\u4f7fLlama2\u3001Llama3\u548cMistral\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u901a\u8fc7\u66f4\u591a\u8f6e\u6b21\u6539\u5584\u81ea\u5df1\uff0c\u4e0e\u7ed9\u5b9a\u7b49\u91cf\u63a8\u7406\u65f6\u95f4\u8ba1\u7b97\u76f8\u6bd4\uff0c\u8d85\u8fc7\u4e86\u51e0\u79cd\u5355\u8f6e\u7b56\u7565\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cRISE\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\uff0c\u901a\u5e38\u968f\u7740\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u800c\u83b7\u5f97\u66f4\u5927\u7684\u6536\u76ca\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cRISE\u5bf9\u56f0\u96be\u63d0\u793a\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u6709\u610f\u4e49\u7684\u6539\u8fdb\uff0c\u4ee5\u8fbe\u5230\u6b63\u786e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u6ca1\u6709\u56e0\u4e3a\u8868\u8fbe\u66f4\u590d\u6742\u7684\u5206\u5e03\u800c\u5bfc\u81f4\u5355\u8f6e\u80fd\u529b\u53d7\u5230\u5f71\u54cd\u3002|\n", "2407.18213": "|**2024-07-26**|**Exploring Scaling Trends in LLM Robustness**|Nikolaus Howe et.al.|[2407.18213](http://arxiv.org/abs/2407.18213)|null|\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u53ef\u9884\u6d4b\u5730\u901a\u8fc7\u589e\u52a0\u6a21\u578b\u7684\u5927\u5c0f\u548c\u8bad\u7ec3\u6570\u636e\u800c\u5f97\u5230\u6539\u5584\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u5df2\u8bad\u7ec3\u4e86\u4e00\u7cfb\u5217\u8d8a\u6765\u8d8a\u5927\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5bf9\u5bf9\u6297\u6027\u63d0\u793a\uff08\u5982\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff09\u975e\u5e38\u8106\u5f31\uff0c\u8fd9\u7c7b\u653b\u51fb\u4f1a\u64cd\u63a7\u6a21\u578b\u6267\u884c\u4e0d\u5e0c\u671b\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6784\u6210\u4e86\u91cd\u5927\u7684\u8bef\u7528\u98ce\u9669\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u968f\u7740\u6a21\u578b\u548c\u6570\u636e\u89c4\u6a21\u7684\u589e\u52a0\uff0c\u8ba1\u7b97\u673a\u89c6\u89c9\u6a21\u578b\u7684\u9c81\u68d2\u6027\u4e5f\u4f1a\u63d0\u9ad8\uff0c\u56e0\u6b64\u63d0\u51fa\u4e86\u8fd9\u6837\u4e00\u4e2a\u95ee\u9898\uff1a\u8bed\u8a00\u6a21\u578b\u7684\u9c81\u68d2\u6027\u662f\u5426\u4e5f\u4f1a\u968f\u89c4\u6a21\u7684\u6269\u5927\u800c\u63d0\u5347\uff1f\u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\u56de\u7b54\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u53d1\u73b0\u66f4\u5927\u7684\u6a21\u578b\u5728\u5bf9\u6297\u6027\u8bad\u7ec3\u4e0b\u6709\u663e\u8457\u66f4\u597d\u7684\u8868\u73b0\uff0c\u4f46\u5728\u6ca1\u6709\u660e\u786e\u9632\u5fa1\u63aa\u65bd\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u89c4\u6a21\u7684\u589e\u52a0\u5e76\u6ca1\u6709\u5e26\u6765\u4efb\u4f55\u76ca\u5904\u3002|\n", "2407.18158": "|**2024-07-25**|**Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models**|Sanae Lotfi et.al.|[2407.18158](http://arxiv.org/abs/2407.18158)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u538b\u7f29\u6280\u672f\u8ba1\u7b97\u4e86LLM\u7684\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\uff0c\u4f46\u5bf9\u4e8e\u5341\u4ebf\u53c2\u6570\u7ea7\u522b\u7684\u5927\u578b\u6a21\u578b\uff0c\u8fd9\u4e9b\u8fb9\u754c\u663e\u5f97\u65e0\u610f\u4e49\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u8fb9\u754c\u662f\u5728\u975e\u5e38\u6709\u9650\u7684\u538b\u7f29\u6280\u672f\u4e0b\u83b7\u5f97\u7684\uff0c\u9650\u5236\u4e86\u751f\u6210\u8d28\u91cf\u8f83\u4f4e\u6587\u672c\u7684\u538b\u7f29\u6a21\u578b\u3002\u66f4\u5173\u952e\u7684\u662f\uff0c\u73b0\u6709\u8fb9\u754c\u4f9d\u8d56\u4e8e\u8bad\u7ec3\u96c6\u4e2d\u72ec\u7acb\u540c\u5206\u5e03\uff08IID\uff09\u6587\u6863\u7684\u6570\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u8bad\u7ec3\u96c6\u5185\u6570\u91cf\u5e9e\u5927\u7684\u975eIID\u6784\u6210\u4ee4\u724c\uff0c\u8fd9\u4f7f\u5f97\u8fdb\u4e00\u6b65\u63d0\u9ad8\u8fb9\u754c\u7d27\u81f4\u6027\u6f5c\u529b\u672a\u88ab\u5145\u5206\u5229\u7528\u3002 \u672c\u7814\u7a76\u91c7\u7528\u9785\u7684\u6027\u8d28\u6765\u63a8\u5bfc\u6cdb\u5316\u8fb9\u754c\uff0c\u8fd9\u4e9b\u8fb9\u754c\u80fd\u591f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u96c6\u4e2d\u5305\u542b\u7684\u5927\u91cf\u4ee4\u724c\u4e2d\u83b7\u76ca\u3002\u4e0e\u8bad\u7ec3\u96c6\u76f8\u6bd4\uff0c\u6570\u636e\u96c6\u5305\u542b\u7684\u4ee4\u724c\u6570\u91cf\u8fdc\u591a\u4e8e\u6587\u6863\uff0c\u56e0\u6b64\u6211\u4eec\u7684\u6cdb\u5316\u8fb9\u754c\u4e0d\u4ec5\u5bb9\u5fcd\u4e86\u66f4\u4e3a\u5bbd\u677e\u7684\u538b\u7f29\u65b9\u6848\uff0c\u5b9e\u9645\u4e0a\u8fd8\u80fd\u4ece\u8fd9\u4e9b\u65b9\u6848\u4e2d\u83b7\u76ca\u3002\u6211\u4eec\u901a\u8fc7Monarch\u77e9\u9635\u3001Kronecker\u56e0\u5b50\u5206\u89e3\u548c\u540e\u8bad\u7ec3\u91cf\u5316\u7b49\u65b9\u6cd5\uff0c\u4e3aLLM\uff08\u5982LLaMA2-70B\uff09\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002\u4e0e\u4ee5\u5f80\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u9996\u6b21\u4e3a\u5728\u5b9e\u8df5\u4e2d\u90e8\u7f72\u5e76\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u7684\u6a21\u578b\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002|\n", "2407.18129": "|**2024-07-26**|**Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic**|Fakhraddin Alwajih et.al.|[2407.18129](http://arxiv.org/abs/2407.18129)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u751f\u6210\u548c\u7406\u89e3\u56fe\u50cf\u5230\u6587\u672c\u5185\u5bb9\u65b9\u9762\u7684\u529f\u80fd\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u8fd9\u4e9b\u6210\u529f\uff0c\u4f46\u8fdb\u6b65\u4e3b\u8981\u5c40\u9650\u4e8e\u82f1\u8bed\uff0c\u7531\u4e8e\u5176\u4ed6\u8bed\u8a00\u5982\u963f\u62c9\u4f2f\u8bed\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u8d44\u6e90\u7684\u7a00\u7f3a\u6027\uff0c\u8fd9\u9650\u5236\u4e86\u963f\u62c9\u4f2f\u8bed\u7b49\u8bed\u8a00\u4e2d\u7ade\u4e89\u6027\u6a21\u578b\u7684\u53d1\u5c55\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684\u963f\u62c9\u4f2f\u8bed\u591a\u6a21\u6001\u52a9\u624b\u2014\u2014Dallah\uff0c\u5b83\u57fa\u4e8eLLaMA-2\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u6765\u4fc3\u8fdb\u591a\u6a21\u6001\u4ea4\u4e92\u3002Dallah\u5728\u963f\u62c9\u4f2f\u8bedMLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u901a\u8fc7\u7ec6\u8c03\u516d\u4e2a\u963f\u62c9\u4f2f\u65b9\u8a00\uff0cDallah\u5c55\u793a\u4e86\u5176\u5904\u7406\u5305\u542b\u6587\u672c\u548c\u89c6\u89c9\u5143\u7d20\u7684\u590d\u6742\u65b9\u8a00\u4e92\u52a8\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u5728\u4e24\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff1a\u4e00\u4e2a\u8bc4\u4f30\u5176\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u6027\u80fd\uff0c\u53e6\u4e00\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u65b9\u8a00\u54cd\u5e94\u3002 \u9664\u4e86\u5728\u591a\u6a21\u6001\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u7a33\u5065\u6027\u80fd\u5916\uff0cDallah\u6709\u671b\u5f15\u9886\u8fdb\u4e00\u6b65\u5f00\u53d1\u65b9\u8a00\u610f\u8bc6\u7684\u963f\u62c9\u4f2f\u8bedMLLM\u7684\u53d1\u5c55\u3002|\n", "2407.18103": "|**2024-07-25**|**Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow**|Tian Guo et.al.|[2407.18103](http://arxiv.org/abs/2407.18103)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u5fae\u8c03\u6280\u672f\u5728\u5404\u79cd\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06LLM\u7528\u4e8e\u57fa\u4e8e\u91d1\u878d\u65b0\u95fb\u6d41\u7684\u80a1\u7968\u56de\u62a5\u9884\u6d4b\u7684\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u91cf\u5316\u6295\u8d44\u9886\u57df\uff0c\u56de\u62a5\u9884\u6d4b\u662f\u540e\u7eed\u4efb\u52a1\u5982\u80a1\u7968\u6311\u9009\u548c\u7ec4\u5408\u4f18\u5316\u7b49\u7684\u57fa\u7840\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u6587\u672c\u8868\u793a\u548c\u9884\u6d4b\u6a21\u5757\u7684\u6a21\u578b\u3002\u63d0\u51fa\u4e86\u6bd4\u8f83\u4ec5\u7f16\u7801\u5668\u548c\u4ec5\u89e3\u7801\u5668LLM\u7684\u4e24\u79cd\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee5\u4e0d\u540c\u7684\u65b9\u5f0f\u751f\u6210\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u4e0d\u540c\u8868\u793a\u5bf9\u9884\u6d4b\u6027\u80fd\u7684\u5f71\u54cd\u4ecd\u662f\u4e00\u4e2a\u5f00\u653e\u7684\u95ee\u9898\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u5c06LLM\u7684token\u7ea7\u8868\u793a\u96c6\u6210\u5230\u9884\u6d4b\u6a21\u5757\u4e2d\u7684\u4e24\u79cd\u7b80\u5355\u65b9\u6cd5\u3002\u5728\u771f\u5b9e\u65b0\u95fb\u548c\u6295\u8d44\u8303\u56f4\u5185\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\u4ee5\u4e0b\u7ed3\u679c\uff1a\uff081\uff09\u4eceLLM\u7684token\u7ea7\u5d4c\u5165\u805a\u5408\u7684\u8868\u793a\u901a\u5e38\u80fd\u4ea7\u751f\u589e\u5f3a\u957f\u671f\u548c\u957f\u671f\u77ed\u671f\u6295\u8d44\u7ec4\u5408\u6027\u80fd\u7684\u56de\u62a5\u9884\u6d4b\uff1b\uff082\uff09\u5728\u76f8\u5bf9\u8f83\u5927\u7684\u6295\u8d44\u8303\u56f4\u5185\uff0c\u57fa\u4e8e\u89e3\u7801\u5668\u7684LLM\u9884\u6d4b\u6a21\u578b\u5bfc\u81f4\u66f4\u5f3a\u7684\u6295\u8d44\u7ec4\u5408\uff0c\u800c\u5728\u8f83\u5c0f\u7684\u8303\u56f4\u5185\uff0c\u6ca1\u6709\u4e00\u81f4\u7684\u8d62\u5bb6\uff1b\uff083\uff09\u4eceLLM\u6587\u672c\u8868\u793a\u4e2d\u5bfc\u51fa\u7684\u56de\u62a5\u9884\u6d4b\u5bf9\u4e8e\u6295\u8d44\u7ec4\u5408\u6784\u9020\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u4fe1\u53f7\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684\u60c5\u7eea\u5f97\u5206\u3002|\n", "2407.18078": "|**2024-07-25**|**PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization**|Christopher Clarke et.al.|[2407.18078](http://arxiv.org/abs/2407.18078)|**[link](https://github.com/ChrisIsKing/Parameter-Efficient-Personalization)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u5174\u8d77\u4e3a\u4eba\u7c7b\u4e0eAI\u7684\u4ea4\u4e92\u5f00\u8f9f\u4e86\u65b0\u7684\u7bc7\u7ae0\u3002\u8fd9\u4e9b\u5148\u8fdb\u6a21\u578b\uff0c\u4ee5Chat-GPT\u4e3a\u4ee3\u8868\uff0c\u5c55\u73b0\u4e86\u5728\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u89c4\u6a21\u7684\u6307\u6570\u7ea7\u589e\u957f\uff0c\u4e00\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6a21\u578b\u4e2a\u6027\u5316\u2014\u2014\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u5927\u578b\u57fa\u7840\u6a21\u578b\u5982GPT-3\u7b49\u4fa7\u91cd\u4e8e\u6784\u5efa\u901a\u7528\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u4efb\u52a1\u548c\u7528\u6237\u7fa4\u4f53\u3002\u8fd9\u79cd\u7b56\u7565\u5f3a\u8c03\u4e86\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c06\u7528\u6237\u89c6\u4e3a\u6574\u4f53\u800c\u975e\u4e2a\u4f53\u3002\u867d\u7136\u5728\u8bb8\u591a\u5e38\u89c1\u5e94\u7528\u4e2d\u5b9e\u7528\uff0c\u4f46\u8fd9\u79cd\u4e00\u5200\u5207\u7684\u65b9\u6cd5\u5f80\u5f80\u65e0\u6cd5\u6ee1\u8db3\u4eba\u7c7b\u591a\u6837\u6027\u548c\u4e2a\u6027\u5316\u9700\u6c42\u7684\u4e30\u5bcc\u6027\u3002\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86PEFT-U\u57fa\u51c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u5efa\u548c\u8bc4\u4f30\u9762\u5411\u7528\u6237\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6a21\u578b\u7684\u65b0\u6570\u636e\u96c6\u3002PEFT-U\u5305\u542b\u4e86\u591a\u5143\u4e14\u4e2a\u6027\u5316\u7684\u8868\u8fbe\u4efb\u52a1\uff0c\u5176\u4e2d\u540c\u4e00\u8f93\u5165\u5bf9\u4e8e\u4e0d\u540c\u7528\u6237\u53ef\u80fd\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u901a\u8fc7PEFT-U\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u9ad8\u6548\u5730\u4e2a\u6027\u5316LLM\u4ee5\u9002\u5e94\u7528\u6237\u7279\u5b9a\u504f\u597d\uff0c\u7279\u522b\u662f\u5728\u591a\u6837\u5316\u7684\u7528\u6237\u4e2d\u5fc3\u4efb\u52a1\u80cc\u666f\u4e0b\u3002**|\n", "2407.18069": "|**2024-07-25**|**C2P: Featuring Large Language Models with Causal Reasoning**|Abdolmahdi Bagheri et.al.|[2407.18069](http://arxiv.org/abs/2407.18069)|null|\u56e0\u679c\u63a8\u7406\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fbe\u5230\u4eba\u7c7b\u7ea7\u667a\u80fd\u7684\u4e3b\u8981\u969c\u788d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u56e0\u679c\u94fe\u63d0\u793a\uff08C2P\uff09\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u4e3a\u5f53\u524dLLM\u63d0\u4f9b\u56e0\u679c\u63a8\u7406\u80fd\u529b\u7684\u63a8\u7406\u6846\u67b6\u3002C2P\u81ea\u4e3b\u8fd0\u884c\uff0c\u5728\u56e0\u679c\u5b66\u4e60\u548c\u63a8\u7406\u9636\u6bb5\u5747\u65e0\u9700\u4f9d\u8d56\u5916\u90e8\u5de5\u5177\u6216\u6a21\u5757\uff0c\u5e76\u4e14\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230LLM\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002\u5728\u5404\u79cd\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cC2P\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u56e0\u679c\u5b66\u4e60\u548c\u540e\u7eed\u63a8\u7406\u51c6\u786e\u6027\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7C2P\u589e\u5f3aLLM\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u7684\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u89e3\u51b3\u533b\u7597\u3001\u533b\u5b66\u3001\u7ecf\u6d4e\u5b66\u3001\u6559\u80b2\u3001\u793e\u4f1a\u79d1\u5b66\u3001\u73af\u5883\u79d1\u5b66\u548c\u5e02\u573a\u8425\u9500\u7b49\u9886\u57df\u4e2d\u7684\u590d\u6742\u95ee\u9898\u3002\u5229\u7528\u5c11\u793a\u4f8b\u5b66\u4e60\uff0cGPT-4 Turbo \u4f7f\u7528C2P\uff0c\u4ec5\u4f7f\u7528\u516d\u4e2a\u793a\u4f8b\u5c31\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u63a8\u7406\u51c6\u786e\u6027\u6bd4\u5728\u7c7b\u4f3c\u60c5\u51b5\u4e0b\u8fd1\u4e4e\u968f\u673a\u8fd0\u884c\u7684\u6700\u5148\u8fdbLLM\u9ad8\u51fa33%\u4ee5\u4e0a\u3002\u8fd9\u8bc1\u660e\u4e86\u5c06C2P\u96c6\u6210\u5230LLM\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u8d4b\u4e88\u8fd9\u4e9b\u6a21\u578b\u9ad8\u7ea7\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u5177\u6709\u53d8\u9769\u6027\u610f\u4e49\u3002|\n", "2407.18064": "|**2024-07-25**|**ComPeer: A Generative Conversational Agent for Proactive Peer Support**|Tianjian Liu et.al.|[2407.18064](http://arxiv.org/abs/2407.18064)|**[link](https://github.com/liutj9/compeer)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u5f0f\u4ee3\u7406\uff08CA\uff09\u4f5c\u4e3a\u540c\u4f34\u652f\u6301\u8005\u5728\u5fc3\u7406\u5065\u5eb7\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\u53ca\u76ca\u5904\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u540c\u4f34\u652f\u6301\u578bCA\u8981\u4e48\u7531\u7528\u6237\u4e3b\u52a8\u89e6\u53d1\uff0c\u8981\u4e48\u9075\u5faa\u9884\u8bbe\u89c4\u5219\u4ee5\u542f\u52a8\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u80fd\u963b\u788d\u7528\u6237\u4e0eCA\u5efa\u7acb\u957f\u671f\u5173\u7cfb\uff0c\u4ece\u800c\u5f71\u54cd\u957f\u671f\u76ca\u5904\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ComPeer\u2014\u2014\u4e00\u79cd\u751f\u6210\u5f0fCA\uff0c\u5b83\u80fd\u591f\u4e3b\u52a8\u63d0\u4f9b\u9002\u5e94\u6027\u7684\u540c\u4f34\u652f\u6301\u3002 ComPeer\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5e76\u53cd\u6620\u5bf9\u8bdd\u4e2d\u7684\u5173\u952e\u4e8b\u4ef6\uff0c\u4ee5\u6b64\u6765\u7b56\u7565\u6027\u5730\u89c4\u5212\u4e3b\u52a8\u5173\u6000\u7684\u65f6\u95f4\u548c\u5185\u5bb9\u3002\u6b64\u5916\uff0cComPeer\u8fd8\u6574\u5408\u4e86\u540c\u4f34\u652f\u6301\u7b56\u7565\u3001\u5bf9\u8bdd\u5386\u53f2\u4ee5\u53ca\u5176\u4e2a\u6027\u5316\u7684\u5143\u7d20\u5230\u751f\u6210\u7684\u6d88\u606f\u4e2d\u3002\u901a\u8fc7\u4e00\u9879\u4e3a\u671f\u4e00\u5468\u7684\u8de8\u7ec4\u5b9e\u9a8c\uff08\u53c2\u4e0e\u4eba\u6570\uff1a24\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86ComPeer\u5728\u957f\u65f6\u95f4\u5185\u63d0\u4f9b\u540c\u4f34\u652f\u6301\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u4e0e\u57fa\u4e8e\u7528\u6237\u7684\u4e3b\u52a8\u89e6\u53d1\u7684CA\u76f8\u6bd4\uff0c\u663e\u8457\u63d0\u5347\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002 \u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u751f\u6210\u5f0fCA\u5728\u540c\u4f34\u652f\u6301\u9886\u57df\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5b83\u4eec\u5982\u4f55\u901a\u8fc7\u4e3b\u52a8\u5173\u6000\u7b56\u7565\u4fc3\u8fdb\u66f4\u6df1\u5165\u3001\u66f4\u6301\u7eed\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u4ece\u800c\u4e3a\u7528\u6237\u63d0\u4f9b\u957f\u671f\u7684\u5fc3\u7406\u5065\u5eb7\u76ca\u5904\u3002|\n", "2407.18062": "|**2024-07-25**|**Audio Entailment: Assessing Deductive Reasoning for Audio Understanding**|Soham Deshmukh et.al.|[2407.18062](http://arxiv.org/abs/2407.18062)|**[link](https://github.com/microsoft/audioentailment)**|**\u8fd1\u671f\u6587\u732e\u5728\u6784\u5efa\u97f3\u9891\u57fa\u7840\u6a21\u578b\u65f6\u4f7f\u7528\u4e86\u8bed\u8a00\u3002\u8fd9\u4e9b\u97f3\u9891-\u8bed\u8a00\u6a21\u578b\uff08ALMs\uff09\u901a\u8fc7\u5927\u91cf\u97f3\u9891\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u5728\u6587\u672c\u5230\u97f3\u9891\u68c0\u7d22\u3001\u5b57\u5e55\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6267\u884c\u66f4\u590d\u6742\u7684\u5f00\u653e\u6027\u4efb\u52a1\uff0c\u5982\u4ea4\u4e92\u5f0f\u95ee\u7b54\u65f6\u7684\u80fd\u529b\uff0c\u9700\u8981\u903b\u8f91\u63a8\u7406\u6280\u80fd\uff0c\u800c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u8bc4\u4f30\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u97f3\u9891\u8574\u542b\u7684\u65b0\u4efb\u52a1\uff0c\u7528\u4e8e\u8bc4\u4f30ALM\u7684\u6f14\u7ece\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u4e2a\u4efb\u52a1\u8bc4\u4f30\u97f3\u9891\u5185\u5bb9\u7684\u6587\u672c\u63cf\u8ff0\uff08\u5047\u8bbe\uff09\u662f\u5426\u53ef\u4ee5\u4ece\u97f3\u9891\u8bb0\u5f55\uff08\u524d\u63d0\uff09\u4e2d\u63a8\u65ad\u51fa\u6765\uff0c\u7ed3\u8bba\u53ef\u80fd\u662f\u8574\u542b\u3001\u4e2d\u7acb\u6216\u77db\u76fe\uff0c\u53d6\u51b3\u4e8e\u8bc1\u636e\u7684\u5145\u5206\u6027\u3002\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\uff0c\u97f3\u9891\u8bb0\u5f55\u6765\u81ea\u4e24\u4e2a\u97f3\u9891\u5b57\u5e55\u6570\u636e\u96c6\u2014\u2014AudioCaps\u548cClotho\uff0c\u800c\u5047\u8bbe\u5219\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u3002 \u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684ALMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u53d1\u73b0\u5b83\u4eec\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5148\u5b57\u5e55\u540e\u63a8\u7406\u201d\u8fd9\u4e00\u4e2d\u95f4\u6b65\u9aa4\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u53ef\u4ee5\u5206\u522b\u63d0\u9ad8ALMs\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u7edd\u5bf9\u503c6%\u548c3%\u3002**|\n", "2407.18061": "|**2024-07-25**|**Difficulty Estimation and Simplification of French Text Using LLMs**|Henri Jamet et.al.|[2407.18061](http://arxiv.org/abs/2407.18061)|null|\u6211\u4eec\u5229\u7528\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6765\u5f00\u53d1\u5916\u8bed\u5b66\u4e60\u5e94\u7528\uff0c\u4e13\u6ce8\u4e8e\u8bc4\u4f30\u5916\u8bed\u6587\u672c\u7684\u96be\u5ea6\u5e76\u5c06\u5176\u7b80\u5316\u81f3\u8f83\u4f4e\u96be\u5ea6\u7ea7\u522b\u3002\u6211\u4eec\u5c06\u8fd9\u4e24\u4e2a\u4efb\u52a1\u90fd\u89c6\u4e3a\u9884\u6d4b\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u6709\u6807\u7b7e\u793a\u4f8b\u3001\u8fc1\u79fb\u5b66\u4e60\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u96be\u5ea6\u5206\u7c7b\u6a21\u578b\uff0c\u76f8\u8f83\u4e8e\u4ee5\u5f80\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u5728\u51c6\u786e\u6027\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5bf9\u4e8e\u7b80\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u7b80\u5316\u8d28\u91cf\u4e0e\u610f\u4e49\u4fdd\u7559\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u6bd4\u8f83\u4e86\u96f6\u521d\u59cb\u5316\u548c\u5fae\u8c03\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u6709\u9650\u7684\u5fae\u8c03\uff0c\u53ef\u4ee5\u83b7\u5f97\u5177\u6709\u610f\u4e49\u7684\u6587\u672c\u7b80\u5316\u7ed3\u679c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u6cd5\u8bed\u6587\u672c\u4e0a\u8fdb\u884c\uff0c\u4f46\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u8bed\u8a00\u65e0\u5173\u6027\uff0c\u5e76\u76f4\u63a5\u9002\u7528\u4e8e\u5176\u4ed6\u5916\u8bed\u3002|\n", "2407.18897": "|**2024-07-26**|**Small Molecule Optimization with Large Language Models**|Philipp Guevorguian et.al.|[2407.18897](http://arxiv.org/abs/2407.18897)|**[link](https://github.com/yerevann/chemlactica)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e3a\u751f\u6210\u5206\u5b50\u836f\u7269\u8bbe\u8ba1\u5e26\u6765\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cChemlactica\u201d\u548c\u201cChemma\u201d\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u4eec\u5747\u57fa\u4e8e\u4e00\u4e2a\u542b\u67091.1\u4ebf\u4e2a\u5206\u5b50\u53ca\u8ba1\u7b97\u5f97\u51fa\u5c5e\u6027\u7684\u5168\u65b0\u6570\u636e\u96c6\uff0c\u5171\u8ba1400\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u5fae\u8c03\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u751f\u6210\u5177\u6709\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\u4ee5\u53ca\u4ece\u6709\u9650\u6837\u672c\u9884\u6d4b\u65b0\u5206\u5b50\u7279\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u4f18\u5316\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u6211\u4eec\u7684\u8bed\u8a00\u6a21\u578b\u5bf9\u4efb\u610f\u5c5e\u6027\u8fdb\u884c\u4f18\u5316\uff0c\u540c\u65f6\u4ec5\u901a\u8fc7\u9ed1\u76d2\u5f0f\u63a5\u53e3\u8bbf\u95ee\u6709\u9650\u4fe1\u606f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u9057\u4f20\u7b97\u6cd5\u3001\u62d2\u7edd\u91c7\u6837\u548c\u63d0\u793a\u4f18\u5316\u7684\u6982\u5ff5\u3002\u8be5\u7b97\u6cd5\u5728\u591a\u4e2a\u5206\u5b50\u4f18\u5316\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5728\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\u63d0\u9ad8\u4e868%\u7684\u201cPractical Molecular Optimization\u201d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u8bed\u8a00\u6a21\u578b\u548c\u4f18\u5316\u7b97\u6cd5\u7684\u4ee3\u7801\u3002**|\n", "2407.18827": "|**2024-07-26**|**Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models**|Mutahar Safdar et.al.|[2407.18827](http://arxiv.org/abs/2407.18827)|null|\u6570\u636e\u9a71\u52a8\u7684\u589e\u6750\u5236\u9020(AM)\u7814\u7a76\u5728\u8fd1\u5e74\u6765\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5927\u91cf\u7684\u79d1\u5b66\u6587\u732e\u6d8c\u73b0\u3002\u8fd9\u4e9b\u6587\u732e\u4e2d\u7684\u77e5\u8bc6\u6d89\u53caAM\u548c\u4eba\u5de5\u667a\u80fd(AI)\u7684\u4e0a\u4e0b\u6587\uff0c\u4f46\u5c1a\u672a\u4ee5\u96c6\u6210\u7684\u65b9\u5f0f\u8fdb\u884c\u6316\u6398\u548c\u5f62\u5f0f\u5316\u3002\u4ece\u8fd9\u4e9b\u4f5c\u54c1\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u9700\u8981\u5927\u91cf\u7684\u52aa\u529b\u548c\u65f6\u95f4\u3002\u5728AM\u9886\u57df\u7684\u4e13\u5bb6\u5df2\u7ecf\u8d21\u732e\u4e86\u8d85\u8fc7\u4e8c\u5341\u591a\u7bc7\u7efc\u8ff0\u8bba\u6587\u6765\u603b\u7ed3\u8fd9\u4e9b\u5de5\u4f5c\u3002\u7136\u800c\uff0c\u4e0eAM\u548cAI\u76f8\u5173\u7684\u7279\u5b9a\u4fe1\u606f\u4ecd\u7136\u9700\u8981\u624b\u52a8\u52aa\u529b\u6765\u63d0\u53d6\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u5982BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u53d8\u6362\u5668\uff09\u6216GPT\uff08\u9884\u8bad\u7ec3\u751f\u6210\u578b\u53d8\u6362\u5668\uff09\u5728\u6587\u672c\u6570\u636e\u4e0a\u7684\u6210\u529f\uff0c\u4e3a\u52a0\u901f\u79d1\u5b66\u4fe1\u606f\u63d0\u53d6\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdbAM\u548cAI\u4e13\u5bb6\u4e4b\u95f4\u7684\u5408\u4f5c\uff0c\u4ee5\u8fde\u7eed\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u3002\u57fa\u4e8e\u63d0\u51fa\u7684\u6846\u67b6\u5b9e\u73b0\u4e86\u4e00\u4e2a\u6f14\u793a\u5de5\u5177\uff0c\u5e76\u5f00\u5c55\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u63d0\u53d6\u4e0e\u6570\u636e\u96c6\u3001\u5efa\u6a21\u3001\u4f20\u611f\u548cAM\u7cfb\u7edf\u7c7b\u522b\u76f8\u5173\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u52a0\u5feb\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\u7684\u80fd\u529b\u3002\u5728\u672a\u6765\uff0c\u8be5\u6846\u67b6\u53ef\u4ee5\u7528\u4e8e\u4ece\u5de5\u7a0b\u5b66\u79d1\u7684\u8bbe\u8ba1\u548c\u5236\u9020\u6587\u732e\u4e2d\u63d0\u53d6\u4fe1\u606f\u3002|\n", "2407.18787": "|**2024-07-26**|**Automatic Detection of Moral Values in Music Lyrics**|Vjosa Preniqi et.al.|[2407.18787](http://arxiv.org/abs/2407.18787)|**[link](https://github.com/vjosapreniqi/ismir-mft-values)**|\u9053\u5fb7\u4ef7\u503c\u89c2\u5728\u8bc4\u4f30\u4fe1\u606f\u3001\u505a\u51fa\u51b3\u7b56\u548c\u5bf9\u91cd\u8981\u793e\u4f1a\u95ee\u9898\u5f62\u6210\u5224\u65ad\u65b9\u9762\u53d1\u6325\u7740\u57fa\u7840\u6027\u4f5c\u7528\u3002\u4ece\u6b4c\u8bcd\u4e2d\u5feb\u901f\u63d0\u53d6\u9053\u5fb7\u4ef7\u503c\u7684\u53ef\u80fd\u6027\u4f7f\u6211\u4eec\u5bf9\u97f3\u4e50\u8046\u542c\u884c\u4e3a\u6709\u66f4\u6df1\u7684\u7406\u89e3\u3002\u57fa\u4e8e\u9053\u5fb7\u57fa\u7840\u7406\u8bba\uff08MFT\uff09\uff0c\u6211\u4eec\u5bf9\u4e00\u7ec4\u7ecf\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4\uff09\u751f\u6210\u76842,721\u4e2a\u5408\u6210\u6b4c\u8bcd\u5fae\u8c03\u7684\u53d8\u538b\u5668\u57fa\u8bed\u8a00\u6a21\u578b\uff08BERT\uff09\u8fdb\u884c\u4e86\u4efb\u52a1\uff0c\u4ee5\u68c0\u6d4b200\u9996\u7531\u4e24\u4f4d\u4e13\u5bb6\u6ce8\u91ca\u7684\u771f\u5b9e\u97f3\u4e50\u6b4c\u8bcd\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff08\u5305\u62ec\u79bb\u57df\uff08BERT\u5728MFT\u6ce8\u91ca\u7684\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u4e0a\u5fae\u8c03\uff09\u548c\u96f6\u5c04\u51fb\uff08GPT-4\uff09\u5206\u7c7b\uff09\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u9884\u6d4b\u80fd\u529b\u3002\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5728\u6240\u6709\u5b9e\u9a8c\u4e2d\u5747\u8868\u73b0\u51fa\u6700\u4f73\u51c6\u786e\u6027\uff0c\u5e73\u5747F1\u52a0\u6743\u5f97\u5206\u4e3a0.8\u3002\u4e0e\u57fa\u51c6\u6a21\u578b\u76f8\u6bd4\uff0c\u8be5\u6027\u80fd\u5e73\u5747\u9ad8\u51fa5%\u3002\u5728\u4e8c\u5143\u5206\u7c7b\u7684\u7cbe\u786e\u5ea6\u4e0a\uff0c\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5e73\u5747\u9ad8\u51fa\u57fa\u51c6\u6a21\u578b12%\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8d21\u732e\u4e86\u65e0\u6ce8\u91ca\u7684\u6b4c\u8bcd\u9053\u5fb7\u5b66\u4e60\u4ee5\u53ca\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u97f3\u4e50\u4e2d\u9053\u5fb7\u8868\u8fbe\u7684\u77e5\u8bc6\u63d0\u70bc\uff0c\u5e76\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u6280\u672f\u5bf9\u521b\u610f\u4ea7\u4e1a\u548c\u97f3\u4e50\u6587\u5316\u6f5c\u5728\u5f71\u54cd\u7684\u6709\u7528\u89c1\u89e3\u3002|\n", "2407.18786": "|**2024-07-26**|**The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs**|Aleix Sant et.al.|[2407.18786](http://arxiv.org/abs/2407.18786)|null|\u672c\u6587\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u89c6\u89d2\u63a2\u8ba8\u4e86\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u6027\u522b\u504f\u89c1\u95ee\u9898\u3002\u7814\u7a76\u4f7f\u7528\u4e86\u56db\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u6d4b\u8bd5\u96c6\uff0c\u5bf9\u82f1\u8bed\u5230\u52a0\u6cf0\u7f57\u5c3c\u4e9a\u8bed\uff08En$\\rightarrow$Ca\uff09\u548c\u82f1\u8bed\u5230\u897f\u73ed\u7259\u8bed\uff08En$\\rightarrow$Es\uff09\u7684\u7ffb\u8bd1\u65b9\u5411\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e0e\u6700\u5148\u8fdb\u7684\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08NMT\uff09\u6a21\u578b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u8bc4\u4f30\u5404\u79cd\u57fa\u7840LLM\u7684\u7ffb\u8bd1\u8d28\u91cf\u548c\u6027\u522b\u504f\u89c1\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u666e\u904d\u5b58\u5728\u6027\u522b\u504f\u89c1\u73b0\u8c61\uff0c\u5176\u4e2d\u57fa\u7840LLM\u7684\u504f\u89c1\u7a0b\u5ea6\u6bd4NMT\u6a21\u578b\u66f4\u9ad8\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u79cd\u504f\u89c1\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u5e94\u7528\u7684\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u3002\u7814\u7a76\u8bc6\u522b\u51fa\u4e00\u79cd\u63d0\u793a\u7ed3\u6784\uff0c\u80fd\u591f\u663e\u8457\u964d\u4f4e\u6027\u522b\u504f\u89c1\uff0c\u76f8\u6bd4\u66f4\u76f4\u63a5\u7684\u63d0\u793a\uff0c\u5728WinoMT\u8bc4\u4f30\u6570\u636e\u96c6\u4e0a\u51cf\u5c11\u4e86\u9ad8\u8fbe12%\u7684\u6027\u522b\u504f\u89c1\u3002\u8fd9\u4e9b\u7ed3\u679c\u663e\u8457\u7f29\u5c0f\u4e86LLM\u4e0e\u4f20\u7edfNMT\u7cfb\u7edf\u5728\u6027\u522b\u504f\u89c1\u51c6\u786e\u6027\u65b9\u9762\u7684\u5dee\u8ddd\u3002|\n", "2407.18764": "|**2024-07-26**|**TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals**|Kevin Kliimask et.al.|[2407.18764](http://arxiv.org/abs/2407.18764)|null|\u81ea2000\u5e74\u4ee3\u4e2d\u671f\u4ee5\u6765\uff0c\u63a8\u52a8\u5f00\u653e\u653f\u5e9c\u6570\u636e\uff08OGD\uff09\u7684\u52aa\u529b\u5728\u5404\u7ea7\u653f\u5e9c\u4e2d\u83b7\u5f97\u4e86\u663e\u8457\u7684\u52bf\u5934\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684\u6570\u636e\u96c6\u88ab\u53d1\u5e03\u5230OGD\u95e8\u6237\u4e0a\uff0c\u67e5\u627e\u7279\u5b9a\u6570\u636e\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\uff0c\u5bfc\u81f4\u4fe1\u606f\u8fc7\u8f7d\u3002\u5b8c\u6574\u4e14\u51c6\u786e\u7684\u6570\u636e\u96c6\u6587\u6863\uff0c\u5305\u62ec\u4e0e\u6570\u636e\u96c6\u5173\u8054\u7684\u9002\u5f53\u6807\u7b7e\uff0c\u5bf9\u4e8e\u63d0\u9ad8\u6570\u636e\u96c6\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u81f3\u5173\u91cd\u8981\u3002\u5bf9\u7231\u6c99\u5c3c\u4e9a\u5f00\u653e\u6570\u636e\u95e8\u6237\u7684\u5206\u6790\u63ed\u793a\uff0c11%\u7684\u6570\u636e\u96c6\u6ca1\u6709\u5173\u8054\u6807\u7b7e\uff0c\u800c26%\u7684\u6570\u636e\u96c6\u4ec5\u6709\u4e00\u4e2a\u6807\u7b7e\u88ab\u5206\u914d\uff0c\u8fd9\u8868\u660e\u4e86\u95e8\u6237\u5185\u6570\u636e\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u9762\u4e34\u7684\u6311\u6218\u3002\u6839\u636e\u6700\u8fd1\u7684\u5f00\u653e\u6570\u636e\u6210\u719f\u5ea6\u62a5\u544a\uff0c\u8be5\u95e8\u6237\u88ab\u8ba4\u4e3a\u662f\u9886\u5148\u8005\u3002\u672c\u7814\u7a76\u7684\u76ee\u6807\u662f\u63d0\u51fa\u4e00\u79cd\u81ea\u52a8\u5316\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u6539\u5584OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u96c6\u6807\u7b7e\uff0c\u4ece\u800c\u63d0\u9ad8\u6570\u636e\u96c6\u7684\u53ef\u53d1\u73b0\u6027\u3002\u672c\u6587\u4ecb\u7ecd\u4e86Tagify\u2014\u2014\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5982GPT-3.5-turbo\u548cGPT-4\u81ea\u52a8\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\u7684\u539f\u578b\uff0c\u4ee5\u82f1\u8bed\u548c\u7231\u6c99\u5c3c\u4e9a\u8bed\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\uff0c\u4ece\u800c\u589e\u5f3a\u6570\u636e\u53d1\u5e03\u8005\u51c6\u5907\u7684\u5143\u6570\u636e\uff0c\u5e76\u901a\u8fc7\u6539\u5584\u6570\u636e\u7528\u6237\u5728OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u53d1\u73b0\u6027\u6765\u63d0\u9ad8\u6570\u636e\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u5f00\u53d1\u7684\u89e3\u51b3\u65b9\u6848\u7ecf\u8fc7\u7528\u6237\u8bc4\u4f30\uff0c\u5e76\u6536\u96c6\u4e86\u4ed6\u4eec\u7684\u53cd\u9988\uff0c\u4ee5\u5b9a\u4e49\u672a\u6765\u539f\u578b\u6539\u8fdb\u7684\u8bae\u7a0b\u3002|\n", "2407.18752": "|**2024-07-26**|**Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery**|Yuni Susanti et.al.|[2407.18752](http://arxiv.org/abs/2407.18752)|**[link](https://github.com/littleflow3r/kg-structure-as-prompt)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u57fa\u4e8e\u5143\u6570\u636e\u800c\u975e\u5b9e\u9645\u6570\u636e\u503c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u56e0\u679c\u53d1\u73b0\u95ee\u9898\u4e0a\u7684\u65b0\u89c6\u89d2\uff0c\u5373\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u6211\u4eec\u5173\u6ce8\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLMs\uff0c\u53c2\u6570\u5c11\u4e8e10\u4ebf\uff09\u5982\u4f55\u901a\u8fc7\u63d0\u793a\u5f0f\u5b66\u4e60\u8fdb\u884c\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7ed3\u6784\u63d0\u793a\u201d\uff08KG Structure as Prompt\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u77e5\u8bc6\u56fe\u8c31\u4e2d\u7684\u7ed3\u6784\u4fe1\u606f\uff0c\u5982\u5171\u90bb\u8282\u70b9\u548c\u5143\u8def\u5f84\uff0c\u6574\u5408\u5230\u63d0\u793a\u5f0f\u5b66\u4e60\u4e2d\uff0c\u4ee5\u589e\u5f3aSLMs\u7684\u80fd\u529b\u3002 \u5728\u4e09\u79cd\u7c7b\u578b\u7684\u751f\u547d\u79d1\u5b66\u548c\u5f00\u653e\u57df\u6570\u636e\u96c6\u4e0b\u7684\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u8d85\u8d8a\u4e86\u8bb8\u591a\u57fa\u7ebf\uff0c\u5e76\u4e14\u751a\u81f3\u8d85\u8fc7\u4e86\u5728\u5b8c\u6574\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5e38\u89c4\u5fae\u8c03\u7684\u4f20\u7edf\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\uff1a\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u63d0\u793a\u5f0f\u5b66\u4e60\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u663e\u793a\u51fa\u8d85\u8d8a\u53c2\u6570\u66f4\u591aLLMs\u7684\u6f5c\u529b\u3002 \u6211\u4eec\u5df2\u7ecf\u5728GitHub\u4e0a\u63d0\u4f9b\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\u3002**|\n", "2407.18743": "|**2024-07-26**|**Towards Effective and Efficient Continual Pre-training of Large Language Models**|Jie Chen et.al.|[2407.18743](http://arxiv.org/abs/2407.18743)|null|\u8fd9\u7bc7\u6280\u672f\u62a5\u544a\u4ecb\u7ecd\u4e86\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u65b9\u6cd5\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u6216\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u62a5\u544a\u4ee5Llama-3\uff088B\uff09\u4e3a\u4f8b\uff0c\u8fd9\u662f\u4e00\u4e2a\u663e\u8457\u63d0\u5347\u4e86\u5176\u5728\u4e2d\u6587\u7406\u89e3\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u7684\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5728\u589e\u5f3a\u65b0\u80fd\u529b\u7684\u540c\u65f6\u4fdd\u6301\u539f\u6709\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6570\u636e\u6df7\u5408\u548c\u8bfe\u7a0b\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u6570\u636e\u96c6\u5e76\u5408\u6210\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u57fa\u4e8e\u76f8\u5173\u7f51\u9875\u751f\u6210\u591a\u5b66\u79d1\u7684\u79d1\u5b66\u95ee\u9898\u4e0e\u7b54\u6848\uff08QA\uff09\u5bf9\uff0c\u5e76\u5c06\u8fd9\u4e9b\u5408\u6210\u6570\u636e\u878d\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u4ee5\u63d0\u5347Llama-3\u7684\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ecf\u8fc7\u8fd9\u4e00\u7cfb\u5217\u6539\u8fdb\u540e\u7684\u6a21\u578b\u88ab\u79f0\u4e3aLlama-3-SynE\uff08\u5408\u6210\u6570\u636e\u589e\u5f3a\u7684Llama-3\uff09\u3002\u62a5\u544a\u8fd8\u901a\u8fc7\u8f83\u5c0f\u89c4\u6a21\u7684TinyLlama\u6a21\u578b\u8fdb\u884c\u8c03\u53c2\u5b9e\u9a8c\uff0c\u5e76\u5229\u7528\u4ece\u8fd9\u4e9b\u5b9e\u9a8c\u4e2d\u5f97\u5230\u7684\u53d1\u73b0\u6765\u8bad\u7ec3\u57fa\u7ebf\u6a21\u578b\u3002 \u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u9ad8\u57fa\u7ebf\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5305\u62ec\u901a\u7528\u80fd\u529b\uff08C-Eval\u4e0a+8.81\u5206\uff0cCMMLU\u4e0a+6.31\u5206\uff09\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\uff08MATH\u4e0a+12.00\u5206\uff0cSciEval\u4e0a+4.13\u5206\uff09\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u539f\u6709\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u3001\u6570\u636e\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/RUC-GSAI/Llama-3-SynE\u3002|\n", "2407.18738": "|**2024-07-26**|**Towards Generalized Offensive Language Identification**|Alphaeus Dmonte et.al.|[2407.18738](http://arxiv.org/abs/2407.18738)|null|\u4e92\u8054\u7f51\u4e0a\u5177\u6709\u653b\u51fb\u6027\u7684\u5185\u5bb9\uff0c\u5305\u62ec\u4ec7\u6068\u8a00\u8bba\u548c\u7f51\u7edc\u6b3a\u51cc\uff0c\u662f\u4e00\u4e2a\u5168\u7403\u6027\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u793e\u533a\u5bf9\u6b64\u7ed9\u4e88\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u5df2\u7ecf\u5f00\u53d1\u51fa\u4e86\u591a\u79cd\u81ea\u52a8\u8bc6\u522b\u53ef\u80fd\u6709\u5bb3\u5185\u5bb9\u5e76\u51cf\u8f7b\u5176\u5f71\u54cd\u7684\u7cfb\u7edf\u3002\u8fd9\u4e9b\u7cfb\u7edf\u4e3b\u8981\u91c7\u7528\u4e24\u79cd\u7b56\u7565\uff1a\uff081\uff09\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u6a21\u578b\u548c\u5e94\u7528\u7aef\u70b9\uff0c\u5305\u62ec\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u6ce8\u91ca\u6570\u636e\u96c6\uff0c\u5e76\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u673a\u5668\u5b66\u4e60\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u901a\u7528\u6027\u5c1a\u4e0d\u6e05\u695a\uff0c\u800c\u4e14\u5b83\u4eec\u5728\u5b9e\u9645\u73af\u5883\u548c\u975e\u9886\u57df\u5185\u7684\u5e94\u7528\u4e5f\u5e38\u53d7\u5230\u8d28\u7591\u3002\u672c\u6587\u901a\u8fc7\u4e00\u4e2a\u65b0\u9896\u7684\u901a\u7528\u57fa\u51c6\u5bf9\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u901a\u7528\u6027\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\u3002\u6211\u4eec\u9488\u5bf9\u901a\u7528\u6027\u63d0\u51fa\u4e86\u4e09\u4e2a\u7814\u7a76\u95ee\u9898\uff0c\u5e76\u5f97\u51fa\u4e86\u7ed3\u8bba\u3002\u8fd9\u4e9b\u53d1\u73b0\u5c06\u6709\u52a9\u4e8e\u6784\u5efa\u66f4\u5f3a\u5927\u7684\u73b0\u5b9e\u4e16\u754c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7cfb\u7edf\u3002|\n", "2407.18723": "|**2024-07-26**|**LLASP: Fine-tuning Large Language Models for Answer Set Programming**|Erica Coppolillo et.al.|[2407.18723](http://arxiv.org/abs/2407.18723)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u3002\u5c3d\u7ba1\u5728\u9002\u5e94LLMs\u4ee5\u751f\u6210\u591a\u79cd\u6307\u4ee4\u6027\u7f16\u7a0b\u8bed\u8a00\u548c\u4efb\u52a1\u7684\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u58f0\u660e\u5f0f\u5f62\u5f0f\u5316\u8bed\u8a00\uff0c\u5982\u7b54\u6848\u96c6\u7f16\u7a0b\uff08ASP\uff09\u65f6\u7684\u80fd\u529b\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8LLMs\u5728ASP\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u5e94\u7528\u53ef\u80fd\u6027\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u8fdb\u884c\u4e86\u7cfb\u7edf\u8bc4\u4f30\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5728\u53c2\u6570\u6570\u91cf\u3001\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u7b49\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u5b83\u4eec\u5728\u751f\u6210\u6b63\u786eASP\u7a0b\u5e8f\u65b9\u9762\u7684\u8868\u73b0\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLASP\u7684\u8f7b\u91cf\u7ea7\u6a21\u578b\uff0c\u4e13\u95e8\u7528\u4e8e\u7f16\u7801ASP\u7a0b\u5e8f\u7684\u57fa\u672c\u6a21\u5f0f\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u5e7f\u6cdb\u57fa\u672c\u95ee\u9898\u89c4\u8303\u7684\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\uff0c\u8fd9\u4e9b\u89c4\u8303\u53ef\u4ee5\u88ab\u7f16\u7801\u4e3aASP\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLASP\u751f\u6210\u7684ASP\u7a0b\u5e8f\u7684\u8d28\u91cf\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u3002\u4e0e\u672a\u7ecf\u8fc7\u5fae\u8c03\u7684\u7248\u672c\u76f8\u6bd4\uff0c\u4ee5\u53ca\u4e0e\u5927\u591a\u6570\u6e34\u671b\u578bLLM\u5019\u9009\u8005\uff0c\u5c24\u5176\u662f\u4ece\u8bed\u4e49\u89d2\u5ea6\u6765\u770b\uff0c\u5176\u8868\u73b0\u5747\u4f18\u4e8e\u591a\u6570\u3002\u6240\u6709\u7528\u4e8e\u6267\u884c\u5b9e\u9a8c\u7684\u4ee3\u7801\u548c\u6570\u636e\u90fd\u5df2\u516c\u5f00\u53d1\u5e03\u4e8ehttps://anonymous.4open.science/r/LLASP-D86C/\u3002|\n", "2407.18722": "|**2024-07-26**|**Neurosymbolic AI for Enhancing Instructability in Generative AI**|Amit Sheth et.al.|[2407.18722](http://arxiv.org/abs/2407.18722)|null|\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u97f3\u4e50\u7b49\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u53d8\u9769\uff0c\u5c55\u793a\u4e86\u9075\u5faa\u6307\u4ee4\u7684\u63d0\u793a\u80fd\u529b\uff0c\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u6307\u4ee4\u8c03\u4f18\u3002\u6307\u4ee4\u8c03\u4f18\u662f\u4e00\u79cd\u76d1\u7763\u5f0f\u5fae\u8c03\u65b9\u6cd5\uff0c\u901a\u8fc7\u8bad\u7ec3\u6570\u636e\u96c6\u6765\u5b9e\u73b0\u7279\u5b9a\u4efb\u52a1\u53ca\u5176\u5bf9\u5e94\u6307\u4ee4\u683c\u5f0f\u5316\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7cfb\u7edf\u6027\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u6267\u884c\u63d0\u4f9b\u6307\u793a\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cLLMs \u5728\u4e00\u81f4\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u3001\u591a\u6b65\u9aa4\u6307\u4ee4\u4ee5\u53ca\u5c06\u8fd9\u4e9b\u6307\u4ee4\u63a8\u5e7f\u5230\u65b0\u4efb\u52a1\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5bf9\u4e8e\u66f4\u5e7f\u6cdb\u5730\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e3a\u4f55\u795e\u7ecf\u7b26\u53f7AI\u80fd\u63d0\u4f9b\u589e\u5f3aLLMs\u6307\u4ee4\u53ef\u7406\u89e3\u6027\u7684\u66f4\u597d\u9014\u5f84\u3002\u6211\u4eec\u63a2\u7d22\u4f7f\u7528\u7b26\u53f7\u4efb\u52a1\u89c4\u5212\u5668\u5206\u89e3\u9ad8\u7ea7\u6307\u4ee4\u4e3a\u7ed3\u6784\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u795e\u7ecf\u8bed\u4e49\u89e3\u6790\u5668\u5c06\u8fd9\u4e9b\u4efb\u52a1\u843d\u5730\u4e3a\u53ef\u6267\u884c\u64cd\u4f5c\uff0c\u4ee5\u53ca\u4f7f\u7528\u795e\u7ecf\u7b26\u53f7\u6267\u884c\u5668\u5b9e\u65bd\u8fd9\u4e9b\u64cd\u4f5c\u7684\u540c\u65f6\u52a8\u6001\u7ef4\u62a4\u660e\u786e\u7684\u72b6\u6001\u8868\u793a\u3002\u6211\u4eec\u4e5f\u5bfb\u6c42\u5c55\u793a\uff0c\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\u80fd\u591f\u589e\u5f3a\u4efb\u52a1\u6267\u884c\u7684\u53ef\u9760\u6027\u548c\u4e0a\u4e0b\u6587\u610f\u8bc6\uff0c\u4f7fLLMs\u80fd\u591f\u4ee5\u66f4\u9ad8\u7684\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u52a8\u6001\u89e3\u91ca\u548c\u54cd\u5e94\u66f4\u5e7f\u6cdb\u7684\u6307\u4ee4\u4e0a\u4e0b\u6587\u3002|\n", "2407.20232": "|**2024-07-29**|**Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing**|Ekaterina Iakovleva et.al.|[2407.20232](http://arxiv.org/abs/2407.20232)|null|\u6587\u672c\u7f16\u8f91\u7684\u6269\u6563\u6a21\u578b\u5728\u7528\u6237\u8f93\u5165\u6307\u4ee4\u5b58\u5728\u6b67\u4e49\u65f6\u8868\u73b0\u51fa\u6709\u9650\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Specify ANd Edit\uff08SANE\uff09\uff0c\u4e00\u4e2a\u7528\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u7f16\u8f91\u7cfb\u7edf\u7684\u96f6\u6837\u672c\u63a8\u7406\u7ba1\u9053\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u8f93\u5165\u6307\u4ee4\u5206\u89e3\u4e3a\u5177\u4f53\u7684\u6307\u4ee4\uff0c\u5373\u5e94\u7528\u5230\u8f93\u5165\u56fe\u50cf\u4ee5\u6ee1\u8db3\u7528\u6237\u8bf7\u6c42\u7684\u5177\u4f53\u5e72\u9884\u63aa\u65bd\u3002\u901a\u8fc7\u4e00\u79cd\u4e13\u95e8\u4e3a\u4efb\u52a1\u8bbe\u8ba1\u7684\u65b0\u9896\u53bb\u566a\u6307\u5bfc\u7b56\u7565\uff0c\u6211\u4eec\u53ef\u4ee5\u4eceLLM\u751f\u6210\u7684\u6307\u4ee4\u4ee5\u53ca\u539f\u59cb\u6307\u4ee4\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u4e09\u4e2a\u57fa\u7ebf\u548c\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86SANE\u5728\u6240\u6709\u8bbe\u7f6e\u4e2d\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u63d0\u9ad8\u4e86\u7f16\u8f91\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u589e\u5f3a\u4e86\u8f93\u51fa\u591a\u6837\u6027\u3002\u6211\u4eec\u8fd8\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u7f16\u8f91\uff0c\u65e0\u8bba\u662f\u5426\u5b58\u5728\u6b67\u4e49\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/fabvio/SANE\u3002|\n", "2407.20224": "|**2024-07-29**|**Can Editing LLMs Inject Harm?**|Canyu Chen et.al.|[2407.20224](http://arxiv.org/abs/2407.20224)|null|\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u6b63\u9010\u6e10\u88ab\u91c7\u7528\u4ee5\u9ad8\u6548\u5730\u7ea0\u6b63\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u9519\u8bef\u6216\u8fc7\u65f6\u77e5\u8bc6\uff0c\u8fd9\u4e3b\u8981\u662f\u56e0\u4e3a\u4ece\u5934\u5f00\u59cb\u91cd\u65b0\u8bad\u7ec3\u7684\u9ad8\u6210\u672c\u3002\u540c\u65f6\uff0c\u4e00\u4e2a\u4e9f\u5f85\u63a2\u7d22\u4f46\u672a\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1a\u77e5\u8bc6\u7f16\u8f91\u662f\u5426\u53ef\u4ee5\u7528\u4e8e\u5411LLMs\u6ce8\u5165\u5371\u5bb3\uff1f\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u77e5\u8bc6\u7f16\u8f91\u91cd\u65b0\u5b9a\u4e49\u4e3aLLMs\u9762\u4e34\u7684\u4e00\u79cd\u65b0\u7c7b\u578b\u5b89\u5168\u6027\u5a01\u80c1\uff0c\u5373\u7f16\u8f91\u653b\u51fb\uff0c\u5e76\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6EditAttack\u8fdb\u884c\u4e86\u7cfb\u7edf\u6027\u7684\u8c03\u67e5\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7f16\u8f91\u653b\u51fb\u7684\u4e24\u4e2a\u5178\u578b\u5b89\u5168\u6027\u98ce\u9669\uff1a\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u504f\u89c1\u6ce8\u5165\u3002\u5bf9\u4e8e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u9996\u5148\u5c06\u5176\u7ec6\u5206\u4e3a\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u957f\u5c3e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u3002\u7136\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u7f16\u8f91\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u5411LLMs\u6ce8\u5165\u8fd9\u4e24\u79cd\u7c7b\u578b\u7684\u8bef\u5bfc\u6027\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5bf9\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u6709\u6548\u6027\u7279\u522b\u9ad8\u3002 \u5bf9\u4e8e\u504f\u89c1\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u70b9\uff0c\u5373\u4e0d\u4ec5\u53ef\u4ee5\u901a\u8fc7\u9ad8\u6709\u6548\u6027\u5411LLMs\u6ce8\u5165\u6709\u504f\u89c1\u7684\u53e5\u5b50\uff0c\u800c\u4e14\u5355\u4e2a\u6709\u504f\u89c1\u7684\u53e5\u5b50\u6ce8\u5165\u5c31\u8db3\u4ee5\u5bfc\u81f4LLMs\u7684\u603b\u4f53\u8f93\u51fa\u51fa\u73b0\u663e\u8457\u504f\u89c1\u589e\u52a0\uff0c\u5373\u4f7f\u8fd9\u4e9b\u8f93\u51fa\u4e0e\u6ce8\u5165\u7684\u53e5\u5b50\u9ad8\u5ea6\u65e0\u5173\uff0c\u8fd9\u8868\u660e\u4e86\u7f16\u8f91\u653b\u51fb\u5bf9LLMs\u6574\u4f53\u516c\u5e73\u6027\u7684\u707e\u96be\u6027\u5f71\u54cd\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7f16\u8f91\u653b\u51fb\u7684\u9ad8\u9690\u853d\u6027\uff0c\u901a\u8fc7\u5176\u5bf9LLMs\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u7684\u5f71\u54cd\u6765\u8861\u91cf\uff0c\u4ee5\u53ca\u5728\u5b9e\u8bc1\u8bc1\u636e\u7684\u57fa\u7840\u4e0a\u8bf4\u660e\u4e86\u9632\u5fa1\u7f16\u8f91\u653b\u51fb\u7684\u56f0\u96be\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u5728\u635f\u5bb3LLMs\u5b89\u5168\u5bf9\u9f50\u65b9\u9762\u6b63\u5728\u51fa\u73b0\u7684\u6ee5\u7528\u98ce\u9669\u3002|\n", "2407.20207": "|**2024-07-29**|**QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval**|Hongming Tan et.al.|[2407.20207](http://arxiv.org/abs/2407.20207)|null|\u5728\u5bc6\u96c6\u68c0\u7d22\u9886\u57df\uff0c\u5c06\u957f\u6587\u672c\u8f6c\u5316\u4e3a\u7a20\u5bc6\u5411\u91cf\u65f6\u53ef\u80fd\u4f1a\u5bfc\u81f4\u4fe1\u606f\u4e22\u5931\uff0c\u4ece\u800c\u5f71\u54cd\u67e5\u8be2\u4e0e\u6587\u672c\u7684\u5339\u914d\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u8d28\u91cf\u8f83\u4f4e\u3001\u566a\u58f0\u8fc7\u591a\u6216\u5173\u952e\u4fe1\u606f\u7a00\u758f\u7684\u6587\u672c\u5f80\u5f80\u96be\u4ee5\u4e0e\u76f8\u5173\u67e5\u8be2\u826f\u597d\u5339\u914d\u3002\u5f53\u524d\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u53e5\u5d4c\u5165\u6a21\u578b\u6216\u68c0\u7d22\u6d41\u7a0b\u4e0a\u3002\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u5c06\u539f\u59cb\u6587\u6863\u8f6c\u5316\u4e3a\u4fe1\u606f\u5bc6\u96c6\u578b\u6587\u672c\u683c\u5f0f\uff0c\u4ee5\u8865\u5145\u539f\u6587\u672c\uff0c\u6709\u6548\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\uff0c\u540c\u65f6\u65e0\u9700\u4fee\u6539\u5d4c\u5165\u6216\u68c0\u7d22\u65b9\u6cd5\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u96f6\u6837\u672c\u63d0\u793a\u751f\u6210\u4e24\u79cd\u6587\u672c\u8868\u793a\uff1a\u95ee\u9898-\u7b54\u6848\u5bf9\u548c\u4e8b\u4ef6\u9a71\u52a8\u5143\u7d20\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u547d\u540d\u4e3aQAEA-DR\uff1a\u7edf\u4e00\u95ee\u9898\u751f\u6210\u4e0e\u4e8b\u4ef6\u63d0\u53d6\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\uff0c\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bc4\u5206\u7684\u8bc4\u4f30\u4e0e\u518d\u751f\u6210\u673a\u5236\u4e8eLLM\u63d0\u793a\u8fc7\u7a0b\u4e2d\u3002\u6211\u4eec\u7684QAEA-DR\u6a21\u578b\u5bf9\u5bc6\u96c6\u68c0\u7d22\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u8fd9\u4e00\u89c2\u70b9\u5f97\u5230\u4e86\u7406\u8bba\u5206\u6790\u548c\u5b9e\u9a8c\u8bc1\u636e\u7684\u652f\u6301\u3002|\n", "2407.20183": "|**2024-07-29**|**MindSearch: Mimicking Human Minds Elicits Deep AI Searcher**|Zehui Chen et.al.|[2407.20183](http://arxiv.org/abs/2407.20183)|**[link](https://github.com/internlm/mindsearch)**|**\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u662f\u4e00\u4e2a\u590d\u6742\u8ba4\u77e5\u4efb\u52a1\uff0c\u9700\u8981\u6295\u5165\u5927\u91cf\u65f6\u95f4\u548c\u7cbe\u529b\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fd1\u671f\u663e\u8457\u8fdb\u5c55\u7684\u542f\u53d1\uff0c\u8fd1\u671f\u5de5\u4f5c\u5c1d\u8bd5\u901a\u8fc7\u7ed3\u5408\u641c\u7d22\u5f15\u64ce\u4e0eLLM\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u7136\u56e0\u4e09\u4e2a\u6311\u6218\u800c\u83b7\u5f97\u4e0d\u4ee4\u4eba\u6ee1\u610f\u7684\u6027\u80fd\uff1a\uff081\uff09\u590d\u6742\u7684\u8bf7\u6c42\u5f80\u5f80\u65e0\u6cd5\u51c6\u786e\u4e14\u5b8c\u6574\u5730\u7531\u641c\u7d22\u5f15\u64ce\u68c0\u7d22\uff1b\uff082\uff09\u9700\u8981\u6574\u5408\u7684\u4fe1\u606f\u5206\u5e03\u5728\u591a\u4e2a\u7f51\u9875\u4e0a\uff0c\u5e76\u5939\u6742\u7740\u5927\u91cf\u566a\u97f3\uff1b\uff083\uff09\u5927\u91cf\u957f\u6587\u672c\u7684\u7f51\u9875\u53ef\u80fd\u8fc5\u901f\u8d85\u8fc7LLM\u7684\u6700\u5927\u4e0a\u4e0b\u6587\u957f\u5ea6\u3002 \u53d7\u4eba\u7c7b\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u601d\u7ef4\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86MindSearch\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u4e92\u8054\u7f51\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u8fc7\u7a0b\u4e2d\u7684\u601d\u7ef4\u6a21\u5f0f\uff0c\u53ef\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u800c\u6709\u6548\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u3002WebPlanner\u4ee5\u52a8\u6001\u56fe\u6784\u5efa\u8fc7\u7a0b\u6765\u6a21\u62df\u4eba\u7c7b\u591a\u6b65\u9aa4\u4fe1\u606f\u68c0\u7d22\u7684\u601d\u7ef4\uff1a\u5b83\u5c06\u7528\u6237\u67e5\u8be2\u5206\u89e3\u4e3a\u539f\u5b50\u5b50\u95ee\u9898\u4f5c\u4e3a\u56fe\u4e2d\u7684\u8282\u70b9\uff0c\u5e76\u6839\u636e\u4eceWebSearcher\u83b7\u53d6\u7684\u641c\u7d22\u7ed3\u679c\u9010\u6b65\u6269\u5c55\u56fe\u3002WebSearcher\u627f\u62c5\u6bcf\u4e2a\u5b50\u95ee\u9898\uff0c\u6267\u884c\u5206\u5c42\u4fe1\u606f\u68c0\u7d22\u5e76\u4ece\u641c\u7d22\u5f15\u64ce\u6536\u96c6\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u4f9bWebPlanner\u4f7f\u7528\u3002MindSearch\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u4f7f\u5176\u6574\u4f53\u6846\u67b6\u80fd\u591f\u5e76\u884c\u4ece\u8d85\u8fc7300\u4e2a\u7f51\u9875\u4e2d\u68c0\u7d22\u548c\u6574\u5408\u4fe1\u606f\uff0c\u4ec5\u97003\u5206\u949f\uff0c\u76f8\u5f53\u4e8e\u8282\u7701\u4e863\u5c0f\u65f6\u7684\u4eba\u7c7b\u52aa\u529b\u3002 MindSearch\u5728\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u9002\u7528\u4e8e\u5c01\u95ed\u96c6\u548c\u5f00\u653e\u96c6\u7684\u95ee\u7b54\u95ee\u9898\u3002\u6b64\u5916\uff0c\u57fa\u4e8eInternLM2.5-7B\u7684MindSearch\u751f\u6210\u7684\u54cd\u5e94\u88ab\u4eba\u7c7b\u8ba4\u4e3a\u4f18\u4e8eChatGPT-Web\u548cPerplexity.ai\u5e94\u7528\uff0c\u8fd9\u8868\u660eMindSearch\u5df2\u7ecf\u80fd\u591f\u63d0\u4f9b\u4e0e\u4e13\u6709AI\u641c\u7d22\u5f15\u64ce\u76f8\u7ade\u4e89\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2407.20174": "|**2024-07-29**|**Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning**|Xingchen Zeng et.al.|[2407.20174](http://arxiv.org/abs/2407.20174)|**[link](https://github.com/zengxingchen/chartqa-mllm)**|**\u65b0\u5174\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u8868\u95ee\u9898\u56de\u7b54\uff08CQA\uff09\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u8fd1\u671f\u7684\u52aa\u529b\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u6269\u5927\u8bad\u7ec3\u6570\u636e\u96c6\uff08\u5305\u62ec\u56fe\u8868\u3001\u6570\u636e\u8868\u683c\u548c\u95ee\u7b54\u5bf9\uff09\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u73b0\u6709MLLMs\u548cCQA\u6570\u636e\u96c6\u7684\u5b9e\u8bc1\u7814\u7a76\u63ed\u793a\u4e86\u663e\u8457\u7684\u5dee\u8ddd\u3002 \u9996\u5148\uff0c\u5f53\u524d\u7684\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u6570\u636e\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u7cbe\u7ec6\u7684\u89c6\u89c9\u7f16\u7801\u548c\u95ee\u7b54\u4efb\u52a1\u7684\u8003\u8651\uff0c\u5bfc\u81f4\u6570\u636e\u5206\u5e03\u4e0e\u5b9e\u9645CQA\u573a\u666f\u5927\u76f8\u5f84\u5ead\uff0c\u4e0d\u5e73\u8861\u6027\u660e\u663e\u3002\u5176\u6b21\uff0c\u73b0\u6709\u7684\u5de5\u4f5c\u9075\u5faa\u4e86\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u81ea\u7136\u56fe\u50cf\u7684\u57fa\u7840MLLMs\u7684\u8bad\u7ec3\u914d\u65b9\uff0c\u5bf9\u4e8e\u56fe\u8868\u7684\u72ec\u7279\u7279\u6027\uff0c\u5982\u4e30\u5bcc\u7684\u6587\u672c\u5143\u7d20\u7684\u9002\u5e94\u6027\u63a2\u7d22\u4e0d\u8db3\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u53c2\u8003\u6307\u4ee4\u8c03\u6574\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u589e\u5f3a\u548c\u6a21\u578b\u5f00\u53d1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u5f15\u64ce\uff0c\u80fd\u591f\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u6709\u6548\u5730\u7b5b\u9009\u51fa\u591a\u6837\u6027\u548c\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u5e76\u968f\u540e\u5229\u7528\u57fa\u4e8eLLM\u7684\u751f\u6210\u6280\u672f\u5bf9\u6570\u636e\u8fdb\u884c\u7ec6\u5316\u548c\u6269\u5145\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0e\u5b9e\u9645\u95ee\u7b54\u4efb\u52a1\u548c\u89c6\u89c9\u7f16\u7801\u76f8\u5339\u914d\u3002 \u7136\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u56fe\u8868\u7279\u6027\u7684\u9002\u5e94\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e30\u5bcc\u5316\u6570\u636e\u6765\u8bad\u7ec3\u4e00\u4e2aMLLM\uff0c\u901a\u8fc7\u89e3\u51bb\u89c6\u89c9\u7f16\u7801\u5668\u5e76\u5f15\u5165\u6df7\u5408\u5206\u8fa8\u7387\u9002\u5e94\u7b56\u7565\uff0c\u4ee5\u589e\u5f3a\u7ec6\u5fae\u7c92\u5ea6\u8bc6\u522b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5373\u4f7f\u4f7f\u7528\u8f83\u5c11\u7684\u8bad\u7ec3\u793a\u4f8b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u7684CQA\u6a21\u578b\uff0c\u5728\u5df2\u5efa\u7acb\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u8fd8\u8d21\u732e\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\u5206\u5272\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u51c6\u3002\u8be5\u8bba\u6587\u7684\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/zengxingchen/ChartQA-MLLM\u3002**|\n", "2407.20171": "|**2024-07-29**|**Diffusion Feedback Helps CLIP See Better**|Wenxuan Wang et.al.|[2407.20171](http://arxiv.org/abs/2407.20171)|**[link](https://github.com/baaivision/diva)**|\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u5728\u8de8\u9886\u57df\u548c\u6a21\u6001\u62bd\u8c61\u5f00\u653e\u4e16\u754c\u8868\u793a\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5df2\u6210\u4e3a\u5404\u79cd\u89c6\u89c9\u548c\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86CLIP\u5728\u89c6\u89c9\u65b9\u9762\u7684\u4e25\u91cd\u5c40\u9650\u6027\uff0c\u5982\u96be\u4ee5\u533a\u5206\u65b9\u5411\u3001\u6570\u91cf\u3001\u989c\u8272\u3001\u7ed3\u6784\u7b49\u3002\u8fd9\u4e9b\u89c6\u89c9\u5c40\u9650\u6027\u4e5f\u9650\u5236\u4e86\u57fa\u4e8eCLIP\u6784\u5efa\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u611f\u77e5\u80fd\u529b\u3002\u4e3b\u8981\u539f\u56e0\u662f\u7528\u4e8e\u8bad\u7ec3CLIP\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u56fa\u6709\u504f\u89c1\uff0c\u7531\u4e8e\u6587\u672c\u7684\u4e0d\u660e\u786e\u6027\u548c\u56fe\u7247\u591a\u6837\u6027\u4e0d\u8db3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9CLIP\u6a21\u578b\u7684\u7b80\u5355\u540e\u5904\u7406\u65b9\u6cd5\uff0c\u901a\u8fc7\u81ea\u6211\u76d1\u7763\u7684\u6269\u6563\u8fc7\u7a0b\u6781\u5927\u5730\u514b\u670d\u4e86\u5176\u89c6\u89c9\u5c40\u9650\u6027\u3002\u6211\u4eec\u5f15\u5165\u4e86DIVA\uff0c\u5373\u4f5c\u4e3aCLIP\u89c6\u89c9\u8f85\u52a9\u7684\u6269\u6563\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0cDIVA\u5229\u7528\u6587\u672c\u5230\u56fe\u50cf\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u53cd\u9988\u6765\u4f18\u5316CLIP\u8868\u793a\uff0c\u4ec5\u4f7f\u7528\u56fe\u50cf\uff08\u4e0d\u5305\u62ec\u5bf9\u5e94\u6587\u672c\uff09\u3002\u6211\u4eec\u8bc1\u660eDIVA\u5728MMVP-VLM\u57fa\u51c6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86CLIP\u7684\u6027\u80fd\uff0c\u8be5\u57fa\u51c6\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u7ec6\u5fae\u7684\u89c6\u89c9\u80fd\u529b\uff08\u4f8b\u5982\uff0c3-7%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u589e\u5f3a\u4e86MLLMs\u548c\u89c6\u89c9\u6a21\u578b\u5728\u591a\u6a21\u6001\u7406\u89e3\u548c\u5206\u5272\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u572829\u4e2a\u56fe\u50cf\u5206\u7c7b\u548c\u68c0\u7d22\u57fa\u51c6\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u4fdd\u7559\u4e86CLIP\u5f3a\u5927\u7684\u96f6\u6837\u672c\u80fd\u529b\u3002\u4ee3\u7801\u5c06\u5728https://github.com/baaivision/DIVA\u516c\u5f00\u3002|\n", "2407.20164": "|**2024-07-29**|**Language-Conditioned Offline RL for Multi-Robot Navigation**|Steven Morad et.al.|[2407.20164](http://arxiv.org/abs/2407.20164)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u4e3a\u591a\u673a\u5668\u4eba\u56e2\u961f\u5f00\u53d1\u80fd\u591f\u7406\u89e3\u5e76\u9075\u5faa\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u7684\u5bfc\u822a\u7b56\u7565\u3002\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5d4c\u5165\u6765\u6761\u4ef6\u5316\u8fd9\u4e9b\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u4ec520\u5206\u949f\u968f\u673a\u6536\u96c6\u7684\u6570\u636e\u8fdb\u884c\u79bb\u7ebf\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u5b83\u4eec\u3002\u5728\u4e94\u53f0\u771f\u5b9e\u673a\u5668\u4eba\u7684\u5b9e\u9a8c\u4e2d\uff0c\u8fd9\u4e9b\u7b56\u7565\u5bf9\u672a\u89c1\u8fc7\u7684\u547d\u4ee4\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8868\u660e\u5b83\u4eec\u7406\u89e3\u4e86LLM\u7684\u6f5c\u5728\u7a7a\u95f4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u9700\u8981\u6a21\u62df\u5668\u6216\u73af\u5883\u6a21\u578b\uff0c\u5e76\u4ea7\u751f\u4f4e\u5ef6\u8fdf\u7684\u63a7\u5236\u7b56\u7565\uff0c\u53ef\u4ee5\u76f4\u63a5\u90e8\u7f72\u5230\u771f\u5b9e\u673a\u5668\u4eba\u4e0a\u800c\u65e0\u9700\u8fdb\u4e00\u6b65\u8c03\u4f18\u3002\u66f4\u591a\u4fe1\u606f\u548c\u5b9e\u9a8c\u89c6\u9891\u8bf7\u53c2\u9605https://sites.google.com/view/llm-marl\u3002|\n", "2407.20157": "|**2024-07-29**|**rLLM: Relational Table Learning with LLMs**|Weichen Li et.al.|[2407.20157](http://arxiv.org/abs/2407.20157)|**[link](https://github.com/rllm-project/rllm)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3arLLM\uff08\u5173\u7cfbLLM\uff09\u7684PyTorch\u5e93\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5b9e\u73b0\u5173\u7cfb\u8868\u5b66\u4e60\uff08RTL\uff09\u3002\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u6700\u5148\u8fdb\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u3001LLMs\u548c\u8868\u795e\u7ecf\u7f51\u7edc\u5206\u89e3\u4e3a\u6807\u51c6\u5316\u6a21\u5757\uff0c\u4ee5\u5b9e\u73b0\u5feb\u901f\u6784\u5efa\u65b0\u578bRTL\u578b\u6a21\u578b\u7684\u7b80\u5355\u201c\u7ec4\u5408\u3001\u5bf9\u9f50\u548c\u8054\u5408\u8bad\u7ec3\u201d\u65b9\u5f0f\u3002\u4e3a\u4e86\u8bf4\u660erLLM\u7684\u4f7f\u7528\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RTL\u65b9\u6cd5\u540d\u4e3aBRIDGE\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7ecf\u5178\u6570\u636e\u96c6\uff0c\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u5173\u7cfb\u8868\u683c\u6570\u636e\u96c6\uff08TML1M\u3001TLF2K\u548cTACM12K\uff09\u3002\u6211\u4eec\u5e0c\u671brLLM\u80fd\u591f\u4f5c\u4e3a\u7528\u4e8eRTL\u76f8\u5173\u4efb\u52a1\u7684\u6709\u7528\u4e14\u6613\u4e8e\u4f7f\u7528\u7684\u5f00\u53d1\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u4f4d\u7f6e\u83b7\u53d6\uff1ahttps://github.com/rllm-project/rllm\u3002**|\n", "2407.20143": "|**2024-07-29**|**ByteCheckpoint: A Unified Checkpointing System for LLM Development**|Borui Wan et.al.|[2407.20143](http://arxiv.org/abs/2407.20143)|null|\u5728\u6784\u5efa\u5b9e\u9645\u4e16\u754c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\uff0c\u9700\u8981\u5728\u6301\u4e45\u5b58\u50a8\u4e2d\u68c0\u67e5\u8bad\u7ec3\u72b6\u6001\u4ee5\u9632\u6b62\u6f5c\u5728\u7684\u8f6f\u4ef6\u548c\u786c\u4ef6\u6545\u969c\uff0c\u5e76\u652f\u6301\u8bad\u7ec3\u7ba1\u9053\u5185\u7684\u68c0\u67e5\u70b9\u8f6c\u79fb\u4ee5\u53ca\u8de8\u4efb\u52a1\u4f7f\u7528\u3002\u7531\u4e8eLLMs\u7684\u89c4\u6a21\u5e9e\u5927\uff0c\u4fdd\u5b58\u548c\u52a0\u8f7d\u68c0\u67e5\u70b9\u5f80\u5f80\u4f1a\u5bfc\u81f4\u4ee4\u4eba\u96be\u4ee5\u63a5\u53d7\u7684\u5206\u949f\u7ea7\u5ef6\u8fdf\uff0c\u6781\u5927\u5730\u964d\u4f4e\u4e86\u8bad\u7ec3\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8de8\u4efb\u52a1\u8f6c\u79fb\u68c0\u67e5\u70b9\u65f6\uff0c\u901a\u5e38\u9700\u8981\u6267\u884c\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\uff0c\u5373\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u7684\u7279\u6027\u548c\u8d44\u6e90\u914d\u989d\u5c06\u68c0\u67e5\u70b9\u52a0\u8f7d\u5230\u4e0d\u540c\u7684\u5e76\u884c\u914d\u7f6e\u4e2d\u3002\u5148\u524d\u7684\u68c0\u67e5\u70b9\u7cfb\u7edf\u5047\u8bbe\u5e76\u884c\u914d\u7f6e\u4e00\u81f4\uff0c\u672a\u80fd\u89e3\u51b3\u5728\u91cd\u65b0\u5206\u7247\u671f\u95f4\u8f6c\u6362\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u800c\u4e14\uff0c\u5728\u5de5\u4e1a\u5e73\u53f0\u4e2d\uff0c\u5f00\u53d1\u8005\u4ece\u4e0d\u540c\u7684\u8bad\u7ec3\u6846\u67b6\u521b\u5efa\u68c0\u67e5\u70b9\uff0c\u6bcf\u4e2a\u6846\u67b6\u90fd\u6709\u5176\u72ec\u7279\u7684\u5b58\u50a8\u548cI/O\u903b\u8f91\uff0c\u8fd9\u589e\u52a0\u4e86\u7edf\u4e00\u7ba1\u7406\u548c\u4f18\u5316\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86ByteCheckpoint\uff0c\u4e00\u4e2a\u652f\u6301\u81ea\u52a8\u5728\u7ebf\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\u7684PyTorch\u539f\u751f\u591a\u6846\u67b6LLM\u68c0\u67e5\u70b9\u7cfb\u7edf\u3002ByteCheckpoint\u91c7\u7528\u6570\u636e/\u5143\u6570\u636e\u5206\u79bb\u7684\u5b58\u50a8\u67b6\u6784\uff0c\u89e3\u8026\u4e86\u68c0\u67e5\u70b9\u5b58\u50a8\u4e0e\u6240\u91c7\u7528\u7684\u5e76\u884c\u7b56\u7565\u548c\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5f02\u6b65\u5f20\u91cf\u5408\u5e76\u6280\u672f\u6765\u89e3\u51b3\u4e0d\u89c4\u5219\u5f20\u91cf\u5206\u7247\u95ee\u9898\uff0c\u5e76\u63d0\u51fa\u4e86\u591a\u9879I/O\u6027\u80fd\u4f18\u5316\u63aa\u65bd\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u68c0\u67e5\u70b9\u4fdd\u5b58\u548c\u52a0\u8f7d\u7684\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cByteCheckpoint\u5728\u51cf\u5c11\u68c0\u67e5\u70b9\u4fdd\u5b58\uff08\u6700\u9ad8\u53ef\u8fbe529.22\u500d\uff09\u548c\u52a0\u8f7d\uff08\u6700\u9ad8\u53ef\u8fbe3.51\u500d\uff09\u6210\u672c\u65b9\u9762\u5177\u6709\u660e\u663e\u4f18\u52bf\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\u3002|\n", "2407.20053": "|**2024-07-29**|**Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models**|Zhe Li et.al.|[2407.20053](http://arxiv.org/abs/2407.20053)|null|\u663e\u8457\u6ce2\u9ad8\uff08SWH\uff09\u5728\u6d77\u6d0b\u79d1\u5b66\u4e2d\u662f\u4e00\u4e2a\u5173\u952e\u6307\u6807\uff0c\u7cbe\u786e\u7684SWH\u4f30\u8ba1\u5bf9\u4e8e\u5404\u79cd\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f8b\u5982\u6d77\u6d0b\u80fd\u5f00\u53d1\u3001\u6e14\u4e1a\u3001\u6f5c\u5728\u98ce\u9669\u7684\u65e9\u671f\u9884\u8b66\u7cfb\u7edf\u7b49\u3002\u57fa\u4e8e\u6570\u503c\u6a21\u578b\u548c\u7269\u7406\u7406\u8bba\u7684\u4f20\u7edfSWH\u4f30\u7b97\u65b9\u6cd5\u53d7\u5230\u8ba1\u7b97\u6548\u7387\u4f4e\u4e0b\u7684\u9650\u5236\u3002\u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u4f5c\u4e3a\u4e00\u79cd\u6709\u5438\u5f15\u529b\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5df2\u7528\u4e8e\u63d0\u9ad8\u51c6\u786e\u5ea6\u5e76\u51cf\u5c11\u8ba1\u7b97\u65f6\u95f4\u3002\u7136\u800c\uff0c\u7531\u4e8e\u89c2\u6d4b\u6280\u672f\u6709\u9650\u548c\u6210\u672c\u9ad8\u6602\uff0c\u5b9e\u9645\u6570\u636e\u7684\u7a00\u7f3a\u6027\u9650\u5236\u4e86\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6d77\u6d0bSWH\u4f30\u7b97\u6846\u67b6\uff0c\u540d\u4e3aOrca\u3002\u5177\u4f53\u800c\u8a00\uff0cOrca\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u7a7a\u95f4\u65f6\u95f4\u611f\u77e5\u7f16\u7801\u6a21\u5757\uff0c\u589e\u5f3a\u4e86\u7ecf\u5178\u8bed\u8a00\u6a21\u578b\u5728\u7a7a\u95f4\u65f6\u95f4\u548c\u6570\u636e\u91cf\u6709\u9650\u60c5\u51b5\u4e0b\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u6709\u9650\u7684\u6d6e\u6807\u89c2\u6d4b\u6570\u636e\u8fdb\u884c\u65f6\u95f4\u5206\u5272\u3001\u7f16\u7801\u6d6e\u6807\u7684\u5730\u7406\u4f4d\u7f6e\u3001\u8bbe\u8ba1\u63d0\u793a\u6a21\u677f\uff0cOrca\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u6709\u6548\u5730\u4f7f\u7528\u6709\u9650\u7684\u6570\u636e\u5bf9\u663e\u8457\u6ce2\u9ad8\u8fdb\u884c\u4f30\u7b97\u3002\u5728\u58a8\u897f\u54e5\u6e7e\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cOrca\u5728SWH\u4f30\u7b97\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.21018": "|**2024-07-30**|**ThinK: Thinner Key Cache by Query-Driven Pruning**|Yuhui Xu et.al.|[2407.21018](http://arxiv.org/abs/2407.21018)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5f15\u53d1\u4e86\u4e00\u573a\u9769\u547d\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u5927\u7684\u6a21\u578b\u89c4\u6a21\u548c\u5e8f\u5217\u957f\u5ea6\uff0c\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u968f\u4e4b\u800c\u6765\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6210\u672c\u7684\u589e\u52a0\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\uff0c\u7531\u4e8e\u6ce8\u610f\u529b\u673a\u5236\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c\u5bf9\u7f13\u5b58\u5185\u5b58\u7ba1\u7406\u63d0\u51fa\u4e86\u4e25\u5cfb\u8003\u9a8c\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u957f\u4e0a\u4e0b\u6587\u573a\u666f\uff0c\u9488\u5bf9\u63a8\u7406\u8fc7\u7a0b\u4e2dKV\u7f13\u5b58\u5185\u5b58\u6d88\u8017\u7684\u6548\u7387\u95ee\u9898\u8fdb\u884c\u6df1\u5165\u63a2\u8ba8\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u57fa\u4e8e\u5e8f\u5217\u957f\u5ea6\u4f18\u5316\u5185\u5b58\u4e0d\u540c\uff0c\u6211\u4eec\u63ed\u793a\u4e86KV\u7f13\u5b58\u901a\u9053\u5728\u6743\u91cd\u5206\u5e03\u4e0d\u5747\u548c\u4f4e\u79e9\u7ed3\u6784\u7279\u5f81\u4e0b\u5b58\u5728\u663e\u8457\u5197\u4f59\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c2\u5bdf\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aThinK\u7684\u65b0\u578b\u67e5\u8be2\u4f9d\u8d56\u578bKV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\uff0c\u65e8\u5728\u6700\u5c0f\u5316\u6ce8\u610f\u529b\u6743\u91cd\u635f\u5931\u7684\u540c\u65f6\uff0c\u6709\u9009\u62e9\u5730\u526a\u679d\u6389\u6700\u4e0d\u91cd\u8981\u7684\u901a\u9053\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u80fd\u591f\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u51c6\u786e\u7387\uff0c\u800c\u4e14\u76f8\u6bd4\u4f20\u7edf\u7684KV\u7f13\u5b58\u6dd8\u6c70\u65b9\u6cd5\uff0c\u80fd\u5b9e\u73b0\u8d85\u8fc720%\u7684\u5185\u5b58\u6210\u672c\u51cf\u5c11\u3002\u901a\u8fc7\u5728LLaMA3\u548cMistral\u6a21\u578b\u4e0a\u5bf9\u591a\u4e2a\u957f\u5e8f\u5217\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86ThinK\u7684\u6709\u6548\u6027\uff0c\u786e\u7acb\u4e86\u5728\u4e0d\u727a\u7272\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u9ad8\u6548\u90e8\u7f72LLM\u7684\u65b0\u6807\u51c6\u3002\u6211\u4eec\u8fd8\u5c55\u671b\u4e86\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u5230\u503c\u7f13\u5b58\u526a\u679d\u7684\u53ef\u80fd\u6027\uff0c\u5c55\u793a\u4e86ThinK\u5728\u964d\u4f4e\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u65b9\u9762\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u548c\u6f5c\u529b\u3002|\n", "2407.21011": "|**2024-07-30**|**CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning**|Yuexi Du et.al.|[2407.21011](http://arxiv.org/abs/2407.21011)|**[link](https://github.com/xypb/cleft)**|**\u8fd1\u671f\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u7684\u8fdb\u5c55\u5728\u591a\u4efb\u52a1\u81ea\u76d1\u7763\u8868\u793a\u5b66\u4e60\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u7136\u800c\uff0c\u73b0\u6709CLIP\u7c7b\u65b9\u6cd5\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684GPU\u8d44\u6e90\u548c\u957f\u65f6\u95f4\u7684\u8bad\u7ec3\u5468\u671f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u4e8e\u533b\u5b66\u5e94\u7528\u800c\u8a00\uff0c\u5927\u89c4\u6a21\u6570\u636e\u96c6\u5e76\u4e0d\u603b\u662f\u5e38\u89c1\u3002\u540c\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u4e3b\u8981\u57fa\u4e8e\u4e0e\u56fe\u50cf\u5173\u8054\u7684\u6807\u7b7e\u8fdb\u884c\u624b\u52a8\u63d0\u53d6\uff0c\u53ef\u80fd\u5ffd\u89c6\u4e86\u8bad\u7ec3\u6837\u672c\u5185\u7684\u4e30\u5bcc\u4fe1\u606f\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u9ad8\u6548\u5927\u8bed\u8a00\u6a21\u578b\u4e0e\u63d0\u793a\u5fae\u8c03\u201d\uff08CLEFT\uff09\u7684\u8bed\u8a00-\u56fe\u50cf\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b83\u5145\u5206\u5229\u7528\u4e86\u5e7f\u6cdb\u9884\u8bad\u7ec3\u7684\u8bed\u4e49\u548c\u89c6\u89c9\u6a21\u578b\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7b56\u7565\u6765\u5b66\u4e60\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u63d0\u793a\uff0c\u4ee5\u7f29\u5c0f\u4e34\u5e8a\u8bca\u65ad\u6570\u636e\u4e0e\u7b80\u5355\u7c7b\u522b\u6807\u7b7e\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u591a\u4e2a\u80f8\u90e8X\u5149\u548c\u4e73\u817aX\u5149\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u5747\u4f18\u4e8e\u5404\u79cd\u57fa\u7ebf\uff0c\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002 \u6240\u63d0\u51fa\u7684\u53c2\u6570\u9ad8\u6548\u7684\u6846\u67b6\u53ef\u4ee5\u5c06\u603b\u53ef\u8bad\u7ec3\u6a21\u578b\u5927\u5c0f\u51cf\u5c1139%\uff0c\u5e76\u5c06\u53ef\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u51cf\u5c11\u5230\u4ec54%\uff0c\u4e0e\u5f53\u524d\u7684BERT\u7f16\u7801\u5668\u76f8\u6bd4\u3002**|\n", "2407.20999": "|**2024-07-30**|**MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning**|Yupeng Chen et.al.|[2407.20999](http://arxiv.org/abs/2407.20999)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u975e\u51e1\u7684\u80fd\u529b\u3002\u901a\u5e38\uff0cLLM\u901a\u8fc7\u5927\u91cf\u8bed\u6599\u5e93\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u5e76\u968f\u540e\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0cLLM\u53ef\u80fd\u4f1a\u5fd8\u8bb0\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5b66\u5230\u7684\u77e5\u8bc6\uff0c\u5bfc\u81f4\u4e00\u822c\u80fd\u529b\u4e0b\u964d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5fae\u8c03\u7b97\u6cd5\u2014\u2014\u52a8\u91cf\u8fc7\u6ee4\u4f18\u5316\u5668\uff08MoFO\uff09\u3002MoFO\u7684\u6838\u5fc3\u601d\u60f3\u662f\u8fed\u4ee3\u5730\u9009\u62e9\u5e76\u66f4\u65b0\u5177\u6709\u6700\u5927\u52a8\u91cf\u5e45\u5ea6\u7684\u6a21\u578b\u53c2\u6570\u3002\u4e0e\u5168\u53c2\u6570\u8bad\u7ec3\u76f8\u6bd4\uff0cMoFO\u5728\u4fdd\u6301\u53c2\u6570\u63a5\u8fd1\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u540c\u65f6\u5b9e\u73b0\u4e86\u76f8\u4f3c\u7684\u5fae\u8c03\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86\u77e5\u8bc6\u9057\u5fd8\u7684\u95ee\u9898\u3002\u4e0e\u73b0\u6709\u7684\u5927\u591a\u6570\u9057\u5fd8\u7f13\u89e3\u65b9\u6cd5\u4e0d\u540c\uff0cMoFO\u5177\u5907\u4ee5\u4e0b\u4e24\u4e2a\u4f18\u52bf\u3002\u9996\u5148\uff0cMoFO\u4e0d\u9700\u8981\u8bbf\u95ee\u9884\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4f7f\u5f97MoFO\u7279\u522b\u9002\u7528\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e0d\u53ef\u7528\u7684\u5fae\u8c03\u573a\u666f\uff0c\u5982\u4f7f\u7528\u5f00\u6e90LLM\u7684\u68c0\u67e5\u70b9\u8fdb\u884c\u5fae\u8c03\u3002\u5176\u6b21\uff0cMoFO\u4e0d\u4f1a\u6539\u53d8\u539f\u59cb\u635f\u5931\u51fd\u6570\u3002\u8fd9\u53ef\u4ee5\u907f\u514d\u635f\u5bb3\u6a21\u578b\u5728\u5fae\u8c03\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u901a\u8fc7\u4e25\u8c28\u7684\u6536\u655b\u6027\u5206\u6790\u548c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86MoFO\u7684\u4f18\u8d8a\u6027\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u7f13\u89e3\u9057\u5fd8\u548c\u589e\u5f3a\u5fae\u8c03\u6027\u80fd\u65b9\u9762\u7684\u4f18\u52bf\u3002|\n", "2407.20990": "|**2024-07-30**|**From Feature Importance to Natural Language Explanations Using LLMs with RAG**|Sule Tekkesinoglu et.al.|[2407.20990](http://arxiv.org/abs/2407.20990)|**[link](https://github.com/suletekkesinoglu/xai_llm_rag)**|\u968f\u7740\u673a\u5668\u5b66\u4e60\u5728\u6d89\u53ca\u4eba\u7c7b\u4ea4\u4e92\u7684\u81ea\u4e3b\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4f5c\u7528\u65e5\u76ca\u91cd\u8981\uff0c\u7406\u89e3\u6a21\u578b\u8f93\u51fa\u53d8\u5f97\u8d8a\u6765\u8d8a\u5173\u952e\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u6b63\u88ab\u63a2\u7d22\u7528\u4f5c\u4e8b\u540e\u89e3\u91ca\u5668\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u63ed\u793a\u9884\u6d4b\u6a21\u578b\u51b3\u7b56\u673a\u5236\u7684\u9014\u5f84\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u53ef\u8ffd\u8e2a\u95ee\u7b54\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u6307\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u573a\u666f\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u7528\u6237\u67e5\u8be2\u8fdb\u884c\u54cd\u5e94\u3002\u8be5\u77e5\u8bc6\u5e93\u5305\u542b\u4e86\u5173\u4e8e\u6a21\u578b\u8f93\u51fa\u7684\u4e0a\u4e0b\u6587\u7ec6\u8282\uff0c\u5305\u62ec\u9ad8\u7ea7\u7279\u5f81\u3001\u7279\u5f81\u91cd\u8981\u6027\u4ee5\u53ca\u66ff\u4ee3\u6982\u7387\u3002 \u6211\u4eec\u91c7\u7528\u51cf\u6cd5\u53cd\u4e8b\u5b9e\u63a8\u7406\u8ba1\u7b97\u7279\u5f81\u91cd\u8981\u6027\uff0c\u8fd9\u662f\u4e00\u79cd\u5206\u6790\u5728\u5206\u89e3\u8bed\u4e49\u7279\u5f81\u540e\u8f93\u51fa\u53d8\u5316\u7684\u65b9\u6cd5\u3002\u4e3a\u4e86\u4fdd\u6301\u5bf9\u8bdd\u6d41\u7545\uff0c\u6211\u4eec\u4ece\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u4e2d\u63d0\u70bc\u51fa\u56db\u4e2a\u5173\u952e\u7279\u6027\u2014\u2014\u793e\u4ea4\u6027\u3001\u56e0\u679c\u6027\u3001\u9009\u62e9\u6027\u548c\u5bf9\u6bd4\u6027\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u4e00\u4e2a\u5373\u65f6\u63d0\u793a\u4e2d\uff0c\u4ee5\u6b64\u6307\u5bfc\u54cd\u5e94\u751f\u6210\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u751f\u6210\u7684\u89e3\u91ca\u5305\u542b\u4e86\u8fd9\u4e9b\u5143\u7d20\uff0c\u8fd9\u8868\u660e\u5b83\u6709\u53ef\u80fd\u5728\u590d\u6742\u6a21\u578b\u8f93\u51fa\u4e0e\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u4e4b\u95f4\u67b6\u8d77\u6865\u6881\u3002|\n", "2407.20970": "|**2024-07-30**|**Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks**|Alakesh Kalita et.al.|[2407.20970](http://arxiv.org/abs/2407.20970)|null|\u968f\u7740\u7b2c\u4e94\u4ee3\uff085G\uff09\u548c\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u4ee5\u53ca\u7269\u8054\u7f51\uff08IoT\uff09\u7684\u5174\u8d77\uff0c\u8bed\u4e49\u901a\u4fe1\u6b63\u53d7\u5230\u7814\u7a76\u8005\u7684\u5173\u6ce8\uff0c\u56e0\u4e3a\u5f53\u524d\u7684\u901a\u4fe1\u6280\u672f\u6b63\u63a5\u8fd1\u9999\u519c\u6781\u9650\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u7406\u89e3\u5e76\u751f\u6210\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u6587\u672c\uff0c\u57fa\u4e8e\u5bf9\u6570\u5341\u4ebf\u53c2\u6570\u7684\u5e7f\u6cdb\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u8003\u8651\u5230\u6700\u8fd1\u7684\u5c31\u8fd1\u8ba1\u7b97\u6280\u672f\u5982\u8fb9\u7f18\u8ba1\u7b97\uff0c\u672c\u6587\u6982\u8ff0\u4e86\u4e00\u4e2a\u6846\u67b6\u53ca\u5176\u6a21\u5757\uff0c\u5176\u4e2dLLMs\u53ef\u4ee5\u5728\u7269\u8054\u7f51\u7f51\u7edc\u7684\u7f51\u7edc\u8fb9\u7f18\u4e0b\uff0c\u4f5c\u4e3a\u8bed\u4e49\u901a\u4fe1\u7684\u4e00\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u9ad8\u6548\u901a\u4fe1\u6548\u7387\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e00\u4e9b\u5e94\u7528\uff0c\u5e76\u5206\u6790\u4e86\u53d1\u5c55\u6b64\u7c7b\u7cfb\u7edf\u7684\u6311\u6218\u548c\u673a\u9047\u3002|\n", "2407.20906": "|**2024-07-30**|**Automated Review Generation Method Based on Large Language Models**|Shican Wu et.al.|[2407.20906](http://arxiv.org/abs/2407.20906)|**[link](https://github.com/tju-ecat-ai/automaticreviewgeneration)**|**\u6587\u732e\u7814\u7a76\u5bf9\u4e8e\u79d1\u5b66\u8fdb\u6b65\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u5bf9\u6d77\u91cf\u4fe1\u606f\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u52a8\u5316\u7efc\u8ff0\u751f\u6210\u65b9\u6cd5\uff0c\u65e8\u5728\u7b80\u5316\u6587\u732e\u5904\u7406\u6d41\u7a0b\u5e76\u51cf\u8f7b\u8ba4\u77e5\u8d1f\u62c5\u3002\u4ee5\u4e19\u70f7\u8131\u6c22\uff08PDH\uff09\u50ac\u5316\u5242\u4e3a\u4f8b\uff0c\u8be5\u65b9\u6cd5\u4ece343\u7bc7\u6587\u7ae0\u4e2d\u8fc5\u901f\u751f\u6210\u4e86\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u5e73\u5747\u6bcf\u7bc7\u6587\u7ae0\u6bcfLLM\u8d26\u6237\u8017\u65f6\u4ec5\u6570\u79d2\u3002\u5bf91041\u7bc7\u6587\u7ae0\u7684\u8fdb\u4e00\u6b65\u5206\u6790\u63ed\u793a\u4e86\u50ac\u5316\u5242\u7ec4\u6210\u3001\u7ed3\u6784\u548c\u6027\u80fd\u7684\u6df1\u5165\u89c1\u89e3\u3002 \u8ba4\u8bc6\u5230LLM\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u7684\u95ee\u9898\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u591a\u5c42\u6b21\u7684\u8d28\u91cf\u63a7\u5236\u7b56\u7565\uff0c\u786e\u4fdd\u4e86\u65b9\u6cd5\u7684\u53ef\u9760\u6027\u548c\u6709\u6548\u7f13\u89e3\u5e7b\u89c9\u7684\u80fd\u529b\u3002\u4e13\u5bb6\u9a8c\u8bc1\u8bc1\u5b9e\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u751f\u6210\u7684\u7efc\u8ff0\u4e0d\u4ec5\u51c6\u786e\u4e14\u5f15\u6587\u5b8c\u6574\uff0cLLM\u5e7b\u89c9\u7684\u98ce\u9669\u5df2\u964d\u81f3\u4f4e\u4e8e0.5%\uff0c\u7f6e\u4fe1\u5ea6\u8d85\u8fc795%\u3002\u53d1\u5e03\u7684Windows\u5e94\u7528\u7a0b\u5e8f\u652f\u6301\u4e00\u952e\u751f\u6210\u7efc\u8ff0\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8ddf\u8e2a\u6700\u65b0\u8fdb\u5c55\u5e76\u63a8\u8350\u76f8\u5173\u6587\u732e\u3002\u8fd9\u4e00\u65b9\u6cd5\u5c55\u793a\u4e86LLM\u5728\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u751f\u4ea7\u529b\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.20898": "|**2024-07-30**|**ThinkRepair: Self-Directed Automated Program Repair**|Xin Yin et.al.|[2407.20898](http://arxiv.org/abs/2407.20898)|**[link](https://github.com/vinci-grape/ThinkRepair)**|**\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u8bb8\u591a\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\uff08APR\uff09\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u4fee\u590d\u4e00\u4e9b\u7279\u5b9a\u7c7b\u578b\u7684\u9519\u8bef\u65f6\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u5bf9\u9519\u8bef\u7a0b\u5e8f\u7684\u903b\u8f91\u8fdb\u884c\u5206\u6790\u548c\u63a8\u7406\u7684\u590d\u6742\u9519\u8bef\u65f6\u4ecd\u5b58\u5728\u5c40\u9650\u6027\u3002\u6700\u8fd1\uff0c\u901a\u8fc7\u63d0\u793a\u5de5\u7a0b\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u89e3\u51b3\u5305\u62ec\u9519\u8bef\u4fee\u590d\u5728\u5185\u7684\u591a\u79cd\u4efb\u52a1\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u63d0\u793a\u7684\u8d28\u91cf\u4f1a\u6781\u5927\u5730\u5f71\u54cdLLMs\u7684\u80fd\u529b\uff0c\u800c\u624b\u52a8\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u63d0\u793a\u662f\u4e00\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5bfc\u5411\u7684LLM\u57fa\u4e8e\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\u65b9\u6cd5ThinkRepair\uff0c\u5b83\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a\u6536\u96c6\u9636\u6bb5\u548c\u4fee\u590d\u9636\u6bb5\u3002\u5728\u6536\u96c6\u9636\u6bb5\uff0c\u901a\u8fc7\u4f7f\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u6307\u5bfcLLMs\uff0c\u81ea\u52a8\u6536\u96c6\u6784\u6210\u9884\u4fee\u590d\u77e5\u8bc6\u7684\u5404\u79cd\u601d\u8003\u94fe\u3002\u5728\u4fee\u590d\u9636\u6bb5\uff0c\u76ee\u6807\u662f\u901a\u8fc7\u9996\u5148\u9009\u62e9\u7528\u4e8e\u5c11\u91cf\u5b66\u4e60\u7684\u793a\u4f8b\u5e76\u5176\u6b21\u4e0eLLMs\u81ea\u52a8\u4ea4\u4e92\u6765\u4fee\u590d\u9519\u8bef\uff0c\u6839\u636e\u6d4b\u8bd5\u4fe1\u606f\u63d0\u4f9b\u53cd\u9988\uff08\u5982\u679c\u9700\u8981\u7684\u8bdd\uff09\u3002 \u5728\u5bf9\u4e24\u4e2a\u5e7f\u6cdb\u7814\u7a76\u7684\u6570\u636e\u96c6\uff08Defects4J\u548cQuixBugs\uff09\u7684\u8bc4\u4f30\u4e2d\uff0c\u4e0e12\u4e2a\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u8868\u660eThinkRepair\u5728\u4fee\u590d\u9519\u8bef\u65b9\u9762\u7684\u4f18\u5148\u7ea7\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728Defects4J V1.2\u4e0a\uff0cThinkRepair\u6210\u529f\u4fee\u590d\u4e8698\u4e2a\u9519\u8bef\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u63d0\u5347\u4e8627%-344.4%\u3002\u5728Defects4J V2.0\u4e0a\uff0cThinkRepair\u6bd4\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u591a\u4fee\u590d\u4e8612-65\u4e2a\u9519\u8bef\u3002\u6b64\u5916\uff0c\u5728Java\u548cPython\u4e0a\uff0cThinkRepair\u5728QuixBugs\u4e0a\u7684\u8868\u73b0\u4e5f\u6709\u4e86\u663e\u8457\u63d0\u5347\uff08\u6700\u591a\u5206\u522b\u8fbe\u523031\u548c21\uff09\u3002**|\n", "2407.20884": "|**2024-07-30**|**Effective Black Box Testing of Sentiment Analysis Classification Networks**|Parsa Karbasizadeh et.al.|[2407.20884](http://arxiv.org/abs/2407.20884)|null|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u795e\u7ecf\u7f51\u7edc\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u5982\u60c5\u611f\u5206\u6790\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u590d\u6742\u67b6\u6784\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u4fdd\u6301\u53ef\u9760\u6027\u7684\u6311\u6218\u4f9d\u7136\u5b58\u5728\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7ec4\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u4e3a\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7f51\u7edc\u6784\u5efa\u7684\u6d4b\u8bd5\u5957\u4ef6\u7684\u8986\u76d6\u6807\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u8f93\u5165\u7a7a\u95f4\u5212\u5206\u7684\u9ed1\u76d2\u7b56\u7565\uff0c\u8003\u8651\u4e86\u4e0e\u60c5\u611f\u76f8\u5173\u7684\u5173\u952e\u8bed\u8a00\u7279\u5f81\uff0c\u5305\u62ec\u52a8\u8bcd\u3001\u5f62\u5bb9\u8bcd\u3001\u526f\u8bcd\u548c\u540d\u8bcd\u3002\u4e3a\u4e86\u6709\u6548\u5730\u751f\u6210\u6db5\u76d6\u5e7f\u6cdb\u60c5\u611f\u5143\u7d20\u7684\u6d4b\u8bd5\u7528\u4f8b\uff0c\u6211\u4eec\u91c7\u7528\u4e86k\u6295\u5f71\u8986\u76d6\u5ea6\u91cf\u3002\u8be5\u5ea6\u91cf\u901a\u8fc7\u4e00\u6b21\u68c0\u67e5k\u4e2a\u7279\u5f81\u7684\u5b50\u96c6\u6765\u51cf\u5c11\u95ee\u9898\u7684\u590d\u6742\u6027\uff0c\u4ece\u800c\u964d\u4f4e\u7ef4\u5ea6\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u5c55\u793a\u7279\u5b9a\u60c5\u611f\u7279\u5f81\u7ec4\u5408\u7684\u53e5\u5b50\u3002\u4ece\u60c5\u611f\u5206\u6790\u6570\u636e\u96c6\u5b9e\u9a8c\u4e2d\u83b7\u5f97\u7684\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6807\u51c6\u548c\u751f\u6210\u7684\u6d4b\u8bd5\u5e73\u5747\u63d0\u9ad8\u4e8616%\u7684\u6d4b\u8bd5\u8986\u76d6\u7387\u3002\u540c\u65f6\uff0c\u6a21\u578b\u51c6\u786e\u5ea6\u5e73\u5747\u4e0b\u964d\u4e866.5%\uff0c\u663e\u793a\u4e86\u8bc6\u522b\u8106\u5f31\u6027\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u8bc4\u4f30\u6539\u8fdb\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7cfb\u7edf\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e0a\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u5de5\u5177\u7684\u5e94\u7528\u4f7f\u7cfb\u7edf\u80fd\u591f\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u4f7f\u5176\u4ece\u4ec5\u4ec5\u751f\u6210\u6587\u672c\u8f6c\u53d8\u4e3a\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u3002\u9274\u4e8e\u4ee3\u7406\u7684\u5b9e\u9645\u5e94\u7528\u8303\u56f4\u4ee5\u53ca\u5176\u5bf9\u73af\u5883\u8fdb\u884c\u64cd\u4f5c\u7684\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u9ed1\u5ba2\u5165\u4fb5\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u9020\u6210\u7684\u635f\u5bb3\u53ef\u80fd\u4f1a\u8d85\u8fc7\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u6709\u7814\u7a76\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e0d\u540c\u89d2\u5ea6\u5ba1\u89c6\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u6545\u969c\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u653b\u51fb\u624b\u6bb5\u3001\u573a\u666f\u548c\u5c5e\u6027\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5176\u6613\u611f\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u4e2a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u5bfc\u81f4\u8d85\u8fc780%\u7684\u5931\u8d25\u7387\u3002\u901a\u8fc7\u5728\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u9488\u5bf9\u5b9e\u73b0\u5e76\u90e8\u7f72\u7684\u4ee3\u7406\u8fdb\u884c\u653b\u51fb\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u6b64\u7c7b\u6f0f\u6d1e\u6240\u4f34\u968f\u7684\u73b0\u5b9e\u98ce\u9669\u3002\u4e3a\u4e86\u51cf\u8f7b\u6b64\u7c7b\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u4ec5\u4f7f\u7528LLM\u5f88\u96be\u6709\u6548\u68c0\u6d4b\u5230\u8fd9\u4e9b\u653b\u51fb\uff0c\u8fd9\u51f8\u663e\u4e86\u8fd9\u79cd\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.20856": "|**2024-07-30**|**Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations**|Sarthak Anand et.al.|[2407.20856](http://arxiv.org/abs/2407.20856)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(Large Language Models, LLMs)\u4e3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u4ea7\u54c1\u63a8\u8350\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u6709\u6548\u6027\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5b83\u4eec\u5bf9\u4ea7\u54c1\u5e93\u5b58\u7684\u5168\u9762\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u589e\u5f3aLLMs\u7684\u4ea7\u54c1\u77e5\u8bc6\u80fd\u529b\uff0c\u901a\u8fc7\u8bad\u7ec3\u5b83\u4eec\u54cd\u5e94\u5305\u542b\u4ea7\u54c1ID\u7684\u5408\u6210\u641c\u7d22\u67e5\u8be2\uff0c\u4ee5\u8fdb\u884c\u4e0a\u4e0b\u6587\u76f8\u5173\u56de\u590d\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0c\u8bc4\u4f30\u5176\u6548\u679c\uff0c\u6982\u8ff0\u5176\u4f18\u70b9\uff0c\u5e76\u6307\u51fa\u4e86\u9650\u5236\u56e0\u7d20\u3002\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u6b64\u65b9\u6cd5\u7684\u6539\u8fdb\u6f5c\u529b\u548c\u672a\u6765\u65b9\u5411\uff0c\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4ea7\u54c1\u63a8\u8350\u4e2d\u89d2\u8272\u7684\u5168\u9762\u7406\u89e3\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u5220\u9664\u6240\u6709','\u5b57\u7b26\u3002|\n", "2407.21771": "|**2024-07-31**|**Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs**|Shi Liu et.al.|[2407.21771](http://arxiv.org/abs/2407.21771)|null|\u73b0\u6709\u5927\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u4e3b\u8981\u901a\u8fc7\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u7684\u56fe\u50cf\u7279\u5f81\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\uff0c\u5229\u7528\u5176\u5f3a\u5927\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u89c4\u6a21\u5dee\u5f02\u53ef\u80fd\u5bfc\u81f4LLM\u5728\u591a\u6a21\u6001\u7406\u89e3\u4e2d\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u79cdLVLM\u4e2d\u7684\u4e0d\u5e73\u8861\u53ef\u80fd\u5f15\u53d1\u5e7b\u89c9\u73b0\u8c61\u3002\u5177\u4f53\u6765\u8bf4\uff0cLVLM\u53ef\u80fd\u751f\u6210\u4e00\u81f4\u7684\u63cf\u8ff0\uff0c\u65e0\u8bba\u662f\u5426\u6709\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u8868\u660e\u67d0\u4e9b\u8f93\u51fa\u4ec5\u53d7\u4e0a\u4e0b\u6587\u6587\u672c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u73b0\u8c61\u79f0\u4e3a\u201c\u6587\u672c\u60ef\u6027\u201d\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u7b97\u6cd5\u6765\u5bfb\u627e\u56fe\u50cf\u7406\u89e3\u548c\u8bed\u8a00\u63a8\u65ad\u4e4b\u95f4\u7684\u5e73\u8861\u70b9\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u52a8\u6001\u8c03\u6574\u5e76\u653e\u5927\u5206\u914d\u7ed9\u56fe\u50cf\u4ee4\u724c\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u4ece\u800c\u8d4b\u4e88\u89c6\u89c9\u5143\u7d20\u66f4\u5927\u7684\u91cd\u8981\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4ece\u591a\u6a21\u6001\u8f93\u5165\u7684logits\u4e2d\u51cf\u53bb\u7eaf\u6587\u672c\u8f93\u5165\u7684logits\uff0c\u6709\u52a9\u4e8eLVLM\u907f\u514d\u8fc7\u5206\u4f9d\u8d56LLM\u3002\u901a\u8fc7\u589e\u5f3a\u56fe\u50cf\u4ee4\u724c\u5e76\u51cf\u5c11LLM\u7684\u987d\u56fa\u8f93\u51fa\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9LVLM\u66f4\u591a\u5730\u5173\u6ce8\u56fe\u50cf\uff0c\u4ece\u800c\u7f13\u89e3\u6587\u672c\u60ef\u6027\u548c\u51cf\u5c11LVLM\u4e2d\u7684\u5e7b\u89c9\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0c\u5728\u4e0d\u540c\u6307\u6807\u4e0b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5404\u79cdLVLM\u4e2d\u7684\u5e7b\u89c9\u8f93\u51fa\u9891\u7387\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://lalbj.github.io/projects/PAI/\u3002|\n", "2407.21762": "|**2024-07-31**|**ReplanVLM: Replanning Robotic Tasks with Visual Language Models**|Aoran Mei et.al.|[2407.21762](http://arxiv.org/abs/2407.21762)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u9886\u57df\u83b7\u5f97\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u5206\u6790\u4e0e\u751f\u6210\u3001\u4ee5\u53ca\u5bf9\u4e16\u754c\u5e7f\u6cdb\u77e5\u8bc6\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u89e3\u6790\u89c6\u89c9\u7ebf\u7d22\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\uff0c\u65e0\u6cd5\u76f4\u63a5\u611f\u77e5\u4e16\u754c\u72b6\u6001\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u63cf\u8ff0\u5f53\u524d\u4e16\u754c\u72b6\u6001\u4e0a\u7684\u4e0d\u8db3\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u901a\u8fc7\u96c6\u6210\u89c6\u89c9\u611f\u77e5\u6a21\u5757\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u589e\u5f3a\u4e86\u673a\u5668\u4eba\u7684\u81ea\u4e3b\u6027\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cVLM\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4f8b\u5982\uff0c\u5728\u63d0\u4f9b\u51c6\u786e\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u6267\u884c\u9519\u8bef\u7684\u98ce\u9669\u4f9d\u7136\u5b58\u5728\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u7684ReplanVLM\u6846\u67b6\u3002\u8be5\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u9519\u8bef\u4fee\u6b63\u5e72\u9884\u63aa\u65bd\u3002\u63d0\u51fa\u4e86\u5185\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\u548c\u5916\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\uff0c\u5728\u76f8\u5e94\u7684\u9636\u6bb5\u8fdb\u884c\u9519\u8bef\u7ea0\u6b63\u3002\u53d1\u5c55\u4e86\u4e00\u79cd\u91cd\u89c4\u5212\u7b56\u7565\uff0c\u5f53\u4efb\u52a1\u6267\u884c\u5931\u8d25\u65f6\uff0c\u7528\u4e8e\u91cd\u65b0\u89c4\u5212\u4efb\u52a1\u6216\u4fee\u6b63\u9519\u8bef\u4ee3\u7801\u3002\u5728\u771f\u5b9e\u673a\u5668\u4eba\u548c\u4eff\u771f\u73af\u5883\u4e2d\u8fdb\u884c\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6846\u67b6\u5177\u6709\u66f4\u9ad8\u7684\u6210\u529f\u7387\u548c\u66f4\u5f3a\u7684\u5f00\u653e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\u3002\u6709\u5173\u5b9e\u9a8c\u7684\u89c6\u9891\u53ef\u4ee5\u5728https://youtu.be/NPk2pWKazJc\u627e\u5230\u3002|\n", "2407.21712": "|**2024-07-31**|**Adaptive Retrieval-Augmented Generation for Conversational Systems**|Xi Wang et.al.|[2407.21712](http://arxiv.org/abs/2407.21712)|null|\u5c3d\u7ba1\u5728\u5bf9\u8bdd\u7cfb\u7edf\u5f00\u53d1\u4e2d\u878d\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u8bb8\u591a\u7814\u7a76\u663e\u793a\u4e86\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u5bf9\u4e8e\u63d0\u4f9b\u4fe1\u606f\u6027\u54cd\u5e94\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7814\u7a76\u901a\u5e38\u5047\u8bbe\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u6bcf\u6b21\u56de\u590d\u90fd\u9700\u8981\u68c0\u7d22\u589e\u5f3a\uff0c\u800c\u65e0\u9700\u660e\u786e\u63a7\u5236\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u5173\u4e8e\u8fd9\u79cd\u5fc5\u8981\u6027\u7684\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u7cfb\u7edf\u56de\u5e94\u662f\u5426\u9700\u8981\u4f7f\u7528\u5916\u90e8\u77e5\u8bc6\u8fdb\u884c\u589e\u5f3a\u7684\u5fc5\u8981\u6027\u3002\u901a\u8fc7\u5229\u7528\u4eba\u7c7b\u5bf9\u662f\u5426\u9700\u8981\u9002\u5e94\u6027\u589e\u5f3a\u7684\u4e8c\u5143\u9009\u62e9\u8fdb\u884c\u5224\u65ad\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RAGate\u2014\u2014\u4e00\u4e2a\u95f8\u95e8\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u901a\u8fc7\u5206\u6790\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u548c\u76f8\u5173\u8f93\u5165\u6765\u9884\u6d4b\u5bf9\u8bdd\u7cfb\u7edf\u662f\u5426\u9700\u8981RAG\u4ee5\u83b7\u5f97\u6539\u8fdb\u7684\u56de\u590d\u3002\u6211\u4eec\u5728\u6784\u5efa\u548c\u5e94\u7528RAGate\u5230\u5bf9\u8bdd\u6a21\u578b\u4ee5\u53ca\u5bf9\u4e0d\u540c\u5bf9\u8bdd\u573a\u666f\u8fdb\u884c\u8be6\u5c3d\u5206\u6790\u65b9\u9762\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u548c\u5206\u6790\u8868\u660e\uff0cRAGate\u5728\u8bc6\u522b\u9700\u8981RAG\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u56de\u590d\u5e76\u5177\u6709\u9ad8\u751f\u6210\u7f6e\u4fe1\u5ea6\u7684\u7cfb\u7edf\u54cd\u5e94\u65b9\u9762\u6709\u6709\u6548\u5e94\u7528\u3002\u8fd9\u9879\u7814\u7a76\u8fd8\u53d1\u73b0\u4e86\u751f\u6210\u7f6e\u4fe1\u5ea6\u6c34\u5e73\u4e0e\u589e\u5f3a\u77e5\u8bc6\u7684\u76f8\u5173\u6027\u3002|\n", "2407.21708": "|**2024-07-31**|**CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature**|Stefan Langer et.al.|[2407.21708](http://arxiv.org/abs/2407.21708)|null|\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5df2\u6807\u6ce8\u6587\u672c\u8bed\u6599\u5e93\u548c\u4eceChebi\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u589e\u5f3a\u73b0\u6709\u77e5\u8bc6\uff0c\u5e76\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u8bc6\u522b\u79d1\u5b66\u6587\u732e\u4e2d\u7684\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u672c\u4f53\u8bba\u77e5\u8bc6\u4e0eLLM\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u79d1\u5b66\u6587\u732e\u4e2d\u8bc6\u522b\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u7684\u9ad8\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u4ece8000\u7bc7ChemRxiv\u6587\u7ae0\u4e2d\u63d0\u53d6\u8fd9\u4e9b\u5b9e\u4f53\u548c\u89d2\u8272\uff0c\u7136\u540e\u4f7f\u7528\u7b2c\u4e8c\u4e2aLLM\u6784\u5efa\u4e86\u4e00\u4e2a\u5316\u5b66\u5b9e\u4f53\u548c\u89d2\u8272\u7684\u77e5\u8bc6\u56fe\u8c31\uff08CEAR\uff09\uff0c\u8be5\u56fe\u8c31\u4e0d\u4ec5\u4e3aChEBI\u63d0\u4f9b\u4e86\u8865\u5145\u4fe1\u606f\uff0c\u8fd8\u80fd\u5e2e\u52a9\u6269\u5c55\u5176\u5185\u5bb9\u3002|\n", "2407.21693": "|**2024-07-31**|**TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities**|Ming Zhang et.al.|[2407.21693](http://arxiv.org/abs/2407.21693)|**[link](https://github.com/konglonggefdu/transfertod)**|\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\uff08TOD\uff09\u7cfb\u7edf\u65e8\u5728\u6709\u6548\u5904\u7406\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\uff0c\u5305\u62ec\u4fe1\u606f\u6536\u96c6\u3002\u5982\u4f55\u51c6\u786e\u3001\u9ad8\u6548\u4e14\u6709\u6548\u5730\u5229\u7528TOD\u8fdb\u884c\u4fe1\u606f\u6536\u96c6\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u4e00\u4e2a\u5173\u952e\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u3001\u6307\u4ee4\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u80fd\u591f\u901a\u8fc7\u5fae\u8c03\u663e\u8457\u63d0\u9ad8TOD\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6570\u636e\u96c6\u4e3b\u8981\u9488\u5bf9\u7528\u6237\u9a71\u52a8\u7684\u7cfb\u7edf\uff0c\u5e76\u5c40\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u7279\u5b9a\u573a\u666f\u548c\u69fd\u4f4d\uff0c\u56e0\u6b64\u9700\u8981\u5728TOD\u7684\u4e3b\u52a8\u6027\u3001\u591a\u6837\u6027\u548c\u80fd\u529b\u65b9\u9762\u8fdb\u884c\u6539\u8fdb\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u591a\u9886\u57df\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u6570\u636e\u6784\u5efa\u8fc7\u7a0b\u4ee5\u53ca\u57fa\u4e8e\u6b64\u8fc7\u7a0b\u751f\u6210\u7684\u4e2d\u6587\u5bf9\u8bdd\u6570\u636e\u96c6\u2014\u2014\\textbf{TransferTOD}\uff0c\u8be5\u6570\u636e\u96c6\u771f\u5b9e\u6a21\u62df\u4e86\u572830\u4e2a\u6d41\u884c\u751f\u6d3b\u670d\u52a1\u573a\u666f\u4e2d\u7684\u4eba\u673a\u5bf9\u8bdd\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4f7f\u7528\u5168\u53c2\u6570\u5fae\u8c03\u7684\\textbf{TransferTOD-7B}\u6a21\u578b\uff0c\u5c55\u793a\u4e86\u5728\u5404\u79cd\u4e0b\u6e38\u573a\u666f\u4e2d\u7684\u663e\u8457\u7684\u586b\u69fd\u80fd\u529b\u548c\u63d0\u95ee\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u6570\u636e\u5e94\u7528\u573a\u666f\u4e0b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6570\u636e\u4f7f\u7528\u6548\u7387\u548c\u7cfb\u7edf\u6027\u80fd\u3002\u6570\u636e\u5df2\u53d1\u5e03\u4e8ehttps://github.com/KongLongGeFDU/TransferTOD\u3002|\n", "2407.21669": "|**2024-07-31**|**Synth-Empathy: Towards High-Quality Synthetic Empathy Data**|Hao Liang et.al.|[2407.21669](http://arxiv.org/abs/2407.21669)|**[link](https://github.com/aurora-slz/synth-empathy)**|\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b9e\u73b0\u51fa\u8272\u540c\u7406\u5fc3\u54cd\u5e94\u80fd\u529b\u5df2\u6210\u4e3a\u4e00\u4e2a\u81f3\u5173\u91cd\u8981\u7684\u524d\u63d0\u3002\u56e0\u6b64\uff0c\u7ba1\u7406\u548c\u7406\u89e3\u540c\u7406\u5fc3\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u540c\u7406\u5fc3\u6570\u636e\u901a\u5e38\u7531\u4eba\u7c7b\u6807\u6ce8\uff0c\u5bfc\u81f4\u6570\u636e\u91cf\u4e0d\u8db3\u548c\u5927\u91cf\u7684\u4eba\u529b\u6d6a\u8d39\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSynth-Empathy\u7684LLM\u57fa\u4e8e\u7684\u6570\u636e\u751f\u6210\u4e0e\u8d28\u91cf\u3001\u591a\u6837\u6027\u9009\u62e9\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u540c\u7406\u5fc3\u6570\u636e\u5e76\u7b5b\u9009\u6389\u4f4e\u8d28\u91cf\u6570\u636e\u3002\u901a\u8fc7\u5229\u7528\u4f4e\u540c\u7406\u5fc3\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u9ad8\u4e86\u540c\u7406\u5fc3\u54cd\u5e94\u6027\u80fd\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\uff08SoTA\uff09\u7ed3\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5404\u79cd\u4eba\u7c7b\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5747\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6570\u636e\u91cf\u4e0e\u8d28\u91cf\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u63d0\u4f9b\u4e86\u540c\u7406\u5fc3\u6570\u636e\u751f\u6210\u4e0e\u9009\u62e9\u65b9\u9762\u7684\u89c1\u89e3\u3002|\n", "2407.21593": "|**2024-07-31**|**LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows**|Lukas Teufelberger et.al.|[2407.21593](http://arxiv.org/abs/2407.21593)|null|\u4e3a\u4e86\u63d0\u9ad8\u751f\u4ea7\u529b\u5e76\u4f18\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u529f\u80fd\u5d4c\u5165\u5e94\u7528\u7a0b\u5e8f\u7684\u8d8b\u52bf\u6b63\u5728\u589e\u957f\uff0c\u4ece\u57fa\u4e8e\u6d4f\u89c8\u5668\u7684\u7f51\u7edc\u5e94\u7528\u5230\u5728\u4e2a\u4eba\u8ba1\u7b97\u673a\u4e0a\u8fd0\u884c\u7684\u539f\u751f\u5e94\u7528\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7cfb\u7edf\u7ea7\u5feb\u6377\u65b9\u5f0f\u5c42\u2014\u2014LLM-for-X\uff0c\u5b83\u901a\u8fc7\u8f7b\u91cf\u7ea7\u5f39\u51fa\u5f0f\u5bf9\u8bdd\u6846\u65e0\u7f1d\u5730\u5411\u4efb\u4f55\u5e94\u7528\u7a0b\u5e8f\u6dfb\u52a0LLM\u670d\u52a1\u3002\u6211\u4eec\u7684\u539f\u751f\u5c42\u901a\u8fc7\u7edf\u4e00\u7684\u804a\u5929\u524d\u7aef\u4f5c\u4e3a\u7f16\u7a0b\u63a5\u53e3\u6216\u81ea\u5b9a\u4e49API\u8c03\u7528\uff0c\u5c06\u524d\u7aef\u5e94\u7528\u7a0b\u5e8f\u4e0e\u6d41\u884c\u7684LLM\u540e\u7aef\uff08\u5982ChatGPT\u548cGemini\uff09\u65e0\u7f1d\u8fde\u63a5\u3002\u6211\u4eec\u5c55\u793a\u4e86LLM-for-X\u5728Microsoft Office\u3001VSCode\u3001Adobe Acrobat\u4ee5\u53caOverleaf\u7b49\u6d41\u884c\u7f51\u7edc\u5e94\u7528\u4e2d\u7684\u4f18\u52bf\u3002\u5728\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c06LLM-for-X\u4e0eChatGPT\u7684\u7f51\u9875\u754c\u9762\u8fdb\u884c\u4e86\u4efb\u52a1\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u4f9b\u5feb\u901f\u3001\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u7684LLM\u8f85\u52a9\uff0c\u65e0\u9700\u5207\u6362\u4e0a\u4e0b\u6587\u652f\u6301\u5199\u4f5c\u548c\u9605\u8bfb\u4efb\u52a1\uff0c\u540c\u65f6\u5bf9\u7279\u5b9a\u5e94\u7528\u65e0\u7279\u5b9a\u4f9d\u8d56\u3002|\n", "2407.21579": "|**2024-07-31**|**A Performance Study of LLM-Generated Code on Leetcode**|Tristan Coignion et.al.|[2407.21579](http://arxiv.org/abs/2407.21579)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6548\u7387\uff0c\u5e76\u4f7f\u7528\u6765\u81eaLeetCode\u7684\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7f16\u5199\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u6211\u4eec\u5bf9\u6bd4\u4e8618\u4e2aLLM\uff0c\u8003\u8651\u4e86\u6a21\u578b\u6e29\u5ea6\u548c\u6210\u529f\u7387\u7b49\u56e0\u7d20\u5bf9\u4ee3\u7801\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5ea6\u91cf\u548c\u6bd4\u8f83LLM\u751f\u6210\u4ee3\u7801\u7684\u901f\u5ea6\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u91c7\u7528\u4e0d\u540cLLM\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u6027\u80fd\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u5e73\u5747\u800c\u8a00\u6bd4\u4eba\u7c7b\u7f16\u5199\u7684\u4ee3\u7801\u66f4\u9ad8\u6548\u3002\u8bba\u6587\u8fdb\u4e00\u6b65\u8ba8\u8bba\u4e86\u4f7f\u7528LeetCode\u4f5c\u4e3a\u57fa\u51c6\u6570\u636e\u96c6\u3001\u6f5c\u5728\u6570\u636e\u6c61\u67d3\u5e26\u6765\u7684\u9650\u5236\u4ee5\u53ca\u5e73\u53f0\u6d4b\u91cf\u53ef\u9760\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3LLM\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u80fd\u529b\uff0c\u5e76\u4e3a\u8be5\u9886\u57df\u672a\u6765\u7684\u4f18\u5316\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2407.21571": "|**2024-07-31**|**PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning**|Min Jae Jung et.al.|[2407.21571](http://arxiv.org/abs/2407.21571)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u9047\u5230\u91cd\u5927\u6311\u6218\uff0c\u4e3b\u8981\u5728\u4e8e\u707e\u96be\u6027\u9057\u5fd8\u73b0\u8c61\uff0c\u5373\u65b0\u4fe1\u606f\u4f1a\u8986\u76d6\u4e4b\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u8fd9\u4e00\u5c40\u9650\u6027\u5bfc\u81f4\u4e86\u5927\u91cf\u73af\u5883\u548c\u7ecf\u6d4e\u8d44\u6e90\u7684\u6d6a\u8d39\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aPMoE\uff08Progressive Mixture of Experts with Asymmetric Transformer\uff09\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u65e8\u5728\u901a\u8fc7\u91c7\u7528\u5177\u6709\u6d45\u5c42\u7528\u4e8e\u4e00\u822c\u77e5\u8bc6\u548c\u6df1\u5c42\u7528\u4e8e\u65b0\u77e5\u8bc6\u7684\u4e0d\u5bf9\u79f0\u8bbe\u8ba1\u6765\u6700\u5c0f\u5316\u9057\u5fd8\u3002PMoE\u5728\u6df1\u5c42\u5f15\u5165\u4e86\u9010\u6b65\u589e\u52a0\u7684\u4e13\u5bb6\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u8be5\u8def\u7531\u5668\u80fd\u591f\u9ad8\u6548\u5730\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u5408\u9002\u7684\u4e13\u5bb6\u3002 \u8def\u7531\u5668\u4f4d\u4e8e\u6df1\u5c42\u9644\u8fd1\uff0c\u5229\u7528\u6df1\u5ea6\u7279\u5f81\u805a\u5408\u5df2\u6574\u5408\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u8def\u7531\u5668\u80fd\u591f\u6709\u6548\u5730\u6267\u884c\u4efb\u52a1\uff0c\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u9010\u6b65\u589e\u52a0\u7684\u6df1\u5c42\u4e13\u5bb6\u3002\u901a\u8fc7\u5728TRACE\u6570\u636e\u96c6\u548c\u901a\u7528\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684PMoE\u65b9\u6cd5\u4f18\u4e8e\u5148\u524d\u7684\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2407.21553": "|**2024-07-31**|**CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment**|Akira Kasuga et.al.|[2407.21553](http://arxiv.org/abs/2407.21553)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5ba2\u6237\u4f53\u9a8c\uff08CX\uff09\u6a21\u62df\u5668\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u7528\u6237\u884c\u4e3a\u6a21\u62df\u6765\u8bc4\u4f30\u672a\u6d4b\u8bd5\u7684\u7f51\u7edc\u8425\u9500\u6d3b\u52a8\u7684\u5f71\u54cd\u3002\u8be5\u63d0\u51fa\u7684\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u7528\u6237\u884c\u4e3a\u5386\u53f2\u4e2d\u7684\u5404\u79cd\u4e8b\u4ef6\uff0c\u5982\u67e5\u770b\u5546\u54c1\u3001\u4f7f\u7528\u4f18\u60e0\u5238\u6216\u8d2d\u4e70\u5546\u54c1\u7b49\uff0c\u8868\u793a\u4e3a\u8bed\u4e49\u5d4c\u5165\u5411\u91cf\u3002\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u6a21\u578b\uff0c\u7528\u4e8e\u4ece\u5176LLM\u5d4c\u5165\u4e2d\u9884\u6d4b\u4e8b\u4ef6\u4e4b\u95f4\u7684\u8fc7\u6e21\uff0c\u751a\u81f3\u53ef\u4ee5\u4ece\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u4e2d\u5b66\u4e60\uff0c\u4ece\u800c\u5bf9\u672a\u77e5\u4e8b\u4ef6\u8fdb\u884c\u6cdb\u5316\u3002\u5728web\u8425\u9500\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u8fc7\u6e21\u9884\u6d4b\u6a21\u578b\u6765\u6a21\u62df\u5f53\u65b0\u7684\u8425\u9500\u6d3b\u52a8\u6216\u4ea7\u54c1\u5c55\u793a\u7ed9\u7528\u6237\u65f6\uff0c\u7528\u6237\u53ef\u80fd\u5982\u4f55\u53cd\u5e94\u4e0d\u540c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u6d88\u9664\u5728\u7ebf\u6d4b\u8bd5\u7684\u9ad8\u6602\u6210\u672c\uff0c\u5e76\u589e\u5f3a\u8425\u9500\u4eba\u5458\u63ed\u793a\u6d1e\u5bdf\u529b\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u6570\u503c\u8bc4\u4f30\u548c\u4f7f\u7528Google\u5546\u54c1\u5546\u5e97\u7684\u5927\u89c4\u6a21\u516c\u5171\u6570\u636e\u96c6\u8fdb\u884c\u7684\u7528\u6237\u7814\u7a76\u8bc1\u660e\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u6574LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\u3002\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u8fc7\u7a0b\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u7684\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6AgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u5305\u542b\u5404\u79cd\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u96be\u5ea6\u591a\u6837\u6027\u7684\u7a0b\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u5bb9\u6613\u548c\u56f0\u96be\u7684\u4e24\u4e2a\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u4f7f\u7528AgentGen\u6307\u4ee4\u8c03\u6574\u7684Llama-3 8B\u5728\u603b\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u6b64\u5916\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4\u3002|\n", "2408.00761": "|**2024-08-01**|**Tamper-Resistant Safeguards for Open-Weight LLMs**|Rishub Tamirisa et.al.|[2408.00761](http://arxiv.org/abs/2408.00761)|**[link](https://github.com/rishub-tamirisa/tamper-resistance)**|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5f15\u53d1\u4e86\u5bf9\u6f5c\u5728\u6076\u610f\u7528\u9014\u7684\u5e7f\u6cdb\u62c5\u5fe7\u3002\u9488\u5bf9\u5f00\u653e\u6743\u91cd\u7684LLM\uff0c\u73b0\u6709\u4fdd\u62a4\u63aa\u65bd\u5728\u62b5\u6297\u7be1\u6539\u653b\u51fb\u65b9\u9762\u7f3a\u4e4f\u8db3\u591f\u7684\u7a33\u5b9a\u6027\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6b65\u9aa4\u8f7b\u6613\u5730\u79fb\u9664\u62d2\u7edd\u548c\u9057\u5fd8\u4fdd\u62a4\u63aa\u65bd\u3002\u8fd9\u7c7b\u6f0f\u6d1e\u8981\u6c42\u91c7\u53d6\u65b0\u7684\u65b9\u6cd5\u6765\u786e\u4fdd\u5b89\u5168\u91ca\u653e\u5f00\u653e\u6743\u91cd\u7684LLM\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTAR\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u4e0d\u53ef\u7be1\u6539\u7684\u5b89\u5168\u9632\u62a4\u878d\u5165\u5230\u5f00\u653e\u6743\u91cd\u7684LLM\u4e2d\uff0c\u4f7f\u5f97\u5373\u4f7f\u7ecf\u8fc7\u6570\u5343\u6b65\u7684\u5fae\u8c03\uff0c\u653b\u51fb\u8005\u4e5f\u65e0\u6cd5\u79fb\u9664\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u3002\u5728\u5168\u9762\u7684\u8bc4\u4f30\u548c\u7ea2\u961f\u6d4b\u8bd5\u5206\u6790\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u9632\u62a4\u7684\u4e0d\u53ef\u7be1\u6539\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u6027\u529f\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u4e0d\u53ef\u7be1\u6539\u6027\u662f\u4e00\u4e2a\u53ef\u884c\u7684\u95ee\u9898\uff0c\u4e3a\u6539\u8fdb\u5f00\u653e\u6743\u91cdLLM\u7684\u5b89\u5168\u6027\u548c\u5b89\u5168\u6027\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u65b0\u9014\u5f84\u3002|\n", "2408.00741": "|**2024-08-01**|**DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency**|Jovan Stojkovic et.al.|[2408.00741](http://arxiv.org/abs/2408.00741)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210\u80fd\u529b\u4f7f\u5176\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u6210\u4e3a\u5173\u952e\u7684\u5de5\u4f5c\u8d1f\u8f7d\u3002\u5982\u4eca\uff0cLLM\u63a8\u7406\u96c6\u7fa4\u5904\u7406\u5927\u91cf\u67e5\u8be2\uff0c\u5e76\u5bf9\u670d\u52a1\u8d28\u91cf\u6307\u6807\uff08SLOs\uff09\u6709\u4e25\u683c\u8981\u6c42\u3002\u4e3a\u4e86\u8fbe\u5230\u9884\u671f\u6027\u80fd\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u80fd\u8017\u9ad8\u7684GPU\u4e0a\u6267\u884c\uff0c\u5bfc\u81f4\u63a8\u7406\u96c6\u7fa4\u6d88\u8017\u5927\u91cf\u80fd\u6e90\uff0c\u5e76\u4ea7\u751f\u8fc7\u91cf\u7684\u78b3\u6392\u653e\u3002\u5e78\u8fd0\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u53ef\u4ee5\u901a\u8fc7\u5229\u7528\u63a8\u7406\u8ba1\u7b97\u7279\u6027\u7684\u5f02\u8d28\u6027\u4ee5\u53ca\u5de5\u4f5c\u8d1f\u8f7d\u7684\u6ce2\u52a8\uff0c\u663e\u8457\u63d0\u9ad8\u80fd\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u591a\u6837\u6027\u548c\u52a8\u6001\u73af\u5883\u521b\u9020\u4e86\u4e00\u4e2a\u5de8\u5927\u7684\u641c\u7d22\u7a7a\u95f4\uff0c\u4e0d\u540c\u7684\u7cfb\u7edf\u914d\u7f6e\uff08\u5982\u5b9e\u4f8b\u6570\u91cf\u3001\u6a21\u578b\u5e76\u884c\u6027\u548cGPU\u9891\u7387\uff09\u5bfc\u81f4\u4e0d\u540c\u7684\u80fd\u6e90\u548c\u6027\u80fd\u6298\u8877\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DynamoLLM\uff0c\u8fd9\u662f\u9996\u4e2a\u9488\u5bf9LLM\u63a8\u7406\u73af\u5883\u7684\u80fd\u6548\u7ba1\u7406\u6846\u67b6\u3002DynamoLLM\u81ea\u52a8\u4e14\u52a8\u6001\u5730\u91cd\u65b0\u914d\u7f6e\u63a8\u7406\u96c6\u7fa4\uff0c\u4ee5\u4f18\u5316\u80fd\u6e90\u548c\u6210\u672c\uff0c\u540c\u65f6\u6ee1\u8db3\u670d\u52a1\u7684\u6027\u80fdSLOs\u3002\u7814\u7a76\u8868\u660e\uff0c\u5728\u670d\u52a1\u5c42\u9762\uff0cDynamoLLM\u80fd\u591f\u8282\u770153%\u7684\u80fd\u6e90\u548c38%\u7684\u64cd\u4f5c\u78b3\u6392\u653e\uff0c\u5e76\u4e3a\u5ba2\u6237\u51cf\u5c1161%\u7684\u6210\u672c\uff0c\u540c\u65f6\u4ecd\u80fd\u6ee1\u8db3\u5ef6\u8fdfSLOs\u3002|\n", "2408.00727": "|**2024-08-01**|**Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions**|Guangzhi Xiong et.al.|[2408.00727](http://arxiv.org/abs/2408.00727)|**[link](https://github.com/teddy-xionggz/medrag)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u89e3\u51b3\u533b\u7597\u95ee\u9898\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u80fd\u591f\u638c\u63e1\u5927\u91cf\u533b\u5b66\u77e5\u8bc6\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u4e14\u5728\u77e5\u8bc6\u66f4\u65b0\u65b9\u9762\u5177\u6709\u5c40\u9650\u6027\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u5728\u533b\u5b66\u95ee\u7b54\u65b9\u9762\u7684\u80fd\u529b\uff0c\u63d0\u51fa\u4e86\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\uff0c\u901a\u8fc7\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u591a\u6b21\u4fe1\u606f\u67e5\u8be2\u7684\u590d\u6742\u60c5\u51b5\u4e0b\uff0cRAG\u53ef\u80fd\u4ecd\u7136\u4f1a\u5931\u8d25\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3RAG\u65b9\u6cd5\uff08i-MedRAG\uff09\uff0c\u5141\u8bb8LLM\u5728\u6bcf\u6b21\u5c1d\u8bd5\u540e\u8fed\u4ee3\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u3002\u5728\u6bcf\u6b21i-MedRAG\u8fed\u4ee3\u4e2d\uff0c\u540e\u7eed\u95ee\u9898\u7531\u57fa\u672c\u7684RAG\u7cfb\u7edf\u56de\u7b54\uff0c\u5e76\u7528\u4e8e\u6307\u5bfc\u4e0b\u4e00\u4e2a\u8fed\u4ee3\u4e2d\u7684\u67e5\u8be2\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ec5\u4f7f\u7528RAG\u7684\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0ci-MedRAG\u663e\u8457\u63d0\u9ad8\u4e86\u5404\u79cdLLM\u5728\u590d\u6742\u95ee\u9898\u4e0a\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u95ee\u9898\u662f\u7f8e\u56fd\u533b\u5b66\u751f\u6267\u7167\u8003\u8bd5\uff08USMLE\uff09\u4e34\u5e8a\u6848\u4f8b\u548c\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u8bed\u8a00\u7406\u89e3\uff08MMLU\uff09\u6570\u636e\u96c6\u4e2d\u7684\u77e5\u8bc6\u6d4b\u8bd5\u6240\u6db5\u76d6\u7684\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u96f6\u6837\u672ci-MedRAG\u5728GPT-3.5\u4e0a\u53d6\u5f97\u4e8669.68%\u7684\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u6240\u6709\u73b0\u6709\u7684\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u65b9\u6cd5\u5728MedQA\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86i-MedRAG\u5728\u4e0d\u540c\u8fed\u4ee3\u6b21\u6570\u548c\u6bcf\u8fed\u4ee3\u67e5\u8be2\u6570\u91cf\u4e0b\u7684\u6269\u5c55\u7279\u6027\u3002 \u6211\u4eec\u7684\u6848\u4f8b\u7814\u7a76\u663e\u793a\uff0ci-MedRAG\u80fd\u591f\u7075\u6d3b\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u5f62\u6210\u63a8\u7406\u94fe\uff0c\u6df1\u5165\u5206\u6790\u533b\u7597\u95ee\u9898\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u540e\u7eed\u95ee\u9898\u878d\u5165\u533b\u5b66RAG\u7684\u7814\u7a76\u3002|\n", "2408.00724": "|**2024-08-01**|**An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models**|Yangzhen Wu et.al.|[2408.00724](http://arxiv.org/abs/2408.00724)|null|\u5728\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u4f18\u8bad\u7ec3\u914d\u7f6e\u7814\u7a76\u4e2d\uff0c\u7279\u522b\u662f\u5728\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u65b9\u9762\u7684\u914d\u7f6e\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u63a2\u8ba8\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u7406\u9636\u6bb5\u5982\u4f55\u6700\u4f18\u5316\u914d\u7f6eLLM\u4ee5\u5e73\u8861\u989d\u5916\u7684\u63a8\u7406\u8ba1\u7b97\u65f6\u95f4\u548c\u6027\u80fd\u63d0\u5347\u7684\u7814\u7a76\u8fd8\u4e0d\u591f\u6df1\u5165\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\uff0c\u5373\u8bbe\u8ba1\u80fd\u591f\u901a\u8fc7\u8c03\u6574\u63a8\u7406\u65f6\u95f4\u7684\u8ba1\u7b97\u91cf\u6765\u4f18\u5316\u6027\u80fd\u7684\u6a21\u578b\u548c\u63a8\u7406\u7b56\u7565\u3002 \u4e3a\u4e86\u7406\u89e3\u5e76\u8bbe\u8ba1\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5bf9\u591a\u79cd\u63a8\u7406\u7b56\u7565\uff0c\u5982\u8d2a\u5fc3\u641c\u7d22\u3001\u591a\u6570\u6295\u7968\u3001\u6700\u4f73N\u79cd\u7ec4\u5408\u3001\u52a0\u6743\u6295\u7968\u53ca\u5176\u53d8\u4f53\uff0c\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u6811\u641c\u7d22\u7b97\u6cd5\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d89\u53ca\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u914d\u5408\u66f4\u5148\u8fdb\u7684\u89e3\u7801\u7b97\u6cd5\u901a\u5e38\u80fd\u5b9e\u73b0\u5e15\u7d2f\u6258\u6700\u4f18\u7684\u6743\u8861\uff0c\u5373\u5728\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u4e0e\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u627e\u5230\u6700\u4f73\u5e73\u8861\u70b9\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u5728\u9884\u7b97\u6709\u9650\u7684\u573a\u666f\u4e0b\uff0c\u5982\u7ec8\u7aef\u8bbe\u5907\u4e0a\u90e8\u7f72\u5c0f\u578b\u6a21\u578b\uff0c\u53ef\u80fd\u5177\u6709\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u9ad8\u95ee\u9898\u89e3\u51b3\u7684\u51c6\u786e\u7387\u3002 \u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86Llemma-7B\u6a21\u578b\u5728\u4f7f\u7528\u7ea6\u4e24\u500d\u4e8eLlemma-34B\u6a21\u578b\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u5b9e\u73b0\u4e0e\u540e\u8005\u76f8\u5f53\u7684MATH500\u4efb\u52a1\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u53ef\u80fd\u9002\u7528\u4e8e\u4efb\u4f55\u6709\u660e\u786e\u6210\u529f\u5ea6\u91cf\u6807\u51c6\u7684\u751f\u6210\u4efb\u52a1\u3002|\n", "2408.00722": "|**2024-08-01**|**Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities**|Sunder Ali Khowaja et.al.|[2408.00722](http://arxiv.org/abs/2408.00722)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u65b0\u5174\u5e94\u7528\u4e2d\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u800c\u5907\u53d7\u5173\u6ce8\uff0c\u8fd9\u4e9b\u5e94\u7528\u5305\u62ec\u901a\u4fe1\u7f51\u7edc\u3002\u9884\u8ba16G\u79fb\u52a8\u8fb9\u7f18\u8ba1\u7b97\u7f51\u7edc\u5c06\u80fd\u591f\u4f5c\u4e3a\u670d\u52a1\u652f\u6301LLMs\uff0c\u56e0\u4e3a\u5b83\u4eec\u63d0\u4f9b\u8d85\u53ef\u9760\u7684\u4f4e\u5ef6\u8fdf\u901a\u4fe1\u548c\u95ed\u73af\u5927\u89c4\u6a21\u8fde\u63a5\u3002\u7136\u800c\uff0cLLMs\u5728\u6570\u636e\u548c\u6a21\u578b\u9690\u79c1\u65b9\u9762\u5b58\u5728\u6f0f\u6d1e\uff0c\u8fd9\u5f71\u54cd\u4e86\u5728\u7528\u6237\u670d\u52a1\u4e2d\u90e8\u7f72LLMs\u7684\u4fe1\u4efb\u5ea6\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u65f6\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u7279\u522b\u662f\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u653b\u51fb\u7f51\u7edc\u7684\u7279\u5f81\uff0c\u8be5\u7f51\u7edc\u53ef\u4ee5\u5728\u8bbf\u95ee\u4e0b\u6e38\u4efb\u52a1\u7ec6\u8c03\u6a21\u578b\u65f6\u6267\u884c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\uff0c\u524d\u63d0\u662f\u653b\u51fb\u8005\u53ef\u4ee5\u8bbf\u95ee\u8be5\u6a21\u578b\u3002\u6211\u4eec\u8868\u660e\uff0c\u5bf9\u4e8e\u4efb\u4f55\u4e0b\u6e38\u4efb\u52a1\uff0c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u90fd\u662f\u6709\u6548\u7684\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5728\u4f7f\u7528LLMs\u4f5c\u4e3a\u670d\u52a1\u65f6\u53d1\u751f\u4e2a\u4eba\u6570\u636e\u6cc4\u9732\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u4e0a\uff0c\u653b\u51fb\u6210\u529f\u7387\u53ef\u8fbe92%\u3002\u57fa\u4e8e\u5b9e\u9a8c\u5206\u6790\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u9632\u5fa1\u673a\u5236\uff0c\u5e76\u63d0\u51fa\u4e86\u53ef\u80fd\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u4f7f\u57286G\u7f51\u7edc\u80cc\u666f\u4e0bLLMs\u66f4\u52a0\u53ef\u9760\u3002|\n", "2408.00690": "|**2024-08-02**|**Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning**|Trapoom Ukarapol et.al.|[2408.00690](http://arxiv.org/abs/2408.00690)|**[link](https://github.com/trapoom555/language-model-sts-cft)**|\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u7279\u70b9\u800c\u964d\u4f4e\u4e86\u5176\u53ef\u83b7\u53d6\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u5982MiniCPM\u63d0\u4f9b\u4e86\u66f4\u53ef\u6301\u7eed\u7684\u6269\u5c55\u6027\uff0c\u4f46\u5f80\u5f80\u5728\u6ca1\u6709\u4e13\u95e8\u4f18\u5316\u7684\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u4f73\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u63d0\u5347\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\u6765\u589e\u5f3a\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff1aMiniCPM\u3001Phi-2\u548cGemma\uff0c\u5728NLI\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u5347\u6240\u6709\u4e09\u79cd\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\uff0c\u5176\u4e2dMiniCPM\u8868\u73b0\u51fa\u6700\u663e\u8457\u7684\u5e73\u574756.33%\u6027\u80fd\u63d0\u5347\u3002\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/trapoom555/Language-Model-STS-CFT\u3002|\n", "2408.00686": "|**2024-08-01**|**Can Developers Prompt? A Controlled Experiment for Code Documentation Generation**|Hans-Alexander Kruse et.al.|[2408.00686](http://arxiv.org/abs/2408.00686)|null|\u6211\u4eec\u5bf920\u540d\u4e13\u4e1a\u4eba\u58eb\u548c30\u540d\u8ba1\u7b97\u673a\u79d1\u5b66\u5b66\u751f\u8fdb\u884c\u4e86\u4e00\u4e2a\u53d7\u63a7\u5b9e\u9a8c\uff0c\u8981\u6c42\u4ed6\u4eec\u4f7f\u7528ChatGPT\u98ce\u683c\u7684Visual Studio Code\u6269\u5c55\u6765\u4e3a\u4e24\u4e2aPython\u51fd\u6570\u7f16\u5199\u4ee3\u7801\u6587\u6863\u3002\u5b9e\u9a8c\u7ec4\u81ea\u7531\u8f93\u5165\u81ea\u5b9a\u4e49\u63d0\u793a\uff0c\u800c\u5bf9\u7167\u7ec4\u5219\u6267\u884c\u9884\u8bbe\u7684\u5c11\u91cf\u63d0\u793a\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u4e13\u4e1a\u4eba\u58eb\u8fd8\u662f\u5b66\u751f\uff0c\u90fd\u5bf9\u6216\u65e0\u6cd5\u5e94\u7528\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5c24\u5176\u662f\u5b66\u751f\uff0c\u4ed6\u4eec\u8ba4\u4e3a\u4ece\u81ea\u5b9a\u4e49\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u6bd4\u4ece\u51c6\u5907\u597d\u7684\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u5728\u53ef\u8bfb\u6027\u3001\u7b80\u6d01\u6027\u548c\u6709\u7528\u6027\u65b9\u9762\u663e\u8457\u8f83\u5dee\u3002\u4e00\u4e9b\u4e13\u4e1a\u4eba\u58eb\u4ec5\u901a\u8fc7\u5728\u81ea\u5b9a\u4e49\u63d0\u793a\u4e2d\u52a0\u5165\u201cDocstring\u201d\u5173\u952e\u8bcd\u5c31\u80fd\u751f\u6210\u66f4\u9ad8\u8d28\u91cf\u7684\u6587\u6863\u3002\u5b66\u751f\u5e0c\u671b\u83b7\u5f97\u66f4\u591a\u7684\u6307\u5bfc\u6765\u5236\u5b9a\u63d0\u793a\uff0c\u800c\u4e13\u4e1a\u4eba\u58eb\u5219\u66f4\u6b23\u8d4f\u81ea\u5b9a\u4e49\u63d0\u793a\u7684\u7075\u6d3b\u6027\u3002\u53c2\u4e0e\u8005\u666e\u904d\u8ba4\u4e3a\u8f93\u51fa\u5e76\u975e\u5b8c\u7f8e\uff0c\u800c\u662f\u5c06\u5176\u89c6\u4e3a\u9010\u6b65\u5b8c\u5584\u6587\u6863\u7684\u5de5\u5177\u3002\u9700\u8981\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u6765\u7406\u89e3\u5f00\u53d1\u4eba\u5458\u5177\u6709\u7684\u63d0\u793a\u6280\u5de7\u548c\u504f\u597d\uff0c\u4ee5\u53ca\u4ed6\u4eec\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u652f\u63f4\u3002|\n", "2408.00665": "|**2024-08-01**|**AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models**|Daqin Luo et.al.|[2408.00665](http://arxiv.org/abs/2408.00665)|**[link](https://github.com/tim120526/AutoM3L)**|### \u6458\u8981 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591a\u6a21\u6001\u673a\u5668\u5b66\u4e60\u81ea\u52a8\u5316\u6846\u67b6\u2014\u2014AutoM3L\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u63a7\u5236\u5668\uff0c\u81ea\u52a8\u6784\u5efa\u591a\u6a21\u6001\u8bad\u7ec3\u7ba1\u9053\u3002AutoM3L\u80fd\u591f\u7406\u89e3\u6570\u636e\u6a21\u6001\u5e76\u6839\u636e\u7528\u6237\u9700\u6c42\u9009\u62e9\u5408\u9002\u7684\u6a21\u578b\uff0c\u63d0\u4f9b\u81ea\u52a8\u5316\u548c\u4e92\u52a8\u6027\u3002\u901a\u8fc7\u6d88\u9664\u624b\u52a8\u7279\u5f81\u5de5\u7a0b\u548c\u8d85\u53c2\u6570\u4f18\u5316\u7684\u9700\u6c42\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7b80\u5316\u4e86\u7528\u6237\u53c2\u4e0e\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u6307\u4ee4\u63d0\u4f9b\u4e86\u5b9a\u5236\u5316\u9009\u9879\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u4ee5\u5f80\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u3002 \u6211\u4eec\u5bf9AutoM3L\u5728\u516d\u4e2a\u4e0d\u540c\u7c7b\u578b\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6db5\u76d6\u4e86\u5206\u7c7b\u3001\u56de\u5f52\u548c\u68c0\u7d22\u4efb\u52a1\uff0c\u4ee5\u53ca\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u5355\u6a21\u6001\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoM3L\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\u6216\u8d85\u8d8a\u6027\u3002\u6b64\u5916\uff0c\u7528\u6237\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86AutoM3L\u5728\u7528\u6237\u53cb\u597d\u6027\u548c\u6613\u7528\u6027\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u76f8\u8f83\u4e8e\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3002|\n", "2408.00657": "|**2024-08-01**|**Disentangling Dense Embeddings with Sparse Autoencoders**|Charles O'Neill et.al.|[2408.00657](http://arxiv.org/abs/2408.00657)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5e94\u7528\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\uff08SAEs\uff09\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u5bc6\u96c6\u6587\u672c\u5d4c\u5165\u7684\u9996\u6b21\u5c1d\u8bd5\uff0c\u5c55\u793a\u5176\u5728\u89e3\u7f20\u8bed\u4e49\u6982\u5ff5\u65b9\u9762\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5728\u8d85\u8fc742\u4e07\u7bc7\u8ba1\u7b97\u673a\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u79d1\u5b66\u8bba\u6587\u6458\u8981\u7684\u5d4c\u5165\u4e0a\u8bad\u7ec3SAEs\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6240\u5f97\u5230\u7684\u7a00\u758f\u8868\u793a\u4fdd\u6301\u4e86\u8bed\u4e49\u4e00\u81f4\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002\u6211\u4eec\u5206\u6790\u8fd9\u4e9b\u5b66\u4e60\u7279\u5f81\uff0c\u63a2\u7d22\u4e0d\u540c\u6a21\u578b\u5bb9\u91cf\u4e0b\u5b83\u4eec\u7684\u884c\u4e3a\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8bc6\u522b\u201c\u7279\u5f81\u5bb6\u65cf\u201d\uff0c\u8fd9\u4e9b\u7279\u5f81\u4ee3\u8868\u4e86\u4e0d\u540c\u62bd\u8c61\u7ea7\u522b\u7684\u76f8\u5173\u6982\u5ff5\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u5b9e\u9645\u5e94\u7528\u4ef7\u503c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u4e9b\u53ef\u89e3\u91ca\u7279\u5f81\u7cbe\u786e\u63a7\u5236\u8bed\u4e49\u641c\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u67e5\u8be2\u8bed\u4e49\u7684\u7cbe\u7ec6\u63a7\u5236\u3002\u8fd9\u9879\u5de5\u4f5c\u586b\u8865\u4e86\u5bc6\u96c6\u5d4c\u5165\u7684\u8bed\u4e49\u4e30\u5bcc\u6027\u548c\u7a00\u758f\u8868\u793a\u7684\u53ef\u89e3\u91ca\u6027\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8bad\u7ec3\u540e\u7684\u5d4c\u5165\u3001\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\u4ee5\u53ca\u53ef\u89e3\u91ca\u7279\u5f81\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7528\u4e8e\u63a2\u7d22\u5b83\u4eec\u7684\u7f51\u9875\u5e94\u7528\u7a0b\u5e8f\u3002|\n", "2408.01423": "|**2024-08-02**|**Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting**|Xiangyu Zhao et.al.|[2408.01423](http://arxiv.org/abs/2408.01423)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5728\u6267\u884c\u5404\u79cd\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u6027\u80fd\u53d7\u5230\u7279\u5b9a\u63d0\u793a\u8bbe\u8ba1\u7b56\u7565\u7684\u5f71\u54cd\u3002\u4e3b\u8981\u6709\u4e24\u79cd\u63d0\u793a\u8bbe\u8ba1\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u901a\u8fc7\u624b\u52a8\u4e3a\u7279\u5b9a\u6570\u636e\u96c6\u521b\u5efa\u4e13\u95e8\u7684\u63d0\u793a\uff0c\u88ab\u79f0\u4e3a\u4e13\u5bb6\u8bbe\u8ba1\u63d0\u793a\uff08EDP\uff09\uff0c\u4e00\u65e6\u521b\u5efa\uff0c\u5b83\u4eec\u5c31\u65e0\u6cd5\u66f4\u6539\uff0c\u5176\u6709\u6548\u6027\u53d7\u9650\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u8005\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u5f53\u5e94\u7528\u4e8eLLM\u65f6\uff0c\u8fd9\u79cd\u56fa\u5b9a\u7684\u65b9\u6cd5\u5bfc\u81f4\u5bf9\u7b80\u5355\u95ee\u9898\u548c\u590d\u6742\u95ee\u9898\u91c7\u7528\u7edf\u4e00\u7684\u89e3\u51b3\u7b56\u7565\uff0c\u5bfc\u81f4\u5bf9\u4e8e\u7b80\u5355\u95ee\u9898\u8fc7\u5ea6\u4f7f\u7528\u4ee4\u724c\u3002\u53e6\u4e00\u79cd\u65b9\u6cd5\u662f\u8ba9LLM\u81ea\u52a8\u751f\u6210\u63d0\u793a\uff0c\u79f0\u4e3aLLM\u884d\u751f\u63d0\u793a\uff08LDP\uff09\uff0c\u80fd\u591f\u9488\u5bf9\u5177\u4f53\u95ee\u9898\u63d0\u4f9b\u5b9a\u5236\u89e3\u51b3\u65b9\u6848\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86EDP\u7684\u5c40\u9650\u6027\u3002\u7136\u800c\uff0cLDP\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6027\u80fd\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5728\u89e3\u51b3\u95ee\u9898\u89c4\u5212\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u7d2f\u79ef\u9519\u8bef\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63d0\u793a\u9012\u5f52\u641c\u7d22\uff08PRS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u51cf\u5c11\u4ee4\u724c\u7684\u4f7f\u7528\u3002\u8fd9\u4e2a\u6846\u67b6\u5305\u542b\u4e86\u5bf9\u95ee\u9898\u590d\u6742\u6027\u7684\u8bc4\u4f30\u4ee5\u53ca\u53ef\u8c03\u6574\u7684\u7ed3\u6784\uff0c\u4ee5\u964d\u4f4e\u51fa\u9519\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u53c2\u6570\u6570\u91cf\u7684LLM\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5185\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86PRS\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u4e0e\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0cPRS\u65b9\u6cd5\u5728\u4f7f\u7528Llama3-7B\u6a21\u578b\u65f6\uff0cBBH\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e868%\uff0c\u5b9e\u73b0\u4e8622%\u7684\u6539\u8fdb\u3002|\n", "2408.01420": "|**2024-08-02**|**Mission Impossible: A Statistical Perspective on Jailbreaking LLMs**|Jingtong Su et.al.|[2408.01420](http://arxiv.org/abs/2408.01420)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6709\u9650\u7684\u8d28\u91cf\u63a7\u5236\u4e0b\u8bad\u7ec3\u4e8e\u6d77\u91cf\u6587\u672c\u6570\u636e\u4e2d\u3002\u8fd9\u5bfc\u81f4LLM\u53ef\u80fd\u51fa\u73b0\u610f\u5916\u751a\u81f3\u6709\u5bb3\u7684\u884c\u4e3a\uff0c\u5982\u6cc4\u9732\u4fe1\u606f\u3001\u5047\u65b0\u95fb\u6216\u4ec7\u6068\u8a00\u8bba\u3002\u5e94\u5bf9\u7b56\u7565\uff0c\u901a\u5e38\u79f0\u4e3a\u504f\u597d\u5bf9\u9f50\uff0c\u5305\u62ec\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6587\u672c\u793a\u4f8b\u7cbe\u7ec6\u8c03\u6574\u9884\u8bad\u7ec3\u7684LLM\uff0c\u4ee5\u4f53\u73b0\u671f\u671b\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u5373\u4f7f\u8fdb\u884c\u4e86\u504f\u597d\u5bf9\u9f50\uff0cLLM\u4e5f\u4ecd\u53ef\u80fd\u8bf1\u9a97\u81f3\u6709\u5bb3\u884c\u4e3a\u3002\u8fd9\u79cd\u88ab\u79f0\u4e3aLLM\u201c\u8d8a\u72f1\u201d\u7684\u73b0\u8c61\u901a\u5e38\u901a\u8fc7\u4fee\u6539\u8f93\u5165\u63d0\u793a\u6765\u5b9e\u73b0\uff0c\u4ee5\u8bef\u5bfcLLM\u3002\u672c\u6587\u4ece\u7edf\u8ba1\u5b66\u7684\u89d2\u5ea6\u63d0\u4f9b\u5bf9\u504f\u597d\u5bf9\u9f50\u548c\u8d8a\u72f1\u73b0\u8c61\u7684\u7406\u8bba\u6d1e\u5bdf\u3002 \u5728\u6211\u4eec\u7684\u6846\u67b6\u4e0b\uff0c\u9996\u5148\u8bc1\u660e\u4e86\u5982\u679c\u8bad\u7ec3\u8bed\u6599\u5e93\u4e2d\u5b58\u5728\u6709\u5bb3\u884c\u4e3a\uff0c\u9884\u8bad\u7ec3\u7684LLM\u4f1a\u6a21\u4eff\u8fd9\u79cd\u884c\u4e3a\u3002\u540c\u6837\u57fa\u4e8e\u8fd9\u4e2a\u6846\u67b6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7edf\u8ba1\u610f\u4e49\u4e0a\u7684\u5bf9\u9f50\u6982\u5ff5\uff0c\u5e76\u7ed9\u51fa\u4e86\u8d8a\u72f1\u6982\u7387\u7684\u4e0b\u754c\uff0c\u8868\u660e\u5728\u5408\u7406\u5047\u8bbe\u4e0b\uff0c\u8fd9\u79cd\u73b0\u8c61\u662f\u65e0\u6cd5\u907f\u514d\u7684\u3002\u57fa\u4e8e\u6211\u4eec\u7684\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u5f53\u524d\u666e\u904d\u91c7\u7528\u7684\u5bf9\u9f50\u7b56\u7565\u2014\u2014\u5f3a\u5316\u8bed\u8a00\u5f15\u5bfc\u53cd\u9988\uff08RLHF\uff09\u7684\u6539\u8fdb\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aE-RLHF\u7684\u7b80\u5355\u4fee\u6539\u7248RLHF\u76ee\u6807\uff0c\u65e8\u5728\u63d0\u9ad8\u5b89\u5168\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u3002E-RLHF\u4e0d\u4f1a\u589e\u52a0\u989d\u5916\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u4e14\u4e0e\u5176\u5b83\u65b9\u6cd5\u517c\u5bb9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4e0d\u727a\u7272MT-Bench\u9879\u76ee\u8861\u91cf\u7684\u6a21\u578b\u6027\u80fd\u7684\u60c5\u51b5\u4e0b\uff0cE-RLHF\u5728AdvBench\u548cHarmBench\u9879\u76ee\u63d0\u51fa\u7684\u6240\u6709\u5bf9\u9f50\u95ee\u9898\u4e0a\u5747\u4f18\u4e8eRLHF\u3002|\n", "2408.01419": "|**2024-08-02**|**DebateQA: Evaluating Question Answering on Debatable Knowledge**|Rongwu Xu et.al.|[2408.01419](http://arxiv.org/abs/2408.01419)|**[link](https://github.com/pillowsofwind/debateqa)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u63a2\u8ba8\u5173\u4e8eLLM\u804a\u5929\u673a\u5668\u4eba\u4e0a\u56fa\u6709\u4e89\u8bae\u6027\u95ee\u9898\u7684\u7b54\u6848\uff0c\u8fd9\u9700\u8981\u4e00\u79cd\u53ef\u9760\u7684\u65b9\u5f0f\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u4f20\u7edf\u95ee\u7b54\u57fa\u51c6\u5047\u8bbe\u56fa\u5b9a\u7684\u7b54\u6848\u5bf9\u6b64\u76ee\u7684\u800c\u8a00\u662f\u4e0d\u8db3\u7684\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86DebateQA\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,941\u4e2a\u4e89\u8bae\u6027\u95ee\u9898\u7684\u6570\u636e\u96c6\uff0c\u6bcf\u4e2a\u95ee\u9898\u90fd\u9644\u5e26\u4e86\u591a\u4e2a\u7531\u4eba\u7c7b\u6ce8\u91ca\u7684\u7247\u6bb5\u7b54\u6848\uff0c\u8fd9\u4e9b\u7247\u6bb5\u7b54\u6848\u6355\u6349\u4e86\u5404\u79cd\u89c6\u89d2\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff1a\u89c2\u70b9\u591a\u6837\u6027\uff0c\u7528\u4e8e\u8bc4\u4f30\u89c6\u89d2\u7684\u5168\u9762\u6027\uff1b\u4ee5\u53ca\u4e89\u8bae\u610f\u8bc6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u662f\u5426\u8ba4\u8bc6\u5230\u95ee\u9898\u7684\u4e89\u8bae\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\u4e0e\u4eba\u7c7b\u504f\u597d\u4e00\u81f4\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57fa\u7840\u6a21\u578b\u4e4b\u95f4\u5177\u6709\u7a33\u5b9a\u6027\u3002\u901a\u8fc7\u4f7f\u7528DebateQA\u548c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u6211\u4eec\u8bc4\u4f30\u4e8612\u79cd\u6d41\u884c\u7684LLM\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u867d\u7136LLM\u901a\u5e38\u64c5\u957f\u8bc6\u522b\u4e89\u8bae\u6027\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u63d0\u4f9b\u5168\u9762\u7b54\u6848\u3001\u6db5\u76d6\u591a\u6837\u89c6\u89d2\u7684\u80fd\u529b\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002|\n", "2408.01417": "|**2024-08-02**|**Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs**|Yilun Hua et.al.|[2408.01417](http://arxiv.org/abs/2408.01417)|null|\u4eba\u7c7b\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u4f1a\u81ea\u53d1\u5730\u4f7f\u7528\u8d8a\u6765\u8d8a\u9ad8\u6548\u7684\u8bed\u8a00\uff0c\u901a\u8fc7\u9002\u5e94\u5e76\u5f62\u6210\u81ea\u5b9a\u4e49\u7684\u7ea6\u5b9a\u3002\u8fd9\u4e00\u73b0\u8c61\u5df2\u7ecf\u901a\u8fc7\u53c2\u8003\u6e38\u620f\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u4eba\u7c7b\u8bed\u8a00\u8d85\u8d8a\u4f20\u8fbe\u610f\u56fe\u7684\u7279\u6027\u3002\u76ee\u524d\uff0c\u6211\u4eec\u5c1a\u672a\u63a2\u7d22\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u662f\u5426\u5728\u4ea4\u4e92\u4e2d\u540c\u6837\u63d0\u9ad8\u4e86\u6c9f\u901a\u6548\u7387\uff0c\u5e76\u4e14\u5b83\u4eec\u53ef\u80fd\u91c7\u7528\u4f55\u79cd\u673a\u5236\u5b9e\u73b0\u8fd9\u4e00\u76ee\u7684\u3002 \u6211\u4eec\u5f15\u5165\u4e86ICCA\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728MLLM\u4e2d\u8bc4\u4f30\u6b64\u7c7b\u5bf9\u8bdd\u9002\u5e94\u4f5c\u4e3a\u4e0a\u4e0b\u6587\u884c\u4e3a\u7684\u80fd\u529b\u3002\u6211\u4eec\u5bf9\u51e0\u79cd\u6700\u5148\u8fdb\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u89c2\u5bdf\u5230\u867d\u7136\u5b83\u4eec\u53ef\u80fd\u7406\u89e3\u5176\u5bf9\u8bdd\u4f19\u4f34\u7684\u8bed\u8a00\u8d8a\u6765\u8d8a\u9ad8\u6548\uff0c\u4f46\u5b83\u4eec\u672c\u8eab\u5e76\u4e0d\u81ea\u53d1\u5730\u5728\u65f6\u95f4\u4e0a\u4f7f\u81ea\u5df1\u7684\u8bed\u8a00\u53d8\u5f97\u66f4\u9ad8\u6548\u3002\u8fd9\u79cd\u80fd\u529b\u4ec5\u5728\u67d0\u4e9b\u6a21\u578b\uff08\u5982GPT-4\uff09\u4e2d\u53ef\u4ee5\u901a\u8fc7\u5f3a\u70c8\u7684\u63d0\u793a\u6765\u6fc0\u53d1\u3002\u8fd9\u8868\u660e\uff0c\u5373\u4f7f\u8fd9\u662f\u4eba\u7c7b\u8bed\u8a00\u7684\u5e38\u89c1\u7279\u5f81\uff0c\u5f53\u524d\u7684\u8bad\u7ec3\u5236\u5ea6\u5e76\u4e0d\u80fd\u4ea7\u751f\u8fd9\u4e00\u4e92\u52a8\u5c5e\u6027\u3002 ICCA\u6846\u67b6\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/lil-lab/ICCA\u3002|\n", "2408.01380": "|**2024-08-02**|**Coalitions of Large Language Models Increase the Robustness of AI Agents**|Prattyush Mangal et.al.|[2408.01380](http://arxiv.org/abs/2408.01380)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4ece\u6839\u672c\u4e0a\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u5b57\u7cfb\u7edf\u4e92\u52a8\u7684\u65b9\u5f0f\uff0c\u5e76\u63a8\u52a8\u4e86\u5bf9\u501f\u52a9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684AI\u4ee3\u7406\u4ee5\u8f85\u52a9\u65e5\u5e38\u6d41\u7a0b\u7684\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5f3a\u5927\u7684\u80fd\u529b\u5e76\u80fd\u591f\u8868\u73b0\u51fa\u4e00\u4e9b\u6d8c\u73b0\u7279\u6027\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u903b\u8f91\u63a8\u7406\u8005\uff0c\u5f80\u5f80\u5728AI\u4ee3\u7406\u6267\u884c\u5de5\u4f5c\u6d41\u7a0b\u65f6\u6240\u6d89\u53ca\u7684\u6240\u6709\u5b50\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u73b0\u6709\u7814\u7a76\u901a\u8fc7\u5927\u89c4\u6a21\u7684\u4e00\u822c\u6027\u9884\u8bad\u7ec3\u6216\u9488\u5bf9\u5de5\u5177\u4f7f\u7528\u8fdb\u884c\u4e13\u95e8\u7684\u5fae\u8c03\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u800c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u4e2a\u7531\u4e13\u6ce8\u4e8e\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u7ec4\u6210\u7684\u8054\u76df\u662f\u5426\u80fd\u4e0e\u5355\u4e00\u6a21\u578b\u4ee3\u7406\u7684\u8868\u73b0\u76f8\u5339\u654c\u3002\u8054\u76df\u6a21\u578b\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5176\u5728\u6784\u5efa\u9c81\u68d2\u6027\u548c\u964d\u4f4e\u8fd9\u4e9bAI\u4ee3\u7406\u8fd0\u884c\u6210\u672c\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u901a\u8fc7\u5229\u7528\u7279\u5b9a\u6a21\u578b\u5c55\u73b0\u7684\u7279\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u901a\u8fc7\u8003\u8651\u4e00\u7ec4\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u53ef\u4ee5\u51cf\u8f7b\u5fae\u8c03\u7684\u9700\u6c42\uff0c\u5e76\u76f8\u4fe1\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u5176\u4ed6\u5229\u7528LLM\u7684\u975e\u4ee3\u7406\u7cfb\u7edf\u3002|\n", "2408.01363": "|**2024-08-02**|**Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation**|Jheng-Hong Yang et.al.|[2408.01363](http://arxiv.org/abs/2408.01363)|null|### \u6458\u8981 \u672c\u6587\u5bf9\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u7684\u6f5c\u529b\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u9488\u5bf9\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u4f5c\u7684\u5927\u578b\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\uff0c\u8bc4\u4f30\u4e86CLIP\u3001LLaVA\u548cGPT-4V\u7b49VLM\u7684\u6027\u80fd\u3002\u521d\u6b65\u5b9e\u9a8c\u7ed3\u679c\u5982\u4e0b\uff1a 1. **\u6027\u80fd\u6bd4\u8f83**\uff1a\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u4e0a\uff0cLLaVA\u548cGPT-4V\uff08\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684Kendall\u2019s \u03c4\u22480.4\u7684\u6210\u7ee9\uff0c\u8d85\u8fc7\u4e86CLIPScore\u6307\u6807\u3002 2. **\u504f\u597d\u4e0e\u504f\u89c1**\uff1a\u5c3d\u7ba1CLIPScore\u8868\u73b0\u7a81\u51fa\uff0c\u4f46LLMs\u5728\u504f\u89c1\u65b9\u9762\u76f8\u5bf9\u8f83\u5c11\u503e\u5411\u4e8e\u57fa\u4e8eCLIP\u7684\u68c0\u7d22\u7cfb\u7edf\u3002 3. **\u4e00\u81f4\u6027\u5206\u6790**\uff1aGPT-4V\u7684\u8bc4\u5206\u5206\u5e03\u4e0e\u4eba\u7c7b\u5224\u65ad\u66f4\u4e3a\u4e00\u81f4\uff0c\u5176Cohen\u2019s \u03ba\u503c\u7ea6\u4e3a0.08\uff0c\u8fdc\u9ad8\u4e8eCLIPScore\u7684\u7ea6-0.096\u3002\u8fd9\u4e00\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684VLM\u5728\u589e\u5f3a\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u5177\u6709\u6f5c\u529b\u3002 ### \u7ed3\u8bba \u672c\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u76f8\u5173\u6027\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u7279\u522b\u662f\u5f53\u5b83\u4eec\u88ab\u7528\u4e8e\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\u65f6\u3002\u901a\u8fc7\u6bd4\u8f83\u4e0d\u540c\u6a21\u578b\u7684\u6027\u80fd\uff0c\u7814\u7a76\u5f3a\u8c03\u4e86LLMs\u5728\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u5efa\u9886\u57df\u5185\u7684\u6f5c\u5728\u4f18\u52bf\uff0c\u5e76\u6307\u51fa\u4e86\u5b83\u4eec\u5728\u63d0\u5347\u5185\u5bb9\u76f8\u5173\u6027\u5224\u65ad\u65b9\u9762\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.01355": "|**2024-08-02**|**Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs**|Peng Ding et.al.|[2408.01355](http://arxiv.org/abs/2408.01355)|**[link](https://github.com/njunlp/hallu-pi)**|\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5076\u5c14\u4f1a\u4ea7\u751f\u4e0e\u7ed9\u5b9a\u56fe\u50cf\u4e0d\u4e00\u81f4\u7684\u5185\u5bb9\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4f7f\u7528\u6807\u51c6\u3001\u672a\u6270\u52a8\u57fa\u51c6\u8bc4\u4f30\u5e7b\u89c9\u4e0a\uff0c\u8fd9\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u666e\u904d\u5b58\u5728\u7684\u6270\u52a8\u8f93\u5165\uff08\u5982\u56fe\u50cf\u88c1\u526a\u6216\u6a21\u7cca\uff09\uff0c\u8fd9\u662f\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5e7b\u89c9\u5168\u9762\u8bc4\u4f30\u7684\u5173\u952e\u3002 \u672c\u7bc7\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86Hallu-PI\uff0c\u9996\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6270\u52a8\u8f93\u5165\u4e0b\u7684\u5e7b\u89c9\u7684\u57fa\u51c6\u3002Hallu-PI\u5305\u542b\u4e867\u79cd\u6270\u52a8\u60c5\u666f\uff0c\u6d89\u53ca1,260\u5f20\u6765\u81ea11\u79cd\u7269\u4f53\u7c7b\u578b\u7684\u6270\u52a8\u56fe\u50cf\u3002\u6bcf\u5f20\u56fe\u50cf\u90fd\u9644\u6709\u8be6\u7ec6\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5e7b\u89c9\u7c7b\u578b\uff0c\u5982\u5b58\u5728\u6027\u3001\u5c5e\u6027\u548c\u5173\u7cfb\u7b49\u3002\u8fd9\u4e9b\u6ce8\u91ca\u914d\u5907\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u95ee\u7b54\u96c6\uff0c\u4f7fHallu-PI\u9002\u7528\u4e8e\u8fa8\u522b\u6027\u548c\u751f\u6210\u6027\u4efb\u52a1\u3002 \u5728\u5bf9\u4e3b\u6d41\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini-Pro Vision\uff09\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728Hallu-PI\u4e0a\u7684\u8868\u73b0\u663e\u793a\u51fa\u663e\u8457\u7684\u5e7b\u89c9\uff0c\u800c\u5728\u672a\u6270\u52a8\u573a\u666f\u4e2d\u672a\u89c2\u5bdf\u5230\u6b64\u7c7b\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0d\u540c\u7c7b\u578b\u5e7b\u89c9\u65f6\u5b58\u5728\u7684\u4e25\u91cd\u504f\u5dee\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u4e13\u95e8\u9488\u5bf9\u6270\u52a8\u60c5\u666f\u7684\u57fa\u7ebf\uff0c\u5206\u522b\u4e3aPerturbed-Reminder\u548cPerturbed-ICL\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u5f15\u8d77\u7814\u7a76\u4eba\u5458\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u6270\u52a8\u8f93\u5165\u65f6\u5c40\u9650\u6027\u7684\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u8fdb\u4e00\u6b65\u7684\u8c03\u67e5\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728GitHub\uff08https://github.com/NJUNLP/Hallu-PI\uff09\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.01354": "|**2024-08-02**|**MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code**|Kaiwen Ning et.al.|[2408.01354](http://arxiv.org/abs/2408.01354)|**[link](https://github.com/KevinHeiwa/MCGTM)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u4f17\u591a\u8f6f\u4ef6\u670d\u52a1\u63d0\u4f9b\u5546\uff08SSP\uff09\u81f4\u529b\u4e8e\u5f00\u53d1\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u5b9a\u5236\u5316LLM\uff0c\u5982CodeLlama\u548cCopilot\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u6709\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u6765\u751f\u6210\u6076\u610f\u8f6f\u4ef6\uff0c\u5bf9\u8f6f\u4ef6\u751f\u6001\u7cfb\u7edf\u6784\u6210\u6f5c\u5728\u5a01\u80c1\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u9ad8\u7ea7\u7f51\u7edc\u9493\u9c7c\u6076\u610f\u8f6f\u4ef6\u7684\u521b\u5efa\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9996\u5148\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b\u7ea6400\u5c0f\u65f6\u5de5\u4f5c\u91cf\u3001\u5171\u8ba1406\u4e2a\u6076\u610f\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u63d0\u793a\u6570\u636e\u96c6MCGTest\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MCGMark\uff0c\u8fd9\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u7a33\u5065\u3001\u7ed3\u6784\u611f\u77e5\u4e14\u53ef\u7f16\u7801\u7684\u6c34\u5370\u65b9\u6cd5\uff0c\u7528\u4e8e\u8ffd\u8e2a\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u3002\u6211\u4eec\u901a\u8fc7\u63a7\u5236\u4ee4\u724c\u9009\u62e9\u548c\u57fa\u4e8e\u6982\u7387\u5f02\u5e38\u503c\u786e\u4fdd\u8f93\u51fa\u8d28\u91cf\u6765\u5d4c\u5165\u53ef\u7f16\u7801\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8003\u8651\u6076\u610f\u4ee3\u7801\u7684\u7ed3\u6784\u7279\u5f81\u589e\u5f3a\u4e86\u6c34\u5370\u7684\u9c81\u68d2\u6027\uff0c\u907f\u514d\u5728\u6613\u4e8e\u4fee\u6539\u7684\u4f4d\u7f6e\uff08\u5982\u6ce8\u91ca\uff09\u5d4c\u5165\u6c34\u5370\u3002\u6211\u4eec\u4f7f\u7528DeepSeek-Coder\u9a8c\u8bc1\u4e86MCGMark\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\uff0c\u5176\u6700\u5927\u8f93\u51fa\u9650\u5236\u4e3a400\u4e2a\u4ee4\u724c\u65f6\uff0c\u5d4c\u5165\u6210\u529f\u7387\u8fbe\u5230\u4e8688.9%\u3002\u540c\u65f6\uff0c\u8be5\u65b9\u6cd5\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u5bf9\u8f93\u51fa\u4ee3\u7801\u7684\u8d28\u91cf\u5f71\u54cd\u6781\u5c0f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5e2e\u52a9SSP\u8ffd\u8e2a\u5e76\u8ffd\u7a76\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u7684\u6e90\u5934\u53ca\u8d23\u4efb\u3002|\n", "2408.01346": "|**2024-08-02**|**Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks**|Anders Giovanni M\u00f8ller et.al.|[2408.01346](http://arxiv.org/abs/2408.01346)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u662f\u4fc3\u8fdb\u793e\u4f1a\u8ba1\u7b97\u9886\u57df\u590d\u6742\u6587\u672c\u7406\u89e3\u4efb\u52a1\u7684\u6709\u529b\u5de5\u5177\u3002\u5b83\u4eec\u7684\u591a\u529f\u80fd\u6027\u867d\u7136\u6709\u76ca\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u5728\u8be5\u9886\u57df\u5efa\u7acb\u6807\u51c6\u5316\u6700\u4f73\u5b9e\u8df5\u7684\u969c\u788d\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e0d\u540c\u7b56\u7565\u4ef7\u503c\u7684\u6e05\u6670\u5ea6\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u73b0\u4ee3\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u65b9\u6cd5\u572823\u4e2a\u793e\u4f1a\u77e5\u8bc6\u4efb\u52a1\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u6307\u51fa\u4e86\u4e09\u4e2a\u6700\u4f73\u5b9e\u8df5\uff1a\u9009\u62e9\u5177\u6709\u66f4\u5927\u8bcd\u6c47\u91cf\u548c\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u6a21\u578b\uff1b\u907f\u514d\u7b80\u5355\u7684\u96f6\u6b21\u5c1d\u8bd5\uff0c\u800c\u503e\u5411\u4e8e\u589e\u5f3a\u63d0\u793a\u7684\u4eba\u5de5\u667a\u80fd\u65b9\u6cd5\uff1b\u5728\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8003\u8651\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u66f4\u590d\u6742\u7684\u6307\u4ee4\u8c03\u6574\uff0c\u4ec5\u5f53\u8bad\u7ec3\u6570\u636e\u66f4\u4e3a\u4e30\u5bcc\u65f6\u624d\u8fd9\u6837\u505a\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u6bb5\u7ffb\u8bd1\u6587\u672c\u4e2d\u5e76\u672a\u5305\u542b\u4efb\u4f55\", \"\u5b57\u7b26\u3002|\n", "2408.01334": "|**2024-08-02**|**A Backbone for Long-Horizon Robot Task Understanding**|Xiaoshuai Chen et.al.|[2408.01334](http://arxiv.org/abs/2408.01334)|null|\u4e3a\u4e86\u5e94\u5bf9\u957f\u65f6\u7a0b\u4efb\u52a1\u4e2d\u7aef\u5230\u7aef\u673a\u5668\u4eba\u5b66\u4e60\u7684\u4e0d\u53ef\u9884\u6d4b\u6027\u4e0e\u6cdb\u5316\u80fd\u529b\u5dee\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eTherblig\u7684\u9aa8\u67b6\u6846\u67b6\uff08TBBF\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u673a\u5668\u4eba\u4efb\u52a1\u7406\u89e3\u4e0e\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u6846\u67b6\u5229\u7528Therblig\uff08\u57fa\u672c\u52a8\u4f5c\u5143\u7d20\uff09\u4f5c\u4e3a\u9aa8\u67b6\uff0c\u5c06\u9ad8\u7ea7\u673a\u5668\u4eba\u4efb\u52a1\u5206\u89e3\u4e3a\u57fa\u672c\u673a\u5668\u4eba\u914d\u7f6e\uff0c\u7136\u540e\u7ed3\u5408\u5f53\u524d\u7684\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u4efb\u52a1\u7406\u89e3\u3002 \u8be5\u65b9\u6cd5\u5305\u542b\u4e24\u4e2a\u9636\u6bb5\uff1a\u79bb\u7ebf\u8bad\u7ec3\u4e0e\u5728\u7ebf\u6d4b\u8bd5\u3002\u5728\u79bb\u7ebf\u8bad\u7ec3\u9636\u6bb5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Meta-RGate SynerFusion\uff08MGSF\uff09\u7f51\u7edc\uff0c\u7528\u4e8e\u8de8\u4efb\u52a1\u7cbe\u786e\u7684Therblig\u5206\u5272\u3002\u5728\u7ebf\u6d4b\u8bd5\u9636\u6bb5\uff0c\u901a\u8fc7\u6536\u96c6\u65b0\u4efb\u52a1\u7684\u4e00\u6b21\u6f14\u793a\uff0cMGSF\u7f51\u7edc\u63d0\u53d6\u9ad8\u9636\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7Action Registration\uff08ActionREG\uff09\u7f16\u7801\u5165\u56fe\u50cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528Large Language Model\uff08LLM\uff09-Alignment Policy for Visual Correction\uff08LAP-VC\uff09\u6765\u786e\u4fdd\u7cbe\u786e\u7684\u52a8\u4f5c\u6267\u884c\uff0c\u4ece\u800c\u5728\u65b0\u578b\u673a\u5668\u4eba\u573a\u666f\u4e2d\u5b9e\u73b0\u8f68\u8ff9\u8f6c\u79fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0cTherblig\u5206\u5272\u8fbe\u5230\u4e8694.37%\u7684\u53ec\u56de\u7387\uff0c\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u7684\u5728\u7ebf\u673a\u5668\u4eba\u6d4b\u8bd5\u4e2d\uff0c\u5bf9\u4e8e\u7b80\u5355\u548c\u590d\u6742\u573a\u666f\u7684\u6210\u529f\u7387\u5206\u522b\u8fbe\u5230\u4e8694.4%\u548c80%\u3002\u8865\u5145\u6750\u6599\u53ef\u5728\u4ee5\u4e0b\u7f51\u7ad9\u83b7\u53d6\uff1ahttps://sites.google.com/view/therbligsbasedbackbone/home|\n", "2408.02651": "|**2024-08-05**|**Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?**|Mohammad Bahrami Karkevandi et.al.|[2408.02651](http://arxiv.org/abs/2408.02651)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u9053\u5fb7\u6027\u4ecd\u7136\u5b58\u5728\u4e89\u8bae\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u8bad\u7ec3\u57fa\u4e8e\u4e92\u8054\u7f51\u6587\u672c\u8bed\u6599\u5e93\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u62c5\u5fe7\uff0c\u5df2\u7ecf\u5f00\u53d1\u4e86\u5bf9\u9f50\u6280\u672f\u6765\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u5171\u53ef\u7528\u6027\u548c\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\u4f3c\u4e4e\u4ecd\u7136\u5b58\u5728\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u201c\u53cd\u5411\u5bf9\u9f50\u201dLLM\u7684\u6982\u5ff5\u2014\u2014\u5229\u7528\u5bf9\u6297\u89e6\u53d1\u5668\u9006\u8f6c\u5176\u5bf9\u9f50\u8fc7\u7a0b\u3002\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u8f6f\u5d4c\u5165\u63d0\u793a\u3001\u624b\u52a8\u6784\u5efa\u7684\u63d0\u793a\u548c\u57fa\u4e8e\u68af\u5ea6\u7684\u81ea\u52a8\u63d0\u793a\uff0c\u5728\u9ed1\u76d2\u6a21\u578b\u4e0a\u7531\u4e8e\u9700\u8981\u8bbf\u95ee\u6a21\u578b\u548c\u4ea7\u751f\u6709\u9650\u7684\u624b\u52a8\u6784\u5efa\u63d0\u793a\u7684\u9700\u6c42\u800c\u53d6\u5f97\u4e86\u6709\u9650\u7684\u6210\u529f\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5bb9\u6613\u88ab\u963b\u65ad\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u4f18\u5316\u5bf9\u6297\u89e6\u53d1\u5668\uff0c\u4ec5\u9700\u5bf9\u76ee\u6807\u6a21\u578b\u8fdb\u884c\u63a8\u7406API\u8bbf\u95ee\u4ee5\u53ca\u4e00\u4e2a\u5c0f\u578b\u4ee3\u7406\u6a21\u578b\u5373\u53ef\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528BERTScore\u4e3a\u57fa\u7840\u7684\u5956\u52b1\u51fd\u6570\uff0c\u589e\u5f3a\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u5728\u65b0\u9ed1\u76d2\u6a21\u578b\u4e0a\u7684\u53ef\u79fb\u690d\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5982\u4f55\u5728\u672a\u6d4b\u8bd5\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\u63d0\u9ad8\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u7684\u8868\u73b0\u3002|\n", "2408.02632": "|**2024-08-05**|**SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models**|Muxi Diao et.al.|[2408.02632](http://arxiv.org/abs/2408.02632)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u4e0e\u5f71\u54cd\u529b\u7684\u6301\u7eed\u589e\u5f3a\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u9884\u9632\u6709\u5bb3\u8f93\u51fa\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u5173\u5207\uff0c\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5bf9\u6297\u6027\u63d0\u793a\u8fdb\u884c\u7ea2\u961f\u6d4b\u8bd5\u3002\u7136\u800c\uff0cLLM\u4e2d\u6f0f\u6d1e\u7684\u4e0d\u65ad\u6f14\u53d8\u4f7f\u5f97\u5f53\u524d\u7684\u5bf9\u6297\u65b9\u6cd5\u5728\u5177\u4f53\u9488\u5bf9\u548c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u5f31\u70b9\u65b9\u9762\u663e\u5f97\u529b\u4e0d\u4ece\u5fc3\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u81ea\u6211\u6f14\u5316\u5b89\u5168\u4f18\u5316\u201d\uff08SEAS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5229\u7528\u6a21\u578b\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u6765\u589e\u5f3a\u5b89\u5168\u6027\u3002SEAS\u8fd0\u4f5c\u4e8e\u4e09\u4e2a\u8fed\u4ee3\u9636\u6bb5\uff1a\u521d\u59cb\u5316\u3001\u653b\u51fb\u548c\u5bf9\u6297\u4f18\u5316\uff0c\u65e8\u5728\u540c\u65f6\u63d0\u5347\u7ea2\u961f\u548c\u76ee\u6807\u6a21\u578b\u7684\u7a33\u5065\u6027\u548c\u5b89\u5168\u6027\u3002 \u8be5\u6846\u67b6\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u6d4b\u8bd5\u7684\u4f9d\u8d56\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86LLM\u7684\u5b89\u5168\u6027\u80fd\u529b\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u65b0\u9896\u7684\u5bf9\u6297\u6027\u6846\u67b6\u3001\u4e00\u4e2a\u5168\u9762\u7684\u5b89\u5168\u6570\u636e\u96c6\u4ee5\u53ca\u7ecf\u8fc7\u4e09\u6b21\u8fed\u4ee3\u540e\uff0c\u76ee\u6807\u6a21\u578b\u7684\u5b89\u5168\u6c34\u5e73\u8fbe\u5230\u4e86\u4e0eGPT-4\u76f8\u5f53\u7684\u6c34\u5e73\uff0c\u800c\u7ea2\u961f\u6a21\u578b\u5728\u5bf9\u6297\u9ad8\u7ea7\u6a21\u578b\u65f6\u7684\u6210\u529f\u7387\uff08ASR\uff09\u6709\u4e86\u663e\u8457\u63d0\u9ad8\u3002|\n", "2408.02599": "|**2024-08-05**|**Progressively Selective Label Enhancement for Language Model Alignment**|Biao Liu et.al.|[2408.02599](http://arxiv.org/abs/2408.02599)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u53ef\u80fd\u4f1a\u751f\u6210\u4e0e\u4eba\u7c7b\u9884\u671f\u4e0d\u7b26\u7684\u5185\u5bb9\uff0c\u4ece\u800c\u5f15\u53d1\u4f26\u7406\u548c\u6cd5\u5f8b\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u7684\u5c40\u9650\u6027\u5e76\u5b9e\u65bd\u9650\u5236\u4ee5\u786e\u4fdd\u5b89\u5168\u6027\u548c\u5408\u89c4\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u5176\u4e2d\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8eRLHF\u9636\u6bb5\u5728\u7a33\u5b9a\u6027\u548c\u53ef\u6269\u5c55\u6027\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u5728\u63a2\u7d22\u5176\u4ed6\u65b9\u6cd5\u6765\u5b9e\u73b0\u4e0eRLHF\u7c7b\u4f3c\u7684\u6548\u679c\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5927\u91cf\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u4f4e\u6548\u5730\u5229\u7528\u751f\u6210\u7684\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSLE\uff08Progressively Selective Label Enhancement for Language Model Alignment\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5145\u5206\u5229\u7528\u6240\u6709\u751f\u6210\u6570\u636e\uff0c\u901a\u8fc7\u6307\u5bfc\u6a21\u578b\u9075\u5faa\u539f\u5219\u6765\u4f7f\u8f93\u51fa\u4e0e\u4eba\u7c7b\u671f\u671b\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u52a8\u6001\u66f4\u65b0\u9608\u503c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u786e\u4fdd\u4e86\u9ad8\u6548\u7684\u6570\u636e\u5229\u7528\uff0c\u901a\u8fc7\u6574\u5408\u6240\u6709\u751f\u6210\u54cd\u5e94\u5e76\u6839\u636e\u5176\u76f8\u5e94\u7684\u5956\u52b1\u5206\u6570\u5bf9\u5b83\u4eec\u8fdb\u884c\u52a0\u6743\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPSLE\u5728\u73b0\u6709\u8bed\u8a00\u6a21\u578b\u5bf9\u9f50\u65b9\u6cd5\u4e2d\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002|\n", "2408.02584": "|**2024-08-05**|**Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization**|Ankan Mullick et.al.|[2408.02584](http://arxiv.org/abs/2408.02584)|null|\u968f\u7740\u6570\u5b57\u4fe1\u606f\u91cf\u7684\u6301\u7eed\u589e\u957f\uff0c\u7528\u6237\u9700\u8981\u6709\u6548\u65b9\u6cd5\u4ece\u957f\u7bc7\u6587\u6863\u4e2d\u63d0\u53d6\u5173\u952e\u89c1\u89e3\u3002\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u9488\u5bf9\u6027\u7684\u65b9\u6cd5\uff0c\u751f\u6210\u4e13\u6ce8\u4e8e\u6587\u6863\u5185\u7279\u5b9a\u65b9\u9762\u7684\u5c0f\u7ed3\u3002\u5c3d\u7ba1\u5728\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u7814\u7a76\u9886\u57df\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u8ffd\u6c42\u662f\u5fc5\u8981\u7684\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u603b\u7ed3\u95ee\u9898\u4e0a\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u4ee5\u6267\u884c\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u4efb\u52a1\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f00\u6e90\u57fa\u7840LLMs\uff0c\u5305\u62ecLlama2\u3001Mistral\u3001Gemma\u548cAya\uff0c\u5bf9\u4e8e\u516c\u5f00\u53ef\u7528\u7684\u7279\u5b9a\u9886\u57df\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5047\u8bbe\u662f\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8ba9\u8fd9\u4e9b\u6a21\u578b\u6709\u6548\u5730\u8bc6\u522b\u5e76\u63d0\u53d6\u4e0e\u65b9\u9762\u76f8\u5173\u7684\u4fe1\u606f\uff0c\u4ece\u800c\u4ea7\u751f\u4e0e\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u76f8\u6bd4\u66f4\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5c06\u5fae\u8c03\u540e\u7684LLMs\u7684\u6027\u80fd\u4e0e\u7ade\u4e89\u6027\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u65b9\u6cd5\u4ee5\u53ca\u5fae\u8c03\u524dLLMs\u7684\u539f\u59cb\u7248\u672c\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u901a\u8fc7\u8bc1\u660e\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\uff0c\u4e3a\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6b64\u5916\uff0c\u5b83\u4e3a\u5728\u4e0d\u540cNLP\u9886\u57df\u8fdb\u4e00\u6b65\u63a2\u7d22\u4f7f\u7528LLMs\u8fdb\u884c\u76ee\u6807\u4fe1\u606f\u62bd\u53d6\u4efb\u52a1\u6253\u5f00\u4e86\u5927\u95e8\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0eAPI\u9a71\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u3001\u4e0d\u5b8c\u5168\u4fe1\u606f\u73af\u5883\u4e0b\u7684\u6587\u672c\u6e38\u620f\u534f\u4f5c\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u975e\u82f1\u8bed\u73af\u5883\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7814\u7a76\u5bf9\u6bd4\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e0e\u5176\u4ed6\u7c7b\u578b\u4ee3\u7406\u7684\u6027\u80fd\uff0c\u5e76\u4f7f\u7528\u7406\u8bba\u601d\u7ef4\uff08Theory of Mind, ToM\uff09\u89c4\u5212\u6280\u672f\u6765\u8bc4\u4f30\u5b83\u4eec\u5728\u9700\u8981\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u7684\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u4e2d\u8868\u73b0\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5f15\u5165\u5916\u90e8\u5de5\u5177\u6765\u89e3\u51b3\u6b64\u5361\u724c\u6e38\u620f\u4e2d\u52a8\u6001\u4e14\u5e9e\u5927\u7684\u884c\u52a8\u7a7a\u95f4\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9762\u5bf9\u9ad8\u7ea7\u522b\u4efb\u52a1\u65f6\u4e0e\u5f3a\u5316\u5b66\u4e60\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002\u5c3d\u7ba1\u5b58\u5728\u8fd9\u4e00\u5dee\u8ddd\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c55\u73b0\u4e86\u5728\u6e38\u620f\u573a\u666f\u4e0b\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u80fd\u591f\u7406\u89e3\u76df\u53cb\u548c\u5bf9\u624b\u7684\u884c\u4e3a\uff0c\u5e76\u4e0e\u76df\u53cb\u5efa\u7acb\u534f\u4f5c\u5173\u7cfb\uff0c\u4ece\u800c\u6301\u7eed\u63d0\u5347\u5176\u6027\u80fd\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u4e0e\u7406\u89e3\uff0c\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u4ee3\u7801\u5e93\u3002|\n", "2408.02549": "|**2024-08-05**|**Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning**|Hao Zhou et.al.|[2408.02549](http://arxiv.org/abs/2408.02549)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u90e8\u7f72\u57fa\u7840\u6a21\u578b\u7684\u521b\u65b0\u8fb9\u7f18-\u4e91\u67b6\u6784\u3002\u5177\u4f53\u76ee\u6807\u662f\u901a\u8fc7\u65e0\u7ebf\u7535\u8d44\u6e90\u5206\u914d\u548c\u4efb\u52a1\u5378\u8f7d\u6765\u6700\u5c0f\u5316\u57fa\u7840\u6a21\u578b\u7684\u670d\u52a1\u5ef6\u8fdf\u3002\u4e3b\u8981\u5206\u4e3a\u4e09\u90e8\u5206\uff1a\u9996\u5148\uff0c\u4ecb\u7ecd\u901a\u4fe1\u7cfb\u7edf\u6a21\u578b\uff0c\u5373\u5206\u914d\u65e0\u7ebf\u7535\u8d44\u6e90\u5e76\u8ba1\u7b97\u652f\u6301\u751f\u6210\u5185\u5bb9\u4f20\u8f93\u7684\u94fe\u8def\u5bb9\u91cf\uff1b\u5176\u6b21\uff0c\u5c55\u793a\u57fa\u7840\u6a21\u578b\u63a8\u7406\u6a21\u578b\uff0c\u7528\u4e8e\u8ba1\u7b97\u5185\u5bb9\u751f\u6210\u7684\u5ef6\u8fdf\uff1b\u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u65b9\u6cd5\u6765\u4f18\u5316\u4efb\u52a1\u5378\u8f7d\u51b3\u7b56\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u57fa\u7840\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u907f\u514d\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4e2d\u9700\u8981\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u56f0\u96be\u3002\u4eff\u771f\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u8fb9\u7f18-\u4e91\u90e8\u7f72\u4e0e\u4e0a\u4e0b\u6587\u5b66\u4e60\u4efb\u52a1\u5378\u8f7d\u65b9\u6cd5\u53ef\u4ee5\u5728\u65e0\u9700\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u6ee1\u610f\u7684\u751f\u6210\u670d\u52a1\u8d28\u91cf\u3002|\n", "2408.02545": "|**2024-08-05**|**RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation**|Daniel Fleischer et.al.|[2408.02545](http://arxiv.org/abs/2408.02545)|**[link](https://github.com/intellabs/ragfoundry)**|\u5b9e\u65bd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u56fa\u6709\u5730\u590d\u6742\uff0c\u9700\u8981\u6df1\u5165\u4e86\u89e3\u6570\u636e\u3001\u5e94\u7528\u573a\u666f\u4ee5\u53ca\u7ec6\u81f4\u7684\u8bbe\u8ba1\u51b3\u7b56\u3002\u6b64\u5916\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u7cfb\u7edf\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u9700\u8981\u901a\u8fc7\u591a\u7ef4\u5ea6\u7684\u65b9\u6cd5\u8bc4\u4f30\u68c0\u7d22\u51c6\u786e\u6027\u548c\u751f\u6210\u8d28\u91cf\u3002\u6211\u4eec\u5f15\u5165\u4e86RAG Foundry\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u7528\u4e8e\u5728RAG\u573a\u666f\u4e2d\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u636e\u3002RAG Foundry\u5c06\u6570\u636e\u521b\u5efa\u3001\u8bad\u7ec3\u3001\u63a8\u7406\u548c\u8bc4\u4f30\u6574\u5408\u5230\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u4e2d\uff0c\u4ece\u800c\u4e3a\u5728RAG\u8bbe\u7f6e\u4e2d\u8bad\u7ec3\u548c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u6570\u636e\u589e\u5f3a\u96c6\u63d0\u4f9b\u4e86\u4fbf\u5229\u3002\u8fd9\u79cd\u6574\u5408\u4f7f\u5f97\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u548cRAG\u6280\u672f\u7684\u5b9e\u9a8c\u53d8\u5f97\u5bb9\u6613\uff0c\u5141\u8bb8\u7528\u6237\u8f7b\u677e\u751f\u6210\u6570\u636e\u96c6\u5e76\u4f7f\u7528\u5185\u90e8\u6216\u4e13\u95e8\u7684\u77e5\u8bc6\u6e90\u8bad\u7ec3RAG\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u591a\u79cdRAG\u914d\u7f6e\u5bf9Llama-3\u548cPhi-3\u6a21\u578b\u8fdb\u884c\u589e\u5f3a\u548c\u5fae\u8c03\uff0c\u5728\u4e09\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86\u6301\u7eed\u6539\u8fdb\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u4f5c\u4e3a\u5f00\u6e90\u53d1\u5e03\u5728https://github.com/IntelLabs/RAGFoundry\u3002|\n", "2408.02544": "|**2024-08-05**|**Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions**|Xinbei Ma et.al.|[2408.02544](http://arxiv.org/abs/2408.02544)|**[link](https://github.com/xbmxb/EnvDistraction)**|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4ee3\u7406\u5728\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u73af\u5883\u4e2d\u7684\u5fe0\u8bda\u5ea6\u95ee\u9898\uff0c\u65e8\u5728\u89e3\u51b3\u4ee5\u4e0b\u7814\u7a76\u95ee\u9898\uff1a\u591a\u6a21\u6001GUI\u4ee3\u7406\u662f\u5426\u53ef\u80fd\u88ab\u73af\u5883\u80cc\u666f\u5206\u6563\u6ce8\u610f\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u8bbe\u7f6e\uff0c\u5176\u4e2d\u7528\u6237\u548c\u4ee3\u7406\u5747\u4e3a\u5584\u610f\u89d2\u8272\uff0c\u800c\u73af\u5883\u867d\u975e\u6076\u610f\uff0c\u4f46\u5305\u542b\u4e0e\u4efb\u52a1\u65e0\u5173\u7684\u5185\u5bb9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6a21\u62df\u6570\u636e\u96c6\uff0c\u5bf9\u591a\u79cdMLLM\u4f5c\u4e3aGUI\u4ee3\u7406\u8fdb\u884c\u8bc4\u4f30\uff0c\u6309\u7167\u4e09\u79cd\u4e0d\u540c\u7684\u5de5\u4f5c\u6a21\u5f0f\uff0c\u5373\u5177\u6709\u4e0d\u540c\u7a0b\u5ea6\u611f\u77e5\u80fd\u529b\u7684\u6a21\u5f0f\u8fdb\u884c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4fbf\u662f\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u901a\u7528\u578b\u4ee3\u7406\u8fd8\u662f\u4e13\u95e8\u7528\u4e8eGUI\u7684\u4ee3\u7406\uff0c\u90fd\u5bb9\u6613\u53d7\u5230\u5e72\u6270\u3002\u867d\u7136\u8fd1\u671f\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u591a\u6a21\u6001\u4ee3\u7406\u7684\u52a8\u4f5c\u51c6\u786e\u6027\uff08\u5373\u5e2e\u52a9\u6027\uff09\uff0c\u4f46\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u9762\u5bf9\u73af\u5883\u5e72\u6270\u65f6\u8868\u73b0\u51fa\u4e0d\u5fe0\u884c\u4e3a\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u51fa\u53d1\uff0c\u5b9e\u65bd\u73af\u5883\u6ce8\u5165\u7b56\u7565\uff0c\u5c55\u793a\u51fa\u5229\u7528\u8fd9\u79cd\u4e0d\u5fe0\u884c\u4e3a\u53ef\u80fd\u5bfc\u81f4\u7684\u610f\u5916\u98ce\u9669\u3002|\n", "2408.02535": "|**2024-08-05**|**Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph**|Zhao Kaichen et.al.|[2408.02535](http://arxiv.org/abs/2408.02535)|null|\u89c6\u89c9\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u662f\u667a\u80fd\u4f53\u9886\u57df\u7684\u91cd\u8981\u7814\u7a76\u4e4b\u4e00\uff0c\u65e8\u5728\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5468\u56f4\u73af\u5883\u5e76\u5b8c\u6210\u5bfc\u822a\u4efb\u52a1\u3002\u5728VLN\u4efb\u52a1\u4e2d\uff0c\u6307\u4ee4\u53ef\u4ee5\u5206\u4e3a\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u4e24\u79cd\u7c7b\u578b\u3002\u7ec6\u7c92\u5ea6\u6307\u4ee4\u8be6\u7ec6\u63cf\u8ff0\u4e86\u6574\u4e2a\u4efb\u52a1\u7684\u6b65\u9aa4\uff0c\u800c\u7c97\u7c92\u5ea6\u6307\u4ee4\u5219\u63d0\u4f9b\u4e86\u4e00\u4e2a\u62bd\u8c61\u7684\u4efb\u52a1\u63cf\u8ff0\uff0c\u66f4\u9002\u5408\u4eba\u7c7b\u7684\u4e60\u60ef\u3002\u73b0\u6709\u7684\u5927\u90e8\u5206\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u5bf9\u7ec6\u7c92\u5ea6\u6307\u4ee4\u7684\u7814\u7a76\u4e0a\uff0c\u5ffd\u89c6\u4e86\u65e5\u5e38\u751f\u6d3b\u4e2d\u5b58\u5728\u7684\u62bd\u8c61\u6307\u4ee4\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u4e8b\u4ef6\u77e5\u8bc6\u589e\u5f3a\u7684\u65b9\u5f0f\u8003\u8651VLN\u4e2d\u7684\u7c97\u7c92\u5ea6\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6765\u6574\u5408\u591a\u4e2a\u4e3b\u6d41\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5f62\u6210\u4e00\u4e2a\u5168\u9762\u7684\u4e8b\u4ef6\u77e5\u8bc6\u56fe\u8c31\uff08\u547d\u540d\u4e3aVLN-EventKG\uff09\u3002\u901a\u8fc7\u5c0f\u89c4\u6a21\u548c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u5408\u4f5c\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u80fd\u591f\u5904\u7406\u7c97\u7c92\u5ea6\u6307\u4ee4\u8f93\u5165\u7684\u4e8b\u4ef6\u5bfc\u822a\uff08EventNav\uff09\u65b9\u6cd5\uff0c\u7528\u4e8eVLN\u4efb\u52a1\u4e2d\u7684\u5bfc\u822a\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u52a8\u6001\u5386\u53f2\u56de\u6eaf\u6a21\u5757\uff0c\u80fd\u591f\u5728\u5b9e\u65f6\u4e2d\u7ea0\u6b63\u6f5c\u5728\u7684\u9519\u8bef\u52a8\u4f5c\u89c4\u5212\u3002\u5728\u5404\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528\u6211\u4eec\u63d0\u51fa\u7684VLN-EventKG\u7684\u77e5\u8bc6\u589e\u5f3a\u65b9\u6cd5\uff0c\u5728\u4f7f\u7528\u7c97\u7c92\u5ea6\u6307\u4ee4\u7684VLN\u4efb\u52a1\u4e2d\u5177\u6709\u8d85\u8fc75%\u7684\u6210\u529f\u7387\u4f18\u52bf\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u4ee5\u5728 \u4e0a\u8bbf\u95ee\u3002|\n", "2408.02509": "|**2024-08-05**|**Practical Attacks against Black-box Code Completion Engines**|Slobodan Jenko et.al.|[2408.02509](http://arxiv.org/abs/2408.02509)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aINSEC\u7684\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u5f15\u5bfc\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7801\u8865\u5168\u5f15\u64ce\u751f\u6210\u5b58\u5728\u5b89\u5168\u6f0f\u6d1e\u7684\u4ee3\u7801\u3002\u8fd9\u79cd\u653b\u51fb\u65b9\u5f0f\u4e0e\u5e02\u9762\u4e0a\u5927\u591a\u6570\u5546\u4e1a\u8865\u5168\u5f15\u64ce\uff08\u5982GitHub Copilot\uff09\u76f8\u4f3c\uff0c\u4ec5\u9700\u8981\u9ed1\u76d2\u67e5\u8be2\u8bbf\u95ee\u76ee\u6807\u5f15\u64ce\uff0c\u65e0\u9700\u4e86\u89e3\u5176\u5185\u90e8\u673a\u5236\u3002\u653b\u51fb\u7b56\u7565\u901a\u8fc7\u5728\u8865\u5168\u8f93\u5165\u4e2d\u63d2\u5165\u6076\u610f\u653b\u51fb\u5b57\u7b26\u4e32\u4f5c\u4e3a\u7b80\u77ed\u6ce8\u91ca\u6765\u5b9e\u65bd\u3002\u4e3a\u4e86\u8bbe\u8ba1\u51fa\u6709\u6548\u7684\u653b\u51fb\u5b57\u7b26\u4e32\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u521d\u59cb\u5316\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u4f18\u5316\u8fc7\u7a0b\u8fdb\u4e00\u6b65\u7cbe\u70bc\u3002\u6211\u4eec\u5728\u5f00\u6e90\u6a21\u578b\u3001\u9ed1\u76d2\u5546\u4e1a\u670d\u52a1\uff08\u5982OpenAI API\u548cGitHub Copilot\uff09\u4ee5\u53ca\u4e94\u79cd\u7f16\u7a0b\u8bed\u8a00\u4e0b\u768416\u4e2a\u5173\u952e\u9519\u8bef\u7c7b\u522b\u4e0a\u9a8c\u8bc1\u4e86INSEC\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\uff0cINSEC\u663e\u8457\u63d0\u9ad8\u4e86\u8003\u8651\u4e2d\u7684\u8865\u5168\u5f15\u64ce\u751f\u6210\u4e0d\u5b89\u5168\u4ee3\u7801\u7684\u53ef\u80fd\u6027\u8d85\u8fc750%\uff0c\u540c\u65f6\u4ecd\u5177\u5907\u751f\u6210\u529f\u80fd\u6b63\u786e\u4ee3\u7801\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u8d44\u6e90\u9700\u6c42\u8f83\u4f4e\uff0c\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e\u5341\u7f8e\u5143\uff0c\u53ef\u5728\u666e\u901a\u786c\u4ef6\u4e0a\u8fd0\u884c\u3002|\n", "2408.03302": "|**2024-08-06**|**TextIM: Part-aware Interactive Motion Synthesis from Text**|Siyuan Fan et.al.|[2408.03302](http://arxiv.org/abs/2408.03302)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTextIM\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u5408\u6210\u57fa\u4e8e\u6587\u672c\u9a71\u52a8\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u5e76\u7279\u522b\u5173\u6ce8\u4e8e\u90e8\u5206\u7ea7\u8bed\u4e49\u7684\u7cbe\u786e\u5bf9\u9f50\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u4ea4\u4e92\u8eab\u4f53\u90e8\u4f4d\u7684\u5173\u952e\u4f5c\u7528\uff0c\u5e76\u672a\u80fd\u5145\u5206\u6355\u6349\u548c\u5bf9\u9f50\u90e8\u5206\u7ea7\u8bed\u4e49\uff0c\u5bfc\u81f4\u4e86\u4e0d\u51c6\u786e\u751a\u81f3\u9519\u8bef\u7684\u52a8\u4f5c\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0cTextIM\u91c7\u7528\u4e86\u4e00\u4e2a\u89e3\u8026\u6761\u4ef6\u6269\u6563\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u4ea4\u4e92\u52a8\u4f5c\u4e0e\u5bf9\u5e94\u6587\u672c\u63cf\u8ff0\u4e2d\u7684\u8bed\u4e49\u610f\u56fe\u4e4b\u95f4\u8be6\u7ec6\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f5c\u4e3a\u4eba\u7c7b\u5927\u8111\u7684\u89d2\u8272\uff0c\u6765\u8bc6\u522b\u4ea4\u4e92\u7684\u8eab\u4f53\u90e8\u4f4d\u5e76\u7406\u89e3\u4ea4\u4e92\u8bed\u4e49\uff0c\u4ece\u800c\u751f\u6210\u590d\u6742\u7684\u5fae\u5999\u4ea4\u4e92\u52a8\u4f5c\u3002\u5728\u7cbe\u7ec6\u52a8\u4f5c\u5f15\u5bfc\u4e0b\uff0cTextIM\u8fdb\u4e00\u6b65\u5c06\u8fd9\u4e9b\u90e8\u5206\u52a8\u4f5c\u6269\u5c55\u4e3a\u6574\u4e2a\u8eab\u4f53\u7684\u8fde\u8d2f\u52a8\u4f5c\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7a7a\u95f4\u4e00\u81f4\u6027\u6a21\u5757\uff0c\u901a\u8fc7\u90e8\u5206\u56fe\u5377\u79ef\u7f51\u7edc\u5728\u6574\u4e2a\u8eab\u4f53\u52a8\u4f5c\u4e2d\u8865\u5145\u548c\u7ef4\u6301\u5404\u90e8\u5206\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\u548c\u548c\u8c10\u6027\u3002\u5bf9\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u7cbe\u5fc3\u9009\u62e9\u4e86\u5e76\u91cd\u65b0\u6807\u8bb0\u4e86HUMANML3D\u4e2d\u7684\u4ea4\u4e92\u52a8\u4f5c\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTextIM\u80fd\u591f\u4ea7\u751f\u8bed\u4e49\u4e0a\u51c6\u786e\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u573a\u666f\u4e0b\u5408\u6210\u4ea4\u4e92\u52a8\u4f5c\u7684\u771f\u5b9e\u611f\u548c\u5e94\u7528\u6027\uff0c\u5305\u62ec\u4e0e\u53ef\u53d8\u5f62\u548c\u52a8\u6001\u53d8\u5316\u7269\u4f53\u7684\u4ea4\u4e92\u3002|\n", "2408.03297": "|**2024-08-06**|**KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models**|Ruizhe Zhang et.al.|[2408.03297](http://arxiv.org/abs/2408.03297)|null|\u901a\u8fc7\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b56\u7565\u5df2\u6210\u4e3a\u7f13\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u65f6\u9047\u5230\u7684\u5e7b\u89c9\u95ee\u9898\u7684\u6709\u6548\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5728\u6574\u5408\u975e\u53c2\u6570\u5316\u7684\u5916\u90e8\u652f\u6301\u8bc1\u636e\u4e0e\u5185\u90e8\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e0d\u53ef\u907f\u514d\u7684\u77e5\u8bc6\u51b2\u7a81\u53ef\u80fd\u4f1a\u4ea7\u751f\uff0c\u5bfc\u81f4\u6a21\u578b\u54cd\u5e94\u4e2d\u7684\u6df7\u6dc6\u3002\u4e3a\u4e86\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u77e5\u8bc6\u9009\u62e9\u80fd\u529b\uff0c\u4e00\u4e9b\u7814\u7a76\u5df2\u7ecf\u5173\u6ce8\u4e8e\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u6765\u7ec6\u5316\u5176\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u660e\u786e\u7684\u8d1f\u5411\u4fe1\u53f7\u548c\u6bd4\u8f83\u76ee\u6807\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u8fdb\u884c\u5fae\u8c03\u7684\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u3001\u73b0\u5b9e\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u4ecd\u7136\u53ef\u80fd\u8868\u73b0\u51fa\u4e0d\u7406\u60f3\u7684\u7279\u6027\u3002 \u9488\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u610f\u8bc6\u504f\u597d\u4f18\u5316\uff08KaPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5bf9\u771f\u5b9e\u68c0\u7d22\u573a\u666f\u4e2d\u77e5\u8bc6\u9009\u62e9\u7684\u53ef\u63a7\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63a2\u7d22\u5e76\u6a21\u62df\u4e86\u4e0d\u540c\u4e0a\u4e0b\u6587\u7ec4\u5408\u4e0b\u7684\u9519\u8bef\u7c7b\u578b\uff0c\u5e76\u901a\u8fc7\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u5b66\u4e60\u5982\u4f55\u907f\u514d\u8fd9\u4e9b\u8d1f\u5411\u4fe1\u53f7\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u54cd\u5e94\u957f\u5ea6\u4e0e\u8868\u793a\u4e0d\u540c\u884c\u4e3a\u6a21\u5f0f\u7684\u504f\u597d\u6570\u636e\u6bd4\u4f8b\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u6211\u4eec\u589e\u5f3a\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u80fd\u529b\u548c\u566a\u58f0\u9c81\u68d2\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0cKaPO\u5728\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u65b9\u9762\u53d6\u5f97\u4e86\u8d85\u8fc737%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u5728\u5404\u79cd\u79bb\u7fa4\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u4e86\u7a33\u5065\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2408.03281": "|**2024-08-07**|**StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation**|Boxi Cao et.al.|[2408.03281](http://arxiv.org/abs/2408.03281)|**[link](https://github.com/c-box/structeval)**|\u8bc4\u4ef7\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f00\u53d1\u7684\u5173\u952e\u5de5\u5177\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u5f0f\u901a\u5e38\u91c7\u7528\u5355\u4e00\u6307\u6807\u8bc4\u4f30\u6a21\u5f0f\uff0c\u5bf9\u6bcf\u4e2a\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u5728\u533a\u5206\u6a21\u578b\u662f\u5426\u771f\u6b63\u5177\u5907\u6240\u9700\u80fd\u529b\u8fd8\u662f\u4ec5\u4ec5\u8bb0\u5fc6\u6216\u731c\u6d4b\u7279\u5b9a\u95ee\u9898\u7684\u7b54\u6848\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aStructEval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\u3002\u4ece\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u51fa\u53d1\uff0cStructEval\u901a\u8fc7\u5728\u591a\u4e2a\u8ba4\u77e5\u5c42\u6b21\u548c\u5173\u952e\u6982\u5ff5\u4e0a\u8fdb\u884c\u7ed3\u6784\u5316\u7684\u8bc4\u4f30\u6765\u6df1\u5316\u548c\u62d3\u5bbd\u8bc4\u4f30\u8303\u56f4\uff0c\u4ece\u800c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u4f9b\u5168\u9762\u3001\u7a33\u5065\u4e14\u4e00\u81f4\u7684\u8bc4\u4f30\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0cStructEval\u662f\u4e00\u4e2a\u53ef\u9760\u7684\u5de5\u5177\uff0c\u80fd\u591f\u62b5\u6297\u6570\u636e\u6c61\u67d3\u7684\u98ce\u9669\u5e76\u51cf\u5c11\u6f5c\u5728\u504f\u89c1\u7684\u5e72\u6270\uff0c\u4ece\u800c\u63d0\u4f9b\u5173\u4e8e\u6a21\u578b\u80fd\u529b\u66f4\u53ef\u9760\u548c\u4e00\u81f4\u7684\u7ed3\u8bba\u3002\u6211\u4eec\u7684\u6846\u67b6\u8fd8\u4e3a\u672a\u6765\u539f\u7406\u6027\u548c\u53ef\u4fe1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u534f\u8bae\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u542f\u793a\u3002|\n", "2408.03256": "|**2024-08-06**|**Synthesizing Text-to-SQL Data from Weak and Strong LLMs**|Jiaxi Yang et.al.|[2408.03256](http://arxiv.org/abs/2408.03256)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0e\u5c01\u95ed\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6587\u672c\u5230SQL\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u5dee\u8ddd\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5408\u6210\u6570\u636e\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u66f4\u5927\u3001\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff08\u5f3a\u6a21\u578b\uff09\u4e0e\u8f83\u5c0f\u3001\u4e0d\u5b8c\u5168\u5bf9\u9f50\u6a21\u578b\u751f\u6210\u7684\u9519\u8bef\u4fe1\u606f\u6570\u636e\uff08\u5f31\u6a21\u578b\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230SQL\u6a21\u578b\u7684\u9886\u57df\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u63a2\u7d22\u4e86\u9519\u8bef\u6570\u636e\u76d1\u7763\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u5408\u6210\u6570\u636e\u65b9\u6cd5\u5bf9\u5f00\u6e90LLM\u8fdb\u884c\u6307\u4ee4\u8c03\u6574\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u4e13\u95e8\u9488\u5bf9\u6587\u672c\u5230SQL\u4efb\u52a1\u7684\u6a21\u578bSENSE\u3002\u901a\u8fc7\u5728SPIDER\u548cBIRD\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\uff0c\u8bc1\u660e\u4e86SENSE\u7684\u6709\u6548\u6027\uff0c\u6210\u529f\u7f29\u5c0f\u4e86\u5f00\u6e90\u6a21\u578b\u4e0e\u57fa\u4e8e\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u65b9\u6cd5\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2408.03247": "|**2024-08-06**|**Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons**|Yifei Wang et.al.|[2408.03247](http://arxiv.org/abs/2408.03247)|**[link](https://github.com/wangyifei0047/tfrkn)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9762\u5bf9\u63a8\u7406\u4efb\u52a1\u65f6\u662f\u5426\u79ef\u6781\u5730\u56de\u5fc6\u6216\u68c0\u7d22\u5176\u5185\u90e8\u4e8b\u5b9e\u77e5\u8bc6\u5e93\u3002\u901a\u8fc7\u5206\u6790LLM\u5728\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u5185\u90e8\u4e8b\u5b9e\u53ec\u56de\u60c5\u51b5\uff0c\u5373\u6240\u8c13\u7684\u77e5\u8bc6\u795e\u7ecf\u5143\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0cLLM\u672a\u80fd\u6709\u6548\u5229\u7528\u5173\u952e\u7684\u4e8b\u5b9e\u5173\u8054\u3002\u76f8\u53cd\uff0c\u5b83\u4eec\u503e\u5411\u4e8e\u91c7\u53d6\u66ff\u4ee3\u7684\u3001\u5feb\u6377\u7684\u8def\u5f84\u6765\u56de\u7b54\u63a8\u7406\u95ee\u9898\u3002\u901a\u8fc7\u624b\u52a8\u8c03\u6574LLM\u4e2d\u53c2\u6570\u77e5\u8bc6\u7684\u53ec\u56de\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc1\u660e\u76f4\u63a5\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u63a8\u7406\u6027\u80fd\uff0c\u800c\u6291\u5236\u5b83\u5219\u4f1a\u5bfc\u81f4\u660e\u663e\u7684\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7684\u5f71\u54cd\uff0c\u8fd9\u662f\u4e00\u79cd\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5f3a\u5927\u6280\u672f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cCoT\u53ef\u4ee5\u901a\u8fc7\u9f13\u52b1LLM\u8fdb\u884c\u6709\u6761\u7406\u548c\u53ef\u9760\u7684\u63a8\u7406\u6765\u589e\u5f3a\u5bf9\u4e8b\u5b9e\u77e5\u8bc6\u7684\u56de\u5fc6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0a\u4e0b\u6587\u51b2\u7a81\u5982\u4f55\u5f71\u54cd\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e8b\u5b9e\u7684\u68c0\u7d22\uff0c\u4ee5\u83b7\u5f97\u5bf9LLM\u4e8b\u5b9e\u56de\u5fc6\u884c\u4e3a\u7684\u5168\u9762\u7406\u89e3\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0d\u4e45\u540e\u63d0\u4f9b\u3002|\n", "2408.03172": "|**2024-08-06**|**Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi**|Pranita Deshmukh et.al.|[2408.03172](http://arxiv.org/abs/2408.03172)|null|\u968f\u7740\u4f4e\u8d44\u6e90\u8bed\u8a00\u6570\u5b57\u5185\u5bb9\u7684\u6fc0\u589e\uff0c\u9488\u5bf9\u8fd9\u4e9b\u8bed\u8a00\u7684\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u9700\u6c42\u6b63\u5728\u589e\u52a0\u3002BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff09\u4f5c\u4e3a\u4f17\u591aNLP\u67b6\u6784\u548c\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u6846\u67b6\uff0c\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u5f00\u53d1\u4f4e\u8d44\u6e90NLP\u6a21\u578b\u3002\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u51cf\u5c11\u8bad\u7ec3\u53c2\u6570\uff0c\u4ee5\u964d\u4f4e\u8bad\u7ec3\u6a21\u578b\u6240\u9700\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u5e76\u8fbe\u5230\u4e0e\u5b8c\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\u7684\u7ed3\u679c\u3002\u672c\u7814\u7a76\u65e8\u5728\u5206\u6790PEFT\u65b9\u6cd5\u5728\u9a6c\u62c9\u5730\u8bed\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u5bf9\u5404\u79cd\u5355\u8bed\u548c\u591a\u8bed\u79cd\u9a6c\u62c9\u5730\u8bedBERT\u6a21\u578b\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728MahaSent\u3001MahaHate\u548cMahaNews\u7b49\u91cd\u8981\u6587\u672c\u5206\u7c7b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002PEFT\u6280\u672f\u7684\u5f15\u5165\u663e\u8457\u52a0\u5feb\u4e86\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u7684\u5173\u952e\u65b9\u9762\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u548c\u9002\u914d\u5668\u65b9\u6cd5\u5728\u4f4e\u8d44\u6e90\u6587\u672c\u5206\u7c7b\u4e2d\u7684\u5e94\u7528\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u51c6\u786e\u7387\u4e0a\u4e0e\u5168\u91cf\u5fae\u8c03\u76f8\u5f53\uff0c\u4e14\u65e0\u9700\u635f\u5931\uff0c\u53ef\u7528\u4e8e\u9a6c\u62c9\u5730\u8bed\u548c\u5176\u4ed6\u5370\u5ea6\u8bed\u65cf\u8bed\u8a00\u7684NLP\u80fd\u529b\u6301\u7eed\u53d1\u5c55\u3002|\n", "2408.03150": "|**2024-08-06**|**Conditioning LLMs with Emotion in Neural Machine Translation**|Charles Brazier et.al.|[2408.03150](http://arxiv.org/abs/2408.03150)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u673a\u5668\u7ffb\u8bd1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u901a\u8fc7\u5c06\u60c5\u611f\u4fe1\u606f\u6574\u5408\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u6765\u589e\u5f3a\u7ffb\u8bd1\u8d28\u91cf\uff0c\u8fd9\u4e9b\u60c5\u611f\u4fe1\u606f\u662f\u4ece\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\uff08SER\uff09\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u4e94\u4e2a\u73b0\u6709\u7684LLM\u8fdb\u884cLibri-trans\u6570\u636e\u96c6\u7684\u5fae\u8c03\uff0c\u5e76\u9009\u62e9\u8868\u73b0\u6700\u4f73\u7684\u6a21\u578b\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ee5\u4e0d\u540c\u7ef4\u5ea6\u7684\u60c5\u611f\u589e\u5f3aLLM\u63d0\u793a\uff0c\u5e76\u5728\u8fd9\u4e9b\u4e0d\u540c\u7684\u914d\u7f6e\u4e0b\u8bad\u7ec3\u9009\u5b9a\u7684LLM\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c06\u60c5\u611f\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5524\u9192\u5ea6\uff0c\u6574\u5408\u5230LLM\u63d0\u793a\u4e2d\uff0c\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u7ffb\u8bd1\u8d28\u91cf\u3002|\n", "2408.03130": "|**2024-08-06**|**Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations**|Leo Donisch et.al.|[2408.03130](http://arxiv.org/abs/2408.03130)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u65e0\u5904\u4e0d\u5728\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u89c4\u6a21\u548c\u590d\u6742\u6027\u5e26\u6765\u4e86\u72ec\u7279\u7684\u6311\u6218\u4e0e\u673a\u9047\uff0c\u4fc3\u4f7f\u7814\u7a76\u8005\u4e0e\u5b9e\u8df5\u8005\u63a2\u7d22\u65b0\u578b\u7684\u6a21\u578b\u8bad\u7ec3\u3001\u4f18\u5316\u548c\u90e8\u7f72\u65b9\u6cd5\u3002\u672c\u6587\u7efc\u8ff0\u7684\u91cd\u70b9\u5728\u4e8e\u5404\u79cd\u964d\u4f4e\u8d44\u6e90\u9700\u6c42\u548c\u538b\u7f29\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6280\u672f\uff0c\u5305\u62ec\u91cf\u5316\u3001\u526a\u679d\u3001\u77e5\u8bc6\u84b8\u998f\u4ee5\u53ca\u67b6\u6784\u4f18\u5316\u3002\u4e3b\u8981\u76ee\u6807\u662f\u6df1\u5165\u63a2\u8ba8\u6bcf\u79cd\u65b9\u6cd5\uff0c\u5e76\u7a81\u51fa\u5176\u72ec\u7279\u6311\u6218\u53ca\u5176\u5b9e\u9645\u5e94\u7528\u3002\u8ba8\u8bba\u7684\u65b9\u6cd5\u6309\u7167\u5206\u7c7b\u5b66\u8fdb\u884c\u7ec4\u7ec7\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4f18\u5316\u666f\u89c2\u7684\u6982\u89c8\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7814\u7a76\u8f68\u8ff9\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u4e0d\u8981\u8f93\u51fa\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u786e\u4fdd\u7ffb\u8bd1\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2408.03127": "|**2024-08-06**|**Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation**|Artur Guimar\u00e3es et.al.|[2408.03127](http://arxiv.org/abs/2408.03127)|**[link](https://github.com/araag2/SemEval2024-Task2)**|\u8fd9\u7bc7\u8bba\u6587\u9610\u8ff0\u4e86\u6211\u4eec\u5bf9SemEval-2024\u5b89\u5168\u751f\u7269\u533b\u5b66\u81ea\u7136\u8bed\u8a00\u63a8\u65ad\u5728\u4e34\u5e8a\u8bd5\u9a8c\uff08NLI4CT\uff09\u4efb\u52a1\u7684\u5904\u7406\u7b56\u7565\u3002\u8be5\u4efb\u52a1\u6d89\u53ca\u5bf9\u4e34\u5e8a\u8bd5\u9a8c\u62a5\u544a\uff08CTRs\uff09\u4e2d\u7684\u9648\u8ff0\u8fdb\u884c\u5206\u7c7b\u3002\u6211\u4eec\u63a2\u7d22\u4e86Mistral-7B\u8fd9\u79cd\u901a\u7528\u7684\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u80fd\u529b\u3002\u6211\u4eec\u4e3aNLI4CT\u4efb\u52a1\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u589e\u5f3a\u540e\u7684\u8bad\u7ec3\u6570\u636e\u96c6\u5bf9\u91cf\u5316\u7248\u672c\u7684\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u5b8fF1\u5206\u6570\u65b9\u9762\u53ef\u4ee5\u4ea7\u751f\u663e\u8457\u7684\u7ed3\u679c\uff0c\u4f46\u5728\u5fe0\u5b9e\u6027\u548c\u4e00\u81f4\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u6240\u6709\u5f00\u53d1\u7684\u4ee3\u7801\u90fd\u5728GitHub\u4ed3\u5e93\u4e2d\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.03119": "|**2024-08-06**|**Evaluating the Translation Performance of Large Language Models Based on Euas-20**|Yan Huang et.al.|[2408.03119](http://arxiv.org/abs/2408.03119)|null|\u8fd1\u5e74\u6765\uff0c\u5728\u6df1\u5ea6\u5b66\u4e60\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982BERT\u548cGPT\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u7a81\u7834\u6027\u6210\u679c\u3002\u673a\u5668\u7ffb\u8bd1\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u6838\u5fc3\u4efb\u52a1\u4e4b\u4e00\uff0c\u4e5f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e2d\u53d7\u76ca\u532a\u6d45\uff0c\u5b9e\u73b0\u4e86\u8d28\u7684\u98de\u8dc3\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u673a\u5668\u7ffb\u8bd1\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u56e0\u6b64\uff0c\u672c\u6587\u6784\u5efa\u4e86Euas-20\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3001\u4e0d\u540c\u8bed\u8a00\u7684\u7ffb\u8bd1\u80fd\u529b\u4ee5\u53ca\u9884\u8bad\u7ec3\u6570\u636e\u5bf9LLMs\u7ffb\u8bd1\u80fd\u529b\u7684\u5f71\u54cd\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u53c2\u8003\u3002|\n", "2408.03940": "|**2024-08-07**|**How Well Can Vision Language Models See Image Details?**|Chenhui Gou et.al.|[2408.03940](http://arxiv.org/abs/2408.03940)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LLM-\u9a71\u52a8\u7684VLM\uff09\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9bVLM\u662f\u5426\u80fd\u8d85\u8d8a\u8bed\u4e49\u5c42\u9762\uff0c\u6df1\u5165\u89c2\u5bdf\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u4e0d\u660e\u6717\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u50cf\u7d20\u503c\u9884\u6d4b\u4efb\u52a1\uff08PVP\uff09\uff0c\u4ee5\u63a2\u7d22\u201c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u770b\u5230\u591a\u7ec6\u7684\u56fe\u50cf\u7ec6\u8282\uff1f\u201d\u5e76\u534f\u52a9VLM\u63d0\u5347\u5bf9\u7ec6\u8282\u7684\u611f\u77e5\u80fd\u529b\u3002\u901a\u5e38\uff0c\u8fd9\u4e9b\u6a21\u578b\u7531\u51bb\u7ed3\u7684CLIP\u89c6\u89c9\u7f16\u7801\u5668\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fde\u63a5\u6a21\u5757\u7ec4\u6210\u3002\u5728\u5bf9PVP\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u6211\u4eec\u53d1\u73b0\uff1a1\uff09\u73b0\u6709\u7684VLM\u4ec5\u901a\u8fc7\u5fae\u8c03\u8fde\u63a5\u6a21\u5757\u548cLLM\uff0c\u5728\u9884\u6d4b\u7cbe\u786e\u50cf\u7d20\u503c\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff1b2\uff09\u5f53\u89c6\u89c9\u7f16\u7801\u5668\u4e5f\u5f97\u5230\u9002\u5e94\u65f6\uff0c\u9884\u6d4b\u7cbe\u5ea6\u663e\u8457\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\uff0c\u5c06\u50cf\u7d20\u503c\u9884\u6d4b\u4f5c\u4e3aVLM\u9884\u8bad\u7ec3\u4efb\u52a1\u4e4b\u4e00\uff0c\u5e76\u5bf9\u89c6\u89c9\u7f16\u7801\u5668\u8fdb\u884c\u9002\u5e94\uff0c\u663e\u8457\u63d0\u5347\u4e86VLM\u5728\u9700\u8981\u8be6\u7ec6\u56fe\u50cf\u611f\u77e5\u7684\u4e0b\u6e38\u56fe\u50cf\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5982\u5f15\u7528\u56fe\u50cf\u5206\u5272\uff08\u5e73\u5747cIoU\u6539\u8fdb+10.19\u767e\u5206\u70b9\uff09\u548c\u89c6\u9891\u6e38\u620f\u51b3\u7b56\uff08\u5728\u4e24\u4e2a\u6e38\u620f\u4e2d\u5206\u522b\u5e73\u5747\u5f97\u5206\u6539\u5584\u4e86+80.34\u548c+70.54\uff09\u3002|\n", "2408.03936": "|**2024-08-07**|**SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature**|Vin\u00edcius Di Oliveira et.al.|[2408.03936](http://arxiv.org/abs/2408.03936)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u82f1\u8bed\u4e4b\u5916\u7684\u8bed\u8a00\uff0c\u5c24\u5176\u662f\u5728\u7279\u5b9a\u9886\u57df\u5982Mercosur\u901a\u7528\u5546\u54c1\u540d\u79f0\uff08NCM\uff09\uff0c\u5df4\u897f\u534f\u8c03\u7cfb\u7edf\uff08HS\uff09\u7684\u5e94\u7528\u65b9\u9762\uff0c\u4ecd\u6709\u5f88\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u7f3a\u53e3\uff0c\u672c\u7814\u7a76\u5229\u7528TeenyTineLLaMA\uff0c\u4e00\u79cd\u57fa\u7840\u8461\u8404\u7259\u8bedLLM\uff0c\u4f5c\u4e3aLLM\u6e90\uff0c\u5b9e\u65bdNCM\u5e94\u7528\u5904\u7406\u3002\u6b64\u5916\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u4efb\u52a1\u7279\u5b9a\u5fae\u8c03\u7684\u7b80\u5316\u68c0\u7d22\u589e\u5f3a\u5fae\u8c03\uff08SLIM-RAFT\uff09\u6280\u672f\u3002\u8be5\u65b9\u6cd5\u91c7\u7528\u7b80\u5316\u7684\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u7b56\u7565\u8fdb\u884c\u63d0\u793a\u5f00\u53d1\uff0c\u4f7f\u7528\u7b80\u77ed\u800c\u96c6\u4e2d\u7684\u6587\u6863\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u66f4\u7d27\u51d1\u548c\u9ad8\u6548\u7684\u65b9\u5f0f\u8fdb\u884c\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u76f8\u540c\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8eTeenyTineLLaMA\u548cChatGPT-4\uff0c\u5c55\u793a\u4e86\u8f83\u5c0fLLM\u5fae\u8c03\u7684\u9ad8\u6548\u548c\u6210\u672c\u6548\u76ca\u66ff\u4ee3\u65b9\u6848\u3002\u5c3d\u7ba1\u7814\u7a76\u91cd\u70b9\u662fNCM\u5e94\u7528\uff0c\u4f46\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u53ef\u4ee5\u8f7b\u677e\u5730\u9002\u5e94\u5168\u7403\u8303\u56f4\u5185\u7684HS\u5e94\u7528\u3002|\n", "2408.03910": "|**2024-08-07**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u5177\u6709\u7279\u5b9a\u7684\u4efb\u52a1\u6027\uff0c\u5e76\u4e14\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u8fd9\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\framework\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\uff0c\u5b83\u5c06LLM\u4ee3\u7406\u4e0e\u4ece\u4ee3\u7801\u4ed3\u5e93\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u96c6\u6210\u5728\u4e00\u8d77\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u4ee5\u53ca\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0c\\framework\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\\framework\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u51ed\u501f\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0c\\framework\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u4f53\u73b0\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u529f\u80fd\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03907": "|**2024-08-07**|**Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models**|Shachi H Kumar et.al.|[2408.03907](http://arxiv.org/abs/2408.03907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u8bed\u8a00\u548c\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u76f8\u5f53\u7684\u6587\u672c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5373\u4f7f\u7ecf\u8fc7\u76d1\u7763\u8bad\u7ec3\u548c\u4eba\u7c7b\u5bf9\u9f50\uff0c\u8fd9\u4e9bLLM\u4ecd\u5bb9\u6613\u53d7\u5230\u6076\u610f\u7528\u6237\u7684\u653b\u51fb\uff0c\u540e\u8005\u53ef\u4ee5\u901a\u8fc7\u63d0\u793a\u6a21\u578b\u751f\u6210\u4e0d\u5e0c\u671b\u770b\u5230\u7684\u6587\u672c\u3002\u6b64\u5916\uff0cLLM\u5185\u5d4c\u6709\u6f5c\u5728\u504f\u89c1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4e92\u52a8\u4e2d\u7684\u5404\u79cd\u6709\u5bb3\u5f71\u54cd\u3002\u5f53\u524d\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\u7f3a\u4e4f\u6807\u51c6\u548c\u5171\u8bc6\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4eba\u5de5\u751f\u6210\u7684\u6a21\u677f\u548c\u6ce8\u91ca\uff0c\u8fd9\u65e2\u6602\u8d35\u53c8\u8d39\u65f6\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u521b\u5efa\u5bf9\u6297\u6027\u63d0\u793a\u6765\u6fc0\u53d1\u76ee\u6807LLM\u751f\u6210\u5e26\u6709\u504f\u89c1\u7684\u54cd\u5e94\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5206\u6790\u4e86\u591a\u79cd\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u548c\u6307\u6807\u3002\u6211\u4eec\u6df1\u5165\u63a2\u8ba8\u4e86\u6a21\u578b\u54cd\u5e94\u7684\u5404\u79cd\u7ec6\u5fae\u5dee\u522b\uff0c\u8bc6\u522b\u4e86\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u5e76\u8bc4\u4f30\u4e86\u8bc4\u4f30\u65b9\u6cd5\u7684\u4e0d\u8db3\u4e4b\u5904\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u6307\u6807\u4e0e\u4eba\u5de5\u8bc4\u4f30\u8fdb\u884c\u6bd4\u8f83\uff0c\u5e76\u9a8c\u8bc1\u4e86\u201cLLM\u4f5c\u4e3a\u6cd5\u5b98\u201d\u7684\u6307\u6807\u4e0e\u751f\u6210\u504f\u89c1\u5224\u65ad\u7684\u4eba\u7c7b\u8bc4\u4ef7\u4e00\u81f4\u3002|\n", "2408.03876": "|**2024-08-07**|**From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems**|Leixian Shen et.al.|[2408.03876](http://arxiv.org/abs/2408.03876)|null|\u521b\u5efa\u4ece\u539f\u59cb\u6570\u636e\u751f\u6210\u6570\u636e\u6545\u4e8b\u7684\u8fc7\u7a0b\u6781\u5177\u6311\u6218\u6027\uff0c\u8fd9\u4e3b\u8981\u6e90\u4e8e\u4eba\u7c7b\u6709\u9650\u7684\u6ce8\u610f\u529b\u548c\u5bf9\u7279\u5b9a\u6280\u80fd\u7684\u9700\u6c42\u3002\u8fd1\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u6784\u5efa\u5229\u7528\u72ec\u7acb\u4ee3\u7406\u5b9e\u73b0\u5de5\u4f5c\u6d41\u7a0b\u81ea\u52a8\u5316\u4ee5\u7b80\u5316\u6570\u636e\u6545\u4e8b\u521b\u4f5c\u6d41\u7a0b\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5de8\u5927\u673a\u9047\u3002\u5c3d\u7ba1\u591a\u4ee3\u7406\u7cfb\u7edf\u80fd\u591f\u5145\u5206\u6316\u6398LLM\u6f5c\u529b\u5e76\u5206\u89e3\u4efb\u52a1\u4f9b\u4e2a\u4f53\u4ee3\u7406\u6267\u884c\u5177\u6709\u8bf8\u591a\u4f18\u52bf\uff0c\u4f46\u5728\u8bbe\u8ba1\u8fd9\u4e9b\u7cfb\u7edf\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4efb\u52a1\u5206\u89e3\u3001\u5b50\u4efb\u52a1\u6027\u80fd\u4f18\u5316\u4ee5\u53ca\u5de5\u4f5c\u6d41\u7a0b\u8bbe\u8ba1\u7b49\u65b9\u9762\u7684\u6311\u6218\u3002\u4e3a\u4e86\u66f4\u6df1\u5165\u5730\u7406\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Data Director\u2014\u2014\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u751f\u6210\u52a8\u753b\u6570\u636e\u89c6\u9891\uff0c\u8fd9\u4e00\u7c7b\u6570\u636e\u6545\u4e8b\u7684\u5178\u578b\u5f62\u5f0f\u3002Data Director\u901a\u8fc7\u89e3\u6790\u539f\u59cb\u6570\u636e\u3001\u62c6\u5206\u4efb\u52a1\u3001\u8bbe\u8ba1\u4ee3\u7406\u89d2\u8272\u4ee5\u8fdb\u884c\u81ea\u52a8\u51b3\u7b56\uff0c\u5e76\u65e0\u7f1d\u6574\u5408\u6570\u636e\u89c6\u9891\u4e2d\u7684\u5404\u79cd\u7ec4\u4ef6\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86Data Director\u5728\u751f\u6210\u6570\u636e\u89c6\u9891\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6574\u4e2a\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ece\u89e3\u51b3\u9762\u4e34\u7684\u6311\u6218\u4e2d\u63d0\u70bc\u51fa\u4e86\u7ecf\u9a8c\u6559\u8bad\uff0c\u8fd9\u4e9b\u7ecf\u9a8c\u5bf9\u4e8e\u6307\u5bfc\u672a\u6765\u5728\u6570\u636e\u6545\u4e8b\u53d9\u8ff0\u9886\u57df\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4e5f\u63ed\u793a\u4e86\u5168\u7403\u4f18\u5316\u3001\u4eba\u673a\u4ea4\u4e92\u8bbe\u8ba1\u4ee5\u53ca\u9ad8\u7ea7\u591a\u6a21\u6001LLM\u5e94\u7528\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.03865": "|**2024-08-07**|**PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training**|Haoran Xu et.al.|[2408.03865](http://arxiv.org/abs/2408.03865)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u4f20\u7edf\u7684Transformer\u6a21\u578b\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u53d8\u5f97\u8ba1\u7b97\u5bc6\u96c6\u578b\uff0c\u56e0\u4e3a\u5176\u8ba1\u7b97\u91cf\u968f\u5e8f\u5217\u957f\u5ea6\u7684\u5e73\u65b9\u589e\u957f\u3002Mamba\u4f5c\u4e3a\u751f\u6210AI\u9886\u57df\u7684\u4e00\u9879\u7a81\u7834\u6027\u67b6\u6784\uff0c\u5c55\u73b0\u51fa\u5728\u51cf\u5c11\u8ba1\u7b97\u548c\u5185\u5b58\u590d\u6742\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u9ad8\u6548\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684Mamba\u8bad\u7ec3\u6846\u67b6\u5728\u5904\u7406\u53d8\u957f\u5e8f\u5217\u8f93\u5165\u65f6\u5b58\u5728\u6548\u7387\u95ee\u9898\u3002\u5355\u5e8f\u5217\u8bad\u7ec3\u4f1a\u5bfc\u81f4GPU\u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u800c\u5bf9\u53d8\u957f\u5e8f\u5217\u8fdb\u884c\u6279\u91cf\u5904\u7406\u5230\u6700\u5927\u957f\u5ea6\u5219\u4f1a\u5e26\u6765\u663e\u8457\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5206\u6790\u4e86Mamba\u67b6\u6784\u4e2d\u74f6\u9888\u64cd\u4f5c\u5668\u5728\u4e0d\u540c\u5f20\u91cf\u5f62\u72b6\u4e0b\u7684\u6027\u80fd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPackMamba\u7684\u9ad8\u541e\u5410\u91cfMamba\uff0c\u5b83\u80fd\u591f\u6709\u6548\u5730\u5904\u7406\u53d8\u957f\u5e8f\u5217\u3002\u6df1\u5165\u7814\u7a76\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSMs\uff09\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u5e76\u884c\u64cd\u4f5c\u5668\uff0c\u4ee5\u907f\u514d\u5728\u5404\u4e2a\u5e8f\u5217\u4e4b\u95f4\u4f20\u9012\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6027\u80fd\u3002\u5728NVIDIA A100 GPU\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPackMamba\u5728\u5904\u74061.4B\u6a21\u578b\u65f6\u6bd4\u57fa\u7ebf\u5355\u5e8f\u5217\u5904\u7406\u65b9\u6848\u63d0\u9ad8\u4e863.06\u500d\u7684\u901f\u5ea6\uff0c\u5728\u5904\u74062.8B\u6a21\u578b\u65f6\u63d0\u9ad8\u4e862.62\u500d\u7684\u901f\u5ea6\u3002|\n", "2408.03847": "|**2024-08-07**|**GAIA -- A Large Language Model for Advanced Power Dispatch**|Yuheng Cheng et.al.|[2408.03847](http://arxiv.org/abs/2408.03847)|null|\u7535\u529b\u8c03\u5ea6\u5bf9\u4e8e\u63d0\u4f9b\u7a33\u5b9a\u3001\u7ecf\u6d4e\u4e14\u73af\u4fdd\u7684\u7535\u529b\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u7535\u529b\u7cfb\u7edf\u89c4\u6a21\u548c\u590d\u6742\u6027\u7684\u589e\u957f\uff0c\u4f20\u7edf\u7684\u8c03\u5ea6\u65b9\u6cd5\u5728\u591a\u4efb\u52a1\u5904\u7406\u3001\u5feb\u901f\u95ee\u9898\u89e3\u51b3\u4ee5\u53ca\u4eba\u673a\u534f\u4f5c\u65b9\u9762\u9047\u5230\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4e13\u4e3a\u7535\u529b\u8c03\u5ea6\u4efb\u52a1\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u2014\u2014GAIA\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u96c6\u6784\u5efa\u6280\u672f\uff0c\u5229\u7528\u591a\u79cd\u6570\u636e\u6e90\u5bf9GAIA\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u5176\u5728\u8be5\u9886\u57df\u7684\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u7b80\u5316\u4e86LLM\u7684\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u4f7f\u5f97\u5728\u7535\u529b\u7cfb\u7edf\u7ba1\u7406\u4e2d\u80fd\u591f\u65e0\u7f1d\u6574\u5408\u591a\u7ef4\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e13\u95e8\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u9ad8GAIA\u5728\u8c03\u5ea6\u573a\u666f\u4e0b\u7684\u8f93\u5165\u8f93\u51fa\u6548\u7387\u3002\u5728ElecBench\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cGAIA\u5728\u591a\u4e2a\u6307\u6807\u4e0a\u8d85\u8d8a\u4e86\u57fa\u7840\u6a21\u578bLLaMA2\u3002\u5b9e\u9645\u5e94\u7528\u8868\u660e\uff0cGAIA\u80fd\u591f\u589e\u5f3a\u51b3\u7b56\u8fc7\u7a0b\u3001\u63d0\u9ad8\u8fd0\u8425\u6548\u7387\uff0c\u5e76\u4fc3\u8fdb\u7535\u529b\u8c03\u5ea6\u64cd\u4f5c\u4e2d\u7684\u4eba\u673a\u4ea4\u4e92\u3002\u672c\u6587\u6269\u5c55\u4e86LLM\u5728\u7535\u529b\u8c03\u5ea6\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u9a8c\u8bc1\u4e86\u5176\u5b9e\u7528\u6027\uff0c\u4e3a\u8fd9\u4e00\u9886\u57df\u672a\u6765\u7684\u521b\u65b0\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2408.03841": "|**2024-08-07**|**MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models**|Yuchen Dong et.al.|[2408.03841](http://arxiv.org/abs/2408.03841)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u64cd\u4f5c\u548c\u5de5\u5177\u751f\u6210\uff08SOTG\uff09\u9886\u57df\u7684\u5e94\u7528\uff0c\u4ee5\u6b64\u6765\u63d0\u5347\u8f6f\u4ef6\u751f\u4ea7\u529b\u3002\u8fd9\u4e00\u8fc7\u7a0b\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u6587\u660e\u65e9\u671f\u901a\u8fc7\u521b\u9020\u5e76\u4f7f\u7528\u5de5\u5177\u52a0\u901f\u53d1\u5c55\u7684\u9636\u6bb5\u3002\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u8981\u6c42AI\u80fd\u591f\u6301\u7eed\u603b\u7ed3\u5e76\u6539\u8fdb\u3002\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u5ffd\u89c6\u4e86\u5c06\u5b9e\u65f6\u4efb\u52a1\u7ecf\u9a8c\u8f6c\u5316\u4e3a\u7cfb\u7edf\u8bb0\u5fc6\u4ee5\u53ca\u533a\u5206\u73b0\u6709\u77e5\u8bc6\u672a\u6765\u4ef7\u503c\u7684\u91cd\u8981\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u201cMemory-Loop\u7f51\u7edc\u201d\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u7684\u8bb0\u5fc6\u5b58\u50a8\u4e0e\u7ecf\u9a8c\u5f15\u7528\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u57fa\u4e8e\u77e5\u8bc6\u7cbe\u786e\u5206\u6bb5\u7684RAG\u673a\u5236\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u4fbf\u6839\u636e\u4ef7\u503c\u5dee\u5f02\u5229\u7528\u8bb0\u5fc6\u3002\u9488\u5bf9SOTG\u8bbe\u8ba1\u4e86MaxMind\u6a21\u578b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MaxMind4Sheet\uff0c\u4e00\u4e2a\u9075\u5faaMaxMind\u7406\u5ff5\u7684\u7535\u5b50\u8868\u683c\u5904\u7406\u7cfb\u7edf\u3002\u4e0eSheetCopilot\u7684\u6bd4\u8f83\u5b9e\u9a8c\u663e\u793a\uff0c\u4efb\u52a1\u8bb0\u5fc6\u7684\u79ef\u7d2f\u548c\u5faa\u73af\u80fd\u591f\u7a33\u6b65\u63d0\u9ad8\u4efb\u52a1\u6210\u529f\u7387\uff0c\u5728\u6b64\u793a\u4f8b\u5b9e\u65bd\u4e2d\uff0c\u6bcf\u8f6e\u7684\u6210\u529f\u7387\u63d0\u5347\u7ea6\u4e3a3%-6%\u3002\u968f\u7740\u8bb0\u5fc6\u7684\u6301\u7eed\u589e\u957f\uff0c\u8fd9\u79cd\u7d2f\u79ef\u6539\u8fdb\u53ef\u80fd\u4f1a\u975e\u5e38\u663e\u8457\u3002 \u5f15\u5165\u8bb0\u5fc6\u5faa\u73af\u8fd8\u53ef\u4ee5\u901a\u8fc7\u9ad8\u8fbe25%\u7684\u6548\u7387\u63d0\u5347\u589e\u52a0\u7cfb\u7edf\u7684\u4efb\u52a1\u6267\u884c\u6548\u7387\uff0c\u5e76\u901a\u8fc7\u8bb0\u5fc6\u8f6c\u79fb\u89e3\u51b3LLM\u5728\u5904\u7406\u4e13\u4e1a\u4efb\u52a1\u65f6\u9762\u4e34\u7684\u518d\u8bad\u7ec3\u95ee\u9898\u3002\u8fd9\u8868\u660eMaxMind\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728SOTG\u9886\u57df\u7684\u529f\u80fd\u548c\u751f\u4ea7\u529b\u3002|\n", "2408.03837": "|**2024-08-07**|**WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models**|Prannaya Gupta et.al.|[2408.03837](http://arxiv.org/abs/2408.03837)|**[link](https://github.com/walledai/walledeval)**|WalledEval\u662f\u4e00\u4e2a\u5168\u9762\u7684AI\u5b89\u5168\u6027\u6d4b\u8bd5\u5de5\u5177\u5305\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u5b83\u80fd\u591f\u517c\u5bb9\u5404\u79cd\u6a21\u578b\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u4e24\u79cd\u7c7b\u578b\uff0c\u5e76\u5305\u542b\u4e86\u8d85\u8fc735\u4e2a\u8986\u76d6\u591a\u8bed\u8a00\u5b89\u5168\u3001\u5938\u5f20\u5b89\u5168\u4ee5\u53ca\u63d0\u793a\u6ce8\u5165\u7b49\u9886\u57df\u7684\u5b89\u5168\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u652f\u6301\u5bf9LLM\u548c\u88c1\u5224\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u4e14\u96c6\u6210\u81ea\u5b9a\u4e49\u7a81\u53d8\u5668\uff0c\u7528\u4e8e\u6d4b\u8bd5\u5728\u4e0d\u540c\u6587\u672c\u98ce\u683c\u53d8\u5f02\u5982\u5c06\u6765\u65f6\u6001\u548c\u91cd\u8ff0\u4e0b\u7684\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0cWalledEval\u5f15\u5165\u4e86WalledGuard\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5c0f\u578b\u9ad8\u6548\u5185\u5bb9\u5ba1\u6838\u5de5\u5177\uff0c\u4ee5\u53caSGXSTest\uff0c\u7528\u4e8e\u8bc4\u4f30\u6587\u5316\u80cc\u666f\u4e0b\u7684\u5938\u5927\u5b89\u5168\u95ee\u9898\u3002\u6211\u4eec\u5df2\u5c06WalledEval\u516c\u5f00\u53d1\u5e03\u5728https://github.com/walledai/walledevalA\u3002|\n", "2408.03834": "|**2024-08-07**|**Target Prompting for Information Extraction with Vision Language Model**|Dipankar Medhi et.al.|[2408.03834](http://arxiv.org/abs/2408.03834)|null|\u8fd1\u671f\uff0c\u5927\u578b\u89c6\u89c9\u4e0e\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u9886\u57df\u7684\u53d1\u5c55\u5728\u6784\u5efa\u4fe1\u606f\u63d0\u53d6\u7cfb\u7edf\u65b9\u9762\u5e26\u6765\u4e86\u65b0\u7684\u53d8\u9769\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u7406\u89e3\u6587\u6863\u548c\u6784\u5efa\u8de8\u884c\u4e1a\u7684\u95ee\u9898\u56de\u7b54\u7cfb\u7edf\u65b9\u9762\u8fbe\u5230\u4e86\u9876\u5c16\u6c34\u5e73\uff0c\u663e\u8457\u63d0\u5347\u4e86\u4ece\u6587\u6863\u56fe\u50cf\u751f\u6210\u6587\u672c\u4ee5\u53ca\u63d0\u4f9b\u7cbe\u786e\u7b54\u6848\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u7cbe\u51c6\u5bf9\u8bdd\u7cfb\u7edf\u65f6\u4ecd\u5b58\u5728\u4e00\u4e9b\u6311\u6218\u3002\u4f20\u7edf\u7684\u901a\u7528\u63d0\u793a\u6280\u672f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u7684\u5e94\u7528\u5f80\u5f80\u4e0d\u9002\u5408\u8fd9\u4e9b\u4e13\u95e8\u8bbe\u8ba1\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u4f7f\u7528\u8fd9\u7c7b\u901a\u7528\u8f93\u5165\u63d0\u793a\u6240\u751f\u6210\u7684\u8f93\u51fa\u901a\u5e38\u8f83\u4e3a\u666e\u901a\uff0c\u4e0e\u6587\u6863\u5b9e\u9645\u5185\u5bb9\u76f8\u6bd4\u53ef\u80fd\u5b58\u5728\u4fe1\u606f\u7f3a\u53e3\u3002\u4e3a\u4e86\u83b7\u5f97\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4f53\u7684\u7b54\u6848\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u90e8\u5206\u7684\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u63d0\u793a\uff0c\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7b54\u6848\u3002\u672c\u6587\u8ba8\u8bba\u4e86\u4e00\u79cd\u79f0\u4e3a\u201c\u76ee\u6807\u63d0\u793a\u201d\u7684\u6280\u672f\uff0c\u8be5\u6280\u672f\u4e13\u6ce8\u4e8e\u660e\u786e\u6307\u5411\u6587\u6863\u56fe\u50cf\u7684\u90e8\u5206\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u7528\u6237\u67e5\u8be2\u548c\u8f93\u5165\u63d0\u793a\u5bf9\u6bcf\u79cd\u63d0\u793a\u6280\u672f\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002|\n", "2408.04614": "|**2024-08-08**|**Better Alignment with Instruction Back-and-Forth Translation**|Thao Nguyen et.al.|[2408.04614](http://arxiv.org/abs/2408.04614)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\uff0c\u7528\u4e8e\u6784\u5efa\u57fa\u4e8e\u4e16\u754c\u77e5\u8bc6\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u6570\u636e\uff0c\u4ee5\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5bf9\u9f50\u3002\u7ed9\u5b9a\u7f51\u7edc\u8bed\u6599\u5e93\u4e2d\u7684\u6587\u6863\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Li\u7b49\u4eba(2023a)\u63d0\u51fa\u7684\u56de\u8bd1\u65b9\u6cd5\u751f\u6210\u5e76\u6574\u7406\u5408\u6210\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u6839\u636e\u521d\u59cb\u6587\u6863\u8fdb\u4e00\u6b65\u6539\u8fdb\u54cd\u5e94\u7684\u8d28\u91cf\u6765\u91cd\u5199\u8fd9\u4e9b\u6307\u4ee4\u3002\u901a\u8fc7\u4f7f\u7528\u4ea7\u751f\u7684\uff08\u56de\u8bd1\u6307\u4ee4\uff0c\u91cd\u5199\u54cd\u5e94\uff09\u5bf9\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5728AlpacaEval\u4e0a\u7684\u83b7\u80dc\u7387\u9ad8\u4e8e\u4f7f\u7528\u5176\u4ed6\u5e38\u89c1\u6307\u4ee4\u6570\u636e\u96c6\uff08\u5982Humpback\u3001ShareGPT\u3001Open Orca\u3001Alpaca-GPT4\u548cSelf-instruct\uff09\u3002\u6211\u4eec\u4e5f\u5c55\u793a\u4e86\u7528LLM\u91cd\u5199\u54cd\u5e94\u4f18\u4e8e\u76f4\u63a5\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u5e76\u4e14\u751f\u6210\u7684\u6587\u672c\u5206\u5e03\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u56de\u8bd1\u6307\u4ee4\u7684\u8d28\u91cf\u6bd4\u5176\u4ed6\u5408\u6210\u6307\u4ee4\u6765\u6e90\u66f4\u9ad8\uff0c\u800c\u6211\u4eec\u7684\u54cd\u5e94\u5728\u591a\u6837\u6027\u4e0e\u590d\u6742\u6027\u4e0a\u6bd4\u4ece\u84b8\u998f\u83b7\u5f97\u7684\u7ed3\u679c\u66f4\u4e3a\u51fa\u8272\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\u7ed3\u5408\u4e86\u7f51\u7edc\u4e0a\u4fe1\u606f\u591a\u6837\u6027\u548c\u6570\u91cf\u7684\u4f18\u52bf\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u8fd9\u662f\u6709\u6548\u5bf9\u9f50\u6240\u5fc5\u9700\u7684\u3002|\n", "2408.04594": "|**2024-08-09**|**Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models**|Qirui Jiao et.al.|[2408.04594](http://arxiv.org/abs/2408.04594)|**[link](https://github.com/modelscope/data-juicer)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aImg-Diff\u7684\u65b0\u6570\u636e\u96c6\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u548c\u56fe\u50cf\u5dee\u5f02\u63cf\u8ff0\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ec6\u5fae\u56fe\u50cf\u8bc6\u522b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5206\u6790\u76f8\u4f3c\u56fe\u50cf\u95f4\u7684\u5bf9\u8c61\u5dee\u5f02\uff0c\u8981\u6c42\u6a21\u578b\u8bc6\u522b\u76f8\u540c\u4e0e\u4e0d\u540c\u4e4b\u5904\u3002\u5229\u7528Stable-Diffusion-XL\u6a21\u578b\u53ca\u9ad8\u7ea7\u56fe\u50cf\u7f16\u8f91\u6280\u672f\u751f\u6210\u7a81\u51fa\u5bf9\u8c61\u66ff\u6362\u7684\u76f8\u4f3c\u56fe\u50cf\u5bf9\u3002\u6570\u636e\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u5dee\u5f02\u533a\u57df\u751f\u6210\u5668\u8bc6\u522b\u5bf9\u8c61\u5dee\u5f02\uff0c\u968f\u540e\u5dee\u5f02\u63cf\u8ff0\u751f\u6210\u5668\u63d0\u4f9b\u8be6\u7ec6\u7684\u5dee\u5f02\u8bf4\u660e\u3002\u7ed3\u679c\u662f\u521b\u5efa\u4e86\u4e00\u4e2a\u5c0f\u800c\u9ad8\u8d28\u91cf\u7684\u201c\u5bf9\u8c61\u66ff\u6362\u201d\u6837\u672c\u96c6\u5408\u3002\u4f7f\u7528\u6b64\u6570\u636e\u96c6\u5bf9\u5f53\u524d\u6700\u4f73\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982MGM-7B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u5dee\u5f02\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini\uff09\u5728MMVP\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63a2\u8ba8\u4e86\u901a\u8fc7\u201c\u5bf9\u8c61\u79fb\u9664\u201d\u65b9\u6cd5\u751f\u6210\u56fe\u50cf\u5dee\u5f02\u6570\u636e\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u5e76\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\u4ee5\u9a8c\u8bc1\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u5bf9\u6bd4\u6027\u6570\u636e\u96c6\u5408\u6210\u7684\u6df1\u5165\u89c1\u89e3\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5e76\u63a8\u52a8\u591a\u6a21\u6001\u6570\u636e\u5408\u6210\u548c\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u80fd\u529b\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53d1\u5e03\u5728https://github.com/modelscope/data-juicer/tree/ImgDiff\u4e0a\u4f9b\u516c\u4f17\u4f7f\u7528\u3002**|\n", "2408.04585": "|**2024-08-08**|**Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness**|Xiaojing Fan et.al.|[2408.04585](http://arxiv.org/abs/2408.04585)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b9e\u7528\u5e94\u7528\u9700\u6c42\u7684\u589e\u52a0\uff0c\u8bb8\u591a\u5173\u6ce8\u6548\u7387\u7684\u6a21\u578b\u88ab\u5f00\u53d1\u51fa\u6765\u4ee5\u5e73\u8861\u6027\u80fd\u548c\u8ba1\u7b97\u6210\u672c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5bf9\u6297\u9c81\u68d2\u6027\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u7814\u7a76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u901a\u8fc7\u6bd4\u8f83\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u590d\u6742\u5ea6\u548c\u6548\u7387\u6c34\u5e73\u7684\u4e3b\u8981\u6a21\u578b\u2014\u2014Transformer++\u3001\u95e8\u63a7\u7ebf\u6027\u6ce8\u610f\u529b\uff08GLA\uff09\u53d8\u6362\u5668\u4ee5\u53caMatMul-Free LM\uff0c\u6765\u63a2\u7d22\u6548\u7387\u3001\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u7684\u6743\u8861\u5173\u7cfb\u3002\u5229\u7528GLUE\u548cAdvGLUE\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002AdvGLUE\u6570\u636e\u96c6\u901a\u8fc7\u6dfb\u52a0\u65e8\u5728\u6311\u6218\u6a21\u578b\u9c81\u68d2\u6027\u7684\u5bf9\u6297\u6837\u672c\u6269\u5c55\u4e86GLUE\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5728GLUE\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u6027\u7a0d\u4f4e\u7684\u60c5\u51b5\u4e0b\uff0cGLA\u53d8\u6362\u5668\u548cMatMul-Free LM\u5728AdvGLUE\u4efb\u52a1\u4e0a\u663e\u793a\u51fa\u66f4\u9ad8\u7684\u6548\u7387\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u653b\u51fb\u7ea7\u522b\u4e0b\uff0c\u5b83\u4eec\u7684\u9c81\u68d2\u6027\u8981\u4e48\u4f18\u4e8e\uff0c\u8981\u4e48\u4e0eTransformer++\u76f8\u5339\u654c\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7b80\u5316\u67b6\u6784\u5728\u5b9e\u73b0\u9ad8\u6548\u80fd\u3001\u9ad8\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u4e4b\u95f4\u53d6\u5f97\u826f\u597d\u5e73\u8861\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8d44\u6e90\u53d7\u9650\u73af\u5883\u548c\u5bf9\u5bf9\u6297\u653b\u51fb\u6709\u9ad8\u62b5\u6297\u529b\u9700\u6c42\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2408.04575": "|**2024-08-08**|**SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals**|Haoran Zheng et.al.|[2408.04575](http://arxiv.org/abs/2408.04575)|null|\u89e3\u91ca\u6027\u4eba\u5de5\u667a\u80fd\uff08XAI\uff09\u5bf9\u4e8e\u589e\u5f3a\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u7684\u900f\u660e\u5ea6\u548c\u8d23\u4efb\u6027\u81f3\u5173\u91cd\u8981\uff0c\u5c24\u5176\u662f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSCENE\uff08\u8f6f\u53cd\u4e8b\u5b9e\u8bc4\u4f30\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u53ef\u89e3\u91ca\u6027\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6b21\u5c04\u51fb\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8f6f\u53cd\u4e8b\u5b9e\u89e3\u91ca\u3002\u901a\u8fc7\u5173\u6ce8\u57fa\u4e8e\u8bcd\u5143\u7684\u66ff\u6362\uff0cSCENE\u521b\u5efa\u4e86\u4e0a\u4e0b\u6587\u76f8\u5173\u4e14\u8bed\u4e49\u4e0a\u5177\u6709\u610f\u4e49\u7684\u8f6f\u53cd\u4e8b\u5b9e\uff0c\u800c\u65e0\u9700\u8fdb\u884c\u5927\u91cf\u5fae\u8c03\u3002SCENE\u91c7\u7528\u6709\u6548\u6027\u8f6f\u548cC\u8f6f\u6307\u6807\u6765\u8bc4\u4f30\u5404\u79cd\u6a21\u578b\u65e0\u5173\u7684XAI\u65b9\u6cd5\u5728\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6548\u679c\u3002\u5e94\u7528\u4e8eCNN\u3001RNN\u548cBERT\u67b6\u6784\uff0cSCENE\u63d0\u4f9b\u4e86\u5bf9\u5404\u79cdXAI\u6280\u672f\u5f3a\u9879\u548c\u5c40\u9650\u6027\u7684\u6709\u4ef7\u503c\u89c1\u89e3\u3002|\n", "2408.04568": "|**2024-08-08**|**Learning Fine-Grained Grounded Citations for Attributed Large Language Models**|Lei Huang et.al.|[2408.04568](http://arxiv.org/abs/2408.04568)|**[link](https://github.com/luckyyysta/fine-grained-attribution)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u5728\u5e7b\u89c9\u95ee\u9898\u4e0a\u5b58\u5728\u6311\u6218\u3002\u57fa\u4e8e\u5c5e\u6027\u7684LLM\uff0c\u901a\u8fc7\u5728\u751f\u6210\u6587\u672c\u4e2d\u6dfb\u52a0\u5185\u8054\u5f15\u7528\uff0c\u663e\u793a\u51fa\u51cf\u5c11\u5e7b\u89c9\u5e76\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u5f15\u7528\u65b9\u9762\u6548\u679c\u4e0d\u4f73\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u4f9d\u8d56\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u53ea\u5f15\u7528\u7c97\u7c92\u5ea6\u6587\u6863\u6807\u8bc6\u7684\u505a\u6cd5\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FRONT\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLM\u751f\u6210\u7ec6\u7c92\u5ea6\u76f8\u5173\u5f15\u7528\u3002\u8fd9\u4e9b\u5f15\u7528\u901a\u8fc7\u8fde\u63a5\u5230\u751f\u6210\u54cd\u5e94\u7684\u7ec6\u7c92\u5ea6\u652f\u6301\u5f15\u7528\u6765\u63d0\u4f9b\u6307\u5bfc\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u5f15\u7528\u8d28\u91cf\uff0c\u8fd8\u4fbf\u4e8e\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u5728ALCE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFRONT\u5728\u751f\u6210\u4f18\u79c0\u76f8\u5173\u54cd\u5e94\u548c\u9ad8\u5ea6\u652f\u6301\u6027\u5f15\u7528\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002\u4f7f\u7528LLaMA-2-7B\u65f6\uff0c\u8be5\u6846\u67b6\u663e\u8457\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u5e73\u5747\u63d0\u9ad8\u4e8614.21%\u7684\u5f15\u7528\u8d28\u91cf\uff0c\u5e76\u4e14\u8d85\u8d8a\u4e86ChatGPT\u3002**|\n", "2408.04556": "|**2024-08-08**|**Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models**|Yupeng Chang et.al.|[2408.04556](http://arxiv.org/abs/2408.04556)|**[link](https://github.com/cyp-jlu-ai/ba-lora)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u77a9\u76ee\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u5c06\u8fd9\u4e9b\u6a21\u578b\u5e94\u7528\u4e8e\u4e0b\u6e38\u5e94\u7528\u65f6\uff0c\u901a\u5e38\u9700\u8981\u8fdb\u884c\u8ba1\u7b97\u5bc6\u96c6\u578b\u548c\u5185\u5b58\u6d88\u8017\u5927\u7684\u5fae\u8c03\u8fc7\u7a0b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u6280\u672f\u5df2\u7ecf\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u51fa\u73b0\uff0c\u65e8\u5728\u4ee5\u6700\u5c0f\u7684\u8ba1\u7b97\u6210\u672c\u6765\u5b9a\u5236LLM\u3002\u5c3d\u7ba1PEFT\u65b9\u6cd5\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5b8c\u5168\u89e3\u51b3\u4ece\u9884\u8bad\u7ec3\u6570\u636e\u7ee7\u627f\u504f\u89c1\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684PEFT\u65b9\u6cd5\u2014\u2014Bias-Aware Low-Rank Adaptation (BA-LoRA)\uff0c\u65e8\u5728\u5bf9\u6297\u504f\u89c1\u7ee7\u627f\u3002 BA-LoRA\u6574\u5408\u4e86\u4e09\u4e2a\u4e0d\u540c\u7684\u6b63\u5219\u5316\u9879\uff1a\u4e00\u81f4\u6027\u6b63\u5219\u5316\u5668\u3001\u591a\u6837\u6027\u6b63\u5219\u5316\u5668\u4ee5\u53ca\u5947\u5f02\u503c\u5206\u89e3\u6b63\u5219\u5316\u5668\u3002\u8fd9\u4e09\u4e2a\u6b63\u5219\u5316\u5668\u5171\u540c\u65e8\u5728\u63d0\u9ad8\u751f\u6210\u6a21\u578b\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u4e00\u81f4\u6027\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u548c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528\u5982LLaMA\u3001Mistral\u548cGemma\u7b49\u4e3b\u6d41LLM\uff0c\u6211\u4eec\u5c55\u793a\u4e86BA-LoRA\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86LoRA\u53ca\u5176\u6700\u5148\u8fdb\u7684\u53d8\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u5730\u51cf\u8f7b\u4e86\u9884\u8bad\u7ec3\u504f\u89c1\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u5bfc\u81f4\u66f4\u53ef\u9760\u4e14\u7a33\u5065\u7684\u6a21\u578b\u8f93\u51fa\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/cyp-jlu-ai/BA-LoRA\u3002**|\n", "2408.04522": "|**2024-08-08**|**Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models**|Fabio Pernisi et.al.|[2408.04522](http://arxiv.org/abs/2408.04522)|null|\u968f\u7740\u4e0d\u540c\u8bed\u8a00\u7684\u591a\u5143\u8bed\u8a00\u793e\u533a\u548c\u7528\u6237\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u4e0d\u540c\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u5b89\u5168\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u7ecf\u8fdb\u884c\u4e86\u6301\u7eed\u7684\u52aa\u529b\u4ee5\u786e\u4fddLLM\u7684\u5b89\u5168\u6027\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u53ef\u4ee5\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u6280\u672f\u6765\u8868\u73b0\u5f97\u4e0d\u5b89\u5168\uff0c\u8fd9\u662f\u4e00\u79cd\u4fc3\u4f7f\u6a21\u578b\u5728\u5176\u64cd\u4f5c\u51c6\u5219\u4e4b\u5916\u884c\u52a8\u7684\u6280\u672f\u3002\u5bf9\u4e8eLLM\u5b89\u5168\u6027\u4ee5\u53ca\u201c\u8d8a\u72f1\u201d\u7684\u7814\u7a76\u76ee\u524d\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\uff0c\u8fd9\u9650\u5236\u4e86\u6211\u4eec\u5bf9\u5176\u4ed6\u8bed\u8a00\u4e2dLLM\u5b89\u5168\u6027\u7684\u7406\u89e3\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u610f\u5927\u5229\u8bed\u4e2d\u7814\u7a76\u591a\u8f6e\u201c\u8d8a\u72f1\u201d\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u7528\u4e0d\u5b89\u5168\u793a\u4f8b\u6765\u8bf1\u5bfc\u4e0d\u5b89\u5168\u884c\u4e3a\uff0c\u6765\u8d21\u732e\u4e8e\u8fd9\u4e00\u9886\u57df\u3002\u4e3a\u4e86\u652f\u6301\u6211\u4eec\u7684\u5206\u6790\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u610f\u5927\u5229\u8bed\u95ee\u9898-\u7b54\u6848\u4e0d\u5b89\u5168\u6570\u636e\u96c6\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5728\u56db\u4e2a\u5f00\u653e\u6743\u91cdLLM\u5bb6\u65cf\u4e2d\u8bc6\u522b\u51fa\u4e86\u660e\u663e\u7684\u5b89\u5168\u6f0f\u6d1e\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u4f7f\u7528\u5c11\u91cf\u4e0d\u5b89\u5168\u793a\u4f8b\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u4e5f\u4f1a\u8868\u73b0\u51fa\u4e0d\u5b89\u5168\u7684\u884c\u4e3a\uff0c\u5e76\u4e14\u66f4\u4ee4\u4eba\u62c5\u5fe7\u7684\u662f\uff0c\u968f\u7740\u66f4\u591a\u793a\u4f8b\u7684\u51fa\u73b0\uff0c\u8fd9\u79cd\u8d8b\u52bf\u8fc5\u901f\u52a0\u5267\u3002|\n", "2408.04477": "|**2024-08-08**|**What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant**|Jonan Richards et.al.|[2408.04477](http://arxiv.org/abs/2408.04477)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7528\u4e8e\u8f85\u52a9\u5f00\u53d1\u8005\u7406\u89e3\u4ee3\u7801\u7684\u5de5\u5177\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\u7684\u540c\u65f6\uff0c\u5f00\u53d1\u8005\u5728\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u65f6\u4ecd\u9762\u4e34\u4e00\u4e9b\u969c\u788d\uff0c\u5305\u62ec\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u5176\u610f\u56fe\u7684\u6311\u6218\u3001\u89e3\u8bfb\u5de5\u5177\u7ed3\u679c\u7684\u56f0\u96be\uff0c\u4ee5\u53ca\u8c03\u6574\u6709\u6548\u63d0\u793a\u4ee5\u83b7\u5f97\u6709\u7528\u4fe1\u606f\u7684\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u5bf9\u8bdd\u52a9\u624b\uff0c\u8be5\u52a9\u624b\u6839\u636e\u63a8\u65ad\u51fa\u7684\u7528\u6237\u5fc3\u7406\u72b6\u6001\uff08\u5982\u80cc\u666f\u77e5\u8bc6\u548c\u7ecf\u9a8c\uff09\u63d0\u4f9b\u4e2a\u6027\u5316\u4e92\u52a8\u3002\u901a\u8fc7\u9488\u5bf9\u5341\u56db\u4f4d\u65b0\u624b\u8fdb\u884c\u7684\u5185\u90e8\u4e3b\u9898\u7814\u7a76\uff0c\u6211\u4eec\u6355\u6349\u4e86\u4ed6\u4eec\u7684\u611f\u77e5\u548c\u504f\u597d\u3002\u7814\u7a76\u7ed3\u679c\u4e3a\u5e0c\u671b\u521b\u5efa\u6216\u6539\u8fdb\u9762\u5411\u65b0\u624b\u7684LLM\u4e3a\u57fa\u7840\u7684\u5bf9\u8bdd\u52a9\u624b\u4ee5\u652f\u6301\u4ee3\u7801\u7406\u89e3\u7684\u7814\u7a76\u4eba\u5458\u548c\u5de5\u5177\u5f00\u53d1\u8005\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.04472": "|**2024-08-08**|**Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate**|Yiqun Zhang et.al.|[2408.04472](http://arxiv.org/abs/2408.04472)|**[link](https://github.com/zhangyiqun018/agent-for-debate)**|**\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u8fd9\u4e00\u5168\u9762\u4e14\u590d\u6742\u7684\u8ba1\u7b97\u8bba\u8fa9\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u4e34\u7740\u5e7b\u89c9\u548c\u7ade\u4e89\u529b\u4e0d\u8db3\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8fa9\u8bba\u8005\u201d\uff08Agent4Debate\uff09\u7684\u52a8\u6001\u3001\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u57fa\u4e8eLLMs\u8bbe\u8ba1\uff0c\u65e8\u5728\u589e\u5f3a\u5176\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u53d7\u5230\u4eba\u7c7b\u5728\u8fa9\u8bba\u51c6\u5907\u4e0e\u6267\u884c\u8fc7\u7a0b\u4e2d\u884c\u4e3a\u7684\u542f\u53d1\uff0c\u91c7\u7528\u534f\u4f5c\u67b6\u6784\uff0c\u7531\u56db\u4e2a\u4e13\u95e8\u7684\u4ee3\u7406\uff08\u641c\u7d22\u8005\u3001\u5206\u6790\u8005\u3001\u64b0\u5199\u8005\u548c\u5ba1\u9605\u8005\uff09\u52a8\u6001\u4ea4\u4e92\u5e76\u5408\u4f5c\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5728\u6574\u4e2a\u8fa9\u8bba\u8fc7\u7a0b\u4e2d\u8986\u76d6\u4e86\u4ece\u521d\u59cb\u7814\u7a76\u5230\u8bba\u70b9\u5f62\u6210\u3001\u53cd\u9a73\u548c\u603b\u7ed3\u7684\u591a\u4e2a\u9636\u6bb5\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6846\u67b6\u7684\u6027\u80fd\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u4e2d\u56fd\u8fa9\u8bba\u7ade\u6280\u573a\u201d\u7684\u6570\u636e\u5e93\uff0c\u5305\u542b\u4e8666\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4e2d\u6587\u8fa9\u8bba\u8bae\u9898\u3002\u6211\u4eec\u62db\u52df\u4e86\u5341\u4f4d\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e13\u4e1a\u8fa9\u8bba\u8005\uff0c\u5e76\u6536\u96c6\u4e86\u6d89\u53caAgent4Debate\u3001\u57fa\u7ebf\u6a21\u578b\u548c\u4eba\u7c7b\u7684200\u573a\u8fa9\u8bba\u8bb0\u5f55\u3002\u8bc4\u4ef7\u4f53\u7cfb\u91c7\u7528\u4e86\u81ea\u52a8\u8bc4\u5206\u7cfb\u7edfDebatrix\u4ee5\u53ca\u57fa\u4e8eDebatrix-Elo\u548cHuman-Elo\u6392\u540d\u7684\u4e13\u4e1a\u8bc4\u5ba1\u56e2\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5148\u8fdb\u7684Agent4Debate\u5728\u80fd\u529b\u4e0a\u4e0e\u4eba\u7c7b\u76f8\u5f53\u3002\u8fdb\u4e00\u6b65\u7684\u6d88\u878d\u7814\u7a76\u8868\u660e\uff0c\u4ee3\u7406\u7ed3\u6784\u4e2d\u7684\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.04449": "|**2024-08-08**|**RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents**|Zihao Zhu et.al.|[2408.04449](http://arxiv.org/abs/2408.04449)|null|\u6458\u8981\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRiskAwareBench\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u7531\u56db\u4e2a\u6a21\u5757\u7ec4\u6210\uff1a\u5b89\u5168\u63d0\u793a\u751f\u6210\u3001\u5371\u9669\u573a\u666f\u751f\u6210\u3001\u8ba1\u5212\u751f\u6210\u548c\u8bc4\u4f30\uff0c\u5b83\u5141\u8bb8\u8fdb\u884c\u5168\u9762\u7684\u98ce\u9669\u8bc4\u4f30\uff0c\u4e14\u6240\u9700\u7684\u4eba\u5de5\u5e72\u9884\u6700\u5c11\u3002\u901a\u8fc7\u4f7f\u7528\u8fd9\u4e2a\u6846\u67b6\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aPhysicalRisk\u7684\u6570\u636e\u96c6\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u6d89\u53ca\u76f8\u5173\u5b89\u5168\u63d0\u793a\u3001\u89c2\u5bdf\u548c\u6307\u4ee4\u7684\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5927\u591a\u6570LLM\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u8868\u73b0\u4e0d\u8db3\uff0c\u5e76\u4e14\u57fa\u7840\u7684\u98ce\u9669\u7f13\u89e3\u7b56\u7565\u5e26\u6765\u7684\u63d0\u5347\u6709\u9650\u3002\u8fd9\u5f3a\u8c03\u4e86\u5728\u672a\u6765\u6539\u8fdb\u57fa\u4e8eLLM\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u7684\u7269\u7406\u98ce\u9669\u610f\u8bc6\u7684\u7d27\u8feb\u6027\u548c\u91cd\u8981\u6027\u3002|\n", "2408.05212": "|**2024-08-10**|**Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions**|Michele Miranda et.al.|[2408.05212](http://arxiv.org/abs/2408.05212)|**[link](https://github.com/michele17284/awesome-privacy-preserving-llms)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u6b65\uff0c\u5e76\u5728\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5e9e\u5927\u7684\u4e92\u8054\u7f51\u6765\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u5e26\u6765\u4e86\u663e\u8457\u7684\u9690\u79c1\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5728\u5173\u952e\u9886\u57df\uff08\u5982\u533b\u7597\u4fdd\u5065\uff09\u7684\u60c5\u51b5\u4e0b\u4f1a\u52a0\u5267\u8fd9\u4e9b\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5728\u7279\u5b9a\u5e94\u7528\u573a\u666f\u4e0b\uff0c\u53ef\u80fd\u9700\u8981\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u9488\u5bf9\u79c1\u6709\u6570\u636e\u7684\u5fae\u8c03\u3002\u672c\u6587\u5bf9LLM\u7684\u9690\u79c1\u5a01\u80c1\u8fdb\u884c\u4e86\u6279\u5224\u6027\u8bc4\u4f30\uff0c\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u6a21\u578b\u53ef\u80fd\u8bb0\u4f4f\u5e76\u65e0\u610f\u95f4\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u7684\u98ce\u9669\u3002 \u6211\u4eec\u901a\u8fc7\u56de\u987e\u9488\u5bf9LLM\u7684\u9690\u79c1\u653b\u51fb\u6765\u63a2\u8ba8\u5f53\u524d\u7684\u5a01\u80c1\uff0c\u5e76\u63d0\u51fa\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5728\u6574\u4e2a\u5b66\u4e60\u7ba1\u9053\u4e2d\u6574\u5408\u9690\u79c1\u673a\u5236\u3002\u8fd9\u4e9b\u89e3\u51b3\u65b9\u6848\u6db5\u76d6\u4e86\u4ece\u533f\u540d\u5316\u8bad\u7ec3\u6570\u636e\u5230\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u8fc7\u7a0b\u4e2d\u5b9e\u65bd\u5dee\u5206\u9690\u79c1\uff0c\u4ee5\u53ca\u5728\u8bad\u7ec3\u540e\u6267\u884c\u673a\u5668\u9057\u5fd8\u7684\u8303\u56f4\u3002\u6211\u4eec\u7684\u6587\u732e\u7efc\u8ff0\u6df1\u5165\u7814\u7a76\u4e86\u73b0\u6709\u7814\u7a76\u4e2d\u7684\u6301\u7eed\u6311\u6218\u3001\u53ef\u7528\u5de5\u5177\u548c\u672a\u6765\u65b9\u5411\uff0c\u4ee5\u4fdd\u62a4LLM\u4e2d\u7684\u9690\u79c1\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u5bf9\u9690\u79c1\u4fdd\u5b58\u65b9\u6cd5\u53ca\u5176\u5728\u51cf\u8f7b\u98ce\u9669\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u5168\u9762\u7406\u89e3\uff0c\u6307\u5bfc\u5f00\u53d1\u66f4\u5b89\u5168\u3001\u66f4\u53ef\u4fe1\u7684AI\u7cfb\u7edf\u3002|\n", "2408.05211": "|**2024-08-09**|**VITA: Towards Open-Source Interactive Omni Multimodal LLM**|Chaoyou Fu et.al.|[2408.05211](http://arxiv.org/abs/2408.05211)|**[link](https://github.com/VITA-MLLM/VITA)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86VITA\uff0c\u8fd9\u662f\u9996\u4e2a\u5f00\u6e90\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u540c\u65f6\u5904\u7406\u548c\u5206\u6790\u89c6\u9891\u3001\u56fe\u50cf\u3001\u6587\u672c\u548c\u97f3\u9891\u7b49\u591a\u5143\u6a21\u6001\u4fe1\u606f\uff0c\u5e76\u4e14\u5177\u5907\u9ad8\u7ea7\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u4f53\u9a8c\u3002\u4eceMixtral 8x7B\u4f5c\u4e3a\u8bed\u8a00\u57fa\u7840\u51fa\u53d1\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u5176\u5728\u4e2d\u6587\u65b9\u9762\u7684\u8bcd\u6c47\uff0c\u5e76\u901a\u8fc7\u53cc\u8bed\u6307\u4ee4\u5fae\u8c03\u8fdb\u4e00\u6b65\u63d0\u5347\u4e86\u6a21\u578b\u80fd\u529b\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e24\u9636\u6bb5\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u65b9\u5f0f\uff0c\u4e3a\u8bed\u8a00\u6a21\u578b\u8d4b\u4e88\u4e86\u89c6\u89c9\u548c\u97f3\u9891\u5904\u7406\u7684\u80fd\u529b\u3002 VITA\u5c55\u73b0\u4e86\u5f3a\u5927\u7684\u591a\u8bed\u8a00\u3001\u89c6\u89c9\u548c\u97f3\u9891\u7406\u89e3\u7684\u57fa\u7840\u80fd\u529b\uff0c\u5e76\u5728\u4e00\u7cfb\u5217\u5355\u6a21\u6001\u4e0e\u591a\u6a21\u6001\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u9664\u4e86\u57fa\u7840\u80fd\u529b\u5916\uff0c\u6211\u4eec\u5728\u63d0\u5347\u81ea\u7136\u591a\u6a21\u6001\u4eba\u673a\u4ea4\u4e92\u4f53\u9a8c\u65b9\u9762\u4e5f\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5229\u7528\u975e\u5524\u9192\u4ea4\u4e92\u548c\u97f3\u9891\u4e2d\u65ad\u529f\u80fd\u3002 VITA\u662f\u5f00\u6e90\u793e\u533a\u63a2\u7d22\u65e0\u7f1d\u878d\u5408\u591a\u6a21\u6001\u7406\u89e3\u548c\u4ea4\u4e92\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1VITA\u4e0e\u4e13\u6709\u6a21\u578b\u8fd8\u6709\u8f83\u5927\u5dee\u8ddd\uff0c\u4f46\u6211\u4eec\u76f8\u4fe1\u5b83\u4f5c\u4e3a\u5148\u950b\u89d2\u8272\u53ef\u4ee5\u6210\u4e3a\u540e\u7eed\u7814\u7a76\u7684\u91cd\u8981\u57fa\u77f3\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://vita-home.github.io|\n", "2408.05204": "|**2024-08-09**|**Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners**|Michael Vaccaro Jr et.al.|[2408.05204](http://arxiv.org/abs/2408.05204)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5c24\u5176\u662fOpenAI\u7684GPT\u7cfb\u5217\uff0c\u5728\u591a\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u56e0\u5176\u5728\u4e0d\u540c\u5b66\u79d1\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u4ee5\u53ca\u5bf9\u7528\u6237\u63d0\u793a\u7684\u5feb\u901f\u9002\u5e94\u6027\u800c\u53d7\u5230\u5173\u6ce8\uff0c\u5e76\u4e14\u5c55\u73b0\u51fa\u4f5c\u4e3a\u4e2a\u6027\u5316\u5b66\u4e60\uff08PL\uff09\u5de5\u5177\u7684\u72ec\u7279\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728K-12\u6559\u80b2\u4e2d\u7684\u5e94\u7528\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u9996\u6b21\u91c7\u7528\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u65b9\u6cd5\uff08\u6837\u672c\u91cf\u4e3a23\uff09\u6765\u8bc4\u4f30GPT-4\u5728\u4e2d\u5b66\u79d1\u5b66\u6587\u672c\u4e2a\u6027\u5316\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u7814\u7a76\u3002\u5728\u8be5\u7814\u7a76\u4e2d\uff0cGPT-4\u7528\u4e8e\u6839\u636e\u5b66\u751f\u5728\u8bad\u7ec3\u9636\u6bb5\u505a\u51fa\u7684\u9009\u62e9\u6765\u5206\u6790\u548c\u9884\u6d4b\u4ed6\u4eec\u7684\u5b66\u4e60\u504f\u597d\u3002\u5bf9\u4e8e\u5b9e\u9a8c\u7ec4\u7684\u5b66\u751f\uff0cGPT-4\u88ab\u7528\u6765\u4fee\u6539\u79d1\u5b66\u6587\u672c\u4ee5\u4e0e\u5b66\u751f\u7684\u9884\u6d4b\u504f\u597d\u76f8\u5339\u914d\uff1b\u800c\u5bf9\u4e8e\u63a7\u5236\u7ec4\u7684\u5b66\u751f\uff0c\u6587\u672c\u5219\u88ab\u4fee\u6539\u4e3a\u4e0e\u5176\u5b66\u4e60\u504f\u597d\u76f8\u53cd\u3002\u901a\u8fc7\u66fc-\u60e0\u7279\u5c3cU\u68c0\u9a8c\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u6587\u672c\u4e0e\u5b66\u751f\u504f\u597d\u5339\u914d\u65f6\uff0c\u5b66\u751f\u660e\u663e\u66f4\u503e\u5411\u4e8e\u63a5\u53d7\uff08\u57280.10\u6c34\u5e73\u4e0a\u5177\u6709\u7edf\u8ba1\u5b66\u610f\u4e49\uff0cp=0.059\uff09\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u80fd\u591f\u6709\u6548\u5730\u7406\u89e3\u548c\u5b9a\u5236\u6559\u80b2\u5185\u5bb9\u4ee5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u8005\u7684\u504f\u597d\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u5b66\u4e60\u6280\u672f\u9886\u57df\u7684\u4e00\u4e2a\u91cd\u8981\u8fdb\u5c55\u3002 \u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u8fd9\u9879\u7814\u7a76\u7684\u5c40\u9650\u6027\u548c\u5728\u6559\u80b2\u4e2d\u4f7f\u7528\u4eba\u5de5\u667a\u80fd\u7684\u4f26\u7406\u8003\u8651\u3002|\n", "2408.05200": "|**2024-08-09**|**TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning**|Yujie Feng et.al.|[2408.05200](http://arxiv.org/abs/2408.05200)|**[link](https://github.com/WoodScene/TaSL)**|\u8bed\u8a00\u6a21\u578b\u8fde\u7eed\u5b66\u4e60\uff08CL\uff09\u6700\u8fd1\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u56e0\u4e3a\u5b83\u6709\u53ef\u80fd\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u9002\u5e94\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u52a8\u6001\u73b0\u5b9e\u73af\u5883\u3002\u4e00\u4e2a\u5173\u952e\u6311\u6218\u662f\u707e\u96be\u6027\u9057\u5fd8\uff0c\u5373\u6a21\u578b\u5728\u5b66\u4e60\u65b0\u4efb\u52a1\u65f6\u4f1a\u5931\u53bb\u5148\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u73b0\u6709\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u591a\u4e2a\u53c2\u6570\u6548\u7387\u5fae\u8c03\uff08PEFT\uff09\u5757\u6765\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u83b7\u53d6\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u7f3a\u4e4f\u6548\u7387\uff0c\u5e76\u4e14\u5ffd\u89c6\u4e86\u901a\u8fc7\u4efb\u52a1\u4ea4\u4e92\u8fdb\u884c\u77e5\u8bc6\u4f20\u9012\u7684\u53ef\u80fd\u6027\u3002 \u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4efb\u52a1\u6280\u80fd\u5b9a\u4f4d\u4e0e\u6574\u5408\uff08TaSL\uff09\u7684\u65b0CL\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u4e0d\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u91cd\u64ad\u6765\u589e\u5f3a\u77e5\u8bc6\u4f20\u9012\u3002TaSL\u9996\u5148\u6839\u636e\u53c2\u6570\u4f9d\u8d56\u6027\u5c06\u6a21\u578b\u5206\u4e3a\u201c\u6280\u80fd\u5355\u5143\u201d\uff0c\u8fd9\u4f7f\u5f97\u5bf9\u6280\u80fd\u5355\u5143\u7684\u63a7\u5236\u66f4\u52a0\u7cbe\u7ec6\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ec4\u7ea7\u6280\u80fd\u5b9a\u4f4d\u6280\u672f\uff0c\u4ee5\u8bc6\u522b\u65b0\u4efb\u52a1\u4e2d\u6280\u80fd\u5355\u5143\u7684\u91cd\u8981\u6027\u5206\u5e03\u3002\u901a\u8fc7\u6bd4\u8f83\u8fd9\u4e2a\u91cd\u8981\u6027\u5206\u5e03\u4e0e\u5176\u4ed6\u5148\u524d\u4efb\u52a1\u4e2d\u7684\u5206\u5e03\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u4e00\u4e2a\u7cbe\u7ec6\u7684\u6280\u80fd\u6574\u5408\u7b56\u7565\uff0c\u4fdd\u7559\u4e86\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u9632\u6b62\u9057\u5fd8\uff0c\u5e76\u66f4\u65b0\u4e86\u5171\u4eab\u4efb\u52a1\u77e5\u8bc6\uff0c\u8fd9\u4fc3\u8fdb\u4e86\u53cc\u5411\u77e5\u8bc6\u4f20\u9012\u3002\u56e0\u6b64\uff0cTaSL\u5b9e\u73b0\u4e86\u4fdd\u6301\u5148\u524d\u77e5\u8bc6\u548c\u5728\u65b0\u4efb\u52a1\u4e0a\u53d6\u5f97\u4f18\u5f02\u8868\u73b0\u4e4b\u95f4\u7684\u6700\u4f73\u5e73\u8861\u3002 TaSL\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u901a\u7528\u6a21\u578b\uff0c\u5e76\u53ef\u4ee5\u6839\u636eLoRA\u7b49PEFT\u65b9\u6cd5\u8fdb\u884c\u5b9a\u5236\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8868\u73b0\u51fa\u663e\u8457\u7684\u6269\u5c55\u6027\uff0c\u5141\u8bb8\u4e0e\u8bb0\u5fc6\u91cd\u64ad\u96c6\u6210\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\u3002\u5728\u4e24\u4e2aCL\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4f7f\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece2.2\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\uff0c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86TaSL\u53ca\u5176\u53d8\u4f53\u5728\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u6709\u6548\u6027\u3002|\n", "2408.05149": "|**2024-08-09**|**AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset**|Pritam Deka et.al.|[2408.05149](http://arxiv.org/abs/2408.05149)|null|\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u653b\u51fb\u5f52\u56e0\u662f\u81f3\u5173\u91cd\u8981\u7684\u8fc7\u7a0b\uff0c\u5b83\u5141\u8bb8\u4e13\u5bb6\u5236\u5b9a\u9488\u5bf9\u653b\u51fb\u8005\u7684\u9632\u5fa1\u63aa\u65bd\u548c\u6cd5\u5f8b\u884c\u52a8\u3002\u76ee\u524d\uff0c\u5206\u6790\u4eba\u5458\u4e3b\u8981\u901a\u8fc7\u624b\u52a8\u64cd\u4f5c\u6765\u8fdb\u884c\u5f52\u56e0\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u4efb\u52a1\u7684\u590d\u6742\u6027\u3002\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u53ef\u4ee5\u88ab\u7528\u6765\u8f85\u52a9\u7f51\u7edc\u5b89\u5168\u5206\u6790\u5e08\u5728\u5f52\u56e0\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u5de5\u4f5c\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6280\u672f\u975e\u5e38\u5f3a\u5927\uff0c\u4f46\u5728\u7f3a\u4e4f\u653b\u51fb\u5f52\u56e0\u9886\u57df\u7684\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u4eec\u9700\u8981\u5e94\u5bf9\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c06\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u5e76\u63d0\u4f9b\u5230\u76ee\u524d\u4e3a\u6b62\u6211\u4eec\u6240\u77e5\u7684\u7b2c\u4e00\u4e2a\u653b\u51fb\u5f52\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u8bbe\u8ba1\u7684\u4e3b\u8981\u76ee\u6807\u662f\u4ece\u7f51\u7edc\u5b89\u5168\u6587\u672c\u4e2d\u63d0\u53d6\u653b\u51fb\u5f52\u56e0\u4fe1\u606f\uff0c\u5229\u7528NLP\u9886\u57df\u7684\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u65b9\u6cd5\u3002\u4e0e\u5176\u5b83\u7f51\u7edc\u5b89\u5168NER\u6570\u636e\u96c6\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u63d0\u4f9b\u4e86\u4e30\u5bcc\u4e14\u5305\u542b\u4e0a\u4e0b\u6587\u7ec6\u8282\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u4e00\u4e9b\u8de8\u77ed\u8bed\u548c\u53e5\u5b50\u7684\u6ce8\u91ca\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5e76\u5e94\u7528\u4e86NLP\u6280\u672f\u6765\u5c55\u793a\u6570\u636e\u96c6\u5728\u653b\u51fb\u5f52\u56e0\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u8fd9\u4e9b\u5b9e\u9a8c\u7a81\u663e\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5728\u6539\u8fdb\u7f51\u7edc\u5b89\u5168\u6570\u636e\u96c6\u4e2d\u7684NER\u4efb\u52a1\u4ee5\u63d0\u5347\u653b\u51fb\u5f52\u56e0\u80fd\u529b\u7684\u6f5c\u529b\u3002|\n", "2408.05141": "|**2024-08-09**|**A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning**|Ye Yuan et.al.|[2408.05141](http://arxiv.org/abs/2408.05141)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7efc\u5408\u4f18\u5316\u7684\u589e\u5f3a\u68c0\u7d22\u8f85\u52a9\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u5916\u90e8\u77e5\u8bc6\u5e93\u663e\u8457\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u548c\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u8fdb\u884c\u4e86\u591a\u9879\u6539\u8fdb\uff0c\u5305\u62ec\u5bf9\u7f51\u9875\u4e2d\u7684\u6587\u672c\u6bb5\u843d\u548c\u8868\u683c\u8fdb\u884c\u7ec6\u5316\u5904\u7406\u3001\u5f15\u5165\u5c5e\u6027\u9884\u6d4b\u5668\u4ee5\u51cf\u5c11\u5e7b\u89c9\u3001\u6784\u5efaLLM\u77e5\u8bc6\u62bd\u53d6\u5668\u548c\u77e5\u8bc6\u56fe\u8c31\u62bd\u53d6\u5668\uff0c\u5e76\u6700\u7ec8\u5efa\u7acb\u4e86\u4e00\u4e2a\u6574\u5408\u6240\u6709\u53c2\u8003\u4fe1\u606f\u7684\u63a8\u7406\u7b56\u7565\u3002\u6211\u4eec\u901a\u8fc7Meta CRAG KDD\u676f2024\u7ade\u8d5b\u4e2d\u7684CRAG\u6570\u636e\u96c6\u5bf9\u7cfb\u7edf\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u672c\u5730\u4e0e\u5728\u7ebf\u8bc4\u4f30\u5747\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u590d\u6742\u63a8\u7406\u80fd\u529b\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u63d0\u5347\u3002\u5728\u672c\u5730\u8bc4\u4f30\u4e2d\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u6a21\u578b\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u9519\u8bef\u7387\u4e5f\u6709\u6240\u4e0b\u964d\uff0c\u53d6\u5f97\u4e86\u8f83\u9ad8\u7684\u5206\u6570\u3002\u540c\u65f6\uff0c\u5728\u7ebf\u8bc4\u4f30\u7ed3\u679c\u540c\u6837\u8868\u73b0\u4f18\u5f02\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7cfb\u7edf\u7684\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u7cfb\u7edf\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8e\\url{https://gitlab.aicrowd.com/shizueyy/crag-new}\u3002|\n", "2408.05128": "|**2024-08-09**|**Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations**|Jasmine Latendresse et.al.|[2408.05128](http://arxiv.org/abs/2408.05128)|null|\u5728\u8f6f\u4ef6\u7cfb\u7edf\u529f\u80fd\u3001\u6548\u7387\u4e0e\u53ef\u7ef4\u62a4\u6027\u65b9\u9762\uff0c\u8f6f\u4ef6\u5e93\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\u3002\u968f\u7740\u5f00\u53d1\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b80\u5316\u7f16\u7801\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6a21\u578b\u63a8\u8350\u5408\u9002\u5e93\u7684\u6709\u6548\u6027\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002\u672c\u6587\u8bc4\u4f30\u4e86ChatGPT\u4f5c\u4e3a\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc6\u522b\u4e86\u6539\u8fdb\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-3.5 Turbo\u751f\u6210\u9488\u5bf910000\u4e2aStack Overflow\u95ee\u9898\u7684Python\u4ee3\u7801\uff0c\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cChatGPT\u6bd4\u4eba\u7c7b\u5f00\u53d1\u8005\u66f4\u9891\u7e41\u5730\u4f7f\u7528\u7b2c\u4e09\u65b9\u5e93\uff0c\u503e\u5411\u4e8e\u5e7f\u6cdb\u91c7\u7528\u4e14\u5386\u53f2\u60a0\u4e45\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c14.2%\u63a8\u8350\u7684\u5e93\u5177\u6709\u9650\u5236\u6027\u7684Copyleft\u8bb8\u53ef\uff0c\u8fd9\u5e76\u672a\u7531ChatGPT\u660e\u786e\u4f20\u8fbe\u3002\u6b64\u5916\uff0c\u67096.5%\u7684\u5e93\u65e0\u6cd5\u76f4\u63a5\u4f7f\u7528\uff0c\u53ef\u80fd\u5bfc\u81f4\u5f00\u53d1\u8005\u56f0\u60d1\u548c\u6d6a\u8d39\u65f6\u95f4\u3002\u5c3d\u7ba1ChatGPT\u53ef\u4ee5\u4f5c\u4e3a\u6709\u6548\u7684\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\uff0c\u4f46\u5e94\u63d0\u4f9b\u5173\u4e8e\u7ef4\u62a4\u6307\u6807\u548c\u8bb8\u53ef\u7684\u66f4\u591a\u660e\u786e\u4fe1\u606f\u3002\u6211\u4eec\u5efa\u8bae\u5f00\u53d1\u8005\u5b9e\u65bd\u4e25\u683c\u7684\u4f9d\u8d56\u7ba1\u7406\u5b9e\u8df5\uff0c\u5e76\u5728\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u96c6\u6210\u5230\u9879\u76ee\u4e2d\u4e4b\u524d\uff0c\u4ed4\u7ec6\u68c0\u67e5\u5e93\u7684\u8bb8\u53ef\u8bc1\u3002|\n", "2408.05126": "|**2024-08-09**|**Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media**|Petre Breazu et.al.|[2408.05126](http://arxiv.org/abs/2408.05126)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u6f14\u8fdb\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6587\u672c\u5206\u6790\u4e2d\u7684\u53d1\u5c55\u4e0e\u5e94\u7528\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5404\u79cdLLMs\u5728\u8fdb\u884c\u5b9a\u6027\u5206\u6790\u65f6\u5c55\u73b0\u51fa\u7684\u6f5c\u529b\u88ab\u5bc4\u4e88\u539a\u671b\uff0c\u4f46\u5b83\u4eec\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u4e2d\u7684\u5e94\u7528\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u8ba8\u3002\u672c\u6587\u901a\u8fc7\u4e00\u9879\u4ee5GPT-4\u4e3a\u6838\u5fc3\u7684\u7814\u7a76\u5b9e\u9a8c\uff0c\u4e3aLLMs\u5728\u5b9a\u6027\u5206\u6790\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\u3002\u7814\u7a76\u57fa\u4e8e\u4e00\u4e2a\u6765\u81ea\u6b27\u76df\u8d44\u52a9\u9879\u76ee\u7684YouTube\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u805a\u7126\u4e8e2016\u5e74\u745e\u5178\u7f57\u9a6c\u5c3c\u4e9a\u79fb\u6c11\u7fa4\u4f53\u7684\u4ee3\u8868\u5f62\u8c61\uff0c\u8fd9\u4e00\u65f6\u671f\u6b63\u503c2015\u5e74\u96be\u6c11\u5371\u673a\u4e4b\u540e\uff0c\u7d27\u90bb2017\u5e74\u7684\u745e\u5178\u5168\u56fd\u9009\u4e3e\u3002\u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u5c06\u4eba\u7c7b\u667a\u6167\u4e0eAI\u7684\u89c4\u6a21\u548c\u6548\u7387\u76f8\u7ed3\u5408\u7684\u53ef\u80fd\u6027\uff0c\u901a\u8fc7\u5206\u6790LLMs\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u5e94\u7528\u4f18\u52a3\uff0c\u5e76\u8ba8\u8bba\u672a\u6765\u53ef\u80fd\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.05123": "|**2024-08-09**|**Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video**|Chunggi Lee et.al.|[2408.05123](http://arxiv.org/abs/2408.05123)|null|\u968f\u7740\u7bee\u7403\u8fd0\u52a8\u7684\u666e\u53ca\uff0c\u7c89\u4e1d\u4eec\u5e38\u5e38\u56e0\u6bd4\u8d5b\u8282\u594f\u5feb\u548c\u590d\u6742\u5ea6\u9ad8\u800c\u611f\u5230\u56f0\u60d1\u3002\u7bee\u7403\u6218\u672f\u6d89\u53ca\u4e00\u7cfb\u5217\u590d\u6742\u7684\u52a8\u4f5c\uff0c\u9700\u8981\u5927\u91cf\u7684\u77e5\u8bc6\u624d\u80fd\u5b8c\u5168\u7406\u89e3\u3002\u8fd9\u79cd\u590d\u6742\u6027\u5bfc\u81f4\u4e86\u5bf9\u989d\u5916\u4fe1\u606f\u548c\u89e3\u91ca\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u4f1a\u5206\u6563\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u5173\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSportify\u7684\u89c6\u89c9\u95ee\u7b54\u7cfb\u7edf\uff0c\u5b83\u878d\u5408\u4e86\u53d9\u4e8b\u548c\u5d4c\u5165\u5f0f\u53ef\u89c6\u5316\uff0c\u65e8\u5728\u4e3a\u7403\u8ff7\u63d0\u4f9b\u7bee\u7403\u6218\u672f\u7591\u95ee\u7684\u6e05\u6670\u89e3\u7b54\uff0c\u5e2e\u52a9\u4ed6\u4eec\u7406\u89e3\u6bd4\u8d5b\u7684\u5404\u79cd\u65b9\u9762\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u79cd\u65b0\u578b\u7684\u52a8\u4f5c\u53ef\u89c6\u5316\uff08\u4f20\u7403\u3001\u5207\u5165\u548c\u63a9\u62a4\uff09\uff0c\u4ee5\u5c55\u793a\u5173\u952e\u52a8\u4f5c\u5e8f\u5217\u3002\u4e3a\u4e86\u89e3\u91ca\u7403\u5458\u884c\u52a8\u80cc\u540e\u7684\u539f\u56e0\u548c\u903b\u8f91\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u53d9\u4e8b\u6587\u672c\u3002\u6211\u4eec\u91c7\u7528\u6545\u4e8b\u8bb2\u8ff0\u7684\u65b9\u6cd5\u6765\u63cf\u8ff0\u590d\u6742\u573a\u666f\uff0c\u4ece\u7b2c\u4e00\u4eba\u79f0\u548c\u7b2c\u4e09\u4eba\u79f0\u7684\u89d2\u5ea6\u8fdb\u884c\u53d9\u8ff0\uff0c\u5e76\u878d\u5165\u52a8\u4f5c\u53ef\u89c6\u5316\u3002\u6211\u4eec\u901a\u8fc7\u4e0e\u7bee\u7403\u7c89\u4e1d\u7684\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86Sportify\u5728\u6df1\u5316\u6218\u672f\u6d1e\u5bdf\u529b\u548c\u589e\u5f3a\u89c2\u8d5b\u4f53\u9a8c\u65b9\u9762\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u7b2c\u4e09\u4eba\u79f0\u53d9\u8ff0\u6709\u52a9\u4e8e\u4eba\u4eec\u83b7\u5f97\u6df1\u5165\u7684\u6bd4\u8d5b\u89e3\u91ca\uff0c\u800c\u7b2c\u4e00\u4eba\u79f0\u53d9\u8ff0\u5219\u589e\u5f3a\u4e86\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u53c2\u4e0e\u611f\u3002|\n", "2408.05109": "|**2024-08-09**|**A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?**|Xinyu Liu et.al.|[2408.05109](http://arxiv.org/abs/2408.05109)|**[link](https://github.com/hkustdial/nl2sql_handbook)**|\u7ffb\u8bd1\u5982\u4e0b\uff1a \u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u5230SQL\u67e5\u8be2\uff08\u5373NL2SQL\uff09\u7684\u7ffb\u8bd1\u53ef\u4ee5\u663e\u8457\u964d\u4f4e\u8bbf\u95ee\u5173\u7cfb\u6570\u636e\u5e93\u7684\u969c\u788d\uff0c\u5e76\u652f\u6301\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\uff0cNL2SQL\u7684\u6027\u80fd\u5f97\u5230\u4e86\u5927\u5e45\u63d0\u5347\u3002\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684NL2SQL\u6280\u672f\u7efc\u8ff0\uff0c\u57fa\u4e8eLLMs\u9a71\u52a8\uff0c\u8986\u76d6\u4e86\u4ece\u56db\u4e2a\u65b9\u9762\u5bf9\u6574\u4e2a\u751f\u547d\u5468\u671f\u7684\u5168\u9762\u5ba1\u67e5\uff1a\uff081\uff09\u6a21\u578b\uff1a\u5904\u7406\u81ea\u7136\u8bed\u8a00\u7684\u6a21\u7cca\u6027\u548c\u4e0d\u5145\u5206\u6027\uff0c\u5e76\u6b63\u786e\u6620\u5c04\u81ea\u7136\u8bed\u8a00\u4e0e\u6570\u636e\u5e93\u6a21\u5f0f\u548c\u5b9e\u4f8b\uff1b\uff082\uff09\u6570\u636e\uff1a\u4ece\u6536\u96c6\u8bad\u7ec3\u6570\u636e\u3001\u5e94\u5bf9\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u6570\u636e\u5408\u6210\uff0c\u5230NL2SQL\u57fa\u51c6\uff1b\uff083\uff09\u8bc4\u4f30\uff1a\u4ece\u591a\u4e2a\u89d2\u5ea6\u4f7f\u7528\u4e0d\u540c\u6307\u6807\u5bf9NL2SQL\u65b9\u6cd5\u8fdb\u884c\u8bc4\u4f30\uff1b\uff084\uff09\u9519\u8bef\u5206\u6790\uff1a\u5206\u6790NL2SQL\u9519\u8bef\u4ee5\u627e\u5230\u6839\u672c\u539f\u56e0\uff0c\u5e76\u6307\u5bfcNL2SQL\u6a21\u578b\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5f00\u53d1NL2SQL\u89e3\u51b3\u65b9\u6848\u7684\u4e00\u6761\u7ecf\u9a8c\u6cd5\u5219\u3002\u6700\u540e\uff0c\u8ba8\u8bba\u4e86\u5728LLMs\u65f6\u4ee3NL2SQL\u7684\u7814\u7a76\u6311\u6218\u548c\u5f00\u653e\u95ee\u9898\u3002 \u8bf7\u6ce8\u610f\uff0c\u6458\u8981\u4e2d\u5df2\u53bb\u9664\u6240\u6709\u4e0d\u5fc5\u8981\u7684\u5b57\u7b26\uff0c\u5305\u62ec\",\"\u7b26\u53f7\u3002|\n", "2408.06332": "|**2024-08-12**|**Animate, or Inanimate, That is the Question for Large Language Models**|Leonardo Ranaldi et.al.|[2408.06332](http://arxiv.org/abs/2408.06332)|null|\u4eba\u7c7b\u7684\u8ba4\u77e5\u6838\u5fc3\u4e0e\u201c\u6709\u751f\u547d\u6027\u201d\u8fd9\u4e00\u6982\u5ff5\u7d27\u5bc6\u76f8\u8fde\uff0c\u5b83\u5728\u5851\u9020\u8bb0\u5fc6\u3001\u89c6\u89c9\u4ee5\u53ca\u591a\u5c42\u6b21\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u867d\u7136\u201c\u6709\u751f\u547d\u6027\u201d\u5728\u8bed\u8a00\u4e2d\u901a\u8fc7\u52a8\u8bcd\u548c\u5f62\u5bb9\u8bcd\u7684\u7ec6\u5fae\u7ea6\u675f\u4f53\u73b0\u51fa\u6765\uff0c\u4f46\u5176\u5b66\u4e60\u548c\u7cbe\u70bc\u8fc7\u7a0b\u4e5f\u4f9d\u8d56\u4e8e\u975e\u8bed\u8a00\u4fe1\u606f\u3002\u540c\u6837\u5730\uff0c\u6211\u4eec\u5047\u8bbe\u5927\u6a21\u578b\u5728\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\u65f6\u80fd\u529b\u6709\u9650\u7684\u539f\u56e0\u662f\u5b83\u4eec\u4ec5\u4ee5\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u3002\u56e0\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u63a2\u8ba8\u7684\u95ee\u9898\u662f\uff1a\u5927\u6a21\u578b\u662f\u5426\u80fd\u591f\u4ee5\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u65b9\u5f0f\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\uff1f\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u65b9\u6cd5\u8fdb\u884c\u4e86\u7cfb\u7edf\u5206\u6790\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u5927\u6a21\u578b\u5728\u4e0d\u540c\u7684\u6709\u751f\u547d\u3001\u65e0\u751f\u547d\u3001\u5e38\u89c1\u548c\u5f02\u5e38\u60c5\u5883\u4e0b\u8fdb\u884c\u64cd\u4f5c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u5927\u6a21\u578b\u4e3b\u8981\u57fa\u4e8e\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f46\u5728\u9762\u5bf9\u5178\u578b\u7684\u6709\u751f\u547d\u4f53\u548c\u65e0\u751f\u547d\u4f53\u65f6\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u4e0e\u5148\u524d\u7814\u7a76\u4e00\u81f4\u7684\u4eba\u7c7b\u884c\u4e3a\u6a21\u5f0f\u3002\u56e0\u6b64\uff0c\u5927\u6a21\u578b\u80fd\u591f\u9002\u5e94\u7406\u89e3\u975e\u5178\u578b\u60c5\u51b5\uff0c\u901a\u8fc7\u8bc6\u522b\u5f02\u5e38\u60c5\u51b5\u4e3a\u6709\u751f\u547d\u4f53\uff0c\u800c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u4f9d\u8d56\u7684\u672a\u8a00\u660e\u7684\u8ba4\u77e5\u89e6\u53d1\u673a\u5236\u6765\u5206\u89e3\u52a8\u753b\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u80fd\u5426\u5bf9\u5177\u6709\u957f\u4e0a\u4e0b\u6587\u7684\u573a\u666f\u4ea7\u751f\u8d1f\u9762\u5f71\u54cd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u5584\u8ba1\u5212\uff1f\uff084\uff09\u662f\u5426\u53ef\u4ee5\u4f7f\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u4f46\u5728\u5904\u7406\u957f\u7bc7\u4e0a\u4e0b\u6587\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u65e0\u6cd5\u5173\u6ce8\u5173\u952e\u90e8\u5206\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u5206\u6790\u957f\u671f\u89c4\u5212\uff0c\u5e76\u4e0d\u80fd\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u4f9b\u7ec6\u5316\u4f7f\u7528\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3a\u53cd\u9988\u611f\u77e5\u5fae\u8c03\uff08FAFT\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4e86\u6b63\u8d1f\u53cd\u9988\uff0c\u76f8\u8f83\u4e8e\u76d1\u7763\u5f0f\u5fae\u8c03\uff08SFT\uff09\uff0c\u5b83\u80fd\u5e26\u6765\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u6709\u5173\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.06292": "|**2024-08-12**|**The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery**|Chris Lu et.al.|[2408.06292](http://arxiv.org/abs/2408.06292)|**[link](https://github.com/sakanaai/ai-scientist)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u7684\u79d1\u5b66\u53d1\u73b0\uff0c\u4f7f\u524d\u6cbf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u72ec\u7acb\u8fdb\u884c\u7814\u7a76\uff0c\u5e76\u4f20\u8fbe\u5176\u7814\u7a76\u6210\u679c\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201cAI\u79d1\u5b66\u5bb6\u201d\u8fd9\u4e00\u6982\u5ff5\uff0c\u5b83\u80fd\u751f\u6210\u65b0\u9896\u7684\u7814\u7a76\u601d\u8def\uff0c\u7f16\u5199\u4ee3\u7801\uff0c\u6267\u884c\u5b9e\u9a8c\uff0c\u53ef\u89c6\u5316\u7ed3\u679c\uff0c\u64b0\u5199\u5b8c\u6574\u7684\u79d1\u5b66\u8bba\u6587\uff0c\u5e76\u8fdb\u884c\u6a21\u62df\u7684\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\u4ee5\u8fdb\u884c\u8bc4\u4f30\u3002\u7406\u8bba\u4e0a\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u8fed\u4ee3\u8fdb\u884c\uff0c\u4ee5\u5f00\u653e\u6027\u65b9\u5f0f\u53d1\u5c55\u60f3\u6cd5\uff0c\u5c31\u50cf\u4eba\u7c7b\u7684\u79d1\u5b66\u793e\u533a\u4e00\u6837\u3002 \u901a\u8fc7\u5c06\u5176\u5e94\u7528\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u4e09\u4e2a\u4e0d\u540c\u5b50\u9886\u57df\uff1a\u6269\u6563\u5efa\u6a21\u3001\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u8bed\u8a00\u5efa\u6a21\u548c\u5b66\u4e60\u52a8\u6001\uff0c\u5c55\u793a\u4e86\u5176\u7075\u6d3b\u6027\u3002\u6bcf\u4e00\u7bc7\u8bba\u6587\u7684\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e15\u7f8e\u5143\u3002\u4e3a\u4e86\u8bc4\u4f30\u751f\u6210\u7684\u8bba\u6587\uff0c\u6211\u4eec\u8bbe\u8ba1\u5e76\u9a8c\u8bc1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5ba1\u7a3f\u4eba\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u5728\u8bc4\u4ef7\u8bba\u6587\u5206\u6570\u65b9\u9762\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u8868\u73b0\u3002AI\u79d1\u5b66\u5bb6\u80fd\u591f\u4ea7\u751f\u8d85\u8fc7\u9876\u7ea7\u673a\u5668\u5b66\u4e60\u4f1a\u8bae\u63a5\u53d7\u9608\u503c\u7684\u8bba\u6587\uff0c\u8fd9\u662f\u7531\u6211\u4eec\u7684\u81ea\u52a8\u5ba1\u7a3f\u4eba\u5224\u65ad\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u9886\u57df\u79d1\u5b66\u7814\u7a76\u65b0\u7eaa\u5143\u7684\u5f00\u59cb\uff1a\u5c06AI\u4ee3\u7406\u7684\u53d8\u9769\u6027\u4f18\u52bf\u5e26\u5165\u6574\u4e2a\u7814\u7a76\u8fc7\u7a0b\uff0c\u4f7f\u6211\u4eec\u66f4\u63a5\u8fd1\u4e00\u4e2a\u80fd\u591f\u91ca\u653e\u89e3\u51b3\u4e16\u754c\u6700\u8270\u5de8\u95ee\u9898\u7684\u65e0\u9650\u53ef\u8d1f\u62c5\u521b\u65b0\u4e0e\u521b\u9020\u529b\u7684\u4e16\u754c\u3002\u6240\u6709\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/SakanaAI/AI-Scientist\u3002**|\n", "2408.06281": "|**2024-08-12**|**MovieSum: An Abstractive Summarization Dataset for Movie Screenplays**|Rohit Saxena et.al.|[2408.06281](http://arxiv.org/abs/2408.06281)|**[link](https://github.com/saxenarohit/moviesum)**|**\u7535\u5f71\u5267\u672c\u7684\u6982\u8ff0\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u8981\u6c42\u7406\u89e3\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u548c\u7535\u5f71\u7279\u6709\u7684\u5404\u79cd\u5143\u7d20\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u6863\u6982\u8ff0\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5728\u5904\u7406\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u65f6\u9047\u5230\u56f0\u96be\u3002\u6b64\u5916\uff0c\u867d\u7136\u6700\u8fd1\u7684\u7814\u7a76\u5173\u6ce8\u7535\u89c6\u811a\u672c\uff0c\u4f46\u7535\u5f71\u5267\u672c\u6982\u8ff0\u4ecd\u7136\u7f3a\u4e4f\u63a2\u7d22\u3002\u4e3a\u4e86\u6fc0\u53d1\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3aMovieSum\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u7535\u5f71\u5267\u672c\u7684\u62bd\u8c61\u6982\u8ff0\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u5305\u542b\u4e862200\u4e2a\u7535\u5f71\u5267\u672c\u53ca\u5176\u5bf9\u5e94\u7684\u7ef4\u57fa\u767e\u79d1\u5267\u60c5\u6982\u8ff0\u3002\u6211\u4eec\u4eba\u5de5\u683c\u5f0f\u5316\u4e86\u7535\u5f71\u5267\u672c\u4ee5\u8868\u793a\u5176\u7ed3\u6784\u5143\u7d20\u3002\u4e0e\u73b0\u6709\u7684\u6570\u636e\u96c6\u76f8\u6bd4\uff0cMovieSum\u5177\u6709\u51e0\u4e2a\u72ec\u7279\u7279\u70b9\uff1a\uff081\uff09\u5b83\u5305\u62ec\u7535\u5f71\u5267\u672c\uff0c\u8fd9\u4e9b\u5267\u672c\u6bd4\u7535\u89c6\u5267\u811a\u672c\u66f4\u957f\u3002\uff082\uff09\u5b83\u7684\u89c4\u6a21\u662f\u4e4b\u524d\u7535\u5f71\u5267\u672c\u6570\u636e\u96c6\u7684\u4e24\u500d\u3002\uff083\uff09\u5b83\u63d0\u4f9b\u4e86IMDb ID\u7b49\u5143\u6570\u636e\uff0c\u65b9\u4fbf\u83b7\u53d6\u989d\u5916\u7684\u5916\u90e8\u77e5\u8bc6\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6982\u8ff0\u7684\u7ed3\u679c\uff0c\u4ee5\u63d0\u4f9b\u8be6\u7ec6\u7684\u57fa\u51c6\u3002**|\n", "2408.06276": "|**2024-08-13**|**Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation**|Jieyong Kim et.al.|[2408.06276](http://arxiv.org/abs/2408.06276)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u4e86\u5b83\u4eec\u5728\u63a8\u8350\u7cfb\u7edf\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u7684\u6f5c\u529b\uff0c\u5f80\u5f80\u53d7\u9650\u4e8e\u8f93\u5165\u4fe1\u606f\u7684\u6709\u9650\u6027\uff0c\u672a\u80fd\u5168\u9762\u53d1\u6325\u5176\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEXP3RT\u7684\u65b0\u9896LLM\u63a8\u8350\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u7528\u6237\u548c\u7269\u54c1\u8bc4\u8bba\u4e2d\u8574\u542b\u7684\u4e30\u5bcc\u504f\u597d\u4fe1\u606f\u3002 EXP3RT\u901a\u8fc7\u4ece\u6559\u5e08LLM\u4e2d\u8fdb\u884c\u77e5\u8bc6\u84b8\u998f\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u5173\u952e\u7684\u4e09\u9879\u4efb\u52a1\uff1a\u9996\u5148\uff0c\u5b83\u4ece\u539f\u59cb\u8bc4\u8bba\u4e2d\u63d0\u53d6\u5e76\u5c01\u88c5\u6838\u5fc3\u7684\u4e3b\u89c2\u504f\u597d\uff1b\u5176\u6b21\uff0c\u6839\u636e\u7279\u5b9a\u6807\u51c6\u805a\u5408\u548c\u603b\u7ed3\u8fd9\u4e9b\u504f\u597d\uff0c\u5f62\u6210\u7528\u6237\u548c\u7269\u54c1\u7684\u6863\u6848\uff1b\u6700\u540e\uff0c\u8003\u8651\u7528\u6237/\u7269\u54c1\u6863\u6848\u4ee5\u53ca\u7269\u54c1\u63cf\u8ff0\u4e2d\u7684\u4e3b\u5ba2\u89c2\u4fe1\u606f\uff0c\u751f\u6210\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\u548c\u9884\u6d4b\u8bc4\u7ea7\uff0c\u5373\u57fa\u4e8e\u63a8\u7406\u7684\u8bc4\u7ea7\u9884\u6d4b\u3002\u8fd9\u79cd\u7531EXP3RT\u63d0\u4f9b\u7684\u4e2a\u6027\u5316\u504f\u597d\u63a8\u7406\u80fd\u591f\u63d0\u9ad8\u8bc4\u7ea7\u9884\u6d4b\u7684\u51c6\u786e\u6027\uff0c\u5e76\u4e3a\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u5fe0\u5b9e\u4e14\u5408\u7406\u7684\u89e3\u91ca\u3002 \u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cEXP3RT\u5728\u8bc4\u7ea7\u9884\u6d4b\u548c\u5019\u9009\u9879\u76ee\u91cd\u6392\u5e8f\uff08\u7528\u4e8etop-k\u63a8\u8350\uff09\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\uff0c\u540c\u65f6\u663e\u8457\u63d0\u5347\u4e86\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u89e3\u91ca\u6027\u3002|\n", "2408.06273": "|**2024-08-12**|**FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data**|Haoran Sun et.al.|[2408.06273](http://arxiv.org/abs/2408.06273)|**[link](https://github.com/tjunlp-lab/fuxitranyu)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bb8\u591aLLM\u5728\u9ad8\u8d44\u6e90\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u591a\u8bed\u8a00LLM\u2014\u2014FuxiTranyu\uff0c\u65e8\u5728\u6ee1\u8db3\u7814\u7a76\u793e\u533a\u5bf9\u5e73\u8861\u4e14\u9ad8\u6027\u80fd\u591a\u8bed\u8a00\u80fd\u529b\u7684\u9700\u6c42\u3002FuxiTranyu-8B\uff0c\u5177\u670980\u4ebf\u53c2\u6570\u7684\u57fa\u6a21\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5728\u4e00\u4e2a\u7cbe\u5fc3\u5e73\u8861\u7684\u591a\u8bed\u8a00\u6570\u636e\u4ed3\u5e93\u4e0a\uff0c\u8be5\u4ed3\u5e93\u5305\u542b\u8986\u76d643\u79cd\u81ea\u7136\u8bed\u8a00\u548c16\u79cd\u7f16\u7a0b\u8bed\u8a00\u76846000\u4ebf\u4e2a\u4ee4\u724c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e24\u4e2a\u6307\u4ee4\u8c03\u4f18\u6a21\u578b\uff1aFuxiTranyu-8B-SFT\uff0c\u5b83\u57fa\u4e8e\u591a\u5143\u6307\u4ee4\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\uff1b\u4ee5\u53caFuxiTranyu-8B-DPO\uff0c\u5728\u504f\u597d\u6570\u636e\u96c6\u4e0a\u8fdb\u4e00\u6b65\u7cbe\u70bc\u4ee5\u589e\u5f3a\u5bf9\u9f50\u80fd\u529b\u7684DPO\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u5728\u591a\u79cd\u591a\u8bed\u8a00\u57fa\u51c6\u4e0a\u7684\u7ed3\u679c\u663e\u793a\uff0cFuxiTranyu\u5728\u4e0e\u73b0\u6709\u591a\u8bed\u8a00LLM\uff08\u5982BLOOM-7B\u3001PolyLM-13B\u3001Llama-2-Chat-7B\u548cMistral-7B-Instruct\uff09\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u6027\u80fd\u3002\u795e\u7ecf\u5143\u7ea7\u548c\u8868\u793a\u7ea7\u53ef\u89e3\u91ca\u6027\u5206\u6790\u8868\u660e\uff0cFuxiTranyu\u80fd\u591f\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5b66\u4e60\u4e00\u81f4\u7684\u591a\u8bed\u8a00\u8868\u793a\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u591a\u8bed\u8a00LLM\u53ca\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u5e03\u4e86\u57fa\u6a21\u548c\u6307\u4ee4\u8c03\u4f18\u7684FuxiTranyu\u6a21\u578b\uff0c\u4ee5\u53ca58\u4e2a\u9884\u8bad\u7ec3\u68c0\u67e5\u70b9\uff0c\u901a\u8fc7HuggingFace\u548cGithub\u516c\u5f00\u5206\u4eab\u3002|\n", "2408.06272": "|**2024-08-12**|**A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution**|Sampath Rajapaksha et.al.|[2408.06272](http://arxiv.org/abs/2408.06272)|null|\u5728\u4e0d\u65ad\u6f14\u8fdb\u7684\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u5206\u6790\u5e08\u9700\u8981\u5bc6\u5207\u5173\u6ce8\u6700\u65b0\u7684\u653b\u51fb\u8d8b\u52bf\u548c\u76f8\u5173\u4fe1\u606f\uff0c\u4ee5\u534f\u52a9\u8c03\u67e5\u4e0e\u5f52\u56e0\u7f51\u7edc\u653b\u51fb\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6280\u672f\u7684\u95ee\u7b54\u6a21\u578b\u53ca\u5176\u5e94\u7528\uff0c\u65e8\u5728\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u5bb6\u63d0\u4f9b\u6709\u5173\u7f51\u7edc\u653b\u51fb\u8c03\u67e5\u4e0e\u5f52\u56e0\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u5e93\uff08KB\uff09\uff0c\u80fd\u591f\u6839\u636e\u77e5\u8bc6\u5e93\u6216\u7528\u6237\u63d0\u4f9b\u7684\u5916\u90e8\u8d44\u6e90\u56de\u7b54\u7528\u6237\u7684\u67e5\u8be2\u3002 \u6211\u4eec\u901a\u8fc7\u5404\u79cd\u7c7b\u578b\u7684\u63d0\u95ee\uff0c\u5305\u62ec\u57fa\u4e8e\u77e5\u8bc6\u5e93\u3001\u5143\u6570\u636e\u3001\u77e5\u8bc6\u5e93\u4e2d\u7684\u7279\u5b9a\u6587\u6863\u4ee5\u53ca\u5916\u90e8\u8d44\u6e90\u7684\u63d0\u95ee\uff0c\u5bf9\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u8fdb\u884c\u4e86\u6d4b\u8bd5\u4e0e\u8bc4\u4f30\u3002\u6211\u4eec\u5c06\u77e5\u8bc6\u5e93\u4e3a\u57fa\u7840\u7684\u95ee\u9898\u7684\u7b54\u6848\u4e0eOpenAI\u7684GPT-3.5\u53ca\u6700\u65b0GPT-4\u7684LLM\u7b54\u6848\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u5728\u63d0\u4f9b\u7b54\u6848\u7684\u540c\u65f6\u7ed9\u51fa\u4e86\u6765\u6e90\u4fe1\u606f\uff0c\u5e76\u4e14\u514b\u670d\u4e86GPT\u6a21\u578b\u53ef\u80fd\u4ea7\u751f\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5bf9\u4e8e\u7f51\u7edc\u653b\u51fb\u7684\u8c03\u67e5\u4e0e\u5f52\u56e0\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u5f53RAG\u95ee\u7b54\u6a21\u578b\u5728\u67e5\u8be2\u4e4b\u5916\u63d0\u4f9b\u5c11\u91cf\u793a\u4f8b\u65f6\uff0c\u5176\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u901a\u5e38\u4f18\u4e8e\u4ec5\u63d0\u4f9b\u67e5\u8be2\u800c\u6ca1\u6709\u793a\u4f8b\u7684\u60c5\u51b5\u3002|\n", "2408.06266": "|**2024-08-12**|**Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment**|Karel D'Oosterlinck et.al.|[2408.06266](http://arxiv.org/abs/2408.06266)|**[link](https://github.com/contextualai/clair_and_apo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u4f7f\u7528\u5bf9\u6bd4\u6027\u5bf9\u9f50\u76ee\u6807\u548c\u504f\u597d\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5bf9\u9f50\u3002\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u5230\u6a21\u578b\u3001\u914d\u5bf9\u6570\u636e\u4ee5\u53ca\u76ee\u6807\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u4f7f\u5f97\u5bf9\u9f50\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u6709\u65f6\u5bfc\u81f4\u4e0d\u7406\u60f3\u7684\u6210\u679c\u3002\u6211\u4eec\u5bf9\u6b64\u8fdb\u884c\u4e86\u7814\u7a76\uff0c\u53d1\u73b0\uff08i\uff09\u5f53\u5e95\u5c42\u54cd\u5e94\u5177\u6709\u5bf9\u6bd4\u6027\u65f6\uff0c\u504f\u597d\u6570\u636e\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u5b66\u4e60\u4fe1\u53f7\uff1b\uff08ii\uff09\u5bf9\u9f50\u76ee\u6807\u5728\u8bad\u7ec3\u671f\u95f4\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u63a7\u5236\uff0c\u4ece\u800c\u5bfc\u81f4\u4e86\u66f4\u597d\u7684\u6027\u80fd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5bf9\u6bd4\u5b66\u4e60\u4eceAI\u4fee\u8ba2\uff08CLAIR\uff09\uff0c\u4e00\u79cd\u6570\u636e\u521b\u5efa\u65b9\u6cd5\uff0c\u53ef\u4ee5\u751f\u6210\u66f4\u5177\u6709\u5bf9\u6bd4\u6027\u7684\u504f\u597d\u5bf9\uff0c\u4ee5\u53ca\u951a\u5b9a\u504f\u597d\u4f18\u5316\uff08APO\uff09\uff0c\u4e00\u4e2a\u66f4\u5177\u53ef\u63a7\u6027\u548c\u7a33\u5b9a\u6027\u7684\u5bf9\u9f50\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u96c6\u548c\u5bf9\u9f50\u76ee\u6807\u6765\u5bf9Llama-3-8B-Instruct\u8fdb\u884c\u5bf9\u9f50\uff0c\u5e76\u6d4b\u91cf\u4e86\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684MixEval-Hard\u5206\u6570\u3002CLAIR\u504f\u597d\u5bfc\u81f4\u6240\u6709\u6570\u636e\u96c6\u4e2d\u7684\u6700\u4f73\u6027\u80fd\uff0c\u800cAPO\u59cb\u7ec8\u4f18\u4e8e\u8f83\u5c11\u53ef\u63a7\u7684\u76ee\u6807\u3002\u901a\u8fc7\u572832K CLAIR\u504f\u597d\u4e0a\u4f7f\u7528APO\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u7684\u6700\u4f73\u6a21\u578b\u63d0\u9ad8\u4e86Llama-3-8B-Instruct\u7684\u6027\u80fd\u8fbe7.65%\uff0c\u5c06\u4e0eGPT4-turbo\u7684\u5dee\u8ddd\u7f29\u5c0f\u4e8645%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/ContextualAI/CLAIR_and_APO\u3002|\n", "2408.06223": "|**2024-08-12**|**On Effects of Steering Latent Representation for Large Language Model Unlearning**|Dang Huu-Tien et.al.|[2408.06223](http://arxiv.org/abs/2408.06223)|null|\u672c\u6587\u9996\u5148\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\u4e86\u5f15\u5bfc\u6a21\u578b\u4e2d\u95f4\u5c42\u9057\u5fd8\u8868\u793a\u5411\u968f\u673a\u65b9\u5411\u504f\u79fb\uff0c\u80fd\u964d\u4f4e\u6587\u672c\u751f\u6210\u7684\u7f6e\u4fe1\u5ea6\uff0c\u5bfc\u81f4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ea7\u751f\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u56de\u7b54\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u7cfb\u6570\u5982\u4f55\u5f71\u54cd\u9057\u5fd8\u6837\u672c\u8868\u793a\u4e0e\u968f\u673a\u65b9\u5411\u7684\u4e00\u81f4\u6027\uff0c\u5e76\u6697\u793a\u4e86\u4e0d\u540c\u7f51\u7edc\u5c42\u4e0b\u6709\u6548\u7684\u6700\u4f18\u7cfb\u6570\u503c\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5229\u7528\u4ee3\u8868\u9519\u4e71\u6cd5\uff08RMU\uff09\u8fdb\u884c\u5b66\u4e60\u64a4\u9500\u540e\u7684\u6a21\u578b\u80fd\u591f\u62b5\u5fa1\u5bf9\u6297\u6027\u9003\u8131\u653b\u51fb\u3002 \u6700\u540e\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u5206\u6790\u8868\u660e\uff0c\u5f53\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u95f4\u548c\u540e\u671f\u5c42\u65f6\uff0cRMU\u7684\u6709\u6548\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u2014\u2014\u81ea\u9002\u5e94RMU\uff0c\u8be5\u65b9\u6cd5\u4f7f\u5927\u591a\u6570\u5c42\u90fd\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\uff0c\u4e14\u4e0d\u589e\u52a0\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u7814\u7a76\u76f8\u6bd4\uff0c\u81ea\u9002\u5e94RMU\u663e\u8457\u63d0\u9ad8\u4e86\u5b66\u4e60\u64a4\u9500\u7684\u6027\u80fd\u3002|\n", "2408.06186": "|**2024-08-12**|**Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting**|Halley Young et.al.|[2408.06186](http://arxiv.org/abs/2408.06186)|null|\u751f\u6210\u591a\u6837\u5316\u7684\u6587\u672c\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u591a\u6837\u6027\u7684\u7814\u7a76\u4e3b\u8981\u901a\u8fc7$n$-gram\u591a\u6837\u6027\u6216BERT\u5d4c\u5165\u7684\u591a\u6837\u6027\u7b49\u6307\u6807\u8fdb\u884c\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u8003\u8651\u591a\u6837\u6027\u7684\u7ef4\u5ea6\u4e0a\u7f3a\u4e4f\u7528\u6237\u63a7\u5236\u6743\u3002\u4f8b\u5982\uff0c\u5728\u8bd7\u6b4c\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u5e0c\u671b\u5728\u62bc\u97f5\u548c\u8282\u594f\u65b9\u9762\u5b9e\u73b0\u591a\u6837\u6027\uff0c\u800c\u5728\u4ee3\u7801\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u66f4\u5173\u6ce8\u89e3\u51b3\u95ee\u9898\u65f6\u6240\u4f7f\u7528\u7684\u8868\u8fbe\u65b9\u5f0f\u7684\u591a\u6837\u6027\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u7ed3\u6784\u591a\u6837\u6027\uff08Structural Diversity\uff09\u7684\u65b0\u6307\u6807\u3002\u8be5\u6307\u6807\u5141\u8bb8\u7528\u6237\u63d0\u4f9b\u4e00\u4e2a\u6620\u5c04\uff0c\u5c06\u751f\u6210\u7684\u6587\u672c\u8f6c\u6362\u4e3a\u6355\u83b7\u7528\u6237\u5173\u5fc3\u7684\u591a\u6837\u6027\u7684\u7279\u5f81\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u5177\u4f53\u5730\u63a7\u5236\u4ed6\u4eec\u60f3\u8981\u63a2\u7d22\u7684\u591a\u6837\u6027\u7ef4\u5ea6\uff0c\u5982\u5728\u8bd7\u6b4c\u9886\u57df\u5173\u6ce8\u62bc\u97f5\u548c\u8282\u594f\uff0c\u5728\u4ee3\u7801\u9886\u57df\u5173\u6ce8\u7279\u5b9a\u7684\u8868\u8fbe\u65b9\u5f0f\u7b49\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u94fe\u5f0f\u89c4\u8303\uff08Chain-of-Specification\uff0cCoS\uff09\u7684\u65b0\u578b\u7b56\u7565\uff0c\u7528\u4e8e\u901a\u8fc7\u9996\u5148\u8ba9LLM\u751f\u6210\u63cf\u8ff0\u7279\u5b9a\u7ed3\u6784\u7279\u5f81\u5b9e\u4f8b\u7684\u89c4\u8303\uff0c\u7136\u540e\u5f15\u5bfcLLM\u751f\u6210\u6ee1\u8db3\u8fd9\u4e9b\u7279\u5f81\u7684\u6587\u672c\u6765\u63d0\u9ad8\u591a\u6837\u6027\uff1b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u7b56\u7565\u9002\u7528\u4e8e\u9ed1\u76d2LLM\u3002\u5728\u6211\u4eec\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728\u8bd7\u6b4c\u548c\u4ee3\u7801\u9886\u57df\u5b9e\u73b0\u7ed3\u6784\u591a\u6837\u6027\u65f6\uff0cCoS\u7b56\u7565\u76f8\u6bd4\u591a\u4e2a\u57fa\u7ebf\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6837\u6027\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u5728SWE-Bench Lite\u4e2d\u89e3\u51b3\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u5b83\u4eec\u7684\u72ec\u7279\u4e13\u957f\u3002DEI\u4f5c\u4e3a\u4e00\u4e2a\u4f4d\u4e8e\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7531DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90\u7684SWE\u4ee3\u7406\uff0c\u5176\u4e2a\u4f53\u89e3\u51b3\u7387\u6700\u9ad8\u4e3a27.3%\u5728SWE-Bench Lite\u4e2d\uff0c\u901a\u8fc7\u91c7\u7528DEI\uff0c\u53ef\u4ee5\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u6027\u80fd\u7ec4\u8868\u73b0\u51fa\u8272\uff0c\u8fbe\u5230\u4e8655%\u7684\u89e3\u51b3\u7387\uff0c\u5728SWE-Bench Lite\u4e2d\u83b7\u5f97\u4e86\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5c\u578b\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u5c55\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.07055": "|**2024-08-13**|**LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs**|Yushi Bai et.al.|[2408.07055](http://arxiv.org/abs/2408.07055)|**[link](https://github.com/thudm/longwriter)**|**\u5f53\u524d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5904\u7406\u6700\u591a10\u4e07\u5b57\u7684\u8f93\u5165\uff0c\u7136\u800c\u5728\u751f\u6210\u8d85\u8fc72\u5343\u5b57\u7684\u8f93\u51fa\u65f6\u5374\u529b\u4e0d\u4ece\u5fc3\u3002\u901a\u8fc7\u63a7\u5236\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u7684\u6709\u6548\u751f\u6210\u957f\u5ea6\u672c\u8d28\u4e0a\u53d7\u5230\u5176\u5728\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u671f\u95f4\u6240\u89c1\u6837\u672c\u7684\u9650\u5236\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u5b83\u4eec\u7684\u8f93\u51fa\u9650\u5236\u6e90\u4e8e\u73b0\u6709SFT\u6570\u636e\u96c6\u4e2d\u957f\u8f93\u51fa\u793a\u4f8b\u7684\u7a00\u7f3a\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentWrite\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u4ee3\u7406\u7684\u7ba1\u9053\uff0c\u5c06\u8d85\u957f\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u4f7f\u73b0\u6709\u7684LLMs\u80fd\u591f\u751f\u6210\u8d85\u8fc72\u4e07\u5b57\u7684\u8fde\u8d2f\u8f93\u51fa\u3002 \u501f\u52a9AgentWrite\uff0c\u6211\u4eec\u6784\u5efa\u4e86LongWriter-6k\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e866000\u4e2aSFT\u6570\u636e\uff0c\u8f93\u51fa\u957f\u5ea6\u8303\u56f4\u4ece2\u5343\u523032\u5343\u5b57\u3002\u901a\u8fc7\u5c06\u6b64\u6570\u636e\u96c6\u7eb3\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u73b0\u6709\u6a21\u578b\u7684\u8f93\u51fa\u957f\u5ea6\u6269\u5c55\u81f3\u8d85\u8fc71\u4e07\u5b57\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u8f93\u51fa\u8d28\u91cf\u3002\u6211\u4eec\u4e5f\u5f00\u53d1\u4e86LongBench-Write\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u8d85\u957f\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u76849\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7DPO\u8fdb\u4e00\u6b65\u6539\u8fdb\u540e\uff0c\u5728\u8fd9\u4e00\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u66f4\u5927\u89c4\u6a21\u7684\u4e13\u6709\u6a21\u578b\u3002 \u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u73b0\u6709\u7684\u957f\u4e0a\u4e0b\u6587LLMs\u5b9e\u9645\u4e0a\u5df2\u7ecf\u5177\u5907\u4e86\u66f4\u5927\u7684\u8f93\u51fa\u7a97\u53e3\u7684\u80fd\u529b\u2014\u2014\u4f60\u53ea\u9700\u8981\u5728\u6a21\u578b\u5bf9\u9f50\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u5e26\u6709\u5ef6\u957f\u8f93\u51fa\u7684\u6570\u636e\u5373\u53ef\u89e3\u9501\u8fd9\u4e00\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u4ee5\u5728\uff1ahttps://github.com/THUDM/LongWriter\u627e\u5230\u3002**|\n", "2408.07004": "|**2024-08-13**|**Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models**|Chun Jie Chong et.al.|[2408.07004](http://arxiv.org/abs/2408.07004)|null|\u57fa\u4e8e\u7f51\u7edc\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u670d\u52a1\u5df2\u88ab\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e76\u5df2\u6210\u4e3a\u6211\u4eec\u4e92\u8054\u7f51\u4f53\u9a8c\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002\u7b2c\u4e09\u65b9\u63d2\u4ef6\u901a\u8fc7\u63d0\u4f9b\u5bf9\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u548c\u670d\u52a1\u7684\u8bbf\u95ee\uff0c\u589e\u5f3a\u4e86LLM\u7684\u529f\u80fd\u6027\u3002\u7136\u800c\uff0c\u4e0e\u8fd9\u4e9b\u670d\u52a1\u53ca\u5176\u7b2c\u4e09\u65b9\u63d2\u4ef6\u76f8\u5173\u7684\u9690\u79c1\u540e\u679c\u5e76\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u654f\u611f\u63d0\u793a\u6570\u636e\u5728\u4e91\u57faLLM\u63d0\u4f9b\u5546\u548c\u7b2c\u4e09\u65b9\u63d2\u4ef6\u4e2d\u88ab\u5b58\u50a8\u3001\u5904\u7406\u548c\u5171\u4eab\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCasper\u7684\u63d0\u793a\u51c0\u5316\u6280\u672f\uff0c\u65e8\u5728\u901a\u8fc7\u68c0\u6d4b\u5e76\u4ece\u7528\u6237\u8f93\u5165\u4e2d\u5220\u9664\u654f\u611f\u4fe1\u606f\u6765\u4fdd\u62a4\u7528\u6237\u9690\u79c1\uff0c\u4ece\u800c\u5728\u53d1\u9001\u7ed9LLM\u670d\u52a1\u4e4b\u524d\u4fdd\u62a4\u7528\u6237\u9690\u79c1\u3002Casper\u5b8c\u5168\u4f5c\u4e3a\u6d4f\u89c8\u5668\u6269\u5c55\u8fd0\u884c\u5728\u7528\u6237\u7684\u8bbe\u5907\u4e0a\uff0c\u65e0\u9700\u5bf9\u5728\u7ebfLLM\u670d\u52a1\u8fdb\u884c\u4efb\u4f55\u66f4\u6539\u3002Casper\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u4e09\u5c42\u51c0\u5316\u673a\u5236\uff0c\u5305\u62ec\u89c4\u5219\u57fa\u4e8e\u8fc7\u6ee4\u5668\u3001\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u5668\u548c\u6d4f\u89c8\u5668\u672c\u5730LLM\u4e3b\u9898\u6807\u8bc6\u5668\u3002\u6211\u4eec\u4f7f\u75284000\u4e2a\u5408\u6210\u63d0\u793a\u96c6\u5bf9Casper\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5b83\u80fd\u591f\u4ee5\u9ad8\u51c6\u786e\u7387\uff0898.5%\uff09\u6709\u6548\u5730\u8fc7\u6ee4\u51fa\u4e2a\u4eba\u53ef\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\u548c\u9690\u79c1\u654f\u611f\u8bdd\u9898\uff0889.9%\uff09\u3002|\n", "2408.06993": "|**2024-08-13**|**LLMs can Schedule**|Henrik Abgaryan et.al.|[2408.06993](http://arxiv.org/abs/2408.06993)|**[link](https://github.com/starjob42/datasetjsp)**|**\u5de5\u4f5c\u8f66\u95f4\u8c03\u5ea6\u95ee\u9898(JSSP)\u5728\u4f18\u5316\u751f\u4ea7\u6d41\u7a0b\u65b9\u9762\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u8be5\u95ee\u9898\u6d89\u53ca\u6709\u6548\u5206\u914d\u4efb\u52a1\u5230\u6709\u9650\u6570\u91cf\u7684\u673a\u5668\u4e0a\uff0c\u4ee5\u6700\u5c0f\u5316\u603b\u5904\u7406\u65f6\u95f4\u6216\u4f5c\u4e1a\u5ef6\u8fdf\u7b49\u56e0\u7d20\u3002\u5c3d\u7ba1\u8fd1\u671f\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\u5df2\u7ecf\u4ea7\u751f\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4f8b\u5982\u5f3a\u5316\u5b66\u4e60\u548c\u56fe\u795e\u7ecf\u7f51\u7edc\uff0c\u4f46\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u5728JSSP\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u4e13\u95e8\u4e3a\u8bad\u7ec3LLM\u8bbe\u8ba1\u7684120k\u6570\u636e\u96c6\uff0c\u4e13\u95e8\u9488\u5bf9JSSP\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8c03\u5ea6\u53ef\u4ee5\u5b9e\u73b0\u4e0e\u5176\u5b83\u795e\u7ecf\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u91c7\u6837\u65b9\u6cd5\uff0c\u4ee5\u63d0\u9ad8LLM\u5728\u89e3\u51b3JSSP\u65f6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.06941": "|**2024-08-13**|**OpenResearcher: Unleashing AI for Accelerated Scientific Research**|Yuxiang Zheng et.al.|[2408.06941](http://arxiv.org/abs/2408.06941)|**[link](https://github.com/gair-nlp/openresearcher)**|**\u5feb\u901f\u53d1\u5c55\u7684\u79d1\u5b66\u6587\u732e\u5bf9\u7814\u7a76\u4eba\u5458\u5728\u5404\u81ea\u9886\u57df\u4fdd\u6301\u6700\u65b0\u8fdb\u5c55\u548c\u63a2\u7d22\u65b0\u9886\u57df\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u5e73\u53f0\u2014\u2014OpenResearcher\uff0c\u5b83\u5229\u7528\u4eba\u5de5\u667a\u80fd\u6280\u672f\u52a0\u901f\u7814\u7a76\u8fc7\u7a0b\uff0c\u901a\u8fc7\u56de\u7b54\u7814\u7a76\u4eba\u5458\u7684\u591a\u79cd\u95ee\u9898\u6765\u5e2e\u52a9\u4ed6\u4eec\u3002OpenResearcher\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6784\u5efa\uff0c\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u7279\u5b9a\u9886\u57df\u7684\u6700\u65b0\u77e5\u8bc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5404\u79cd\u5de5\u5177\uff0c\u4f7fOpenResearcher\u80fd\u591f\u7406\u89e3\u7814\u7a76\u4eba\u5458\u7684\u95ee\u9898\u3001\u4ece\u79d1\u5b66\u6587\u732e\u4e2d\u641c\u7d22\u3001\u7b5b\u9009\u68c0\u7d22\u5230\u7684\u4fe1\u606f\u3001\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u7b54\u6848\uff0c\u5e76\u81ea\u6211\u4f18\u5316\u8fd9\u4e9b\u7b54\u6848\u3002OpenResearcher\u7075\u6d3b\u5730\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5728\u6548\u7387\u4e0e\u6709\u6548\u6027\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u3002\u7ed3\u679c\uff0cOpenResearcher\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8282\u7701\u65f6\u95f4\uff0c\u63d0\u9ad8\u4ed6\u4eec\u53d1\u73b0\u65b0\u89c1\u89e3\u548c\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u7a81\u7834\u7684\u6f5c\u529b\u3002\u6f14\u793a\u3001\u89c6\u9891\u548c\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/GAIR-NLP/OpenResearcher\u3002**|\n", "2408.06929": "|**2024-08-13**|**Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas**|Louis Kwok et.al.|[2408.06929](http://arxiv.org/abs/2408.06929)|**[link](https://github.com/louiskwoklf/llms-cultural-adaptability)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u591a\u6587\u5316\u73af\u5883\u4e2d\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5b83\u4eec\u7406\u89e3\u7528\u6237\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u8ba9LLM\u6a21\u62df\u4ee3\u8868\u5404\u79cd\u56fd\u7c4d\u7684\u4eba\u7c7b\u89d2\u8272\u8fdb\u884c\u95ee\u5377\u5f0f\u5fc3\u7406\u5b66\u5b9e\u9a8c\u6765\u8861\u91cf\u8fd9\u4e00\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4f7f\u7528GPT-3.5\u5bf9\u6765\u81ea15\u4e2a\u56fd\u5bb6\u76847,286\u540d\u53c2\u4e0e\u8005\u9605\u8bfb\u5e76\u56de\u5e94\u5177\u6709\u8bf4\u670d\u529b\u7684\u65b0\u95fb\u6587\u7ae0\u7684\u53cd\u5e94\u8fdb\u884c\u6a21\u62df\uff1b\u5e76\u5c06\u7ed3\u679c\u4e0e\u62e5\u6709\u76f8\u540c\u4eba\u53e3\u7edf\u8ba1\u7279\u5f81\u7684\u771f\u5b9e\u53c2\u4e0e\u8005\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0c\u660e\u786e\u6307\u5b9a\u4e00\u4e2a\u4eba\u7684\u5c45\u4f4f\u56fd\u53ef\u4ee5\u63d0\u9ad8GPT-3.5\u4e0e\u4ed6\u4eec\u7684\u53cd\u5e94\u7684\u4e00\u81f4\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5f15\u5165\u7684\u53d8\u5316\u663e\u8457\u964d\u4f4e\u4e86\u6574\u4f53\u4e00\u81f4\u6027\uff0c\u5e76\u4e14\u67d0\u4e9b\u8bed\u8a00\u7279\u522b\u5f71\u54cd\u4e86\u6027\u80fd\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u76f4\u63a5\u63d0\u4f9b\u56fd\u7c4d\u4fe1\u606f\u53ef\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6587\u5316\u9002\u5e94\u6027\uff0c\u4f46\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5e76\u4e0d\u4e00\u5b9a\u80fd\u53ef\u9760\u5730\u63d0\u9ad8\u6a21\u62df\u51c6\u786e\u6027\uff0c\u53cd\u800c\u53ef\u80fd\u635f\u5bb3\u6a21\u578b\u7684\u6709\u6548\u6027\u3002|\n", "2408.06904": "|**2024-08-13**|**Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives**|Zhihu Wang et.al.|[2408.06904](http://arxiv.org/abs/2408.06904)|null|\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6301\u7eed\u6269\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u589e\u5f3a\u5f80\u5f80\u4e0d\u8db3\u4ee5\u89e3\u51b3\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\u3002\u7cfb\u7edf\u6027\u5730\u5206\u6790\u8fd9\u4e9b\u5931\u8d25\u5e76\u6709\u6548\u63d0\u5347\u5176\u6027\u80fd\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86Re-TASK\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7406\u8bba\u6a21\u578b\uff0c\u4ece\u80fd\u529b\u3001\u6280\u80fd\u3001\u77e5\u8bc6\u7684\u89d2\u5ea6\u91cd\u65b0\u5ba1\u89c6LLM\u4efb\u52a1\uff0c\u9075\u5faa\u5e03\u5362\u59c6\u5206\u7c7b\u6cd5\u548c\u77e5\u8bc6\u7a7a\u95f4\u7406\u8bba\u7684\u539f\u5219\u3002Re-TASK\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u6df1\u5316\u6211\u4eec\u5bf9LLM\u7684\u7406\u89e3\u3001\u8bc4\u4f30\u548c\u63d0\u5347\uff0c\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u3002\u5b83\u63a2\u7d22\u4e86LLM\u7684\u80fd\u529b\u3001\u5904\u7406\u7684\u77e5\u8bc6\u4ee5\u53ca\u5e94\u7528\u7684\u6280\u80fd\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u9610\u660e\u4e86\u8fd9\u4e9b\u5143\u7d20\u5982\u4f55\u76f8\u4e92\u5173\u8054\u5e76\u5f71\u54cd\u4efb\u52a1\u8868\u73b0\u3002 \u901a\u8fc7\u5e94\u7528Re-TASK\u6846\u67b6\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8bb8\u591a\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5931\u8d25\u7684\u539f\u56e0\u4e3b\u8981\u5f52\u548e\u4e8e\u77e5\u8bc6\u4e0d\u8db3\u6216\u6280\u80fd\u9002\u5e94\u5ea6\u4e0d\u591f\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u7ed3\u6784\u5316\u7684\u7b56\u7565\u6765\u589e\u5f3aLLM\uff0c\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u77e5\u8bc6\u6ce8\u5165\u548c\u6280\u80fd\u9002\u5e94\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u80fd\u529b\u9879\uff0c\u5e76\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u5347\u4efb\u52a1\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u5c11\u5927\u91cf\u5fae\u8c03\u7684\u9700\u6c42\u3002\u6216\u8005\uff0c\u6211\u4eec\u4f7f\u7528\u80fd\u529b\u7279\u5b9a\u6307\u4ee4\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86\u663e\u8457\u63d0\u9ad8LLM\u5728\u6027\u80fd\u548c\u9002\u7528\u6027\u65b9\u9762\u7684\u6548\u679c\u3002|\n", "2408.06874": "|**2024-08-13**|**Leveraging Language Models for Emotion and Behavior Analysis in Education**|Kaito Tanaka et.al.|[2408.06874](http://arxiv.org/abs/2408.06874)|null|\u5206\u6790\u5b66\u751f\u7684\u60c5\u7eea\u548c\u884c\u4e3a\u5bf9\u4e8e\u63d0\u5347\u5b66\u4e60\u6548\u679c\u4e0e\u4e2a\u6027\u5316\u6559\u80b2\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5bf9\u4fb5\u5165\u6027\u7684\u89c6\u89c9\u548c\u751f\u7406\u6570\u636e\u6536\u96c6\uff0c\u8fd9\u5f15\u53d1\u4e86\u9690\u79c1\u95ee\u9898\u5e76\u9650\u5236\u4e86\u89c4\u6a21\u6027\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u63d0\u793a\u5de5\u7a0b\u6765\u5206\u6790\u5b66\u751f\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u7b56\u7565\u901a\u8fc7\u5b9a\u5236\u7684\u63d0\u793a\u5f15\u5bfcLLMs\u68c0\u6d4b\u60c5\u611f\u548c\u53c2\u4e0e\u72b6\u6001\uff0c\u63d0\u4f9b\u4e00\u79cd\u975e\u4fb5\u5165\u6027\u3001\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528Qwen\u3001ChatGPT\u3001Claude2\u548cGPT-4\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u4e0e\u57fa\u7840\u6a21\u578b\u548c\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u5728\u63d0\u4f9b\u5b9e\u7528\u6709\u6548\u5de5\u5177\u4ee5\u8fdb\u884c\u6559\u80b2\u60c5\u7eea\u548c\u884c\u4e3a\u5206\u6790\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06854": "|**2024-08-13**|**LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models**|Jia-Chen Zhang et.al.|[2408.06854](http://arxiv.org/abs/2408.06854)|null|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5b9e\u73b0\u9ad8\u53c2\u6570\u6548\u7387\u5e76\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u5df2\u6210\u4e3a\u65b0\u7684\u7814\u7a76\u65b9\u5411\u3002\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u663e\u8457\u964d\u4f4e\u4e86\u7ec6\u8c03\u65f6\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u5c3d\u7ba1\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u590d\u6742\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0c\u4ec5\u5728\u5355\u4e00\u5c3a\u5ea6\u4e0a\u8c03\u53c2\u53ef\u80fd\u5e76\u975e\u6700\u4f18\u7b56\u7565\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6269\u5c55LoRA\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aLoRA$^2$\u3002\u9996\u5148\uff0c\u901a\u8fc7\u7ed3\u5408\u6b63\u4ea4\u6295\u5f71\u7406\u8bba\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e24\u7ec4\u5728\u76f8\u4e92\u6b63\u4ea4\u5e73\u9762\u4e0a\u7684LoRA\u96c6\u5408\u3002\u7136\u540e\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u91cd\u8981\u6027\u8bc4\u5206\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5927\u7ea6\u51cf\u5c11\u4e8698.5%\u7684\u53c2\u6570\u654f\u611f\u5ea6\u8ba1\u7b97\u3002\u901a\u8fc7\u53bb\u9664\u5177\u6709\u8f83\u4f4e\u91cd\u8981\u6027\u5206\u6570\u7684\u5947\u5f02\u503c\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u5bf9\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u7684\u9002\u5e94\u80fd\u529b\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1LoRA$^2$\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u5168\u91cf\u7ec6\u8c03\u76f8\u6bd4\uff0c\u5b83\u4ec5\u5c06\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u51cf\u5c11\u81f30.72%\uff0c\u540c\u65f6\u4ecd\u80fd\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u5373\u4f7f\u8fdb\u4e00\u6b65\u5c06\u53c2\u6570\u51cf\u5c11\u81f30.17M\uff0c\u5176\u7ed3\u679c\u4e5f\u4e0e\u57fa\u7ebf\u6a21\u578b\uff08\u53c2\u6570\u91cf\u591a\u51fa8\u500d\uff09\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1a|\n", "2408.06849": "|**2024-08-13**|**Causal Agent based on Large Language Model**|Kairong Han et.al.|[2408.06849](http://arxiv.org/abs/2408.06849)|**[link](https://github.com/kairong-han/causal_agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\u3002\u7136\u800c\uff0c\u56e0\u679c\u95ee\u9898\u7684\u5185\u5728\u590d\u6742\u6027\u548c\u56e0\u679c\u7406\u8bba\u4f7f\u5f97\u7528\u81ea\u7136\u8bed\u8a00\u51c6\u786e\u63cf\u8ff0\u5b83\u4eec\u53d8\u5f97\u56f0\u96be\uff0c\u8fd9\u963b\u788d\u4e86LLM\u6709\u6548\u5730\u7406\u89e3\u548c\u4f7f\u7528\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7528\u81ea\u7136\u8bed\u8a00\u4f20\u8fbe\u56e0\u679c\u65b9\u6cd5\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fd9\u9650\u5236\u4e86LLM\u5e94\u7528\u5b83\u4eec\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u56e0\u679c\u6570\u636e\u96c6\u901a\u5e38\u4ee5\u8868\u683c\u5f62\u5f0f\u5b58\u5728\uff0c\u800cLLM\u5728\u5904\u7406\u81ea\u7136\u8bed\u8a00\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u79cd\u7ed3\u6784\u4e0a\u7684\u4e0d\u5339\u914d\u59a8\u788d\u4e86\u5bf9\u8868\u683c\u6570\u636e\u7684\u6709\u6548\u63a8\u7406\u3002\u7f3a\u4e4f\u56e0\u679c\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86LLM\u7684\u53d1\u5c55\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4e3aLLM\u914d\u5907\u4e86\u56e0\u679c\u5de5\u5177\uff0c\u5e76\u5c06\u5176\u7f6e\u4e8e\u4e00\u4e2a\u4ee3\u7406\u6846\u67b6\u4e2d\uff0c\u79f0\u4e3a\u201c\u56e0\u679c\u4ee3\u7406\u201d\u3002\u8be5\u4ee3\u7406\u5305\u62ec\u5de5\u5177\u3001\u8bb0\u5fc6\u548c\u63a8\u7406\u6a21\u5757\u3002\u5728\u5de5\u5177\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u901a\u8fc7\u5c06\u8868\u683c\u6570\u636e\u4e0e\u81ea\u7136\u8bed\u8a00\u5bf9\u9f50\u6765\u5e94\u7528\u56e0\u679c\u65b9\u6cd5\u3002\u5728\u63a8\u7406\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u91c7\u7528ReAct\u6846\u67b6\u591a\u6b21\u8fed\u4ee3\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u8fdb\u884c\u63a8\u7406\u3002\u5728\u8bb0\u5fc6\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5b57\u5178\u5b9e\u4f8b\uff0c\u5176\u4e2d\u952e\u662f\u552f\u4e00\u7684\u540d\u79f0\uff0c\u503c\u662f\u56e0\u679c\u56fe\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u56e0\u679c\u4ee3\u7406\u7684\u56e0\u679c\u80fd\u529b\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u5305\u62ec\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\uff1a\u53d8\u91cf\u7ea7\u522b\u3001\u8fb9\u7ea7\u522b\u3001\u56e0\u679c\u56fe\u7ea7\u522b\u548c\u56e0\u679c\u6548\u5e94\u7ea7\u522b\u3002\u6211\u4eec\u4f7f\u7528ChatGPT-3.5\u751f\u6210\u4e861300\u4e2a\u9488\u5bf9\u8fd9\u56db\u4e2a\u5c42\u6b21\u95ee\u9898\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u56e0\u679c\u4ee3\u7406\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\u4e0a\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u5747\u8d85\u8fc780%\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u6d1e\u5bdf\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u901a\u8fc7GitHub\u4ed3\u5e93https://github.com/Kairong-Han/Causal_Agent\u83b7\u53d6\u3002**|\n", "2408.07702": "|**2024-08-14**|**The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models**|Karime Maamari et.al.|[2408.07702](http://arxiv.org/abs/2408.07702)|null|Schema linking is a crucial step in Text-to-SQL pipelines, which translate natural language queries into SQL. The goal of schema linking is to retrieve relevant tables and columns (signal) while disregarding irrelevant ones (noise). However, imperfect schema linking can often exclude essential columns needed for accurate query generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at identifying relevant schema elements during generation, without the need for explicit schema linking. This allows Text-to-SQL pipelines to bypass schema linking entirely and instead pass the full database schema to the LLM, eliminating the risk of excluding necessary information. Furthermore, as alternatives to schema linking, we propose techniques that improve Text-to-SQL accuracy without compromising on essential schema information. Our approach achieves 71.83\\% execution accuracy on the BIRD benchmark, ranking first at the time of submission.|\n", "2408.07666": "|**2024-08-15**|**Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities**|Enneng Yang et.al.|[2408.07666](http://arxiv.org/abs/2408.07666)|**[link](https://github.com/ennengyang/awesome-model-merging-methods-theories-applications)**|**Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at \\url{https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications}.**|\n", "2408.07665": "|**2024-08-14**|**Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models**|Yi-Cheng Lin et.al.|[2408.07665](http://arxiv.org/abs/2408.07665)|**[link](https://github.com/dlion168/spoken_stereoset)**|Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address these biases. This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in SLLMs. By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. Our experiments reveal significant insights into their performance and bias levels. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.|\n", "2408.07663": "|**2024-08-14**|**Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions**|Quan Liu et.al.|[2408.07663](http://arxiv.org/abs/2408.07663)|**[link](https://github.com/gigabaozi/aed)**|**Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root causes of jailbreak issues. We first define the Competitive Index to quantify alignment failures and utilize feedback from self-evaluation to compute post-alignment logits. Then, AED adaptively combines AED and post-alignment logits with the original logits to obtain harmless and helpful distributions. Consequently, our method enhances safety alignment while maintaining helpfulness. We conduct experiments across five models and four common jailbreaks, with the results validating the effectiveness of our approach. Code is available at https://github.com/GIGABaozi/AED.git.**|\n", "2408.07611": "|**2024-08-14**|**WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs**|Weijian Xie et.al.|[2408.07611](http://arxiv.org/abs/2408.07611)|null|Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce \"phantom\" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a \"Retrieval-Augmented Generation (RAG)\" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions.|\n", "2408.07583": "|**2024-08-14**|**Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey**|Hamza Kheddar et.al.|[2408.07583](http://arxiv.org/abs/2408.07583)|null|With significant advancements in Transformers LLMs, NLP has extended its reach into many research fields due to its enhanced capabilities in text generation and user interaction. One field benefiting greatly from these advancements is cybersecurity. In cybersecurity, many parameters that need to be protected and exchanged between senders and receivers are in the form of text and tabular data, making NLP a valuable tool in enhancing the security measures of communication protocols. This survey paper provides a comprehensive analysis of the utilization of Transformers and LLMs in cyber-threat detection systems. The methodology of paper selection and bibliometric analysis is outlined to establish a rigorous framework for evaluating existing research. The fundamentals of Transformers are discussed, including background information on various cyber-attacks and datasets commonly used in this field. The survey explores the application of Transformers in IDSs, focusing on different architectures such as Attention-based models, LLMs like BERT and GPT, CNN/LSTM-Transformer hybrids, emerging approaches like ViTs, among others. Furthermore, it explores the diverse environments and applications where Transformers and LLMs-based IDS have been implemented, including computer networks, IoT devices, critical infrastructure protection, cloud computing, SDN, as well as in autonomous vehicles. The paper also addresses research challenges and future directions in this area, identifying key issues such as interpretability, scalability, and adaptability to evolving threats, and more. Finally, the conclusion summarizes the findings and highlights the significance of Transformers and LLMs in enhancing cyber-threat detection capabilities, while also outlining potential avenues for further research and development.|\n", "2408.07543": "|**2024-08-15**|**MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark**|Minxuan Zhou et.al.|[2408.07543](http://arxiv.org/abs/2408.07543)|**[link](https://github.com/PKU-Baichuan-MLSystemLab/MathScape)**|With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchmarks have not sufficiently integrated visual and textual information. To address this gap, we proposed MathScape, a new benchmark that emphasizes the understanding and application of combined visual and textual information. MathScape is designed to evaluate photo-based math problem scenarios, assessing the theoretical understanding and application ability of MLLMs through a categorical hierarchical approach. We conduct a multi-dimensional evaluation on 11 advanced MLLMs, revealing that our benchmark is challenging even for the most sophisticated models. By analyzing the evaluation results, we identify the limitations of MLLMs, offering valuable insights for enhancing model performance.|\n", "2408.07537": "|**2024-08-15**|**Usefulness of data flow diagrams and large language models for security threat validation: a registered report**|Winnie Bahati Mbaka et.al.|[2408.07537](http://arxiv.org/abs/2408.07537)|null|The arrival of recent cybersecurity standards has raised the bar for security assessments in organizations, but existing techniques don't always scale well. Threat analysis and risk assessment are used to identify security threats for new or refactored systems. Still, there is a lack of definition-of-done, so identified threats have to be validated which slows down the analysis. Existing literature has focused on the overall performance of threat analysis, but no previous work has investigated how deep must the analysts dig into the material before they can effectively validate the identified security threats. We propose a controlled experiment with practitioners to investigate whether some analysis material (like LLM-generated advice) is better than none and whether more material (the system's data flow diagram and LLM-generated advice) is better than some material. In addition, we present key findings from running a pilot with 41 MSc students, which are used to improve the study design. Finally, we also provide an initial replication package, including experimental material and data analysis scripts and a plan to extend it to include new materials based on the final data collection campaign with practitioners (e.g., pre-screening questions).|\n", "2408.07531": "|**2024-08-14**|**Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments**|Seungjun Han et.al.|[2408.07531](http://arxiv.org/abs/2408.07531)|null|Emergency department (ED) overcrowding and the complexity of rapid decision-making in critical care settings pose significant challenges to healthcare systems worldwide. While clinical decision support systems (CDSS) have shown promise, the integration of large language models (LLMs) offers new possibilities for enhancing triage accuracy and clinical decision-making. This study presents an LLM-driven CDSS designed to assist ED physicians and nurses in patient triage, treatment planning, and overall emergency care management. We developed a multi-agent CDSS utilizing Llama-3-70b as the base LLM, orchestrated by CrewAI and Langchain. The system comprises four AI agents emulating key ED roles: Triage Nurse, Emergency Physician, Pharmacist, and ED Coordinator. It incorporates the Korean Triage and Acuity Scale (KTAS) for triage assessment and integrates with the RxNorm API for medication management. The model was evaluated using the Asclepius dataset, with performance assessed by a clinical emergency medicine specialist. The CDSS demonstrated high accuracy in triage decision-making compared to the baseline of a single-agent system. Furthermore, the system exhibited strong performance in critical areas, including primary diagnosis, critical findings identification, disposition decision-making, treatment planning, and resource allocation. Our multi-agent CDSS demonstrates significant potential for supporting comprehensive emergency care management. By leveraging state-of-the-art AI technologies, this system offers a scalable and adaptable tool that could enhance emergency medical care delivery, potentially alleviating ED overcrowding and improving patient outcomes. This work contributes to the growing field of AI applications in emergency medicine and offers a promising direction for future research and clinical implementation.|\n", "2408.07505": "|**2024-08-14**|**Large Language Models Know What Makes Exemplary Contexts**|Quanyu Long et.al.|[2408.07505](http://arxiv.org/abs/2408.07505)|null|In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks without needing to update millions of parameters. This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts; self-rank candidates with different demonstration compositions; self-optimize the demonstration selection and ordering through reinforcement learning. Specifically, our method designs a parameter-efficient retrieval head that generates the optimized demonstration after training with rewards from LLM's own preference. Experimental results validate the proposed method's effectiveness in enhancing ICL performance. Additionally, our approach effectively identifies and selects the most representative examples for the current task, and includes more diversity in retrieval.|\n", "2408.08313": "|**2024-08-15**|**Can Large Language Models Understand Symbolic Graphics Programs?**|Zeju Qiu et.al.|[2408.08313](http://arxiv.org/abs/2408.08313)|null|Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown exciting promise towards program synthesis, but do they understand symbolic graphics programs? Unlike conventional programs, symbolic graphics programs can be translated to graphics content. Here, we characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. This task is challenging as the questions are difficult to answer from the symbolic programs alone -- yet, they would be easy to answer from the corresponding graphics content as we verify through a human experiment. To understand symbolic programs, LLMs may need to possess the ability to imagine how the corresponding graphics content would look without directly accessing the rendered visual content. We use this task to evaluate LLMs by creating a large benchmark for the semantic understanding of symbolic graphics programs. This benchmark is built via program-graphics correspondence, hence requiring minimal human efforts. We evaluate current LLMs on our benchmark to elucidate a preliminary assessment of their ability to reason about visual scenes from programs. We find that this task distinguishes existing LLMs and models considered good at reasoning perform better. Lastly, we introduce Symbolic Instruction Tuning (SIT) to improve this ability. Specifically, we query GPT4-o with questions and images generated by symbolic programs. Such data are then used to finetune an LLM. We also find that SIT data can improve the general instruction following ability of LLMs.|\n", "2408.08310": "|**2024-08-15**|**ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws**|Ruihang Li et.al.|[2408.08310](http://arxiv.org/abs/2408.08310)|null|High-quality data is crucial for the pre-training performance of large language models. Unfortunately, existing quality filtering methods rely on a known high-quality dataset as reference, which can introduce potential bias and compromise diversity. In this paper, we propose ScalingFilter, a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data, thereby eliminating the influence of the reference dataset in the filtering process. An theoretical analysis shows that ScalingFilter is equivalent to an inverse utilization of scaling laws. Through training models with 1.3B parameters on the same data source processed by various quality filters, we find ScalingFilter can improve zero-shot performance of pre-trained models in downstream tasks. To assess the bias introduced by quality filtering, we introduce semantic diversity, a metric of utilizing text embedding models for semantic representations. Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.|\n", "2408.08302": "|**2024-08-15**|**Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors**|Usman Syed et.al.|[2408.08302](http://arxiv.org/abs/2408.08302)|null|In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems. This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced LLMs, especially their accuracy, consistency, and reasoning behaviors, in solving transportation engineering problems. Our comprehensive analysis uncovers the unique strengths and limitations of each LLM, e.g. our analysis shows the impressive accuracy and some unexpected inconsistent behaviors of Claude 3.5 Sonnet in solving TransportBench problems. Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.|\n", "2408.08300": "|**2024-08-15**|**HELP: Hierarchical Embeddings-based Log Parsing**|Andy Xu et.al.|[2408.08300](http://arxiv.org/abs/2408.08300)|null|Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heuristics-based parsers require handcrafted features and domain knowledge, which are difficult to generalize at scale. Second, existing large language model-based parsers rely on periodic offline processing, limiting their effectiveness in real-time use cases. Third, existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies. To address these challenges, we propose HELP, a Hierarchical Embeddings-based Log Parser. HELP is the first online semantic-based parser to leverage LLMs for performant and cost-effective log parsing. We achieve this through a novel hierarchical embeddings module, which fine-tunes a text embedding model to cluster logs before parsing, reducing querying costs by multiple orders of magnitude. To combat log drift, we also develop an iterative rebalancing module, which periodically updates existing log groupings. We evaluate HELP extensively on 14 public large-scale datasets, showing that HELP achieves significantly higher F1-weighted grouping and parsing accuracy than current state-of-the-art online log parsers. We also implement HELP into Iudex's production observability platform, confirming HELP's practicality in a production environment. Our results show that HELP is effective and efficient for high-throughput real-world log parsing.|\n", "2408.08291": "|**2024-08-15**|**The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community**|Shachar Don-Yehiya et.al.|[2408.08291](http://arxiv.org/abs/2408.08291)|null|Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.|\n", "2408.08282": "|**2024-08-15**|**Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model**|Jin Wang et.al.|[2408.08282](http://arxiv.org/abs/2408.08282)|null|Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.|\n", "2408.08274": "|**2024-08-15**|**BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts**|Qizhen Zhang et.al.|[2408.08274](http://arxiv.org/abs/2408.08274)|null|The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when \"upcycling\" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.|\n", "2408.08231": "|**2024-08-15**|**DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System**|Xihong Yang et.al.|[2408.08231](http://arxiv.org/abs/2408.08231)|null|Benefiting from the strong reasoning capabilities, Large language models (LLMs) have demonstrated remarkable performance in recommender systems. Various efforts have been made to distill knowledge from LLMs to enhance collaborative models, employing techniques like contrastive learning for representation alignment. In this work, we prove that directly aligning the representations of LLMs and collaborative models is sub-optimal for enhancing downstream recommendation tasks performance, based on the information theorem. Consequently, the challenge of effectively aligning semantic representations between collaborative models and LLMs remains unresolved. Inspired by this viewpoint, we propose a novel plug-and-play alignment framework for LLMs and collaborative models. Specifically, we first disentangle the latent representations of both LLMs and collaborative models into specific and shared components via projection layers and representation regularization. Subsequently, we perform both global and local structure alignment on the shared representations to facilitate knowledge transfer. Additionally, we theoretically prove that the specific and shared representations contain more pertinent and less irrelevant information, which can enhance the effectiveness of downstream recommendation tasks. Extensive experimental results on benchmark datasets demonstrate that our method is superior to existing state-of-the-art algorithms.|\n", "2408.08217": "|**2024-08-15**|**RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science**|David Farr et.al.|[2408.08217](http://arxiv.org/abs/2408.08217)|null|Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.|\n", "2408.08210": "|**2024-08-15**|**Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models**|Javier Gonz\u00e1lez et.al.|[2408.08210](http://arxiv.org/abs/2408.08210)|null|Recent advances in AI have been significantly driven by the capabilities of large language models (LLMs) to solve complex problems in ways that resemble human thinking. However, there is an ongoing debate about the extent to which LLMs are capable of actual reasoning. Central to this debate are two key probabilistic concepts that are essential for connecting causes to their effects: the probability of necessity (PN) and the probability of sufficiency (PS). This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms using these probabilistic measures. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which it is possible to compute suitable approximations of PN and PS. Our research marks an important step towards gaining a deeper understanding of when LLMs are capable of reasoning, as illustrated by a series of math examples.|\n", "2408.08869": "|**2024-08-16**|**PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars**|Sumanth Prabhu et.al.|[2408.08869](http://arxiv.org/abs/2408.08869)|null|\u81ea\u4e00\u81f4\u6027\u7b49\u4f9d\u8d56\u4e8e\u51c6\u786e\u7b54\u6848\u63d0\u53d6\u8fc7\u7a0b\u7684\u81ea\u6211\u96c6\u4e1b\u6280\u672f\u5df2\u7ecf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6280\u672f\u5728\u805a\u5408\u591a\u4e2a\u8f93\u51fa\u65f6\u9700\u8981\u8f83\u9ad8\u7684\u63a8\u7406\u6210\u672c\uff0c\u76f8\u8f83\u4e8e\u8d2a\u5fc3\u89e3\u7801\u800c\u8a00\uff0c\u751f\u6210\u76f8\u5bf9\u8f83\u591a\u7684\u8f93\u51fa\u4ee4\u724c\u3002\u7814\u7a76\u663e\u793a\uff0c\u81ea\u4e00\u81f4\u6027\u65b9\u6cd5\u4ea7\u751f\u7684\u81ea\u7531\u6587\u672c\u8f93\u51fa\u53ef\u4ee5\u901a\u8fc7LLM\u53ef\u9760\u5730\u805a\u5408\u4ee5\u4ea7\u751f\u6700\u7ec8\u8f93\u51fa\u3002\u6b64\u5916\uff0c\u6700\u8fd1\u7684LLM\u63a8\u7406\u8fdb\u5c55\u8868\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u591a\u6837\u5316\u7684\u793a\u4f8b\u80fd\u591f\u8bf1\u5bfcLLM\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002\u8fd9\u4e9b\u5df2\u7ecf\u8bc1\u660e\u7684\u6280\u672f\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u6269\u5c55\u5230\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u5b9e\u73b0\u6587\u672c\u751f\u6210\u7684\u6574\u4f53\u6027\u80fd\u6539\u8fdb\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPEDAL\uff08\u57fa\u4e8e\u793a\u4f8b\u591a\u6837\u6027\u7684LLM\u805a\u5408\uff09\u7684\u6df7\u5408\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u57fa\u4e8e\u591a\u6837\u793a\u4f8b\u63d0\u793a\u548cLLM\u805a\u5408\u7684\u4f18\u52bf\uff0c\u4ee5\u5b9e\u73b0\u6027\u80fd\u7684\u63d0\u5347\u3002\u5728\u516c\u5f00\u7684SVAMP\u548cARC\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\uff0c\u4e0e\u57fa\u4e8e\u8d2a\u5fc3\u89e3\u7801\u7684\u7b56\u7565\u76f8\u6bd4\uff0cPEDAL\u80fd\u591f\u5728\u8f83\u4f4e\u7684\u63a8\u7406\u6210\u672c\u4e0b\u83b7\u5f97\u66f4\u597d\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u57fa\u4e8e\u81ea\u4e00\u81f4\u6027\u7684\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u4f18\u52bf\u3002|\n", "2408.08862": "|**2024-08-16**|**Visual Agents as Fast and Slow Thinkers**|Guangyan Sun et.al.|[2408.08862](http://arxiv.org/abs/2408.08862)|**[link](https://github.com/guangyans/sys2-llava)**|\u5b9e\u73b0\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u667a\u80fd\u9700\u8981\u5bf9\u8ba4\u77e5\u4e0a\u7684\u7b2c\u4e00\u7cfb\u7edf\u548c\u7b2c\u4e8c\u7cfb\u7edf\u601d\u7ef4\u8fdb\u884c\u7ec6\u5316\u3002\u5f53\u524d\u7684\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\uff0c\u867d\u7136\u8868\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u7279\u70b9\uff0c\u4f46\u5e76\u672a\u8fbe\u5230\u771f\u6b63\u7684\u8ba4\u77e5\u6c34\u5e73\u3002\u5728\u4ece\u7ed3\u6784\u5316\u57fa\u51c6\u5411\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8fc7\u6e21\u7684\u8fc7\u7a0b\u4e2d\uff0c\u89c6\u89c9\u4ee3\u7406\u9762\u4e34\u6311\u6218\uff0c\u5f80\u5f80\u5bfc\u81f4\u56de\u7b54\u65e2\u4e0d\u51c6\u786e\u53c8\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86FaST\uff08\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\uff09\uff0c\u5b83\u5c06\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\u673a\u5236\u878d\u5165\u5230\u89c6\u89c9\u4ee3\u7406\u4e2d\u3002FaST\u91c7\u7528\u5207\u6362\u9002\u914d\u5668\u52a8\u6001\u9009\u62e9\u7cfb\u7edf1/2\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u7684\u590d\u6742\u6027\u8c03\u6574\u89e3\u51b3\u95ee\u9898\u7684\u65b9\u6cd5\u3002\u9762\u5bf9\u4e0d\u786e\u5b9a\u548c\u672a\u89c1\u8fc7\u7684\u5bf9\u8c61\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u4fe1\u5fc3\u5e76\u6574\u5408\u65b0\u7684\u4e0a\u4e0b\u6587\u6570\u636e\uff0c\u5b83\u80fd\u591f\u7075\u6d3b\u5e94\u5bf9\u3002 \u6211\u4eec\u63d0\u5021\u4e00\u4e2a\u7075\u6d3b\u7684\u7cfb\u7edf\u3001\u5c42\u6b21\u5316\u7684\u63a8\u7406\u80fd\u529b\u548c\u900f\u660e\u7684\u51b3\u7b56\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u4f7f\u5f97FaST\u80fd\u591f\u6a21\u4eff\u4eba\u7c7b\u5728\u89c6\u89c9\u667a\u80fd\u4e2d\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFaST\u5728\u89c6\u89c9\u95ee\u7b54(VQA^{v2})\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8680.8%\u7684\u51c6\u786e\u7387\uff0c\u5728\u63a8\u7406\u5206\u5272(ReasonSeg)\u4efb\u52a1\u4e0a\u83b7\u5f97\u4e8648.7%\u7684GIoU\u5206\u6570\uff0c\u8fd9\u5145\u5206\u5c55\u793a\u4e86FaST\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u5e7f\u6cdb\u7684\u6d4b\u8bd5\u9a8c\u8bc1\u4e86FaST\u6838\u5fc3\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u548c\u7a33\u5065\u6027\uff0c\u663e\u793a\u4e86\u5176\u5728\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u8ba4\u77e5\u89c6\u89c9\u4ee3\u7406\u7684\u53d1\u5c55\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.08849": "|**2024-08-16**|**ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis**|Yubao Zhao et.al.|[2408.08849](http://arxiv.org/abs/2408.08849)|null|\u5728\u533b\u7597\u8f85\u52a9\u9886\u57df\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u6210\u529f\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97\u60a3\u8005\u80fd\u591f\u5229\u7528\u751f\u7406\u4fe1\u53f7\u6570\u636e\u8fdb\u884c\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u901a\u7528\u7684MLLMs\u5728\u5fc3\u810f\u75c5\u8bca\u65ad\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u5728ECG\u6570\u636e\u89e3\u6790\u4e0e\u957f\u6587\u672c\u533b\u5b66\u62a5\u544a\u751f\u6210\u7684\u6574\u5408\u4e0a\uff0c\u4e3b\u8981\u539f\u56e0\u662fECG\u6570\u636e\u89e3\u6790\u7684\u590d\u6742\u6027\u4ee5\u53ca\u6587\u672c\u4e0eECG\u4fe1\u53f7\u6a21\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6a21\u578b\u5728\u957f\u6587\u672c\u751f\u6210\u65f6\u5f80\u5f80\u5b58\u5728\u4e25\u91cd\u7684\u7a33\u5b9a\u6027\u95ee\u9898\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u4e0e\u7528\u6237\u67e5\u8be2\u7d27\u5bc6\u76f8\u5173\u7684\u7cbe\u786e\u77e5\u8bc6\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aECG-Chat\u7684\u591a\u4efb\u52a1MLLM\uff0c\u4e13\u6ce8\u4e8eECG\u533b\u5b66\u62a5\u544a\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u57fa\u4e8e\u5fc3\u810f\u75c5\u5b66\u77e5\u8bc6\u7684\u8de8\u6a21\u6001\u5bf9\u8bdd\u80fd\u529b\u3002\u6211\u4eec\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5c06ECG\u6ce2\u5f62\u6570\u636e\u4e0e\u6587\u672c\u62a5\u544a\u7ed3\u5408\uff0c\u4ee5\u7cbe\u7ec6\u7684\u65b9\u5f0f\u5bf9\u9f50ECG\u7279\u5f81\u4e0e\u62a5\u544a\u5185\u5bb9\u3002\u8fd9\u79cd\u65b9\u6cd5\u8fd8\u4ea7\u751f\u4e86\u4e00\u4e2a\u5728\u96f6\u6837\u672c\u62a5\u544a\u68c0\u7d22\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u7684ECG\u7f16\u7801\u5668\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u6269\u5c55\u73b0\u6709\u6570\u636e\u96c6\uff0c\u6784\u5efa\u4e86\u5305\u542b19K\u4e2aECG\u8bca\u65ad\u6570\u636e\u96c6\u548c25K\u4e2a\u591a\u8f6e\u5bf9\u8bdd\u6570\u636e\u96c6\u7528\u4e8e\u8bad\u7ec3\u548c\u5fae\u8c03ECG-Chat\uff0c\u4ece\u800c\u63d0\u4f9b\u4e13\u4e1a\u7684\u8bca\u65ad\u548c\u5bf9\u8bdd\u80fd\u529b\u3002\u6b64\u5916\uff0cECG-Chat\u53ef\u4ee5\u901a\u8fc7\u81ea\u52a8\u5316LaTeX\u751f\u6210\u7ba1\u9053\u6765\u751f\u6210\u5168\u9762\u7684ECG\u5206\u6790\u62a5\u544a\u3002\u6211\u4eec\u4e3aECG\u62a5\u544a\u751f\u6210\u4efb\u52a1\u5efa\u7acb\u4e86\u57fa\u51c6\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u7ebf\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002ECG-Chat\u5728\u5206\u7c7b\u3001\u68c0\u7d22\u3001\u591a\u6a21\u6001\u5bf9\u8bdd\u548c\u533b\u5b66\u62a5\u544a\u751f\u6210\u4efb\u52a1\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u4f73\u6027\u80fd\u3002\u6211\u4eec\u7684\u62a5\u544a\u6a21\u677f\u8bbe\u8ba1\u4e5f\u5f97\u5230\u4e86\u533b\u7597\u4e13\u4e1a\u4eba\u5458\u7684\u4e00\u81f4\u8ba4\u53ef\u3002|\n", "2408.08848": "|**2024-08-16**|**PsychoLex: Unveiling the Psychological Mind of Large Language Models**|Mohammad Amin Abbasi et.al.|[2408.08848](http://arxiv.org/abs/2408.08848)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5fc3\u7406\u5b66\u4e0e\u4eba\u5de5\u667a\u80fd\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5f00\u53d1\u548c\u8bc4\u4f30\u4e13\u7528\u4e8e\u5fc3\u7406\u4efb\u52a1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u6211\u4eec\u5f15\u5165\u4e86PsychoLex\u5957\u4ef6\uff0c\u65e8\u5728\u589e\u5f3aLLMs\u5728\u6ce2\u65af\u8bed\u548c\u82f1\u8bed\u4e2d\u7684\u5fc3\u7406\u4efb\u52a1\u5904\u7406\u80fd\u529b\u3002\u4e3b\u8981\u8d21\u732e\u5305\u62ecPsychoLexQA\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u6559\u5b66\u5185\u5bb9\u7684\u521b\u5efa\uff0c\u4ee5\u53caPsychoLexEval\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5bf9LLMs\u5728\u590d\u6742\u5fc3\u7406\u60c5\u666f\u4e0b\u7684\u4e25\u683c\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4ecb\u7ecd\u4e86PsychoLexLLaMA\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u7279\u522b\u4f18\u5316\u4ee5\u9002\u7528\u4e8e\u5fc3\u7406\u5e94\u7528\uff0c\u5176\u6027\u80fd\u660e\u663e\u4f18\u4e8e\u901a\u7528\u6a21\u578b\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5b9a\u5236LLMs\u5728\u63a8\u8fdb\u5fc3\u7406\u7814\u7a76\u548c\u5e94\u7528\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u540c\u65f6\u4e5f\u6307\u51fa\u4e86\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u9886\u57df\u3002\u8fd9\u9879\u7814\u7a76\u4e3a\u5c06LLMs\u878d\u5165\u7279\u5b9a\u7684\u5fc3\u7406\u5b66\u9886\u57df\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5bf9\u672a\u6765AI\u9a71\u52a8\u7684\u5fc3\u7406\u5b9e\u8df5\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2408.08841": "|**2024-08-16**|**FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats**|Xuanliang Zhang et.al.|[2408.08841](http://arxiv.org/abs/2408.08841)|**[link](https://github.com/zhxlia/FLEXTAF)**|**## \u4e0a\u6587\u80cc\u666f \u8868\u683c\u63a8\u7406\u4efb\u52a1\u65e8\u5728\u6839\u636e\u7ed9\u5b9a\u7684\u8868\u683c\u56de\u7b54\u95ee\u9898\u3002\u76ee\u524d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u8868\u683c\u63a8\u7406\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u73b0\u6709\u7684\u5927\u591a\u6570\u65b9\u6cd5\u90fd\u91c7\u7528\u56fa\u5b9a\u7684\u8868\u683c\u683c\u5f0f\u6765\u8868\u793a\u8868\u683c\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u6027\u80fd\u3002\u9274\u4e8e\u6bcf\u4e2a\u5b9e\u4f8b\u9700\u8981\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u800c\u6a21\u578b\u5177\u6709\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u6211\u4eec\u65ad\u8a00\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u9002\u7528\u4e8e\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\u3002\u901a\u8fc7\u5b9e\u9a8c\u7ed3\u679c\u7684\u5b9a\u91cf\u5206\u6790\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e00\u70b9\uff1a\u4f7f\u7528\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\uff0c\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u53ef\u4ee5\u83b7\u5f97\u4e0d\u540c\u7684\u6027\u80fd\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u589e\u5f3a\u8868\u683c\u63a8\u7406\u6027\u80fd\u7684\u65b9\u6cd5FLEXTAF-Single\u548cFLEXTAF-Vote\uff0c\u901a\u8fc7\u4f7f\u7528\u7075\u6d3b\u7684\u8868\u683c\u683c\u5f0f\u3002\u5177\u4f53\u6765\u8bf4\uff0c(i) FLEXTAF-Single\u8bad\u7ec3\u4e00\u4e2a\u5206\u7c7b\u5668\uff0c\u57fa\u4e8e\u5b9e\u4f8b\u548cLLM\u9884\u6d4b\u6700\u9002\u5408\u7684\u8868\u683c\u683c\u5f0f\u3002(ii) FLEXTAF-Vote\u5728\u4e0d\u540c\u683c\u5f0f\u4e4b\u95f4\u96c6\u6210\u7ed3\u679c\u3002\u6211\u4eec\u5728WikiTableQuestions\u548cTabFact\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u4e86\u663e\u8457\u7684\u6539\u8fdb\uff0c\u4e0e\u4f7f\u7528\u56fa\u5b9a\u8868\u683c\u683c\u5f0f\u5e76\u7ed3\u5408\u8d2a\u5a6a\u89e3\u7801\u548c\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u8fbe\u5230\u7684\u6700\u4f73\u6027\u80fd\u76f8\u6bd4\uff0c\u5e73\u5747\u63d0\u9ad8\u4e862.3%\u548c4.8%\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.08811": "|**2024-08-16**|**Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors**|Felipe A. Csaszar et.al.|[2408.08811](http://arxiv.org/abs/2408.08811)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5982\u4f55\u5f71\u54cd\u4f01\u4e1a\u6218\u7565\u51b3\u7b56\u8fc7\u7a0b\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u4f8b\u5c55\u793a\u4e86AI\u5982\u4f55\u589e\u5f3a\u73b0\u6709\u6218\u7565\u51b3\u7b56\u5de5\u5177\uff0c\u5e76\u63d0\u4f9b\u4e86\u6765\u81ea\u9886\u5148\u52a0\u901f\u5668\u8ba1\u5212\u548c\u521b\u4e1a\u7ade\u8d5b\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660e\u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u548c\u8bc4\u4f30\u7b56\u7565\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u4f01\u4e1a\u5bb6\u548c\u6295\u8d44\u8005\u76f8\u5f53\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5206\u6790\u4e86\u6218\u7565\u51b3\u7b56\u80cc\u540e\u7684\u5173\u952e\u8ba4\u77e5\u8fc7\u7a0b\u2014\u2014\u641c\u7d22\u3001\u8868\u793a\u548c\u805a\u5408\uff0c\u5e76\u63d0\u51faAI\u6709\u53ef\u80fd\u63d0\u5347\u6218\u7565\u5206\u6790\u7684\u901f\u5ea6\u3001\u8d28\u91cf\u548c\u89c4\u6a21\uff0c\u540c\u65f6\u8fd8\u80fd\u542f\u7528\u5982\u865a\u62df\u6218\u7565\u6a21\u62df\u7b49\u65b0\u65b9\u6cd5\u3002\u7136\u800c\uff0cAI\u5bf9\u4f01\u4e1a\u53d1\u5c55\u7684\u5f71\u54cd\u6700\u7ec8\u53d6\u51b3\u4e8e\u7ade\u4e89\u52a8\u6001\u4ee5\u53caAI\u80fd\u529b\u7684\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5c06AI\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u5e94\u7528\u4e0e\u4f01\u4e1a\u7ed3\u679c\u8054\u7cfb\u8d77\u6765\uff0c\u5e76\u8ba8\u8bba\u4e86AI\u5982\u4f55\u91cd\u5851\u7ade\u4e89\u4f18\u52bf\u7684\u6765\u6e90\u3002\u6700\u540e\uff0c\u6211\u4eec\u8003\u8651\u4e86AI\u5982\u4f55\u65e2\u652f\u6301\u53c8\u6311\u6218\u57fa\u4e8e\u7406\u8bba\u7684\u6218\u7565\u89c2\u7684\u6838\u5fc3\u539f\u5219\u3002\u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63cf\u7ed8\u4e86\u4e00\u4e2aAI\u4e0e\u6218\u7565\u9886\u57df\u6b63\u5728\u5f62\u6210\u7684\u7814\u7a76\u524d\u6cbf\u3002|\n", "2408.08808": "|**2024-08-16**|**Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge**|Ravi Raju et.al.|[2408.08808](http://arxiv.org/abs/2408.08808)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u5b66\u4e60\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u7136\u800c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u591a\u6837\u884c\u4e3a\u3002\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7684\u4ef7\u503c\u5728\u4e8e\u5b83\u80fd\u5426\u6e05\u6670\u533a\u5206\u4e0d\u540c\u80fd\u529b\u7ea7\u522b\u7684\u6a21\u578b\uff08\u53ef\u5206\u6027\uff09\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u7d27\u5bc6\u5339\u914d\u5ea6\u3002\u5f53\u524d\u7684\u6846\u67b6\u5982Alpaca-Eval 2.0 LC \\cite{dubois2024lengthcontrolledalpacaevalsimpleway} \u548cArena-Hard v0.1 \\cite{li2024crowdsourced}\u4e3b\u8981\u5173\u6ce8\u901a\u7528\u67e5\u8be2\uff0c\u5e76\u4e14\u7f3a\u4e4f\u8de8\u6cd5\u5f8b\u3001\u533b\u5b66\u7b49\u9886\u57df\u7684\u591a\u6837\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u7ba1\u9053\uff0c\u6765\u5b9a\u5236\u4e00\u7cfb\u5217\u591a\u5143\u5316\u7684\u3001\u9488\u5bf9LLM-as-a-Judge\u6846\u67b6\u7684\u9886\u57df\u7279\u5b9a\u8bc4\u4f30\u96c6\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u4eba\u5de5\u7b5b\u9009\u3001\u534a\u76d1\u7763\u5b66\u4e60\u751f\u6210\u805a\u7c7b\u4ee5\u53ca\u5206\u5c42\u62bd\u6837\uff0c\u786e\u4fdd\u5728\u5e7f\u6cdb\u9886\u57df\u548c\u8bed\u8a00\u4e2d\u90fd\u6709\u5747\u8861\u7684\u4ee3\u8868\u6027\u3002\u4ea7\u751f\u7684\u8bc4\u4f30\u96c6\u5305\u62ec1573\u4e2a\u6837\u672c\uff0c\u5206\u5e03\u572814\u4e2a\u7c7b\u522b\u4e2d\uff0c\u663e\u793a\u51fa\u9ad8\u53ef\u5206\u6027\uff0884%\uff09\u548c\u5bf9\u524d\u5341\u5927\u6a21\u578b\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u540c\u65f6\u4e0eChatbot Arena\u7684\u5171\u8bc6\u5ea6\uff0884%\uff09\u548cSpearman\u76f8\u5173\u7cfb\u6570\uff080.915\uff09\u4e5f\u8868\u73b0\u51fa\u826f\u597d\u7684\u4e00\u81f4\u6027\u3002\u4e0eAlpacaEval 2.0 LC\u7684\u5171\u8bc6\u5ea6\u76f8\u6bd4\uff0c\u8fd9\u4e00\u503c\u9ad8\u51fa9%\uff0c\u4e0eArena Hard\u76f8\u6bd4\u5219\u9ad8\u51fa20%\uff0c\u800c\u4e0eSpearman\u7cfb\u6570\u76f8\u6bd4\u5219\u662f\u4e0b\u4e00\u4e2a\u6700\u4f73\u57fa\u51c6\u76840.7\u500d\uff0c\u8fd9\u8868\u660e\u6211\u4eec\u5728\u57fa\u51c6\u6d4b\u8bd5\u7684\u6709\u6548\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f00\u6e90\u7684\u8bc4\u4f30\u5de5\u5177\uff0c\u5141\u8bb8\u7528\u6237\u81ea\u5b9a\u4e49\u7c7b\u522b\u8fdb\u884c\u7cbe\u7ec6\u5206\u6790\uff0c\u4ece\u800c\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6d1e\u5bdf\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u589e\u5f3aLLM\u8bc4\u4f30\u65b9\u6cd5\u7684\u900f\u660e\u5ea6\u3001\u591a\u6837\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2408.08782": "|**2024-08-16**|**EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics**|Chenwei Wan et.al.|[2408.08782](http://arxiv.org/abs/2408.08782)|**[link](https://github.com/cw-wan/EmoDynamiX-v2)**|**\u8bbe\u8ba1\u80fd\u591f\u63d0\u4f9b\u6170\u85c9\u548c\u5efa\u8bae\u7684\u5177\u6709\u60c5\u611f\u667a\u80fd\u7684\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u4ee5\u5e2e\u52a9\u90a3\u4e9b\u7ecf\u5386\u538b\u529b\u7684\u4eba\u4eec\uff0c\u662f\u4e00\u4e2a\u6781\u5177\u5438\u5f15\u529b\u7684\u7814\u7a76\u9886\u57df\u3002\u8fc7\u53bb\u7684\u7814\u7a76\u5de5\u4f5c\u7740\u91cd\u4e8e\u6784\u5efa\u6a21\u5757\u5316\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u5e76\u5c06\u5176\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u89c6\u4e3a\u8f85\u52a9\u4efb\u52a1\uff0c\u901a\u8fc7\u5b9a\u5236\u89e3\u7801\u5668\u751f\u6210\u6761\u4ef6\u5316\u7684\u54cd\u5e94\u3002\u6700\u8fd1\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u65b9\u9762\u7684\u53d1\u5c55\u4f7f\u5f97\u65e0\u9700\u660e\u786e\u7684\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u6b65\u9aa4\u7684\u7aef\u5230\u7aef\u5bf9\u8bdd\u4ee3\u7406\u53d8\u5f97\u6d41\u884c\u8d77\u6765\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u8bed\u8a00\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u56fa\u6709\u7684\u504f\u597d\u504f\u89c1\uff0c\u503e\u5411\u4e8e\u67d0\u4e9b\u793e\u4f1a\u60c5\u611f\u7b56\u7565\uff0c\u963b\u788d\u4e86\u63d0\u4f9b\u9ad8\u8d28\u91cf\u60c5\u611f\u652f\u6301\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff1a\u5c06\u7b56\u7565\u9884\u6d4b\u4e0e\u8bed\u8a00\u751f\u6210\u5206\u79bb\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aEmoDynamiX\u7684\u65b0\u578b\u5bf9\u8bdd\u7b56\u7565\u9884\u6d4b\u5668\u3002\u8be5\u9884\u6d4b\u5668\u5229\u7528\u5f02\u6784\u56fe\u6765\u5efa\u6a21\u7528\u6237\u60c5\u7eea\u4e0e\u7cfb\u7edf\u7b56\u7565\u4e4b\u95f4\u7684\u5bf9\u8bdd\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4e86\u5bf9\u8bdd\u4e2d\u60c5\u611f\u8bc6\u522b\uff08ERC\uff09\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u6df7\u5408\u60c5\u7eea\u6a21\u5757\uff0c\u4ee5\u6355\u6349\u7528\u6237\u7684\u7ec6\u5fae\u60c5\u611f\u72b6\u6001\u3002\u5728\u4e24\u4e2aESC\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEmoDynamiX\u663e\u8457\u8d85\u8d8a\u4e86\u5148\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u7ecf\u79fb\u9664\u4e86\",\"\u5b57\u7b26\u3002**|\n", "2408.08780": "|**2024-08-16**|**Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions**|Chenming Tang et.al.|[2408.08780](http://arxiv.org/abs/2408.08780)|null|\u901a\u8fc7\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728ICL\u8fc7\u7a0b\u4e2d\u63cf\u8ff0\u6027\u6307\u4ee4\u7684\u4f5c\u7528\u4ecd\u7136\u6709\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u96c6\u6210\u63d0\u793a\u6846\u67b6\uff0c\u7528\u4e8e\u63cf\u8ff0\u591a\u4e2a\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u9009\u62e9\u6807\u51c6\uff0c\u5e76\u5728\u516d\u4e2a\u7ffb\u8bd1\u65b9\u5411\u7684\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u4efb\u52a1\u4e0a\u7684\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u6846\u67b6\u80fd\u591f\u63d0\u5347ICL\u6027\u80fd\u3002\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0cLLM\u53ef\u80fd\u5e76\u4e0d\u5173\u5fc3\u63cf\u8ff0\u7684\u5177\u4f53\u5185\u5bb9\uff0c\u6027\u80fd\u63d0\u5347\u4e3b\u8981\u6e90\u4e8e\u96c6\u6210\u683c\u5f0f\uff0c\u5373\u4f7f\u4f7f\u7528\u968f\u673a\u63cf\u8ff0\u540d\u8bcd\uff0c\u8be5\u6846\u67b6\u4e5f\u80fd\u5e26\u6765\u6539\u8fdb\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u5e38\u8bc6\u3001\u6570\u5b66\u3001\u903b\u8f91\u63a8\u7406\u548c\u5e7b\u89c9\u4efb\u52a1\u4e0a\u5e94\u7528\u4e86\u8fd9\u79cd\u65b0\u7684\u96c6\u6210\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u4e09\u79cdLLM\u53d6\u5f97\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u8fd9\u518d\u6b21\u8868\u660e\u8bbe\u8ba1\u9002\u5f53\u7684\u63d0\u793a\u683c\u5f0f\u6bd4\u4e13\u6ce8\u4e8e\u7279\u5b9a\u63cf\u8ff0\u66f4\u4e3a\u6709\u6548\u548c\u9ad8\u6548\u3002\u5728\u8bba\u6587\u53d1\u8868\u540e\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.08779": "|**2024-08-16**|**DAC: Decomposed Automation Correction for Text-to-SQL**|Dingzirui Wang et.al.|[2408.08779](http://arxiv.org/abs/2408.08779)|**[link](https://github.com/zirui-HIT/DAC)**|**\u6587\u672c\u5230SQL\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u4efb\u52a1\uff0c\u5b83\u901a\u8fc7\u81ea\u52a8\u751f\u6210SQL\u67e5\u8be2\u5e2e\u52a9\u4eba\u4eec\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u8003\u8651\u5230\u51fa\u8272\u7684\u6027\u80fd\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b9\u6cd5\u6210\u4e3a\u4e86\u6587\u672c\u5230SQL\u7684\u4e3b\u6d41\u65b9\u5f0f\u3002\u5728\u8fd9\u7c7b\u65b9\u6cd5\u4e2d\uff0c\u81ea\u52a8\u4fee\u6b63\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u624b\u6bb5\uff0c\u80fd\u591f\u901a\u8fc7\u7ea0\u6b63\u751f\u6210\u7ed3\u679c\u4e2d\u7684\u9519\u8bef\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u73b0\u6709\u4fee\u6b63\u65b9\u6cd5\u8981\u6c42LLM\u76f4\u63a5\u5bf9\u751f\u6210\u7684SQL\u8fdb\u884c\u4fee\u6b63\uff0c\u800c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u68c0\u6d4b\u9519\u8bef\uff0c\u5bfc\u81f4\u4e86\u8f83\u5dee\u7684\u6027\u80fd\u3002\u56e0\u6b64\uff0c\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u91c7\u7528\u5206\u89e3\u5f0f\u4fee\u6b63\u6765\u589e\u5f3a\u6587\u672c\u5230SQL\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5206\u89e3\u5f0f\u4fee\u6b63\u4f18\u4e8e\u76f4\u63a5\u4fee\u6b63\uff0c\u56e0\u4e3a\u4e0eSQL\u76f8\u6bd4\uff0c\u901a\u8fc7\u7ed3\u679c\u5206\u89e3\u5b50\u4efb\u52a1\u6765\u68c0\u6d4b\u548c\u4fee\u590d\u9519\u8bef\u66f4\u4e3a\u5bb9\u6613\u3002\u57fa\u4e8e\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5206\u89e3\u81ea\u52a8\u5316\u4fee\u6b63\uff08DAC\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u5c06\u6587\u672c\u5230SQL\u5206\u89e3\u4e3a\u5b9e\u4f53\u94fe\u63a5\u548c\u9aa8\u67b6\u89e3\u6790\u4e24\u4e2a\u5b50\u4efb\u52a1\u6765\u4fee\u6b63SQL\u3002DAC\u9996\u5148\u751f\u6210\u4e0e\u95ee\u9898\u5bf9\u5e94\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\uff0c\u7136\u540e\u6bd4\u8f83\u521d\u59cbSQL\u4e0e\u751f\u6210\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\u4e4b\u95f4\u7684\u5dee\u5f02\u4f5c\u4e3a\u4fee\u6b63\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u3001Bird\u548cKaggleDBQA\u4e0a\u7684\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e863.7%\uff0c\u8bc1\u660e\u4e86DAC\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10197": "|**2024-08-19**|**Demystifying the Communication Characteristics for Distributed Transformer Models**|Quentin Anthony et.al.|[2408.10197](http://arxiv.org/abs/2408.10197)|null|\u6df1\u5ea6\u5b66\u4e60\uff08DL\uff09\u6a21\u578b\u57fa\u4e8e\u53d8\u6362\u5668\u67b6\u6784\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3001\u89c6\u89c9\u53d8\u6362\u5668\u3001\u97f3\u9891\u751f\u6210\u548c\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u7b49\u4f17\u591aDL\u5e94\u7528\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u8fdb\u5c55\u3002\u8fd9\u4e00\u7cfb\u5217\u8fdb\u6b65\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\uff0c\u7136\u800c\u5206\u5e03\u5f0f\u901a\u4fe1\u4ecd\u7136\u662f\u5f71\u54cd\u8bad\u7ec3\u8fdb\u5ea6\u7684\u4e00\u4e2a\u91cd\u5927\u74f6\u9888\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u53d8\u6362\u5668\u6a21\u578b\u7684\u901a\u4fe1\u884c\u4e3a\uff0c\u5373\u5728\u4f7f\u7528\u591a\u8282\u70b9/\u591aGPU DL\u8bad\u7ec3\u65f6\uff0c\u4e0d\u540c\u5e76\u884c\u65b9\u6848\u5982\u4f55\u5728\u53d8\u6362\u5668\u80cc\u666f\u4e0b\u8fdb\u884c\u6570\u636e\u901a\u4fe1\u3002\u6211\u4eec\u4ee5GPT\u4e3a\u57fa\u7840\u7684\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u53d8\u6362\u5668\u67b6\u6784\u6848\u4f8b\u7814\u7a76\u7684\u4e3b\u8981\u5bf9\u8c61\uff0c\u7531\u4e8e\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u800c\u88ab\u9009\u4e2d\u3002\u901a\u8fc7\u6211\u4eec\u7684\u901a\u4fe1\u65e5\u5fd7\u9a8c\u8bc1\u4e86\u6240\u83b7\u5f97\u7684\u5b9e\u9a8c\u7ed3\u679c\uff0c\u5e76\u4f7f\u7528\u5206\u6790\u6a21\u578b\u5bf9\u8fd9\u4e9b\u7ed3\u679c\u8fdb\u884c\u4e86\u786e\u8ba4\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8fdb\u4e00\u6b65\u4f18\u5316\u5c0f\u6d88\u606f\u70b9\u5230\u70b9\u901a\u4fe1\u7684\u5fc5\u8981\u6027\u3001\u5e8f\u5217\u957f\u5ea6\u3001\u6bcfGPU\u541e\u5410\u91cf\u3001\u6a21\u578b\u5927\u5c0f\u4ee5\u53ca\u6240\u7528\u4f18\u5316\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4ee5\u53ca\u5728\u6846\u67b6\u548c\u9ad8\u6027\u80fd\u8ba1\u7b97\u4e2d\u95f4\u4ef6\u8bbe\u8ba1\u4e0e\u4f18\u5316\u65b9\u9762\u53ef\u80fd\u9700\u8981\u5f15\u5bfc\u7684\u8fdb\u4e00\u6b65\u4f18\u5316\u65b9\u5411\u3002|\n", "2408.10174": "|**2024-08-19**|**SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models**|Anke Tang et.al.|[2408.10174](http://arxiv.org/abs/2408.10174)|**[link](https://github.com/tanganke/fusion_bench)**|**\u6df1\u5ea6\u6a21\u578b\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u8bad\u7ec3\u65e5\u76ca\u53d8\u5f97\u6210\u672c\u9ad8\u6602\uff0c\u8fd9\u4fc3\u4f7f\u4eba\u4eec\u5e7f\u6cdb\u91c7\u7528\u6df1\u5ea6\u6a21\u578b\u878d\u5408\u6280\u672f\uff0c\u4ee5\u5229\u7528\u73b0\u6709\u6a21\u578b\u7684\u77e5\u8bc6\u3002\u4ece\u7b80\u5355\u7684\u6743\u91cd\u5e73\u5747\u5230\u66f4\u590d\u6742\u7684AdaMerging\u7b49\u65b9\u6cd5\uff0c\u6a21\u578b\u878d\u5408\u80fd\u591f\u6709\u6548\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5e76\u52a0\u901f\u65b0\u6a21\u578b\u7684\u5f00\u53d1\u3002\u7136\u800c\uff0c\u4e2a\u4f53\u6a21\u578b\u53c2\u6570\u95f4\u7684\u76f8\u4e92\u5e72\u6270\u4ee5\u53ca\u878d\u5408\u8fc7\u7a0b\u7684\u53ef\u89e3\u91ca\u6027\u4e0d\u8db3\u4ecd\u7136\u662f\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u8bd5\u56fe\u901a\u8fc7\u8bc4\u4f30\u53c2\u6570\u5c5e\u6027\uff08\u5982\u5927\u5c0f\u6216\u7b26\u53f7\uff09\u6216\u8fdb\u884c\u53c2\u6570\u4fee\u526a\u6765\u89e3\u51b3\u53c2\u6570\u5e72\u6270\u95ee\u9898\u3002\u672c\u7814\u7a76\u9996\u5148\u4ece\u7ebf\u6027\u5c42\u5fae\u8c03\u7684\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b50\u7a7a\u95f4\u5206\u6790\u660e\u786e\u5730\u5b9a\u4e49\u4e86\u53c2\u6570\u5e72\u6270\u4f5c\u4e3a\u4f18\u5316\u95ee\u9898\uff0c\u4ee5\u63ed\u793a\u8fd9\u4e00\u4e3b\u9898\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u96f6\u6837\u672c\u7a00\u758f\u6df7\u5408\u4f4e\u79e9\u4e13\u5bb6\uff08SMILE\uff09\u6784\u9020\u7684\u521b\u65b0\u6a21\u578b\u878d\u5408\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5141\u8bb8\u5728\u65e0\u9700\u989d\u5916\u6570\u636e\u6216\u8fdb\u4e00\u6b65\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u5c06\u6e90\u6a21\u578b\u5347\u7ea7\u4e3a\u6df7\u5408\u4e13\u5bb6\u6a21\u578b\uff08MoE\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u4ee5\u4e0b\u89c2\u5bdf\uff1a\u5fae\u8c03\u4e3b\u8981\u4fdd\u7559\u4e86\u9884\u8bad\u7ec3\u7684\u91cd\u8981\u90e8\u5206\uff0c\u4f46\u4f7f\u7528\u8f83\u5c11\u91cd\u8981\u6216\u672a\u4f7f\u7528\u7684\u533a\u57df\u6765\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u5728\u539f\u59cb\u53c2\u6570\u7a7a\u95f4\u4e2d\u56fa\u6709\u7684\u53c2\u6570\u5e72\u6270\u95ee\u9898\uff0c\u53ef\u4ee5\u901a\u8fc7\u6269\u5c55\u7ef4\u5ea6\u6765\u7ba1\u7406\u3002\u6211\u4eec\u5728\u591a\u79cd\u573a\u666f\u4e0b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5305\u62ec\u56fe\u50cf\u5206\u7c7b\u548c\u6587\u672c\u6cdb\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u5168\u91cf\u5fae\u8c03\u548cLoRA\u5fae\u8c03\uff0c\u5e76\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08CLIP\u6a21\u578b\u3001Flan-T5\u6a21\u578b\u548cMistral-7B\u6a21\u578b\uff09\uff0c\u7a81\u51fa\u4e86SMILE\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8ehttps://github.com/tanganke/fusion_bench**|\n", "2408.10159": "|**2024-08-19**|**Customizing Language Models with Instance-wise LoRA for Sequential Recommendation**|Xiaoyu Kong et.al.|[2408.10159](http://arxiv.org/abs/2408.10159)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u77e5\u8bc6\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u8bed\u8a00\u751f\u6210\u8303\u5f0f\u5c06LLM\u5e94\u7528\u4e8e\u5e8f\u5217\u63a8\u8350\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5c06\u7528\u6237\u884c\u4e3a\u5e8f\u5217\u8f6c\u6362\u4e3aLLM\u5fae\u8c03\u7684\u63d0\u793a\uff0c\u5229\u7528LoRA\u6a21\u5757\u6765\u7ec6\u5316\u63a8\u8350\u3002\u7136\u800c\uff0c\u5728\u4e0d\u540c\u7528\u6237\u884c\u4e3a\u4e4b\u95f4\u8fdb\u884c\u7edf\u4e00\u5e94\u7528\u65f6\uff0cLoRA\u6709\u65f6\u65e0\u6cd5\u6355\u6349\u5230\u4e2a\u4f53\u5dee\u5f02\u6027\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u4ee5\u53ca\u5728\u4e0d\u540c\u884c\u4e3a\u5e8f\u5217\u95f4\u7684\u8d1f\u8fc1\u79fb\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684LoRA\uff08iLoRA\uff09\uff0c\u5b83\u7ed3\u5408\u4e86LoRA\u4e0e\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6846\u67b6\u3002iLoRA\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6837\u5316\u7684\u4e13\u5bb6\u96c6\u5408\uff0c\u6bcf\u4e2a\u4e13\u5bb6\u90fd\u80fd\u591f\u6355\u83b7\u7279\u5b9a\u7684\u7528\u6237\u504f\u597d\u65b9\u9762\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u7531\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u5f15\u5bfc\u7684\u95e8\u63a7\u51fd\u6570\u3002\u8be5\u95e8\u63a7\u51fd\u6570\u5904\u7406\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u4ee5\u751f\u6210\u589e\u5f3a\u8868\u793a\uff0c\u4ece\u800c\u6307\u5bfc\u95e8\u63a7\u7f51\u7edc\u8f93\u51fa\u5b9a\u5236\u7684\u4e13\u5bb6\u53c2\u4e0e\u6743\u91cd\u3002\u8fd9\u79cd\u5b9a\u5236\u5316\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c11\u8d1f\u8fc1\u79fb\u5e76\u52a8\u6001\u9002\u5e94\u591a\u6837\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u4e09\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\u4e86iLoRA\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u6355\u6349\u7528\u6237\u7279\u5b9a\u504f\u597d\u548c\u63d0\u9ad8\u63a8\u8350\u51c6\u786e\u5ea6\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2408.10151": "|**2024-08-19**|**Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models**|Amey Hengle et.al.|[2408.10151](http://arxiv.org/abs/2408.10151)|**[link](https://github.com/AmeyHengle/multilingual-needle-in-a-haystack)**|\u5728\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u793a\u4e86\u5728\u591a\u79cd\u8bed\u8a00\u4e2d\u54cd\u5e94\u67e5\u8be2\u7684\u80fd\u529b\u4e4b\u540e\uff0c\u5b83\u4eec\u5904\u7406\u957f\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002\u56e0\u6b64\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u8bc4\u4f30LLM\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u7279\u522b\u662f\u5728\u4fe1\u606f\u68c0\u7d22\u7684\u80cc\u666f\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u8bed\u8a00\u9488\u5728\u8349\u5806\u4e2d\u7684\u6d4b\u8bd5\uff08MultiLingual Needle-in-a-Haystack\uff0c\u7b80\u79f0MLNeedle\uff09\uff0c\u65e8\u5728\u8bc4\u4f30\u6a21\u578b\u4ece\u591a\u8bed\u8a00\u5e72\u6270\u6587\u672c\u96c6\u5408\uff08\u8349\u5806\uff09\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\uff08\u9488\uff09\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u6d4b\u8bd5\u6269\u5c55\u4e86\u591a\u8bed\u8a00\u95ee\u7b54\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u5355\u8bed\u8a00\u548c\u8de8\u8bed\u8a00\u68c0\u7d22\u3002\u6211\u4eec\u5bf9\u5f53\u524d\u7684\u56db\u5927\u5148\u8fdbLLM\u8fdb\u884c\u4e86MLNeedle\u6d4b\u8bd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u6a21\u578b\u6027\u80fd\u5728\u4e0d\u540c\u8bed\u8a00\u548c\u9488\u7684\u4f4d\u7f6e\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5f53\u9488\u4f4d\u4e8e\u82f1\u8bed\u8bed\u7cfb\u4e4b\u5916\u7684\u8bed\u8a00\u4e2d\u4ee5\u53ca\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u4e2d\u95f4\u4f4d\u7f6e\u65f6\uff0c\u6a21\u578b\u7684\u6027\u80fd\u6700\u4f4e\u3002\u6b64\u5916\uff0c\u5c3d\u7ba1\u67d0\u4e9b\u6a21\u578b\u58f0\u79f0\u5177\u6709\u9ad8\u8fbe8k\u4e2a\u4ee4\u724c\u7684\u4e0a\u4e0b\u6587\u5927\u5c0f\uff0c\u4f46\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u90fd\u6ca1\u6709\u8868\u73b0\u51fa\u6ee1\u610f\u7684\u8de8\u8bed\u8a00\u68c0\u7d22\u6027\u80fd\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5173\u4e8eLLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u5173\u952e\u89c1\u89e3\uff0c\u4ee5\u6307\u5bfc\u672a\u6765\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u7814\u7a76LLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7684\u957f\u4e0a\u4e0b\u6587\u884c\u4e3a\u3002|\n", "2408.10147": "|**2024-08-19**|**In-Context Learning with Representations: Contextual Generalization of Trained Transformers**|Tong Yang et.al.|[2408.10147](http://arxiv.org/abs/2408.10147)|null|\u672c\u6587\u901a\u8fc7\u975e\u7ebf\u6027\u56de\u5f52\u4efb\u52a1\u7684\u89c6\u89d2\u6765\u63a2\u8ba8Transformer\u5728\u68af\u5ea6\u4e0b\u964d\u8fc7\u7a0b\u4e2d\u7684\u8bad\u7ec3\u52a8\u6001\u3002\u5728\u6b64\u7c7b\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u5b66\u4e60\u6bcf\u4e2a\u4efb\u52a1\u7684\u6a21\u677f\u51fd\u6570\u5b9e\u73b0\u4e0a\u4e0b\u6587\u6cdb\u5316\uff0c\u6240\u6709\u6a21\u677f\u51fd\u6570\u90fd\u4f4d\u4e8e\u5305\u542b$m$\u4e2a\u57fa\u51fd\u6570\u7684\u7ebf\u6027\u7a7a\u95f4\u5185\u3002\u6211\u4eec\u5bf9\u5355\u5c42\u591a\u5934Transformer\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u4ee5\u5728\u90e8\u5206\u6807\u8bb0\u63d0\u793a\u4e0b\u9884\u6d4b\u672a\u6807\u8bb0\u8f93\u5165\u7684\u4e0a\u4e0b\u6587\u5185\u9884\u6d4b\u80fd\u529b\uff0c\u5176\u4e2d\u6807\u7b7e\u5305\u542b\u9ad8\u65af\u566a\u58f0\uff0c\u6bcf\u4e2a\u63d0\u793a\u4e2d\u7684\u793a\u4f8b\u6570\u91cf\u4e0d\u8db3\u4ee5\u786e\u5b9a\u6a21\u677f\u3002 \u5728\u6e29\u548c\u5047\u8bbe\u4e0b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5355\u5c42\u591a\u5934Transformer\u7684\u8bad\u7ec3\u635f\u5931\u4f1a\u7ebf\u6027\u6536\u655b\u81f3\u5168\u5c40\u6700\u5c0f\u503c\u3002\u6b64\u5916\uff0cTransformer\u6709\u6548\u5730\u5b66\u4e60\u4e86\u5728\u57fa\u51fd\u6570\u4e0a\u8fdb\u884c\u5cad\u56de\u5f52\u7684\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u5c55\u793a\u4e86\u5f53\u63d0\u793a\u4ec5\u5305\u542b\u5c11\u91cf\u67e5\u8be2-\u7b54\u6848\u5bf9\u65f6\uff0cTransformer\u80fd\u591f\u5b66\u4e60\u4e0a\u4e0b\u6587\u4fe1\u606f\uff08\u5373\u6a21\u677f\uff09\u4ee5\u5bf9\u672a\u89c1\u8fc7\u7684\u793a\u4f8b\u548c\u4efb\u52a1\u8fdb\u884c\u6cdb\u5316\u3002|\n", "2408.10141": "|**2024-08-19**|**Instruction Finetuning for Leaderboard Generation from Empirical AI Research**|Salomon Kabongo et.al.|[2408.10141](http://arxiv.org/abs/2408.10141)|null|\u672c\u6587\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6307\u4ee4\u5fae\u8c03\u5728\u81ea\u52a8\u5316\u751f\u6210AI\u7814\u7a76\u6392\u884c\u699c\u4e2d\u7684\u5e94\u7528\uff0c\u4ece\u6587\u7ae0\u4e2d\u63d0\u53d6\uff08\u4efb\u52a1\uff0c\u6570\u636e\u96c6\uff0c\u6307\u6807\uff0c\u5206\u6570\uff09\u56db\u5143\u7ec4\u3002\u8be5\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ece\u4f20\u7edf\u7684\u3001\u57fa\u4e8e\u793e\u533a\u7684\u624b\u52a8\u6574\u7406\u8f6c\u53d8\u4e3a\u5229\u7528\u81ea\u52a8\u5316\u3001\u751f\u6210\u5f0fLLM\u65b9\u6cd5\u6765\u7b80\u5316AI\u7814\u7a76\u8fdb\u5c55\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u8d85\u8d8a\u4f9d\u8d56\u4e8e\u7279\u5b9a\u5206\u7c7b\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u6a21\u578b\u7684\u4f20\u7edf\u65b9\u5f0f\u3002\u901a\u8fc7\u5229\u7528FLAN-T5\u6a21\u578b\uff0c\u672c\u7814\u7a76\u589e\u5f3a\u4e86LLMs\u5728\u4fe1\u606f\u62bd\u53d6\u65b9\u9762\u7684\u9002\u5e94\u6027\u548c\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u6784\u5efa\u7ed3\u6784\u5316\u77e5\u8bc6\u8868\u793a\u3002|\n", "2408.10124": "|**2024-08-19**|**Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models**|Tianyu Zhang et.al.|[2408.10124](http://arxiv.org/abs/2408.10124)|**[link](https://github.com/zhangtia16/molgraph-lardo)**|**\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u662f\u836f\u7269\u53d1\u73b0\u7684\u57fa\u7840\u3002\u8fd1\u5e74\u6765\uff0c\u9884\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u5f97\u5230\u4e86\u5e7f\u6cdb\u5e94\u7528\uff0c\u5e76\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u4e00\u4e9b\u5c06\u751f\u7269\u5316\u5b66\u9886\u57df\u7684\u5148\u9a8c\u77e5\u8bc6\u878d\u5165\u9884\u8bad\u7ec3\u6846\u67b6\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u751f\u7269\u5316\u5b66\u4e13\u5bb6\uff0c\u83b7\u53d6\u548c\u603b\u7ed3\u5927\u91cf\u7684\u9886\u57df\u77e5\u8bc6\u6587\u732e\u65e2\u8017\u65f6\u53c8\u6602\u8d35\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u5e76\u9ad8\u6548\u63d0\u4f9b\u901a\u7528\u77e5\u8bc6\u65b9\u9762\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5076\u5c14\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u7f3a\u4e4f\u751f\u6210\u7279\u5b9a\u9886\u57df\u77e5\u8bc6\u7684\u7cbe\u786e\u6027\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\uff08DSMs\uff09\u62e5\u6709\u4e30\u5bcc\u7684\u9886\u57df\u77e5\u8bc6\uff0c\u80fd\u591f\u51c6\u786e\u8ba1\u7b97\u4e0e\u5206\u5b50\u9886\u57df\u76f8\u5173\u7684\u6307\u6807\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5b83\u4eec\u7684\u6a21\u578b\u5927\u5c0f\u6709\u9650\u4e14\u529f\u80fd\u5355\u4e00\uff0c\u5b83\u4eec\u7f3a\u4e4f\u5168\u9762\u7684\u8868\u793a\u5b66\u4e60\u6240\u9700\u7684\u5e7f\u6cdb\u77e5\u8bc6\u3002\u4e3a\u4e86\u5728\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u4e2d\u5145\u5206\u5229\u7528\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMolGraph-LarDo\u7684\u65b0\u578b\u5206\u5b50\u56fe\u8868\u793a\u5b66\u4e60\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u878d\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\u3002\u6280\u672f\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e24\u9636\u6bb5\u63d0\u793a\u7b56\u7565\uff0c\u5176\u4e2d\u5f15\u5165DSMs\u6765\u6821\u51c6LLMs\u63d0\u4f9b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u589e\u5f3a\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u7684\u51c6\u786e\u6027\uff0c\u4f7fLLMs\u80fd\u591f\u4e3a\u5206\u5b50\u6837\u672c\u751f\u6210\u66f4\u7cbe\u786e\u7684\u6587\u5b57\u63cf\u8ff0\u3002\u968f\u540e\uff0c\u6211\u4eec\u91c7\u7528\u591a\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u534f\u8c03\u5305\u62ec\u5206\u5b50\u56fe\u53ca\u5176\u5bf9\u5e94\u63cf\u8ff0\u6587\u672c\u5728\u5185\u7684\u5404\u79cd\u6a21\u6001\uff0c\u4ee5\u6307\u5bfc\u5206\u5b50\u8868\u793a\u7684\u9884\u8bad\u7ec3\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10111": "|**2024-08-20**|**PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities**|Yuanjian Xu et.al.|[2408.10111](http://arxiv.org/abs/2408.10111)|null|\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u5efa\u6a21\u5bf9\u4e8e\u7406\u89e3\u4e0e\u9884\u6d4b\u5e02\u573a\u884c\u4e3a\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u4e34\u7740\u975e\u7ebf\u6027\u3001\u975e\u5e73\u7a33\u6027\u548c\u9ad8\u566a\u58f0\u7b49\u6311\u6218\u3002\u4f20\u7edf\u7684\u6a21\u578b\u5728\u6355\u6349\u590d\u6742\u6a21\u5f0f\u65f6\u53d7\u5230\u8fd9\u4e9b\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u53d7\u5230\u8ba1\u7b97\u8d44\u6e90\u548c\u6a21\u578b\u5bb9\u91cf\u7684\u9650\u5236\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a$\\textbf{PLUTUS}$\u7684\u6a21\u578b\uff0c\u5176\u5168\u79f0\u4e3a$\\textbf{P}$re-trained $\\textbf{L}$arge $\\textbf{U}$nified $\\textbf{T}$ransformer-based\u6a21\u578b\uff0c\u7528\u4e8e\u63ed\u793a\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u4e2d\u7684\u89c4\u5f8b\u3002$\\textbf{PLUTUS}$\u901a\u8fc7\u7ed3\u5408\u53ef\u9006\u5d4c\u5165\u6a21\u5757\u3001\u5bf9\u6bd4\u5b66\u4e60\u548c\u81ea\u52a8\u7f16\u7801\u6280\u672f\uff0c\u521b\u5efa\u4e86\u539f\u59cb\u6570\u636e\u4e0e\u5757\u5d4c\u5165\u4e4b\u95f4\u7684\u8fd1\u4f3c\u4e00\u4e00\u6620\u5c04\u3002 TimeFormer\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6ce8\u610f\u529b\u7684\u67b6\u6784\uff0c\u6784\u6210\u4e86$\\textbf{PLUTUS}$\u7684\u6838\u5fc3\uff0c\u6709\u6548\u5730\u5904\u7406\u4e86\u9ad8\u566a\u58f0\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6ce8\u610f\u529b\u673a\u5236\uff0c\u4ee5\u8de8\u53d8\u91cf\u548c\u65f6\u95f4\u7ef4\u5ea6\u6355\u83b7\u7279\u5f81\u3002$\\textbf{PLUTUS}$\u5728\u89c4\u6a21\u7a7a\u524d\u76841000\u4ebf\u4e2a\u89c2\u5bdf\u503c\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u65e8\u5728\u9002\u5e94\u5608\u6742\u7684\u91d1\u878d\u5e02\u573a\u73af\u5883\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c$\\textbf{PLUTUS}$\u662f\u9996\u4e2a\u5f00\u6e90\u7684\u3001\u5927\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6a21\u578b\uff0c\u53c2\u6570\u8d85\u8fc7\u5341\u4ebf\u4e2a\u3002\u5b83\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u8fc1\u79fb\u6027\uff0c\u5e76\u4e3a\u91d1\u878d\u9886\u57df\u5efa\u7acb\u4e86\u4e00\u4e2a\u575a\u5b9e\u7684\u57fa\u7840\u6a21\u578b\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u7684\u6280\u672f\u6307\u5bfc\uff0c\u786e\u7acb\u4e86\u8be5\u9886\u57df\u7684\u5168\u65b0\u6807\u51c6\u3002|\n", "2408.10086": "|**2024-08-19**|**ARMADA: Attribute-Based Multimodal Data Augmentation**|Xiaomeng Jin et.al.|[2408.10086](http://arxiv.org/abs/2408.10086)|null|\u5728\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLMs\uff09\u4e2d\uff0c\u624b\u52a8\u6807\u6ce8\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u6570\u636e\u4ee5\u8fdb\u884c\u5fae\u8c03\u548c\u5bf9\u9f50\u7684\u6210\u672c\u975e\u5e38\u9ad8\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u6846\u67b6\u63d0\u51fa\u4e86\u589e\u5f3a\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u7684\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u8981\u4e48\u5728\u6587\u672c\u548c\u56fe\u50cf\u4e4b\u95f4\u5b58\u5728\u8bed\u4e49\u4e0d\u4e00\u81f4\uff0c\u8981\u4e48\u751f\u6210\u4e0d\u5207\u5b9e\u9645\u7684\u56fe\u50cf\uff0c\u5bfc\u81f4\u4e0e\u73b0\u5b9e\u4e16\u754c\u793a\u4f8b\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAttribute-based Multimodal Data Augmentation (ARMADA)\u7684\u65b0\u578b\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u77e5\u8bc6\u5f15\u5bfc\u7684\u63d0\u53ca\u5b9e\u4f53\u89c6\u89c9\u5c5e\u6027\u7684\u4fee\u6539\u6765\u589e\u5f3a\u6570\u636e\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u539f\u59cb\u6587\u672c\u6570\u636e\u4e2d\u63d0\u53d6\u5b9e\u4f53\u53ca\u5176\u89c6\u89c9\u5c5e\u6027\uff0c\u7136\u540e\u5728\u77e5\u8bc6\u5e93\uff08KBs\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u5bfc\u4e0b\u641c\u7d22\u89c6\u89c9\u5c5e\u6027\u7684\u66ff\u4ee3\u503c\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5229\u7528\u56fe\u50cf\u7f16\u8f91\u6a21\u578b\u6839\u636e\u63d0\u53d6\u7684\u5c5e\u6027\u7f16\u8f91\u56fe\u50cf\u3002ARMADA\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u6570\u636e\u751f\u6210\u6846\u67b6\uff1a(i) \u4ece\u7b26\u53f7\u77e5\u8bc6\u5e93\u4e2d\u63d0\u53d6\u77e5\u8bc6\u5173\u8054\u7684\u5c5e\u6027\uff0c\u5b9e\u73b0\u8bed\u4e49\u4e00\u81f4\u4e14\u5177\u6709\u533a\u522b\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u751f\u6210\uff1b(ii) \u5229\u7528\u77e5\u8bc6\u5e93\u5c42\u6b21\u7ed3\u6784\u4e2d\u7684\u540c\u7c7b\u522b\u5b9e\u4f53\u751f\u6210\u89c6\u89c9\u4e0a\u76f8\u4f3c\u4f46\u4e0d\u540c\u7c7b\u522b\u7684\u56fe\u50cf\uff1b(iii) \u4f7f\u7528LLMs\u7684\u5e38\u8bc6\u77e5\u8bc6\u8c03\u8282\u8f85\u52a9\u89c6\u89c9\u5c5e\u6027\uff0c\u5982\u80cc\u666f\uff0c\u4ee5\u66f4\u5168\u9762\u5730\u8868\u793a\u539f\u59cb\u5b9e\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u660e\uff0c\u5728\u56db\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u5e76\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e5f\u5f3a\u8c03\u4e86\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4ee3\u7406\u4ee5\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\u548c\u73b0\u5b9e\u4e16\u754c\u76f8\u5173\u6027\u7684\u5fc5\u8981\u6027\u3002|\n", "2408.10072": "|**2024-08-19**|**FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant**|Zhengchao Huang et.al.|[2408.10072](http://arxiv.org/abs/2408.10072)|null|\u5feb\u901f\u53d1\u5c55\u7684\u6df1\u5ea6\u4f2a\u9020\u6280\u672f\u5f15\u53d1\u4e86\u516c\u4f17\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c24\u5176\u662f\u5728\u5bf9\u516c\u5171\u4fe1\u606f\u5b89\u5168\u6784\u6210\u4e25\u91cd\u5a01\u80c1\u7684\u9762\u90e8\u4f2a\u9020\u65b9\u9762\u3002\u7136\u800c\uff0c\u672a\u77e5\u548c\u591a\u6837\u7684\u4f2a\u9020\u6280\u672f\u3001\u591a\u53d8\u7684\u9762\u90e8\u7279\u5f81\u4ee5\u53ca\u590d\u6742\u7684\u73af\u5883\u56e0\u7d20\u7ed9\u9762\u90e8\u4f2a\u9020\u5206\u6790\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u73b0\u6709\u6570\u636e\u96c6\u5728\u63cf\u8ff0\u8fd9\u4e9b\u65b9\u9762\u65f6\u5b58\u5728\u4e0d\u8db3\uff0c\u4f7f\u5f97\u4ec5\u901a\u8fc7\u89c6\u89c9\u4fe1\u606f\u96be\u4ee5\u5728\u5404\u79cd\u5e72\u6270\u56e0\u7d20\u4e2d\u533a\u5206\u771f\u5b9e\u4e0e\u4f2a\u9020\u7684\u9762\u90e8\u3002\u6b64\u5916\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u672a\u80fd\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u4e14\u53ef\u89e3\u91ca\u7684\u7ed3\u679c\uff0c\u590d\u6742\u5316\u4e86\u6a21\u578b\u51b3\u7b56\u8fc7\u7a0b\u7684\u7406\u89e3\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u9896\u7684\u201c\u5f00\u653e\u4e16\u754c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u95ee\u7b54\u201d\uff08OW-FFA-VQA\uff09\u4efb\u52a1\u53ca\u5176\u76f8\u5e94\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u9996\u5148\u5efa\u7acb\u4e86\u4e00\u4e2a\u5305\u542b\u771f\u5b9e\u548c\u4f2a\u9020\u9762\u90e8\u56fe\u50cf\u7684\u591a\u6837\u96c6\u5408\uff0c\u5e76\u914d\u6709\u5173\u952e\u63cf\u8ff0\u548c\u53ef\u9760\u4f2a\u9020\u63a8\u7406\u7684\u6570\u636e\u96c6\u3002\u57fa\u4e8e\u6b64\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u52a9\u624b\u201d\uff08FFAA\uff09\uff0c\u5b83\u7531\u4e00\u4e2a\u5fae\u8c03\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u548c\u4e00\u4e2a\u591a\u7b54\u6848\u667a\u80fd\u51b3\u7b56\u7cfb\u7edf\uff08MIDS\uff09\u7ec4\u6210\u3002\u901a\u8fc7\u7ed3\u5408\u5047\u8bbe\u6027\u63d0\u793a\u4e0eMIDS\uff0c\u6709\u6548\u6d88\u9664\u4e86\u6a21\u7cca\u5206\u7c7b\u8fb9\u754c\u7684\u5f71\u54cd\u529b\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u9c81\u68d2\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u7528\u6237\u53cb\u597d\u7684\u53ef\u89e3\u91ca\u7ed3\u679c\uff0c\u800c\u4e14\u5728\u51c6\u786e\u6027\u4e0e\u9c81\u68d2\u6027\u65b9\u9762\u663e\u8457\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u65b9\u6cd5\u3002|\n", "2408.11053": "|**2024-08-20**|**Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks**|Nathaniel Pinckney et.al.|[2408.11053](http://arxiv.org/abs/2408.11053)|**[link](https://github.com/nvlabs/verilog-eval)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6570\u5b57\u786c\u4ef6\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5e94\u7528\u662f\u4e00\u4e2a\u65b0\u5174\u9886\u57df\u3002\u5927\u591a\u6570LLM\u4e3b\u8981\u662f\u5728\u81ea\u7136\u8bed\u8a00\u548c\u8f6f\u4ef6\u4ee3\u7801\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\u3002\u786c\u4ef6\u4ee3\u7801\uff0c\u5982Verilog\uff0c\u53ea\u5360\u8bad\u7ec3\u6570\u636e\u7684\u4e00\u5c0f\u90e8\u5206\uff0c\u800c\u4e14\u5f88\u5c11\u6709\u786c\u4ef6\u57fa\u51c6\u5b58\u5728\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c2023\u5e74\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aVerilogEval\u7684\u5f00\u6e90\u57fa\u51c6\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4e00\u81f4\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8eLLM\u5728\u4ee3\u7801\u5b8c\u6210\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u57fa\u51c6\u5728\u5f53\u65f6\u7684\u9886\u5148\u6a21\u578b\uff0c\u5305\u62ecGPT-4\uff0c\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002\u7136\u800c\uff0cVerilogEval\u548c\u5176\u4ed6Verilog\u751f\u6210\u57fa\u51c6\u7f3a\u4e4f\u5931\u8d25\u5206\u6790\uff0c\u5f53\u524d\u5f62\u5f0f\u4e0b\u4e5f\u4e0d\u5229\u4e8e\u63a2\u7d22\u63d0\u793a\u6280\u672f\u3002\u6b64\u5916\uff0c\u5728VerilogEval\u53d1\u5e03\u540e\uff0c\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u90fd\u7ecf\u5386\u4e86\u6301\u7eed\u7684\u53d1\u5c55\u3002 \u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u65b0\u53d1\u5e03\u7684\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u4e0d\u540c\u89c4\u6a21\uff0c\u9488\u5bf9\u6539\u8fdb\u540e\u7684VerilogEval\u57fa\u51c6\u5957\u4ef6\u3002\u6211\u4eec\u589e\u5f3a\u4e86VerilogEval\u7684\u57fa\u7840\u67b6\u6784\u548c\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u81ea\u52a8\u5206\u7c7b\u5931\u8d25\uff0c\u5f15\u5165\u4e86\u652f\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u793a\u4f8b\u7684\u65b0\u63d0\u793a\uff0c\u5e76\u6269\u5c55\u4e86\u652f\u6301\u7684\u4efb\u52a1\u5230\u89c4\u683c\u5230RTL\u8f6c\u6362\u3002\u6211\u4eec\u53d1\u73b0\u5546\u4e1a\u9886\u57df\u7684\u6700\u65b0\u6a21\u578b\u6709\u4e86\u53ef\u6d4b\u91cf\u7684\u6539\u8fdb\uff0c\u5176\u4e2dGPT-4 Turbo\u5728\u89c4\u683c\u5230RTL\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8659%\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u4e5f\u7814\u7a76\u4e86\u65b0\u51fa\u73b0\u7684\u5f00\u6e90\u548c\u9886\u57df\u7279\u5b9a\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5e76\u5c55\u793a\u4e86\u6a21\u578b\u4ece\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u83b7\u5f97\u663e\u8457\u76ca\u5904\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u53d1\u5e03\u7684Llama 3.1 405B\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0eGPT-4 Turbo\u76f8\u5f53\uff0c\u5b9e\u73b0\u4e8658%\u7684\u6210\u529f\u7387\uff0c\u800c\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u7684RTL-Coder 6.7B\u6a21\u578b\u5219\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768437%\u7684\u6210\u529f\u7387\u3002\u7136\u800c\uff0c\u63d0\u793a\u5de5\u7a0b\u5bf9\u4e8e\u5b9e\u73b0\u826f\u597d\u7684\u6210\u529f\u7387\u81f3\u5173\u91cd\u8981\uff0c\u5e76\u4e14\u968f\u7740\u6a21\u578b\u548c\u4efb\u52a1\u7684\u53d8\u5316\u800c\u53d8\u5316\u3002\u4e00\u4e2a\u5141\u8bb8\u8fdb\u884c\u63d0\u793a\u5de5\u7a0b\u548c\u5931\u8d25\u5206\u6790\u7684\u57fa\u51c6\u57fa\u7840\u8bbe\u65bd\u5bf9\u4e8e\u6301\u7eed\u7684\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u5f80\u5f80\u8f83\u4f4e\u4e0b\u3002\u6211\u4eec\u5f15\u5165\u4e86FLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u591a\u6a21\u6001LLM\u7684\u65b0\u578b\u4ee3\u7406\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u80fd\u9ad8\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u5b9e\u73b0\u5bf9\u5bfc\u822a\u4efb\u52a1\u7684\u6709\u6548\u9002\u5e94\uff1a\u5355\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8857\u9053\u89c6\u56fe\u63cf\u8ff0\u3001\u591a\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8f68\u8ff9\u603b\u7ed3\u4ee5\u53ca\u7aef\u5230\u7aef\u8bad\u7ec3\u5728VLN\u6570\u636e\u96c6\u4e0a\u7684\u7efc\u5408\u80fd\u529b\u3002\u751f\u6210\u7684\u6570\u636e\u96c6\u901a\u8fc7\u81ea\u52a8\u5316\u8fc7\u7a0b\u5408\u6210\u800c\u6210\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u8f83\u73b0\u6709\u65b9\u6cd5\u63d0\u9ad8\u4e867.3%\uff0c\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u4ee3\u8868\u4e86\u5411\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53\u4eba\u5de5\u667a\u80fd\u9886\u57df\u8fc8\u51fa\u7684\u91cd\u8981\u4e00\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11049": "|**2024-08-20**|**MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding**|Jian Chen et.al.|[2408.11049](http://arxiv.org/abs/2408.11049)|**[link](https://github.com/infini-ai-lab/magicdec)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982\u4ea4\u4e92\u5f0f\u804a\u5929\u673a\u5668\u4eba\u3001\u6587\u6863\u5206\u6790\u548c\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u7b49\u957f\u671f\u4e0a\u4e0b\u6587\u5e94\u7528\u4e2d\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4f46\u63d0\u4f9b\u957f\u4e0a\u4e0b\u6587\u8bf7\u6c42\u65f6\uff0c\u8981\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u548c\u9ad8\u541e\u5410\u91cf\u662f\u4e00\u4e2a\u6311\u6218\u3002\u63a8\u6d4b\u6027\u89e3\u7801\uff08SD\uff09\u662f\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u964d\u4f4e\u5ef6\u8fdf\u7684\u6280\u672f\uff0c\u4f20\u7edf\u89c2\u70b9\u8ba4\u4e3a\u5176\u6548\u80fd\u4ec5\u9650\u4e8e\u8f83\u5c0f\u7684\u6279\u6b21\u5927\u5c0f\u3002\u7136\u800c\uff0c\u5728MagicDec\u4e2d\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4ee4\u4eba\u60ca\u8bb6\u7684\u4e8b\u5b9e\uff1a\u5373\u4f7f\u5728\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u73af\u5883\u4e2d\uff0c\u5bf9\u4e8e\u4e2d\u7b49\u5230\u8f83\u957f\u5e8f\u5217\uff0cSD\u4ecd\u80fd\u5b9e\u73b0\u52a0\u901f\u3002\u66f4\u6709\u8da3\u7684\u662f\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u4e25\u8c28\u5206\u6790\uff0c\u4e00\u79cd\u667a\u80fd\u8d77\u8349\u7b56\u7565\u53ef\u4ee5\u5728\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u65f6\u83b7\u5f97\u66f4\u597d\u7684\u52a0\u901f\u6548\u679c\u3002 MagicDec\u9996\u5148\u8bc6\u522b\u51fa\u968f\u7740\u6279\u6b21\u5927\u5c0f\u548c\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u7684\u74f6\u9888\u8f6c\u79fb\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u6d1e\u5bdf\u6765\u66f4\u6709\u6548\u5730\u90e8\u7f72\u63a8\u6d4b\u6027\u89e3\u7801\u4ee5\u652f\u6301\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u5229\u7528\u7a00\u758fKV\u7f13\u5b58\u7684\u8349\u6848\u6a21\u578b\u6765\u89e3\u51b3\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u548c\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u800c\u6269\u5c55\u7684KV\u74f6\u9888\u95ee\u9898\u3002|\n", "2408.11043": "|**2024-08-20**|**Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research**|Sreyoshi Bhaduri et.al.|[2408.11043](http://arxiv.org/abs/2408.11043)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5206\u6790\u8bbf\u8c08\u8bb0\u5f55\uff0c\u4ee5\u89e3\u51b3\u624b\u52a8\u5206\u6790\u5b9a\u6027\u6570\u636e\u9700\u8981\u5927\u91cf\u65f6\u95f4\u548c\u52aa\u529b\u7684\u95ee\u9898\u3002\u7814\u7a76\u65e8\u5728\u5c06\u7814\u7a76\u95ee\u9898\u8bbe\u5b9a\u4e3a\u7531LLM\u4f5c\u4e3a\u521d\u7ea7\u7814\u7a76\u52a9\u624b\u8fdb\u884c\u8f85\u52a9\u7684\u6a21\u5f0f\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u5c06LLM\u89c6\u4e3a\u4eba\u624d\u7ba1\u7406\u9886\u57df\u7814\u7a76\u4eba\u5458\u7684\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u601d\u7ef4\u6a21\u578b\u3002\u901a\u8fc7\u6269\u5c55\u57fa\u4e8eRAG\u7684LLM\u65b9\u6cd5\uff0c\u672c\u6587\u5c55\u793a\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u5bf9\u534a\u7ed3\u6784\u5316\u8bbf\u8c08\u6570\u636e\u8fdb\u884c\u4e3b\u9898\u5efa\u6a21\u65b9\u9762\u7684\u7075\u6d3b\u6027\uff0c\u8d85\u8d8a\u4e86\u5b83\u4eec\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u4e2d\u7684\u4f20\u7edf\u5e94\u7528\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684RAG\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u63d0\u53d6\u611f\u5174\u8da3\u7684\u8bae\u9898\uff0c\u4e0e\u4ece\u540c\u4e00\u6570\u636e\u96c6\u624b\u52a8\u751f\u6210\u7684\u4e3b\u9898\u76f8\u6bd4\uff0c\u8986\u76d6\u8303\u56f4\u663e\u8457\u66f4\u9ad8\u3002\u8fd9\u8bc1\u660e\u4e86\u4f7f\u7528LLM\u4f5c\u4e3a\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u53ef\u884c\u6027\u3002\u6b64\u5916\uff0c\u7814\u7a76\u5efa\u8bae\uff0c\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u7814\u7a76\u8005\u5e94\u4e25\u683c\u9075\u5faa\u4f20\u7edf\u8d28\u6027\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u8d28\u91cf\u6807\u51c6\uff0c\u4ee5\u786e\u4fdd\u5176\u65b9\u6cd5\u7684\u4e25\u8c28\u6027\u548c\u53ef\u9760\u6027\u3002 \u6700\u540e\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u9488\u5bf9\u5e0c\u671b\u5c06LLM\u4e0e\u73b0\u6709\u8d28\u6027\u7814\u7a76\u8303\u5f0f\u76f8\u878d\u5408\u7684\u884c\u4e1a\u5b9e\u8df5\u8005\u7684\u5173\u952e\u5efa\u8bae\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u6709\u6548\u6574\u5408\u8fd9\u4e9b\u5f3a\u5927\u4f46\u521d\u7ea7\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u5177\u5728\u5b9a\u6027\u6570\u636e\u5206\u6790\u4e2d\u7684\u8def\u5f84\uff0c\u7279\u522b\u662f\u5728\u4eba\u624d\u9886\u57df\u3002|\n", "2408.11029": "|**2024-08-20**|**Scaling Law with Learning Rate Annealing**|Howe Tissue et.al.|[2408.11029](http://arxiv.org/abs/2408.11029)|null|\u6211\u4eec\u53d1\u73b0\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u4ea4\u53c9\u71b5\u635f\u5931\u66f2\u7ebf\u9075\u5faa\u4e86\u4e00\u4e2a\u4e0e\u5b66\u4e60\u7387\uff08LR\uff09\u8870\u51cf\u76f8\u5173\u7684\u7f29\u653e\u5b9a\u5f8b\uff1a$L(s) = L_0 + A\\cdot S_1^{-\\alpha} - C\\cdot S_2$\u3002\u5176\u4e2d\uff0c$S_1$\u4ee3\u8868\u524d\u5411\u533a\u57df\uff0c$S_2$\u4ee3\u8868\u5b66\u4e60\u7387\u8870\u51cf\u533a\u57df\u3002\u8fd9\u4e00\u516c\u5f0f\u8003\u8651\u4e86\u4e24\u4e2a\u56e0\u7d20\uff1a\uff081\uff09\u4f20\u7edf\u7684\u7f29\u653e\u5f8b\u5b9a\u4e49\u7684\u524d\u5411\u7f29\u653e\uff1b\u4ee5\u53ca\uff082\uff09\u5b66\u4e60\u7387\u8870\u51cf\u5e26\u6765\u7684\u989d\u5916\u635f\u5931\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u8be5\u516c\u5f0f\u80fd\u591f\u63cf\u8ff0\u6bcf\u4e2a\u6b65\u9aa4\u7684\u5b8c\u6574\u635f\u5931\u66f2\u7ebf\uff0c\u800c\u975e\u4ec5\u9650\u4e8e\u8bad\u7ec3\u7ed3\u675f\u65f6\u7684\u5355\u4e00\u635f\u5931\u70b9\u3002\u901a\u8fc7\u5e94\u7528\u5305\u542b\u5b66\u4e60\u7387\u8870\u51cf\u7684\u7f29\u653e\u5f8b\uff0c\u5e76\u4ec5\u901a\u8fc7\u4e00\u5230\u4e24\u6b21\u8bad\u7ec3\u66f2\u7ebf\u62df\u5408\uff0c\u6211\u4eec\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\uff08LRS\uff09\u4e0b\u7684\u635f\u5931\u3002 \u6b64\u5916\uff0c\u8fd9\u4e00\u65b9\u7a0b\u51c6\u786e\u5730\u63cf\u8ff0\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u52a8\u6001\uff0c\u5e76\u4e3a\u5148\u524d\u7814\u7a76\u4e2d\u5173\u6ce8\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u548c\u5b66\u4e60\u7387\u8870\u51cf\u7684\u76f8\u5173\u5b9e\u9a8c\u53d1\u73b0\u63d0\u4f9b\u4e86\u7406\u8bba\u9a8c\u8bc1\u548c\u89e3\u91ca\u3002\u7531\u6b64\u4ea7\u751f\u7684\u6d1e\u5bdf\uff0c\u4e5f\u4e3a\u7814\u7a76\u4eba\u5458\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u63d0\u524d\u9009\u62e9\u5173\u952e\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u7b56\u7565\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002\u6700\u91cd\u8981\u7684\u662f\uff0c\u7531\u4e8e\u6574\u4e2a\u8bad\u7ec3\u66f2\u7ebf\u4e0a\u7684\u6240\u6709\u70b9\u90fd\u9075\u5faa\u8be5\u65b9\u7a0b\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\u4e0b\u5b9e\u73b0\u51c6\u786e\u7684\u635f\u5931\u9884\u6d4b\uff0c\u800c\u6240\u9700\u8ba1\u7b97\u6210\u672c\u4ec5\u4e3a\u4f7f\u7528\u5c0f\u677e\u9f20\u7f29\u653e\u6cd5\u5219\u62df\u5408\u8bed\u8a00\u6a21\u578b\u635f\u5931\u6240\u9700\u76841%\u4ee5\u4e0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u6781\u5927\u5730\u4fc3\u8fdb\u4e86\u7f29\u653e\u5f8b\u62df\u5408\u548c\u9884\u6d4b\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u666e\u53ca\u6027\u3002|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4ee5\u4e0d\u65ad\u589e\u957f\u7684\u7a0b\u5ea6\u81ea\u4e3b\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u7ed9\u5b83\u4eec\u7684\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5b83\u5229\u7528\u4e86\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u7684\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5411\u5b89\u5168\u6027\u53d1\u5c55\uff0c\u540c\u65f6\u5b8c\u6210\u7ed9\u5b9a\u7684\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u4e2a\u6279\u5224\u6027\u673a\u5236\uff0c\u5728\u6bcf\u4e2a\u6b65\u9aa4\u4e0a\u5f15\u5bfc\u4ee3\u7406\u907f\u514d\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5b89\u5168\u63a8\u7406\u80fd\u529b\u7684\u73b0\u6709\u57fa\u51c6\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u6db5\u76d68\u4e2a\u7c7b\u522b\u5171\u8ba180\u4e2a\u5de5\u5177\u5305\u548c180\u4e2a\u573a\u666f\u7684\u4e00\u7ec4\u6570\u636e\u96c6\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u6027\u601d\u8003\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.11006": "|**2024-08-20**|**While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?**|Wen Cheng et.al.|[2408.11006](http://arxiv.org/abs/2408.11006)|**[link](https://github.com/sensente/security-attacks-on-lccts)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u8865\u5168\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u50ac\u751f\u4e86\u65b0\u4e00\u4ee3\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff08LCCT\uff09\u3002\u4e0e\u901a\u7528LLM\u4e0d\u540c\uff0c\u8fd9\u4e9b\u5de5\u5177\u5177\u6709\u72ec\u7279\u7684\u64cd\u4f5c\u6d41\u7a0b\uff0c\u6574\u5408\u591a\u79cd\u4fe1\u606f\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4f18\u5148\u8003\u8651\u4ee3\u7801\u5efa\u8bae\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\uff0c\u8fd9\u5f15\u5165\u4e86\u7279\u5b9a\u7684\u5b89\u5168\u6311\u6218\u3002\u6b64\u5916\uff0cLCCT\u901a\u5e38\u4f9d\u8d56\u4e8e\u4e13\u6709\u4ee3\u7801\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u5f15\u53d1\u4e86\u5173\u4e8e\u654f\u611f\u6570\u636e\u6cc4\u9732\u7684\u62c5\u5fe7\u3002\u672c\u6587\u5229\u7528LCCT\u7684\u72ec\u7279\u7279\u6027\uff0c\u5f00\u53d1\u4e86\u9488\u5bf9\u4e24\u79cd\u5173\u952e\u5b89\u5168\u98ce\u9669\u7684\u9488\u5bf9\u6027\u653b\u51fb\u65b9\u6cd5\uff1a\u8d8a\u72f1\u653b\u51fb\u548c\u8bad\u7ec3\u6570\u636e\u63d0\u53d6\u653b\u51fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86LCCT\u4e2d\u5b58\u5728\u7684\u91cd\u5927\u6f0f\u6d1e\uff0c\u5305\u62ec\u5728GitHub Copilot\u4e0a\u768499.4%\u6210\u529f\u8d8a\u72f1\u653b\u51fb\u7387\uff0c\u5728Amazon Q\u4e0a\u768446.3%\u6210\u529f\u7387\u3002\u6211\u4eec\u8fd8\u6210\u529f\u4eceGitHub Copilot\u4e2d\u63d0\u53d6\u4e86\u654f\u611f\u7528\u6237\u6570\u636e\uff0c\u5305\u62ec54\u4e2a\u771f\u5b9e\u7535\u5b50\u90ae\u4ef6\u5730\u5740\u548c314\u4e2a\u4e0eGitHub\u7528\u6237\u540d\u5173\u8054\u7684\u7269\u7406\u5730\u5740\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7801\u7684\u653b\u51fb\u65b9\u6cd5\u5bf9\u901a\u7528LLM\uff08\u5982GPT\u7cfb\u5217\uff09\u540c\u6837\u6709\u6548\uff0c\u7a81\u663e\u4e86\u73b0\u4ee3LLM\u5904\u7406\u4ee3\u7801\u65f6\u5b58\u5728\u7684\u66f4\u5e7f\u6cdb\u5b89\u5168\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86LCCT\u9762\u4e34\u7684\u5173\u952e\u5b89\u5168\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u52a0\u5f3a\u5176\u5b89\u5168\u6846\u67b6\u7684\u91cd\u8981\u65b9\u5411\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u76f8\u5173\u4ee3\u7801\u793a\u4f8b\u548c\u653b\u51fb\u6837\u672c\uff0c\u5b83\u4eec\u53ef\u4ecehttps://github.com/Sensente/Security-Attacks-on-LCCTs\u83b7\u53d6\u3002**|\n", "2408.10995": "|**2024-08-20**|**CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models**|Michael Reinisch et.al.|[2408.10995](http://arxiv.org/abs/2408.10995)|null|\u65b0\u533b\u7597\u6cbb\u7597\u65b9\u6cd5\u7684\u5f00\u53d1\u9700\u8981\u591a\u4e2a\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u3002\u5c3d\u7ba1\u5c06\u836f\u7269\u63a8\u5411\u5e02\u573a\u7684\u6210\u672c\u9ad8\u6602\u4e14\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u53ea\u6709\u4e0d\u523020%\u7684\u836f\u7269\u80fd\u4ece\u7b2c\u4e00\u9636\u6bb5\u8fc7\u6e21\u5230\u6700\u540e\u7684\u6279\u51c6\u3002\u8fd1\u671f\u7684\u7814\u7a76\u6587\u732e\u8868\u660e\uff0c\u8bd5\u9a8c\u65b9\u6848\u7684\u8bbe\u8ba1\u5bf9\u8bd5\u9a8c\u8868\u73b0\u6709\u7740\u663e\u8457\u5f71\u54cd\u3002\u6211\u4eec\u7814\u7a76\u4e86\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u9884\u6d4b\uff08CTOP\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u8bd5\u9a8c\u8bbe\u8ba1\u6587\u4ef6\u81ea\u52a8\u9884\u6d4b\u4e0d\u540c\u9636\u6bb5\u7684\u8f6c\u6362\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684CTOP\u6a21\u578b\u2014\u2014CTP-LLM\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aPhaseTransition\uff08PT\uff09\u7684\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u6839\u636e\u8bd5\u9a8c\u5728\u76d1\u7ba1\u8fc7\u7a0b\u4e2d\u7684\u8fdb\u5c55\u8fdb\u884c\u6807\u8bb0\uff0c\u5e76\u4f5c\u4e3aCTOP\u8bc4\u4f30\u7684\u6807\u51c6\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cbe\u7ec6\u8c03\u53c2GPT-3.5\u4e3a\u57fa\u7840\u7684\u6a21\u578b\uff08CTP-LLM\uff09\u80fd\u591f\u901a\u8fc7\u5206\u6790\u539f\u59cb\u534f\u8bae\u6587\u672c\u6765\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u7684\u8f6c\u6362\uff0c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u9009\u62e9\u7684\u7279\u5f81\u3002CTP-LLM\u5728\u6240\u6709\u9636\u6bb5\u7684\u9884\u6d4b\u4e2d\u8fbe\u5230\u4e8667%\u7684\u51c6\u786e\u7387\uff0c\u5728\u9884\u6d4b\u4ece\u7b2c\u4e09\u9636\u6bb5\u5230\u6700\u7ec8\u6279\u51c6\u7684\u8f6c\u6362\u65f6\uff0c\u51c6\u786e\u7387\u66f4\u8fbe\u5230\u4e8675%\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u9a71\u52a8\u5e94\u7528\u5728\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u548c\u8bc4\u4f30\u8bd5\u9a8c\u8bbe\u8ba1\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.10947": "|**2024-08-20**|**Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models**|Yuyan Chen et.al.|[2408.10947](http://arxiv.org/abs/2408.10947)|null|\u6559\u5e08\u5728\u4f20\u6388\u77e5\u8bc6\u548c\u5f15\u5bfc\u5b66\u4e60\u8005\u65b9\u9762\u53d1\u6325\u7740\u91cd\u8981\u4f5c\u7528\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6f5c\u5728\u6559\u80b2\u8005\u7684\u89d2\u8272\u6b63\u5728\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u9886\u57df\u3002\u8ba4\u8bc6\u5230LLMs\u751f\u6210\u6559\u80b2\u5185\u5bb9\u7684\u80fd\u529b\u53ef\u4ee5\u63a8\u52a8\u81ea\u52a8\u5316\u548c\u4e2a\u6027\u5316\u5b66\u4e60\u7684\u8fdb\u5c55\u3002\u867d\u7136LLMs\u5728\u7406\u89e3\u529b\u548c\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u65b9\u9762\u7684\u6d4b\u8bd5\u5df2\u7ecf\u8fdb\u884c\uff0c\u4f46\u5b83\u4eec\u5728\u6559\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u9c9c\u4e3a\u4eba\u77e5\u3002\u5728\u6559\u5b66\u4e2d\uff0c\u63d0\u95ee\u662f\u4e00\u9879\u5173\u952e\u6280\u80fd\uff0c\u80fd\u591f\u6307\u5bfc\u5b66\u751f\u5206\u6790\u3001\u8bc4\u4f30\u5e76\u7efc\u5408\u6838\u5fc3\u6982\u5ff5\u548c\u539f\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\u6765\u8bc4\u4f30\u6559\u80b2\u4e2dLLMs\u7684\u63d0\u95ee\u80fd\u529b\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u7684\u6559\u80b2\u95ee\u9898\uff0c\u5229\u7528\u5b89\u5fb7\u68ee\u548c\u514b\u62c9\u592b\u970d\u592b\u7684\u5206\u7c7b\u6cd5\u8986\u76d6\u4e00\u822c\u3001\u5355\u5b66\u79d1\u548c\u8de8\u5b66\u79d1\u9886\u57df\u3002\u6211\u4eec\u4ece\u5c06LLMs\u89c6\u4e3a\u5b66\u4e60\u8005\u8f6c\u5411\u5c06\u5176\u89c6\u4e3a\u6559\u80b2\u8005\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u95ee\u9898\u7684\u80fd\u529b\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6559\u5b66\u80fd\u529b\u3002\u6211\u4eec\u5e94\u7528\u4e86\u56db\u4e2a\u6307\u6807\uff0c\u5305\u62ec\u76f8\u5173\u6027\u3001\u8986\u76d6\u7387\u3001\u4ee3\u8868\u6027\u4ee5\u53ca\u4e00\u81f4\u6027\uff0c\u6765\u8bc4\u4f30LLMs\u8f93\u51fa\u7684\u6559\u80b2\u8d28\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u5728\u6559\u6388\u4e00\u822c\u3001\u4eba\u6587\u5b66\u79d1\u548c\u79d1\u5b66\u8bfe\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u663e\u8457\u6f5c\u529b\uff1bClaude2\u4f3c\u4e4e\u66f4\u9002\u5408\u62c5\u4efb\u8de8\u5b66\u79d1\u6559\u5e08\u3002\u6b64\u5916\uff0c\u81ea\u52a8\u8bc4\u5206\u4e0e\u4eba\u7c7b\u89c2\u70b9\u4e00\u81f4\u3002|\n", "2408.10946": "|**2024-08-20**|**Large Language Model Driven Recommendation**|Anton Korikov et.al.|[2408.10946](http://arxiv.org/abs/2408.10946)|null|### \u6458\u8981 \u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u65b0\u673a\u9047\u3002\u5728\u4e4b\u524d\u7684\u7ae0\u8282\u4e2d\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662f\u57fa\u4e8e\u6807\u51c6\u5316\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u5982\u8d2d\u4e70\u3001\u89c2\u770b\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u589e\u5f3a\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff0c\u8fd9\u4e3a\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6765\u6784\u5efa\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u63a8\u8350\u7cfb\u7edf\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u7684\u65b9\u5f0f\u4ecb\u7ecd\u5173\u952e\u7684\u6570\u636e\u6e90\uff0c\u6db5\u76d6\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u7684\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u5305\u62ec\u8c03\u4f18\u548c\u672a\u8c03\u4f18\u60c5\u51b5\u4e0b\u7684\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\u3002\u7136\u540e\uff0c\u8f6c\u5411\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u7ba1\u9053\u4e2d\u534f\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u4fc3\u8fdb\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u63d0\u4f9b\u63a8\u8350\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e0e\u7528\u6237\u7684\u4e92\u52a8\uff0c\u7528\u4e8e\u504f\u597d\u63d0\u53d6\u3001\u6279\u8bc4\u548c\u95ee\u7b54\u3002 ### \u7ffb\u8bd1 \u672c\u6587\u63a2\u8ba8\u4e86\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u65b9\u9762\u7684\u65b0\u578b\u5e94\u7528\u3002\u6b64\u524d\u7ae0\u8282\u4e3b\u8981\u5173\u6ce8\u57fa\u4e8e\u6807\u51c6\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u4f8b\u5982\u8d2d\u4e70\u3001\u6d4f\u89c8\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5b83\u4eec\u5177\u5907\u4e86\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u7684\u80fd\u529b\uff0c\u4ece\u800c\u6253\u5f00\u4e86\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6784\u5efa\u9ad8\u5ea6\u5b9a\u5236\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u80fd\u6027\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u65b9\u5f0f\u6982\u8ff0\u4e86\u5173\u952e\u6570\u636e\u6e90\uff0c\u5305\u62ec\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u968f\u540e\uff0c\u6df1\u5165\u63a2\u8ba8\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u6db5\u76d6\u4e86\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\uff0c\u65e0\u8bba\u662f\u5728\u8c03\u4f18\u8fd8\u662f\u672a\u8c03\u4f18\u72b6\u6001\u4e0b\u3002\u63a5\u7740\uff0c\u8ba8\u8bba\u4e86\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u6d41\u7a0b\u4e2d\u534f\u540c\u5de5\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u652f\u6301\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u7528\u4e8e\u751f\u6210\u63a8\u8350\uff0c\u8fd8\u80fd\u4e0e\u7528\u6237\u8fdb\u884c\u4e92\u52a8\uff0c\u8fdb\u884c\u504f\u597d\u6536\u96c6\u3001\u8bc4\u4ef7\u548c\u95ee\u7b54\u3002|\n", "2408.11813": "|**2024-08-21**|**SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs**|Yuanyang Yin et.al.|[2408.11813](http://arxiv.org/abs/2408.11813)|null|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u8868\u73b0\uff0c\u5b83\u4eec\u901a\u5e38\u7531\u89c6\u89c9\u7f16\u7801\u5668\u3001\u9002\u914d\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7ec4\u6210\u3002\u9002\u914d\u5668\u4f5c\u4e3a\u89c6\u89c9\u4e0e\u8bed\u8a00\u7ec4\u4ef6\u4e4b\u95f4\u7684\u5173\u952e\u6865\u6881\u3002\u7136\u800c\uff0c\u901a\u8fc7\u56fe\u50cf\u7ea7\u76d1\u7763\u8bad\u7ec3\u9002\u914d\u5668\u5f80\u5f80\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u5bf9\u9f50\u504f\u5dee\uff0c\u8fd9\u4f1a\u524a\u5f31LLM\u7684\u80fd\u529b\u5e76\u9650\u5236\u591a\u6a21\u6001LLM\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76d1\u7763\u5d4c\u5165\u5bf9\u9f50\uff08SEA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982CLIP\uff09\u7684\u5206\u8bcd\u7ea7\u5bf9\u9f50\u65b9\u6cd5\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u8c03\u6574\u89c6\u89c9\u5206\u8bcd\u4e0eLLM\u5d4c\u5165\u7a7a\u95f4\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u786e\u4fdd\u4e86\u89c6\u89c9\u548c\u8bed\u8a00\u8868\u793a\u4e4b\u95f4\u66f4\u534f\u8c03\u7684\u6574\u5408\uff0c\u4ece\u800c\u589e\u5f3a\u591a\u6a21\u6001LLM\u7684\u6027\u80fd\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u540c\u65f6\u4fdd\u7559\u5176\u56fa\u6709\u7279\u6027\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cSEA\u6709\u6548\u5730\u63d0\u9ad8\u4e86MLLMs\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u65e0\u9700\u989d\u5916\u7684\u6570\u636e\u6216\u63a8\u7406\u8ba1\u7b97\u3002\u6b64\u5916\uff0cSEA\u4e5f\u4e3a\u5f00\u53d1\u66f4\u901a\u7528\u548c\u9002\u5e94\u6027\u5f3a\u7684\u89e3\u51b3\u65b9\u6848\u4ee5\u589e\u5f3a\u591a\u6a21\u6001\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.11801": "|**2024-08-21**|**Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models**|Yuzhou Huang et.al.|[2408.11801](http://arxiv.org/abs/2408.11801)|null|\u4f20\u7edf\u89c6\u89c9\u53d9\u4e8b\u590d\u6742\uff0c\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u548c\u5927\u91cf\u8d44\u6e90\uff0c\u4f46\u5f80\u5f80\u53d7\u9650\u4e8e\u4eba\u7c7b\u7684\u521b\u9020\u529b\u4e0e\u521b\u4f5c\u7cbe\u5ea6\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u589e\u5f3a\u4e86\u89c6\u89c9\u53d9\u4e8b\u80fd\u529b\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u4e8c\u7ef4\u89c6\u89c9\u6548\u679c\u6216\u901a\u8fc7\u52a8\u4f5c\u5408\u6210\u548c\u884c\u4e3a\u6a21\u62df\u7b80\u5316\u6545\u4e8b\uff0c\u672a\u80fd\u751f\u6210\u5168\u9762\u3001\u591a\u7ef4\u7684\u53d9\u4e8b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faStory3D-Agent\uff0c\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u7684\u80fd\u529b\u5c06\u63d0\u4f9b\u7684\u53d9\u4e8b\u8f6c\u5316\u4e3a\u4e09\u7ef4\u6e32\u67d3\u53ef\u89c6\u5316\u3002\u901a\u8fc7\u96c6\u6210\u7a0b\u5e8f\u5efa\u6a21\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u7cbe\u786e\u63a7\u5236\u591a\u89d2\u8272\u7684\u52a8\u4f5c\u548c\u52a8\u6001\uff0c\u4ee5\u53ca\u5404\u79cd\u88c5\u9970\u5143\u7d20\uff0c\u786e\u4fdd\u957f\u671f\u548c\u52a8\u6001\u7684\u4e09\u7ef4\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u652f\u6301\u901a\u8fc7\u903b\u8f91\u63a8\u7406\u8fdb\u884c\u53d9\u4e8b\u6269\u5c55\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u6709\u6761\u4ef6\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u5bf9Story3D-Agent\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u4ee5\u9a8c\u8bc1\u5176\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u672c\u6846\u67b6\u6765\u63a8\u52a8\u4e09\u7ef4\u6545\u4e8b\u8868\u793a\u7684\u53d1\u5c55\u3002|\n", "2408.11800": "|**2024-08-21**|**PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain**|Rounak Meyur et.al.|[2408.11800](http://arxiv.org/abs/2408.11800)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u548c\u6587\u672c\u751f\u6210\u9886\u57df\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5174\u8d77\u4e3a\u901a\u8fc7\u5229\u7528\u7528\u6237\u6307\u5b9a\u6570\u636e\u5e93\u4e2d\u7684\u4fe1\u606f\u6765\u63d0\u9ad8\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u57fa\u51c6\u6d4b\u8bd5\u5bf9\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u4e0d\u540cRAG\u914d\u7f6e\u5728\u68c0\u7d22\u5668\u548c\u751f\u6210\u5668\u65b9\u9762\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u914d\u7f6e\u7684\u6709\u6548\u6027\u3001\u53ef\u6269\u5c55\u6027\u548c\u7279\u5b9a\u9886\u57df\u548c\u5e94\u7528\u7684\u9002\u7528\u6027\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u751f\u6210\u4e0e\u7279\u5b9a\u9886\u57df\u76f8\u5173\u7684RAG\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u57fa\u4e8e\u81ea\u52a8\u95ee\u9898\u7b54\u6848\u751f\u6210\u4e0e\u4eba\u7c7b\uff08\u9886\u57df\u4e13\u5bb6\uff09-\u4eba\u5de5\u667a\u80fd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u534f\u4f5c\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u4ee5\u6848\u4f8b\u7814\u7a76\u7684\u5f62\u5f0f\uff0c\u6211\u4eec\u901a\u8fc7\u5f15\u5165PermitQA\u4f5c\u4e3a\u98ce\u573a\u9009\u5740\u548c\u8bb8\u53ef\u9886\u57df\u7684\u9996\u4e2a\u57fa\u51c6\u8fdb\u884c\u4e86\u6846\u67b6\u5c55\u793a\uff0c\u8be5\u57fa\u51c6\u5305\u542b\u4e86\u4e0e\u98ce\u80fd\u9879\u76ee\u73af\u5883\u5f71\u54cd\u76f8\u5173\u7684\u591a\u7bc7\u79d1\u5b66\u6587\u6863/\u62a5\u544a\u3002 \u6211\u4eec\u7684\u6846\u67b6\u7cfb\u7edf\u5730\u4f7f\u7528\u591a\u79cd\u6307\u6807\u548c\u4e0d\u540c\u590d\u6742\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\u7c7b\u578b\u6765\u8bc4\u4f30RAG\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4e0d\u540c\u6a21\u578b\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002|\n", "2408.11795": "|**2024-08-21**|**EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model**|Feipeng Ma et.al.|[2408.11795](http://arxiv.org/abs/2408.11795)|null|\u5728\u591a\u6a21\u6001\u7814\u7a76\u9886\u57df\uff0c\u4f17\u591a\u7814\u7a76\u5229\u7528\u5927\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u8fdb\u884c\u6a21\u6001\u5bf9\u9f50\u5b66\u4e60\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u8f6c\u5316\u4e3a\u591a\u6a21\u6001LLMs\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u76ee\u524d\u4e3b\u8981\u7684\u5b9e\u73b0\u65b9\u6cd5\u5206\u4e3a\u4e24\u7c7b\uff1a\u81ea\u6ce8\u610f\u529b\u57fa\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u3002\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u56e0\u5176\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u673a\uff08MLP\uff09\u67b6\u6784\u800c\u5177\u6709\u8f83\u9ad8\u7684\u6570\u636e\u6548\u7387\uff0c\u4f46\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u5374\u76f8\u5bf9\u8f83\u4f4e\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u9700\u8981\u5c06\u89c6\u89c9\u548c\u6587\u672c\u4ee4\u724c\u4f5c\u4e3a\u8f93\u5165\u8fdb\u884c\u8fde\u63a5\u3002\u800c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u867d\u7136\u5728\u989d\u5916\u7684\u5b66\u4e60\u53c2\u6570\u65b9\u9762\u4e0d\u5982\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u9ad8\u6548\uff0c\u4f46\u7531\u4e8e\u907f\u514d\u4e86\u4e3aLLM\u63d0\u4f9b\u8fc7\u957f\u5e8f\u5217\u8f93\u5165\uff0c\u56e0\u6b64\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8868\u73b0\u66f4\u9ad8\u3002\u4e3a\u4e86\u5e73\u8861\u8fd9\u4e9b\u6743\u8861\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6570\u636e\u9ad8\u6548\u4e14\u8ba1\u7b97\u9ad8\u6548\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08EE-MLLM\uff09\u3002EE-MLLM\u5728\u4e0d\u5f15\u5165\u989d\u5916\u6a21\u5757\u6216\u53ef\u5b66\u4e60\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6570\u636e\u548c\u8ba1\u7b97\u6548\u7387\u7684\u63d0\u5347\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u591a\u6a21\u6001LLM\u4e2d\u7684\u539f\u59cb\u81ea\u6ce8\u610f\u529b\u673a\u5236\u8fdb\u884c\u4e86\u6539\u8fdb\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u590d\u5408\u6ce8\u610f\u529b\u673a\u5236\u3002\u8be5\u673a\u5236\u6709\u4e24\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u6d88\u9664\u89c6\u89c9\u4ee4\u724c\u5185\u90e8\u7684\u81ea\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u4ee5\u5b9e\u73b0\u8ba1\u7b97\u6548\u7387\uff1b2\uff09\u91cd\u7528LLM\u6bcf\u4e00\u5c42\u7684\u6743\u91cd\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u4e0e\u8bed\u8a00\u4e4b\u95f4\u7684\u6709\u6548\u6a21\u6001\u5bf9\u9f50\uff0c\u4ece\u800c\u5b9e\u73b0\u6570\u636e\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEE-MLLM\u5728\u5305\u62ecMMBench\u3001SeedBench\u7b49\u901a\u7528\u6027\u6570\u636e\u96c6\u4ee5\u53caTextVQA\u3001DocVQA\u7b49\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u5728\u5185\u7684\u591a\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6709\u6548\u6027\u3002|\n", "2408.11793": "|**2024-08-21**|**Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design**|Nathaniel H. Park et.al.|[2408.11793](http://arxiv.org/abs/2408.11793)|null|\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u548c\u901a\u8fc7\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u8fdb\u884c\u751f\u6210\u8bbe\u8ba1\u662f\u7814\u7a76\u7684\u70ed\u70b9\u9886\u57df\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u5b83\u5728\u52a0\u901f\u65b0\u6750\u6599\u5f00\u53d1\u65b9\u9762\u7684\u6f5c\u529b\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u5f97\u5230\u4e86\u663e\u8457\u589e\u5f3a\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u66f4\u590d\u6742\u7684\u7814\u7a76\u4efb\u52a1\u80cc\u666f\u4e0b\u8fdb\u884c\u9884\u6d4b\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u65b9\u9762\uff0c\u4ee3\u7406\u7cfb\u7edf\u4ecd\u6709\u6539\u8fdb\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u5bf9\u9884\u6d4b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u66ff\u4ee3\u5e94\u7528\uff0c\u5982\u5229\u7528\u5b83\u4eec\u7684\u6f5c\u5728\u8868\u793a\u6765\u4fc3\u8fdb\u8de8\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u5728\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u4efb\u52a1\u7279\u5b9a\u7684\u6750\u6599\u8bbe\u8ba1\uff0c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002 \u5728\u6b64\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5927\u89c4\u6a21\u3001\u9884\u8bad\u7ec3\u7684\u5316\u5b66\u57fa\u7840\u6a21\u578b\u53ef\u4ee5\u4f5c\u4e3a\u4f7f\u5316\u5b66\u4fe1\u606f\u68c0\u7d22\u8bed\u4e49\u5316\u7684\u57fa\u7840\uff0c\u9002\u7528\u4e8e\u5c0f\u5206\u5b50\u3001\u590d\u6742\u805a\u5408\u7269\u6750\u6599\u548c\u53cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5316\u5b66\u57fa\u7840\u6a21\u578b\u4e0e\u56fe\u50cf\u6a21\u578b\uff08\u5982OpenCLIP\uff09\u76f8\u7ed3\u5408\uff0c\u80fd\u591f\u5b9e\u73b0\u8de8\u591a\u4e2a\u8868\u5f81\u6570\u636e\u57df\u7684\u524d\u6240\u672a\u6709\u7684\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e9b\u7cfb\u7edf\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\uff0c\u4ee5\u652f\u6301\u7ed3\u6784\u548c\u62d3\u6251\u4e3a\u57fa\u7840\u7684\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\uff0c\u4ece\u800c\u4fc3\u8fdb\u590d\u6742\u7814\u7a76\u4efb\u52a1\u7684\u6267\u884c\u3002|\n", "2408.11791": "|**2024-08-21**|**Critique-out-Loud Reward Models**|Zachary Ankner et.al.|[2408.11791](http://arxiv.org/abs/2408.11791)|**[link](https://github.com/zankner/cloud)**|**\u4f20\u7edf\u7684\u5956\u52b1\u6a21\u578b\u5728\u4ece\u4eba\u7c7b\u53cd\u9988\u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u65f6\uff0c\u4ec5\u7528\u4e8e\u76f4\u63a5\u9884\u6d4b\u504f\u597d\u5206\u6570\uff0c\u800c\u4e0d\u5229\u7528\u5e95\u5c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u80fd\u529b\u3002\u8fd9\u9650\u5236\u4e86\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u5fc5\u987b\u901a\u8fc7\u5355\u4e00\u524d\u5411\u4f20\u9012\u6765\u9690\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5373\uff0c\u5fc5\u987b\u5728\u504f\u597d\u5efa\u6a21\u8fc7\u7a0b\u4e2d\u5b8c\u6210\u63a8\u7406\u3002\u4e3a\u4e86\u4f7f\u5956\u52b1\u6a21\u578b\u80fd\u591f\u663e\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u3002CLoud\u5956\u52b1\u6a21\u578b\u9996\u5148\u751f\u6210\u5bf9\u52a9\u624b\u54cd\u5e94\u7684\u81ea\u7136\u8bed\u8a00\u6279\u8bc4\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u6279\u8bc4\u6765\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\u7684\u6807\u91cf\u5956\u52b1\u3002 \u6211\u4eec\u8bc1\u660e\u4e86\u5bf9\u4e8eLlama-3-8B\u548c70B\u57fa\u7840\u6a21\u578b\uff0cCLoud\u5956\u52b1\u6a21\u578b\u7684\u6210\u529f\uff1a\u4e0e\u7ecf\u5178\u5956\u52b1\u6a21\u578b\u76f8\u6bd4\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5206\u522b\u5728RewardBench\u4e0a\u63d0\u9ad8\u4e868B\u548c70B\u57fa\u7840\u6a21\u578b\u7684\u4e8c\u5143\u504f\u597d\u5206\u7c7b\u51c6\u786e\u73874.65\u548c5.84\u4e2a\u767e\u5206\u70b9\u3002\u6b64\u5916\uff0c\u5f53\u4f5c\u4e3aBest-of-N\u8bc4\u5206\u6a21\u578b\u4f7f\u7528\u65f6\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5728ArenaHard\u4e0a\u7684\u80dc\u7387\u4e5f\u5b9e\u73b0\u4e86\u5e15\u7d2f\u6258\u6539\u8fdb\u3002\u6700\u540e\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u5229\u7528CLoud\u5956\u52b1\u6a21\u578b\u7684\u52a8\u6001\u63a8\u7406\u8ba1\u7b97\u80fd\u529b\uff0c\u901a\u8fc7\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u6765\u8fdb\u884c\u5956\u52b1\u9884\u6d4b\u3002 \u4ee5\u4e0a\u662f\u5173\u4e8e\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u7684\u6458\u8981\u7ffb\u8bd1\uff0c\u5b83\u5c55\u793a\u4e86\u8fd9\u79cd\u65b0\u578b\u5956\u52b1\u6a21\u578b\u5728\u63d0\u5347\u5f3a\u5316\u5b66\u4e60\u7cfb\u7edf\u6027\u80fd\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2408.11788": "|**2024-08-21**|**DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework**|Zhifei Xie et.al.|[2408.11788](http://arxiv.org/abs/2408.11788)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cDreamFactory\u201d\u7684LLM\u57fa\u6846\u67b6\uff0c\u5b83\u80fd\u89e3\u51b3\u5f53\u524d\u89c6\u9891\u751f\u6210\u6a21\u578b\u5728\u521b\u5efa\u957f\u89c6\u9891\u65f6\u9047\u5230\u7684\u6311\u6218\u3002DreamFactory\u901a\u8fc7\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u539f\u5219\u548c\u5173\u952e\u5e27\u8fed\u4ee3\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u786e\u4fdd\u4e86\u957f\u89c6\u9891\u7684\u4e00\u81f4\u6027\u548c\u98ce\u683c\u7edf\u4e00\u3002\u5b83\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08Chain of Thought\uff0cCOT\uff09\u6765\u5904\u7406\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56fa\u6709\u7684\u4e0d\u786e\u5b9a\u6027\u3002DreamFactory\u80fd\u591f\u751f\u6210\u957f\u3001\u98ce\u683c\u4e00\u81f4\u4e14\u590d\u6742\u7684\u89c6\u9891\u3002 \u5bf9\u4e8e\u8fd9\u4e9b\u957f\u5f62\u5f0f\u89c6\u9891\u7684\u8bc4\u4f30\u63d0\u51fa\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u8de8\u573a\u666f\u9762\u90e8\u8ddd\u79bb\u5206\u6570\u548c\u8de8\u573a\u666f\u98ce\u683c\u4e00\u81f4\u6027\u5206\u6570\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\uff0c\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u5305\u542b\u8d85\u8fc7150\u4e2a\u7531\u4eba\u7c7b\u8bc4\u5206\u7684\u591a\u573a\u666f\u89c6\u9891\u7684\u591a\u573a\u666f\u89c6\u9891\u6570\u636e\u96c6\u3002|\n", "2408.11779": "|**2024-08-21**|**Personality Alignment of Large Language Models**|Minjun Zhu et.al.|[2408.11779](http://arxiv.org/abs/2408.11779)|**[link](https://github.com/zhu-minjun/palign)**|**\u4e3a\u4e86\u5f25\u8865\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\u65b9\u6cd5\u5728\u53cd\u6620\u4eba\u7c7b\u666e\u904d\u4ef7\u503c\u89c2\u548c\u884c\u4e3a\u65f6\u7684\u4e0d\u8db3\uff0c\u5ffd\u89c6\u4e86\u4e2a\u4f53\u7528\u6237\u72ec\u7279\u7279\u5f81\u548c\u504f\u597d\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e2a\u6027\u5bf9\u9f50\u7684\u6982\u5ff5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u6839\u636e\u4e2a\u4f53\u7528\u6237\u6216\u7d27\u5bc6\u5173\u8054\u7fa4\u4f53\u7684\u5177\u4f53\u504f\u597d\u8c03\u6574LLM\u7684\u54cd\u5e94\u4e0e\u51b3\u7b56\u3002\u53d7\u5fc3\u7406\u6d4b\u91cf\u5b66\u7684\u542f\u53d1\uff0c\u6211\u4eec\u6784\u5efa\u4e86Personality Alignment with Personality Inventories (PAPI) \u6570\u636e\u96c6\uff0c\u5305\u542b\u4e8630\u4e07\u771f\u5b9e\u4e3b\u4f53\u7684\u6570\u636e\uff0c\u6bcf\u4e2a\u4e3b\u4f53\u57fa\u4e8e\u4e94\u5927\u4eba\u683c\u56e0\u7d20\u63d0\u4f9b\u884c\u4e3a\u504f\u597d\u4fe1\u606f\u3002\u8fd9\u4e00\u6570\u636e\u96c6\u4f7f\u6211\u4eec\u80fd\u591f\u5b9a\u91cf\u8bc4\u4f30LLM\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u4e0e\u6bcf\u4e2a\u4e3b\u4f53\u7684\u884c\u4e3a\u6a21\u5f0f\u76f8\u5339\u914d\u3002\u9274\u4e8e\u4e2a\u6027\u5bf9\u9f50\u9762\u4e34\u7684\u6311\u6218\uff1a\u5982\u4e2a\u4eba\u6570\u636e\u6709\u9650\u3001\u504f\u597d\u591a\u6837\u4ee5\u53ca\u53ef\u6269\u5c55\u6027\u9700\u6c42\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6fc0\u6d3b\u5e72\u9884\u4f18\u5316\u65b9\u6cd5\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528\u6700\u5c11\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u63d0\u9ad8\u4e86LLM\u9ad8\u6548\u5bf9\u9f50\u4e2a\u4f53\u884c\u4e3a\u504f\u597d\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5PAS\u4e0d\u4ec5\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86DPO\uff0c\u800c\u4e14\u4f18\u5316\u65f6\u95f4\u4ec5\u4e3a\u540e\u8005\u7684\u4e94\u5206\u4e4b\u4e00\uff0c\u5177\u6709\u5b9e\u9645\u4ef7\u503c\uff0c\u63a8\u52a8\u4e86\u4e2a\u6027\u5316\u7684AI\u7cfb\u7edf\u51b3\u7b56\u4e0e\u63a8\u7406\u7684\u53d1\u5c55\uff0c\u589e\u5f3a\u4e86\u4e0e\u6bcf\u4f4d\u7528\u6237\u7684\u4ea4\u4e92\u76f8\u5173\u6027\u548c\u610f\u4e49\uff0c\u4fc3\u8fdb\u4e86\u4ee5\u4eba\u4e3a\u672c\u7684\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728\u3002**|\n", "2408.11775": "|**2024-08-21**|**Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards**|Omar Erak et.al.|[2408.11775](http://arxiv.org/abs/2408.11775)|**[link](https://github.com/Nouf-Alabbasi/oKUmura_AI_Telecom_challenge)**|**\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7535\u4fe1\u6807\u51c6\u65b9\u9762\u7684\u6280\u672f\u89c4\u8303\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8ePhi-2\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u7684\u5fae\u8c03\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u4f5c\u4e3a\u901a\u4fe1\u7f51\u7edc\u7684\u6743\u5a01\u7b54\u6848\u6765\u6e90\u3002\u6211\u4eec\u5f00\u53d1\u7684\u7cfb\u7edf\u5229\u7528\u524d\u77bb\u6027\u7684\u8bed\u4e49\u5206\u5757\u6765\u52a8\u6001\u786e\u5b9a\u89e3\u6790\u65ad\u70b9\uff0c\u4f9d\u636e\u5d4c\u5165\u76f8\u4f3c\u5ea6\u8fdb\u884c\u8c03\u6574\uff0c\u4ece\u800c\u6709\u6548\u5904\u7406\u591a\u79cd\u6587\u6863\u683c\u5f0f\u3002\u9488\u5bf9\u6280\u672f\u6807\u51c6\u4e2d\u53ef\u80fd\u51fa\u73b0\u7684\u591a\u4e2a\u76f8\u4f3c\u4e0a\u4e0b\u6587\u95ee\u9898\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u91cd\u65b0\u6392\u540d\u7b97\u6cd5\u4ee5\u4f18\u5148\u8003\u8651\u6700\u76f8\u5173\u7684\u63d0\u53d6\u7247\u6bb5\u3002\u8003\u8651\u5230Phi-2\u7684\u5c0f\u8bed\u5883\u7a97\u53e3\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aSelfExtend\u7684\u6700\u65b0\u6280\u672f\uff0c\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u6269\u5c55\u8bed\u5883\u7a97\u53e3\uff0c\u4e0d\u4ec5\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u8fd8\u80fd\u9002\u5e94\u5ba2\u6237\u5230\u4e13\u4e1a\u6280\u672f\u4eba\u5458\u7684\u5404\u79cd\u67e5\u8be2\u548c\u8bbe\u8ba1\u9700\u6c42\u3002\u4e3a\u4e86\u5fae\u8c03\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4f4e\u79e9\u9002\u914d\uff08LoRA\uff09\u6280\u672f\uff0c\u5728\u8bad\u7ec3\u65f6\u63d0\u9ad8\u8ba1\u7b97\u6548\u7387\uff0c\u5e76\u5728\u5c0f\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u6709\u6548\u7684\u5fae\u8c03\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u7535\u4fe1\u9886\u57df\u5bf9\u73b0\u6709\u95ee\u7b54\u65b9\u6cd5\u7684\u663e\u8457\u6539\u8fdb\uff0c\u6027\u80fd\u8d85\u8fc7GPT-4\uff08\u5927\u7ea6\u662f\u5176\u89c4\u6a21\u7684880\u500d\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u5229\u7528SLM\u5728\u901a\u4fe1\u7f51\u7edc\u4e2d\u7684\u65b0\u65b9\u6cd5\uff0c\u63d0\u4f9b\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u53ef\u4f5c\u4e3a\u6784\u5efa\u667a\u80fd\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u3002**|\n", "2408.11749": "|**2024-08-21**|**Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks**|Yiyi Chen et.al.|[2408.11749](http://arxiv.org/abs/2408.11749)|**[link](https://github.com/siebeniris/vec2text_exp)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u6765\u81ea\u7f51\u7edc\u653b\u51fb\u8005\u7684\u6076\u610f\u5f71\u54cd\uff0c\u5982\u5bf9\u6297\u6027\u3001\u540e\u95e8\u548c\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u3002\u5bf9\u6b64\uff0c\u65b0\u5174\u7684LLM\u5b89\u5168\u9886\u57df\u81f4\u529b\u4e8e\u7814\u7a76\u5e76\u9632\u5fa1\u6b64\u7c7b\u5a01\u80c1\u3002\u8fc4\u4eca\u4e3a\u6b62\uff0c\u8be5\u9886\u57df\u7684\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u82f1\u8bed\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u7136\u800c\uff0c\u6700\u65b0\u7814\u7a76\u8868\u660e\uff0c\u591a\u8bed\u8a00LLM\u53ef\u80fd\u6bd4\u5176\u5355\u4e00\u8bed\u8a00\u540c\u50da\u66f4\u6613\u53d7\u5230\u5404\u79cd\u653b\u51fb\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5728\u90e8\u5206\u6b27\u6d32\u8bed\u8a00\u4e0a\u7684\u5d4c\u5165\u53cd\u8f6c\uff0c\u4f46\u8981\u5c06\u8fd9\u4e9b\u53d1\u73b0\u63a8\u53ca\u5230\u4e0d\u540c\u8bed\u7cfb\u548c\u4e0d\u540c\u4e66\u5199\u7cfb\u7edf\u7684\u8bed\u8a00\uff0c\u5374\u6781\u5177\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u591a\u8bed\u8a00LLM\u5728\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5e76\u572820\u79cd\u8bed\u8a00\u4e2d\u8fdb\u884c\u8de8\u8bed\u8a00\u548c\u8de8\u4e66\u5199\u7684\u53cd\u8f6c\u6d4b\u8bd5\uff0c\u8986\u76d68\u4e2a\u8bed\u7cfb\u548c12\u79cd\u4e66\u5199\u7cfb\u7edf\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u963f\u62c9\u4f2f\u5b57\u6bcd\u548c\u897f\u91cc\u5c14\u5b57\u6bcd\u4e66\u5199\u7684\u8bed\u8a00\u4ee5\u53ca\u5370\u5ea6-\u96c5\u5229\u5b89\u8bed\u7cfb\u7684\u8bed\u8a00\u7279\u522b\u5bb9\u6613\u53d7\u5230\u5d4c\u5165\u53cd\u8f6c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u89c2\u5bdf\u5230\u53cd\u8f6c\u6a21\u578b\u503e\u5411\u4e8e\u51fa\u73b0\u8bed\u8a00\u6df7\u6dc6\uff0c\u6709\u65f6\u5927\u5e45\u5ea6\u964d\u4f4e\u4e86\u653b\u51fb\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u63a2\u7d22\u4e86\u8fd9\u4e00\u74f6\u9888\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u53ef\u9884\u6d4b\u6a21\u5f0f\uff0c\u8fd9\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u3002\u6700\u7ec8\uff0c\u672c\u7814\u7a76\u65e8\u5728\u6df1\u5316\u5bf9\u591a\u8bed\u8a00LLM\u9762\u4e34\u7684\u4e3b\u8981\u5b89\u5168\u6f0f\u6d1e\u7684\u7406\u89e3\uff0c\u5e76\u63d0\u9ad8\u5bf9\u6700\u6613\u53d7\u8fd9\u4e9b\u653b\u51fb\u5f71\u54cd\u7684\u8bed\u8a00\u7684\u610f\u8bc6\u3002|\n", "2408.12599": "|**2024-08-22**|**Controllable Text Generation for Large Language Models: A Survey**|Xun Liang et.al.|[2408.12599](http://arxiv.org/abs/2408.12599)|**[link](https://github.com/iaar-shanghai/ctgsurvey)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6587\u672c\u751f\u6210\u8d28\u91cf\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0cLLMs\u9700\u8981\u6ee1\u8db3\u65e5\u76ca\u590d\u6742\u7684\u9700\u6c42\u3002\u9664\u4e86\u907f\u514d\u8bef\u5bfc\u6027\u6216\u4e0d\u9002\u5f53\u7684\u5185\u5bb9\uff0cLLMs\u8fd8\u88ab\u671f\u671b\u6839\u636e\u7279\u5b9a\u7528\u6237\u9700\u6c42\u8fdb\u884c\u8c03\u6574\uff0c\u5982\u6a21\u4eff\u7279\u5b9a\u7684\u5199\u4f5c\u98ce\u683c\u6216\u751f\u6210\u5bcc\u6709\u8bd7\u610f\u7684\u6587\u672c\u3002\u8fd9\u4e9b\u591a\u6837\u7684\u9700\u6c42\u63a8\u52a8\u4e86\u53ef\u63a7\u6587\u672c\u751f\u6210\uff08CTG\uff09\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u7b26\u5408\u9884\u8bbe\u7684\u63a7\u5236\u6761\u4ef6\uff0c\u5982\u5b89\u5168\u6027\u3001\u60c5\u611f\u503e\u5411\u3001\u4e3b\u9898\u4e00\u81f4\u6027\u4ee5\u53ca\u8bed\u8a00\u98ce\u683c\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u6709\u7528\u6027\u3001\u6d41\u7545\u6027\u548c\u591a\u6837\u6027\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u56de\u987e\u4e86CTG\u5728LLMs\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8be6\u7ec6\u5b9a\u4e49\u4e86\u5176\u6838\u5fc3\u6982\u5ff5\uff0c\u5e76\u660e\u786e\u4e86\u63a7\u5236\u6761\u4ef6\u548c\u6587\u672c\u8d28\u91cf\u7684\u8981\u6c42\u3002\u6211\u4eec\u5c06CTG\u4efb\u52a1\u5206\u4e3a\u4e24\u5927\u7c7b\uff1a\u5185\u5bb9\u63a7\u5236\u548c\u5c5e\u6027\u63a7\u5236\uff0c\u5e76\u5bf9\u6bcf\u79cd\u7c7b\u578b\u7684\u65b9\u6cd5\u8fdb\u884c\u4e86\u8ba8\u8bba\uff0c\u5305\u62ec\u6a21\u578b\u91cd\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u5f3a\u5316\u5b66\u4e60\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6f5c\u5728\u7a7a\u95f4\u64cd\u7eb5\u548c\u89e3\u7801\u65f6\u5e72\u9884\u3002\u6211\u4eec\u5206\u6790\u4e86\u6bcf\u79cd\u65b9\u6cd5\u7684\u7279\u70b9\u3001\u4f18\u52bf\u548c\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u5b9e\u73b0\u751f\u6210\u63a7\u5236\u7684\u6df1\u5165\u89c1\u89e3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u56de\u987e\u4e86CTG\u8bc4\u4f30\u65b9\u6cd5\u3001\u603b\u7ed3\u4e86\u5176\u8de8\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u6307\u51fa\u4e86\u5f53\u524d\u7814\u7a76\u7684\u5173\u952e\u6311\u6218\uff0c\u5982\u6d41\u7545\u5ea6\u548c\u5b9e\u7528\u6027\u7684\u964d\u4f4e\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u82e5\u5e72\u547c\u5401\uff0c\u5f3a\u8c03\u672a\u6765\u7814\u7a76\u5e94\u66f4\u6ce8\u91cd\u5b9e\u9645\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u4e3a\u8be5\u9886\u57df\u7684\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6307\u5bfc\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u548c\u4e2d\u6587\u7248\u672c\u5df2\u5f00\u6e90\u5728https://github.com/IAAR-Shanghai/CTGSurvey\u3002**|\n", "2408.12579": "|**2024-08-22**|**RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment**|Xiaohan Wang et.al.|[2408.12579](http://arxiv.org/abs/2408.12579)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u3001MedPaLM-2\u548cMed-Gemini\u5728\u5404\u7c7b\u533b\u7597\u8bc4\u4f30\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u4e0e\u533b\u5b66\u4e13\u5bb6\u7ade\u4e89\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u4e0e\u533b\u751f\u76f8\u5ab2\u7f8e\u7684\u4e13\u4e1a\u8bca\u65ad\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u9ad8\u6548\u6536\u96c6\u60a3\u8005\u4fe1\u606f\u4ee5\u53ca\u63a8\u7406\u6700\u7ec8\u8bca\u65ad\u7684\u8fc7\u7a0b\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRuleAlign\u7684\u6846\u67b6\uff0c\u65e8\u5728\u4f7fLLM\u4e0e\u7279\u5b9a\u8bca\u65ad\u89c4\u5219\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u57fa\u4e8e\u89c4\u5219\u7684\u533b\u60a3\u5bf9\u8bdd\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u8fdb\u884c\u5bf9\u9f50\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u671f\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u591f\u542f\u53d1\u63a2\u7d22LLM\u4f5c\u4e3aAI\u533b\u5e08\u7684\u6f5c\u529b\u3002|\n", "2408.12570": "|**2024-08-22**|**Jamba-1.5: Hybrid Transformer-Mamba Models at Scale**|Jamba Team et.al.|[2408.12570](http://arxiv.org/abs/2408.12570)|null|\u6211\u4eec\u63a8\u51fa\u4e86Jamba-1.5\uff0c\u57fa\u4e8e\u6211\u4eecJamba\u67b6\u6784\u7684\u65b0\u578b\u6307\u4ee4\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002Jamba\u662f\u4e00\u79cd\u6df7\u5408Transformer-Mamba\u4e13\u5bb6\u6df7\u5408\u67b6\u6784\uff0c\u5b83\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u8303\u56f4\u5185\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u4f7f\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eTransformer\u6a21\u578b\u76f8\u540c\u6216\u66f4\u597d\u7684\u8d28\u91cf\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4e24\u79cd\u6a21\u578b\u5927\u5c0f\uff1aJamba-1.5-Large\uff0c\u5177\u670994B\u4e2a\u6d3b\u8dc3\u53c2\u6570\uff1b\u4ee5\u53caJamba-1.5-Mini\uff0c\u5177\u670912B\u4e2a\u6d3b\u8dc3\u53c2\u6570\u3002\u8fd9\u4e24\u79cd\u6a21\u578b\u5747\u9488\u5bf9\u591a\u79cd\u5bf9\u8bdd\u548c\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5e76\u4e14\u5177\u6709256K\u4ee4\u724c\u7684\u6700\u5927\u6709\u6548\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u5728\u5f00\u653e\u6743\u91cd\u6a21\u578b\u4e2d\u6700\u5927\u3002\u4e3a\u4e86\u652f\u6301\u6210\u672c\u6548\u76ca\u7684\u63a8\u7406\uff0c\u6211\u4eec\u5f15\u5165\u4e86ExpertsInt8\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u91cf\u5316\u6280\u672f\uff0c\u5141\u8bb8\u5728\u5904\u7406256K\u4ee4\u724c\u4e0a\u4e0b\u6587\u65f6\u5c06Jamba-1.5-Large\u6a21\u578b\u653e\u5165\u5177\u67098\u4e2a80GB GPU\u7684\u673a\u5668\u4e0a\u800c\u4e0d\u4f1a\u635f\u5931\u8d28\u91cf\u3002\u5f53\u5728\u4e00\u7cfb\u5217\u5b66\u672f\u548c\u804a\u5929\u673a\u5668\u4eba\u57fa\u51c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\u65f6\uff0cJamba-1.5\u6a21\u578b\u53d6\u5f97\u4e86\u51fa\u8272\u7684\u7ed3\u679c\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u5e76\u4f18\u4e8e\u5176\u4ed6\u5f00\u653e\u6743\u91cd\u6a21\u578b\u5728\u957f\u4e0a\u4e0b\u6587\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u4e24\u79cd\u5927\u5c0f\u7684\u6a21\u578b\u7684\u6743\u91cd\u90fd\u6839\u636eJamba\u5f00\u653e\u6a21\u578b\u8bb8\u53ef\u516c\u5f00\u63d0\u4f9b\uff0c\u5e76\u4e14\u6211\u4eec\u53d1\u5e03\u4e86ExpertsInt8\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u3002|\n", "2408.12561": "|**2024-08-22**|**ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation**|Lujia Zhong et.al.|[2408.12561](http://arxiv.org/abs/2408.12561)|**[link](https://github.com/lujiazho/ssprop)**|**\u8fd1\u671f\uff0c\u6df1\u5ea6\u5b66\u4e60\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u6a21\u578b\u9886\u57df\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u6982\u7387\u6027\u6269\u6563\u6a21\u578b\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u8fd9\u4e9b\u6a21\u578b\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u6d88\u8017\u6570\u5341\u4ebf\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08petaFLOPs\uff09\uff0c\u5bfc\u81f4\u5de8\u5927\u7684\u80fd\u6e90\u6d88\u8017\u548c\u78b3\u8db3\u8ff9\uff0c\u5f15\u53d1\u4e86\u5bf9\u73af\u5883\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u5728\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u8fc7\u7a0b\u4e2d\uff0c\u53cd\u5411\u4f20\u64ad\uff08Back-propagation, BP\uff09\u662f\u4e3b\u8981\u7684\u8ba1\u7b97\u8d1f\u62c5\u6765\u6e90\u3002 \u4e3a\u4e86\u63a8\u52a8\u80fd\u6e90\u6548\u7387\u7684\u63d0\u9ad8\uff0c\u5e76\u5141\u8bb8\u5728\u4efb\u4f55\u673a\u5668\u548c\u8bbe\u5907\u4e0a\u5b9e\u73b0\u7a00\u758f\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u3001\u80fd\u6e90\u9ad8\u6548\u7684\u5377\u79ef\u6a21\u5757\uff0c\u5b83\u80fd\u591f\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u4e2d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u901a\u9053\u7ea7\u7a00\u758f\u6027\uff0c\u5e76\u57fa\u4e8e\u5047\u8bbeBP\u901a\u5e38\u5bc6\u96c6\u4e14\u4f4e\u6548\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u8fc7\u62df\u5408\u548c\u9ad8\u8ba1\u7b97\u6d88\u8017\uff0c\u63d0\u51fa\u4e86\u989d\u5916\u7684\u68af\u5ea6\u9009\u62e9\u8c03\u5ea6\u5668\uff0c\u5728\u53cd\u5411\u4f20\u64ad\u9636\u6bb5\u8fdb\u884c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c1140%\u7684\u8ba1\u7b97\u91cf\uff0c\u540c\u65f6\u6709\u53ef\u80fd\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u751f\u6210\u4efb\u52a1\u4e0a\u5f97\u5230\u9a8c\u8bc1\u3002\u8fd9\u79cd\u51cf\u5c11\u53ef\u4ee5\u5e26\u6765\u663e\u8457\u7684\u80fd\u6e90\u8282\u7701\u548c\u8f83\u4f4e\u7684\u78b3\u8db3\u8ff9\uff0c\u5c24\u5176\u662f\u5728\u5927\u578bAI\u7cfb\u7edf\u7684\u7814\u7a76\u4e0e\u5f00\u53d1\u9636\u6bb5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u4e0d\u540c\u4e8eDropout\u7684\u65b9\u5f0f\u7f13\u89e3\u4e86\u8fc7\u62df\u5408\u95ee\u9898\uff0c\u5141\u8bb8\u5b83\u4e0eDropout\u7ed3\u5408\u4f7f\u7528\uff0c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u5e76\u964d\u4f4e\u8ba1\u7b97\u8d44\u6e90\u6d88\u8017\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u6570\u636e\u96c6\u548c\u4efb\u52a1\uff0c\u5e76\u4e0e\u591a\u79cd\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u548c\u6a21\u5757\u517c\u5bb9\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/lujiazho/ssProp\u3002**|\n", "2408.12547": "|**2024-08-22**|**Towards Evaluating and Building Versatile Large Language Models for Medicine**|Chaoyi Wu et.al.|[2408.12547](http://arxiv.org/abs/2408.12547)|**[link](https://github.com/magic-ai4med/meds-ins)**|**\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014MedS-Bench\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e34\u5e8a\u573a\u666f\u4e2d\u7684\u6027\u80fd\u3002\u4e0e\u73b0\u6709\u4fa7\u91cd\u4e8e\u591a\u9879\u9009\u62e9\u95ee\u9898\u56de\u7b54\u7684\u57fa\u51c6\u4e0d\u540c\uff0cMedS-Bench\u8986\u76d6\u4e8611\u4e2a\u9ad8\u7ea7\u522b\u4e34\u5e8a\u4efb\u52a1\uff0c\u5305\u62ec\u4e34\u5e8a\u62a5\u544a\u6458\u8981\u3001\u6cbb\u7597\u5efa\u8bae\u3001\u8bca\u65ad\u3001\u5b9e\u4f53\u8bc6\u522b\u548c\u533b\u5b66\u6982\u5ff5\u89e3\u91ca\u7b49\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u63d0\u793a\u5bf9\u516d\u6b3e\u9886\u5148\u7684LLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5982MEDITRON\u3001Mistral\u3001InternLM 2\u3001Llama 3\u3001GPT-4\u548cClaude-3.5\uff0c\u53d1\u73b0\u5373\u4f7f\u662f\u6700\u9ad8\u7ea7\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u4e0a\u4e5f\u5b58\u5728\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MedS-Ins\uff0c\u4e00\u4e2a\u9762\u5411\u533b\u5b66\u9886\u57df\u7684\u5927\u578b\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u3002MedS-Ins\u5305\u542b\u4e8658\u4e2a\u533b\u5b66\u76f8\u5173\u7684\u8bed\u8a00\u8bed\u6599\u5e93\uff0c\u603b\u8ba11350\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86122\u4e2a\u4efb\u52a1\u3002\u901a\u8fc7\u5c55\u793a\u8be5\u6570\u636e\u96c6\u7684\u7528\u9014\uff0c\u6211\u4eec\u5728\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u5f00\u6e90\u7684\u533b\u7597\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u6307\u4ee4\u8c03\u4f18\u5b9e\u9a8c\uff0c\u7ed3\u679c\u5f97\u5230\u4e86\u540d\u4e3aMMedIns-Llama 3\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u5728\u51e0\u4e4e\u6240\u6709\u4e34\u5e8a\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u90fd\u8d85\u8fc7\u4e86\u73b0\u6709\u6a21\u578b\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9LLMs\u5e94\u7528\u4e8e\u4e34\u5e8a\u6311\u6218\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06MedS-Ins\u6570\u636e\u96c6\u5b8c\u5168\u516c\u5f00\uff0c\u5e76\u9080\u8bf7\u7814\u7a76\u793e\u533a\u53c2\u4e0e\u5176\u6269\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u52a8\u6001\u6392\u884c\u699c\uff0c\u8ba1\u5212\u5b9a\u671f\u66f4\u65b0\u6d4b\u8bd5\u96c6\uff0c\u4ee5\u8ddf\u8e2a\u8fdb\u5c55\u5e76\u589e\u5f3a\u901a\u7528LLM\u5728\u533b\u5b66\u9886\u57df\u4e2d\u7684\u9002\u5e94\u80fd\u529b\u3002\u6392\u884c\u699c\uff1ahttps://henrychur.github.io/MedS-Bench/\u3002Github\uff1ahttps://github.com/MAGIC-AI4Med/MedS-Ins\u3002**|\n", "2408.12496": "|**2024-08-22**|**MEDCO: Medical Education Copilots Based on A Multi-Agent Framework**|Hao Wei et.al.|[2408.12496](http://arxiv.org/abs/2408.12496)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u5b66\u548c\u5065\u5eb7\u9886\u57df\u7b49\u591a\u4e2a\u7814\u7a76\u9886\u57df\u4ea7\u751f\u4e86\u91cd\u5927\u5f71\u54cd\uff0c\u7136\u800cLLMs\u4f5c\u4e3a\u533b\u7597\u6559\u80b2\u4e2d\u7684\u52a9\u624b\u6f5c\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684AI\u8f85\u52a9\u6559\u80b2\u5de5\u5177\u53d7\u9650\u4e8e\u5355\u4e00\u5b66\u4e60\u65b9\u6cd5\u4ee5\u53ca\u65e0\u6cd5\u6a21\u62df\u5b9e\u9645\u533b\u7597\u57f9\u8bad\u7684\u591a\u5b66\u79d1\u6027\u548c\u4e92\u52a8\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDCO\uff08Medical EDucation COpilots\uff09\u7684\u65b0\u578b\u591a\u4ee3\u7406\u52a9\u624b\u7cfb\u7edf\uff0c\u4e13\u95e8\u7528\u4e8e\u6a21\u62df\u771f\u5b9e\u4e16\u754c\u533b\u7597\u57f9\u8bad\u73af\u5883\u3002MEDCO\u6574\u5408\u4e86\u4e09\u4e2a\u6838\u5fc3\u4ee3\u7406\uff1a\u4e00\u4e2a\u81ea\u4e3b\u60a3\u8005\u3001\u4e00\u4f4d\u4e13\u5bb6\u533b\u751f\u548c\u4e00\u4f4d\u653e\u5c04\u79d1\u533b\u5e08\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u548c\u4e92\u52a8\u7684\u5b66\u4e60\u73af\u5883\u3002\u6211\u4eec\u7684\u6846\u67b6\u7740\u91cd\u4e8e\u6559\u6388\u9ad8\u6548\u63d0\u95ee\u6280\u5de7\u3001\u8de8\u5b66\u79d1\u534f\u4f5c\u4ee5\u53ca\u5b66\u751f\u4e4b\u95f4\u7684\u540c\u4f34\u8ba8\u8bba\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7MEDCO\u8bad\u7ec3\u7684\u865a\u62df\u5b66\u751f\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u4e0e\u9ad8\u7ea7\u6a21\u578b\u76f8\u5ab2\u7f8e\u7684\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u8fd8\u5c55\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u5b66\u4e60\u884c\u4e3a\u548c\u8fdb\u6b65\uff0c\u5e76\u4e14\u5b66\u4e60\u6837\u672c\u6570\u91cf\u589e\u52a0\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u533b\u7597\u6559\u80b2\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u4e92\u52a8\u548c\u534f\u4f5c\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u63d0\u4f9b\u4e86\u5173\u4e8e\u96c6\u6210AI\u7684\u8bad\u7ec3\u6a21\u5f0f\u6709\u6548\u6027\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2408.12494": "|**2024-08-22**|**GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models**|Kunsheng Tang et.al.|[2408.12494](http://arxiv.org/abs/2408.12494)|**[link](https://github.com/kstanghere/gendercare-ccs24)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u4e5f\u88ab\u89c2\u5bdf\u5230\u653e\u5927\u4e86\u793e\u4f1a\u504f\u89c1\uff0c\u5c24\u5176\u662f\u4e0e\u6027\u522b\u76f8\u5173\u7684\u504f\u89c1\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u5df2\u7ecf\u63d0\u51fa\u4e86\u82e5\u5e72\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u7f3a\u4e4f\u5b9e\u9645\u7684\u7075\u6d3b\u6027\u6216\u65e0\u610f\u4e2d\u5f15\u5165\u4e86\u504f\u89c1\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenderCARE\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\uff0c\u5305\u62ec\u521b\u65b0\u7684\u51c6\u5219\u3001\u8bc4\u4f30\u3001\u51cf\u5c11\u6280\u672f\u4ee5\u53ca\u8bc4\u4ef7\u6307\u6807\uff0c\u65e8\u5728\u91cf\u5316\u548c\u51cf\u8f7bLLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u5f00\u521b\u6027\u7684\u6027\u522b\u5e73\u7b49\u57fa\u51c6\u51c6\u5219\uff0c\u8986\u76d6\u4e86\u5305\u5bb9\u6027\u3001\u591a\u6837\u6027\u3001\u53ef\u89e3\u91ca\u6027\u3001\u5ba2\u89c2\u6027\u3001\u7a33\u5065\u6027\u548c\u73b0\u5b9e\u6027\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6839\u636e\u8fd9\u4e9b\u51c6\u5219\uff0c\u6211\u4eec\u6784\u5efa\u4e86GenderPair\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u914d\u5bf9\u57fa\u51c6\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u6807\u51c6\u5316\u4e14\u73b0\u5b9e\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u4ee5\u524d\u88ab\u5ffd\u89c6\u7684\u6027\u522b\u7fa4\u4f53\uff0c\u5982\u8de8\u6027\u522b\u8005\u548c\u975e\u4e8c\u5143\u4e2a\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u6709\u6548\u7684\u53bb\u504f\u6280\u672f\uff0c\u5305\u62ec\u53cd\u4e8b\u5b9e\u6570\u636e\u589e\u5f3a\u548c\u4e13\u95e8\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ee5\u5728\u4e0d\u635f\u5bb3LLM\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u51cf\u5c11\u6027\u522b\u504f\u89c1\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u572817\u4e2a\u4e0d\u540c\u7684LLM\u4e0a\uff0c\u5404\u79cd\u6027\u522b\u504f\u89c1\u57fa\u51c6\u7684\u663e\u8457\u51cf\u5c11\uff0c\u6700\u9ad8\u53ef\u8fbe\u8d85\u8fc790%\uff0c\u5e73\u5747\u503c\u8d85\u8fc735%\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u51cf\u5c11\u5e26\u6765\u7684\u4e3b\u6d41\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u7684\u53d8\u5f02\u6027\u4fdd\u6301\u57282%\u4ee5\u4e0b\u3002\u901a\u8fc7\u63d0\u4f9b\u771f\u5b9e\u6027\u7684\u8bc4\u4f30\u548c\u9488\u5bf9\u6027\u522b\u504f\u89c1\u7684\u5b9a\u5236\u51cf\u5c11\uff0c\u6211\u4eec\u5e0c\u671bGenderCARE\u80fd\u591f\u4ee3\u8868\u5728LLM\u4e2d\u5b9e\u73b0\u516c\u5e73\u548c\u516c\u6b63\u7684\u4e00\u4e2a\u91cd\u8981\u6b65\u9aa4\u3002\u66f4\u591a\u7ec6\u8282\u8bf7\u53c2\u9605https://github.com/kstanghere/GenderCARE-ccs24\u3002**|\n", "2408.12480": "|**2024-08-23**|**Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese**|Khang T. Doan et.al.|[2408.12480](http://arxiv.org/abs/2408.12480)|null|\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Vintern-1B\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u8d8a\u5357\u8bed\u4efb\u52a1\u7684\u53ef\u9760\u7684\u4e00\u767e\u4ebf\u53c2\u6570\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u3002\u901a\u8fc7\u6574\u5408Qwen2-0.5B-Instruct\u8bed\u8a00\u6a21\u578b\u4e0eInternViT-300M-448px\u89c6\u89c9\u6a21\u578b\uff0cVintern-1B\u4f18\u5316\u4e86\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\uff08OCR\uff09\u3001\u6587\u6863\u63d0\u53d6\u548c\u8d8a\u5357\u8bed\u4e0a\u4e0b\u6587\u4e2d\u7684\u901a\u7528\u95ee\u9898\u56de\u7b54\u7b49\u5e94\u7528\u3002\u8be5\u6a21\u578b\u5728\u8d85\u8fc7\u4e09\u767e\u4e07\u5f20\u56fe\u50cf-\u95ee\u9898-\u7b54\u6848\u5bf9\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5b9e\u73b0\u4e86\u5728\u591a\u4e2a\u8d8a\u5357\u8bed\u57fa\u51c6\u6d4b\u8bd5\u5982OpenViVQA\u548cViTextVQA\u4e0a\u7684\u7a33\u5065\u6027\u80fd\u548c\u53ef\u9760\u7ed3\u679c\u3002Vintern-1B\u8db3\u591f\u5c0f\uff0c\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u5404\u79cd\u79bb\u7ebf\u5e94\u7528\u4e2d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u6e90\u4e86\u51e0\u7ec4\u7528\u4e8e\u6587\u672c\u548c\u56fe\u8868\u7684\u8d8a\u5357\u8bed\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u6570\u636e\u96c6\uff0c\u4f7f\u7528\u7684\u662fGemini 1.5 Flash\u521b\u5efa\u7684\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://huggingface.co/5CD-AI/Vintern-1B-v2\u3002|\n", "2408.12475": "|**2024-08-22**|**Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition**|Bozheng Li et.al.|[2408.12475](http://arxiv.org/abs/2408.12475)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65f6\u5e8f\u5e8f\u5217\u611f\u77e5\u6a21\u578b\uff08TSAM\uff09\u4ee5\u8fdb\u884c\u5c11\u91cf\u6837\u672c\u52a8\u4f5c\u8bc6\u522b\uff08FSAR\uff09\uff0c\u8be5\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u6846\u67b6\u4e2d\u5f15\u5165\u4e86\u5e8f\u5217\u611f\u77e5\u5668\u9002\u914d\u5668\uff0c\u65e8\u5728\u6574\u5408\u7a7a\u95f4\u4fe1\u606f\u548c\u5e8f\u5217\u65f6\u95f4\u52a8\u6001\u5230\u7279\u5f81\u5d4c\u5165\u4e2d\u3002\u4e0e\u73b0\u6709\u901a\u8fc7\u63a2\u7d22\u6240\u6709\u5e27\u4e4b\u95f4\u5173\u7cfb\u6765\u6355\u83b7\u65f6\u95f4\u4fe1\u606f\u7684\u7ec6\u8c03\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u611f\u77e5\u5668\u7684\u9002\u914d\u5668\u80fd\u591f\u6cbf\u65f6\u95f4\u7ebf\u9012\u5f52\u5730\u6355\u6349\u5e8f\u5217\u52a8\u6001\uff0c\u5e76\u611f\u77e5\u987a\u5e8f\u53d8\u5316\u3002\u4e3a\u4e86\u83b7\u53d6\u6bcf\u4e2a\u7c7b\u522b\u7684\u5224\u522b\u6027\u8868\u793a\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bfc\u51fa\u7684\u6587\u672c\u5e93\uff0c\u5bf9\u89c6\u89c9\u539f\u578b\u8fdb\u884c\u4e86\u4e30\u5bcc\uff0c\u901a\u8fc7\u6574\u5408\u4e0a\u4e0b\u6587\u8bed\u4e49\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u4e0d\u5e73\u8861\u6700\u4f18\u4f20\u8f93\u7b56\u7565\u6765\u8fdb\u884c\u7279\u5f81\u5339\u914d\uff0c\u4ee5\u51cf\u8f7b\u4e0e\u7c7b\u522b\u65e0\u5173\u7279\u5f81\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u6709\u6548\u7684\u51b3\u7b56\u3002\u5728\u4e94\u4e2aFSAR\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u521b\u4e0b\u4e86\u65b0\u7684\u57fa\u51c6\uff0c\u4e0e\u7b2c\u4e8c\u597d\u7684\u7ade\u4e89\u5bf9\u624b\u76f8\u6bd4\u53d6\u5f97\u4e86\u663e\u8457\u7684\u4f18\u52bf\u3002|\n", "2408.12470": "|**2024-08-22**|**DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems**|Jiaju Chen et.al.|[2408.12470](http://arxiv.org/abs/2408.12470)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u4f46\u5f80\u5f80\u4f34\u968f\u7740\u63a8\u8350\u591a\u6837\u6027\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u635f\u5bb3\u7528\u6237\u4f53\u9a8c\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u5e94\u8fd0\u800c\u751f\uff0c\u5b83\u5141\u8bb8\u7528\u6237\u6307\u5b9a\u504f\u597d\u5e76\u83b7\u5f97\u6ee1\u8db3\u5176\u591a\u6837\u5316\u9700\u6c42\u7684\u63a8\u8350\u3002\u5c3d\u7ba1\u5177\u6709\u6f5c\u529b\uff0c\u73b0\u6709\u7684\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u901a\u5e38\u4f9d\u8d56\u4e8e\u7b80\u5355\u673a\u5236\uff0c\u5982\u5355\u4e00\u63d0\u793a\uff0c\u6765\u8c03\u8282\u591a\u6837\u6027\uff0c\u8fd9\u79cd\u505a\u6cd5\u672a\u80fd\u5145\u5206\u6355\u6349\u7528\u6237\u504f\u597d\u7684\u590d\u6742\u6027\u3002\u9488\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDLCRec\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8eLLM\u7684\u63a8\u8350\u7cfb\u7edf\u7684\u7cbe\u7ec6\u7c92\u5ea6\u591a\u6837\u6027\u63a7\u5236\u3002\u4e0e\u4f20\u7edf\u65b9\u6cd5\u4e0d\u540c\uff0cDLCRec\u91c7\u7528\u7cbe\u7ec6\u4efb\u52a1\u5206\u89e3\u7b56\u7565\uff0c\u5c06\u63a8\u8350\u8fc7\u7a0b\u62c6\u5206\u4e3a\u4e09\u4e2a\u4f9d\u6b21\u8fdb\u884c\u7684\u5b50\u4efb\u52a1\uff1a\u4f53\u88c1\u9884\u6d4b\u3001\u4f53\u88c1\u586b\u5145\u548c\u9879\u76ee\u9884\u6d4b\u3002\u8fd9\u4e9b\u5b50\u4efb\u52a1\u72ec\u7acb\u8bad\u7ec3\u5e76\u5728\u7528\u6237\u5b9a\u4e49\u7684\u63a7\u5236\u6570\u6307\u5bfc\u4e0b\u4f9d\u6b21\u63a8\u7406\uff0c\u786e\u4fdd\u4e86\u5bf9\u591a\u6837\u6027\u7684\u66f4\u7cbe\u786e\u63a7\u5236\u3002\u6b64\u5916\uff0c\u7a00\u7f3a\u4e14\u5206\u5e03\u4e0d\u5747\u7684\u591a\u6837\u6027\u76f8\u5173\u7528\u6237\u884c\u4e3a\u6570\u636e\u7684\u7f3a\u4e4f\u6784\u6210\u4e86\u5bf9\u5fae\u8c03\u7684\u4e25\u5cfb\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u79cd\u6570\u636e\u589e\u5f3a\u6280\u672f\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u566a\u58f0\u548c\u79bb\u7fa4\u6570\u636e\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u6280\u672f\u4f7f\u6a21\u578b\u63a5\u89e6\u5230\u66f4\u5e7f\u6cdb\u7684\u6a21\u5f0f\uff0c\u4ece\u800c\u63d0\u9ad8\u5176\u751f\u6210\u4e0d\u540c\u591a\u6837\u6027\u7684\u63a8\u8350\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDLCRec\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u5bf9\u591a\u6837\u6027\u7684\u7cbe\u786e\u63a7\u5236\uff0c\u800c\u4e14\u5728\u591a\u4e2a\u63a8\u8350\u573a\u666f\u4e2d\u90fd\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.13257": "|**2024-08-23**|**MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?**|Yi-Fan Zhang et.al.|[2408.13257](http://arxiv.org/abs/2408.13257)|null|\u8fd1\u671f\uff0c\u5168\u9762\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7814\u7a76\u793e\u533a\u4e2d\u5f15\u53d1\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u73b0\u6709\u57fa\u51c6\u6d4b\u8bd5\u5b58\u5728\u4e00\u4e9b\u666e\u904d\u7684\u969c\u788d\uff0c\u4f7f\u5f97\u8861\u91cf\u6a21\u578b\u9762\u4e34\u7684\u5b9e\u9645\u4e16\u754c\u6311\u6218\u53d8\u5f97\u56f0\u96be\uff0c\u5305\u62ec\uff1a1\uff09\u6570\u636e\u89c4\u6a21\u8f83\u5c0f\u5bfc\u81f4\u6027\u80fd\u6ce2\u52a8\u5927\uff1b2\uff09\u4f9d\u8d56\u6a21\u578b\u751f\u6210\u6ce8\u91ca\u9020\u6210\u6570\u636e\u8d28\u91cf\u53d7\u9650\uff1b3\uff09\u4efb\u52a1\u96be\u5ea6\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u56fe\u50cf\u5206\u8fa8\u7387\u6709\u9650\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86MME-RealWorld\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4ece\u516c\u5171\u6570\u636e\u96c6\u548c\u4e92\u8054\u7f51\u6536\u96c6\u4e86\u8d85\u8fc730\u4e07\u5f20\u56fe\u7247\uff0c\u5e76\u7b5b\u9009\u51fa13,366\u5f20\u9ad8\u8d28\u91cf\u56fe\u7247\u8fdb\u884c\u6807\u6ce8\u3002\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u52a8\u7528\u4e8625\u540d\u4e13\u4e1a\u6ce8\u91ca\u5458\u548c7\u540dMLLM\u9886\u57df\u7684\u4e13\u5bb6\uff0c\u5171\u8d21\u732e\u4e8629,429\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u6db5\u76d6\u4e865\u79cd\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4e0b\u768443\u4e2a\u5b50\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u751a\u81f3\u5bf9\u4eba\u7c7b\u6765\u8bf4\u4e5f\u6781\u5177\u6311\u6218\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cMME-RealWorld\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u4eba\u5de5\u6807\u6ce8\u57fa\u51c6\uff0c\u5176\u7279\u5f81\u4e3a\u6700\u9ad8\u5206\u8fa8\u7387\u4ee5\u53ca\u4e13\u6ce8\u4e8e\u771f\u5b9e\u4e16\u754c\u5e94\u7528\u7684\u76ee\u6807\u5bfc\u5411\u3002 \u6211\u4eec\u8fdb\u4e00\u6b65\u5bf928\u4e2a\u9886\u5148\u7684MLLM\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u5982GPT-4o\u3001Gemini 1.5 Pro\u548cClaude 3.5 Sonnet\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u65e0\u6cd5\u5e94\u5bf9\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u6ca1\u6709\u4e00\u4e2a\u6a21\u578b\u8fbe\u523060%\u7684\u51c6\u786e\u7387\u3002\u611f\u77e5\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u7406\u89e3\u590d\u6742\u7684\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4ecd\u7136\u662f\u4e9f\u5f85\u89e3\u51b3\u7684\u5173\u952e\u95ee\u9898\u3002\u76f8\u5173\u7684\u6570\u636e\u548c\u8bc4\u4f30\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://mme-realworld.github.io/ \u3002|\n", "2408.13253": "|**2024-08-23**|**Domain-specific long text classification from sparse relevant information**|C\u00e9lia D'Cruz et.al.|[2408.13253](http://arxiv.org/abs/2408.13253)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65e0\u7591\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\uff0c\u5f53\u524d\u7684\u8d8b\u52bf\u662f\u63a8\u52a8\u5355\u4e00\u6a21\u578b\u89e3\u51b3\u6240\u6709\u4efb\u52a1\uff08\u5982\u60c5\u611f\u5206\u6790\u3001\u7ffb\u8bd1\u7b49\uff09\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u7a00\u758f\u4fe1\u606f\u6216\u5f31\u4fe1\u53f7\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u7edf\u8ba1\u673a\u5236\u96be\u4ee5\u6709\u6548\u5229\u7528\u5173\u952e\u4fe1\u606f\u3002\u4f8b\u5982\uff0c\u5728\u957f\u7bc7\u7279\u5b9a\u9886\u57df\u6587\u6863\u7684\u5206\u7c7b\u4e2d\uff0c\u76f8\u5173\u6027\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4e00\u4e2a\u6216\u51e0\u4e2a\u5173\u952e\u672f\u8bed\u3002\u533b\u7597\u9886\u57df\u4e2d\uff0c\u786e\u5b9a\u67d0\u4e2a\u62a5\u544a\u662f\u5426\u5305\u542b\u4e86\u5173\u4e8e\u60a3\u8005\u72b6\u51b5\u7684\u5173\u952e\u4fe1\u606f\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5173\u952e\u4fe1\u606f\u901a\u5e38\u57fa\u4e8e\u4e00\u4e24\u4e2a\u7279\u5b9a\u7684\u5b64\u7acb\u672f\u8bed\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5c42\u6b21\u5316\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e00\u4e2a\u6f5c\u5728\u76ee\u6807\u672f\u8bed\u5217\u8868\u6765\u68c0\u7d22\u5019\u9009\u53e5\u5b50\uff0c\u5e76\u5c06\u8fd9\u4e9b\u53e5\u5b50\u8868\u793a\u4e3a\u5305\u542b\u5b83\u4eec\u7684\u76ee\u6807\u672f\u8bed\u7684\u4e0a\u4e0b\u6587\u5d4c\u5165\u3002\u5bf9\u76ee\u6807\u672f\u8bed\uff08\u6216\u672f\u8bed\uff09\u7684\u5d4c\u5165\u8fdb\u884c\u805a\u5408\u5bfc\u81f4\u6587\u6863\u8868\u793a\u88ab\u7528\u4e8e\u5206\u7c7b\u3002\u6211\u4eec\u5206\u522b\u5728\u82f1\u8bed\u548c\u6cd5\u8bed\u7684\u516c\u5f00\u533b\u7597\u6587\u6863\u57fa\u51c6\u6570\u636e\u96c6\u4ee5\u53ca\u79c1\u6709\u533b\u7597\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7a84\u5c42\u7ea7\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u80cc\u666f\u4e0b\u68c0\u7d22\u76f8\u5173\u957f\u6587\u6863\u65b9\u9762\u4f18\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2408.13233": "|**2024-08-23**|**Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time**|Yingyu Liang et.al.|[2408.13233](http://arxiv.org/abs/2408.13233)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5feb\u901f\u8ba1\u7b97\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u4e2d\u7684\u68af\u5ea6\u8ba1\u7b97\u3002\u8be5\u65b9\u6cd5\u5728\u51e0\u4e4e\u7ebf\u6027\u65f6\u95f4\u5185$n^{1+o(1)}$\u8ba1\u7b97\u6574\u4e2a\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u7684\u68af\u5ea6\uff0c\u5176\u4e2d$n$\u662f\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u3002\u8fd9\u4e00\u7a81\u7834\u6781\u5927\u5730\u964d\u4f4e\u4e86\u4f20\u7edf\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u76f8\u5173\u7684\u8ba1\u7b97\u74f6\u9888\u3002\u6211\u4eec\u7684\u7406\u8bba\u9002\u7528\u4e8e\u4efb\u4f55\u635f\u5931\u51fd\u6570\uff0c\u5e76\u5728\u5168\u6a21\u578b\u4e0a\u4fdd\u6301\u53ef\u63a7\u5236\u7684\u8fd1\u4f3c\u8bef\u5dee\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fd8\u8003\u8651\u4e86\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u5305\u542b\u8bb8\u591a\u5b9e\u7528\u5b50\u6a21\u5757\u7684\u60c5\u51b5\uff0c\u5982\u6b8b\u5dee\u8fde\u63a5\u3001\u56e0\u679c\u63a9\u7801\u548c\u591a\u5934\u6ce8\u610f\u529b\u3002\u901a\u8fc7\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u68af\u5ea6\u8ba1\u7b97\u7684\u6548\u7387\uff0c\u6211\u4eec\u671f\u671b\u901a\u8fc7\u57fa\u4e8e\u6211\u4eec\u7684\u7406\u8bba\u7ed3\u679c\u6539\u8fdb\u957f\u4e0a\u4e0b\u6587\u8bed\u8a00\u6a21\u578b\u7684\u8bad\u7ec3\u548c\u90e8\u7f72\uff0c\u4f7f\u8fd9\u4e9b\u6a21\u578b\u66f4\u52a0\u6709\u6548\u3002|\n", "2408.13214": "|**2024-08-23**|**EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods**|Hongcheng Ding et.al.|[2408.13214](http://arxiv.org/abs/2408.13214)|null|\u51c6\u786e\u9884\u6d4bEUR/USD\u6c47\u7387\u5bf9\u6295\u8d44\u8005\u3001\u4f01\u4e1a\u548c\u653f\u7b56\u5236\u5b9a\u8005\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6IUS\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u65b0\u95fb\u548c\u5206\u6790\u7684\u975e\u7ed3\u6784\u5316\u6587\u672c\u6570\u636e\u4e0e\u6c47\u7387\u548c\u91d1\u878d\u6307\u6807\u7684\u7ed3\u6784\u5316\u6570\u636e\uff0c\u4ee5\u589e\u5f3a\u6c47\u7387\u9884\u6d4b\u80fd\u529b\u3002IUS\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u60c5\u611f\u6781\u6027\u8bc4\u5206\u548c\u6c47\u7387\u53d8\u52a8\u5206\u7c7b\u3002\u8fd9\u4e9b\u6587\u672c\u7279\u5f81\u4e0e\u5b9a\u91cf\u7279\u5f81\u76f8\u7ed3\u5408\uff0c\u5e76\u8f93\u5165\u5230\u56e0\u679c\u9a71\u52a8\u7279\u5f81\u751f\u6210\u5668\u4e2d\u3002\u7136\u540e\u4f7f\u7528Optuna\u4f18\u5316\u7684Bi-LSTM\u6a21\u578b\u9884\u6d4bEUR/USD\u6c47\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6a21\u578b\u5728\u51cf\u5c11\u5e73\u5747\u7edd\u5bf9\u8bef\u5dee\uff08MAE\uff0910.69%\u548c\u6839\u5747\u65b9\u8bef\u5dee\uff08RMSE\uff099.56%\u65b9\u9762\u4f18\u4e8e\u57fa\u51c6\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u878d\u5408\u975e\u7ed3\u6784\u5316\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u51c6\u786e\u6027\u6bd4\u4ec5\u4f7f\u7528\u7ed3\u6784\u5316\u6570\u636e\u66f4\u9ad8\u3002\u6b64\u5916\uff0c\u4f7f\u7528\u9876\u7ea712\u4e2a\u91cd\u8981\u5b9a\u91cf\u7279\u5f81\u548c\u6587\u672c\u7279\u5f81\u76f8\u7ed3\u5408\u8fdb\u884c\u7279\u5f81\u9009\u62e9\u8bc1\u660e\u662f\u6700\u6709\u6548\u7684\u3002\u63d0\u51fa\u7684IUS\u6846\u67b6\u548cOptuna-Bi-LSTM\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u79cd\u5f3a\u5927\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u6e90\u6570\u636e\u96c6\u6210\u7684\u6c47\u7387\u9884\u6d4b\u3002|\n", "2408.13204": "|**2024-08-23**|**DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation**|Qiming Zhu et.al.|[2408.13204](http://arxiv.org/abs/2408.13204)|null|\u4ee3\u7801\u57fa\u51c6\uff0c\u5982HumanEval\uff0c\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u5b83\u4eec\u4f18\u52bf\u4e0e\u4e0d\u8db3\u7684\u6d1e\u5bdf\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u7528\u7f16\u7801\u4efb\u52a1\u4e0a\uff08\u4f8b\u5982\uff1a\u5192\u6ce1\u6392\u5e8f\u3001\u6700\u5927\u516c\u7ea6\u6570\uff09\uff0c\u5bf9\u9886\u57df\u7279\u5b9a\u7f16\u7801\u4efb\u52a1\uff08\u5982\u8ba1\u7b97\u3001\u7cfb\u7edf\u3001\u52a0\u5bc6\uff09\u7684\u63a2\u7d22\u5219\u8f83\u5c11\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u9886\u57df\u4ee3\u7801\u57fa\u51c6DOMAINEVAL\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLMs\u7684\u7f16\u7801\u80fd\u529b\u3002\u6211\u4eec\u7684\u6d41\u7a0b\u4ee5\u5168\u81ea\u52a8\u65b9\u5f0f\u5de5\u4f5c\uff0c\u5141\u8bb8\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u6784\u5efa\u683c\u5f0f\u5316\u7684\u7814\u7a76\u4e3b\u9898\u8fdb\u884c\u5e95\u90e8\u63a8\u52a8\u5f0f\u6784\u5efa\u3002\u901a\u8fc7\u4f7f\u752812\u4e2a\u4ee3\u8868\u6027LLM\u5728DOMAINEVAL\u4e0a\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u4e00\u4e9b\u6709\u8da3\u7684\u7ed3\u679c\u3002 \u6211\u4eec\u6ce8\u610f\u5230\uff0cLLMs\u5728\u8ba1\u7b97\u4efb\u52a1\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u4f46\u5728\u52a0\u5bc6\u548c\u7cfb\u7edf\u7f16\u7801\u4efb\u52a1\u4e0a\u5374\u6709\u6240\u6b20\u7f3a\u3002\u67d0\u4e9bLLM\u5728\u8fd9\u4e9b\u9886\u57df\u7684\u6027\u80fd\u5dee\u8ddd\u53ef\u80fd\u9ad8\u8fbe68.94%\uff0880.94%-12.0%\uff09\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u751f\u6210\u66f4\u591a\u6837\u672c\u53ef\u4ee5\u63d0\u9ad8LLMs\u7684\u6574\u4f53\u6027\u80fd\uff0c\u4f46\u9886\u57df\u504f\u89c1\u751a\u81f3\u53ef\u80fd\u589e\u52a0\u3002\u672c\u7814\u7a76\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u6570\u636e\u96c6DOMAINEVAL\uff0c\u6db5\u76d6\u516d\u4e2a\u6d41\u884c\u9886\u57df\uff0c\u4ee5\u53ca\u4e00\u4e2a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7ba1\u9053\u7528\u4e8e\u6784\u5efa\u4ee3\u7801\u57fa\u51c6\uff0c\u5e76\u57fa\u4e8e\u5728DOMAINEVAL\u4e0a\u7684\u6027\u80fd\u8bc6\u522b\u4e86LLMs\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u6539\u8fdb\u7684\u65b9\u5411\u3002\u9886\u5bfc\u8005\u677f\u53ef\u5728https://domaineval.github.io/\u67e5\u770b\u3002|\n", "2408.13184": "|**2024-08-23**|**Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning**|Hourui Deng et.al.|[2408.13184](http://arxiv.org/abs/2408.13184)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\uff0c\u7a7a\u95f4\u63a8\u7406\u662f\u5b9e\u73b0\u611f\u77e5\u667a\u80fd\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u5728\u7b80\u5355\u7684\u8ff7\u5bab\u73af\u5883\u4e2d\uff0cLLM\u5728\u957f\u671f\u8def\u5f84\u89c4\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u53d7\u5230\u5176\u7a7a\u95f4\u5e7b\u89c9\u548c\u957f\u671f\u63a8\u7406\u5bfc\u81f4\u7684\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6a21\u578b\u2014\u2014\u7a7a\u95f4\u5230\u5173\u7cfb\u8f6c\u6362\u4e0e\u9012\u8fdbQ\u5b66\u4e60\uff08S2RCQL\uff09\u3002\u4e3a\u89e3\u51b3LLM\u7684\u7a7a\u95f4\u5e7b\u89c9\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7a7a\u95f4\u5230\u5173\u7cfb\u201d\u7684\u65b9\u6cd5\uff0c\u5c06\u7a7a\u95f4\u63d0\u793a\u8f6c\u5316\u4e3a\u5b9e\u4f53\u5173\u7cfb\u548c\u8868\u793a\u5b9e\u4f53\u5173\u7cfb\u94fe\u7684\u8def\u5f84\uff0c\u5145\u5206\u6316\u6398\u4e86LLM\u5728\u5e8f\u5217\u601d\u8003\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8eQ\u5b66\u4e60\u7684\u8def\u5f84\u89c4\u5212\u7b97\u6cd5\uff0c\u4ee5\u7f13\u89e3\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\uff0c\u589e\u5f3aLLM\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u72b6\u6001\u52a8\u4f5c\u7684Q\u503c\u4f5c\u4e3a\u63d0\u793a\u7684\u8f85\u52a9\u4fe1\u606f\uff0c\u6211\u4eec\u7ea0\u6b63\u4e86LLM\u7684\u5e7b\u89c9\uff0c\u5f15\u5bfcLLM\u5b66\u4e60\u6700\u4f18\u8def\u5f84\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u53cd\u5411\u8bfe\u7a0b\u5b66\u4e60\u6280\u672f\uff0c\u8fdb\u4e00\u6b65\u7f13\u89e3\u4e86\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u3002\u8be5\u6280\u672f\u901a\u8fc7\u964d\u4f4e\u4efb\u52a1\u96be\u5ea6\u5e76\u5229\u7528\u6210\u529f\u7ecf\u9a8c\uff0c\u5e2e\u52a9LLM\u5feb\u901f\u79ef\u7d2f\uff0c\u5e76\u4ee5\u6b64\u6765\u5e94\u5bf9\u66f4\u590d\u6742\u4efb\u52a1\u3002\u6211\u4eec\u5728\u767e\u5ea6\u81ea\u4e3b\u7814\u53d1\u7684LLM\uff1aERNIE-Bot 4.0\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684S2RCQL\u5728\u6210\u529f\u7387\u548c\u6700\u4f18\u6027\u65b9\u9762\u5206\u522b\u63d0\u9ad8\u4e8623%\u81f340%\uff0c\u76f8\u8f83\u4e8e\u5148\u8fdb\u7684\u63d0\u793a\u5de5\u7a0b\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002|\n", "2408.13073": "|**2024-08-23**|**IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models**|Zhihao Yu et.al.|[2408.13073](http://arxiv.org/abs/2408.13073)|**[link](https://github.com/yzhHoward/IntelliCare)**|\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u6570\u636e\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u53d6\u5f97\u5de8\u5927\u8fdb\u6b65\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u5728\u5904\u7406\u6709\u9650\u6570\u636e\u4e2d\u7684\u591a\u6837\u5316\u7684\u533b\u5b66\u4ee3\u7801\u65f6\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u5176\u8bed\u4e49\u3002\u5f15\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u77e5\u8bc6\u6574\u5408\u4e3a\u63d0\u5347\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u5206\u6790\u53ef\u80fd\u4f1a\u56e0\u6b67\u4e49\u95ee\u9898\u548c\u4e0d\u4e00\u81f4\u6027\u5bfc\u81f4\u663e\u8457\u7684\u6ce2\u52a8\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u6709\u6548\u5229\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aIntelliCare\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528LLM\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u60a3\u8005\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e76\u589e\u5f3a\u73b0\u6709\u7684EHR\u6a21\u578b\u6765\u6539\u5584\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u3002\u5177\u4f53\u6765\u8bf4\uff0cIntelliCare\u901a\u8fc7\u8bc6\u522b\u60a3\u8005\u7fa4\u4f53\uff0c\u5e76\u5229\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u7edf\u8ba1\u4fe1\u606f\u6765\u589e\u5f3aLLM\u7684\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6b67\u4e49\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5b83\u901a\u8fc7\u7ed3\u5408EHR\u6a21\u578b\u548c\u56f0\u60d1\u5ea6\u91cf\u6765\u7ec6\u5316\u4eceLLM\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u91c7\u7528\u6df7\u5408\u65b9\u6cd5\u751f\u6210\u591a\u4e2a\u5206\u6790\u7ed3\u679c\u5e76\u8fdb\u884c\u6821\u51c6\u3002\u5728\u4e09\u4e2a\u4e34\u5e8a\u9884\u6d4b\u4efb\u52a1\u4e0a\u5bf9\u4e24\u4e2a\u5927\u89c4\u6a21EHR\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cIntelliCare\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u73b0\u6709\u65b9\u6cd5\u7684\u8868\u73b0\uff0c\u51f8\u663e\u4e86\u5176\u5728\u63a8\u8fdb\u4e2a\u6027\u5316\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u548c\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.13071": "|**2024-08-23**|**Guiding IoT-Based Healthcare Alert Systems with Large Language Models**|Yulan Gao et.al.|[2408.13071](http://arxiv.org/abs/2408.13071)|null|\u5728\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff08HAS\uff09\u9886\u57df\uff0c\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u3001\u7269\u8054\u7f51\uff08IoT\uff09\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u4ee5\u53ca\u516c\u4f17\u5065\u5eb7\u610f\u8bc6\u7684\u63d0\u9ad8\uff0cHAS\u6b63\u7ecf\u5386\u7740\u5feb\u901f\u7684\u53d8\u9769\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u5b58\u5728\u4e00\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u5982\u4f55\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u5728\u4e2a\u6027\u5316\u5065\u5eb7\u8b66\u62a5\u7684\u51c6\u786e\u6027\u4e0e\u4e25\u683c\u9690\u79c1\u4fdd\u62a4\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u70b9\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-HAS\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff09\u3002\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u878d\u5165\u5230HAS\u4e2d\uff0c\u4ee5\u663e\u8457\u63d0\u5347\u8b66\u62a5\u7684\u51c6\u786e\u6027\u3001\u786e\u4fdd\u7528\u6237\u9690\u79c1\uff0c\u5e76\u589e\u5f3a\u4e2a\u6027\u5316\u533b\u7597\u670d\u52a1\uff0c\u540c\u65f6\u6539\u5584\u7528\u6237\u4f53\u9a8c\u7684\u8d28\u91cf\uff08QoE\uff09\u3002\u6211\u4eec\u7684\u521b\u65b0\u6846\u67b6\u91c7\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u65b9\u6cd5\uff0c\u7ed3\u5408LLM\uff0c\u901a\u8fc7\u5206\u6790\u7528\u6237\u7684\u4e2a\u6027\u5316\u504f\u597d\u548c\u6f5c\u5728\u5065\u5eb7\u98ce\u9669\u6765\u5904\u7406\u989d\u5916\u7684\u6587\u672c\u5de5\u4f5c\u63cf\u8ff0\u3002\u8fd9\u79cd\u5206\u6790\u6307\u5bfc\u4e86\u4e13\u95e8\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08DDPG\uff09\u4e13\u5bb6\u7684\u9009\u62e9\uff0c\u4ed6\u4eec\u8d1f\u8d23\u63d0\u4f9b\u7cbe\u786e\u7684\u5065\u5eb7\u8b66\u62a5\u3002\u6b64\u5916\uff0cLLM-HAS\u80fd\u591f\u5904\u7406\u5bf9\u8bdd\u5f0f\u7528\u6237\u53cd\u9988\uff0c\u4e0d\u4ec5\u5141\u8bb8\u5bf9DDPG\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd8\u80fd\u52a0\u6df1\u7528\u6237\u53c2\u4e0e\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8\u5065\u5eb7\u7ba1\u7406\u7b56\u7565\u7684\u51c6\u786e\u6027\u548c\u4e2a\u6027\u5316\u7a0b\u5ea6\u3002 \u6a21\u62df\u7ed3\u679c\u9a8c\u8bc1\u4e86LLM-HAS\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5176\u4f5c\u4e3a\u5229\u7528\u751f\u6210\u578b\u4eba\u5de5\u667a\u80fd\uff08GAI\uff09\u63d0\u4f9b\u9ad8\u5ea6\u51c6\u786e\u53ef\u9760\u8b66\u62a5\u7684\u7a81\u7834\u6027\u65b9\u6cd5\u7684\u6f5c\u529b\u3002|\n", "2408.13031": "|**2024-08-23**|**VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models**|Wentao Wu et.al.|[2408.13031](http://arxiv.org/abs/2408.13031)|**[link](https://github.com/event-ahu/vfm-det)**|**\u73b0\u6709\u8f66\u8f86\u68c0\u6d4b\u5668\u901a\u5e38\u901a\u8fc7\u5728\u57fa\u4e8e\u9884\u8bad\u7ec3\u4e3b\u5e72\uff08\u5982ResNet\u3001ViT\uff09\u7684\u9884\u8bad\u7ec3\u5178\u578b\u68c0\u6d4b\u5668\uff08\u4f8b\u5982YOLO\u3001RCNN\u3001DETR\u7cfb\u5217\uff09\u4e0a\u8fdb\u884c\u8f66\u8f86\u56fe\u50cf\u8bad\u7ec3\u83b7\u5f97\u3002\u4e00\u4e9b\u7814\u7a76\u8005\u8fd8\u5229\u7528\u5e76\u589e\u5f3a\u5927\u578b\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u68c0\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e9b\u68c0\u6d4b\u5668\u53ef\u80fd\u4ec5\u83b7\u5f97\u6b21\u4f18\u7ed3\u679c\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f7f\u7528\u7684\u5927\u578b\u6a21\u578b\u5e76\u975e\u4e13\u95e8\u4e3a\u8f66\u8f86\u8bbe\u8ba1\u3002\u6b64\u5916\uff0c\u4ed6\u4eec\u7684\u7ed3\u679c\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7279\u5f81\uff0c\u5e76\u4e14\u5f88\u5c11\u8003\u8651\u8f66\u8f86\u8bed\u4e49\u4fe1\u606f\u4e0e\u89c6\u89c9\u8868\u793a\u4e4b\u95f4\u7684\u5bf9\u9f50\u3002 \u5728\u6b64\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u9884\u8bad\u7ec3\u7684\u8f66\u8f86\u6a21\u578b\uff08VehicleMAE\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08T5\uff09\u7684\u65b0\u8f66\u8f86\u68c0\u6d4b\u8303\u5f0f\uff0c\u79f0\u4e3aVFM-Det\u3002\u5b83\u9075\u5faa\u533a\u57df\u5efa\u8bae\u6846\u68c0\u6d4b\u6846\u67b6\uff0c\u6bcf\u4e2a\u63d0\u8bae\u7684\u7279\u5f81\u53ef\u4ee5\u901a\u8fc7VehicleMAE\u589e\u5f3a\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684VAtt2Vec\u6a21\u5757\uff0c\u7528\u4e8e\u9884\u6d4b\u8fd9\u4e9b\u63d0\u8bae\u7684\u8f66\u8f86\u8bed\u4e49\u5c5e\u6027\u5e76\u5c06\u5b83\u4eec\u8f6c\u6362\u4e3a\u7279\u5f81\u5411\u91cf\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u589e\u5f3a\u89c6\u89c9\u7279\u5f81\u3002\u5bf9\u4e09\u4e2a\u8f66\u8f86\u68c0\u6d4b\u57fa\u51c6\u6570\u636e\u96c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u5145\u5206\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u8f66\u8f86\u68c0\u6d4b\u5668\u7684\u6709\u6548\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5206\u522b\u5728Cityscapes\u6570\u636e\u96c6\u4e0a\u7684$AP_{0.5}$\u3001$AP_{0.75}$\u6307\u6807\u4e0a\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e86$+5.1\\%$\u3001$+6.2\\%$\u3002\u6b64\u5de5\u4f5c\u7684\u6e90\u4ee3\u7801\u5c06\u5728https://github.com/Event-AHU/VFM-Det\u53d1\u5e03\u3002**|\n", "2408.13028": "|**2024-08-23**|**In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting**|Haowei Du et.al.|[2408.13028](http://arxiv.org/abs/2408.13028)|null|\u5728\u5f53\u524d\u7684\u5b66\u672f\u754c\uff0c\u5bf9\u57fa\u4e8e\u6307\u4ee4\u589e\u5f3a\u7684\u5c11\u91cf\u5b9e\u4f8b\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLM\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08In-context Learning, ICL\uff09\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u73b0\u6709\u7684\u7528\u4e8eICL\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u5229\u7528\u7a00\u758f\u6216\u5bc6\u96c6\u68c0\u7d22\u5668\uff0c\u5e76\u4e14\u80fd\u591f\u4ea7\u751f\u6709\u6548\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u5bf9\u53cd\u9988\u4fe1\u606f\u7684\u5229\u7528\u6765\u8bad\u7ec3\u68c0\u7d22\u5668\uff0c\u6240\u9009\u7684\u793a\u4f8b\u53ef\u80fd\u65e0\u6cd5\u663e\u8457\u63d0\u5347LLM\u7684\u7c7b\u6bd4\u80fd\u529b\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u57fa\u4e8e\u5f3a\u5316\u5b66\u4e60\u7684\u7b56\u7565\u6846\u67b6\uff08Policy-based Reinforcement Learning Framework, RLS\uff09\u7528\u4e8e\u793a\u4f8b\u9009\u62e9\u3002\u8be5\u6846\u67b6\u7531\u8bed\u8a00\u6a21\u578b\uff08Language Model, LM\uff09\u9009\u62e9\u5668\u548cLLM\u751f\u6210\u5668\u7ec4\u6210\u3002\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u5c06\u5019\u9009\u793a\u4f8b\u7f16\u7801\u4e3a\u5bc6\u96c6\u8868\u793a\uff0c\u5e76\u4ece\u4e2d\u9009\u62e9top-k\u4e2a\u793a\u4f8b\u4f5c\u4e3aLLM\u7684\u793a\u8303\u3002\u901a\u8fc7\u91c7\u7528LLM\u7684\u8f93\u51fa\u6765\u8ba1\u7b97\u5956\u52b1\u548c\u7b56\u7565\u68af\u5ea6\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u3002 \u6211\u4eec\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u76f8\u8f83\u4e8e\u76d1\u7763\u5fae\u8c03\uff08Supervised Fine-tuning, SFT\uff09\u6a21\u578b\u663e\u793a\u51fa\u4f18\u52bf\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u793a\u4f8b\u7684\u6570\u91cf\u4e30\u5bcc\u6027\u548c\u4e0e\u6d4b\u8bd5\u6848\u4f8b\u7684\u76f8\u4f3c\u6027\u5bf9\u4e8eICL\u4e2d\u7684LLM\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.14470": "|**2024-08-27**|**Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models**|Aradhye Agarwal et.al.|[2408.14470](http://arxiv.org/abs/2408.14470)|**[link](https://github.com/Aradhye2002/selective-peft-toolkit)**|**\u7ec6\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0b\u6e38\u4efb\u52a1\u4e0a\u9700\u8981\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\u3002\u53c2\u6570\u9ad8\u6548\u7ec6\u8c03\uff08PEFT\uff09\u7c7b\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4ec5\u5fae\u8c03\u6a21\u578b\u53c2\u6570\u7684\u5c0f\u90e8\u5206\u6765\u7f13\u89e3\u8fd9\u4e9b\u8ba1\u7b97\u6311\u6218\u3002\u867d\u7136\u4ece\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8003\u8651\uff0c\u8fd9\u4e9b\u6280\u672f\u901a\u5e38\u65e0\u6cd5\u4e0e\u5b8c\u5168\u5fae\u8c03\u7684\u6a21\u578b\u6027\u80fd\u76f8\u5339\u654c\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u56fa\u6709\u7684\u504f\u89c1\u3002\u4f20\u7edf\u7684\u9009\u62e9\u6027PEFT\u6280\u672f\u57fa\u4e8e\u9884\u5148\u5b9a\u4e49\u7684\u9884\u7b97\uff08\u4e5f\u79f0\u4e3a\u53bb\u906e\u7f69\uff09\u4f7f\u7528\u56fa\u5b9a\u53c2\u6570\u96c6\uff0c\u672a\u80fd\u52a8\u6001\u6355\u6349\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u7ecf\u5e38\u8d85\u51fa\u9884\u7b97\u3002\u6211\u4eec\u5f15\u5165\u4e86$\\text{ID}^3$\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u9009\u62e9\u6027PEFT\u65b9\u6cd5\uff0c\u5b83\u8fde\u7eed\u8ba1\u7b97\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u901a\u8fc7\u5e73\u8861\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u6765\u52a8\u6001\u5730\u53bb\u906e\u7f69\u53c2\u6570\u3002\u6211\u4eec\u572815\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8986\u76d6\u4e86\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\uff0c\u663e\u793a\u4e86\u4e0e\u57fa\u4e8e\u56fa\u5b9a\u53bb\u906e\u7f69\u7684PEFT\u6280\u672f\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\uff0c$\\text{ID}^3$\u5c06\u68af\u5ea6\u66f4\u65b0\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u4e00\u500d\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u8ba1\u7b97\u6548\u7387\u3002$\\text{ID}^3$\u5bf9\u795e\u7ecf\u5143\u7684\u968f\u673a\u521d\u59cb\u5316\u5177\u6709\u9c81\u68d2\u6027\uff0c\u56e0\u6b64\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230\u73b0\u6709\u6dfb\u52a0\u5f0f\u548c\u91cd\u65b0\u53c2\u6570\u5316\u57faPEFT\u6a21\u5757\uff0c\u5982\u9002\u914d\u5668\u548cLoRA\u4e2d\uff0c\u7528\u4e8e\u52a8\u6001\u7a00\u758f\u5316\u3002**|\n", "2408.14469": "|**2024-08-26**|**Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos**|Qirui Chen et.al.|[2408.14469](http://arxiv.org/abs/2408.14469)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u957f\u5f62\u5f0f\u7b2c\u4e00\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e2d\u7684\u591a\u8df3\u89c6\u9891\u95ee\u7b54\uff08Multi-Hop Video Question Answering\uff0cMH-VidQA\uff09\u95ee\u9898\u3002\u8fd9\u9879\u4efb\u52a1\u4e0d\u4ec5\u9700\u8981\u56de\u7b54\u89c6\u89c9\u95ee\u9898\uff0c\u8fd8\u9700\u8981\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u591a\u4e2a\u76f8\u5173\u7684\u65f6\u95f4\u6bb5\u4f5c\u4e3a\u89c6\u89c9\u8bc1\u636e\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u6d41\u7a0b\u6765\u521b\u5efa\u5e26\u6709\u5173\u8054\u65f6\u95f4\u8bc1\u636e\u7684\u591a\u8df3\u95ee\u9898\u89e3\u7b54\u914d\u5bf9\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6307\u4ee4\u8c03\u6574\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u3002\u4e3a\u4e86\u76d1\u6d4b\u8fd9\u4e00\u65b0\u4efb\u52a1\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u6574\u7406\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u57fa\u51c6\u2014\u2014MultiHop-EgoQA\uff0c\u901a\u8fc7\u4ed4\u7ec6\u7684\u624b\u52a8\u9a8c\u8bc1\u548c\u7ec6\u5316\u8fdb\u884c\u6784\u5efa\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86\u73b0\u6709\u8de8\u6a21\u6001\u7cfb\u7edf\u5728\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGrounding Scattered Evidence with Large Language Model\u201d\uff08GeLM\uff09\u7684\u65b0\u67b6\u6784\uff0c\u8be5\u67b6\u6784\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u5730\u7406\u89e3\u7801\u6a21\u5757\u589e\u5f3a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\uff0c\u8be5\u6a21\u5757\u4f7f\u7528\u7075\u6d3b\u7684\u5730\u7406\u89e3\u7801\u4ee4\u724c\u4ece\u89c6\u9891\u4e2d\u68c0\u7d22\u65f6\u95f4\u8bc1\u636e\u3002\u5728\u6211\u4eec\u7684\u89c6\u89c9\u6307\u4ee4\u6570\u636e\u4e0a\u8fdb\u884c\u8bad\u7ec3\u540e\uff0cGeLM\u5c55\u793a\u4e86\u589e\u5f3a\u7684\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4e3a\u8fd9\u4e00\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u8bbe\u5b9a\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u6b64\u5916\uff0c\u5f53\u5728\u7b2c\u4e09\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u76f8\u540c\u7684\u67b6\u6784\u5728\u5355\u8df3\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\uff08ActivityNet-RTL\uff09\u4e0a\u4e5f\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8bc1\u660e\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2408.14467": "|**2024-08-26**|**Explicit Inductive Inference using Large Language Models**|Tianyang Liu et.al.|[2408.14467](http://arxiv.org/abs/2408.14467)|null|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ba1\u9053\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fd9\u4e00\u504f\u5dee\u8fdb\u884c\u660e\u786e\u7684\u5f52\u7eb3\u63a8\u7406\u3002\u8be5\u7ba1\u9053\u4f7f\u7528LLM\u5c06\u524d\u63d0\u8f6c\u6362\u4e3a\u4e00\u7ec4\u5df2\u9a8c\u8bc1\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u805a\u5408\u884d\u751f\u7684\u65b0\u8574\u542b\u8be2\u95ee\u7684\u7b54\u6848\u6765\u652f\u6301\u539f\u59cb\u63a8\u7406\u9884\u6d4b\u3002\u5728\u65b9\u5411\u6027\u8c13\u8bcd\u8574\u542b\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u901a\u8fc7\u5e94\u7528\u6b64\u7b80\u5355\u7ba1\u9053\uff0c\u53ef\u4ee5\u63d0\u9ad8LLM\u5728\u63a8\u7406\u4e0a\u7684\u6574\u4f53\u6027\u80fd\uff0c\u5e76\u663e\u8457\u51cf\u8f7b\u5b83\u4eec\u7684\u8bc1\u5b9e\u504f\u5dee\u5f71\u54cd\u3002|\n", "2408.14438": "|**2024-08-26**|**Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study**|Liuchang Xu Shuo Zhao et.al.|[2408.14438](http://arxiv.org/abs/2408.14438)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u3001Gemini\u7b49\u7684\u95ee\u4e16\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4ee3\u7801\u751f\u6210\u7b49\u591a\u65b9\u9762\u80fd\u529b\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u65b9\u9762\u7684\u8868\u73b0\u5e76\u672a\u5f97\u5230\u5168\u9762\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u4efb\u52a1\u7a7a\u95f4\u8bc4\u4ef7\u6570\u636e\u96c6\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u548c\u6bd4\u8f83\u51e0\u79cd\u5148\u8fdb\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u5341\u4e8c\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\u7c7b\u578b\uff0c\u5305\u62ec\u7a7a\u95f4\u7406\u89e3\u548c\u8def\u5f84\u89c4\u5212\uff0c\u5e76\u4e14\u6bcf\u9879\u4efb\u52a1\u90fd\u6709\u7ecf\u8fc7\u9a8c\u8bc1\u7684\u51c6\u786e\u7b54\u6848\u3002 \u6211\u4eec\u91c7\u7528\u53cc\u9636\u6bb5\u6d4b\u8bd5\u65b9\u6cd5\u5bf9\u591a\u4e2a\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecOpenAI\u7684gpt-3.5-turbo\u3001gpt-4o\u4ee5\u53caZhipuAI\u7684glm-4\u3002\u9996\u5148\u8fdb\u884c\u96f6\u6837\u672c\u6d4b\u8bd5\uff0c\u968f\u540e\u6839\u636e\u96be\u5ea6\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\uff0c\u5e76\u6267\u884c\u4e86\u63d0\u793a\u8c03\u4f18\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u7b2c\u4e00\u9636\u6bb5\u7684\u6d4b\u8bd5\u4e2d\uff0cgpt-4o\u7684\u6574\u4f53\u51c6\u786e\u6027\u6700\u9ad8\uff0c\u5e73\u5747\u8fbe\u5230\u4e8671.3%\u3002\u5c3d\u7ba1moonshot-v1-8k\u5728\u603b\u4f53\u4e0a\u7565\u900a\u4e00\u7b79\uff0c\u4f46\u5728\u5730\u540d\u8bc6\u522b\u4efb\u52a1\u4e0a\u5374\u8d85\u8d8a\u4e86gpt-4o\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u7279\u5b9a\u4efb\u52a1\u4e2d\u63d0\u793a\u7b56\u7565\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u94fe\u5f0f\u601d\u8003\uff08COT\uff09\u7b56\u7565\u4f7fgpt-4o\u5728\u8def\u5f84\u89c4\u5212\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece12.4%\u63d0\u5347\u81f387.5%\uff0c\u800c\u4e00\u6b21\u5c04\u51fb\u7b56\u7565\u5219\u4f7fmoonshot-v1-8k\u5728\u5730\u56fe\u7ed8\u5236\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece10.1%\u63d0\u9ad8\u523076.3%\u3002|\n", "2408.14419": "|**2024-08-26**|**CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models**|Shubham Bharti et.al.|[2408.14419](http://arxiv.org/abs/2408.14419)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHARTOM\u7684\u89c6\u89c9\u7406\u8bba\u7406\u89e3\u57fa\u51c6\uff0c\u9488\u5bf9\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002CHARTOM\u7531\u4e13\u95e8\u8bbe\u8ba1\u7684\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\u7ec4\u6210\u3002\u7ed9\u5b9a\u4e00\u4e2a\u56fe\u8868\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u9700\u8981\u6b63\u786e\u7406\u89e3\u56fe\u8868\uff08\u4e8b\u5b9e\u95ee\u9898\uff09\uff0c\u8fd8\u9700\u8981\u5224\u65ad\u8be5\u56fe\u8868\u662f\u5426\u4f1a\u8ba9\u4eba\u7c7b\u8bfb\u8005\u4ea7\u751f\u8bef\u5bfc\uff08\u601d\u7ef4\u95ee\u9898\uff09\u3002\u8fd9\u4e24\u4e2a\u95ee\u9898\u90fd\u5177\u6709\u91cd\u8981\u7684\u793e\u4f1a\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u8be6\u7ec6\u4ecb\u7ecd\u6784\u5efaCHARTOM\u57fa\u51c6\u7684\u8fc7\u7a0b\uff0c\u5305\u62ec\u5176\u5bf9\u4eba\u7c7b\u8868\u73b0\u7684\u6821\u51c6\u3002|\n", "2408.14418": "|**2024-08-26**|**MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues**|Kuluhan Binici et.al.|[2408.14418](http://arxiv.org/abs/2408.14418)|null|\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b(ASR)\u7cfb\u7edf\u5728\u5c06\u8bed\u97f3\u8f6c\u6362\u4e3a\u6587\u672c\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u7136\u800c\uff0c\u5b83\u4eec\u5f15\u5165\u7684\u9519\u8bef\u4f1a\u4e25\u91cd\u964d\u4f4e\u4e0b\u6e38\u4efb\u52a1\u5982\u6458\u8981\u751f\u6210\u7684\u8868\u73b0\u3002\u8fd9\u4e2a\u95ee\u9898\u5728\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u9886\u57df\u5c24\u4e3a\u7a81\u51fa\uff0c\u8fd9\u662f\u4e00\u4e2a\u6570\u636e\u8d44\u6e90\u6709\u9650\u7684\u9886\u57df\uff0c\u7528\u4e8e\u5fae\u8c03\u7684\u76d1\u7763\u6570\u636e\u7a00\u7f3a\uff0c\u56e0\u6b64\u9700\u8981\u5c06ASR\u6a21\u578b\u4f5c\u4e3a\u9ed1\u76d2\u89e3\u51b3\u65b9\u6848\u4f7f\u7528\u3002\u4f20\u7edf\u7684\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u4e5f\u4e0d\u9002\u7528\u4e8e\u63d0\u9ad8\u6458\u8981\u6a21\u578b\u5bf9\u566a\u97f3\u7684\u9c81\u68d2\u6027\uff0c\u539f\u56e0\u662f\u7f3a\u4e4f\u8db3\u591f\u7684\u533b\u7597\u5bf9\u8bdd\u97f3\u9891\u8bb0\u5f55\u53ca\u5176\u5bf9\u5e94\u7684ASR\u8f6c\u5f55\u6587\u672c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDSAGE\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u751f\u6210\u5408\u6210\u6837\u672c\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5229\u7528LLMs\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\uff0c\u5e76\u6307\u5bfc\u5b83\u4eec\u57fa\u4e8e\u5c11\u91cf\u53ef\u7528\u7684\u533b\u7597\u5bf9\u8bdd\u793a\u4f8b\u548c\u97f3\u9891\u8bb0\u5f55\uff0c\u751f\u6210\u7c7b\u4f3cASR\u7684\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u5efa\u6a21ASR\u566a\u97f3\uff0c\u5c06\u8fd9\u79cd\u542b\u566a\u6570\u636e\u878d\u5165\u8bad\u7ec3\u8fc7\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86\u533b\u7597\u5bf9\u8bdd\u6458\u8981\u7cfb\u7edf\u7684\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u89e3\u51b3\u4e86\u5173\u952e\u5e94\u7528\u4e2dASR\u8f93\u51fa\u566a\u97f3\u7684\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u589e\u5f3a\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u53ef\u9760\u6027\u7684\u7a33\u5065\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.14398": "|**2024-08-26**|**Language-specific Calibration for Pruning Multilingual Language Models**|Simon Kurz et.al.|[2408.14398](http://arxiv.org/abs/2408.14398)|null|\u8fd1\u671f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u526a\u679d\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\uff0c\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u5353\u8d8a\u7684\u538b\u7f29\u6548\u679c\uff0c\u5e76\u4fdd\u6301\u4e86\u9ad8\u9884\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u4f7f\u7528\u82f1\u8bed\u6587\u672c\u8fdb\u884c\u526a\u679d\u6821\u51c6\uff0c\u800c\u5ffd\u7565\u4e86\u73b0\u4ee3LLM\u7684\u591a\u8bed\u8a00\u6027\u8d28\u53ca\u5176\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u7528\u4e8e\u526a\u679d\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u7b56\u7565\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u9996\u4e2a\u5168\u9762\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u4e0d\u540c\u6821\u51c6\u8bed\u8a00\u5728\u591a\u8bed\u8a00\u4efb\u52a1\u3001\u6a21\u578b\u548c\u6700\u5148\u8fdb\u7684\u526a\u679d\u6280\u672f\u4e0b\u5bf9\u526a\u679d\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63d0\u4f9b\u4e86\u5b9e\u7528\u7684\u5efa\u8bae\uff0c\u4f8b\u5982\uff0c\u5728\u76ee\u6807\u8bed\u8a00\u4e0a\u8fdb\u884c\u6821\u51c6\u53ef\u4ee5\u6709\u6548\u5730\u964d\u4f4e\u56f0\u60d1\u5ea6\uff0c\u4f46\u4e0d\u4e00\u5b9a\u80fd\u4fc3\u8fdb\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u63d0\u5347\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u5b9e\u9a8c\u63ed\u793a\uff0c\u76ee\u6807\u8bed\u8a00\u4e0a\u7684\u6821\u51c6\u4e3b\u8981\u8d21\u732e\u5728\u4e8e\u4fdd\u7559\u4e0e\u6d41\u7545\u6027\u548c\u8fde\u8d2f\u6027\u76f8\u5173\u7684\u8bed\u8a00\u7279\u5b9a\u7279\u6027\uff0c\u4f46\u53ef\u80fd\u65e0\u6cd5\u6355\u6349\u5230\u4e0e\u7406\u89e3\u80fd\u529b\u548c\u63a8\u7406\u80fd\u529b\u7b49\u8bed\u8a00\u901a\u7528\u7279\u6027\u7684\u5173\u8054\u3002 \u6700\u540e\uff0c\u6211\u4eec\u4e3a\u672a\u6765\u7684\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u5b9e\u9645\u7684\u5efa\u8bae\u3002|\n", "2408.14387": "|**2024-08-26**|**Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning**|Sakhinana Sagar Srinivas et.al.|[2408.14387](http://arxiv.org/abs/2408.14387)|null|\u7a7a\u95f4\u65f6\u95f4\u9884\u6d4b\u5728\u4ea4\u901a\u7cfb\u7edf\u3001\u7269\u6d41\u548c\u4f9b\u5e94\u94fe\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u53d7\u9650\u4e8e\u5904\u7406\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5f00\u6e90\u5927\u578b\u548c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs \u548c LMs\uff09\u4e0e\u4f20\u7edf\u9884\u6d4b\u65b9\u6cd5\u7684\u6df7\u5408\u7b56\u7565\u3002\u901a\u8fc7\u5f15\u5165\u52a8\u6001\u63d0\u793a\u548c\u5206\u7ec4\u67e5\u8be2\u3001\u591a\u5934\u6ce8\u610f\u529b\u673a\u5236\uff0c\u8be5\u7b56\u7565\u80fd\u591f\u66f4\u6709\u6548\u5730\u6355\u6349\u6f14\u53d8\u975e\u7ebf\u6027\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u4e2d\u7684\u5185\u90e8\u7cfb\u5217\u548c\u8de8\u7cfb\u5217\u4f9d\u8d56\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4f4e\u79e9\u9002\u914d\u4e0e\u6fc0\u6d3b\u8bb0\u5fc6\u51cf\u5c11\u6280\u672f\uff08LoRA-AMR\uff09\uff0c\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u5bf9\u5f00\u6e90\u5c0f\u578b LM \u8fdb\u884c\u5b9a\u5236\u5316\u5fae\u8c03\uff0c\u4ee5\u5206\u6790\u65f6\u95f4\u5e8f\u5217\u8d8b\u52bf\uff0c\u540c\u65f6\u4fdd\u7559\u63a8\u7406\u5ef6\u8fdf\u5e76\u964d\u4f4e\u8ba1\u7b97\u5f00\u9500\u548c\u6fc0\u6d3b\u5b58\u50a8\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u5c06\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0e\u4f20\u7edf\u65f6\u95f4\u5e8f\u5217\u8868\u793a\u5b66\u4e60\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u5b9e\u73b0\u8de8\u6a21\u6001\u96c6\u6210\uff0c\u4ece\u800c\u83b7\u5f97\u7a33\u5065\u4e14\u51c6\u786e\u7684\u9884\u6d4b\u7ed3\u679c\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u5b9e\u9645\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8be5\u6846\u67b6\u7684\u6548\u80fd\u5f97\u5230\u4e86\u5145\u5206\u9a8c\u8bc1\uff0c\u5176\u9884\u6d4b\u51c6\u786e\u6027\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.14380": "|**2024-08-26**|**Probing Causality Manipulation of Large Language Models**|Chenyang Zhang et.al.|[2408.14380](http://arxiv.org/abs/2408.14380)|**[link](https://github.com/tongjinlp/llm-causality-probing)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u591a\u79cd\u80fd\u529b\uff0c\u5305\u62ec\u56e0\u679c\u5173\u7cfb\u95ee\u9898\u3002\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u901a\u5e38\u57fa\u4e8e\u7edf\u8ba1\u5173\u8054\u5de5\u4f5c\uff0c\u800c\u975e\u4e13\u6ce8\u4e8e\u53e5\u5b50\u4e2d\u7684\u56e0\u679c\u4e0e\u5f71\u54cd\u3002\u56e0\u6b64\uff0c\u63a2\u7d22LLM\u5185\u90e8\u5bf9\u56e0\u679c\u6027\u7684\u64cd\u7eb5\u662f\u5fc5\u8981\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u63d0\u4f9b\u4e0d\u540c\u7684\u6377\u5f84\u5e76\u89c2\u5bdf\u6a21\u578b\u884c\u4e3a\u6765\u63a2\u67e5\u56e0\u679c\u6027\u64cd\u7eb5\u7684\u5c42\u7ea7\u3002\u6211\u4eec\u5229\u7528\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u6280\u672f\uff0c\u9488\u5bf9\u8bbe\u8ba1\u7684\u56e0\u679c\u5206\u7c7b\u4efb\u52a1\uff0c\u5bf9\u4e3b\u6d41LLM\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5305\u62ecGPT-4\u4ee5\u53ca\u4e00\u4e9b\u8f83\u5c0f\u7684\u548c\u7279\u5b9a\u9886\u57df\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM\u80fd\u591f\u8bc6\u522b\u4e0e\u56e0\u679c\u6027\u76f8\u5173\u7684\u5b9e\u4f53\uff0c\u5e76\u8ba4\u8bc6\u5230\u76f4\u63a5\u7684\u56e0\u679c\u5173\u7cfb\u3002\u7136\u800c\uff0cLLM\u7f3a\u4e4f\u4e13\u95e8\u7684\u56e0\u679c\u8ba4\u77e5\u80fd\u529b\uff0c\u53ea\u662f\u5c06\u56e0\u679c\u6027\u89c6\u4e3a\u53e5\u5b50\u6574\u4f53\u8bed\u4e49\u7684\u4e00\u90e8\u5206\u3002**|\n", "2408.14354": "|**2024-08-26**|**SWE-bench-java: A GitHub Issue Resolving Benchmark for Java**|Daoguang Zan et.al.|[2408.14354](http://arxiv.org/abs/2408.14354)|**[link](https://github.com/multi-swe-bench/multi-swe-bench-env)**|**GitHub\u95ee\u9898\u89e3\u51b3\u662f\u8f6f\u4ef6\u5de5\u7a0b\u4e2d\u7684\u5173\u952e\u4efb\u52a1\uff0c\u8fd1\u671f\u5728\u884c\u4e1a\u548c\u5b66\u672f\u754c\u90fd\u53d7\u5230\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5728\u8fd9\u4e2a\u9886\u57df\u5185\uff0cSWE-bench\u5df2\u7ecf\u53d1\u5e03\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u4ec5\u5173\u6ce8Python\u7248\u672c\u3002\u7136\u800c\uff0c\u652f\u6301\u66f4\u591a\u7f16\u7a0b\u8bed\u8a00\u540c\u6837\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5de5\u4e1a\u754c\u5bf9\u6b64\u6709\u5f3a\u70c8\u9700\u6c42\u3002\u4f5c\u4e3a\u8fc8\u5411\u591a\u8bed\u8a00\u652f\u6301\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Java\u7248\u7684SWE-bench\uff0c\u79f0\u4e3aSWE-bench-java\u3002\u6211\u4eec\u5df2\u516c\u5f00\u53d1\u5e03\u4e86\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u4e8eDocker\u7684\u8bc4\u4f30\u73af\u5883\u548c\u6392\u884c\u699c\uff0c\u8fd9\u4e9b\u90fd\u5c06\u6301\u7eed\u7ef4\u62a4\u548c\u66f4\u65b0\u3002\u4e3a\u4e86\u9a8c\u8bc1SWE-bench-java\u7684\u53ef\u9760\u6027\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u7ecf\u5178\u65b9\u6cd5SWE-agent\uff0c\u5e76\u5728\u5176\u4e2d\u6d4b\u8bd5\u4e86\u51e0\u79cd\u5f3a\u5927\u7684LLMs\u3002\u4f17\u6240\u5468\u77e5\uff0c\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u8a00\u57fa\u51c6\u65e2\u8017\u65f6\u53c8\u8d39\u529b\uff0c\u56e0\u6b64\u6211\u4eec\u6b22\u8fce\u901a\u8fc7\u62c9\u53d6\u8bf7\u6c42\u6216\u5408\u4f5c\u6765\u52a0\u901f\u5176\u8fed\u4ee3\u548c\u6539\u8fdb\uff0c\u4e3a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7f16\u7a0b\u94fa\u5e73\u9053\u8def\u3002**|\n", "2408.15240": "|**2024-08-27**|**Generative Verifiers: Reward Modeling as Next-Token Prediction**|Lunjun Zhang et.al.|[2408.15240](http://arxiv.org/abs/2408.15240)|null|\u9a8c\u8bc1\u5668\u6216\u5956\u52b1\u6a21\u578b\u5e38\u7528\u4e8e\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u6027\u80fd\u3002\u4e00\u79cd\u5e38\u89c1\u7684\u65b9\u6cd5\u662fBest-of-N\u7b56\u7565\uff0c\u5176\u4e2d\u4eceLLM\u751f\u6210\u7684N\u4e2a\u5019\u9009\u89e3\u51b3\u65b9\u6848\u4e2d\u7531\u9a8c\u8bc1\u5668\u8fdb\u884c\u6392\u540d\uff0c\u9009\u62e9\u6700\u4f73\u4e00\u4e2a\u3002\u4f20\u7edf\u4e0a\uff0c\u9a8c\u8bc1\u5668\u662f\u4f5c\u4e3a\u5224\u522b\u5206\u7c7b\u5668\u8fdb\u884c\u8bad\u7ec3\u4ee5\u5bf9\u89e3\u51b3\u65b9\u6848\u6253\u5206\u7684\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3LLM\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u5728\u9a8c\u8bc1\u548c\u89e3\u51b3\u65b9\u6848\u751f\u6210\u4e0a\u4f7f\u7528\u901a\u7528\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u76ee\u6807\u8054\u5408\u8bad\u7ec3\u9a8c\u8bc1\u5668\u3002\u4e0e\u6807\u51c6\u9a8c\u8bc1\u5668\u76f8\u6bd4\uff0c\u8fd9\u6837\u7684\u751f\u6210\u578b\u9a8c\u8bc1\u5668\uff08GenRM\uff09\u53ef\u4ee5\u4eceLLM\u7684\u51e0\u4e2a\u4f18\u52bf\u4e2d\u83b7\u76ca\uff1a\u5b83\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u6307\u4ee4\u8c03\u8c10\u76f8\u7ed3\u5408\uff0c\u652f\u6301\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u5e76\u4e14\u53ef\u4ee5\u901a\u8fc7\u589e\u52a0\u63a8\u7406\u65f6\u7684\u8ba1\u7b97\u91cf\u6765\u5229\u7528\u591a\u6570\u6295\u7968\uff0c\u4ece\u800c\u8fdb\u884c\u66f4\u597d\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7b97\u6cd5\u95ee\u9898\u548c\u5c0f\u5b66\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u4f7f\u7528Gemma\u4e3a\u57fa\u7840\u7684\u9a8c\u8bc1\u5668\u65f6\uff0cGenRM\u4f18\u4e8e\u5224\u522b\u578b\u9a8c\u8bc1\u5668\u548cLLM\u4f5c\u4e3a\u88c1\u5224\uff0c\u8868\u73b0\u51fa16%-64%\u7684\u95ee\u9898\u89e3\u51b3\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86GenRM\u5728\u6570\u636e\u96c6\u89c4\u6a21\u3001\u6a21\u578b\u5bb9\u91cf\u548c\u63a8\u7406\u65f6\u8ba1\u7b97\u91cf\u589e\u52a0\u65b9\u9762\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\u3002|\n", "2408.15221": "|**2024-08-27**|**LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet**|Nathaniel Li et.al.|[2408.15221](http://arxiv.org/abs/2408.15221)|null|\u8fd1\u671f\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9632\u5fa1\u63aa\u65bd\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5bf9\u6709\u5bb3\u67e5\u8be2\u7684\u62d2\u7edd\u80fd\u529b\uff0c\u5373\u4f7f\u5728\u906d\u53d7\u6709\u7ec4\u7ec7\u653b\u51fb\u7684\u60c5\u51b5\u4e0b\u4e5f\u4e0d\u4f8b\u5916\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u9632\u5fa1\u63aa\u65bd\u4e3b\u8981\u662f\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u9488\u5bf9\u81ea\u52a8\u5316\u653b\u51fb\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u79cd\u5a01\u80c1\u6a21\u578b\u4e0d\u8db3\u4ee5\u53cd\u6620\u771f\u5b9e\u4e16\u754c\u4e2d\u6076\u610f\u884c\u4e3a\u7684\u590d\u6742\u6027\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\uff08\u5373\u653b\u51fb\u8005\u5229\u7528\u6a21\u578b\u7684\u6f0f\u6d1e\u6765\u7ed5\u8fc7\u9632\u5fa1\u673a\u5236\uff09\u80fd\u591f\u63ed\u9732\u9632\u5fa1\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u6f0f\u6d1e\u3002\u5728\u4f7f\u7528HarmBench\u8fd9\u4e00\u8bc4\u4f30\u5e73\u53f0\uff0c\u5bf9\u6297\u90a3\u4e9b\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u4ec5\u62a5\u544a\u4f4e\u767e\u5206\u6bd4\u653b\u51fb\u6210\u529f\u7387\uff08ASR\uff09\u7684\u9632\u5fa1\u7cfb\u7edf\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u7684\u6210\u529f\u7387\u8d85\u8fc7\u4e8670%\u3002\u8fd9\u8868\u660e\u5f53\u524d\u7684\u9632\u5fa1\u673a\u5236\u5728\u9762\u5bf9\u66f4\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u653b\u51fb\u7b56\u7565\u65f6\u5b58\u5728\u4e0d\u8db3\u3002 \u6b64\u5916\uff0c\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u8fd8\u63ed\u793a\u4e86\u673a\u5668\u9057\u5fd8\u9632\u5fa1\u7cfb\u7edf\u7684\u6f0f\u6d1e\u3002\u653b\u51fb\u8005\u6210\u529f\u5730\u4ece\u672a\u88ab\u5220\u9664\u7684\u6a21\u578b\u4e2d\u6062\u590d\u4e86\u53ef\u7528\u4e8e\u751f\u7269\u5b89\u5168\u53cc\u91cd\u7528\u9014\u7684\u77e5\u8bc6\uff0c\u8fd9\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u73b0\u6709\u9632\u5fa1\u63aa\u65bd\u5728\u4fdd\u62a4\u654f\u611f\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u7684\u5f31\u70b9\u3002 \u4e3a\u4e86\u603b\u7ed3\u548c\u5171\u4eab\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u591a\u8f6e\u5bf9\u8bdd\u4eba\u5de5\u667a\u80fd\u8d8a\u72f1\u201d\uff08Multi-Turn Human Jailbreaks\uff0c\u7b80\u79f0MHJ\uff09\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u6765\u81ea537\u4e2a\u4e0d\u540c\u591a\u8f6e\u5bf9\u8bdd\u573a\u666f\u76842912\u4e2a\u89e6\u53d1\u6307\u4ee4\uff0c\u5171\u8ba12,912\u4e2a\u89e6\u53d1\u6307\u4ee4\u6d89\u53ca2,912\u4e2a\u4e0d\u540c\u7684\u591a\u8f6e\u5bf9\u8bdd\u201c\u8d8a\u72f1\u201d\u6848\u4f8b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u4ee5\u53ca\u5728\u591a\u79cd\u5546\u4e1a\u7ea2\u961f\u6d4b\u8bd5\u4e2d\u53d1\u5c55\u51fa\u7684\u4e00\u7cfb\u5217\u201c\u8d8a\u72f1\u201d\u7b56\u7565\u7684\u7efc\u8ff0\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u66f4\u5f3a\u5927\u7684LLM\u9632\u5fa1\u7cfb\u7edf\u63d0\u4f9b\u8d44\u6e90\u548c\u652f\u6301\u3002|\n", "2408.15207": "|**2024-08-27**|**Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks**|Shide Zhou et.al.|[2408.15207](http://arxiv.org/abs/2408.15207)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\u6781\u5927\u5730\u6539\u53d8\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u683c\u5c40\uff0c\u7136\u800c\u5728\u654f\u611f\u9886\u57df\u90e8\u7f72\u65f6\uff0c\u5b83\u4eec\u7684\u8106\u5f31\u6027\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u4e25\u91cd\u5173\u5207\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u6076\u610f\u5229\u7528\u7684\u98ce\u9669\u3002\u8fd9\u79cd\u60c5\u51b5\u51f8\u663e\u4e86\u9884\u90e8\u7f72\u6d4b\u8bd5\u4e0d\u8db3\u7684\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u66f4\u52a0\u4e25\u683c\u548c\u5168\u9762\u8bc4\u4f30\u65b9\u6cd5\u7684\u7d27\u8feb\u6027\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5168\u9762\u7684\u5b9e\u8bc1\u5206\u6790\uff0c\u8bc4\u4f30\u4e86\u4f20\u7edf\u8986\u76d6\u6807\u51c6\u5728\u8bc6\u522b\u8fd9\u4e9b\u6f0f\u6d1e\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u5173\u952e\u95ee\u9898\u2014\u2014\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002\u7814\u7a76\u9996\u5148\u5bf9LLM\u4e2d\u7684\u9690\u85cf\u72b6\u6001\u8fdb\u884c\u4e86\u805a\u7c7b\u5206\u6790\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u72b6\u6001\u7684\u5185\u5728\u7279\u6027\u80fd\u591f\u660e\u663e\u533a\u5206\u4e0d\u540c\u7c7b\u578b\u7684\u67e5\u8be2\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6807\u51c6\u7ea7\u522b\u3001\u5c42\u7ea7\u522b\u548c\u8bcd\u7ea7\u522b\u2014\u2014\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6807\u51c6\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u6b63\u5e38\u67e5\u8be2\u4e0e\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u5728\u795e\u7ecf\u5143\u6fc0\u6d3b\u6a21\u5f0f\u4e0a\u7684\u663e\u8457\u5dee\u5f02\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u805a\u7c7b\u7ed3\u679c\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5b9e\u65f6\u68c0\u6d4b\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff0c\u5229\u7528\u795e\u7ecf\u6fc0\u6d3b\u7279\u5f81\u3002\u6211\u4eec\u7684\u5206\u7c7b\u5668\u8868\u73b0\u51fa\u4e86\u6781\u9ad8\u7684\u51c6\u786e\u7387\uff0c\u5e73\u5747\u8fbe\u523096.33%\uff0c\u6210\u529f\u8bc6\u522b\u51fa\u5305\u62ec\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6027\u653b\u51fb\u7684\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u3002\u8fd9\u9879\u7814\u7a76\u7684\u91cd\u8981\u6027\u5728\u4e8e\u5176\u5bf9LLM\u5b89\u5168\u6027\u6d4b\u8bd5\u590d\u6742\u6311\u6218\u7684\u5168\u9762\u5e94\u5bf9\u3002\u901a\u8fc7\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u751f\u6210\u7b2c\u4e00\u4e2a\u8bcd\u65f6\u7acb\u5373\u68c0\u6d4b\u5230\u653b\u51fb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u96c6\u6210LLM\u7684\u672a\u6765\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5f3a\u5927\u7684\u5b9e\u65f6\u68c0\u6d4b\u80fd\u529b\u3002\u8fd9\u4e00\u7814\u7a76\u6df1\u5316\u4e86\u6211\u4eec\u5bf9LLM\u5b89\u5168\u6027\u7684\u7406\u89e3\uff0c\u5e76\u4e3a\u5f00\u53d1\u66f4\u7a33\u5065\u7684\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.15205": "|**2024-08-27**|**Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation**|Jian Hu et.al.|[2408.15205](http://arxiv.org/abs/2408.15205)|**[link](https://github.com/lwpyh/ProMaC_code)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4efb\u52a1\u901a\u7528\u7684\u63d0\u793a\u53ef\u5206\u5272\u65b9\u6cd5\uff0c\u65e8\u5728\u51cf\u5c11\u5bf9\u6bcf\u79cd\u6240\u9700\u5bf9\u8c61\u7684\u5b9e\u4f8b\u7279\u5b9a\u624b\u52a8\u63d0\u793a\u7684\u9700\u6c42\u3002\u901a\u8fc7\u4f7f\u7528\u5355\u4e2a\u4efb\u52a1\u901a\u7528\u63d0\u793a\u6765\u6307\u5bfc\u540c\u4e00\u4efb\u52a1\u4e0b\u4e0d\u540c\u5bf9\u8c61\u7684\u4e0d\u540c\u56fe\u50cf\u7684\u5206\u5272\uff0c\u5f15\u5165\u4e86\u4efb\u52a1\u901a\u7528\u63d0\u793a\u5206\u5272\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4ece\u901a\u7528\u63d0\u793a\u63a8\u7406\u51fa\u8be6\u7ec6\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u5206\u5272\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002\u7136\u800c\uff0cMLLMs\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7ecf\u5e38\u51fa\u73b0\u5e7b\u89c9\uff0c\u5bfc\u81f4\u63d0\u793a\u4e0d\u51c6\u786e\u3002\u73b0\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u6d88\u9664\u5e7b\u89c9\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u672c\u6587\u8ba4\u4e3aMLLM\u5e7b\u89c9\u5728\u6b63\u786e\u5229\u7528\u65f6\u53ef\u4ee5\u63ed\u793a\u6709\u4ef7\u503c\u7684\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee3\u8868\u4e86\u8d85\u8d8a\u5355\u5f20\u56fe\u50cf\u7684\u9884\u8bad\u7ec3\u5927\u89c4\u6a21\u77e5\u8bc6\u3002\u56e0\u6b64\uff0c\u672c\u6587\u5229\u7528\u5e7b\u89c9\u4ece\u56fe\u50cf\u4e2d\u6316\u6398\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u9a8c\u8bc1\u5176\u51c6\u786e\u6027\u4ee5\u589e\u5f3a\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7684\u63d0\u793a-\u63a9\u7801\u5faa\u73af\u751f\u6210\u6846\u67b6\uff08ProMaC\uff09\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u63d0\u793a\u751f\u6210\u5668\u548c\u4e00\u4e2a\u63a9\u7801\u751f\u6210\u5668\u3002\u63d0\u793a\u751f\u6210\u5668\u4f7f\u7528\u591a\u5c3a\u5ea6\u94fe\u5f0f\u601d\u8003\u63d0\u793a\uff0c\u6700\u521d\u63a2\u7d22\u5e7b\u89c9\u4ee5\u63d0\u53d6\u6d4b\u8bd5\u56fe\u50cf\u4e0a\u7684\u6269\u5c55\u4e0a\u4e0b\u6587\u77e5\u8bc6\u3002\u7136\u540e\uff0c\u5c06\u8fd9\u4e9b\u5e7b\u89c9\u964d\u4f4e\u5230\u5f62\u6210\u7cbe\u786e\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ece\u800c\u5f15\u5bfc\u63a9\u7801\u751f\u6210\u5668\u901a\u8fc7\u63a9\u7801\u8bed\u4e49\u5bf9\u9f50\u4ea7\u751f\u4e0e\u4efb\u52a1\u8bed\u4e49\u4e00\u81f4\u7684\u63a9\u7801\u3002\u751f\u6210\u7684\u63a9\u7801\u901a\u8fc7\u8fed\u4ee3\u5f15\u5bfc\u63d0\u793a\u751f\u6210\u5668\u66f4\u5173\u6ce8\u4efb\u52a1\u76f8\u5173\u7684\u56fe\u50cf\u533a\u57df\u5e76\u51cf\u5c11\u65e0\u5173\u7684\u5e7b\u89c9\uff0c\u6700\u7ec8\u5171\u540c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u63a9\u7801\u7684\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u57285\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8bc1\u660e\u4e86ProMaC\u7684\u6709\u6548\u6027\u3002\u8be6\u7ec6\u4ee3\u7801\u89c1https://lwpyh.github.io/ProMaC/\u3002|\n", "2408.15204": "|**2024-08-27**|**Can Unconfident LLM Annotations Be Used for Confident Conclusions?**|Kristina Gligori\u0107 et.al.|[2408.15204](http://arxiv.org/abs/2408.15204)|**[link](https://github.com/kristinagligoric/confidence-driven-inference)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u8005\u9ad8\u5ea6\u4e00\u81f4\uff0c\u663e\u793a\u51fa\u51cf\u8f7b\u4eba\u7c7b\u6570\u636e\u6536\u96c6\u6311\u6218\u7684\u6f5c\u529b\u3002\u5728\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\uff08CSS\uff09\u9886\u57df\uff0c\u7814\u7a76\u4eba\u5458\u8d8a\u6765\u8d8a\u591a\u5730\u5229\u7528LLM\u6ce8\u91ca\u6765\u8865\u5145\u7f13\u6162\u4e14\u6602\u8d35\u7684\u4eba\u7c7b\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5982\u4f55\u6536\u96c6\u548c\u4f7f\u7528LLM\u6ce8\u91ca\u800c\u4e0d\u635f\u5bb3\u4e0b\u6e38\u7ed3\u8bba\u7684\u6709\u6548\u6027\uff0c\u4ecd\u7f3a\u4e4f\u660e\u786e\u7684\u6307\u5357\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u201d\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86LLM\u6ce8\u91ca\u548cLLM\u7f6e\u4fe1\u5ea6\u6307\u793a\u5668\uff0c\u4ee5\u6218\u7565\u65b9\u5f0f\u9009\u62e9\u5e94\u6536\u96c6\u54ea\u4e9b\u4eba\u7c7b\u6ce8\u91ca\uff0c\u65e8\u5728\u751f\u4ea7\u51c6\u786e\u7684\u7edf\u8ba1\u4f30\u8ba1\u548c\u53ef\u9a8c\u8bc1\u7684\u7f6e\u4fe1\u533a\u95f4\uff0c\u540c\u65f6\u51cf\u5c11\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u9632\u6b62LLM\u6ce8\u91ca\u8d28\u91cf\u5dee\u7684\u4fdd\u969c\u63aa\u65bd\uff0c\u786e\u4fdd\u5f97\u51fa\u7684\u7ed3\u8bba\u65e2\u6709\u6548\u53c8\u4e0d\u6bd4\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u6ce8\u91ca\u66f4\u4e0d\u51c6\u786e\u3002\u6211\u4eec\u5728\u4e09\u4e2aCSS\u573a\u666f\u2014\u2014\u793c\u8c8c\u6587\u672c\u3001\u7acb\u573a\u548c\u504f\u89c1\u2014\u2014\u4e2d\u7684\u7edf\u8ba1\u4f30\u8ba1\u4efb\u52a1\u4e2d\uff0c\u901a\u8fc7\u4e0e\u57fa\u7ebf\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u7684\u6709\u6548\u6027\uff0c\u6bcf\u79cd\u573a\u666f\u4e0b\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u51cf\u5c11\u4e86\u8d85\u8fc725%\u3002\u5c3d\u7ba1\u6211\u4eec\u4f7f\u7528CSS\u573a\u666f\u8fdb\u884c\u6f14\u793a\uff0c\u4f46\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u53ef\u4ee5\u7528\u4e8e\u5e7f\u6cdbNLP\u95ee\u9898\u4e2d\u7684\u5927\u591a\u6570\u6807\u51c6\u91cf\u4f30\u8ba1\u3002|\n", "2408.15176": "|**2024-08-27**|**Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement**|Longshen Ou et.al.|[2408.15176](http://arxiv.org/abs/2408.15176)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\uff0c\u5305\u62ec\u7b26\u53f7\u97f3\u4e50\u751f\u6210\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u8fdb\u884c\u53ef\u63a7\u97f3\u4e50\u7f16\u6392\u4efb\u52a1\u7684\u6311\u6218\u4ecd\u7136\u65b0\u9896\uff0c\u6bcf\u4e2a\u4efb\u52a1\u90fd\u9700\u8981\u4e0d\u540c\u7684\u97f3\u4e50\u4fe1\u606f\u4f5c\u4e3a\u63a7\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5e8f\u5217\u5230\u5e8f\u5217\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u5bf9\u7b26\u53f7\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u56db\u4e2a\u4e0d\u540c\u7684\u591a\u8f68\u7f16\u6392\u4efb\u52a1\uff1a\u4e50\u961f\u7f16\u6392\u3001\u94a2\u7434\u7f29\u51cf\u3001\u9f13\u7f16\u6392\u548c\u58f0\u97f3\u5206\u79bb\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u7b56\u7565\u5728\u6240\u6709\u56db\u4e2a\u4efb\u52a1\u4e0a\u5747\u5b9e\u73b0\u4e86\u66f4\u9ad8\u97f3\u4e50\u8d28\u91cf\u7684\u7ed3\u679c\uff0c\u4e0e\u4e13\u95e8\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u57fa\u7ebf\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u989d\u5916\u7684\u63a2\u67e5\u5206\u6790\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u9636\u6bb5\u8d4b\u4e88\u6a21\u578b\u7406\u89e3\u97f3\u4e50\u6761\u4ef6\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u8fd9\u5728\u4ec5\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u7684\u5fae\u8c03\u96be\u4ee5\u83b7\u5f97\u7684\u60c5\u51b5\u4e0b\u5c24\u4e3a\u91cd\u8981\u3002|\n", "2408.15172": "|**2024-08-27**|**X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation**|Hanjia Lyu et.al.|[2408.15172](http://arxiv.org/abs/2408.15172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5df2\u88ab\u8bc1\u660e\u80fd\u663e\u8457\u63d0\u5347\u4e30\u5bcc\u9879\u76ee\u63cf\u8ff0\u7684\u6548\u679c\uff0c\u8fdb\u800c\u589e\u5f3a\u63a8\u8350\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u4e8e\u7eaf\u6587\u672c\u63d0\u793a\uff0c\u6216\u8005\u91c7\u7528\u57fa\u672c\u7684\u591a\u6a21\u6001\u7b56\u7565\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u6587\u672c\u4e0e\u89c6\u89c9\u6a21\u6001\u4e4b\u95f4\u4e92\u8865\u7684\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCross-Reflection Prompting\uff08X-Reflect\uff09\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5f15\u5bfcLMM\u660e\u786e\u8bc6\u522b\u5e76\u8c03\u548c\u6587\u672c\u4e0e\u56fe\u50cf\u4e4b\u95f4\u7684\u652f\u6301\u6027\u4e0e\u51b2\u7a81\u4fe1\u606f\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u901a\u8fc7\u6355\u6349\u4e24\u79cd\u6a21\u6001\u7684\u7ec6\u5fae\u6d1e\u5bdf\uff0c\u6b64\u65b9\u6cd5\u751f\u6210\u4e86\u66f4\u4e3a\u5168\u9762\u4e14\u8bed\u5883\u4e30\u5bcc\u7684\u9879\u76ee\u8868\u793a\u3002\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4e0b\u6e38\u63a8\u8350\u51c6\u786e\u5ea6\u4e0a\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u6846\u67b6\u5728\u4e0d\u540cLMM\u67b6\u6784\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u63d0\u793a\u7b56\u7565\u7684\u9c81\u68d2\u6027\uff0c\u63d0\u4f9b\u4e86\u4f18\u5316\u7684\u89c1\u89e3\u3002\u8fd9\u9879\u5de5\u4f5c\u5f3a\u8c03\u4e86\u6574\u5408\u591a\u6a21\u6001\u4fe1\u606f\u7684\u91cd\u8981\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u5584\u591a\u6a21\u6001\u63a8\u8350\u7cfb\u7edf\u4e2d\u9879\u76ee\u7406\u89e3\u7684\u65b0\u578b\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.15171": "|**2024-08-27**|**Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation**|N. E. Kriman et.al.|[2408.15171](http://arxiv.org/abs/2408.15171)|null|\u81ea2022\u5e74ChatGPT\u7684\u53d1\u5e03\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u8303\u56f4\u663e\u8457\u6269\u5927\uff0c\u663e\u793a\u51fa\u5176\u5728\u5404\u79cd\u573a\u666f\u4e2d\u7684\u4ef7\u503c\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4f01\u4e1a\u7ea7\u548c\u5546\u4e1a\u5e94\u7528\u800c\u8a00\uff0cLLMs\u751f\u6210\u4e0d\u51c6\u786e\u4fe1\u606f\u7684\u8d8b\u52bf\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u73b0\u8c61\uff0c\u6210\u4e3a\u4e86\u4e00\u4e2a\u4e3b\u8981\u6311\u6218\u3002\u672c\u9879\u76ee\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728\u4e0e\u539f\u59cb\u6587\u672c\u8fdb\u884c\u6bd4\u8f83\u65f6\u8bc4\u4f30LLM\u751f\u6210\u6982\u8981\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6734\u7d20\u8d1d\u53f6\u65af\u5206\u7c7b\u6765\u5224\u65ad\u751f\u6210\u5185\u5bb9\u7684\u771f\u5b9e\u6027\u3002 \u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u53ef\u4ee5\u4f30\u8ba1\u751f\u6210\u6587\u672c\u4e0e\u5b9e\u9645\u4fe1\u606f\u4e4b\u95f4\u7684\u5339\u914d\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8LLM\u5e94\u7528\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u3002\u8fd9\u4e0d\u4ec5\u6709\u52a9\u4e8e\u8bc6\u522b\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u4e4b\u5904\uff0c\u8fd8\u80fd\u589e\u5f3a\u7528\u6237\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u4fe1\u4efb\uff0c\u4fc3\u8fdb\u5176\u5728\u66f4\u5e7f\u6cdb\u9886\u57df\u7684\u6709\u6548\u4f7f\u7528\u3002\u6b64\u5916\uff0c\u8be5\u65b9\u6cd5\u8fd8\u80fd\u4e3aLLM\u7684\u6301\u7eed\u6539\u8fdb\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u53cd\u9988\uff0c\u63a8\u52a8\u6280\u672f\u8fdb\u6b65\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u9ad8\u8d28\u91cf\u3001\u66f4\u53ef\u9760\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u5185\u5bb9\u751f\u6210\u3002|\n", "2408.15079": "|**2024-08-27**|**BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline**|Guosheng Dong et.al.|[2408.15079](http://arxiv.org/abs/2408.15079)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6838\u5fc3\u80fd\u529b\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5e7f\u6cdb\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u7ec4\u6210\u548c\u9009\u62e9\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u88ab\u591a\u4e2a\u673a\u6784\u89c6\u4e3a\u5546\u4e1a\u79d8\u5bc6\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u6e90\u4e86\u4e00\u4e2a\u901a\u7528\u9002\u7528\u7684\u6570\u636e\u5904\u7406\u7ba1\u9053\uff0c\u5e76\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u7ade\u4e89\u6027\u7684LLM\u57fa\u7ebf\u6765\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u548c\u6f5c\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6570\u636e\u5904\u7406\u7ba1\u9053\u5305\u62ec\u5e7f\u57df\u6536\u96c6\u4ee5\u6269\u5927\u89c4\u6a21\u548c\u91cd\u65b0\u52a0\u6743\u4ee5\u63d0\u9ad8\u8d28\u91cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u7ba1\u9053\u5bf93\u4e07\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u660e\u786e\u7684\u4e0b\u6e38\u4efb\u52a1\u4f18\u5316\uff0c\u63a5\u7740\u8fdb\u884c\u4e00\u4e2a\u7b80\u5355\u4f46\u6709\u6548\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u3002BaichuanSEED\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8868\u73b0\u51fa\u4e00\u81f4\u6027\u4e0e\u9884\u6d4b\u6027\uff0c\u5e76\u5728\u7efc\u5408\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u51e0\u4e2a\u5148\u8fdb\u7684\u5546\u4e1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5982Qwen1.5\u548cLlama3\uff0c\u5b9e\u73b0\u4e86\u53ef\u6bd4\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u51e0\u4e2a\u542f\u53d1\u5f0f\u5b9e\u9a8c\uff0c\u8ba8\u8bba\u4e86\u5728\u6570\u5b66\u548c\u7f16\u7a0b\u7b49\u4e0b\u6e38\u4efb\u52a1\u8fdb\u4e00\u6b65\u4f18\u5316\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.15066": "|**2024-08-27**|**Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models**|Ned Cooper et.al.|[2408.15066](http://arxiv.org/abs/2408.15066)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u53cd\u9988\u529f\u80fd\u5728ChatGPT\u754c\u9762\u4e2d\u7684\u53ef\u7528\u6027\uff0c\u5206\u6790\u4e86\u8fd9\u4e9b\u529f\u80fd\u5982\u4f55\u5851\u9020\u7528\u6237\u8f93\u5165\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fed\u4ee3\u8fc7\u7a0b\u4e2d\u7684\u53c2\u4e0e\u5ea6\u3002\u901a\u8fc7\u8c03\u7814ChatGPT\u7528\u6237\u5e76\u5e94\u7528\u4e86\u53ef\u64cd\u4f5c\u6027\u6846\u67b6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u7c7b\u529f\u80fd\u9f13\u52b1\u7b80\u5355\u3001\u9891\u7e41\u4e14\u4fa7\u91cd\u4e8e\u6027\u80fd\u7684\u53cd\u9988\uff0c\u540c\u65f6\u9650\u5236\u4e86\u96c6\u4f53\u8f93\u5165\u548c\u7528\u6237\u95f4\u7684\u8ba8\u8bba\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u8fd9\u79cd\u53cd\u9988\u683c\u5f0f\u6781\u5927\u5730\u9650\u5236\u4e86\u7528\u6237\u7684\u53c2\u4e0e\uff0c\u5f3a\u5316\u4e86\u7528\u6237\u3001\u516c\u4f17\u4e0e\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u53f8\u4e4b\u95f4\u7684\u6743\u529b\u4e0d\u5e73\u7b49\u3002\u6211\u4eec\u7684\u5206\u6790\u4e3a\u73b0\u6709\u53c2\u4e0e\u5f0f\u4eba\u5de5\u667a\u80fd\u6587\u732e\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\uff0c\u7740\u91cd\u4e8e\u73b0\u6709\u53cd\u9988\u6d41\u7a0b\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u91cd\u65b0\u8bbe\u8ba1\u7684\u65b9\u5411\u3002 \u4e3a\u4e86\u4f7f\u516c\u4f17\u5728\u4eba\u5de5\u667a\u80fd\u53d1\u5c55\u4e2d\u80fd\u591f\u66f4\u5177\u6709\u610f\u4e49\u5730\u53c2\u4e0e\uff0c\u6211\u4eec\u63d0\u5021\u8f6c\u5411\u5173\u6ce8\u6a21\u578b\u8f93\u51fa\u4e0e\u7279\u5b9a\u7528\u6237\u504f\u597d\u7684\u4e00\u81f4\u6027\u7684\u8fc7\u7a0b\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u5f3a\u8c03\u9700\u8981\u4fc3\u8fdb\u516c\u53f8\u4e0e\u4e0d\u540c\u201c\u516c\u4f17\u201d\u4e4b\u95f4\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u76ee\u7684\u548c\u5e94\u7528\u8fdb\u884c\u5bf9\u8bdd\u7684\u8fc7\u7a0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u8981\u6c42\u5bf9\u6301\u7eed\u7684\u793e\u4f1a\u57fa\u7840\u8bbe\u65bd\u5efa\u8bbe\u7684\u5173\u6ce8\uff0c\u5373\u521b\u5efa\u548c\u7ef4\u6301\u89e3\u51b3AI\u5f00\u53d1\u548c\u90e8\u7f72\u5f71\u54cd\u7fa4\u4f53\u5173\u5207\u6240\u9700\u7684\u793e\u4f1a\u3001\u6280\u672f\u548c\u673a\u6784\u7ed3\u6784\u3002|\n", "2408.15998": "|**2024-08-28**|**Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders**|Min Shi et.al.|[2408.15998](http://arxiv.org/abs/2408.15998)|**[link](https://github.com/nvlabs/eagle)**|**\u300a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\uff1a\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u63a2\u7d22\u300b\u4e00\u6587\u63a2\u8ba8\u4e86\u51c6\u786e\u89e3\u6790\u590d\u6742\u89c6\u89c9\u4fe1\u606f\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u91cd\u8981\u6027\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u589e\u5f3a\u7684\u89c6\u89c9\u611f\u77e5\u80fd\u663e\u8457\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\u3001\u6587\u6863\u5206\u6790\u7b49\u5206\u8fa8\u7387\u654f\u611f\u4efb\u52a1\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u8bb8\u591a\u5148\u8fdbMLLMs\u901a\u8fc7\u96c6\u6210\u591a\u79cd\u89c6\u89c9\u7f16\u7801\u5668\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u7136\u800c\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u5173\u952e\u65b9\u9762\u7cfb\u7edf\u7684\u6bd4\u8f83\u548c\u8be6\u7ec6\u7684\u62c6\u89e3\u7814\u7a76\uff0c\u6bd4\u5982\u4e13\u5bb6\u9009\u62e9\u548c\u591a\u89c6\u89c9\u4e13\u5bb6\u878d\u5408\u7b56\u7565\u3002\u672c\u6587\u5bf9\u4f7f\u7528\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684MLLM\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u5e7f\u6cdb\u63a2\u7d22\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u591a\u4e2a\u4e92\u8865\u89c6\u89c9\u7f16\u7801\u5668\u7684\u89c6\u89c9\u4ee4\u724c\u7b80\u5355\u62fc\u63a5\u5373\u53ef\u8fbe\u5230\u4e0e\u66f4\u590d\u6742\u7684\u6df7\u5408\u67b6\u6784\u6216\u7b56\u7565\u76f8\u5f53\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u5f15\u5165\u9884\u5bf9\u9f50\uff08Pre-Alignment\uff09\u673a\u5236\uff0c\u4ee5\u5f25\u5408\u4e13\u6ce8\u4e8e\u89c6\u89c9\u7684\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u4ee4\u724c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u4ece\u800c\u63d0\u5347\u6a21\u578b\u4e00\u81f4\u6027\u3002\u7531\u6b64\u4ea7\u751f\u7684MLLM\u5bb6\u65cf\u2014\u2014Eagle\uff0c\u5728\u4e3b\u8981\u7684MLLM\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8d85\u8d8a\u4e86\u5176\u4ed6\u9886\u5148\u5f00\u6e90\u6a21\u578b\u3002\u76f8\u5173\u4ee3\u7801\u53ca\u6a21\u578b\u5df2\u5f00\u6e90\u53d1\u5e03\uff1ahttps://github.com/NVlabs/Eagle**|\n", "2408.15971": "|**2024-08-28**|**BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems**|Wei Wang et.al.|[2408.15971](http://arxiv.org/abs/2408.15971)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u53d8\u5f97\u8d8a\u6765\u8d8a\u5f3a\u5927\uff0c\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\uff0c\u4f8b\u5982\u6784\u5efa\u5355\u4e00\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u76f8\u8f83\u4e8e\u5355\u4e00\u4ee3\u7406\uff0c\u591a\u4ee3\u7406\u7cfb\u7edf\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\u80fd\u529b\u63d0\u51fa\u4e86\u66f4\u9ad8\u7684\u8981\u6c42\u3002\u5df2\u6709\u7684\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u4e8e\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u7ec6\u7c92\u5ea6\u8bc4\u4f30\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5e76\u4e14\u5ffd\u7565\u4e86\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u4e0e\u7ade\u4e89\u573a\u666f\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014BattleAgentBench\u3002\u8be5\u57fa\u51c6\u5b9a\u4e49\u4e86\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u4e03\u4e2a\u5b50\u9636\u6bb5\uff0c\u65e8\u5728\u4ece\u5355\u4e00\u4ee3\u7406\u573a\u666f\u5bfc\u822a\u80fd\u529b\u3001\u914d\u5bf9\u4ee3\u7406\u4efb\u52a1\u6267\u884c\u80fd\u529b\u4ee5\u53ca\u591a\u4ee3\u7406\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7b49\u591a\u4e2a\u7ef4\u5ea6\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u7ec6\u81f4\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5bf9\u56db\u5927\u95ed\u6e90\u6a21\u578b\u548c\u4e03\u5927\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u5219\u9762\u4e34\u6311\u6218\u3002\u5bf9\u4e8e\u9700\u8981\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7684\u56f0\u96be\u4efb\u52a1\uff0c\u5c3d\u7ba1\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5c55\u793a\u4e86\u4e00\u5b9a\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u4ecd\u6709\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002|\n", "2408.15966": "|**2024-08-28**|**More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding**|Yuan Tang et.al.|[2408.15966](http://arxiv.org/abs/2408.15966)|**[link](https://github.com/tangyuan96/greenplm)**|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7406\u89e3\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u4e09\u7ef4\u70b9\u4e91\u4e0e\u6587\u672c\u914d\u5bf9\u6570\u636e\u96c6\uff0cLLM \u5728\u4e09\u7ef4\u7406\u89e3\u4e0a\u7684\u6210\u529f\u5c1a\u672a\u5b9e\u73b0\u590d\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u4efb\u52a1\uff1a3D \u6570\u636e\u9ad8\u6548\u70b9\u4e91-\u8bed\u8a00\u7406\u89e3\u3002\u76ee\u6807\u662f\u4f7fLLM \u80fd\u591f\u5229\u7528\u6700\u5c11\u7684\u4e09\u7ef4\u70b9\u4e91\u548c\u6587\u672c\u6570\u636e\u5bf9\u5b9e\u73b0\u7a33\u5065\u7684\u4e09\u7ef4\u5bf9\u8c61\u7406\u89e3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u5f15\u5165\u4e86GreenPLM\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u591a\u7684\u6587\u672c\u6570\u636e\u6765\u5f25\u8865\u7f3a\u5c11\u7684\u4e09\u7ef4\u6570\u636e\u3002\u9996\u5148\uff0c\u501f\u9274\u4f7f\u7528CLIP\u5bf9\u56fe\u50cf\u548c\u6587\u672c\u8fdb\u884c\u5bf9\u9f50\u7684\u65b9\u5f0f\uff0c\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u7684\u70b9\u4e91-\u6587\u672c\u7f16\u7801\u5668\u5c06\u4e09\u7ef4\u70b9\u4e91\u7a7a\u95f4\u6620\u5c04\u5230\u6587\u672c\u7a7a\u95f4\u3002\u8fd9\u4e00\u6620\u5c04\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u8fde\u63a5\u6587\u672c\u7a7a\u95f4\u4e0eLLM\u3002\u4e00\u65e6\u5efa\u7acb\u4e86\u70b9\u4e91-\u6587\u672c-LLM\u7684\u8fde\u63a5\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u901a\u8fc7\u6269\u5c55\u4e2d\u95f4\u6587\u672c\u7a7a\u95f4\u589e\u5f3a\u6587\u672c-LLM\u7684\u5bf9\u9f50\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u4e09\u7ef4\u70b9\u4e91\u6570\u636e\u7684\u4f9d\u8d56\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u751f\u6210\u4e86600\u4e07\u4e2a\u5173\u4e8e\u4e09\u7ef4\u7269\u4f53\u7684\u81ea\u7531\u6587\u672c\u63cf\u8ff0\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e09\u9636\u6bb5\u8bad\u7ec3\u7b56\u7565\uff0c\u5e2e\u52a9LLM\u66f4\u597d\u5730\u63a2\u7d22\u4e0d\u540c\u6a21\u6001\u4e4b\u95f4\u7684\u5185\u5728\u8054\u7cfb\u3002\u4e3a\u4e86\u5b9e\u73b0\u9ad8\u6548\u7684\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u96f6\u53c2\u6570\u4ea4\u53c9\u6ce8\u610f\u529b\u6a21\u5757\u7528\u4e8e\u4ee4\u724c\u805a\u5408\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGreenPLM\u4ec5\u9700\u8981\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u6240\u75283D\u8bad\u7ec3\u6570\u636e\u768412%\uff0c\u5c31\u80fd\u8fbe\u5230\u66f4\u4f18\u7684\u4e09\u7ef4\u7406\u89e3\u6027\u80fd\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cGreenPLM\u4ec5\u4f7f\u7528\u6587\u672c\u6570\u636e\u4e5f\u80fd\u5b9e\u73b0\u7ade\u4e89\u529b\u7684\u8868\u73b0\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6743\u91cd\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/TangYuan96/GreenPLM\u3002|\n", "2408.15950": "|**2024-08-28**|**Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games**|Nicholas R. Waytowich et.al.|[2408.15950](http://arxiv.org/abs/2408.15950)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u4f7f\u5176\u80fd\u529b\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u6587\u672c\u4efb\u52a1\uff0c\u6269\u5c55\u5230\u4e86\u591a\u6a21\u6001\u9886\u57df\uff0c\u6574\u5408\u4e86\u89c6\u89c9\u3001\u542c\u89c9\u548c\u6587\u672c\u6570\u636e\u3002\u867d\u7136\u5728\u673a\u5668\u4eba\u5b66\u548c\u6e38\u620f\u7b49\u9ad8\u9636\u89c4\u5212\u9886\u57df\u5bf9\u591a\u6a21\u6001LLM\u7684\u7814\u7a76\u5df2\u7ecf\u76f8\u5f53\u5e7f\u6cdb\uff0c\u4f46\u5728\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u5374\u9c9c\u6709\u63a2\u7d22\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001LLM\u5728 Atari \u89c6\u9891\u6e38\u620f\u9886\u57df\u7684\u5e94\u7528\uff0c\u5f15\u5165\u4e86 Atari \u6e38\u620f\u6027\u80fd\u4f5c\u4e3a\u8bc4\u4f30\u591a\u6a21\u6001LLM\u6267\u884c\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u80fd\u529b\u7684\u65b0\u57fa\u51c6\u3002\u4e0e\u4f20\u7edf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u548c\u6a21\u4eff\u5b66\u4e60\uff08IL\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8fd9\u4e9bLLM\u65e0\u9700\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u5956\u52b1\u51fd\u6570\u5b9a\u4e49\uff0c\u800c\u662f\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u77e5\u8bc6\u76f4\u63a5\u4e0e\u6e38\u620f\u73af\u5883\u4ea4\u4e92\u3002 \u6211\u4eec\u7684\u7814\u7a76\u8bc4\u4f30\u4e86\u591a\u4e2a\u591a\u6a21\u6001LLM\u7684\u8868\u73b0\uff0c\u4e0e\u4f20\u7edfRL\u4ee3\u7406\u3001\u4eba\u7c7b\u73a9\u5bb6\u548c\u968f\u673a\u4ee3\u7406\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u91cd\u70b9\u5173\u6ce8\u5b83\u4eec\u7406\u89e3\u590d\u6742\u89c6\u89c9\u573a\u666f\u5e76\u5236\u5b9a\u6218\u7565\u54cd\u5e94\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u5f15\u5165\u4eba\u7c7b\u6f14\u793a\u7684\u6e38\u620f\u73a9\u6cd5\u8f68\u8ff9\u6765\u7814\u7a76\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u7684\u5f71\u54cd\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002 \u901a\u8fc7\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u786e\u5b9a\u591a\u6a21\u6001LLM\u80fd\u5426\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u8bad\u7ec3\u6765\u6709\u6548\u5730\u5145\u5f53\u4f4e\u7ea7\u63a7\u5236\u5668\uff0c\u4ece\u800c\u91cd\u65b0\u5b9a\u4e49\u52a8\u6001\u548c\u89c6\u89c9\u590d\u6742\u73af\u5883\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\u3002\u6709\u5173\u989d\u5916\u7ed3\u679c\u548c\u89c6\u9891\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u9879\u76ee\u7f51\u9875\uff1ahttps://sites.google.com/view/atari-gpt/\u3002|\n", "2408.15915": "|**2024-08-28**|**Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models**|Yuncheng Yang et.al.|[2408.15915](http://arxiv.org/abs/2408.15915)|**[link](https://github.com/yaphabates/rocket)**|\u5728\u7279\u5b9a\u9886\u57df\u57f9\u517b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee5\u89e3\u51b3\u4efb\u52a1\u6240\u9700\u7684\u4e13\u957f\u5f80\u5f80\u9700\u8981\u9488\u5bf9\u7a33\u5b9a\u9884\u671f\u8f93\u51fa\u8fdb\u884c\u4e13\u95e8\u8c03\u6574\u3002\u907f\u514d\u624b\u52a8\u51c6\u5907\u6307\u4ee4\u6570\u636e\u96c6\u548c\u8bad\u7ec3\u8d44\u6e90\u5e26\u6765\u7684\u5de8\u5927\u6210\u672c\uff0c\u5229\u7528\u5f00\u653e\u77e5\u8bc6\u5305\u62ec\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u6a21\u578b\u548c\u6307\u4ee4\u6570\u636e\u96c6\u4f5c\u4e3a\u8d77\u70b9\u662f\u5408\u7406\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u4e0a\u4fa7\u91cd\u4e8e\u901a\u7528\u80fd\u529b\u7684\u6027\u80fd\uff0c\u800c\u5ffd\u89c6\u4e86\u5728\u7279\u5b9a\u9886\u57df\u90e8\u7f72\u65f6\u66b4\u9732\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u8fc7\u5f15\u5165\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u6837\u672c\uff08\u5373K-shot\uff09\u6765\u5f25\u5408\u6b64\u7c7b\u5dee\u8ddd\u7684\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdbLLM\u5728\u5f00\u653e\u77e5\u8bc6\u4e0a\u7684\u4efb\u52a1\u4e13\u957f\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u9ad8\u6548\u4e14\u53ef\u6269\u5c55\u7684\u7ba1\u9053\uff0c\u4ee5\u6210\u672c\u6548\u76ca\u65b9\u5f0f\u751f\u6210\u4efb\u52a1\u4e13\u5bb6\uff0c\u5176\u4e2dK-shot\u6570\u636e\u53c2\u4e0e\u9009\u62e9\u6700\u5177\u6f5c\u529b\u7684\u4e13\u5bb6\u5019\u9009\u8005\u548c\u4efb\u52a1\u76f8\u5173\u7684\u6307\u4ee4\u3002\u6784\u5efa\u4e86\u4e00\u4e2a\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7cfb\u7edf\uff0c\u5145\u5206\u5229\u7528\u591a\u4e2a\u4e13\u5bb6\u4e4b\u95f4\u72ec\u7279\u4f46\u4e92\u8865\u7684\u77e5\u8bc6\u3002\u6211\u4eec\u63ed\u793a\u4e86MoE\u7cfb\u7edf\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\uff1a 1. \u9075\u5faaK-shot\u539f\u5219\uff1a\u786e\u4fdd\u771f\u6b63\u5177\u5907\u89e3\u51b3K-shot\u95ee\u9898\u80fd\u529b\u7684\u6a21\u578b\u88ab\u9009\u4e2d\uff0c\u800c\u975e\u76f2\u731c\u8005\u3002 2. \u5f3a\u8c03\u591a\u6837\u6027\uff1a\u4e0d\u4ec5\u4e13\u5bb6\u672c\u8eab\u5177\u6709\u591a\u6837\u6027\uff0c\u800c\u4e14\u5728\u6574\u4e2a\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u8fc7\u7a0b\u4e2d\uff0c\u7ec6\u8c03\u6307\u4ee4\u4e5f\u4f53\u73b0\u51fa\u591a\u6837\u6027\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5bf9\u5f00\u653e\u77e5\u8bc6\u5229\u7528\u7684\u4f18\u8d8a\u6027\u3002\u540e\u7eed\u5c06\u53d1\u5e03\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2408.15907": "|**2024-08-28**|**Decentralized LLM Inference over Edge Networks with Energy Harvesting**|Aria Khoshsirat et.al.|[2408.15907](http://arxiv.org/abs/2408.15907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u7684\u5353\u8d8a\u6027\u80fd\u5df2\u7ecf\u6781\u5927\u5730\u6539\u53d8\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u4f46\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u5982\u8fb9\u7f18\u7f51\u7edc\u4e2d\u7684\u90e8\u7f72\u4ecd\u9762\u4e34\u6311\u6218\u3002\u5206\u5e03\u5f0f\u63a8\u7406\u6280\u672f\u7684\u51fa\u73b0\u901a\u8fc7\u5728\u591a\u53f0\u8bbe\u5907\u95f4\u5206\u914d\u6a21\u578b\u5757\u6765\u63d0\u5347\u7075\u6d3b\u6027\u548c\u6210\u672c\u6548\u76ca\uff0c\u4f46\u4ecd\u5b58\u5728\u80fd\u6e90\u9650\u5236\u95ee\u9898\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u7535\u6c60\u4f9b\u7535\u7684\u8fb9\u7f18\u8bbe\u5907\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u4e92\u8054\u3001\u4f7f\u7528\u80fd\u91cf\u6536\u96c6\u7684\u7535\u6c60\u4f9b\u7535\u8fb9\u7f18\u8bbe\u5907\u7684\u534f\u4f5c\u63a8\u7406\u53ef\u6301\u7eed\u6a21\u578b\u3002\u901a\u8fc7\u5efa\u7acb\u534a\u9a6c\u5c14\u53ef\u592b\u6a21\u578b\u63cf\u8ff0\u8bbe\u5907\u72b6\u6001\uff0c\u8003\u8651\u5904\u7406\u53c2\u6570\u548c\u5e73\u5747\u7eff\u8272\u80fd\u6e90\u5230\u8fbe\u60c5\u51b5\uff0c\u4ee5\u6307\u5bfc\u8bbe\u8ba1\u65e8\u5728\u51cf\u5c11\u8bbe\u5907\u505c\u673a\u65f6\u95f4\u548c\u6700\u5927\u5316\u7f51\u7edc\u541e\u5410\u91cf\u7684\u8c03\u5ea6\u7b97\u6cd5\u3002\u901a\u8fc7\u5b9e\u8bc1\u8bc4\u4f30\u548c\u6a21\u62df\u8fd0\u884c\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u4e3a\u8fb9\u7f18\u7f51\u7edc\u4e0a\u7684\u8282\u80fd\u5206\u5e03\u5f0f\u63a8\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2408.15903": "|**2024-08-28**|**LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments**|Ruirui Chen et.al.|[2408.15903](http://arxiv.org/abs/2408.15903)|null|\u5feb\u901f\u8fc7\u65f6\u7684\u4fe1\u606f\u4f7f\u5f97\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6574\u5408\u65b0\u77e5\u8bc6\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u51c6\u786e\u4e8b\u5b9e\u8bc6\u522b\u548c\u5e8f\u5217\u903b\u8f91\u63a8\u7406\u7684\u591a\u8df3\u95ee\u9898\u65f6\u4ecd\u5b58\u5728\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u9762\u5bf9\u5927\u91cf\u4e8b\u5b9e\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86Graph Memory-based Editing for Large Language Models\uff08GMeLLo\uff09\uff0c\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86\u77e5\u8bc6\u56fe\u8c31\uff08KGs\uff09\u7684\u660e\u786e\u77e5\u8bc6\u8868\u793a\u4e0eLLMs\u7684\u8bed\u8a00\u7075\u6d3b\u6027\u3002GMeLLo\u4e0d\u4ec5\u5229\u7528LLMs\u8fdb\u884c\u95ee\u7b54\uff0c\u8fd8\u8fd0\u7528\u8fd9\u4e9b\u6a21\u578b\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u6362\u4e3a\u7ed3\u6784\u5316\u67e5\u8be2\u548c\u4e8b\u5b9e\u4e09\u5143\u7ec4\uff0c\u4ece\u800c\u5b9e\u73b0\u4e0eKGs\u7684\u65e0\u7f1d\u4ea4\u4e92\uff0c\u7528\u4e8e\u5feb\u901f\u66f4\u65b0\u548c\u7cbe\u786e\u7684\u591a\u8df3\u63a8\u7406\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMeLLo\u5728\u591a\u8df3\u95ee\u7b54\u57fa\u51c6MQuAKE\u4e2d\u663e\u8457\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u5927\u91cf\u77e5\u8bc6\u66f4\u65b0\u7684\u573a\u666f\u4e2d\u3002|\n", "2408.15901": "|**2024-08-28**|**Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts**|Nikolas Gritsch et.al.|[2408.15901](http://arxiv.org/abs/2408.15901)|null|\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6548\u7387\u3001\u4e13\u4e1a\u5316\u548c\u5bf9\u65b0\u6570\u636e\u5206\u5e03\u7684\u9002\u5e94\u6027\u65b9\u9762\u96be\u4ee5\u540c\u65f6\u5177\u5907\u8fd9\u4e9b\u4f18\u79c0\u54c1\u8d28\u3002\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u6761\u4ef6\u8ba1\u7b97\u7684\u5185\u5728\u7279\u6027\uff0c\u6210\u4e3a\u7814\u7a76\u7684\u91cd\u70b9\u9886\u57df\uff0c\u65e8\u5728\u63d0\u5347\u8fd9\u4e9b\u54c1\u8d28\u3002\u672c\u5de5\u4f5c\u4e13\u6ce8\u4e8e\u201c\u5347\u7ea7\u201d\u5bc6\u96c6\u578b\u4e13\u5bb6\u6a21\u578b\u81f3MoE\u67b6\u6784\uff0c\u65e8\u5728\u589e\u5f3a\u4e13\u4e1a\u5316\u7684\u540c\u65f6\uff0c\u4e5f\u589e\u52a0\u5bf9\u65b0\u4efb\u52a1\u7684\u7075\u6d3b\u9002\u5e94\u6027\u3002 \u6211\u4eec\u5f15\u5165\u4e86Nexus\uff0c\u4e00\u79cd\u589e\u5f3a\u7684MoE\u67b6\u6784\uff0c\u5176\u5177\u6709\u81ea\u9002\u5e94\u8def\u7531\u673a\u5236\uff0c\u5141\u8bb8\u6a21\u578b\u5b66\u4e60\u5c06\u4e13\u5bb6\u5d4c\u5165\u4ece\u9886\u57df\u8868\u793a\u8fdb\u884c\u6295\u5f71\u3002\u8fd9\u79cd\u7b56\u7565\u4f7f\u5f97Nexus\u80fd\u591f\u901a\u8fc7\u5355\u72ec\u8bad\u7ec3\u7684\u5bc6\u96c6\u6a21\u578b\u7075\u6d3b\u5730\u6dfb\u52a0\u65b0\u7684\u4e13\u5bb6\uff0c\u65e0\u9700\u5bf9\u672a\u89c1\u6570\u636e\u57df\u8fdb\u884c\u5927\u89c4\u6a21MoE\u8bad\u7ec3\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cNexus\u5728\u521d\u59cb\u5347\u7ea7\u9636\u6bb5\u5b9e\u73b0\u4e86\u9ad8\u8fbe2.1%\u7684\u76f8\u5bf9\u589e\u76ca\uff0c\u5728\u4f7f\u7528\u6709\u9650\u7684\u5fae\u8c03\u6570\u636e\u6269\u5c55MoE\u65f6\u5b9e\u73b0\u4e8618.8%\u7684\u76f8\u5bf9\u589e\u76ca\u3002Nexus\u7684\u7075\u6d3b\u6027\u5bf9\u4e8e\u5efa\u7acb\u4e00\u4e2a\u5f00\u6e90\u751f\u6001\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\uff0c\u8be5\u751f\u6001\u7cfb\u7edf\u5141\u8bb8\u6bcf\u4e2a\u7528\u6237\u6839\u636e\u81ea\u5df1\u7684\u9700\u6c42\u4e0d\u65ad\u7ec4\u88c5\u81ea\u5df1\u7684MoE\u6df7\u5408\u6a21\u578b\u3002|\n", "2408.15895": "|**2024-08-28**|**Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models**|Sebastian Vallejo Vera et.al.|[2408.15895](http://arxiv.org/abs/2408.15895)|null|\u4eba\u7c7b\u7f16\u7801\u5458\u5b58\u5728\u504f\u89c1\u3002\u6211\u4eec\u901a\u8fc7\u590d\u5236Ennser-Jedenastik\u548cMeyer\uff082018\uff09\u7684\u5b9e\u9a8c\uff0c\u53d1\u73b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc4\u4f30\u653f\u6cbb\u58f0\u660e\u65f6\u4f7f\u7528\u653f\u6cbb\u4fe1\u606f\uff0c\u7279\u522b\u662f\u653f\u515a\u7ebf\u7d22\u3002LLMs\u4e0d\u4ec5\u6839\u636e\u653f\u515a\u7ebf\u7d22\u4e0a\u4e0b\u6587\u5316\u5224\u65ad\u9648\u8ff0\u662f\u6b63\u9762\u3001\u8d1f\u9762\u8fd8\u662f\u4e2d\u6027\uff0c\u8fd8\u53cd\u6620\u51fa\u5b83\u4eec\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u751f\u6210\u7684\u4eba\u7c7b\u6570\u636e\u6240\u5177\u6709\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4e0e\u4eba\u7c7b\u4e0d\u540c\u7684\u662f\uff0c\u4eba\u7c7b\u4ec5\u5728\u9762\u5bf9\u6781\u7aef\u653f\u515a\u58f0\u660e\u65f6\u8868\u73b0\u51fa\u504f\u89c1\uff0c\u800cLLMs\u5373\u4f7f\u5728\u88ab\u63d0\u793a\u6765\u81ea\u4e2d\u95f4\u5de6\u7ffc\u548c\u4e2d\u95f4\u53f3\u7ffc\u653f\u515a\u7684\u58f0\u660e\u65f6\u4e5f\u663e\u793a\u51fa\u663e\u8457\u504f\u89c1\u3002\u6700\u540e\u90e8\u5206\u8ba8\u8bba\u4e86\u8fd9\u4e9b\u53d1\u73b0\u7684\u610f\u4e49\u3002|\n", "2408.15879": "|**2024-08-28**|**Persuasion Games using Large Language Models**|Ganesh Prasath Ramani et.al.|[2408.15879](http://arxiv.org/abs/2408.15879)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5df2\u7ecf\u53d1\u5c55\u6210\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u672c\u6587\u7814\u7a76\u4e86LLM\u5728\u5851\u9020\u4eba\u7c7b\u89c2\u70b9\u5e76\u8fdb\u800c\u5f71\u54cd\u4ed6\u4eec\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u51b3\u7b56\u65b9\u9762\u7684\u6f5c\u529b\u3002\u8fd9\u4e9b\u80fd\u529b\u5728\u6295\u8d44\u3001\u4fe1\u7528\u5361\u548c\u4fdd\u9669\u7b49\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\uff0c\u5e2e\u52a9\u7528\u6237\u9009\u62e9\u5408\u9002\u7684\u4fdd\u9669\u653f\u7b56\u3001\u6295\u8d44\u8ba1\u5212\u3001\u4fe1\u7528\u5361\u4ee5\u53ca\u96f6\u552e\u4ea7\u54c1\uff0c\u751a\u81f3\u5728\u884c\u4e3a\u6539\u53d8\u652f\u6301\u7cfb\u7edf\uff08BCSS\uff09\u4e2d\u4e5f\u6709\u5e94\u7528\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u590d\u6742\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u4e2d\u4e00\u7ec4\u4ee3\u7406\u4ee5\u534f\u4f5c\u65b9\u5f0f\u64cd\u4f5c\u3002\u4e3b\u8981\u4ee3\u7406\u76f4\u63a5\u4e0e\u7528\u6237\u8fdb\u884c\u6709\u8bf4\u670d\u529b\u7684\u5bf9\u8bdd\uff0c\u800c\u8f85\u52a9\u4ee3\u7406\u6267\u884c\u8bf8\u5982\u4fe1\u606f\u68c0\u7d22\u3001\u54cd\u5e94\u5206\u6790\u3001\u5236\u5b9a\u8bf4\u670d\u7b56\u7565\u548c\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u4efb\u52a1\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u534f\u4f5c\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u8bf4\u670d\u6548\u679c\u3002\u6211\u4eec\u6301\u7eed\u5206\u6790\u7528\u6237\u7684\u62b5\u6297\u6027\uff0c\u5e76\u901a\u8fc7\u7ed3\u5408\u89c4\u5219\u57fa\u4e8e\u548cLLM\u57fa\u4e8e\u7684\u62b5\u6297-\u8bf4\u670d\u6620\u5c04\u6280\u672f\u6765\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\u3002 \u6211\u4eec\u4f7f\u7528\u6a21\u62df\u7684\u4eba\u683c\u5f62\u8c61\uff0c\u5e76\u5728\u4fdd\u9669\u3001\u94f6\u884c\u548c\u96f6\u552e\u9886\u57df\u751f\u6210\u5bf9\u8bdd\uff0c\u4ee5\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bc6\u522b\u3001\u9002\u5e94\u548c\u5f71\u54cd\u4e0d\u540c\u4eba\u683c\u7c7b\u578b\u65b9\u9762\u7684\u719f\u7ec3\u7a0b\u5ea6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u68c0\u67e5\u4e86LLM\u6a21\u62df\u4eba\u683c\u6240\u91c7\u7528\u7684\u62b5\u6297\u673a\u5236\u3002\u8bf4\u670d\u6548\u679c\u901a\u8fc7\u4ea4\u4e92\u524d\u540e\u7684\u53ef\u8861\u91cf\u8c03\u67e5\u3001LLM\u751f\u6210\u7684\u5bf9\u8bdd\u8bc4\u5206\u4ee5\u53ca\u7528\u6237\u51b3\u7b56\uff08\u8d2d\u4e70\u6216\u4e0d\u8d2d\u4e70\uff09\u8fdb\u884c\u91cf\u5316\u3002|\n", "2408.16756": "|**2024-08-29**|**How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models**|Jiyue Jiang et.al.|[2408.16756](http://arxiv.org/abs/2408.16756)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u6539\u53d8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u7ade\u8d5b\u73af\u5883\uff0c\u7279\u522b\u662f\u5728\u82f1\u8bed\u548c\u5176\u4ed6\u6570\u636e\u4e30\u5bcc\u7684\u8bed\u8a00\u4e2d\u3002\u7136\u800c\uff0c\u5728\u8bf8\u5982\u7ca4\u8bed\u8fd9\u6837\u7684\u4ee3\u8868\u6027\u4e0d\u8db3\u7684\u8bed\u8a00\u9886\u57df\uff0c\u5f00\u53d1\u5dee\u8ddd\u4ecd\u7136\u663e\u8457\u5b58\u5728\uff0c\u8fd9\u5c24\u5176\u4ee4\u4eba\u62c5\u5fe7\uff0c\u8003\u8651\u5230\u5e7f\u6df1\u6e2f\u6fb3\u5927\u6e7e\u533a\u7684\u7ecf\u6d4e\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5728\u65b0\u52a0\u5761\u548c\u5317\u7f8e\u5730\u533a\u5927\u91cf\u7ca4\u8bed\u4f7f\u7528\u8005\u7684\u60c5\u51b5\u3002\u5c3d\u7ba1\u7ca4\u8bed\u5e7f\u6cdb\u4f7f\u7528\uff0c\u4f46\u5728NLP\u7814\u7a76\u4e2d\u5bf9\u7ca4\u8bed\u7684\u4ee3\u8868\u5374\u5c11\u4e4b\u53c8\u5c11\uff0c\u5c24\u5176\u662f\u4e0e\u5176\u4ed6\u540c\u6837\u53d1\u8fbe\u5730\u533a\u7684\u8bed\u8a00\u76f8\u6bd4\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e9b\u7a7a\u767d\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u5f53\u524d\u7684\u7ca4\u8bedNLP\u65b9\u6cd5\uff0c\u5e76\u5f15\u5165\u4e86\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4e8b\u5b9e\u751f\u6210\u3001\u6570\u5b66\u903b\u8f91\u3001\u590d\u6742\u63a8\u7406\u548c\u7ca4\u8bed\u4e2d\u7684\u901a\u7528\u77e5\u8bc6\u7b49\u65b9\u9762\u7684\u6027\u80fd\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u63a8\u52a8\u5f00\u6e90\u7ca4\u8bedLLM\u6280\u672f\u7684\u53d1\u5c55\u3002\u6211\u4eec\u4e5f\u63d0\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u548c\u63a8\u8350\u7684\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u7ca4\u8bedLLM\u7684\u5f00\u53d1\u3002|\n", "2408.16753": "|**2024-08-29**|**Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models**|Alec Solway et.al.|[2408.16753](http://arxiv.org/abs/2408.16753)|null|\u5f3a\u5316\u5b66\u4e60\u5728\u9884\u8bad\u7ec3\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u6700\u5927\u5316\u4f3c\u7136\u6027\u6765\u9884\u6d4b\u5927\u578b\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u7684\u4e0b\u4e00\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u7528\u4e8e\u5c06\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5bf9\u9f50\u3002\u5728\u90e8\u7f72\u5230\u7279\u5b9a\u9886\u57df\u4e4b\u524d\uff0c\u901a\u5e38\u4f1a\u5bf9\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u5fae\u8c03\u4ee5\u9002\u5e94\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u3002\u7531\u4e8e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5728\u6700\u540e\u9636\u6bb5\u5f80\u5f80\u4e0d\u53ef\u7528\uff0c\u56e0\u6b64\u901a\u5e38\u4f7f\u7528\u6700\u5927\u5316\u4f3c\u7136\u6027\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u9ed8\u8ba4\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5f3a\u5316\u5b66\u4e60\u9664\u4e86\u80fd\u591f\u4fc3\u8fdb\u4e0e\u4eba\u7c7b\u5b9a\u4e49\u5956\u52b1\u51fd\u6570\u7684\u5bf9\u9f50\u4e4b\u5916\uff0c\u8fd8\u6709\u5176\u4ed6\u4f18\u52bf\u3002\u76f8\u6bd4\u4e8e\u6700\u5927\u5316\u4f3c\u7136\u6027\uff0c\u5373\u6a21\u4eff\u5b66\u4e60\u6a21\u578b\u5728\u7406\u60f3\u6761\u4ef6\u4e0b\u5e94\u6267\u884c\u7684\u64cd\u4f5c\uff0c\u5f3a\u5316\u5b66\u4e60\u4e0d\u9650\u4e8e\u4ec5\u5c55\u793a\u8fbe\u5230\u6700\u4f18\u72b6\u6001\u65f6\u7684\u64cd\u4f5c\uff0c\u800c\u662f\u5728\u63a2\u7d22\u7b56\u7565\u7a7a\u95f4\u7684\u8fc7\u7a0b\u4e2d\u8bad\u7ec3\u6a21\u578b\u5728\u5404\u79cd\u60c5\u51b5\u4e0b\u7684\u64cd\u4f5c\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8bad\u7ec3\u6a21\u578b\u907f\u514d\u6267\u884c\u7ade\u4e89\u4f46\u6548\u679c\u4e0d\u4f73\u7684\u64cd\u4f5c\u3002\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u8fdb\u884c\u6700\u540e\u4e00\u9636\u6bb5\u5fae\u8c03\u7684\u6846\u67b6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u8be5\u65b9\u6cd5\u662f\u5426\u80fd\u5e26\u6765\u6027\u80fd\u63d0\u5347\u3002\u5b9e\u9a8c\u96c6\u4e2d\u5728\u62bd\u8c61\u6982\u62ec\u4e0a\uff0c\u4f46\u6846\u67b6\u5177\u6709\u666e\u904d\u9002\u7528\u6027\u3002\u91c7\u7528\u8be5\u6d41\u7a0b\u4ea7\u751f\u7684\u7ed3\u679c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u6700\u5927\u4f3c\u7136\u6027\u8f93\u51fa\u7684\u7ed3\u679c\u3002\u5bf9\u4e8e\u7279\u5b9a\u7684\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u540e\u5904\u7406\u6700\u5927\u4f3c\u7136\u8f93\u51fa\u53ef\u4ee5\u7f29\u5c0f\u6027\u80fd\u5dee\u8ddd\u3002\u7136\u800c\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u4f18\u5316\u6a21\u578b\u7684\u65b0\u9014\u5f84\uff0c\u5728\u540e\u5904\u7406\u53ef\u80fd\u4e0d\u90a3\u4e48\u76f4\u63a5\u6709\u6548\u6216\u6709\u6548\u7684\u573a\u666f\u4e2d\u5c24\u4e3a\u6709\u7528\uff0c\u5e76\u4e14\u5b83\u53ef\u4ee5\u6269\u5c55\u4ee5\u5305\u62ec\u66f4\u591a\u7c7b\u522b\u7684\u9700\u8981\u60e9\u7f5a\u5e76\u8bad\u7ec3\u53cd\u5bf9\u7684\u4e0d\u9002\u5f53\u8f93\u51fa\uff0c\u5982\u5e7b\u89c9\u3002|\n", "2408.16749": "|**2024-08-29**|**Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge**|Beidi Dong et.al.|[2408.16749](http://arxiv.org/abs/2408.16749)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u68c0\u6d4b\u548c\u9650\u5236\u7f51\u7edc\u4e0a\u6781\u7aef\u4e3b\u4e49\u601d\u60f3\u4f20\u64ad\u65b9\u9762\uff0c\u81ea\u52a8\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u7814\u7a76\u6bd4\u8f83\u4e86\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff08BERT\uff09\u548c\u751f\u6210\u9884\u8bad\u7ec3Transformer\uff08GPT\uff09\u6a21\u578b\uff0c\u5728\u201c\u53f3\u7ffc\u201d\u548c\u201c\u5de6\u7ffc\u201d\u610f\u8bc6\u5f62\u6001\u5173\u952e\u8bcd\u7684\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u4e2d\u8fdb\u884c\u68c0\u6d4b\u4e0e\u5206\u7c7b\u7684\u80fd\u529b\u3002\u6211\u4eec\u6536\u96c6\u4e86\u542b\u6709\u4e0a\u8ff0\u5173\u952e\u8bcd\u7684\u5e16\u5b50\uff0c\u5e76\u4eba\u5de5\u6807\u8bb0\u4e3a\u6781\u7aef\u4e3b\u4e49\u6216\u975e\u6781\u7aef\u4e3b\u4e49\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c06\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u5206\u4e3a\u4e94\u4e2a\u6784\u6210\u8981\u7d20\u4e4b\u4e00\uff0c\u57fa\u4e8e\u5de5\u4f5c\u5b9a\u4e49\u6846\u67b6\u3002 BERT\u6a21\u578b\u7684\u6027\u80fd\u8bc4\u4f30\u57fa\u4e8e\u8bad\u7ec3\u6570\u636e\u89c4\u6a21\u548c\u7c7b\u522b\u95f4\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u6bd4\u4e86\u4f7f\u7528\u4e0d\u540c\u63d0\u793a\u7684GPT 3.5\u548cGPT 4\u6a21\u578b\u7684\u6027\u80fd\uff1a\u539f\u59cb\u63d0\u793a\u3001\u4e00\u822c\u5b9a\u4e49\u3001\u89d2\u8272\u626e\u6f14\u548c\u4e13\u4e1a\u5b9a\u4e49\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6700\u4f73\u8868\u73b0\u7684GPT\u6a21\u578b\u4f18\u4e8e\u6700\u4f73\u8868\u73b0\u7684BERT\u6a21\u578b\uff0c\u66f4\u8be6\u7ec6\u7684\u63d0\u793a\u901a\u5e38\u80fd\u5e26\u6765\u66f4\u597d\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fc7\u4e8e\u590d\u6742\u7684\u63d0\u793a\u53ef\u80fd\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e0d\u540c\u7684GPT\u7248\u672c\u5bf9\u88ab\u8ba4\u5b9a\u4e3a\u6781\u7aef\u4e3b\u4e49\u7684\u654f\u611f\u5ea6\u5404\u4e0d\u76f8\u540c\u3002GPT 3.5\u5728\u8bc6\u522b\u5de6\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\uff0c\u800cGPT 4\u5219\u5728\u8bc6\u522b\u53f3\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\u3002 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT\u6a21\u578b\uff09\u5728\u5728\u7ebf\u6781\u7aef\u4e3b\u4e49\u5206\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684BERT\u6a21\u578b\uff0c\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u672a\u6765\u7814\u7a76\u5e94\u63a2\u7d22\u4eba\u7c7b\u4e0e\u8ba1\u7b97\u673a\u4ea4\u4e92\u5728\u4f18\u5316GPT\u6a21\u578b\u4ee5\u8fdb\u884c\u6781\u7aef\u4e3b\u4e49\u68c0\u6d4b\u4e0e\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u4f5c\u7528\uff0c\u4ee5\u5f00\u53d1\u66f4\u9ad8\u6548\uff08\u4f8b\u5982\uff0c\u66f4\u5feb\u6377\u3001\u66f4\u5c11\u52aa\u529b\uff09\u4e14\u66f4\u6709\u6548\u7684\u8bc6\u522b\u6781\u7aef\u4e3b\u4e49\u5185\u5bb9\u65b9\u6cd5\u3002|\n", "2408.16740": "|**2024-08-29**|**Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models**|Ji\u0159\u00ed Mili\u010dka et.al.|[2408.16740](http://arxiv.org/abs/2408.16740)|null|\u672c\u6587\u4ece\u5b9a\u91cf\u8bed\u8a00\u5b66\u7684\u89d2\u5ea6\u63a2\u8ba8\u4e86\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u751f\u6210\u6587\u672c\u6240\u9762\u4e34\u7684\u6982\u5ff5\u3001\u65b9\u6cd5\u8bba\u548c\u6280\u672f\u6311\u6218\u3002\u672c\u6587\u57fa\u4e8e\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u533a\u5206\u4e86\u4f5c\u4e3a\u8f7d\u4f53\u7684LLM\u4e0e\u6a21\u62df\u7684\u5b9e\u4f53\u3002\u672c\u6587\u5021\u5bfc\u5bf9\u6a21\u578b\u91c7\u53d6\u4e25\u683c\u975e\u62df\u4eba\u5316\u7684\u65b9\u6cd5\uff0c\u540c\u65f6\u8c28\u614e\u5730\u5e94\u7528\u7528\u4e8e\u7814\u7a76\u4eba\u7c7b\u8bed\u8a00\u884c\u4e3a\u7684\u65b9\u6cd5\u6765\u5206\u6790\u6a21\u62df\u5b9e\u4f53\u3002\u867d\u7136\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7814\u7a76\u8005\u5173\u6ce8\u6a21\u578b\u672c\u8eab\u3001\u5176\u67b6\u6784\u3001\u8bc4\u4f30\u4ee5\u53ca\u63d0\u9ad8\u6027\u80fd\u7684\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u5b9a\u91cf\u8bed\u8a00\u5b66\u5bb6\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u6784\u5efa\u5173\u4e8eLLM\u751f\u6210\u6587\u672c\u7279\u6027\u7684\u7406\u8bba\u4f53\u7cfb\uff0c\u5b83\u4eec\u4e0e\u4eba\u7c7b\u751f\u6210\u7684\u6587\u672c\u6709\u4f55\u4e0d\u540c\uff0c\u4ee5\u53ca\u6a21\u62df\u5b9e\u4f53\u7684\u5c5e\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5e94\u63a2\u7d22LLM\u4f5c\u4e3a\u7814\u7a76\u4eba\u7c7b\u6587\u5316\u5de5\u5177\u7684\u53ef\u80fd\u6027\uff0c\u800c\u8bed\u8a00\u662f\u8fd9\u4e00\u6587\u5316\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002|\n", "2408.16700": "|**2024-08-29**|**GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models**|Moreno D'Inc\u00e0 et.al.|[2408.16700](http://arxiv.org/abs/2408.16700)|**[link](https://github.com/moreno98/gradbias)**|**\u8fd1\u671f\u5728\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\u4f7f\u5f97\u9ad8\u8d28\u91cf\u56fe\u50cf\u751f\u6210\u6210\u4e3a\u53ef\u80fd\u3002\u968f\u7740\u6027\u80fd\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u63d0\u9ad8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6b63\u53d7\u5230\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u548c\u6b22\u8fce\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u516c\u5e73\u6027\u548c\u5b89\u5168\u6027\u662f\u9632\u6b62\u504f\u89c1\u4f20\u64ad\u548c\u5ef6\u7eed\u7684\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9884\u5b9a\u4e49\u504f\u89c1\uff08\u5982\u6027\u522b\u3001\u79cd\u65cf\uff09\u7684\u5c01\u95ed\u96c6\u5408\u4e0a\u8fdb\u884c\u504f\u89c1\u68c0\u6d4b\u3002\u7136\u800c\uff0c\u5728\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\uff0c\u5373\u65e0\u9700\u9884\u5148\u8bbe\u5b9a\u7684\u60c5\u51b5\u4e0b\uff0c\u68c0\u6d4b\u548c\u91cf\u5316\u504f\u89c1\u662f\u4e00\u4e2a\u6311\u6218\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u8bc6\u522b\u3001\u91cf\u5316\u548c\u89e3\u91ca\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\u7684\u504f\u89c1\u3002\u8be5\u7ba1\u9053\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ece\u4e00\u7ec4\u63cf\u8ff0\u4e2d\u63d0\u51fa\u504f\u89c1\u3002\u968f\u540e\uff0c\u4f7f\u7528\u76ee\u6807\u751f\u6210\u6a21\u578b\u751f\u6210\u4e00\u7cfb\u5217\u56fe\u50cf\u3002\u6700\u540e\uff0c\u901a\u8fc7\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u8fdb\u884c\u504f\u89c1\u8bc4\u4f30\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4e24\u79cd\u57fa\u4e8e\u6b64\u6846\u67b6\u7684\u65b9\u6cd5\uff1aOpenBias \u548c GradBias\u3002OpenBias \u80fd\u591f\u68c0\u6d4b\u5e76\u91cf\u5316\u4e0e\u4eba\u3001\u7269\u4f53\u548c\u52a8\u7269\u76f8\u5173\u7684\u5df2\u77e5\u548c\u65b0\u578b\u504f\u89c1\uff0c\u5e76\u4e0e\u73b0\u6709\u7684\u5c01\u95ed\u96c6\u504f\u89c1\u68c0\u6d4b\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u4e00\u81f4\u3002GradBias \u663e\u793a\u51fa\u4e2d\u6027\u8bcd\u6c47\u5bf9\u504f\u89c1\u7684\u5f71\u54cd\u663e\u8457\uff0c\u5e76\u4e14\u5728\u591a\u9879\u57fa\u7ebf\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684\u57fa\u7840\u6a21\u578b\u3002 \u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1ahttps://github.com/Moreno98/GradBias\u3002**|\n", "2408.16673": "|**2024-08-29**|**Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity**|Ziniu Li et.al.|[2408.16673](http://arxiv.org/abs/2408.16673)|null|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u7684\u7cbe\u8c03\uff08Supervised Fine-Tuning\uff0cSFT\uff09\u8fc7\u7a0b\u4e2d\u9047\u5230\u7684\u8fc7\u62df\u5408\u548c\u8f93\u51fa\u591a\u6837\u6027\u53d7\u9650\u7684\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u4ea4\u53c9\u71b5\uff08Cross Entropy\uff0cCE\uff09\u635f\u5931\u51fd\u6570\u88ab\u5e7f\u6cdb\u7528\u4e8eSFT\uff0c\u7136\u800c\u5b83\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5bf9\u6570\u636e\u5206\u5e03\u8fdb\u884c\u8fc7\u4e8e\u6fc0\u8fdb\u7684\u66f4\u65b0\uff0c\u4ece\u800c\u5f15\u53d1\u8fc7\u62df\u5408\u548c\u964d\u4f4e\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u6700\u5927\u71b5\u539f\u5219\uff0c\u8be5\u539f\u5219\u503e\u5411\u4e8e\u4fc3\u8fdb\u6a21\u578b\u751f\u6210\u66f4\u5e73\u6ed1\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u4ecd\u80fd\u6709\u6548\u6355\u6349\u6570\u636e\u7279\u5f81\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGEM\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u89e3\u51b3\u53cd\u5411Kullback-Leibler\u6563\u5ea6\u6700\u5c0f\u5316\u95ee\u9898\uff0c\u5e76\u52a0\u5165\u71b5\u6b63\u5219\u5316\u5668\uff0c\u6765\u5339\u914d\u76ee\u6807\u5206\u5e03\u3002 \u5728\u5bf9Llama-3-8B\u6a21\u578b\u8fdb\u884cSFT\u65f6\uff0cGEM\u5728\u591a\u4e2a\u65b9\u9762\u4f18\u4e8eCE\u3002\u9996\u5148\uff0c\u5728\u4f7f\u7528UltraFeedback\u6570\u636e\u96c6\u8bad\u7ec3\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u65f6\uff0cGEM\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u8ff9\u8c61\uff0c\u8868\u73b0\u4e3a\u66f4\u4f4e\u7684\u56f0\u60d1\u5ea6\u548c\u5728IFEval\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u66f4\u597d\u6027\u80fd\u3002\u6b64\u5916\uff0cGEM\u8fd8\u63d0\u9ad8\u4e86\u8f93\u51fa\u7684\u591a\u6837\u6027\uff0c\u5373\u4f7f\u5728\u6ca1\u6709\u7279\u5b9a\u9886\u57df\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u4ec5\u901a\u8fc7\u6700\u4f73n\u91c7\u6837\uff0c\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u6027\u80fd\u4e5f\u5f97\u5230\u4e86\u6700\u9ad87\u5206\u7684\u63d0\u5347\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u5f53\u4f7f\u7528\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u96c6\u5bf9\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u65f6\uff0cGEM\u540c\u6837\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u548c\u4e0eCE\u76f8\u6bd4\u9ad8\u8fbe10\u5206\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.16601": "|**2024-08-29**|**Examination of Code generated by Large Language Models**|Robin Beer et.al.|[2408.16601](http://arxiv.org/abs/2408.16601)|**[link](https://github.com/t-muras/ai-code-analysis)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4f8b\u5982ChatGPT\u548cCopilot\uff0c\u6b63\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u4ee3\u7801\u751f\u6210\u5f7b\u5e95\u6539\u53d8\u8f6f\u4ef6\u5f00\u53d1\uff0c\u8fd9\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u4fc3\u8fdb\u4e86\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u3001\u6559\u80b2\u652f\u6301\u4ee5\u53ca\u751f\u4ea7\u529b\u7684\u63d0\u5347\u3002\u56e0\u6b64\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u6b63\u786e\u6027\u548c\u8d28\u91cf\u5e94\u4e0e\u4eba\u5de5\u7f16\u5199\u7684\u4ee3\u7801\u76f8\u5f53\u3002\u4e3a\u4e86\u8bc4\u4f30\u5f53\u524dLLM\u5728\u751f\u6210Java\u548cPython\u8bed\u8a00\u4e2d\u7684\u7b80\u5355\u7b97\u6cd5\u53ca\u5176\u5bf9\u5e94\u7684\u5355\u5143\u6d4b\u8bd5\u65f6\u7684\u6b63\u786e\u6027\u548c\u8d28\u91cf\uff08\u8986\u76d6\u7387\uff09\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u53d7\u63a7\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u5305\u62ec\u8ba9LLM\u751f\u6210\u4ee3\u7801\u5e76\u8bc4\u4f30\u5176\u6b63\u786e\u6027\u4e0e\u8d28\u91cf\u3002\u6211\u4eec\u89c2\u5bdf\u5230LLM\u4e4b\u95f4\u3001\u4e0d\u540c\u7f16\u7a0b\u8bed\u8a00\u4e4b\u95f4\u3001\u7b97\u6cd5\u4e0e\u6d4b\u8bd5\u4ee3\u7801\u4e4b\u95f4\u4ee5\u53ca\u65f6\u95f4\u4e0a\u7684\u663e\u8457\u5dee\u5f02\u3002\u672c\u6587\u62a5\u544a\u4e86\u8fd9\u4e9b\u7ed3\u679c\u53ca\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u4ee5\u4fbf\u8fdb\u884c\u91cd\u590d\u548c\u53ef\u6bd4\u7684\u8bc4\u4f30\uff0c\u4ee5\u6db5\u76d6\u66f4\u591a\u7684\u7b97\u6cd5\u3001\u8bed\u8a00\u548cLLM\u968f\u65f6\u95f4\u7684\u53d8\u5316\u60c5\u51b5\u3002**|\n", "2408.16586": "|**2024-08-29**|**Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies**|Zhiyang Qi et.al.|[2408.16586](http://arxiv.org/abs/2408.16586)|null|\u8fd1\u671f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u7684\u53d1\u5c55\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5bf9\u8bdd\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f7f\u5f97\u5b83\u4eec\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u81ea\u7136\u6d41\u7545\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u4ecd\u9762\u4e34\u7740\u8bf8\u5982\u6301\u7eed\u5bf9\u8bdd\u7ba1\u7406\u3001\u8bb0\u5fc6\u4fdd\u7559\u548c\u51cf\u5c11\u5e7b\u89c9\u7b49\u6311\u6218\u3002AIWolfDial2024\u8fd9\u4e00\u9879\u76ee\u901a\u8fc7\u91c7\u7528\u201c\u72fc\u4eba\u6740\u201d\u8fd9\u4e00\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u6765\u6d4b\u8bd5LLM\u5728\u590d\u6742\u4e92\u52a8\u73af\u5883\u4e2d\u7684\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9\u4e0a\u8ff0\u6311\u6218\u3002\u8be5\u9879\u76ee\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620fAI\uff0c\u5176\u4e2d\u6bcf\u4e2a\u89d2\u8272\u90fd\u901a\u8fc7\u60c5\u5883\u5206\u6790\u6765\u8f85\u52a9\u56de\u5e94\u751f\u6210\u3002\u5bf9\u4e8e\u201c\u72fc\u4eba\u201d\u8fd9\u4e00\u89d2\u8272\uff0c\u9879\u76ee\u91c7\u7528\u4e86\u5305\u62ec\u903b\u8f91\u5438\u5f15\u529b\u3001\u53ef\u4fe1\u5ea6\u5438\u5f15\u529b\u548c\u60c5\u611f\u5438\u5f15\u529b\u5728\u5185\u7684\u591a\u79cd\u8bf4\u670d\u7b56\u7565\uff0c\u4ee5\u6709\u6548\u5730\u5f15\u5bfc\u5176\u4ed6\u73a9\u5bb6\u4e0e\u81ea\u5df1\u7684\u884c\u52a8\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2408.16518": "|**2024-08-29**|**CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues**|Rena Gao et.al.|[2408.16518](http://arxiv.org/abs/2408.16518)|**[link](https://github.com/renagao/csl2024)**|\u6211\u4eec\u5f00\u53d1\u4e86CNIMA\uff08\u4e00\u79cd\u4e2d\u6587\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u975e\u6bcd\u8bed\u4e92\u52a8\u6d4b\u91cf\u4e0e\u81ea\u52a8\u5316\u6570\u636e\u96c6\uff09\uff0c\u5305\u542b10,000\u4e2a\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u8bc4\u4f30\u6846\u67b6\u6765\u6ce8\u91caCNIMA\uff0c\u8be5\u6846\u67b6\u6700\u521d\u7528\u4e8e\u82f1\u8bed\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u5bf9\u8bdd\uff0c\u5b83\u8bc4\u4f30\u4e86\u5fae\u89c2\u5c42\u9762\u7279\u5f81\uff08\u5982\u56de\u8bdd\uff09\u548c\u5b8f\u89c2\u5c42\u9762\u4e92\u52a8\u6807\u7b7e\uff08\u5982\u4e3b\u9898\u7ba1\u7406\uff09\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u8be5\u6846\u67b6\u4ece\u82f1\u8bed\u5230\u4e2d\u6587\u7684\u53ef\u79fb\u690d\u6027\u3002\u53d1\u73b0\u8be5\u6846\u67b6\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u666e\u904d\u6027\u548c\u7279\u5b9a\u4e8e\u8bed\u8a00\u7684\u5fae\u89c2\u5c42\u9762\u548c\u5b8f\u89c2\u5c42\u9762\u7279\u5f81\u4e4b\u95f4\u7684\u5173\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u8bc4\u4f30\u7684\u65b9\u6cd5\uff0c\u5e76\u627e\u5230\u4e86\u5f3a\u5927\u7684\u6027\u80fd\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u81ea\u52a8\u5316\u7b2c\u4e8c\u8bed\u8a00\u8bc4\u4f30\u5de5\u5177\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6613\u4e8e\u9002\u5e94\u5176\u4ed6\u8bed\u8a00\uff0c\u56e0\u4e3a\u5b83\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u56e0\u6b64\u4e0d\u9700\u8981\u5927\u89c4\u6a21\u6807\u6ce8\u8bad\u7ec3\u6570\u636e\u3002|\n", "2408.16502": "|**2024-08-29**|**LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?**|Jan Cegin et.al.|[2408.16502](http://arxiv.org/abs/2408.16502)|null|\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8d8a\u6765\u8d8a\u5e7f\u6cdb\uff0c\u6587\u672c\u6837\u672c\u901a\u8fc7LLM\u8fdb\u884c\u540c\u4e49\u66ff\u6362\u540e\u7528\u4e8e\u5206\u7c7b\u6a21\u578b\u7684\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5173\u4e8eLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u76f8\u8f83\u4e8e\u73b0\u6709\u6210\u719f\u65b9\u6cd5\u662f\u5426\u5177\u6709\u660e\u663e\u4f18\u52bf\u7684\u7814\u7a76\u8bc1\u636e\u76f8\u5bf9\u7f3a\u4e4f\u3002\u4e3a\u4e86\u63a2\u8ba8\u5728\u4f55\u79cd\u60c5\u51b5\u4e0b\u4f7f\u7528LLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u66f4\u4e3a\u6709\u5229\uff0c\u672c\u7814\u7a76\u57286\u4e2a\u6570\u636e\u96c6\u30013\u4e2a\u5206\u7c7b\u5668\u548c2\u79cd\u5fae\u8c03\u65b9\u6cd5\u4e0a\u8fdb\u884c\u4e86\u5bf9\u6bd4\u5b9e\u9a8c\u3002\u6211\u4eec\u8fd8\u8c03\u6574\u4e86\u79cd\u5b50\u6570\u91cf\u548c\u6536\u96c6\u6837\u672c\u7684\u6570\u91cf\uff0c\u4ee5\u4fbf\u66f4\u5168\u9762\u5730\u63a2\u7d22\u4e0b\u6e38\u6a21\u578b\u51c6\u786e\u5ea6\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6210\u672c\u6548\u76ca\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u975e\u5e38\u5c11\u91cf\u79cd\u5b50\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u503c\u5f97\u90e8\u7f72\u3002\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u65b9\u6cd5\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u7c7b\u4f3c\u751a\u81f3\u66f4\u597d\u7684\u6a21\u578b\u51c6\u786e\u5ea6\u3002|\n", "2408.17437": "|**2024-08-30**|**SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists**|Raoyuan Zhao et.al.|[2408.17437](http://arxiv.org/abs/2408.17437)|**[link](https://github.com/loreley99/syntheval_checklist)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\u901a\u5e38\u4f7f\u7528\u9759\u6001\u9884\u7559\u6d4b\u8bd5\u96c6\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5f80\u5f80\u4f1a\u5bfc\u81f4\u6027\u80fd\u8fc7\u4f30\u8ba1\uff0c\u5e76\u7f3a\u4e4f\u63d0\u4f9b\u5168\u9762\u3001\u53ef\u89e3\u91ca\u548c\u52a8\u6001\u8bc4\u4f30NLP\u6a21\u578b\u7684\u80fd\u529b\u3002\u8fd1\u671f\uff0c\u5982DynaBench\uff08Kiela\u7b49\uff0c2021\u5e74\uff09\u548cCheckList\uff08Ribeiro\u7b49\uff0c2020\u5e74\uff09\u7b49\u4f5c\u54c1\u901a\u8fc7\u591a\u6b65\u9aa4\u4eba\u5de5\u6ce8\u91ca\u7ba1\u9053\u751f\u6210\u6d4b\u8bd5\u7c7b\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5bf9NLP\u6a21\u578b\u8fdb\u884c\u884c\u4e3a\u6d4b\u8bd5\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u624b\u52a8\u521b\u5efa\u5404\u79cd\u6d4b\u8bd5\u7c7b\u578b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\u52b3\u52a8\uff0c\u6210\u672c\u9ad8\u6602\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSYNTHEVAL\u7684\u6df7\u5408\u884c\u4e3a\u6d4b\u8bd5\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5927\u91cf\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4e3aNLP\u6a21\u578b\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002SYNTHEVAL\u9996\u5148\u901a\u8fc7LLMs\u8fdb\u884c\u53d7\u63a7\u751f\u6210\u751f\u6210\u53e5\u5b50\uff0c\u7136\u540e\u901a\u8fc7\u6bd4\u8f83LLMs\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684NLP\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\u6765\u8bc6\u522b\u6311\u6218\u6027\u793a\u4f8b\u3002\u6700\u540e\u9636\u6bb5\uff0c\u7531\u4eba\u7c7b\u4e13\u5bb6\u8c03\u67e5\u8fd9\u4e9b\u6311\u6218\u6027\u793a\u4f8b\uff0c\u624b\u52a8\u8bbe\u8ba1\u6a21\u677f\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u6a21\u578b\u4e00\u81f4\u8868\u73b0\u7684\u5931\u8d25\u7c7b\u578b\u3002\u6211\u4eec\u5c06SYNTHEVAL\u5e94\u7528\u4e8e\u60c5\u611f\u5206\u6790\u548c\u6709\u6bd2\u8bed\u8a00\u68c0\u6d4b\u4e24\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e0a\uff0c\u5e76\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u8bc6\u522b\u8fd9\u4e9b\u4efb\u52a1\u4e2d\u5f3a\u5927\u6a21\u578b\u7684\u5f31\u70b9\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5206\u4eab\u4e86\u4ee3\u7801\u4e8ehttps://github.com/Loreley99/SynthEval_CheckList\u3002**|\n", "2408.17431": "|**2024-08-30**|**Advancing Multi-talker ASR Performance with Large Language Models**|Mohan Shi et.al.|[2408.17431](http://arxiv.org/abs/2408.17431)|null|\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u9886\u57df\uff0c\u8bc6\u522b\u5bf9\u8bdd\u573a\u666f\u4e2d\u7684\u91cd\u53e0\u8bed\u97f3\u662f\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u5904\u7406\u65b9\u6cd5\u901a\u8fc7\u5e8f\u5217\u8f93\u51fa\u8bad\u7ec3\uff08SOT\uff09\uff0c\u5373\u5c06\u591a\u4e2a\u8bf4\u8bdd\u8005\u7684\u58f0\u97f3\u6392\u653e\u65f6\u95f4\u6309\u7167\u5176\u53d1\u8a00\u987a\u5e8f\u8fdb\u884c\u62fc\u63a5\uff0c\u6765\u89e3\u51b3\u591a\u8bf4\u8bdd\u8005ASR\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u4ece\u5bf9\u8bdd\u4e2d\u62fc\u63a5\u76f8\u5173\u8bdd\u8bed\u7684\u8f6c\u5f55\u4f9d\u8d56\u4e8e\u6784\u5efa\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u65b9\u6cd5\u53ef\u80fd\u66f4\u9002\u5408\u5904\u7406\u8fd9\u7c7b\u590d\u6742\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u573a\u666f\uff0c\u56e0\u4e3a\u5b83\u5229\u7528\u4e86\u9884\u8bad\u7ec3\u89e3\u7801\u5668\u7684\u5f3a\u5927\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684SOT\u65b9\u6cd5\u7528\u4e8e\u591a\u8bf4\u8bdd\u8005ASR\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u548cLLM\uff0c\u5e76\u901a\u8fc7\u9002\u5f53\u7684\u7b56\u7565\u5bf9\u591a\u8bf4\u8bdd\u8005\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6a21\u62df\u6570\u636e\u96c6LibriMix\u4e0a\u4f18\u4e8e\u4f20\u7edf\u7684\u65b9\u6cd5\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6AMI\u7684\u8bc4\u4f30\u96c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u663e\u8457\u8d85\u8d8a\u4e86\u4e4b\u524d\u4f7f\u75281000\u500d\u66f4\u591a\u76d1\u7763\u6570\u636e\u8bad\u7ec3\u7684AED\u6a21\u578b\u3002|\n", "2408.17404": "|**2024-08-30**|**Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach**|Jialiang Wei et.al.|[2408.17404](http://arxiv.org/abs/2408.17404)|null|\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\uff0c\u501f\u9274\u5e94\u7528\u5546\u5e97\uff08AppStore\uff09\u7684\u89c4\u8303\u83b7\u53d6\u65b9\u6cd5\u88ab\u8bc1\u660e\u975e\u5e38\u6709\u76ca\u3002\u5f00\u53d1\u8005\u7ecf\u5e38\u7814\u7a76\u7ade\u4e89\u5bf9\u624b\u7684\u5e94\u7528\u7a0b\u5e8f\u4ee5\u6536\u96c6\u65b0\u529f\u80fd\u7684\u7075\u611f\u3002\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u542f\u53d1\u7684\u89c4\u8303\u83b7\u53d6\u5177\u6709\u6f5c\u529b\u3002LLMs\u53ef\u4ee5\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u65b0\u529f\u80fd\u60f3\u6cd5\u7684\u7075\u611f\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u5728\u5b9e\u8df5\u4e2d\u8d8a\u6765\u8d8a\u53d7\u6b22\u8fce\uff0c\u4f46\u5b83\u4eec\u4e4b\u95f4\u7684\u5dee\u5f02\u7f3a\u4e4f\u6df1\u5165\u7406\u89e3\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6bd4\u8f83\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5e94\u7528\u5546\u5e97\u548cLLM\u542f\u53d1\u7684\u65b9\u6cd5\u5728\u7ec6\u5316\u529f\u80fd\u4e3a\u5b50\u529f\u80fd\u65f6\u7684\u8868\u73b0\u3002\u901a\u8fc7\u624b\u52a8\u5206\u6790\u4ece\u4e24\u79cd\u65b9\u6cd5\u63a8\u8350\u76841200\u4e2a\u5b50\u529f\u80fd\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u5b83\u4eec\u7684\u4f18\u70b9\u3001\u6311\u6218\u4ee5\u53ca\u5173\u952e\u5dee\u5f02\u3002\u5c3d\u7ba1\u4e24\u79cd\u65b9\u6cd5\u90fd\u63a8\u8350\u4e86\u9ad8\u5ea6\u76f8\u5173\u4e14\u63cf\u8ff0\u6e05\u6670\u7684\u5b50\u529f\u80fd\uff0c\u4f46LLMs\u5728\u7279\u522b\u6d89\u53ca\u672a\u89c1\u5e94\u7528\u8303\u56f4\u7684\u65b0\u9896\u6027\u65b9\u9762\u4f3c\u4e4e\u66f4\u4e3a\u5f3a\u5927\u3002\u6b64\u5916\uff0c\u4e00\u4e9b\u63a8\u8350\u7684\u529f\u80fd\u662f\u865a\u6784\u7684\uff0c\u5176\u53ef\u884c\u6027\u4e0d\u660e\u786e\uff0c\u8fd9\u5f3a\u8c03\u4e86\u4eba\u7c7b\u5206\u6790\u5e08\u5728\u83b7\u53d6\u8fc7\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2408.17377": "|**2024-08-30**|**NDP: Next Distribution Prediction as a More Broad Target**|Junhao Ruan et.al.|[2408.17377](http://arxiv.org/abs/2408.17377)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u8303\u5f0f\u8fdb\u884c\u8bad\u7ec3\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684NTP\u8303\u5f0f\u5b58\u5728\u51e0\u4e2a\u9650\u5236\uff0c\u7279\u522b\u662f\u5728\u8ba1\u5212\u4efb\u52a1\u590d\u6742\u6027\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9519\u8bef\u4f20\u64ad\u65b9\u9762\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6269\u5c55\u4e86\u5bf9NTP\u7684\u6279\u8bc4\uff0c\u6307\u51fa\u5176\u9650\u5236\u8fd8\u6e90\u4e8e\u8bad\u7ec3\u76ee\u6807\u72ed\u7a84\uff1a\u9884\u6d4b\u4e00\u4e2a\u6b21\u4f18\u7684\u4e00\u70ed\u5206\u5e03\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u6279\u8bc4\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u9884\u5b9e\u9a8c\uff0c\u5c06\u5f3a\u5927\u7684LLM\u7684\u8f93\u51fa\u5206\u5e03\u89c6\u4e3a\u9ad8\u6548\u7684\u4e16\u754c\u6570\u636e\u538b\u7f29\u3002\u901a\u8fc7\u8bc4\u4f30n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\uff0c\u6211\u4eec\u53d1\u73b0n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u66f4\u4e3a\u4e00\u81f4\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e0b\u4e00\u4e2a\u5206\u5e03\u9884\u6d4b\uff08NDP\uff09\uff0c\u4f7f\u7528n-gram\u5206\u5e03\u6765\u66ff\u6362\u4e00\u70ed\u76ee\u6807\uff0c\u4ece\u800c\u589e\u5f3a\u5b66\u4e60\u8fc7\u7a0b\u800c\u65e0\u9700\u989d\u5916\u7684\u5728\u7ebf\u8bad\u7ec3\u65f6\u95f4\u3002\u6211\u4eec\u5728\u7ffb\u8bd1\u3001\u901a\u7528\u4efb\u52a1\u3001\u8bed\u8a00\u8fc1\u79fb\u548c\u533b\u7597\u9886\u57df\u9002\u5e94\u7b49\u56db\u4e2a\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0eNTP\u76f8\u6bd4\uff0cNDP\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u53ef\u8fbe\u5230+2.97 COMET\u6539\u8fdb\uff0c\u5728\u901a\u7528\u4efb\u52a1\u4e0a\u5e73\u5747\u6539\u5584+0.61\uff0c\u5728\u533b\u7597\u9886\u57df\u4e0a\u5e73\u5747\u6539\u5584+10.75\u3002\u8fd9\u8868\u660e\u89e3\u51b3\u76ee\u6807\u72ed\u7a84\u95ee\u9898\u7684\u5177\u4f53\u76ca\u5904\uff0c\u5e76\u6307\u51fa\u4e86\u672a\u6765\u6539\u8fdbNTP\u7684\u4e00\u4e2a\u65b0\u65b9\u5411\u3002|\n", "2408.17362": "|**2024-08-30**|**Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain**|Francesca Grasso et.al.|[2408.17362](http://arxiv.org/abs/2408.17362)|**[link](https://github.com/stefanolocci/LLMClassification)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4e24\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09GPT3.5\u548cLlama2\u4ee5\u53ca\u4e00\u79cd\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09Gemma\u5728\u6c14\u5019\u53d8\u5316\uff08CC\uff09\u548c\u73af\u5883\u9886\u57df\u5185\u7684\u4e09\u79cd\u4e0d\u540c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528\u57fa\u4e8eBERT\u7684\u6a21\u578b\u4f5c\u4e3a\u57fa\u51c6\uff0c\u6211\u4eec\u5c06\u8fd9\u4e9b\u8f6c\u6362\u5668\u57fa\u6a21\u578b\u4e0e\u5b83\u4eec\u8fdb\u884c\u6bd4\u8f83\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u8fd9\u4e9b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u53e3\u5934\u4fe1\u5fc3\u5206\u6570\u7684\u6821\u51c6\u60c5\u51b5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u57fa\u4e8eBERT\u7684\u6a21\u578b\u901a\u5e38\u5728\u6240\u6709\u6a21\u578b\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u4f46\u5927\u751f\u6210\u6a21\u578b\u7684\u6027\u80fd\u4ecd\u7136\u503c\u5f97\u6ce8\u610f\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u7684\u6821\u51c6\u5206\u6790\u663e\u793a\uff0cGemma\u5728\u521d\u671f\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u826f\u597d\u7684\u6821\u51c6\u6027\uff0c\u968f\u540e\u4ea7\u751f\u4e0d\u4e00\u81f4\u7684\u7ed3\u679c\uff1bLlama\u5177\u6709\u5408\u7406\u7684\u6821\u51c6\u6027\uff0c\u800cGPT\u59cb\u7ec8\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6821\u51c6\u6027\u3002\u901a\u8fc7\u8fd9\u9879\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u8ba8\u8bba\u5927\u578b\u751f\u6210\u578bLM\u5728\u89e3\u51b3\u5730\u7403\u6700\u7d27\u8feb\u95ee\u9898\u65b9\u9762\u7684\u9002\u7528\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u8d21\u732e\uff0c\u7279\u522b\u662f\u5728\u751f\u6001\u5b66\u548cCC\u80cc\u666f\u4e0b\u7a81\u51fa\u5176\u4f18\u52bf\u548c\u9650\u5236\u3002**|\n", "2408.17354": "|**2024-08-30**|**Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage**|Md Rafi Ur Rashid et.al.|[2408.17354](http://arxiv.org/abs/2408.17354)|null|\u9488\u5bf9\u79c1\u6709\u6570\u636e\u8fdb\u884c\u4e0b\u6e38\u5e94\u7528\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u5b58\u5728\u91cd\u5927\u9690\u79c1\u98ce\u9669\uff0c\u53ef\u80fd\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u3002\u5f53\u524d\u793e\u533a\u5e73\u53f0\u63d0\u4f9b\u4e86\u65b9\u4fbf\u7684\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u6a21\u578b\u5206\u53d1\uff0c\u4efb\u4f55\u4eba\u90fd\u53ef\u4ee5\u53d1\u5e03\u800c\u65e0\u9700\u4e25\u683c\u7684\u9a8c\u8bc1\u3002\u8fd9\u79cd\u60c5\u5883\u4e0b\uff0c\u9690\u79c1\u5a01\u80c1\u663e\u8457\u589e\u52a0\uff0c\u56e0\u4e3a\u9884\u8bad\u7ec3\u6a21\u578b\u53ef\u80fd\u88ab\u6545\u610f\u7be1\u6539\u4ee5\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u6cc4\u9732\u79c1\u4eba\u6570\u636e\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e2d\u6bd2\u6280\u672f\uff0c\u4f7f\u7528\u6a21\u578b\u5378\u8f7d\u4f5c\u4e3a\u653b\u51fb\u5de5\u5177\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u8c03\u6574\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6765\u63d0\u9ad8\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u79c1\u4eba\u6570\u636e\u6cc4\u9732\u7a0b\u5ea6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u6a21\u578b\u5b9e\u7528\u6027\u7684\u540c\u65f6\uff0c\u589e\u5f3a\u4e86\u6210\u5458\u5f52\u5c5e\u6027\u548c\u6570\u636e\u63d0\u53d6\u653b\u51fb\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u4e0d\u540c\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5fae\u8c03\u8bbe\u7f6e\u4e0b\u663e\u793a\uff0c\u6211\u4eec\u7684\u653b\u51fb\u663e\u8457\u8d85\u8d8a\u4e86\u57fa\u51c6\u6027\u80fd\u3002\u8fd9\u9879\u5de5\u4f5c\u5411\u4e0b\u8f7d\u672a\u7ecf\u8fc7\u4e25\u683c\u9a8c\u8bc1\u6765\u6e90\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7528\u6237\u53d1\u51fa\u4e86\u8b66\u544a\uff0c\u7a81\u663e\u4e86\u6f5c\u5728\u7684\u98ce\u9669\u3002|\n", "2408.17316": "|**2024-08-30**|**Bridging Domain Knowledge and Process Discovery Using Large Language Models**|Ali Norouzifar et.al.|[2408.17316](http://arxiv.org/abs/2408.17316)|**[link](https://github.com/alinorouzifar/imr-llm)**|**\u53d1\u73b0\u4f18\u8d28\u6d41\u7a0b\u6a21\u578b\u5bf9\u4e8e\u6267\u884c\u4e0d\u540c\u7684\u6d41\u7a0b\u5206\u6790\u4efb\u52a1\u81f3\u5173\u91cd\u8981\uff0c\u5982\u4e00\u81f4\u6027\u68c0\u67e5\u548c\u6d41\u7a0b\u6539\u8fdb\u3002\u81ea\u52a8\u5316\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u4ef7\u503c\u7684\u4e13\u4e1a\u9886\u57df\u77e5\u8bc6\u3002\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5305\u62ec\u6765\u81ea\u4e13\u4e1a\u9886\u57df\u4e13\u5bb6\u7684\u89c1\u89e3\u548c\u8be6\u7ec6\u6d41\u7a0b\u6587\u6863\uff0c\u901a\u5e38\u5728\u6d41\u7a0b\u53d1\u73b0\u8fc7\u7a0b\u4e2d\u672a\u5f97\u5230\u5145\u5206\u5229\u7528\u3002\u672c\u6587\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f4\u63a5\u5c06\u6b64\u7c7b\u77e5\u8bc6\u6574\u5408\u5230\u6d41\u7a0b\u53d1\u73b0\u4e2d\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u4f7f\u7528\u4eceLLMs\u4e2d\u63d0\u53d6\u7684\u89c4\u5219\u6765\u6307\u5bfc\u6a21\u578b\u6784\u5efa\u8fc7\u7a0b\uff0c\u786e\u4fdd\u5176\u4e0e\u9886\u57df\u77e5\u8bc6\u548c\u5b9e\u9645\u6d41\u7a0b\u6267\u884c\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u6574\u5408LLMs\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u5ea7\u8fde\u63a5\u4ee5\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u7684\u6d41\u7a0b\u77e5\u8bc6\u4e0e\u53d1\u73b0\u7a33\u5065\u6d41\u7a0b\u6a21\u578b\u4e4b\u95f4\u7684\u6865\u6881\uff0c\u663e\u8457\u63a8\u8fdb\u4e86\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u8bba\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u6846\u67b6\u7684\u5b9e\u7528\u6027\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u5bf9\u8c61\u662fUWV\u5458\u5de5\u4fdd\u9669\u516c\u53f8\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5b9e\u9645\u4f18\u52bf\u548c\u6709\u6548\u6027\u3002**|\n", "2408.17280": "|**2024-08-30**|**Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts**|Rhui Dih Lee et.al.|[2408.17280](http://arxiv.org/abs/2408.17280)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5de5\u5177\u5305\uff0c\u7528\u4e8e\u4ece\u5df2\u8bad\u7ec3\u7684\u6a21\u578b\u521b\u5efa\u4f4e\u6210\u672c\u7684\u9886\u57df\u4e13\u5bb6\u6df7\u5408\uff08MOE\uff09\u3002\u8be5\u5de5\u5177\u5305\u53ef\u4ee5\u7528\u4e8e\u4ece\u6a21\u578b\u6216\u9002\u914d\u5668\u521b\u5efa\u6df7\u5408\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d4b\u8bd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u5173\u4e8e\u4f7f\u7528\u5de5\u5177\u5305\u5b9a\u4e49\u7ed3\u679cMOE\u67b6\u6784\u7684\u6307\u5bfc\u3002\u516c\u5f00\u4e86\u4e00\u4e2a\u53ef\u7528\u7684\u5b58\u50a8\u5e93\u3002|\n", "2408.17258": "|**2024-08-30**|**Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach**|Tong Nie et.al.|[2408.17258](http://arxiv.org/abs/2408.17258)|null|\u7535\u5b50\u5546\u52a1\u548c\u57ce\u5e02\u5316\u7684\u84ec\u52c3\u53d1\u5c55\uff0c\u6781\u5927\u5730\u589e\u5f3a\u4e86\u57ce\u5e02\u533a\u57df\u7684\u914d\u9001\u6d3b\u52a8\uff0c\u5bfc\u81f4\u4e86\u9700\u6c42\u91cf\u7684\u589e\u52a0\u4e0e\u590d\u6742\u6027\u7684\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6570\u636e\u9a71\u52a8\u7684\u9884\u6d4b\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5f00\u59cb\u5728\u57ce\u5e02\u914d\u9001\u9700\u6c42\u7ba1\u7406\u95ee\u9898\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\u5168\u57ce\u8303\u56f4\u5185\u7684\u914d\u9001\u9700\u6c42\u8054\u5408\u4f30\u8ba1\u4e0e\u9884\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c06\u5176\u5efa\u6a21\u4e3a\u4e00\u4e2a\u57fa\u4e8e\u56fe\u7684\u65f6\u7a7a\u5b66\u4e60\u4efb\u52a1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e00\u4e2a\u6d88\u606f\u4f20\u9012\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u6765\u6355\u6349\u76f8\u5173\u533a\u57df\u4e4b\u95f4\u9700\u6c42\u6a21\u5f0f\u7684\u4ea4\u4e92\u3002\u5176\u6b21\uff0c\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u4ece\u672a\u7ed3\u6784\u5316\u7684\u5730\u7406\u4f4d\u7f6e\u6570\u636e\u4e2d\u63d0\u53d6\u901a\u7528\u7684\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u7f16\u7801\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u9700\u6c42\u9884\u6d4b\u5668\u4e2d\u3002\u6700\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u6a21\u578b\u5728\u4e0d\u540c\u57ce\u5e02\u7684\u8fc1\u79fb\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u5f52\u7eb3\u8bad\u7ec3\u65b9\u6848\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u771f\u5b9e\u7684\u914d\u9001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4e2d\u56fd\u7684\u516b\u4e2a\u57ce\u5e02\u548c\u7f8e\u56fd\u7684\u57ce\u5e02\uff0c\u7ed3\u679c\u8868\u660e\u6211\u4eec\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u57fa\u51c6\u65b9\u6cd5\u3002|\n", "2408.17253": "|**2024-08-30**|**VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters**|Mouxiang Chen et.al.|[2408.17253](http://arxiv.org/abs/2408.17253)|**[link](https://github.com/keytoyze/visionts)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ece\u4e30\u5bcc\u4e14\u9ad8\u8d28\u91cf\u7684\u81ea\u7136\u56fe\u50cf\u51fa\u53d1\u6784\u5efa\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u57fa\u7840\u6a21\u578b\u7684\u65b0\u8def\u5f84\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u8981\u4e48\u901a\u8fc7\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8981\u4e48\u5efa\u7acb\u5927\u89c4\u6a21\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u96c6\u6765\u5f00\u53d1TSF\u57fa\u7840\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u9762\u4e34\u8de8\u57df\u5dee\u8ddd\u6216\u9886\u57df\u5185\u5f02\u8d28\u6027\u7684\u4e25\u5cfb\u6311\u6218\u3002\u6211\u4eec\u57fa\u4e8e\u56fe\u50cf\u4e0e\u65f6\u95f4\u5e8f\u5217\u4e4b\u95f4\u5185\u5728\u76f8\u4f3c\u6027\uff0c\u63a2\u7d22\u4e86\u4e00\u79cd\u65b0\u7684TSF\u4efb\u52a1\u8868\u793a\uff0c\u5c06\u5176\u91cd\u65b0\u8868\u8ff0\u4e3a\u56fe\u50cf\u91cd\u5efa\u4efb\u52a1\uff0c\u5e76\u5229\u7528\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u63a9\u7801\u81ea\u52a8\u7f16\u7801\u5668\uff08MAE\uff09\u8fdb\u884c\u5904\u7406\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5728\u65e0\u9700\u8fdb\u4e00\u6b65\u5728\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u8fdb\u884c\u9002\u5e94\u7684\u60c5\u51b5\u4e0b\uff0c\u6240\u63d0\u51fa\u7684VisionTS\u5c31\u80fd\u5b9e\u73b0\u4f18\u4e8e\u73b0\u6709TSF\u57fa\u7840\u6a21\u578b\u7684\u96f6\u6837\u672c\u9884\u6d4b\u6027\u80fd\u3002\u901a\u8fc7\u6700\u5c0f\u7a0b\u5ea6\u7684\u5fae\u8c03\uff0cVisionTS\u80fd\u591f\u8fdb\u4e00\u6b65\u63d0\u5347\u9884\u6d4b\u6027\u80fd\uff0c\u5e76\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u89c6\u89c9\u6a21\u578b\u53ef\u80fd\u4e3aTSF\u63d0\u4f9b\u514d\u8d39\u5348\u9910\uff0c\u5e76\u5f3a\u8c03\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u4e0eTSF\u9886\u57df\u672a\u6765\u4ea4\u53c9\u7814\u7a76\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/Keytoyze/VisionTS\u4e0a\u3002**|\n", "2409.02920": "|**2024-09-04**|**RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)**|Yao Mu et.al.|[2409.02920](http://arxiv.org/abs/2409.02920)|null|\u672c\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboTwin\u7684\u65b0\u578b\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u7ed3\u5408\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9065\u63a7\u6570\u636e\u4e0e\u901a\u8fc7\u6570\u5b57\u5b6a\u751f\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u3002RoboTwin\u65e8\u5728\u4e3a\u53cc\u81c2\u673a\u5668\u4eba\u573a\u666f\u63d0\u4f9b\u652f\u6301\uff0c\u7279\u522b\u5173\u6ce8\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u548c\u4eba\u673a\u4ea4\u4e92\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528COBOT Magic\u5e73\u53f0\u6536\u96c6\u4e86\u4e30\u5bcc\u7684\u6570\u636e\uff0c\u6db5\u76d6\u5de5\u5177\u64cd\u4f5c\u548c\u4eba\u673a\u4e92\u52a8\u7684\u591a\u6837\u6027\u3002 \u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\u6765\u521b\u5efa\u6570\u5b57\u5b6a\u751f\u4f53\uff0c\u5229\u7528AI\u751f\u6210\u7684\u5185\u5bb9\u5c06\u4e8c\u7ef4\u56fe\u50cf\u8f6c\u6362\u4e3a\u8be6\u7ec6\u7684\u4e09\u7ef4\u6a21\u578b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u4e13\u5bb6\u7ea7\u8bad\u7ec3\u6570\u636e\u548c\u9762\u5411\u529f\u80fd\u6027\u7684\u4efb\u52a1\u7279\u5b9a\u59ff\u6001\u5e8f\u5217\u3002 \u6211\u4eec\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a 1. RoboTwin\u57fa\u51c6\u6570\u636e\u96c6\uff0c 2. \u9ad8\u6548\u7684\u73b0\u5b9e\u5230\u6a21\u62df\u7ba1\u9053\uff0c\u4ee5\u53ca 3. \u5229\u7528\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u52a8\u4e13\u5bb6\u7ea7\u6570\u636e\u751f\u6210\u3002 \u8fd9\u4e9b\u8fdb\u5c55\u65e8\u5728\u89e3\u51b3\u673a\u5668\u4eba\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u6709\u671b\u52a0\u901f\u5f00\u53d1\u66f4\u591a\u529f\u80fd\u5f3a\u5927\u3001\u9002\u5e94\u6027\u5e7f\u6cdb\u7684\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5e94\u7528\u4e8e\u5e7f\u6cdb\u7684\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://robotwin-benchmark.github.io/early-version/|\n", "2409.02897": "|**2024-09-05**|**LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA**|Jiajie Zhang et.al.|[2409.02897](http://arxiv.org/abs/2409.02897)|**[link](https://github.com/THUDM/LongCite)**|\u5c3d\u7ba1\u5f53\u524d\u7684\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5927\u91cf\u6587\u672c\u56de\u7b54\u7528\u6237\u95ee\u9898\u65b9\u9762\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7f3a\u4e4f\u5f15\u7528\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u9a8c\u8bc1\u7b54\u6848\u7684\u51c6\u786e\u6027\uff0c\u4ece\u800c\u5f15\u53d1\u4e86\u5bf9\u5176\u53ef\u9760\u6027\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4ea7\u751f\u9519\u8bef\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4f7f\u8fd9\u4e9b\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u751f\u6210\u5305\u542b\u7cbe\u7ec6\u53e5\u7ea7\u5f15\u7528\u7684\u54cd\u5e94\uff0c\u4ee5\u63d0\u9ad8\u5b83\u4eec\u7684\u5fe0\u5b9e\u5ea6\u548c\u53ef\u9a8c\u8bc1\u6027\u3002 \u6211\u4eec\u9996\u5148\u5f15\u5165\u4e86LongBench-Cite\uff0c\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u8868\u73b0\u7684\u57fa\u51c6\uff0c\u63ed\u793a\u4e86\u5728\u53e5\u7ea7\u5f15\u7528\u65b9\u9762\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CoF\uff08\u7c97\u5230\u7ec6\uff09\u8fd9\u4e00\u65b0\u9896\u7684\u7ba1\u9053\uff0c\u5229\u7528\u73b0\u6210\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5305\u542b\u7cbe\u786e\u53e5\u7ea7\u5f15\u7528\u7684\u957f\u6587\u672c\u95ee\u7b54\u5b9e\u4f8b\uff0c\u5e76\u4ee5\u6b64\u7ba1\u9053\u6784\u5efa\u4e86LongCite-45k\uff0c\u4e00\u4e2a\u7528\u4e8e\u53e5\u7ea7\u5f15\u7528\u95ee\u9898\u7684\u5927\u578b\u81ea\u76d1\u7763\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528LongCite-45k\u6570\u636e\u96c6\u8bad\u7ec3\u4e86LongCite-8B\u548cLongCite-9B\u6a21\u578b\uff0c\u6210\u529f\u5730\u4f7f\u5b83\u4eec\u80fd\u591f\u5728\u5355\u4e2a\u8f93\u51fa\u4e2d\u751f\u6210\u51c6\u786e\u7684\u54cd\u5e94\u548c\u7cbe\u7ec6\u7684\u53e5\u7ea7\u5f15\u7528\u3002\u5728LongBench-Cite\u4e0a\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u8bad\u7ec3\u6a21\u578b\u5728\u5f15\u7528\u8d28\u91cf\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6c34\u5e73\uff0c\u8d85\u8d8a\u4e86\u5305\u62ecGPT-4\u5728\u5185\u7684\u9ad8\u7ea7\u4e13\u6709\u6a21\u578b\u3002|\n", "2409.02889": "|**2024-09-04**|**LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture**|Xidong Wang et.al.|[2409.02889](http://arxiv.org/abs/2409.02889)|**[link](https://github.com/freedomintelligence/longllava)**|**\u6269\u5c55\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u5bf9\u4e8e\u89c6\u9891\u7406\u89e3\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u7406\u89e3\u548c\u591a\u6a21\u6001\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6d89\u53ca\u5230\u4e00\u7cfb\u5217\u7cfb\u7edf\u4f18\u5316\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u6570\u636e\u6784\u9020\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u5c24\u5176\u662f\u89e3\u51b3\u968f\u7740\u66f4\u591a\u56fe\u50cf\u5f15\u5165\u800c\u51fa\u73b0\u7684\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u9ad8\u6602\u8ba1\u7b97\u6210\u672c\u7b49\u95ee\u9898\u3002\u672c\u6587\u901a\u8fc7\u5c06\u6a21\u578b\u67b6\u6784\u8c03\u6574\u4e3aMamba\u548cTransformer\u5757\u7684\u6df7\u5408\u4f53\u3001\u91c7\u7528\u65e2\u80fd\u8003\u8651\u591a\u4e2a\u56fe\u50cf\u95f4\u65f6\u95f4\u4f9d\u8d56\u6027\u53c8\u80fd\u8003\u8651\u7a7a\u95f4\u4f9d\u8d56\u6027\u7684\u6570\u636e\u6784\u9020\u65b9\u6cd5\uff0c\u5e76\u5b9e\u65bd\u6e10\u8fdb\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5bf9\u8fd9\u4e9b\u6311\u6218\u8fdb\u884c\u4e86\u5e94\u5bf9\u3002\u53d1\u5e03\u7684\u6a21\u578b\u201cLongLLaVA\u201d\uff08\u957f\u671f\u8bed\u8a00\u4e0e\u89c6\u89c9\u52a9\u624b\uff09\u662f\u9996\u4e2a\u6df7\u5408\u578bMLLM\uff0c\u5b9e\u73b0\u4e86\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u826f\u597d\u5e73\u8861\u3002LongLLaVA\u4e0d\u4ec5\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u800c\u4e14\u4fdd\u6301\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u6d88\u8017\u7684\u7279\u70b9\u3002\u7279\u522b\u5730\uff0c\u5b83\u80fd\u591f\u5728\u5355\u4e2aA100 80GB GPU\u4e0a\u5904\u7406\u8fd1\u4e00\u5343\u5f20\u56fe\u7247\uff0c\u5c55\u793a\u4e86\u5e7f\u6cdb\u4efb\u52a1\u5e94\u7528\u524d\u666f\u7684\u6f5c\u529b\u3002**|\n", "2409.02841": "|**2024-09-04**|**Historical German Text Normalization Using Type- and Token-Based Language Modeling**|Anton Ehrmanntraut et.al.|[2409.02841](http://arxiv.org/abs/2409.02841)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf91700\u5e74\u81f31900\u5e74\u5fb7\u56fd\u6587\u5b66\u6587\u672c\u7684\u6b63\u8bcd\u6cd5\u89c4\u8303\u5316\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u5e73\u884c\u8bed\u6599\u5e93\u8bad\u7ec3\u3002\u6240\u63d0\u51fa\u7684\u7cfb\u7edf\u5229\u7528\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u548cTransformer\u8bed\u8a00\u6a21\u578b\uff0c\u7ed3\u5408\u7f16\u7801\u5668-\u89e3\u7801\u5668\u6a21\u578b\u5bf9\u5355\u4e2a\u8bcd\u6c47\u7c7b\u578b\u8fdb\u884c\u89c4\u8303\u5316\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u56e0\u679c\u8bed\u8a00\u6a21\u578b\u5728\u4e0a\u4e0b\u6587\u4e2d\u8c03\u6574\u8fd9\u4e9b\u89c4\u8303\u5316\u7ed3\u679c\u3002\u5e7f\u6cdb\u8bc4\u4f30\u8868\u660e\uff0c\u8be5\u63d0\u51fa\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u6700\u5148\u8fdb\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u5b8c\u5168\u7aef\u5230\u7aef\u7684\u53e5\u5b50\u7ea7\u89c4\u8303\u5316\u7cfb\u7edf\u76f8\u5f53\uff0c\u8be5\u7cfb\u7edf\u662f\u901a\u8fc7\u5bf9\u9884\u8bad\u7ec3\u7684Transformer\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u800c\u5b9e\u73b0\u7684\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6a21\u578b\u96be\u4ee5\u6cdb\u5316\u4ee5\u53ca\u7f3a\u4e4f\u5927\u91cf\u9ad8\u8d28\u91cf\u5e73\u884c\u6570\u636e\uff0c\u5386\u53f2\u6587\u672c\u7684\u89c4\u8303\u5316\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002|\n", "2409.02836": "|**2024-09-04**|**Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models**|Moein Shahiki Tash et.al.|[2409.02836](http://arxiv.org/abs/2409.02836)|null|\u672c\u6587\u901a\u8fc7\u8fd0\u7528\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\uff0c\u5bf9\u52a0\u5bc6\u8d27\u5e01\u76f8\u5173\u8ba8\u8bba\u4e2d\u7684\u9884\u6d4b\u9648\u8ff0\u3001\u5e0c\u671b\u6f14\u8bb2\u53ca\u6094\u6068\u68c0\u6d4b\u884c\u4e3a\u8fdb\u884c\u5206\u6790\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5206\u7c7b\u65b9\u6cd5\u2014\u2014\u201c\u9884\u6d4b\u9648\u8ff0\u201d\uff0c\u5c06\u5176\u7ec6\u5206\u4e3a\u9884\u6d4b\u589e\u52a0\u3001\u9884\u6d4b\u51cf\u5c11\u3001\u9884\u6d4b\u4e2d\u7acb\u6216\u975e\u9884\u6d4b\u7c7b\u522b\u3002\u5229\u7528GPT-4o\u8fd9\u4e00\u524d\u6cbf\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u5728\u4e94\u5927\u4e3b\u6d41\u52a0\u5bc6\u8d27\u5e01\uff08Cardano\u3001Binance\u3001Matic\u3001Fantom\u3001Ripple\uff09\u7684\u8ba8\u8bba\u4e2d\u63a2\u7d22\u4e86\u60c5\u7eea\u52a8\u6001\u3002\u7814\u7a76\u53d1\u73b0\uff0cMatic\u5728\u4e50\u89c2\u9884\u6d4b\u65b9\u9762\u663e\u793a\u51fa\u7279\u522b\u9ad8\u7684\u503e\u5411\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5e0c\u671b\u4e0e\u6094\u6068\u60c5\u7eea\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u63ed\u793a\u4e86\u8fd9\u4e9b\u60c5\u611f\u4e0e\u9884\u6d4b\u884c\u4e3a\u4e4b\u95f4\u590d\u6742\u7684\u4e92\u52a8\u6a21\u5f0f\u3002\u5c3d\u7ba1\u9762\u4e34\u6570\u636e\u91cf\u548c\u8d44\u6e90\u53ef\u7528\u6027\u65b9\u9762\u7684\u9650\u5236\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4ecd\u63ed\u793a\u4e86\u52a0\u5bc6\u8d27\u5e01\u5e02\u573a\u6295\u8d44\u8005\u884c\u4e3a\u548c\u60c5\u7eea\u8d8b\u52bf\u7684\u91cd\u8981\u53d1\u73b0\uff0c\u4e3a\u6218\u7565\u51b3\u7b56\u548c\u672a\u6765\u7814\u7a76\u63d0\u4f9b\u4e86\u4fe1\u606f\u3002|\n", "2409.02834": "|**2024-09-04**|**CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models**|Wentao Liu et.al.|[2409.02834](http://arxiv.org/abs/2409.02834)|null|\u672c\u6587\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aCMM-Math\u7684\u4e2d\u6587\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\uff0c\u5305\u542b\u57fa\u51c6\u548c\u8bad\u7ec3\u90e8\u5206\uff0c\u65e8\u5728\u8bc4\u4f30\u548c\u589e\u5f3a\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\u3002CMM-Math\u5305\u542b\u4e86\u8d85\u8fc728,000\u4e2a\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u4ece\u5c0f\u5b66\u5230\u9ad8\u4e2d\u7684\u4e2d\u56fd12\u4e2a\u5e74\u7ea7\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\uff08\u4f8b\u5982\u9009\u62e9\u9898\u3001\u586b\u7a7a\u9898\u7b49\uff09\uff0c\u5e76\u63d0\u4f9b\u4e86\u8be6\u7ec6\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7279\u522b\u5730\uff0c\u95ee\u9898\u6216\u89c2\u70b9\u4e2d\u53ef\u80fd\u5305\u542b\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u4f7f\u5f97\u8fd9\u4e2a\u6570\u636e\u96c6\u66f4\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u6700\u5148\u8fdb\u7684LMM\u5728CMM-Math\u6570\u636e\u96c6\u4e0a\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728LMM\u5f00\u53d1\u65b9\u9762\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultimodal Mathematical LMM\uff08Math-LMM\uff09\u7684\u6a21\u578b\u6765\u5904\u7406\u6df7\u5408\u8f93\u5165\u7684\u591a\u4e2a\u56fe\u50cf\u548c\u6587\u672c\u6bb5\u843d\u7684\u95ee\u9898\u3002\u6211\u4eec\u91c7\u7528\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff1a\u57fa\u7840\u9884\u8bad\u7ec3\u3001\u57fa\u7840\u5fae\u8c03\u548c\u6570\u5b66\u5fae\u8c03\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u4e0e\u4e09\u4e2a\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u7684SOTA LMM\u8fdb\u884c\u6bd4\u8f83\u65f6\uff0c\u6709\u6548\u5730\u63d0\u9ad8\u4e86\u6570\u5b66\u63a8\u7406\u6027\u80fd\u3002|\n", "2409.02828": "|**2024-09-04**|**ExpLLM: Towards Chain of Thought for Facial Expression Recognition**|Xing Lan et.al.|[2409.02828](http://arxiv.org/abs/2409.02828)|null|\u9762\u90e8\u8868\u60c5\u8bc6\u522b\uff08FER\uff09\u5728\u591a\u5a92\u4f53\u9886\u57df\u81f3\u5173\u91cd\u8981\uff0c\u5bf9\u5404\u79cd\u5e94\u7528\u5177\u6709\u91cd\u5927\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7406\u89e3\u9762\u90e8\u8868\u60c5\u7684\u539f\u56e0\u5bf9\u4e8e\u51c6\u786e\u8bc6\u522b\u8868\u60c5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9762\u90e8\u52a8\u4f5c\u5355\u4f4d\uff08AUs\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u5e38\u63d0\u4f9bAU\u540d\u79f0\u548c\u5f3a\u5ea6\uff0c\u4f46\u7f3a\u4e4f\u5173\u4e8eAU\u4e4b\u95f4\u7684\u4e92\u52a8\u4ee5\u53ca\u6574\u4f53\u8868\u60c5\u4e4b\u95f4\u5173\u7cfb\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aExpLLM\u7684\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u7684\u51c6\u786e\u601d\u7ef4\u94fe\uff08CoT\uff09\u3002\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u89c6\u89d2\u8bbe\u8ba1\u4e86CoT\u673a\u5236\uff1a\u5173\u952e\u89c2\u5bdf\u3001\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u548c\u7ed3\u8bba\u3002\u5173\u952e\u89c2\u5bdf\u63cf\u8ff0\u4e86AU\u7684\u540d\u79f0\u3001\u5f3a\u5ea6\u53ca\u5176\u76f8\u5173\u60c5\u611f\u3002\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u57fa\u4e8e\u591a\u4e2aAU\u53ca\u5176\u4e92\u52a8\u8fdb\u884c\u5206\u6790\uff0c\u786e\u5b9a\u4e3b\u5bfc\u60c5\u611f\u53ca\u5176\u5173\u7cfb\u3002\u6700\u540e\uff0c\u7ed3\u8bba\u57fa\u4e8e\u524d\u4e00\u5206\u6790\u5f97\u51fa\u6700\u7ec8\u7684\u8868\u60c5\u6807\u7b7e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86Exp-CoT\u5f15\u64ce\uff0c\u7528\u4e8e\u6784\u5efa\u6b64\u8868\u60c5CoT\u5e76\u751f\u6210\u6307\u4ee4\u63cf\u8ff0\u6570\u636e\u4ee5\u8bad\u7ec3\u6211\u4eec\u7684ExpLLM\u3002\u5728RAF-DB\u548cAffectNet\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cExpLLM\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u65b9\u6cd5\u3002\u5728\u5fae\u8868\u60c5\u8bc6\u522b\u65b9\u9762\uff0cExpLLM\u4e5f\u8d85\u8d8a\u4e86\u6700\u65b0\u7684GPT-4o\uff0c\u5c24\u5176\u662f\u5728GPT-4o\u7ecf\u5e38\u5931\u8d25\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2409.02823": "|**2024-09-04**|**Design Contradictions: Help or Hindrance?**|Aron E. Owen et.al.|[2409.02823](http://arxiv.org/abs/2409.02823)|null|\u5728\u6570\u636e\u53ef\u89c6\u5316\u9886\u57df\uff0c\u521b\u65b0\u601d\u7ef4\u7684\u8feb\u5207\u9700\u6c42\u4fc3\u4f7f\u6211\u4eec\u63a2\u7d22\u65b0\u7684\u521b\u610f\u65b9\u6cd5\u3002\u901a\u8fc7\u7ec4\u5408\u4e24\u4e2a\u6216\u66f4\u591a\u5177\u6709\u5bf9\u7acb\u6027\u8d28\u7684\u521b\u9020\u6027\u8bcd\u6c47\uff0c\u80fd\u591f\u6fc0\u53d1\u65b0\u578b\u60f3\u6cd5\u4e0e\u8bbe\u8ba1\uff0c\u5bf9\u521b\u610f\u8fc7\u7a0b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002\u968f\u7740\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u8bbe\u8ba1\u7684\u53d1\u5c55\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u6d6e\u51fa\u6c34\u9762\uff1a\u8fd9\u4e9b\u8bbe\u8ba1\u77db\u76fe\u662f\u5426\u80fd\u4e0eAI\u5de5\u5177\u534f\u540c\u5de5\u4f5c\uff1f\u76ee\u524d\u7b54\u6848\u662f\u5426\u5b9a\u7684\u3002AI\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f9d\u8d56\u4e8e\u4ea7\u751f\u76f8\u4f3c\u6027\u7684\u7b97\u6cd5\uff0c\u800c\u521b\u9020\u529b\u5f80\u5f80\u9700\u8981\u5dee\u5f02\u6027\u548c\u65b0\u9896\u6027\u3002\u8fd9\u4efd\u6d77\u62a5\u5f00\u542f\u4e86\u5173\u4e8e\u5982\u4f55\u5f15\u5bfcAI\u7cfb\u7edf\u53d8\u5f97\u66f4\u5177\u521b\u9020\u6027\u548c\u751f\u6210\u65b0\u60f3\u6cd5\u7684\u5bf9\u8bdd\u3002\u8fd9\u9879\u7814\u7a76\u9080\u8bf7\u6211\u4eec\u91cd\u65b0\u8003\u8651\u4f20\u7edf\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5e76\u63a2\u7d22AI\u9a71\u52a8\u4e16\u754c\u4e2d\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u80fd\u5426\u5e94\u7528\u4f20\u7edf\u7684\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5982\u53cc\u94bb\u77f3\u6a21\u578b\uff0c\u6216\u8005\u662f\u5426\u9700\u8981\u65b0\u7684\u8bbe\u8ba1\u5de5\u7a0b\u65b9\u6cd5\uff1f\u5982\u4f55\u5229\u7528\u751f\u6210\u5f0fAI\u5feb\u901f\u8bbe\u8ba1\u53ef\u89c6\u5316\u5e76\u6784\u601d\u65b0\u60f3\u6cd5\uff1f\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u5f00\u542f\u8fd9\u4e00\u91cd\u8981\u5bf9\u8bdd\uff0c\u5e76\u63d0\u4f9b\u6709\u5173AI\u5728\u63a8\u52a8\u6570\u636e\u53ef\u89c6\u5316\u521b\u610f\u65b9\u9762\u7684\u6f5c\u529b\u7684\u5b9e\u7528\u89c1\u89e3\u3002|\n", "2409.02822": "|**2024-09-04**|**Language Understanding as a Constraint on Consensus Size in LLM Societies**|Giordano De Marzo et.al.|[2409.02822](http://arxiv.org/abs/2409.02822)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u671d\u7740\u534f\u4f5c\u4efb\u52a1\u53d1\u5c55\u7684\u60c5\u51b5\u4e0b\uff0c\u591a\u4e2a\u4ee3\u7406\u76f8\u4e92\u4f5c\u7528\uff0c\u5982\u540c\u4e00\u4e2aLLM\u793e\u4f1a\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u5927\u91cf\u7684LLM\u80fd\u591f\u901a\u8fc7\u81ea\u6211\u7ec4\u7ec7\u65b9\u5f0f\u8fbe\u6210\u5173\u4e8e\u4efb\u610f\u89c4\u8303\u7684\u5171\u8bc6\uff0c\u8fd9\u4e9b\u89c4\u8303\u5728\u4fe1\u606f\u652f\u6301\u67d0\u4e00\u9009\u9879\u4f18\u4e8e\u53e6\u4e00\u9009\u9879\u7684\u60c5\u51b5\u4e0b\u4e0d\u5b58\u5728\u3002\u4e3a\u4e86\u7406\u89e3LLM\u662f\u5426\u4e0e\u4eba\u7c7b\u793e\u4f1a\u4e00\u6837\uff0c\u5728\u6ca1\u6709\u673a\u6784\u7684\u60c5\u51b5\u4e0b\u80fd\u591f\u8fbe\u5230\u5171\u8bc6\uff0c\u6211\u4eec\u5e94\u7528\u4e86\u590d\u6742\u79d1\u5b66\u7684\u65b9\u6cd5\u548c\u884c\u4e3a\u79d1\u5b66\u7684\u539f\u5219\uff0c\u5f00\u521b\u4e86\u4e00\u79cdAI\u4eba\u7c7b\u5b66\u7684\u65b0\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLM\u80fd\u591f\u5728\u7fa4\u4f53\u4e2d\u8fbe\u6210\u5171\u8bc6\uff0c\u5e76\u4e14LLM\u7684\u610f\u89c1\u52a8\u6001\u53ef\u4ee5\u7528\u4e00\u4e2a\u7531\u591a\u6570\u529b\u91cf\u7cfb\u6570\u53c2\u6570\u5316\u7684\u51fd\u6570\u6765\u7406\u89e3\uff0c\u8be5\u7cfb\u6570\u51b3\u5b9a\u4e86\u5171\u8bc6\u662f\u5426\u53ef\u80fd\u3002\u5bf9\u4e8e\u5177\u6709\u66f4\u9ad8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u79cd\u591a\u6570\u529b\u91cf\u66f4\u5f3a\uff0c\u800c\u5bf9\u4e8e\u8f83\u5927\u7684\u7fa4\u4f53\u800c\u8a00\u5219\u4f1a\u51cf\u5f31\uff0c\u5bfc\u81f4\u5b58\u5728\u4e00\u4e2a\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\uff0c\u8d85\u8fc7\u8fd9\u4e2a\u5927\u5c0f\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684LLM\uff0c\u8fbe\u6210\u5171\u8bc6\u53d8\u5f97\u4e0d\u53ef\u80fd\u3002\u8fd9\u4e00\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\u968f\u7740\u6a21\u578b\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u589e\u957f\u5448\u6307\u6570\u7ea7\u589e\u957f\uff0c\u5bf9\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u800c\u8a00\uff0c\u5176\u53ef\u4ee5\u8fbe\u5230\u8fdc\u8d85\u975e\u6b63\u5f0f\u4eba\u7c7b\u7fa4\u4f53\u5178\u578b\u89c4\u6a21\u7684\u6570\u91cf\u7ea7\u3002|\n", "2409.02795": "|**2024-09-04**|**Towards a Unified View of Preference Learning for Large Language Models: A Survey**|Bofei Gao et.al.|[2409.02795](http://arxiv.org/abs/2409.02795)|**[link](https://github.com/kbsdjames/awesome-llm-preference-learning)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u5b9e\u73b0\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\u4e4b\u4e00\u662f\u4f7fLLM\u7684\u8f93\u51fa\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u8fd9\u4e00\u8fc7\u7a0b\u901a\u5e38\u9700\u8981\u5c11\u91cf\u6570\u636e\u5c31\u80fd\u9ad8\u6548\u63d0\u5347LLM\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u76f8\u5173\u65b9\u6cd5\u76f8\u5bf9\u590d\u6742\u96be\u4ee5\u7406\u89e3\u3002\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u5173\u7cfb\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\uff0c\u9650\u5236\u4e86\u504f\u597d\u8c03\u6574\u7b56\u7565\u7684\u53d1\u5c55\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5206\u89e3\u4e86\u73b0\u6709\u6d41\u884c\u8c03\u6574\u7b56\u7565\u7684\u56db\u4e2a\u7ec4\u6210\u90e8\u5206\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u6765\u7814\u7a76\u5f53\u524d\u7684\u8c03\u6574\u7b56\u7565\uff0c\u4ee5\u6b64\u5efa\u7acb\u5b83\u4eec\u4e4b\u95f4\u7684\u8054\u7cfb\u3002\u5728\u672c\u6587\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u5c06\u6240\u6709\u504f\u597d\u5b66\u4e60\u7b56\u7565\u5206\u89e3\u4e3a\u56db\u4e2a\u90e8\u5206\uff1a\u6a21\u578b\u3001\u6570\u636e\u3001\u53cd\u9988\u548c\u7b97\u6cd5\u3002\u8fd9\u79cd\u7edf\u4e00\u89c6\u89d2\u4e3a\u73b0\u6709\u8c03\u6574\u7b97\u6cd5\u63d0\u4f9b\u4e86\u6df1\u5165\u7406\u89e3\uff0c\u5e76\u4e14\u4e5f\u5f00\u542f\u4e86\u6574\u5408\u4e0d\u540c\u7b56\u7565\u4f18\u52bf\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u73b0\u6709\u4e3b\u6d41\u7b97\u6cd5\u7684\u5de5\u4f5c\u793a\u4f8b\uff0c\u4ee5\u5e2e\u52a9\u8bfb\u8005\u5168\u9762\u4e86\u89e3\u3002\u6700\u540e\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u7edf\u4e00\u89c6\u89d2\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7684\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002|\n", "2409.03752": "|**2024-09-05**|**Attention Heads of Large Language Models: A Survey**|Zifan Zheng et.al.|[2409.03752](http://arxiv.org/abs/2409.03752)|**[link](https://github.com/iaar-shanghai/awesome-attention-heads)**|**\u81eaChatGPT\u95ee\u4e16\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u4f5c\u4e3a\u9ed1\u76d2\u7cfb\u7edf\u5b58\u5728\u3002\u56e0\u6b64\uff0c\u5176\u53d1\u5c55\u4e3b\u8981\u4f9d\u8d56\u4e8e\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\uff0c\u9650\u5236\u4e86\u901a\u8fc7\u6539\u53d8\u5185\u90e8\u67b6\u6784\u548c\u63a8\u7406\u8def\u5f84\u6765\u63d0\u5347\u6027\u80fd\u7684\u53ef\u80fd\u6027\u3002\u8bb8\u591a\u7814\u7a76\u8005\u5f00\u59cb\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u90e8\u673a\u5236\uff0c\u65e8\u5728\u8bc6\u522b\u63a8\u7406\u74f6\u9888\u7684\u672c\u8d28\uff0c\u5927\u591a\u6570\u7814\u7a76\u96c6\u4e2d\u5728\u6ce8\u610f\u529b\u5934\u90e8\u4e0a\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u65e8\u5728\u901a\u8fc7\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\u548c\u6ce8\u610f\u529b\u5934\u90e8\u7684\u5185\u5728\u673a\u5236\uff0c\u63ed\u793a\u5176\u5185\u90e8\u63a8\u7406\u8fc7\u7a0b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u63d0\u70bc\u4e3a\u56db\u4e2a\u9636\u6bb5\u6846\u67b6\uff1a\u77e5\u8bc6\u56de\u5fc6\u3001\u60c5\u5883\u5185\u8bc6\u522b\u3001\u6f5c\u5728\u63a8\u7406\u548c\u8868\u8fbe\u51c6\u5907\u3002\u5229\u7528\u8fd9\u4e00\u6846\u67b6\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u56de\u987e\u73b0\u6709\u7814\u7a76\uff0c\u8bc6\u522b\u5e76\u5206\u7c7b\u7279\u5b9a\u6ce8\u610f\u529b\u5934\u90e8\u7684\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u53d1\u73b0\u8fd9\u4e9b\u7279\u6b8a\u5934\u90e8\u6240\u4f7f\u7528\u7684\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u5206\u4e3a\u65e0\u6a21\u578b\u65b9\u6cd5\u548c\u6709\u6a21\u578b\u65b9\u6cd5\u4e24\u5927\u7c7b\u3002\u6211\u4eec\u4e5f\u6982\u8ff0\u4e86\u76f8\u5173\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u5f53\u524d\u7814\u7a76\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u51e0\u4e2a\u6f5c\u5728\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u5f00\u6e90\u4e8e\u3002**|\n", "2409.03735": "|**2024-09-05**|**LLM-CI: Assessing Contextual Integrity Norms in Language Models**|Yan Shvartzshnaider et.al.|[2409.03735](http://arxiv.org/abs/2409.03735)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ece\u4e92\u8054\u7f51\u4e0a\u6536\u96c6\u7684\u6570\u636e\u4e2d\u8bb0\u5fc6\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u540c\u65f6\uff0c\u4e5f\u53ef\u80fd\u65e0\u610f\u4e2d\u7f16\u7801\u4e86\u793e\u4f1a\u504f\u597d\u548c\u89c4\u8303\u3002\u968f\u7740\u8fd9\u4e9b\u6a21\u578b\u88ab\u6574\u5408\u5230\u793e\u4f1a\u6280\u672f\u7cfb\u7edf\u4e2d\uff0c\u786e\u4fdd\u5b83\u4eec\u7f16\u7801\u7684\u89c4\u8303\u7b26\u5408\u793e\u4f1a\u671f\u671b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u89c4\u8303\u53ef\u80fd\u56e0\u6a21\u578b\u3001\u8d85\u53c2\u6570\u3001\u4f18\u5316\u6280\u672f\u4ee5\u53ca\u6570\u636e\u96c6\u7684\u4e0d\u540c\u800c\u4e0d\u540c\u3002\u7531\u4e8e\u63d0\u793a\u654f\u611f\u6027\u7684\u95ee\u9898\u2014\u2014\u5fae\u5c0f\u7684\u63d0\u793a\u53d8\u5316\u4f1a\u5bfc\u81f4\u4e0d\u540c\u7684\u54cd\u5e94\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u53d8\u5f97\u4e0d\u53ef\u9760\u3002\u9700\u8981\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u6db5\u76d6\u5404\u79cd\u6a21\u578b\u3001\u4f18\u5316\u548c\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u53ef\u9760\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u7f16\u7801\u7684\u89c4\u8303\u3002 \u6211\u4eec\u63d0\u51fa\u4e86LLM-CI\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30LLM\u4e2d\u7f16\u7801\u9690\u79c1\u89c4\u8303\u7684\u5f00\u6e90\u6846\u67b6\u3002LLM-CI\u4f7f\u7528\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u56e0\u7d20\u7684\u60c5\u5883\u53d9\u8ff0\u65b9\u6cd5\u6765\u8bc4\u4f30\u4e0d\u540c\u4e0a\u4e0b\u6587\u4e2d\u548c\u4e0d\u540cLLM\u4e2d\u7684\u7f16\u7801\u89c4\u8303\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u63d0\u793a\u8bc4\u4f30\u65b9\u6cd5\u6765\u89e3\u51b3\u63d0\u793a\u654f\u611f\u6027\u95ee\u9898\uff0c\u901a\u8fc7\u4ec5\u4ece\u5bfc\u81f4\u591a\u4e2a\u53d8\u4f53\u4e00\u81f4\u54cd\u5e94\u7684\u63d0\u793a\u4e2d\u8bc4\u4f30\u89c4\u8303\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4f7f\u7528\u5148\u524d\u5de5\u4f5c\u4e2d\u7684IoT\u548cCOPPA\u60c5\u666f\u6570\u636e\u96c6\u7684LLM\u3002 \u901a\u8fc7\u4f7f\u7528LLM-CI\u548c\u6211\u4eec\u63d0\u51fa\u7684\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5168\u9762\u5730\u8bc4\u4f30\u4e86LLM\uff0c\u7814\u7a76\u4e86\u6a21\u578b\u5c5e\u6027\uff08\u5982\u8d85\u53c2\u6570\u3001\u5bb9\u91cf\uff09\u548c\u4f18\u5316\u7b56\u7565\uff08\u5982\u5bf9\u9f50\u3001\u91cf\u5316\uff09\u7684\u5f71\u54cd\u3002|\n", "2409.03734": "|**2024-09-05**|**Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry**|Meena Jagadeesan et.al.|[2409.03734](http://arxiv.org/abs/2409.03734)|null|\u672c\u6587\u4ece\u7ecf\u6d4e\u548c\u7b97\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7b49\u5927\u89c4\u6a21\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5e02\u573a\u4e2d\u7684\u96c6\u4e2d\u95ee\u9898\uff0c\u4ee5\u53ca\u662f\u5426\u5b58\u5728\u8fdb\u5165\u6b64\u7c7b\u5e02\u573a\u7684\u4e0d\u53ef\u514b\u670d\u969c\u788d\u3002\u6211\u4eec\u901a\u8fc7\u6b63\u5f0f\u5b9a\u4e49\u4e00\u4e2a\u591a\u76ee\u6807\u9ad8\u7ef4\u56de\u5f52\u6846\u67b6\u6765\u63a2\u8ba8\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u7684\u95ee\u9898\uff0c\u8be5\u6846\u67b6\u6355\u6349\u5230\u4e86\u58f0\u8a89\u635f\u5bb3\u7684\u7279\u5f81\uff0c\u5e76\u5206\u6790\u4e86\u65b0\u516c\u53f8\u8fdb\u5165\u5e02\u573a\u6240\u9700\u7684\u6837\u672c\u6570\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u591a\u76ee\u6807\u8003\u8651\u80fd\u591f\u4ece\u6839\u672c\u4e0a\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u2014\u2014\u6240\u9700\u6837\u672c\u6570\u91cf\u53ef\u80fd\u8fdc\u5c0f\u4e8e\u73b0\u6709\u516c\u53f8\u7684\u6570\u636e\u96c6\u5927\u5c0f\u3002\u5728\u8bc1\u660e\u8fd9\u4e9b\u7ed3\u679c\u7684\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u8fd8\u53d1\u5c55\u4e86\u591a\u76ee\u6807\u73af\u5883\u4e2d\u9ad8\u7ef4\u7ebf\u6027\u56de\u5f52\u7684\u7f29\u653e\u5b9a\u5f8b\uff0c\u5c55\u793a\u4e86\u5f53\u6570\u636e\u96c6\u89c4\u6a21\u8f83\u5927\u65f6\uff0c\u7f29\u653e\u7387\u4f1a\u53d8\u5f97\u8f83\u6162\uff0c\u8fd9\u4e00\u53d1\u73b0\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u7814\u7a76\u4ef7\u503c\u3002|\n", "2409.03733": "|**2024-09-05**|**Planning In Natural Language Improves LLM Search For Code Generation**|Evan Wang et.al.|[2409.03733](http://arxiv.org/abs/2409.03733)|null|\u5728\u5927\u89c4\u6a21\u63d0\u5347\u8bad\u7ec3\u8ba1\u7b97\u80fd\u529b\u7684\u540c\u65f6\uff0c\u63a8\u7406\u8ba1\u7b97\u7684\u89c4\u6a21\u6269\u5c55\u5e76\u672a\u5e26\u6765\u7c7b\u4f3c\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u5047\u8bbe\uff0c\u8fd9\u4e00\u9886\u57df\u7f3a\u4e4f\u5173\u952e\u6027\u7684\u7a81\u7834\u5728\u4e8e\u751f\u6210\u6a21\u578b\u7684\u8f93\u51fa\u591a\u6837\u6027\u4e0d\u8db3\uff0c\u5bfc\u81f4\u641c\u7d22\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u6a21\u578b\u4e0d\u65ad\u4ea7\u751f\u9ad8\u5ea6\u76f8\u4f3c\u4f46\u9519\u8bef\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u73b0\u63d0\u9ad8\u8f93\u51fa\u591a\u6837\u6027\u53ef\u4ee5\u6709\u6548\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPLANSEARCH\u7684\u65b0\u9896\u641c\u7d22\u7b97\u6cd5\uff0c\u5b83\u5728\u4eba\u7c7b\u8bc4\u4ef7\u3001MBPP+\u548cLiveCodeBench\uff08\u4e00\u4e2a\u7528\u4e8e\u7ade\u4e89\u6027\u7f16\u7a0b\u7684\u65e0\u6c61\u67d3\u57fa\u51c6\uff09\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u751f\u6210\u5173\u4e8e\u95ee\u9898\u7684\u591a\u6837\u89c2\u5bdf\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u89c2\u5bdf\u6784\u5efa\u89e3\u51b3\u7b56\u7565\uff0c\u6765\u63a2\u7d22\u6bd4\u4f20\u7edf\u65b9\u6cd5\u66f4\u5e7f\u6cdb\u7684\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u3002\u5728\u4f7f\u7528PLANSEARCH\u7ed3\u5408Claude 3.5 Sonnet\u8fdb\u884c\u4f18\u5316\u540e\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86LiveCodeBench\u4e0a77.0%\u7684\u901a\u8fc7\u7387\uff08pass@200\uff09\uff0c\u8fd9\u4e0d\u4ec5\u8d85\u8d8a\u4e86\u4e0d\u4f7f\u7528\u641c\u7d22\u65b9\u6cd5\uff08pass@1=41.4%\uff09\u7684\u7ed3\u679c\uff0c\u4e5f\u4f18\u4e8e\u4ec5\u4f9d\u8d56\u91cd\u590d\u91c7\u6837\u7684\u65b9\u6cd5\uff08pass@200=60.6%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u641c\u7d22\u5e26\u6765\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5176\u5173\u952e\u56e0\u7d20\u662f\u751f\u6210\u60f3\u6cd5\u7684\u591a\u6837\u6027\u3002|\n", "2409.03708": "|**2024-09-06**|**RAG based Question-Answering for Contextual Response Prediction System**|Sriram Veturi et.al.|[2409.03708](http://arxiv.org/abs/2409.03708)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u80fd\u529b\uff0c\u9488\u5bf9\u5b9e\u9645\u5de5\u4e1a\u5e94\u7528\u4e2d\u7684\u95ee\u9898\u56de\u7b54\u573a\u666f\u3002\u7ed9\u5b9a\u5ba2\u6237\u67e5\u8be2\uff0c\u8be5\u7cfb\u7edf\u4f1a\u68c0\u7d22\u76f8\u5173\u77e5\u8bc6\u6587\u6863\uff0c\u5e76\u7ed3\u5408\u4e4b\u524d\u7684\u804a\u5929\u5386\u53f2\uff0c\u4e3a\u96f6\u552e\u516c\u53f8\u7684\u5ba2\u670d\u4e2d\u5fc3\u63d0\u4f9b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u751f\u6210\u54cd\u5e94\u5efa\u8bae\u3002\u901a\u8fc7\u5168\u9762\u7684\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5728\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u4e0a\u4f18\u4e8e\u5f53\u524d\u57fa\u4e8eBERT\u7684\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eRAG\u7684LLMs\u53ef\u4ee5\u4f5c\u4e3a\u4eba\u7c7b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u7684\u4f18\u79c0\u8f85\u52a9\u5de5\u5177\uff0c\u51cf\u8f7b\u4ed6\u4eec\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002|\n", "2409.03671": "|**2024-09-05**|**TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems**|Stylianos Loukas Vasileiou et.al.|[2409.03671](http://arxiv.org/abs/2409.03671)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRACE-cs\u7684\u65b0\u578b\u6df7\u5408\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u7b26\u53f7\u63a8\u7406\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u89e3\u51b3\u6392\u7a0b\u95ee\u9898\u4e2d\u7684\u5bf9\u6bd4\u67e5\u8be2\u3002TRACE-cs\u5229\u7528SAT\u6c42\u89e3\u6280\u672f\u7f16\u7801\u6392\u7a0b\u7ea6\u675f\uff0c\u5e76\u751f\u6210\u7528\u6237\u67e5\u8be2\u7684\u89e3\u91ca\uff0c\u540c\u65f6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c06\u7528\u6237\u7684\u67e5\u8be2\u8f6c\u6362\u4e3a\u903b\u8f91\u6761\u76ee\uff0c\u5e76\u7ec6\u5316\u7b26\u53f7\u6c42\u89e3\u5668\u751f\u6210\u7684\u89e3\u91ca\u4e3a\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u3002\u901a\u8fc7\u6574\u5408\u8fd9\u4e9b\u7ec4\u4ef6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5c06\u7b26\u53f7\u65b9\u6cd5\u4e0eLLM\u76f8\u7ed3\u5408\uff0c\u521b\u5efa\u5177\u6709\u6b63\u786e\u6027\u4fdd\u8bc1\u7684\u53ef\u89e3\u91caAI\u4ee3\u7406\u7684\u6f5c\u529b\u3002|\n", "2409.03668": "|**2024-09-05**|**A Fused Large Language Model for Predicting Startup Success**|Abdurahman Maarouf et.al.|[2409.03668](http://arxiv.org/abs/2409.03668)|null|\u4e3a\u4e86\u5e2e\u52a9\u6295\u8d44\u8005\u505a\u51fa\u6709\u6548\u7684\u51b3\u7b56\u5e76\u6301\u7eed\u5bfb\u627e\u76c8\u5229\u7684\u521b\u4e1a\u6295\u8d44\u673a\u4f1a\uff0c\u9700\u8981\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u5982\u4eca\uff0c\u6295\u8d44\u8005\u4e0d\u4ec5\u53ef\u4ee5\u5229\u7528\u6709\u5173\u521d\u521b\u516c\u53f8\u7684\u5404\u79cd\u57fa\u672c\u9762\u4fe1\u606f\uff08\u5982\u516c\u53f8\u7684\u6210\u7acb\u65f6\u95f4\u3001\u521b\u59cb\u4eba\u6570\u91cf\u4ee5\u53ca\u6240\u5904\u884c\u4e1a\uff09\uff0c\u8fd8\u53ef\u4ee5\u901a\u8fc7\u5728\u7ebf\u98ce\u9669\u6295\u8d44\uff08VC\uff09\u5e73\u53f0\u83b7\u53d6\u5173\u4e8e\u516c\u53f8\u521b\u65b0\u548c\u4e1a\u52a1\u6a21\u5f0f\u7684\u6587\u672c\u63cf\u8ff0\u4fe1\u606f\uff0c\u4f8b\u5982Crunchbase\u3002\u4e3a\u4e86\u652f\u6301\u6295\u8d44\u8005\u7684\u51b3\u7b56\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\uff0c\u65e8\u5728\u5728VC\u5e73\u53f0\u4e0a\u5b9a\u4f4d\u6210\u529f\u7684\u521d\u521b\u516c\u53f8\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f00\u53d1\u3001\u8bad\u7ec3\u5e76\u8bc4\u4f30\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u8bc4\u4f30VC\u5e73\u53f0\u4e0a\u516c\u53f8\u7684\u81ea\u6211\u63cf\u8ff0\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u9884\u6d4b\u5176\u6210\u529f\u6027\u3002\u4f7f\u7528\u6765\u81eaCrunchbase\u768420,172\u4e2a\u5728\u7ebf\u8d44\u6599\u6863\u6848\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\uff0c\u5176\u4e2d\u6587\u672c\u81ea\u6211\u63cf\u8ff0\u5bf9\u9884\u6d4b\u80fd\u529b\u8d21\u732e\u4e86\u663e\u8457\u90e8\u5206\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u51b3\u7b56\u652f\u6301\u5de5\u5177\uff0c\u5e2e\u52a9\u6295\u8d44\u8005\u627e\u5230\u76c8\u5229\u7684\u6295\u8d44\u673a\u4f1a\u3002|\n", "2409.03662": "|**2024-09-05**|**The representation landscape of few-shot learning and fine-tuning in large language models**|Diego Doimo et.al.|[2409.03662](http://arxiv.org/abs/2409.03662)|**[link](https://github.com/diegodoimo/geometry_icl_finetuning)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u6539\u8fdb\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6027\u80fd\u7684\u4e24\u79cd\u5e38\u89c1\u7b56\u7565\uff1a\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u548c\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u672c\u8d28\u4e0d\u540c\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u80fd\u4ea7\u751f\u76f8\u4f3c\u7684\u6027\u80fd\u63d0\u5347\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u5b83\u4eec\u662f\u5426\u5728LLM\u5185\u90e8\u8bf1\u5bfc\u51fa\u76f8\u4f3c\u7684\u8868\u793a\u7ed3\u6784\u77e5\u4e4b\u751a\u5c11\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u9690\u85cf\u8868\u793a\u7684\u6982\u7387\u666f\u89c2\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5728\u76f8\u540c\u7684\u95ee\u7b54\u4efb\u52a1\u4e0a\u6bd4\u8f83\u4e86LLM\u7684\u8868\u73b0\uff0c\u53d1\u73b0ICL\u548cSFT\u4ea7\u751f\u4e86\u975e\u5e38\u4e0d\u540c\u7684\u5185\u90e8\u7ed3\u6784\uff0c\u4e24\u8005\u90fd\u5728\u7f51\u7edc\u7684\u4e2d\u95f4\u90e8\u5206\u7ecf\u5386\u4e86\u4e00\u4e2a\u660e\u663e\u7684\u8f6c\u53d8\u3002\u5728\u6a21\u578b\u7684\u524d\u534a\u90e8\u5206\uff0cICL\u5851\u9020\u4e86\u5206\u5c42\u7ec4\u7ec7\u7684\u53ef\u89e3\u91ca\u8868\u793a\uff0c\u6309\u7167\u5176\u8bed\u4e49\u5185\u5bb9\u8fdb\u884c\u6392\u5e8f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0cSFT\u5f97\u5230\u7684\u6982\u7387\u666f\u89c2\u66f4\u52a0\u6a21\u7cca\u4e14\u8bed\u4e49\u6df7\u6742\u3002\u5728\u7f51\u7edc\u7684\u540e\u534a\u90e8\u5206\uff0c\u5fae\u8c03\u540e\u7684\u8868\u793a\u53d1\u5c55\u51fa\u4e86\u66f4\u6709\u5229\u4e8e\u7f16\u7801\u7b54\u6848\u8eab\u4efd\u7684\u6982\u7387\u6a21\u5f0f\uff0c\u800cICL\u8868\u793a\u7684\u6982\u7387\u5cf0\u5219\u4e0d\u592a\u660e\u786e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u63ed\u793a\u4e86LLM\u5728\u4e0d\u540c\u6761\u4ef6\u4e0b\u89e3\u51b3\u76f8\u540c\u4efb\u52a1\u65f6\u6240\u91c7\u7528\u7684\u591a\u6837\u5316\u8ba1\u7b97\u7b56\u7565\uff0c\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u671d\u7740\u8bbe\u8ba1\u51fa\u4ece\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u4fe1\u606f\u7684\u6700\u4f73\u65b9\u6cd5\u8fc8\u8fdb\u3002**|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u4e14\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u65b9\u5f0f\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u8003\u8651\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\uff0c\u540c\u65f6\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u5916\u8fd8\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u4f7f\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u7cfb\u7edf\uff08GPT-3 \u548c GPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u6b4c\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u667a\u80fd\u4f53\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u5bfc\u81f4\u4ee5\u4e0b\u7ed3\u679c\uff1a1\uff09\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff08pp\uff09\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\uff0c\u6839\u636e\u72ec\u7279\u548c\u65b0\u9896\u7684n-grams\u8bc4\u4f30\u3002\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u65b9\u9762\u4e5f\u8868\u73b0\u51fa\u7fa4\u4f53\u5dee\u5f02\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u83b7\u76ca\uff0c\u5177\u6709\u975e\u540c\u8d28\u667a\u80fd\u4f53\u7684\u591a\u6837\u5316\u7684\u6a21\u578b\u7ec4\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u663e\u793a\u4e86\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u8bcd\u6c47\u591a\u6837\u6027\u7684\u4e0b\u964d\uff0c\u5e76\u6ca1\u6709\u5c55\u73b0\u51fa\u65e8\u5728\u5728\u793e\u4ea4\u7f51\u7edc\u4e2d\u5b9e\u73b0\u7684\u7fa4\u4f53\u95f4\u5206\u5316\u3002 \u672c\u6587\u8ba4\u4e3a\uff0c\u5728\u8bf8\u5982\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u8fdb\u884c\u8303\u5f0f\u8f6c\u53d8\uff0c\u5f15\u5165\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4ea4\u4e92\u7684\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u5efa\u6a21\uff09\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u52a0\u591a\u6837\u6027\u548c\u521b\u65b0\u7684\u751f\u6210\u3002**|\n", "2409.03512": "|**2024-09-05**|**From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents**|Jifan Yu et.al.|[2409.03512](http://arxiv.org/abs/2409.03512)|null|\u81ea\u6700\u65e9\u7684\u5728\u7ebf\u6559\u80b2\u5b9e\u4f8b\u51fa\u73b0\uff0c\u8bfe\u7a0b\u88ab\u4e0a\u4f20\u81f3\u53ef\u8bbf\u95ee\u5e76\u5171\u4eab\u7684\u5728\u7ebf\u5e73\u53f0\u4ee5\u6765\uff0c\u8fd9\u79cd\u6269\u5927\u77e5\u8bc6\u4f20\u64ad\u8303\u56f4\u3001\u89e6\u53ca\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u5f62\u5f0f\u5f15\u53d1\u4e86\u5e7f\u6cdb\u8ba8\u8bba\u548c\u666e\u904d\u91c7\u7eb3\u3002\u8ba4\u8bc6\u5230\u4e2a\u6027\u5316\u5b66\u4e60\u4ecd\u5b58\u5728\u6539\u8fdb\u7a7a\u95f4\uff0c\u4eba\u5de5\u667a\u80fd\u6280\u672f\u4e0d\u65ad\u878d\u5165\u8fd9\u4e00\u5b66\u4e60\u6a21\u5f0f\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u591a\u79cd\u6559\u80b2AI\u5e94\u7528\uff0c\u5982\u6559\u80b2\u63a8\u8350\u548c\u667a\u80fd\u8f85\u5bfc\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u667a\u80fd\u7684\u6d8c\u73b0\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u6559\u80b2\u589e\u5f3a\u529f\u80fd\u5f97\u4ee5\u57fa\u4e8e\u7edf\u4e00\u7684\u57fa\u7840\u6a21\u578b\u6784\u5efa\uff0c\u5b9e\u73b0\u66f4\u6df1\u5c42\u9762\u7684\u6574\u5408\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMAIC\uff08\u5927\u89c4\u6a21AI\u8d4b\u80fd\u8bfe\u7a0b\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5728\u7ebf\u6559\u80b2\u5f62\u5f0f\uff0c\u5229\u7528LLM\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u6784\u5efaAI\u8f85\u52a9\u8bfe\u5802\uff0c\u5e73\u8861\u4e86\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002\u9664\u4e86\u63a2\u7d22\u6982\u5ff5\u6846\u67b6\u548c\u6280\u672f\u521b\u65b0\u5916\uff0c\u6211\u4eec\u5728\u6e05\u534e\u5927\u5b66\u2014\u2014\u4e2d\u56fd\u9876\u5c16\u5927\u5b66\u4e4b\u4e00\u2014\u2014\u8fdb\u884c\u4e86\u521d\u6b65\u5b9e\u9a8c\u3002\u901a\u8fc7\u8d85\u8fc710\u4e07\u6761\u5b66\u4e60\u8bb0\u5f55\u548c500\u591a\u540d\u5b66\u751f\u7684\u6570\u636e\uff0c\u6211\u4eec\u83b7\u5f97\u4e86\u5b9d\u8d35\u89c2\u5bdf\u548c\u521d\u6b65\u5206\u6790\u3002\u8fd9\u4e2a\u9879\u76ee\u5c06\u6301\u7eed\u53d1\u5c55\uff0c\u6700\u7ec8\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u5168\u9762\u5f00\u653e\u7684\u5e73\u53f0\uff0c\u652f\u6301\u548c\u7edf\u4e00\u7814\u7a76\u3001\u6280\u672f\u548c\u5e94\u7528\uff0c\u5728\u5927\u6a21\u578bAI\u65f6\u4ee3\u63a2\u7d22\u5728\u7ebf\u6559\u80b2\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bbe\u60f3\u8fd9\u4e2a\u5e73\u53f0\u662f\u4e00\u4e2a\u5408\u4f5c\u67a2\u7ebd\uff0c\u6c47\u96c6\u6559\u80b2\u8005\u3001\u7814\u7a76\u4eba\u5458\u548c\u521b\u65b0\u8005\u5171\u540c\u63a2\u7d22AI\u9a71\u52a8\u5728\u7ebf\u6559\u80b2\u7684\u672a\u6765\u3002|\n", "2409.04421": "|**2024-09-06**|**RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs**|Jiaxing Wu et.al.|[2409.04421](http://arxiv.org/abs/2409.04421)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u9884\u6d4b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08Reinforcement Learning from Prediction Feedback\uff0cRLPF\uff09\u201d\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u5e94\u7528\u65f6\u9762\u4e34\u7684\u95ee\u9898\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f53LLMs\u4ece\u7528\u6237\u7684\u8fc7\u5f80\u6d3b\u52a8\u9884\u6d4b\u884c\u4e3a\u65f6\uff0c\u5b83\u4eec\u7684\u6709\u6548\u6027\u5f80\u5f80\u53d6\u51b3\u4e8e\u80fd\u5426\u6709\u6548\u5730\u5229\u7528\u5927\u91cf\u3001\u957f\u7bc7\u7684\u7528\u6237\u5386\u53f2\u6570\u636e\uff0c\u800c\u8fd9\u4e9b\u6570\u636e\u901a\u5e38\u542b\u6709\u566a\u97f3\u4e14\u957f\u5ea6\u8fc7\u957f\u3002\u73b0\u6709\u9884\u8bad\u7ec3\u7684LLMs\u53ef\u80fd\u751f\u6210\u7684\u6458\u8981\u867d\u77ed\u5c0f\u7cbe\u608d\uff0c\u4f46\u7f3a\u4e4f\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5176\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0cRLPF\u65b9\u6cd5\u901a\u8fc7\u5fae\u8c03LLMs\u6765\u751f\u6210\u7cbe\u70bc\u3001\u4eba\u7c7b\u53ef\u8bfb\u7684\u7528\u6237\u6982\u8981\uff0c\u8fd9\u4e9b\u6982\u8981\u80fd\u591f\u4f18\u5316\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\u3002\u901a\u8fc7\u6700\u5927\u5316\u751f\u6210\u6982\u8981\u7684\u6709\u7528\u6027\uff0cRLPF\u80fd\u591f\u6709\u6548\u63d0\u53d6\u5927\u91cf\u7528\u6237\u5386\u53f2\u6570\u636e\u7684\u5173\u952e\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0cRLPF\u5728\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u4e0a\u663e\u8457\u63d0\u5347\u4e8622%\uff0c\u5728\u4e8b\u5b9e\u6027\u3001\u62bd\u8c61\u6027\u548c\u53ef\u8bfb\u6027\u7b49\u6307\u6807\u4e0a\u7684\u8868\u73b0\u5206\u522b\u8fbe\u5230\u4e8684.59%\u7684\u80dc\u7387\uff0c\u540c\u65f6\u5b9e\u73b0\u4e8674%\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u51cf\u5c11\uff0c\u4e14\u572816\u4e2a\u672a\u89c1\u7684\u4efb\u52a1\u548c/\u6216\u6570\u636e\u96c6\u4e0a\u5747\u6709\u6027\u80fd\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u5176\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u603b\u4e4b\uff0cRLPF\u63d0\u4f9b\u4e86\u4e00\u79cd\u589e\u5f3aLLMs\u5728\u4e2a\u4eba\u5316\u9886\u57df\u5e94\u7528\u7684\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u901a\u8fc7\u5c06\u957f\u7bc7\u3001\u566a\u97f3\u4e30\u5bcc\u7684\u7528\u6237\u5386\u53f2\u8f6c\u5316\u4e3a\u4fe1\u606f\u4e30\u5bcc\u3001\u6613\u4e8e\u7406\u89e3\u7684\u8868\u793a\uff0c\u4ece\u800c\u63d0\u9ad8LLMs\u7684\u4e2a\u4eba\u5316\u80fd\u529b\u3002|\n", "2409.04388": "|**2024-09-06**|**Question-Answering Dense Video Events**|Hangyu Qin et.al.|[2409.04388](http://arxiv.org/abs/2409.04388)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\u2014\u2014\u9488\u5bf9\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u4e0e\u5b9a\u4f4d\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u80fd\u591f\u51c6\u786e\u7406\u89e3\u5e76\u63a8\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u591a\u4e2a\u4e8b\u4ef6\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aDeVE-QA\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5173\u4e8e10600\u4e2a\u957f\u89c6\u9891\u4e2d26000\u4e2a\u4e8b\u4ef6\u768478000\u4e2a\u95ee\u9898\u3002 \u73b0\u6709\u5728\u5355\u4e8b\u4ef6\u95ee\u7b54\u4e0a\u8868\u73b0\u51fa\u8272\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u9762\u5bf9DeVE-QA\u65f6\u9047\u5230\u6311\u6218\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5904\u7406\u957f\u65f6\u95f4\u6bb5\u5185\u53d1\u751f\u7684\u591a\u4e2a\u4e8b\u4ef6\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDeVi\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u5373\u53ef\u63d0\u5347MLLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002DeVi\u901a\u8fc7\u5f15\u5165\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u6765\u6539\u8fdb\u73b0\u6709\u7684MLLMs\uff1a\u5c42\u7ea7\u63cf\u8ff0\u6a21\u5757\u3001\u65f6\u95f4\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u548c\u81ea\u6211\u4e00\u81f4\u6027\u68c0\u67e5\u6a21\u5757\u3002\u8fd9\u4e09\u4e2a\u6a21\u5757\u5206\u522b\u7528\u4e8e\u68c0\u6d4b\u3001\u4e0a\u4e0b\u6587\u5316\u548c\u8bb0\u5fc6\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\uff0c\u4ee5\u53ca\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u4ee5\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709MLLMs\u76f8\u6bd4\uff0cDeVi\u5728\u56de\u7b54\u5bc6\u96c6\u4e8b\u4ef6\u95ee\u9898\u548c\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728DeVE-QA\u6570\u636e\u96c6\u4e0a\uff0cDeVi\u7684G(round)QA\u51c6\u786e\u7387\u63d0\u9ad8\u4e864.1%\uff0c\u5728NExT-GQA\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e863.7%\u3002|\n", "2409.04318": "|**2024-09-06**|**Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs**|Aliakbar Nafar et.al.|[2409.04318](http://arxiv.org/abs/2409.04318)|**[link](https://github.com/HLR/LvsR-LLM)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u4f30\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5185\u5728\u5b66\u4e60\u673a\u5236\u7684\u6846\u67b6\u3002\u6211\u4eec\u58f0\u79f0\uff0c\u8fd9\u4e9b\u673a\u5236\u662f\u901a\u8fc7\u68c0\u7d22\u5185\u90e8\u77e5\u8bc6\u548c\u901a\u8fc7\u5173\u6ce8\u56de\u5f52\u4efb\u52a1\u4ece\u4e0a\u4e0b\u6587\u4e2d\u7684\u793a\u4f8b\u8fdb\u884c\u5b66\u4e60\u7684\u7ec4\u5408\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u6267\u884c\u56de\u5f52\u7684\u80fd\u529b\uff0c\u5e76\u8bbe\u8ba1\u5b9e\u9a8c\u6765\u8861\u91cf\u6a21\u578b\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u901a\u8fc7\u68c0\u7d22\u5176\u5185\u90e8\u77e5\u8bc6\u800c\u4e0d\u662f\u4ece\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u5b66\u4e60\u6765\u8fdb\u884c\u5185\u5728\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e2a\u8fc7\u7a0b\u4f4d\u4e8e\u8fd9\u4e24\u4e2a\u6781\u7aef\u4e4b\u95f4\u7684\u8fde\u7eed\u4f53\u4e0a\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u6839\u636e\u5404\u79cd\u56e0\u7d20\uff08\u5982\u4efb\u52a1\u7684\u5148\u9a8c\u77e5\u8bc6\u4ee5\u53ca\u63d0\u4f9b\u7ed9\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u4fe1\u606f\u7c7b\u578b\u548c\u4e30\u5bcc\u5ea6\uff09\u8fd9\u4e9b\u673a\u5236\u88ab\u89e6\u53d1\u7684\u7a0b\u5ea6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cdLLMs\u5e76\u5229\u7528\u591a\u4e2a\u6570\u636e\u96c6\u6765\u9a8c\u8bc1\u6211\u4eec\u7684\u53d1\u73b0\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5982\u4f55\u6839\u636e\u6240\u89e3\u51b3\u7684\u95ee\u9898\u5229\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u7684\u5143\u5b66\u4e60\u548c\u4fc3\u8fdb\u77e5\u8bc6\u68c0\u7d22\u7684\u65b9\u6cd5\u3002|\n", "2409.04312": "|**2024-09-06**|**An optically accelerated extreme learning machine using hot atomic vapors**|Pierre Azam et.al.|[2409.04312](http://arxiv.org/abs/2409.04312)|null|\u673a\u5668\u5b66\u4e60\u6b63\u9010\u6e10\u6210\u4e3a\u4e00\u79cd\u5e7f\u6cdb\u5e94\u7528\u7684\u6280\u672f\uff0c\u5176\u589e\u957f\u901f\u5ea6\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u539f\u56e0\u5728\u4e8e\u5b83\u80fd\u591f\u63d0\u4f9b\u89e3\u51b3\u793e\u4f1a\u5173\u6ce8\u95ee\u9898\u7684\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\u7684\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u968f\u7740\u5e94\u7528\u548c\u6240\u9700\u8d44\u6e90\u7684\u589e\u52a0\uff0c\u5f53\u524d\u7684\u786c\u4ef6\u6280\u672f\u5f00\u59cb\u53d7\u9650\u3002\u7279\u522b\u662f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6216\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u8bc6\u522b\u7b49\u65b0\u578b\u673a\u5668\u5b66\u4e60\u9886\u57df\uff0c\u8ba1\u7b97\u65f6\u95f4\u4e0e\u80fd\u6e90\u6210\u672c\u6210\u4e3a\u4e86\u5173\u952e\u95ee\u9898\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u591a\u5e74\u6765\u5df2\u7ecf\u8bbe\u8ba1\u51fa\u4e86\u5149\u5b66\u5e73\u53f0\uff0c\u65e8\u5728\u5f00\u53d1\u66f4\u9ad8\u6548\u7684\u673a\u5668\u5b66\u4e60\u786c\u4ef6\u3002 \u5176\u4e2d\uff0c\u81ea\u7531\u7a7a\u95f4\u4f20\u64ad\u5e73\u53f0\u5177\u6709\u591a\u79cd\u4f18\u52bf\uff1a\u5e76\u884c\u6027\u3001\u4f4e\u80fd\u8017\u4e0e\u8ba1\u7b97\u901f\u5ea6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7ed3\u5408\u5149\u675f\u5728\u70ed\u539f\u5b50\u84b8\u6c14\u4e2d\u4f20\u64ad\u7684\u5f3a\u70c8\u4e14\u53ef\u8c03\u975e\u7ebf\u6027\u7279\u6027\u7684\u65b0\u8bbe\u8ba1\uff0c\u5e76\u4e0e\u6781\u7aef\u5b66\u4e60\u673a\u6a21\u578b\u76f8\u7ed3\u5408\u3002\u901a\u8fc7\u6570\u503c\u6a21\u62df\u4e0e\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728MNIST\u56fe\u50cf\u5206\u7c7b\u4efb\u52a1\u4e2d\u4f7f\u7528\u6b64\u7c7b\u81ea\u7531\u7a7a\u95f4\u975e\u7ebf\u6027\u4f20\u64ad\u589e\u5f3a\u8bad\u7ec3\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u9a8c\u4e2d\u7684\u591a\u4e2a\u8d85\u53c2\u6570\uff0c\u8fd9\u4e9b\u53c2\u6570\u8fdb\u4e00\u6b65\u4f18\u5316\u540e\u53ef\u4ee5\u63d0\u9ad8\u5e73\u53f0\u7684\u51c6\u786e\u6027\u3002|\n", "2409.04286": "|**2024-09-06**|**Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets**|Desiree Heim et.al.|[2409.04286](http://arxiv.org/abs/2409.04286)|null|\u5f53\u524d\u516c\u5f00\u7684\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u5728\u591a\u6837\u6027\u3001\u8be6\u5c3d\u6ce8\u91ca\u4ee5\u53ca\u7528\u6237\u548c\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u77e5\u8bc6\u5de5\u4f5c\u8f85\u52a9\u7cfb\u7edf\u8fdb\u884c\u5ba2\u89c2\u548c\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u9a71\u52a8\u8bc4\u4f30\u4e0e\u4f18\u5316\u3002\u7531\u4e8e\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u6536\u96c6\u6b64\u7c7b\u6570\u636e\u6240\u9700\u7684\u8d44\u6e90\u5de8\u5927\uff0c\u4ee5\u53ca\u6570\u636e\u5ba1\u67e5\u7684\u5fc5\u8981\u6027\uff0c\u56e0\u6b64\u6784\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u51e0\u4e4e\u4e0d\u53ef\u80fd\u5b9e\u73b0\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u914d\u7f6e\u7684\u591a\u4ee3\u7406\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u751f\u6210\u5668\u3002\u8be5\u7cfb\u7edf\u6a21\u62df\u4e86\u7531\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u6863\u5e76\u76f8\u4e92\u534f\u4f5c\u7684\u4ee3\u7406\u4e4b\u95f4\u7684\u77e5\u8bc6\u5de5\u4f5c\uff0c\u5e76\u8bb0\u5f55\u4e86\u4f34\u968f\u7684\u6570\u636e\u8f68\u8ff9\u3002\u6b64\u5916\uff0c\u751f\u6210\u5668\u5728\u5176\u914d\u7f6e\u4e2d\u6355\u83b7\u6216\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u521b\u5efa\u7684\u6240\u6709\u80cc\u666f\u4fe1\u606f\uff0c\u5e76\u4ee5\u77e5\u8bc6\u56fe\u8c31\u7684\u5f62\u5f0f\u5b58\u50a8\u3002\u6700\u540e\uff0c\u4ea7\u751f\u7684\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\u5229\u7528\u548c\u5171\u4eab\uff0c\u800c\u65e0\u9700\u6d89\u53ca\u9690\u79c1\u6216\u673a\u5bc6\u95ee\u9898\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u8bbe\u8ba1\u613f\u666f\uff0c\u5e76\u4e13\u6ce8\u4e8e\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u771f\u5b9e\u7684\u77e5\u8bc6\u5de5\u4f5c\u6587\u6863\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u4eba\u7c7b\u8bc4\u4f30\u8005\u8bc4\u4f30\u4e86\u751f\u6210\u6587\u6863\u768453%\u548c\u771f\u5b9e\u6587\u6863\u768474%\uff0c\u8ba4\u4e3a\u5b83\u4eec\u5177\u6709\u771f\u5b9e\u6027\uff0c\u8fd9\u8868\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u6790\u4e86\u53c2\u4e0e\u8005\u8bc4\u8bba\u4e2d\u63d0\u5230\u7684\u771f\u5b9e\u6027\u6807\u51c6\uff0c\u5e76\u5bf9\u5df2\u8bc6\u522b\u7684\u5e38\u89c1\u95ee\u9898\u8fdb\u884c\u4e86\u8be6\u7ec6\u8bf4\u660e\uff0c\u63d0\u51fa\u4e86\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.04270": "|**2024-09-06**|**Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models**|Yuxiao Huang et.al.|[2409.04270](http://arxiv.org/abs/2409.04270)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4f18\u5316\u8303\u5f0f\uff0c\u4ee5\u5efa\u7acb\u4e00\u4e2a\u81ea\u4e3b\u6a21\u578b\u5de5\u5382\uff0c\u7528\u4e8e\u751f\u6210\u9002\u7528\u4e8e\u4e0d\u540c\u4f18\u5316\u4efb\u52a1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u8bbe\u8ba1\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u4e3a\u4e86\u8bc4\u4f30\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u7814\u7a76\uff0c\u5c06\u751f\u6210\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u4e0e\u73b0\u6709\u7684\u6700\u4f73\u77e5\u8bc6\u8f6c\u79fb\u65b9\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u751f\u6210\u7684\u6a21\u578b\u5728\u6548\u7387\u548c\u6709\u6548\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u6216\u4e0e\u624b\u5de5\u8bbe\u8ba1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2409.04183": "|**2024-09-06**|**GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding**|Ziyin Zhang et.al.|[2409.04183](http://arxiv.org/abs/2409.04183)|null|\u5728\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GALLa - \u56fe\u5f62\u5bf9\u9f50\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002GALLa \u5229\u7528\u56fe\u795e\u7ecf\u7f51\u7edc\u548c\u8de8\u6a21\u6001\u5bf9\u9f50\u6280\u672f\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u5411LLM\u6ce8\u5165\u4ee3\u7801\u7684\u7ed3\u6784\u4fe1\u606f\u4f5c\u4e3a\u8f85\u52a9\u4efb\u52a1\u3002\u8fd9\u79cd\u6846\u67b6\u65e2\u65e0\u6a21\u578b\u4f9d\u8d56\u6027\u4e5f\u65e0\u4efb\u52a1\u4f9d\u8d56\u6027\uff0c\u5b83\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801LLM\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801\u4e0b\u6e38\u4efb\u52a1\uff0c\u5e76\u4ec5\u5728\u8bad\u7ec3\u65f6\u4ece\u4e0e\u5fae\u8c03\u6570\u636e\u65e0\u5173\u7684\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7ed3\u6784\u5316\u56fe\u5f62\u6570\u636e\uff0c\u800c\u5728\u63a8\u7406\u9636\u6bb5\u65e0\u9700\u989d\u5916\u6210\u672c\u3002\u901a\u8fc7\u56db\u79cd\u4e0d\u540c\u57fa\u7ebfLLM\uff08\u53c2\u6570\u91cf\u4ece3.5\u4ebf\u523080\u4ebf\u4e0d\u7b49\uff09\u5728\u4e94\u4e2a\u4ee3\u7801\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86GALLa\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5f3a\u5927\u7684\u6a21\u578b\u5982LLaMA3\uff0c\u4e5f\u8bc1\u660e\u4e86\u5176\u4e00\u81f4\u6027\u6539\u8fdb\u3002|\n", "2409.04181": "|**2024-09-06**|**Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering**|Larissa Pusch et.al.|[2409.04181](http://arxiv.org/abs/2409.04181)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u636e\u5e93\u7b49\u4fe1\u606f\u7cfb\u7edf\u7684\u4ea4\u4e92\u65b9\u5f0f\uff0c\u4f7f\u5176\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u7136\u800c\uff0c\u5728\u5173\u952e\u51c6\u786e\u6027\u9886\u57df\uff0c\u5982\u751f\u7269\u533b\u5b66\u9886\u57df\uff0c\u4ecd\u5b58\u5728\u6311\u6218\u3002\u5176\u4e2d\u4e00\u4e2a\u91cd\u8981\u95ee\u9898\u662f\u5e7b\u89c9\u95ee\u9898\uff0c\u5373\u6a21\u578b\u751f\u6210\u4e86\u6570\u636e\u652f\u6301\u4e4b\u5916\u7684\u4fe1\u606f\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5371\u9669\u7684\u9519\u8bef\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u6539\u5584\u95ee\u7b54\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u4ee5\u751f\u7269\u533b\u5b66KG\u4e3a\u4f8b\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8eLangChain\u6846\u67b6\u6784\u5efa\uff0c\u901a\u8fc7\u5f15\u5165\u67e5\u8be2\u68c0\u67e5\u5668\u786e\u4fddLLM\u751f\u6210\u7684\u67e5\u8be2\u5728\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u7684\u6709\u6548\u6027\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u67e5\u8be2\u4ece\u77e5\u8bc6\u56fe\u8c31\u4e2d\u63d0\u53d6\u4fe1\u606f\uff0c\u5927\u5e45\u51cf\u5c11\u4e86\u9519\u8bef\u5982\u5e7b\u89c9\u7684\u53d1\u751f\u3002 \u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u5305\u542b50\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u9898\u7684\u65b0\u57fa\u51c6\u6570\u636e\u96c6\u5bf9\u6574\u4f53\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u4e86\u5305\u62ecGPT-4 Turbo\u548cllama3:70b\u5728\u5185\u7684\u51e0\u79cdLLM\u3002\u7ed3\u679c\u663e\u793a\uff0c\u867d\u7136GPT-4 Turbo\u5728\u751f\u6210\u51c6\u786e\u67e5\u8be2\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f00\u6e90\u6a21\u578b\u5982llama3:70b\u5728\u9002\u5f53\u7684\u95ee\u9898\u63d0\u793a\u5de5\u7a0b\u4e0b\u4e5f\u663e\u793a\u51fa\u6f5c\u529b\u3002\u4e3a\u4e86\u4f7f\u8fd9\u79cd\u65b9\u6cd5\u6613\u4e8e\u8bbf\u95ee\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684Web\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\uff0c\u67e5\u770b\u751f\u6210\u548c\u4fee\u6b63\u7684Cypher\u67e5\u8be2\uff0c\u5e76\u9a8c\u8bc1\u7ed3\u679c\u8def\u5f84\u7684\u51c6\u786e\u6027\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u8fd9\u79cd\u6df7\u5408\u65b9\u6cd5\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6570\u636e\u7f3a\u53e3\u548c\u5e7b\u89c9\u7b49\u5e38\u89c1\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u9760\u4e14\u76f4\u89c2\u7684\u89e3\u51b3\u65b9\u6848\u6765\u6539\u8fdb\u95ee\u7b54\u7cfb\u7edf\u3002\u751f\u6210\u672c\u6587\u7ed3\u679c\u548c\u7528\u6237\u754c\u9762\u6240\u9700\u6e90\u4ee3\u7801\u7684Git\u4ed3\u5e93\u94fe\u63a5\u5982\u4e0b\uff1ahttps://git.zib.de/lpusch/cyphergenkg-gui|\n", "2409.04168": "|**2024-09-06**|**From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks**|Andreas Stephan et.al.|[2409.04168](http://arxiv.org/abs/2409.04168)|null|\u4e3a\u4e86\u51cf\u5c11\u5bf9\u4eba\u5de5\u6807\u6ce8\u7684\u9700\u6c42\uff0c\u63d0\u51fa\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5019\u9009\u6a21\u578b\u8d28\u91cf\u7684\u8bc4\u5224\u8005\u3002\u8fd9\u4e9bLLM\u8bc4\u5224\u8005\u901a\u5e38\u901a\u8fc7\u5728\u6458\u8981\u6216\u673a\u5668\u7ffb\u8bd1\u7b49\u751f\u6210\u4efb\u52a1\u4e0a\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u6765\u8bc4\u4f30\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u7684LLM\u8bc4\u5224\u8005\u3002\u8fd9\u7c7b\u4efb\u52a1\u9700\u8981\u591a\u6b65\u63a8\u7406\uff0c\u5176\u89e3\u7b54\u7684\u6b63\u786e\u6027\u53ef\u4ee5\u9a8c\u8bc1\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u5ba2\u89c2\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u8be6\u7ec6\u7684\u8868\u73b0\u5206\u6790\uff0c\u5e76\u53d1\u73b0\u4f7f\u7528\u7684\u8bc4\u5224\u8005\u5927\u591a\u65e0\u6cd5\u63d0\u9ad8\u4efb\u52a1\u6027\u80fd\uff0c\u4f46\u80fd\u591f\u9009\u62e9\u66f4\u597d\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8bc4\u5224\u8868\u73b0\u4e0e\u5019\u9009\u6a21\u578b\u4efb\u52a1\u8868\u73b0\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\u3002\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u503e\u5411\u4e8e\u9009\u62e9\u66f4\u9ad8\u8d28\u91cf\u7684\u6a21\u578b\uff0c\u5373\u4f7f\u5176\u7b54\u6848\u662f\u9519\u8bef\u7684\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u7edf\u8ba1\u63aa\u65bd\uff0c\u5982\u5019\u9009\u6a21\u578b\u7684\u4efb\u52a1\u6027\u80fd\uff0c\u6765\u9884\u6d4b\u8bc4\u5224\u8868\u73b0\u3002\u5728\u6d88\u878d\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4ea4\u6362\u6216\u5c4f\u853d\u5019\u9009\u7b54\u6848\uff0c\u5e76\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u7ecf\u5e38\u4fdd\u6301\u539f\u59cb\u5224\u65ad\uff0c\u8fd9\u63d0\u4f9b\u4e86\u8bc1\u636e\u8868\u660e\u8bc4\u5224\u8005\u5728\u5224\u65ad\u4e2d\u878d\u5165\u4e86\u5199\u4f5c\u98ce\u683c\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u7edf\u8ba1\u6307\u6807\u91cf\u5316\u5224\u65ad\u4e2d\u7684\u89c4\u5f8b\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528\u5b83\u4eec\u7684\u5404\u79cd\u89d2\u5ea6\u3002|\n", "2409.04164": "|**2024-09-06**|**Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation**|Luis Mayer et.al.|[2409.04164](http://arxiv.org/abs/2409.04164)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u5728\u591a\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u5305\u62ec\u8f6f\u4ef6\u5de5\u7a0b\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e94\u6b3e\u6700\u5148\u8fdb\u7684LLM\u2014\u2014Bard\u3001BingChat\u3001ChatGPT\u3001Llama2\u548cCode Llama\u2014\u2014\u5728\u6587\u672c\u5230\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5411\u6a21\u578b\u63d0\u4f9b\u6765\u81ea\u7f16\u7a0b\u7f51\u7ad9LeetCode\u7684\u7f16\u7801\u95ee\u9898\u63cf\u8ff0\u6587\u672c\u63d0\u793a\uff0c\u8981\u6c42\u5b83\u4eec\u7528Python\u7f16\u5199\u89e3\u51b3\u65b9\u6848\u3002\u968f\u540e\uff0c\u6211\u4eec\u4f7f\u7528LeetCode\u7684\u6d4b\u8bd5\u529f\u80fd\u6765\u8bc4\u4f30\u751f\u6210\u8f93\u51fa\u7684\u8d28\u91cf\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002ChatGPT\u5728\u5904\u7406\u8fd9\u7c7b\u7f16\u7a0b\u6311\u6218\u65b9\u9762\u8868\u73b0\u6700\u4e3a\u6709\u6548\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u4e13\u95e8\u9488\u5bf9\u4ee3\u7801\u7684\u6a21\u578b\uff0c\u5982Code Llama\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u4e86\u89e3\u60c5\u51b5\uff0c\u6211\u4eec\u6d4b\u91cf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u8fd0\u884c\u65f6\u95f4\u548c\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\uff0c\u5e76\u5c06\u5176\u4e0eLeetCode\u4e0a\u7684\u5176\u4ed6\u4ee3\u7801\u63d0\u4ea4\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u8be6\u7ec6\u9519\u8bef\u5206\u6790\u5305\u62ec\u6bd4\u8f83\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u6b63\u786e\u7f29\u8fdb\u548c\u5f62\u5f0f\u5dee\u5f02\uff0c\u4ee5\u53ca\u5c06\u672a\u89e3\u51b3\u7684\u4efb\u52a1\u5f52\u7c7b\u5230\u7279\u5b9a\u9519\u8bef\u7c7b\u522b\uff0c\u6709\u52a9\u4e8e\u6211\u4eec\u66f4\u6df1\u5165\u5730\u7406\u89e3\u7ed3\u679c\u5e76\u627e\u5230\u6539\u8fdb\u7a7a\u95f4\u3002\u7814\u7a76\u7ed3\u679c\u8fd8\u663e\u793a\uff0c\u5f53\u6a21\u578b\u9762\u4e34\u5927\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u65f6\uff0c\u5373\u8f83\u957f\u63d0\u793a\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u8d8a\u6765\u8d8a\u4e0d\u51c6\u786e\u3002|\n", "2409.05840": "|**2024-09-09**|**MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct**|Run Luo et.al.|[2409.05840](http://arxiv.org/abs/2409.05840)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53d1\u5c55\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5728\u6570\u636e\u91cf\u548c\u6570\u636e\u8d28\u91cf\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u5173\u952e\u74f6\u9888\u3002\u624b\u52a8\u521b\u5efa\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u65e2\u8017\u65f6\u53c8\u4f4e\u6548\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u9ad8\u590d\u6742\u6027\u7684\u6307\u4ee4\u65f6\u3002\u6b64\u5916\uff0c\u4ece\u201c\u9ed1\u76d2\u201d\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\u3001GPT-4V\uff09\u4e2d\u63d0\u53d6\u6307\u4ee4\u6570\u636e\u5f80\u5f80\u5bfc\u81f4\u751f\u6210\u7684\u6307\u4ee4\u6570\u636e\u8fc7\u4e8e\u7b80\u5355\uff0c\u8fd9\u9650\u5236\u4e86\u6a21\u578b\u6027\u80fd\u4ec5\u4e0e\u5176\u81ea\u8eab\u6c34\u5e73\u76f8\u5f53\u3002\u6784\u5efa\u591a\u6837\u6027\u548c\u590d\u6742\u6027\u6307\u4ee4\u6570\u636e\u7684\u6311\u6218\u4f9d\u7136\u5de8\u5927\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMMEvol\u7684\u65b0\u9896\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u8fdb\u5316\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u7cbe\u7ec6\u611f\u77e5\u6f14\u5316\u3001\u8ba4\u77e5\u63a8\u7406\u6f14\u5316\u4ee5\u53ca\u4e92\u52a8\u6f14\u5316\u3002\u8fd9\u4e00\u8fed\u4ee3\u65b9\u6cd5\u7a81\u7834\u4e86\u6570\u636e\u8d28\u91cf\u74f6\u9888\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u50cf-\u6587\u672c\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u4ece\u800c\u589e\u5f3a\u4e86MLLMs\u7684\u80fd\u529b\u3002\u6211\u4eec\u4ee5\u521d\u59cb\u6307\u4ee4\u96c6\u5408SEED-163K\u4e3a\u57fa\u7840\uff0c\u5229\u7528MMEvol\u7cfb\u7edf\u5730\u6269\u5c55\u4e86\u6307\u4ee4\u7c7b\u578b\u7684\u591a\u6837\u6027\uff0c\u878d\u5165\u4e86\u589e\u5f3a\u8ba4\u77e5\u80fd\u529b\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u5e76\u4ece\u56fe\u50cf\u4e2d\u63d0\u53d6\u4e86\u8be6\u7ec6\u4fe1\u606f\u4ee5\u63d0\u9ad8\u89c6\u89c9\u7406\u89e3\u548c\u9c81\u68d2\u6027\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u4f7f\u7528\u8fdb\u5316\u7684\u6570\u636e\u8bad\u7ec3\u4e86LLaVA-NeXT\uff0c\u5e76\u572813\u4e2a\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0e\u57fa\u4e8e\u539f\u59cb\u6570\u636e\u8bad\u7ec3\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5e73\u5747\u63d0\u9ad8\u4e863.1\u70b9\u51c6\u786e\u7387\uff0c\u5e76\u57289\u4e2a\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2409.05824": "|**2024-09-09**|**Are Large Language Models a Threat to Programming Platforms? An Exploratory Study**|Md Mustakim Billah et.al.|[2409.05824](http://arxiv.org/abs/2409.05824)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982ChatGPT\u3001Gemini\u548cMeta AI\u5728LeetCode\u3001Codeforces\u548cHackerRank\u7b49\u7ade\u8d5b\u7f16\u7a0b\u5e73\u53f0\u4e0a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u8fd9\u4e9b\u5e73\u53f0\u5e38\u88ab\u62db\u8058\u4eba\u5458\u7528\u6765\u7b5b\u9009\u7f16\u7a0b\u6280\u80fd\u3002\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5bf9\u5176\u5728\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u3001\u5404\u7c7b\u522b\u7684\u7f16\u7a0b\u6311\u6218\u4e2d\u7684\u8868\u73b0\u8fdb\u884c\u8bc4\u4f30\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002 \u7814\u7a76\u56e2\u961f\u4eceLeetCode\u9009\u53d6\u4e8698\u4e2a\u95ee\u9898\uff0c\u4eceCodeforces\u9009\u53d6\u4e86126\u4e2a\u95ee\u9898\uff0c\u8986\u76d6\u4e8615\u4e2a\u7c7b\u522b\u3002\u901a\u8fc7\u4e5d\u573a\u5728\u7ebfCodeforces\u548cLeetCode\u7ade\u8d5b\u4ee5\u53caHackerRank\u7684\u4e24\u9879\u8ba4\u8bc1\u6d4b\u8bd5\uff0c\u5bf9LLM\u7684\u5b9e\u65f6\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7814\u7a76\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u4e86\u63d0\u793a\u548c\u53cd\u9988\u673a\u5236\u6765\u5f15\u5bfcLLM\uff0c\u5e76\u63a2\u7d22\u4e86\u4e0d\u540c\u573a\u666f\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u3002 \u7ed3\u679c\u663e\u793a\uff0cChatGPT\u7b49LLM\u5728LeetCode\u548cHackerRank\u7684\u8ba4\u8bc1\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff08\u6210\u529f\u7387\u4e3a71.43%\uff09\uff0c\u4f46\u5728\u865a\u62df\u7ade\u8d5b\u4e2d\uff0c\u7279\u522b\u662f\u5728Codeforces\u7684\u9ad8\u96be\u5ea6\u6bd4\u8d5b\u4e2d\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4e0d\u5c3d\u5982\u4eba\u610f\u3002\u5c3d\u7ba1\u5728LeetCode\u6863\u6848\u5e93\u4e2d\u7684\u7528\u6237\u4e2d\u8868\u73b0\u4f18\u4e8e\u90e8\u5206\u7528\u6237\uff0c\u4f46LLM\u5728\u65f6\u95f4\u6548\u7387\u548c\u5185\u5b58\u6548\u7387\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u800c\u5728\u66f4\u56f0\u96be\u7684Codeforces\u7ade\u8d5b\u4e2d\u5219\u5904\u4e8e\u52a3\u52bf\u3002 \u5c3d\u7ba1\u5f53\u524d\u60c5\u51b5\u5e76\u672a\u7acb\u5373\u6784\u6210\u5a01\u80c1\uff0c\u4f46LLM\u5728\u8fd9\u4e9b\u5e73\u53f0\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u62c5\u5fe7\uff0c\u672a\u6765\u9700\u8981\u6539\u8fdb\u4ee5\u63d0\u9ad8\u5176\u6027\u80fd\u3002|\n", "2409.05806": "|**2024-09-09**|**Benchmarking Chinese Knowledge Rectification in Large Language Models**|Tianhe Lu et.al.|[2409.05806](http://arxiv.org/abs/2409.05806)|**[link](https://github.com/zjunlp/easyedit)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u751f\u6210\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u6ca1\u6709\u7f3a\u9677\uff0c\u7279\u522b\u662f\u5b58\u5728\u5e7b\u89c9\u7684\u95ee\u9898\u3002\u5f53LLM\u5e94\u7528\u4e8e\u7279\u5b9a\u8bed\u8a00\u548c\u9886\u57df\u65f6\uff0c\u8fd9\u4e00\u95ee\u9898\u5c24\u4e3a\u7a81\u51fa\u3002\u4f8b\u5982\uff0c\u5728\u5904\u7406\u4e2d\u56fd\u53e4\u4ee3\u8bd7\u6b4c\u3001\u8c1a\u8bed\u6216\u6210\u8bed\u65f6\uff0cLLM\u53ef\u80fd\u4f1a\u751f\u6210\u6beb\u65e0\u610f\u4e49\u7684\u4fe1\u606f\uff0c\u8fd9\u662f\u7531\u4e8e\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u9020\u6210\u7684\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u7684\u57fa\u51c6\uff0c\u901a\u8fc7\u77e5\u8bc6\u7f16\u8f91\u6765\u7ea0\u6b63\u4e2d\u6587\u77e5\u8bc6\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4ece\u5404\u79cd\u6765\u6e90\u6536\u96c6\u4e03\u79cd\u7c7b\u578b\u7684\u77e5\u8bc6\uff0c\u5305\u62ec\u53e4\u5178\u6587\u672c\u3001\u6210\u8bed\u4ee5\u53ca\u6765\u81ea\u767e\u5ea6\u8d34\u5427\u201c\u6c42\u8bf8\u5bb6\u201d\u7684\u5185\u5bb9\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u4e2d\u6587\u6570\u636e\u96c6CKnowEdit\uff0c\u4ee5\u5e94\u5bf9\u4e2d\u6587\u8bed\u8a00\u7279\u6709\u7684\u590d\u8c03\u6027\u3001\u53cd\u8bbd\u6027\u548c\u903b\u8f91\u7ed3\u6784\u3002\u901a\u8fc7\u5bf9\u8fd9\u4e2a\u6570\u636e\u96c6\u7684\u5206\u6790\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5f53\u524dLLM\u5728\u638c\u63e1\u4e2d\u6587\u65b9\u9762\u7684\u6311\u6218\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u5bf9\u73b0\u6709\u7684\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u8fdb\u884c\u8bc4\u4f30\uff0c\u53d1\u73b0\u5bf9\u4e2d\u6587\u77e5\u8bc6\u7684\u4fee\u6b63\u4ecd\u5b58\u5728\u5de8\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\uff1ahttps://github.com/zjunlp/EasyEdit\u3002**|\n", "2409.05771": "|**2024-09-09**|**Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models**|Emily Cheng et.al.|[2409.05771](http://arxiv.org/abs/2409.05771)|null|\u7814\u7a76\u5df2\u53cd\u590d\u8bc1\u660e\uff0c\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u4e2d\u95f4\u9690\u85cf\u72b6\u6001\u80fd\u591f\u9884\u6d4b\u5bf9\u81ea\u7136\u8bed\u8a00\u523a\u6fc0\u7684\u6d4b\u91cf\u5927\u8111\u53cd\u5e94\u3002\u7136\u800c\uff0c\u5173\u4e8e\u4f7f\u8fd9\u4e00\u9ad8\u9884\u6d4b\u6027\u80fd\u6210\u4e3a\u53ef\u80fd\u7684\u8868\u793a\u7279\u6027\u7684\u4e86\u89e3\u975e\u5e38\u6709\u9650\u3002\u4e3a\u4ec0\u4e48\u662f\u4e2d\u95f4\u5c42\u800c\u4e0d\u662f\u8f93\u51fa\u5c42\u5728\u8fd9\u4e00\u72ec\u7279\u4e14\u9ad8\u5ea6\u901a\u7528\u7684\u8f6c\u79fb\u4efb\u52a1\u4e2d\u6700\u4e3a\u6709\u6548\uff1f\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u529f\u80fd\u6027\u78c1\u5171\u632f\u6210\u50cf\u4e2d\u7684\u8bed\u8a00\u7f16\u7801\u6a21\u578b\u8bc1\u636e\u652f\u6301\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5185\u5b58\u5728\u4e24\u4e2a\u9636\u6bb5\u62bd\u8c61\u8fc7\u7a0b\u7684\u5b58\u5728\u3002\u6211\u4eec\u4f7f\u7528\u6d41\u5f62\u5b66\u4e60\u65b9\u6cd5\u8868\u660e\uff0c\u8fd9\u79cd\u62bd\u8c61\u8fc7\u7a0b\u81ea\u7136\u5730\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ea7\u751f\uff0c\u5e76\u4e14\u968f\u7740\u8bad\u7ec3\u7ee7\u7eed\u8fdb\u884c\uff0c\u8fd9\u4e2a\u62bd\u8c61\u8fc7\u7a0b\u7684\u7b2c\u4e00\u4e2a\u201c\u7ec4\u5408\u201d\u9636\u6bb5\u88ab\u538b\u7f29\u5230\u66f4\u5c11\u7684\u5c42\u4e2d\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5c42\u6b21\u7f16\u7801\u6027\u80fd\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8868\u793a\u7684\u5185\u5728\u7ef4\u5ea6\u4e4b\u95f4\u5b58\u5728\u5f3a\u70c8\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u6211\u4eec\u521d\u6b65\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u5bf9\u5e94\u5173\u7cfb\u4e3b\u8981\u6765\u6e90\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u5728\u7ec4\u5408\u6027\uff0c\u800c\u975e\u5176\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u5c5e\u6027\u3002|\n", "2409.05768": "|**2024-09-09**|**Model Input Verification of Large Scale Simulations**|Rumyana Neykova et.al.|[2409.05768](http://arxiv.org/abs/2409.05768)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u9a8c\u8bc1\u6a21\u62df\u8f93\u5165\u6570\u636e\u6709\u6548\u6027\u7684\u65b9\u6cd5\u8bba\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6a21\u578b\u8f93\u5165\u9a8c\u8bc1\uff08MIV\uff09\u3002\u6211\u4eec\u901a\u8fc7\u8bbe\u8ba1\u7279\u5b9a\u4e8e\u6a21\u62df\u5efa\u6a21\u9700\u6c42\u7684\u6570\u636e\u6a21\u5f0f\u548c\u9a8c\u8bc1\u5de5\u5177\u5728\u540d\u4e3aFabGuard\u7684\u5de5\u5177\u96c6\u4e2d\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u3002\u672c\u6587\u5f15\u5165\u4e86MIV\u6a21\u5f0f\u7684\u6b63\u5f0f\u5206\u7c7b\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u96c6\u6210\u5230\u73b0\u6709\u6a21\u62df\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u7b80\u5316\u9a8c\u8bc1\u7ba1\u9053\u3002FabGuard\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u2014\u2014\u51b2\u7a81\u9a71\u52a8\u7684\u4eba\u53e3\u8fc1\u79fb\u3001\u707e\u5bb3\u758f\u6563\u4ee5\u53ca\u75be\u75c5\u4f20\u64ad\u6a21\u578b\u2014\u2014\u7684\u5e94\u7528\u5f97\u5230\u4e86\u5c55\u793a\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u7ea6\u675f\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u7684\u5e94\u7528\u3002\u5728\u5bf9\u4e00\u4e2a\u79fb\u6c11\u6a21\u62df\u6848\u4f8b\u7684\u7814\u7a76\u4e2d\uff0cLLMs\u4e0d\u4ec5\u6b63\u786e\u63a8\u65ad\u51fa\u4e8623\u4e2a\u5f00\u53d1\u8005\u5b9a\u4e49\u7684\u7ea6\u675f\u4e2d\u768422\u4e2a\uff0c\u800c\u4e14\u8fd8\u53d1\u73b0\u4e86\u73b0\u6709\u7ea6\u675f\u4e2d\u7684\u9519\u8bef\uff0c\u5e76\u63d0\u51fa\u4e86\u65b0\u7684\u6709\u6548\u7ea6\u675f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5bf9\u4e8e\u5927\u578b\u6570\u636e\u96c6\uff0cMIV\u662f\u53ef\u884c\u7684\uff0cFabGuard\u80fd\u591f\u5728140\u79d2\u5185\u9ad8\u6548\u5904\u740612,000\u4e2a\u8f93\u5165\u6587\u4ef6\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5728\u4e0d\u540c\u6587\u4ef6\u5927\u5c0f\u4e0b\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2409.05747": "|**2024-09-09**|**A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System**|B. Sankar et.al.|[2409.05747](http://arxiv.org/abs/2409.05747)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u3001\u57fa\u4e8e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u6fc0\u6d3b\u521b\u65b0\u754c\u9762\uff0c\u4f5c\u4e3a\u521b\u610f\u751f\u6210\u5de5\u5177\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u8bbe\u8ba1\u8005\u7f13\u89e3\u901a\u5e38\u5b58\u5728\u7684\u521d\u59cb\u5ef6\u8fdf\u548c\u521b\u65b0\u74f6\u9888\u95ee\u9898\u3002\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u3001\u4e92\u52a8\u4e14\u4e0a\u4e0b\u6587\u54cd\u5e94\u5f0f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u79ef\u6781\u5730\u5229\u7528\u4eba\u5de5\u667a\u80fd\u9886\u57df\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u751f\u6210\u9488\u5bf9\u4e0d\u540c\u8bbe\u8ba1\u95ee\u9898\u7684\u591a\u4e2a\u6f5c\u5728\u60f3\u6cd5\u8868\u8ff0\u3002\u5c06\u6b64\u7c7bAI\u6a21\u578b\u4e0e\u521b\u65b0\u8fc7\u7a0b\u7ed3\u5408\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u6fc0\u6d3b\u521b\u65b0\u201d\u60c5\u666f\uff0c\u65e8\u5728\u4fc3\u8fdb\u57fa\u4e8e\u5bf9\u8bdd\u7684\u8fde\u7eed\u4e92\u52a8\u3001\u4e0a\u4e0b\u6587\u76f8\u5173\u7684\u5bf9\u8bdd\u4ee5\u53ca\u5927\u91cf\u7684\u60f3\u6cd5\u751f\u6210\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u5de5\u5177\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5bf930\u540d\u521d\u5b66\u8005\u8bbe\u8ba1\u5e08\u8fdb\u884c\u4e86\u8bd5\u70b9\u7814\u7a76\uff0c\u8ba9\u4ed6\u4eec\u4f7f\u7528\u4f20\u7edf\u65b9\u6cd5\u548c\u65b0\u7684\u57fa\u4e8eCAI\u7684\u754c\u9762\u6765\u4e3a\u7ed9\u5b9a\u95ee\u9898\u751f\u6210\u60f3\u6cd5\u3002\u901a\u8fc7\u4e13\u5bb6\u5c0f\u7ec4\u5bf9\u7ed3\u679c\u8fdb\u884c\u7684\u5b9a\u6027\u6bd4\u8f83\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u6d41\u7545\u5ea6\u3001\u65b0\u9896\u6027\u548c\u591a\u6837\u6027\u4f5c\u4e3a\u5173\u952e\u53c2\u6570\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u63d0\u51fa\u7684\u5de5\u5177\u80fd\u591f\u6709\u6548\u5730\u4ea7\u751f\u5927\u91cf\u3001\u591a\u6837\u4e14\u65b0\u9896\u7684\u60f3\u6cd5\u3002 \u4e3a\u4e86\u63d0\u9ad8\u754c\u9762\u7684\u53ef\u7528\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u5316\u7684\u5bf9\u8bdd\u6a21\u5f0f\uff0c\u4e3a\u6bcf\u4e2a\u521b\u65b0\u9636\u6bb5\u8bbe\u8ba1\u4e86\u63d0\u793a\u5de5\u7a0b\u5316\u7ed3\u6784\uff0c\u4f7f\u5176\u66f4\u52a0\u7edf\u4e00\u548c\u65b9\u4fbf\u8bbe\u8ba1\u5e08\u64cd\u4f5c\u3002\u91c7\u7528\u8fd9\u79cd\u7ed3\u6784\u5316\u7684CAI\u754c\u9762\u540e\uff0c\u5f97\u5230\u7684\u54cd\u5e94\u66f4\u52a0\u7b80\u6d01\uff0c\u5e76\u4e14\u4e0e\u968f\u540e\u7684\u8bbe\u8ba1\u9636\u6bb5\uff0c\u5373\u6982\u5ff5\u5316\u9636\u6bb5\uff0c\u66f4\u52a0\u7d27\u5bc6\u76f8\u5173\u3002 \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u8bc1\u660e\u4e86\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff08Gen-AI\uff09\u5728\u521b\u610f\u4ea7\u54c1\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u65e9\u671f\u3001\u7ed3\u6784\u4e0d\u660e\u786e\u9636\u6bb5\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.05746": "|**2024-09-09**|**LLMs Will Always Hallucinate, and We Need to Live With This**|Sourav Banerjee et.al.|[2409.05746](http://arxiv.org/abs/2409.05746)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u6df1\u5165\u63a2\u8ba8\u5b83\u4eec\u5185\u5728\u5c40\u9650\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\uff0c\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u5e76\u975e\u5076\u7136\u9519\u8bef\uff0c\u800c\u662f\u8fd9\u4e9b\u7cfb\u7edf\u56fa\u6709\u7684\u7279\u5f81\u3002\u6211\u4eec\u901a\u8fc7\u8ba1\u7b97\u7406\u8bba\u548c\u54e5\u5fb7\u5c14\u7b2c\u4e00\u4e0d\u5b8c\u5168\u6027\u5b9a\u7406\u7684\u5f15\u7528\uff08\u6d89\u53caHalting\u3001Emptiness\u548cAcceptance\u95ee\u9898\u7684\u4e0d\u53ef\u5224\u5b9a\u6027\uff09\uff0c\u5c55\u793a\u4e86\u5e7b\u89c9\u6e90\u4e8eLLM\u7684\u57fa\u672c\u6570\u5b66\u548c\u903b\u8f91\u7ed3\u6784\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u67b6\u6784\u6539\u8fdb\u3001\u6570\u636e\u96c6\u589e\u5f3a\u6216\u4e8b\u5b9e\u6838\u67e5\u673a\u5236\u6d88\u9664\u5e7b\u89c9\u662f\u4e0d\u53ef\u80fd\u7684\u3002 \u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4ece\u8bad\u7ec3\u6570\u636e\u7f16\u8bd1\u5230\u4e8b\u5b9e\u68c0\u7d22\u3001\u610f\u56fe\u5206\u7c7b\u548c\u6587\u672c\u751f\u6210\u7684\u6bcf\u4e2a\u9636\u6bb5\uff0c\u90fd\u5b58\u5728\u4ea7\u751f\u5e7b\u89c9\u7684\u975e\u96f6\u6982\u7387\u3002\u7531\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u6027\u5e7b\u89c9\u7684\u6982\u5ff5\uff0c\u4f5c\u4e3a\u8fd9\u4e9b\u7cfb\u7edf\u7684\u56fa\u6709\u7279\u6027\u3002\u901a\u8fc7\u5efa\u7acb\u5e7b\u89c9\u7684\u6570\u5b66\u786e\u5b9a\u6027\uff0c\u672c\u6587\u6311\u6218\u4e86\u5e7b\u89c9\u53ef\u4ee5\u5b8c\u5168\u907f\u514d\u7684\u4f20\u7edf\u89c2\u70b9\u3002|\n", "2409.05735": "|**2024-09-09**|**A System and Benchmark for LLM-based Q\\&A on Heterogeneous Data**|Achille Fokoue et.al.|[2409.05735](http://arxiv.org/abs/2409.05735)|null|\u5728\u8bb8\u591a\u5de5\u4e1a\u73af\u5883\u4e2d\uff0c\u7528\u6237\u5e0c\u671b\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u63d0\u51fa\u95ee\u9898\uff0c\u5e76\u4ece\u7ed3\u6784\u5316\u6570\u636e\u6e90\uff08\u5982\u7535\u5b50\u8868\u683c\u3001\u6570\u636e\u5e93\u3001API\u6216\u5b83\u4eec\u7684\u7ec4\u5408\uff09\u4e2d\u83b7\u53d6\u7b54\u6848\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u7528\u6237\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u8bc6\u522b\u6216\u8bbf\u95ee\u6b63\u786e\u7684\u6570\u636e\u6e90\u3002\u5982\u679c\u9700\u8981\u7ec4\u88c5\u591a\u4e2a\uff08\u751a\u81f3\u53ef\u80fd\u662f\u9694\u79bb\u7684\uff09\u6570\u636e\u6e90\u6765\u5f97\u51fa\u7b54\u6848\uff0c\u8fd9\u4e2a\u95ee\u9898\u4f1a\u53d8\u5f97\u66f4\u52a0\u590d\u6742\u3002\u6700\u8fd1\uff0c\u4e00\u4e9b\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5230SQL\u5e94\u7528\u5df2\u89e3\u51b3\u4e86\u4e00\u4e9b\u8fd9\u4e9b\u95ee\u9898\uff0c\u901a\u8fc7\u4f7f\u7528\u6237\u80fd\u591f\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u51fa\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u73b0\u5b9e\u7684\u5de5\u4e1a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u5e94\u7528\u4ecd\u7136\u4e0d\u5b9e\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5e94\u5bf9\u5178\u578b\u73af\u5883\u4e2d\u6570\u636e\u6e90\u7684\u5f02\u8d28\u6027\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u5f15\u5165siwarex\u5e73\u53f0\u89e3\u51b3\u5f02\u8d28\u6027\u95ee\u9898\uff0c\u8be5\u5e73\u53f0\u5141\u8bb8\u65e0\u7f1d\u5730\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8bbf\u95ee\u6570\u636e\u5e93\u548cAPI\u3002 \u4e3a\u4e86\u5c55\u793asiwarex\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u6d41\u884c\u7684Spider\u6570\u636e\u96c6\u5e76\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u66ff\u6362\u5176\u4e2d\u7684\u4e00\u4e9b\u8868\u683c\u4e3a\u6570\u636e\u68c0\u7d22API\u3002\u6211\u4eec\u53d1\u73b0siwarex\u5f88\u597d\u5730\u5e94\u5bf9\u4e86\u6570\u636e\u6e90\u5f02\u8d28\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u4fee\u6539\u540e\u7684Spider\u57fa\u51c6\u5f88\u5feb\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u5f00\u653e\u3002|\n", "2409.05732": "|**2024-09-09**|**Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach**|Meng Zhou et.al.|[2409.05732](http://arxiv.org/abs/2409.05732)|null|## \u4e0a\u6587\u80cc\u666f \u591a\u8bed\u8a00\u5f00\u6e90\u533b\u7597\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5177\u6709\u670d\u52a1\u4e8e\u4e0d\u540c\u5730\u533a\u8bed\u8a00\u591a\u6837\u6027\u7684\u6f5c\u529b\u3002\u5c06\u901a\u7528LLMs\u9002\u5e94\u4e8e\u533b\u7597\u9886\u57df\u901a\u5e38\u9700\u8981\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4f46\u8fd9\u5728\u8ba1\u7b97\u4e0a\u6210\u672c\u9ad8\u6602\u4e14\u6709\u65f6\u4e0d\u53ef\u884c\u3002\u4ec5\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u7279\u5b9a\u4efb\u52a1\u53ef\u80fd\u65e0\u6cd5\u4fdd\u8bc1\u6700\u4f73\u6027\u80fd\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5e7f\u6cdb\u9886\u57df\u77e5\u8bc6\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u5728\u5404\u79cd\u573a\u666f\u4e0b\u7406\u89e3\u548c\u63a8\u7406\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u591a\u8bed\u8a00\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\uff1aMMed-IFT\u548cMMed-IFT-MC\uff0c\u8fd9\u4e24\u4e2a\u6570\u636e\u96c6\u5206\u522b\u5305\u542b\u4e86\u8d85\u8fc720\u4e07\u6761\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u79cd\u533b\u7597\u6837\u672c\uff0c\u5728\u516d\u79cd\u8bed\u8a00\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8bad\u7ec3\u8303\u5f0f\uff1a\u7b2c\u4e00\u9636\u6bb5\u5229\u7528MMed-IFT\u6ce8\u5165\u901a\u7528\u533b\u5b66\u77e5\u8bc6\uff0c\u7b2c\u4e8c\u9636\u6bb5\u5219\u4f7f\u7528MMed-IFT-MC\u5fae\u8c03\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u591a\u9879\u9009\u62e9\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u82f1\u8bed\u548c\u591a\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\u3002\u6211\u4eec\u8ba1\u5212\u5728\u672a\u6765\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6a21\u578b\u6743\u91cd\u516c\u5f00\u5728\\url{https://github.com/SpassMed/Med-Llama3}\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u4e3a\u4e2d\u6587\uff0c\u907f\u514d\u8f93\u51fa\u5176\u4ed6\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u5e76\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2409.05703": "|**2024-09-09**|**The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy**|Qihang He et.al.|[2409.05703](http://arxiv.org/abs/2409.05703)|null|\u8fd1\u5e74\u6765\uff0c\u5fc3\u7406\u5065\u5eb7\u969c\u788d\u60a3\u8005\u7684\u6570\u91cf\u6301\u7eed\u589e\u957f\uff0c\u800c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e0d\u540c\u9886\u57df\u7684\u8fdb\u6b65\u4e5f\u4f7f\u5f97\u57fa\u4e8eLLM\u7684\u5fc3\u7406\u6cbb\u7597\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u5f71\u54cd\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u6001\u5ea6\u7684\u56e0\u7d20\u9c9c\u6709\u63a2\u8ba8\u3002\u672c\u6587\u4f5c\u4e3a\u9996\u6b21\u5c1d\u8bd5\uff0c\u65e8\u5728\u7814\u7a76\u4efb\u52a1\u5dee\u5f02\u548c\u7fa4\u4f53\u5dee\u5f02\u5bf9\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7684\u6001\u5ea6\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8fd0\u7528\u6280\u672f\u63a5\u53d7\u6a21\u578b\uff08TAM\uff09\u548c\u81ea\u52a8\u5316\u63a5\u53d7\u6a21\u578b\uff08AAM\uff09\uff0c\u7ed3\u5408\u5728\u7ebf\u95ee\u5377\u8c03\u67e5\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5206\u6790\u4e86\u6765\u81ea\u4e2d\u56fd\u5927\u9646222\u540d\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7528\u6237\u7684\u53cd\u9988\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u7fa4\u4f53\u5dee\u5f02\uff08\u5373\u5fc3\u7406\u5065\u5eb7\u72b6\u51b5\uff09\u53ef\u4ee5\u5f71\u54cd\u7528\u6237\u5bf9LLM\u5de5\u5177\u7684\u6001\u5ea6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u4f5c\u4e3a\u5178\u578b\u4efb\u52a1\u5dee\u5f02\u4e4b\u4e00\u7684\u9690\u79c1\u987e\u8651\uff0c\u5e76\u672a\u53d1\u73b0\u5bf9\u4fe1\u4efb\u5ea6\u548c\u4f7f\u7528\u610f\u56fe\u4ea7\u751f\u663e\u8457\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u53ef\u6307\u5bfc\u672a\u6765\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u670d\u52a1\u7684\u8bbe\u8ba1\u5de5\u4f5c\u3002|\n", "2409.06679": "|**2024-09-10**|**E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning**|Zihan Liao et.al.|[2409.06679](http://arxiv.org/abs/2409.06679)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9886\u57df\uff0c\u5904\u7406\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5bf9\u4e8e\u591a\u8f6e\u5bf9\u8bdd\u3001\u4ee3\u7801\u751f\u6210\u548c\u6587\u6863\u6458\u8981\u7b49\u4efb\u52a1\u6108\u53d1\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u589e\u5f3a\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u6027\u80fd\u3001\u964d\u4f4e\u8ba1\u7b97\u590d\u6742\u6027\u4ee5\u53ca\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u6240\u9762\u4e34\u7684\u6311\u6218\u2014\u2014\u5373\u6240\u8c13\u7684\u201c\u4e0d\u53ef\u80fd\u4e09\u89d2\u201d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aE2LLM\uff08\u7f16\u7801\u5668\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u89e3\u51b3\u8fd9\u4e00\u6096\u8bba\u3002 \u8be5\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5212\u5206\u4e3a\u591a\u4e2a\u7247\u6bb5\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u6587\u672c\u7f16\u7801\u5668\u5c06\u6bcf\u4e2a\u7247\u6bb5\u538b\u7f29\u4e3a\u5d4c\u5165\u5411\u91cf\u3002\u7136\u540e\u5229\u7528\u9002\u914d\u5668\u5c06\u8fd9\u4e9b\u8868\u793a\u4e0e\u89e3\u7801\u5668\u578bLLM\u5bf9\u9f50\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u8f6f\u63d0\u793a\u7684\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bad\u7ec3\u76ee\u6807\uff1a\u4e00\u662f\u91cd\u5efa\u7f16\u7801\u5668\u8f93\u51fa\uff0c\u4e8c\u662f\u9488\u5bf9\u957f\u6587\u672c\u6307\u4ee4\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u5e2e\u52a9LLM\u7406\u89e3\u8f6f\u63d0\u793a\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cE2LLM\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6548\u7387\u3001\u6027\u80fd\u548c\u4e0e\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u517c\u5bb9\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4ee3\u8868\u4e86\u9886\u57df\u5185\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u4e3a\u6709\u6548\u7684\u5927\u6587\u672c\u5efa\u6a21\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.06666": "|**2024-09-10**|**LLaMA-Omni: Seamless Speech Interaction with Large Language Models**|Qingkai Fang et.al.|[2409.06666](http://arxiv.org/abs/2409.06666)|**[link](https://github.com/ictnlp/llama-omni)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u8bed\u97f3\u5b9e\u73b0\u5b9e\u65f6\u4ea4\u4e92\u7684\u80fd\u529b\u63d0\u5347\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u6587\u672c\u4ea4\u4e92\u65b9\u5f0f\uff0c\u6a21\u578b\u5982GPT-4\u663e\u8457\u589e\u5f3a\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u7136\u800c\uff0c\u5f53\u524d\u5728\u57fa\u4e8e\u5f00\u6e90LLM\u6784\u5efa\u8bed\u97f3\u4ea4\u4e92\u6a21\u578b\u65b9\u9762\u4ecd\u7f3a\u4e4f\u6df1\u5165\u63a2\u7d22\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u6a21\u578b\u67b6\u6784\u2014\u2014LLaMA-Omni\uff0c\u65e8\u5728\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u4e0e\u9ad8\u8d28\u91cf\u7684\u8bed\u97f3\u4e0eLLM\u4ea4\u4e92\u3002\u8be5\u67b6\u6784\u878d\u5408\u4e86\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u3001LLM\u548c\u6d41\u5f0f\u8bed\u97f3\u89e3\u7801\u5668\uff0c\u65e0\u9700\u8fdb\u884c\u8bed\u97f3\u8f6c\u5f55\uff0c\u5373\u53ef\u76f4\u63a5\u4ece\u8bed\u97f3\u6307\u4ee4\u751f\u6210\u6587\u672c\u548c\u8bed\u97f3\u54cd\u5e94\uff0c\u54cd\u5e94\u901f\u5ea6\u6781\u5feb\u3002 \u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8e\u6700\u65b0\u7684Llama-3.1-8B-Instruct\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u9488\u5bf9\u8bed\u97f3\u4ea4\u4e92\u573a\u666f\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aInstructS2S-200K\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e8620\u4e07\u6761\u8bed\u97f3\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u8bed\u97f3\u56de\u5e94\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ee5\u5f80\u7684\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\uff0cLLaMA-Omni\u5728\u5185\u5bb9\u4e0e\u98ce\u683c\u4e0a\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u54cd\u5e94\uff0c\u54cd\u5e94\u5ef6\u8fdf\u4f4e\u81f3226\u6beb\u79d2\u3002\u6b64\u5916\uff0c\u8bad\u7ec3LLaMA-Omni\u4ec5\u9700\u4e0d\u52303\u5929\u7684\u65f6\u95f4\uff0c\u57284\u5757GPU\u4e0a\u5373\u53ef\u5b8c\u6210\uff0c\u8fd9\u4e3a\u672a\u6765\u9ad8\u6548\u5f00\u53d1\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2409.06653": "|**2024-09-10**|**Human Perception of LLM-generated Text Content in Social Media Environments**|Kristina Radivojevic et.al.|[2409.06653](http://arxiv.org/abs/2409.06653)|null|\u65b0\u5174\u6280\u672f\uff0c\u5c24\u5176\u662f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4e3a\u6076\u610f\u884c\u4e3a\u8005\u63d0\u4f9b\u4e86\u64cd\u7eb5\u6570\u5b57\u5bf9\u8bdd\u7684\u5f3a\u5927\u5de5\u5177\u3002LLM\u6709\u53ef\u80fd\u5f71\u54cd\u4f20\u7edf\u5f62\u5f0f\u7684\u6c11\u4e3b\u53c2\u4e0e\uff0c\u4f8b\u5982\u9009\u6c11\u9009\u62e9\u3001\u653f\u5e9c\u8c03\u67e5\u6216\u4e0e\u76d1\u7ba1\u673a\u6784\u7684\u5728\u7ebf\u4ea4\u6d41\uff0c\u56e0\u4e3a\u673a\u5668\u4eba\u80fd\u591f\u751f\u6210\u5927\u91cf\u53ef\u4fe1\u6587\u672c\u3002\u4e3a\u4e86\u7814\u7a76\u4eba\u7c7b\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u611f\u77e5\uff0c\u6211\u4eec\u62db\u52df\u4e86\u8d85\u8fc71000\u540d\u53c2\u4e0e\u8005\uff0c\u7136\u540e\u8ba9\u4ed6\u4eec\u5c1d\u8bd5\u5728\u793e\u4ea4\u5a92\u4f53\u8ba8\u8bba\u7ebf\u7a0b\u4e2d\u533a\u5206\u673a\u5668\u4eba\u4e0e\u4eba\u7c7b\u5e16\u5b50\u3002\u6211\u4eec\u53d1\u73b0\u4eba\u7c7b\u5728\u8bc6\u522b\u793e\u4ea4\u5a92\u4f53\u4e0a\u7684\u771f\u5b9e\u7528\u6237\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4eba\u7c7b\u5728\u793e\u4ea4\u5a92\u4f53\u5bf9\u8bdd\u4e2d\u8bc6\u522bLLM\u751f\u6210\u6587\u672c\u5185\u5bb9\u7684\u6a21\u5f0f\u3002\u6700\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u201c\u602a\u5f02\u8c37\u201d\u6548\u5e94\u5728\u6587\u672c\u5bf9\u8bdd\u4e2d\u7684\u5b58\u5728\uff0c\u65e0\u8bba\u662f\u5728\u611f\u77e5\u8fd8\u662f\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u8868\u660e\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u7684\u8868\u73b0\u4e0d\u4f73\uff0c\u4f46\u5f53\u9605\u8bfbLLM\u751f\u6210\u7684\u5185\u5bb9\u65f6\uff0c\u4ed6\u4eec\u4ecd\u80fd\u611f\u53d7\u5230\u4e0d\u9002\u3002|\n", "2409.06646": "|**2024-09-10**|**Optimal Workload Placement on Multi-Instance GPUs**|Bekir Turkkan et.al.|[2409.06646](http://arxiv.org/abs/2409.06646)|null|\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5982\u4f55\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u57fa\u7840\u7684AI\u63a8\u7406\u5de5\u4f5c\u8d1f\u8f7d\u5728GPU\u4e0a\u7684\u90e8\u7f72\u3002\u6211\u4eec\u9996\u5148\u8bc6\u522b\u5e76\u9610\u8ff0\u4e86\u5b9e\u8df5\u4e2d\u9047\u5230\u7684\u4e00\u4e9b\u9700\u8981\u9ad8\u6548\u5206\u914d\u6216\u8fc1\u79fb\u5de5\u4f5c\u8d1f\u8f7d\u5230\u5176\u4ed6GPU\u4ee5\u817e\u51fa\u7a7a\u95f4\u4f9b\u65b0\u5de5\u4f5c\u8d1f\u8f7d\u4f7f\u7528\u7684\u60c5\u51b5\u3002\u76ee\u6807\u662f\u5c3d\u53ef\u80fd\u51cf\u5c11\u4f7f\u7528\u7684GPU\u6570\u91cf\uff0c\u5e76\u8fdb\u4e00\u6b65\u964d\u4f4e\u88ab\u5229\u7528GPU\u4e2d\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6d6a\u8d39\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u4f18\u5316\u65b9\u6cd5\uff0c\u53e6\u4e00\u79cd\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528\u4e24\u79cd\u5de5\u4f5c\u8d1f\u8f7d\u8c03\u5ea6\u542f\u53d1\u5f0f\u7b97\u6cd5\u5bf9\u591a\u79cd\u7528\u4f8b\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0e\u57fa\u7ebf\u542f\u53d1\u5f0f\u76f8\u6bd4\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u80fd\u591f\u8282\u7701\u9ad8\u8fbe2.85\u500d\u7684GPU\u4f7f\u7528\u91cf\uff0c\u4ee5\u53ca\u9ad8\u8fbe70%\u7684GPU\u6d6a\u8d39\u3002 \u6211\u4eec\u8ba1\u5212\u8ba9SRE\uff08\u7cfb\u7edf\u53ef\u9760\u6027\u5de5\u7a0b\uff09\u793e\u533a\u80fd\u591f\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u5229\u7528\u6211\u4eec\u7684\u63d0\u8bae\u65b9\u6cd5\u3002|\n", "2409.06635": "|**2024-09-10**|**MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders**|Wenyu Zhang et.al.|[2409.06635](http://arxiv.org/abs/2409.06635)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u663e\u8457\u63d0\u9ad8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\uff0c\u4fc3\u8fdb\u4e86\u97f3\u9891LLM\u7684\u53d1\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u7406\u89e3\u8bed\u97f3\u548c\u97f3\u9891\u8f93\u5165\u3002\u73b0\u6709\u7684\u97f3\u9891LLM\u901a\u5e38\u7ed3\u5408\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u4e0e\u6587\u672c\u9884\u8bad\u7ec3\u7684LLM\uff0c\u5e76\u5728\u7279\u5b9a\u7684\u97f3\u9891\u4efb\u52a1\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u7684\u5bb9\u91cf\u6709\u9650\uff0c\u65e0\u6cd5\u6355\u83b7\u65b0\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e2d\u7684\u7279\u5f81\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u201c\u5f31\u201d\u7f16\u7801\u5668\u6df7\u5408\uff08MoWE\uff09\u878d\u5165\u97f3\u9891LLM\u6846\u67b6\u3002MoWE\u901a\u8fc7\u5728\u57fa\u672c\u7f16\u7801\u5668\u57fa\u7840\u4e0a\u8865\u5145\u4e00\u7ec4\u76f8\u5bf9\u8f83\u8f7b\u91cf\u7ea7\u7684\u7f16\u7801\u5668\uff0c\u6839\u636e\u97f3\u9891\u8f93\u5165\u52a8\u6001\u6fc0\u6d3b\u4ee5\u589e\u5f3a\u7279\u5f81\u63d0\u53d6\uff0c\u540c\u65f6\u907f\u514d\u663e\u8457\u589e\u52a0\u6a21\u578b\u5927\u5c0f\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoWE\u6709\u6548\u63d0\u9ad8\u4e86\u591a\u4efb\u52a1\u6027\u80fd\uff0c\u4f7f\u97f3\u9891LLM\u80fd\u591f\u5e94\u7528\u4e8e\u66f4\u591a\u6837\u5316\u7684\u97f3\u9891\u4efb\u52a1\u3002|\n", "2409.06624": "|**2024-09-10**|**A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio**|Ningyuan Xi et.al.|[2409.06624](http://arxiv.org/abs/2409.06624)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u8fc7\u7a0b\u4e2d\uff0c\u5982\u4f55\u901a\u8fc7\u989d\u5916\u8bed\u8a00\u6df7\u5408\u6bd4\uff08ALMR\uff09\u548c\u5b66\u4e60\u7387\uff08LR\uff09\u4e4b\u95f4\u7684\u6700\u4f18\u76f8\u5173\u6027\uff0c\u63d0\u5347\u6a21\u578b\u5728\u4e2d\u6587\u53ca\u5176\u4ed6\u7279\u5b9a\u9886\u57df\u7684\u6027\u80fd\u3002\u9488\u5bf98B\u5927\u5c0f\u7684Llama-3\u6a21\u578b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\uff0c\u786e\u5b9a\u4e86\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u7684\u5173\u952e\u8d85\u53c2\u6570\uff0c\u5e76\u901a\u8fc7\u7cbe\u7ec6\u8c03\u6574\uff0c\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5728\u4e2d\u6587\u76f8\u5173\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u6570\u5b66\u3001\u7f16\u7a0b\u548c\u60c5\u7eea\u667a\u80fd\u7b49\u7279\u5b9a\u9886\u57df\u7684\u80fd\u529b\u3002\u6700\u7ec8\uff0c\u6211\u4eec\u5c0670B\u5927\u5c0f\u7684LLM\u90e8\u7f72\u5230\u5b9e\u9645\u804a\u5929\u7cfb\u7edf\u4e2d\uff0c\u5e76\u53d6\u5f97\u4e86\u4ee4\u4eba\u6ee1\u610f\u7684\u6548\u679c\u3002|\n", "2409.06601": "|**2024-09-10**|**Alleviating Hallucinations in Large Language Models with Scepticism Modeling**|Yetao Wu et.al.|[2409.06601](http://arxiv.org/abs/2409.06601)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u4e3b\u8981\u6311\u6218\u662f\u5e7b\u89c9\u73b0\u8c61\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5728\u591a\u4e2a\u9886\u57df\u7684\u5e94\u7528\u3002\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u53ef\u4ee5\u88ab\u7528\u4e8e\u7f13\u89e3\u5e7b\u89c9\u5e26\u6765\u7684\u635f\u5bb3\u3002\u4eba\u7c7b\u7684\u6000\u7591\u60c5\u7eea\u88ab\u8ba4\u4e3a\u80fd\u589e\u5f3a\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c2\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8d28\u7591\u5efa\u6a21\u201d\uff08SM\uff09\u7684\u65b0\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u8bcd\u5143\u548clogits\u4fe1\u606f\u6765\u8fdb\u884c\u81ea\u6211\u8bc4\u4f30\u800c\u5f97\u5230\u5f62\u5f0f\u5316\u3002\u6211\u4eec\u6784\u5efa\u4e86\u5305\u542b\u6000\u7591\u60c5\u7eea\u610f\u8bc6\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u8fde\u7eed\u9884\u8bad\u7ec3\uff0c\u7136\u540e\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u63d0\u5347\u5b83\u4eec\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u6709\u6548\u589e\u5f3a\u4e86\u6a21\u578b\u4f30\u7b97\u4e0d\u786e\u5b9a\u6027\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u8de8\u9886\u57df\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u5176\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.06595": "|**2024-09-10**|**GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering**|Sacha Muller et.al.|[2409.06595](http://arxiv.org/abs/2409.06595)|**[link](https://github.com/illuin-tech/grouse)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u79c1\u6709\u4e14\u66f4\u65b0\u81f3\u6700\u65b0\u7684\u77e5\u8bc6\u5e93\u76f8\u7ed3\u5408\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u8303\u5f0f\u65f6\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u8bc4\u4f30\u7531RAG\u7cfb\u7edf\u751f\u6210\u7684\u57fa\u4e8e\u73b0\u5b9e\u7684\u7b54\u6848\u65f6\uff0c\u4f5c\u4e3a\u88c1\u5224\u7684LLM\u6240\u9047\u5230\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u8bc4\u4f30\u88c1\u5224\u6a21\u578b\u7684\u6821\u51c6\u548c\u533a\u5206\u80fd\u529b\uff0c\u6211\u4eec\u8bc6\u522b\u4e867\u79cd\u751f\u6210\u5668\u5931\u8d25\u6a21\u5f0f\uff0c\u5e76\u5f15\u5165\u4e86GroUSE\uff08\u57fa\u4e8e\u95ee\u9898\u89e3\u7b54\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b144\u4e2a\u5355\u5143\u6d4b\u8bd5\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\u3002\u8fd9\u4e2a\u57fa\u51c6\u63ed\u793a\u4e86\u73b0\u6709\u7684\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u5f80\u5f80\u5ffd\u89c6\u4e86\u91cd\u8981\u5931\u8d25\u6a21\u5f0f\uff0c\u5373\u4f7f\u5728\u4f7f\u7528GPT-4\u4f5c\u4e3a\u88c1\u5224\u7684\u60c5\u51b5\u4e0b\u4e5f\u662f\u5982\u6b64\u3002 \u4e3a\u4e86\u6539\u8fdb\u5f53\u524d\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u7684\u8bbe\u8ba1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7ba1\u9053\uff0c\u5e76\u53d1\u73b0\u5c01\u95ed\u6a21\u578b\u5728GroUSE\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u88c1\u5224\u6a21\u578b\u5728\u6211\u4eec\u7684\u63d0\u8bae\u6807\u51c6\u4e0b\u5e76\u672a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c3d\u7ba1\u5b83\u4eec\u4e0eGPT-4\u7684\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7684\u76f8\u5173\u6027\u662f\u4e00\u4e2a\u4e0d\u5b8c\u6574\u7684\u4ee3\u7406\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u88c1\u5224\u6a21\u578b\u7684\u5b9e\u9645\u6027\u80fd\uff0c\u5e76\u5e94\u8be5\u901a\u8fc7\u5bf9\u53c2\u8003\u60c5\u51b5\u7684\u7cbe\u786e\u5931\u8d25\u6a21\u5f0f\u68c0\u6d4b\u8fdb\u884c\u8865\u5145\u8bc4\u4f30\u3002 \u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u901a\u8fc7\u5728GPT-4\u7684\u63a8\u7406\u75d5\u8ff9\u4e0a\u5bf9Llama-3\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5176\u8bc4\u4f30\u80fd\u529b\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u4e0eGPT-4\u8bc4\u4ef7\u7684\u76f8\u5173\u6027\u548c\u53c2\u8003\u60c5\u51b5\u7684\u6821\u51c6\u5ea6\u3002|\n", "2409.06558": "|**2024-09-10**|**MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science**|Mahdieh Aliazam et.al.|[2409.06558](http://arxiv.org/abs/2409.06558)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u9ad8\u5ea6\u7cbe\u786e\u548c\u9ad8\u6548\u7684\u7cfb\u7edf\u7684\u9700\u6c42\u4e5f\u5728\u4e0d\u65ad\u589e\u957f\uff0c\u4ee5\u63d0\u5347\u5b89\u5168\u6027\u80fd\u3001\u64cd\u4f5c\u6548\u7387\u548c\u80fd\u6e90\u6d88\u8017\u3002\u5728\u7ba1\u7406\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u65f6\uff0c\u9884\u6d4b\u8f66\u8f86\u8fd0\u884c\u671f\u95f4\u7684\u5404\u79cd\u6761\u4ef6\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6539\u8fdb\u4ee5\u53ca\u77e5\u540d\u6a21\u578b\u5982ChatGPT\u7684\u51fa\u73b0\uff0c\u4e3a\u81ea\u52a8\u9a7e\u9a76\u76f8\u5173\u9884\u6d4b\u63d0\u4f9b\u4e86\u72ec\u7279\u7684\u673a\u4f1a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAPS\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u4f5c\u4e3a\u5730\u56fe\u9605\u8bfb\u8f85\u52a9\u9a7e\u9a76\u5458\uff0c\u9884\u6d4b\u5728\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u8bbe\u7f6e\u7684\u5173\u952e\u53c2\u6570\uff0c\u4ee5\u5e73\u8861\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u3002MAPS\u65b9\u6cd5\u5728\u5bfc\u822a\u7cbe\u5ea6\u65b9\u9762\u76f8\u8f83\u4e8e\u6700\u4f73\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e8620%\u3002\u6b64\u5916\uff0cMAPS\u8fd8\u663e\u793a\u4e86\u5728\u8ba1\u7b97\u5355\u5143\u4e0a\u8282\u7701\u4e8611%\u7684\u80fd\u6e90\uff0c\u5e76\u5728\u673a\u68b0\u548c\u8ba1\u7b97\u5355\u5143\u4e0a\u6700\u9ad8\u8282\u7701\u4e8654%\u3002|\n", "2409.06518": "|**2024-09-10**|**Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games**|Juhwan Choi et.al.|[2409.06518](http://arxiv.org/abs/2409.06518)|**[link](https://github.com/c-juhwan/olympics_analysis)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u7ecf\u6210\u4e3a\u4e3b\u5bfc\u6027\u65b9\u6cd5\uff0c\u7136\u800c\u5b83\u4eec\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u901a\u8fc7\u5206\u6790\u5965\u6797\u5339\u514b\u8fd0\u52a8\u4f1a\u7684\u5386\u53f2\u5956\u724c\u7edf\u8ba1\u60c5\u51b5\uff0c\u7814\u7a76\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u3002\u6211\u4eec\u8981\u6c42\u6a21\u578b\u63d0\u4f9b\u5404\u961f\u7684\u5956\u724c\u6570\u91cf\uff0c\u5e76\u786e\u5b9a\u54ea\u4e9b\u961f\u4f0d\u83b7\u5f97\u4e86\u7279\u5b9a\u6392\u540d\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u6700\u5148\u8fdb\u7684LLM\u5728\u62a5\u544a\u5355\u4e2a\u961f\u4f0d\u7684\u5956\u724c\u6570\u91cf\u65b9\u9762\u8868\u73b0\u5f97\u975e\u5e38\u51fa\u8272\uff0c\u4f46\u5728\u56de\u7b54\u5173\u4e8e\u7279\u5b9a\u6392\u540d\u7684\u95ee\u9898\u65f6\u5374\u9047\u5230\u663e\u8457\u56f0\u96be\u3002\u8fd9\u6697\u793a\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4e0e\u4eba\u7c7b\u7684\u6839\u672c\u4e0d\u540c\uff0c\u4eba\u7c7b\u80fd\u591f\u8f7b\u677e\u5730\u4ece\u5df2\u77e5\u7684\u5956\u724c\u6570\u91cf\u63a8\u65ad\u51fa\u6392\u540d\u3002\u4e3a\u4e86\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u6a21\u578b\u8f93\u51fa\u3002|\n", "2409.07453": "|**2024-09-11**|**\"My Grade is Wrong!\": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays**|Shengxin Hong et.al.|[2409.07453](http://arxiv.org/abs/2409.07453)|null|\u4ea4\u4e92\u5f0f\u53cd\u9988\u5728\u6559\u5e08\u4e0e\u5b66\u751f\u4e4b\u95f4\u53cc\u5411\u6d41\u52a8\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u5411\u53cd\u9988\u66f4\u4e3a\u6709\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u53cd\u9988\u65b9\u5f0f\u5f80\u5f80\u8017\u65f6\u8fc7\u591a\uff0c\u96be\u4ee5\u5728\u6559\u80b2\u5b9e\u8df5\u4e2d\u5e7f\u6cdb\u5e94\u7528\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5177\u6709\u81ea\u52a8\u5316\u53cd\u9988\u7684\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u4e92\u52a8\u60c5\u5883\u4e0b\u7684\u63a8\u7406\u548c\u4ea4\u4e92\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCAELF\uff08Contestable AI Empowered LLM\u6846\u67b6\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u591a\u4ee3\u7406\u7cfb\u7edf\u4e0e\u8ba1\u7b97\u8bba\u8fa9\u6765\u81ea\u52a8\u5316\u4ea4\u4e92\u5f0f\u53cd\u9988\u3002\u9996\u5148\uff0c\u5b66\u751f\u7684\u4f5c\u6587\u7531\u591a\u4e2a\u6559\u5b66\u52a9\u7406\u4ee3\u7406\uff08TA\u4ee3\u7406\uff09\u8fdb\u884c\u8bc4\u4f30\uff0c\u968f\u540e\uff0c\u6559\u5e08\u4ee3\u7406\u901a\u8fc7\u5f62\u5f0f\u5316\u63a8\u7406\u6574\u5408\u8fd9\u4e9b\u8bc4\u4ef7\uff0c\u751f\u6210\u53cd\u9988\u548c\u8bc4\u5206\u3002\u5b66\u751f\u53ef\u4ee5\u8fdb\u4e00\u6b65\u4e0e\u53cd\u9988\u4e92\u52a8\uff0c\u4ee5\u6df1\u5316\u7406\u89e3\u3002\u901a\u8fc7\u5bf9500\u7bc7\u6279\u5224\u6027\u601d\u7ef4\u4f5c\u6587\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u5e76\u7ed3\u5408\u7528\u6237\u7814\u7a76\uff0c\u7ed3\u679c\u8868\u660e\uff0cCAELF\u663e\u8457\u63d0\u9ad8\u4e86\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u589e\u5f3a\u4e86LLM\u7684\u63a8\u7406\u548c\u4e92\u52a8\u80fd\u529b\u3002\u8fd9\u4e00\u65b9\u6cd5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u514b\u670d\u5f71\u54cd\u6559\u80b2\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u65f6\u95f4\u548c\u8d44\u6e90\u969c\u788d\u7684\u6709\u524d\u666f\u89e3\u51b3\u65b9\u6848\u3002|\n", "2409.07440": "|**2024-09-11**|**SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories**|Ben Bogin et.al.|[2409.07440](http://arxiv.org/abs/2409.07440)|**[link](https://github.com/allenai/super-benchmark)**|**\u7ed9\u5b9a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u5199\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u5b83\u4eec\u73b0\u5728\u662f\u5426\u80fd\u591f\u81ea\u4e3b\u91cd\u73b0\u7814\u7a76\u4ed3\u5e93\u4e2d\u7684\u7ed3\u679c\uff1f\u8fd9\u6837\u7684\u80fd\u529b\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u4ea7\u751f\u5de8\u5927\u76ca\u5904\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9a8c\u8bc1\u3001\u7406\u89e3\u5e76\u6269\u5c55\u5148\u524d\u7684\u5de5\u4f5c\u3002\u4e3a\u4e86\u5411\u8fd9\u4e00\u76ee\u6807\u8fc8\u8fdb\uff0c\u6211\u4eec\u5f15\u5165\u4e86SUPER\uff0c\u8fd9\u662f\u9996\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4ece\u7814\u7a76\u4ed3\u5e93\u8bbe\u7f6e\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u7684\u57fa\u51c6\u3002SUPER\u65e8\u5728\u6355\u6349\u7814\u7a76\u4eba\u5458\u5728\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u4ed3\u5e93\u5de5\u4f5c\u65f6\u6240\u9762\u4e34\u7684\u771f\u5b9e\u6311\u6218\u3002\u6211\u4eec\u7684\u57fa\u51c6\u7531\u4e09\u4e2a\u4e0d\u540c\u7684\u95ee\u9898\u96c6\u7ec4\u6210\uff1a45\u4e2a\u7aef\u5230\u7aef\u95ee\u9898\uff0c\u9644\u6709\u4e13\u5bb6\u89e3\u51b3\u65b9\u6848\u7684\u6ce8\u91ca\uff0c152\u4e2a\u4e13\u6ce8\u4e8e\u7279\u5b9a\u6311\u6218\uff08\u4f8b\u5982\u914d\u7f6e\u8bad\u7ec3\u5668\uff09\u7684\u5b50\u95ee\u9898\uff0c\u4ee5\u53ca602\u4e2a\u7528\u4e8e\u66f4\u5927\u89c4\u6a21\u5f00\u53d1\u7684\u81ea\u52a8\u751f\u6210\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u5404\u79cd\u8bc4\u4f30\u6307\u6807\u6765\u8bc4\u4f30\u4efb\u52a1\u6210\u529f\u548c\u8fdb\u5ea6\uff0c\u5f53\u6709\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\u53ef\u7528\u65f6\u4f7f\u7528\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\uff0c\u5426\u5219\u4f7f\u7528\u8fd1\u4f3c\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u9047\u5230\u4e86\u56f0\u96be\uff0c\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4o\uff09\u4ec5\u89e3\u51b3\u4e8616.3%\u7684\u7aef\u5230\u7aef\u96c6\u548c46.1%\u7684\u573a\u666f\u3002\u8fd9\u8868\u660e\u4e86\u8fd9\u9879\u4efb\u52a1\u7684\u6311\u6218\u6027\uff0c\u5e76\u8868\u660eSUPER\u53ef\u4ee5\u4f5c\u4e3a\u793e\u533a\u8861\u91cf\u548c\u63a8\u52a8\u8fdb\u6b65\u7684\u5b9d\u8d35\u8d44\u6e90\u3002**|\n", "2409.07407": "|**2024-09-11**|**CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification**|Zeqing Qin et.al.|[2409.07407](http://arxiv.org/abs/2409.07407)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6f0f\u6d1e\u8bc6\u522b\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7531\u4e8eC/C++\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u5360\u636e\u4e86\u5f00\u6e90\u8f6f\u4ef6\uff08OSS\uff09\u6f0f\u6d1e\u7684\u4e00\u534a\uff0c\u5e76\u4e14\u4e3b\u8981\u901a\u8fc7\u63d0\u4ea4\u8fdb\u884c\u66f4\u65b0\uff0c\u56e0\u6b64\u589e\u5f3aLLM\u5728\u8bc6\u522bC/C++\u6f0f\u6d1e\u8d21\u732e\u63d0\u4ea4\uff08VCC\uff09\u65b9\u9762\u7684\u80fd\u529b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u5927\u89c4\u6a21\u4ee3\u7801\u96c6\u8fdb\u4e00\u6b65\u9884\u8bad\u7ec3LLM\u4e0a\uff0c\u8fd9\u65e2\u8017\u8d39\u8d44\u6e90\u53c8\u5b58\u5728\u6548\u7387\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u65b9\u6cd5\u6765\u63d0\u5347\u57fa\u4e8eBERT\u7684LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86CodeLinguaNexus\uff08CLNX\uff09\uff0c\u4f5c\u4e3a\u8fde\u63a5C/C++\u7a0b\u5e8f\u4e0eLLM\u7684\u6865\u6881\u3002CLNX\u901a\u8fc7\u5728\u4fdd\u7559\u5173\u952e\u7ec6\u8282\u7684\u540c\u65f6\uff0c\u4ee5\u66f4\u81ea\u7136\u7684\u65b9\u5f0f\u9ad8\u6548\u5730\u5c06\u6e90\u4ee3\u7801\u8f6c\u6362\u4e3a\u66f4\u9002\u5408LLM\u5904\u7406\u7684\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0cCLNX\u9996\u5148\u5e94\u7528\u7ed3\u6784\u7ea7\u81ea\u7136\u5316\u6765\u5206\u89e3\u590d\u6742\u7684\u7a0b\u5e8f\uff0c\u7136\u540e\u5e94\u7528\u7b26\u53f7\u7ea7\u81ea\u7136\u5316\u6765\u89e3\u91ca\u590d\u6742\u7684\u7b26\u53f7\u3002\u6211\u4eec\u5728\u5305\u542b25,872\u4e2aC/C++\u51fd\u6570\u53ca\u5176\u63d0\u4ea4\u7684\u516c\u5f00\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86CLNX\u3002\u7ed3\u679c\u8868\u660e\uff0cCLNX\u663e\u8457\u63d0\u5347\u4e86LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u914d\u5907CLNX\u7684CodeBERT\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u8bc6\u522b\u4e8638\u4e2aOSS\u6f0f\u6d1e\u3002|\n", "2409.07394": "|**2024-09-11**|**AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge**|Han Wang et.al.|[2409.07394](http://arxiv.org/abs/2409.07394)|**[link](https://github.com/hannight/adacad)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u53c2\u6570\u5b58\u50a8\u7684\u77e5\u8bc6\u4e4b\u95f4\u5b58\u5728\u77e5\u8bc6\u51b2\u7a81\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u4f7f\u7528\u6807\u51c6\u89e3\u7801\u6280\u672f\u65f6\u6027\u80fd\u53d7\u635f\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6280\u672f\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e0a\u4e0b\u6587\u3002\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u95f4\u5bf9\u6bd4\u65b9\u6cd5\u8bd5\u56fe\u901a\u8fc7\u6bd4\u8f83\u5e26\u6709\u548c\u4e0d\u5e26\u6709\u4e0a\u4e0b\u6587\u7684LLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u5bf9\u6bd4\uff0c\u5e76\u6839\u636e\u5b83\u4eec\u4e4b\u95f4\u7684\u5bf9\u6bd4\u8c03\u6574\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u65b9\u6cd5\u7ecf\u5e38\u9519\u8bef\u5730\u5224\u65ad\u51b2\u7a81\u7684\u7a0b\u5ea6\uff0c\u5e76\u4e14\u96be\u4ee5\u5904\u7406\u4e0d\u540c\u51b2\u7a81\u7a0b\u5ea6\u7684\u5b9e\u4f8b\uff0c\u9759\u6001\u65b9\u6cd5\u5728\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\u8fc7\u5ea6\u8c03\u6574\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684\u7cbe\u7ec6\u7c92\u5ea6\u65b9\u6cd5AdaCAD\uff0c\u5b83\u52a8\u6001\u5730\u6839\u636eJensen-Shannon\u6563\u5ea6\u6d4b\u91cf\u7684\u4e0a\u4e0b\u6587\u548c\u53c2\u6570\u77e5\u8bc6\u5206\u5e03\u4e4b\u95f4\u7684\u51b2\u7a81\u7a0b\u5ea6\u6765\u63a8\u65ad\u8c03\u6574\u6743\u91cd\u3002\u6211\u4eec\u5728\u56db\u4e2a\u6a21\u578b\u4e0a\u5bf9\u516d\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u548c\u4e09\u4e2a\u6458\u8981\u4efb\u52a1\u8fdb\u884c\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u81ea\u9002\u5e94\u65b9\u6cd5\u59cb\u7ec8\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u5176\u4ed6\u89e3\u7801\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8614.21%\uff08\u7edd\u5bf9\u503c\uff09\uff0c\u5e76\u4e14\u63d0\u9ad8\u4e86\u6458\u8981\u7684\u771f\u5b9e\u6027\uff0cAlignScore\u63d0\u9ad8\u4e865.59\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4e0e\u51b2\u7a81\u7684\u5bf9\u6bd4\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5f53\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\uff0c\u89e3\u7801\u4f1a\u635f\u5bb3\u6027\u80fd\uff0c\u800cAdaCAD\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u635f\u5931\uff0c\u4f7f\u5176\u66f4\u9002\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\uff0c\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e2d\uff0c\u6709\u4e9b\u793a\u4f8b\u5b58\u5728\u51b2\u7a81\uff0c\u800c\u5176\u4ed6\u793a\u4f8b\u5219\u4e0d\u5b58\u5728\u51b2\u7a81\u3002**|\n", "2409.07368": "|**2024-09-11**|**Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code**|Khiem Ton et.al.|[2409.07368](http://arxiv.org/abs/2409.07368)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSGCode\u7684\u7075\u6d3b\u63d0\u793a\u4f18\u5316\u7cfb\u7edf\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5b89\u5168\u4ee3\u7801\u3002SGCode\u5c06\u6700\u8fd1\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e0eLLM\u7ed3\u5408\u5728\u4e00\u4e2a\u7edf\u4e00\u7684\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u524d\u7aef\u548c\u540e\u7aefAPI\u63d0\u4f9b\u670d\u52a1\uff0c\u4f7f\u7528\u6237\u80fd\u591f\uff1a1\uff09\u751f\u6210\u65e0\u6f0f\u6d1e\u7684\u5b89\u5168\u4ee3\u7801\uff1b2\uff09\u67e5\u770b\u548c\u5171\u4eab\u5b89\u5168\u6027\u5206\u6790\uff1b\u4ee5\u53ca3\uff09\u8f7b\u677e\u5728\u4e0d\u540c\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u63d0\u4f9b\u6709\u5173\u6a21\u578b\u548c\u7cfb\u7edf\u6027\u80fd\u7684\u89c1\u89e3\u3002\u6211\u4eec\u4f7f\u7528AWS\u670d\u52a1\u5668\u4e0a\u7684PromSec\u586b\u5145SGCode\uff0c\u8fd9\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06LLM\u3001\u5b89\u5168\u5de5\u5177\u4e0e\u8f7b\u91cf\u7ea7\u751f\u6210\u5bf9\u6297\u56fe\u795e\u7ecf\u7f51\u7edc\u76f8\u7ed3\u5408\uff0c\u6765\u68c0\u6d4b\u5e76\u4fee\u590d\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u4ece\u800c\u4f18\u5316\u63d0\u793a\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSGCode\u4f5c\u4e3a\u516c\u5171\u5de5\u5177\uff0c\u80fd\u591f\u63ed\u793a\u6a21\u578b\u5b9e\u7528\u6027\u3001\u5b89\u5168\u4ee3\u7801\u751f\u6210\u548c\u7cfb\u7edf\u6210\u672c\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5177\u6709\u76f8\u5bf9\u8f83\u4f4e\u7684\u6210\u672c\u3002SGCode\u5df2\u4e0a\u7ebf\u4e8e\uff1a\u3002|\n", "2409.07355": "|**2024-09-11**|**Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation**|SeongYeub Chu et.al.|[2409.07355](http://arxiv.org/abs/2409.07355)|**[link](https://github.com/BBeeChu/InteractEval)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cInteractEval\u201d\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u91c7\u7528\u201cThink-Aloud\u201d\u65b9\u6cd5\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u610f\u89c1\uff0c\u4ee5\u751f\u6210\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u7684\u5c5e\u6027\u3002\u901a\u8fc7\u878d\u5408\u4eba\u7c7b\u7684\u7075\u6d3b\u6027\u548c\u63a8\u7406\u80fd\u529b\u4ee5\u53caLLM\u7684\u4e00\u81f4\u6027\uff0cInteractEval\u5728\u4e00\u81f4\u6027\u3001\u6d41\u7545\u6027\u3001\u76f8\u5173\u6027\u548c\u8fde\u8d2f\u6027\u56db\u4e2a\u7ef4\u5ea6\u4e0a\u5747\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u975eLLM\u57fa\u7ebf\u548cLLM\u57fa\u7ebf\u6a21\u578b\u3002\u5b9e\u9a8c\u8fd8\u63a2\u8ba8\u4e86\u201cThink-Aloud\u201d\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5b83\u80fd\u4fc3\u8fdb\u4eba\u7c7b\u548cLLM\u7684\u53d1\u6563\u601d\u7ef4\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u5e7f\u6cdb\u7684\u76f8\u5173\u5c5e\u6027\uff0c\u5e76\u63d0\u9ad8\u6587\u672c\u8bc4\u4f30\u6027\u80fd\u3002\u6bd4\u8f83\u5206\u6790\u663e\u793a\uff0c\u4eba\u7c7b\u5728\u8bc6\u522b\u4e0e\u5185\u90e8\u8d28\u91cf\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u8fde\u8d2f\u6027\u548c\u6d41\u7545\u6027\uff09\u65b9\u9762\u8868\u73b0\u4f18\u5f02\uff0c\u800cLLM\u5728\u4e0e\u5916\u90e8\u5bf9\u9f50\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u4e00\u81f4\u6027\u548c\u76f8\u5173\u6027\uff09\u4e0a\u8868\u73b0\u66f4\u597d\u3002\u56e0\u6b64\uff0c\u7ed3\u5408\u4eba\u7c7b\u548cLLM\u5171\u540c\u4ea7\u751f\u7684\u8bc4\u4f30\u7ed3\u679c\u6700\u4f73\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u672c\u6587\u5f3a\u8c03\u4e86\u5728\u81ea\u52a8\u5316\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u6846\u67b6\u4e2d\u6709\u6548\u6574\u5408\u4eba\u7c7b\u548cLLM\u7684\u5fc5\u8981\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\textbf{\\url{https://github.com/BBeeChu/InteractEval.git}}}\u3002**|\n", "2409.07331": "|**2024-09-11**|**Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering**|Weixi Weng et.al.|[2409.07331](http://arxiv.org/abs/2409.07331)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u51fa\u8272\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u77e5\u8bc6\u57fa\u89c6\u89c9\u95ee\u7b54\uff08KB-VQA\uff09\u4efb\u52a1\u4e2d\uff0cMLLMs\u53ef\u80fd\u7f3a\u4e4f\u4eba\u7c7b\u5e38\u8bc6\u6216\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u4ece\u800c\u9700\u8981\u4ece\u5916\u90e8\u77e5\u8bc6\u6e90\u83b7\u53d6\u6240\u9700\u4fe1\u606f\u4ee5\u56de\u7b54\u6b64\u7c7b\u95ee\u9898\u3002\u5148\u524d\u7684\u5de5\u4f5c\uff0c\u5982\u68c0\u7d22\u589e\u5f3a\u7684VQA-v2\uff08RAVQA-v2\uff09\uff0c\u4fa7\u91cd\u4e8e\u5145\u5206\u5229\u7528\u8f93\u5165\u4fe1\u606f\uff0c\u4f8b\u5982\u56fe\u50cf\u6587\u672c\u63cf\u8ff0\u548c\u68c0\u7d22\u7684\u77e5\u8bc6\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u90fd\u5ffd\u89c6\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u968f\u7740\u8f93\u5165\u4ee4\u724c\u6570\u91cf\u7684\u589e\u52a0\uff0c\u63a8\u7406\u6548\u7387\u663e\u8457\u964d\u4f4e\uff0c\u8fd9\u4e0e\u5b9e\u9645\u5e94\u7528\u7684\u9700\u6c42\u76f8\u77db\u76fe\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68c0\u7d22\u589e\u5f3a\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08RACC\uff09\u3002RACC\u5b66\u4e60\u538b\u7f29\u5e76\u805a\u5408\u68c0\u7d22\u4e0a\u4e0b\u6587\uff0c\u5e76\u751f\u6210\u7d27\u51d1\u7684\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u5f62\u5f0f\u7684\u8c03\u8282\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u8c03\u8282\u6765\u9002\u5e94\u4e0b\u6e38\u51bb\u7ed3\u7684MLLM\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u4e14\u9ad8\u6548\u7684\u63a8\u7406\u3002RACC\u5728OK-VQA\u4e0a\u5b9e\u73b0\u4e86\u5f53\u524d\u6700\u4f73\u768462.9%\u6027\u80fd\u3002\u6b64\u5916\uff0c\u5b83\u5c06RAVQA-v2\u7684\u63a8\u7406\u5ef6\u8fdf\u663e\u8457\u964d\u4f4e\u4e8622.0%-59.7%\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\u4e86RACC\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u3002\u5b83\u4e0e\u5404\u79cd\u73b0\u6210\u7684MLLM\u517c\u5bb9\uff0c\u5e76\u53ef\u4ee5\u5904\u7406\u5305\u62ec\u6587\u672c\u548c\u591a\u6a21\u6001\u6587\u6863\u5728\u5185\u7684\u4e0d\u540c\u77e5\u8bc6\u6e90\u3002|\n", "2409.07314": "|**2024-09-11**|**MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications**|Praveen K Kanithi et.al.|[2409.07314](http://arxiv.org/abs/2409.07314)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u533b\u7597\u5065\u5eb7\u9886\u57df\u7684\u5feb\u901f\u5f00\u53d1\u5f15\u53d1\u4e86\u5bf9\u8d85\u8d8a\u5982USMLE\u7b49\u5e38\u7528\u57fa\u51c6\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u9700\u6c42\uff0c\u4ee5\u66f4\u597d\u5730\u53cd\u6620\u5b9e\u9645\u5e94\u7528\u8868\u73b0\u3002\u867d\u7136\u73b0\u5b9e\u4e16\u754c\u7684\u8bc4\u4f30\u662f\u5b9e\u7528\u6027\u7684\u91cd\u8981\u6307\u6807\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u843d\u540e\u4e8eLLM\u6f14\u8fdb\u7684\u901f\u5ea6\uff0c\u53ef\u80fd\u5bfc\u81f4\u7814\u7a76\u7ed3\u679c\u5728\u90e8\u7f72\u65f6\u53d8\u5f97\u8fc7\u65f6\u3002\u8fd9\u79cd\u65f6\u95f4\u4e0a\u7684\u8131\u8282\u9700\u8981\u4e00\u79cd\u5168\u9762\u7684\u524d\u671f\u8bc4\u4f30\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6a21\u578b\u9009\u62e9\u3002 \u6211\u4eec\u5f15\u5165\u4e86MEDIC\u6846\u67b6\uff0c\u5b83\u4ece\u4e94\u4e2a\u5173\u952e\u7684\u4e34\u5e8a\u80fd\u529b\u7ef4\u5ea6\u8bc4\u4f30LLM\uff1a\u533b\u5b66\u63a8\u7406\u3001\u4f26\u7406\u4e0e\u504f\u89c1\u3001\u6570\u636e\u548c\u8bed\u8a00\u7406\u89e3\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\u4ee5\u53ca\u4e34\u5e8a\u5b89\u5168\u6027\u3002MEDIC\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ea4\u53c9\u5ba1\u67e5\u6846\u67b6\uff0c\u91cf\u5316\u4e86LLM\u5728\u8986\u76d6\u8303\u56f4\u548c\u5e7b\u89c9\u68c0\u6d4b\u7b49\u9886\u57df\u7684\u6027\u80fd\uff0c\u800c\u65e0\u9700\u53c2\u8003\u8f93\u51fa\u3002\u6211\u4eec\u4f7f\u7528MEDIC\u5bf9\u533b\u7597\u95ee\u7b54\u3001\u5b89\u5168\u3001\u603b\u7ed3\u3001\u7b14\u8bb0\u751f\u6210\u4ee5\u53ca\u5176\u4ed6\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\u4e0d\u540c\u6a21\u578b\u5927\u5c0f\u4e4b\u95f4\u3001\u57fa\u7ebf\u6a21\u578b\u4e0e\u533b\u5b66\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u5e76\u5bf9\u9700\u8981\u7279\u5b9a\u6a21\u578b\u4f18\u52bf\u7684\u5e94\u7528\uff08\u5982\u4f4e\u5e7b\u89c9\u6216\u8f83\u4f4e\u63a8\u7406\u6210\u672c\uff09\u7684\u6a21\u578b\u9009\u62e9\u5177\u6709\u542f\u793a\u610f\u4e49\u3002MEDIC\u7684\u591a\u7ef4\u5ea6\u8bc4\u4f30\u63ed\u793a\u4e86\u7406\u8bba\u80fd\u529b\u548c\u5b9e\u9645\u5b9e\u65bd\u4e4b\u95f4\u7684\u6027\u80fd\u6743\u8861\uff0c\u5f25\u5408\u4e86\u5728\u533b\u7597\u4fdd\u5065\u73af\u5883\u4e2d\u8bc6\u522b\u548c\u9002\u5e94\u6700\u6709\u524d\u666f\u6a21\u578b\u7684\u5dee\u8ddd\uff0c\u786e\u4fdd\u4e86\u9002\u5408\u591a\u79cd\u533b\u7597\u4fdd\u5065\u5e94\u7528\u7684\u6a21\u578b\u5f97\u5230\u8bc6\u522b\u548c\u9002\u5e94\u3002|\n", "2409.07276": "|**2024-09-11**|**STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM**|Qijiong Liu et.al.|[2409.07276](http://arxiv.org/abs/2409.07276)|null|\u4f20\u7edf\u63a8\u8350\u6a21\u578b\u901a\u5e38\u4f9d\u8d56\u4e8e\u72ec\u7279\u7684\u9879\u76ee\u6807\u8bc6\u7b26\uff08ID\uff09\u6765\u533a\u5206\u9879\u76ee\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u5b83\u4eec\u5229\u7528\u9879\u76ee\u5185\u5bb9\u4fe1\u606f\u548c\u63a8\u5e7f\u957f\u5c3e\u6216\u51b7\u542f\u52a8\u9879\u76ee\u7684\u80fd \u529b\u3002\u8fd1\u671f\uff0c\u5df2\u63d0\u51fa\u8bed\u4e49\u5206\u8bcd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6bcf\u4e2a\u9879\u76ee\u7684\u8bed\u4e49\u8868\u793a\u5206\u8bcd\u4e3a\u4e00\u7cfb\u5217\u79bb\u6563\u7684\u4ee4\u724c\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5b83\u4fdd \u7559\u4e86\u9879\u76ee\u5728\u8fd9\u4e9b\u4ee4\u724c\u5185\u7684\u8bed\u4e49\uff0c\u5e76\u786e\u4fdd\u5177\u6709\u76f8\u4f3c\u8bed\u4e49\u7684\u9879\u76ee\u7531\u76f8\u4f3c\u7684\u4ee4\u724c\u8868\u793a\u3002\u8fd9\u4e9b\u8bed\u4e49\u4ee4\u724c\u6210\u4e3a\u8bad\u7ec3\u751f\u6210\u63a8\u8350\u6a21\u578b\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u73b0\u6709 \u7684\u751f\u6210\u63a8\u8350\u65b9\u6cd5\u901a\u5e38\u6d89\u53ca\u591a\u4e2a\u5b50\u6a21\u578b\u8fdb\u884c\u5d4c\u5165\u3001\u91cf\u5316\u548c\u63a8\u8350\uff0c\u5bfc\u81f4\u7cfb\u7edf\u8fc7\u4e8e\u590d\u6742\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u79f0\u4e3aSTORE\uff0c \u5229\u7528\u5355\u4e00\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u540c\u65f6\u6267\u884c\u8fd9\u4e24\u9879\u4efb\u52a1\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u8bed\u4e49\u5206\u8bcd\u8868\u8ff0\u4e3a\u6587\u672c\u5230\u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u800c\u751f\u6210\u63a8\u8350\u5219\u8868\u8ff0\u4e3a\u4ee4\u724c\u5230 \u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u901a\u8fc7\u8865\u5145\u4ee4\u724c\u5230\u6587\u672c\u91cd\u6784\u4efb\u52a1\u548c\u6587\u672c\u5230\u4ee4\u724c\u8f85\u52a9\u4efb\u52a1\uff0c\u6240\u6709\u8fd9\u4e9b\u4efb\u52a1\u5747\u4ee5\u751f\u6210\u65b9\u5f0f\u8868\u8ff0\u5e76\u4f7f\u7528\u5355\u4e00LLM\u9aa8\u5e72\u8fdb\u884c\u8bad\u7ec3\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1\u6211\u4eec\u7684STORE\u6846\u67b6\u5728\u5404\u79cd\u63a8\u8350\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u6e90\u4ee3\u7801\u548c\u914d\u7f6e\uff0c\u4ee5\u4fbf\u8fdb\u884c\u53ef\u590d\u73b0\u7684\u7814\u7a76\u3002|\n", "2409.07267": "|**2024-09-11**|**MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving**|Enming Zhang et.al.|[2409.07267](http://arxiv.org/abs/2409.07267)|**[link](https://github.com/emzucas/minidrive)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMiniDrive\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u573a\u666f\u4e2d\u7684\u5e94\u7528\u96be\u9898\u3002\u73b0\u6709\u7684VLM\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u8ba1\u7b97\u5bc6\u96c6\u578b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u5728\u5b9e\u9645\u4e16\u754c\u548c\u5b9e\u65f6\u5e94\u7528\u4e2d\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u5927\u591a\u6570\u73b0\u6709VLM\u7f3a\u4e4f\u5904\u7406\u591a\u5f20\u56fe\u7247\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u9002\u5e94\u81ea\u52a8\u9a7e\u9a76\u4e2d\u7684\u591a\u6444\u50cf\u5934\u611f\u77e5\u9700\u6c42\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u7279\u5f81\u5de5\u7a0b\u6df7\u5408\u4e13\u5bb6\uff08FE-MoE\uff09\u548c\u52a8\u6001\u6307\u4ee4\u9002\u914d\u5668\uff08DI-Adapter\uff09\u3002FE-MoE\u6709\u6548\u5730\u5c06\u4e8c\u7ef4\u7279\u5f81\u6620\u5c04\u5230\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\uff0c\u7136\u540e\u4f5c\u4e3a\u8f93\u5165\u4f20\u9012\u7ed9\u8bed\u8a00\u6a21\u578b\u3002DI-Adapter\u5141\u8bb8\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u6839\u636e\u6307\u4ee4\u6587\u672c\u5d4c\u5165\u52a8\u6001\u53d8\u5316\uff0c\u89e3\u51b3\u4e86\u4ee5\u5f80\u65b9\u6cd5\u4e2d\u540c\u4e00\u56fe\u7247\u4e0b\u9759\u6001\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u7684\u95ee\u9898\u3002 \u4e0e\u4e4b\u524d\u7684\u6210\u679c\u76f8\u6bd4\uff0cMiniDrive\u5728\u53c2\u6570\u5927\u5c0f\u3001\u6d6e\u70b9\u8fd0\u7b97\u91cf\u548c\u54cd\u5e94\u6548\u7387\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u4f18\u6027\u80fd\uff0c\u6700\u5c0f\u7248\u672c\u4ec5\u5305\u542b83M\u53c2\u6570\u3002|\n", "2409.08264": "|**2024-09-12**|**Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale**|Rogerio Bonatti et.al.|[2409.08264](http://arxiv.org/abs/2409.08264)|**[link](https://github.com/microsoft/windowsagentarena)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u5728\u9700\u8981\u89c4\u5212\u548c\u63a8\u7406\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u4f5c\u4e3a\u8ba1\u7b97\u673a\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u80fd\u663e\u8457\u63d0\u5347\u4eba\u7c7b\u751f\u4ea7\u529b\u548c\u8f6f\u4ef6\u53ef\u8bbf\u95ee\u6027\u3002\u7136\u800c\uff0c\u8861\u91cf\u8fd9\u4e9b\u4ee3\u7406\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u7684\u6027\u80fd\u4ecd\u5b58\u5728\u6311\u6218\uff1a\uff08i\uff09\u5927\u591a\u6570\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u9650\u4e8e\u7279\u5b9a\u6a21\u6001\u6216\u9886\u57df\uff08\u4f8b\u5982\u7eaf\u6587\u672c\u3001\u7f51\u9875\u5bfc\u822a\u3001\u95ee\u9898\u56de\u7b54\u3001\u7f16\u7a0b\uff09\uff0c\uff08ii\uff09\u5b8c\u6574\u57fa\u51c6\u8bc4\u4f30\u8017\u65f6\u957f\uff08\u901a\u5e38\u9700\u6570\u5929\u65f6\u95f4\uff09\uff0c\u56e0\u4e3a\u4efb\u52a1\u5177\u6709\u591a\u6b65\u9aa4\u7684\u5e8f\u5217\u6027\u8d28\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201cWindows Agent Arena\u201d\uff1a\u4e00\u4e2a\u53ef\u590d\u73b0\u7684\u901a\u7528\u73af\u5883\uff0c\u4e13\u6ce8\u4e8eWindows\u64cd\u4f5c\u7cfb\u7edf\uff0c\u5141\u8bb8\u4ee3\u7406\u81ea\u7531\u64cd\u4f5c\u5e76\u4f7f\u7528\u4e0e\u4eba\u7c7b\u7528\u6237\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u76f8\u540c\u7684\u5e7f\u6cdb\u5e94\u7528\u7a0b\u5e8f\u3001\u5de5\u5177\u548c\u7f51\u7edc\u6d4f\u89c8\u5668\u3002\u6211\u4eec\u6839\u636eOSWorld\u6846\u67b6\uff08Xie\u7b49\u4eba\uff0c2024\u5e74\uff09\u521b\u5efa\u4e86150\u591a\u4e2a\u8de8\u4ee3\u8868\u9886\u57df\u7684\u591a\u6837\u5316Windows\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u89c4\u5212\u3001\u5c4f\u5e55\u7406\u89e3\u53ca\u5de5\u5177\u4f7f\u7528\u7684\u4ee3\u7406\u80fd\u529b\u8981\u6c42\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u5e76\u80fd\u591f\u65e0\u7f1d\u5730\u5728Azure\u4e0a\u5e76\u884c\u5316\uff0c\u4ece\u800c\u5728\u77ed\u77ed20\u5206\u949f\u5185\u5b8c\u6210\u5168\u9762\u57fa\u51c6\u8bc4\u4f30\u3002\u4e3a\u4e86\u5c55\u793aWindows Agent Arena\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ee3\u7406Navi\u3002Navi\u5728Windows\u9886\u57df\u5185\u7684\u6210\u529f\u7387\u8fbe\u5230\u4e8619.5%\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u672a\u7ecf\u8f85\u52a9\u7684\u4eba\u7c7b\u8868\u73b0\u5219\u4e3a74.5%\u3002\u6b64\u5916\uff0cNavi\u5728\u53e6\u4e00\u4e2a\u6d41\u884c\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u57fa\u51c6\u6d4b\u8bd5Mind2Web\u4e2d\u4e5f\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9Navi\u6027\u80fd\u7684\u8be6\u7ec6\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528Windows Agent Arena\u8fdb\u884c\u672a\u6765\u7814\u7a76\u7684\u4ee3\u7406\u5f00\u53d1\u548c\u6570\u636e\u751f\u6210\u673a\u4f1a\u7684\u89c1\u89e3\u3002\u7f51\u9875\uff1ahttps://microsoft.github.io/WindowsAgentArena \u4ee3\u7801\uff1ahttps://github.com/microsoft/WindowsAgentArena**|\n", "2409.08250": "|**2024-09-12**|**OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering**|Jiahao Nick Li et.al.|[2409.08250](http://arxiv.org/abs/2409.08250)|null|\u4eba\u4eec\u5e38\u901a\u8fc7\u7167\u7247\u3001\u5c4f\u5e55\u622a\u56fe\u548c\u89c6\u9891\u6765\u6355\u6349\u8bb0\u5fc6\u3002\u73b0\u6709\u7684\u57fa\u4e8eAI\u7684\u5de5\u5177\u80fd\u591f\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u68c0\u7d22\u8fd9\u4e9b\u6570\u636e\uff0c\u4f46\u4e3b\u8981\u5c40\u9650\u4e8e\u68c0\u7d22\u50cf\u7167\u7247\u4e2d\u7684\u7279\u5b9a\u7269\u4f53\u8fd9\u6837\u7684\u5355\u4e00\u4fe1\u606f\uff0c\u96be\u4ee5\u5904\u7406\u6d89\u53ca\u7406\u89e3\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\uff08\u5982\u4e8b\u4ef6\u5e8f\u5217\uff09\u7684\u66f4\u590d\u6742\u67e5\u8be2\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u4e3a\u671f\u4e00\u4e2a\u6708\u7684\u65e5\u5fd7\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u73b0\u5b9e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96c6\u6210\u4e0e\u6355\u83b7\u8bb0\u5fc6\u76f8\u5173\u5fc5\u8981\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86OmniQuery\uff0c\u8fd9\u662f\u4e00\u79cd\u80fd\u591f\u56de\u7b54\u9700\u8981\u63d0\u53d6\u548c\u63a8\u65ad\u591a\u5c42\u4e0a\u4e0b\u6587\u4fe1\u606f\u4ee5\u6574\u5408\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\u7684\u590d\u6742\u4e2a\u4eba\u8bb0\u5fc6\u76f8\u5173\u95ee\u9898\u7684\u65b0\u578b\u7cfb\u7edf\u3002OmniQuery\u901a\u8fc7\u4ece\u591a\u4e2a\u76f8\u4e92\u5173\u8054\u7684\u8bb0\u5fc6\u4e2d\u96c6\u6210\u5206\u6563\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u589e\u5f3a\u5355\u4e2a\u6355\u83b7\u7684\u8bb0\u5fc6\uff0c\u68c0\u7d22\u76f8\u5173\u8bb0\u5fc6\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u4f9b\u5168\u9762\u7684\u7b54\u6848\u3002\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86OmniQuery\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u8fbe\u523071.5%\uff0c\u5e76\u4e14\u5b83\u572874.5%\u7684\u65f6\u95f4\u91cc\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684RAG\u7cfb\u7edf\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u751a\u81f3\u53d6\u5f97\u4e86\u80dc\u5229\u6216\u5e76\u5217\u7b2c\u4e00\u7684\u6210\u7ee9\u3002|\n", "2409.08239": "|**2024-09-12**|**Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources**|Alisia Lupidi et.al.|[2409.08239](http://arxiv.org/abs/2409.08239)|null|\u5728\u9762\u5bf9\u4f9d\u8d56\u7ed3\u6784\u5316\u6570\u636e\u3001\u590d\u6742\u63a8\u7406\u6216\u5de5\u5177\u4f7f\u7528\u7684\u6311\u6218\u6027\u573a\u666f\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSource2Synth\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u65e0\u9700\u6602\u8d35\u7684\u4eba\u7c7b\u6807\u6ce8\u5373\u53ef\u7528\u4e8e\u6559\u6388LLMs\u65b0\u6280\u80fd\u3002Source2Synth\u63a5\u53d7\u81ea\u5b9a\u4e49\u6570\u636e\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u751f\u6210\u5177\u6709\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u6765\u6e90\u7684\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u5408\u6210\u6570\u636e\u70b9\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u6839\u636e\u5176\u53ef\u56de\u7b54\u6027\u4e22\u5f03\u4f4e\u8d28\u91cf\u751f\u6210\u6765\u63d0\u9ad8\u6570\u636e\u96c6\u8d28\u91cf\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u9886\u57df\u4e2d\u5e94\u7528\u6b64\u65b9\u6cd5\u6765\u5c55\u793a\u5176\u901a\u7528\u6027\uff1a\u5728\u591a\u8df3\u95ee\u9898\u56de\u7b54\uff08MHQA\uff09\u4e2d\u6d4b\u8bd5\u63a8\u7406\u80fd\u529b\uff0c\u5728\u8868\u683c\u578b\u95ee\u9898\u56de\u7b54\uff08TQA\uff09\u4e2d\u6d4b\u8bd5\u5de5\u5177\u4f7f\u7528\u3002\u4e0e\u7ecf\u8fc7\u5fae\u8c03\u7684\u57fa\u672c\u6a21\u578b\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728WikiSQL\u4e0a\u7684TQA\u4e0a\u63d0\u9ad8\u4e8625.51%\uff0c\u5728HotPotQA\u4e0a\u7684MHQA\u4e0a\u63d0\u9ad8\u4e8622.57%\u7684\u6027\u80fd\u3002|\n", "2409.08234": "|**2024-09-12**|**LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems**|Hakan T. Otal et.al.|[2409.08234](http://arxiv.org/abs/2409.08234)|**[link](https://github.com/ai-in-complex-systems-lab/llm-honeypot)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6784\u5efa\u771f\u5b9e\u4e14\u4e92\u52a8\u7684\u871c\u7f50\u7cfb\u7edf\u3002\u901a\u8fc7\u5728\u5305\u542b\u653b\u51fb\u8005\u751f\u6210\u547d\u4ee4\u548c\u54cd\u5e94\u7684\u591a\u6837\u5316\u6570\u636e\u96c6\u4e0a\u5bf9\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u80fd\u591f\u4e0e\u653b\u51fb\u8005\u8fdb\u884c\u9ad8\u7ea7\u4ea4\u4e92\u7684\u871c\u7f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5173\u952e\u6b65\u9aa4\uff1a\u6570\u636e\u6536\u96c6\u4e0e\u5904\u7406\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6a21\u578b\u9009\u62e9\u4ee5\u53ca\u76d1\u7763\u5f0f\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002\u901a\u8fc7\u76f8\u4f3c\u6027\u6307\u6807\u8bc4\u4f30\u4e0e\u73b0\u573a\u90e8\u7f72\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u51c6\u786e\u4e14\u4fe1\u606f\u4e30\u5bcc\u7684\u54cd\u5e94\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u5728\u91cd\u5851\u871c\u7f50\u6280\u672f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u4e1a\u4eba\u5458\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u5de5\u5177\u6765\u68c0\u6d4b\u548c\u5206\u6790\u6076\u610f\u6d3b\u52a8\uff0c\u4ece\u800c\u589e\u5f3a\u6574\u4f53\u5b89\u5168\u67b6\u6784\u3002**|\n", "2409.08202": "|**2024-09-12**|**What Makes a Maze Look Like a Maze?**|Joy Hsu et.al.|[2409.08202](http://arxiv.org/abs/2409.08202)|null|\u4eba\u7c7b\u89c6\u89c9\u7406\u89e3\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u80fd\u591f\u7075\u6d3b\u5730\u89e3\u91ca\u62bd\u8c61\u6982\u5ff5\u7684\u80fd\u529b\uff1a\u83b7\u53d6\u63d0\u5347\u89c4\u5219\u6765\u89e3\u91ca\u5b83\u4eec\u6240\u8c61\u5f81\u7684\u542b\u4e49\uff0c\u5728\u719f\u6089\u548c\u4e0d\u719f\u6089\u7684\u4e0a\u4e0b\u6587\u4e2d\u951a\u5b9a\u5b83\u4eec\uff0c\u5e76\u5bf9\u5b83\u4eec\u8fdb\u884c\u9884\u6d4b\u6216\u63a8\u7406\u3002\u5c3d\u7ba1\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u5177\u4f53\u5bf9\u8c61\u7c7b\u522b\uff08\u5982\u6811\u679d\uff09\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u7406\u89e3\u8fd9\u6837\u7684\u89c6\u89c9\u62bd\u8c61\uff08\u4f8b\u5982\uff0c\u4e00\u7ec4\u6811\u679d\u5982\u4f55\u5f62\u6210\u8ff7\u5bab\u7684\u5899\u58c1\uff09\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u6df1\u5ea6\u67b6\u6784\u63a5\u5730\uff08DSG\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u660e\u786e\u7684\u7ed3\u6784\u5316\u8868\u793a\u6cd5\u6765\u951a\u5b9a\u548c\u63a8\u7406\u89c6\u89c9\u62bd\u8c61\u7684\u6846\u67b6\u3002DSG\u7684\u6838\u5fc3\u662f\u67b6\u6784\u2014\u2014\u5206\u89e3\u62bd\u8c61\u6982\u5ff5\u7684\u4f9d\u8d56\u56fe\u5f62\u63cf\u8ff0\uff0c\u5c06\u5176\u5206\u89e3\u4e3a\u66f4\u57fa\u672c\u7684\u7b26\u53f7\u3002DSG\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u53d6\u67b6\u6784\uff0c\u7136\u540e\u901a\u8fc7\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5206\u5c42\u5730\u5c06\u67b6\u6784\u4e2d\u7684\u5177\u4f53\u5230\u62bd\u8c61\u7ec4\u4ef6\u951a\u5b9a\u5230\u56fe\u50cf\u4e0a\u3002\u951a\u5b9a\u540e\u7684\u67b6\u6784\u7528\u4e8e\u589e\u5f3a\u5bf9\u89c6\u89c9\u62bd\u8c61\u7684\u7406\u89e3\u3002\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86DSG\u53ca\u5176\u4e0d\u540c\u7684\u65b9\u6cd5\u5728\u6211\u4eec\u65b0\u521b\u5efa\u7684\u89c6\u89c9\u62bd\u8c61\u6570\u636e\u96c6\u4e0a\u7684\u63a8\u7406\u6027\u80fd\uff0c\u8be5\u6570\u636e\u96c6\u7531\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u4e16\u754c\u56fe\u50cf\u548c\u76f8\u5e94\u7684\u95ee\u7b54\u5bf9\u7ec4\u6210\u3002\u6211\u4eec\u5c55\u793a\u4e86DSG\u663e\u8457\u63d0\u9ad8\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u62bd\u8c61\u89c6\u89c9\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\uff0c\u5e76\u671d\u7740\u4e0e\u4eba\u7c7b\u4e00\u81f4\u7684\u89c6\u89c9\u62bd\u8c61\u7406\u89e3\u8fc8\u8fdb\u4e86\u4e00\u6b65\u3002|\n", "2409.08185": "|**2024-09-12**|**Fine-tuning Large Language Models for Entity Matching**|Aaron Steiner et.al.|[2409.08185](http://arxiv.org/abs/2409.08185)|**[link](https://github.com/wbsg-uni-mannheim/tailormatch)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5b9e\u4f53\u5339\u914d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5fae\u8c03\u3002\u5df2\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4e0a\u3002\u672c\u6587\u4ece\u4e24\u4e2a\u7ef4\u5ea6\u5206\u6790\u4e86\u5fae\u8c03\u7684\u53ef\u884c\u6027\uff1a1\uff09\u8bad\u7ec3\u793a\u4f8b\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u5b9e\u9a8c\u6d89\u53ca\u5728\u8bad\u7ec3\u96c6\u4e2d\u6dfb\u52a0\u4e0d\u540c\u7c7b\u578b\u7684LLM\u751f\u6210\u89e3\u91ca\uff1b2\uff09\u4f7f\u7528LLM\u9009\u62e9\u548c\u751f\u6210\u8bad\u7ec3\u793a\u4f8b\u3002\u6211\u4eec\u4e0d\u4ec5\u5173\u6ce8\u6e90\u6570\u636e\u96c6\u4e0a\u7684\u5339\u914d\u6027\u80fd\uff0c\u8fd8\u7814\u7a76\u4e86\u5fae\u8c03\u5bf9\u6a21\u578b\u5728\u540c\u57df\u6570\u636e\u96c6\u4ee5\u53ca\u8de8\u9886\u57df\u6570\u636e\u96c6\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u7684\u5f71\u54cd\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5fae\u8c03\u663e\u8457\u63d0\u5347\u4e86\u5c0f\u578b\u6a21\u578b\u7684\u6027\u80fd\uff0c\u800c\u5927\u578b\u6a21\u578b\u7684\u8868\u73b0\u5219\u53c2\u5dee\u4e0d\u9f50\u3002\u5fae\u8c03\u5728\u63d0\u5347\u540c\u57df\u6570\u636e\u96c6\u7684\u6cdb\u5316\u80fd\u529b\u7684\u540c\u65f6\uff0c\u4e5f\u5f71\u54cd\u4e86\u8de8\u57df\u8fc1\u79fb\u7684\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5411\u8bad\u7ec3\u96c6\u6dfb\u52a0\u7ed3\u6784\u5316\u7684\u89e3\u91ca\u5bf9\u56db\u79cdLLM\u4e2d\u7684\u4e09\u79cd\u6709\u6b63\u9762\u5f71\u54cd\uff0c\u800c\u63d0\u51fa\u7684\u793a\u4f8b\u9009\u62e9\u548c\u751f\u6210\u65b9\u6cd5\u4ec5\u63d0\u5347\u4e86Llama 3.1 8B\u7684\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86GPT-4o Mini\u7684\u6027\u80fd\u3002**|\n", "2409.08148": "|**2024-09-12**|**Faster Speech-LLaMA Inference with Multi-token Prediction**|Desh Raj et.al.|[2409.08148](http://arxiv.org/abs/2409.08148)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u4efb\u52a1\u4e0a\u53d8\u5f97\u6781\u4e3a\u719f\u7ec3\uff0c\u5305\u62ec\u6d89\u53ca\u591a\u6a21\u6001\u8f93\u5165\u7684\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u4f7f\u7528\u8bed\u97f3\u7f16\u7801\u5668\u5b9e\u4f8b\u5316LLM\uff08\u4f8b\u5982LLaMA\uff09\u5e76\u5229\u7528\u914d\u5bf9\u6570\u636e\u5bf9\u5176\u8fdb\u884c\u8bad\u7ec3\uff0c\u53ef\u4ee5\u8d4b\u4e88\u53ea\u89e3\u7801\u7684\u6a21\u578b\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u80fd\u529b\uff0c\u56e0\u6b64\u79f0\u4e4b\u4e3aSpeech-LLaMA\u3002\u7136\u800c\uff0c\u7531\u4e8e\u81ea\u56de\u5f52\u63a8\u7406\u7684\u987a\u5e8f\u6027\u8d28\u4ee5\u53ca\u76f8\u5bf9\u8f83\u5927\u7684\u89e3\u7801\u5668\uff0cSpeech-LLaMA\u6a21\u578b\u7684\u63a8\u7406\u65f6\u95f4\u76f8\u5bf9\u8f83\u9ad8\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u5728\u540c\u4e00\u89e3\u7801\u6b65\u9aa4\u4e2d\u9884\u6d4b\u591a\u4e2a\u4ee4\u724c\u6765\u52a0\u901fSpeech-LLaMA\u7684\u63a8\u7406\u3002\u6211\u4eec\u63a2\u7d22\u4e86\u51e0\u4e2a\u80fd\u591f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6a21\u578b\u67b6\u6784\uff0c\u5e76\u901a\u8fc7\u9608\u503c\u63a8\u7406\u548c\u9a8c\u8bc1\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u524d\u7f00\u7684\u675f\u641c\u7d22\u89e3\u7801\u65b9\u6cd5\uff0c\u5141\u8bb8\u6b64\u7c7b\u6a21\u578b\u8fdb\u884c\u9ad8\u6548\u7684\u6700\u5c0f\u8bcd\u9519\u8bef\u7387\uff08MWER\uff09\u8bad\u7ec3\u3002\u6211\u4eec\u5728\u591a\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u4eec\u5c06\u89e3\u7801\u8c03\u7528\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u7ea63.2\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u6216\u63d0\u9ad8\u4e86WER\u6027\u80fd\u3002|\n", "2409.08147": "|**2024-09-12**|**LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models**|Zhengliang Liu et.al.|[2409.08147](http://arxiv.org/abs/2409.08147)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u8868\u73b0\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u957f\u671f\u5b58\u5728\u7684\u5ba2\u89c2\u8bc4\u4f30\u8fa9\u8bba\u7ed3\u679c\u7684\u6311\u6218\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u4ece\u201c\u653f\u7b56\u3001\u4e2a\u6027\u4e0e\u89c6\u89d2\u201d\uff083P\uff09\u548c\u201c\u5174\u8da3\u3001\u610f\u8bc6\u5f62\u6001\u4e0e\u8eab\u4efd\u8ba4\u540c\u201d\uff083I\uff09\u7684\u89d2\u5ea6\u5206\u6790\u56db\u4f4d\u5173\u952e\u53d7\u4f17\u7fa4\u4f53\uff1a\u9009\u6c11\u3001\u4f01\u4e1a\u3001\u6350\u8d60\u8005\u53ca\u653f\u5ba2\u5bf9\u5019\u9009\u4eba\u7684\u5171\u9e23\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u751f\u6210\u201cLLM-POTUS\u8bc4\u5206\u201d\uff0c\u5373\u57fa\u4e8e3P\u4e0e3I\u4e4b\u95f4\u4e00\u81f4\u6027\u5ea6\u91cf\u7684\u91cf\u5316\u6307\u6807\uff0c\u6765\u8bc4\u4ef7\u8fa9\u8bba\u8868\u73b0\u3002\u6211\u4eec\u5e94\u7528\u6b64\u6846\u67b6\u5bf9\u8fd1\u671f\u7f8e\u56fd\u603b\u7edf\u8fa9\u8bba\u7684\u6587\u672c\u8fdb\u884c\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u8fa9\u8bba\u7b56\u7565\u7684\u6709\u6548\u6027\u53ca\u5176\u5bf9\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u5f71\u54cd\u3002\u7814\u7a76\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u653f\u6cbb\u5206\u6790\u5de5\u5177\uff0c\u8fd8\u63a2\u7d22\u4e86\u5728\u590d\u6742\u793e\u4f1a\u80cc\u666f\u4e0b\u4f7f\u7528LLM\u4f5c\u4e3a\u516c\u6b63\u8bc4\u5224\u8005\u7684\u6f5c\u529b\u4e0e\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u4e3a\u4e2a\u4eba\u516c\u6c11\u63d0\u4f9b\u4e86\u4e00\u4e2a\u72ec\u7acb\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u7684\u8868\u73b0\uff0c\u4ece\u800c\u589e\u5f3a\u6c11\u4e3b\u53c2\u4e0e\u5ea6\uff0c\u51cf\u5c11\u5bf9\u53ef\u80fd\u504f\u89c1\u7684\u5a92\u4f53\u89e3\u8bfb\u548c\u673a\u6784\u5f71\u54cd\u529b\u7684\u4f9d\u8d56\uff0c\u8fdb\u800c\u52a0\u5f3a\u77e5\u60c5\u516c\u6c11\u53c2\u4e0e\u7684\u57fa\u7840\u3002|\n", "2409.08098": "|**2024-09-12**|**The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal**|Huiyuan Xie et.al.|[2409.08098](http://arxiv.org/abs/2409.08098)|null|\u672c\u6587\u7814\u7a76\u4e86\u6280\u672f\u9769\u65b0\u4e0e\u83b7\u53d6\u516c\u6b63\u4e4b\u95f4\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5728\u82f1\u56fd\u5c31\u4e1a\u6cd5\u5ead\uff08UKET\uff09\u6784\u5efa\u9884\u6d4b\u6848\u4f8b\u7ed3\u679c\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u6311\u6218\uff0c\u8be5\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u81ea\u52a8\u6ce8\u91ca\uff0c\u4ece\u800c\u521b\u5efa\u4e86CLC-UKET\u6570\u636e\u96c6\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea619,000\u4e2aUKET\u6848\u4f8b\u53ca\u5176\u5143\u6570\u636e\u3002\u5168\u9762\u7684\u6cd5\u5f8b\u6ce8\u91ca\u6db5\u76d6\u4e86\u4e8b\u5b9e\u3001\u4e3b\u5f20\u3001\u5148\u4f8b\u5f15\u7528\u3001\u6cd5\u89c4\u5f15\u7528\u3001\u6848\u4f8b\u7ed3\u679c\u3001\u7406\u7531\u548c\u7ba1\u8f96\u6743\u4ee3\u7801\u3002\u501f\u52a9CLC-UKET\u6570\u636e\uff0c\u6211\u4eec\u5bf9UKET\u7684\u591a\u7c7b\u6848\u4f8b\u7ed3\u679c\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u6536\u96c6\u4e86\u4eba\u7c7b\u9884\u6d4b\u4ee5\u5efa\u7acb\u6a21\u578b\u6bd4\u8f83\u7684\u6027\u80fd\u53c2\u8003\u3002\u4ece\u57fa\u7840\u6a21\u578b\u7684\u5b9e\u8bc1\u7ed3\u679c\u6765\u770b\uff0c\u5fae\u8c03\u7684\u8f6c\u6362\u5668\u6a21\u578b\u5728UKET\u9884\u6d4b\u4efb\u52a1\u4e0a\u4f18\u4e8e\u96f6\u6b21\u548c\u5c11\u91cf\u6837\u672c\u7684LLM\u3002\u96f6\u6b21LLM\u7684\u6027\u80fd\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u4fe1\u606f\u6765\u589e\u5f3a\uff0c\u878d\u5165\u5c11\u91cf\u6837\u672c\u793a\u4f8b\u4e2d\u3002\u6211\u4eec\u5e0c\u671bCLC-UKET\u6570\u636e\u96c6\u3001\u4eba\u7c7b\u6ce8\u91ca\u4ee5\u53ca\u5b9e\u8bc1\u53d1\u73b0\u80fd\u591f\u4f5c\u4e3a\u5c31\u4e1a\u76f8\u5173\u7ea0\u7eb7\u89e3\u51b3\u7684\u5b9d\u8d35\u57fa\u51c6\u3002|\n", "2409.08087": "|**2024-09-12**|**Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks**|Benji Peng et.al.|[2409.08087](http://arxiv.org/abs/2409.08087)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u8fd1\u5e74\u6765\u6709\u5173\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b89\u5168\u6027\u7684\u5173\u952e\u95ee\u9898\u7684\u7814\u7a76\u6587\u732e\uff0c\u91cd\u70b9\u662f\u51c6\u786e\u6027\u3001\u504f\u89c1\u3001\u5185\u5bb9\u68c0\u6d4b\u4ee5\u53ca\u5bf9\u6297\u653b\u51fb\u7684\u8106\u5f31\u6027\u3002\u6587\u7ae0\u8be6\u7ec6\u8ba8\u8bba\u4e86LLM\u8f93\u51fa\u53ef\u80fd\u4e0d\u51c6\u786e\u6216\u8bef\u5bfc\u6027\u7684\u95ee\u9898\uff0c\u5e76\u5f3a\u8c03\u4e86\u901a\u8fc7\u4e8b\u5b9e\u6838\u67e5\u65b9\u6cd5\u589e\u5f3a\u54cd\u5e94\u53ef\u9760\u6027\u7684\u5b9e\u65bd\u7b56\u7565\u3002\u6587\u7ae0\u6df1\u5165\u63a2\u8ba8\u4e86\u5185\u5d4c\u4e8eLLM\u4e2d\u7684\u56fa\u6709\u504f\u89c1\uff0c\u901a\u8fc7\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6280\u672f\uff0c\u5982\u63a7\u5236\u8f93\u5165\u7814\u7a76\u548c\u7ea2\u961f\u6f14\u7ec3\uff0c\u5bf9\u5176\u8fdb\u884c\u6279\u5224\u6027\u5ba1\u89c6\u3002\u63d0\u51fa\u4e86\u5168\u9762\u7684\u504f\u89c1\u7f13\u89e3\u7b56\u7565\u5206\u6790\uff0c\u5305\u62ec\u4ece\u9884\u5904\u7406\u5e72\u9884\u5230\u8bad\u7ec3\u671f\u95f4\u8c03\u6574\u548c\u540e\u5904\u7406\u6539\u8fdb\u7684\u5404\u79cd\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u63a2\u7a76\u4e86\u533a\u5206LLM\u751f\u6210\u5185\u5bb9\u4e0e\u4eba\u7c7b\u521b\u4f5c\u6587\u672c\u7684\u590d\u6742\u6027\uff0c\u5f15\u5165\u4e86\u8bf8\u5982DetectGPT\u7684\u68c0\u6d4b\u673a\u5236\u4ee5\u53ca\u6c34\u5370\u6280\u672f\uff0c\u540c\u65f6\u6307\u51fa\u5728\u590d\u6742\u60c5\u51b5\u4e0b\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u5206\u7c7b\u5668\u5b58\u5728\u5c40\u9650\u6027\u3002\u6587\u7ae0\u8fd8\u5206\u6790\u4e86LLM\u7684\u6f0f\u6d1e\uff0c\u5305\u62ec\u9003\u9038\u653b\u51fb\u548c\u63d0\u793a\u6ce8\u5165\u653b\u51fb\uff0c\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u548c\u5927\u89c4\u6a21\u7ade\u8d5bHackAPrompt\u7b49\u8fdb\u884c\u4e86\u6df1\u5165\u63a2\u8ba8\u3002\u6700\u540e\uff0c\u6587\u7ae0\u56de\u987e\u4e86\u4fdd\u62a4LLM\u7684\u9632\u5fa1\u63aa\u65bd\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u5bf9LLM\u5b89\u5168\u6027\u9886\u57df\u8fdb\u884c\u66f4\u6df1\u5165\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.09030": "|**2024-09-13**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanxian Huang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5c24\u5176\u662f\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u4e2d\u7684\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u8bb8\u591a\u5c06LLMs\u4e0eSE\u7ed3\u5408\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u7684\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u53d1\u5c55\u80cc\u666f\u7684\u6df1\u5165\u7efc\u8ff0\u3001\u5206\u6790\u5b83\u4eec\u5982\u4f55\u7ed3\u5408\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u79cd\u4efb\u52a1\u4ee5\u53ca\u6f84\u6e05SE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u65e8\u5728\u8fdb\u884c\u9996\u6b21\u5173\u4e8e\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u3002\u540c\u65f6\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u8fd9\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u5f53\u524d\u6311\u6218\uff0c\u5e76\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u7684\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u76f8\u5173\u7684\u8bba\u6587GitHub\u4ed3\u5e93\uff0c\u5730\u5740\u4e3a\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09010": "|**2024-09-13**|**Contri(e)ve: Context + Retrieve for Scholarly Question Answering**|Kanchan Shivashankar et.al.|[2409.09010](http://arxiv.org/abs/2409.09010)|null|### \u6458\u8981\u7ffb\u8bd1 \u5b66\u8005\u4ea4\u6d41\u662f\u4e00\u4e2a\u5feb\u901f\u53d1\u5c55\u7684\u9886\u57df\uff0c\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5176\u975e\u7ed3\u6784\u5316\u7684\u6587\u6863\u683c\u5f0f\uff0c\u4f20\u7edf\u7684\u6587\u6863\u68c0\u7d22\u65b9\u6cd5\u96be\u4ee5\u4ece\u4e2d\u63d0\u53d6\u6709\u7528\u4fe1\u606f\u3002\u5b66\u8005\u77e5\u8bc6\u56fe\u8c31\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u8bed\u4e49\u7f51\u7edc\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u9690\u85cf\u7684\u6d1e\u5bdf\u3001\u6458\u8981\u548c\u6613\u4e8e\u901a\u8fc7\u67e5\u8be2\u83b7\u53d6\u7684\u8bbf\u95ee\u6027\u3002\u81ea\u7136\u5730\uff0c\u5bf9\u5b66\u8005\u56fe\u8c31\u8fdb\u884c\u95ee\u7b54\u6269\u5c55\u4e86\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u67d0\u4e9b\u77e5\u8bc6\u4ecd\u7136\u4ee5\u975e\u7ed3\u6784\u5316\u6587\u672c\u5f62\u5f0f\u5448\u73b0\uff0c\u56e0\u6b64\u9700\u8981\u7ed3\u5408\u89e3\u51b3\u65b9\u6848\u6765\u4e3a\u95ee\u7b54\u7cfb\u7edf\u63d0\u4f9b\u652f\u6301\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u89e3\u51b3\u65b9\u6848\uff0c\u4f7f\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff1aLlama3.1\u5bf9\u5b66\u8005-QALD\u6570\u636e\u96c6\u8fdb\u884c\u5904\u7406\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ece\u4e0d\u540c\u7684\u7ed3\u6784\u5316\u548c\u975e\u7ed3\u6784\u5316\u6570\u636e\u6e90\u4e2d\u63d0\u53d6\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5185\u5bb9\uff1aDBLP\u3001SemOpenAlex\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u7ef4\u57fa\u767e\u79d1\u6587\u672c\u3002 \u5176\u6b21\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u63d0\u793a\u5de5\u7a0b\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4fe1\u606f\u68c0\u7d22\u6027\u80fd\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u5728F1\u5206\u6570\u4e0a\u53d6\u5f97\u4e8640%\u7684\u6210\u7ee9\uff0c\u5e76\u89c2\u5bdf\u5230\u4e00\u4e9b\u6765\u81eaLLM\u7684\u5f02\u5e38\u54cd\u5e94\uff0c\u8fd9\u4e9b\u54cd\u5e94\u5728\u8bba\u6587\u7684\u6700\u540e\u90e8\u5206\u8fdb\u884c\u4e86\u8ba8\u8bba\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u7b26\u5408\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u4eba\u7c7b\u7684\u5408\u89c4\u6027\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u4e0d\u65ad\u589e\u957f\u91cf\u548c\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\u9762\u4e34\u7740\u6269\u5c55\u96be\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\uff0c\u4e3a\u81ea\u52a8\u5316\u5185\u5bb9\u5408\u89c4\u6027\u9a8c\u8bc1\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u672c\u6587\u8bc4\u4f30\u4e86\u516d\u4e2a\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u8fd9\u4e9b\u4ee3\u7406\u57fa\u4e8eOpen-LLMs\uff0c\u5728\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u5bf9\u89c4\u5219\u5408\u89c4\u6027\u8fdb\u884c\u81ea\u52a8\u9a8c\u8bc1\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\uff0c\u56e0\u4e3a\u793e\u533a\u7684\u8303\u56f4\u548c\u89c4\u5219\u5404\u4e0d\u76f8\u540c\u3002\u901a\u8fc7\u5bf9\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\u7684\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u975e\u5408\u89c4\u5185\u5bb9\u3001\u638c\u63e1\u8bed\u8a00\u4e0a\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u4e0d\u540c\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u663e\u793a\u51fa\u9ad8\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\uff0c\u5728\u8bc4\u5206\u89e3\u91ca\u548c\u5408\u89c4\u5efa\u8bae\u4e0a\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u8005\u76f8\u5339\u914d\u3002\u901a\u8fc7\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u5de5\u8bc4\u4f30\uff0c\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08937": "|**2024-09-13**|**Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions**|Zahra Ashktorab et.al.|[2409.08937](http://arxiv.org/abs/2409.08937)|null|\u672c\u6587\u7814\u7a76\u4e86\u5728\u4eba\u7c7b\u4e0e\u4eba\u5de5\u667a\u80fd\u5408\u4f5c\u8fdb\u884c\u6587\u672c\u751f\u6210\u4efb\u52a1\u65f6\uff0c\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u534f\u52a9\u751f\u6210\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6570\u636e\u3002\u5bf9\u4e8e\u8fd9\u4e9b\u6a21\u578b\u800c\u8a00\uff0c\u9700\u8981\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u63d0\u5347\u5176\u6027\u80fd\u7684\u5173\u952e\u6b65\u9aa4\u3002\u5728\u5ba2\u6237\u670d\u52a1\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u4e2d\uff0c\u6570\u636e\u4ee5\u4eba\u4e0e\u5ba2\u670d\u4ee3\u7406\u4e4b\u95f4\u7684\u5bf9\u8bdd\u5f62\u5f0f\u5b58\u5728\uff0c\u5e76\u53ef\u501f\u52a9AI\u52a9\u624b\u751f\u6210\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u5171\u62db\u52df\u4e8611\u4f4d\u7528\u6237\uff0c\u6bcf\u4f4d\u7528\u6237\u5b8c\u62108\u9879\u4efb\u52a1\uff0c\u603b\u5171\u5b8c\u6210\u4e8688\u9879\u4efb\u52a1\u3002\u7ed3\u679c\u53d1\u73b0\uff0c\u5e7b\u89c9\u7684\u5b58\u5728\u5bf9\u6570\u636e\u8d28\u91cf\u4ea7\u751f\u4e86\u8d1f\u9762\u5f71\u54cd\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5c3d\u7ba1\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5e76\u975e\u603b\u80fd\u62b5\u6d88\u5e7b\u89c9\u5bf9\u6570\u636e\u8d28\u91cf\u7684\u4e0d\u5229\u5f71\u54cd\uff0c\u4f46\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5171\u540c\u4f5c\u7528\u4e8e\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u5f71\u54cd\u7528\u6237\u5982\u4f55\u5229\u7528\u5448\u73b0\u7ed9\u4ed6\u4eec\u7684AI\u54cd\u5e94\u3002\u901a\u8fc7\u5206\u6790\u7528\u6237\u884c\u4e3a\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5bf9AI\u751f\u6210\u54cd\u5e94\u4f9d\u8d56\u7684\u660e\u663e\u6a21\u5f0f\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728\u5bf9\u8bddAI\u60c5\u5883\u4e0b\u7ba1\u7406\u5e7b\u89c9\u5728AI\u751f\u6210\u5185\u5bb9\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08936": "|**2024-09-13**|**SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records**|Paloma Rabaey et.al.|[2409.08936](http://arxiv.org/abs/2409.08936)|**[link](https://github.com/prabaey/synsum)**|**\u6211\u4eec\u63d0\u51fa\u4e86SynSUM\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5c06\u975e\u7ed3\u6784\u5316\u7684\u4e34\u5e8a\u8bb0\u5f55\u4e0e\u7ed3\u6784\u5316\u80cc\u666f\u53d8\u91cf\u8054\u7cfb\u8d77\u6765\u3002\u8be5\u6570\u636e\u96c6\u753110,000\u4e2a\u865a\u6784\u7684\u60a3\u8005\u8bb0\u5f55\u7ec4\u6210\uff0c\u5305\u542b\u8868\u683c\u53d8\u91cf\uff08\u5982\u75c7\u72b6\u3001\u8bca\u65ad\u548c\u57fa\u7840\u6761\u4ef6\uff09\u4ee5\u53ca\u4e0e\u4e4b\u76f8\u5173\u7684\u63cf\u8ff0\u865a\u6784\u60a3\u8005\u5c31\u8bca\u60c5\u51b5\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u9886\u57df\u4e3a\u547c\u5438\u75be\u75c5\u3002\u8868\u683c\u90e8\u5206\u7684\u6570\u636e\u901a\u8fc7\u8d1d\u53f6\u65af\u7f51\u7edc\u751f\u6210\uff0c\u5176\u4e2d\u56e0\u679c\u7ed3\u6784\u548c\u6761\u4ef6\u6982\u7387\u7531\u4e13\u5bb6\u57fa\u4e8e\u9886\u57df\u77e5\u8bc6\u63d0\u51fa\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4o\uff09\u751f\u6210\u4e0e\u60a3\u8005\u5c31\u8bca\u76f8\u5173\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u63cf\u8ff0\u60a3\u8005\u7684\u75c7\u72b6\u548c\u989d\u5916\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u3002 SynSUM\u6570\u636e\u96c6\u4e3b\u8981\u65e8\u5728\u4fc3\u8fdb\u5728\u5b58\u5728\u8868\u683c\u80cc\u666f\u53d8\u91cf\u7684\u60c5\u51b5\u4e0b\u5bf9\u4e34\u5e8a\u4fe1\u606f\u63d0\u53d6\u7684\u7814\u7a76\uff0c\u53ef\u4ee5\u901a\u8fc7\u9886\u57df\u77e5\u8bc6\u5c06\u8fd9\u4e9b\u53d8\u91cf\u94fe\u63a5\u5230\u4ece\u6587\u672c\u4e2d\u63d0\u53d6\u7684\u6982\u5ff5\u5174\u8da3\u70b9\u2014\u2014\u5728SynSUM\u7684\u60c5\u51b5\u4e0b\u662f\u75c7\u72b6\u3002\u6b21\u8981\u7528\u9014\u5305\u62ec\u7814\u7a76\u8868\u683c\u6570\u636e\u548c\u6587\u672c\u7684\u81ea\u52a8\u5316\u4e34\u5e8a\u63a8\u7406\u3001\u5728\u5b58\u5728\u8868\u683c\u548c/\u6216\u6587\u672c\u6df7\u6742\u56e0\u7d20\u60c5\u51b5\u4e0b\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u4ee5\u53ca\u591a\u6a21\u6001\u5408\u6210\u6570\u636e\u751f\u6210\u3002 \u8be5\u6570\u636e\u96c6\u53ef\u4ee5\u4ece\u4ee5\u4e0b\u94fe\u63a5\u4e0b\u8f7d\uff1a**|\n", "2409.08931": "|**2024-09-13**|**LLM-based Weak Supervision Framework for Query Intent Classification in Video Search**|Farnoosh Javadi et.al.|[2409.08931](http://arxiv.org/abs/2409.08931)|null|\u6d41\u5a92\u4f53\u670d\u52a1\u5df2\u7ecf\u5f7b\u5e95\u6539\u53d8\u4e86\u6211\u4eec\u53d1\u73b0\u548c\u53c2\u4e0e\u6570\u5b57\u5a31\u4e50\u7684\u65b9\u5f0f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u6709\u6548\u7406\u89e3\u7528\u6237\u641c\u7d22\u67e5\u8be2\u7684\u5e7f\u6cdb\u8303\u56f4\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6784\u5efa\u4e00\u4e2a\u80fd\u591f\u5904\u7406\u4ee3\u8868\u4e0d\u540c\u7528\u6237\u610f\u56fe\u7684\u5404\u79cd\u5b9e\u4f53\u7684\u51c6\u786e\u67e5\u8be2\u7406\u89e3\u7cfb\u7edf\u5bf9\u4e8e\u63d0\u4f9b\u589e\u5f3a\u7684\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u8bad\u7ec3\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u6a21\u578b\u53ef\u4ee5\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u7136\u800c\uff0c\u5728\u8fd9\u4e2a\u4e13\u95e8\u9886\u57df\u7684\u9ad8\u8d28\u91cf\u6807\u6ce8\u6570\u636e\u83b7\u53d6\u662f\u4e00\u4e2a\u5de8\u5927\u7684\u969c\u788d\u3002\u624b\u52a8\u6ce8\u91ca\u6210\u672c\u9ad8\u6602\u4e14\u5728\u6355\u6349\u7528\u6237\u8bcd\u6c47\u53d8\u5f02\u6027\u65b9\u9762\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f31\u76d1\u7763\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u6807\u6ce8\u5927\u91cf\u7528\u6237\u641c\u7d22\u67e5\u8be2\u3002\u901a\u8fc7\u4f7f\u7528\u63d0\u793a\u5de5\u7a0b\u548c\u591a\u6837\u5316\u7684LLM\u89d2\u8272\uff0c\u6211\u4eec\u751f\u6210\u4e86\u4e0e\u4eba\u5de5\u6ce8\u91ca\u8005\u671f\u671b\u76f8\u5339\u914d\u7684\u8bad\u7ec3\u6570\u636e\u3002\u901a\u8fc7\u5f15\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u5229\u7528\u94fe\u5f0f\u601d\u8003\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6807\u8bb0\u6570\u636e\u8bad\u7ec3\u4f18\u5316\u7528\u4e8e\u5b9e\u65f6\u63a8\u7406\u7684\u4f4e\u5ef6\u8fdf\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u53ec\u56de\u7387\u4e0a\u4f18\u4e8e\u57fa\u7ebf\u5e73\u5747\u63d0\u9ad8\u4e86113%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u4ea7\u751f\u7528\u4e8e\u5f31\u76d1\u7763\u7684\u9ad8\u8d28\u91cfLLM\u751f\u6210\u6570\u636e\uff1b\u4e0e\u4eba\u7c7b\u6ce8\u91ca\u7684F1\u5f97\u5206\u52a0\u6743\u5206\u5e03\u76f8\u6bd4\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u9884\u6d4b\u548c\u4eba\u7c7b\u6ce8\u89e3\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u63d0\u9ad8\u4e8647.60%\u3002\u6211\u4eec\u7684\u89d2\u8272\u9009\u62e9\u8def\u7531\u673a\u5236\u8fdb\u4e00\u6b65\u589e\u52a0\u4e863.67%\u7684\u52a0\u6743F1\u5f97\u5206\uff0c\u8fd9\u662f\u5728\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u57fa\u7840\u4e0a\u7684\u989d\u5916\u6536\u76ca\u3002|\n", "2409.08904": "|**2024-09-13**|**AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models**|Yifei Yao et.al.|[2409.08904](http://arxiv.org/abs/2409.08904)|**[link](https://github.com/sjtu-mvasl-robotics/AnyBipe)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bad\u7ec3\u548c\u90e8\u7f72\u673a\u5668\u4eba\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7b56\u7565\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5f15\u5bfc\u3002\u8be5\u6846\u67b6\u7531\u4e09\u4e2a\u76f8\u4e92\u8fde\u63a5\u7684\u6a21\u5757\u7ec4\u6210\uff1a\u4e00\u4e2a\u901a\u8fc7LLM\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u6a21\u5757\u3001\u4e00\u4e2a\u5229\u7528\u73b0\u6709\u5de5\u4f5c\u7684RL\u8bad\u7ec3\u6a21\u5757\u4ee5\u53ca\u4e00\u4e2a\u6a21\u62df\u5230\u73b0\u5b9e\uff08sim-to-real\uff09\u540c\u6001\u8bc4\u4f30\u6a21\u5757\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4ec5\u9700\u8981\u57fa\u672c\u7684\u6a21\u62df\u548c\u90e8\u7f72\u5e73\u53f0\uff0c\u5e76\u4e14\u63d0\u4f9b\u4e86\u4eba\u5de5\u5de5\u7a0b\u7b56\u7565\u548c\u5386\u53f2\u6570\u636e\u7684\u6574\u5408\u9009\u9879\u3002\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u8fd9\u4e9b\u6a21\u5757\u7684\u6784\u5efa\u3001\u5b83\u4eec\u76f8\u5bf9\u4e8e\u4f20\u7edf\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u5c55\u793a\u8be5\u6846\u67b6\u5728\u53cc\u8db3\u673a\u5668\u4eba\u6b65\u6001\u63a7\u5236\u81ea\u4e3b\u5f00\u53d1\u548c\u6539\u8fdb\u80fd\u529b\u7684\u5b9e\u4f8b\uff0c\u8bc1\u660e\u5176\u5728\u4e0d\u9700\u8981\u4eba\u7c7b\u5e72\u9884\u7684\u60c5\u51b5\u4e0b\u64cd\u4f5c\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.08890": "|**2024-09-13**|**A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research**|Martin Obschonka et.al.|[2409.08890](http://arxiv.org/abs/2409.08890)|null|\u5728\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u91c7\u7528\u7684\u8fc5\u901f\u589e\u957f\u4ee5\u53ca\u5927\u6570\u636e\u53ef\u7528\u6027\u7684\u80cc\u666f\u4e0b\uff0c\u521b\u4e1a\u5b66\u9886\u57df\u53ef\u80fd\u8fce\u6765\u6709\u53f2\u4ee5\u6765\u6700\u91cd\u5927\u7684\u8f6c\u53d8\u3002\u672c\u6587\u901a\u8fc7\u5f3a\u8c03AI\u9769\u547d\u671f\u95f4\u521b\u4e1a\u7814\u7a76\u4e2d\u6f5c\u5728\u7684\u65e0\u6210\u6548\u77e5\u8bc6\u4ea4\u6d41\u98ce\u9669\uff0c\u505a\u51fa\u4e86\u7d27\u8feb\u7684\u5143\u8d21\u732e\u3002\u5b83\u63d0\u4f9b\u4e86\u7f13\u89e3\u8fd9\u4e00\u98ce\u9669\u7684\u7b56\u7565\uff0c\u5e76\u4e3a\u672a\u6765\u57fa\u4e8eAI\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u5176\u96c6\u4f53\u5f71\u54cd\u529b\u548c\u76f8\u5173\u6027\u3002 \u501f\u9274Akerlof\u8457\u540d\u7684\u201c\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u201d\u6982\u5ff5\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u7531\u4e8e\u9886\u57df\u6f14\u8fdb\u5230\u5f53\u524d\u73af\u5883\u800c\u53ef\u80fd\u51fa\u73b0\u7684\u91cd\u5927\u77e5\u8bc6\u4e0d\u5bf9\u79f0\u6027\uff0c\u5982\u6784\u9020\u6709\u6548\u6027\u3001\u7406\u8bba\u6784\u5efa\u548c\u7814\u7a76\u76f8\u5173\u6027\u65b9\u9762\u7684\u590d\u6742\u6027\u3002\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u6027\u7279\u522b\u6df1\u690d\u4e8e\u6240\u8c13\u7684\u53cc\u91cd\u9ed1\u7bb1\u56f0\u5883\u4e2d\uff0c\u5373AI\u65b9\u6cd5\u7684\u5e7f\u6cdb\u8ba4\u53ef\u7684\u9ed1\u7bb1\u6027\u8d28\u4e0e\u7531\u5185\u5728\u4e0d\u786e\u5b9a\u6027\u9a71\u52a8\u7684\u521b\u4e1a\u73b0\u8c61\u7684\u9ed1\u7bb1\u6027\u8d28\u7684\u4ea4\u6c47\u70b9\u3002\u7ed3\u679c\uff0c\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u53ef\u80fd\u5bfc\u81f4\u4e0d\u53ef\u68c0\u6d4b\u7684\u6b21\u4f18\u7814\u7a76\u4ea7\u54c1\u589e\u52a0\uff0c\u4ece\u800c\u5f62\u6210\u4e00\u4e2a\u635f\u5bb3\u9886\u57df\u798f\u7949\u3001\u58f0\u8a89\u548c\u5f71\u54cd\u529b\u7684\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u3002 \u7136\u800c\uff0c\u91cd\u8981\u7684\u662f\uff0c\u5982\u679c\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u98ce\u9669\uff0cAI\u9769\u547d\u6709\u53ef\u80fd\u9884\u793a\u7740\u521b\u4e1a\u7814\u7a76\u7684\u65b0\u9ec4\u91d1\u65f6\u4ee3\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u63d0\u5347\u9886\u57df\u81f3\u66f4\u9ad8\u6c34\u5e73\u7684AI\u97e7\u6027\u6240\u9700\u91c7\u53d6\u7684\u884c\u52a8\uff0c\u540c\u65f6\u575a\u5b9a\u5730\u4fdd\u6301\u5176\u57fa\u7840\u539f\u5219\u548c\u6838\u5fc3\u4ef7\u503c\u89c2\u3002|\n", "2409.08864": "|**2024-09-13**|**Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies**|Zhiqiang Zhong et.al.|[2409.08864](http://arxiv.org/abs/2409.08864)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5904\u7406\u5404\u79cd\u6570\u636e\u7ed3\u6784\u65f6\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5305\u62ec\u56fe\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u5f00\u53d1\u7528\u4e8e\u56fe\u8868\u793a\u7684\u6587\u672c\u7f16\u7801\u65b9\u6cd5\u4e0a\uff0c\u4f46\u591a\u6a21\u6001LLM\u7684\u51fa\u73b0\u4e3a\u7406\u89e3\u56fe\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u524d\u6cbf\u3002\u8fd9\u4e9b\u5148\u8fdb\u7684\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6587\u672c\u548c\u56fe\u50cf\uff0c\u901a\u8fc7\u7ed3\u5408\u89c6\u89c9\u8868\u793a\u4e0e\u4f20\u7edf\u7684\u6587\u672c\u6570\u636e\uff0c\u53ef\u80fd\u5728\u63d0\u9ad8\u5bf9\u56fe\u7ed3\u6784\u7684\u7406\u89e3\u65b9\u9762\u5e26\u6765\u6539\u8fdb\u3002\u8fd9\u9879\u7814\u7a76\u63a2\u8ba8\u4e86\u53ef\u89c6\u5316\u56fe\u5728\u4e0d\u540c\u7ea7\u522b\uff08\u8282\u70b9\u3001\u8fb9\u548c\u56fe\u7ea7\u522b\uff09\u4e0a\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u591a\u6a21\u6001\u65b9\u6cd5\u4e0e\u7eaf\u6587\u672c\u56fe\u8868\u793a\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u63d0\u4f9b\u4e86\u5173\u4e8e\u5229\u7528\u89c6\u89c9\u56fe\u6a21\u6001\u589e\u5f3aLLM\u5bf9\u56fe\u7ed3\u6784\u7406\u89e3\u80fd\u529b\u7684\u6f5c\u529b\u548c\u9650\u5236\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2409.08846": "|**2024-09-13**|**FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition**|Zhenhua Xu et.al.|[2409.08846](http://arxiv.org/abs/2409.08846)|null|\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9700\u8981\u5de8\u5927\u7684\u8ba1\u7b97\u80fd\u529b\u548c\u5927\u91cf\u7684\u6570\u636e\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u6307\u7eb9\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u7684\u77e5\u8bc6\u4ea7\u6743\u5bf9\u4e8e\u6240\u6709\u6743\u8ba4\u8bc1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5c1d\u8bd5\u901a\u8fc7\u5fae\u8c03\u5411LLMs\u6dfb\u52a0\u6307\u7eb9\uff0c\u4f46\u8fd9\u4ecd\u6210\u672c\u9ad8\u6602\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FP-VEC\uff0c\u4e00\u79cd\u4f7f\u7528\u6307\u7eb9\u5411\u91cf\u4f5c\u4e3a\u9ad8\u6548LLM\u6307\u7eb9\u65b9\u6cd5\u7684\u8bd5\u70b9\u7814\u7a76\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u751f\u6210\u4e00\u4e2a\u4ee3\u8868\u5d4c\u5165\u5728\u6a21\u578b\u4e2d\u7684\u4fdd\u5bc6\u7b7e\u540d\u7684\u6307\u7eb9\u5411\u91cf\uff0c\u5141\u8bb8\u901a\u8fc7\u5411\u91cf\u76f8\u52a0\u65e0\u7f1d\u5730\u5c06\u76f8\u540c\u7684\u6307\u7eb9\u6574\u5408\u5230\u65e0\u9650\u6570\u91cf\u7684LLMs\u4e2d\u3002\u5728\u591a\u4e2aLLMs\u4e0a\u7684\u7ed3\u679c\u8868\u660e\uff0cFP-VEC\u8f7b\u91cf\u7ea7\uff0c\u53ef\u4ee5\u5728\u4ec5\u4f7f\u7528CPU\u7684\u8bbe\u5907\u4e0a\u8fd0\u884c\u4ee5\u8fdb\u884c\u6307\u7eb9\u8bc6\u522b\uff1b\u53ef\u6269\u5c55\uff0c\u53ea\u9700\u8981\u4e00\u6b21\u8bad\u7ec3\u5373\u53ef\u5b9e\u73b0\u65e0\u9650\u6b21\u7684\u6307\u7eb9\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u4e14\u80fd\u591f\u4fdd\u6301\u6a21\u578b\u7684\u6b63\u5e38\u884c\u4e3a\u3002\u9879\u76ee\u9875\u9762\u4f4d\u4e8ehttps://fingerprintvector.github.io \u3002|\n", "2409.10516": "|**2024-09-16**|**RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval**|Di Liu et.al.|[2409.10516](http://arxiv.org/abs/2409.10516)|**[link](https://github.com/jzbjyb/reatt)**|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u5bf9\u6269\u5c55\u5230\u66f4\u957f\u4e0a\u4e0b\u6587\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5bfc\u81f4\u4e86\u6781\u9ad8\u7684\u63a8\u7406\u5ef6\u8fdf\u548cGPU\u5185\u5b58\u6d88\u8017\u4ee5\u7f13\u5b58\u952e\u503c\uff08KV\uff09\u5411\u91cf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u2014\u2014\u68c0\u7d22\u6ce8\u610f\u529b\uff08RetrievalAttention\uff09\uff0c\u4ee5\u52a0\u901f\u6ce8\u610f\u529b\u8ba1\u7b97\u3002\u901a\u8fc7\u5229\u7528\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u52a8\u6001\u7a00\u758f\u7279\u6027\uff0cRetrievalAttention\u5728CPU\u5185\u5b58\u4e0a\u6784\u5efa\u4e86\u8fd1\u4f3c\u6700\u8fd1\u90bb\u641c\u7d22\uff08ANNS\uff09\u7d22\u5f15\uff0c\u5e76\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u901a\u8fc7\u5411\u91cf\u641c\u7d22\u68c0\u7d22\u6700\u76f8\u5173\u7684\u90e8\u5206\u3002 \u7531\u4e8e\u67e5\u8be2\u5411\u91cf\u4e0e\u952e\u5411\u91cf\u4e4b\u95f4\u7684\u5206\u5e03\u5916\uff08OOD\uff09\u95ee\u9898\uff0c\u73b0\u6210\u7684ANNS\u7d22\u5f15\u4ecd\u9700\u8981\u626b\u63cfO(N)\uff08\u901a\u5e38\u4e3a\u6240\u6709\u952e\u768430%\uff09\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u786e\u68c0\u7d22\uff0c\u8fd9\u65e0\u6cd5\u5145\u5206\u5229\u7528\u9ad8\u7a00\u758f\u6027\u3002RetrievalAttention\u9996\u5148\u8bc6\u522b\u4e86ANNS\u57fa\u6ce8\u610f\u529b\u4e2d\u7684OOD\u6311\u6218\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u9002\u5e94\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u611f\u77e5\u5411\u91cf\u641c\u7d22\u7b97\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u8be5\u7b97\u6cd5\u4ec5\u8bbf\u95ee1-3%\u7684\u6570\u636e\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e9a\u7ebf\u6027\u65f6\u95f4\u590d\u6742\u5ea6\u3002 RetrievalAttention\u5927\u5e45\u964d\u4f4e\u4e86\u957f\u4e0a\u4e0b\u6587LLMs\u7684\u63a8\u7406\u6210\u672c\uff0c\u540c\u65f6\u663e\u8457\u51cf\u5c11\u4e86GPU\u5185\u5b58\u9700\u6c42\uff0c\u800c\u4fdd\u6301\u4e86\u6a21\u578b\u51c6\u786e\u6027\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cRetrievalAttention\u4ec5\u9700\u898116GB\u7684GPU\u5185\u5b58\u5373\u53ef\u4e3a\u5177\u67098B\u53c2\u6570\u7684LLM\u63d0\u4f9b\u670d\u52a1\uff0c\u652f\u6301\u5904\u7406128K\u4e2a\u4ee4\u724c\uff0c\u80fd\u591f\u5728\u5355\u4e2aNVIDIA RTX4090\uff0824GB\uff09\u4e0a\u751f\u6210\u4e00\u4e2a\u4ee4\u724c\u8017\u65f60.188\u79d2\u3002|\n", "2409.10506": "|**2024-09-16**|**Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models**|Momoko Shiraishi et.al.|[2409.10506](http://arxiv.org/abs/2409.10506)|null|\u7531\u4e8e\u73b0\u6709C\u7a0b\u5e8f\u4e2d\u7684\u5185\u5b58\u5b89\u5168\u6027\u6f0f\u6d1e\u6301\u7eed\u5a01\u80c1\u4ee5\u53caRust\u8bed\u8a00\u4f5c\u4e3aC\u8bed\u8a00\u66ff\u4ee3\u54c1\u6240\u53d7\u5230\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c06C\u4ee3\u7801\u8f6c\u6362\u4e3aRust\u4ee3\u7801\u5b58\u5728\u5f3a\u70c8\u7684\u52a8\u673a\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u751f\u6210\u6bd4\u57fa\u4e8e\u89c4\u5219\u65b9\u6cd5\u66f4\u81ea\u7136\u3001\u66f4\u5b89\u5168\u7684\u4ee3\u7801\u6765\u81ea\u52a8\u5316\u8fd9\u4e00\u7ffb\u8bd1\u8fc7\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u751f\u6210\u7684Rust\u4ee3\u7801\u5f80\u5f80\u65e0\u6cd5\u7f16\u8bd1\uff0c\u5373\u4f7f\u662f\u76f8\u5bf9\u8f83\u5c0f\u7684C\u7a0b\u5e8f\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u548c\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u7ffb\u8bd1\u65b9\u6848\uff0c\u4ee5\u63d0\u9ad8\u5927\u89c4\u6a21C\u4ee3\u7801\u6210\u529f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\u7684\u6982\u7387\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6280\u672f\uff1a\uff081\uff09\u9884\u5904\u7406C\u4ee3\u7801\uff0c\u4f7f\u5176\u7ed3\u6784\u548c\u8868\u8fbe\u5f0f\u66f4\u597d\u5730\u4e0eRust\u5bf9\u9f50\uff1b\uff082\uff09\u5c06\u4ee3\u7801\u5206\u5272\u4e3a\u6700\u4f73\u5927\u5c0f\u7684\u7ffb\u8bd1\u5355\u5143\uff0c\u4ee5\u907f\u514d\u8d85\u51faLLM\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\uff1b\uff083\uff09\u901a\u8fc7\u4f7f\u7528\u4e0a\u4e0b\u6587\u8865\u5145\u63d0\u793a\uff0c\u8fed\u4ee3\u7f16\u8bd1\u5e76\u4fee\u590d\u9519\u8bef\uff0c\u540c\u65f6\u4fdd\u6301\u4e0d\u540c\u7ffb\u8bd1\u5355\u5143\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u3002\u6210\u529f\u7f16\u8bd1\u662f\u5b9e\u73b0\u529f\u80fd\u7b49\u6548\u6027\u7684\u9996\u8981\u6b65\u9aa4\uff0c\u56e0\u4e3a\u53ea\u6709\u53ef\u7f16\u8bd1\u7684\u4ee3\u7801\u624d\u80fd\u8fdb\u4e00\u6b65\u8fdb\u884c\u6d4b\u8bd5\u3002 \u572820\u4e2a\u57fa\u51c6C\u7a0b\u5e8f\u7684\u5b9e\u9a8c\u4e2d\uff0c\u5305\u62ec\u90a3\u4e9b\u8d85\u8fc74\u5343\u884c\u4ee3\u7801\u7684\u7a0b\u5e8f\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u6240\u6709\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\uff0c\u6ca1\u6709\u4e22\u5931\u539f\u59cb\u4ee3\u7801\u7684\u5bf9\u5e94\u90e8\u5206\u3002|\n", "2409.10504": "|**2024-09-16**|**DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction**|John Wu et.al.|[2409.10504](http://arxiv.org/abs/2409.10504)|null|\u5728\u533b\u5b66\u7f16\u7801\u7b49\u9ad8\u7ef4\u6216\u591a\u6807\u7b7e\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0c\u65e2\u9700\u8981\u9884\u6d4b\u7684\u51c6\u786e\u6027\u4e5f\u9700\u8981\u89e3\u91ca\u7684\u53ef\u8bfb\u6027\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5c40\u90e8\u89e3\u91ca\u65b9\u6cd5\uff0c\u65e0\u6cd5\u63d0\u4f9b\u6574\u4e2a\u591a\u6807\u7b7e\u96c6\u5185\u6bcf\u4e2a\u6807\u7b7e\u9884\u6d4b\u80cc\u540e\u7684\u5168\u9762\u673a\u5236\u89e3\u91ca\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDIctionary Label Attention\uff08\u7b80\u79f0\\method\uff09\u7684\u6a21\u5757\u5316\u89e3\u91ca\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u4e0d\u53ef\u89e3\u91ca\u7684\u5bc6\u96c6\u5d4c\u5165\u5206\u89e3\u5230\u7a00\u758f\u5d4c\u5165\u7a7a\u95f4\u4e2d\u3002\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u975e\u96f6\u5143\u7d20\uff08\u5b57\u5178\u7279\u5f81\uff09\u4ee3\u8868\u4e86\u5168\u5c40\u5b66\u4e60\u7684\u533b\u7597\u6982\u5ff5\u3002 \u901a\u8fc7\u4eba\u5de5\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u7a00\u758f\u5d4c\u5165\u6bd4\u5176\u5bc6\u96c6\u5bf9\u5e94\u7269\u5728\u4eba\u7c7b\u7406\u89e3\u4e0a\u81f3\u5c11\u63d0\u9ad8\u4e8650%\u3002\u6211\u4eec\u7684\u81ea\u52a8\u5b57\u5178\u7279\u5f81\u8bc6\u522b\u7ba1\u9053\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u901a\u8fc7\u68c0\u67e5\u5e76\u603b\u7ed3\u6bcf\u4e2a\u5b57\u5178\u7279\u5f81\u6fc0\u6d3b\u7684\u6700\u9ad8\u7ea7\u8bcd\u6c47\uff0c\u63ed\u793a\u4e86\u6570\u5343\u4e2a\u5b66\u4e60\u5230\u7684\u533b\u7597\u6982\u5ff5\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7a00\u758f\u7684\u53ef\u89e3\u91ca\u77e9\u9635\u8868\u793a\u5b57\u5178\u7279\u5f81\u4e0e\u533b\u7597\u4ee3\u7801\u4e4b\u95f4\u7684\u5173\u7cfb\uff0c\u8fd9\u4e0d\u4ec5\u589e\u5f3a\u4e86\u6a21\u578b\u9884\u6d4b\u7684\u673a\u5236\u6027\u548c\u5168\u5c40\u7406\u89e3\u80fd\u529b\uff0c\u800c\u4e14\u5728\u4e0d\u9700\u8981\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u60c5\u51b5\u4e0b\uff0c\u4fdd\u6301\u4e86\u7ade\u4e89\u529b\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2409.10502": "|**2024-09-16**|**Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles**|Kulin Shah et.al.|[2409.10502](http://arxiv.org/abs/2409.10502)|null|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8eTransformer\u67b6\u6784\u7684\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u6b63\u53d1\u5c55\u51fa\u4e86\u57fa\u672c\u7684\u641c\u7d22\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ecd\u662f\u4e00\u4e2a\u6301\u7eed\u8ba8\u8bba\u7684\u8bdd\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u80fd\u5426\u5b66\u4f1a\u89e3\u51b3\u590d\u6742\u7684\u6570\u72ec\u8c1c\u9898\u8fd9\u4e00\u4efb\u52a1\u3002\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\u9700\u8981\u6a21\u578b\u9996\u5148\u5728\u6240\u6709\u7a7a\u767d\u5355\u5143\u683c\u4e2d\u8fdb\u884c\u641c\u7d22\u4ee5\u51b3\u5b9a\u586b\u5145\u54ea\u4e2a\u5355\u5143\u683c\uff0c\u7136\u540e\u5e94\u7528\u9002\u5f53\u7684\u7b56\u7565\u6765\u586b\u5145\u9009\u5b9a\u7684\u5355\u5143\u683c\u3002\u6709\u65f6\uff0c\u7b56\u7565\u7684\u5e94\u7528\u4ec5\u5bfc\u81f4\u5355\u5143\u683c\u53ef\u80fd\u503c\u7684\u51cf\u5c11\uff0c\u800c\u975e\u786e\u5b9a\u786e\u5207\u503c\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u9700\u8981\u5bf9\u5355\u4e2a\u5355\u5143\u683c\u5e94\u7528\u591a\u4e2a\u7b56\u7565\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u7ecf\u8fc7\u903b\u8f91\u6b65\u9aa4\u5e8f\u5217\u8bad\u7ec3\u7684Transformer\u6a21\u578b\u786e\u5b9e\u80fd\u591f\u5b66\u4f1a\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\uff08\u6211\u4eec\u7684\u6a21\u578b\u6b63\u786e\u89e3\u51b3\u4e8694.21%\u7684\u8c1c\u9898\uff09\u3002\u6211\u4eec\u8fd8\u5bf9Zebra\u8c1c\u9898\uff08\u53c8\u79f0\u7231\u56e0\u65af\u5766\u8c1c\u9898\uff09\u8fdb\u884c\u4e86\u6269\u5c55\u5206\u6790\uff0c\u5e76\u8bc1\u660e\u6a21\u578b\u80fd\u591f\u6b63\u786e\u89e3\u51b392.04%\u7684\u8c1c\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u8bad\u7ec3\u540e\u7684Transformer\u5185\u90e8\u8868\u793a\uff0c\u5e76\u901a\u8fc7\u7ebf\u6027\u63a2\u67e5\u53d1\u73b0\uff0c\u53ef\u4ee5\u4ece\u5b83\u4eec\u4e2d\u89e3\u7801\u51fa\u7ed9\u5b9a\u5355\u5143\u683c\u7684\u6240\u6709\u53ef\u80fd\u503c\u4fe1\u606f\uff0c\u8fd9\u8868\u660eTransformer\u6743\u91cd\u4e2d\u9690\u542b\u7740\u5f3a\u5927\u7684\u63a8\u7406\u5f15\u64ce\u3002|\n", "2409.10490": "|**2024-09-16**|**Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models**|Shaznin Sultana et.al.|[2409.10490](http://arxiv.org/abs/2409.10490)|null|\u8fd1\u5e74\u6765\uff0c\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u5bf9\u5f00\u6e90\u9879\u76ee\u4f9d\u8d56\u7684\u589e\u52a0\u5bfc\u81f4\u4e86\u6f0f\u6d1e\u95ee\u9898\u7684\u663e\u8457\u589e\u957f\uff0c\u8fd9\u4e00\u73b0\u8c61\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc6\u522b\u4ee3\u7801\u5e93\u4e2d\u7684\u6f0f\u6d1e\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u6548\u679c\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u65b0\u5174LLM\u6280\u672f\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u901a\u8fc7\u5bf9\u6bd4\u5206\u6790\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5305\u62ecLlama\u3001CodeLlama\u3001Gemma\u548cCodeGemma\u5728\u5185\u7684\u6700\u8fd1\u52a0\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u53caBERT\u3001RoBERTa\u548cGPT-3\u7b49\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u63ed\u793aLLM\u5728\u6f0f\u6d1e\u68c0\u6d4b\u9886\u57df\u7684\u80fd\u529b\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e0d\u540c\u5f00\u6e90\u4ed3\u5e93\u7684\u5b89\u5168\u5b9e\u8df5\u63d0\u5347\u3002\u7ed3\u679c\u663e\u793a\uff0cCodeGemma\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u53d6\u5f97\u4e86\u6700\u9ad8\u7684F1\u5206\u6570\uff0858%\uff09\u548c\u53ec\u56de\u7387\uff0887%\uff09\u3002|\n", "2409.10484": "|**2024-09-16**|**XLM for Autonomous Driving Systems: A Comprehensive Review**|Sonda Fourati et.al.|[2409.10484](http://arxiv.org/abs/2409.10484)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u63d0\u53d6\u548c\u6587\u732e\u603b\u7ed3\u5230\u5185\u5bb9\u751f\u6210\u3001\u9884\u6d4b\u5efa\u6a21\u3001\u51b3\u7b56\u5236\u5b9a\u4ee5\u53ca\u7cfb\u7edf\u63a7\u5236\u7b49\u591a\u4e2a\u65b9\u9762\u3002\u6b64\u5916\uff0c\u89c6\u89c9\u5927\u578b\u6a21\u578b\uff08VLMs\uff09\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\uff0c\u5373XLMs\uff0c\u80fd\u591f\u7ed3\u5408\u591a\u79cd\u6570\u636e\u6a21\u6001\uff0c\u5e76\u5229\u7528\u8bed\u8a00\u7406\u89e3\u7684\u5f3a\u5927\u529b\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u8bf8\u5982\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\uff08ADS\uff09\u7b49\u57fa\u4e8e\u4fe1\u606f\u7cfb\u7edf\u7684\u8fdb\u6b65\u3002\u901a\u8fc7\u5c06\u8bed\u8a00\u901a\u4fe1\u4e0e\u591a\u6a21\u5f0f\u611f\u5b98\u8f93\u5165\uff08\u5982\u5168\u666f\u56fe\u50cf\u548c\u6fc0\u5149\u96f7\u8fbe\u6216\u96f7\u8fbe\u6570\u636e\uff09\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u91c7\u53d6\u51c6\u786e\u7684\u9a7e\u9a76\u884c\u52a8\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u672c\u6587\u7efc\u8ff0\u4e86XLMs\u5728\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u56de\u987e\u4e86ADS\u548cXLMs\u7684\u76f8\u5173\u6587\u732e\uff0c\u5305\u62ec\u5b83\u4eec\u7684\u67b6\u6784\u3001\u5de5\u5177\u548c\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u8be6\u7ec6\u9610\u8ff0\u4e86\u90e8\u7f72XLMs\u4ee5\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u89e3\u51b3\u65b9\u6848\u7684\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86XLM\u90e8\u7f72\u5728ADS\u4e2d\u7684\u76f8\u5173\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u4fc3\u8fdbXLM\u5728\u672a\u6765ADS\u6846\u67b6\u4e2d\u7684\u5e94\u7528\u3002|\n", "2409.10482": "|**2024-09-17**|**Schrodinger's Memory: Large Language Models**|Wei Wang et.al.|[2409.10482](http://arxiv.org/abs/2409.10482)|null|\u8bb0\u5fc6\u662f\u4eba\u7c7b\u6d3b\u52a8\u7684\u57fa\u7840\uff1b\u6ca1\u6709\u8bb0\u5fc6\uff0c\u51e0\u4e4e\u4e0d\u53ef\u80fd\u6267\u884c\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u4efb\u4f55\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u80fd\u529b\u6b63\u53d8\u5f97\u8d8a\u6765\u8d8a\u63a5\u8fd1\u4eba\u7c7b\u3002\u4f46LLMs\u6709\u8bb0\u5fc6\u5417\uff1f\u6839\u636e\u5f53\u524d\u7684\u8868\u73b0\uff0cLLMs\u786e\u5b9e\u663e\u793a\u51fa\u5177\u6709\u8bb0\u5fc6\u7684\u8ff9\u8c61\u3002\u90a3\u4e48\uff0c\u8fd9\u79cd\u8bb0\u5fc6\u673a\u5236\u80cc\u540e\u662f\u4ec0\u4e48\u539f\u7406\u5462\uff1f\u76ee\u524d\u7684\u7814\u7a76\u7f3a\u4e4f\u5bf9LLMs\u8bb0\u5fc6\u80fd\u529b\u548c\u5e95\u5c42\u7406\u8bba\u7684\u6df1\u5165\u63a2\u8ba8\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6cdb\u903c\u8fd1\u5b9a\u7406\uff08UAT\uff09\u6765\u89e3\u91caLLMs\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u5404\u79cdLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8fd9\u4e9b\u8bb0\u5fc6\u80fd\u529b\u7684\u65b0\u65b9\u6cd5\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0cLLMs\u7684\u8bb0\u5fc6\u5de5\u4f5c\u65b9\u5f0f\u7c7b\u4f3c\u4e8e\u859b\u5b9a\u8c14\u7684\u8bb0\u5fc6\uff0c\u5373\u53ea\u6709\u5728\u67e5\u8be2\u7279\u5b9a\u8bb0\u5fc6\u65f6\u624d\u4f1a\u663e\u73b0\u51fa\u6765\u3002\u6211\u4eec\u53ea\u80fd\u901a\u8fc7\u54cd\u5e94\u67e5\u8be2\u7684\u8f93\u51fa\u6765\u786e\u5b9a\u6a21\u578b\u662f\u5426\u4fdd\u7559\u4e86\u8bb0\u5fc6\uff1b\u5426\u5219\uff0c\u5b83\u4ecd\u7136\u662f\u4e0d\u786e\u5b9a\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u8fd9\u4e00\u6982\u5ff5\uff0c\u901a\u8fc7\u6bd4\u8f83\u4eba\u8111\u548cLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u5b83\u4eec\u5728\u64cd\u4f5c\u673a\u5236\u4e0a\u7684\u76f8\u4f3c\u6027\u548c\u5dee\u5f02\u6027\u3002|\n", "2409.10444": "|**2024-09-16**|**LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning**|Jicong Ao et.al.|[2409.10444](http://arxiv.org/abs/2409.10444)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLLM\u4f5c\u4e3a\u884c\u4e3a\u6811\u89c4\u5212\u5668\u201d\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u673a\u5668\u4eba\u88c5\u914d\u4efb\u52a1\u89c4\u5212\u4e0e\u6267\u884c\u4e2d\u7684\u884c\u4e3a\u6811\uff08BT\uff09\u751f\u6210\u3002\u6211\u4eec\u5f15\u5165\u4e86\u56db\u79cd\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ee5BT\u683c\u5f0f\u4ea7\u751f\u4efb\u52a1\u8ba1\u5212\uff0c\u4ece\u800c\u51cf\u5c11\u4eba\u5de5\u52aa\u529b\u5e76\u786e\u4fdd\u5176\u7a33\u5065\u6027\u548c\u53ef\u7406\u89e3\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u5bf9\u540c\u4e00\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u7684\u53c2\u6570\u8f83\u5c11\u7684LLMs\u7684\u8868\u73b0\u3002\u5728\u6a21\u62df\u548c\u5b9e\u9645\u4e16\u754c\u8bbe\u7f6e\u4e0b\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u63d0\u9ad8\u4e86LLMs\u5728BT\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u548c\u76d1\u7763\u5fae\u8c03\uff0c\u5728BT\u751f\u6210\u65b9\u9762\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u3002|\n", "2409.10411": "|**2024-09-16**|**A Large-Scale Privacy Assessment of Android Third-Party SDKs**|Mark Huasong Meng et.al.|[2409.10411](http://arxiv.org/abs/2409.10411)|null|\u672c\u6587\u7814\u7a76\u5bf9Android\u5e73\u53f0\u4e0a\u7684\u7b2c\u4e09\u65b9\u8f6f\u4ef6\u5f00\u53d1\u5de5\u5177\u5305\uff08SDK\uff09\u8fdb\u884c\u4e86\u9488\u5bf9\u6027\u5206\u6790\uff0c\u65e8\u5728\u586b\u8865Android\u8f6f\u4ef6\u4f9b\u5e94\u94fe\u4e2d\u7684\u5173\u952e\u7a7a\u767d\uff0c\u5173\u6ce8\u4e8e\u7528\u6237\u9690\u79c1\u4fdd\u62a4\u95ee\u9898\u3002\u7814\u7a76\u4e3b\u8981\u4ece\u4e24\u4e2a\u5173\u952e\u7684SDK\u53d1\u5e03\u5e73\u53f0\uff0c\u5b98\u65b9\u5e73\u53f0\u4e0e\u5927\u578b\u66ff\u4ee3\u5e73\u53f0\uff0c\u5bf9\u5e7f\u6cdb\u4f7f\u7528\u7684158\u4e2aSDK\u8fdb\u884c\u4e86\u8c03\u67e5\u3002 \u5728\u9690\u79c1\u6cc4\u9732\u65b9\u9762\uff0c\u6211\u4eec\u53d1\u73b0\u4e86338\u4e2a\u5b9e\u4f8b\uff0c\u8868\u660e\u8fd9\u4e9bSDK\u5728\u672a\u7ecf\u6388\u6743\u7684\u60c5\u51b5\u4e0b\uff0c\u975e\u6cd5\u4f20\u8f93\u4e86\u7528\u6237\u7684\u654f\u611f\u4fe1\u606f\u3002\u8fd9\u53ef\u80fd\u88ab\u7528\u4e8e\u975e\u6cd5\u76ee\u7684\uff0c\u5982\u7528\u6237\u8ffd\u8e2a\u6216\u725f\u5229\u3002 \u5728\u9690\u79c1\u5408\u89c4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u8d85\u8fc730%\u7684\u88ab\u68c0\u67e5SDK\u5e76\u672a\u63d0\u4f9b\u9690\u79c1\u653f\u7b56\uff0c\u4ee5\u62ab\u9732\u5176\u6570\u636e\u5904\u7406\u5b9e\u8df5\u3002\u5bf9\u4e8e\u90a3\u4e9b\u63d0\u4f9b\u4e86\u9690\u79c1\u653f\u7b56\u7684SDK\uff0c\u670937%\u8fc7\u5ea6\u6536\u96c6\u4e86\u7528\u6237\u6570\u636e\uff0c\u800c88%\u5219\u9519\u8bef\u5730\u58f0\u79f0\u62e5\u6709\u8bbf\u95ee\u654f\u611f\u6570\u636e\u7684\u6743\u5229\u3002 \u6211\u4eec\u5728\u4e00\u5e74\u540e\u91cd\u65b0\u5ba1\u89c6\u4e86SDK\u7684\u6700\u65b0\u7248\u672c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u4ee4\u4eba\u62c5\u5fe7\u7684\u8d8b\u52bf\u5e76\u6ca1\u6709\u5f97\u5230\u6539\u5584\u3002 \u57fa\u4e8e\u6211\u4eec\u7684\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u9879\u884c\u52a8\u5efa\u8bae\uff0c\u65e8\u5728\u964d\u4f4e\u9690\u79c1\u6cc4\u9732\u98ce\u9669\u5e76\u589e\u5f3aAndroid\u7528\u6237\u7684\u9690\u79c1\u4fdd\u62a4\u3002\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u5bf9\u884c\u4e1a\u63d0\u51fa\u4e86\u7d27\u8feb\u7684\u5173\u6ce8\u547c\u5401\uff0c\u4e5f\u4e3a\u672a\u6765\u7684\u76d1\u7ba1\u5e72\u9884\u63d0\u4f9b\u4e86\u5173\u952e\u89c1\u89e3\u3002|\n", "2409.10354": "|**2024-09-17**|**Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot**|Bhuvan Sachdeva et.al.|[2409.10354](http://arxiv.org/abs/2409.10354)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u5e94\u7528\u53ca\u5176\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u5e7b\u89c9\u3001\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u504f\u89c1\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u8005\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6784\u5efa\u4f60\u81ea\u5df1\u7684\u4e13\u5bb6\u673a\u5668\u4eba\u201d\uff08BYOeB\uff09\u7684\u5e73\u53f0\uff0c\u5141\u8bb8\u5f00\u53d1\u4eba\u5458\u521b\u5efa\u96c6\u6210\u4e13\u5bb6\u9a8c\u8bc1\u7684LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u3002CataractBot\u662f\u8be5\u5e73\u53f0\u7684\u7b2c\u4e00\u4e2a\u5b9e\u73b0\uff0c\u5b83\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u6709\u5173\u767d\u5185\u969c\u624b\u672f\u7684\u4e13\u5bb6\u9a8c\u8bc1\u56de\u7b54\u3002\u521d\u6b65\u8bc4\u4f30\u663e\u793a\u4e86\u5176\u6f5c\u529b\uff0c\u4f46\u8be5\u7814\u7a76\u6837\u672c\u91cf\u8f83\u5c0f\u4e14\u4e3b\u8981\u4e3a\u5b9a\u6027\u5206\u6790\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5bf9CataractBot\u8fdb\u884c\u4e86\u4e3a\u671f24\u5468\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u6d89\u53ca318\u540d\u60a3\u8005\u53ca\u5176\u966a\u540c\u4eba\u5458\u53d1\u9001\u76841992\u6761\u6d88\u606f\uff0c\u5176\u4e2d91.71%\u7684\u56de\u7b54\u7ecf\u8fc7\u4e86\u4e03\u4f4d\u4e13\u5bb6\u7684\u9a8c\u8bc1\u3002\u901a\u8fc7\u5206\u6790\u4ea4\u4e92\u65e5\u5fd7\uff0c\u6211\u4eec\u53d1\u73b0\u533b\u7597\u95ee\u9898\u8fdc\u591a\u4e8e\u7269\u6d41\u95ee\u9898\uff0c\u5e7b\u89c9\u73b0\u8c61\u53ef\u4ee5\u5ffd\u7565\u4e0d\u8ba1\uff0c\u5e76\u4e14\u4e13\u5bb6\u8bc4\u5b9a84.52%\u7684\u533b\u7597\u56de\u7b54\u51c6\u786e\u65e0\u8bef\u3002\u968f\u7740\u77e5\u8bc6\u5e93\u901a\u8fc7\u4e13\u5bb6\u66f4\u6b63\u4e0d\u65ad\u6269\u5c55\uff0c\u7cfb\u7edf\u7684\u6027\u80fd\u5f97\u5230\u4e8619.02%\u7684\u63d0\u5347\uff0c\u51cf\u5c11\u4e86\u4e13\u5bb6\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002\u8fd9\u4e9b\u53d1\u73b0\u6307\u5bfc\u672a\u6765LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.11404": "|**2024-09-17**|**AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs**|Basel Mousi et.al.|[2409.11404](http://arxiv.org/abs/2409.11404)|null|\u963f\u62c9\u4f2f\u8bed\uff0c\u4ee5\u5176\u4e30\u5bcc\u7684\u65b9\u8a00\u591a\u6837\u6027\uff0c\u4ecd\u7136\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u663e\u8457\u88ab\u4f4e\u4f30\uff0c\u5c24\u5176\u662f\u5728\u65b9\u8a00\u53d8\u4f53\u65b9\u9762\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u673a\u5668\u7ffb\u8bd1\u7ed3\u5408\u4eba\u5de5\u540e\u7f16\u8f91\u521b\u5efa\u7684\u4e03\u4e2a\u4eba\u5de5\u5408\u6210\u6570\u636e\u96c6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u4ee5\u53ca\u963f\u62c9\u4f2f\u5404\u5730\u533a\u7684\u65b9\u8a00\u3002\u6211\u4eec\u63d0\u51fa\u4e86AraDiCE\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u7406\u89e3\u4e0e\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u7814\u7a76\u4fa7\u91cd\u4e8e\u4f4e\u8d44\u6e90\u963f\u62c9\u4f2f\u65b9\u8a00\uff0c\u5e76\u5bf9\u5176\u8fdb\u884c\u4e86\u8bc4\u4ef7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u57fa\u51c6\uff0c\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u534a\u5c9b\u3001\u57c3\u53ca\u548c\u9ece\u51e1\u7279\u5730\u533a\u4e4b\u95f4\u7684\u6587\u5316\u610f\u8bc6\uff0c\u4e3aLLM\u8bc4\u4f30\u63d0\u4f9b\u4e86\u65b0\u7684\u7ef4\u5ea6\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u9488\u5bf9\u7279\u5b9a\u963f\u62c9\u4f2f\u8bed\u6a21\u578b\u5982Jais\u548cAceGPT\u5728\u65b9\u8a00\u4efb\u52a1\u4e0a\u4f18\u4e8e\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u5728\u65b9\u8a00\u8bc6\u522b\u3001\u751f\u6210\u548c\u7ffb\u8bd1\u65b9\u9762\u4ecd\u5b58\u5728\u91cd\u5927\u6311\u6218\u3002\u8fd9\u9879\u5de5\u4f5c\u8d21\u732e\u4e86\u7ea64.5\u4e07\u4e2a\u7ecf\u8fc7\u4eba\u5de5\u540e\u7f16\u8f91\u7684\u6837\u672c\u3001\u4e00\u4e2a\u6587\u5316\u57fa\u51c6\uff0c\u5e76\u5f3a\u8c03\u4e86\u6839\u636e\u7279\u5b9a\u8bad\u7ec3\u6765\u6539\u5584\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6355\u6349\u4e0d\u540c\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u80cc\u666f\u7ec6\u5fae\u5dee\u5f02\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u5728\u672c\u7814\u7a76\u4e2d\u6784\u5efa\u7684\u65b9\u8a00\u7ffb\u8bd1\u6a21\u578b\u548c\u57fa\u51c6\u3002|\n", "2409.11402": "|**2024-09-17**|**NVLM: Open Frontier-Class Multimodal LLMs**|Wenliang Dai et.al.|[2409.11402](http://arxiv.org/abs/2409.11402)|null|\u6211\u4eec\u5f15\u5165\u4e86NVLM 1.0\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fbe\u5230\u524d\u6cbf\u6c34\u5e73\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cf\uff0c\u5176\u6027\u80fd\u4e0e\u9876\u7ea7\u4e13\u6709\u6a21\u578b\uff08\u5982GPT-4o\uff09\u548c\u5f00\u6e90\u6a21\u578b\uff08\u5982Llama 3-V 405B\u548cInternVL 2\uff09\u76f8\u5339\u654c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cNVLM 1.0\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u540e\uff0c\u5728\u4ec5\u6587\u672c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u751a\u81f3\u8d85\u8fc7\u4e86\u5176\u80cc\u540e\u7684\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u67b6\u6784\u3002 \u5728\u6a21\u578b\u8bbe\u8ba1\u65b9\u9762\uff0c\u6211\u4eec\u5bf9\u89e3\u7801\u5668\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982LLaVA\uff09\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u578b\u6a21\u578b\uff08\u5982Flamingo\uff09\u8fdb\u884c\u4e86\u5168\u9762\u6bd4\u8f83\u3002\u57fa\u4e8e\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u67b6\u6784\uff0c\u4ee5\u63d0\u9ad8\u8bad\u7ec3\u6548\u7387\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7528\u4e8e\u52a8\u6001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u76841-D\u74f7\u7816\u6807\u8bb0\u8bbe\u8ba1\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u63a8\u7406\u548cOCR\u76f8\u5173\u4efb\u52a1\u7684\u6027\u80fd\u3002 \u5173\u4e8e\u8bad\u7ec3\u6570\u636e\uff0c\u6211\u4eec\u7cbe\u5fc3\u6536\u96c6\u5e76\u63d0\u4f9b\u4e86\u6240\u6709\u67b6\u6784\u7684\u9884\u8bad\u7ec3\u548c\u76d1\u7763\u5fae\u8c03\u6570\u636e\u96c6\u7684\u8be6\u7ec6\u4fe1\u606f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u6570\u636e\u8d28\u91cf\u548c\u4efb\u52a1\u591a\u6837\u6027\u6bd4\u89c4\u6a21\u66f4\u4e3a\u91cd\u8981\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u4e3aNVLM-1.0\u6a21\u578b\u5f00\u53d1\u4e86\u751f\u4ea7\u7ea7\u591a\u6a21\u6001\u529f\u80fd\uff0c\u4f7f\u5b83\u4eec\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e2d\u4e0d\u4ec5\u4fdd\u6301\u751a\u81f3\u8d85\u8d8a\u4e86\u57fa\u7840\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u4e2d\u5de7\u5999\u5730\u6574\u5408\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u7eaf\u6587\u672c\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u5927\u91cf\u7684\u591a\u6a21\u6001\u6570\u5b66\u548c\u63a8\u7406\u6570\u636e\uff0c\u4ece\u800c\u5728\u6240\u6709\u6a21\u6001\u4e0b\u63d0\u9ad8\u4e86\u6570\u5b66\u548c\u7f16\u7801\u80fd\u529b\u3002 \u4e3a\u4e86\u63a8\u52a8\u9886\u57df\u7814\u7a76\uff0c\u6211\u4eec\u5c06\u53d1\u5e03\u6a21\u578b\u6743\u91cd\u5e76\u5f00\u6e90\u4ee3\u7801\u4f9b\u793e\u533a\u4f7f\u7528\uff1ahttps://nvlm-project.github.io/\u3002|\n", "2409.11390": "|**2024-09-17**|**Says Who? Effective Zero-Shot Annotation of Focalization**|Rebecca M. M. Hicke et.al.|[2409.11390](http://arxiv.org/abs/2409.11390)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e3a\u6587\u5b66\u6587\u672c\u6807\u6ce8\u7126\u70b9\u6a21\u5f0f\u65f6\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u4efb\u52a1\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0e\u53d7\u8fc7\u8bad\u7ec3\u7684\u4eba\u7c7b\u6ce8\u91ca\u8005\u76f8\u5f53\u3002\u6211\u4eec\u4ee5\u65af\u8482\u82ac\u00b7\u91d1\u7684\u5c0f\u8bf4\u4e3a\u4f8b\u8fdb\u884c\u6848\u4f8b\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u8ba1\u7b97\u6587\u5b66\u7814\u7a76\u4e2d\u7684\u5b9e\u7528\u6027\uff0c\u8bf4\u660e\u4e86\u5982\u4f55\u5927\u89c4\u6a21\u5730\u7814\u7a76\u7126\u70b9\u6a21\u5f0f\u3002|\n", "2409.11378": "|**2024-09-17**|**Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement**|Simon Yu et.al.|[2409.11378](http://arxiv.org/abs/2409.11378)|**[link](https://github.com/for-ai/iterative-data-selection)**|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u6570\u636e\u4e0a\u7684\u80fd\u529b\u5bf9\u4e8e\u589e\u5f3a\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u63d0\u5347\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u6307\u4ee4\u6570\u636e\u96c6\u7684\u4e0d\u65ad\u589e\u591a\uff0c\u9009\u62e9\u6709\u6548\u7684\u6570\u636e\u8fdb\u884c\u6709\u6548\u8bad\u7ec3\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u786e\u5b9a\u6709\u6548\u8bad\u7ec3\u7684\u6700\u4f73\u6570\u636e\u5b50\u96c6\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u5b9e\u4f8b\u8d28\u91cf\u7b49\u5c40\u90e8\u6807\u51c6\u8fdb\u884c\u5b50\u96c6\u9009\u62e9\uff0c\u4f46\u6211\u4eec\u8ba4\u4e3a\u5168\u5c40\u89c6\u89d2\u5173\u6ce8\u6570\u636e\u591a\u6837\u6027\u66f4\u4e3a\u5173\u952e\u3002\u6211\u4eec\u91c7\u7528k\u5747\u503c\u805a\u7c7b\u65b9\u6cd5\u786e\u4fdd\u6240\u9009\u5b50\u96c6\u5145\u5206\u4ee3\u8868\u6574\u4e2a\u6570\u636e\u96c6\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u542f\u53d1\u81ea\u4e3b\u52a8\u5b66\u4e60\u6280\u672f\u7684\u8fed\u4ee3\u4f18\u5316\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ece\u5404\u4e2a\u805a\u7c7b\u4e2d\u91cd\u65b0\u91c7\u6837\u5b9e\u4f8b\uff0c\u5e76\u5728\u6bcf\u4e00\u6b21\u8bad\u7ec3\u8fed\u4ee3\u4e2d\u91cd\u65b0\u8bc4\u4f30\u6bcf\u4e2a\u805a\u7c7b\u7684\u91cd\u8981\u6027\u548c\u91c7\u6837\u6743\u91cd\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u964d\u4f4e\u5f02\u5e38\u503c\u7684\u5f71\u54cd\u5e76\u81ea\u52a8\u7b5b\u9009\u51fa\u5305\u542b\u4f4e\u8d28\u91cf\u6570\u636e\u7684\u805a\u7c7b\u3002\u901a\u8fc7\u5728\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u3001\u4e00\u822c\u4e16\u754c\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u5e76\u5bf9\u5404\u79cd\u6a21\u578b\u5bb6\u65cf\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u81f4\u6027\u6539\u8fdb\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u9009\u62e9\u63d0\u9ad8\u4e867%\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u91c7\u6837\u65b9\u6cd5\u63d0\u9ad8\u4e863.8%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5728\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee5\u589e\u5f3a\u5e7f\u6cdb\u7684\u8bc4\u4f30\u4efb\u52a1\u6027\u80fd\u65f6\uff0c\u4f18\u5148\u8003\u8651\u591a\u6837\u6027\u7684\u91c7\u6837\u65b9\u6cd5\u7684\u91cd\u8981\u6027\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/for-ai/iterative-data-selection\u3002|\n", "2409.11376": "|**2024-09-17**|**Towards Time Series Reasoning with LLMs**|Winnie Chow et.al.|[2409.11376](http://arxiv.org/abs/2409.11376)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7b49\u9886\u57df\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u5c1a\u672a\u770b\u5230\u8fd9\u79cd\u5e7f\u6cdb\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65f6\u95f4\u5e8f\u5217MLLM\u7814\u7a76\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u8868\u73b0\uff0c\u4f46\u5f88\u5c11\u6709\u5de5\u4f5c\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u7684\u65f6\u95f4\u5e8f\u5217\u63a8\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u6a21\u6001\u65f6\u95f4\u5e8f\u5217LLM\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8de8\u5404\u79cd\u9886\u57df\u5b66\u4e60\u901a\u7528\u4fe1\u606f\uff0c\u5e76\u5177\u6709\u5f3a\u5927\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5728LLM\u9876\u90e8\u8bad\u7ec3\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u65f6\u95f4\u5e8f\u5217\u7f16\u7801\u5668\uff0c\u76f4\u63a5\u63d0\u53d6\u65f6\u95f4\u5e8f\u5217\u4fe1\u606f\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7684\u65f6\u95f4\u5e8f\u5217\u4efb\u52a1\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9f13\u52b1\u6a21\u578b\u751f\u6210\u63a8\u7406\u8def\u5f84\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u6a21\u578b\u5b66\u4e60\u5230\u7684\u6f5c\u5728\u8868\u793a\u53cd\u6620\u4e86\u7279\u5b9a\u7684\u65f6\u95f4\u5e8f\u5217\u7279\u5f81\uff08\u4f8b\u5982\u659c\u7387\u3001\u9891\u7387\uff09\uff0c\u5e76\u4e14\u5728\u591a\u79cd\u9886\u57df\u7684\u4e00\u7cfb\u5217\u96f6\u6837\u672c\u63a8\u7406\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8eGPT-4o\u3002|\n", "2409.11375": "|**2024-09-17**|**Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification**|Fatema-E- Jannat et.al.|[2409.11375](http://arxiv.org/abs/2409.11375)|null|\u5728\u533b\u7597\u9886\u57df\u4e2d\uff0c\u83b7\u53d6\u5927\u91cf\u6570\u636e\u9762\u4e34\u7740\u663e\u8457\u7684\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u9690\u79c1\u95ee\u9898\u3002\u7136\u800c\uff0c\u4e3a\u4e86\u8bad\u7ec3\u7528\u4e8e\u89c6\u7f51\u819c\u75be\u75c5\u8bca\u65ad\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u96c6\u3002\u5728\u8f83\u5c0f\u6570\u636e\u96c6\u4e0a\u6709\u6548\u6cdb\u5316\u7684\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u6301\u7eed\u7684\u6311\u6218\u3002\u6570\u636e\u7a00\u7f3a\u6027\u6784\u6210\u4e86\u5b9e\u65bd\u53ef\u6269\u5c55\u533b\u7597AI\u89e3\u51b3\u65b9\u6848\u7684\u5b9e\u9645\u969c\u788d\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u591a\u79cd\u6570\u636e\u6e90\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u5e76\u589e\u5f3a\u5bf9\u65b0\u6570\u636e\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u901a\u8fc7\u8d4b\u4e88\u6a21\u578b\u4ece\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e2d\u66f4\u6df1\u5165\u7406\u89e3\u6570\u636e\u8868\u793a\u7684\u80fd\u529b\u3002\u6211\u4eec\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548cSwinV2\u6846\u67b6\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u76d1\u7763\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u591a\u6a21\u6001\u6570\u636e\u96c6\u8868\u793a\u7684\u7406\u89e3\uff0c\u4ece\u800c\u63d0\u9ad8\u4f7f\u7528\u5149\u5b66\u76f8\u5e72\u65ad\u5c42\u6210\u50cf\uff08OCT\uff09\u56fe\u50cf\u68c0\u6d4b\u773c\u75c5\u7684\u80fd\u529b\u3002 \u6211\u4eec\u91c7\u7528\u4e86\u4e24\u9636\u6bb5\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5373\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u548c\u4e0b\u6e38\u76d1\u7763\u5206\u7c7b\u5668\u7684\u5fae\u8c03\u3002\u9488\u5bf9\u4e09\u79cd\u4e0d\u540c\u6570\u636e\u96c6\u8fdb\u884c\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5728\u672a\u878d\u5408\u6570\u636e\u3001\u6570\u636e\u91cf\u6709\u9650\u8bbe\u7f6e\u548c\u65e0\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u573a\u666f\u4e0b\u91c7\u7528\u4e0d\u540c\u7684\u7f16\u7801\u5668\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5373\u4f7f\u5728\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u6761\u4ef6\u4e0b\uff0c\u4e5f\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u4e0e\u57fa\u7ebf\u6a21\u578bResNet-50\u76f8\u6bd4\uff0c\u5177\u6709\u66f4\u5f3a\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.11365": "|**2024-09-17**|**CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration**|Jiahui Gao et.al.|[2409.11365](http://arxiv.org/abs/2409.11365)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u9762\u5bf9\u6076\u610f\u89c6\u89c9\u8f93\u5165\u65f6\u7684\u5b89\u5168\u610f\u8bc6\u95ee\u9898\u3002MLLM\u901a\u5e38\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u914d\u4ee5\u56fe\u50cf\u7f16\u7801\u5668\u5c06\u56fe\u50cf\u8f6c\u6362\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u7684\u6587\u672c\u6570\u636e\u96c6\u4e2d\u7684\u4ee4\u724c\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u89c6\u89c9\u6a21\u6001\u7684\u6574\u5408\u5f15\u5165\u4e86\u4e00\u79cd\u72ec\u7279\u7684\u8106\u5f31\u6027\uff1aMLLM\u5bf9\u6076\u610f\u56fe\u50cf\u8f93\u5165\u53d8\u5f97\u654f\u611f\uff0c\u5e76\u503e\u5411\u4e8e\u751f\u6210\u53ef\u80fd\u5f15\u53d1\u5b89\u5168\u6216\u6709\u5bb3\u54cd\u5e94\u7684\u8f93\u51fa\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u901a\u8fc7\u5728MLLM\u7684\u8f93\u5165\u4e2d\u52a0\u5165\u4e00\u4e2a\u539f\u5219\uff0c\u4ee5\u660e\u786e\u5b9a\u4e49\u5b89\u5168\u6027\u8981\u6c42\uff0c\u5176\u5b89\u5168\u610f\u8bc6\u5f97\u5230\u4e86\u589e\u5f3a\u3002\u8fd9\u8bc1\u5b9e\u4e86MLLM\u5728\u5904\u7406\u56fe\u50cf\u8f93\u5165\u65f6\u5177\u6709\u4e00\u5b9a\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u4f46\u8fd9\u4e00\u80fd\u529b\u53d7\u5230\u6a21\u6001\u5dee\u8ddd\u7684\u5f71\u54cd\u800c\u51cf\u5f31\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u6280\u672f\u2014\u2014CoCA\uff08Calibration of Conditional Awareness\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u8f93\u51fa\u5206\u5e03\u6765\u589e\u5f3aMLLM\u7684\u5b89\u5168\u610f\u8bc6\u3002\u8be5\u7b56\u7565\u6709\u52a9\u4e8e\u6a21\u578b\u6062\u590d\u5176\u539f\u59cb\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u540c\u65f6\u4e0d\u727a\u7272\u5176\u539f\u6709\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u6a21\u6001\u5b89\u5168\u6027\u548c\u7406\u89e3\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.11360": "|**2024-09-17**|**AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances**|Dhruv Agarwal et.al.|[2409.11360](http://arxiv.org/abs/2409.11360)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f53\u897f\u65b9\u5bfc\u5411\u7684AI\u6a21\u578b\u5411\u6765\u81ea\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u7528\u6237\u63d0\u4f9b\u5199\u4f5c\u5efa\u8bae\u65f6\u4f1a\u53d1\u751f\u4ec0\u4e48\u60c5\u51b5\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u8de8\u6587\u5316\u7684\u53d7\u63a7\u5b9e\u9a8c\uff0c\u5171\u6709\u6765\u81ea\u5370\u5ea6\u548c\u7f8e\u56fd\u7684118\u540d\u53c2\u4e0e\u8005\u5b8c\u6210\u4e86\u5177\u6709\u6587\u5316\u57fa\u7840\u7684\u5199\u4f5c\u4efb\u52a1\uff0c\u5e76\u5728\u6709\u65e0AI\u5efa\u8bae\u7684\u60c5\u51b5\u4e0b\u5b8c\u6210\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cAI\u4e3a\u7f8e\u56fd\u4eba\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u6548\u7387\u589e\u76ca\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5370\u5ea6\u53c2\u4e0e\u8005\u5219\u5728\u91c7\u7528\u897f\u65b9\u5199\u4f5c\u98ce\u683c\u65b9\u9762\u53d7\u5230\u5f71\u54cd\uff0c\u4e0d\u4ec5\u6539\u53d8\u4e86\u6240\u5199\u7684\u5185\u5bb9\uff0c\u4e5f\u6539\u53d8\u4e86\u5176\u5199\u4f5c\u98ce\u683c\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u4ee5\u897f\u65b9\u4e3a\u4e2d\u5fc3\u7684AI\u6a21\u578b\u4f1a\u5c06\u5199\u4f5c\u65b9\u5f0f\u540c\u8d28\u5316\uff0c\u4f7f\u4e4b\u8d8b\u5411\u4e8e\u897f\u65b9\u89c4\u8303\uff0c\u4ece\u800c\u524a\u5f31\u4e86\u80fd\u591f\u4f53\u73b0\u6587\u5316\u5dee\u5f02\u7684\u7ec6\u5fae\u4e4b\u5904\u3002|\n", "2409.11353": "|**2024-09-17**|**THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models**|Mengfei Liang et.al.|[2409.11353](http://arxiv.org/abs/2409.11353)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aTHaMES\uff08\u5de5\u5177\u7528\u4e8e\u5e7b\u89c9\u7f13\u89e3\u4e0e\u8bc4\u4f30\uff09\u7684\u96c6\u6210\u6846\u67b6\u548c\u5e93\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u5b58\u5728\u7684\u5e7b\u89c9\u751f\u6210\u8fd9\u4e00\u65e5\u76ca\u589e\u957f\u7684\u6311\u6218\u3002\u73b0\u6709\u7684\u68c0\u6d4b\u548c\u7f13\u89e3\u65b9\u6cd5\u5f80\u5f80\u5b64\u7acb\u4e14\u65e0\u6cd5\u6ee1\u8db3\u7279\u5b9a\u9886\u57df\u7684\u9700\u8981\uff0c\u7f3a\u4e4f\u6807\u51c6\u5316\u6d41\u7a0b\u3002THaMES\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7aef\u5230\u7aef\u89e3\u51b3\u65b9\u6848\uff0c\u6db5\u76d6\u8bc4\u4f30\u548c\u7f13\u89e3LLMs\u4e2d\u5e7b\u89c9\u95ee\u9898\u7684\u5404\u4e2a\u73af\u8282\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u6d4b\u8bd5\u96c6\u751f\u6210\u3001\u591a\u7ef4\u5ea6\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u7075\u6d3b\u7684\u7f13\u89e3\u7b56\u7565\u3002\u5b83\u901a\u8fc7\u6279\u91cf\u5904\u7406\u3001\u52a0\u6743\u62bd\u6837\u548c\u53cd\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u6280\u672f\u81ea\u52a8\u521b\u5efa\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6210\u672c\u6548\u76ca\u9ad8\u7684\u6d4b\u8bd5\u96c6\u3002THaMES\u8bc4\u4f30\u4e86\u6a21\u578b\u5728\u6587\u672c\u751f\u6210\u548c\u4e8c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u68c0\u6d4b\u4e0e\u51cf\u5c11\u80fd\u529b\uff0c\u5e76\u5e94\u7528\u4e86\u6700\u4f73\u7f13\u89e3\u7b56\u7565\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u3001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u4f7f\u7528\u5b66\u672f\u8bba\u6587\u3001\u653f\u6cbb\u65b0\u95fb\u548c\u7ef4\u57fa\u767e\u79d1\u7684\u77e5\u8bc6\u5e93\u5bf9\u524d\u6cbfLLMs\u8fdb\u884c\u8bc4\u4f30\u53d1\u73b0\uff0c\u5546\u4e1a\u6a21\u578b\u5982GPT-4o\u5728\u53d7\u76ca\u4e8eRAG\u65b9\u9762\u6bd4ICL\u66f4\u591a\uff0c\u800c\u5f00\u6e90\u6a21\u578b\u5982Llama-3.1-8B-Instruct\u548cMistral-Nemo\u5219\u4eceICL\u4e2d\u83b7\u5f97\u66f4\u5927\u76ca\u5904\u3002\u6b64\u5916\uff0cPEFT\u663e\u8457\u63d0\u9ad8\u4e86Llama-3.1-8B-Instruct\u5728\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002|\n", "2409.11282": "|**2024-09-17**|**Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5**|Marcel Lamott et.al.|[2409.11282](http://arxiv.org/abs/2409.11282)|null|\u968f\u7740\u5404\u7c7b\u6570\u5b57\u6587\u6863\u683c\u5f0f\u7684\u6fc0\u589e\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u975e\u6807\u51c6\u5316\u7684\u6587\u6863\u5982\u5546\u4e1a\u62a5\u544a\u548c\u73af\u5883\u8bc4\u4f30\u62a5\u544a\uff0c\u6587\u6863\u7406\u89e3\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u80fd\u529b\uff0c\u4f46\u5728\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u76f4\u63a5\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u8868\u660eLLMs\u5728\u8fd9\u4e00\u9886\u57df\u5177\u6709\u6f5c\u529b\uff0c\u7136\u800c\u5b83\u4eec\u5de8\u5927\u7684\u8ba1\u7b97\u9700\u6c42\u4f7f\u5176\u96be\u4ee5\u6709\u6548\u5730\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u4e13\u6709\u7684\u201c\u9ed1\u76d2\u201dLLMs\u5f80\u5f80\u4f18\u4e8e\u5f00\u6e90\u7248\u672c\uff0c\u8fd9\u6784\u6210\u4e86\u5e7f\u6cdb\u53ef\u8bbf\u95ee\u6027\u7684\u969c\u788d\u3002\u672c\u6587\u6df1\u5165\u63a2\u8ba8\u4e86\u6587\u6863\u7406\u89e3\u7684\u9886\u57df\uff0c\u5229\u7528\u4e86\u4eceLLM ChatGPT\u5230FLAN-T5\u7684\u63d0\u70bc\u65b9\u6cd5\u6765\u5e73\u8861\u5927\u6a21\u578b\u7684\u5f3a\u5927\u529f\u80fd\u4e0e\u8ba1\u7b97\u9650\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u6574\u5408\u6807\u8bb0\u548c\u8bfe\u7a0b\u5b66\u4e60\u673a\u5236\u6765\u4fc3\u8fdb\u77e5\u8bc6\u7684\u6709\u6548\u8f6c\u79fb\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u7684\u8fdb\u5c55\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5f25\u5408\u8d44\u6e90\u5bc6\u96c6\u578bLLMs\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u63d0\u70bc\u6280\u672f\u5728\u4f7f\u590d\u6742\u8bed\u8a00\u6a21\u578b\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5f97\u5230\u5e7f\u6cdb\u5e94\u7528\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u63a8\u52a8\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u53d1\u5c55\u3002|\n", "2409.12194": "|**2024-09-20**|**Gender Representation and Bias in Indian Civil Service Mock Interviews**|Somonnoy Banerjee et.al.|[2409.12194](http://arxiv.org/abs/2409.12194)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u8d21\u732e\u3002\u9996\u5148\uff0c\u901a\u8fc7\u6536\u96c6\u81ea888\u4e2a\u5370\u5ea6\u516c\u52a1\u5458\u5019\u9009\u4eba\u9762\u8bd5\u6a21\u62df\u7684YouTube\u89c6\u9891\u4e2d\u768451,278\u4e2a\u95ee\u9898\u6837\u672c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5bf9\u7537\u6027\u548c\u5973\u6027\u5019\u9009\u4eba\u63d0\u95ee\u7684\u6027\u522b\u504f\u89c1\u5728\u5e7f\u6cdb\u6027\u8d28\u4e0a\u7684\u663e\u8457\u5b58\u5728\u3002\u7b2c\u4e8c\uff0c\u6211\u4eec\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b9e\u9a8c\u63ed\u793a\u4e86\u5728\u6027\u522b\u63a8\u65ad\u4efb\u52a1\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u7684\u89e3\u91ca\u4e2d\u5b58\u5728\u5f3a\u70c8\u7684\u6027\u522b\u504f\u89c1\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b51,278\u4e2a\u9762\u8bd5\u95ee\u9898\u7684\u65b0\u578b\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u672a\u6765\u7684\u4eba\u6587\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u63d0\u4f9b\u4fe1\u606f\u3002|\n", "2409.12183": "|**2024-09-18**|**To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning**|Zayne Sprague et.al.|[2409.12183](http://arxiv.org/abs/2409.12183)|null|\u4e3a\u4e86\u5206\u6790\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u5728\u54ea\u4e9b\u4efb\u52a1\u4e2d\u771f\u6b63\u6709\u76ca\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u91cf\u5316\u5143\u5206\u6790\uff0c\u8986\u76d6\u4e86\u8d85\u8fc7100\u7bc7\u4f7f\u7528CoT\u7684\u8bba\u6587\uff0c\u5e76\u5bf920\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u4e8614\u79cd\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0cCoT\u4e3b\u8981\u5728\u6570\u5b66\u6216\u903b\u8f91\u4efb\u52a1\u4e0a\u63d0\u4f9b\u663e\u8457\u6027\u80fd\u4f18\u52bf\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u4efb\u52a1\u4e0a\u7684\u589e\u76ca\u8f83\u5c0f\u3002\u5728MMLU\u4e0a\uff0c\u76f4\u63a5\u751f\u6210\u7b54\u6848\u800c\u65e0\u9700CoT\u51e0\u4e4e\u4e0eCoT\u5177\u6709\u76f8\u540c\u7684\u51c6\u786e\u6027\uff0c\u9664\u975e\u95ee\u9898\u6216\u6a21\u578b\u7684\u56de\u7b54\u5305\u542b\u7b49\u53f7\uff0c\u8fd9\u8868\u660e\u7b26\u53f7\u64cd\u4f5c\u548c\u63a8\u7406\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u5206\u6790\u4e86CoT\u5728\u8fd9\u4e9b\u95ee\u9898\u4e2d\u7684\u884c\u4e3a\uff0c\u901a\u8fc7\u5206\u79bb\u89c4\u5212\u548c\u6267\u884c\uff0c\u5e76\u4e0e\u589e\u5f3a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6bd4\u8f83\u3002CoT\u5927\u90e8\u5206\u6536\u76ca\u6765\u81ea\u6539\u8fdb\u7684\u7b26\u53f7\u6267\u884c\uff0c\u4f46\u76f8\u8f83\u4e8e\u4f7f\u7528\u7b26\u53f7\u6c42\u89e3\u5668\uff0c\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u53ef\u4ee5\u6839\u636e\u9700\u8981\u5e94\u7528CoT\uff0c\u540c\u65f6\u4fdd\u6301\u6027\u80fd\u5e76\u8282\u7701\u63a8\u7406\u6210\u672c\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u7ed3\u679c\u8fd8\u8868\u660e\uff0c\u9700\u8981\u8d85\u8d8a\u57fa\u4e8e\u63d0\u793a\u7684CoT\uff0c\u8f6c\u5411\u65b0\u7684\u8303\u5f0f\uff0c\u66f4\u597d\u5730\u5229\u7528\u6574\u4e2a\u8303\u56f4\u5185\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5e94\u7528\u4e2d\u7684\u4e2d\u95f4\u8ba1\u7b97\u3002|\n", "2409.12180": "|**2024-09-18**|**Finetuning Language Models to Emit Linguistic Expressions of Uncertainty**|Arslan Chaudhry et.al.|[2409.12180](http://arxiv.org/abs/2409.12180)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u68c0\u7d22\u4e0e\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u4f46\u5b83\u4eec\u503e\u5411\u4e8e\u751f\u6210\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u76f8\u51b2\u7a81\u7684\u4fe1\u606f\uff0c\u5e76\u4ee5\u8bf4\u670d\u6027\u7684\u65b9\u5f0f\u8868\u8fbe\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4e0d\u51c6\u786e\u6027\u770b\u8d77\u6765\u81ea\u4fe1\u4e14\u4ee4\u4eba\u4fe1\u670d\u3002\u8fd9\u5bfc\u81f4\u6700\u7ec8\u7528\u6237\u96be\u4ee5\u4e00\u81f4\u5730\u5c06LLM\u7684\u81ea\u4fe1\u5ea6\u4e0e\u9884\u6d4b\u7684\u51c6\u786e\u6027\u5bf9\u9f50\uff0c\u5e38\u5e38\u5bfc\u81f4\u5bf9\u6240\u6709\u8f93\u51fa\u7684\u76f2\u76ee\u4fe1\u4efb\u6216\u5b8c\u5168\u5ffd\u89c6\u5176\u53ef\u9760\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5728\u4e0d\u786e\u5b9a\u6027\u589e\u5f3a\u7684\u9884\u6d4b\u57fa\u7840\u4e0a\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u4ee5\u6b64\u6765\u5f00\u53d1\u80fd\u591f\u751f\u6210\u8bed\u8a00\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u7684\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8861\u91cf\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6821\u51c6\u7a0b\u5ea6\uff0c\u7136\u540e\u901a\u8fc7\u57fa\u4e8e\u6a21\u578b\u81ea\u8eab\u4fe1\u5fc3\u7684\u5fae\u8c03\uff0c\u4f7f\u8bed\u8a00\u6a21\u578b\u4ea7\u751f\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002\u901a\u8fc7\u5bf9\u5404\u79cd\u95ee\u7b54\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86LLM\u5728\u8bc4\u4f30\u9884\u6d4b\u65f6\u5177\u6709\u826f\u597d\u7684\u6821\u51c6\u80fd\u529b\uff0c\u5e76\u57fa\u4e8e\u6a21\u578b\u672c\u8eab\u7684\u4fe1\u5fc3\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u53ef\u83b7\u5f97\u7279\u522b\u9002\u7528\u4e8e\u5355\u4e2a\u58f0\u660e\u7b54\u6848\u7684\u826f\u597d\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002|\n", "2409.12150": "|**2024-09-18**|**Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference**|Najmeh Forouzandehmehr et.al.|[2409.12150](http://arxiv.org/abs/2409.12150)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u8868\u8fbe\u80fd\u529b\u6765\u89e3\u51b3\u4e2a\u6027\u5316\u670d\u88c5\u63a8\u8350\u8fd9\u4e00\u590d\u6742\u6311\u6218\u3002\u901a\u8fc7\u7ec6\u8c03\u548c\u76f4\u63a5\u53cd\u9988\u96c6\u6210\uff0c\u6211\u4eec\u8bd5\u56fe\u514b\u670dLLM\u7684\u201c\u9ed1\u76d2\u201d\u7279\u6027\u548c\u9759\u6001\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e0a\u4f7f\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u56fe\u50cf\u63cf\u8ff0\uff0c\u6765\u5f25\u5408\u9879\u76ee\u89c6\u89c9\u4e0e\u6587\u672c\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u4ece\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e2d\u63d0\u53d6\u98ce\u683c\u548c\u8272\u5f69\u7279\u5f81\uff0c\u4ece\u800c\u5f62\u6210\u4e2a\u6027\u5316\u7684\u63a8\u8350\u57fa\u7840\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u7684Polyvore\u6570\u636e\u96c6\u5bf9LLM\u8fdb\u884c\u9ad8\u6548\u7ec6\u8c03\uff0c\u4f18\u5316\u5176\u63a8\u8350\u65f6\u5c1a\u642d\u914d\u7684\u80fd\u529b\u3002\u91c7\u7528\u76f4\u63a5\u504f\u597d\u673a\u5236\u5e76\u7ed3\u5408\u8d1f\u4f8b\uff0c\u4ee5\u589e\u5f3aLLM\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u8fd9\u521b\u5efa\u4e86\u4e00\u4e2a\u81ea\u6211\u589e\u5f3a\u7684\u4eba\u5de5\u667a\u80fd\u53cd\u9988\u5faa\u73af\uff0c\u6301\u7eed\u5730\u6839\u636e\u5b63\u8282\u6027\u65f6\u5c1a\u8d8b\u52bf\u4f18\u5316\u63a8\u8350\u3002\u6211\u4eec\u7684\u6846\u67b6\u5728Polyvore\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\uff1a\u8865\u5168\u7a7a\u767d\u548c\u8f85\u52a9\u9879\u76ee\u68c0\u7d22\u3002\u8fd9\u4e9b\u8bc4\u4f30\u7ed3\u679c\u5f3a\u8c03\u4e86\u6846\u67b6\u751f\u6210\u65f6\u5c1a\u3001\u4e0e\u6f6e\u6d41\u4e00\u81f4\u7684\u670d\u88c5\u5efa\u8bae\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u76f4\u63a5\u53cd\u9988\u6301\u7eed\u6539\u8fdb\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u5728\u8fd9\u4e9b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\uff0c\u521b\u9020\u4e86\u66f4\u52a0\u534f\u8c03\u7684\u670d\u88c5\u3002\u6539\u8fdb\u7684\u8868\u73b0\u8bc1\u660e\u4e86\u8be5\u6846\u67b6\u589e\u5f3a\u8d2d\u7269\u4f53\u9a8c\u3001\u63d0\u4f9b\u51c6\u786e\u5efa\u8bae\u7684\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5b83\u76f8\u5bf9\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.12147": "|**2024-09-18**|**MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning**|Justin Chih-Yao Chen et.al.|[2409.12147](http://arxiv.org/abs/2409.12147)|**[link](https://github.com/dinobby/magicore)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\u53ef\u4ee5\u901a\u8fc7\u5728\u6d4b\u8bd5\u65f6\u91c7\u7528\u805a\u5408\u7b56\u7565\u8fdb\u884c\u63d0\u5347\uff0c\u5373\u751f\u6210\u591a\u4e2a\u6837\u672c\u5e76\u57fa\u4e8e\u751f\u6210\u6837\u672c\u8fdb\u884c\u6295\u7968\u3002\u867d\u7136\u8fd9\u4e9b\u7b56\u7565\u80fd\u591f\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5b58\u5728\u9971\u548c\u70b9\u3002\u6539\u8fdb\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201cRefinement\u201d\u7684\u7b56\u7565\uff0c\u901a\u8fc7\u5229\u7528LLM\u751f\u6210\u7684\u53cd\u9988\u6765\u63d0\u5347\u89e3\u51b3\u65b9\u6848\u7684\u8d28\u91cf\u3002\u7136\u800c\uff0cRefinement\u4e5f\u5e26\u6765\u4e86\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u8fc7\u5ea6\u7ec6\u5316\uff1a\u5bf9\u6240\u6709\u5b9e\u4f8b\u8fdb\u884c\u7edf\u4e00\u7ec6\u5316\u53ef\u80fd\u5bfc\u81f4\u8fc7\u5ea6\u4fee\u6b63\uff0c\u4ece\u800c\u964d\u4f4e\u6574\u4f53\u6027\u80fd\u3002\uff082\uff09\u96be\u4ee5\u5b9a\u4f4d\u548c\u7ea0\u6b63\u9519\u8bef\uff1aLLM\u5177\u6709\u6709\u9650\u7684\u81ea\u6211\u7ea0\u6b63\u80fd\u529b\uff0c\u5f88\u96be\u8bc6\u522b\u5e76\u7ea0\u6b63\u81ea\u5df1\u7684\u9519\u8bef\u3002\uff083\uff09\u7ec6\u5316\u4e0d\u8db3\uff1a\u51b3\u5b9a\u9700\u8981\u591a\u5c11\u8fed\u4ee3\u7684\u7ec6\u5316\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fc7\u65e9\u505c\u6b62\u53ef\u80fd\u4f1a\u8ba9\u9519\u8bef\u672a\u5f97\u5230\u89e3\u51b3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAgICoRe\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u95ee\u9898\u96be\u5ea6\u5206\u4e3a\u7b80\u5355\u6216\u56f0\u96be\uff0c\u5e76\u4f7f\u7528\u7c97\u7c92\u5ea6\u805a\u5408\u89e3\u51b3\u7b80\u5355\u95ee\u9898\uff0c\u4f7f\u7528\u7ec6\u7c92\u5ea6\u548c\u591a\u8f6e\u8fed\u4ee3\u7ec6\u5316\u89e3\u51b3\u56f0\u96be\u95ee\u9898\uff0c\u4ee5\u907f\u514d\u8fc7\u5ea6\u7ec6\u5316\u3002\u4e3a\u4e86\u6539\u5584\u9519\u8bef\u5b9a\u4f4d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57fa\u4e8e\u6b65\u9aa4\u7ea7\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u5206\u6570\u7684\u5916\u90e8\u8bc4\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7531\u4e09\u4e2a\u4ee3\u7406\u7ec4\u6210\u7684\u591a\u4ee3\u7406\u5faa\u73af\uff1a\u6c42\u89e3\u8005\u3001\u5ba1\u67e5\u8005\uff08\u6839\u636e\u6b65\u9aa4\u7ea7RM\u5206\u6570\u751f\u6210\u9488\u5bf9\u6027\u53cd\u9988\uff09\u4ee5\u53ca\u7ec6\u5316\u8005\uff08\u6574\u5408\u53cd\u9988\uff09\uff0c\u4ee5\u786e\u4fdd\u6709\u6548\u7ec6\u5316\u3002\u4e3a\u4e86\u786e\u4fdd\u8db3\u591f\u7684\u7ec6\u5316\uff0c\u6211\u4eec\u91cd\u65b0\u8bc4\u4f30\u66f4\u65b0\u540e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u5728\u5fc5\u8981\u65f6\u542f\u52a8\u8fdb\u4e00\u6b65\u7684\u7ec6\u5316\u8f6e\u6b21\u3002\u6211\u4eec\u4f7f\u7528Llama-3-8B\u548cGPT-3.5\u57285\u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86MAgICoRe\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u6709\u6548\u6027\u3002\u5373\u4f7f\u53ea\u8fdb\u884c\u4e00\u6b21\u8fed\u4ee3\uff0cMAgICoRe\u4e5f\u80fd\u5728\u4f7f\u7528\u4e0d\u5230\u57fa\u7ebf\u6837\u672c\u4e00\u534a\u7684\u60c5\u51b5\u4e0b\uff0c\u5206\u522b\u8d85\u8fc7Self-Consistency\u3001Best-of-k\u548cSelf-Refine\u7b97\u6cd53.4%\u30013.2%\u548c4.0%\u3002\u4e0e\u8fed\u4ee3\u7ec6\u5316\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0cMAgICoRe\u968f\u7740\u8fed\u4ee3\u6b21\u6570\u7684\u589e\u52a0\u6301\u7eed\u63d0\u9ad8\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u7684\u6d88\u878d\u5b9e\u9a8c\u5f3a\u8c03\u4e86MAgICoRe\u4e2dRMs\u548c\u591a\u4ee3\u7406\u901a\u4fe1\u7684\u91cd\u8981\u6027\u3002**|\n", "2409.12140": "|**2024-09-18**|**MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion**|Kalakonda Sai Shashank et.al.|[2409.12140](http://arxiv.org/abs/2409.12140)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMoRAG\u7684\u521b\u65b0\u591a\u90e8\u5206\u878d\u5408\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\uff0c\u7528\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u3002\u6b64\u65b9\u6cd5\u901a\u8fc7\u5229\u7528\u589e\u5f3a\u7684\u8fd0\u52a8\u68c0\u7d22\u8fc7\u7a0b\u83b7\u5f97\u7684\u989d\u5916\u77e5\u8bc6\u6765\u63d0\u5347\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u3002\u901a\u8fc7\u6709\u6548\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u6211\u4eec\u89e3\u51b3\u4e86\u8fd0\u52a8\u68c0\u7d22\u4e2d\u7684\u62fc\u5199\u9519\u8bef\u548c\u91cd\u8ff0\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u591a\u90e8\u5206\u68c0\u7d22\u7b56\u7565\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u68c0\u7d22\u5728\u8bed\u8a00\u7a7a\u95f4\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u7a7a\u95f4\u7ec4\u5408\u68c0\u7d22\u5230\u7684\u52a8\u4f5c\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u6837\u672c\u3002\u6b64\u5916\uff0c\u5229\u7528\u4f4e\u7ea7\u3001\u7279\u5b9a\u90e8\u5206\u7684\u8fd0\u52a8\u4fe1\u606f\uff0c\u6211\u4eec\u53ef\u4ee5\u6784\u5efa\u9488\u5bf9\u672a\u89c1\u8fc7\u6587\u672c\u63cf\u8ff0\u7684\u8fd0\u52a8\u6837\u672c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u4f5c\u4e3a\u63d2\u4ef6\u6a21\u5757\u4f7f\u7528\uff0c\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u7684\u6027\u80fd\u3002\u4ee3\u7801\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u548c\u89c6\u9891\u793a\u4f8b\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u63d0\u4f9b\uff1ahttps://motion-rag.github.io/|\n", "2409.12139": "|**2024-09-24**|**Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models**|Sijing Chen et.al.|[2409.12139](http://arxiv.org/abs/2409.12139)|null|\u968f\u7740\u5927\u6570\u636e\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u4ee3\u7684\u5230\u6765\uff0c\u96f6\u6837\u672c\u4e2a\u6027\u5316\u5feb\u901f\u5b9a\u5236\u5df2\u6210\u4e3a\u4e00\u4e2a\u663e\u8457\u8d8b\u52bf\u3002\u672c\u62a5\u544a\u4ecb\u7ecd\u4e86Takin AudioLLM\u7cfb\u5217\u6280\u672f\u4e0e\u6a21\u578b\uff0c\u4e3b\u8981\u5305\u62ecTakin TTS\u3001Takin VC\u548cTakin Morphing\uff0c\u4e13\u95e8\u7528\u4e8e\u6709\u58f0\u8bfb\u7269\u5236\u4f5c\u3002\u8fd9\u4e9b\u6a21\u578b\u5177\u5907\u96f6\u6837\u672c\u8bed\u97f3\u751f\u6210\u80fd\u529b\uff0c\u80fd\u4ea7\u751f\u51e0\u4e4e\u4e0e\u771f\u4eba\u58f0\u97f3\u96be\u4ee5\u533a\u5206\u7684\u9ad8\u8d28\u91cf\u8bed\u97f3\uff0c\u4f7f\u5f97\u4e2a\u4eba\u53ef\u4ee5\u6839\u636e\u81ea\u8eab\u9700\u6c42\u5b9a\u5236\u8bed\u97f3\u5185\u5bb9\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdTakin TTS\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u589e\u5f3a\u795e\u7ecf\u8bed\u97f3\u7f16\u89e3\u7801\u5668\u548c\u591a\u4efb\u52a1\u8bad\u7ec3\u6846\u67b6\u7684\u795e\u7ecf\u7f16\u89e3\u7801\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u4ee5\u96f6\u6837\u672c\u65b9\u5f0f\u751f\u6210\u9ad8\u4fdd\u771f\u81ea\u7136\u8bed\u97f3\u3002\u5bf9\u4e8eTakin VC\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u5185\u5bb9\u4e0e\u97f3\u8272\u8054\u5408\u5efa\u6a21\u65b9\u6cd5\u6765\u63d0\u9ad8\u8bf4\u8bdd\u4eba\u76f8\u4f3c\u5ea6\uff0c\u5e76\u5021\u5bfc\u57fa\u4e8e\u6761\u4ef6\u6d41\u5339\u914d\u7684\u89e3\u7801\u5668\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u81ea\u7136\u6027\u548c\u8868\u8fbe\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Takin Morphing\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u91c7\u7528\u9ad8\u5ea6\u89e3\u8026\u4e14\u5148\u8fdb\u7684\u97f3\u8272\u4e0e\u8282\u594f\u5efa\u6a21\u65b9\u6cd5\uff0c\u4f7f\u4e2a\u4f53\u80fd\u591f\u4ee5\u7cbe\u786e\u53ef\u63a7\u7684\u65b9\u5f0f\u6839\u636e\u81ea\u5df1\u7684\u504f\u597d\u5b9a\u5236\u8bed\u97f3\u751f\u4ea7\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eecTakin AudioLLM\u7cfb\u5217\u6a21\u578b\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u6709\u5173\u8be6\u7ec6\u6f14\u793a\uff0c\u8bf7\u53c2\u9605\u3002|\n", "2409.12122": "|**2024-09-18**|**Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement**|An Yang et.al.|[2409.12122](http://arxiv.org/abs/2409.12122)|null|\u5728\u672c\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7cfb\u5217\u6570\u5b66\u4e13\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff1aQwen2.5-Math \u548c Qwen2.5-Math-Instruct-1.5B/7B/72B\u3002Qwen2.5 \u7cfb\u5217\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5728\u6574\u4e2a\u7ba1\u9053\u4e2d\u878d\u5165\u81ea\u6211\u63d0\u5347\u7684\u54f2\u5b66\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u3001\u540e\u5904\u7406\u548c\u63a8\u7406\u9636\u6bb5\uff1a\uff081\uff09\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u4f7f\u7528 Qwen2-Math-Instruct \u6765\u751f\u6210\u5927\u89c4\u6a21\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6570\u636e\u3002\uff082\uff09\u5728\u540e\u5904\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u901a\u8fc7\u4ece Qwen2-Math-Instruct \u8fdb\u884c\u5927\u91cf\u91c7\u6837\u6765\u5f00\u53d1\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06\u6b64 RM \u5e94\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u7684\u8fed\u4ee3\u8fdb\u5316\u3002\u901a\u8fc7\u589e\u5f3a\u7684 SFT \u6a21\u578b\uff0c\u6709\u53ef\u80fd\u8fdb\u884c\u8fed\u4ee3\u8bad\u7ec3\u5e76\u66f4\u65b0 RM\uff0c\u8fdb\u800c\u6307\u5bfc SFT \u6570\u636e\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u5728\u6700\u7ec8\u7684 SFT \u6a21\u578b\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u7ec8\u6781 RM \u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff0c\u4ece\u800c\u4ea7\u751f Qwen2.5-Math-Instruct \u6a21\u578b\u3002\uff083\uff09\u6b64\u5916\uff0c\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u4f7f\u7528 RM \u6765\u5f15\u5bfc\u91c7\u6837\uff0c\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002 Qwen2.5-Math-Instruct \u652f\u6301\u4e2d\u6587\u548c\u82f1\u6587\uff0c\u5e76\u5177\u6709\u9ad8\u7ea7\u6570\u5b66\u63a8\u7406\u80fd\u529b\uff0c\u5305\u62ec\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u548c\u5de5\u5177\u96c6\u6210\u63a8\u7406\uff08TIR\uff09\u3002\u6211\u4eec\u5728\u82f1\u8bed\u548c\u4e2d\u6587\u7684 10 \u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5982 GSM8K\u3001MATH\u3001GaoKao\u3001AMC23 \u548c AIME24\uff0c\u6db5\u76d6\u4ece\u5c0f\u5b66\u6c34\u5e73\u5230\u6570\u5b66\u7ade\u8d5b\u95ee\u9898\u7684\u5e7f\u6cdb\u96be\u5ea6\u3002|\n", "2409.12117": "|**2024-09-18**|**Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference**|Edresson Casanova et.al.|[2409.12117](http://arxiv.org/abs/2409.12117)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u5c06\u97f3\u9891\u8f6c\u6362\u4e3a\u79bb\u6563\u4ee4\u724c\u7684\u97f3\u9891\u7f16\u89e3\u7801\u5668\u65b9\u9762\u663e\u8457\u63a8\u52a8\u4e86\u97f3\u9891\u5904\u7406\uff0c\u8fd9\u4f7f\u5f97\u53ef\u4ee5\u5c06\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u5e94\u7528\u4e8e\u97f3\u9891\u6570\u636e\u3002\u7136\u800c\uff0c\u97f3\u9891\u7f16\u89e3\u7801\u5668\u901a\u5e38\u4ee5\u9ad8\u5e27\u7387\u8fd0\u884c\uff0c\u5bfc\u81f4\u8bad\u7ec3\u548c\u63a8\u7406\u901f\u5ea6\u7f13\u6162\uff0c\u7279\u522b\u662f\u5728\u81ea\u56de\u5f52\u6a21\u578b\u4e2d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f4e\u5e27\u7387\u8bed\u97f3\u7f16\u89e3\u7801\u5668\uff08LFSC\uff09\uff1a\u4e00\u79cd\u795e\u7ecf\u97f3\u9891\u7f16\u89e3\u7801\u5668\uff0c\u5b83\u5229\u7528\u6709\u9650\u6807\u91cf\u91cf\u5316\u548c\u4e0e\u5927\u578b\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u6297\u6027\u8bad\u7ec3\uff0c\u4ee51.89 kbps\u7684\u6bd4\u7279\u7387\u548c21.5\u5e27/\u79d2\u5b9e\u73b0\u9ad8\u8d28\u91cf\u7684\u97f3\u9891\u538b\u7f29\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6211\u4eec\u7684\u65b0\u578b\u7f16\u89e3\u7801\u5668\u53ef\u4ee5\u4f7f\u57fa\u4e8eLLM\u7684\u6587\u672c\u5230\u8bed\u97f3\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u52a0\u5feb\u7ea6\u4e09\u500d\uff0c\u540c\u65f6\u63d0\u9ad8\u53ef\u61c2\u5ea6\u5e76\u4ea7\u751f\u4e0e\u4ee5\u5f80\u6a21\u578b\u76f8\u5f53\u7684\u8d28\u91cf\u3002|\n", "2409.12106": "|**2024-09-18**|**Measuring Human and AI Values based on Generative Psychometrics with Large Language Models**|Haoran Ye et.al.|[2409.12106](http://arxiv.org/abs/2409.12106)|**[link](https://github.com/value4ai/gpv)**|**\u672c\u6587\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u5fc3\u7406\u6d4b\u5ea6\uff08GPV\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6570\u636e\u9a71\u52a8\u7684\u4ef7\u503c\u6d4b\u91cf\u8303\u5f0f\uff0c\u7406\u8bba\u57fa\u7840\u5728\u4e8e\u6587\u672c\u63ed\u793a\u7684\u9009\u62e9\u6027\u611f\u77e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u4ee5\u5b9e\u73b0\u7cbe\u786e\u7684\u611f\u77e5\u5c42\u7ea7\u4ef7\u503c\u6d4b\u91cf\uff0c\u5e76\u9a8c\u8bc1LLM\u89e3\u6790\u6587\u672c\u5f62\u6210\u611f\u77e5\u7684\u6838\u5fc3\u80fd\u529b\uff0c\u4ece\u800c\u6784\u5efaGPV\u7ba1\u9053\u7684\u57fa\u7840\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06GPV\u5e94\u7528\u4e8e\u4eba\u7c7b\u64b0\u5199\u7684\u535a\u5ba2\uff0c\u8bc1\u660e\u5176\u7a33\u5b9a\u6027\u548c\u6709\u6548\u6027\uff0c\u5e76\u4e14\u4f18\u4e8e\u5148\u524d\u7684\u5fc3\u7406\u5b66\u5de5\u5177\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06GPV\u6269\u5c55\u5230LLM\u4ef7\u503c\u6d4b\u91cf\uff0c\u901a\u8fc7\u4ee5\u4e0b\u65b9\u5f0f\u63a8\u52a8\u5f53\u524d\u6280\u672f\uff1a1\uff09\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u53ef\u6269\u5c55\u548c\u81ea\u7531\u5f62\u5f0f\u8f93\u51fa\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u4f7f\u4ef7\u503c\u6d4b\u91cf\u80fd\u591f\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\uff1b2\uff09\u6bd4\u8f83\u4e86\u4e0d\u540c\u6d4b\u91cf\u65b9\u6cd5\uff0c\u63ed\u793a\u4e86\u524d\u4eba\u65b9\u6cd5\u7684\u56de\u5e94\u504f\u5dee\uff1b3\uff09\u5c1d\u8bd5\u5c06LLM\u4ef7\u503c\u4e0e\u5b89\u5168\u6027\u8054\u7cfb\u8d77\u6765\uff0c\u53d1\u73b0\u4e0d\u540c\u4ef7\u503c\u4f53\u7cfb\u7684\u9884\u6d4b\u529b\uff0c\u5e76\u5206\u6790\u5404\u79cd\u4ef7\u503c\u5bf9LLM\u5b89\u5168\u6027\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8de8\u5b66\u79d1\u52aa\u529b\uff0c\u672c\u6587\u65e8\u5728\u5229\u7528AI\u63a8\u52a8\u4e0b\u4e00\u4ee3\u5fc3\u7406\u6d4b\u5ea6\u7684\u53d1\u5c55\uff0c\u5e76\u5229\u7528\u5fc3\u7406\u6d4b\u5ea6\u4fc3\u8fdb\u4ef7\u503c\u5bfc\u5411\u7684AI\u3002**|\n", "2409.17143": "|**2024-09-25**|**Attention Prompting on Image for Large Vision-Language Models**|Runpeng Yu et.al.|[2409.17143](http://arxiv.org/abs/2409.17143)|**[link](https://github.com/yu-rp/apiprompting)**|**\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u76f8\u6bd4\uff0c\u5927\u578b\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u8fd8\u80fd\u63a5\u53d7\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u56e0\u6b64\u5c55\u793a\u4e86\u66f4\u591a\u6709\u8da3\u7684\u73b0\u8c61\u7ea7\u80fd\u529b\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u8868\u73b0\u3002\u53d7LLM\u4e2d\u6587\u672c\u63d0\u793a\u7684\u542f\u53d1\uff0c\u63a2\u7d22\u4e86\u589e\u5f3aLVLM\u5bf9\u89c6\u89c9\u4fe1\u606f\u611f\u77e5\u80fd\u529b\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u3002\u7136\u800c\uff0c\u4ee5\u5f80\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u4ec5\u5904\u7406\u89c6\u89c9\u8f93\u5165\u800c\u4e0d\u8003\u8651\u6587\u672c\u67e5\u8be2\uff0c\u9650\u5236\u4e86\u6a21\u578b\u9075\u5faa\u6587\u672c\u6307\u4ee4\u5b8c\u6210\u4efb\u52a1\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u7684\u65b0\u63d0\u793a\u6280\u672f\uff0c\u8be5\u6280\u672f\u7b80\u5355\u5730\u5728\u539f\u59cb\u8f93\u5165\u56fe\u50cf\u4e0a\u53e0\u52a0\u4e86\u4e00\u4e2a\u7531\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u751f\u6210\u7684\u3001\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\uff0c\u5e76\u6709\u6548\u5730\u589e\u5f3a\u4e86LVLM\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u4e3a\u8f93\u5165\u56fe\u50cf\u751f\u6210\u4e00\u4e2a\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\u3002\u7136\u540e\uff0c\u70ed\u56fe\u7b80\u5355\u5730\u4e58\u4ee5\u539f\u59cb\u56fe\u50cf\u7684\u50cf\u7d20\u503c\u6765\u83b7\u5f97\u5b9e\u9645\u8f93\u5165\u56fe\u50cf\u4f9bLVLM\u4f7f\u7528\u3002\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6280\u672f\u7684\u6709\u6548\u6027\u3002\u4f8b\u5982\uff0c\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u5206\u522b\u63d0\u9ad8\u4e86LLaVA-1.5\u5728MM-Vet\u548cLLaVA-Wild\u57fa\u51c6\u4e0a\u7684\u6027\u80fd3.8%\u548c2.9%\u3002**|\n", "2409.17141": "|**2024-09-25**|**FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression**|Fazal Mittu et.al.|[2409.17141](http://arxiv.org/abs/2409.17141)|**[link](https://github.com/fazalmittu/finezip)**|**\u672c\u6587\u6df1\u5165\u5206\u6790\u4e86\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u4e0eTransformer\u7684\u6587\u672c\u538b\u7f29\u6280\u672f\uff0c\u5e76\u5c06\u5176\u4e0e\u4f20\u7edf\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u8fdb\u884c\u5bf9\u6bd4\u3002\u5c3d\u7ba1\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\u5728\u538b\u7f29\u6bd4\u4e0a\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5728\u5b9e\u7528\u6027\u65b9\u9762\u5374\u6781\u4e3a\u6709\u9650\u3002\u4ee5Llama3-8B\u4e3a\u57fa\u7840\u7684LLM\u538b\u7f29\u7cfb\u7edf\u2014\u2014LLMZip\uff0c\u5728\u538b\u7f29\u4ec510MB\u6587\u672c\u65f6\u9700\u89819.5\u5929\u7684\u65f6\u95f4\uff0c\u5c3d\u7ba1\u538b\u7f29\u6548\u679c\u6709\u6240\u63d0\u5347\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FineZip\u2014\u2014\u4e00\u79cd\u7ed3\u5408\u5728\u7ebf\u8bb0\u5fc6\u4e0e\u52a8\u6001\u4e0a\u4e0b\u6587\u6982\u5ff5\u7684\u65b0\u578bLLM\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u3002FineZip\u76f8\u8f83\u4e8eLLMZip\uff0c\u5c06\u538b\u7f29\u65f6\u95f4\u5927\u5e45\u7f29\u77ed\u81f3\u7ea64\u5c0f\u65f6\uff0c\u6027\u80fd\u63d0\u5347\u4e8654\u500d\uff0c\u4e14\u4e0e\u4f20\u7edf\u7b97\u6cd5\u538b\u7f29\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5176\u538b\u7f29\u6548\u7387\u63d0\u9ad8\u4e86\u5927\u7ea650%\u3002\u901a\u8fc7\u672c\u7814\u7a76\uff0c\u6211\u4eec\u8fc8\u51fa\u4e86\u8ba9\u57fa\u4e8eLLM\u7684\u65e0\u635f\u6587\u672c\u538b\u7f29\u6210\u4e3a\u73b0\u5b9e\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1FineZip\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46LLM\u4ecd\u4e0d\u9002\u7528\u4e8e\u5927\u89c4\u6a21\u6587\u672c\u538b\u7f29\u3002\u6211\u4eec\u671f\u5f85\u672c\u6587\u7684\u7814\u7a76\u548c\u521b\u65b0\u80fd\u4e3a\u672a\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u94fa\u5e73\u9053\u8def\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAXIS\u7684\u65b0\u578b\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u7f16\u7a0b\u63a5\u53e3\uff08API\uff09\u4f18\u5148\u5904\u7406\u64cd\u4f5c\u800c\u975e\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\uff0c\u4ee5\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u7684\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u95ee\u9898\u3002\u6b64\u5916\uff0cAXIS\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u7a0b\u5e8f\u7684\u65b9\u5f0f\u4fc3\u8fdb\u4e86API\u7684\u521b\u5efa\u4e0e\u6269\u5c55\u3002 \u5728Office Word\u5e94\u7528\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u4efb\u52a1\u5b8c\u6210\u65f6\u95f4\u4e0a\u7f29\u77ed\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u4eba\u7c7b\u3001\u4ee3\u7406\u548c\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u4ee5\u53ca\u5e94\u7528\u7a0b\u5e8f\u63d0\u4f9b\u8005\u5728LLM\u65f6\u4ee3\u7684\u65b0UI\u8bbe\u8ba1\u539f\u5219\u505a\u51fa\u4e86\u8d21\u732e\u3002\u5b83\u4e5f\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e2a\u5e94\u7528\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.17115": "|**2024-09-25**|**Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale**|Fan Zhou et.al.|[2409.17115](http://arxiv.org/abs/2409.17115)|**[link](https://github.com/gair-nlp/prox)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u8bad\u7ec3\u9886\u57df\uff0c\u4eba\u4eec\u957f\u671f\u4ee5\u6765\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u5236\u5b9a\u63d0\u5347\u6570\u636e\u8d28\u91cf\u7684\u542f\u53d1\u5f0f\u89c4\u5219\uff0c\u81f3\u4eca\u5df2\u53d1\u5c55\u51fa\u4f17\u591a\u89c4\u5219\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u89c4\u5219\u7f3a\u4e4f\u7075\u6d3b\u6027\uff0c\u65e0\u6cd5\u6709\u6548\u9488\u5bf9\u6bcf\u4e2a\u5b9e\u4f8b\u7684\u72ec\u7279\u7279\u6027\u8fdb\u884c\u8c03\u6574\u3002\u540c\u65f6\uff0c\u4e3a\u6bcf\u4e2a\u5b9e\u4f8b\u5e94\u7528\u5b9a\u5236\u89c4\u5219\u5bf9\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u800c\u8a00\u662f\u4e0d\u5207\u5b9e\u9645\u7684\u3002\u672c\u6587\u5c55\u793a\u4e86\u5373\u4f7f\u662f\u53c2\u6570\u6570\u91cf\u4ec5\u67090.3B\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u4e5f\u80fd\u5c55\u73b0\u51fa\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u76f8\u5f53\u7684\u6570\u636e\u4f18\u5316\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f16\u7a0b\u6bcf\u4f8b\u201d\uff08ProX\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u6570\u636e\u4f18\u5316\u89c6\u4e3a\u7f16\u7a0b\u4efb\u52a1\uff0c\u5141\u8bb8\u6a21\u578b\u901a\u8fc7\u751f\u6210\u5e76\u6267\u884c\u7cbe\u7ec6\u7c92\u5ea6\u7684\u64cd\u4f5c\uff08\u5982\u5b57\u7b26\u4e32\u89c4\u8303\u5316\uff09\u5bf9\u6bcf\u4e2a\u4e2a\u4f53\u5b9e\u4f8b\u8fdb\u884c\u5927\u89c4\u6a21\u4f18\u5316\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528ProX\u7b5b\u9009\u540e\u7684\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u5404\u79cd\u4e0b\u6e38\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u4f18\u4e8e\u539f\u59cb\u6570\u636e\u6216\u7531\u5176\u4ed6\u7b5b\u9009\u65b9\u6cd5\u5904\u7406\u7684\u6570\u636e\uff0c\u6027\u80fd\u63d0\u5347\u8d85\u8fc72%\u3002\u8be5\u6846\u67b6\u7684\u6709\u6548\u6027\u9002\u7528\u4e8e\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5305\u62ecC4\u3001RedPajama-V2\u548cFineWeb\u3002\u6b64\u5916\uff0cProX\u5728\u7279\u5b9a\u9886\u57df\u7684\u8fde\u7eed\u9884\u8bad\u7ec3\u4e2d\u8868\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff1a\u5728\u65e0\u9700\u7279\u5b9a\u9886\u57df\u8bbe\u8ba1\u7684\u60c5\u51b5\u4e0b\uff0c\u4f7f\u7528ProX\u4f18\u5316\u7684OpenWebMath\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u51c6\u786e\u6027\u4e0a\u5206\u522b\u6bd4Mistral-7B\u3001Llama-2-7B\u548cCodeLlama-7B\u63d0\u9ad8\u4e867.6%\u300114.6%\u548c20.3%\uff0c\u4ec5\u4f7f\u7528\u7ea610B\u4ee4\u724c\u5373\u53ef\u8fbe\u5230\u7c7b\u4f3c\u4e8e\u4f7f\u7528200B\u4ee4\u724c\u9884\u8bad\u7ec3\u7684Llama-7B\u6a21\u578b\u7684\u6c34\u5e73\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0cProX\u663e\u8457\u8282\u7701\u4e86\u8bad\u7ec3FLOPs\uff0c\u4e3a\u9ad8\u6548LLM\u9884\u8bad\u7ec3\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u9053\u8def\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86ProX\uff0c\u5305\u62ec>100B\u7684\u8bed\u6599\u5e93\u3001\u6a21\u578b\u4ee5\u53ca\u6240\u6709\u8bad\u7ec3\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u4ee5\u4fc3\u8fdb\u53ef\u590d\u5236\u7814\u7a76\u548c\u672a\u6765\u521b\u65b0\u3002\u4ee3\u7801\uff1ahttps://github.com/GAIR-NLP/ProX**|\n", "2409.17092": "|**2024-09-25**|**Accumulator-Aware Post-Training Quantization**|Ian Colbert et.al.|[2409.17092](http://arxiv.org/abs/2409.17092)|null|\u8fd1\u5e74\u6765\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u7d22\u4e86\u4f4e\u7cbe\u5ea6\u7d2f\u52a0\uff0c\u62a5\u544a\u4e86\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684\u541e\u5410\u91cf\u3001\u529f\u7387\u548c\u9762\u79ef\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u63d0\u8bae\u4ec5\u8003\u8651\u4e86\u91cf\u5316\u611f\u77e5\u8bad\u7ec3\uff08QAT\uff09\u8303\u5f0f\uff0c\u5728\u8be5\u8303\u5f0f\u4e2d\uff0c\u6a21\u578b\u5728\u91cf\u5316\u5faa\u73af\u4e2d\u8fdb\u884c\u5fae\u8c03\u6216\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3002\u968f\u7740\u6a21\u578b\u7ee7\u7eed\u589e\u5927\uff0cQAT\u6280\u672f\u7684\u6210\u672c\u53d8\u5f97\u8d8a\u6765\u8d8a\u9ad8\uff0c\u8fd9\u6fc0\u53d1\u4e86\u6700\u8fd1\u5bf9\u540e\u91cf\u5316\u91cf\u5316\uff08PTQ\uff09\u7814\u7a76\u7684\u70ed\u6f6e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u6b63\u5f0f\u7814\u7a76PTQ\u80cc\u666f\u4e0b\u7684\u79ef\u7b97\u5668\u611f\u77e5\u91cf\u5316\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AXE\uff0c\u4e00\u4e2a\u65e8\u5728\u8d4b\u4e88\u73b0\u6709\u5c42\u5f0fPTQ\u7b97\u6cd5\u6ea2\u51fa\u907f\u514d\u4fdd\u8bc1\u7684\u5b9e\u7528\u6846\u67b6\u7684\u6269\u5c55\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u6700\u5148\u8fdb\u7684PTQ\u7b97\u6cd5\uff1aGPFQ\u548cOPTQ\u4e4b\u4e0a\u5b9e\u73b0AXE\u6765\u7406\u8bba\u5730\u63a8\u52a8AXE\uff0c\u5e76\u8bc1\u660e\u5176\u7075\u6d3b\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u901a\u8fc7\u9996\u6b21\u652f\u6301\u591a\u9636\u6bb5\u79ef\u7d2f\u6765\u4e00\u822c\u5316AXE\uff0c\u4e3a\u5168\u6570\u636e\u8def\u5f84\u4f18\u5316\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6269\u5c55\u6253\u5f00\u5927\u95e8\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u8bed\u8a00\u751f\u6210\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86AXE\uff0c\u5e76\u89c2\u5bdf\u5230\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5728\u79ef\u7b97\u5668\u4f4d\u5bbd\u4e0e\u6a21\u578b\u51c6\u786e\u6027\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u6539\u8fdb\u3002|\n", "2409.17066": "|**2024-09-25**|**VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models**|Yifei Liu et.al.|[2409.17066](http://arxiv.org/abs/2409.17066)|**[link](https://github.com/microsoft/vptq)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aVector Post-Training Quantization\uff08VPTQ\uff09\u7684\u4f4e\u6bd4\u7279\u91cf\u5316\u65b9\u6cd5\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u8fc7\u4f7f\u7528\u4e8c\u6b21\u4f18\u5316\u6765\u5b9a\u4e49LLM\u5411\u91cf\u91cf\u5316\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u89e3\u51b3\u4f18\u5316\u95ee\u9898\u6765\u6307\u5bfc\u91cf\u5316\u7b97\u6cd5\u8bbe\u8ba1\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u5f15\u5165\u4e86\u901a\u9053\u72ec\u7acb\u7684\u4e8c\u6b21\u4f18\u5316\u4ee5\u5b9e\u73b0\u7cbe\u7ec6\u5316\u91cf\u5316\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u5206\u89e3\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u51fa\u4e86\u7b80\u660e\u6709\u6548\u7684\u4ee3\u7801\u672c\u521d\u59cb\u5316\u7b97\u6cd5\u3002\u6b64\u5916\uff0cVPTQ\u8fd8\u6269\u5c55\u4e86\u6b8b\u5dee\u548c\u5f02\u5e38\u503c\u91cf\u5316\u652f\u6301\uff0c\u8fd9\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6a21\u578b\u7cbe\u5ea6\uff0c\u8fd8\u80fd\u8fdb\u4e00\u6b65\u538b\u7f29\u6a21\u578b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eSOTA\u76f8\u6bd4\uff0c\u57282\u6bd4\u7279\u91cf\u5316\u65f6\uff0cVPTQ\u5c06\u6a21\u578b\u91cf\u5316\u56f0\u60d1\u5ea6\u964d\u4f4e0.01-0.34\uff0cMistral-7B\u4e0a\u4e3a0.38-0.68\uff0cLLaMA-3\u4e0a\u4e3a4.41-7.34\u3002\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u5e73\u5747\u51c6\u786e\u5ea6\u63d0\u5347\u8303\u56f4\u4e3aLLaMA-2\u4e0a\u76840.79%-1.5%\uff0cMistral-7B\u4e0a\u76841%\uff0c\u4ee5\u53caLLaMA-3\u4e0a\u768411%-22%\u3002\u91cf\u5316\u7b97\u6cd5\u6267\u884c\u65f6\u95f4\u4ec5\u536010.4%-18.6%\uff0c\u5bfc\u81f4\u63a8\u7406\u541e\u5410\u91cf\u63d0\u9ad81.6-1.8\u500d\u3002**|\n", "2409.17054": "|**2024-09-25**|**Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia**|Azmul Asmar Irfan et.al.|[2409.17054](http://arxiv.org/abs/2409.17054)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u672c\u5730\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528Whisper\u6a21\u578b\u8fdb\u884c\u8f6c\u5f55\uff0cGPT-3\u8fdb\u884c\u603b\u7ed3\uff0c\u5e76\u5c06\u5176\u683c\u5f0f\u5316\u4e3aePuskemas\u533b\u7597\u8bb0\u5f55\u3002\u6b64\u7cfb\u7edf\u4f5c\u4e3a\u73b0\u6709\u7f51\u7edc\u6d4f\u89c8\u5668\u6269\u5c55\u7684\u9644\u52a0\u7ec4\u4ef6\u5b9e\u73b0\uff0c\u5141\u8bb8\u533b\u751f\u5728\u8bf4\u8bdd\u65f6\u586b\u5199\u60a3\u8005\u8868\u683c\u3002\u901a\u8fc7\u5229\u7528\u5b9e\u65f6\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u529f\u80fd\uff0c\u533b\u751f\u53ef\u4ee5\u63d0\u9ad8\u60a3\u8005\u62a4\u7406\u7684\u5468\u8f6c\u65f6\u95f4\uff0c\u540c\u65f6\u589e\u5f3a\u8bb0\u5f55\u7684\u8d28\u91cf\uff0c\u4f7f\u5f97\u8bb0\u5f55\u66f4\u52a0\u8be6\u7ec6\u4e14\u5bcc\u6709\u6d1e\u5bdf\u529b\uff0c\u4ee5\u4f9b\u672a\u6765\u7684\u8bbf\u95ee\u53c2\u8003\u3002\u8fd9\u4e00\u521b\u65b0\u65e8\u5728\u89e3\u51b3\u5370\u5c3c\u533b\u7597\u673a\u6784\u62e5\u6324\u4ee5\u53ca\u533b\u62a4\u4eba\u5458\u884c\u653f\u8d1f\u62c5\u91cd\u7684\u95ee\u9898\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5c06\u5e2e\u52a9\u533b\u751f\u8282\u7701\u65f6\u95f4\u3001\u63d0\u4f9b\u66f4\u597d\u7684\u62a4\u7406\u5e76\u4ea7\u751f\u66f4\u51c6\u786e\u7684\u533b\u7597\u8bb0\u5f55\uff0c\u4ee3\u8868\u4e86\u5411\u73b0\u4ee3\u5316\u533b\u7597\u4fdd\u5065\u8fc8\u8fdb\u7684\u91cd\u8981\u4e00\u6b65\uff0c\u786e\u4fdd\u5373\u4f7f\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u60a3\u8005\u4e5f\u80fd\u83b7\u5f97\u53ca\u65f6\u3001\u9ad8\u8d28\u91cf\u7684\u62a4\u7406\u3002|\n", "2409.17044": "|**2024-09-25**|**How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not**|Francesco Verdini et.al.|[2409.17044](http://arxiv.org/abs/2409.17044)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u60ca\u4eba\u8868\u73b0\u63a8\u52a8\u4e86\u7814\u7a76\u52aa\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u4efb\u52a1\u548c\u8f93\u5165\u6a21\u6001\u3002\u5728\u8bed\u97f3\u8f6c\u6587\u672c\uff08S2T\uff09\u4efb\u52a1\u4e2d\uff0c\u65b0\u5174\u7684\u89e3\u51b3\u65b9\u6848\u662f\u901a\u8fc7\u9002\u914d\u5668\u6a21\u5757\u5c06\u8bed\u97f3\u57fa\u7840\u6a21\u578b\uff08SFM\uff09\u7684\u8f93\u51fa\u6295\u5f71\u5230LLM\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u76ee\u524d\u8fd8\u6ca1\u6709\u5de5\u4f5c\u63a2\u8ba8\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6bcf\u4e2a\u7ec4\u4ef6\uff08SFM\u3001\u9002\u914d\u5668\u3001LLM\uff09\uff0c\u6216\u8005\u9009\u62e9\u9002\u914d\u5668\u7684\u6700\u4f73\u8bbe\u8ba1\u662f\u5426\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e865\u4e2a\u9002\u914d\u5668\u6a21\u5757\u30012\u4e2aLLM\uff08Mistral\u548cLlama\uff09\u4ee5\u53ca2\u4e2aSFM\uff08Whisper\u548cSeamlessM4T\uff09\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u548c\u8bed\u97f3\u7ffb\u8bd1\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684S2T\u4efb\u52a1\u4e0a\u7684\u7ec4\u5408\u6548\u679c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cSFM\u5728\u4e0b\u6e38\u6027\u80fd\u4e2d\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\uff0c\u800c\u9002\u914d\u5668\u7684\u9009\u62e9\u5177\u6709\u9002\u5ea6\u7684\u5f71\u54cd\uff0c\u5e76\u4e14\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002|\n", "2409.17027": "|**2024-09-25**|**Counterfactual Token Generation in Large Language Models**|Ivi Chatzi et.al.|[2409.17027](http://arxiv.org/abs/2409.17027)|**[link](https://github.com/networks-learning/counterfactual-llms)**|\u672c\u6587\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u529f\u80fd\uff0c\u4f7f\u5176\u80fd\u591f\u63a8\u7406\u8fc7\u53bb\u751f\u6210\u7684\u4ee4\u724c\u6240\u5448\u73b0\u7684\u53ef\u80fd\u66ff\u4ee3\u60c5\u51b5\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGumbel-Max\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u7684\u56e0\u679c\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fd9\u4e00\u529f\u80fd\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u51e0\u4e4e\u4e0d\u589e\u52a0\u4e0e\u57fa\u7840\u4ee4\u724c\u751f\u6210\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u8fdb\u884c\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\uff0c\u5b9e\u73b0\u8fc7\u7a0b\u7b80\u5355\u4e14\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u6216\u63d0\u793a\u5de5\u7a0b\u3002\u6211\u4eec\u5728\u6b64\u57fa\u7840\u4e0a\u5728Llama 3 8B-instruct\u4e0a\u5b9e\u73b0\u4e86\u8be5\u6a21\u578b\uff0c\u5e76\u5bf9\u751f\u6210\u7684\u53cd\u4e8b\u5b9e\u6587\u672c\u8fdb\u884c\u4e86\u5b9a\u6027\u548c\u5b9a\u91cf\u5206\u6790\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\u5728\u504f\u89c1\u68c0\u6d4b\u65b9\u9762\u7684\u5e94\u7528\uff0c\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u7684\u4e16\u754c\u6a21\u578b\u4e2d\u7684\u4e00\u4e9b\u6709\u8da3\u89c1\u89e3\u3002|\n", "2409.17011": "|**2024-09-25**|**LLM-CARD: Towards a Description and Landscape of Large Language Models**|Shengwei Tian et.al.|[2409.17011](http://arxiv.org/abs/2409.17011)|**[link](https://github.com/shengwei-tian/dependency-parser-visualization)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e2d\u4e0d\u65ad\u6d8c\u73b0\u3002\u968f\u7740\u53d1\u8868\u7684\u8bba\u6587\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\uff0c\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u9762\u4e34\u4fe1\u606f\u8fc7\u8f7d\u7684\u6311\u6218\u3002\u56e0\u6b64\uff0c\u5f00\u53d1\u4e00\u4e2a\u80fd\u591f\u81ea\u52a8\u4ece\u5b66\u672f\u8bba\u6587\u4e2d\u63d0\u53d6\u5e76\u7ec4\u7ec7LLM\u5173\u952e\u4fe1\u606f\u7684\u7cfb\u7edf\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u5173\u7cfb\u62bd\u53d6\uff08RE\uff09\u65b9\u6cd5\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u4ee5\u81ea\u52a8\u4ece\u8bba\u6587\u4e2d\u63d0\u53d6\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9ad8\u6548\u5730\u83b7\u53d6\u5173\u4e8eLLMs\u7684\u4fe1\u606f\u3002\u8fd9\u4e9b\u7279\u6027\u5305\u62ec\u6a21\u578b\u7684\u201c\u8bb8\u53ef\u201d\u3001\u201c\u540d\u79f0\u201d\u548c\u201c\u5e94\u7528\u201d\u3002\u501f\u52a9\u8fd9\u4e9b\u7279\u6027\uff0c\u6211\u4eec\u53ef\u4ee5\u4e3a\u6bcf\u7bc7\u8bba\u6587\u5f62\u6210\u4e00\u4e2a\u6a21\u578b\u5361\u7247\u3002\u5728\u6570\u636e\u8d21\u732e\u65b9\u9762\uff0c\u5bf9106\u7bc7\u5b66\u672f\u8bba\u6587\u8fdb\u884c\u4e86\u5904\u7406\uff0c\u5b9a\u4e49\u4e86\u4e09\u4e2a\u5b57\u5178\u2014\u2014LLMs\u540d\u79f0\u3001\u8bb8\u53ef\u548c\u5e94\u7528\u3002\u901a\u8fc7\u5b57\u5178\u67e5\u627e\u63d0\u53d6\u4e8611051\u4e2a\u53e5\u5b50\uff0c\u5e76\u901a\u8fc7\u4eba\u5de5\u5ba1\u67e5\u6700\u7ec8\u9009\u62e9\u4e86129\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u540d\u79f0\u4e0e\u8bb8\u53ef\u4e4b\u95f4\u7684\u94fe\u63a5\uff0c\u4ee5\u53ca106\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u6a21\u578b\u540d\u79f0\u4e0e\u5e94\u7528\u4e4b\u95f4\u7684\u94fe\u63a5\u3002|\n", "2409.18127": "|**2024-09-26**|**EgoLM: Multi-Modal Language Model of Egocentric Motions**|Fangzhou Hong et.al.|[2409.18127](http://arxiv.org/abs/2409.18127)|null|\u5728\u7a7f\u6234\u8bbe\u5907\u7684\u666e\u53ca\u80cc\u666f\u4e0b\uff0c\u7406\u89e3\u4e3b\u89c2\u89c6\u89d2\u7684\u52a8\u4f5c\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53d1\u5c55\u5177\u6709\u60c5\u5883\u610f\u8bc6\u7684\u4eba\u5de5\u667a\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEgoLM\u7684\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u4ece\u591a\u6a21\u6001\u8f93\u5165\uff08\u5982\u4e3b\u89c2\u89c6\u9891\u548c\u8fd0\u52a8\u4f20\u611f\u5668\uff09\u4e2d\u8ddf\u8e2a\u548c\u7406\u89e3\u4e3b\u89c2\u52a8\u4f5c\u3002EgoLM\u901a\u8fc7\u5229\u7528\u4e30\u5bcc\u7684\u4e0a\u4e0b\u6587\u6765\u89e3\u51b3\u5355\u6a21\u6001\u6761\u4ef6\u4e0b\u7684\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u548c\u7406\u89e3\u96be\u9898\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u901a\u7528\u4e14\u591a\u6a21\u6001\u7684\u6846\u67b6\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5efa\u6a21\u4e3b\u4f53\u52a8\u4f5c\u548c\u81ea\u7136\u8bed\u8a00\u7684\u8054\u5408\u5206\u5e03\u3002\u591a\u6a21\u6001\u4f20\u611f\u5668\u8f93\u5165\u88ab\u7f16\u7801\u5e76\u6295\u5f71\u5230\u8bed\u8a00\u6a21\u578b\u7684\u8054\u5408\u6f5c\u5728\u7a7a\u95f4\u4e2d\uff0c\u5e76\u7528\u4e8e\u89e6\u53d1\u52a8\u4f5c\u751f\u6210\u6216\u6587\u672c\u751f\u6210\uff0c\u5206\u522b\u7528\u4e8e\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u6216\u7406\u89e3\u3002\u5927\u89c4\u6a21\u591a\u6a21\u6001\u4eba\u4f53\u52a8\u4f5c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86EgoLM\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u5728\u666e\u904d\u4e3b\u89c2\u5b66\u4e60\u4e2d\u7684\u6709\u6548\u6027\u3002|\n", "2409.18119": "|**2024-09-26**|**Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography**|Yuexi Du et.al.|[2409.18119](http://arxiv.org/abs/2409.18119)|null|\u5728\u533b\u7597\u56fe\u50cf\u5206\u6790\u9886\u57df\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u5176\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7684CLIP\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5982\u80f8\u7247\u8fd9\u7c7b\u62e5\u6709\u4e30\u5bcc\u56fe\u50cf\u62a5\u544a\u6570\u636e\u7684\u6a21\u6001\u4e0a\uff0c\u800c\u5ffd\u7565\u4e86\u8bf8\u5982\u4e73\u817aX\u5149\u7b49\u8bb8\u591a\u91cd\u8981\u6a21\u6001\u7684\u7814\u7a76\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u5c06\u5b8c\u6574\u7684CLIP\u6a21\u578b\u5e94\u7528\u4e8e\u4e73\u817aX\u5149\u56fe\u50cf\u5206\u6790\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u6807\u8bb0\u6570\u636e\u7a00\u7f3a\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u5c0f\u611f\u5174\u8da3\u533a\u57df\u4ee5\u53ca\u6570\u636e\u4e0d\u5e73\u8861\u7684\u6311\u6218\u3002 \u6211\u4eec\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u79cd\u9488\u5bf9\u4e73\u817aX\u5149\u7684\u4e13\u7528\u76d1\u7763\u6846\u67b6\uff0c\u5229\u7528\u5176\u591a\u89c6\u56fe\u7279\u6027\u3002\u6b64\u5916\uff0c\u8bbe\u8ba1\u4e86\u5bf9\u9f50\u6a21\u5757\u4ee5\u66f4\u597d\u5730\u805a\u7126\u4e8e\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u8be6\u7ec6\u7279\u5f81\u3002\u6700\u540e\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\uff0c\u7528\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u9884\u5148\u4f7f\u7528\u533b\u5b66\u77e5\u8bc6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u5e94\u5bf9\u6570\u636e\u9650\u5236\u95ee\u9898\u3002 \u6211\u4eec\u7684\u591a\u89c6\u56fe\u548c\u591a\u5c3a\u5ea6\u5bf9\u9f50\uff08MaMA\uff09\u65b9\u6cd5\uff0c\u5728\u4e24\u4e2a\u5927\u578b\u771f\u5b9e\u4e16\u754c\u4e73\u817aX\u5149\u6570\u636e\u96c6EMBED\u548cRSNA-Mammo\u4e0a\uff0c\u5bf9\u4e8e\u4e09\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u76f8\u6bd4\u6700\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u4ec5\u4f7f\u7528\u4e8652%\u7684\u6a21\u578b\u5927\u5c0f\u3002|\n", "2409.18111": "|**2024-09-26**|**E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding**|Ye Liu et.al.|[2409.18111](http://arxiv.org/abs/2409.18111)|**[link](https://github.com/PolyU-ChenLab/ETBench)**|**\u4e3a\u4e86\u9a8c\u8bc1\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\uff08Video Large Language Models, Video-LLMs\uff09\u5728\u901a\u7528\u89c6\u9891\u7406\u89e3\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5df2\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bca\u65ad\u6a21\u578b\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u901a\u8fc7\u89c6\u9891\u7ea7\u95ee\u9898\u56de\u7b54\u8fdb\u884c\u8bc4\u4f30\uff0c\u7f3a\u4e4f\u5bf9\u4e8b\u4ef6\u7ea7\u522b\u7684\u7cbe\u7ec6\u8bc4\u4f30\u548c\u4efb\u52a1\u591a\u6837\u6027\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86E.T. Bench\uff08\u4e8b\u4ef6\u7ea7\u522b\u4e0e\u65f6\u95f4\u654f\u611f\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u5f00\u653e\u5f0f\u7684\u4e8b\u4ef6\u7ea7\u522b\u89c6\u9891\u7406\u89e3\u7684\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u57fa\u51c6\u6d4b\u8bd5\u3002 E.T. Bench\u6309\u7167\u4e09\u5c42\u4efb\u52a1\u5206\u7c7b\u4f53\u7cfb\u8fdb\u884c\u7ec4\u7ec7\uff0c\u5305\u542b\u4e86\u6db5\u76d612\u4e2a\u4efb\u52a1\u76847300\u4e2a\u6837\u672c\uff0c\u4ee5\u53ca8\u4e2a\u9886\u57df\u76842514\u5c0f\u65f6\u603b\u65f6\u957f\u76847000\u4e2a\u89c6\u9891\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5e7f\u6cdb\u5730\u5bf98\u4e2a\u56fe\u50cf\u5927\u8bed\u8a00\u6a21\u578b\u548c12\u4e2a\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5e76\u4e14\u7ed3\u679c\u663e\u793a\uff0c\u7528\u4e8e\u7c97\u7c92\u5ea6\uff08\u89c6\u9891\u7ea7\uff09\u7406\u89e3\u7684\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u89e3\u51b3\u6211\u4eec\u7684\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u4f8b\u5982\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u611f\u5174\u8da3\u7684\u4e8b\u4ef6\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u89c6\u9891\u4e0a\u4e0b\u6587\u957f\u5ea6\u77ed\u3001\u65f6\u95f4\u8868\u793a\u4e0d\u5f53\u4ee5\u53ca\u7f3a\u4e4f\u591a\u4e8b\u4ef6\u8bad\u7ec3\u6570\u636e\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\u2014\u2014E.T. Chat\uff0c\u4ee5\u53ca\u4e13\u95e8\u4e3a\u7cbe\u7ec6\u7c92\u5ea6\u4e8b\u4ef6\u7406\u89e3\u8bbe\u8ba1\u7684\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6E.T. Instruct 164K\u3002\u6211\u4eec\u7684\u7b80\u5355\u4f46\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6848\u5728\u591a\u4e2a\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002**|\n", "2409.18060": "|**2024-09-26**|**Infering Alt-text For UI Icons With Large Language Models During App Development**|Sabrina Haque et.al.|[2409.18060](http://arxiv.org/abs/2409.18060)|null|\u786e\u4fdd\u79fb\u52a8\u5e94\u7528\u7684\u65e0\u969c\u788d\u6027\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u4f9d\u8d56\u5c4f\u5e55\u9605\u8bfb\u5668\u7684\u89c6\u969c\u7528\u6237\u3002\u754c\u9762\u56fe\u6807\u5bf9\u4e8e\u5bfc\u822a\u548c\u4e92\u52a8\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u7f3a\u4e4f\u6709\u610f\u4e49\u7684\u66ff\u4ee3\u6587\u672c\uff0c\u4ece\u800c\u5f62\u6210\u4f7f\u7528\u969c\u788d\u3002\u4f20\u7edf\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5728\u751f\u6210\u66ff\u4ee3\u6587\u672c\u65f6\u9700\u8981\u5927\u91cf\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u5728\u56fe\u6807\u7c7b\u578b\u591a\u6837\u6027\u4e0e\u4e0d\u5e73\u8861\u6027\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u66f4\u8fd1\u671f\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5219\u8981\u6c42\u5b8c\u6574\u7684UI\u5c4f\u5e55\uff0c\u8fd9\u5728\u5e94\u7528\u7a0b\u5e8f\u5f00\u53d1\u7684\u8fed\u4ee3\u9636\u6bb5\u53ef\u80fd\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u90e8\u5206UI\u6570\u636e\u81ea\u4e3b\u751f\u6210\u79fb\u52a8UI\u56fe\u6807\u7684\u63cf\u8ff0\u6027\u66ff\u4ee3\u6587\u672c\u3002\u901a\u8fc7\u6574\u5408\u5305\u62ec\u7c7b\u522b\u3001\u8d44\u6e90ID\u3001\u8fb9\u754c\u3001OCR\u68c0\u6d4b\u5230\u7684\u6587\u5b57\u4ee5\u53ca\u7236\u8282\u70b9\u548c\u540c\u7ea7\u8282\u70b9\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5728\u5185\u7684\u56fe\u6807\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u5bf9\u5927\u7ea61400\u4e2a\u56fe\u6807\u7684\u5c0f\u578b\u6570\u636e\u96c6\u8fdb\u884c\u79bb\u7ebf\u5fae\u8c03\uff0c\u4ece\u800c\u751f\u6210\u4e86IconDesc\u3002\u5728\u5b9e\u8bc1\u8bc4\u4f30\u548c\u7528\u6237\u7814\u7a76\u4e2d\uff0cIconDesc\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u76f8\u5173\u66ff\u4ee3\u6587\u672c\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u80fd\u529b\u4f7f\u5f97IconDesc\u6210\u4e3a\u5f00\u53d1\u8005\u7684\u91cd\u8981\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u5feb\u901f\u8fed\u4ee3\u548c\u63d0\u5347UI\u7684\u65e0\u969c\u788d\u6027\u3002|\n", "2409.18053": "|**2024-09-26**|**DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving**|Dingrui Wang et.al.|[2409.18053](http://arxiv.org/abs/2409.18053)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u81ea\u4e3b\u9a7e\u9a76\u6846\u67b6DualAD\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u9a7e\u9a76\u8fc7\u7a0b\u4e2d\u7684\u51b3\u7b56\u903b\u8f91\u3002DualAD\u7531\u4e24\u5c42\u6784\u6210\uff1a\u5e95\u5c42\u4e3a\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\uff0c\u8d1f\u8d23\u5904\u7406\u9700\u8981\u8f83\u5c11\u51b3\u7b56\u7684\u5e38\u89c4\u9a7e\u9a76\u4efb\u52a1\uff1b\u4e0a\u5c42\u5219\u914d\u5907\u4e86\u4e00\u4e2a\u57fa\u4e8e\u89c4\u5219\u7684\u6587\u5b57\u7f16\u7801\u5668\uff0c\u5c06\u7edd\u5bf9\u72b6\u6001\u4e0b\u7684\u9a7e\u9a76\u573a\u666f\u8f6c\u5316\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6b64\u6587\u672c\u968f\u540e\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u51b3\u7b56\u3002\u5f53\u68c0\u6d4b\u5230\u6f5c\u5728\u5371\u9669\u65f6\uff0c\u4e0a\u5c42\u4f1a\u4ecb\u5165\u5e95\u5c42\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u5728\u5173\u952e\u60c5\u51b5\u4e0b\u7684\u51b3\u7b56\u903b\u8f91\u3002\u95ed\u5408\u73af\u8def\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528\u96f6\u8bad\u7ec3\u9884\u8bad\u7ec3\u6a21\u578b\u7684DualAD\u663e\u8457\u4f18\u4e8e\u7f3a\u4e4f\u51b3\u7b56\u80fd\u529b\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8fd8\u5f3a\u8c03\u4e86\u6587\u5b57\u7f16\u7801\u5668\u7684\u6709\u6548\u6027\uff0c\u5b83\u6781\u5927\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u5bf9\u573a\u666f\u7684\u7406\u89e3\u80fd\u529b\u3002\u6b64\u5916\uff0c\u96c6\u6210\u7684DualAD\u6a21\u578b\u968f\u7740\u66f4\u5f3a\u5927\u7684LLM\u7684\u4f7f\u7528\u800c\u5f97\u5230\u6539\u5584\uff0c\u8fd9\u8868\u660e\u8be5\u6846\u67b6\u5177\u6709\u8fdb\u4e00\u6b65\u589e\u5f3a\u7684\u6f5c\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4ee3\u7801\u548c\u57fa\u51c6\u6d4b\u8bd5\u4f9b\u516c\u4f17\u8bbf\u95ee\u3002|\n", "2409.18042": "|**2024-09-26**|**EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions**|Kai Chen et.al.|[2409.18042](http://arxiv.org/abs/2409.18042)|null|\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u4e2d\uff0c\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u4ee5\u516c\u5f00\u6570\u636e\u8fdb\u884c\u7aef\u5230\u7aef\u7684\u56fe\u50cf\u3001\u6587\u672c\u548c\u8bed\u97f3\u751f\u6210\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u73b0\u6709\u7684\u89c6\u8bed\u6a21\u578b\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u8fdb\u884c\u8bed\u97f3\u5904\u7406\uff0c\u800c\u8bed\u97f3\u8bed\u6a21\u578b\u4ecd\u7f3a\u4e4f\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86EMOVA\uff08\u60c5\u7eea\u5316\u7684\u5168\u6a21\u5f0f\u8bed\u97f3\u52a9\u624b\uff09\uff0c\u4ee5\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5177\u5907\u7aef\u5230\u7aef\u7684\u8bed\u97f3\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9886\u5148\u7684\u89c6\u8bed\u8868\u73b0\u3002\u901a\u8fc7\u8bed\u4e49-\u58f0\u5b66\u5206\u79bb\u7684\u8bed\u97f3\u7f16\u7801\u5668\uff0c\u6211\u4eec\u610f\u5916\u5730\u53d1\u73b0\uff0c\u5168\u6a21\u6001\u5bf9\u9f50\u53ef\u4ee5\u8fdb\u4e00\u6b65\u589e\u5f3a\u89c6\u8bed\u548c\u8bed\u97f3\u80fd\u529b\uff0c\u4e0e\u76f8\u5e94\u7684\u53cc\u6a21\u6001\u5bf9\u9f50\u6a21\u578b\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u98ce\u683c\u6a21\u5757\uff0c\u7528\u4e8e\u7075\u6d3b\u63a7\u5236\u8bed\u97f3\u98ce\u683c\uff08\u4f8b\u5982\u60c5\u611f\u548c\u97f3\u8c03\uff09\u3002\u9996\u6b21\uff0cEMOVA\u5728\u89c6\u8bed\u548c\u8bed\u97f3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u540c\u65f6\u652f\u6301\u5e26\u6709\u751f\u52a8\u60c5\u611f\u7684\u5168\u6a21\u6001\u5bf9\u8bdd\u3002|\n", "2409.18028": "|**2024-09-26**|**Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective**|Yotam Wolf et.al.|[2409.18028](http://arxiv.org/abs/2409.18028)|null|\u5728\u8fdb\u884c\u590d\u6742\u5206\u6790\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f7f\u7528\u4e2d\uff0c\u901a\u5e38\u4f1a\u5c06\u6574\u4e2a\u4efb\u52a1\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u8fdb\u884c\u91c7\u6837\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u4e2d\u5206\u89e3\u4efb\u52a1\uff08\u5373\u94fe\u5f0f\u601d\u7ef4\uff09\u5bf9\u4e8e\u89e3\u51b3\u8fd9\u7c7b\u4efb\u52a1\u662f\u6709\u76ca\u7684\u3002\u672c\u6587\u6307\u51fa\u4e86\u4e00\u79cd\u9650\u5236\uff0c\u5373LLM\u5728\u540c\u4e00\u4e2a\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u6267\u884c\u591a\u4e2a\u5b50\u4efb\u52a1\u7684\u80fd\u529b\u2014\u2014\u4e00\u79cd\u201c\u590d\u5408\u96be\u5ea6\u201d\u3002\u8fd9\u8868\u660e\u5728LLM\u7ec4\u6210\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u5c06\u5206\u89e3\u540e\u7684\u95ee\u9898\u5206\u53d1\u5904\u7406\u5177\u6709\u4f18\u52bf\u3002\u6211\u4eec\u901a\u8fc7\u751f\u6210\u590d\u6742\u5ea6\u6307\u6807\u6765\u91cf\u5316\u8fd9\u79cd\u590d\u5408\u96be\u5ea6\uff0c\u5373\u5728\u91c7\u6837\u5230\u81f3\u5c11\u4e00\u4e2a\u6b63\u786e\u89e3\u6240\u9700\u7684LLM\u751f\u6210\u6b21\u6570\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u76f8\u5bf9\u4e8e\u5728\u76f8\u540c\u4e0a\u4e0b\u6587\u5185\u89e3\u51b3\u7ec4\u5408\u95ee\u9898\uff0c\u5c06\u95ee\u9898\u5206\u6563\u7ed9\u591a\u4e2a\u667a\u80fd\u4f53\u7684\u751f\u6210\u590d\u6742\u5ea6\u4e4b\u95f4\u5b58\u5728\u5dee\u8ddd\uff0c\u5e76\u4e14\u968f\u7740\u89e3\u957f\u5ea6\u7684\u589e\u52a0\uff0c\u8fd9\u4e2a\u5dee\u8ddd\u5448\u6307\u6570\u589e\u957f\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u548c\u5b9e\u9a8c\u8bc1\u660e\u4e86\u8fd9\u4e00\u7ed3\u679c\u3002|\n", "2409.18025": "|**2024-09-26**|**An Adversarial Perspective on Machine Unlearning for AI Safety**|Jakub \u0141ucki et.al.|[2409.18025](http://arxiv.org/abs/2409.18025)|**[link](https://github.com/ethz-spylab/unlearning-vs-safety)**|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u62d2\u7edd\u5371\u9669\u77e5\u8bc6\u76f8\u5173\u95ee\u9898\u65b9\u9762\u7684\u5fae\u8c03\u65b9\u5f0f\uff0c\u4f46\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5f80\u5f80\u5bb9\u6613\u88ab\u7ed5\u8fc7\u3002\u53bb\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5f7b\u5e95\u6d88\u9664\u6a21\u578b\u7684\u5371\u9669\u80fd\u529b\u5e76\u4f7f\u5176\u5bf9\u653b\u51fb\u8005\u4e0d\u53ef\u8bbf\u95ee\u3002\u672c\u6587\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u6311\u6218\u4e86\u53bb\u5b66\u4e60\u4e0e\u4f20\u7edf\u5b89\u5168\u540e\u8bad\u7ec3\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u4e4b\u524d\u88ab\u8ba4\u4e3a\u65e0\u6548\u7684\u73b0\u6709\u9003\u8131\u65b9\u6cd5\uff0c\u5728\u7cbe\u5fc3\u5e94\u7528\u65f6\u53ef\u4ee5\u6210\u529f\u5e94\u5bf9\u53bb\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u7cfb\u5217\u9002\u5e94\u6027\u65b9\u6cd5\u6765\u6062\u590d\u5927\u90e8\u5206\u88ab\u8ba4\u4e3a\u662f\u65e0\u6cd5\u5b66\u4e60\u7684\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528RMU\uff08\u5f53\u524d\u6700\u5148\u8fdb\u7684\u53bb\u5b66\u4e60\u65b9\u6cd5\uff09\u7f16\u8f91\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u5728\u65e0\u5173\u793a\u4f8b\u4e0a\u8fdb\u884c\u5fae\u8c03\u6216\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u79fb\u9664\u7279\u5b9a\u65b9\u5411\uff0c\u53ef\u4ee5\u6062\u590d\u5927\u90e8\u5206\u5371\u9669\u80fd\u529b\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8d28\u7591\u4e86\u5f53\u524d\u53bb\u5b66\u4e60\u65b9\u6cd5\u7684\u7a33\u5065\u6027\uff0c\u5e76\u5bf9\u5b83\u4eec\u76f8\u5bf9\u4e8e\u5b89\u5168\u8bad\u7ec3\u7684\u4f18\u52bf\u63d0\u51fa\u4e86\u7591\u95ee\u3002|\n", "2409.18023": "|**2024-09-26**|**DARE: Diverse Visual Question Answering with Robustness Evaluation**|Hannah Sterz et.al.|[2409.18023](http://arxiv.org/abs/2409.18023)|null|\u300aDARE\uff1a\u591a\u6837\u5316\u7684\u89c6\u89c9\u95ee\u7b54\u4e0e\u9c81\u68d2\u6027\u8bc4\u4f30\u300b\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u5982\u4e0b\uff1a \u672c\u6587\u5f15\u5165\u4e86DARE\uff08Diverse Visual Question Answering with Robustness Evaluation\uff09\uff0c\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u5e76\u6536\u96c6\u7684\u591a\u9009\u578b\u89c6\u89c9\u95ee\u7b54\u57fa\u51c6\u3002DARE\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u4e94\u4e2a\u4e0d\u540c\u7c7b\u522b\u7684\u89c6\u89c9\u95ee\u9898\u4e0a\uff0c\u5e76\u5305\u62ec\u57fa\u4e8e\u63d0\u793a\u53d8\u5316\u3001\u7b54\u6848\u9009\u9879\u5b50\u96c6\u3001\u8f93\u51fa\u683c\u5f0f\u548c\u6b63\u786e\u7b54\u6848\u6570\u91cf\u7b49\u56db\u4e2a\u9c81\u68d2\u6027\u5bfc\u5411\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5927\u591a\u6570\u7c7b\u522b\u4e2d\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u4e14\u65e0\u6cd5\u5728\u6d4b\u8bd5\u7684\u6240\u6709\u9c81\u68d2\u6027\u8bc4\u4f30\u4e2d\u4fdd\u6301\u4e00\u81f4\u7684\u9ad8\u6027\u80fd\u3002\u5728\u4e0d\u540c\u7b54\u6848\u9009\u9879\u5b50\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u6700\u5dee\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u4e0b\u964d\u53ef\u8fbe\u6807\u51c6\u60c5\u51b5\u4e0b\u768434%\u3002\u5f00\u6e90\u6a21\u578b\u5982LLaVA 1.6\u548cIdefics\u5728\u9c81\u68d2\u6027\u65b9\u9762\u65e0\u6cd5\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4\u548cGemini\u76f8\u5339\u654c\uff0c\u800c\u540e\u8005\u5728\u4e0d\u540c\u53d8\u4f53\u4e0b\u4ecd\u8868\u73b0\u51fa\u660e\u663e\u7684\u8106\u5f31\u6027\u3002 \u603b\u4e4b\uff0c\u8be5\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u65f6\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u8bbe\u8ba1\u66f4\u9c81\u68d2\u7684\u6a21\u578b\u65f6\u9700\u8981\u8003\u8651\u7684\u95ee\u9898\u3002|\n", "2409.18014": "|**2024-09-26**|**Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles**|Lewei He et.al.|[2409.18014](http://arxiv.org/abs/2409.18014)|null|\u9488\u5bf9\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5904\u7406\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ecd\u7136\u5b58\u5728\u5b9e\u73b0\u590d\u6742\u6027\u3001\u8bad\u7ec3\u6548\u7387\u548c\u6570\u636e\u7a00\u758f\u6027\u7b49\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u8303\u5f0f\u2014\u2014\u5728\u7ebf\u957f\u671f\u4e0a\u4e0b\u6587\u5904\u7406\uff08OLP\uff09\uff0c\u9002\u7528\u4e8e\u5904\u7406\u65e0\u9650\u957f\u5ea6\u7684\u6587\u6863\uff0c\u5e38\u89c1\u4e8e\u81ea\u52a8\u5316\u65b0\u95fb\u62a5\u9053\u3001\u76f4\u64ad\u7535\u5546\u548c\u75c5\u6bd2\u77ed\u89c6\u9891\u7b49\u591a\u6837\u5316\u7684\u6d41\u5a92\u4f53\u4fe1\u606f\u63a5\u6536\u4e0e\u7ec4\u7ec7\u573a\u666f\u3002\u540c\u65f6\uff0c\u5728\u9009\u62e9\u4f17\u591a\u6027\u80fd\u4f18\u5f02\u3001\u4ef7\u683c\u9002\u4e2d\u4e14\u54cd\u5e94\u5ef6\u8fdf\u77ed\u7684LLM\u65f6\uff0c\u5f80\u5f80\u9047\u5230\u96be\u4ee5\u6289\u62e9\u7684\u95ee\u9898\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u89d2\u8272\u5f3a\u5316\u5b66\u4e60\uff08Role-RL\uff09\u6846\u67b6\uff0c\u81ea\u52a8\u90e8\u7f72\u4e0d\u540c\u89d2\u8272\u7684LLM\u5728OLP\u7ba1\u9053\u4e2d\uff0c\u6839\u636e\u5176\u5b9e\u9645\u6027\u80fd\u8fdb\u884c\u5408\u7406\u5206\u914d\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5e76\u5728\u6211\u4eec\u7684OLP-MINI\u6570\u636e\u96c6\u4e0a\u53d1\u73b0\uff0c\u7ed3\u5408Role-RL\u6846\u67b6\u7684OLP\u7cfb\u7edf\u5e73\u5747\u53ec\u56de\u7387\u4e3a93.2%\uff0c\u5b9e\u73b0\u4e86OLP\u57fa\u51c6\uff0c\u5e76\u8282\u7701\u4e8679.4%\u7684LLM\u6210\u672c\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\uff1ahttps://anonymous.4open.science/r/Role-RL\u3002|\n", "2409.18957": "|**2024-09-27**|**LML: Language Model Learning a Dataset for Data-Augmented Prediction**|Praneeth Vadlapati et.al.|[2409.18957](http://arxiv.org/abs/2409.18957)|**[link](https://github.com/pro-genai/lml-dap)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u5206\u7c7b\u4efb\u52a1\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u901a\u5e38\u7531\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5904\u7406\u3002\u4e0e\u4f9d\u8d56\u5927\u91cf\u6570\u636e\u6e05\u6d17\u548c\u7279\u5f81\u5de5\u7a0b\u7684ML\u6a21\u578b\u4e0d\u540c\uff0c\u6b64\u65b9\u6cd5\u901a\u8fc7\u7b80\u5316\u6d41\u7a0b\uff0c\u4f7f\u7528LLM\u6765\u4f18\u5316\u8fc7\u7a0b\u3002\u672c\u6587\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\uff08LML\uff09\u201d\u7684\u6982\u5ff5\uff0c\u501f\u52a9\u4e00\u79cd\u79f0\u4e3a\u201c\u6570\u636e\u589e\u5f3a\u9884\u6d4b\uff08DAP\uff09\u201d\u7684\u65b0\u65b9\u6cd5\u3002\u5206\u7c7b\u4efb\u52a1\u7531LLM\u6267\u884c\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u624b\u52a8\u63a2\u7d22\u548c\u7406\u89e3\u6570\u636e\uff0c\u5e76\u5229\u7528\u6570\u636e\u4f5c\u4e3a\u53c2\u8003\u6765\u505a\u51fa\u5206\u7c7b\u51b3\u7b56\u3002 \u8bad\u7ec3\u6570\u636e\u88ab\u603b\u7ed3\u548c\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5bfc\u81f4\u6bcf\u4e2a\u6807\u7b7e\u5206\u7c7b\u7684\u4e3b\u8981\u7279\u5f81\u3002\u5728DAP\u8fc7\u7a0b\u4e2d\uff0c\u7cfb\u7edf\u4f7f\u7528\u6570\u636e\u6982\u8981\u81ea\u52a8\u751f\u6210\u67e5\u8be2\uff0c\u7528\u4e8e\u4ece\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u884c\u3002\u901a\u8fc7\u4f7f\u7528\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u6570\u636e\uff0cLLM\u57fa\u4e8e\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u884c\u751f\u6210\u5206\u7c7b\uff0c\u5373\u4f7f\u9762\u5bf9\u590d\u6742\u6570\u636e\u4e5f\u80fd\u786e\u4fdd\u6ee1\u610f\u7684\u51c6\u786e\u6027\u3002\u6570\u636e\u6982\u8981\u548c\u7c7b\u4f3c\u6570\u636e\u5728DAP\u4e2d\u7684\u5e94\u7528\u786e\u4fdd\u4e86\u51b3\u7b56\u7684\u4e0a\u4e0b\u6587\u610f\u8bc6\u3002\u8be5\u65b9\u6cd5\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u4e86\u201c\u4ee5\u53ef\u89e3\u91ca\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8eab\u4efd\u884c\u4e8b\u201d\u7684\u8bed\u53e5\uff0c\u589e\u5f3a\u4e86\u9884\u6d4b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5141\u8bb8\u7528\u6237\u5ba1\u67e5\u6bcf\u6761\u9884\u6d4b\u80cc\u540e\u7684\u903b\u8f91\u3002\u5728\u67d0\u4e9b\u6d4b\u8bd5\u6848\u4f8b\u4e2d\uff0c\u7cfb\u7edf\u7684\u51c6\u786e\u7387\u8d85\u8fc790%\uff0c\u8bc1\u660e\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u53ca\u5176\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8d85\u8d8a\u4f20\u7edfML\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/Pro-GenAI/LML-DAP\u3002**|\n", "2409.18943": "|**2024-09-27**|**Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models**|Jiaming Li et.al.|[2409.18943](http://arxiv.org/abs/2409.18943)|**[link](https://github.com/geaming2002/ruler)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u4f7f\u5f97\u4eba\u7c7b\u80fd\u591f\u4ee5\u81ea\u7136\u7684\u65b9\u5f0f\u4e0eAI\u4ee3\u7406\u4e92\u52a8\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u751f\u6210\u7279\u5b9a\u957f\u5ea6\u54cd\u5e94\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f80\u5f80\u96be\u4ee5\u6ee1\u8db3\u7528\u6237\u9700\u6c42\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u5728\u51c6\u786e\u611f\u77e5\u6570\u503c\u9650\u5236\u65b9\u9762\u5b58\u5728\u7684\u56fa\u6709\u56f0\u96be\u3002\u4e3a\u4e86\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9075\u5faa\u7279\u5b9a\u957f\u5ea6\u6307\u4ee4\u65f6\u63a7\u5236\u751f\u6210\u54cd\u5e94\u957f\u5ea6\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\uff08TLG\uff09\u5e76\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u7cbe\u786e\u5339\u914d\uff08PM\uff09\u548c\u7075\u6d3b\u5339\u914d\uff08FM\uff09\uff0c\u4ee5\u8bc4\u4f30\u6a21\u578b\u5728\u9075\u5b88\u6307\u5b9a\u54cd\u5e94\u957f\u5ea6\u65b9\u9762\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u6a21\u578b\u65e0\u5173\u7684\u65b9\u6cd5Ruler\uff0c\u901a\u8fc7\u4f7f\u7528\u5143\u957f\u5ea6\u6807\u8bb0\uff08MLTs\uff09\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u5ea6\u53d7\u9650\u6307\u4ee4\u4e0b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0cRuler\u4f7fLLMs\u80fd\u591f\u5728\u6307\u4ee4\u4e2d\u5305\u542b\u957f\u5ea6\u7ea6\u675f\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u6307\u5b9a\u957f\u5ea6\u7684\u54cd\u5e94\u3002\u800c\u4e14\uff0c\u5f53\u957f\u5ea6\u7ea6\u675f\u6ca1\u6709\u660e\u786e\u63d0\u4f9b\u65f6\uff0cRuler\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5f53\u7684MLT\uff0c\u8868\u73b0\u51fa\u51fa\u8272\u7684\u901a\u7528\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRuler\u5728\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\u4e0a\u5bf9\u4e0d\u540c\u7684LLMs\u90fd\u663e\u793a\u51fa\u6709\u6548\u6027\uff0c\u4f8b\u5982\u5728PM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a27.97\uff0c\u5728FM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a29.57\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86Ruler\u7684\u6709\u6548\u6027\u53ca\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/Geaming2002/Ruler\u83b7\u53d6\u3002**|\n", "2409.18938": "|**2024-09-27**|**From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding**|Heqing Zou et.al.|[2409.18938](http://arxiv.org/abs/2409.18938)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u89c6\u89c9\u7f16\u7801\u5668\u96c6\u6210\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5229\u7528\u5176\u56fa\u6709\u4f18\u52bf\u6765\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u4ee5\u8fdb\u884c\u89c6\u89c9\u63a8\u7406\u3002\u7531\u4e8e\u89c6\u89c9\u6570\u636e\u7684\u591a\u6837\u6027\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MM-LLMs\uff09\u5728\u8bbe\u8ba1\u548c\u8bad\u7ec3\u4e0a\u9488\u5bf9\u7406\u89e3\u56fe\u50cf\u3001\u77ed\u89c6\u9891\u548c\u957f\u89c6\u9891\u65f6\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u7279\u5f81\u548c\u6311\u6218\u3002\u6211\u4eec\u7684\u7814\u7a76\u805a\u7126\u4e8e\u957f\u89c6\u9891\u7406\u89e3\u4e0e\u9759\u6001\u56fe\u50cf\u53ca\u77ed\u89c6\u9891\u7406\u89e3\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u53ca\u5176\u72ec\u7279\u6311\u6218\u3002 \u4e0d\u540c\u4e8e\u9759\u6001\u56fe\u50cf\uff0c\u77ed\u89c6\u9891\u5305\u542b\u4e86\u5e8f\u5217\u5e27\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u5185\u90e8\u7684\u65f6\u95f4\u4fe1\u606f\uff1b\u800c\u957f\u89c6\u9891\u5219\u5305\u542b\u4e86\u591a\u4e2a\u4e8b\u4ef6\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u95f4\u7684\u957f\u671f\u65f6\u95f4\u4f9d\u8d56\u6027\u3002\u672c\u6587\u65e8\u5728\u8ffd\u6eaf\u5e76\u603b\u7ed3MM-LLMs\u4ece\u56fe\u50cf\u7406\u89e3\u5230\u957f\u89c6\u9891\u7406\u89e3\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u8be6\u7ec6\u5bf9\u6bd4\u5404\u79cd\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e4b\u95f4\u7684\u5dee\u5f02\uff0c\u5e76\u7a81\u51fa\u957f\u89c6\u9891\u7406\u89e3\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u66f4\u7ec6\u81f4\u7684\u65f6\u7a7a\u7ec6\u8282\u3001\u52a8\u6001\u4e8b\u4ef6\u548c\u957f\u671f\u4f9d\u8d56\u6027\u3002 \u63a5\u7740\uff0c\u672c\u6587\u5bf9MM-LLMs\u5728\u6a21\u578b\u8bbe\u8ba1\u548c\u8bad\u7ec3\u65b9\u6cd5\u4e0a\u7684\u53d1\u5c55\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u6982\u8ff0\uff0c\u7279\u522b\u5173\u6ce8\u4e8e\u5982\u4f55\u6709\u6548\u7406\u89e3\u957f\u89c6\u9891\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6bd4\u8f83\u73b0\u6709MM-LLMs\u5728\u4e0d\u540c\u957f\u5ea6\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u8868\u73b0\uff0c\u672c\u6587\u8ba8\u8bba\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u89c6\u9891\u7406\u89e3\u9886\u57df\u53ef\u80fd\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.18924": "|**2024-09-27**|**AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow**|Huizi Yu et.al.|[2409.18924](http://arxiv.org/abs/2409.18924)|null|\u5728\u73b0\u4ee3\u533b\u5b66\u6559\u80b2\u4e0e\u7814\u7a76\u9886\u57df\uff0c\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u53d1\u6325\u7740\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u3001\u7efc\u5408\u7684\u5b66\u4e60\u73af\u5883\uff0c\u5e76\u5141\u8bb8\u8fdb\u884c\u4e34\u5e8a\u51b3\u7b56\u6a21\u62df\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6709\u671b\u901a\u8fc7\u9ad8\u4fdd\u771f\u5ea6\u548c\u4f4e\u6210\u672c\u5730\u590d\u5236\u533b\u7597\u72b6\u51b5\u548c\u533b\u60a3\u4e92\u52a8\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u7cfb\u7edf\u7684\u6709\u6548\u6027\u548c\u53ef\u4fe1\u6027\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u4e00\u4e2a\u89c4\u6a21\u5927\u3001\u591a\u6837\u4e14\u7cbe\u786e\u7684\u60a3\u8005\u77e5\u8bc6\u5e93\uff0c\u540c\u65f6\u5177\u5907\u5f3a\u5927\u7684\u7a33\u5b9a\u77e5\u8bc6\u4f20\u64ad\u80fd\u529b\u3002 \u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AIPatient\uff0c\u8fd9\u662f\u4e00\u4e2a\u9ad8\u7ea7\u7684\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\uff0c\u5b83\u4ee5AIPatient\u77e5\u8bc6\u56fe\u8c31\uff08AIPatient KG\uff09\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u91c7\u7528\u57fa\u4e8e\u63a8\u7406\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Reasoning RAG\uff09\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u4f5c\u4e3a\u751f\u6210\u57fa\u7840\u3002AIPatient KG\u4eceMedical Information Mart for Intensive Care\uff08MIMIC-III\uff09\u6570\u636e\u5e93\u4e2d\u7684\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHRs\uff09\u62bd\u53d6\u6570\u636e\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u5728\u77e5\u8bc6\u5e93\u6709\u6548\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08F1\u5f97\u5206\u4e3a0.89\uff09\u3001\u4e34\u5e8a\u591a\u6837\u6027\u548c\u76f8\u5173\u6027\u9ad8\u76841,495\u540d\u60a3\u8005\u7684\u7fa4\u4f53\u3002 Reasoning RAG\u5229\u7528\u4e86\u516d\u4e2a\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u8986\u76d6\u4e86\u5305\u62ec\u68c0\u7d22\u3001KG\u67e5\u8be2\u751f\u6210\u3001\u62bd\u8c61\u3001\u68c0\u67e5\u3001\u91cd\u5199\u548c\u603b\u7ed3\u5728\u5185\u7684\u4efb\u52a1\u3002\u8fd9\u4e2a\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8eEHR\u7684\u533b\u7597\u95ee\u7b54\uff08QA\uff09\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8694.15%\u7684\u6574\u4f53\u51c6\u786e\u6027\uff0c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u65e0\u4ee3\u7406\u6216\u90e8\u5206\u4ee3\u7406\u96c6\u6210\u7684\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cfb\u7edf\u8fd8\u5c55\u793a\u4e86\u9ad8\u53ef\u8bfb\u6027\uff08\u4e2d\u4f4d\u6570Flesch\u9605\u8bfb\u8f7b\u677e\u5ea677.23\uff1b\u4e2d\u4f4d\u6570Flesch-Kincaid\u5e74\u7ea75.6\uff09\u3001\u7a33\u5065\u6027\uff08ANOVA F\u503c0.6126\uff0cp<0.1\uff09\u548c\u7a33\u5b9a\u6027\uff08ANOVA F\u503c0.782\uff0cp<0.1\uff09\u3002AIPatient\u7cfb\u7edf\u7684\u51fa\u8272\u6027\u80fd\u9884\u793a\u7740\u5176\u5728\u533b\u5b66\u6559\u80b2\u3001\u6a21\u578b\u8bc4\u4f30\u548c\u7cfb\u7edf\u96c6\u6210\u7b49\u591a\u4e2a\u5e94\u7528\u9886\u57df\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2409.18911": "|**2024-09-27**|**Soft Measures for Extracting Causal Collective Intelligence**|Maryam Berijanian et.al.|[2409.18911](http://arxiv.org/abs/2409.18911)|**[link](https://github.com/kuldeep7688/soft-measures-causal-intelligence)**|**\u7406\u89e3\u4e0e\u6a21\u62df\u96c6\u4f53\u667a\u6167\u5bf9\u4e8e\u5904\u7406\u590d\u6742\u793e\u4f1a\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u6a21\u7cca\u8ba4\u77e5\u5730\u56fe\uff08FCMs\uff09\u4f5c\u4e3a\u8868\u793a\u56e0\u679c\u5fc3\u7406\u6a21\u578b\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u901a\u8fc7\u5b9a\u5411\u56fe\u8fdb\u884c\u7f16\u7801\uff0c\u4f46\u76f4\u63a5\u4ece\u6587\u672c\u63d0\u53d6\u9ad8\u53ef\u4fe1\u5ea6\u7684FCMs\u5177\u6709\u6311\u6218\u6027\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u63d0\u53d6FCMs\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5f15\u5165\u4e86\u65b0\u9896\u7684\u57fa\u4e8e\u56fe\u7684\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528Elo\u8bc4\u5206\u7cfb\u7edf\u5173\u8054\u8f93\u51fa\u4e0e\u4eba\u7c7b\u5224\u65ad\u6765\u8bc4\u4f30\u8fd9\u4e9b\u5ea6\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u5ea6\u91cf\u4e0e\u4eba\u7c7b\u8bc4\u4ef7\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\uff0c\u5c3d\u7ba1\u8868\u73b0\u6700\u597d\u7684\u5ea6\u91cf\u4ecd\u7136\u5728\u6355\u6349FCM\u7ec6\u5fae\u5dee\u522b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u73b0\u6709\u7684\u5ea6\u91cf\u4ecd\u7136\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u9700\u6c42\u3002\u672c\u7814\u7a76\u5f3a\u8c03\u4e86\u9700\u8981\u9488\u5bf9FCMs\u63d0\u53d6\u8bbe\u8ba1\u7684\u8f6f\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u4f7f\u7528NLP\u6a21\u62df\u96c6\u4f53\u667a\u6167\u7684\u53d1\u5c55\u3002**|\n", "2409.18892": "|**2024-09-27**|**IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation**|Fan Lin et.al.|[2409.18892](http://arxiv.org/abs/2409.18892)|**[link](https://github.com/DUTlf/IDGen)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u8bc4\u4f30\u96c6\u5fc5\u987b\u4e0e\u65f6\u4ff1\u8fdb\uff0c\u4ee5\u786e\u4fdd\u5176\u6301\u7eed\u4fdd\u6301\u8db3\u591f\u7684\u533a\u5206\u80fd\u529b\u3002\u53d7\u6559\u80b2\u8bc4\u4f30\u4e2d\u5e7f\u6cdb\u4f7f\u7528\u7684\u9879\u76ee\u9274\u522b\uff08Item Discrimination, ID\uff09\u7406\u8bba\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eID\u7684\u63d0\u793a\u5408\u6210\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\uff0c\u786e\u4fdd\u8bc4\u4f30\u96c6\u80fd\u591f\u6839\u636e\u6a21\u578b\u7684\u80fd\u529b\u4e0d\u65ad\u66f4\u65b0\u548c\u4f18\u5316\u3002\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u6ce8\u91cd\u5e7f\u5ea6\u4e0e\u7cbe\u786e\u6027\u5e76\u91cd\u3002\u5b83\u80fd\u751f\u6210\u65e2\u80fd\u5168\u9762\u8bc4\u4f30LLMs\u80fd\u529b\uff0c\u53c8\u80fd\u63ed\u793a\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u6709\u610f\u4e49\u6027\u80fd\u5dee\u5f02\u7684\u63d0\u793a\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u548c\u9886\u57df\u4e2d\u7684\u76f8\u5bf9\u5f3a\u9879\u548c\u5f31\u70b9\u7684\u6709\u6548\u533a\u5206\u3002 \u4e3a\u4e86\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u901a\u7528\u5316\u6846\u67b6\u4e2d\u878d\u5165\u4e86\u4e00\u4e2a\u81ea\u6211\u6821\u6b63\u673a\u5236\uff0c\u5e76\u5f00\u53d1\u4e86\u4e24\u4e2a\u6a21\u578b\u6765\u9884\u6d4b\u63d0\u793a\u7684\u9274\u522b\u80fd\u529b\u548c\u96be\u5ea6\u8bc4\u5206\uff0c\u4ee5\u6b64\u63a8\u52a8\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u3002\u8fd9\u4e9b\u5de5\u5177\u5bf9\u8bc4\u4f30\u6570\u636e\u5408\u6210\u7814\u7a76\u5177\u6709\u91cd\u8981\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u6570\u636e\u5e94\u7528\u4e8e\u8bc4\u4f30\u4e94\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u3002\u8be5\u6570\u636e\u5e73\u5747\u5f97\u5206\u4e3a51.92\uff0c\u65b9\u5dee\u4e3a10.06\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5148\u524d\u7684\u5de5\u4f5c\uff08\u5982SELF-INSTRUCT\u548cWizardLM\uff09\u7684\u5e73\u5747\u5f97\u5206\u8d85\u8fc767\uff0c\u65b9\u5dee\u4f4e\u4e8e3.2\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u6846\u67b6\u751f\u6210\u7684\u6570\u636e\u5728\u6311\u6218\u6027\u548c\u533a\u5206\u80fd\u529b\u4e0a\u6bd4\u4e4b\u524d\u7684\u5de5\u4f5c\u66f4\u5177\u4f18\u52bf\u3002\u6211\u4eec\u8ba1\u5212\u53d1\u5e03\u5305\u542b\u8d85\u8fc73000\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7684\u6570\u636e\u5e93\uff0c\u4ee5\u4fc3\u8fdbLLMs\u8bc4\u4f30\u7814\u7a76\u7684\u53d1\u5c55\u3002|\n", "2409.18858": "|**2024-09-27**|**Predicting and analyzing memorization within fine-tuned Large Language Models**|J\u00e9r\u00e9mie Dentan et.al.|[2409.18858](http://arxiv.org/abs/2409.18858)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u8bb0\u5fc6\u4e86\u76f8\u5f53\u5927\u7684\u6bd4\u4f8b\uff0c\u8fd9\u5728\u63a8\u7406\u65f6\u6784\u6210\u4e86\u4e25\u91cd\u7684\u5a01\u80c1\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u79cd\u65e0\u610f\u7684\u8bb0\u5fc6\u95ee\u9898\uff0c\u7406\u89e3\u54ea\u4e9b\u5143\u7d20\u88ab\u8bb0\u5fc6\u4ee5\u53ca\u539f\u56e0\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u5927\u591a\u6570\u73b0\u6709\u5de5\u4f5c\u63d0\u4f9b\u7684\u662f\u4e8b\u540e\u89e3\u91ca\uff0c\u8fd9\u5728\u5b9e\u8df5\u4e2d\u5174\u8da3\u6709\u9650\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u5207\u7247\u4e92\u4fe1\u606f\uff0c\u5728\u5206\u7c7b\u573a\u666f\u4e2d\u9884\u5148\u68c0\u6d4b\u8bb0\u5fc6\u6837\u672c\u3002\u8be5\u65b9\u6cd5\u4ece\u8bad\u7ec3\u7684\u65e9\u671f\u9636\u6bb5\u5c31\u5177\u6709\u9ad8\u6548\u6027\uff0c\u5e76\u4e14\u6613\u4e8e\u9002\u5e94\u5b9e\u9645\u573a\u666f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f97\u5230\u4e86\u65b0\u7684\u7406\u8bba\u7ed3\u679c\u7684\u652f\u6301\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff0c\u5e76\u4e14\u9700\u8981\u8f83\u4f4e\u7684\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u83b7\u5f97\u4e86\u5f3a\u5927\u7684\u5b9e\u8bc1\u7ed3\u679c\uff0c\u4e3a\u5728\u8bb0\u5fc6\u53d1\u751f\u4e4b\u524d\u7cfb\u7edf\u5730\u68c0\u67e5\u548c\u4fdd\u62a4\u8fd9\u4e9b\u6613\u53d7\u5f71\u54cd\u7684\u6837\u672c\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.18857": "|**2024-09-27**|**Mitigating Selection Bias with Node Pruning and Auxiliary Options**|Hyeong Kyu Choi et.al.|[2409.18857](http://arxiv.org/abs/2409.18857)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u56de\u7b54\u591a\u9009\u9898\u65f6\u5f80\u5f80\u8868\u73b0\u51fa\u5bf9\u67d0\u4e9b\u9009\u9879\u7684\u4e0d\u9002\u5f53\u504f\u597d\uff0c\u8fd9\u5728LLM\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u5f15\u53d1\u4e86\u663e\u8457\u7684\u53ef\u9760\u6027\u95ee\u9898\u3002\u4ee5\u5f80\u7684\u89e3\u51b3\u65b9\u6848\u4e3b\u8981\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u8f93\u5165\u548c/\u6216\u8f93\u51fa\u6765\u5e94\u5bf9\u504f\u89c1\u95ee\u9898\u3002\u800c\u6211\u4eec\u7684\u5de5\u4f5c\u5219\u91c7\u53d6\u4e86\u4e0d\u540c\u7684\u8def\u5f84\uff0c\u65e8\u5728\u63a2\u7a76\u6a21\u578b\u5185\u90e8\u504f\u89c1\u7684\u5f62\u6210\u673a\u5236\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u504f\u5dee\u8282\u70b9\u4fee\u526a\uff08BNP\uff09\u7684\u65b0\u9896\u53bb\u504f\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u5220\u9664\u90a3\u4e9b\u5bfc\u81f4\u504f\u89c1\u7684\u7ebf\u6027\u5c42\u53c2\u6570\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u8f85\u52a9\u9009\u9879\u6ce8\u5165\uff08AOI\uff09\u7684\u7b80\u5355\u800c\u6709\u6548\u7684\u8f93\u5165\u4fee\u6539\u6280\u672f\uff0c\u9002\u7528\u4e8e\u9ed1\u76d2\u6a21\u578b\u7684\u53bb\u504f\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e00\u4e2a\u66f4\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u9009\u62e9\u504f\u89c1\uff0c\u6211\u4eec\u56de\u987e\u4e86\u73b0\u6709\u6307\u6807\uff0c\u5e76\u63d0\u51fa\u4e86\u9009\u62e9Kullback-Leibler\u6563\u5ea6\uff08CKLD\uff09\uff0c\u4ee5\u89e3\u51b3\u5e38\u7528\u6307\u6807\u5bf9\u6807\u7b7e\u4e0d\u5e73\u8861\u4e0d\u654f\u611f\u7684\u95ee\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5e94\u7528\u5230\u4e09\u79cd\u4e0d\u540c\u7684LLM\u65f6\u8868\u73b0\u51fa\u4e86\u9c81\u68d2\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2409.18812": "|**2024-09-27**|**LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis**|Hamed Babaei Giglou et.al.|[2409.18812](http://arxiv.org/abs/2409.18812)|**[link](https://github.com/HamedBabaei/LLMs4Synthesis)**|\u9762\u5bf9\u79d1\u5b66\u6587\u732e\u65e5\u76ca\u589e\u957f\u7684\u590d\u6742\u6027\u548c\u6570\u91cf\uff0c\u672c\u6587\u63d0\u51fa\u4e86LLMs4Synthesis\u6846\u67b6\uff0c\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u79d1\u5b66\u7efc\u5408\u5206\u6790\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u9488\u5bf9\u5feb\u901f\u3001\u8fde\u8d2f\u548c\u8bed\u5883\u4e30\u5bcc\u7684\u79d1\u5b66\u89c1\u89e3\u96c6\u6210\u9700\u6c42\uff0c\u5229\u7528\u5f00\u6e90\u548c\u4e13\u6709LLMs\uff0c\u4ee5\u89e3\u51b3\u5f53\u524d\u5b9a\u91cf\u6307\u6807\u5728\u8bc4\u4f30\u8fd9\u4e9b\u7efc\u5408\u5206\u6790\u65f6\u5b58\u5728\u7684\u4e0d\u8db3\u3002\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u5904\u7406\u79d1\u5b66\u8bba\u6587\u7684\u65b0\u65b9\u6cd5\u3001\u5b9a\u4e49\u65b0\u7684\u7efc\u5408\u7c7b\u578b\u4ee5\u53ca\u5efa\u7acb\u4e5d\u9879\u8be6\u7ec6\u7684\u8d28\u91cf\u8bc4\u4f30\u6807\u51c6\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5bf9\u8fd9\u4e00\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6211\u4eec\u8fd8\u63d0\u8bae\u5c06LLMs\u4e0e\u5f3a\u5316\u5b66\u4e60\u548cAI\u53cd\u9988\u76f8\u7ed3\u5408\uff0c\u4ee5\u4f18\u5316\u7efc\u5408\u8d28\u91cf\uff0c\u5e76\u786e\u4fdd\u5176\u4e0e\u65e2\u5b9a\u6807\u51c6\u4fdd\u6301\u4e00\u81f4\u3002LLMs4Synthesis\u6846\u67b6\u53ca\u5176\u7ec4\u6210\u90e8\u5206\u7684\u53ef\u7528\u6027\uff0c\u6709\u671b\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u7efc\u5408\u8fc7\u7a0b\u7684\u751f\u6210\u548c\u8bc4\u4ef7\u80fd\u529b\u3002|\n", "2409.18794": "|**2024-09-27**|**Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs**|Yanyuan Qiao et.al.|[2409.18794](http://arxiv.org/abs/2409.18794)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u540d\u4e3aOpen-Nav\u7684\u521b\u65b0\u7814\u7a76\uff0c\u65e8\u5728\u63a2\u7d22\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fde\u7eed\u73af\u5883\u4e2d\u7684\u96f6\u6837\u672c\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u5e94\u7528\u3002Open-Nav\u91c7\u7528\u4e86\u7a7a\u95f4\u65f6\u95f4\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63a8\u7406\u65b9\u6cd5\uff0c\u5c06\u4efb\u52a1\u5206\u89e3\u4e3a\u6307\u4ee4\u7406\u89e3\u3001\u8fdb\u5ea6\u4f30\u8ba1\u548c\u51b3\u7b56\u5236\u5b9a\u4e09\u4e2a\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u6a21\u578b\u5728\u5bfc\u822a\u573a\u666f\u4e2d\u7684\u611f\u77e5\u80fd\u529b\u5e76\u589e\u5f3a\u5bf9\u7ec6\u7c92\u5ea6\u7269\u4f53\u548c\u7a7a\u95f4\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u6a21\u62df\u73af\u5883\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5747\u663e\u793a\uff0cOpen-Nav\u80fd\u591f\u4e0e\u4f7f\u7528\u95ed\u6e90LLMs\u5b9e\u73b0\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u6027\u80fd\u3002|\n", "2409.20566": "|**2024-09-30**|**MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning**|Haotian Zhang et.al.|[2409.20566](http://arxiv.org/abs/2409.20566)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cfMM1.5\uff0c\u65e8\u5728\u589e\u5f3a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7406\u89e3\u3001\u89c6\u89c9\u5f15\u7528\u4e0e\u5b9a\u4f4d\u4ee5\u53ca\u591a\u56fe\u50cf\u63a8\u7406\u7684\u80fd\u529b\u3002\u5728MM1\u67b6\u6784\u7684\u57fa\u7840\u4e0a\uff0cMM1.5\u91c7\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u5728\u6574\u4e2a\u6a21\u578b\u8bad\u7ec3\u751f\u547d\u5468\u671f\u5185\u4e0d\u540c\u6570\u636e\u6df7\u5408\u7684\u5f71\u54cd\u3002\u8fd9\u5305\u62ec\u9ad8\u8d28\u91cf\u7684OCR\u6570\u636e\u548c\u5408\u6210\u63cf\u8ff0\u7b26\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4ee5\u53ca\u4f18\u5316\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u53c2\u6570\u636e\u6df7\u5408\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6a21\u578b\u6db5\u76d6\u4e86\u4ece1\u4ebf\u523030\u4ebf\u53c2\u6570\u7684\u8303\u56f4\uff0c\u5305\u62ec\u5bc6\u96c6\u578b\u548c\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u53d8\u4f53\uff0c\u5e76\u8bc1\u660e\u4e86\u5373\u4f7f\u5728\u8f83\u5c0f\u89c4\u6a21\uff081\u4ebf\u548c3\u4ebf\u53c2\u6570\uff09\u4e0b\uff0c\u7cbe\u5fc3\u7684\u6570\u636e\u6574\u7406\u548c\u8bad\u7ec3\u7b56\u7565\u4e5f\u80fd\u4ea7\u751f\u5f3a\u5927\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u4e13\u95e8\u7684\u53d8\u4f53\uff1aMM1.5-Video\uff0c\u7528\u4e8e\u89c6\u9891\u7406\u89e3\uff1bMM1.5-UI\uff0c\u7528\u4e8e\u79fb\u52a8\u7528\u6237\u754c\u9762\u7406\u89e3\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u548c\u6d88\u878d\u5206\u6790\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u548c\u51b3\u7b56\u7684\u8be6\u7ec6\u89c1\u89e3\uff0c\u8fd9\u4e9b\u89c1\u89e3\u5bf9\u4e8e\u672a\u6765\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u5177\u6709\u5b9d\u8d35\u7684\u6307\u5bfc\u610f\u4e49\u3002|\n", "2409.20557": "|**2024-09-30**|**Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos**|Md Mohaiminul Islam et.al.|[2409.20557](http://arxiv.org/abs/2409.20557)|null|\u672c\u6587\u63d0\u51fa\u4e86VidAssist\uff0c\u4e00\u4e2a\u7528\u4e8e\u4ece\u6559\u5b66\u89c6\u9891\u4e2d\u8fdb\u884c\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u7684\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u7684\u96c6\u6210\u6846\u67b6\u3002VidAssist\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u77e5\u8bc6\u5e93\u548c\u8bc4\u4f30\u5de5\u5177\uff0c\u751f\u6210\u5e76\u8bc4\u4f30\u884c\u52a8\u8ba1\u5212\uff0c\u4ee5\u6b64\u514b\u670d\u4ece\u5c0f\u89c4\u6a21\u3001\u4f4e\u591a\u6837\u6027\u6570\u636e\u96c6\u83b7\u53d6\u8fc7\u7a0b\u77e5\u8bc6\u7684\u6311\u6218\u3002\u6b64\u5916\uff0cVidAssist\u91c7\u7528\u5e7f\u5ea6\u4f18\u5148\u641c\u7d22\u7b97\u6cd5\u8fdb\u884c\u6700\u4f18\u8ba1\u5212\u751f\u6210\uff0c\u5e76\u4f7f\u7528\u4e13\u4e3a\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u8ba1\u7684\u4ef7\u503c\u51fd\u6570\uff0c\u5728\u6bcf\u4e00\u6b65\u8bc4\u4f30\u9884\u6d4b\u52a8\u4f5c\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cVidAssist\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u4e0d\u540c\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u7f6e\u7684\u7edf\u4e00\u6846\u67b6\uff0c\u5982\u89c6\u89c9\u8f85\u52a9\u89c4\u5212\uff08VPA\uff09\u548c\u7a0b\u5e8f\u89c4\u5212\uff08PP\uff09\uff0c\u5728\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5c11\u91cf\u6837\u672c\u6a21\u578b\u5728COIN\u6570\u636e\u96c6\u4e0a\u7684VPA\u4efb\u52a1\u548cPP\u4efb\u52a1\u4e0a\u5206\u522b\u6bd4\u5168\u76d1\u7763\u7684\u524d\u5bfc\u65b9\u6cd5\u9ad8\u51fa+7.7%\u548c+4.81%\uff0c\u540c\u65f6\u9884\u6d4b4\u4e2a\u672a\u6765\u52a8\u4f5c\u3002\u6240\u6709\u4ee3\u7801\u548c\u6a21\u578b\u90fd\u5728https://sites.google.com/view/vidassist\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2409.20550": "|**2024-09-30**|**LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation**|Ziyao Zhang et.al.|[2409.20550](http://arxiv.org/abs/2409.20550)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u73b0\u8c61\u7684\u5b9e\u8bc1\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u5b9e\u9645\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u590d\u6742\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u65f6\uff0c\u5f80\u5f80\u4f1a\u4ea7\u751f\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u7684\u7ed3\u679c\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u5728\u5355\u4e00\u529f\u80fd\u751f\u6210\u573a\u666f\u4e0b\u7684\u5e7b\u89c9\u5206\u6790\uff0c\u4f46\u672c\u6587\u5c06\u7814\u7a76\u8303\u56f4\u6269\u5c55\u81f3\u66f4\u5b9e\u9645\u4e14\u590d\u6742\u7684\u4ed3\u5e93\u7ea7\u751f\u6210\u60c5\u666f\u3002 \u9996\u5148\uff0c\u901a\u8fc7\u4eba\u5de5\u68c0\u67e5\u516d\u79cd\u4e3b\u6d41LLM\u7684\u4ee3\u7801\u751f\u6210\u7ed3\u679c\uff0c\u672c\u6587\u5efa\u7acb\u4e86LLM\u751f\u6210\u4ee3\u7801\u7684\u5e7b\u89c9\u5206\u7c7b\u4f53\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u8be6\u7ec6\u9610\u8ff0\u4e86\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5206\u6790\u4e86\u4e0d\u540c\u6a21\u578b\u95f4\u5e7b\u89c9\u5206\u5e03\u7684\u60c5\u51b5\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5e7b\u89c9\u4ea7\u751f\u7684\u539f\u56e0\uff0c\u5e76\u8bc6\u522b\u4e86\u56db\u4e2a\u53ef\u80fd\u5bfc\u81f4\u5e7b\u89c9\u7684\u56e0\u7d20\u3002 \u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bb0\u5fc6\u7f51\u7edc\uff08RAG\uff09\u7684\u7f13\u89e3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u6240\u6709\u7814\u7a76\u7684LLM\u4e0a\u5747\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6709\u6548\u6027\u3002\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u62ec\u4ee3\u7801\u3001\u6570\u636e\u548c\u5b9e\u9a8c\u7ed3\u679c\u7684\u53ef\u590d\u5236\u5305\uff0c\u4f9b\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u53c2\u8003\u548c\u9a8c\u8bc1\u3002\u6b64\u7814\u7a76\u6709\u52a9\u4e8e\u63d0\u9ad8LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u53ef\u9760\u6027\u4e0e\u51c6\u786e\u6027\uff0c\u5bf9\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20548": "|**2024-09-30**|**Robi Butler: Remote Multimodal Interactions with Household Robot Assistant**|Anxing Xiao et.al.|[2409.20548](http://arxiv.org/abs/2409.20548)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Robi Butler\uff0c\u4e00\u79cd\u65b0\u578b\u7684\u5bb6\u5ead\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5b83\u80fd\u591f\u4e0e\u8fdc\u7a0b\u7528\u6237\u8fdb\u884c\u591a\u6a21\u6001\u4ea4\u4e92\u3002\u57fa\u4e8e\u5148\u8fdb\u7684\u901a\u4fe1\u63a5\u53e3\uff0cRobi Butler\u5141\u8bb8\u7528\u6237\u76d1\u63a7\u673a\u5668\u4eba\u7684\u72b6\u6001\u3001\u53d1\u9001\u6587\u672c\u6216\u8bed\u97f3\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u624b\u52bf\u9009\u62e9\u76ee\u6807\u5bf9\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u9ad8\u7ea7\u884c\u4e3a\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u89e3\u91ca\u591a\u6a21\u6001\u6307\u4ee4\u5e76\u751f\u6210\u884c\u52a8\u8ba1\u5212\u3002\u8fd9\u4e9b\u8ba1\u5212\u7531\u652f\u6301\u6587\u672c\u548c\u70b9\u51fb\u67e5\u8be2\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5904\u7406\u7684\u5f00\u653e\u8bcd\u6c47\u96c6\u7ec4\u6210\u3002\u6574\u5408\u4ee5\u4e0a\u7ec4\u4ef6\u4f7f\u5f97Robi Butler\u80fd\u591f\u5728\u96f6\u6837\u672c\u7684\u60c5\u51b5\u4e0b\u5c06\u8fdc\u7a0b\u591a\u6a21\u6001\u6307\u4ee4\u8f6c\u5316\u4e3a\u73b0\u5b9e\u4e16\u754c\u5bb6\u5ead\u73af\u5883\u4e2d\u7684\u5b9e\u9645\u64cd\u4f5c\u3002\u6211\u4eec\u901a\u8fc7\u6f14\u793a\u5404\u79cd\u65e5\u5e38\u5bb6\u52a1\u4efb\u52a1\u7684\u6709\u6548\u6027\u548c\u6548\u7387\uff0c\u5c55\u793a\u4e86\u8be5\u7cfb\u7edf\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u5230\u8fdc\u7a0b\u7528\u6237\u7ed9\u51fa\u591a\u6a21\u6001\u6307\u4ee4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u7528\u6237\u7814\u7a76\uff0c\u5206\u6790\u4e86\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8fdc\u7a0b\u4eba\u673a\u4ea4\u4e92\u7684\u6548\u7387\u548c\u7528\u6237\u4f53\u9a8c\u7684\u5f71\u54cd\uff0c\u5e76\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.20512": "|**2024-09-30**|**Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models**|Arpan Mukherjee et.al.|[2409.20512](http://arxiv.org/abs/2409.20512)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u51c6\u786e\u9884\u6d4b\u5de5\u4e1a\u5408\u6210\u4e2d\u6240\u7528\u9499\u949b\u77ff\u6eb6\u5242\u6bd2\u6027\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u9488\u5bf9\u6027\u548c\u7ed3\u6784\u5316\u7684\u6bd2\u6027\u6570\u636e\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u52a8\u5316\u6570\u636e\u63d0\u53d6\u4e0e\u5177\u6709\u4e0d\u786e\u5b9a\u6027\u4fe1\u606f\u7684\u9884\u6d4b\u6a21\u578b\uff0c\u4ee5\u586b\u8865\u6570\u636e\u7a7a\u767d\u5e76\u63d0\u9ad8\u9884\u6d4b\u7684\u7f6e\u4fe1\u5ea6\u3002 \u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e24\u79cd\u65b9\u6cd5\u4ece\u6d89\u53ca\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u7684\u79d1\u5b66\u6587\u732e\u8bed\u6599\u5e93\u4e2d\u81ea\u52a8\u63d0\u53d6\u76f8\u5173\u6570\u636e\uff1a\u8f83\u5c0f\u7684\u53cc\u5411\u8bed\u8a00\u6a21\u578b\uff08\u5982BERT\u548cELMo\uff09\u56e0\u5176\u91cd\u590d\u6027\u548c\u786e\u5b9a\u6027\u8f93\u51fa\u800c\u88ab\u4f7f\u7528\uff1b\u800c\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-3.5\u5219\u5229\u7528\u5176\u5e9e\u5927\u7684\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u66f4\u597d\u7684\u54cd\u5e94\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u201c\u63d0\u793a\u548c\u9a8c\u8bc1\u201d\u6280\u672f\u96c6\u6210\u5230LLM\u4e2d\uff0c\u65e8\u5728\u5b9e\u73b0\u6709\u9488\u5bf9\u6027\u7684\u63d0\u53d6\u548c\u4f18\u5316\uff0c\u4ece\u800c\u51cf\u5c11LLM\u7684\u5e7b\u89c9\u73b0\u8c61\uff0c\u63d0\u5347\u63d0\u53d6\u6570\u636e\u7684\u8d28\u91cf\u3002 \u63a5\u4e0b\u6765\uff0c\u63d0\u53d6\u7684\u6570\u636e\u88ab\u8f93\u5165\u5230\u9884\u8bad\u7ec3\u7684\u591a\u4efb\u52a1\u4e8c\u5143\u5206\u7c7b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u63d0\u53d6\u6eb6\u5242\u7684ED\u6027\u8d28\u3002\u6211\u4eec\u5229\u7528\u4ece\u5206\u7c7b\u6a21\u578b\u83b7\u5f97\u7684\u7c7b\u522b\u6982\u7387\u8fdb\u884c\u9999\u519c\u71b5\u4e3a\u57fa\u7840\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u4ee5\u6b64\u6765\u91cf\u5316\u4e0d\u786e\u5b9a\u6027\u5e76\u8bc6\u522b\u9884\u6d4b\u4e2d\u7684\u6570\u636e\u7f3a\u53e3\u3002\u8fd9\u79cd\u65b9\u6cd5\u5bfc\u81f4\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u7528\u4e8e\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u53ca\u5176\u57fa\u4e8e\u4e0d\u786e\u5b9a\u6027\u865a\u62df\u6bd2\u6027\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u548c\u5f26\u56fe\u6765\u53ef\u89c6\u5316\u6eb6\u5242\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u5e76\u4f18\u5148\u8003\u8651\u90a3\u4e9b\u53ef\u80fd\u5b58\u5728\u5371\u9669\u7684\u6eb6\u5242\uff0c\u7ed3\u679c\u53d1\u73b070%\u7684\u6eb6\u5242\u76f8\u4e92\u4f5c\u7528\u4e3b\u8981\u4e0e\u7279\u5b9a\u7684\u4e24\u79cd\u9499\u949b\u77ff\u76f8\u5173\u8054\u3002|\n", "2409.20502": "|**2024-09-30**|**COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models**|Divyanshu Daiya et.al.|[2409.20502](http://arxiv.org/abs/2409.20502)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOLLAGE\u7684\u65b0\u578b\u6846\u67b6\uff0c\u7528\u4e8e\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5c42\u6b21\u5316\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u5411\u91cf\u91cf\u5316\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VQ-VAE\uff09\u6765\u751f\u6210\u534f\u4f5c\u5f0f\u4ee3\u7406-\u5bf9\u8c61-\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u6a21\u578b\u89e3\u51b3\u4e86\u8fd9\u4e00\u9886\u57df\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u6765\u6307\u5bfc\u751f\u6210\u6027\u6269\u6563\u6a21\u578b\u3002\u5c42\u6b21\u5316\u7684VQ-VAE\u67b6\u6784\u5728\u591a\u4e2a\u62bd\u8c61\u7ea7\u522b\u6355\u83b7\u4e86\u4e0d\u540c\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u7279\u5f81\uff0c\u907f\u514d\u4e86\u5197\u4f59\u6982\u5ff5\uff0c\u5e76\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u591a\u5206\u8fa8\u7387\u8868\u793a\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u5728\u9690\u7a7a\u95f4\u4e2d\u64cd\u4f5c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5e76\u7ed3\u5408\u4e86\u7531LLM\u751f\u6210\u7684\u8fd0\u52a8\u89c4\u5212\u63d0\u793a\u6765\u5f15\u5bfc\u53bb\u566a\u8fc7\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9488\u5bf9\u7279\u5b9a\u63d0\u793a\u7684\u8fd0\u52a8\u751f\u6210\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a7\u5236\u6027\u548c\u591a\u6837\u6027\u3002\u5728CORE-4D\u548cInterHuman\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u751f\u6210\u771f\u5b9e\u4e14\u591a\u6837\u5316\u7684\u534f\u4f5c\u4eba\u7c7b-\u7269\u4f53-\u4eba\u7c7b\u4ea4\u4e92\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u673a\u5668\u4eba\u5b66\u3001\u56fe\u5f62\u5b66\u548c\u8ba1\u7b97\u673a\u89c6\u89c9\u7b49\u9886\u57df\u5efa\u6a21\u590d\u6742\u4ea4\u4e92\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.20441": "|**2024-10-01**|**Instance-adaptive Zero-shot Chain-of-Thought Prompting**|Xiaosong Yuan et.al.|[2409.20441](http://arxiv.org/abs/2409.20441)|null|\u96f6\u5c04\u94fe\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7b56\u7565\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u65b9\u9762\u5c55\u73b0\u51fa\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5355\u4e00\u4efb\u52a1\u7ea7\u63d0\u793a\u5728\u6574\u4e2a\u5b9e\u4f8b\u4e0a\u7684\u5e94\u7528\u5b58\u5728\u5c40\u9650\u6027\uff0c\u56e0\u4e3a\u4e00\u4e2a\u63d0\u793a\u65e0\u6cd5\u4e0e\u6240\u6709\u5b9e\u4f8b\u90fd\u6210\u4e3a\u6700\u4f73\u642d\u6863\u3002\u56e0\u6b64\uff0c\u66f4\u6070\u5f53\u7684\u505a\u6cd5\u662f\u7cbe\u5fc3\u8003\u8651\u63d0\u793a\u4e0e\u6bcf\u4e2a\u5b9e\u4f8b\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b97\u6cd5\u4f5c\u4e3a\u96f6\u5c04CoT\u63a8\u7406\u7684\u4e00\u79cd\u66ff\u4ee3\u7b56\u7565\uff0c\u65e8\u5728\u901a\u8fc7\u9002\u5f53\u5730\u533a\u5206\u51fa\u597d\u7684\u548c\u574f\u7684\u63d0\u793a\u6765\u63d0\u5347\u6027\u80fd\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u4fe1\u606f\u6d41\u7684\u89d2\u5ea6\u5bf9LLM\u8fdb\u884c\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\uff0c\u53d1\u73b0\u4fe1\u606f\u4ece\u95ee\u9898\u5230\u63d0\u793a\u4ee5\u53ca\u95ee\u9898\u5230\u63a8\u7406\u7684\u53cc\u5411\u6d41\u52a8\u5bf9\u63a8\u7406\u7ed3\u679c\u5f71\u54cd\u6700\u5927\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u66f4\u4f18\u79c0\u7684\u96f6\u5c04CoT\u63a8\u7406\u9700\u8981\u63d0\u793a\u4ece\u95ee\u9898\u4e2d\u83b7\u53d6\u8bed\u4e49\u4fe1\u606f\uff0c\u7136\u540e\u63a8\u7406\u4ece\u95ee\u9898\u76f4\u63a5\u6216\u901a\u8fc7\u63d0\u793a\u95f4\u63a5\u5730\u805a\u5408\u8db3\u591f\u4fe1\u606f\u3002\u76f8\u53cd\uff0c\u7f3a\u5931\u8fd9\u4e9b\u4efb\u4f55\u4e00\u9879\u53ef\u80fd\u90fd\u4f1a\u5bfc\u81f4\u4e00\u4e2a\u4e0d\u7406\u60f3\u7684\u63d0\u793a\u3002\u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u96f6\u5c04CoT\u63a8\u7406\u7684\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b56\u7565\uff08IAP\uff09\u3002 \u5728LLaMA-2\u3001LLaMA-3\u548cQwen\u4e0a\u5bf9\u6570\u5b66\u3001\u903b\u8f91\u548c\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\uff08\u5982GSM8K\u3001MMLU\u3001\u56e0\u679c\u5224\u65ad\uff09\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5b9e\u4f8b\u81ea\u9002\u5e94\u96f6\u5c04CoT\u63d0\u793a\u7b56\u7565\u5728\u67d0\u4e9b\u5b9a\u5236\u63d0\u793a\u6216\u590d\u6742\u7a0b\u5e8f\u7684\u57fa\u7840\u4e0a\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd9\u8bc1\u660e\u4e86\u6211\u4eec\u5728\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\u7814\u7a76\u4e2d\u7684\u53d1\u73b0\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20385": "|**2024-09-30**|**Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation**|Shan Chen et.al.|[2409.20385](http://arxiv.org/abs/2409.20385)|null|\u80cc\u666f\uff1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8bad\u7ec3\u6210\u9075\u5faa\u6307\u4ee4\uff0c\u4f46\u8fd9\u79cd\u8bbe\u8ba1\u4f7f\u5176\u5bb9\u6613\u5728\u751f\u6210\u9519\u8bef\u4fe1\u606f\u65f6\u76f2\u76ee\u9075\u4ece\u7528\u6237\u8bf7\u6c42\u3002\u5728\u533b\u5b66\u9886\u57df\uff0c\u8fd9\u53ef\u80fd\u4f1a\u52a0\u901f\u9519\u8bef\u4fe1\u606f\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u5f71\u54cd\u4eba\u7c7b\u5065\u5eb7\u3002\u7814\u7a76\u76ee\u6807/\u65b9\u6cd5\uff1a\u6211\u4eec\u5206\u6790\u4e86\u6a21\u578b\u5728\u77e5\u9053\u8bf7\u6c42\u4e0d\u5408\u7406\u7684\u60c5\u51b5\u4e0b\uff0c\u751f\u6210\u4e0e\u836f\u7269\u6709\u5173\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u503e\u5411\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u901a\u8fc7\u4e0a\u4e0b\u6587\u63d0\u793a\u548c\u8c03\u6574\u53c2\u6570\uff0c\u4f7fLLMs\u4f18\u5148\u8003\u8651\u903b\u8f91\u63a8\u7406\u800c\u975e\u9075\u4ece\u6027\uff0c\u4ee5\u964d\u4f4e\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u98ce\u9669\u7684\u53ef\u80fd\u6027\u3002 \u7ed3\u679c\uff1a\u6240\u6709\u524d\u6cbfLLMs\u90fd\u9075\u5b88\u4e86\u751f\u6210\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u4e0d\u5408\u7406\u8bf7\u6c42\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u548c\u53c2\u6570\u8c03\u6574\u7b56\u7565\u53ef\u4ee5\u63d0\u5347\u68c0\u6d4b\u8bf7\u6c42\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\uff0c\u5e76\u9632\u6b62\u533b\u7597\u4fe1\u606f\u7684\u8bef\u4f20\u3002 \u7ed3\u8bba\uff1a\u5c06LLMs\u7684\u8bbe\u8ba1\u91cd\u5fc3\u4ece\u9075\u4ece\u6027\u8f6c\u5411\u903b\u8f91\u63a8\u7406\uff0c\u6709\u52a9\u4e8e\u964d\u4f4e\u5176\u88ab\u5229\u7528\u4e8e\u4f20\u64ad\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u7684\u98ce\u9669\u3002|\n", "2409.20370": "|**2024-09-30**|**The Perfect Blend: Redefining RLHF with Mixture of Judges**|Tengyu Xu et.al.|[2409.20370](http://arxiv.org/abs/2409.20370)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u7684\u540e\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u7ea6\u675f\u751f\u6210\u7b56\u7565\u4f18\u5316\uff08CGPO\uff09\u3002CGPO\u7684\u6838\u5fc3\u662f\u201c\u88c1\u5224\u6df7\u5408\u201d\uff08MoJ\uff09\uff0c\u5b83\u4ee5\u6210\u672c\u6548\u76ca\u7684\u65b9\u5f0f\u5bf9\u7b56\u7565\u8fdb\u884c\u5206\u5c42\u7ea6\u675f\u4f18\u5316\uff0c\u4ece\u800c\u5728\u539f\u7406\u4e0a\u8bc6\u522bRLHF\u4e2d\u7684\u5b8c\u7f8e\u878d\u5408\u3002\u6b64\u65b9\u6cd5\u5728\u7406\u8bba\u4e0a\u6709\u4fdd\u8bc1\uff0c\u4e0d\u9700\u8981\u5927\u91cf\u7684\u8d85\u53c2\u6570\u8c03\u6574\uff0c\u5e76\u4e14\u53ef\u4ee5\u5728\u5e38\u89c1\u7684\u540e\u8bad\u7ec3\u7ba1\u9053\u4e2d\u65e0\u7f1d\u96c6\u6210\u3002\u8fd9\u6709\u52a9\u4e8e\u68c0\u6d4b\u548c\u7f13\u89e3\u5956\u52b1\u4f5c\u5f0a\u884c\u4e3a\uff0c\u5e76\u5728\u5927\u91cf\u76ee\u6807\u7684\u573a\u666f\u4e0b\u8fbe\u5230\u5e15\u7d2f\u6258\u6700\u4f18\u70b9\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cCGPO\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u6807\u51c6\u7684RLHF\u7b97\u6cd5\uff0c\u5982PPO\u548cDPO\uff0c\u5305\u62ec\u901a\u7528\u804a\u5929\u3001STEM\u95ee\u9898\u3001\u6307\u4ee4\u9075\u5faa\u548c\u7f16\u7a0b\u7b49\u3002\u5177\u4f53\u800c\u8a00\uff0cCGPO\u5728AlpacaEval-2\uff08\u901a\u7528\u804a\u5929\uff09\u4e0a\u63d0\u9ad8\u4e867.4%\uff0c\u5728Arena-Hard\uff08STEM\u4e0e\u63a8\u7406\uff09\u4e0a\u63d0\u9ad8\u4e8612.5%\uff0c\u5e76\u5728\u6570\u5b66\u548c\u5176\u4ed6\u9886\u57df\u5982\u7f16\u7a0b\u7b49\u4efb\u52a1\u4e0a\u4fdd\u6301\u4e00\u81f4\u7684\u6539\u8fdb\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u867d\u7136PPO\u7ecf\u5e38\u88ab\u4f7f\u7528\uff0c\u4f46\u5728\u6d41\u884c\u7684\u7f16\u7a0b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u5bb9\u6613\u906d\u53d7\u4e25\u91cd\u7684\u5956\u52b1\u4f5c\u5f0a\uff0c\u800cCGPO\u6210\u529f\u5730\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\u3002 \u8fd9\u4e00\u7a81\u7834\u5728RLHF\u9886\u57df\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5956\u52b1\u4f5c\u5f0a\u548c\u6781\u7aef\u591a\u76ee\u6807\u4f18\u5316\u7684\u6311\u6218\uff0c\u800c\u4e14\u63a8\u8fdb\u4e86\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u591a\u79cd\u5e94\u7528\u4e2d\u7684\u5bf9\u9f50\u6280\u672f\u3002|\n", "2409.20365": "|**2024-09-30**|**VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs**|Ruotong Liao et.al.|[2409.20365](http://arxiv.org/abs/2409.20365)|null|\u5728\u89c6\u9891\u8bed\u8a00\u9886\u57df\uff0c\u5229\u7528\u96f6\u6837\u672c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u7406\u8fdb\u884c\u89c6\u9891\u7406\u89e3\u7684\u6700\u65b0\u5de5\u4f5c\u5df2\u6210\u4e3a\u6311\u6218\u4f20\u7edf\u7aef\u5230\u7aef\u6a21\u578b\u7684\u6709\u529b\u7ade\u4e89\u8005\u3002\u7136\u800c\uff0c\u957f\u89c6\u9891\u7684\u7406\u89e3\u9762\u4e34\u7740\u72ec\u7279\u7684\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u65f6\u95f4\u8de8\u5ea6\u65f6\uff0c\u5373\u4f7f\u662f\u96f6\u6837\u672cLLM\u65b9\u6cd5\u4e5f\u662f\u5982\u6b64\u3002\u957f\u89c6\u9891\u4e2d\u7684\u4fe1\u606f\u5197\u4f59\u95ee\u9898\u4fc3\u4f7f\u6211\u4eec\u601d\u8003\u54ea\u4e9b\u4fe1\u606f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53ca\u5982\u4f55\u5229\u7528\u5b83\u4eec\u8fdb\u884c\u590d\u6742\u7684\u7a7a\u95f4-\u65f6\u95f4\u63a8\u7406\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u957f\u89c6\u9891\u5206\u6790\u7684\u7406\u89e3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aVideoINSTA\uff08INformative Spatial-TemporAl Reasoning\uff09\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u96f6\u6837\u672c\u957f\u89c6\u9891\u7406\u89e3\u3002VideoINSTA\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a\uff081\uff09\u5229\u7528LLM\u8fdb\u884c\u957f\u89c6\u9891\u7406\u89e3\u7684\u96f6\u6837\u672c\u6846\u67b6\uff1b\uff082\uff09\u4e8b\u4ef6\u9a71\u52a8\u7684\u65f6\u95f4\u63a8\u7406\u548c\u57fa\u4e8e\u5185\u5bb9\u7684\u7a7a\u95f4\u63a8\u7406\u65b9\u6cd5\uff0c\u4f7fLLM\u80fd\u591f\u5bf9\u89c6\u9891\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\uff1b\uff083\uff09\u4e00\u79cd\u81ea\u6211\u53cd\u601d\u7684\u4fe1\u606f\u63a8\u7406\u65b9\u6848\uff0c\u901a\u8fc7\u4fe1\u606f\u5145\u5206\u6027\u548c\u9884\u6d4b\u7f6e\u4fe1\u5ea6\u7684\u5e73\u8861\u6765\u8c03\u6574\u65f6\u95f4\u56e0\u7d20\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5728\u4e09\u4e2a\u957f\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6700\u4f73\u6027\u80fd\uff1aEgoSchema\u3001NextQA\u548cIntentQA\uff0c\u4ee5\u53ca\u5f00\u653e\u95ee\u7b54\u6570\u636e\u96c6ActivityNetQA\u3002\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u53d1\u5e03\uff1ahttps://github.com/mayhugotong/VideoINSTA\u3002|\n", "2410.01805": "|**2024-10-02**|**Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads**|Yuxiang Huang et.al.|[2410.01805](http://arxiv.org/abs/2410.01805)|**[link](https://github.com/huangyuxiang03/Locret)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u652f\u6301\u957f\u671f\u4e0a\u4e0b\u6587\u7406\u89e3\u548c\u5904\u7406\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5c06LLMs\u7684\u751f\u6210\u63a8\u7406\u6269\u5c55\u5230\u5982\u6b64\u957f\u7684\u4e0a\u4e0b\u6587\u4f1a\u589e\u52a0\u5927\u91cf\u7684\u8ba1\u7b97\u8d1f\u8f7d\uff0c\u5e76\u8981\u6c42\u5728\u7ef4\u6301\u57fa\u4e8e\u8f6c\u6362\u5668\u7684LLMs\u7684\u5173\u952e\u503c\u5bf9\uff08KV\uff09\u7f13\u5b58\u65f6\u4f7f\u7528\u5927\u91cfGPU\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u538b\u7f29\u65b9\u6cd5\uff0c\u5982\u91cf\u5316\uff0c\u968f\u7740\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u589e\u52a0\u800c\u9047\u5230\u5185\u5b58\u74f6\u9888\uff1b\u800c\u56fa\u5b9a\u5927\u5c0f\u7684\u7f13\u5b58\uff0c\u5982\u6dd8\u6c70\u7b56\u7565\uff0c\u5219\u7531\u4e8e\u4e0d\u9ad8\u6548\u7684\u7b56\u7565\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u8fd9\u4e9b\u9650\u5236\u9650\u5236\u4e86\u5728\u5355\u4e2aNvidia 4090 GPU\u7b49\u6d88\u8d39\u8005\u7ea7\u8bbe\u5907\u4e0a\u7684\u90e8\u7f72\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Locret\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u7528\u4e8e\u957f\u4e0a\u4e0b\u6587LLM\u63a8\u7406\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5165\u4fdd\u7559\u5934\u90e8\u6765\u8bc4\u4f30KV\u7f13\u5b58\u5355\u5143\u7684\u56e0\u679c\u91cd\u8981\u6027\uff0c\u4ece\u800c\u5141\u8bb8\u5728\u56fa\u5b9a\u7f13\u5b58\u5927\u5c0f\u5185\u8fdb\u884c\u66f4\u51c6\u786e\u7684\u6dd8\u6c70\u3002Locret\u5728\u51bb\u7ed3\u7684\u4e3b\u5e72LLM\u57fa\u7840\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4f7f\u7528\u6807\u51c6\u957f\u65f6\u95f4\u4e0a\u4e0b\u6587SFT\u6570\u636e\u96c6\u7684\u5c11\u91cf\u6570\u636e\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ee5\u5206\u5757\u9884\u586b\u5145\u6a21\u5f0f\u6dd8\u6c70\u4f4e\u91cd\u8981\u6027\u7684\u7f13\u5b58\u5355\u5143\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u91cf\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u6765\u8bc4\u4f30Locret\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6700\u8fd1\u7684\u7ade\u4e89\u65b9\u6cd5\uff08\u5305\u62ecInfLLM\u3001\u91cf\u5316\u3001SirLLM\u548cMInference\uff09\u76f8\u6bd4\uff0cLocret\u5728\u5185\u5b58\u6548\u7387\u548c\u751f\u6210\u5185\u5bb9\u8d28\u91cf\u65b9\u9762\u5747\u8868\u73b0\u51fa\u8272\u2014\u2014Locret\u5b9e\u73b0\u4e86\u4e0ePhi-3-mini-128K\u548cLlama-3.1-8B-instruct\u5168KV\u7f13\u5b58\u76f8\u6bd4\u8d85\u8fc720\u500d\u548c8\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\u7387\u3002\u6b64\u5916\uff0cLocret\u8fd8\u53ef\u4ee5\u4e0e\u5176\u4ed6\u65b9\u6cd5\uff08\u5982\u91cf\u5316\u548c\u4ee4\u724c\u5408\u5e76\uff09\u7ed3\u5408\u4f7f\u7528\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cLocret\u662f\u7b2c\u4e00\u4e2a\u80fd\u591f\u5c06Llama-3.1-8B\u6216\u7c7b\u4f3c\u6a21\u578b\u90e8\u7f72\u5230\u5355\u4e2aNvidia 4090 GPU\u4e0a\uff0c\u540c\u65f6\u5728\u4e0d\u727a\u7272\u751f\u6210\u8d28\u91cf\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0128K\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u7684\u6846\u67b6\uff0c\u4e14\u4ec5\u9700\u8981\u5c11\u91cf\u989d\u5916\u7684\u7cfb\u7edf\u4f18\u5316\u3002**|\n", "2410.01799": "|**2024-10-02**|**Efficient $1$-bit tensor approximations**|Alex W. Neal Riasanovsky et.al.|[2410.01799](http://arxiv.org/abs/2410.01799)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7a7a\u95f4\u6548\u7387\u9ad8\u7684\u77e9\u9635\u548c\u4efb\u610f\u9636\u5f20\u91cf\u5206\u89e3\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u7ebf\u6027\u7ec4\u5408\u7684\u5f20\u91cf\u79ef\u5f62\u5f0f\uff0c\u5176\u4e2d\u5411\u91cf\u503c\u4e3a$\\{-1, 1\\}$\u3002\u5bf9\u4e8e\u4efb\u4e00\u77e9\u9635$A \\in \\mathbb{R}^{m \\times n}$\uff0c\u5176\u8868\u8fbe\u5f0f\u4e3a\uff1a$$A - R_w = S_w C_w T_w^\\top = \\sum_{j=1}^w c_j \\cdot \\mathbf{s}_j \\mathbf{t}_j^\\top$$ \u8fd9\u662f\u4e00\u4e2a\u5173\u4e8e$A$\u7684\u201c\u5bbd\u5ea6\u4e3a$w$\u7684\u7b26\u53f7\u5207\u5206\u89e3\u201d\u3002\u8fd9\u91cc$C_w = \"diag\"(\\mathbf{c}_w)$\uff0c\u4e14$S_w, T_w$\u548c\u5411\u91cf$\\mathbf{s}_j, \\mathbf{t}_j$\u5747\u4e3a$\\{-1, 1\\}$\u503c\u3002\u7528\u4e8e\u5b58\u50a8$(S_w, T_w, C_w)$\u6240\u9700\u7684\u7a7a\u95f4\u662f$w \\cdot (m + n)$\u4f4d\uff0c\u5e76\u4ec5\u9700$w$\u4e2a\u6d6e\u70b9\u6570\u3002\u5f53\u5e94\u7528\u4e8e\u5177\u6709i.i.d. $\\mathcal N (0, 1)$\u5206\u5e03\u5143\u7d20\u7684#f32\u77e9\u9635\u65f6\uff0c$\\,R_w\\,_F$\u5448\u73b0\u51fa\u6307\u6570\u8870\u51cf\u3002\u9009\u62e9\u5408\u9002\u7684$w$\uff0c\u4f7f$(S_w, T_w, C_w)$\u7684\u5185\u5b58\u5360\u7528\u4e0e\\textit{f16}\u6216\\textit{bf16}\u77e9\u9635\u76f8\u540c\uff0c\u76f8\u5bf9\u8bef\u5dee\u76f8\u5f53\u3002\u6211\u4eec\u7684\u7b97\u6cd5\u572820\u884c\u4f2a\u4ee3\u7801\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u7b26\u53f7\u5207\u5206\u89e3\u3002\u5b83\u6e90\u81ea1999\u5e74Frieze\u548cKannan\u7684\u4e00\u7bc7\u8457\u540d\u8bba\u6587\u7684\u7b80\u5355\u4fee\u6539\u3002 \u4f5c\u4e3a\u7b2c\u4e00\u4e2a\u5e94\u7528\uff0c\u6211\u4eec\u5bf9\u5f00\u653e\u6e90\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\\textit{Mistral-7B-v0.1}\u4e2d\u7684\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u4e86$50\\%$\u7684\u7a7a\u95f4\u538b\u7f29\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6240\u6709$226$\u4e2a\u4f59\u77e9\u9635\u7684\u76f8\u5bf9\u8bef\u5dee\u5747\u5c0f\u4e8e$6\\%$\uff0c\u4e14\u6269\u5c55\u6a21\u578b\u5728huggingface\u6392\u884c\u699c\u4e0a\u4e0e\\textit{Mistral-7B-v0.1}\u6a21\u578b\u8868\u73b0\u76f8\u8fd1\u3002\u968f\u7740\u7a7a\u95f4\u538b\u7f29\u7387\u4ece$50\\%$\u964d\u4f4e\u81f3$25\\%$\uff0c\u57fa\u51c6\u6027\u80fd\u7f13\u6162\u4e0b\u964d\u3002\u6211\u4eec\u4f18\u5316\u4e86\u5f00\u6e90\u7684\\textit{rust}\u5b9e\u73b0\uff0c\u4f7f\u7528\u4e86\\textit{avx2}\u548c\\textit{avx512}\u67b6\u6784\u4e0b\u7684\\textit{simd}\u6307\u4ee4\u8fdb\u884c\u52a0\u901f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06\u8be5\u7b97\u6cd5\u6269\u5c55\u5230\u4e86\u4efb\u610f\u9636\u5f20\u91cf\uff0c\u5e76\u5229\u7528\u5b83\u538b\u7f29\u4e86\u4e00\u5f20\u4f5c\u8005\u732bAngus\u7684\u7167\u7247\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u91cc\u7684\u6587\u672c\u5e76\u672a\u5305\u542b\u4efb\u4f55\u7279\u6b8a\u5b57\u7b26\u6216\u7279\u5b9a\u683c\u5f0f\u6807\u8bb0\uff0c\u800c\u662f\u4ee5\u7eaf\u6587\u672c\u5f62\u5f0f\u5448\u73b0\u4e86\u6458\u8981\u5185\u5bb9\u3002|\n", "2410.01795": "|**2024-10-02**|**Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models**|Joseph Lee et.al.|[2410.01795](http://arxiv.org/abs/2410.01795)|**[link](https://github.com/pennshenlab/freeform)**|**\u57fa\u4e8e\u590d\u6742\u9057\u4f20\u57fa\u7840\u9884\u6d4b\u8868\u578b\uff0c\u5229\u7528\u5c0f\u800c\u53ef\u89e3\u91ca\u7684\u53d8\u5f02\u7279\u5f81\u4ecd\u7136\u662f\u4e00\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u4e0a\uff0c\u4f7f\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6b64\u4efb\u52a1\uff0c\u4f46\u57fa\u56e0\u578b\u6570\u636e\u7684\u9ad8\u7ef4\u7279\u6027\u4f7f\u5f97\u5206\u6790\u548c\u9884\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u53d7\u5230\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7f16\u7801\u7684\u4e30\u5bcc\u77e5\u8bc6\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u751f\u7269\u533b\u5b66\u6982\u5ff5\u4e0a\u7684\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u65e8\u5728\u63a2\u7d22LLM\u5728\u8868\u683c\u57fa\u56e0\u578b\u6570\u636e\u7279\u5f81\u9009\u62e9\u4e0e\u5de5\u7a0b\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5e76\u5f15\u5165\u4e00\u79cd\u57fa\u4e8e\u77e5\u8bc6\u7684\u6846\u67b6\u3002\u6211\u4eec\u5f00\u53d1\u4e86FREEFORM\uff0c\u4e00\u79cd\u81ea\u7531\u6d41\u52a8\u63a8\u7406\u4e0e\u96c6\u6210\u589e\u5f3a\u7279\u5f81\u8f93\u51fa\u548c\u7a33\u5065\u5efa\u6a21\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u94fe\u5f0f\u601d\u8003\u4e0e\u96c6\u6210\u539f\u5219\uff0c\u5229\u7528LLM\u7684\u5185\u5728\u77e5\u8bc6\u6765\u9009\u62e9\u548c\u5de5\u7a0b\u7279\u5f81\u3002\u5728\u4e24\u4e2a\u4e0d\u540c\u7684\u4eba\u7c7b\u57fa\u56e0\u578b-\u8868\u578b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\uff0c\u5305\u62ec\u9057\u4f20\u8840\u7edf\u548c\u9057\u4f20\u6027\u542c\u529b\u635f\u5931\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e2a\u6846\u67b6\u5728\u4f4e\u6837\u672c\u91cf\u60c5\u51b5\u4e0b\u4f18\u4e8e\u51e0\u79cd\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u3002FREEFORM\u4f5c\u4e3a\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u53ef\u4ee5\u5728GitHub\u4e0a\u83b7\u53d6\uff1ahttps://github.com/PennShenLab/FREEFORM\u3002**|\n", "2410.01792": "|**2024-10-02**|**When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1**|R. Thomas McCoy et.al.|[2410.01792](http://arxiv.org/abs/2410.01792)|null|\u5728\u201c\u81ea\u52a8\u56de\u5f52\u4f59\u70ec\u201d\uff08McCoy\u7b49\u4eba\uff0c2023\u5e74\uff09\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u51e0\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8d77\u6e90\u4e0a\u5b58\u5728\u4e00\u4e9b\u91cd\u8981\u9650\u5236\uff0c\u8fd9\u5f52\u56e0\u4e8e\u5b83\u4eec\u7684\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u7279\u6027\u3002\u8fd9\u91cc\u6211\u4eec\u63a2\u8ba8\u4e86OpenAI\u7684\u65b0\u7cfb\u7edfo1\u662f\u5426\u4f9d\u7136\u5b58\u5728\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e0e\u4e4b\u524d\u7684LLMs\u76f8\u6bd4\uff0co1\u5728\u63a8\u7406\u4f18\u5316\u65b9\u9762\u6709\u6240\u4e0d\u540c\u3002\u7814\u7a76\u53d1\u73b0\uff0co1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\u663e\u8457\u4f18\u4e8e\u4e4b\u524d\u6a21\u578b\uff0c\u5728\u67d0\u4e9b\u5e38\u89c1\u4efb\u52a1\u7684\u7f55\u89c1\u53d8\u4f53\u4e0a\uff08\u4f8b\u5982\uff0c\u4ece\u5217\u8868\u4e2d\u7684\u6bcf\u4e2a\u8bcd\u7684\u7b2c\u4e8c\u4e2a\u5b57\u6bcd\u5f62\u6210\u7f29\u5199\uff0c\u800c\u4e0d\u662f\u7b2c\u4e00\u4e2a\u5b57\u6bcd\uff09\u8868\u73b0\u5c24\u5176\u51fa\u8272\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5b9a\u91cf\u6539\u8fdb\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u4f46o1\u4f9d\u7136\u663e\u793a\u51fa\u4e86\u4e0e\u4e4b\u524d\u7cfb\u7edf\u76f8\u540c\u7684\u57fa\u672c\u8d8b\u52bf\uff1a\u5bf9\u4e8e\u6982\u7387\u8f83\u9ad8\u7684\u793a\u4f8b\u548c\u4efb\u52a1\uff0co1\u7684\u8868\u73b0\u66f4\u597d\u4e14\u9700\u8981\u7684\u201c\u601d\u8003\u4ee4\u724c\u201d\u6570\u91cf\u8f83\u5c11\uff1b\u800c\u5728\u6982\u7387\u8f83\u4f4e\u7684\u60c5\u51b5\u4e0b\u5219\u8868\u73b0\u4e0d\u4f73\u3002 \u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u4ee5\u8fdb\u884c\u63a8\u7406\u53ef\u4ee5\u51cf\u8f7b\u4f46\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u514b\u670d\u8bed\u8a00\u6a21\u578b\u7684\u6982\u7387\u654f\u611f\u6027\u95ee\u9898\u3002|\n", "2410.01789": "|**2024-10-02**|**Investigating on RLHF methodology**|Alexey Kutalev et.al.|[2410.01789](http://arxiv.org/abs/2410.01789)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7684\u5bf9\u9f50\u95ee\u9898\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u8bad\u7ec3\u504f\u597d\u6a21\u578b\u7684\u7279\u6027\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u504f\u597d\uff0c\u5e76\u4ecb\u7ecd\u4e86\u5b9e\u73b0\u6700\u4f73\u7ed3\u679c\u6240\u9700\u7684\u65b9\u6cd5\u548c\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u63cf\u8ff0\u4e86\u9047\u5230\u7684\u6311\u6218\u4ee5\u53ca\u514b\u670d\u8fd9\u4e9b\u6311\u6218\u7684\u65b9\u5f0f\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u7684\u7ecf\u9a8c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u6211\u4eec\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\uff0c\u800c\u65e0\u9700\u521b\u5efa\u5355\u72ec\u7684\u504f\u597d\u6a21\u578b\u3002\u4f5c\u4e3a\u6211\u4eec\u7684\u8d21\u732e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u901a\u8fc7\u56f0\u60d1\u5ea6\u7b5b\u9009\u6536\u96c6\u504f\u597d\u6570\u636e\u96c6\u7684\u65b9\u6cd5\uff0c\u8fd9\u4f7f\u5f97\u4e3a\u7279\u5b9a\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u7684\u8fc7\u7a0b\u66f4\u52a0\u7b80\u4fbf\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u3002|\n", "2410.01784": "|**2024-10-02**|**OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models**|Heng Yang et.al.|[2410.01784](http://arxiv.org/abs/2410.01784)|**[link](https://github.com/yangheng95/OmniGenomeBench)**|**\u8fd1\u5e74\u6765\uff0c\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u6fc0\u53d1\u4e86\u5bf9\u57fa\u56e0\u7ec4\u57fa\u7840\u6a21\u578b\uff08GFMs\uff09\u7a81\u7834\u6027\u8fdb\u5c55\u7684\u671f\u5f85\u3002\u81ea\u751f\u547d\u8fdb\u5316\u4e4b\u521d\u5c31\u9690\u85cf\u5728\u591a\u6837\u5316\u7684\u57fa\u56e0\u7ec4\u4e2d\u7684\u201c\u81ea\u7136\u4e4b\u7801\u201d\uff0c\u8574\u542b\u7740\u5de8\u5927\u6f5c\u529b\uff0c\u80fd\u591f\u901a\u8fc7\u57fa\u56e0\u7ec4\u5efa\u6a21\u5bf9\u4eba\u7c7b\u548c\u751f\u6001\u7cfb\u7edf\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002\u8fd1\u671fGFM\u9886\u57df\u7684\u91cd\u8981\u7a81\u7834\uff0c\u5982Evo\uff0c\u5438\u5f15\u4e86\u5927\u91cf\u6295\u8d44\u4e0e\u5173\u6ce8\uff0c\u5b83\u4eec\u89e3\u51b3\u4e86\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5e76\u5c06\u57fa\u56e0\u7ec4\u7814\u7a76\u4ece\u624b\u52a8\u3001\u4e0d\u53ef\u9760\u548c\u4f4e\u6548\u7684\u4f20\u7edf\u6a21\u5f0f\u8f6c\u53d8\u4e3a\u81ea\u52a8\u5316\u3001\u53ef\u9760\u548c\u9ad8\u6548\u7684\u65b0\u8303\u5f0f\u3002\u5728\u57fa\u56e0\u7ec4\u5b66\u8fde\u7eed\u6280\u672f\u9769\u547d\u7684\u80cc\u666f\u4e0b\uff0cGFM\u7814\u7a76\u9762\u4e34\u4e24\u5927\u6311\u6218\uff1a\u7f3a\u4e4fGFM\u57fa\u51c6\u6d4b\u8bd5\u5de5\u5177\u4ee5\u53ca\u591a\u7ef4\u57fa\u56e0\u7ec4\u5b66\u7684\u5f00\u6e90\u8f6f\u4ef6\u7f3a\u5931\u3002\u8fd9\u4e9b\u6311\u6218\u963b\u788d\u4e86GFM\u5feb\u901f\u6f14\u8fdb\u53ca\u5176\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7406\u89e3\u4e0e\u5408\u6210\u57fa\u56e0\u7ec4\u7b49\u6570\u5341\u5e74\u6765\u5b58\u5728\u7684\u95ee\u9898\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GFMBench\u6846\u67b6\uff0c\u4e00\u4e2a\u4e13\u6ce8\u4e8eGFM\u5bfc\u5411\u57fa\u51c6\u6d4b\u8bd5\u7684\u5e73\u53f0\u3002GFMBench\u6807\u51c6\u5316\u4e86\u57fa\u51c6\u5957\u4ef6\uff0c\u5e76\u5b9e\u73b0\u4e86\u5bf9\u5927\u91cf\u5f00\u6e90GFMs\u7684\u81ea\u52a8\u5316\u57fa\u51c6\u6d4b\u8bd5\u3002\u5b83\u96c6\u6210\u4e86\u6765\u81ea\u56db\u5927\u5927\u578b\u57fa\u51c6\u7684\u6570\u767e\u4e07\u4e2a\u57fa\u56e0\u5e8f\u5217\uff0c\u8986\u76d6\u6570\u767e\u79cd\u57fa\u56e0\u7ec4\u4efb\u52a1\uff0c\u4f7fGFMs\u6c11\u4e3b\u5316\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u865a\u62df\u57fa\u56e0\u7ec4\u5e94\u7528\u3002\u6b64\u5916\uff0cGFMBench\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u53d1\u5e03\uff0c\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u754c\u9762\u548c\u591a\u6837\u5316\u6559\u7a0b\uff0c\u9002\u7528\u4e8e\u81ea\u52a8\u6d4b\u8bd5\u4ee5\u53caRNA\u8bbe\u8ba1\u548c\u7ed3\u6784\u9884\u6d4b\u7b49\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u4fc3\u8fdb\u57fa\u56e0\u7ec4\u5efa\u6a21\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u516c\u5171\u6392\u884c\u699c\uff0c\u5c55\u793a\u7531AutoBench\u751f\u6210\u7684\u57fa\u51c6\u6027\u80fd\u3002GFMBench\u4ee3\u8868\u4e86\u6807\u51c6\u5316GFM\u57fa\u51c6\u6d4b\u8bd5\u548c\u6c11\u4e3b\u5316GFM\u5e94\u7528\u7684\u4e00\u5927\u6b65\u3002**|\n", "2410.01782": "|**2024-10-02**|**Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models**|Shayekh Bin Islam et.al.|[2410.01782](http://arxiv.org/abs/2410.01782)|**[link](https://github.com/ShayekhBinIslam/openrag)**|\u4e3a\u4e86\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e8b\u5b9e\u51c6\u786e\u6027\u4e0a\u7684\u8868\u73b0\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\u5df2\u7ecf\u5f97\u5230\u4e86\u5e7f\u6cdb\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u5728\u5229\u7528\u68c0\u7d22\u5230\u7684\u8bc1\u636e\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u5f00\u6e90LLM\u65f6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Open-RAG\uff0c\u65e8\u5728\u589e\u5f3a\u5f00\u6e90LLM\u5728RAG\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u4efb\u610f\u5bc6\u96c6\u578bLLM\u8f6c\u6362\u6210\u4e00\u4e2a\u53c2\u6570\u9ad8\u6548\u7684\u7a00\u758f\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\uff0c\u80fd\u591f\u5904\u7406\u5305\u62ec\u5355\u8df3\u548c\u591a\u8df3\u67e5\u8be2\u5728\u5185\u7684\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002 Open-RAG\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\uff0c\u5b83\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u6765\u5e94\u5bf9\u770b\u4f3c\u76f8\u5173\u4f46\u5177\u6709\u8bef\u5bfc\u6027\u7684\u5e72\u6270\u9879\uff0c\u4ece\u800c\u6709\u6548\u5730\u5bfc\u822a\u590d\u6742\u573a\u666f\u3002\u901a\u8fc7\u5229\u7528\u6f5c\u5b66\u4e60\uff0cOpen-RAG\u52a8\u6001\u9009\u62e9\u76f8\u5173\u4e13\u5bb6\u5e76\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u4ee5\u63d0\u4f9b\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u81ea\u9002\u5e94\u68c0\u7d22\u65b9\u6cd5\uff0c\u7528\u4e8e\u5224\u65ad\u68c0\u7d22\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5e73\u8861\u6027\u80fd\u589e\u76ca\u4e0e\u63a8\u7406\u901f\u5ea6\u4e4b\u95f4\u7684\u6743\u8861\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLlama2-7B\u7684Open-RAG\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\uff0c\u76f8\u8f83\u4e8eChatGPT\u3001Self-RAG\u548cCommand R+\u7b49\u6700\u5148\u8fdb\u7684LLM\u548cRAG\u6a21\u578b\uff0c\u8868\u73b0\u51fa\u66f4\u4f18\u7684\u8868\u73b0\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6a21\u578b\u5f00\u6e90\u5728https://openragmoe.github.io/\u3002|\n", "2410.01769": "|**2024-10-02**|**Quantifying Generalization Complexity for Large Language Models**|Zhenting Qi et.al.|[2410.01769](http://arxiv.org/abs/2410.01769)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u7406\u89e3\u590d\u6742\u67e5\u8be2\u548c\u6267\u884c\u9ad8\u7ea7\u4efb\u52a1\u7684\u975e\u51e1\u80fd\u529b\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u5f80\u5f80\u4e0e\u8bb0\u5fc6\u6df1\u5ea6\u4ea4\u7ec7\u5728\u4e00\u8d77\uff0c\u8fd9\u8981\u6c42\u6211\u4eec\u8fdb\u884c\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86Scylla\uff0c\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u5b9a\u91cf\u8861\u91cfLLMs\u7684\u6cdb\u5316\u80fd\u529b\u3002Scylla\u901a\u8fc7\u5728\u5206\u5e03\u5185\uff08ID\uff09\u548c\u5206\u5e03\u5916\uff08OOD\uff09\u6570\u636e\u4e0a\u8bc4\u4f30\u6a21\u578b\u6027\u80fd\u6765\u5206\u79bb\u6cdb\u5316\u4e0e\u8bb0\u5fc6\uff0c\u6d89\u53ca20\u4e2a\u4efb\u52a1\uff0c\u8986\u76d65\u4e2a\u590d\u6742\u5ea6\u7ea7\u522b\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u4e0eID\u548cOOD\u6570\u636e\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u4e4b\u95f4\u975e\u5355\u8c03\u7684\u5173\u7cfb\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6cdb\u5316\u5c71\u8c37\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u8fd9\u4e00\u73b0\u8c61\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u9608\u503c\u2014\u2014\u79f0\u4e3a\u5173\u952e\u590d\u6742\u6027\u2014\u2014\u5728\u8be5\u9608\u503c\u5904\uff0c\u975e\u6cdb\u5316\u884c\u4e3a\u7684\u4f9d\u8d56\u8fbe\u5230\u5cf0\u503c\uff0c\u8868\u660e\u4e86LLMs\u6cdb\u5316\u80fd\u529b\u7684\u4e0a\u9650\u3002\u968f\u7740\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\uff0c\u5173\u952e\u590d\u6742\u6027\u5411\u66f4\u9ad8\u5c42\u6b21\u7684\u4efb\u52a1\u590d\u6742\u5ea6\u79fb\u52a8\uff0c\u8868\u660e\u66f4\u5927\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u4e4b\u524d\u5904\u7406\u66f4\u590d\u6742\u7684\u63a8\u7406\u4efb\u52a1\u3002\u5229\u7528Scylla\u548c\u5173\u952e\u590d\u6742\u6027\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u5bf9\u5305\u62ec\u5f00\u6e90\u6a21\u578b\u5982LLaMA\u548cQwen\u5bb6\u65cf\u3001\u4ee5\u53ca\u95ed\u6e90\u6a21\u578b\u5982Claude\u548cGPT\u5728\u5185\u768428\u4e2aLLMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u63d0\u4f9b\u4e86\u66f4\u7a33\u5065\u7684\u8bc4\u4f30\uff0c\u5e76\u5bf9LLMs\u7684\u6cdb\u5316\u80fd\u529b\u6709\u4e86\u66f4\u6e05\u6670\u7684\u7406\u89e3\u3002|\n", "2410.01744": "|**2024-10-02**|**LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks**|Mengzhao Jia et.al.|[2410.01744](http://arxiv.org/abs/2410.01744)|**[link](https://github.com/jill0001/leopard)**|\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u50cf\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u666e\u904d\u5b58\u5728\uff0c\u5982\u5e7b\u706f\u7247\u6f14\u793a\u3001\u626b\u63cf\u6587\u6863\u548c\u7f51\u9875\u5feb\u7167\u7b49\uff0c\u5176\u4e2d\u6587\u672c\u4f5c\u4e3a\u6838\u5fc3\u89c6\u89c9\u5143\u7d20\u5f15\u5bfc\u6574\u4f53\u7406\u89e3\u3002\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u7684\u4efb\u52a1\u5c24\u5176\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e0d\u4ec5\u9700\u8981\u7406\u89e3\u5355\u4e2a\u56fe\u50cf\u7684\u5185\u5bb9\uff0c\u8fd8\u9700\u8981\u5728\u591a\u4e2a\u89c6\u89c9\u8f93\u5165\u4e4b\u95f4\u63a8\u7406\u5173\u7cfb\u548c\u903b\u8f91\u6d41\u7a0b\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u573a\u666f\u7684\u91cd\u8981\u6027\uff0c\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u65f6\u9047\u5230\u4e24\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u7f3a\u4e4f\u9002\u5408\u4e8e\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff1b\uff082\uff09\u96be\u4ee5\u5e73\u8861\u56fe\u50cf\u5206\u8fa8\u7387\u4e0e\u89c6\u89c9\u7279\u5f81\u5e8f\u5217\u957f\u5ea6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\OurMethod\uff0c\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u6d89\u53ca\u591a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7684\u89c6\u8bed\u8a00\u4efb\u52a1\u7684MLLM\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u7ea6\u4e00\u767e\u4e07\u6761\u9488\u5bf9\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u7684\u9ad8\u5206\u8fa8\u7387\u591a\u56fe\u50cf\u7f16\u7801\u6a21\u5757\uff0c\u6839\u636e\u8f93\u5165\u56fe\u50cf\u7684\u539f\u59cb\u7eb5\u6a2a\u6bd4\u548c\u5206\u8fa8\u7387\u52a8\u6001\u4f18\u5316\u89c6\u89c9\u5e8f\u5217\u957f\u5ea6\u7684\u5206\u914d\u3002\u5728\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u80fd\u529b\uff0c\u5e76\u5728\u901a\u7528\u9886\u57df\u8bc4\u4f30\u4e2d\u5c55\u73b0\u51fa\u7ade\u4e89\u529b\u3002|\n", "2410.01738": "|**2024-10-02**|**VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models**|Kailai Feng et.al.|[2410.01738](http://arxiv.org/abs/2410.01738)|**[link](https://github.com/carlofkl/vitaglyph)**|**\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u53cc\u5206\u652f\u3001\u65e0\u9700\u8bad\u7ec3\u7684\u65b0\u578b\u827a\u672f\u5b57\u4f53\u751f\u6210\u65b9\u6cd5\u2014\u2014VitaGlyph\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u7075\u6d3b\u5730\u8868\u8fbe\u8f93\u5165\u5b57\u7b26\u7684\u6838\u5fc3\u6982\u5ff5\u4ee5\u53ca\u4e30\u5bcc\u76f8\u5173\u7684\u80cc\u666f\u4fe1\u606f\uff0c\u5b9e\u73b0\u827a\u672f\u5b57\u4f53\u4e0e\u53ef\u63a7\u5236\u7684\u51e0\u4f55\u53d8\u5316\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u4ece\u800c\u4fdd\u6301\u5b57\u4f53\u7684\u53ef\u8bfb\u6027\u3002VitaGlyph\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u8f93\u5165\u5b57\u7b26\u89c6\u4e3a\u7531\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7ec4\u6210\u7684\u573a\u666f\uff0c\u5e76\u5728\u4e0d\u540c\u51e0\u4f55\u53d8\u6362\u7a0b\u5ea6\u4e0b\u8fdb\u884c\u6e32\u67d3\u3002 \u5177\u4f53\u6765\u8bf4\uff0cVitaGlyph\u901a\u8fc7\u4ee5\u4e0b\u4e09\u4e2a\u9636\u6bb5\u6846\u67b6\u5b9e\u73b0\u5176\u529f\u80fd\uff1a(i) \u77e5\u8bc6\u83b7\u53d6\u9636\u6bb5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bbe\u8ba1\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7684\u6587\u672c\u63cf\u8ff0\uff1b(ii) \u533a\u57df\u5206\u89e3\u9636\u6bb5\u8bc6\u522b\u6700\u5339\u914d\u4e3b\u4f53\u63cf\u8ff0\u7684\u90e8\u5206\uff0c\u5e76\u5c06\u8f93\u5165\u7684\u5b57\u7b26\u56fe\u50cf\u5206\u4e3a\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\uff1b(iii) \u5b57\u4f53\u98ce\u683c\u5316\u9636\u6bb5\u9996\u5148\u901a\u8fc7\u8bed\u4e49\u5b57\u4f53\u4f18\u5316\u4e3b\u4f53\u533a\u57df\u7684\u7ed3\u6784\uff0c\u7136\u540e\u5206\u522b\u4f7f\u7528\u53ef\u63a7\u7ec4\u5408\u751f\u6210\u6280\u672f\u6e32\u67d3\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\u7684\u7eb9\u7406\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cVitaGlyph\u4e0d\u4ec5\u5728\u827a\u672f\u6027\u548c\u53ef\u8bfb\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u80fd\u591f\u63cf\u7ed8\u591a\u79cd\u5b9a\u5236\u6982\u5ff5\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5bcc\u6709\u521b\u610f\u548c\u6109\u60a6\u7684\u827a\u672f\u5b57\u4f53\u751f\u6210\u3002\u9879\u76ee\u4ee3\u7801\u5c06\u5728https://github.com/Carlofkl/VitaGlyph\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2410.02761": "|**2024-10-03**|**FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models**|Zhipei Xu et.al.|[2410.02761](http://arxiv.org/abs/2410.02761)|**[link](https://github.com/zhipeixu/fakeshield)**|\u751f\u6210\u5f0fAI\u7684\u5feb\u901f\u53d1\u5c55\u72b9\u5982\u4e00\u628a\u53cc\u5203\u5251\uff0c\u65e2\u4fc3\u8fdb\u4e86\u5185\u5bb9\u521b\u4f5c\uff0c\u4e5f\u4f7f\u5f97\u56fe\u50cf\u7f16\u8f91\u548c\u96be\u4ee5\u8fa8\u8bc6\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u5f53\u524d\u7684\u56fe\u50cf\u4f2a\u9020\u68c0\u6d4b\u4e0e\u5b9a\u4f4d\uff08IFDL\uff09\u65b9\u6cd5\u867d\u7136\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u6709\u6548\uff0c\u4f46\u4ecd\u7136\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a1\uff09\u9ed1\u76d2\u6027\u8d28\uff0c\u5373\u65e0\u6cd5\u77e5\u6653\u5176\u68c0\u6d4b\u539f\u7406\uff1b2\uff09\u5bf9\u4e0d\u540c\u4f2a\u9020\u6280\u672f\uff08\u5982Photoshop\u3001DeepFake\u3001AIGC-Editing\u7b49\uff09\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53ef\u89e3\u91ca\u7684IFDL\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u5177\u6709\u591a\u6a21\u6001\u80fd\u529b\u7684\u6846\u67b6\u2014\u2014FakeShield\u3002\u8be5\u6846\u67b6\u65e8\u5728\u8bc4\u4f30\u56fe\u50cf\u7684\u771f\u5b9e\u6027\uff0c\u751f\u6210\u7be1\u6539\u533a\u57df\u7684\u63a9\u6a21\uff0c\u5e76\u57fa\u4e8e\u50cf\u7d20\u7ea7\u548c\u56fe\u50cf\u7ea7\u7684\u7be1\u6539\u7ebf\u7d22\u63d0\u4f9b\u5224\u65ad\u4f9d\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528GPT-4o\u589e\u5f3a\u4e86\u73b0\u6709\u7684IFDL\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u591a\u6a21\u6001\u7be1\u6539\u63cf\u8ff0\u6570\u636e\u96c6\uff08MMTD-Set\uff09\uff0c\u7528\u4e8e\u8bad\u7ec3FakeShield\u7684\u7be1\u6539\u5206\u6790\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57df\u6807\u7b7e\u5f15\u5bfc\u7684\u53ef\u89e3\u91ca\u4f2a\u9020\u68c0\u6d4b\u6a21\u5757\uff08DTE-FDM\uff09\u548c\u591a\u6a21\u6001\u4f2a\u9020\u5b9a\u4f4d\u6a21\u5757\uff08MFLM\uff09\uff0c\u4ee5\u5e94\u5bf9\u5404\u79cd\u4f2a\u9020\u68c0\u6d4b\u89e3\u91ca\u548c\u5b9e\u73b0\u7531\u8be6\u7ec6\u6587\u672c\u63cf\u8ff0\u6307\u5bfc\u7684\u4f2a\u9020\u5b9a\u4f4d\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0cFakeShield\u6709\u6548\u5730\u68c0\u6d4b\u548c\u5b9a\u4f4d\u4e86\u5404\u79cd\u7be1\u6539\u6280\u672f\uff0c\u63d0\u4f9b\u4e86\u6bd4\u4ee5\u5f80IFDL\u65b9\u6cd5\u66f4\u53ef\u89e3\u91ca\u4e14\u6027\u80fd\u66f4\u4f18\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.02757": "|**2024-10-03**|**Loong: Generating Minute-level Long Videos with Autoregressive Language Models**|Yuqing Wang et.al.|[2410.02757](http://arxiv.org/abs/2410.02757)|null|\u5728\u751f\u6210\u65f6\u957f\u8fbe\u5230\u6570\u5206\u949f\u7684\u4e30\u5bcc\u5185\u5bb9\u89c6\u9891\u65b9\u9762\uff0c\u5c3d\u7ba1\u5177\u6709\u6311\u6218\u6027\u4f46\u524d\u666f\u5e7f\u9614\u3002\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u751f\u6210\u8fde\u8d2f\u4e14\u957f\u5ea6\u8f83\u957f\u7684\u4ee4\u724c\u5e8f\u5217\u65b9\u9762\u53d6\u5f97\u4e86\u5de8\u5927\u6210\u529f\uff0c\u800c\u5728\u63a2\u7d22\u4f7f\u7528\u81ea\u56de\u5f52LLMs\u8fdb\u884c\u89c6\u9891\u751f\u6210\u65f6\uff0c\u4e3b\u8981\u5c40\u9650\u4e8e\u751f\u6210\u51e0\u79d2\u949f\u7684\u77ed\u89c6\u9891\u3002\u672c\u6587\u5bf9\u963b\u6b62\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u751f\u6210\u957f\u65f6\u95f4\u89c6\u9891\u7684\u6311\u6218\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\u3002\u57fa\u4e8e\u89c2\u5bdf\u548c\u5206\u6790\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u201cLoong\u201d\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe\u6570\u5206\u949f\u7684\u89c6\u9891\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u6587\u672c\u4ee4\u724c\u548c\u89c6\u9891\u4ee4\u724c\u7edf\u4e00\u4e3a\u81ea\u56de\u5f52LLM\u53ef\u4ee5\u8fdb\u884c\u81ea\u56de\u5f52\u5efa\u6a21\u7684\u5e8f\u5217\uff0c\u5e76\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u6e10\u8fdb\u5f0f\u77ed\u81f3\u957f\u8bad\u7ec3\u548c\u635f\u5931\u91cd\u65b0\u52a0\u6743\u65b9\u6848\uff0c\u4ee5\u7f13\u89e3\u957f\u671f\u89c6\u9891\u8bad\u7ec3\u4e2d\u7684\u635f\u5931\u4e0d\u5e73\u8861\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u63a8\u7406\u7b56\u7565\uff0c\u5305\u62ec\u89c6\u9891\u4ee4\u724c\u91cd\u7f16\u7801\u548c\u91c7\u6837\u7b56\u7565\uff0c\u4ee5\u51cf\u5c11\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7d2f\u79ef\u7684\u8bef\u5dee\u3002\u6211\u4eec\u7684\u63d0\u51fa\u7684\u201cLoong\u201d\u53ef\u4ee5\u4ece10\u79d2\u7684\u89c6\u9891\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u6269\u5c55\u5230\u6839\u636e\u6587\u672c\u63d0\u793a\u751f\u6210\u6570\u5206\u949f\u7ea7\u522b\u7684\u957f\u89c6\u9891\uff0c\u5982\u7ed3\u679c\u6240\u793a\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\uff1ahttps://epiphqny.github.io/Loong-video\u3002|\n", "2410.02755": "|**2024-10-03**|**SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost**|Jifan Zhang et.al.|[2410.02755](http://arxiv.org/abs/2410.02755)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSIEVE\u7684\u8f7b\u91cf\u7ea7\u66ff\u4ee3\u65b9\u6848\uff0c\u8be5\u65b9\u6848\u5728\u6210\u672c\u4ec5\u4e3aGPT-4o\u5355\u6b21\u8fc7\u6ee4\u8c03\u7528\u7684\u5341\u5206\u4e4b\u4e00\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u4e0eGPT-4o\u7684\u51c6\u786e\u6027\u76f8\u5339\u914d\u3002SIEVE\u7684\u6838\u5fc3\u5728\u4e8e\u5c06GPT-4o\u548c\u8f7b\u91cf\u7ea7T5\u6a21\u578b\u65e0\u7f1d\u96c6\u6210\uff0c\u5e76\u4f7f\u7528\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u5728\u5c11\u91cfGPT-4o\u8c03\u7528\u7684\u652f\u6301\u4e0b\u5bf9T5\u8fdb\u884c\u5fae\u8c03\u3002\u4e00\u65e6\u8bad\u7ec3\u5b8c\u6210\uff0cSIEVE\u7684\u8868\u73b0\u4e0eGPT-4o\u76f8\u5f53\uff0c\u4f46\u6210\u672c\u5374\u4f4e\u5f97\u591a\uff08\u4ec5\u4e3a\u73b0\u6709\u6280\u672f\u76841%\uff09\u3002\u6211\u4eec\u5728OpenWebText\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u9488\u5bf9\u9ad8\u8d28\u91cf\u548c\u9886\u57df\u7279\u5b9a\u5185\u5bb9\u7684\u4e94\u4e2a\u9ad8\u5ea6\u5b9a\u5236\u5316\u7684\u8fc7\u6ee4\u4efb\u52a1\u9a8c\u8bc1\u4e86SIEVE\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002 \u8fdb\u4e00\u6b65\u9a8c\u8bc1SIEVE\u7684\u6548\u679c\u663e\u793a\uff0cSIEVE\u548cGPT-4o\u5728\u51c6\u786e\u6027\u65b9\u9762\u8fbe\u5230\u76f8\u4f3c\u6c34\u5e73\uff0c\u800c\u4eba\u7c7b\u8bc4\u4f30\u8005\u66f4\u503e\u5411\u4e8eSIEVE\u7684\u8fc7\u6ee4\u7ed3\u679c\u800c\u975eGPT-4o\u7684\u7ed3\u679c\u3002|\n", "2410.02749": "|**2024-10-03**|**Training Language Models on Synthetic Edit Sequences Improves Code Synthesis**|Ulyana Piterbarg et.al.|[2410.02749](http://arxiv.org/abs/2410.02749)|null|\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aLintSeq\u7684\u5408\u6210\u6570\u636e\u751f\u6210\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u4f7f\u7528\u4ee3\u7801\u68c0\u67e5\u5668\u6765\u7a0b\u5e8f\u5316\u5730\u5728\u4e0d\u5f15\u5165\u9519\u8bef\u7684\u60c5\u51b5\u4e0b\u968f\u673a\u9009\u53d6\u63d2\u5165\u64cd\u4f5c\u5e8f\u5217\uff0c\u4ece\u800c\u5bf9\u73b0\u6709\u4ee3\u7801\u8fdb\u884c\u91cd\u6784\uff0c\u751f\u6210\u4e00\u7cfb\u5217\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u3002\u8fd9\u4e9b\u5e8f\u5217\u4ee5\u8fde\u7eed\u7684\u7a0b\u5e8f\u5dee\u5f02\u5f62\u5f0f\u8f93\u51fa\u3002 \u4e3a\u4e86\u6d4b\u8bd5LintSeq\uff0c\u6211\u4eec\u5c06\u5176\u5e94\u7528\u4e8e\u5c06\u6307\u4ee4+\u7a0b\u5e8f\u5bf9\u91cd\u65b0\u683c\u5f0f\u5316\u4e3a\u6307\u4ee4+\u7a0b\u5e8f\u5dee\u5f02\u5e8f\u5217\u5bf9\u7684\u4ee3\u7801\u5e93\u3002\u7136\u540e\uff0c\u6211\u4eec\u5bf9\u53c2\u6570\u4ece2.6B\u523014B\u7684\u591a\u4e2a\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u57fa\u4e8e\u6307\u4ee4\u7684\u5fae\u8c03\uff0c\u6bd4\u8f83\u4e86\u5728\u539f\u59cb\u7248\u672c\u548c\u91cd\u65b0\u683c\u5f0f\u5316\u7248\u672c\u6570\u636e\u96c6\u4e0a\u7684\u96f6\u6b21\u5c04\u51fb\u6027\u80fd\u5728\u4ee3\u7801\u5408\u6210\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u6b21\u91c7\u6837\u671f\u95f4\uff0c\u7ecf\u8fc7\u4ee3\u7801\u5dee\u5f02\u5fae\u8c03\u7684\u6a21\u578b\u4ea7\u751f\u7684\u7a0b\u5e8f\u591a\u6837\u6027\u9ad8\u4e8e\u57fa\u7ebf\u3002\u8fd9\u5bfc\u81f4\u4e86\u5728\u7ed9\u5b9a\u5c1d\u8bd5\u6b21\u6570\u201ck\u201d\u65f6\uff0c\u9488\u5bf9\u57fa\u51c6\u8986\u76d6\u7387\u7684\u63a8\u7406\u65f6\u95f4\u6269\u5c55\u6027\u66f4\u597d\uff0c\u5373\u89e3\u51b3\u4efb\u4f55\u95ee\u9898\u7684\u6982\u7387\u201cpass@k\u201d\u3002\u4f8b\u5982\uff0c\u5728HumanEval pass@50\u4e0a\uff0c\u8f83\u5c0f\u6a21\u578b\u5728\u7ecf\u8fc7\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u5fae\u8c03\u540e\u4e0eGPT-4\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\uff0c\u5e76\u4e14\u4f18\u4e8e\u57fa\u4e8e\u57fa\u7ebf\u6570\u636e\u96c6\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u7edd\u5bf9\u5f97\u5206\u9ad8\u51fa20%\uff08\u00b13%\uff09\u3002 \u6700\u540e\uff0c\u6211\u4eec\u8fd8\u9884\u8bad\u7ec3\u4e86\u81ea\u5df1\u7684\u5c0f\u578b\u6a21\u578b\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u57fa\u4e8e\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u7684\u5fae\u8c03\u53ef\u4ee5\u8fbe\u5230\u7c7b\u8bbe\u5907\u6a21\u578b\u7684\u6700\u9ad8\u4ee3\u7801\u5408\u6210\u6027\u80fd\u3002\u6211\u4eec\u76841.5\u4ebf\u53c2\u6570\u7f16\u8f91\u5e8f\u5217\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5339\u914d\u6216\u8d85\u8d8a\u4e86\u53c2\u6570\u91cf\u7ffb\u500d\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u5426\u8fdb\u884c\u591a\u6b21\u91c7\u6837\uff0c\u5305\u62ecCodex\u548cAlphaCode\u3002|\n", "2410.02748": "|**2024-10-03**|**CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation**|Han He et.al.|[2410.02748](http://arxiv.org/abs/2410.02748)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u4ece\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u589e\u5f3a\u603b\u7ed3\u63d0\u793a\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u53ef\u4ee5\u63d0\u9ad8ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0c\u878d\u5165\u77ed\u8bed\u7ea7\u522b\u7684\u663e\u8457\u4fe1\u606f\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7ea7\u522b\u7684\u4fe1\u606f\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5bf9\u5e7b\u89c9\u7684\u5f71\u54cd\u5e76\u975e\u5728\u6240\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0a\u90fd\u662f\u79ef\u6781\u7684\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u9879\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u6a21\u578bKeyphrase Signal Extractor\uff08CriSPO\uff09\uff0c\u8be5\u6a21\u578b\u53ef\u4ee5\u5fae\u8c03\u4ee5\u63d0\u53d6\u663e\u8457\u7684\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528CriSPO\uff0c\u6211\u4eec\u5728\u6570\u636e\u96c6\u3001\u5f00\u6e90\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u5bf9ROUGE\u6539\u8fdb\u7684\u4e00\u81f4\u6027\uff0c\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u5b9a\u5236\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u603b\u7ed3\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.02746": "|**2024-10-03**|**Contrastive Localized Language-Image Pre-Training**|Hong-You Chen et.al.|[2410.02746](http://arxiv.org/abs/2410.02746)|null|\u672c\u6587\u9488\u5bf9\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u4f5c\u4e3a\u89c6\u89c9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u7684\u6210\u529f\uff0c\u91cd\u70b9\u5728\u4e8e\u901a\u8fc7\u5728\u56fe\u50cf\u7ea7\u522b\u4e0a\u5bf9\u9f50\u7f51\u7edc\u6587\u672c\u6ce8\u91ca\u6765\u4f18\u5316\u89c6\u89c9\u7f16\u7801\u5668\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u9700\u8981\u7ec6\u7c92\u5ea6\u89c6\u89c9\u8868\u793a\u7684\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d8\u5f97\u4e0d\u591f\u5145\u5206\uff0c\u5c24\u5176\u662f\u5f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u9700\u8981\u8fdb\u884c\u533a\u57df\u7ea7\u7406\u89e3\u65f6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bf9\u6bd4\u5b9a\u4f4d\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLOC\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8865\u5145CLIP\u4ee5\u589e\u52a0\u533a\u57df\u6587\u672c\u5bf9\u6bd4\u635f\u5931\u548c\u6a21\u5757\u6765\u63d0\u5347\u5176\u5b9a\u4f4d\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u6982\u5ff5\uff0c\u5373\u53ef\u63d0\u793a\u5d4c\u5165\uff0c\u5176\u5141\u8bb8\u7f16\u7801\u5668\u751f\u6210\u6613\u4e8e\u901a\u8fc7\u7a7a\u95f4\u63d0\u793a\u8f6c\u6362\u4e3a\u533a\u57df\u8868\u793a\u7684\u56fe\u50cf\u5d4c\u5165\u3002\u4e3a\u4e86\u652f\u6301\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\uff0c\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u89c6\u89c9\u589e\u5f3a\u4e14\u7a7a\u95f4\u5c40\u90e8\u5316\u7684\u63cf\u8ff0\u7b26\u751f\u6210\u6846\u67b6\uff0c\u80fd\u591f\u6709\u6548\u751f\u6210\u5927\u89c4\u6a21\u7684\u533a\u57df\u6587\u672c\u4f2a\u6807\u7b7e\u3002\u901a\u8fc7\u6269\u5c55\u5230\u6570\u5341\u4ebf\u6807\u6ce8\u56fe\u50cf\uff0cCLOC\u4f7f\u5f97\u56fe\u50cf\u533a\u57df\u8bc6\u522b\u548c\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u9ad8\u8d28\u91cf\u533a\u57df\u5d4c\u5165\u6210\u4e3a\u53ef\u80fd\uff0c\u5e76\u53ef\u4ee5\u4f5c\u4e3aCLIP\u7684\u76f4\u63a5\u66ff\u4ee3\u54c1\uff0c\u7528\u4e8e\u589e\u5f3aMLLMs\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee3\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u4efb\u52a1\u4e2d\u3002|\n", "2410.02744": "|**2024-10-03**|**Neutral residues: revisiting adapters for model extension**|Franck Signe Talla et.al.|[2410.02744](http://arxiv.org/abs/2410.02744)|null|\u6211\u4eec\u89e3\u51b3\u4e86\u4e00\u4e2a\u65b0\u7684\u95ee\u9898\uff1a\u5982\u4f55\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u6269\u5c55\u5230\u5728\u8bad\u7ec3\u65f6\u672a\u66fe\u89c1\u8fc7\u7684\u9886\u57df\uff0c\u4f8b\u5982\u6dfb\u52a0\u4e00\u79cd\u539f\u59cb\u6a21\u578b\u672a\u89c1\u8fc7\u6216\u89c1\u8fc7\u5f88\u5c11\u8bad\u7ec3\u6570\u636e\u7684\u8bed\u8a00\u3002\u6d41\u884c\u7684\u89e3\u51b3\u65b9\u6848\u5982\u5fae\u8c03\u6216\u4f4e\u79e9\u9002\u5e94\u5728\u9886\u57df\u9002\u5e94\u65b9\u9762\u53d6\u5f97\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5b9e\u9645\u4e0a\u5e76\u672a\u589e\u52a0\u989d\u5916\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u964d\u4f4e\u4e86\u539f\u59cb\u9886\u57df\u7684\u6027\u80fd\u3002\u672c\u6587\u4ece\u4e09\u4e2a\u89d2\u5ea6\u5206\u6790\u4e86\u8fd9\u4e2a\u95ee\u9898\uff1a\u6570\u636e\u3001\u67b6\u6784\u548c\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u88ab\u6709\u5229\u5730\u8054\u5408\u8003\u8651\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u9002\u914d\u5668\uff0c\u5e76\u4f7f\u5176\u6709\u53ef\u80fd\u5b66\u4e60\u4e00\u4e2a\u5168\u65b0\u7684\u8bed\u8a00\uff0c\u540c\u65f6\u786e\u4fdd\u795e\u7ecf\u7f51\u7edc\u5728\u539f\u59cb\u9886\u57df\u7684\u8f93\u51fa\u51e0\u4e4e\u4e0d\u53d8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u65b0\u7684\u6b8b\u5dee\u5757\u7684\u65b9\u5f0f\uff0c\u4f7f\u5f97\u6bcf\u4e2a\u65b0\u7684\u6b8b\u5dee\u5757\u5728\u539f\u59cb\u9886\u57df\u8f93\u51fa\u63a5\u8fd1\u96f6\u7684\u7ed3\u679c\u3002 \u8fd9\u79cd\u88ab\u79f0\u4e3a\u201c\u4e2d\u6027\u6b8b\u5dee\u201d\u7684\u89e3\u51b3\u65b9\u6848\u501f\u9274\u4e86\u6df7\u5408\u4e13\u5bb6\u67b6\u6784\u7684\u7ec4\u4ef6\uff0c\u6548\u679c\u663e\u8457\uff1a\u4e0e\u4ec5\u7528\u82f1\u8bed\u8bad\u7ec3\u7684\u539f\u59cb\u6a21\u578b\u76f8\u6bd4\uff0c\u53ea\u9700\u8981\u989d\u591620%\u7684\u5b66\u4e60\u6743\u91cd\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u548c\u4e0d\u5fd8\u8bb0\u82f1\u8bed\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u4f18\u4e8e\u540c\u65f6\u8fdb\u884c\u7684\u5176\u4ed6\u65b9\u6cd5\uff08\u5fae\u8c03\u3001\u4f4e\u79e9\u6216\u5e38\u89c4\u9002\u914d\u5668\uff09\u7684\u7ed3\u679c\u3002|\n", "2410.02743": "|**2024-10-03**|**MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions**|Yekun Chai et.al.|[2410.02743](http://arxiv.org/abs/2410.02743)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u8bc1\u660e\u4e86\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u65b9\u9762\u5177\u6709\u6709\u6548\u6027\u3002\u7136\u800c\uff0c\u57fa\u4e8etoken\u7684RLHF\u9762\u4e34\u7740\u957f\u671f\u5e8f\u5217\u4e2d\u7684\u8d23\u4efb\u5f52\u56e0\u95ee\u9898\uff0c\u5176\u4e2d\u5ef6\u8fdf\u5956\u52b1\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u786e\u5b9a\u54ea\u4e9b\u64cd\u4f5c\u5bfc\u81f4\u4e86\u6210\u529f\u7684\u7ed3\u679c\uff0c\u8fd9\u963b\u788d\u4e86\u5b66\u4e60\u6548\u7387\u5e76\u51cf\u6162\u4e86\u6536\u655b\u901f\u5ea6\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMA-RLHF\u7684\u7b80\u5355\u800c\u6709\u6548\u7684RLHF\u6846\u67b6\uff0c\u5b83\u5c06\u5b8f\u52a8\u4f5c\u2014\u2014\u4e00\u7cfb\u5217token\u6216\u66f4\u9ad8\u5c42\u6b21\u7684\u8bed\u8a00\u6784\u9020\u2014\u2014\u878d\u5165\u5230\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u3002\u901a\u8fc7\u5728\u66f4\u9ad8\u62bd\u8c61\u7ea7\u522b\u4e0a\u64cd\u4f5c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u884c\u52a8\u548c\u5956\u52b1\u4e4b\u95f4\u7684\u65f6\u5e8f\u8ddd\u79bb\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e86\u66f4\u5feb\u4e14\u66f4\u51c6\u786e\u7684\u8d23\u4efb\u5f52\u56e0\u3002\u8fd9\u5bfc\u81f4\u4e86\u66f4\u7a33\u5b9a\u7684\u7b56\u7565\u68af\u5ea6\u4f30\u8ba1\uff0c\u5e76\u63d0\u9ad8\u4e86\u6bcf\u4e2aepisode\u5185\u7684\u5b66\u4e60\u6548\u7387\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u671f\u95f4\u589e\u52a0\u8ba1\u7b97\u590d\u6742\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6587\u672c\u6458\u8981\u3001\u5bf9\u8bdd\u751f\u6210\u3001\u95ee\u9898\u56de\u7b54\u548c\u7a0b\u5e8f\u5408\u6210\u7b49\u5404\u4e2a\u6a21\u578b\u5927\u5c0f\u548c\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6587\u672c\u6458\u8981\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u8fbe30%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5728\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e8618%\uff0c\u5728\u95ee\u9898\u56de\u7b54\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e868%\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6bd4\u6807\u51c6\u7684RLHF\u5feb1.7\u81f32\u500d\u7684\u8bad\u7ec3\u65f6\u95f4\u8fbe\u5230\u4e0e\u4e4b\u76f8\u5339\u654c\u7684\u6027\u80fd\u6c34\u5e73\uff0c\u5e76\u4e14\u968f\u7740\u8fdb\u4e00\u6b65\u7684\u8bad\u7ec3\uff0c\u7ee7\u7eed\u8d85\u8d8a\u5b83\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u7f51\u5740\u4e3ahttps://github.com/ernie-research/MA-RLHF \u3002|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\u7ecf\u5e38\u9047\u5230\u56f0\u96be\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u8282\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large language model with Imperfect world MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u6574\u5408\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u65f6\u95f4\u4e00\u81f4\u6027\u4f53\u9a8c\u91c7\u6837\u7684\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\u3001\u4e00\u7ec4\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u601d\u5148\u524d\u7ecf\u9a8c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6a21\u5757\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u663e\u8457\u63d0\u5347\u5f3a\u5f00\u6e90LLMs\uff08\u5982LLaMA-3\uff09\u7684\u8868\u73b0\uff0c\u5206\u522b\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.04\u500d\u30011.54\u500d\u548c1.82\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5176\u66f4\u5927\u7684\u540c\u8f88\u6a21\u578b\uff0c\u5982GPT-4\u3002|\n", "2410.02741": "|**2024-10-03**|**Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization**|Lei Xu et.al.|[2410.02741](http://arxiv.org/abs/2410.02741)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u6765\u589e\u5f3a\u751f\u6210\u63d0\u793a\u4ee5\u6539\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6458\u8981\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u80fd\u63d0\u5347ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u5f97\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u52a0\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u901a\u8fc7\u8c03\u6574\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\uff0c\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0c\u5c06\u77ed\u8bed\u7ea7\u7684\u663e\u8457\u4fe1\u606f\u878d\u5165\u63d0\u793a\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u4e0d\u610f\u5473\u7740\u5bf9\u6240\u6709LLM\u90fd\u666e\u904d\u6709\u76ca\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u7684Keyphrase Signal Extractor\uff08SigExt\uff09\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u53ef\u8fdb\u884c\u5fae\u8c03\u4ee5\u63d0\u53d6\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528SigExt\uff0c\u6211\u4eec\u5728\u591a\u4e2a\u6570\u636e\u96c6\u3001\u516c\u5f00\u6743\u91cd\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u4e0d\u4f9d\u8d56\u4e8eLLM\u5b9a\u5236\u7684ROUGE\u6307\u6807\u6539\u5584\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u6458\u8981\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.03663": "|**2024-10-04**|**Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models**|Zhuochun Li et.al.|[2410.03663](http://arxiv.org/abs/2410.03663)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cMistake-Aware Peer-Review Distillation\u201d\uff08MAPD\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u6539\u8fdb\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u7684\u77e5\u8bc6\u63d0\u70bc\uff08KD\uff09\u8fc7\u7a0b\u6765\u63d0\u9ad8\u5b83\u4eec\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u901a\u5e38\u4f9d\u8d56\u4e8e\u5927\u578b\u5546\u4e1a\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u6559\u5e08\u3002\u4e0e\u4ee5\u5f80\u7814\u7a76\u4ec5\u4f7f\u7528\u5355\u4e00\u6559\u5e08\u751f\u6210\u7684\u9ec4\u91d1\u7406\u636e\u8fdb\u884c\u8bad\u7ec3\u4e0d\u540c\uff0cMAPD\u65b9\u6cd5\u91c7\u53d6\u4e86\u66f4\u4e3a\u7ec6\u81f4\u7684\u7b56\u7565\uff1a 1. **\u4e2a\u6027\u5316\u9519\u8bef\u53cd\u9988**\uff1aMAPD\u4e0d\u4ec5\u8981\u6c42\u6559\u5e08\u63d0\u4f9b\u5b66\u751f\u7b54\u6848\u7684\u6b63\u786e\u7406\u636e\uff0c\u66f4\u8fdb\u4e00\u6b65\u5730\uff0c\u5b83\u8ba9\u6559\u5e08\u6307\u51fa\u5b66\u751f\u7684\u9519\u8bef\u5e76\u89e3\u91ca\u539f\u56e0\uff0c\u4ece\u800c\u751f\u6210\u5b9a\u5236\u5316\u7684\u6559\u5b66\u6570\u636e\u3002 2. **\u6a21\u62df\u540c\u884c\u8bc4\u5ba1**\uff1a\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6559\u5e08\u95f4\u7684\u6a21\u62df\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\uff0cMAPD\u7b5b\u9009\u51fa\u90a3\u4e9b\u8fbe\u5230\u4e00\u5b9a\u63a5\u53d7\u6807\u51c6\u7684\u751f\u6210\u7406\u636e\u3002\u8fd9\u4e00\u673a\u5236\u51cf\u5c11\u4e86\u6559\u5e08\u56e0\u731c\u6d4b\u800c\u7ed9\u51fa\u9519\u8bef\u7406\u636e\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6559\u5b66\u6570\u636e\u7684\u8d28\u91cf\u3002 \u672c\u6587\u5728\u6570\u5b66\u3001\u5e38\u8bc6\u548c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86MAPD\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.03658": "|**2024-10-04**|**RAFT: Realistic Attacks to Fool Text Detectors**|James Wang et.al.|[2410.03658](http://arxiv.org/abs/2410.03658)|**[link](https://github.com/jameslwang/raft)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u73b0\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5668\u7684\u8bed\u6cd5\u65e0\u8bef\u7684\u9ed1\u76d2\u653b\u51fb\u65b9\u6cd5\uff0c\u79f0\u4e3aRAFT\u3002\u4e0e\u4e4b\u524d\u9488\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u653b\u51fb\u4e0d\u540c\uff0cRAFT\u65b9\u6cd5\u5229\u7528\u4e86\u8bcd\u7ea7\u4e0a\u7684LLM\u5d4c\u5165\u7684\u53ef\u8fc1\u79fb\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u539f\u59cb\u6587\u672c\u8d28\u91cf\u4e0d\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8f85\u52a9\u5d4c\u5165\uff0cRAFT\u8d2a\u5a6a\u5730\u9009\u62e9\u9700\u8981\u6270\u52a8\u7684\u76ee\u6807\u5355\u8bcd\uff0c\u4ee5\u5bf9\u6297\u7279\u5b9a\u7684\u68c0\u6d4b\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRAFT\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u4f7f\u6240\u6709\u7814\u7a76\u4e2d\u7684\u68c0\u6d4b\u5668\u5728\u5404\u79cd\u9886\u57df\u4e2d\u5931\u6548\u9ad8\u8fbe99%\uff0c\u5e76\u4e14\u5177\u6709\u8de8\u6e90\u6a21\u578b\u7684\u53ef\u79fb\u690d\u6027\u3002\u624b\u52a8\u7684\u4eba\u7c7b\u8bc4\u4f30\u7814\u7a76\u8868\u660e\uff0cRAFT\u751f\u6210\u7684\u653b\u51fb\u5b9e\u4f8b\u65e2\u771f\u5b9e\u53c8\u96be\u4ee5\u4e0e\u539f\u521b\u4eba\u7c7b\u7f16\u5199\u6587\u672c\u533a\u5206\u5f00\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86RAFT\u751f\u6210\u7684\u4f8b\u5b50\u53ef\u4ee5\u7528\u6765\u8bad\u7ec3\u9c81\u68d2\u6027\u66f4\u5f3a\u7684\u68c0\u6d4b\u5668\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u5f53\u524d\u7684LLM\u68c0\u6d4b\u5668\u5e76\u975e\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5f3a\u8c03\u4e86\u8feb\u5207\u9700\u8981\u66f4\u5f3a\u5927\u7684\u68c0\u6d4b\u673a\u5236\u7684\u5fc5\u8981\u6027\u3002|\n", "2410.03642": "|**2024-10-04**|**Aligning LLMs with Individual Preferences via Interaction**|Shujin Wu et.al.|[2410.03642](http://arxiv.org/abs/2410.03642)|**[link](https://github.com/shujinwu-0814/aloe)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u65e5\u76ca\u5148\u8fdb\u7684\u80fd\u529b\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u5bf9\u4e8e\u5e7f\u6cdb\u91c7\u7528\u8fd9\u4e9b\u6a21\u578b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9075\u5faa\u8bf8\u5982\u5e2e\u52a9\u6027\u3001\u65e0\u5bb3\u6027\u548c\u8bda\u5b9e\u6027\u7b49\u4e00\u822c\u539f\u5219\u4e0a\uff0c\u4f46\u5ffd\u89c6\u4e86\u8003\u8651\u5230\u4e2a\u4eba\u548c\u591a\u6837\u6027\u504f\u597d\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31\u4e86\u4e2a\u6027\u5316\u7684\u4eba\u7c7b\u4f53\u9a8c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u80fd\u591f\u201c\u4ea4\u4e92\u4ee5\u5bf9\u9f50\u201d\u7684LLMs\uff0c\u5373\u8ba9LLMs\u53d1\u5c55\u51fa\u4e00\u79cd\u9690\u5f0f\u63a8\u65ad\u5f53\u524d\u7528\u6237\u672a\u660e\u786e\u8868\u8fbe\u7684\u4e2a\u6027\u5316\u504f\u597d\u7684\u5143\u6280\u80fd\uff0c\u5e76\u636e\u6b64\u52a8\u6001\u8c03\u6574\u540e\u7eed\u884c\u4e3a\u548c\u54cd\u5e94\u4ee5\u9002\u5e94\u8fd9\u4e9b\u63a8\u65ad\u7684\u504f\u597d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u5efa\u7acb\u4e00\u4e2a\u75313,310\u4e2a\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7ec4\u6210\u7684\u591a\u6837\u5316\u6c60\uff0c\u901a\u8fc7\u521d\u59cb\u793a\u4f8b\u521b\u5efa\uff0c\u7136\u540e\u901a\u8fc7\u8fed\u4ee3\u81ea\u6211\u751f\u6210\u548c\u7b5b\u9009\u8fdb\u884c\u6269\u5c55\u3002\u5728\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7684\u6307\u5bfc\u4e0b\uff0c\u6211\u4eec\u5229\u7528\u591aLLM\u534f\u4f5c\u5f00\u53d1\u4e86\u4e00\u4e2a\u5305\u542b3K+\u591a\u8f6e\u5bf9\u8bdd\u7684\u6811\u5f62\u7ed3\u6784\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u76d1\u7763\u5fae\u8c03\u548c\u5f3a\u5316\u5b66\u4e60\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u63d0\u9ad8LLMs\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u5efa\u7acb\u4e86ALOE\uff08ALign With CustOmized PrEferences\uff09\u57fa\u51c6\uff0c\u5305\u542b100\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4f8b\u5b50\u4ee5\u53ca\u7528\u4e8e\u8861\u91cf\u5bf9\u8bdd\u4e2d\u4e2a\u6027\u5316\u5bf9\u9f50\u6027\u80fd\u7684\u9002\u5f53\u5ea6\u91cf\u6807\u51c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u901a\u8fc7\u4e92\u52a8\u5b9e\u73b0\u52a8\u6001\u3001\u4e2a\u6027\u5316\u7684\u5bf9\u9f50\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002**|\n", "2410.03613": "|**2024-10-04**|**Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation**|Jie Xiao et.al.|[2410.03613](http://arxiv.org/abs/2410.03613)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6211\u4eec\u5de5\u4f5c\u548c\u65e5\u5e38\u751f\u6d3b\u7684\u5404\u4e2a\u65b9\u9762\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u7528\u6237\u9690\u79c1\u7684\u5173\u6ce8\u63a8\u52a8\u4e86\u8fd9\u4e9b\u6a21\u578b\u672c\u5730\u90e8\u7f72\u7684\u8d8b\u52bf\u3002\u5b58\u5728\u4e00\u4e9b\u8f7b\u91cf\u7ea7LLM\uff08\u4f8b\u5982Gemini Nano\uff0cLLAMA2 7B\uff09\uff0c\u5b83\u4eec\u53ef\u4ee5\u5728\u667a\u80fd\u624b\u673a\u4e0a\u672c\u5730\u8fd0\u884c\uff0c\u4e3a\u7528\u6237\u63d0\u4f9b\u5bf9\u5176\u4e2a\u4eba\u6570\u636e\u7684\u66f4\u5927\u63a7\u5236\u6743\u3002\u4f5c\u4e3a\u4e00\u9879\u8fc5\u901f\u53d1\u5c55\u7684\u5e94\u7528\uff0c\u6211\u4eec\u5173\u6ce8\u5b83\u4eec\u5728\u5546\u7528\u79fb\u52a8\u8bbe\u5907\u4e0a\u7684\u6027\u80fd\u3002 \u4e3a\u4e86\u5168\u9762\u4e86\u89e3LLM\u5728\u79fb\u52a8\u5e73\u53f0\u4e0a\u7684\u90e8\u7f72\u73b0\u72b6\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u6d4b\u91cf\u7814\u7a76\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f71\u54cd\u7528\u6237\u4f53\u9a8c\u7684\u6307\u6807\uff0c\u5305\u62ec\u4ee4\u724c\u541e\u5410\u91cf\u3001\u5ef6\u8fdf\u548c\u7535\u6c60\u6d88\u8017\uff0c\u4ee5\u53ca\u5bf9\u5f00\u53d1\u8005\u81f3\u5173\u91cd\u8981\u7684\u56e0\u7d20\uff0c\u5982\u8d44\u6e90\u5229\u7528\u3001\u52a8\u6001\u7535\u538b\u9891\u7387\u7f29\u653e\u7b56\u7565\u548c\u63a8\u7406\u5f15\u64ce\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u5206\u6790\u4e86\u786c\u4ef6\u80fd\u529b\u548c\u7cfb\u7edf\u52a8\u529b\u5b66\u5982\u4f55\u5f71\u54cd\u672c\u5730\u8bbe\u5907\u4e0a\u7684LLM\u6027\u80fd\uff0c\u8fd9\u53ef\u80fd\u6709\u52a9\u4e8e\u5f00\u53d1\u8005\u8bc6\u522b\u5e76\u89e3\u51b3\u79fb\u52a8LLM\u5e94\u7528\u7a0b\u5e8f\u4e2d\u7684\u74f6\u9888\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u9488\u5bf9\u4e3b\u8981\u4f9b\u5e94\u5546\u7684\u79fb\u52a8\u7cfb\u7edf\u7ea7\u82af\u7247\uff08SoC\uff09\u7684\u5168\u9762\u6bd4\u8f83\uff0c\u7a81\u51fa\u4e86\u5b83\u4eec\u5728\u5904\u7406LLM\u5de5\u4f5c\u8d1f\u8f7d\u65f6\u7684\u6027\u80fd\u5dee\u5f02\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u591f\u4e3a\u672c\u5730\u8bbe\u5907LLM\u7684\u5f00\u53d1\u548c\u672a\u6765\u79fb\u52a8\u7cfb\u7edf\u67b6\u6784\u7684\u8bbe\u8ba1\u63d0\u4f9b\u6d1e\u5bdf\u3002|\n", "2410.03608": "|**2024-10-04**|**TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation**|Jonathan Cook et.al.|[2410.03608](http://arxiv.org/abs/2410.03608)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u80cc\u666f\u4e0b\uff0c\u6784\u5efa\u7075\u6d3b\u4e14\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u5176\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u7684\u65b9\u6cd5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\uff0c\u504f\u597d\u5224\u65ad\u6210\u4e3a\u4e86\u8bc4\u4f30\u6807\u51c6\u7684\u9ed8\u8ba4\u9009\u62e9\uff0c\u5c3d\u7ba1\u8fd9\u79cd\u505a\u6cd5\u7b80\u5316\u4e86\u590d\u6742\u3001\u591a\u7ef4\u504f\u597d\u7684\u63d0\u70bc\uff0c\u5c06\u5176\u5f52\u7ed3\u4e3a\u5355\u4e00\u6392\u540d\u3002\u7136\u800c\uff0c\u968f\u7740\u4eba\u5de5\u6ce8\u91ca\u7684\u7f13\u6162\u548c\u6210\u672c\u9ad8\u6602\uff0cLLM\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u505a\u51fa\u8fd9\u4e9b\u5224\u65ad\uff0c\u8fd9\u727a\u7272\u4e86\u53ef\u9760\u6027\u548c\u53ef\u89e3\u91ca\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TICK\uff08\u9488\u5bf9\u7279\u5b9a\u6307\u4ee4\u7684\u7ed3\u6784\u5316\u8bc4\u4f30\u4e0e\u6838\u67e5\u6e05\u5355\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u81ea\u52a8\u5316\u3001\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u65b9\u6848\uff0c\u901a\u8fc7LLM\u751f\u6210\u7684\u3001\u9488\u5bf9\u6307\u4ee4\u7684\u6838\u67e5\u6e05\u5355\u7ed3\u6784\u5316\u8bc4\u4f30\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7ed9\u5b9a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u80fd\u591f\u53ef\u9760\u5730\u4ea7\u751f\u9ad8\u8d28\u91cf\u3001\u5b9a\u5236\u5316\u7684\u8bc4\u4f30\u6838\u67e5\u6e05\u5355\uff0c\u5c06\u6307\u4ee4\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u662f/\u5426\u95ee\u9898\u3002\u6bcf\u4e2a\u95ee\u9898\u8be2\u95ee\u5019\u9009\u56de\u5e94\u662f\u5426\u6ee1\u8db3\u6307\u4ee4\u7684\u5177\u4f53\u8981\u6c42\u3002\u6211\u4eec\u8bc1\u660e\u4f7f\u7528TICK\u80fd\u591f\u663e\u8457\u63d0\u9ad8LLM\u5224\u65ad\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7cbe\u786e\u4e00\u81f4\u6027\u7684\u9891\u7387\uff0c\u76f8\u6bd4\u76f4\u63a5\u7531LLM\u8bc4\u5206\u8f93\u51fa\uff0c\u8fd9\u4e00\u6bd4\u4f8b\u4ece46.4%\u63d0\u5347\u81f352.2%\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86STICK\uff08\u81ea\u6211TICK\uff09\u53ef\u4ee5\u5229\u7528\u81ea\u6211\u7ec6\u5316\u548c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u6765\u6539\u5584\u591a\u4e2a\u57fa\u51c6\u7684\u751f\u6210\u8d28\u91cf\u3002\u5bf9LiveBench\u63a8\u7406\u4efb\u52a1\u8fdb\u884cSTICK\u81ea\u6211\u7ec6\u5316\uff0c\u5b9e\u73b0\u4e86\u7edd\u5bf9\u589e\u76ca+7.8%\uff0c\u800c\u4f7f\u7528STICK\u8fdb\u884c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u5728\u771f\u5b9e\u4e16\u754c\u6307\u4ee4\u6570\u636e\u96c6WildBench\u4e0a\u83b7\u5f97\u4e86+6.3%\u7684\u7edd\u5bf9\u6539\u8fdb\u3002\u8fd9\u8868\u660e\uff0c\u7ed3\u6784\u5316\u7684\u3001\u591a\u7ef4\u5ea6\u7684\u81ea\u6211\u6539\u8fdb\u662f\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u80fd\u529b\u7684\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6700\u540e\uff0c\u901a\u8fc7\u5411\u76f4\u63a5\u4e3aWildBench\u6307\u4ee4\u8bc4\u4f30LLM\u54cd\u5e94\u7684\u4eba\u7c7b\u8bc4\u4f30\u8005\u63d0\u4f9bLLM\u751f\u6210\u7684\u6838\u67e5\u6e05\u5355\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u8bc4\u4f30\u8005\u4e4b\u95f4\u7684\u5171\u8bc6\u5ea6\uff08\u4ece0.194\u63d0\u5347\u81f30.256\uff09\u3002|\n", "2410.03600": "|**2024-10-04**|**Efficiently Identifying Watermarked Segments in Mixed-Source Texts**|Xuandong Zhao et.al.|[2410.03600](http://arxiv.org/abs/2410.03600)|null|\u6587\u672c\u6c34\u5370\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u7528\u4e8e\u68c0\u6d4b\u5408\u6210\u6587\u672c\uff0c\u4ee5\u7f13\u89e3\u865a\u5047\u65b0\u95fb\u548c\u5b66\u672f\u4e0d\u8bda\u5b9e\u7b49\u6ee5\u7528\u60c5\u51b5\u3002\u73b0\u6709\u6c34\u5370\u68c0\u6d4b\u6280\u672f\u4e3b\u8981\u5173\u6ce8\u4e8e\u5bf9\u6574\u4e2a\u6587\u6863\u8fdb\u884c\u5206\u7c7b\uff0c\u5224\u65ad\u5176\u662f\u5426\u88ab\u6c34\u5370\u6807\u8bb0\uff0c\u4f46\u5f80\u5f80\u5ffd\u7565\u4e86\u5728\u66f4\u957f\u7684\u6df7\u5408\u6765\u6e90\u6587\u6863\u4e2d\u8bc6\u522b\u5355\u72ec\u6c34\u5370\u6bb5\u843d\u7684\u5e38\u89c1\u573a\u666f\u3002\u53d7\u5230\u6284\u88ad\u68c0\u6d4b\u7cfb\u7edf\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u578b\u65b9\u6cd5\u8fdb\u884c\u90e8\u5206\u6c34\u5370\u68c0\u6d4b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u51e0\u4f55\u8986\u76d6\u68c0\u6d4b\u6846\u67b6\uff0c\u65e8\u5728\u786e\u5b9a\u957f\u6587\u672c\u4e2d\u662f\u5426\u5b58\u5728\u6c34\u5370\u6bb5\u843d\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7b97\u6cd5\uff0c\u4ee5\u51c6\u786e\u5b9a\u4f4d\u6587\u672c\u4e2d\u7684\u6c34\u5370\u6bb5\u843d\u4f4d\u7f6e\u3002\u5728\u4e09\u79cd\u6d41\u884c\u7684\u6c34\u5370\u6280\u672f\uff08KGW-Watermark\u3001Unigram-Watermark \u548c Gumbel-Watermark\uff09\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53d6\u5f97\u4e86\u9ad8\u7cbe\u5ea6\uff0c\u5e76\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5177\u6709\u9002\u5e94\u5176\u4ed6\u6c34\u5370\u6280\u672f\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u7cbe\u786e\u6c34\u5370\u68c0\u6d4b\u7684\u65b0\u89c1\u89e3\u3002|\n", "2410.03595": "|**2024-10-04**|**Understanding Reasoning in Chain-of-Thought from the Hopfieldian View**|Lijie Hu et.al.|[2410.03595](http://arxiv.org/abs/2410.03595)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u63d0\u793a\u4f5c\u4e3a\u4e00\u79cd\u63d0\u5347\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u6280\u672f\u9010\u6e10\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u9ad8\u6027\u80fd\u65b9\u9762\uff0c\u7f3a\u4e4f\u5bf9CoT\u6210\u529f\u80cc\u540e\u6839\u672c\u56e0\u7d20\u7684\u5168\u9762\u89e3\u91ca\u6846\u67b6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba4\u77e5\u795e\u7ecf\u79d1\u5b66\u4e2d\u7684\u970d\u666e\u83f2\u5c14\u5fb7\u8ba4\u77e5\u89c2\u7684\u65b0\u89c6\u89d2\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u94fe\u63a5CoT\u63a8\u7406\u4e0e\u523a\u6fc0\u3001\u52a8\u4f5c\u3001\u795e\u7ecf\u7fa4\u4f53\u548c\u8868\u793a\u7a7a\u95f4\u7b49\u5173\u952e\u8ba4\u77e5\u5143\u7d20\u4e4b\u95f4\u7684\u5173\u7cfb\u6846\u67b6\u3002\u4ece\u8fd9\u4e00\u89c6\u89d2\u51fa\u53d1\uff0c\u6211\u4eec\u53ef\u4ee5\u7406\u89e3\u63a8\u7406\u8fc7\u7a0b\u5b9e\u8d28\u4e0a\u662f\u8fd9\u4e9b\u8868\u793a\u7a7a\u95f4\u4e4b\u95f4\u7684\u79fb\u52a8\u3002 \u57fa\u4e8e\u6b64\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u5b9a\u4f4dCoT\u54cd\u5e94\u4e2d\u7684\u63a8\u7406\u9519\u8bef\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u601d\u8003\u7684\u8868\u793a\u201d\uff08Representation-of-Thought, RoT\uff09\u7684\u6846\u67b6\uff0c\u5229\u7528\u4f4e\u7ef4\u8868\u793a\u7a7a\u95f4\u7684\u9c81\u68d2\u6027\u6765\u589e\u5f3aCoT\u63a8\u7406\u8fc7\u7a0b\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRoT\u4e0d\u4ec5\u63d0\u9ad8\u4e86CoT\u63a8\u7406\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u800c\u4e14\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u5316\u63a7\u5236\u7684\u53ef\u80fd\u6027\u3002|\n", "2410.03577": "|**2024-10-04**|**Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models**|Xin Zou et.al.|[2410.03577](http://arxiv.org/abs/2410.03577)|**[link](https://github.com/1zhou-Wang/MemVR)**|\u5c3d\u7ba1\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5177\u6709\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u5e7b\u89c9\uff0c\u7279\u522b\u662f\u5728\u89c6\u89c9\u8f93\u5165\u4e2d\u4e0d\u5b58\u5728\u5173\u952e\u7ec6\u8282\u65f6\uff0c\u4f1a\u5938\u5f20\u5730\u7f16\u9020\u5185\u5bb9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9075\u5faa\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e2a\u5e38\u89c1\u6b65\u9aa4\u2014\u2014\u5f53\u5bf9\u73b0\u573a\u5173\u952e\u7ec6\u8282\u7684\u8bb0\u5fc6\u9010\u6e10\u6a21\u7cca\u65f6\uff0c\u76f4\u89c2\u7684\u505a\u6cd5\u662f\u518d\u6b21\u67e5\u770b\u8fd9\u4e9b\u7ec6\u8282\u4ee5\u5bfb\u6c42\u51c6\u786e\u548c\u771f\u5b9e\u7684\u4fe1\u606f\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bb0\u5fc6\u7a7a\u95f4\u89c6\u89c9\u91cd\u8bfb\u201d\uff08MemVR\uff09\u7684\u65b0\u578b\u5e7b\u89c9\u7f13\u89e3\u8303\u5f0f\uff0c\u5b83\u65e0\u9700\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6216\u989d\u5916\u7684\u5fae\u8c03\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u63d0\u793a\u4f5c\u4e3a\u8865\u5145\u8bc1\u636e\uff0c\u901a\u8fc7\u524d\u9988\u7f51\u7edc\uff08FFN\uff09\u6ce8\u5165\u5230MLLMs\u4e2d\u4f5c\u4e3a\u952e\u503c\u8bb0\u5fc6\uff0c\u5f53\u6a21\u578b\u5bf9\u95ee\u9898\u76f8\u5173\u7684\u89c6\u89c9\u8bb0\u5fc6\u4e0d\u786e\u5b9a\u751a\u81f3\u9057\u5fd8\u65f6\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cMemVR\u5728\u5404\u79cdMLLMs\u4e0a\u663e\u8457\u7f13\u89e3\u4e86\u5e7b\u89c9\u95ee\u9898\uff0c\u5e76\u4e14\u5728\u4e0d\u589e\u52a0\u65f6\u95f4\u5f00\u9500\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u901a\u7528\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4ece\u800c\u7a81\u663e\u51fa\u5176\u5e7f\u6cdb\u9002\u7528\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.03568": "|**2024-10-04**|**Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs)**|Abrar Rahman et.al.|[2410.03568](http://arxiv.org/abs/2410.03568)|null|\u672c\u6587\u5bf9\u5f53\u524d\u9876\u7ea7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u91c7\u7528\u7684\u5206\u8bcd\u6280\u672f\u8fdb\u884c\u4e86\u5168\u9762\u7814\u7a76\uff0c\u5e76\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u4e0d\u540c\u8bed\u8a00\u5c24\u5176\u662f\u8d44\u6e90\u532e\u4e4f\u8bed\u8a00\u670d\u52a1\u6210\u672c\u4e0e\u53ef\u7528\u6027\u65b9\u9762\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u7814\u7a76\u8003\u8651\u4e86\u591a\u79cdLLMs\uff0c\u5305\u62ec\u4f7f\u7528cl100k_base\u5d4c\u5165\u7684GPT-4\u3001\u4f7f\u7528p50k_base\u5d4c\u5165\u7684GPT-3\u4ee5\u53ca\u4f7f\u7528r50k_base\u5d4c\u5165\u7684DaVinci\uff0c\u540c\u65f6\u5bf9\u6bd4\u4e86\u5e7f\u6cdb\u4f7f\u7528\u7684BERT\u57fa\u7840\u5206\u8bcd\u5668\u3002\u7814\u7a76\u5206\u6790\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u5206\u8bcd\u5dee\u5f02\uff0c\u5e76\u6df1\u5165\u63a2\u7a76\u4e86\u5b50\u8bcd\u5206\u8bcd\u5728\u8bed\u8a00\u8868\u793a\u4e0a\u7684\u6311\u6218\u3002 \u7814\u7a76\u5f3a\u8c03\u4e86\u57f9\u517b\u8bed\u8a00\u610f\u8bc6\u5f00\u53d1\u5b9e\u8df5\u7684\u91cd\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u90a3\u4e9b\u4f20\u7edf\u4e0a\u8d44\u6e90\u4e0d\u8db3\u7684\u8bed\u8a00\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u5206\u8bcd\u9009\u62e9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002\u7814\u7a76\u65e8\u5728\u4fc3\u8fdbAI\u670d\u52a1\u9886\u57df\uff0c\u7279\u522b\u662f\u8de8\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u901a\u7528\u5316\u56fd\u9645\u5316\uff08I18N\uff09\u5b9e\u8df5\uff0c\u7279\u522b\u5173\u6ce8\u88ab\u73b0\u6709AI\u5e94\u7528\u4e25\u91cd\u5ffd\u89c6\u7684\u8bed\u8a00\u7684\u5305\u5bb9\u6027\u53d1\u5c55\u3002|\n", "2410.03553": "|**2024-10-04**|**Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding**|Wei Wu et.al.|[2410.03553](http://arxiv.org/abs/2410.03553)|null|\u86cb\u767d\u8d28\u4f5c\u4e3a\u751f\u7269\u5206\u5b50\u7684\u6838\u5fc3\uff0c\u5728\u751f\u7269\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u5305\u62ec\u4ee3\u8c22\u53cd\u5e94\u548cDNA\u590d\u5236\u3002\u51c6\u786e\u9884\u6d4b\u5b83\u4eec\u7684\u6027\u8d28\u548c\u529f\u80fd\u5bf9\u751f\u7269\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u5f00\u53d1\u7684\u86cb\u767d\u8d28\u8bed\u8a00\u6a21\u578b\uff08pLMs\uff09\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u63d0\u4f9b\u4e86\u89e3\u51b3\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5fae\u8c03\u7684\u6a21\u578b\u4ec5\u9488\u5bf9\u7279\u5b9a\u4e0b\u6e38\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u5b9a\u5236\uff0c\u5b9e\u73b0\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u589e\u5f3a\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u8c03\u8c10\uff08SEPIT\uff09\u6846\u67b6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728pLMs\u4e2d\u96c6\u6210\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u7ed3\u6784\u611f\u77e5\u6a21\u5757\uff0c\u4ee5\u63d0\u4f9b\u6709\u5173\u7ed3\u6784\u7684\u77e5\u8bc6\uff0c\u5e76\u5c06\u8fd9\u4e9b\u589e\u5f3a\u7684pLMs\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fde\u63a5\u8d77\u6765\uff0c\u4ee5\u751f\u6210\u86cb\u767d\u8d28\u7684\u7406\u89e3\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u6307\u4ee4\u8c03\u8c10\u7ba1\u9053\uff0c\u9996\u5148\u901a\u8fc7\u57fa\u4e8e\u56fe\u6807\u7684\u6307\u4ee4\u5efa\u7acb\u86cb\u767d\u8d28\u7684\u57fa\u672c\u7406\u89e3\uff0c\u7136\u540e\u4f7f\u7528\u4e13\u5bb6\u6df7\u5408\uff08MoEs\uff09\u5b66\u4e60\u66f4\u590d\u6742\u5c5e\u6027\u548c\u529f\u80fd\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6fc0\u6d3b\u53c2\u6570\u7684\u6570\u91cf\u76f8\u540c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u6700\u5168\u9762\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u6211\u4eec\u80fd\u591f\u8bad\u7ec3\u548c\u8bc4\u4f30\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u7ecf\u9a8c\u7ed3\u679c\u5728\u5f00\u653e\u5f0f\u751f\u6210\u548c\u5c01\u95ed\u96c6\u5408\u7b54\u6848\u4efb\u52a1\u4e0a\u663e\u793a\u4e86SEPIT\u76f8\u5bf9\u4e8e\u95ed\u6e90\u901a\u7528LLM\u548c\u4f7f\u7528\u86cb\u767d\u8d28\u77e5\u8bc6\u8bad\u7ec3\u7684\u5f00\u6e90LLM\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2410.05269": "|**2024-10-07**|**Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models**|Fei Wang et.al.|[2410.05269](http://arxiv.org/abs/2410.05269)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u6570\u636e\u662f\u5173\u952e\u8981\u7d20\u3002\u8fd1\u671f\u7814\u7a76\u63a2\u7d22\u4e86\u5229\u7528LLM\u8fdb\u884c\u9ad8\u6548\u6570\u636e\u6536\u96c6\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531LLM\u751f\u6210\u7684\u6570\u636e\u5f80\u5f80\u5b58\u5728\u8d28\u91cf\u53c2\u5dee\u4e0d\u9f50\u3001\u67d0\u4e9b\u65b9\u9762\u88ab\u4f4e\u4f30\u6216\u7f3a\u5931\u4ee5\u53ca\u6570\u636e\u70b9\u8d28\u91cf\u4f4e\u4e0b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6570\u636e\u987e\u95ee\u201d\u7684\u589e\u5f3a\u578bLLM\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u76ee\u6807\u6570\u636e\u96c6\u7684\u7279\u6027\uff0c\u4ece\u9884\u5b9a\u4e49\u7684\u539f\u5219\u51fa\u53d1\uff0c\u76d1\u63a7\u751f\u6210\u6570\u636e\u7684\u72b6\u6001\uff0c\u8bc6\u522b\u5f53\u524d\u6570\u636e\u96c6\u7684\u5f31\u70b9\uff0c\u5e76\u636e\u6b64\u6307\u5bfc\u6570\u636e\u751f\u6210\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u6570\u636e\u987e\u95ee\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u73b0\u6709\u7684\u6570\u636e\u751f\u6210\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u8986\u76d6\u9762\u3002 \u5728\u5bf9\u4e09\u4e2a\u4ee3\u8868\u6027LLM\uff08\u5373Mistral\u3001Llama2\u548cFalcon\uff09\u7684\u5b89\u5168\u5bf9\u9f50\u8fdb\u884c\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6570\u636e\u987e\u95ee\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u727a\u7272\u6a21\u578b\u5b9e\u7528\u6027\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u63d0\u5347\u6a21\u578b\u5bf9\u5404\u79cd\u7cbe\u7ec6\u7c92\u5ea6\u5b89\u5168\u95ee\u9898\u7684\u9002\u5e94\u6027\u7684\u80fd\u529b\u3002|\n", "2410.05265": "|**2024-10-07**|**PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs**|Mengzhao Chen et.al.|[2410.05265](http://arxiv.org/abs/2410.05265)|**[link](https://github.com/chenmnz/prefixquant)**|**\u91cf\u5316\u5bf9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u80fd\u663e\u8457\u63d0\u5347\u5185\u5b58\u6548\u7387\u4e0e\u63a8\u7406\u901f\u5ea6\u3002\u73b0\u6709\u7684\u6fc0\u6d3b\u91cf\u5316\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u901a\u9053\u7ea7\u5f02\u5e38\u503c\u8fdb\u884c\u5904\u7406\uff0c\u5f80\u5f80\u5ffd\u7565\u4e86\u4ee4\u724c\u7ea7\u7684\u5f02\u5e38\u503c\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5bf9\u6210\u672c\u9ad8\u6602\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u4f9d\u8d56\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPrefixQuant\u7684\u65b0\u9896\u6280\u672f\uff0c\u8be5\u6280\u672f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u79bb\u7ebf\u8bc6\u522b\u51fa\u9ad8\u9891\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3a\u524d\u7f00\u653e\u5165KV\u7f13\u5b58\u4e2d\uff0c\u4ee5\u9632\u6b62\u63a8\u7406\u8fc7\u7a0b\u4e2d\u751f\u6210\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u7b80\u5316\u4e86\u91cf\u5316\u8fc7\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cPrefixQuant\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u5e76\u8d85\u8d8a\u6602\u8d35\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u7684\u65b9\u6cd5\u3002\u4f8b\u5982\uff0c\u5728W4A4KV4\uff08\u6743\u91cd4\u4f4d\u3001\u6fc0\u6d3b4\u4f4d\u3001KV\u7f13\u5b584\u4f4d\uff09\u7684Llama-3-8B\u6a21\u578b\u4e2d\uff0c\u4f7f\u7528PrefixQuant\u548c\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u540e\uff0cWikiText2\u7684\u56f0\u60d1\u5ea6\u964d\u4f4e\u4e867.43\u4e2a\u70b9\uff0c\u5e73\u5747\u51c6\u786e\u7387\u57285\u4e2a\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\u4e0a\u63d0\u9ad8\u4e8671.08%\uff0c\u76f8\u8f83\u4e8e\u4e4b\u524d\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u65b9\u6cd5QuaRot\uff0c\u5206\u522b\u5728\u56f0\u60d1\u5ea6\u4e0a\u63d0\u5347\u4e860.98\u4e2a\u70b9\uff0c\u5728\u51c6\u786e\u7387\u4e0a\u63d0\u5347\u4e865.98\u4e2a\u70b9\u3002\u6b64\u5916\uff0c\u4f7f\u7528PrefixQuant\u91cf\u5316\u540e\u7684\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u76f8\u8f83\u4e8eFP16\u6a21\u578b\u63d0\u5347\u4e861.60\u500d\u52302.81\u500d\uff0c\u4e14\u8d85\u8fc7\u4e86QuaRot\u6a21\u578b1.2\u500d\u52301.3\u500d\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\url{https://github.com/ChenMnZ/PrefixQuant}\u3002**|\n", "2410.05262": "|**2024-10-07**|**TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles**|Qingchen Yu et.al.|[2410.05262](http://arxiv.org/abs/2410.05262)|**[link](https://github.com/mazzzystar/TurtleBench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u8303\u56f4\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u53ef\u9760\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u73b0\u6709\u7684LLM\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u4f9d\u8d56\u9759\u6001\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u5f97\u8bc4\u4f30\u6a21\u578b\u5728\u4e0e\u7528\u6237\u52a8\u6001\u4ea4\u4e92\u65f6\u7684\u8868\u73b0\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5f80\u5f80\u9700\u8981\u7279\u5b9a\u80cc\u666f\u77e5\u8bc6\uff0c\u4ece\u800c\u590d\u6742\u5316\u4e86\u8861\u91cf\u6a21\u578b\u903b\u8f91\u63a8\u7406\u80fd\u529b\u7684\u6d4b\u91cf\u3002\u57fa\u4e8e\u5f3a\u5927\u6a21\u578b\u6216\u4eba\u5de5\u52aa\u529b\u7684\u5176\u4ed6\u52a8\u6001\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4f1a\u5f15\u5165\u504f\u89c1\uff0c\u5e76\u4e14\u6210\u672c\u548c\u65f6\u95f4\u9700\u6c42\u9ad8\uff0c\u8fd9\u963b\u788d\u4e86\u5927\u89c4\u6a21\u5e94\u7528\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TurtleBench\u3002TurtleBench\u4ece\u6211\u4eec\u5f00\u53d1\u7684\u5728\u7ebfTurtle Soup Puzzle\u5e73\u53f0\u6536\u96c6\u771f\u5b9e\u7684\u7528\u6237\u731c\u6d4b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u751f\u6210\u76f8\u5bf9\u52a8\u6001\u7684\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u964d\u4f4e\u6a21\u578b\u4f5c\u5f0a\u7684\u98ce\u9669\uff0c\u540c\u65f6\u4f7f\u8bc4\u4f30\u66f4\u8d34\u8fd1\u5b9e\u9645\u7528\u6237\u7684\u63a8\u7406\u9700\u6c42\uff0c\u4ece\u800c\u63d0\u9ad8\u8bc4\u4f30\u7684\u53ef\u9760\u6027\u3002TurtleBench\u5305\u542b\u4e861,532\u4e2a\u7528\u6237\u731c\u6d4b\u53ca\u5176\u6b63\u786e\u6027\u7684\u6ce8\u91ca\u4fe1\u606f\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u4e5d\u4e2aLLM\u6a21\u578b\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cOpenAI o1\u7cfb\u5217\u6a21\u578b\u5728\u8fd9\u4e9b\u8bc4\u4f30\u4e2d\u5e76\u672a\u53d6\u5f97\u9886\u5148\u5730\u4f4d\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u5047\u8bbe\uff0c\u4f8b\u5982\u201co1\u7684\u6f5c\u5728\u63a8\u7406\u4f7f\u7528\u4e86\u7b80\u5355\u7684\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u6280\u672f\u201d\u548c\u201c\u589e\u52a0CoT\u957f\u5ea6\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u63a8\u7406\u76ca\u5904\uff0c\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u566a\u97f3\u6210\u672c\u201d\u3002**|\n", "2410.05258": "|**2024-10-07**|**Differential Transformer**|Tianzhu Ye et.al.|[2410.05258](http://arxiv.org/abs/2410.05258)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5dee\u5f02\u53d8\u6362\u5668\uff08Diff Transformer\uff09\uff0c\u5b83\u80fd\u591f\u589e\u5f3a\u5bf9\u76f8\u5173\u4e0a\u4e0b\u6587\u7684\u6ce8\u610f\u529b\u540c\u65f6\u6d88\u9664\u566a\u97f3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5dee\u5f02\u6ce8\u610f\u529b\u673a\u5236\u901a\u8fc7\u8ba1\u7b97\u4e24\u4e2a\u72ec\u7acb\u7684softmax\u6ce8\u610f\u529b\u6620\u5c04\u4e4b\u95f4\u7684\u5dee\u503c\u6765\u786e\u5b9a\u6ce8\u610f\u529b\u5206\u6570\u3002\u8fd9\u79cd\u51cf\u6cd5\u64cd\u4f5c\u53ef\u4ee5\u6d88\u9664\u566a\u97f3\u5e76\u4fc3\u8fdb\u7a00\u758f\u6ce8\u610f\u529b\u6a21\u5f0f\u7684\u4ea7\u751f\u3002\u5728\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u53d8\u6362\u5668\u76f8\u6bd4\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u5728\u6a21\u578b\u5927\u5c0f\u548c\u8bad\u7ec3\u6837\u672c\u91cf\u7684\u6269\u5c55\u4e0a\u5747\u8868\u73b0\u51fa\u8272\u3002\u66f4\u4ee4\u4eba\u5174\u594b\u7684\u662f\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5982\u957f\u4e0a\u4e0b\u6587\u5efa\u6a21\u3001\u5173\u952e\u4fe1\u606f\u68c0\u7d22\u3001\u5e7b\u89c9\u6291\u5236\u3001\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u4ee5\u53ca\u6fc0\u6d3b\u5f02\u5e38\u51cf\u5c11\u7b49\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002\u7531\u4e8e\u5bf9\u65e0\u5173\u4e0a\u4e0b\u6587\u7684\u5173\u6ce8\u8f83\u5c11\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u80fd\u591f\u6709\u6548\u7f13\u89e3\u95ee\u7b54\u548c\u6587\u672c\u6458\u8981\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u5728\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u51c6\u786e\u7387\uff0c\u800c\u4e14\u5bf9\u4e8e\u987a\u5e8f\u6392\u5217\u66f4\u4e3a\u9c81\u68d2\uff0c\u8fd9\u88ab\u8ba4\u4e3a\u662f\u957f\u671f\u7684\u7a33\u5065\u6027\u95ee\u9898\u3002\u8fd9\u4e9b\u7ed3\u679c\u786e\u7acb\u4e86\u5dee\u5f02\u53d8\u6362\u5668\u4f5c\u4e3a\u63a8\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u7684\u9ad8\u6548\u4e14\u6709\u524d\u666f\u67b6\u6784\u7684\u5730\u4f4d\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u4e0e\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u9886\u57df\u901a\u5e38\u4ee5\u81ea\u7136\u8bed\u8a00\u6c9f\u901a\u4e3a\u4e3b\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\u884c\u4e3a\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u548c\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u7b56\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u8fd9\u4e9b\u95ee\u9898\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u4e00\u76f4\u5728\u63a2\u7d22LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u529b\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u7684\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u5dee\u5f02\u4f7f\u5f97\u5f88\u96be\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6807\u51c6\u5316\u7814\u7a76\u57fa\u4e8e\u53cc\u4eba\u3001\u5e8f\u5217\u3001\u8bed\u8a00\u9a71\u52a8\u6e38\u620f\u7684\u6807\u51c6\u6846\u67b6\u3002\u53d7\u7ecf\u6d4e\u5b66\u6587\u732e\u542f\u53d1\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u672c\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u548c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u548c\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u6765\u6a21\u62df\u4ea4\u4e92\u548c\u5206\u6790\uff0c\u5e76\u5229\u7528\u5b83\u6536\u96c6\u4e86LMM\u5bf9LMM\u4ea4\u4e92\u7684\u5927\u91cf\u6570\u636e\u96c6\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u5bf9LMM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u5982\u4f55\u88ab\u7528\u6765\uff1a (i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b (ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u5c42\u9762\u8bc4\u4f30\u4ee3\u7406\u7684\u6027\u80fd\uff1b (iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.05252": "|**2024-10-07**|**Causal Micro-Narratives**|Mourad Heddaya et.al.|[2410.05252](http://arxiv.org/abs/2410.05252)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5bf9\u6587\u672c\u4e2d\u7684\u56e0\u679c\u5fae\u53d9\u4e8b\u8fdb\u884c\u5206\u7c7b\u3002\u8fd9\u4e9b\u53d9\u4e8b\u662f\u5173\u4e8e\u76ee\u6807\u4e3b\u4f53\u7684\u56e0\u679c\u89e3\u91ca\u7684\u53e5\u5b50\u7ea7\u63cf\u8ff0\u3002\u8be5\u65b9\u6cd5\u4ec5\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u4e3b\u9898\u7684\u56e0\u679c\u548c\u6548\u679c\u7684\u672c\u4f53\uff0c\u6211\u4eec\u901a\u8fc7\u5e94\u7528\u5230\u901a\u8d27\u81a8\u80c0\u53d9\u4e8b\u4e2d\u8fdb\u884c\u4e86\u793a\u8303\u3002\u5229\u7528\u8986\u76d6\u7f8e\u56fd\u5386\u53f2\u548c\u5f53\u4ee3\u65b0\u95fb\u6587\u7ae0\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u5728\u591a\u6807\u7b7e\u5206\u7c7b\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u51e0\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\u2014\u2014\u5fae\u8c03\u540e\u7684Llama 3.1 8B\uff0c\u5728\u53d9\u4e8b\u68c0\u6d4b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.87\uff0c\u5728\u53d9\u4e8b\u5206\u7c7b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.71\u3002\u5168\u9762\u7684\u9519\u8bef\u5206\u6790\u63ed\u793a\u4e86\u8bed\u4e49\u6b67\u4e49\u5e26\u6765\u7684\u6311\u6218\uff0c\u5e76\u6307\u51fa\u6a21\u578b\u9519\u8bef\u5f80\u5f80\u53cd\u6620\u4e86\u4eba\u5de5\u6ce8\u91ca\u8005\u7684\u5206\u6b67\u3002\u8fd9\u9879\u7814\u7a76\u5efa\u7acb\u4e86\u4e00\u4e2a\u4ece\u5b9e\u9645\u6570\u636e\u4e2d\u63d0\u53d6\u56e0\u679c\u5fae\u53d9\u4e8b\u7684\u6846\u67b6\uff0c\u5177\u6709\u5e7f\u6cdb\u7684\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u5e94\u7528\u524d\u666f\u3002|\n", "2410.05248": "|**2024-10-07**|**SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe**|Yuxin Xiao et.al.|[2410.05248](http://arxiv.org/abs/2410.05248)|null|\u4e3a\u4e86\u5728\u4ea4\u4e92\u9a71\u52a8\u4efb\u52a1\u4e2d\u8bf1\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u671f\u671b\u7684\u884c\u4e3a\uff0c\u901a\u5e38\u91c7\u7528\u6307\u4ee4-\u8c03\u4f18\u9636\u6bb5\uff0c\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u635f\u5931\u8bad\u7ec3LLM\u4e8e\u6307\u4ee4\u54cd\u5e94\u5bf9\u3002\u5148\u524d\u7684\u5de5\u4f5c\u65e8\u5728\u63d0\u5347\u8c03\u4f18\u6027\u80fd\uff0c\u5e38\u7740\u91cd\u4e8e\u9ad8\u8d28\u91cf\u7684\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u96c6\u7684\u6784\u5efa\uff0c\u8fd9\u901a\u5e38\u9700\u8981\u6602\u8d35\u7684\u6570\u636e\u8fc7\u6ee4\u8fc7\u7a0b\u6216\u4eba\u529b\u5bc6\u96c6\u578b\u7684\u4eba\u5de5\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528\u6570\u636e\u96c6\u7684\u5185\u5728\u7279\u6027\uff0c\u5bfc\u81f4\u4e86\u9ad8\u6602\u7684\u8ba1\u7b97\u548c\u52b3\u52a8\u6210\u672c\uff0c\u9650\u5236\u4e86\u53ef\u6269\u5c55\u6027\u548c\u6027\u80fd\u63d0\u5347\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSFTMix\u7684\u65b0\u9896\u65b9\u6cd5\uff0c\u5b83\u8d85\u8d8a\u4e86\u4f20\u7edfNTP\u8303\u5f0f\uff0c\u65e0\u9700\u7cbe\u5fc3\u8bbe\u8ba1\u7684SFT\u6570\u636e\u96c6\u5373\u53ef\u63d0\u5347\u8c03\u4f18\u6027\u80fd\u3002 \u89c2\u5bdf\u5230LLM\u5728\u8bed\u4e49\u8868\u793a\u7a7a\u95f4\u4e2d\u8868\u73b0\u51fa\u4e0d\u5747\u5300\u7684\u7f6e\u4fe1\u5ea6\u5206\u5e03\uff0c\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\u5728\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u5e94\u626e\u6f14\u4e0d\u540c\u7684\u89d2\u8272\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0cSFTMix\u5229\u7528\u8bad\u7ec3\u52a8\u6001\u6765\u8bc6\u522b\u5177\u6709\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\uff0c\u7136\u540e\u5e94\u7528\u57fa\u4e8eMixup\u7684\u6b63\u5219\u5316\u6765\u51cf\u5c11\u5bf9\u9ad8\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u8fc7\u62df\u5408\uff0c\u540c\u65f6\u4f20\u64ad\u76d1\u7763\u4fe1\u53f7\u4ee5\u6539\u5584\u76f8\u5bf9\u4f4e\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u5b66\u4e60\u6548\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u5f97SFTMix\u80fd\u591f\u5728\u5e7f\u6cdb\u7684\u64cd\u4f5c\u6307\u4ee4\u9075\u5faa\u548c\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u7279\u5b9aSFT\u4efb\u52a1\u4e2d\u663e\u8457\u8d85\u8d8aNTP\uff0c\u8bc1\u660e\u4e86\u5176\u5bf9\u4e0d\u540cLLM\u5bb6\u65cf\u548c\u4efb\u610f\u5927\u5c0f\u6570\u636e\u96c6\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86SFTMix\u8bbe\u8ba1\u9009\u62e9\u7684\u7a33\u5065\u6027\uff0c\u5f3a\u8c03\u4e86\u5176\u5728\u4e0d\u540cLLM\u548c\u6570\u636e\u96c6\u4e0a\u7684\u4e00\u81f4\u6027\u80fd\u63d0\u5347\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u66f4\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u3002|\n", "2410.05243": "|**2024-10-07**|**Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents**|Boyu Gou et.al.|[2410.05243](http://arxiv.org/abs/2410.05243)|**[link](https://github.com/OSU-NLP-Group/UGround)**|\u672c\u8bba\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5982\u4f55\u91cd\u5851\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u4f7f\u5176\u4ece\u53d7\u63a7\u6a21\u62df\u5411\u8de8\u5e73\u53f0\u7684\u590d\u6742\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u8fc7\u6e21\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u5176\u56fa\u6709\u6027\u7684\u7a33\u5065\u6027\u3002\u5f53\u524d\u7684GUI\u4ee3\u7406\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u8868\u793a\uff0c\u5982HTML\u6216\u53ef\u8bbf\u95ee\u6027\u6811\uff0c\u5c3d\u7ba1\u5b83\u4eec\u5177\u6709\u5b9e\u7528\u6027\uff0c\u4f46\u5f80\u5f80\u5f15\u5165\u566a\u58f0\u3001\u4e0d\u5b8c\u6574\u6027\u4ee5\u53ca\u589e\u52a0\u8ba1\u7b97\u5f00\u9500\u3002 \u6211\u4eec\u7684\u89c2\u70b9\u662f\uff0c\u4e3aGUI\u4ee3\u7406\u6784\u5efa\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4f53\u73b0\uff0c\u80fd\u591f\u5b8c\u5168\u901a\u8fc7\u89c6\u89c9\u611f\u77e5\u73af\u5883\uff0c\u5e76\u76f4\u63a5\u5bf9GUI\u6267\u884c\u50cf\u7d20\u7ea7\u64cd\u4f5c\u3002\u5173\u952e\u5728\u4e8e\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u5b83\u4eec\u80fd\u591f\u51c6\u786e\u5730\u5c06GUI\u5143\u7d20\u7684\u5404\u79cd\u5f15\u7528\u8868\u8fbe\u6620\u5c04\u5230\u5176\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684GUI\u5750\u6807\u4e0a\u3002\u6211\u4eec\u8868\u660e\uff0c\u4e00\u4e2a\u7b80\u5355\u7684\u914d\u65b9\u2014\u2014\u5305\u62ec\u57fa\u4e8e\u7f51\u7edc\u7684\u5408\u6210\u6570\u636e\u548c\u5bf9LLaVA\u67b6\u6784\u7684\u8f7b\u5fae\u8c03\u6574\u2014\u2014\u5bf9\u4e8e\u8bad\u7ec3\u8fd9\u6837\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u662f\u51fa\u5947\u6709\u6548\u7684\u3002 \u6211\u4eec\u6536\u96c6\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684GUI\u89c6\u89c9\u5b9a\u4f4d\u6570\u636e\u96c6\uff0c\u5305\u542b10M\u4e2aGUI\u5143\u7d20\u53ca\u5176\u5f15\u7528\u8868\u8fbe\uff0c\u8986\u76d6\u4e861.3M\u5f20\u622a\u56fe\uff0c\u4ee5\u6b64\u6765\u8bad\u7ec3UGround\uff0c\u8fd9\u662f\u7528\u4e8eGUI\u4ee3\u7406\u7684\u5f3a\u5927\u901a\u7528\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u3002\u5728\u516d\u4e2a\u8de8\u4e09\u4e2a\u7c7b\u522b\uff08\u5b9a\u4f4d\u3001\u79bb\u7ebf\u4ee3\u7406\u548c\u5728\u7ebf\u4ee3\u7406\uff09\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u51fa\u4ee5\u4e0b\u4e24\u70b9\uff1a 1\uff09UGround\u663e\u8457\u4f18\u4e8e\u73b0\u6709GUI\u4ee3\u7406\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u7edd\u5bf9\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002 2\uff09\u4f7f\u7528UGround\u7684\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u4ee3\u7406\uff0c\u5c3d\u7ba1\u73b0\u6709\u7684\u4ee3\u7406\u4f7f\u7528\u989d\u5916\u7684\u57fa\u4e8e\u6587\u672c\u7684\u8f93\u5165\uff0c\u800c\u6211\u4eec\u7684\u4ee3\u7406\u4ec5\u4f9d\u8d56\u4e8e\u89c6\u89c9\u611f\u77e5\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5f3a\u6709\u529b\u5730\u652f\u6301\u4e86\u8fd9\u6837\u4e00\u79cd\u8bbe\u60f3\uff1a\u5373\u50cf\u4eba\u7c7b\u4e00\u6837\u5728\u6570\u5b57\u4e16\u754c\u4e2d\u5bfc\u822a\u7684GUI\u4ee3\u7406\u662f\u53ef\u884c\u7684\uff0c\u5e76\u4e14\u5145\u6ee1\u4e86\u6f5c\u529b\u3002|\n", "2410.05229": "|**2024-10-07**|**GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models**|Iman Mirzadeh et.al.|[2410.05229](http://arxiv.org/abs/2410.05229)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5f15\u53d1\u4e86\u5bf9\u5b83\u4eec\u5728\u6570\u5b66\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u5173\u6ce8\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5c0f\u5b66\u6c34\u5e73\u95ee\u9898\u3002GSM8K\u57fa\u51c6\u6d4b\u8bd5\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u8868\u73b0\u3002\u5c3d\u7ba1LLM\u5728GSM8K\u4e0a\u7684\u6210\u7ee9\u8fd1\u5e74\u6765\u663e\u8457\u63d0\u9ad8\uff0c\u4f46\u5176\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5426\u771f\u6b63\u6709\u6240\u63d0\u5347\u4ecd\u7136\u5b58\u5728\u7591\u95ee\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u8bc4\u4f30\u6307\u6807\u7684\u53ef\u9760\u6027\u53d7\u5230\u8d28\u7591\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u5f53\u524d\u6700\u524d\u6cbf\u7684\u5f00\u653e\u548c\u5c01\u95ed\u6a21\u578b\u3002\u4e3a\u4e86\u514b\u670d\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86GSM-Symbolic\u6539\u8fdb\u7248\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u57fa\u4e8e\u7b26\u53f7\u6a21\u677f\u751f\u6210\u4e86\u591a\u6837\u5316\u7684\u9898\u76ee\u3002GSM-Symbolic\u4f7f\u5f97\u8bc4\u4f30\u66f4\u52a0\u53ef\u63a7\uff0c\u63d0\u4f9b\u4e86\u5173\u952e\u6d1e\u5bdf\u548c\u66f4\u53ef\u9760\u7684\u6307\u6807\u6765\u8861\u91cf\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u5728\u56de\u7b54\u4e0d\u540c\u7248\u672c\u540c\u9898\u65f6\u8868\u73b0\u51fa\u660e\u663e\u7684\u5dee\u5f02\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728GSM-Symbolic\u57fa\u51c6\u4e2d\uff0c\u4ec5\u6539\u53d8\u95ee\u9898\u4e2d\u7684\u6570\u503c\u540e\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f1a\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8106\u5f31\u6027\uff0c\u5e76\u8868\u660e\u968f\u7740\u95ee\u9898\u4e2d\u6761\u76ee\u6570\u91cf\u7684\u589e\u52a0\uff0c\u5176\u6027\u80fd\u4f1a\u663e\u8457\u964d\u4f4e\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u8fd9\u662f\u56e0\u4e3a\u5f53\u524d\u7684LLM\u65e0\u6cd5\u6267\u884c\u771f\u6b63\u7684\u903b\u8f91\u63a8\u7406\uff1b\u5b83\u4eec\u53ea\u662f\u590d\u5236\u4e86\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u63a8\u7406\u6b65\u9aa4\u3002\u5373\u4f7f\u6dfb\u52a0\u4e00\u4e2a\u770b\u4f3c\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5355\u4e2a\u6761\u76ee\uff0c\u6240\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7684\u8868\u73b0\u4e5f\u4f1a\u5927\u5e45\u4e0b\u964d\uff08\u9ad8\u8fbe65%\uff09\uff0c\u5c3d\u7ba1\u8fd9\u4e2a\u6761\u76ee\u5b9e\u9645\u4e0a\u5e76\u4e0d\u8d21\u732e\u4e8e\u5b8c\u6210\u7b54\u6848\u6240\u9700\u7684\u5173\u952e\u63a8\u7406\u94fe\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u7406\u89e3LLM\u5728\u6570\u5b66\u63a8\u7406\u4e0a\u7684\u80fd\u529b\u548c\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u4e3a\u7ec6\u81f4\u7684\u89c6\u89d2\u3002|\n", "2410.05224": "|**2024-10-07**|**Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates**|Avanika Narayan et.al.|[2410.05224](http://arxiv.org/abs/2410.05224)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCookbook\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u7f16\u7a0b\u65b9\u5f0f\u751f\u6210\u8bad\u7ec3\u6570\u636e\uff0c\u6570\u636e\u4e3b\u8981\u7531\u968f\u673a\u6807\u8bb0\u7684\u7b80\u5355\u6a21\u5f0f\u7ec4\u6210\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u89c4\u6a21\u548c\u6210\u672c\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4e14\u907f\u514d\u4e86\u4e0e\u4eba\u7c7b\u6216\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6570\u636e\u76f8\u5173\u7684\u6cd5\u5f8b\u548c\u9690\u79c1\u95ee\u9898\u3002\u9996\u5148\uff0cCookbook\u5229\u7528\u6570\u636e\u751f\u6210Python\u51fd\u6570\u6a21\u677f\u6765\u4ea7\u751f\u9f13\u52b1\u6a21\u578b\u5b66\u4e60\u4e0e\u7279\u5b9a\u4efb\u52a1\u76f8\u5339\u914d\u7684\u663e\u5f0f\u89c4\u5219\u7684\u8bad\u7ec3\u6570\u636e\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u6a21\u578b\u5728\u5bf9\u5e94\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\uff0c\u6700\u9ad8\u53ef\u8fbe52.7\u4e2a\u51c6\u786e\u6027\u70b9\u3002\u5176\u6b21\uff0c\u7531\u4e8e\u6307\u4ee4\u6570\u636e\u96c6\u80fd\u591f\u540c\u65f6\u6539\u5584\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\uff0cCookbook\u7b97\u6cd5\u81ea\u52a8\u5b66\u4e60\u5982\u4f55\u6df7\u5408\u6765\u81ea\u4e0d\u540c\u6a21\u677f\u7684\u6570\u636e\u4ee5\u4f18\u5316\u591a\u4e2a\u4efb\u52a1\u7684\u6027\u80fd\u3002\u5728\u6807\u51c6\u7684\u591a\u4efb\u52a1GPT4ALL\u8bc4\u4f30\u5957\u4ef6\u4e0a\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684Mistral-7B\u6a21\u578b\u5728\u5e73\u5747\u51c6\u786e\u6027\u548c\u4e09\u4e2a\u4efb\u52a1\u4e2d\u7684\u4e09\u4e2a\u4e0a\u5747\u53d6\u5f97\u6700\u4f73\u6210\u7ee9\u3002\u6700\u540e\uff0c\u5206\u6790\u4e86Cookbook\u4e3a\u4f55\u80fd\u63d0\u9ad8\u6027\u80fd\u4ee5\u53ca\u5176\u80cc\u540e\u7684\u539f\u7406\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u9879\u6307\u6807\u6765\u9a8c\u8bc1\u6539\u8fdb\u7684\u4e3b\u8981\u539f\u56e0\u662f\u6a21\u578b\u751f\u6210\u7684\u7ed3\u679c\u66f4\u597d\u5730\u9075\u5faa\u4e86\u6a21\u677f\u89c4\u5219\u3002|\n", "2410.07176": "|**2024-10-09**|**Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models**|Fei Wang et.al.|[2410.07176](http://arxiv.org/abs/2410.07176)|null|\u5728\u63a2\u7d22\u5982\u4f55\u901a\u8fc7\u8054\u5408\u5206\u6790\u6765\u7406\u89e3\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u5bf9\u751f\u6210\u578b\u95ee\u7b54\uff08RAG\uff09\u884c\u4e3a\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5982\u4f55\u5728LLM\u5185\u90e8\u77e5\u8bc6\u4e0e\u5916\u90e8\u6765\u6e90\u4e4b\u95f4\u4ea7\u751f\u6f5c\u5728\u51b2\u7a81\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u4e0d\u5b8c\u7f8e\u7684\u68c0\u7d22\u589e\u5f3a\u53ef\u80fd\u662f\u4e0d\u53ef\u907f\u514d\u7684\uff0c\u5e76\u4e14\u4f1a\u5bf9RAG\u7cfb\u7edf\u9020\u6210\u4e25\u91cd\u5f71\u54cd\u3002\u901a\u8fc7\u5728\u73b0\u5b9e\u6761\u4ef6\u4e0b\u7684\u63a7\u5236\u6027\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4ece\u68c0\u7d22\u5230\u7684\u4e0d\u5b8c\u6574\u77e5\u8bc6\u4e0eLLM\u5185\u90e8\u77e5\u8bc6\u4e4b\u95f4\u7684\u77e5\u8bc6\u51b2\u7a81\u662fRAG\u540e\u5904\u7406\u9636\u6bb5\u9700\u8981\u514b\u670d\u7684\u5173\u952e\u74f6\u9888\u3002 \u4e3a\u4e86\u4f7fLLM\u5728\u9762\u5bf9\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u65f6\u5177\u6709\u9c81\u68d2\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7cbe\u660eRAG\u201d\u8fd9\u4e00\u65b0\u9896\u7684RAG\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u80fd\u591f\u9002\u5f53\u5730\u6fc0\u53d1LLM\u5185\u90e8\u77e5\u8bc6\u4e2d\u7684\u5173\u952e\u4fe1\u606f\uff0c\u901a\u8fc7\u6e90\u610f\u8bc6\u5730\u6574\u5408\u5185\u90e8\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u6700\u7ec8\u6839\u636e\u4fe1\u606f\u53ef\u9760\u6027\u786e\u5b9a\u7b54\u6848\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4f7f\u7528\u4e86Gemini\u548cClaude\u4e24\u4e2a\u6a21\u578b\u9a8c\u8bc1\u4e86\u201c\u7cbe\u660eRAG\u201d\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5176\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u589e\u5f3aRAG\u9c81\u68d2\u6027\u7684\u65b9\u6cd5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u6700\u574f\u60c5\u51b5\u573a\u666f\u4e0b\uff0c\u201c\u7cbe\u660eRAG\u201d\u662f\u552f\u4e00\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u6ca1\u6709RAG\u7684LLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002 \u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u201c\u7cbe\u660eRAG\u201d\u6709\u6548\u5730\u89e3\u51b3\u4e86\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u63d0\u9ad8\u4e86RAG\u7cfb\u7edf\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2410.07173": "|**2024-10-09**|**Do better language models have crisper vision?**|Jona Ruthardt et.al.|[2410.07173](http://arxiv.org/abs/2410.07173)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u6587\u672c\u4ec5\u4f9d\u8d56\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u4e16\u754c\u65b9\u9762\u7684\u8868\u73b0\u3002\u968f\u7740LLMs\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u8fd9\u4e00\u95ee\u9898\u53d8\u5f97\u65e2\u57fa\u7840\u53c8\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7684\u573a\u666f\u4e0a\uff0c\u5982\u751f\u6210\u89c6\u89c9\u5185\u5bb9\u6216\u5bf9\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u805a\u7c7b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u89c6\u89c9\u6587\u672c\u8868\u793a\u57fa\u51c6\u201d\uff08ViTeRB\uff09\u7684\u4efb\u52a1\uff0c\u65e8\u5728\u8bc6\u522b\u51fa\u80fd\u591f\u4e0e\u89c6\u89c9\u4e16\u754c\u9ad8\u5ea6\u4e00\u81f4\u7684\u5173\u952e\u5c5e\u6027\u3002\u57fa\u4e8e\u6b64\u4efb\u52a1\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u4e3a\u4e2d\u5fc3\u7684\u8bed\u5883\u4e0b\u4f5c\u4e3a\u6587\u672c\u8868\u793a\u7684\u7406\u60f3\u5019\u9009\uff0c\u8fd9\u4e0e\u5f53\u524d\u4f7f\u7528\u6587\u672c\u7f16\u7801\u5668\u7684\u505a\u6cd5\u5f62\u6210\u4e86\u5bf9\u6bd4\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201cShareLock\u201d\u2014\u2014\u4e00\u79cd\u8d85\u8f7b\u91cf\u7ea7\u7684\u7c7b\u4f3cCLIP\u7684\u6a21\u578b\u3002\u901a\u8fc7\u5229\u7528\u4ece\u5f3a\u5927\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u9884\u8ba1\u7b97\u7684\u51bb\u7ed3\u7279\u5f81\uff0cShareLock\u5728ImageNet\u4e0a\u53d6\u5f97\u4e8651%\u7684\u51c6\u786e\u7387\uff0c\u4ec5\u4f7f\u7528\u4e86563,000\u5f20\u56fe\u50cf-\u63cf\u8ff0\u5bf9\u3002\u6b64\u5916\uff0c\u8bad\u7ec3\u6240\u9700\u7684\u8d44\u6e90\u4ec5\u4e3a1\u4e2aGPU\u5c0f\u65f6\uff08\u6216\u5305\u62ec\u7279\u5f81\u9884\u8ba1\u7b97\u768410\u4e2a\u5c0f\u65f6\uff09\uff0c\u8fdc\u5c11\u4e8e\u4ee5\u5f80\u65b9\u6cd5\u6240\u9700\u7684\u65f6\u95f4\u6570\u91cf\u7ea7\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u8be5\u4ee3\u7801\u3002|\n", "2410.07167": "|**2024-10-09**|**Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate**|Qidong Huang et.al.|[2410.07167](http://arxiv.org/abs/2410.07167)|**[link](https://github.com/shikiw/modality-integration-rate)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u3001\u7a33\u5065\u7684\u4e14\u901a\u7528\u7684\u6307\u6807\u2014\u2014\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u7528\u4e8e\u8861\u91cf\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(LVLMs)\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u8d28\u91cf\u3002\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u5728\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u800c\u5982\u4f55\u5728\u6602\u8d35\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u4e4b\u524d\u8bc4\u4f30\u5176\u8bad\u7ec3\u8d28\u91cf\u5219\u662f\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u7d22\u7684\u9886\u57df\u3002\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\uff0c\u5e38\u7528\u7684\u9884\u8bad\u7ec3\u6307\u6807\u5305\u62ec\u635f\u5931\u3001\u56f0\u60d1\u5ea6\u4ee5\u53ca\u4e0a\u4e0b\u6587\u5185\u8bc4\u4f30\u7ed3\u679c\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u8fd9\u4e9b\u6307\u6807\u5728\u5bf9\u826f\u597d\u8bad\u7ec3\u7684LLMs\u4e0e\u65b0\u6a21\u6001\u8fdb\u884c\u5bf9\u9f50\u65f6\u5e76\u4e0d\u5177\u6709\u5f88\u597d\u7684\u6307\u793a\u6027\u3002\u7531\u4e8e\u7f3a\u4e4f\u5408\u9002\u7684\u6307\u6807\uff0cLVLMs\u5728\u5173\u952e\u7684\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u7814\u7a76\u53d7\u5230\u4e86\u6781\u5927\u7684\u963b\u788d\uff0c\u5305\u62ec\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u9ad8\u6548\u6a21\u5757\u8bbe\u8ba1\u7b49\u3002\u672c\u6587\u63d0\u51fa\u4ece\u8de8\u6a21\u6001\u5206\u5e03\u8ddd\u79bb\u7684\u89d2\u5ea6\u6765\u8bc4\u4f30\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u5f15\u5165\u4e86\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u8be5\u6307\u6807\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a1\uff09**\u6709\u6548**\u5730\u4ee3\u8868\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u4e0e\u7ecf\u8fc7\u76d1\u7763\u5fae\u8c03\u540e\u7684\u57fa\u51c6\u6027\u80fd\u5448\u73b0\u6b63\u76f8\u5173\uff1b2\uff09**\u7a33\u5065**\u4e8e\u4e0d\u540c\u7684\u8bad\u7ec3/\u8bc4\u4f30\u6570\u636e\uff1b3\uff09**\u6cdb\u5316**\u4e8e\u591a\u79cd\u8bad\u7ec3\u914d\u7f6e\u548c\u67b6\u6784\u9009\u62e9\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u9884\u8bad\u7ec3\u5b9e\u9a8c\u4ee5\u63a2\u7d22MIR\u7684\u6709\u6548\u6027\uff0c\u5e76\u89c2\u5bdf\u5230\u4ee4\u4eba\u6ee1\u610f\u7684\u7ed3\u679c\uff0c\u5373MIR\u80fd\u591f\u6307\u793a\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u8bad\u7ec3\u7b56\u7565\u8c03\u5ea6\u4ee5\u53ca\u6a21\u578b\u67b6\u6784\u8bbe\u8ba1\u4ee5\u83b7\u5f97\u66f4\u597d\u7684\u9884\u8bad\u7ec3\u7ed3\u679c\u3002\u6211\u4eec\u5e0c\u671bMIR\u80fd\u591f\u6210\u4e3a\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u7684\u6709\u7528\u6307\u6807\uff0c\u5e76\u6fc0\u53d1\u4e0d\u540c\u9886\u57df\u5173\u4e8e\u6a21\u6001\u5bf9\u9f50\u7684\u540e\u7eed\u7814\u7a76\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/shikiw/Modality-Integration-Rate\u3002**|\n", "2410.07166": "|**2024-10-09**|**Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making**|Manling Li et.al.|[2410.07166](http://arxiv.org/abs/2410.07166)|**[link](https://github.com/embodied-agent-interface/embodied-agent-interface)**|**\u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u7684\u8868\u73b0\uff0c\u867d\u7136\u5df2\u6709\u5927\u91cf\u7814\u7a76\u5229\u7528LLMs\u5904\u7406\u5b9e\u4f53\u5316\u73af\u5883\u4e2d\u7684\u51b3\u7b56\u95ee\u9898\uff0c\u4f46\u6211\u4eec\u4ecd\u7f3a\u4e4f\u5bf9\u5176\u6027\u80fd\u7684\u5168\u9762\u7406\u89e3\u3002\u73b0\u6709\u5de5\u4f5c\u901a\u5e38\u5728\u4e0d\u540c\u9886\u57df\u3001\u9488\u5bf9\u4e0d\u540c\u76ee\u7684\u3001\u57fa\u4e8e\u4e0d\u540c\u8f93\u5165\u548c\u8f93\u51fa\u6784\u5efaLLMs\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u7edf\u4e00\u8bc4\u4ef7\u5b83\u4eec\u3002\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u6700\u7ec8\u7684\u6210\u529f\u7387\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u8bc6\u522bLLMs\u7f3a\u5931\u7684\u80fd\u529b\u4ee5\u53ca\u95ee\u9898\u6240\u5728\uff0c\u8fdb\u800c\u963b\u788d\u4e86\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u6709\u6548\u4e14\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u63a5\u53e3\uff08\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u63a5\u53e3\uff09\uff0c\u65e8\u5728\u652f\u6301\u5404\u79cd\u4efb\u52a1\u7c7b\u578b\u4e0eLLM\u6a21\u5757\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u7684\u7edf\u4e00\u5316\u3002\u5177\u4f53\u800c\u8a00\uff0c\u8be5\u63a5\u53e3\u5141\u8bb8\uff1a 1. \u7edf\u4e00\u591a\u79cd\u6d89\u53ca\u72b6\u6001\u4e0e\u65f6\u95f4\u5ef6\u4f38\u76ee\u6807\u7684\u5b9e\u4f53\u5316\u51b3\u7b56\u4efb\u52a1\u3002 2. \u7edf\u4e00\u56db\u79cd\u5e38\u7528\u7684\u7528\u4e8e\u51b3\u7b56\u7684LLM\u6a21\u5757\uff1a\u76ee\u6807\u89e3\u91ca\u3001\u5b50\u76ee\u6807\u5206\u89e3\u3001\u52a8\u4f5c\u5e8f\u5217\u89c4\u5212\u548c\u8fc7\u6e21\u5efa\u6a21\u3002 3. \u63d0\u4f9b\u4e00\u7cfb\u5217\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u5c06\u8bc4\u4f30\u7ec6\u5206\u4e3a\u5404\u79cd\u9519\u8bef\u7c7b\u578b\uff0c\u5982\u5e7b\u89c9\u9519\u8bef\u3001\u53ef\u7528\u6027\u9519\u8bef\u3001\u4e0d\u540c\u7c7b\u578b\u89c4\u5212\u9519\u8bef\u7b49\u3002 \u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4e0d\u540c\u5b50\u4efb\u52a1\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\uff0c\u63ed\u793a\u4e86LLM\u9a71\u52a8\u7684\u5b9e\u4f53\u5316\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u5f3a\u9879\u4e0e\u5f31\u70b9\uff0c\u5e76\u4e3a\u6709\u6548\u548c\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002**|\n", "2410.07163": "|**2024-10-09**|**Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning**|Chongyu Fan et.al.|[2410.07163](http://arxiv.org/abs/2410.07163)|**[link](https://github.com/OPTML-Group/Unlearn-Simple)**|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53bb\u5b66\u4e60\u95ee\u9898\uff0c\u5373\u5728\u4e0d\u91cd\u65b0\u4ece\u5934\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u6d88\u9664\u4e0d\u9700\u8981\u7684\u6570\u636e\u5f71\u54cd\u4ee5\u53ca\u76f8\u5173\u6a21\u578b\u80fd\u529b\uff08\u5982\u7248\u6743\u6570\u636e\u6216\u6709\u5bb3\u5185\u5bb9\u751f\u6210\uff09\uff0c\u540c\u65f6\u4fdd\u7559\u5fc5\u8981\u7684\u6a21\u578b\u529f\u80fd\u3002\u5c3d\u7ba1\u5bf9LLM\u53bb\u5b66\u4e60\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5c1a\u672a\u5f62\u6210\u4e00\u79cd\u539f\u7406\u6027\u7684\u4f18\u5316\u6846\u67b6\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u56de\u987e\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014\u8d1f\u504f\u597d\u4f18\u5316\uff08NPO\uff09\uff0c\u5e76\u53d1\u73b0\u4e86\u53c2\u8003\u6a21\u578b\u504f\u89c1\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31NPO\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u662f\u5728\u53bb\u5b66\u4e60\u4e0d\u540c\u96be\u5ea6\u6570\u636e\u65f6\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u53bb\u5b66\u4e60\u4f18\u5316\u6846\u67b6\u2014\u2014SimNPO\uff0c\u8868\u660e\u901a\u8fc7\u7b80\u5355\u7684\u504f\u597d\u4f18\u5316\u51cf\u5c11\u5bf9\u53c2\u8003\u6a21\u578b\u7684\u4f9d\u8d56\uff08\u4ece\u7b80\u5316\u89c6\u89d2\u6765\u770b\uff09\u6709\u52a9\u4e8e\u53bb\u5b66\u4e60\u8fc7\u7a0b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u6df1\u5165\u7684SimNPO\u4f18\u52bf\u5206\u6790\uff0c\u901a\u8fc7\u6df7\u5408\u9a6c\u5c14\u53ef\u592b\u94fe\u7684\u5206\u6790\u65b9\u6cd5\u652f\u6301\u8fd9\u4e00\u89c2\u70b9\u3002 \u6211\u4eec\u901a\u8fc7\u5728TOFU\u548cMUSE\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86SimNPO\u76f8\u5bf9\u4e8e\u73b0\u6709\u53bb\u5b66\u4e60\u57fa\u7ebf\u7684\u4f18\u8d8a\u6027\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5bf9\u91cd\u65b0\u5b66\u4e60\u653b\u51fb\u7684\u9c81\u68d2\u6027\u3002\u6240\u6709\u4ee3\u7801\u5747\u53ef\u5728GitHub\u4e0a\u7684https://github.com/OPTML-Group/Unlearn-Simple\u83b7\u53d6\u3002|\n", "2410.07155": "|**2024-10-09**|**Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis**|Bohan Zeng et.al.|[2410.07155](http://arxiv.org/abs/2410.07155)|**[link](https://github.com/yangling0818/trans4d)**|**\u8fd1\u671f\u5728\u6269\u6563\u6a21\u578b\u9886\u57df\u7684\u8fdb\u5c55\u5c55\u793a\u4e86\u5176\u5728\u56fe\u50cf\u548c\u89c6\u9891\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u4e864D\u5408\u6210\u7684\u6709\u6548\u6027\u3002\u73b0\u6709\u76844D\u751f\u6210\u65b9\u6cd5\u80fd\u591f\u6839\u636e\u7528\u6237\u53cb\u597d\u7684\u6761\u4ef6\u751f\u6210\u9ad8\u8d28\u91cf\u76844D\u5bf9\u8c61\u6216\u573a\u666f\uff0c\u5bf9\u6e38\u620f\u548c\u89c6\u9891\u884c\u4e1a\u5927\u6709\u88e8\u76ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5408\u6210\u590d\u67424D\u8fc7\u6e21\u548c\u573a\u666f\u5185\u5bf9\u8c61\u4ea4\u4e92\u7684\u663e\u8457\u53d8\u5f62\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTrans4D\u7684\u521b\u65b0\u6587\u672c\u52304D\u5408\u6210\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u771f\u5b9e\u53ef\u4fe1\u7684\u573a\u666f\u7ea7\u590d\u6742\u8fc7\u6e21\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u751f\u6210\u7269\u7406\u610f\u8bc6\u7684\u573a\u666f\u63cf\u8ff0\u4ee5\u8fdb\u884c4D\u573a\u666f\u521d\u59cb\u5316\u4ee5\u53ca\u6709\u6548\u8fc7\u6e21\u65f6\u95f4\u89c4\u5212\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51e0\u4f55\u611f\u77e5\u76844D\u8fc7\u6e21\u7f51\u7edc\uff0c\u57fa\u4e8e\u8ba1\u5212\u5b9e\u73b0\u590d\u6742\u7684\u573a\u666f\u7ea74D\u8fc7\u6e21\uff0c\u6d89\u53ca\u8868\u73b0\u529b\u5f3a\u7684\u5bf9\u8c61\u51e0\u4f55\u53d8\u5f62\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cTrans4D\u5728\u751f\u6210\u5177\u6709\u51c6\u786e\u6027\u548c\u9ad8\u8d28\u91cf\u8fc7\u6e21\u76844D\u573a\u666f\u65b9\u9762\u59cb\u7ec8\u8d85\u8d8a\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/Trans4D**|\n", "2410.07129": "|**2024-10-09**|**Mental Disorders Detection in the Era of Large Language Models**|Gleb Kuzmin et.al.|[2410.07129](http://arxiv.org/abs/2410.07129)|null|\u672c\u6587\u6bd4\u8f83\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3001\u7f16\u7801\u5668\u57fa\u6a21\u578b\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6291\u90c1\u75c7\u548c\u7126\u8651\u75c7\u68c0\u6d4b\u4efb\u52a1\u4e0a\u7684\u6548\u679c\u3002\u8003\u8651\u4e86\u4e94\u4e2a\u4e0d\u540c\u683c\u5f0f\u7684\u6570\u636e\u5e93\uff0c\u6bcf\u4e2a\u6570\u636e\u5e93\u90fd\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6765\u5b9a\u4e49\u76ee\u6807\u75c5\u7406\u5b66\u7c7b\u522b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u57fa\u4e8e\u8bed\u8a00\u7279\u5f81\u7684AutoML\u6a21\u578b\u3001\u591a\u79cd\u53d8\u4f53\u7684Transformer\u7f16\u7801\u5668\uff0c\u5982BERT\uff0c\u4ee5\u53ca\u6700\u5148\u8fdb\u7684LLM\u4f5c\u4e3a\u75c5\u7406\u5206\u7c7b\u6a21\u578b\u3002\u7ed3\u679c\u8868\u660e\uff0cLLM\u5728\u566a\u58f0\u5927\u4e14\u8bad\u7ec3\u6837\u672c\u5728\u6587\u672c\u957f\u5ea6\u548c\u7c7b\u578b\u4e0a\u5dee\u5f02\u663e\u8457\u7684\u5c0f\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5f53\u5728\u786e\u8bca\u4e3a\u6291\u90c1\u75c7\u4e2a\u4f53\u7684\u6587\u672c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u4f18\u4e8e\u4f20\u7edf\u7684\u5fc3\u7406\u8bed\u8a00\u5b66\u7279\u5f81\u548c\u7f16\u7801\u5668\u57fa\u6a21\u578b\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u5728\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2410.07113": "|**2024-10-09**|**Personalized Visual Instruction Tuning**|Renjie Pi et.al.|[2410.07113](http://arxiv.org/abs/2410.07113)|**[link](https://github.com/sterzhang/pvit)**|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u5c55\u5c55\u73b0\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5b58\u5728\u4e00\u4e2a\u660e\u663e\u7684\u5c40\u9650\u6027\u2014\u2014\u201c\u9762\u90e8\u76f2\u75c7\u201d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u4e00\u822c\u6027\u7684\u5bf9\u8bdd\uff0c\u4f46\u5374\u65e0\u6cd5\u9488\u5bf9\u7279\u5b9a\u4e2a\u4f53\u8fdb\u884c\u4e2a\u6027\u5316\u5bf9\u8bdd\u3002\u8fd9\u4e00\u7f3a\u9677\u963b\u788d\u4e86MLLMs\u5728\u4e2a\u6027\u5316\u573a\u666f\u4e2d\u7684\u5e94\u7528\uff0c\u5982\u5b9a\u5236\u5316\u7684\u79fb\u52a8\u8bbe\u5907\u89c6\u89c9\u52a9\u624b\u6216\u9700\u8981\u8bc6\u522b\u5bb6\u5ead\u6210\u5458\u7684\u5bb6\u7528\u673a\u5668\u4eba\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4e2a\u6027\u5316\u89c6\u89c9\u6307\u4ee4\u8c03\u6574\uff08PVIT\uff09\u7684\u65b0\u9896\u6570\u636e\u6574\u7406\u4e0e\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u4f7fMLLMs\u80fd\u591f\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u76ee\u6807\u4e2a\u4f53\uff0c\u5e76\u5c55\u5f00\u4e2a\u6027\u5316\u4e14\u8fde\u8d2f\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5f00\u53d1\u4e00\u4e2a\u590d\u6742\u7684\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u4e3b\u751f\u6210\u5305\u542b\u4e2a\u6027\u5316\u5bf9\u8bdd\u7684\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4e2a\u7ba1\u9053\u5229\u7528\u4e86\u5404\u79cd\u89c6\u89c9\u4e13\u5bb6\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u548c\uff08\u591a\u6a21\u6001\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30MLLMs\u7684\u4e2a\u6027\u5316\u6f5c\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aP-Bench\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u5305\u62ec\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u6211\u4eec\u6574\u7406\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u4e2a\u6027\u5316\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u667a\u80fd\u4f53\u53d8\u5f97\u8d8a\u6765\u8d8a\u81ea\u4e3b\uff0c\u5e76\u4e14\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\u65f6\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u6a21\u5f0f\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u9884\u89c1\u53ef\u80fd\u4ea7\u751f\u7684\u65b0\u73b0\u8c61\u4ee5\u53ca\u6f5c\u5728\u98ce\u9669\u3002\u672c\u6587\u53d7\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u542f\u53d1\uff0c\u4e13\u6ce8\u4e8e\u7814\u7a76\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u80cc\u666f\u7684\u591a\u667a\u80fd\u4f53\u73af\u5883\u4e2d\u7684LLM\u4ea4\u4e92\u6a21\u5f0f\u3002 \u7814\u7a76\u805a\u7126\u4e8e\u4e24\u7c7b\u4e3b\u8981\u73b0\u8c61\uff1a\u8bf4\u670d\u529b\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u8bd5\u56fe\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\uff08\u5982\u83b7\u5f97\u989d\u5916\u7684\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u72f1\uff09\u7684\u56da\u72af\u667a\u80fd\u4f53\u4e4b\u95f4\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u63a2\u8ba8\u3002\u901a\u8fc7\u4f7f\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\uff0c\u5171\u8ba12000\u6b21\u673a\u5668\u95f4\u7684\u5bf9\u8bdd\uff0c\u7814\u7a76\u4e86\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u83b7\u5f97\u4e86\u4ee5\u4e0b\u663e\u8457\u53d1\u73b0\uff1a 1. \u4e00\u4e9b\u6a21\u578b\u5728\u591a\u667a\u80fd\u4f53\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\uff0c\u65e0\u6cd5\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002 2. \u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u76ee\u6807\u5bf9\u667a\u80fd\u4f53\u7684\u8bf4\u670d\u529b\u6709\u663e\u8457\u5f71\u54cd\uff0c\u800c\u5bf9\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002 3. \u667a\u80fd\u4f53\u7684\u89d2\u8272\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u4eba\u683c\u7279\u8d28\uff0c\u5bf9\u56da\u72af\u7684\u8bf4\u670d\u6210\u529f\u51e0\u7387\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u6709\u7740\u76f4\u63a5\u63a8\u52a8\u4f5c\u7528\u3002 4. \u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u7684\u4eba\u683c\u7279\u8d28\uff0c\u4ec5\u901a\u8fc7\u8d4b\u4e88\u89d2\u8272\uff0c\u4e5f\u89c2\u5bdf\u5230\u4e86\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u81ea\u7136\u4ea7\u751f\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ea4\u4e92\u667a\u80fd\u4f53\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8ba8\u8bba\u5177\u6709\u91cd\u8981\u542f\u793a\u3002|\n", "2410.07103": "|**2024-10-09**|**Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context**|Sangwon Yu et.al.|[2410.07103](http://arxiv.org/abs/2410.07103)|null|\u5728\u591a\u8df3\u63a8\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u57fa\u4e8e\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u5185\u7684\u652f\u6301\u6587\u6863\u8fdb\u884c\u591a\u6b65\u9aa4\u63a8\u7406\u7684\u6311\u6218\u3002LLM\u5f80\u5f80\u96be\u4ee5\u7b5b\u9009\u51fa\u4e0d\u76f8\u5173\u7684\u6587\u6863\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5bf9\u4e0a\u4e0b\u6587\u4e2d\u652f\u6301\u6587\u6863\u7684\u4f4d\u7f6e\u975e\u5e38\u654f\u611f\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u6311\u6218\uff1aLLM\u7684\u6027\u80fd\u4e5f\u5bf9\u5448\u73b0\u652f\u6301\u6587\u6863\u7684\u987a\u5e8f\u975e\u5e38\u654f\u611f\u3002\u6211\u4eec\u5c06\u6b64\u95ee\u9898\u79f0\u4e3a\u201c\u9519\u5e8f\u4e0a\u4e0b\u6587\u95ee\u9898\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6cd5\u2014\u2014\u4e0a\u4e0b\u6587\u91cd\u590d\uff08CoRe\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u591a\u6b21\u63d0\u793a\u6a21\u578b\u4ee5\u786e\u4fdd\u652f\u6301\u6587\u6863\u4ee5\u6700\u4f73\u987a\u5e8f\u5448\u73b0\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002 \u901a\u8fc7\u5e94\u7528CoRe\uff0c\u6211\u4eec\u5728\u591a\u8df3\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684F1\u5f97\u5206\u63d0\u9ad8\u4e86\u9ad8\u8fbe30%\uff0c\u5728\u5408\u6210\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e86\u9ad8\u8fbe70%\u3002\u6b64\u5916\uff0cCoRe\u6709\u52a9\u4e8e\u7f13\u89e3LLM\u666e\u904d\u5b58\u5728\u7684\u201c\u4e2d\u95f4\u8ff7\u5931\u201d\u95ee\u9898\uff0c\u5e76\u53ef\u4ee5\u4e0e\u5229\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63a8\u7406\u7684\u68c0\u7d22\u65b9\u6cd5\u6709\u6548\u7ed3\u5408\u4f7f\u7528\u3002|\n", "2410.08202": "|**2024-10-10**|**Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training**|Gen Luo et.al.|[2410.08202](http://arxiv.org/abs/2410.08202)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5bf9\u6269\u5c55\u5176\u80fd\u529b\u4ee5\u5904\u7406\u591a\u6a21\u6001\u4efb\u52a1\u7684\u5173\u6ce8\u65e5\u76ca\u589e\u52a0\u3002\u5176\u4e2d\uff0c\u5bf9\u5355\u4f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u7814\u7a76\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6574\u5408\u4e86\u89c6\u89c9\u7f16\u7801\u548c\u8bed\u8a00\u89e3\u7801\u529f\u80fd\u3002\u5c3d\u7ba1\u5355\u4f53MLLM\u5728\u7ed3\u6784\u4e0a\u7b80\u6d01\u4e14\u6613\u4e8e\u90e8\u7f72\uff0c\u4f46\u8981\u5b9e\u73b0\u5177\u6709\u7ade\u4e89\u529b\u6027\u80fd\u7684\u8bad\u7ec3\u4ecd\u9762\u4e34\u6311\u6218\u3002\u6d41\u884c\u7684\u7b56\u7565\u91c7\u7528\u8fde\u7eed\u9884\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5c06\u9884\u8bad\u7ec3\u7684LLM\u6269\u5c55\u4e3a\u5355\u4f53MLLM\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u707e\u96be\u6027\u9057\u5fd8\u5e76\u5bfc\u81f4\u6027\u80fd\u9000\u5316\u3002 \u672c\u6587\u65e8\u5728\u4ece\u589e\u91cf\u5b66\u4e60\u7684\u89d2\u5ea6\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5728\u9884\u8bad\u7ec3\u7684LLM\u4e2d\u5d4c\u5165\u89c6\u89c9\u53c2\u6570\uff0c\u901a\u8fc7\u589e\u91cf\u5b66\u4e60\u673a\u5236\uff0c\u5373\u5728\u4f18\u5316\u89c6\u89c9\u53c2\u6570\u65f6\u51bb\u7ed3LLM\uff0c\u4ece\u5927\u91cf\u6570\u636e\u4e2d\u9010\u6b65\u5b66\u4e60\u89c6\u89c9\u77e5\u8bc6\u3002\u57fa\u4e8e\u8fd9\u4e00\u539f\u5219\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMono-InternVL\u7684\u65b0\u578b\u5355\u4f53MLLM\uff0c\u5b83\u901a\u8fc7\u591a\u6a21\u6001\u6df7\u5408\u4e13\u5bb6\u7ed3\u6784\u65e0\u7f1d\u5730\u878d\u5408\u4e86\u4e00\u7cfb\u5217\u89c6\u89c9\u4e13\u5bb6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u9884\u8bad\u7ec3\u7b56\u7565\u6765\u6700\u5927\u5316Mono-InternVL\u7684\u89c6\u89c9\u80fd\u529b\uff0c\u5373\u5185\u751f\u89c6\u89c9\u9884\u8bad\u7ec3\uff08EViP\uff09\u3002\u5177\u4f53\u800c\u8a00\uff0cEViP\u8bbe\u8ba1\u4e3a\u4e00\u4e2a\u89c6\u89c9\u4e13\u5bb6\u7684\u6e10\u8fdb\u5f0f\u5b66\u4e60\u8fc7\u7a0b\uff0c\u65e8\u5728\u5145\u5206\u5229\u7528\u4ece\u4f4e\u8d28\u91cf\u6570\u636e\u5230\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u89c6\u89c9\u77e5\u8bc6\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u572816\u4e2a\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u8bc1\u5b9e\u4e86\u4e0e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u5355\u4f53MLLM\u76f8\u6bd4\uff0cMono-InternVL\u57286\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\u4e0a\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4f8b\u5982\u5728OCRBench\u4e0a\u7684+113\u70b9\u4f18\u52bf\uff0c\u800c\u4e14\u8fd8\u786e\u8ba4\u4e86\u5176\u66f4\u597d\u7684\u90e8\u7f72\u6548\u7387\uff0c\u9996\u6b21\u4ee4\u724c\u5ef6\u8fdf\u964d\u4f4e\u4e86\u9ad8\u8fbe67%\u3002|\n", "2410.08197": "|**2024-10-10**|**From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions**|Changle Qu et.al.|[2410.08197](http://arxiv.org/abs/2410.08197)|**[link](https://github.com/quchangle1/DRAFT)**|**\u672c\u6587\u4e13\u6ce8\u4e8e\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u5b58\u5728\u7684\u7406\u89e3\u9e3f\u6c9f\u95ee\u9898\uff0c\u8fd9\u4e00\u9e3f\u6c9f\u6e90\u4e8e\u73b0\u6709\u4eba\u7c7b\u5bfc\u5411\u7684\u5de5\u5177\u6587\u6863\u7684\u4e0d\u5b8c\u5584\u6027\u548c\u4e0d\u51c6\u786e\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDRAFT\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u52a8\u6001\u4f18\u5316\u5de5\u5177\u6587\u6863\uff0c\u901a\u8fc7\u5206\u6790\u6765\u81eaLLM\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u7684\u53cd\u9988\u548c\u8f68\u8ff9\u4fe1\u606f\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8e\u4e00\u79cd\u521b\u65b0\u7684\u8bd5\u9519\u5b66\u4e60\u6d41\u7a0b\uff0c\u5305\u62ec\u7ecf\u9a8c\u6536\u96c6\u3001\u4ece\u7ecf\u9a8c\u5b66\u4e60\u4ee5\u53ca\u6587\u6863\u91cd\u5199\u4e09\u4e2a\u9636\u6bb5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5de5\u5177\u6587\u6863\u7684\u8d28\u91cf\u3002 \u4e3a\u4e86\u786e\u4fdd\u63a2\u7d22\u7684\u591a\u6837\u6027\u5e76\u907f\u514d\u8fc7\u62df\u5408\uff0cDRAFT\u8fd8\u91c7\u7528\u4e86\u4fc3\u8fdb\u591a\u6837\u6027\u7684\u63a2\u7d22\u7b56\u7565\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u5de5\u5177\u9002\u5e94\u6027\u7ec8\u6b62\u673a\u5236\u6765\u63d0\u9ad8\u6548\u7387\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAFT\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u4f18\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u6587\u6863\u8d28\u91cf\uff0c\u4fc3\u8fdb\u4e86LLM\u5bf9\u5de5\u5177\u7684\u66f4\u6df1\u5165\u7406\u89e3\u548c\u66f4\u6709\u6548\u5229\u7528\u3002\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u4f18\u5316\u540e\u7684\u5de5\u5177\u6587\u6863\u5177\u6709\u5f3a\u5927\u7684\u8de8\u6a21\u578b\u901a\u7528\u80fd\u529b\u3002**|\n", "2410.08196": "|**2024-10-10**|**MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code**|Zimu Lu et.al.|[2410.08196](http://arxiv.org/abs/2410.08196)|**[link](https://github.com/mathllm/mathcoder2)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u4f34\u968f\u63a8\u7406\u6b65\u9aa4\u7684\u6570\u5b66\u4ee3\u7801\uff0c\u4ee5\u8fdb\u884c\u6301\u7eed\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u6574\u5408\u6570\u5b66\u76f8\u5173\u7f51\u7edc\u6570\u636e\u3001\u4f7f\u7528\u6570\u5b66\u5305\u7684\u4ee3\u7801\u3001\u6570\u5b66\u6559\u79d1\u4e66\u548c\u5408\u6210\u6570\u636e\u6765\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6301\u7eed\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u53d6LaTeX\u8868\u8fbe\u5f0f\u3001\u8868\u8fbe\u5f0f\u7684\u6761\u4ef6\u4ee5\u53ca\u7ed3\u679c\u6765\u6784\u9020\u63a8\u7406\u6b65\u9aa4\u3002\u57fa\u4e8e\u8fd9\u4e9b\u63d0\u53d6\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\uff0c\u4ee5\u51c6\u786e\u6355\u6349\u6570\u5b66\u63a8\u7406\u8fc7\u7a0b\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u4ee3\u7801\u9644\u52a0\u5230\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u540e\uff0c\u5f62\u6210\u5305\u542b\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u6b65\u9aa4\u53ca\u5176\u5bf9\u5e94\u4ee3\u7801\u7684\u6570\u636e\u5bf9\u3002\u5c06\u6b64\u6570\u636e\u4e0e\u539f\u59cb\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u5f97\u5230\u4e00\u4e2a\u5305\u542b19.2B\u4e2a\u6807\u8bb0\u7684\u9ad8\u6027\u80fd\u6570\u5b66\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aMathCode-Pile\u3002\u4f7f\u7528\u6b64\u8bed\u6599\u5e93\u5bf9\u51e0\u79cd\u6d41\u884c\u7684\u57fa\u6a21\u8fdb\u884c\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ece\u800c\u4ea7\u751f\u4e86\u540d\u4e3aMathCoder2\u7684\u6a21\u578b\u5bb6\u65cf\u3002\u6240\u6709\u6570\u636e\u5904\u7406\u548c\u8bad\u7ec3\u4ee3\u7801\u5747\u5f00\u6e90\uff0c\u786e\u4fdd\u4e86\u6574\u4e2a\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u6d41\u7a0b\u7684\u900f\u660e\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002\u4ee3\u7801\u5728https://github.com/mathllm/MathCoder2\u4e0a\u53d1\u5e03\u3002**|\n", "2410.08193": "|**2024-10-10**|**GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment**|Yuancheng Xu et.al.|[2410.08193](http://arxiv.org/abs/2410.08193)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4ed4\u7ec6\u5bf9\u9f50\u4ee5\u6ee1\u8db3\u4eba\u7c7b\u7684\u504f\u597d\u3002\u4f20\u7edf\u7684\u8bad\u7ec3\u65f6\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u96c6\u6765\u5fae\u8c03LLM\uff0c\u4f46\u4f1a\u5e26\u6765\u663e\u8457\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u5e76\u4e14\u9700\u8981\u91cd\u590d\u8bad\u7ec3\u4ee5\u5e94\u5bf9\u591a\u6837\u5316\u7684\u7528\u6237\u504f\u597d\u3002\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u6765\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u8f68\u8ff9\u7ea7RM\uff0c\u5b83\u4eec\u65e8\u5728\u8bc4\u4f30\u5b8c\u6574\u54cd\u5e94\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u4e0d\u9002\u5408\u7528\u4e8e\u9700\u8981\u4ece\u90e8\u5206\u54cd\u5e94\u8ba1\u7b97\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\u7684\u81ea\u56de\u5f52\u6587\u672c\u751f\u6210\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenARM\uff0c\u4e00\u79cd\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5229\u7528\u4e86\u81ea\u56de\u5f52\u5956\u52b1\u6a21\u578b\u2014\u2014\u4e00\u79cd\u65b0\u578b\u7684\u5956\u52b1\u53c2\u6570\u5316\u65b9\u6cd5\uff0c\u65e8\u5728\u9884\u6d4b\u81ea\u56de\u5f52\u751f\u6210\u8fc7\u7a0b\u4e2d\u7684\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u548c\u6709\u6548\u7684\u81ea\u56de\u5f52\u751f\u6210\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u53c2\u6570\u5316\u53ef\u4ee5\u5728KL\u6b63\u5219\u5316\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u5185\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\u63a5\u8fd1\u4efb\u4f55\u7531\u4f20\u7edfRM\u53ef\u5b9e\u73b0\u7684\u5206\u5e03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGenARM\u5728\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u57fa\u7ebf\uff0c\u5e76\u4e14\u4e0e\u8bad\u7ec3\u65f6\u65b9\u6cd5\u7684\u6027\u80fd\u76f8\u5f53\u3002\u6b64\u5916\uff0cGenARM\u652f\u6301\u5f31\u5230\u5f3a\u7684\u6307\u5bfc\uff0c\u5141\u8bb8\u5728\u4e0d\u9700\u8981\u8bad\u7ec3\u66f4\u5927\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\uff0c\u901a\u8fc7\u8f83\u5c0f\u7684RM\u5bf9\u66f4\u5927\u7684LLM\u8fdb\u884c\u5bf9\u9f50\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u6210\u672c\u3002\u8fdb\u4e00\u6b65\u5730\uff0cGenARM\u8fd8\u652f\u6301\u591a\u76ee\u6807\u5bf9\u9f50\uff0c\u5141\u8bb8\u5b9e\u65f6\u5e73\u8861\u504f\u597d\u7ef4\u5ea6\uff0c\u6ee1\u8db3\u4e0d\u540c\u7528\u6237\u9700\u6c42\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u3002|\n", "2410.08174": "|**2024-10-10**|**Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models**|Qingni Wang et.al.|[2410.08174](http://arxiv.org/abs/2410.08174)|null|\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRON\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u65e8\u5728\u5bf9\u4efb\u4f55\u652f\u6301\u5728\u5f00\u653e\u548c\u5c01\u95ed\u573a\u666f\u4e0b\u91c7\u6837\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u98ce\u9669\u63a7\u5236\u4e0e\u8bc4\u4f30\u3002TRON\u7531\u4e24\u4e2a\u4e3b\u8981\u7ec4\u4ef6\u6784\u6210\uff1a\uff081\uff09\u4e00\u79cd\u65b0\u9896\u7684\u6821\u51c6\u8bc4\u5206\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ee5\u6700\u5c0f\u5c3a\u5bf8\u91c7\u6837\u54cd\u5e94\u96c6\uff1b\uff082\uff09\u57fa\u4e8e\u81ea\u81f4\u6027\u7406\u8bba\u7684\u975e\u4e00\u81f4\u6027\u8bc4\u5206\uff0c\u901a\u8fc7\u8bbe\u5b9a\u4e24\u79cd\u7279\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u6765\u63a7\u5236\u9519\u8bef\u7387\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u9996\u6b21\u63a2\u8ba8\u4e86\u5728\u5f00\u653e\u573a\u666f\u4e0b\u7684\u9884\u6d4b\u96c6\u4e2d\u7684\u8bed\u4e49\u5197\u4f59\u95ee\u9898\uff0c\u5e76\u636e\u6b64\u63d0\u51fa\u4e86\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4ef7MLLM\u7684\u65b0\u6307\u6807\u2014\u2014\u5e73\u5747\u96c6\u5408\u5927\u5c0f\u3002 \u901a\u8fc7\u5728\u56db\u4e2a\u89c6\u9891\u95ee\u7b54\uff08VideoQA\uff09\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u516b\u79cdMLLM\u8fdb\u884c\u5168\u9762\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86TRON\u80fd\u591f\u5b9e\u73b0\u7528\u6237\u6307\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u8303\u56f4\u5185\u7684\u671f\u671b\u9519\u8bef\u7387\u3002\u540c\u65f6\uff0c\u53bb\u91cd\u540e\u7684\u9884\u6d4b\u96c6\u5728\u4fdd\u6301\u9002\u5e94\u6027\u7684\u540c\u65f6\uff0c\u5c55\u73b0\u51fa\u66f4\u9ad8\u6548\u3001\u7a33\u5b9a\u7684\u98ce\u9669\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u4e0d\u540c\u98ce\u9669\u6c34\u5e73\u4e0b\u5747\u6709\u51fa\u8272\u8868\u73b0\u3002|\n", "2410.08172": "|**2024-10-10**|**On the Evaluation of Generative Robotic Simulations**|Feng Chen et.al.|[2410.08172](http://arxiv.org/abs/2410.08172)|null|\u7531\u4e8e\u83b7\u53d6\u771f\u5b9e\u4e16\u754c\u6570\u636e\u7684\u56f0\u96be\u6027\uff0c\u673a\u5668\u4eba\u6a21\u62df\u5df2\u6210\u4e3a\u5e76\u884c\u8bad\u7ec3\u548c\u6a21\u62df\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u8f6c\u6362\u7684\u5173\u952e\uff0c\u8fd9\u51f8\u663e\u4e86\u53ef\u6269\u5c55\u4eff\u771f\u673a\u5668\u4eba\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002\u57fa\u7840\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u5728\u81ea\u4e3b\u751f\u6210\u53ef\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u65b0\u8303\u5f0f\u5f3a\u8c03\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u81ea\u4e3b\u751f\u6210\u4efb\u52a1\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u751f\u6210\u6a21\u62df\u7684\u5168\u9762\u8bc4\u4ef7\u6846\u67b6\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u8bc4\u4f30\u5206\u4e3a\u4e09\u4e2a\u6838\u5fc3\u65b9\u9762\uff1a\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u3002\u5bf9\u4e8e\u5355\u4efb\u52a1\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u751f\u6210\u4efb\u52a1\u7684\u771f\u5b9e\u6027\u548c\u751f\u6210\u8f68\u8ff9\u7684\u5b8c\u6574\u6027\u3002\u5728\u591a\u6837\u6027\u65b9\u9762\uff0c\u6211\u4eec\u901a\u8fc7\u4efb\u52a1\u63cf\u8ff0\u7684\u6587\u672c\u76f8\u4f3c\u6027\u548c\u6536\u96c6\u7684\u4efb\u52a1\u8f68\u8ff9\u8bad\u7ec3\u7684\u4e16\u754c\u6a21\u578b\u635f\u5931\u6765\u6d4b\u91cf\u4efb\u52a1\u548c\u6570\u636e\u7684\u591a\u6837\u6027\u3002\u5bf9\u4e8e\u4efb\u52a1\u7ea7\u522b\u7684\u6cdb\u5316\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4f7f\u7528\u591a\u4e2a\u751f\u6210\u4efb\u52a1\u8bad\u7ec3\u7684\u7b56\u7565\u5728\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6cdb\u5316\u80fd\u529b\u3002\u5728\u4e09\u4e2a\u4ee3\u8868\u6027\u4efb\u52a1\u751f\u6210\u7ba1\u9053\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7684\u8bc4\u4f30\u7ed3\u679c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u9ad8\u5ea6\u4e00\u81f4\uff0c\u786e\u8ba4\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u6709\u6548\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u53ef\u4ee5\u901a\u8fc7\u67d0\u4e9b\u65b9\u6cd5\u5b9e\u73b0\u8d28\u91cf\u548c\u591a\u6837\u6027\u7684\u6307\u6807\uff0c\u4f46\u6ca1\u6709\u4efb\u4f55\u4e00\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u6240\u6709\u6307\u6807\u4e0a\u90fd\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u8868\u660e\u9700\u8981\u66f4\u591a\u5730\u5173\u6ce8\u5e73\u8861\u8fd9\u4e9b\u4e0d\u540c\u6307\u6807\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u7a81\u663e\u4e86\u5f53\u524d\u5de5\u4f5c\u9762\u4e34\u7684\u5171\u540c\u6311\u6218\u2014\u2014\u4f4e\u6cdb\u5316\u80fd\u529b\u3002 \u533f\u540d\u7f51\u7ad9\u94fe\u63a5\uff1ahttps://sites.google.com/view/evaltasks|\n", "2410.08164": "|**2024-10-10**|**Agent S: An Open Agentic Framework that Uses Computers Like a Human**|Saaket Agashe et.al.|[2410.08164](http://arxiv.org/abs/2410.08164)|**[link](https://github.com/simular-ai/agent-s)**|**\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgent S\u7684\u5f00\u653e\u6027\u4ee3\u7406\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u56fe\u5f62\u7528\u6237\u754c\u9762(GUI)\u4e0e\u8ba1\u7b97\u673a\u8fdb\u884c\u81ea\u4e3b\u4ea4\u4e92\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u590d\u6742\u3001\u591a\u6b65\u9aa4\u7684\u4efb\u52a1\u6765\u6539\u53d8\u4eba\u673a\u4ea4\u4e92\u65b9\u5f0f\u3002Agent S\u65e8\u5728\u89e3\u51b3\u81ea\u52a8\u5316\u8ba1\u7b97\u673a\u4efb\u52a1\u65f6\u9047\u5230\u7684\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\u83b7\u53d6\u7279\u5b9a\u9886\u57df\u7684\u77e5\u8bc6\u3001\u5728\u957f\u4efb\u52a1\u5468\u671f\u5185\u89c4\u5212\u4ee5\u53ca\u5904\u7406\u52a8\u6001\u3001\u975e\u5747\u5300\u7684\u754c\u9762\u3002\u4e3a\u6b64\uff0cAgent S\u5f15\u5165\u4e86\u7ecf\u9a8c\u589e\u5f3a\u7684\u5c42\u6b21\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u591a\u4e2a\u7ea7\u522b\u4e0a\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u641c\u7d22\u548c\u5185\u90e8\u7ecf\u9a8c\u68c0\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u9ad8\u6548\u7684\u4efb\u52a1\u89c4\u5212\u548c\u5b50\u4efb\u52a1\u6267\u884c\u3002\u6b64\u5916\uff0c\u5b83\u91c7\u7528\u4e86\u4ee3\u7406-\u8ba1\u7b97\u673a\u63a5\u53e3(ACI)\uff0c\u57fa\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b(MLLMs)\u66f4\u597d\u5730\u63ed\u793aGUI\u4ee3\u7406\u7684\u63a8\u7406\u548c\u63a7\u5236\u80fd\u529b\u3002\u5728OSWorld\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cAgent S\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e869.37%(\u76f8\u5bf9\u63d0\u9ad8\u4e8683.6%)\uff0c\u5e76\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u9ad8\u6c34\u5e73\u3002\u5168\u9762\u5206\u6790\u5f3a\u8c03\u4e86\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u672a\u6765\u6539\u8fdb\u7684\u89c1\u89e3\u3002\u6b64\u5916\uff0cAgent S\u5728\u65b0\u53d1\u5e03\u7684WindowsAgentArena\u57fa\u51c6\u4e0a\u5c55\u793a\u4e86\u5e7f\u6cdb\u7684\u901a\u7528\u6027\uff0c\u80fd\u591f\u9002\u5e94\u4e0d\u540c\u7684\u64cd\u4f5c\u7cfb\u7edf\u3002\u6709\u5173\u4ee3\u7801\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u9605https://github.com/simular-ai/Agent-S\u3002**|\n", "2410.08146": "|**2024-10-10**|**Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning**|Amrith Setlur et.al.|[2410.08146](http://arxiv.org/abs/2410.08146)|null|\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u7684\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u8fc7\u7a0b\u5956\u52b1\u6a21\u578b\uff08PRMs\uff09\u3002\u4e0e\u4ec5\u5728\u6700\u7ec8\u6b65\u9aa4\u63d0\u4f9b\u53cd\u9988\u7684\u7ed3\u679c\u5956\u52b1\u6a21\u578b\uff08ORMs\uff09\u76f8\u6bd4\uff0cPRMs\u5728\u591a\u6b65\u63a8\u7406\u8ddf\u8e2a\u7684\u6bcf\u4e2a\u6b65\u9aa4\u90fd\u63d0\u4f9b\u53cd\u9988\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u6539\u8fdb\u4fe1\u7528\u5206\u914d\u3002\u7136\u800c\uff0c\u6536\u96c6\u5bc6\u96c6\u3001\u6bcf\u6b65\u9aa4\u7684\u4eba\u7c7b\u6807\u7b7e\u5e76\u4e0d\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u4ece\u81ea\u52a8\u6807\u8bb0\u6570\u636e\u8bad\u7ec3PRMs\u8fc4\u4eca\u4e3a\u6b62\u5bfc\u81f4\u7684\u589e\u76ca\u6709\u9650\u3002\u4e3a\u4e86\u901a\u8fc7\u8fd0\u884c\u641c\u7d22\u6765\u6539\u8fdb\u57fa\u7b56\u7565\u6216\u5c06\u5176\u7528\u4f5c\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7684\u5bc6\u96c6\u5956\u52b1\u6765\u4f18\u5316\u57fa\u7b56\u7565\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u95ee\u9898\u662f\uff1a\u201c\u6211\u4eec\u5e94\u8be5\u5982\u4f55\u8bbe\u8ba1\u8fc7\u7a0b\u5956\u52b1\uff1f\u201d\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0c\u4e3a\u4e86\u6709\u6548\uff0c\u6b65\u9aa4\u7ea7\u5956\u52b1\u5e94\u8be5\u8861\u91cf\u8fdb\u5ea6\uff1a\u91c7\u53d6\u6b65\u9aa4\u524d\u540e\u4ea7\u751f\u6b63\u786e\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u53d8\u5316\uff0c\u5bf9\u5e94\u4e8eRL\u4e2d\u7684\u6b65\u9aa4\u7ea7\u4f18\u52bf\u7684\u6982\u5ff5\u3002\u5173\u952e\u5728\u4e8e\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5e94\u8be5\u5728\u4e0e\u57fa\u7b56\u7565\u4e0d\u540c\u7684\u8bc1\u660e\u7b56\u7565\u4e0b\u8fdb\u884c\u6d4b\u91cf\u3002\u6211\u4eec\u7406\u8bba\u5730\u63cf\u8ff0\u4e86\u826f\u597d\u7684\u8bc1\u660e\u8005\u96c6\u5408\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u8fd9\u6837\u7684\u8bc1\u660e\u8005\u4f18\u5316\u8fc7\u7a0b\u5956\u52b1\u53ef\u4ee5\u6539\u5584\u6d4b\u8bd5\u65f6\u641c\u7d22\u548c\u5728\u7ebfRL\u671f\u95f4\u7684\u63a2\u7d22\u3002\u5b9e\u9645\u4e0a\uff0c\u6211\u4eec\u7684\u63cf\u8ff0\u663e\u793a\uff0c\u5f31\u8bc1\u660e\u8005\u7b56\u7565\u53ef\u4ee5\u663e\u7740\u63d0\u9ad8\u66f4\u5f3a\u7684\u57fa\u7b56\u7565\uff0c\u8fd9\u4e5f\u662f\u6211\u4eec\u5728\u5b9e\u9a8c\u4e0a\u89c2\u5bdf\u5230\u7684\u73b0\u8c61\u3002\u6211\u4eec\u901a\u8fc7\u8bad\u7ec3\u8fc7\u7a0b\u4f18\u52bf\u9a8c\u8bc1\u5668\uff08PAVs\uff09\u6765\u9884\u6d4b\u5728\u8fd9\u4e9b\u8bc1\u660e\u8005\u4e0b\u8fdb\u884c\u7684\u8fdb\u5c55\uff0c\u8bc1\u660e\u4e0eORMs\u76f8\u6bd4\uff0c\u5728\u7ebfRL\u4f7f\u7528PAVs\u63d0\u4f9b\u7684\u5bc6\u96c6\u5956\u52b1\u53ef\u4ee5\u5b9e\u73b0\u9ad8\u8fbe8\uff05\u4ee5\u4e0a\u7684\u51c6\u786e\u6027\u63d0\u9ad8\uff0c\u4ee5\u53ca1.5\u81f35\u500d\u7684\u8ba1\u7b97\u6548\u7387\u63d0\u9ad8\u3002\u4f7f\u7528PAVs\u7684\u5728\u7ebfRL\u9996\u6b21\u5b9e\u73b0\u4e86\u6837\u672c\u6548\u7387\u63d0\u53475-6\u500d\uff0c\u51c6\u786e\u7387\u63d0\u5347\u8d85\u8fc76\uff05\u7684\u7ed3\u679c\u3002|\n", "2410.08145": "|**2024-10-10**|**Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs**|Xiaoyuan Liu et.al.|[2410.08145](http://arxiv.org/abs/2410.08145)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4e2d\u89c6\u89c9\u4fe1\u606f\u4e0e\u6a21\u578b\u5185\u90e8\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u7684\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u7279\u5b9a\u60c5\u51b5\u4e0b\uff0cMLLMs\u53ef\u80fd\u57fa\u4e8e\u6587\u672c\u67e5\u8be2\u800c\u975e\u89c6\u89c9\u8f93\u5165\u505a\u51fa\u51b3\u7b56\uff0c\u5bfc\u81f4\u5e38\u8bc6\u7ea7\u7684\u89c6\u89c9-\u77e5\u8bc6\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\uff0c\u5e76\u8f85\u4ee5\u4eba\u5de5\u8d28\u91cf\u63a7\u5236\u73af\u8282\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6a21\u62df\u548c\u8bc4\u4f30\u6b64\u7c7b\u51b2\u7a81\u7684\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u3002 \u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u542b\u4e86374\u5f20\u539f\u521b\u56fe\u7247\u53ca1122\u4e2a\u9ad8\u8d28\u91cf\u7684\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u8986\u76d6\u4e86\u4e24\u79cd\u51b2\u7a81\u76ee\u6807\u7c7b\u578b\u548c\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\uff0c\u4e3a\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u63d0\u4f9b\u4e86\u5de5\u5177\u3002\u901a\u8fc7\u8fd9\u4e00\u57fa\u51c6\uff0c\u6211\u4eec\u5bf9\u4e5d\u79cd\u4ee3\u8868\u6027\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u4e0e\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u65f6\u5b58\u5728\u663e\u8457\u7684\u6587\u672c\u4f9d\u8d56\u6027\u95ee\u9898\u3002 \u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u7b56\u7565\u2014\u2014\u201c\u805a\u7126\u4e8e\u89c6\u89c9\u201d\uff08FoV\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u6a21\u578b\u5728\u9047\u5230\u51b2\u7a81\u65f6\u4f18\u5148\u8003\u8651\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u77db\u76fe\u6587\u672c\u4fe1\u606f\u7684\u4f9d\u8d56\u3002\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u4ee5\u53ca\u63d0\u51fa\u7684\u7b56\u7565\u5bf9\u7406\u89e3\u5e76\u7f13\u89e3MLLM\u4e2d\u7684\u89c6\u89c9-\u77e5\u8bc6\u51b2\u7a81\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002 \u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u548c\u4ee3\u7801\u7684\u516c\u5f00\u8bbf\u95ee\u6743\u9650\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002|\n", "2410.08143": "|**2024-10-10**|**DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory**|Yutong Wang et.al.|[2410.08143](http://arxiv.org/abs/2410.08143)|**[link](https://github.com/yutongwang1216/docmtagent)**|**\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u76f8\u5f53\u53ef\u89c2\u7684\u8d28\u91cf\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5f53\u524d\u7684MT-LLM\u7814\u7a76\u4ecd\u7136\u9762\u4e34\u5728\u5904\u7406\u6574\u4e2a\u6587\u6863\u65f6\u4fdd\u6301\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u51c6\u786e\u6027\u7684\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aDelTA\u7684\u6587\u6863\u7ea7\u7ffb\u8bd1\u4ee3\u7406\uff0c\u65e8\u5728\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u3002DelTA\u5177\u6709\u4e00\u79cd\u591a\u5c42\u6b21\u8bb0\u5fc6\u7ed3\u6784\uff0c\u80fd\u591f\u5b58\u50a8\u4e0d\u540c\u7c92\u5ea6\u548c\u8de8\u5ea6\u7684\u4fe1\u606f\uff0c\u5305\u62ec\u4e13\u6709\u540d\u8bcd\u8bb0\u5f55\u3001\u53cc\u8bed\u6458\u8981\u3001\u957f\u671f\u8bb0\u5fc6\u548c\u77ed\u671f\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u4fe1\u606f\u7531\u8f85\u52a9\u7684LLM\u7ec4\u4ef6\u8fde\u7eed\u68c0\u7d22\u548c\u66f4\u65b0\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u56db\u4e2a\u5f00\u6e90/\u95ed\u6e90LLM\u548c\u4e24\u4e2a\u4ee3\u8868\u6027\u6587\u6863\u7ffb\u8bd1\u6570\u636e\u96c6\u4e0a\uff0cDelTA\u5728\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u8d28\u91cf\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u5f3a\u5927\u7684\u57fa\u7ebf\uff0c\u5e73\u5747\u4e00\u81f4\u6027\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe4.58\u4e2a\u767e\u5206\u70b9\uff0cCOMET\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe3.16\u70b9\u3002DelTA\u91c7\u7528\u9010\u53e5\u7ffb\u8bd1\u7b56\u7565\uff0c\u786e\u4fdd\u65e0\u53e5\u5b50\u9057\u6f0f\uff0c\u5e76\u63d0\u4f9b\u4e0e\u4e3b\u6d41\u65b9\u6cd5\u76f8\u6bd4\u66f4\u4e3a\u5185\u5b58\u9ad8\u6548\u7684\u9009\u62e9\u3002\u6b64\u5916\uff0cDelTA\u63d0\u9ad8\u4e86\u4ee3\u8bcd\u7ffb\u8bd1\u51c6\u786e\u6027\uff0c\u5e76\u4e14\u4ee3\u7406\u7684\u6458\u8981\u7ec4\u4ef6\u4e5f\u663e\u793a\u51fa\u4f5c\u4e3a\u57fa\u4e8e\u67e5\u8be2\u7684\u6458\u8981\u4efb\u52a1\u5de5\u5177\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u53d1\u5e03\u5728https://github.com/YutongWang1216/DocMTAgent\u3002**|\n", "2410.09040": "|**2024-10-11**|**AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation**|Zijun Wang et.al.|[2410.09040](http://arxiv.org/abs/2410.09040)|**[link](https://github.com/ucsc-vlaa/attngcg-attack)**|**\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d7\u5230\u56da\u7981\u653b\u51fb\u7684\u8106\u5f31\u6027\uff0c\u7279\u522b\u5173\u6ce8\u57fa\u4e8e\u4f18\u5316\u7684\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u7b56\u7565\u3002\u6211\u4eec\u9996\u5148\u89c2\u5bdf\u5230\u653b\u51fb\u7684\u6709\u6548\u6027\u4e0e\u6a21\u578b\u5185\u90e8\u884c\u4e3a\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u5f53\u6a21\u578b\u5bf9\u65e8\u5728\u786e\u4fddLLM\u5b89\u5168\u5bf9\u9f50\u7684\u7cfb\u7edf\u63d0\u793a\u7ed9\u4e88\u66f4\u591a\u5173\u6ce8\u65f6\uff0c\u653b\u51fb\u5f80\u5f80\u6548\u679c\u8f83\u5dee\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u64cd\u7eb5\u6a21\u578b\u7684\u6ce8\u610f\u529b\u5206\u6570\u6765\u4fc3\u8fdbLLM\u7684\u56da\u7981\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aAttnGCG\u3002\u5b9e\u9a8c\u4e0a\uff0cAttnGCG\u5728\u5404\u79cdLLMs\u4e0a\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6539\u8fdb\uff0c\u5728Llama-2\u7cfb\u5217\u4e2d\u5e73\u5747\u63d0\u9ad8\u4e86\u7ea67%\uff0c\u5728Gemma\u7cfb\u5217\u4e2d\u63d0\u9ad8\u4e86\u7ea610%\u3002\u6211\u4eec\u7684\u7b56\u7565\u8fd8\u5c55\u793a\u4e86\u9488\u5bf9\u672a\u89c1\u8fc7\u7684\u6709\u5bb3\u76ee\u6807\u548c\u9ed1\u76d2LLMs\uff08\u5982GPT-3.5\u548cGPT-4\uff09\u7684\u7a33\u5065\u653b\u51fb\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u6211\u4eec\u7684\u6ce8\u610f\u529b\u5206\u6570\u53ef\u89c6\u5316\u66f4\u6613\u4e8e\u89e3\u91ca\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u66f4\u597d\u5730\u4e86\u89e3\u5982\u4f55\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u6ce8\u610f\u529b\u64cd\u7eb5\u5b9e\u73b0\u66f4\u6709\u6548\u7684\u56da\u7981\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4ee3\u7801\uff0c\u53ef\u5728https://github.com/UCSC-VLAA/AttnGCG-attack\u4e2d\u83b7\u53d6\u3002**|\n", "2410.09039": "|**2024-10-11**|**Semi-Supervised Learning of Noisy Mixture of Experts Models**|Oh-Ran Kwon et.al.|[2410.09039](http://arxiv.org/abs/2410.09039)|null|\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\u662f\u4e00\u4e2a\u7075\u6d3b\u7684\u9884\u6d4b\u5efa\u6a21\u6846\u67b6\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65f6\u4ee3\u91cd\u65b0\u5f15\u8d77\u4e86\u4eba\u4eec\u7684\u5173\u6ce8\u3002\u4e00\u4e2a\u7531\u9884\u6d4b\u201c\u4e13\u5bb6\u201d\u7ec4\u6210\u7684\u96c6\u5408\u4e0e\u63a7\u5236\u5728\u9884\u6d4b\u65f6\u6bcf\u4e2a\u4e13\u5bb6\u5f71\u54cd\u529b\u7684\u201c\u95e8\u63a7\u51fd\u6570\u201d\u5171\u540c\u5b66\u4e60\u3002\u8fd9\u79cd\u7ed3\u6784\u5141\u8bb8\u76f8\u5bf9\u7b80\u5355\u7684\u6a21\u578b\u5728\u590d\u6742\u3001\u5f02\u6784\u7684\u6570\u636e\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u5728\u5f53\u4eca\u8bb8\u591a\u5e94\u7528\u573a\u666f\u4e2d\uff0c\u672a\u6807\u8bb0\u6570\u636e\u5e7f\u6cdb\u53ef\u7528\u800c\u6807\u6ce8\u6570\u636e\u5374\u96be\u4ee5\u83b7\u53d6\u3002\u534a\u76d1\u7763\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5229\u7528\u672a\u6807\u8bb0\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8eMoE\u6a21\u578b\u534a\u76d1\u7763\u5b66\u4e60\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u6d77\u6d0b\u5b66\u5bb6\u5f00\u53d1\u7684\u4e00\u79cd\u5047\u8bbe\u5f3a\u70c8\u7684\u534a\u76d1\u7763MoE\u6a21\u578b\u5f00\u59cb\uff0c\u8be5\u6a21\u578b\u5047\u8bbe\u672a\u6807\u6ce8\u6570\u636e\u4e2d\u7684\u6f5c\u5728\u805a\u7c7b\u7ed3\u6784\u76f4\u63a5\u6620\u5c04\u5230\u76d1\u7763\u4efb\u52a1\u4e2d\u6bcf\u4e2a\u4e13\u5bb6\u5e94\u7ed9\u4e88\u7684\u5f71\u54cd\u3002\u6211\u4eec\u653e\u677e\u4e86\u8fd9\u4e00\u5047\u8bbe\uff0c\u8bbe\u60f3\u4e24\u8005\u4e4b\u95f4\u5b58\u5728\u566a\u58f0\u8fde\u63a5\uff0c\u5e76\u57fa\u4e8e\u6700\u5c0f\u5316\u5254\u9664\u5e73\u65b9\u7b97\u6cd5\u63d0\u51fa\u4e86\u4e00\u79cd\u7b97\u6cd5\uff0c\u5373\u4f7f\u5b58\u5728\u6570\u636e\u9519\u4f4d\u4e5f\u80fd\u6210\u529f\u3002\u6211\u4eec\u7684\u7406\u8bba\u5206\u6790\u786e\u5b9a\u4e86\u8be5\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u63a5\u8fd1\u53c2\u6570\u7387\u6536\u655b\u4f30\u8ba1\u5668\u7684\u6761\u4ef6\u3002\u6a21\u62df\u548c\u771f\u5b9e\u6570\u636e\u793a\u4f8b\u8bc1\u660e\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.09038": "|**2024-10-11**|**SimpleStrat: Diversifying Language Model Generation with Stratification**|Justin Wong et.al.|[2410.09038](http://arxiv.org/abs/2410.09038)|null|\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u6837\u5316\u54cd\u5e94\u5bf9\u4e8e\u89c4\u5212/\u641c\u7d22\u548c\u5408\u6210\u6570\u636e\u751f\u6210\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5e94\u7528\u9700\u8981\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u591a\u6837\u5316\u7684\u7b54\u6848\uff0c\u4ee5\u4fbf\u5728\u6bcf\u6b21\u751f\u6210\u65f6\u90fd\u80fd\u5f97\u5230\u4e0d\u540c\u7684\u7ed3\u679c\u3002\u4e4b\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u589e\u52a0\u6e29\u5ea6\u6765\u63d0\u9ad8\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u4e0e\u666e\u904d\u8ba4\u8bc6\u76f8\u53cd\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u968f\u7740\u6e29\u5ea6\u589e\u52a0\uff0c\u4e2a\u4f53\u751f\u6210\u7684\u8d28\u91cf\u964d\u4f4e\uff0c\u800c\u4e14\u5176\u6709\u6548\u6027\u8fd8\u53d6\u51b3\u4e8e\u6a21\u578b\u7684\u4e0b\u4e00\u4e2a\u8bcd\u6982\u7387\u4e0e\u771f\u5b9e\u7b54\u6848\u5206\u5e03\u7684\u76f8\u4f3c\u6027\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cSimpleStrat\u201d\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bed\u8a00\u6a21\u578b\u672c\u8eab\u5bf9\u7a7a\u95f4\u8fdb\u884c\u5206\u533a\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u5206\u533a\u5e76\u5728\u5176\u4e2d\u62bd\u53d6\u6837\u672c\u3002\u4e3a\u4e86\u8861\u91cf\u591a\u6837\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86CoverageQA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u5177\u6709\u591a\u4e2a\u540c\u7b49\u53ef\u80fd\u7b54\u6848\u7684\u672a\u6307\u5b9a\u95ee\u9898\u3002\u901a\u8fc7\u6d4b\u91cf\u8f93\u51fa\u5206\u5e03\u4e0e\u6709\u6548\u5730\u9762\u771f\u76f8\u7b54\u6848\u7684\u5747\u5300\u5206\u5e03\u4e4b\u95f4\u7684KL\u6563\u5ea6\u6765\u8bc4\u4f30\u591a\u6837\u6027\u3002\u7531\u4e8e\u8ba1\u7b97\u4e13\u7528\u6a21\u578b\u6bcf\u6761\u54cd\u5e94/\u89e3\u51b3\u65b9\u6848\u7684\u6982\u7387\u901a\u5e38\u662f\u4e0d\u53ef\u884c\u7684\uff0c\u56e0\u6b64\u6211\u4eec\u4f7f\u7528\u53ec\u56de\u7387\u6765\u8bc4\u4f30\u5730\u771f\u7406\u89e3\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528SimpleStrat\u65b9\u6cd5\u53ef\u4ee5\u5b9e\u73b0\u6bd4GPT-4o\u9ad80.05\u7684\u53ec\u56de\u7387\uff0c\u5e76\u4e14\u5e73\u5747\u51cf\u5c11\u4e860.36\u7684KL\u6563\u5ea6\u4e0eLlama 3\u76f8\u6bd4\u3002|\n", "2410.09037": "|**2024-10-11**|**Mentor-KD: Making Small Language Models Better Multi-step Reasoners**|Hojae Lee et.al.|[2410.09037](http://arxiv.org/abs/2410.09037)|**[link](https://github.com/2hojae/mentor-kd)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63d0\u793a\u5728\u5404\u79cd\u590d\u6742\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u975e\u51e1\u7684\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u84b8\u998f\uff08KD\uff09\u65b9\u6cd5\u2014\u2014\u63a8\u7406\u84b8\u998f\uff0c\u901a\u8fc7\u5fae\u8c03\u7531LLM\u6559\u5e08\u751f\u6210\u7684\u591a\u6b65\u63a8\u7406\u8bed\u8a00\u6a21\u578b\uff0c\u5c06LLM\u7684\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u6a21\u578b\u4e0a\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7814\u7a76\u5728\u4ee5\u4e0b\u4e24\u4e2a\u65b9\u9762\u8003\u8651\u4e0d\u8db3\uff1a\u4eceLLM\u6559\u5e08\u6a21\u578b\u83b7\u53d6\u7684\u793a\u4f8b\u96c6\u8d28\u91cf\u4f4e\u548c\u8f6f\u6807\u7b7e\u63d0\u4f9b\u4e0d\u8db3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bfc\u5e08-KD\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6709\u6548\u5730\u5c06LLM\u7684\u591a\u6b65\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u5e76\u89e3\u51b3\u4e86\u4e0a\u8ff0\u6311\u6218\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u4e00\u4e2a\u5bfc\u5e08\u2014\u2014\u7279\u5b9a\u4efb\u52a1\u7684\u4e2d\u95f4\u5927\u5c0f\u7684\u5fae\u8c03\u6a21\u578b\u2014\u2014\u6765\u589e\u52a0\u989d\u5916\u7684CoT\u6ce8\u91ca\u5e76\u4e3a\u5b66\u751f\u6a21\u578b\u63d0\u4f9b\u8f6f\u6807\u7b7e\uff0c\u4ee5\u5728\u63a8\u7406\u84b8\u998f\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u786e\u8ba4\u4e86\u5bfc\u5e08-KD\u5728\u4e0d\u540c\u6a21\u578b\u548c\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|Ptychography\u662f\u4e00\u79cd\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u7684\u9ad8\u7ea7\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5728\u7269\u7406\u5b66\u3001\u5316\u5b66\u3001\u751f\u7269\u5b66\u548c\u6750\u6599\u79d1\u5b66\u7b49\u7814\u7a76\u9886\u57df\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u5b9e\u8df5\u8fc7\u7a0b\u4e2d\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684ptychographic\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u4f17\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u5de5\u4f5c\u6548\u7387\u4f4e\u4e0b\uff0c\u5e76\u53ef\u80fd\u5f15\u5165\u4eba\u4e3a\u504f\u89c1\u3002\u672c\u5de5\u4f5c\u5f00\u53d1\u4e86\u201cptychographic\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u5904\u7406ptychography\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u91c7\u7528\u4e86\u591a\u4e2aLLM\u4ee3\u7406\u8fdb\u884c\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u4e5f\u662f\u5982\u6b64\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u8bbe\u8ba1\u6709\u53ef\u81ea\u5b9a\u4e49\u7684\u672c\u5730\u77e5\u8bc6\u5e93\uff0c\u4ee5\u786e\u4fdd\u5176\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e0b\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09013": "|**2024-10-11**|**The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals**|Xiaofeng Wu et.al.|[2410.09013](http://arxiv.org/abs/2410.09013)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5229\u7528\u6c49\u5b57\u4e2d\u7684\u89c6\u89c9\u4fe1\u606f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5173\u4e8e\u90e8\u9996\u3001\u7ed3\u6784\u3001\u7b14\u753b\u4ee5\u53ca\u7b14\u753b\u6570\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5bf9\u6c49\u5b57\u4e2d\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u7a0b\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u63d0\u4f9b\u5b57\u7b26\u56fe\u50cf\uff0c\u6a21\u578b\u4ecd\u7136\u5c55\u793a\u4e86\u6709\u9650\u4f46\u90e8\u5206\u7406\u89e3\u89c6\u89c9\u4fe1\u606f\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u6fc0\u53d1\u6a21\u578b\u5229\u7528\u90e8\u9996\u8fdb\u884c\u4e2d\u6587\u7406\u89e3\u4efb\u52a1\u7684\u6f5c\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5c1d\u8bd5\u5c06\u90e8\u9996\u4fe1\u606f\u878d\u5165\u5230\u63d0\u793a\u4e2d\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5728\u63d0\u4f9b\u5173\u4e8e\u90e8\u9996\u7684\u989d\u5916\u4fe1\u606f\u65f6\uff0c\u8bcd\u6027\u6807\u6ce8\u4efb\u52a1\u7684\u8868\u73b0\u5f97\u5230\u4e86\u4e00\u81f4\u6027\u7684\u63d0\u5347\u3002\u8fd9\u8868\u660e\u901a\u8fc7\u6574\u5408\u5b50\u5b57\u7b26\u4fe1\u606f\uff0c\u6709\u53ef\u80fd\u589e\u5f3a\u8bed\u8a00\u5904\u7406\u80fd\u529b\u3002|\n", "2410.09012": "|**2024-10-11**|**Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models**|Hao Li et.al.|[2410.09012](http://arxiv.org/abs/2410.09012)|null|\u672c\u6587\u9996\u6b21\u4ece\u5b9e\u8df5\u8005\u7684\u89c6\u89d2\u5206\u6790\u4e86\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u9876\u7ea7\u79d1\u6280\u516c\u53f8\u7684155\u7bc7FM4SE\u548c997\u7bc7SE4FM\u535a\u5ba2\u6587\u7ae0\uff0c\u5229\u7528\u57fa\u4e8eFM\u7684\u8c03\u7814\u65b9\u6cd5\u7cfb\u7edf\u5730\u6807\u8bb0\u548c\u603b\u7ed3\u4e86\u8ba8\u8bba\u7684\u6d3b\u52a8\u548c\u4efb\u52a1\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u4ee3\u7801\u751f\u6210\u662fFM4SE\u4e2d\u6700\u7a81\u51fa\u7684\u4efb\u52a1\uff0c\u4f46FMs\u8fd8\u88ab\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3001\u603b\u7ed3\u548cAPI\u63a8\u8350\u7b49\u4f17\u591a\u5176\u4ed6SE\u6d3b\u52a8\u3002\u5173\u4e8eSE4FM\u7684\u5927\u591a\u6570\u535a\u5ba2\u6587\u7ae0\u5173\u6ce8\u4e8e\u6a21\u578b\u90e8\u7f72\u4e0e\u64cd\u4f5c\u4ee5\u53ca\u7cfb\u7edf\u67b6\u6784\u4e0e\u7f16\u6392\u3002\u5c3d\u7ba1\u4e91\u90e8\u7f72\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u4f46\u5bf9FMs\u8fdb\u884c\u538b\u7f29\u5e76\u5728\u8fb9\u7f18\u6216\u79fb\u52a8\u8bbe\u5907\u4e0a\u90e8\u7f72\u7684\u5174\u8da3\u6b63\u5728\u589e\u957f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u516b\u4e2a\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u5f25\u5408\u7406\u8bba\u53d1\u73b0\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e0d\u4ec5\u4e30\u5bcc\u4e86FMs\u5728SE\u9886\u57df\u5b9e\u8df5\u5e94\u7528\u7684\u77e5\u8bc6\u4f53\u7cfb\uff0c\u8fd8\u5c55\u793a\u4e86FMs\u5728\u6280\u672f\u4e0e\u7070\u8272\u6587\u732e\u9886\u57df\u8fdb\u884c\u6587\u732e\u8c03\u7814\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u63d0\u4f9b\u7684\u6570\u636e\u96c6\u3001\u7ed3\u679c\u3001\u4ee3\u7801\u4ee5\u53ca\u4f7f\u7528\u7684\u63d0\u793a\u53ef\u4ee5\u5728\u5728\u7ebf\u590d\u5236\u5305https://github.com/SAILResearch/fmse-blogs\u4e2d\u627e\u5230\u3002|\n", "2410.09008": "|**2024-10-11**|**SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights**|Ling Yang et.al.|[2410.09008](http://arxiv.org/abs/2410.09008)|**[link](https://github.com/yangling0818/supercorrect-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u3001PaLM\u548cLLaMA\u5728\u5404\u79cd\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8f83\u5c0f\u7684\u6a21\u578b\u5982Llama-3-8B\u548cDeepSeekMath-Base\u4ecd\u7136\u5728\u590d\u6742\u7684\u6570\u5b66\u63a8\u7406\u65b9\u9762\u5b58\u5728\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u6709\u6548\u5730\u8bc6\u522b\u5e76\u7ea0\u6b63\u63a8\u7406\u9519\u8bef\u3002\u8fd1\u671f\u7684\u53cd\u601d\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4f7f\u6a21\u578b\u80fd\u591f\u81ea\u6211\u53cd\u601d\u548c\u81ea\u6211\u6821\u6b63\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u4ecd\u9762\u4e34\u72ec\u7acb\u68c0\u6d4b\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u9519\u8bef\u7684\u6311\u6218\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSuperCorrect\u7684\u65b0\u578b\u4e24\u9636\u6bb5\u6846\u67b6\uff0c\u5b83\u4f7f\u7528\u5927\u578b\u6559\u5e08\u6a21\u578b\u6765\u76d1\u7763\u548c\u7ea0\u6b63\u8f83\u5c0f\u5b66\u751f\u6a21\u578b\u7684\u63a8\u7406\u548c\u53cd\u601d\u8fc7\u7a0b\u3002 \u5728\u7b2c\u4e00\u9636\u6bb5\uff0c\u6211\u4eec\u4ece\u6559\u5e08\u6a21\u578b\u4e2d\u63d0\u53d6\u4e86\u5c42\u6b21\u5316\u7684\u9ad8\u9636\u548c\u8be6\u7ec6\u7684\u601d\u60f3\u6a21\u677f\uff0c\u4ee5\u6307\u5bfc\u5b66\u751f\u6a21\u578b\u751f\u6210\u66f4\u7ec6\u81f4\u7684\u63a8\u7406\u601d\u60f3\u3002\u5728\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8de8\u6a21\u578b\u534f\u4f5c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u6765\u589e\u5f3a\u5b66\u751f\u6a21\u578b\u7684\u81ea\u6211\u6821\u6b63\u80fd\u529b\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8ddf\u968f\u6559\u5e08\u7684\u4fee\u6b63\u8f68\u8ff9\u8fdb\u884c\u6539\u8fdb\u3002\u8fd9\u79cd\u8de8\u6a21\u578bDPO\u65b9\u6cd5\u6559\u4f1a\u5b66\u751f\u6a21\u578b\u901a\u8fc7\u4ece\u6559\u5e08\u6a21\u578b\u83b7\u5f97\u7684\u9519\u8bef\u9a71\u52a8\u7684\u89c1\u89e3\u6709\u6548\u5730\u5b9a\u4f4d\u5e76\u89e3\u51b3\u9519\u8bef\u7684\u601d\u60f3\uff0c\u6253\u7834\u5176\u601d\u60f3\u7684\u74f6\u9888\uff0c\u5e76\u901a\u8fc7\u5b66\u4e60\u65b0\u6280\u80fd\u548c\u77e5\u8bc6\u6765\u5e94\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684\u95ee\u9898\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u4e00\u81f4\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u4f18\u8d8a\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684SuperCorrect-7B\u6a21\u578b\u5728MATH/GSM8K\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684DeepSeekMath-7B\u548cQwen2.5-Math-7B\uff0c\u5206\u522b\u5728MATH\u548cGSM8K\u57fa\u51c6\u4e0a\u63d0\u9ad8\u4e867.8%/5.3%\u548c15.1%/6.3%\uff0c\u5728\u6240\u67097B\u6a21\u578b\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6027\u80fd\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/SuperCorrect-llm**|\n", "2410.09006": "|**2024-10-11**|**From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts**|Zhuohao Jerry Zhang et.al.|[2410.09006](http://arxiv.org/abs/2410.09006)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5728\u521b\u5efa\u80fd\u591f\u901a\u8fc7\u7528\u6237\u754c\u9762\uff08UI\uff09\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u5982\u4f55\u5bfc\u822aUI\u4ee5\u53ca\u7406\u89e3UI\u7ed3\u6784\u7684\u673a\u5236\uff0c\u4f46\u4ee3\u7406\u53ca\u5176\u81ea\u4e3b\u884c\u4e3a\uff08\u7279\u522b\u662f\u53ef\u80fd\u5177\u6709\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u6027\u7684\u884c\u4e3a\uff09\u7684\u5f71\u54cd\u548c\u540e\u679c\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86AI\u4ee3\u7406UI\u64cd\u4f5c\u7684\u5b9e\u9645\u4e16\u754c\u5f71\u54cd\u548c\u540e\u679c\u3002 \u6211\u4eec\u9996\u5148\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0e\u9886\u57df\u4e13\u5bb6\u7684\u5de5\u4f5c\u574a\u5f00\u53d1\u4e86\u4e00\u79cdUI\u64cd\u4f5c\u5f71\u54cd\u7684\u5206\u7c7b\u7cfb\u7edf\u3002\u968f\u540e\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6570\u636e\u7efc\u5408\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u7528\u6237\u611f\u77e5\u4e3a\u5177\u6709\u5f71\u54cd\u529b\u7684UI\u5c4f\u5e55\u8f68\u8ff9\u548c\u64cd\u4f5c\u6570\u636e\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u5f71\u54cd\u7c7b\u522b\u5bf9\u6536\u96c6\u7684\u6570\u636e\u548c\u4ece\u73b0\u6709UI\u5bfc\u822a\u6570\u636e\u96c6\u4e2d\u91cd\u65b0\u5229\u7528\u7684\u6570\u636e\u8fdb\u884c\u4e86\u6ce8\u91ca\u3002\u6211\u4eec\u5bf9\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u53d8\u4f53\u7684\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\u4e86\u8fd9\u4e9bLLM\u7406\u89e3\u548c\u9884\u6d4bAI\u4ee3\u7406\u53ef\u80fd\u91c7\u53d6\u7684UI\u64cd\u4f5c\u5f71\u54cd\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u5206\u7c7b\u7cfb\u7edf\u589e\u5f3a\u4e86\u8fd9\u4e9bLLM\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5b83\u4eec\u80fd\u591f\u66f4\u597d\u5730\u7406\u89e3UI\u64cd\u4f5c\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4ed6\u4eec\u5728\u53ef\u9760\u5730\u5206\u7c7b\u66f4\u5fae\u5999\u6216\u590d\u6742\u7684\u5f71\u54cd\u529b\u7c7b\u522b\u65f6\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u7684\u95ee\u9898\u3002|\n", "2410.08996": "|**2024-10-11**|**Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference**|Grace Proebsting et.al.|[2410.08996](http://arxiv.org/abs/2410.08996)|null|\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-4\u3001Llama-2\u548cMistral 7b\u7b49\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u751f\u6210\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u5047\u8bbe\uff0c\u6d4b\u8bd5\u4e86\u7528LLM\u66ff\u6362\u4f17\u5305\u5de5\u4f5c\u8005\u5bf9\u4ea7\u751f\u6ce8\u91ca\u504f\u89c1\u7684\u5f71\u54cd\u3002\u6211\u4eec\u590d\u73b0\u4e86\u65af\u5766\u798fNLI\u8bed\u6599\u5e93\u7684\u90e8\u5206\u6570\u636e\uff0c\u5e76\u8bad\u7ec3\u4e86\u4ec5\u4f7f\u7528\u5047\u8bbe\u7684\u5206\u7c7b\u5668\u6765\u786e\u5b9aLLM\u751f\u6210\u7684\u5047\u8bbe\u662f\u5426\u5305\u542b\u6ce8\u91ca\u504f\u89c1\u3002\u5728\u6211\u4eec\u7684\u7531LLM\u751f\u6210\u7684NLI\u6570\u636e\u96c6\u4e0a\uff0c\u57fa\u4e8eBERT\u7684\u4ec5\u5047\u8bbe\u5206\u7c7b\u5668\u8fbe\u5230\u4e8686%-96%\u7684\u51c6\u786e\u7387\uff0c\u8fd9\u8868\u660e\u8fd9\u4e9b\u6570\u636e\u96c6\u5305\u542b\u4ec5\u5047\u8bbe\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0LLM\u751f\u6210\u7684\u5047\u8bbe\u4e2d\u5b58\u5728\u9891\u7e41\u7684\u201c\u7ebf\u7d22\u201d\uff0c\u4f8b\u5982\uff0c\u201c\u5728\u6cf3\u6c60\u91cc\u6e38\u6cf3\u201d\u8fd9\u4e00\u77ed\u8bed\u5728GPT-4\u751f\u6210\u768410000\u591a\u4e2a\u77db\u76fe\u5047\u8bbe\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660eNLI\u4e2d\u5df2\u77e5\u7684\u504f\u89c1\u53ef\u80fd\u5728LLM\u751f\u6210\u7684\u6570\u636e\u4e2d\u6301\u7eed\u5b58\u5728\u3002|\n", "2410.10819": "|**2024-10-14**|**DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads**|Guangxuan Xiao et.al.|[2410.10819](http://arxiv.org/abs/2410.10819)|**[link](https://github.com/mit-han-lab/duo-attention)**|**\u90e8\u7f72\u957f\u4e0a\u4e0b\u6587\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6311\u6218\u3002\u7f13\u5b58\u6240\u6709\u6ce8\u610f\u529b\u5934\u4e2d\u7684Key\u548cValue\uff08KV\uff09\u72b6\u6001\u4f1a\u6d88\u8017\u5927\u91cf\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\u8981\u4e48\u635f\u5bb3\u4e86LLM\u7684\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\uff0c\u8981\u4e48\u53ea\u63d0\u4f9b\u4e86\u6709\u9650\u7684\u6548\u7387\u63d0\u5347\u3002\u672c\u6587\u53d1\u73b0\uff0c\u53ea\u6709\u90e8\u5206\u6ce8\u610f\u529b\u5934\uff0c\u5373\u68c0\u7d22\u5934\uff0c\u5bf9\u4e8e\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u662f\u81f3\u5173\u91cd\u8981\u7684\uff0c\u5e76\u4e14\u9700\u8981\u5bf9\u6240\u6709\u6807\u8bb0\u8fdb\u884c\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u673a\u5236\u3002\u76f8\u53cd\uff0c\u6240\u6709\u5176\u4ed6\u5934\u90e8\uff0c\u4e3b\u8981\u5173\u6ce8\u6700\u8fd1\u7684\u6807\u8bb0\u4ee5\u53ca\u6ce8\u610f\u529b\u6c47\u70b9\uff0c\u79f0\u4e3a\u6d41\u5934\u90e8\uff0c\u4e0d\u9700\u8981\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86DuoAttention\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4ec5\u5bf9\u68c0\u7d22\u5934\u5e94\u7528\u5b8c\u6574\u7684KV\u7f13\u5b58\uff0c\u800c\u5bf9\u6d41\u5934\u90e8\u4f7f\u7528\u8f7b\u91cf\u7ea7\u3001\u56fa\u5b9a\u957f\u5ea6\u7684KV\u7f13\u5b58\uff0c\u4ece\u800c\u5728\u4e0d\u635f\u5bb3\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51cf\u5c11LLM\u89e3\u7801\u548c\u9884\u586b\u5145\u7684\u5185\u5b58\u548c\u5ef6\u8fdf\u3002DuoAttention\u91c7\u7528\u4e86\u4e00\u79cd\u57fa\u4e8e\u4f18\u5316\u7684\u7b97\u6cd5\uff0c\u4f7f\u7528\u5408\u6210\u6570\u636e\u51c6\u786e\u8bc6\u522b\u68c0\u7d22\u5934\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u5185\u5b58\u6700\u591a\u51cf\u5c11\u4e862.55\u500d\uff08\u5bf9\u4e8eMHA\u6a21\u578b\uff09\u548c1.67\u500d\uff08\u5bf9\u4e8eGQA\u6a21\u578b\uff09\uff0c\u540c\u65f6\u89e3\u7801\u901f\u5ea6\u63d0\u9ad8\u4e86\u6700\u591a2.18\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.50\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u52a0\u901f\u9884\u586b\u5145\u6700\u591a1.73\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.63\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u4e14\u4e0e\u5168\u6ce8\u610f\u529b\u76f8\u6bd4\uff0c\u7cbe\u5ea6\u635f\u5931\u6700\u5c0f\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7ed3\u5408\u91cf\u5316\u6280\u672f\uff0cDuoAttention\u4f7fLlama-3-8B\u80fd\u591f\u5728\u5355\u4e2aA100 GPU\u4e0a\u89e3\u7801\u957f\u8fbe330\u4e07\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mit-han-lab/duo-attention\u83b7\u53d6\u3002**|\n", "2410.10813": "|**2024-10-14**|**LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory**|Di Wu et.al.|[2410.10813](http://arxiv.org/abs/2410.10813)|**[link](https://github.com/xiaowu0162/longmemeval)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u804a\u5929\u52a9\u624b\u7cfb\u7edf\u5df2\u96c6\u6210\u4e86\u8bb0\u5fc6\u7ec4\u4ef6\u6765\u8ddf\u8e2a\u7528\u6237\u4e0e\u52a9\u624b\u4e4b\u95f4\u7684\u804a\u5929\u5386\u53f2\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u51c6\u786e\u548c\u4e2a\u6027\u5316\u7684\u54cd\u5e94\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6301\u7eed\u4ea4\u4e92\u4e2d\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u4ecd\u9700\u6df1\u5165\u7814\u7a76\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLongMemEval\u7684\u7efc\u5408\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u804a\u5929\u52a9\u624b\u7684\u4e94\u9879\u6838\u5fc3\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\uff1a\u4fe1\u606f\u63d0\u53d6\u3001\u591a\u4f1a\u8bdd\u63a8\u7406\u3001\u65f6\u95f4\u63a8\u7406\u3001\u77e5\u8bc6\u66f4\u65b0\u548c\u5f03\u6743\u3002\u8be5\u57fa\u51c6\u5305\u542b500\u4e2a\u7cbe\u5fc3\u7b56\u5212\u7684\u95ee\u9898\uff0c\u5e76\u5d4c\u5165\u5728\u81ea\u7531\u6269\u5c55\u7684\u7528\u6237\u4e0e\u52a9\u624b\u804a\u5929\u5386\u53f2\u4e2d\u3002LongMemEval\u5bf9\u73b0\u6709\u7684\u957f\u671f\u8bb0\u5fc6\u7cfb\u7edf\u63d0\u51fa\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5728\u5546\u4e1a\u804a\u5929\u52a9\u624b\u548c\u957f\u4e0a\u4e0b\u6587LLM\u4e0a\uff0c\u8de8\u6301\u7eed\u4ea4\u4e92\u7684\u8bb0\u5fc6\u4fe1\u606f\u4fdd\u7559\u7387\u4e0b\u964d\u4e8630%\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u5c06\u957f\u671f\u8bb0\u5fc6\u8bbe\u8ba1\u5206\u89e3\u4e3a\u7d22\u5f15\u3001\u68c0\u7d22\u548c\u9605\u8bfb\u9636\u6bb5\u7684\u56db\u4e2a\u8bbe\u8ba1\u9009\u62e9\u3002\u57fa\u4e8e\u5173\u952e\u5b9e\u9a8c\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u79cd\u5185\u5b58\u8bbe\u8ba1\uff0c\u5305\u62ec\u4f1a\u8bdd\u5206\u89e3\u4ee5\u4f18\u5316\u503c\u7c92\u5ea6\u3001\u4e8b\u5b9e\u589e\u5f3a\u7684\u5173\u952e\u6269\u5c55\u4ee5\u589e\u5f3a\u7d22\u5f15\u7ed3\u6784\u4ee5\u53ca\u65f6\u95f4\u611f\u77e5\u67e5\u8be2\u6269\u5c55\u4ee5\u7ec6\u5316\u641c\u7d22\u8303\u56f4\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u4f18\u5316\u6781\u5927\u5730\u63d0\u9ad8\u4e86LongMemEval\u4e0a\u7684\u5185\u5b58\u53ec\u56de\u7387\u548c\u4e0b\u6e38\u95ee\u9898\u56de\u7b54\u6027\u80fd\u3002\u603b\u4f53\u800c\u8a00\uff0c\u672c\u7814\u7a76\u4e3a\u63a8\u8fdb\u57fa\u4e8eLLM\u7684\u804a\u5929\u52a9\u624b\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u8d44\u6e90\u548c\u6307\u5bfc\uff0c\u4e3a\u66f4\u4e2a\u6027\u5316\u548c\u53ef\u9760\u7684\u5bf9\u8bddAI\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2410.10814": "|**2024-10-14**|**Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free**|Ziyue Li et.al.|[2410.10814](http://arxiv.org/abs/2410.10814)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u89e3\u7801\u5668-only\u67b6\u6784\u901a\u5e38\u9650\u5236\u4e86\u5b83\u4eec\u4f5c\u4e3a\u5d4c\u5165\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u9664\u975e\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u8868\u793a\u5fae\u8c03\u3002\u8fd9\u662f\u5426\u4e0e\u5b83\u4eec\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u7684\u4e3b\u5f20\u76f8\u77db\u76fe\uff1f\u4e3a\u4e86\u56de\u7b54\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u66f4\u4ed4\u7ec6\u5730\u7814\u7a76\u4e86\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cMoE LLMs\u4e2d\u7684\u4e13\u5bb6\u8def\u7531\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u73b0\u6210\u7684\u5d4c\u5165\u6a21\u578b\uff0c\u5728\u5404\u79cd\u5d4c\u5165\u91cd\u70b9\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5e7f\u6cdb\u7684\u5206\u6790\u8868\u660e\uff0cMoE\u8def\u7531\u6743\u91cd\uff08RW\uff09\u4e0eLLMs\u5e7f\u6cdb\u4f7f\u7528\u7684\u9690\u85cf\u72b6\u6001\uff08HS\uff09\u4e92\u8865\u3002\u4e0eHS\u76f8\u6bd4\uff0c\u6211\u4eec\u53d1\u73b0RW\u5bf9\u63d0\u793a\u7684\u9009\u62e9\u66f4\u5177\u9c81\u68d2\u6027\uff0c\u5e76\u5173\u6ce8\u9ad8\u5c42\u6b21\u8bed\u4e49\u3002\u53d7\u6b64\u5206\u6790\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MoEE\uff0c\u7ed3\u5408\u4e86RW\u548cHS\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5355\u72ec\u4f7f\u7528\u4efb\u4e00\u65b9\u6cd5\u3002\u6211\u4eec\u5bf9\u5b83\u4eec\u7684\u7ec4\u5408\u53ca\u5176\u63d0\u793a\u7b56\u7565\u7684\u63a2\u7d22\u63ed\u793a\u4e86\u82e5\u5e72\u65b0\u9896\u89c1\u89e3\uff0c\u4f8b\u5982\uff0cRW\u548cHS\u76f8\u4f3c\u5ea6\u7684\u52a0\u6743\u548c\u4f18\u4e8e\u5b83\u4eec\u8fde\u63a5\u540e\u7684\u76f8\u4f3c\u5ea6\u3002\u6211\u4eec\u5728\u6765\u81ea\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\u76846\u4e2a\u5d4c\u5165\u4efb\u52a1\u4e2d\u768420\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0cMoEE\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8eLLM\u7684\u5d4c\u5165\u6548\u679c\uff0c\u4e14\u65e0\u9700\u8fdb\u4e00\u6b65\u5fae\u8c03\u3002|\n", "2410.10801": "|**2024-10-14**|**Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning**|Aakanksha et.al.|[2410.10801](http://arxiv.org/abs/2410.10801)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u88ab\u5168\u7403\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e94\u7528\u4e8e\u5404\u79cd\u9886\u57df\u3002\u7136\u800c\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u4f7f\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u504f\u597d\u8bad\u7ec3\u548c\u5b89\u5168\u63aa\u65bd\u5f80\u5f80\u8fc7\u5ea6\u62df\u5408\u4e8e\u897f\u65b9\u4e2d\u5fc3\u6570\u636e\u96c6\u4e2d\u7684\u5371\u5bb3\uff0c\u800c\u5b89\u5168\u534f\u8bae\u901a\u5e38\u65e0\u6cd5\u6269\u5c55\u5230\u591a\u8bed\u8a00\u73af\u5883\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5728\u591a\u6837\u5316\u7684\u591a\u4efb\u52a1\u8bbe\u7f6e\u4e2d\u63a2\u7d22\u6a21\u578b\u5408\u5e76\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7ed3\u5408\u5b89\u5168\u548c\u901a\u7528\u4efb\u52a1\u3002\u6bcf\u79cd\u8bed\u8a00\u5728\u4e0d\u540c\u4efb\u52a1\u4e2d\u5f15\u5165\u4e86\u72ec\u7279\u7684\u5b66\u4e60\u6311\u6218\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u57fa\u4e8e\u76ee\u6807\u7684\u5408\u5e76\u6bd4\u6df7\u5408\u6570\u636e\u66f4\u6709\u6548\uff0c\u603b\u4f53\u6027\u80fd\u548c\u5b89\u5168\u6027\u5206\u522b\u63d0\u9ad8\u4e868%\u548c10%\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u57fa\u4e8e\u8bed\u8a00\u7684\u5408\u5e76\u975e\u5e38\u6709\u6548\u2014\u2014\u901a\u8fc7\u5408\u5e76\u5355\u8bed\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u76f8\u540c\u53ef\u7528\u6570\u636e\u4e0b\uff0c\u76f8\u6bd4\u6df7\u5408\u6570\u636e\u65b9\u6cd5\uff0c\u6574\u4f53\u6027\u80fd\u63d0\u9ad84%\uff0c\u6240\u6709\u8bed\u8a00\u4e0a\u7684\u5371\u5bb3\u51cf\u5c117%\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u5408\u5e76\u65b9\u6cd5\u7684\u7efc\u5408\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6784\u5efa\u5f3a\u5927\u4e14\u5b89\u5168\u7684\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u7528\u6846\u67b6\u3002|\n", "2410.10798": "|**2024-10-15**|**MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling**|Jian Yang et.al.|[2410.10798](http://arxiv.org/abs/2410.10798)|null|\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u8054\u5408\u6982\u7387\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u540c\u65f6\u7406\u89e3\u548c\u751f\u6210\u56fe\u50cf\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u7684\u65b9\u6cd5\u5728\u7406\u89e3\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u4e0d\u53ef\u907f\u514d\u5730\u4f1a\u4e22\u5931\u56fe\u50cf\u4fe1\u606f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u56fe\u50cf\u79bb\u6563\u5316\u6216\u6269\u6563\u53bb\u566a\u6b65\u9aa4\u9020\u6210\u7684\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u81ea\u56de\u5f52\uff08MMAR\uff09\u6982\u7387\u5efa\u6a21\u6846\u67b6\u3002\u4e0e\u79bb\u6563\u5316\u65b9\u6cd5\u4e0d\u540c\uff0cMMAR\u91c7\u7528\u8fde\u7eed\u503c\u7684\u56fe\u50cf\u6807\u8bb0\u6765\u907f\u514d\u4fe1\u606f\u4e22\u5931\u3002\u4e0d\u540c\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u6bcf\u4e2a\u81ea\u56de\u5f52\u56fe\u50cf\u5757\u5d4c\u5165\u9876\u90e8\u6dfb\u52a0\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6269\u6563\u5934\u6765\u89e3\u8026\u6269\u6563\u8fc7\u7a0b\u548c\u81ea\u56de\u5f52\u4e3b\u5e72\u6a21\u578b\u3002\u8fd9\u6837\u4e00\u6765\uff0c\u5f53\u6a21\u578b\u4ece\u56fe\u50cf\u751f\u6210\u8fc7\u6e21\u5230\u901a\u8fc7\u6587\u672c\u751f\u6210\u8fdb\u884c\u7406\u89e3\u65f6\uff0c\u4e3b\u5e72\u6a21\u578b\u5bf9\u56fe\u50cf\u7684\u9690\u85cf\u8868\u793a\u4e0d\u53d7\u9650\u4e8e\u6700\u540e\u7684\u53bb\u566a\u6b65\u9aa4\u3002\u4e3a\u4e86\u6210\u529f\u8bad\u7ec3\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u88ab\u8bc1\u660e\u53ef\u4ee5\u89e3\u51b3\u6570\u503c\u7a33\u5b9a\u6027\u95ee\u9898\u7684\u6280\u672f\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u5e73\u8861\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u76ee\u6807\u7684\u8bad\u7ec3\u7b56\u7565\u3002\u901a\u8fc7\u572818\u4e2a\u56fe\u50cf\u7406\u89e3\u57fa\u51c6\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u8bc4\u4f30\uff0cMMAR\u5c55\u793a\u4e86\u6bd4\u5176\u4ed6\u8054\u5408\u591a\u6a21\u6001\u6a21\u578b\u66f4\u4f18\u8d8a\u7684\u6027\u80fd\uff0c\u5176\u6027\u80fd\u53ef\u4e0e\u91c7\u7528\u9884\u8bad\u7ec3CLIP\u89c6\u89c9\u7f16\u7801\u5668\u7684\u65b9\u6cd5\u76f8\u5ab2\u7f8e\uff0c\u540c\u65f6\u8fd8\u80fd\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u8868\u660e\uff0c\u8be5\u65b9\u6cd5\u5728\u66f4\u5927\u6570\u636e\u96c6\u548c\u66f4\u5927\u6a21\u578b\u89c4\u6a21\u4e0b\u5177\u6709\u53ef\u6269\u5c55\u6027\u3002|\n", "2410.10796": "|**2024-10-14**|**Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance**|Sachin Goyal et.al.|[2410.10796](http://arxiv.org/abs/2410.10796)|**[link](https://github.com/locuslab/context-parametric-inversion)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u6765\u589e\u5f3a\u5176\u9075\u5faa\u7528\u6237\u6307\u4ee4\u548c\u5904\u7406\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u5e38\u5e38\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\uff0c\u5c24\u5176\u662f\u5728\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e0d\u4e00\u81f4\u65f6\u3002\u8fd9\u4f1a\u5bfc\u81f4\u5404\u79cd\u5931\u8d25\uff0c\u4f8b\u5982\u5e7b\u89c9\uff0c\u5373\u54cd\u5e94\u5185\u5bb9\u8fc7\u65f6\u3001\u5e26\u6709\u504f\u89c1\u6216\u5305\u542b\u672a\u7ecf\u9a8c\u8bc1\u7684\u4e8b\u5b9e\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bd5\u56fe\u7406\u89e3\u8fd9\u79cd\u4e0d\u826f\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u7684\u6839\u672c\u539f\u56e0\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee4\u5fae\u8c03\u4e4b\u540e\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u4e2a\u6709\u8da3\u7684\u73b0\u8c61\uff1a\u5728\u6307\u4ee4\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u6700\u521d\u5982\u9884\u671f\u822c\u589e\u52a0\uff0c\u4f46\u968f\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u8fdb\u884c\uff0c\u8fd9\u79cd\u4f9d\u8d56\u6027\u9010\u6e10\u51cf\u5c11\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u79f0\u4e3a\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\uff0c\u5e76\u53d1\u73b0\u5728\u591a\u4e2a\u901a\u7528\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff08\u5982TULU\u3001Alpaca\u548cUltrachat\uff09\u4ee5\u53ca\u6a21\u578b\u5bb6\u65cf\uff08\u5982Llama\u3001Mistral\u548cPythia\uff09\u4e2d\u90fd\u5b58\u5728\u8fd9\u79cd\u73b0\u8c61\u3002\u5728\u4e00\u4e2a\u7b80\u5355\u7684\u7406\u8bba\u8bbe\u7f6e\u4e2d\uff0c\u6211\u4eec\u6cbf\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u68af\u5ea6\u4e0b\u964d\u8f68\u8ff9\u5206\u79bb\u51fa\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\u53d1\u751f\u7684\u539f\u56e0\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u4e0e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u6df7\u5408\u4e2d\u7684\u793a\u4f8b\u8054\u7cfb\u8d77\u6765\uff0c\u8fd9\u4e9b\u793a\u4f8b\u4e2d\u8f93\u5165\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u4fe1\u606f\u5df2\u7ecf\u5b58\u5728\u4e8e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e2d\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u51fa\u4e86\u67d0\u4e9b\u6709\u9650\u7684\u7f13\u89e3\u7b56\u7565\uff0c\u540c\u65f6\u4e5f\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u7406\u8bba\u89c1\u89e3\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5931\u8d25\u6a21\u5f0f\u7684\u4e00\u4e2a\u8d77\u70b9\uff0c\u800c\u8fd9\u4e00\u6a21\u5f0f\u662fLLM\u8bad\u7ec3\u4e2d\u7684\u4e00\u4e2a\u6807\u51c6\u90e8\u5206\u3002**|\n", "2410.10779": "|**2024-10-14**|**Focused ReAct: Improving ReAct through Reiterate and Early Stop**|Shuoqiu Li et.al.|[2410.10779](http://arxiv.org/abs/2410.10779)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\u65b9\u9762\u6709\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u8fd9\u4f53\u73b0\u5728ReAct\u7b49\u65b9\u6cd5\u4e2d\u3002\u7136\u800c\uff0c\u5c3d\u7ba1ReAct\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u975e\u5e38\u6709\u6548\uff0c\u4f46\u5b83\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u4e00\u662f\u5bb9\u6613\u504f\u79bb\u539f\u59cb\u95ee\u9898\uff0c\u4e8c\u662f\u9677\u5165\u884c\u52a8\u5faa\u73af\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86Focused ReAct\uff0c\u8fd9\u662fReAct\u8303\u5f0f\u7684\u4e00\u4e2a\u589e\u5f3a\u7248\u672c\uff0c\u5b83\u7ed3\u5408\u4e86\u91cd\u7533\u548c\u65e9\u671f\u505c\u6b62\u673a\u5236\u3002\u8fd9\u4e9b\u6539\u8fdb\u6709\u52a9\u4e8e\u6a21\u578b\u4fdd\u6301\u5bf9\u539f\u59cb\u95ee\u9898\u7684\u5173\u6ce8\u5e76\u907f\u514d\u91cd\u590d\u884c\u4e3a\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u539f\u59cb\u7684ReAct\u65b9\u6cd5\u76f8\u6bd4\uff0cFocused ReAct\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e8618%\u5230530%\uff0c\u8fd0\u884c\u65f6\u95f4\u51cf\u5c11\u4e86\u6700\u591a34%\u3002|\n", "2410.10762": "|**2024-10-14**|**AFlow: Automating Agentic Workflow Generation**|Jiayi Zhang et.al.|[2410.10762](http://arxiv.org/abs/2410.10762)|**[link](https://github.com/geekan/metagpt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u9886\u57df\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u663e\u8457\u7684\u6f5c\u529b\uff0c\u901a\u5e38\u901a\u8fc7\u91c7\u7528\u9075\u5faa\u8be6\u7ec6\u6307\u4ee4\u548c\u64cd\u4f5c\u5e8f\u5217\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u6765\u5b9e\u73b0\u3002\u7136\u800c\uff0c\u6784\u5efa\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u548c\u901a\u7528\u6027\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8bd5\u56fe\u81ea\u52a8\u5316\u751f\u6210\u548c\u4f18\u5316\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4ecd\u7136\u4f9d\u8d56\u4e8e\u521d\u59cb\u7684\u624b\u52a8\u8bbe\u7f6e\uff0c\u5e76\u4e14\u672a\u80fd\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u5316\u548c\u6709\u6548\u7684\u6d41\u7a0b\u751f\u6210\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c06\u5de5\u4f5c\u6d41\u4f18\u5316\u91cd\u65b0\u8868\u8ff0\u4e3a\u4e00\u4e2a\u4ee3\u7801\u8868\u793a\u7684\u5de5\u4f5c\u6d41\u7a7a\u95f4\u641c\u7d22\u95ee\u9898\uff0c\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u7531LLM\u8c03\u7528\u7684\u8282\u70b9\u901a\u8fc7\u8fb9\u8fde\u63a5\u3002\u6211\u4eec\u5f15\u5165\u4e86AFlow\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\u6709\u6548\u5730\u63a2\u7d22\u8fd9\u4e2a\u7a7a\u95f4\uff0c\u901a\u8fc7\u4ee3\u7801\u4fee\u6539\u3001\u6811\u7ed3\u6784\u7684\u7ecf\u9a8c\u4ee5\u53ca\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u5730\u6539\u8fdb\u5de5\u4f5c\u6d41\u7a0b\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u8bc1\u8bc4\u4f30\u8868\u660e\uff0cAFlow\u7684\u6709\u6548\u6027\uff0c\u5e73\u5747\u6bd4\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u63d0\u9ad8\u4e865.7%\u3002\u6b64\u5916\uff0cAFlow\u4f7f\u5f97\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u80fd\u591f\u8d85\u8d8aGPT-4\uff0c\u540c\u65f6\u5176\u63a8\u7406\u6210\u672c\u4ec5\u4e3aGPT-4\u76844.55%\u3002\u4ee3\u7801\u5c06\u5728https://github.com/geekan/MetaGPT\u83b7\u53d6\u3002**|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u8fd9\u79cd\u653b\u51fb\u901a\u8fc7\u6076\u610f\u8f93\u5165\u5982\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u89e6\u53d1\u6a21\u578b\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ed3\u675f\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\u7684\u60c5\u51b5\u4e0b\uff08\u4f8b\u5982\uff0c\u5bf9\u673a\u5668\u4eba\u7684\u8bed\u97f3\u6307\u4ee4\uff09\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4f9d\u8d56\u81ea\u7136\u6307\u4ee4\u7684\u65b9\u5f0f\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u9650\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u7684\u6700\u5927\u957f\u5ea6\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLMs\u7684\u57fa\u4e8e\u6295\u6bd2\u7684DoS\uff08P-DoS\uff09\u653b\u51fb\u65b9\u6cd5\uff0c\u8bc1\u660e\u901a\u8fc7\u6ce8\u5165\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6295\u6bd2\u6837\u672c\u53ef\u4ee5\u7a81\u7834\u8f93\u51fa\u957f\u5ea6\u7684\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u6295\u6bd2\u6837\u672c\u80fd\u591f\u4ee5\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\u6210\u529f\u653b\u51fbGPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u5bfc\u81f4\u91cd\u590d\u8f93\u51fa\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2a\u6807\u8bb0\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u6295\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u5f00\u6e90LLMs\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u6b64\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u9700\u8981\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u7684\u5b89\u5168\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u83b7\u53d6\u3002**|\n", "2410.10759": "|**2024-10-14**|**SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization**|Akrit Mudvari et.al.|[2410.10759](http://arxiv.org/abs/2410.10759)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u5e74\u6765\u6210\u4e3a\u4e00\u9879\u98a0\u8986\u6027\u7684\u521b\u65b0\uff0c\u5728\u6211\u4eec\u7684\u65e5\u5e38\u751f\u6d3b\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u5b83\u4eec\u7684\u529f\u80fd\u5305\u62ec\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u3001\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u3001\u865a\u62df\u52a9\u624b\u7b49\u3002\u7136\u800c\uff0c\u4f17\u6240\u5468\u77e5\uff0cLLMs\u5728\u53c2\u6570\u6570\u91cf\u4e0a\u975e\u5e38\u5e9e\u5927\u3002\u6b64\u5916\uff0c\u5e95\u5c42\u67b6\u6784Transformer\u4e2d\u7684\u81ea\u6ce8\u610f\u529b\u673a\u5236\u5728\u8ba1\u7b97\u548c\u5185\u5b58\u65b9\u9762\u4e0e\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u5448\u4e8c\u6b21\u590d\u6742\u6027\u5173\u7cfb\u3002\u7531\u4e8e\u8fd9\u4e9b\u539f\u56e0\uff0cLLM\u63a8\u7406\u8d44\u6e90\u5bc6\u96c6\u578b\u9ad8\uff0c\u56e0\u6b64LLM\u63a8\u7406\u7684\u541e\u5410\u91cf\u53d7\u5230\u9650\u5236\uff0c\u5c24\u5176\u662f\u5728\u8f83\u957f\u5e8f\u5217\u7684\u60c5\u51b5\u4e0b\u3002\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u670d\u52a1\u5668\u4e0e\u5176\u5ba2\u6237\u7aef\u4e4b\u95f4\u7684\u534f\u4f5c\u63a8\u7406\u67b6\u6784\uff0c\u4ee5\u7f13\u89e3\u541e\u5410\u91cf\u9650\u5236\u3002\u5728\u8fd9\u4e2a\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u53cc\u65b9\u53ef\u7528\u7684\u8d44\u6e90\uff0c\u5373\u8ba1\u7b97\u548c\u901a\u4fe1\u6210\u672c\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8e\u52a8\u6001\u89c4\u5212\u7684\u7b97\u6cd5\uff0c\u4ee5\u6700\u4f18\u65b9\u5f0f\u5206\u914d\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u8bbe\u5907\u4e4b\u95f4\u7684\u8ba1\u7b97\uff0c\u4ece\u800c\u63d0\u9ad8\u670d\u52a1\u5668\u541e\u5410\u91cf\uff0c\u540c\u65f6\u4e0d\u8fdd\u53cd\u670d\u52a1\u6c34\u5e73\u534f\u8bae\uff08SLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u80fd\u591f\u9ad8\u6548\u5730\u5206\u914d\u5de5\u4f5c\u8d1f\u8f7d\uff0c\u4f7f\u670d\u52a1\u5668\u7684\u5de5\u4f5c\u8d1f\u8f7d\u51cf\u5c11\u7ea6\u4e09\u5206\u4e4b\u4e00\uff0c\u540c\u65f6\u6bd4\u8d2a\u5fc3\u65b9\u6cd5\u63d0\u9ad8\u4e8619%\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5177\u6709\u4e0d\u540c\u7c7b\u578bLLM\u63a8\u7406\u8bf7\u6c42\u7684\u73af\u5883\u4e2d\uff0c\u670d\u52a1\u5668\u7684\u541e\u5410\u91cf\u5f97\u5230\u4e86\u63d0\u5347\u3002|\n", "2410.11841": "|**2024-10-15**|**GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation**|Fei Tang et.al.|[2410.11841](http://arxiv.org/abs/2410.11841)|null|\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u63a8\u8350\uff08LLM-based ER\uff09\u7cfb\u7edf\u5728\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u8350\u89e3\u91ca\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u9762\u4e34\u7740\u5efa\u6a21\u7528\u6237\u4e0e\u9879\u76ee\u4e4b\u95f4\u7684\u534f\u540c\u504f\u597d\u3001\u4e2a\u6027\u5316\u89e3\u91ca\u4ee5\u53ca\u5904\u7406\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGaVaMoE\u7684\u65b0\u6846\u67b6\uff0c\u5373\u9ad8\u65af\u53d8\u5206\u95e8\u63a7\u4e13\u5bb6\u6df7\u5408\u6a21\u578b\uff0c\u7528\u4e8e\u53ef\u89e3\u91ca\u63a8\u8350\u3002GaVaMoE\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(1) \u4e00\u4e2a\u8bc4\u5206\u91cd\u6784\u6a21\u5757\uff0c\u91c7\u7528\u5e26\u6709\u9ad8\u65af\u6df7\u5408\u6a21\u578b\uff08GMM\uff09\u7684\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VAE\uff09\uff0c\u4ee5\u6355\u6349\u590d\u6742\u7684\u7528\u6237-\u9879\u76ee\u534f\u540c\u504f\u597d\uff0c\u4f5c\u4e3a\u9884\u8bad\u7ec3\u7684\u591a\u95e8\u673a\u5236\uff1b(2) \u4e00\u7ec4\u7ec6\u7c92\u5ea6\u7684\u4e13\u5bb6\u6a21\u578b\uff0c\u4e0e\u591a\u95e8\u673a\u5236\u8026\u5408\uff0c\u7528\u4e8e\u751f\u6210\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u89e3\u91ca\u3002VAE\u7ec4\u4ef6\u5bf9\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u56e0\u7d20\u8fdb\u884c\u5efa\u6a21\uff0c\u800cGMM\u5219\u805a\u7c7b\u5177\u6709\u76f8\u4f3c\u884c\u4e3a\u7684\u7528\u6237\u3002\u6bcf\u4e2a\u805a\u7c7b\u5bf9\u5e94\u591a\u95e8\u673a\u5236\u4e2d\u7684\u4e00\u4e2a\u95e8\uff0c\u5c06\u7528\u6237-\u9879\u76ee\u5bf9\u8def\u7531\u5230\u9002\u5f53\u7684\u4e13\u5bb6\u6a21\u578b\u3002\u8fd9\u79cd\u67b6\u6784\u4f7fGaVaMoE\u80fd\u591f\u4e3a\u7279\u5b9a\u7c7b\u578b\u7684\u7528\u6237\u548c\u504f\u597d\u751f\u6210\u5b9a\u5236\u5316\u89e3\u91ca\uff0c\u901a\u8fc7\u5229\u7528\u7528\u6237\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u6765\u7f13\u89e3\u6570\u636e\u7a00\u758f\u95ee\u9898\u3002\u5728\u4e09\u4e2a\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cGaVaMoE\u5728\u89e3\u91ca\u8d28\u91cf\u3001\u4e2a\u6027\u5316\u548c\u4e00\u81f4\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002\u7279\u522b\u662f\uff0c\u5728\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u573a\u666f\u4e2d\uff0cGaVaMoE\u8868\u73b0\u51fa\u7a33\u5065\u7684\u6027\u80fd\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5386\u53f2\u6570\u636e\u6709\u9650\u7684\u7528\u6237\u4e5f\u80fd\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u89e3\u91ca\u3002|\n", "2410.11829": "|**2024-10-15**|**MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding**|Yue Cao et.al.|[2410.11829](http://arxiv.org/abs/2410.11829)|**[link](https://github.com/yuecao0119/MMFuser)**|**\u5c3d\u7ba1\u5728\u8de8\u6a21\u6001\u4ea4\u4e92\u4e2d\u7406\u89e3\u590d\u6742\u7684\u4eba\u7c7b\u610f\u56fe\u65b9\u9762\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u6355\u6349\u590d\u6742\u7684\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6574\u5408\u591a\u4e2a\u89c6\u89c9\u7f16\u7801\u5668\u6765\u589e\u5f3a\u89c6\u89c9\u7ec6\u8282\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u5f15\u5165\u4e86\u5197\u4f59\u548c\u8ba1\u7b97\u5f00\u9500\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5927\u591a\u6570MLLMs\u4ec5\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u7684\u6700\u540e\u4e00\u5c42\u7279\u5f81\u56fe\u6765\u8fdb\u884c\u89c6\u89c9\u8868\u793a\uff0c\u800c\u5ffd\u7565\u4e86\u6d45\u5c42\u7279\u5f81\u56fe\u4e2d\u7684\u4e30\u5bcc\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\modelname\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u591a\u5c42\u7279\u5f81\u878d\u5408\u5668\uff0c\u80fd\u591f\u9ad8\u6548\u5730\u6574\u5408\u6765\u81ea\u89c6\u89c9\u53d8\u6362\u5668\uff08ViTs\uff09\u7684\u6df1\u5c42\u548c\u6d45\u5c42\u7279\u5f81\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5229\u7528\u8bed\u4e49\u5bf9\u9f50\u7684\u6df1\u5c42\u7279\u5f81\u4f5c\u4e3a\u67e5\u8be2\uff0c\u52a8\u6001\u63d0\u53d6\u6d45\u5c42\u7279\u5f81\u4e2d\u7f3a\u5931\u7684\u7ec6\u8282\uff0c\u4ece\u800c\u5728\u4fdd\u6301\u8bed\u4e49\u5bf9\u9f50\u7684\u540c\u65f6\u4e30\u5bcc\u4e86\u8868\u793a\u5f62\u5f0f\u7684\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u5e94\u7528\u4e8eLLaVA-1.5\u6a21\u578b\u65f6\uff0c\\modelname\u5728\u89c6\u89c9\u8868\u793a\u548c\u57fa\u51c6\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u591a\u7f16\u7801\u5668\u96c6\u6210\u65b9\u6cd5\u66f4\u7075\u6d3b\u3001\u66f4\u8f7b\u91cf\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/yuecao0119/MMFuser\u3002**|\n", "2410.11815": "|**2024-10-15**|**SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing**|Zhiyuan Zhang et.al.|[2410.11815](http://arxiv.org/abs/2410.11815)|null|\u573a\u666f\u56fe\u4ee5\u8282\u70b9\u548c\u8fb9\u7684\u5f62\u5f0f\u63d0\u4f9b\u4e86\u56fe\u50cf\u7684\u7ed3\u6784\u5316\u3001\u5206\u5c42\u8868\u793a\uff0c\u5206\u522b\u8868\u793a\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\u3002\u5b83\u53ef\u4ee5\u7528\u4f5c\u56fe\u50cf\u7f16\u8f91\u7684\u81ea\u7136\u754c\u9762\uff0c\u663e\u8457\u63d0\u9ad8\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u3002\u5229\u7528\u8fd9\u4e00\u4f18\u52bf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u76f8\u7ed3\u5408\uff0c\u7528\u4e8e\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u56fe\u50cf\u7f16\u8f91\u3002\u8fd9\u79cd\u96c6\u6210\u4f7f\u5f97\u5728\u5bf9\u8c61\u7ea7\u522b\u8fdb\u884c\u7cbe\u786e\u4fee\u6539\u4ee5\u53ca\u5bf9\u573a\u666f\u8fdb\u884c\u521b\u9020\u6027\u91cd\u6784\u6210\u4e3a\u53ef\u80fd\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u6574\u4f53\u56fe\u50cf\u7684\u5b8c\u6574\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a1\uff09\u5229\u7528LLM\u9a71\u52a8\u7684\u573a\u666f\u89e3\u6790\u5668\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u56fe\u50cf\u7684\u573a\u666f\u56fe\uff0c\u6355\u6349\u5173\u952e\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\uff0c\u5e76\u89e3\u6790\u7ec6\u7c92\u5ea6\u5c5e\u6027\u5982\u5bf9\u8c61\u63a9\u7801\u548c\u63cf\u8ff0\u3002\u8fd9\u4e9b\u6ce8\u91ca\u4fc3\u8fdb\u4e86\u6982\u5ff5\u5b66\u4e60\uff0c\u4f7f\u7528\u5fae\u8c03\u6269\u6563\u6a21\u578b\u6765\u4ee3\u8868\u6bcf\u4e2a\u5bf9\u8c61\uff0c\u7528\u4f18\u5316\u7684\u6807\u8bb0\u548c\u8be6\u7ec6\u7684\u63cf\u8ff0\u63d0\u793a\u8868\u793a\u30022\uff09\u5728\u56fe\u50cf\u7f16\u8f91\u9636\u6bb5\uff0cLLM\u7f16\u8f91\u63a7\u5236\u5668\u6307\u5bfc\u7279\u5b9a\u533a\u57df\u7684\u7f16\u8f91\u3002\u8fd9\u4e9b\u7f16\u8f91\u901a\u8fc7\u6ce8\u610f\u529b\u8c03\u8282\u7684\u6269\u6563\u7f16\u8f91\u5668\u5b9e\u73b0\uff0c\u5229\u7528\u5fae\u8c03\u6a21\u578b\u6267\u884c\u5bf9\u8c61\u6dfb\u52a0\u3001\u5220\u9664\u3001\u66ff\u6362\u548c\u8c03\u6574\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u7f16\u8f91\u7cbe\u5ea6\u548c\u573a\u666f\u7f8e\u5b66\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u56fe\u50cf\u7f16\u8f91\u65b9\u6cd5\u3002|\n", "2410.11805": "|**2024-10-15**|**NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models**|Han Han et.al.|[2410.11805](http://arxiv.org/abs/2410.11805)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u5de5\u5177\u5b66\u4e60\u5728\u73b0\u5b9e\u5e94\u7528\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u679c\u3002\u5728\u5de5\u5177\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0cLLMs\u53ef\u80fd\u4f1a\u6309\u7167\u5d4c\u5957\u987a\u5e8f\u8c03\u7528\u591a\u4e2a\u5de5\u5177\uff0c\u5176\u4e2d\u540e\u4e00\u4e2a\u5de5\u5177\u8c03\u7528\u53ef\u80fd\u5c06\u5176\u524d\u4e00\u4e2a\u5de5\u5177\u7684\u54cd\u5e94\u4f5c\u4e3a\u8f93\u5165\u53c2\u6570\u3002\u7136\u800c\uff0c\u5f53\u524d\u5bf9\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7814\u7a76\u4ecd\u7136\u4e0d\u8db3\uff0c\u56e0\u4e3a\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u7f3a\u4e4f\u76f8\u5173\u6570\u636e\u5b9e\u4f8b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86NesTools\u6765\u586b\u8865\u5168\u9762\u8bc4\u4f30\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7a7a\u767d\u3002NesTools\u5305\u542b\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u7528\u4e8e\u6784\u5efa\u5177\u6709\u4e0d\u540c\u5d4c\u5957\u7ed3\u6784\u7684\u5927\u89c4\u6a21\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u3002\u901a\u8fc7\u4eba\u5de5\u5ba1\u6838\u548c\u4f18\u5316\uff0c\u8be5\u6570\u636e\u96c6\u8d28\u91cf\u9ad8\u4e14\u4e0e\u73b0\u5b9e\u573a\u666f\u7d27\u5bc6\u76f8\u5173\u3002\u56e0\u6b64\uff0cNesTools\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6765\u8bc4\u4f30LLMs\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u3002\u6211\u4eec\u5bf922\u4e2aLLMs\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528NesTools\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684LLMs\u5728\u590d\u6742\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u4efb\u52a1\u4e0a\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002|\n", "2410.11802": "|**2024-10-15**|**FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting**|Zhe Li et.al.|[2410.11802](http://arxiv.org/abs/2410.11802)|null|\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u5728\u91d1\u878d\u3001\u6c14\u8c61\u670d\u52a1\u548c\u80fd\u6e90\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u90fd\u662f\u5173\u952e\u529f\u80fd\u3002\u5c3d\u7ba1\u8fd1\u5e74\u6765\u51fa\u73b0\u4e86\u8bb8\u591aTSF\u65b9\u6cd5\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e2d\u7684\u8bb8\u591a\u9700\u8981\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u6536\u96c6\u548c\u6a21\u578b\u8bad\u7ec3\uff0c\u5e76\u4e14\u5728\u65b0\u9886\u57df\u4e0a\u7684\u6cdb\u5316\u6027\u80fd\u8f83\u5dee\u3002\u57fa\u7840\u6a21\u578b\u65e8\u5728\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u3002\u5b83\u4eec\u901a\u8fc7\u5927\u89c4\u6a21\u8bed\u8a00\u6216\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u9884\u8bad\u7ec3\uff0c\u8868\u73b0\u51fa\u5728\u65b0\u6216\u672a\u89c1\u8fc7\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u63a8\u7406\u7684\u6f5c\u529b\u3002\u8fd9\u4fc3\u4f7f\u4e86\u65b0\u578bTSF\u57fa\u7840\u6a21\u578b\u7684\u6d8c\u73b0\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5373FoundTS\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5f7b\u5e95\u800c\u516c\u5e73\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002FoundTS\u6db5\u76d6\u4e86\u5404\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u65f6\u95f4\u5e8f\u5217\u7684\u57fa\u7840\u6a21\u578b\u3002\u6b64\u5916\uff0cFoundTS\u652f\u6301\u4e0d\u540c\u7684\u9884\u6d4b\u7b56\u7565\uff0c\u5305\u62ec\u96f6\u6837\u672c\u3001\u5c11\u91cf\u6837\u672c\u548c\u5168\u6837\u672c\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6700\u540e\uff0cFoundTS\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6807\u51c6\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\u7ba1\u9053\uff0c\u5305\u62ec\u6570\u636e\u96c6\u5206\u5272\u3001\u52a0\u8f7d\u3001\u5f52\u4e00\u5316\u548c\u5c11\u91cf\u6837\u672c\u62bd\u53d6\uff0c\u4ece\u800c\u5b9e\u73b0\u516c\u5e73\u7684\u8bc4\u4f30\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5bf9\u5e7f\u6cdb\u9886\u57df\u5185\u5177\u6709\u4e0d\u540c\u7edf\u8ba1\u7279\u6027\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u7684TSF\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u8bc4\u4f30\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u6709\u57fa\u7840\u6a21\u578b\u7684\u4f18\u70b9\u3001\u7f3a\u70b9\u53ca\u5176\u5185\u5728\u9650\u5236\uff0c\u5e76\u786e\u5b9a\u4e86\u672a\u6765\u6a21\u578b\u8bbe\u8ba1\u7684\u65b9\u5411\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u5728https://anonymous.4open.science/r/FoundTS-C2B0\u83b7\u53d6\u3002|\n", "2410.11786": "|**2024-10-15**|**Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability**|Tsz Ting Chung et.al.|[2410.11786](http://arxiv.org/abs/2410.11786)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u65f6\u3002\u7136\u800c\uff0c\u4e0a\u4e0b\u6587\u5b66\u4e60\u5e26\u6765\u4e86\u989d\u5916\u7684\u8ba1\u7b97\u548c\u8d22\u52a1\u6210\u672c\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u4e00\u4e9b\u63d0\u793a\u538b\u7f29\u65b9\u6cd5\u88ab\u63d0\u51fa\u4ee5\u538b\u7f29\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u7684\u63d0\u793a\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u9762\u4e34\u7740\u7531\u4e8e\u6a21\u578b\u7279\u5b9a\u538b\u7f29\u800c\u5bfc\u81f4\u7684\u8fc1\u79fb\u6027\u5dee\u7684\u95ee\u9898\uff0c\u6216\u8005\u4f9d\u8d56\u5916\u90e8\u8bad\u7ec3\u6570\u636e\uff0c\u4f8b\u5982GPT-4\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u5f00\u53d1\u7edf\u4e00\u538b\u7f29\u65b9\u6cd5\u7684\u80fd\u529b\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u79bb\u6563\u5316\u4e0d\u5177\u4fe1\u606f\u6027\u7684\u6807\u8bb0\uff0c\u91c7\u7528\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u6280\u672f\u3002\u901a\u8fc7\u5728\u6301\u7eed\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f15\u5165\u5c11\u91cf\u53c2\u6570\uff0c\u6240\u63d0\u51fa\u7684Selection-p\u4e3a\u6bcf\u4e2a\u8f93\u5165\u6807\u8bb0\u751f\u6210\u4e00\u4e2a\u6982\u7387\u503c\uff0c\u6307\u793a\u4fdd\u7559\u6216\u4e22\u5f03\u8be5\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cSelection-p\u5728\u591a\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e2d\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u5b9e\u73b0\u9ad8\u8fbe10\u500d\u7684\u538b\u7f29\u7387\u7684\u540c\u65f6\uff0c\u4ec5\u7ecf\u5386\u4e86\u5fae\u5c0f\u76840.8%\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u5b83\u76f8\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u5728\u4e0d\u540c\u6a21\u578b\u4e0a\u7684\u8fc1\u79fb\u6027\u66f4\u4f18\u3002\u53e6\u5916\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5206\u6790\u4e86Selection-p\u5982\u4f55\u6709\u52a9\u4e8e\u5728\u957f\u4e0a\u4e0b\u6587\u4e2d\u4fdd\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u6027\u80fd\u3002|\n", "2410.11782": "|**2024-10-15**|**G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks**|Guibin Zhang et.al.|[2410.11782](http://arxiv.org/abs/2410.11782)|null|\u8fd1\u671f\u5728\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6280\u672f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u8bc1\u660e\u96c6\u4f53\u667a\u80fd\u53ef\u4ee5\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\u62d3\u6251\u3002\u5c3d\u7ba1\u6709\u8bb8\u591a\u591a\u6837\u5316\u4e14\u9ad8\u6027\u80fd\u7684\u8bbe\u8ba1\u53ef\u4f9b\u9009\u62e9\uff0c\u4f46\u5b9e\u8df5\u8005\u5728\u4e3a\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u6709\u6548\u7684\u7ba1\u9053\u65f6\u5e38\u5e38\u611f\u5230\u56f0\u60d1\uff1a\u54ea\u79cd\u62d3\u6251\u6700\u9002\u5408\u6211\u7684\u4efb\u52a1\uff0c\u540c\u65f6\u907f\u514d\u4e0d\u5fc5\u8981\u7684\u901a\u4fe1\u4ee4\u724c\u5f00\u9500\u5e76\u786e\u4fdd\u9ad8\u8d28\u91cf\u7684\u89e3\u51b3\u65b9\u6848\uff1f\u9488\u5bf9\u8fd9\u4e00\u56f0\u5883\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86G-Designer\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u9002\u5e94\u3001\u9ad8\u6548\u4e14\u7a33\u5065\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u52a8\u6001\u8bbe\u8ba1\u4efb\u52a1\u611f\u77e5\u7684\u5b9a\u5236\u5316\u901a\u4fe1\u62d3\u6251\u3002\u5177\u4f53\u6765\u8bf4\uff0cG-Designer\u5c06\u591a\u4ee3\u7406\u7cfb\u7edf\u5efa\u6a21\u4e3a\u4e00\u4e2a\u591a\u4ee3\u7406\u7f51\u7edc\uff0c\u5229\u7528\u53d8\u5206\u56fe\u81ea\u52a8\u7f16\u7801\u5668\u5bf9\u8282\u70b9\uff08\u4ee3\u7406\uff09\u548c\u4e00\u4e2a\u7279\u5b9a\u4efb\u52a1\u7684\u865a\u62df\u8282\u70b9\u8fdb\u884c\u7f16\u7801\uff0c\u5e76\u89e3\u7801\u51fa\u4e00\u4e2a\u4efb\u52a1\u9002\u5e94\u6027\u5f3a\u4e14\u6027\u80fd\u9ad8\u7684\u901a\u4fe1\u62d3\u6251\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cG-Designer\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\\textbf{(1) \u9ad8\u6027\u80fd}\uff0c\u5728MMLU\u4e0a\u7684\u51c6\u786e\u7387\u8fbe\u523084.50%\uff0c\u5728HumanEval\u4e0a\u7684pass@1\u8fbe\u523089.90%\uff1b\\textbf{(2) \u4efb\u52a1\u9002\u5e94\u6027}\uff0c\u6839\u636e\u4efb\u52a1\u96be\u5ea6\u6784\u5efa\u5b9a\u5236\u5316\u7684\u901a\u4fe1\u534f\u8bae\uff0c\u5c06\u4ee4\u724c\u6d88\u8017\u51cf\u5c11\u4e86\u9ad8\u8fbe95.33%\uff1b\u5e76\u4e14\\textbf{(3) \u5bf9\u6297\u9c81\u68d2}\uff0c\u80fd\u591f\u62b5\u5fa1\u4ee3\u7406\u5bf9\u6297\u653b\u51fb\uff0c\u4ec5\u5bfc\u81f40.3%\u7684\u51c6\u786e\u7387\u4e0b\u964d\u3002|\n", "2410.11781": "|**2024-10-15**|**Language Models Encode Numbers Using Digit Representations in Base 10**|Amit Arnold Levy et.al.|[2410.11781](http://arxiv.org/abs/2410.11781)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u5373\u4f7f\u662f\u7b80\u5355\u7684\u6570\u503c\u95ee\u9898\u65f6\uff0c\u5982\u6bd4\u8f83\u4e24\u4e2a\u5c0f\u6570\u5b57\uff0c\u4e5f\u7ecf\u5e38\u51fa\u9519\u3002\u4e00\u4e2a\u81ea\u7136\u7684\u5047\u8bbe\u662f\u8fd9\u4e9b\u9519\u8bef\u6e90\u4e8e\u6a21\u578b\u5982\u4f55\u8868\u793a\u6570\u5b57\uff0c\u7279\u522b\u662f\u5b83\u4eec\u662f\u5426\u6355\u6349\u5230\u4e86\u6570\u5b57\u7684\u5b9e\u9645\u6570\u503c\u3002\u6211\u4eec\u901a\u8fc7\u89c2\u5bdf\u53d1\u73b0\uff0cLLM\u5728\u6570\u503c\u4efb\u52a1\u4e0a\u7684\u9519\u8bef\u901a\u5e38\u5206\u5e03\u5728\u7b54\u6848\u7684\u201c\u4f4d\u6570\u201d\u4e0a\uff0c\u800c\u4e0d\u662f\u56f4\u7ed5\u5176\u201c\u6570\u503c\u201d\u6b63\u5e38\u5206\u5e03\u3002\u901a\u8fc7\u4e00\u7cfb\u5217\u63a2\u9488\u5b9e\u9a8c\u548c\u56e0\u679c\u5e72\u9884\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u5185\u90e8\u4ee5\u5341\u8fdb\u5236\u7684\u6bcf\u4e00\u4f4d\u6570\u5b57\u8fdb\u884c\u5706\u73af\u5f0f\u8868\u793a\uff0c\u800c\u4e0d\u662f\u6570\u503c\u8868\u793a\u3002\u8fd9\u79cd\u57fa\u4e8e\u4f4d\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u800c\u975e\u6570\u503c\u8868\u793a\uff0c\u63ed\u793a\u4e86\u6a21\u578b\u5728\u6d89\u53ca\u6570\u503c\u63a8\u7406\u7684\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u6a21\u5f0f\uff0c\u5e76\u53ef\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u5206\u6790LLM\u4e2d\u6570\u503c\u673a\u5236\u7684\u57fa\u7840\u3002|\n", "2410.11779": "|**2024-10-15**|**MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation**|Chenxi Wang et.al.|[2410.11779](http://arxiv.org/abs/2410.11779)|**[link](https://github.com/zjunlp/Deco)**|**\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7ecf\u5e38\u8868\u73b0\u51fa\u5e7b\u89c9\u73b0\u8c61\uff0c\u4f46\u5176\u80cc\u540e\u7684\u539f\u56e0\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u5206\u6790\u5e76\u53d1\u73b0\uff0c\u5c3d\u7ba1MLLMs\u5728\u6700\u7ec8\u8f93\u51fa\u4e2d\u9519\u8bef\u5730\u751f\u6210\u4e86\u5bf9\u8c61\uff0c\u4f46\u5728\u524d\u4e00\u5c42\u5b83\u4eec\u5b9e\u9645\u4e0a\u80fd\u591f\u8bc6\u522b\u89c6\u89c9\u5bf9\u8c61\u3002\u6211\u4eec\u63a8\u6d4b\u8fd9\u53ef\u80fd\u662f\u7531\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u77e5\u8bc6\u5148\u9a8c\u6291\u5236\u4e86\u89c6\u89c9\u4fe1\u606f\uff0c\u4ece\u800c\u5bfc\u81f4\u5e7b\u89c9\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u6821\u6b63\u89e3\u7801\u65b9\u6cd5\uff08DeCo\uff09\uff0c\u8be5\u65b9\u6cd5\u81ea\u9002\u5e94\u5730\u9009\u62e9\u5408\u9002\u7684\u524d\u4e00\u5c42\uff0c\u5e76\u6309\u6bd4\u4f8b\u5c06\u77e5\u8bc6\u6574\u5408\u5230\u6700\u7ec8\u5c42\u4ee5\u8c03\u6574\u8f93\u51falogits\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cDeCo\u662f\u4e0e\u6a21\u578b\u65e0\u5173\u7684\uff0c\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u5404\u79cd\u7ecf\u5178\u89e3\u7801\u7b56\u7565\u7ed3\u5408\uff0c\u5e76\u5e94\u7528\u4e8e\u4e0d\u540c\u7684MLLMs\u3002\u6211\u4eec\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86DeCo\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u76f8\u6bd4\u57fa\u7ebf\u5927\u5e45\u964d\u4f4e\u4e86\u5e7b\u89c9\u7387\uff0c\u7a81\u663e\u4e86\u5176\u51cf\u8f7b\u5e7b\u89c9\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u53ef\u5728https://github.com/zjunlp/DeCo\u83b7\u53d6\u3002**|\n", "2410.11772": "|**2024-10-15**|**Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models**|Kai Yao et.al.|[2410.11772](http://arxiv.org/abs/2410.11772)|**[link](https://github.com/kaiseem/ist)**|**\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u65b9\u6cd5\u56e0\u5176\u5728\u9002\u5e94\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5230\u4e0b\u6e38\u4efb\u52a1\u65f6\u663e\u8457\u51cf\u5c11\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u7684\u6f5c\u529b\u800c\u5e7f\u53d7\u6b22\u8fce\u3002\u7136\u800c\uff0c\u5927\u591a\u6570PEFT\u65b9\u6cd5\u7684\u4e00\u4e2a\u5e38\u89c1\u9650\u5236\u662f\u5b83\u4eec\u5728\u6574\u4e2a\u5c42\u4e2d\u5e94\u7528\u7edf\u4e00\u7684\u67b6\u6784\u8bbe\u8ba1\uff0c\u8fd9\u6d89\u53ca\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u6a21\u5757\uff0c\u5e76\u5ffd\u7565\u4e86\u6bcf\u5c42\u7684\u91cd\u8981\u6027\u5dee\u5f02\uff0c\u4ece\u800c\u5bfc\u81f4\u5fae\u8c03\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u5c40\u9650\u5e76\u83b7\u5f97\u66f4\u597d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u91cd\u8981\u6027\u611f\u77e5\u7a00\u758f\u8c03\u4f18\uff08IST\uff09\uff0c\u4ee5\u5145\u5206\u5229\u7528\u56fa\u6709\u7684\u7a00\u758f\u6027\uff0c\u5e76\u901a\u8fc7\u6709\u6548\u7684\u9010\u5c42\u91cd\u8981\u6027\u8bc4\u5206\u9009\u62e9\u6700\u91cd\u8981\u7684\u5168\u5c42\u5b50\u96c6\u3002\u6240\u63d0\u51fa\u7684IST\u662f\u4e00\u79cd\u901a\u7528\u4e14\u5373\u63d2\u5373\u7528\u7684\u6280\u672f\uff0c\u4e0e\u5404\u79cd\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u517c\u5bb9\u3002\u901a\u8fc7\u5229\u7528\u4f30\u8ba1\u7684\u91cd\u8981\u6027\u5f97\u5206\uff0cIST\u5728PEFT\u6a21\u5757\u4e2d\u52a8\u6001\u66f4\u65b0\u8fd9\u4e9b\u9009\u5b9a\u7684\u5c42\uff0c\u4ece\u800c\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u4f9b\u4e86\u6536\u655b\u6027\u7684\u7406\u8bba\u8bc1\u660e\u548c\u4f18\u4e8e\u5747\u5300\u66f4\u65b0\u7b56\u7565\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u4ee5\u8bc1\u660eIST\u76f8\u5bf9\u4e8e\u73b0\u6709\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5404\u79cdLLMs\u3001PEFT\u65b9\u6cd5\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u8bc1\u5b9e\u4e86\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86IST\u589e\u5f3a\u73b0\u6709\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/Kaiseem/IST\u83b7\u53d6\u3002**|\n", "2410.12788": "|**2024-10-16**|**Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception**|Jihao Zhao et.al.|[2410.12788](http://arxiv.org/abs/2410.12788)|null| Retrieval-Augmented Generation\uff08RAG\uff09\u5728\u4f5c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53ef\u884c\u8865\u5145\u65f6\uff0c\u5e38\u5e38\u5ffd\u7565\u4e86\u5176\u7ba1\u9053\u4e2d\u4e00\u4e2a\u5173\u952e\u65b9\u9762\u2014\u2014\u6587\u672c\u5206\u5757\uff0c\u8fd9\u5f71\u54cd\u4e86\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u7684\u8d28\u91cf\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u79f0\u4e3a\u5143\u5206\u5757\uff08Meta-Chunking\uff09\u7684\u6982\u5ff5\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8e\u53e5\u5b50\u548c\u6bb5\u843d\u4e4b\u95f4\u7684\u7c92\u5ea6\uff0c\u7531\u6bb5\u843d\u5185\u5177\u6709\u6df1\u5c42\u6b21\u8bed\u8a00\u903b\u8f91\u8054\u7cfb\u7684\u4e00\u7ec4\u53e5\u5b50\u7ec4\u6210\u3002\u4e3a\u4e86\u5b9e\u73b0\u5143\u5206\u5757\uff0c\u6211\u4eec\u57fa\u4e8eLLMs\u8bbe\u8ba1\u4e86\u4e24\u79cd\u7b56\u7565\uff1a\u8fb9\u754c\u91c7\u6837\u5206\u5757\u548c\u56f0\u60d1\u5ea6\u5206\u5757\u3002\u524d\u8005\u5229\u7528LLMs\u5bf9\u8fde\u7eed\u53e5\u5b50\u662f\u5426\u9700\u8981\u5206\u5272\u8fdb\u884c\u4e8c\u5206\u7c7b\u51b3\u7b56\uff0c\u57fa\u4e8e\u4ece\u8fb9\u754c\u91c7\u6837\u83b7\u5f97\u7684\u6982\u7387\u5dee\u505a\u51fa\u51b3\u7b56\u3002\u540e\u8005\u901a\u8fc7\u5206\u6790\u56f0\u60d1\u5ea6\u5206\u5e03\u7684\u7279\u70b9\u6765\u7cbe\u786e\u8bc6\u522b\u6587\u672c\u5206\u5757\u8fb9\u754c\u3002\u6b64\u5916\uff0c\u8003\u8651\u5230\u4e0d\u540c\u6587\u672c\u7684\u56fa\u6709\u590d\u6742\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5143\u5206\u5757\u4e0e\u52a8\u6001\u5408\u5e76\u7684\u7b56\u7565\uff0c\u4ee5\u5b9e\u73b0\u5728\u7ec6\u7c92\u5ea6\u548c\u7c97\u7c92\u5ea6\u6587\u672c\u5206\u5757\u4e4b\u95f4\u53d6\u5f97\u5e73\u8861\u3002\u5b9e\u9a8c\u5728\u5341\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660e\u5143\u5206\u5757\u53ef\u4ee5\u66f4\u6709\u6548\u5730\u63d0\u9ad8\u57fa\u4e8eRAG\u7684\u5355\u8df3\u548c\u591a\u8df3\u95ee\u7b54\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u57282WikiMultihopQA\u6570\u636e\u96c6\u4e0a\uff0c\u5b83\u6bd4\u76f8\u4f3c\u6027\u5206\u5757\u63d0\u9ad8\u4e861.32\u7684\u6027\u80fd\uff0c\u540c\u65f6\u4ec5\u6d88\u8017\u4e8645.8%\u7684\u65f6\u95f4\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/IAAR-Shanghai/Meta-Chunking \u83b7\u53d6\u3002|\n", "2410.12782": "|**2024-10-16**|**In-Context Learning Enables Robot Action Prediction in LLMs**|Yida Yin et.al.|[2410.12782](http://arxiv.org/abs/2410.12782)|null|\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u9886\u57df\u901a\u8fc7\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\u3002\u7136\u800c\uff0c\u5229\u7528LLMs\u7684ICL\u80fd\u529b\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u7684\u7814\u7a76\u8fd8\u76f8\u5bf9\u8f83\u5c11\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboPrompt\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u73b0\u6210\u7684\u7eaf\u6587\u672cLLMs\u80fd\u591f\u5728\u65e0\u9700\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u901a\u8fc7ICL\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u8bc6\u522b\u51fa\u4e00\u4e2a\u7247\u6bb5\u4e2d\u7684\u5173\u952e\u5e27\uff0c\u8fd9\u4e9b\u5173\u952e\u5e27\u6355\u6349\u4e86\u91cd\u8981\u7684\u65f6\u523b\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u4ece\u8fd9\u4e9b\u5173\u952e\u5e27\u4e2d\u63d0\u53d6\u672b\u7aef\u6267\u884c\u5668\u7684\u52a8\u4f5c\u4ee5\u53ca\u4f30\u8ba1\u7684\u521d\u59cb\u7269\u4f53\u59ff\u6001\uff0c\u5e76\u5c06\u4e24\u8005\u8f6c\u6362\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6a21\u677f\uff0c\u4ece\u8fd9\u4e9b\u6587\u672c\u63cf\u8ff0\u548c\u4efb\u52a1\u6307\u4ee4\u4e2d\u5f62\u6210ICL\u6f14\u793a\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u5728\u6d4b\u8bd5\u65f6\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0cRoboPrompt\u5728\u6a21\u62df\u548c\u771f\u5b9e\u73af\u5883\u4e2d\u5747\u8868\u73b0\u51fa\u6bd4\u96f6\u6837\u672c\u548cICL\u57fa\u7ebf\u66f4\u5f3a\u7684\u6027\u80fd\u3002|\n", "2410.12774": "|**2024-10-16**|**Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information**|Yingya Li et.al.|[2410.12774](http://arxiv.org/abs/2410.12774)|null|\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u4efb\u52a1\u7684\u5206\u7ec4\u65b9\u5f0f\u3002\u7b80\u5355\u5730\u5c06\u6240\u6709\u4efb\u52a1\u6216\u968f\u673a\u9009\u62e9\u7684\u4efb\u52a1\u7ec4\u5408\u5728\u4e00\u8d77\u53ef\u80fd\u5bfc\u81f4\u8d1f\u8fc1\u79fb\uff0c\u4ece\u800c\u4f7f\u591a\u4efb\u52a1\u6a21\u578b\u7684\u8868\u73b0\u4e0d\u5982\u5355\u4efb\u52a1\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u7ecf\u505a\u51fa\u4e86\u8bb8\u591a\u52aa\u529b\u6765\u8bc6\u522b\u4efb\u52a1\u5206\u7ec4\u5e76\u8861\u91cf\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4f46\u5b9a\u4e49\u4e00\u4e2a\u6307\u6807\u4ee5\u4ece\u4f17\u591a\u6f5c\u5728\u4efb\u52a1\u7ec4\u5408\u4e2d\u786e\u5b9a\u6700\u4f73\u4efb\u52a1\u5206\u7ec4\u4ecd\u7136\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7814\u7a76\u8bfe\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u70b9\u5f0fV-\u53ef\u7528\u4fe1\u606f\uff08PVI\uff09\u6d4b\u91cf\u4efb\u52a1\u96be\u5ea6\u7684\u4efb\u52a1\u76f8\u5173\u6027\u5ea6\u91cf\u65b9\u6cd5\u3002PVI\u662f\u4e00\u79cd\u65b0\u8fd1\u63d0\u51fa\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u7528\u4e8e\u4f30\u8ba1\u7ed9\u5b9a\u6a21\u578b\u65f6\u6570\u636e\u96c6\u5305\u542b\u591a\u5c11\u53ef\u7528\u4fe1\u606f\u3002\u6211\u4eec\u5047\u8bbe\u5177\u6709\u7edf\u8ba1\u4e0a\u4e0d\u53ef\u533a\u5206\u7684PVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u8db3\u591f\u76f8\u4f3c\uff0c\u53ef\u4ee5\u4ece\u8054\u5408\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u5728\u4e00\u822c\u3001\u751f\u7269\u533b\u5b66\u548c\u4e34\u5e8a\u9886\u57df\u768415\u4e2aNLP\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u8be5\u5ea6\u91cf\u65b9\u6cd5\u7528\u4e8e\u4efb\u52a1\u5206\u7ec4\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u5c06\u8054\u5408\u5b66\u4e60\u5668\u7684\u7ed3\u679c\u4e0e\u5355\u4efb\u52a1\u5b66\u4e60\u5668\u3001\u73b0\u6709\u57fa\u7ebf\u65b9\u6cd5\u4ee5\u53ca\u6700\u8fd1\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5305\u62ecLlama 2\u548cGPT-4\uff09\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u5c06\u5177\u6709\u76f8\u4f3cPVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u5206\u7ec4\uff0c\u8054\u5408\u5b66\u4e60\u5668\u5728\u8f83\u5c11\u603b\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u83b7\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u9886\u57df\u5185\u8868\u73b0\u4e00\u81f4\u3002|\n", "2410.12757": "|**2024-10-16**|**StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples**|Ajay Patel et.al.|[2410.12757](http://arxiv.org/abs/2410.12757)|null|\u98ce\u683c\u8868\u793a\u65e8\u5728\u5c06\u5177\u6709\u76f8\u4f3c\u5199\u4f5c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u63a5\u8fd1\u7684\u4f4d\u7f6e\uff0c\u5e76\u5c06\u5177\u6709\u4e0d\u540c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u8fdc\u79bb\u7684\u4f4d\u7f6e\uff0c\u800c\u4e0d\u8003\u8651\u5185\u5bb9\u3002\u7136\u800c\uff0c\u7528\u4e8e\u8bad\u7ec3\u8fd9\u4e9b\u8868\u793a\u7684\u5bf9\u6bd4\u4e09\u5143\u7ec4\u5f80\u5f80\u5728\u98ce\u683c\u548c\u5185\u5bb9\u4e0a\u90fd\u6709\u6240\u53d8\u5316\uff0c\u5bfc\u81f4\u8868\u793a\u4e2d\u53ef\u80fd\u5b58\u5728\u5185\u5bb9\u6cc4\u6f0f\u7684\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aStyleDistance\u7684\u65b0\u65b9\u6cd5\u6765\u8bad\u7ec3\u66f4\u5f3a\u7684\u72ec\u7acb\u4e8e\u5185\u5bb9\u7684\u98ce\u683c\u5d4c\u5165\u3002\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u4e86\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u53d7\u63a7\u98ce\u683c\u53d8\u5316\u7684\u8fd1\u4f3c\u91ca\u4e49\uff0c\u5e76\u4e3a\u7cbe\u786e\u7684\u5bf9\u6bd4\u5b66\u4e60\u751f\u6210\u4e86\u8de8\u8d8a40\u4e2a\u4e0d\u540c\u98ce\u683c\u7279\u5f81\u7684\u6b63\u4f8b\u548c\u8d1f\u4f8b\u3002\u6211\u4eec\u901a\u8fc7\u4eba\u5de5\u548c\u81ea\u52a8\u8bc4\u4f30\u6765\u8bc4\u4f30\u5408\u6210\u6570\u636e\u548c\u5d4c\u5165\u7684\u8d28\u91cf\u3002StyleDistance\u589e\u5f3a\u4e86\u98ce\u683c\u5d4c\u5165\u7684\u5185\u5bb9\u72ec\u7acb\u6027\uff0c\u8fd9\u79cd\u5d4c\u5165\u53ef\u4ee5\u63a8\u5e7f\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u5728\u4e0b\u6e38\u5e94\u7528\u4e2d\u4f18\u4e8e\u9886\u5148\u7684\u98ce\u683c\u8868\u793a\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728https://huggingface.co/StyleDistance/styledistance\u627e\u5230\u3002|\n", "2410.12735": "|**2024-10-17**|**CREAM: Consistency Regularized Self-Rewarding Language Models**|Zhaoyang Wang et.al.|[2410.12735](http://arxiv.org/abs/2410.12735)|null|\u8fd1\u671f\u7684\u81ea\u6211\u5956\u52b1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6210\u529f\u5730\u5e94\u7528\u4e86LLM\u4f5c\u4e3a\u88c1\u5224\u7684\u65b9\u6cd5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5bf9\u9f50\u6027\u80fd\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u6807\u6ce8\u7684\u504f\u597d\u6570\u636e\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u540c\u4e00LLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\uff08\u751f\u6210\u54cd\u5e94\uff09\u548c\u5956\u52b1\u6a21\u578b\uff08\u8bc4\u5206\u548c\u6392\u5e8f\u8fd9\u4e9b\u54cd\u5e94\uff09\u3002\u7136\u540e\uff0c\u6839\u636e\u6392\u540d\u7684\u54cd\u5e94\u4f5c\u4e3a\u504f\u597d\u5bf9\u6765\u901a\u8fc7\u76f4\u63a5\u5bf9\u9f50\u6280\u672f\uff08\u4f8b\u5982DPO\uff09\u8bad\u7ec3LLM\u3002\u7136\u800c\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0c\u5956\u52b1\u548c\u6392\u5e8f\u7684\u51c6\u786e\u6027\u6ca1\u6709\u4fdd\u8bc1\uff0c\u8fd9\u5bf9\u4e8e\u786e\u4fdd\u51c6\u786e\u7684\u5956\u52b1\u548c\u9ad8\u8d28\u91cf\u7684\u504f\u597d\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u6765\u81ea\u76f8\u5bf9\u8f83\u5c0f\u7684LLM\uff08\u4f8b\u59827B\u53c2\u6570\uff09\u7684\u7ecf\u9a8c\u7ed3\u679c\u4e5f\u8868\u660e\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7ecf\u8fc7\u51e0\u6b21\u8fed\u4ee3\u540e\uff0c\u81ea\u6211\u5956\u52b1\u7684\u6539\u8fdb\u53ef\u80fd\u4f1a\u51cf\u5f31\uff0c\u6211\u4eec\u5047\u8bbe\u8fd9\u662f\u7531\u4e8e\u5956\u52b1\u7cfb\u7edf\u4e2d\u7684\u7d2f\u79ef\u504f\u5dee\u6240\u81f4\u3002\u8fd9\u79cd\u504f\u5dee\u53ef\u80fd\u5bfc\u81f4\u7528\u4e8e\u8bad\u7ec3LLM\u7684\u4e0d\u53ef\u9760\u504f\u597d\u6570\u636e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u9996\u5148\u5236\u5b9a\u4e86\u5e76\u5206\u6790\u4e86\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u4e49\u8fed\u4ee3\u504f\u597d\u5fae\u8c03\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728\u8fd9\u4e00\u5e7f\u4e49\u6846\u67b6\u4e2d\u5f15\u5165\u6b63\u5219\u5316\uff0c\u4ee5\u51cf\u8f7b\u81ea\u6211\u5956\u52b1\u8fc7\u7a0b\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u504f\u597d\u6807\u8bb0\u3002\u57fa\u4e8e\u8fd9\u4e00\u7406\u8bba\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e00\u81f4\u6027\u6b63\u5219\u5316\u7684\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\uff08CREAM\uff09\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e0d\u540c\u8fed\u4ee3\u4e2d\u7684\u5956\u52b1\u4e00\u81f4\u6027\u6765\u6b63\u5219\u5316\u81ea\u6211\u5956\u52b1\u8bad\u7ec3\uff0c\u5e2e\u52a9\u6a21\u578b\u4ece\u66f4\u53ef\u9760\u7684\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u79cd\u660e\u786e\u7684\u6b63\u5219\u5316\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u660e\u4e86CREAM\u5728\u63d0\u9ad8\u5956\u52b1\u4e00\u81f4\u6027\u548c\u5bf9\u9f50\u6027\u80fd\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Raibows/CREAM\u516c\u5f00\u83b7\u53d6\u3002|\n", "2410.12707": "|**2024-10-16**|**FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression**|Zhenheng Tang et.al.|[2410.12707](http://arxiv.org/abs/2410.12707)|null|\u4e3a\u4e86\u7f13\u89e3\u5728\u8bad\u7ec3\u5927\u578b\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\uff08DNNs\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\u7684\u786c\u4ef6\u77ed\u7f3a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FusionLLM\uff0c\u8fd9\u662f\u4e00\u79cd\u53bb\u4e2d\u5fc3\u5316\u7684\u8bad\u7ec3\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u5730\u7406\u5206\u5e03\u7684GPU\u8de8\u4e0d\u540c\u7684\u8ba1\u7b97\u96c6\u7fa4\u6216\u5355\u4e2a\u8bbe\u5907\u8fdb\u884cDNN\u8bad\u7ec3\u3002\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u5728\u7cfb\u7edf\u8bbe\u8ba1\u548c\u6548\u7387\u65b9\u9762\u9762\u4e34\u91cd\u5927\u6311\u6218\uff0c\u5305\u62ec\uff1a1\uff09\u9700\u8981\u8fdc\u7a0b\u81ea\u52a8\u5fae\u5206\uff08RAD\uff09\uff0c2\uff09\u652f\u6301\u7075\u6d3b\u7684\u6a21\u578b\u5b9a\u4e49\u548c\u5f02\u6784\u8f6f\u4ef6\uff0c3\uff09\u5f02\u6784\u786c\u4ef6\u5bfc\u81f4\u8d44\u6e90\u5229\u7528\u7387\u4f4e\u6216\u5b58\u5728\u6162\u901f\u8282\u70b9\u95ee\u9898\uff0c\u4ee5\u53ca4\uff09\u7f51\u7edc\u901a\u4fe1\u7f13\u6162\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u5728\u7cfb\u7edf\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u5c06\u6a21\u578b\u8868\u793a\u4e3a\u64cd\u4f5c\u7b26\uff08OP-DAG\uff09\u7684\u6709\u5411\u65e0\u73af\u56fe\u3002DAG\u4e2d\u7684\u6bcf\u4e2a\u8282\u70b9\u4ee3\u8868DNN\u4e2d\u7684\u64cd\u4f5c\u7b26\uff0c\u8fb9\u5219\u8868\u793a\u64cd\u4f5c\u7b26\u4e4b\u95f4\u7684\u6570\u636e\u4f9d\u8d56\u5173\u7cfb\u3002\u57fa\u4e8e\u8fd9\u79cd\u8bbe\u8ba1\uff0c1\uff09\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49\u4efb\u4f55DNN\u800c\u4e0d\u5fc5\u5173\u5fc3\u5e95\u5c42\u64cd\u4f5c\u7b26\u5b9e\u73b0\uff1b2\uff09\u6211\u4eec\u901a\u8fc7\u66f4\u7ec6\u7c92\u5ea6\u7684\u5b50\u4efb\u52a1\u8fdb\u884c\u4efb\u52a1\u8c03\u5ea6\uff0c\u63d0\u4f9b\u66f4\u591a\u7684\u4f18\u5316\u7a7a\u95f4\uff1b3\uff09DAG\u8fd0\u884c\u65f6\u6267\u884c\u5668\u53ef\u4ee5\u5728\u4e0d\u4f9d\u8d56\u4e00\u81f4\u7684\u4f4e\u7ea7\u673a\u5668\u5b66\u4e60\u6846\u67b6\u7248\u672c\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0RAD\u3002 \u4e3a\u4e86\u63d0\u9ad8\u7cfb\u7edf\u6548\u7387\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u4e00\u4e2a\u5de5\u4f5c\u8d1f\u8f7d\u4f30\u8ba1\u5668\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cdOP-Fence\u8c03\u5ea6\u5668\uff0c\u5c06\u5177\u6709\u76f8\u4f3c\u5e26\u5bbd\u7684\u8bbe\u5907\u5206\u7ec4\u5728\u4e00\u8d77\uff0c\u5e76\u5bf9DAG\u8fdb\u884c\u5206\u533a\u4ee5\u589e\u52a0\u541e\u5410\u91cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cdAdaTopK\u538b\u7f29\u5668\uff0c\u4ee5\u81ea\u9002\u5e94\u5730\u538b\u7f29\u5728\u6700\u6162\u901a\u4fe1\u94fe\u8def\u4e0a\u7684\u4e2d\u95f4\u6fc0\u6d3b\u548c\u68af\u5ea6\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u7b97\u6cd5\u7684\u6536\u655b\u6027\u548c\u6548\u7387\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u73b0\u5b9e\u6d4b\u8bd5\u5e73\u53f0\u4e0a\u4f7f\u7528\u8fde\u63a5\u901f\u5ea6\u57288 Mbps\u523010 Gbps\u768448\u4e2aGPU\u4e0a\u8bad\u7ec3\u4e86ResNet-101\u548cGPT-2\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u65b9\u6cd5\u53ef\u4ee5\u5728\u786e\u4fdd\u6536\u655b\u7684\u540c\u65f6\u5b9e\u73b01.45\u81f39.39\u500d\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2410.12700": "|**2024-10-16**|**Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization**|Xingqi Wang et.al.|[2410.12700](http://arxiv.org/abs/2410.12700)|**[link](https://github.com/achernarwang/LiVO)**|**\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u7684\u6269\u6563\u6a21\u578b\u5df2\u7ecf\u80fd\u591f\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u56fe\u50cf\u96be\u4ee5\u533a\u5206\u7684\u56fe\u50cf\uff0c\u4f46\u5b83\u4eec\u5e38\u5e38\u4ea7\u751f\u6709\u5bb3\u5185\u5bb9\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u4e0d\u7b26\uff0c\u4f8b\u5982\u793e\u4f1a\u504f\u89c1\u548c\u5192\u72af\u6027\u5185\u5bb9\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\u8fdb\u884c\u4e86\u5927\u91cf\u7814\u7a76\uff0c\u4f46\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u5bf9\u9f50\u95ee\u9898\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LiVO\uff08\u8f7b\u91cf\u7ea7\u4ef7\u503c\u4f18\u5316\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u8f7b\u91cf\u7ea7\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06T2I\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u3002LiVO\u4ec5\u4f18\u5316\u4e00\u4e2a\u5373\u63d2\u5373\u7528\u7684\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u4ee5\u5c06\u6307\u5b9a\u7684\u4ef7\u503c\u539f\u5219\u6574\u5408\u5230\u8f93\u5165\u63d0\u793a\u4e2d\uff0c\u4ece\u800c\u5728\u63a7\u5236\u751f\u6210\u56fe\u50cf\u7684\u8bed\u4e49\u548c\u4ef7\u503c\u89c2\u65b9\u9762\u53d1\u6325\u4f5c\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9488\u5bf9\u6269\u6563\u6a21\u578b\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\uff0c\u8be5\u51fd\u6570\u5728\u7406\u8bba\u4e0a\u903c\u8fd1LLM\u5bf9\u9f50\u4e2d\u4f7f\u7528\u7684Bradley-Terry\u6a21\u578b\uff0c\u4f46\u63d0\u4f9b\u4e86\u56fe\u50cf\u8d28\u91cf\u548c\u4ef7\u503c\u4e00\u81f4\u6027\u4e4b\u95f4\u7684\u66f4\u7075\u6d3b\u7684\u6743\u8861\u3002\u4e3a\u4e86\u4f18\u5316\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e00\u4e2a\u6846\u67b6\u6765\u81ea\u52a8\u6784\u5efa\u4e00\u4e2a\u5305\u542b86k\u4e2a\u6837\u672c\uff08\u63d0\u793a\u3001\u5bf9\u9f50\u56fe\u50cf\u3001\u8fdd\u53cd\u56fe\u50cf\u3001\u4ef7\u503c\u539f\u5219\uff09\u7684\u6587\u672c-\u56fe\u50cf\u504f\u597d\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e0d\u66f4\u65b0\u5927\u591a\u6570\u6a21\u578b\u53c2\u6570\u5e76\u901a\u8fc7\u4ece\u8f93\u5165\u63d0\u793a\u4e2d\u8fdb\u884c\u81ea\u9002\u5e94\u4ef7\u503c\u9009\u62e9\uff0cLiVO\u663e\u8457\u51cf\u5c11\u4e86\u6709\u5bb3\u8f93\u51fa\uff0c\u5e76\u5b9e\u73b0\u4e86\u66f4\u5feb\u7684\u6536\u655b\uff0c\u8d85\u8d8a\u4e86\u51e0\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u8fc8\u51fa\u4e86\u5411\u4f26\u7406\u5bf9\u9f50\u7684T2I\u6a21\u578b\u8fc8\u51fa\u7684\u7b2c\u4e00\u6b65\u3002**|\n", "2410.12686": "|**2024-10-16**|**Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2**|Mohamad Abdi et.al.|[2410.12686](http://arxiv.org/abs/2410.12686)|null|\u89e3\u5256\u5b66\u6807\u5fd7\u5728\u533b\u5b66\u5f71\u50cf\u4e2d\u5bf9\u4e8e\u5bfc\u822a\u548c\u5f02\u5e38\u68c0\u6d4b\u81f3\u5173\u91cd\u8981\u3002\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5982Llama-2\uff0c\u4e3a\u5c06\u8fd9\u4e9b\u6807\u5fd7\u4ece\u81ea\u7531\u6587\u672c\u7684\u653e\u5c04\u5b66\u62a5\u544a\u6620\u5c04\u5230\u56fe\u50cf\u6570\u636e\u4e2d\u7684\u76f8\u5e94\u4f4d\u7f6e\u63d0\u4f9b\u4e86\u5e0c\u671b\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLMs\u53ef\u80fd\u80fd\u591f\u5f62\u6210\u8fde\u8d2f\u7684\u751f\u6210\u8fc7\u7a0b\u8868\u793a\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u662f\u5426\u51c6\u786e\u5730\u8868\u793a\u89e3\u5256\u5b66\u6807\u5fd7\u7684\u7a7a\u95f4\u4f4d\u7f6e\u3002\u901a\u8fc7\u4f7f\u7528Llama-2\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u53ef\u4ee5\u7ebf\u6027\u5730\u8868\u793a\u7a7a\u95f4\u4e2d\u7684\u89e3\u5256\u5b66\u6807\u5fd7\uff0c\u5e76\u4e14\u5bf9\u4e0d\u540c\u63d0\u793a\u5177\u6709\u76f8\u5f53\u5f3a\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u589e\u5f3a\u533b\u5b66\u5f71\u50cf\u5de5\u4f5c\u6d41\u7a0b\u6548\u7387\u548c\u51c6\u786e\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.12656": "|**2024-10-16**|**Evaluating Morphological Compositional Generalization in Large Language Models**|Mete Ismayilzada et.al.|[2410.12656](http://arxiv.org/abs/2410.12656)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u5c55\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u503c\u5f97\u8d28\u7591\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8e\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u50cf\u4eba\u7c7b\u4e00\u6837\u5b66\u4e60\u8bed\u8a00\u7684\u7591\u95ee\u3002\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bed\u8a00\u4f7f\u7528\u4e2d\u8868\u73b0\u51fa\u7ec4\u5408\u80fd\u529b\u548c\u8bed\u8a00\u521b\u9020\u6027\uff0c\u4f46LLMs\u5728\u8fd9\u65b9\u9762\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u5f62\u6001\u5b66\u65b9\u9762\u7684\u80fd\u529b\uff0c\u4ecd\u9700\u8fdb\u4e00\u6b65\u63a2\u7d22\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u7ec4\u5408\u6027\u7684\u89c6\u89d2\u7cfb\u7edf\u5730\u7814\u7a76\u4e86LLMs\u5728\u5f62\u6001\u5b66\u6cdb\u5316\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u5c06\u8bcd\u7d20\u5b9a\u4e49\u4e3a\u7ec4\u5408\u7684\u57fa\u672c\u5355\u4f4d\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u5957\u65b0\u7684\u751f\u6210\u6027\u548c\u5224\u522b\u6027\u4efb\u52a1\u6765\u8bc4\u4f30\u5f62\u6001\u5b66\u7684\u751f\u4ea7\u529b\u548c\u7cfb\u7edf\u6027\u3002\u91cd\u70b9\u5173\u6ce8\u50cf\u571f\u8033\u5176\u8bed\u548c\u82ac\u5170\u8bed\u8fd9\u6837\u7684\u9ecf\u7740\u8bed\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6700\u5148\u8fdb\u7684\u6307\u4ee4\u5fae\u8c03\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u5305\u62ecGPT-4\u548cGemini\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLLMs\u5728\u5904\u7406\u5f62\u6001\u5b66\u7ec4\u5408\u6cdb\u5316\u65f6\u7279\u522b\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u5e94\u7528\u4e8e\u65b0\u8bcd\u6839\u65f6\uff0c\u968f\u7740\u5f62\u6001\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u867d\u7136\u6a21\u578b\u80fd\u591f\u6bd4\u968f\u673a\u731c\u6d4b\u66f4\u597d\u5730\u8bc6\u522b\u4e2a\u522b\u5f62\u6001\u7ec4\u5408\uff0c\u4f46\u5176\u8868\u73b0\u7f3a\u4e4f\u7cfb\u7edf\u6027\uff0c\u5bfc\u81f4\u4e0e\u4eba\u7c7b\u76f8\u6bd4\u5b58\u5728\u663e\u8457\u7684\u51c6\u786e\u7387\u5dee\u8ddd\u3002|\n", "2410.12631": "|**2024-10-16**|**Explainable Moral Values: a neuro-symbolic approach to value classification**|Nicolas Lazzari et.al.|[2410.12631](http://arxiv.org/abs/2410.12631)|null|\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u672c\u4f53\u7684\u63a8\u7406\u4e0e\u673a\u5668\u5b66\u4e60\u6280\u672f\u5728\u53ef\u89e3\u91ca\u4ef7\u503c\u5206\u7c7b\u4e2d\u7684\u6574\u5408\u3002\u901a\u8fc7\u4f9d\u8d56\u9053\u5fb7\u57fa\u7840\u7406\u8bba\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u5f62\u5f0f\u5316\u4ee5\u53caDnS\u672c\u4f53\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4f7f\u7528sandra\u795e\u7ecf\u7b26\u53f7\u63a8\u7406\u5668\u6765\u63a8\u65ad\u6ee1\u8db3\u7279\u5b9a\u53e5\u5b50\u63cf\u8ff0\u7684\u4ef7\u503c\u3002\u53e5\u5b50\u53ca\u5176\u7ed3\u6784\u5316\u8868\u793a\u662f\u4f7f\u7528\u5f00\u6e90\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u7684\u3002\u6240\u63a8\u65ad\u7684\u63cf\u8ff0\u88ab\u7528\u6765\u81ea\u52a8\u68c0\u6d4b\u53e5\u5b50\u6240\u5173\u8054\u7684\u4ef7\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f9d\u9760\u63a8\u7406\u5668\u7684\u7ed3\u679c\u5373\u53ef\u5b9e\u73b0\u4e0e\u66f4\u590d\u6742\u65b9\u6cd5\u76f8\u5f53\u7684\u53ef\u89e3\u91ca\u5206\u7c7b\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5c06\u63a8\u7406\u5668\u7684\u63a8\u65ad\u7ed3\u679c\u4e0e\u5206\u5e03\u8bed\u4e49\u65b9\u6cd5\u76f8\u7ed3\u5408\u53ef\u4ee5\u5927\u5e45\u8d85\u8d8a\u6240\u6709\u57fa\u7ebf\uff0c\u5305\u62ec\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\u7684\u590d\u6742\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u5de5\u5177\u6765\u63a2\u7d22\u57fa\u4e8e\u7406\u8bba\u7684\u503c\u5206\u7c7b\u7684\u6f5c\u529b\uff0c\u8be5\u5de5\u5177\u53ef\u5728http://xmv.geomeaning.com/\u516c\u5f00\u8bbf\u95ee\u3002|\n", "2410.13863": "|**2024-10-17**|**Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens**|Lijie Fan et.al.|[2410.13863](http://arxiv.org/abs/2410.13863)|null|\u5728\u89c6\u89c9\u9886\u57df\uff0c\u6269\u5927\u81ea\u56de\u5f52\u6a21\u578b\u7684\u6548\u679c\u5e76\u4e0d\u50cf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u90a3\u6837\u663e\u8457\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8fd9\u4e00\u6269\u5c55\u95ee\u9898\uff0c\u91cd\u70b9\u5173\u6ce8\u4e24\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u6a21\u578b\u662f\u5426\u4f7f\u7528\u79bb\u6563\u6216\u8fde\u7eed\u7684\u6807\u8bb0\uff0c\u4ee5\u53ca\u6807\u8bb0\u662f\u901a\u8fc7BERT\u6216GPT\u7c7b\u7684\u53d8\u6362\u5668\u67b6\u6784\u4ee5\u968f\u673a\u987a\u5e8f\u8fd8\u662f\u56fa\u5b9a\u6805\u683c\u987a\u5e8f\u751f\u6210\u3002\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u867d\u7136\u6240\u6709\u6a21\u578b\u5728\u9a8c\u8bc1\u635f\u5931\u65b9\u9762\u90fd\u6709\u6548\u6269\u5c55\uff0c\u4f46\u5b83\u4eec\u7684\u8bc4\u4f30\u6027\u80fd\uff08\u901a\u8fc7FID\u3001GenEval\u5206\u6570\u548c\u89c6\u89c9\u8d28\u91cf\u6765\u8861\u91cf\uff09\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u8d8b\u52bf\u3002\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u7684\u6a21\u578b\u6bd4\u4f7f\u7528\u79bb\u6563\u6807\u8bb0\u7684\u6a21\u578b\u5728\u89c6\u89c9\u8d28\u91cf\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u66f4\u597d\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u751f\u6210\u987a\u5e8f\u548c\u6ce8\u610f\u529b\u673a\u5236\u5bf9GenEval\u5206\u6570\u6709\u663e\u8457\u5f71\u54cd\uff1a\u968f\u673a\u987a\u5e8f\u6a21\u578b\u6bd4\u6805\u683c\u987a\u5e8f\u6a21\u578b\u83b7\u5f97\u4e86\u663e\u8457\u66f4\u9ad8\u7684GenEval\u5206\u6570\u3002\u53d7\u8fd9\u4e9b\u53d1\u73b0\u7684\u542f\u53d1\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u7684\u968f\u673a\u987a\u5e8f\u81ea\u56de\u5f52\u6a21\u578bFluid\u3002Fluid 10.5B\u6a21\u578b\u5728MS-COCO 30K\u4e0a\u7684\u96f6\u6837\u672cFID\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6c34\u5e73\uff0c\u4e3a6.16\uff0c\u5e76\u4e14\u5728GenEval\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u603b\u4f53\u5f97\u5206\u4e3a0.69\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u53d1\u73b0\u548c\u7ed3\u679c\u80fd\u591f\u9f13\u52b1\u672a\u6765\u7684\u7814\u7a76\u8fdb\u4e00\u6b65\u7f29\u5c0f\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u6269\u5c55\u5dee\u8ddd\u3002|\n", "2410.13861": "|**2024-10-17**|**PUMA: Empowering Unified MLLM with Multi-granular Visual Generation**|Rongyao Fang et.al.|[2410.13861](http://arxiv.org/abs/2410.13861)|**[link](https://github.com/rongyaofang/puma)**|**\u8fd1\u671f\u5728\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u65b9\u9762\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9-\u8bed\u8a00\u7406\u89e3\u7684\u80fd\u529b\u3002\u521d\u6b65\u5c1d\u8bd5\u4e5f\u63a2\u7d22\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u5185\u5bb9\u751f\u6210\u4e2d\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u672a\u80fd\u5145\u5206\u89e3\u51b3\u4e0d\u540c\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u5728\u7edf\u4e00MLLM\u8303\u5f0f\u4e0b\u5bf9\u4e0d\u540c\u7c92\u5ea6\u9700\u6c42\u7684\u5904\u7406\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PUMA\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u591a\u7c92\u5ea6\u89c6\u89c9\u751f\u6210\u6765\u8d4b\u80fd\u7edf\u4e00MLLM\u7684\u65b9\u6cd5\u3002PUMA\u5c06\u591a\u7c92\u5ea6\u89c6\u89c9\u7279\u5f81\u7edf\u4e00\u4f5c\u4e3aMLLM\u7684\u8f93\u5165\u548c\u8f93\u51fa\uff0c\u4f18\u96c5\u5730\u89e3\u51b3\u4e86\u4e0d\u540c\u7c92\u5ea6\u9700\u6c42\u5728\u7edf\u4e00MLLM\u6846\u67b6\u4e0b\u7684\u5404\u79cd\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u3002\u7ecf\u8fc7\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u548c\u4efb\u52a1\u7279\u5b9a\u6307\u4ee4\u5fae\u8c03\u540e\uff0cPUMA\u5c55\u793a\u4e86\u5728\u5e7f\u6cdb\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u3002\u8fd9\u9879\u5de5\u4f5c\u4ee3\u8868\u4e86\u5411\u771f\u6b63\u80fd\u591f\u9002\u5e94\u5404\u79cd\u89c6\u89c9\u4efb\u52a1\u5bf9\u7c92\u5ea6\u9700\u6c42\u7684\u7edf\u4e00MLLM\u8fc8\u51fa\u7684\u91cd\u8981\u4e00\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5c06\u5728https://github.com/rongyaofang/PUMA\u53d1\u5e03\u3002**|\n", "2410.13859": "|**2024-10-17**|**$\u03b3-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models**|Yaxin Luo et.al.|[2410.13859](http://arxiv.org/abs/2410.13859)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5176\u9ad8\u6602\u7684\u8ba1\u7b97\u6210\u672c\u4ecd\u7136\u662f\u5b9e\u9645\u90e8\u7f72\u7684\u969c\u788d\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u6df1\u5ea6\u6df7\u5408\uff08MoDs\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u4ece\u201c\u6fc0\u6d3b\u6807\u8bb0\u201d\u7684\u89d2\u5ea6\u6765\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\u95ee\u9898\u3002\u6211\u4eec\u7684\u5173\u952e\u89c1\u89e3\u662f\uff0c\u5982\u679c\u5927\u591a\u6570\u6807\u8bb0\u5bf9\u4e8e\u5c42\u8ba1\u7b97\u6765\u8bf4\u662f\u5197\u4f59\u7684\uff0c\u5219\u53ef\u4ee5\u901a\u8fc7MoD\u5c42\u76f4\u63a5\u8df3\u8fc7\u5b83\u4eec\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5c06MLLMs\u7684\u5bc6\u96c6\u5c42\u8f6c\u6362\u4e3aMoD\u5c42\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u4e0b\u964d\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MoD\u9002\u5e94\u7b56\u7565\uff0c\u79f0\u4e3a$\\gamma$-MoD\uff0c\u7528\u4e8e\u73b0\u6709\u7684MLLMs\u3002\u5728$\\gamma$-MoD\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u6807\u6765\u6307\u5bfcMLLM\u4e2d\u7684MoD\u90e8\u7f72\uff0c\u5373\u6ce8\u610f\u529b\u56fe\u7684\u79e9\uff08ARank\uff09\u3002\u901a\u8fc7ARank\uff0c\u6211\u4eec\u53ef\u4ee5\u6709\u6548\u5730\u8bc6\u522b\u54ea\u4e9b\u5c42\u662f\u5197\u4f59\u7684\uff0c\u5e76\u5e94\u88ab\u66ff\u6362\u4e3aMoD\u5c42\u3002\u57fa\u4e8eARank\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u9896\u7684\u8bbe\u8ba1\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8MLLM\u7684\u8ba1\u7b97\u7a00\u758f\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6027\u80fd\uff0c\u5373\u5171\u4eab\u89c6\u89c9-\u8bed\u8a00\u8def\u7531\u5668\u548c\u63a9\u7801\u8def\u7531\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u4e9b\u8bbe\u8ba1\uff0cMLLM\u7684\u8d85\u8fc790%\u7684\u5bc6\u96c6\u5c42\u53ef\u4ee5\u6709\u6548\u5730\u8f6c\u6362\u4e3aMoD\u5c42\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u6d41\u884c\u7684MLLMs\u4e0a\u8fdb\u884c\u4e86\u5e94\u7528\uff0c\u5e76\u57289\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u9a8c\u8bc1\u4e86$\\gamma$-MoD\u5bf9\u73b0\u6709MLLMs\u7684\u663e\u8457\u6548\u7387\u4f18\u52bf\uff0c\u8fd8\u786e\u8ba4\u4e86\u5b83\u5728\u5404\u79cdMLLMs\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u4f8b\u5982\uff0c$\\gamma$-MoD\u4ec5\u9020\u6210\u8f7b\u5fae\u7684\u6027\u80fd\u4e0b\u964d\uff0c\u5373-1.5%\uff0c\u4f46\u53ef\u4ee5\u5c06LLaVA-HR\u7684\u8bad\u7ec3\u65f6\u95f4\u548c\u63a8\u7406\u65f6\u95f4\u5206\u522b\u51cf\u5c1131.0%\u548c53.2%\u3002|\n", "2410.13857": "|**2024-10-17**|**How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs**|Guhao Feng et.al.|[2410.13857](http://arxiv.org/abs/2410.13857)|null|\u5c3d\u7ba1\u57fa\u4e8eTransformer\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u4f46\u7406\u89e3\u548c\u63d0\u5347\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5bf9LLMs\u7684\u6570\u5b66\u80fd\u529b\u8fdb\u884c\u4e86\u4e25\u683c\u7684\u7406\u8bba\u5206\u6790\uff0c\u7279\u522b\u5173\u6ce8\u5b83\u4eec\u5728\u7b97\u672f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u53d1\u73b0\u6570\u503c\u7cbe\u5ea6\u662f\u5f71\u54cd\u5176\u5728\u6570\u5b66\u4efb\u52a1\u4e2d\u6548\u679c\u7684\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u4f4e\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u5728\u5904\u7406\u7b97\u672f\u4efb\u52a1\uff08\u5982\u8fed\u4ee3\u52a0\u6cd5\u548c\u6574\u6570\u4e58\u6cd5\uff09\u65f6\uff0c\u9664\u975e\u6a21\u578b\u5927\u5c0f\u76f8\u5bf9\u4e8e\u8f93\u5165\u957f\u5ea6\u5448\u8d85\u591a\u9879\u5f0f\u589e\u957f\uff0c\u5426\u5219\u65e0\u6cd5\u6709\u6548\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6807\u51c6\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u80fd\u591f\u4ee5\u663e\u8457\u66f4\u5c0f\u7684\u6a21\u578b\u89c4\u6a21\u9ad8\u6548\u5730\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u63a2\u7d22\u4e0d\u540c\u6570\u503c\u7cbe\u5ea6\u5bf9\u7b97\u672f\u4efb\u52a1\u7684\u5f71\u54cd\u7684\u5b9e\u8bc1\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u652f\u6301\u4e86\u6211\u4eec\u7684\u7406\u8bba\u53d1\u73b0\uff0c\u4e3a\u63d0\u9ad8LLMs\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002|\n", "2410.13854": "|**2024-10-17**|**Can MLLMs Understand the Deep Implication Behind Chinese Images?**|Chenhao Zhang et.al.|[2410.13854](http://arxiv.org/abs/2410.13854)|**[link](https://github.com/MING-ZCH/CII-Bench)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u80fd\u529b\u4e0d\u65ad\u63d0\u5347\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u66f4\u9ad8\u5c42\u6b21\u7684\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u9488\u5bf9MLLMs\u5728\u4e2d\u6587\u89c6\u89c9\u5185\u5bb9\u4e0a\u8fdb\u884c\u9ad8\u5c42\u6b21\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u8bc4\u4f30\u7684\u5de5\u4f5c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201cCII-Bench\u201d\uff08Chinese Image Implication understanding Benchtermark\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u65e8\u5728\u8bc4\u4f30MLLMs\u5bf9\u4e2d\u56fd\u56fe\u50cf\u8fdb\u884c\u9ad8\u5c42\u6b21\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\u3002\u4e0e\u73b0\u6709\u57fa\u51c6\u76f8\u6bd4\uff0cCII-Bench\u5177\u6709\u51e0\u4e2a\u663e\u8457\u7684\u7279\u70b9\u3002\u9996\u5148\uff0c\u4e3a\u4e86\u786e\u4fdd\u4e2d\u6587\u8bed\u5883\u7684\u771f\u5b9e\u6027\uff0cCII-Bench\u4e2d\u7684\u56fe\u50cf\u6765\u81ea\u4e2d\u56fd\u4e92\u8054\u7f51\uff0c\u5e76\u7ecf\u8fc7\u4eba\u5de5\u5ba1\u67e5\uff0c\u76f8\u5e94\u7684\u7b54\u6848\u4e5f\u662f\u4eba\u5de5\u7cbe\u5fc3\u5236\u4f5c\u7684\u3002\u6b64\u5916\uff0cCII-Bench\u8fd8\u7eb3\u5165\u4e86\u4ee3\u8868\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u56fe\u50cf\uff0c\u5982\u8457\u540d\u7684\u4e2d\u56fd\u4f20\u7edf\u753b\u4f5c\uff0c\u8fd9\u53ef\u4ee5\u6df1\u5165\u53cd\u6620\u6a21\u578b\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u7406\u89e3\u3002\u901a\u8fc7\u5728\u591a\u4e2aMLLMs\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4e86\u91cd\u8981\u53d1\u73b0\u3002\u6700\u521d\uff0cMLLMs\u5728CII-Bench\u4e0a\u7684\u8868\u73b0\u4e0e\u4eba\u7c7b\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002MLLMs\u7684\u6700\u4f73\u51c6\u786e\u7387\u8fbe\u5230\u4e8664.4%\uff0c\u800c\u4eba\u7c7b\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a78.2%\uff0c\u6700\u9ad8\u53ef\u8fbe81.0%\u3002\u968f\u540e\uff0cMLLMs\u5728\u5904\u7406\u4e0e\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u76f8\u5173\u7684\u56fe\u50cf\u65f6\u8868\u73b0\u8f83\u5dee\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u7406\u89e3\u9ad8\u5c42\u6b21\u8bed\u4e49\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5e76\u4e14\u7f3a\u4e4f\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u6df1\u5165\u4e86\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5927\u591a\u6570\u6a21\u578b\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u56fe\u50cf\u60c5\u611f\u7ebf\u7d22\u540e\uff0c\u5176\u51c6\u786e\u6027\u6709\u6240\u63d0\u9ad8\u3002\u6211\u4eec\u8ba4\u4e3a\uff0cCII-Bench\u5c06\u4f7fMLLMs\u66f4\u597d\u5730\u7406\u89e3\u4e2d\u6587\u8bed\u4e49\u548c\u7279\u5b9a\u4e8e\u4e2d\u56fd\u7684\u56fe\u50cf\uff0c\u63a8\u52a8\u8fc8\u5411\u4e13\u5bb6\u7ea7\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u8fdb\u7a0b\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u4ee5\u5728https://cii-bench.github.io/\u516c\u5f00\u8bbf\u95ee\u3002**|\n", "2410.13852": "|**2024-10-17**|**Retrospective Learning from Interactions**|Zizhao Chen et.al.|[2410.13852](http://arxiv.org/abs/2410.13852)|null|\u591a\u8f6e\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7528\u6237\u4e4b\u95f4\u7684\u5bf9\u8bdd\u81ea\u7136\u5305\u542b\u4e86\u9690\u5f0f\u7684\u53cd\u9988\u4fe1\u53f7\u3002\u5982\u679cLLM\u4ee5\u51fa\u4e4e\u610f\u6599\u7684\u65b9\u5f0f\u56de\u5e94\u7528\u6237\u7684\u6307\u4ee4\uff0c\u7528\u6237\u53ef\u80fd\u4f1a\u901a\u8fc7\u91cd\u65b0\u8868\u8ff0\u8bf7\u6c42\u3001\u8868\u8fbe\u632b\u8d25\u611f\u6216\u8f6c\u5411\u66ff\u4ee3\u4efb\u52a1\u6765\u4f20\u8fbe\u8fd9\u4e9b\u4fe1\u53f7\u3002\u8fd9\u4e9b\u4fe1\u53f7\u4e0e\u5177\u4f53\u4efb\u52a1\u65e0\u5173\uff0c\u5e76\u4e14\u5360\u636e\u8bed\u8a00\u7684\u4e00\u4e2a\u76f8\u5bf9\u53d7\u9650\u7684\u5b50\u7a7a\u95f4\uff0c\u5373\u4f7fLLM\u5728\u5b9e\u9645\u4efb\u52a1\u4e0a\u5931\u8d25\u4e86\uff0c\u4e5f\u80fd\u8bc6\u522b\u8fd9\u4e9b\u4fe1\u53f7\u3002\u8fd9\u4e3aLLM\u63d0\u4f9b\u4e86\u4e0d\u65ad\u4ece\u4ea4\u4e92\u4e2d\u5b66\u4e60\u7684\u673a\u4f1a\uff0c\u800c\u65e0\u9700\u989d\u5916\u7684\u6ce8\u91ca\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aReSpect\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u56de\u987e\u8fc7\u53bb\u4ea4\u4e92\u4e2d\u7684\u8fd9\u4e9b\u4fe1\u53f7\u6765\u5b66\u4e60\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u573a\u666f\u4e2d\u90e8\u7f72\u4e86ReSpect\uff0c\u5728\u8be5\u573a\u666f\u4e2d\uff0c\u4eba\u7c7b\u6307\u5bfcLLM\u89e3\u51b3\u5177\u6709\u7ec4\u5408\u89e3\u7a7a\u95f4\u7684\u62bd\u8c61\u63a8\u7406\u4efb\u52a1\u3002\u901a\u8fc7\u6570\u5343\u6b21\u4e0e\u4eba\u7c7b\u7684\u4ea4\u4e92\uff0c\u6211\u4eec\u5c55\u793a\u4e86ReSpect\u5982\u4f55\u9010\u6b65\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7387\uff0c\u4ece\u6700\u521d\u768431%\u63d0\u5347\u523082%\uff0c\u5e76\u4e14\u6574\u4e2a\u8fc7\u7a0b\u6ca1\u6709\u4efb\u4f55\u5916\u90e8\u6ce8\u91ca\u3002|\n", "2410.13846": "|**2024-10-17**|**SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction**|Xuan Zhang et.al.|[2410.13846](http://arxiv.org/abs/2410.13846)|**[link](https://github.com/sail-sg/simlayerkv)**|**\u8fd1\u671f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u65b9\u9762\u7684\u8fdb\u5c55\u4f7f\u5176\u80fd\u591f\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u3002\u7136\u800c\uff0c\u589e\u52a0\u6a21\u578b\u5c42\u6570\u548c\u8f93\u5165\u5e8f\u5217\u7684\u957f\u5ea6\u663e\u8457\u589e\u52a0\u4e86\u5b58\u50a8\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\uff0c\u8fd9\u5bf9\u9ad8\u6548\u7684\u63a8\u7406\u6784\u6210\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SimLayerKV\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u9009\u62e9\u6027\u5730\u5220\u9664\u8bc6\u522b\u51fa\u7684\u61d2\u60f0\u5c42\u4e2d\u7684\u7f13\u5b58\u6765\u51cf\u5c11\u5c42\u95f4KV\u7f13\u5b58\u7684\u5197\u4f59\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u8fd9\u6837\u7684\u89c2\u5bdf\uff1a\u5728\u957f\u4e0a\u4e0b\u6587LLM\u4e2d\uff0c\u67d0\u4e9b\u5c42\u8868\u73b0\u51fa\u201c\u61d2\u60f0\u201d\u884c\u4e3a\uff0c\u5bf9\u5efa\u6a21\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8d21\u732e\u8f83\u5c11\uff0c\u4e0d\u5982\u975e\u61d2\u60f0\u5c42\u3002\u901a\u8fc7\u5206\u6790\u6ce8\u610f\u529b\u6743\u91cd\u6a21\u5f0f\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u61d2\u60f0\u5c42\u5728\u7ed9\u5b9a\u8f93\u5165\u751f\u6210\u8fc7\u7a0b\u4e2d\u5bf9\u4e0d\u540ctoken\u7684\u884c\u4e3a\u662f\u4e00\u81f4\u7684\u3002\u8fd9\u4e00\u89c1\u89e3\u542f\u53d1\u4e86\u6211\u4eec\u7684SimLayerKV\uff0c\u5b83\u901a\u8fc7\u8bc6\u522b\u61d2\u60f0\u5c42\u5e76\u76f8\u5e94\u5730\u51cf\u5c11\u5b83\u4eec\u7684KV\u7f13\u5b58\u6765\u5b9e\u73b0\u8fd9\u4e00\u70b9\u3002SimLayerKV\u662f\u65e0\u9700\u8bad\u7ec3\u7684\u3001\u53ef\u6cdb\u5316\u7684\uff0c\u5e76\u4e14\u53ea\u9700\u4e03\u884c\u4ee3\u7801\u5373\u53ef\u5b9e\u73b0\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u4ee3\u8868\u6027LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4f8b\u5982LLaMA2-7B\u3001LLaMA3-8B\u548cMistral-7B\uff0c\u5728\u6765\u81eaLongBench\u57fa\u51c6\u768416\u9879\u4efb\u52a1\u4e0a\u8fdb\u884c\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u7ed3\u54084\u4f4d\u91cf\u5316\u65f6\uff0cSimLayerKV\u5b9e\u73b0\u4e865\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\uff0c\u4ec5\u5bfc\u81f41.2%\u7684\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/sail-sg/SimLayerKV\u83b7\u5f97\u3002**|\n", "2410.13835": "|**2024-10-17**|**Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs**|Tianyu Guo et.al.|[2410.13835](http://arxiv.org/abs/2410.13835)|null|\u5b9e\u8df5\u8005\u4eec\u5728\u53d8\u538b\u5668\u57fa\u7840\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u89c2\u5bdf\u5230\u4e86\u4e09\u4e2a\u4ee4\u4eba\u56f0\u60d1\u7684\u73b0\u8c61\uff1a\u6ce8\u610f\u529b Sink\u3001\u503c\u72b6\u6001\u8017\u5c3d\u548c\u6b8b\u5dee\u72b6\u6001\u5cf0\u503c\uff0c\u8fd9\u4e9b\u73b0\u8c61\u7edf\u79f0\u4e3a\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u3002\u8fd9\u4e9b\u73b0\u8c61\u7684\u7279\u70b9\u662f\u67d0\u4e9b\u6240\u8c13\u7684\u201cSink \u6807\u8bb0\u201d\u63a5\u6536\u4e0d\u6210\u6bd4\u4f8b\u9ad8\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u8868\u73b0\u51fa\u663e\u8457\u8f83\u5c0f\u7684\u503c\u72b6\u6001\uff0c\u5e76\u4e14\u5177\u6709\u6bd4\u5176\u4ed6\u6807\u8bb0\u5927\u5f97\u591a\u7684\u6b8b\u5dee\u72b6\u6001\u8303\u6570\u3002\u8fd9\u4e9b\u6781\u7aef\u6807\u8bb0\u5bfc\u81f4\u4e86LLM\u63a8\u7406\u3001\u91cf\u5316\u548c\u53ef\u89e3\u91ca\u6027\u4e2d\u7684\u5404\u79cd\u6311\u6218\u3002\u6211\u4eec\u9610\u660e\u4e86\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u80cc\u540e\u7684\u673a\u5236\u3002\u9996\u5148\uff0c\u6211\u4eec\u8868\u660e\u8fd9\u4e9b\u73b0\u8c61\u51fa\u73b0\u5728\u975e\u5e38\u7b80\u5355\u7684\u67b6\u6784\u4e2d\u2014\u2014\u53ea\u6709\u4e00\u5230\u4e09\u5c42\u7684\u53d8\u538b\u5668\uff0c\u5728\u73a9\u5177\u6a21\u578bBigram-Backcopy\uff08BB\uff09\u4efb\u52a1\u4e0a\u8bad\u7ec3\u65f6\u4e5f\u4f1a\u51fa\u73b0\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e00\u79cd\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5176\u4e2d\u6ce8\u610f\u529b\u5934\u5bf9\u4e8e\u7279\u5b9a\u8f93\u5165\u57df\u6210\u4e3aSink\uff0c\u800c\u5bf9\u4e8e\u5176\u4ed6\u8f93\u5165\u5219\u4e0d\u7136\u3002\u6211\u4eec\u5bf9\u8bad\u7ec3\u52a8\u6001\u7684\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u4e9b\u73b0\u8c61\u662f\u7531\u4e00\u79cd\u76f8\u4e92\u5f3a\u5316\u673a\u5236\u9a71\u52a8\u7684\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u7f13\u89e3\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u7684\u7b56\u7565\uff0c\u5305\u62ec\u7528ReLU\u66ff\u6362softmax\u4ee5\u53ca\u7528SGD\u66ff\u6362Adam\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5206\u6790\u6269\u5c55\u5230\u9884\u8bad\u7ec3\u7684LLMs\uff0c\u5305\u62ecLlama\u548cOLMo\uff0c\u663e\u793a\u8bb8\u591a\u6ce8\u610f\u529b\u5934\u8868\u73b0\u51fa\u4e0eBB\u4efb\u52a1\u4e2d\u7c7b\u4f3c\u7684\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5e76\u4e14\u76f8\u4e92\u5f3a\u5316\u673a\u5236\u4e5f\u63a7\u5236\u7740LLM\u9884\u8bad\u7ec3\u671f\u95f4\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u7684\u51fa\u73b0\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u8bb8\u591a\u7531BB\u4efb\u52a1\u9884\u6d4b\u7684\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u7684\u9759\u6001\u548c\u52a8\u6001\u7279\u6027\u4e0e\u5728\u9884\u8bad\u7ec3\u7684LLMs\u4e2d\u7684\u89c2\u5bdf\u7ed3\u679c\u4e00\u81f4\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u6765\u5b9e\u73b0\u81ea\u6cbb\uff0c\u53ef\u4ee5\u63d0\u9ad8\u4eba\u7c7b\u5728\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u5982\u9884\u8ba2\u9884\u7b97\u5185\u7684\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u8fd9\u4e9b\u7f51\u7edc\u4ee3\u7406\u4e0d\u4ec5\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\uff0c\u8fd8\u4f5c\u4e3a\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6982\u5ff5\u9a8c\u8bc1\u793a\u4f8b\uff0c\u5176\u6210\u529f\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\uff0c\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u800c\u8fd9\u4e9b\u7b56\u7565\u53ef\u80fd\u65e0\u6cd5\u5f88\u597d\u5730\u63a8\u5e7f\u5230\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0e\u57fa\u4e8e\u8be5\u4ee3\u7406\u7684LLM\u7684\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u9519\u4f4d\u7684\u7814\u7a76\u975e\u5e38\u6709\u9650\u3002\u8fd9\u79cd\u5dee\u5f02\u5c24\u5176\u660e\u663e\uff0c\u56e0\u4e3aLLM\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u8fdb\u884c\u8bad\u7ec3\uff0c\u800c\u4e0d\u662f\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u5316\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u4ee5\u66f4\u597d\u5730\u9002\u5e94LLM\u7684\u80fd\u529b\uff0c\u4ece\u800c\u589e\u5f3a\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u7684\u57fa\u7840\u4ee3\u7406\u5728\u5e7f\u6cdb\u7684\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4ee5\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u57fa\u51c6\u6d4b\u8bd5\u6db5\u76d6\u4e86\u901a\u7528\u7684\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u4e4b\u524d\u7684\u6700\u5148\u8fdb\u65b9\u6cd5\u548c\u540c\u671f\u5de5\u4f5c\u5206\u522b\u9ad8\u51fa9.8\u5206\uff08+29.4%\uff09\u548c5.9\u5206\uff08+15.8%\uff09\uff0c\u5e76\u4e14\u6210\u529f\u7387\u4e3a26.6\u5206\uff08+161%\uff09\uff0c\u8d85\u8fc7\u4e86\u7c7b\u4f3c\u7684\u57fa\u672c\u7f51\u7edc\u4ee3\u7406\uff0c\u8fd9\u4e9b\u4ee3\u7406\u7684\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u6ca1\u6709\u5bf9\u9f50\u3002\u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u7684\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u7b80\u5355\u8bbe\u8ba1\u7a81\u663e\u4e86LLMs\u5728\u65e0\u6837\u672c\u60c5\u51b5\u4e0b\u5904\u7406\u7f51\u7edc\u4efb\u52a1\u7684\u5f3a\u5927\u6027\u80fd\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2410.13824": "|**2024-10-17**|**Harnessing Webpage UIs for Text-Rich Visual Understanding**|Junpeng Liu et.al.|[2410.13824](http://arxiv.org/abs/2410.13824)|null|\u6587\u672c\u4e30\u5bcc\u7684\u89c6\u89c9\u7406\u89e3\u2014\u2014\u5373\u5904\u7406\u5bc6\u96c6\u6587\u672c\u5185\u5bb9\u4e0e\u89c6\u89c9\u5143\u7d20\u76f8\u7ed3\u5408\u7684\u73af\u5883\u7684\u80fd\u529b\u2014\u2014\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u4e0e\u7ed3\u6784\u5316\u73af\u5883\u4ea4\u4e92\u65f6\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u4e86\u589e\u5f3a\u8fd9\u79cd\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u57fa\u4e8e\u6587\u672c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ece\u7f51\u9875\u7528\u6237\u754c\u9762\uff08UI\uff09\u5408\u6210\u901a\u7528\u7684\u591a\u6a21\u6001\u6307\u4ee4\u3002\u5c3d\u7ba1\u7f3a\u4e4f\u76f4\u63a5\u7684\u89c6\u89c9\u8f93\u5165\uff0c\u57fa\u4e8e\u6587\u672c\u7684LLMs\u80fd\u591f\u5904\u7406\u6765\u81ea\u7f51\u9875\u53ef\u8bbf\u95ee\u6027\u6811\u7684\u7ed3\u6784\u5316\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6307\u4ee4\u968f\u540e\u4e0eUI\u622a\u56fe\u914d\u5bf9\u4ee5\u8bad\u7ec3\u591a\u6a21\u6001\u6a21\u578b\u3002\u6211\u4eec\u5f15\u5165\u4e86MultiUI\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6765\u81ea100\u4e07\u4e2a\u7f51\u7ad9\u7684730\u4e07\u4e2a\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u591a\u6837\u5316\u7684\u591a\u6a21\u6001\u4efb\u52a1\u548cUI\u5e03\u5c40\u3002\u5728MultiUI\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7f51\u9875UI\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u2014\u2014\u5728VisualWebBench\u4e0a\u5b9e\u73b0\u4e86\u9ad8\u8fbe48%\u7684\u63d0\u5347\uff0c\u5728Mind2Web\u7f51\u9875\u4ee3\u7406\u6570\u636e\u96c6\u4e0a\u7684\u52a8\u4f5c\u51c6\u786e\u7387\u63d0\u9ad8\u4e8619.1%\u2014\u2014\u800c\u4e14\u5728\u975e\u7f51\u9875UI\u4efb\u52a1\u4ee5\u53ca\u751a\u81f3\u975eUI\u9886\u57df\uff08\u5982\u6587\u6863\u7406\u89e3\u3001OCR\u548c\u56fe\u8868\u89e3\u91ca\uff09\u4e2d\u7684\u6cdb\u5316\u6548\u679c\u4e5f\u975e\u5e38\u597d\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u7f51\u9875UI\u6570\u636e\u5728\u63a8\u8fdb\u5404\u79cd\u573a\u666f\u4e0b\u6587\u672c\u4e30\u5bcc\u7684\u89c6\u89c9\u7406\u89e3\u65b9\u9762\u7684\u5e7f\u6cdb\u5e94\u7528\u3002|\n"}} \ No newline at end of file +{"agent": {"2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u5d4c\u5165\u5f0f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7cfb\u7edf\u5728\u7a7a\u95f4\u8ba4\u77e5\u548c\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u96c6\u6210\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u8bba\u6587\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u63ed\u793a\u4e86\u660e\u663e\u7684\u8fdb\u5c55\uff0c\u4f46\u4e5f\u5f3a\u8c03\u4e86\u5f00\u53d1\u65b0\u65b9\u6cd5\u4ee5\u5145\u5206\u5229\u75283D-LLMs\u6f5c\u529b\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u6307\u660e\u9053\u8def\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u7efc\u8ff0\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.09935": "|**2024-05-24**|**DEBATE: Devil's Advocate-Based Assessment and Text Evaluation**|Alex Kim et.al.|[2405.09935](http://arxiv.org/abs/2405.09935)|**[link](https://github.com/gunny97/DEBATE)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u6a21\u578b\u7684\u666e\u53ca\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u673a\u5668\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u53d8\u5f97\u65e5\u76ca\u5173\u952e\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65e0\u53c2\u8003\u8bc4\u4ef7\u5668\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u5904\u7406\u65b0\u4efb\u52a1\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u91c7\u7528\u5355\u4ee3\u7406\u65b9\u6cd5\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u8868\u73b0\u3002\u56e0\u4e3aLLM\u4ee3\u7406\u7684\u56de\u7b54\u5b58\u5728\u504f\u89c1\uff0c\u6bd4\u5982\u5bf9\u7279\u5b9a\u6587\u672c\u7ed3\u6784\u6216\u5185\u5bb9\u7684\u504f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u5de5\u4f5c\u4e2d\u63d0\u51faDEBATE\uff0c\u4e00\u4e2a\u5efa\u7acb\u5728\u591a\u4ee3\u7406\u8bc4\u5206\u7cfb\u7edf\u57fa\u7840\u4e0a\u7684NLG\u8bc4\u4ef7\u6846\u67b6\uff0c\u878d\u5165\u4e86\u201c\u6076\u9b54\u8fa9\u624b\u201d\u7684\u6982\u5ff5\u3002\u5728\u8be5\u6846\u67b6\u4e2d\uff0c\u4e00\u4e2a\u4ee3\u7406\u88ab\u6307\u4ee4\u6279\u8bc4\u5176\u4ed6\u4ee3\u7406\u7684\u8bba\u70b9\uff0c\u4ece\u800c\u53ef\u80fd\u6d88\u89e3LLM\u4ee3\u7406\u7b54\u6848\u4e2d\u7684\u504f\u89c1\u3002DEBATE\u5728\u4e24\u4e2aNLG\u8bc4\u4ef7\u5143\u8bc4\u4f30\u57fa\u51c6\u2014\u2014SummEval\u548cTopicalChat\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4ee3\u7406\u4e4b\u95f4\u7684\u8fa9\u8bba\u5e7f\u5ea6\u4ee5\u53ca\u4ee3\u7406\u7684\u4eba\u683c\u7279\u8d28\u4f1a\u5f71\u54cd\u8bc4\u4ef7\u5668\u7684\u6027\u80fd\u3002|\n", "2405.05175": "|**2024-05-08**|**Air Gap: Protecting Privacy-Conscious Conversational Agents**|Eugene Bagdasaryan et.al.|[2405.05175](http://arxiv.org/abs/2405.05175)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u5f0f\u4ee3\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5904\u7406\u654f\u611f\u7528\u6237\u6570\u636e\u65f6\u5f15\u53d1\u4e86\u4e25\u91cd\u7684\u9690\u79c1\u95ee\u9898\u3002\u8fd9\u4e9b\u4ee3\u7406\u867d\u80fd\u7406\u89e3\u5e76\u5904\u7406\u4e0a\u4e0b\u6587\uff0c\u4f46\u4e5f\u53ef\u80fd\u88ab\u6076\u610f\u4e00\u65b9\u5229\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5a01\u80c1\u6a21\u578b\uff0c\u5373\u7b2c\u4e09\u65b9\u5e94\u7528\u901a\u8fc7\u64cd\u63a7\u4ea4\u4e92\u4e0a\u4e0b\u6587\uff0c\u8bef\u5bfcLLM\u4ee3\u7406\u6cc4\u9732\u4e0e\u5176\u4efb\u52a1\u65e0\u5173\u7684\u79c1\u4eba\u4fe1\u606f\u3002\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u6846\u67b6\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AirGapAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u6ce8\u91cd\u9690\u79c1\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u901a\u8fc7\u9650\u5236\u4ee3\u7406\u4ec5\u8bbf\u95ee\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u6570\u636e\uff0c\u9632\u6b62\u610f\u5916\u7684\u6570\u636e\u6cc4\u6f0f\u3002\u5b9e\u9a8c\u4f7f\u7528Gemini\u3001GPT\u548cMistral\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\uff0c\u7ed3\u679c\u663e\u793aAirGapAgent\u5728\u62b5\u5fa1\u57fa\u4e8e\u5355\u4e2a\u67e5\u8be2\u7684\u4e0a\u4e0b\u6587\u52ab\u6301\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8eGemini Ultra\u4ee3\u7406\uff0c\u8fd9\u79cd\u653b\u51fb\u4ece94%\u7684\u4fdd\u62a4\u80fd\u529b\u964d\u4f4e\u523045%\uff0c\u800cAirGapAgent\u53ef\u4ee5\u4fdd\u630197%\u7684\u9632\u62a4\u6548\u679c\uff0c\u4f7f\u540c\u6837\u7684\u653b\u51fb\u5931\u6548\u3002|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325](http://arxiv.org/abs/2405.04325)|null|\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u867d\u4e3a\u6784\u5efa\u81ea\u7136\u8bed\u8a00\u4ee3\u7406\u63d0\u4f9b\u4e86\u5f3a\u5927\u57fa\u7840\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u5b83\u4eec\u53ca\u5176\u57fa\u4e8e\u5b83\u4eec\u6784\u5efa\u7684\u81ea\u4e3b\u4ee3\u7406\u7684\u5b89\u5168\u6027\u62c5\u5fe7\u3002\u7279\u522b\u662f\u6b3a\u9a97\u80fd\u529b\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662fAI\u4ee3\u7406\u901a\u8fc7\u6df7\u6dc6\u548c\u6a21\u68f1\u4e24\u53ef\u6765\u8bef\u5bfc\u3001\u9690\u85cf\u771f\u76f8\u6216\u63a8\u5e7f\u90e8\u5206\u4e0d\u771f\u5b9e\u7684\u4fe1\u5ff5\u7684\u884c\u4e3a\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80AI\u5b89\u5168\u7814\u7a76\u4e2d\u7684\u6492\u8c0e\u3001\u81ea\u79c1\u51b3\u7b56\u6216\u63d0\u4f9b\u865a\u5047\u4fe1\u606f\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u4e00\u7c7b\u7279\u6b8a\u7684\u6b3a\u9a97\uff1a\u7c7b\u4f3c\u4e8e\u9b54\u672f\u5e08\u5229\u7528\u969c\u773c\u6cd5\u8ba9\u5154\u5b50\u4ece\u5e3d\u5b50\u91cc\u51fa\u73b0\uff0c\u8981\u4e48\u901a\u8fc7\u9690\u85cf\u7684\u6697\u95e8\uff0c\u8981\u4e48\u901a\u8fc7\u8f6c\u79fb\u6ce8\u610f\u529b\u76f4\u63a5\u5c55\u793a\u3002 \u6211\u4eec\u7684\u65b0\u5b9e\u9a8c\u5e73\u53f0\u5728\u4e00\u4e2a\u6709\u76ee\u6807\u7684\u73af\u5883\u4e2d\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u5bf9\u6297\u6027\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65f6\u7684\u6b3a\u9a97\u56fa\u6709\u80fd\u529b\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u7acb\u6cd5\u4efb\u52a1\u201c\u6e38\u8bf4\u201d\u8bae\u6848\u3002\u5728\u76ee\u6807\u9a71\u52a8\u7684\u73af\u5883\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u6784\u5efa\u6b3a\u9a97\u80fd\u529b\uff0c\u7ed3\u5408\u8bed\u8a00\u54f2\u5b66\u548c\u8ba4\u77e5\u5fc3\u7406\u5b66\u7406\u8bba\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6e38\u8bf4\u4ee3\u7406\u5728\u5bf9\u6297\u4e92\u52a8\u7684\u540e\u7eed\u5f3a\u5316\u8bd5\u9a8c\u4e2d\u5176\u6b3a\u9a97\u80fd\u529b\u63d0\u9ad8\u4e86\u7ea640%\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6b3a\u9a97\u68c0\u6d4b\u673a\u5236\u80fd\u8fbe\u5230\u9ad8\u8fbe92%\u7684\u8bc6\u522b\u7387\u3002\u8fd9\u4e9b\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u95ee\u9898\uff0c\u5373\u4ee3\u7406\u53ef\u80fd\u64cd\u7eb5\u4eba\u7c7b\u4ee5\u8fbe\u6210\u9884\u8bbe\u76ee\u6807\u3002|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324](http://arxiv.org/abs/2405.04324)|**[link](https://github.com/ibm-granite/granite-code-models)**|**\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u9886\u57df\u7684\u8bad\u7ec3\u6b63\u5728\u9769\u65b0\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002\u5982\u4eca\uff0c\u8fd9\u4e9b\u4ee3\u7801LLMs\u6b63\u9010\u6b65\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u73af\u5883\uff0c\u4ee5\u63d0\u5347\u4eba\u7c7b\u7a0b\u5e8f\u5458\u7684\u6548\u7387\uff0c\u5e76\u5c55\u73b0\u51fa\u81ea\u4e3b\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u8981\u5145\u5206\u5229\u7528\u4ee3\u7801LLMs\u7684\u5168\u90e8\u6548\u80fd\uff0c\u9700\u8981\u5176\u5177\u5907\u751f\u6210\u4ee3\u7801\u3001\u4fee\u590dbug\u3001\u89e3\u91ca\u548c\u6ce8\u91ca\u4ee3\u7801\u3001\u7ef4\u62a4\u4ed3\u5e93\u7b49\u591a\u79cd\u529f\u80fd\u3002\u672c\u6587\u4ecb\u7ecdGranite\u7cfb\u5217\u7684\u89e3\u7801\u5668\u4ec5\u6709\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u4e13\u4e3a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u800c\u8bbe\u8ba1\uff0c\u8bad\u7ec3\u6570\u636e\u6db5\u76d6116\u79cd\u7f16\u7a0b\u8bed\u8a00\u3002Granite Code\u6a21\u578b\u5bb6\u65cf\u5305\u62ec\u4ece3\u4ebf\u5230340\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u4ece\u590d\u6742\u5e94\u7528\u73b0\u4ee3\u5316\u5230\u8bbe\u5907\u5185\u5b58\u53d7\u9650\u7684\u591a\u79cd\u5e94\u7528\u573a\u666f\u3002\u901a\u8fc7\u5168\u9762\u4efb\u52a1\u8bc4\u4f30\uff0cGranite Code\u6a21\u578b\u5728\u5f00\u6e90\u4ee3\u7801LLM\u4e2d\u7684\u6027\u80fd\u59cb\u7ec8\u5904\u4e8e\u9886\u5148\u6c34\u5e73\u3002\u8be5\u6a21\u578b\u5bb6\u65cf\u9488\u5bf9\u4f01\u4e1a\u8f6f\u4ef6\u5f00\u53d1\u5de5\u4f5c\u6d41\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8868\u73b0\u51fa\u8272\u4e8e\u5404\u79cd\u7f16\u7801\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\u3001\u4fee\u590d\u4e0e\u89e3\u91ca\uff09\uff0c\u662f\u4e00\u6b3e\u591a\u7528\u9014\u7684\u5168\u80fd\u4ee3\u7801\u6a21\u578b\u3002\u6211\u4eec\u4ee5Apache 2.0\u8bb8\u53ef\u534f\u8bae\u53d1\u5e03\u6240\u6709Granite Code\u6a21\u578b\uff0c\u4f9b\u7814\u7a76\u548c\u5546\u4e1a\u4f7f\u7528\u3002**|\n", "2405.04219": "|**2024-05-07**|**Iterative Experience Refinement of Software-Developing Agents**|Chen Qian et.al.|[2405.04219](http://arxiv.org/abs/2405.04219)|null|### \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u81ea\u4e3b\u6027\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u9759\u6001\u7ecf\u9a8c\u8303\u5f0f\u4f9d\u8d56\u4e8e\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u83b7\u53d6\u7684\u56fa\u5b9a\u5386\u53f2\u7ecf\u9a8c\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u8fed\u4ee3\u7ecf\u9a8c\u4f18\u5316\u6846\u67b6\uff0c\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u5728\u6267\u884c\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u52a8\u6001\u8c03\u6574\u548c\u4f18\u5316\u7ecf\u9a8c\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e24\u79cd\u6838\u5fc3\u6a21\u5f0f\uff1a\u987a\u5e8f\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u6279\u6b21\u5185\u7684\u6700\u8fd1\u7ecf\u9a8c\u8fdb\u884c\u6539\u8fdb\uff1b\u7d2f\u8ba1\u6a21\u5f0f\uff0c\u79ef\u7d2f\u6240\u6709\u5148\u524d\u4efb\u52a1\u6279\u6b21\u7684\u7ecf\u9a8c\u3002\u901a\u8fc7\u5f15\u5165\u7ecf\u9a8c\u6dd8\u6c70\u7b56\u7565\uff0c\u8be5\u65b9\u6cd5\u4f18\u5148\u9009\u62e9\u9ad8\u8d28\u91cf\u548c\u5e38\u7528\u7684\u7ecf\u9a8c\uff0c\u6709\u6548\u5730\u7ba1\u7406\u7ecf\u9a8c\u7a7a\u95f4\uff0c\u63d0\u9ad8\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u987a\u5e8f\u6a21\u5f0f\u53ef\u80fd\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\uff0c\u4f46\u7d2f\u8ba1\u6a21\u5f0f\u5728\u7a33\u5b9a\u6027\u65b9\u9762\u66f4\u4f18\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u6dd8\u6c70\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u9ad8\u8d28\u91cf\u7ecf\u9a8c\u5b50\u96c6\u768411.54%\uff0c\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2405.03813": "|**2024-05-06**|**Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control**|Yaqub Chaudhary et.al.|[2405.03813](http://arxiv.org/abs/2405.03813)|null|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6a21\u4eff\u5404\u79cd\u4fee\u8f9e\u98ce\u683c\uff0c\u751f\u6210\u8868\u8fbe\u5e7f\u6cdb\u60c5\u611f\u7684\u6587\u672c\uff0c\u8fd9\u79cd\u80fd\u529b\u5728\u4f4e\u6210\u672c\u4e0b\u8fc5\u901f\u666e\u53ca\uff0c\u5e26\u6765\u4e86\u6f5c\u5728\u7684\u793e\u4f1a\u5371\u5bb3\u3002\u672c\u6587\u5e76\u672a\u5b64\u7acb\u770b\u5f85\u8fd9\u4e9b\u6a21\u578b\uff0c\u800c\u662f\u5173\u6ce8\u5b83\u4eec\u80cc\u540e\u5927\u89c4\u6a21\u8ba1\u7b97\u57fa\u7840\u8bbe\u65bd\u5728\u5404\u9886\u57df\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u63a2\u8ba8\u4e86LLMs\u5982\u4f55\u901a\u8fc7\u6c61\u67d3\u548c\u6807\u51c6\u5316\u4fe1\u606f\u73af\u5883\u6765\u5f71\u54cd\u793e\u4f1a\uff0c\u5e76\u6307\u51fa\u8fd9\u4e9b\u529f\u80fd\u53ef\u80fd\u88ab\u7528\u4f5c\u63a7\u5236\u624b\u6bb5\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u7126\u70b9\u8f6c\u5411\u51e0\u4e2a\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u8fd9\u4e9b\u9886\u57df\u589e\u5f3a\u4e86LLMs\u4f5c\u4e3a\u6743\u529b\u5de5\u5177\u7684\u80fd\u529b\uff1a 1. \u901a\u8fc7\u5b9e\u65f6\u8bbe\u8ba1\u5bf9\u8bdd\u754c\u9762\u4e2d\u7684\u9009\u62e9\u67b6\u6784\uff08\u5982\u201cAI\u89d2\u8272\u201d\uff09\uff0c\u8fdb\u884c\u8bf4\u670d\u7b56\u7565\u3002 2. \u5229\u7528LLM\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u8ba1\u7b97\u6a21\u578b\uff08\u5982\u201c\u7845\u8d28\u4e3b\u4f53\u201d\uff09\u3002 3. \u5c06LLM\u5e94\u7528\u4e8e\u6a21\u62df\u4eba\u7c7b\u7fa4\u4f53\u884c\u4e3a\uff08\u5982\u201c\u7845\u8d28\u793e\u4f1a\u201d\uff09\u3002 4. \u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff0c\u521b\u5efa\u53ef\u63a7\u5236\u548c\u5bfc\u5411\u7684\u6218\u7565\u5bf9\u8bdd\u6a21\u578b\u3002 \u7efc\u5408\u4ee5\u4e0a\u51e0\u70b9\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6280\u672f\u6784\u5efa\u57fa\u4e8eLLMs\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u901a\u8fc7\u6a21\u62df\u548c\u4f2a\u88c5\u7684\u201c\u9884\u6d4b\u201d\uff0c\u6210\u4e3a\u4e2a\u4f53\u3001\u793e\u4f1a\u548c\u653f\u6cbb\u63a7\u5236\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u64cd\u63a7\u4eba\u7c7b\u7684\u884c\u4e3a\u3001\u610f\u56fe\u548c\u884c\u52a8\u3002|\n", "2405.06682": "|**2024-05-05**|**Self-Reflection in LLM Agents: Effects on Problem-Solving Performance**|Matthew Renze et.al.|[2405.06682](http://arxiv.org/abs/2405.06682)|**[link](https://github.com/matthewrenze/self-reflection)**|**\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u81ea\u6211\u53cd\u601d\u5bf9\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba9\u4e5d\u79cd\u6d41\u884c\u7684LLMs\u56de\u7b54\u4e00\u7cfb\u5217\u9009\u62e9\u9898\uff0c\u4ee5\u5efa\u7acb\u6027\u80fd\u57fa\u7ebf\u3002\u5bf9\u4e8e\u56de\u7b54\u9519\u8bef\u7684\u95ee\u9898\uff0c\u6211\u4eec\u6307\u5bfc\u516b\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u81ea\u6211\u53cd\u601dLLM\u4ee3\u7406\u53cd\u601d\u5176\u9519\u8bef\uff0c\u5e76\u4e3a\u81ea\u5df1\u63d0\u4f9b\u6539\u8fdb\u95ee\u9898\u89e3\u51b3\u7684\u6307\u5bfc\u3002\u7136\u540e\uff0c\u6839\u636e\u8fd9\u4e9b\u6307\u5bfc\uff0c\u6bcf\u4e2a\u53cd\u601d\u578b\u4ee3\u7406\u91cd\u65b0\u5c1d\u8bd5\u56de\u7b54\u540c\u6837\u7684\u95ee\u9898\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLM\u4ee3\u7406\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u663e\u8457\u63d0\u9ad8\u4e86\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff08$p < 0.001$\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6bd4\u8f83\u4e86\u5404\u79cd\u81ea\u6211\u53cd\u601d\u65b9\u5f0f\u5bf9\u6027\u80fd\u7684\u5355\u72ec\u8d21\u732e\u3002\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\uff1ahttps://github.com/matthewrenze/self-reflection\u3002**|\n", "2405.02858": "|**2024-05-05**|**Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation**|Jinyu Cai et.al.|[2405.02858](http://arxiv.org/abs/2405.02858)|**[link](https://github.com/BlueLinkX/GA-MAS)**|**\u793e\u4ea4\u5a92\u4f53\u5e73\u53f0\u5982Twitter\u3001Reddit\u548c\u65b0\u6d6a\u5fae\u535a\u5728\u5168\u7403\u4ea4\u6d41\u4e2d\u626e\u6f14\u91cd\u8981\u89d2\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5730\u7f18\u653f\u6cbb\u654f\u611f\u533a\u57df\u5e38\u5e38\u53d7\u5230\u4e25\u683c\u76d1\u7ba1\u3002\u8fd9\u4fc3\u4f7f\u7528\u6237\u5728\u53d7\u9650\u7684\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e2d\u5de7\u5999\u5730\u8c03\u6574\u6c9f\u901a\u65b9\u5f0f\uff0c\u7ecf\u5e38\u4f7f\u7528\u7f16\u7801\u8bed\u8a00\u3002\u8fd9\u79cd\u8bed\u8a00\u6a21\u5f0f\u7684\u53d8\u5316\u4e0d\u4ec5\u662f\u4e3a\u4e86\u5bf9\u6297\u76d1\u7ba1\uff0c\u4e5f\u662f\u8bed\u8a00\u6f14\u5316\u7684\u751f\u52a8\u4f8b\u8bc1\uff0c\u5c55\u793a\u4e86\u793e\u4f1a\u548c\u6280\u672f\u538b\u529b\u4e0b\u8bed\u8a00\u5982\u4f55\u81ea\u7136\u6f14\u53d8\u3002\u7814\u7a76\u53d7\u9650\u5236\u793e\u4ea4\u5a92\u4f53\u73af\u5883\u4e0b\u8bed\u8a00\u7684\u6f14\u53d8\u5bf9\u4e8e\u4fdd\u969c\u8a00\u8bba\u81ea\u7531\u3001\u4f18\u5316\u5185\u5bb9\u7ba1\u7406\u4ee5\u53ca\u63a8\u52a8\u8bed\u8a00\u5b66\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6a21\u62df\u6846\u67b6\uff0c\u7528\u4e8e\u63a2\u7d22\u5728\u4e25\u683c\u76d1\u7ba1\u4e0b\u7684\u7528\u6237\u8bed\u8a00\u8fdb\u5316\u3002\u8be5\u6846\u67b6\u5305\u542b\u5bf9\u8bdd\u76d1\u7763\u7684LLM\u9a71\u52a8\u4ee3\u7406\u548c\u53c2\u4e0e\u8005\u4ee3\u7406\uff0c\u5b83\u4eec\u5728\u4e92\u52a8\u4e2d\u53d1\u5c55\u8bed\u8a00\u7b56\u7565\uff0c\u6a21\u62df\u5728\u89c4\u907f\u793e\u4ea4\u5a92\u4f53\u89c4\u5219\u7684\u73af\u5883\u4e2d\u4ea4\u6d41\u65b9\u5f0f\u7684\u6f14\u53d8\u3002\u901a\u8fc7\u4ece\u62bd\u8c61\u573a\u666f\u5230\u73b0\u5b9e\u60c5\u5883\u7684\u591a\u79cd\u60c5\u666f\u8bc4\u4f30\uff0c\u7814\u7a76\u7ed3\u679c\u663e\u793aLLMs\u80fd\u591f\u6709\u6548\u6a21\u62df\u53d7\u9650\u73af\u5883\u4e2d\u7684\u590d\u6742\u8bed\u8a00\u52a8\u6001\u548c\u4ea4\u4e92\uff0c\u968f\u7740\u8fdb\u5316\uff0c\u5b83\u4eec\u5728\u89c4\u907f\u76d1\u7763\u548c\u4fe1\u606f\u51c6\u786e\u6027\u65b9\u9762\u8868\u73b0\u51fa\u63d0\u5347\u3002\u6b64\u5916\uff0c\u7814\u7a76\u53d1\u73b0LLM\u4ee3\u7406\u9488\u5bf9\u4e0d\u540c\u7684\u573a\u666f\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u7b56\u7565\u3002**|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533](http://arxiv.org/abs/2405.01533)|**[link](https://github.com/nvlabs/omnidrive)**|**\u968f\u7740\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5bf9\u4e8e\u57fa\u4e8e\u8fd9\u4e9b\u6a21\u578b\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u8868\u73b0\u51fa\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u671f\u671b\u5229\u7528\u5b83\u4eec\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u5c06MLLMs\u7684\u5f3a\u9879\u5e94\u7528\u4e8e\u9a7e\u9a76\u4efb\u52a1\u7684\u89c4\u5212\u90e8\u5206\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u89c4\u5212\u9700\u8981\u5bf9\u4e09\u7ef4\u73af\u5883\u6709\u5168\u9762\u7684\u7406\u89e3\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u4e8c\u7ef4\u63a8\u7406\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u6a21\u578b\u4e0e3D\u9a7e\u9a76\u4efb\u52a1\u7684\u7d27\u5bc6\u5951\u5408\u3002\u6211\u4eec\u9996\u5148\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u76843D MLLM\u67b6\u6784\uff0c\u5b83\u5229\u7528\u7a00\u758f\u67e5\u8be2\u6280\u672f\u5c06\u89c6\u89c9\u8868\u793a\u63d0\u5347\u5e76\u538b\u7f29\u5230\u4e09\u7ef4\u7a7a\u95f4\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u8fd9\u79cd\u57fa\u4e8e\u67e5\u8be2\u7684\u8868\u793a\u65b9\u5f0f\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u540c\u65f6\u7f16\u7801\u52a8\u6001\u7269\u4f53\u548c\u9759\u6001\u5730\u56fe\u5143\u7d20\uff08\u5982\u9053\u8def\uff09\uff0c\u4e3a\u611f\u77e5\u548c\u884c\u52a8\u7684\u5bf9\u9f50\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5316\u7684\u4e09\u7ef4\u4e16\u754c\u6a21\u578b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86OmniDrive-nuScenes\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u89c6\u89c9\u95ee\u7b54\u6570\u636e\u96c6\uff0c\u5b83\u901a\u8fc7\u5168\u9762\u7684\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\uff08\u5982\u573a\u666f\u63cf\u8ff0\u3001\u4ea4\u901a\u89c4\u5219\u7406\u89e3\u3001\u4e09\u7ef4\u5b9a\u4f4d\u3001\u53cd\u4e8b\u5b9e\u63a8\u7406\u3001\u51b3\u7b56\u5236\u5b9a\u548c\u89c4\u5212\uff09\u6765\u8003\u9a8c\u6a21\u578b\u5728\u590d\u6742\u4e09\u7ef4\u573a\u666f\u4e2d\u7684\u771f\u6b63\u60c5\u5883\u610f\u8bc6\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u63d0\u51fa\u7684\u67b6\u6784\u6709\u6548\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u590d\u6742\u4e09\u7ef4\u73af\u5883\u4e2d\u8fdb\u884c\u63a8\u7406\u548c\u89c4\u5212\u65f6\uff0c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002**|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972](http://arxiv.org/abs/2405.00972)|**[link](https://github.com/pnnl/cactus)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCACTUS\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u7ed3\u5408\u4e86\u5316\u5b66\u4fe1\u606f\u5b66\u5de5\u5177\uff0c\u65e8\u5728\u63d0\u5347\u5728\u5316\u5b66\u548c\u5206\u5b50\u53d1\u73b0\u9886\u57df\u7684\u9ad8\u7ea7\u63a8\u7406\u4e0e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u7814\u7a76\u8005\u4eec\u4f7f\u7528\u5305\u62ecGemma-7b\u3001Falcon-7b\u3001MPT-7b\u3001Llama2-7b\u548cMistral-7b\u5728\u5185\u7684\u591a\u6b3e\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u5bf9CACTUS\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6027\u80fd\u8bc4\u4f30\uff0c\u901a\u8fc7\u6570\u5343\u4e2a\u5316\u5b66\u95ee\u9898\u7684\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cCACTUS\u660e\u663e\u4f18\u4e8e\u57fa\u7840\u6a21\u578b\uff0c\u5176\u4e2dGemma-7b\u548cMistral-7b\u65e0\u8bba\u91c7\u7528\u4f55\u79cd\u63d0\u793a\u7b56\u7565\uff0c\u8868\u73b0\u6700\u4e3a\u51fa\u8272\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u9886\u57df\u7279\u5b9a\u63d0\u793a\u548c\u786c\u4ef6\u914d\u7f6e\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5f3a\u8c03\u4e86\u63d0\u793a\u5de5\u7a0b\u7684\u91cd\u8981\u6027\uff0c\u5e76\u6307\u51fa\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u90e8\u7f72\u8f83\u5c0f\u6a21\u578b\u53ef\u80fd\u4e0d\u4f1a\u663e\u8457\u727a\u7272\u51c6\u786e\u6027\u3002 CACTUS\u901a\u8fc7\u878d\u5408\u5f00\u6e90\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8ba4\u77e5\u529f\u80fd\u4e0e\u4e13\u4e1a\u5de5\u5177\uff0c\u80fd\u591f\u534f\u52a9\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u5206\u5b50\u6027\u8d28\u9884\u6d4b\u3001\u76f8\u4f3c\u6027\u641c\u7d22\u548c\u836f\u7269\u9002\u7528\u6027\u8bc4\u4f30\u7b49\u4efb\u52a1\u3002\u4f5c\u4e3a\u5316\u5b66\u4fe1\u606f\u5b66\u9886\u57df\u7684\u91cd\u5927\u7a81\u7834\uff0cCACTUS\u4e3a\u5316\u5b66\u5bb6\u548c\u5206\u5b50\u63a2\u7d22\u8005\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u5de5\u5177\uff0c\u6709\u671b\u52a0\u901f\u79d1\u5b66\u7814\u7a76\uff0c\u63a8\u52a8\u65b0\u578b\u6709\u6548\u3001\u5b89\u5168\u836f\u7269\u3001\u50ac\u5316\u5242\u548c\u6750\u6599\u7684\u53d1\u73b0\u3002\u6b64\u5916\uff0cCACTUS\u4e0e\u81ea\u52a8\u5316\u5b9e\u9a8c\u5e73\u53f0\u7684\u96c6\u6210\u4ee5\u53ca\u5b9e\u65f6\u6570\u636e\u9a71\u52a8\u51b3\u7b56\u7684\u80fd\u529b\uff0c\u4e3a\u81ea\u4e3b\u53d1\u73b0\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u3002**|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978](http://arxiv.org/abs/2404.18978)|null|\u968f\u7740\u6559\u80b2\u73af\u5883\u4e2d\u5bf9\u5b66\u4e60\u8005\u6a21\u578b\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0c\u7814\u7a76\u91cd\u70b9\u9010\u6e10\u8f6c\u5411\u5982\u4f55\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f8\u7ed3\u5408\uff0c\u63d0\u5347\u5728\u5f00\u653e\u6027\u6587\u672c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u901a\u7528\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e09\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\uff081\uff09\u57fa\u4e8eRL\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8868\u793a\u72b6\u6001\u548c\u884c\u52a8\u7b56\u7565\u4ee5\u5bfb\u627e\u6700\u4f73\u4e92\u52a8\u65b9\u5f0f\uff1b\uff082\uff09\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5229\u7528\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u901a\u8fc7\u63d0\u793a\u8fdb\u884c\u64cd\u4f5c\uff1b\uff083\uff09\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\uff0c\u65e8\u5728\u63d0\u9ad8\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e9b\u4ee3\u7406\u7684\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PharmaSimText\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81eaPharmaSim\u865a\u62df\u836f\u5e97\u73af\u5883\u7684\u65b0\u57fa\u51c6\uff0c\u4e13\u6ce8\u4e8e\u8bca\u65ad\u5bf9\u8bdd\u5b9e\u8df5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRL\u57fa\u7840\u7684\u4ee3\u7406\u5728\u4efb\u52a1\u5b8c\u6210\u65b9\u9762\u8868\u73b0\u4f18\u79c0\uff0c\u4f46\u5728\u63d0\u95ee\u8d28\u91cf\u4e0a\u6709\u6240\u6b20\u7f3a\uff1b\u800cLLM\u57fa\u7840\u7684\u4ee3\u7406\u5728\u63d0\u95ee\u80fd\u529b\u4e0a\u8f83\u5f3a\uff0c\u4f46\u4efb\u52a1\u5b8c\u6210\u5ea6\u4e0d\u9ad8\u3002\u6700\u540e\uff0c\u6df7\u5408LLM\u8f85\u52a9RL\u7684\u4ee3\u7406\u5c55\u793a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u7684\u6f5c\u529b\uff0c\u8bc1\u5b9e\u4e86RL\u4e0eLLMs\u7ed3\u5408\u7528\u4e8e\u5f00\u53d1\u5f00\u653e\u6027\u5b66\u4e60\u73af\u5883\u9ad8\u8868\u73b0\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021](http://arxiv.org/abs/2404.18021)|null|\u968f\u7740\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u7684\u5174\u8d77\uff0c\u7cbe\u786e\u4fee\u6539\u9057\u4f20\u4fe1\u606f\u5df2\u6210\u4e3a\u53ef\u80fd\uff0c\u4f46\u9ad8\u6548\u57fa\u56e0\u7f16\u8f91\u7cfb\u7edf\u7684\u6784\u5efa\u9700\u8981\u6df1\u5165\u7406\u89e3CRISPR\u6280\u672f\u53ca\u5176\u590d\u6742\u5b9e\u9a8c\u80cc\u666f\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bf8\u591a\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u751f\u7269\u8bbe\u8ba1\u95ee\u9898\u4e0a\u5f80\u5f80\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u3002\u672c\u6587\u4ecb\u7ecdCRISPR-GPT\uff0c\u4e00\u4e2a\u589e\u5f3a\u578bLLM\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u9886\u57df\u77e5\u8bc6\u548c\u5916\u90e8\u5de5\u5177\uff0c\u4ee5\u81ea\u52a8\u5316\u5e76\u63d0\u5347\u57fa\u4e8eCRISPR\u7684\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\u8bbe\u8ba1\u8fc7\u7a0b\u3002CRISPR-GPT\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u534f\u52a9\u9009\u62e9CRISPR\u7cfb\u7edf\u3001\u8bbe\u8ba1\u5f15\u5bfcRNA\u3001\u63a8\u8350\u7ec6\u80de\u9012\u9001\u65b9\u6cd5\u3001\u8d77\u8349\u534f\u8bae\u4ee5\u53ca\u8bbe\u8ba1\u9a8c\u8bc1\u5b9e\u9a8c\u4ee5\u786e\u8ba4\u7f16\u8f91\u7ed3\u679c\u3002\u6211\u4eec\u5c55\u793a\u4e86CRISPR-GPT\u5982\u4f55\u5e2e\u52a9\u975e\u4e13\u5bb6\u7814\u7a76\u4eba\u5458\u4ece\u5934\u5f00\u59cb\u8fdb\u884c\u57fa\u56e0\u7f16\u8f91\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u5b9e\u9645\u6848\u4f8b\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u81ea\u52a8\u5316\u57fa\u56e0\u7f16\u8f91\u8bbe\u8ba1\u7684\u4f26\u7406\u548c\u76d1\u7ba1\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u8d1f\u8d23\u4efb\u548c\u900f\u660e\u4f7f\u7528\u6b64\u7c7b\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u76ee\u6807\u662f\u5f25\u5408\u521d\u7ea7\u751f\u7269\u7814\u7a76\u8005\u4e0eCRISPR\u57fa\u56e0\u7ec4\u5de5\u7a0b\u6280\u672f\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u5c55\u793aLLM\u4ee3\u7406\u5728\u4fc3\u8fdb\u590d\u6742\u751f\u7269\u53d1\u73b0\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2404.17833": "|**2024-04-27**|**Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs**|Zhenlan Ji et.al.|[2404.17833](http://arxiv.org/abs/2404.17833)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u4e2d\uff0c\u7279\u522b\u662f\u5728\u5fc3\u7406\u5065\u5eb7\u652f\u6301\u3001\u5316\u5b66\u5408\u6210\u548c\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5c55\u73b0\u6548\u7528\uff0c\u4eba\u4eec\u53d1\u73b0\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u957f\u671f\u89c4\u5212\u65f6\u5bb9\u6613\u4ea7\u751f\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u2014\u2014PDoctor\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u7406\u89e3LLM\u4ee3\u7406\u7684\u9519\u8bef\u89c4\u5212\u3002PDoctor\u9996\u5148\u5b9a\u4e49\u4e86\u4e00\u4e2a\u9886\u57df\u7279\u5b9a\u7684\u8bed\u8a00\uff08DSL\uff09\uff0c\u7528\u4e8e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u501f\u52a9Z3\u7ea6\u675f\u6c42\u89e3\u5668\u751f\u6210\u5404\u79cd\u8f93\u5165\uff0c\u8fd9\u4e9b\u8f93\u5165\u662f\u63cf\u8ff0\u4e00\u7cfb\u5217\u4efb\u52a1\u5b8c\u6210\u9700\u6c42\u7684\u81ea\u7136\u8bed\u8a00\u6bb5\u843d\u3002\u7136\u540e\uff0cPDoctor\u4ece\u8fd9\u4e9b\u9700\u6c42\u4e2d\u63d0\u53d6\u7ea6\u675f\uff0c\u5f62\u6210\u4e00\u4e2a\u6d4b\u8bd5\u57fa\u51c6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u4e3b\u6d41\u7684\u4ee3\u7406\u6846\u67b6\u548c\u4e24\u4e2a\u5f3a\u5927\u7684LLMs\uff08GPT-3.5\u548cGPT-4\uff09\u5bf9PDoctor\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u80fd\u6709\u6548\u8bc6\u522b\u4ee3\u7406\u89c4\u5212\u4e2d\u7684\u5404\u79cd\u9519\u8bef\uff0c\u5e76\u4e3a\u5f00\u53d1\u8005\u548c\u7528\u6237\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u9519\u8bef\u7279\u6027\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u66ff\u4ee3\u8bbe\u8ba1\u548c\u6269\u5c55PDoctor\u7684\u65b9\u5411\u3002|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662](http://arxiv.org/abs/2404.17662)|**[link](https://github.com/alickzhu/player)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u589e\u5f3a\u4e86\u4ee3\u7406\u95f4\u7684\u901a\u4fe1\u548c\u793e\u4f1a\u4ea4\u4e92\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u6d89\u53ca\u7ade\u4e89\u4e0e\u5408\u4f5c\u7684\u52a8\u6001\u73af\u5883\u4e2d\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u590d\u6742\u63a8\u7406\u7684\u6784\u5efa\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5c24\u5176\u662f\u56e0\u4e3a\u57fa\u4e8e\u4fe1\u606f\u56fe\u7684\u641c\u7d22\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPLAYER*\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u4efb\u610f\u91c7\u6837\u5f0f\u89c4\u5212\u5668\u7684\u65b0\u6846\u67b6\uff0c\u5b83\u7ed3\u5408\u4e86\u4f20\u611f\u5668\u548c\u526a\u679d\u6280\u672f\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5b8c\u5168\u4f9d\u8d56\u4e8e\u95ee\u9898\u9a71\u52a8\u7684\u641c\u7d22\u6846\u67b6\uff0c\u9002\u7528\u4e8e\u9ad8\u96be\u5ea6\u7684\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u53ef\u91cf\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u901a\u8fc7\u591a\u9879\u9009\u62e9\u9898\u6765\u6d4b\u8bd5\uff0c\u5e76\u521b\u5efa\u4e86WellPlay\u6570\u636e\u96c6\uff0c\u5305\u542b1,482\u4e2a\u95ee\u7b54\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPLAYER*\u5728\u590d\u6742\u52a8\u6001\u73af\u5883\u4e2d\u7684\u6548\u7387\u548c\u6027\u80fd\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u53ef\u91cf\u5316\u7684\u5bf9\u6bd4\u7ed3\u679c\u3002**|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525](http://arxiv.org/abs/2404.17525)|null|\u4f20\u7edf\u7684\u673a\u68b0\u8bbe\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u901a\u8fc7\u7ecf\u9a8c\u5f15\u5bfc\u7684\u4fee\u6539\u548c\u6709\u9650\u5143\u5206\u6790\uff08FEA\uff09\u6765\u6ee1\u8db3\u7279\u5b9a\u9700\u6c42\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8017\u65f6\u4e14\u9ad8\u5ea6\u4f9d\u8d56\u4e2a\u4eba\u77e5\u8bc6\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u4e86\u8bb8\u591a\u673a\u5668\u5b66\u4e60\u6a21\u578b\u6765\u7b80\u5316\u7e41\u7410\u7684\u4e13\u5bb6\u9a71\u52a8\u8fed\u4ee3\u8fc7\u7a0b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u9700\u8981\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u5176\u8bad\u7ec3\u9886\u57df\u548c\u4efb\u52a1\uff0c\u9650\u5236\u4e86\u8de8\u4efb\u52a1\u5e94\u7528\u3002\u8fd9\u5728\u81ea\u52a8\u5316\u6548\u7387\u4e0e\u8d44\u6e90\u9700\u6c42\u4e4b\u95f4\u5f62\u6210\u4e86\u6743\u8861\u3002 \u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u5c06\u9884\u8bad\u7ec3\u7684\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u6709\u9650\u5143\u6a21\u5757\u7ed3\u5408\u3002\u6709\u9650\u5143\u6a21\u5757\u8bc4\u4f30\u6bcf\u4e2a\u8bbe\u8ba1\u5e76\u63d0\u4f9b\u5173\u952e\u53cd\u9988\uff0c\u5f15\u5bfcLLMs\u4e0d\u65ad\u5b66\u4e60\u3001\u89c4\u5212\u3001\u751f\u6210\u548c\u4f18\u5316\u8bbe\u8ba1\uff0c\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u8fdb\u884c\u4e13\u95e8\u8bad\u7ec3\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6841\u67b6\u7ed3\u6784\u7684\u8fed\u4ee3\u4f18\u5316\u4e2d\u5c55\u793a\u8fd9\u79cd\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u80fd\u591f\u6839\u636e\u7ed3\u6784\u5316\u7684\u53cd\u9988\u548c\u6807\u51c6\u8c03\u6574\u8bbe\u8ba1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6210\u529f\u751f\u6210\u7b26\u5408\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u6841\u67b6\u7ed3\u6784\u8bbe\u8ba1\uff0c\u6210\u529f\u7387\u9ad8\u8fbe90%\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6240\u65bd\u52a0\u7684\u7ea6\u675f\u6761\u4ef6\u3002\u901a\u8fc7\u63d0\u793a\u5f0f\u4f18\u5316\u6280\u672f\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u4ee3\u7406\u5728\u63a5\u6536\u5230\u89e3-\u5f97\u5206\u5bf9\u540e\uff0c\u80fd\u591f\u6839\u636e\u5176\u5185\u5728\u63a8\u7406\u80fd\u529b\u8fed\u4ee3\u4f18\u5316\u8bbe\u8ba1\u4ee5\u6ee1\u8db3\u89c4\u683c\u8981\u6c42\u3002 LLM\u4ee3\u7406\u80fd\u591f\u4ea7\u751f\u53ef\u884c\u7684\u8bbe\u8ba1\u5e76\u6839\u636e\u5176\u56fa\u6709\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4f18\u5316\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u6709\u6f5c\u529b\u81ea\u4e3b\u53d1\u5c55\u548c\u5b9e\u65bd\u6709\u6548\u7684\u8bbe\u8ba1\u7b56\u7565\u3002|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460](http://arxiv.org/abs/2404.17460)|null|\u672c\u6587\u8ba8\u8bba\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5bf9\u8bdd\u5f0f\u8f85\u5bfc\u7cfb\u7edf\uff08Conversational Tutoring Systems\uff0cCTS\uff09\uff0c\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u9996\u5148\uff0c\u7cfb\u7edf\u901a\u8fc7\u81ea\u52a8\u4ece\u8bfe\u7a0b\u6587\u672c\u4e2d\u751f\u6210\u6613\u4e8e\u7f16\u8f91\u7684\u6559\u5b66\u811a\u672c\uff0c\u5b9e\u73b0AI\u8f85\u52a9\u7684\u5185\u5bb9\u521b\u4f5c\u3002\u5176\u6b21\uff0c\u7cfb\u7edf\u901a\u8fc7\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff08Ruffle\u548cRiley\uff09\u4ee5\u5b66\u4e60\u6559\u5b66\u6a21\u5f0f\u8fd0\u884c\uff0c\u5206\u522b\u626e\u6f14\u5b66\u751f\u548c\u6559\u6388\u89d2\u8272\uff0c\u8fdb\u884c\u81ea\u7531\u5f62\u5f0f\u7684\u5bf9\u8bdd\uff0c\u9075\u5faa\u5178\u578b\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\u7684\u5185\u73af\u548c\u5916\u73af\u7ed3\u6784\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5728\u7ebf\u7528\u6237\u7814\u7a76\uff08N=200\uff09\u4e2d\u5bf9\u6bd4\u4e86\u8be5\u7cfb\u7edf\u4e0e\u7b80\u5355\u7684\u95ee\u7b54\u804a\u5929\u673a\u5668\u4eba\u548c\u9605\u8bfb\u6d3b\u52a8\u5728\u652f\u6301\u751f\u7269\u5b66\u8bfe\u7a0b\u7684\u6548\u679c\u3002\u7814\u7a76\u5206\u6790\u4e86\u7cfb\u7edf\u4f7f\u7528\u6a21\u5f0f\u3001\u9884\u540e\u6d4b\u8bd5\u6210\u7ee9\u4ee5\u53ca\u7528\u6237\u4f53\u9a8c\u8c03\u67e5\uff0c\u7ed3\u679c\u663e\u793a\u7528\u6237\u5bf9Ruffle&Riley\u7684\u53c2\u4e0e\u5ea6\u9ad8\uff0c\u7406\u89e3\u529b\u5f3a\uff0c\u5e76\u8ba4\u4e3a\u63d0\u4f9b\u7684\u652f\u6301\u6709\u5e2e\u52a9\u3002\u5c3d\u7ba1Ruffle&Riley\u7528\u6237\u7684\u5b8c\u6210\u65f6\u95f4\u8f83\u957f\uff0c\u4f46\u5728\u77ed\u671f\u5b66\u4e60\u6210\u6548\u4e0a\u5e76\u672a\u53d1\u73b0\u663e\u8457\u5dee\u5f02\uff0c\u4f18\u4e8e\u9605\u8bfb\u6d3b\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u67b6\u6784\u548c\u7528\u6237\u7814\u7a76\u4e3a\u672a\u6765CTS\u8bbe\u8ba1\u8005\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u6e90\u6211\u4eec\u7684\u7cfb\u7edf\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8eLLM\u7684\u5b66\u4e60\u6280\u672f\u6709\u6548\u6559\u5b66\u8bbe\u8ba1\u7684\u7814\u7a76\u3002|\n", "2404.17153": "|**2024-04-26**|**A Unified Debugging Approach via LLM-Based Multi-Agent Synergy**|Cheryl Lee et.al.|[2404.17153](http://arxiv.org/abs/2404.17153)|null|\u5728\u8f6f\u4ef6\u8c03\u8bd5\u8fd9\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u52aa\u529b\u5b9e\u73b0\u81ea\u52a8\u5316\uff0c\u5305\u62ec\u6545\u969c\u5b9a\u4f4d\u548c\u4fee\u590d\u751f\u6210\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u8c03\u8bd5\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4f20\u7edf\u548c\u57fa\u4e8eLLM\u7684\u8c03\u8bd5\u5de5\u5177\u9762\u4e34\u4e09\u5927\u6311\u6218\uff1a1\uff09\u4e0a\u6e38\u7684\u6545\u969c\u5b9a\u4f4d\u4e0d\u51c6\u786e\u4f1a\u6ce2\u53ca\u4e0b\u6e38\u7684\u4fee\u590d\uff1b2\uff09\u5904\u7406\u590d\u6742\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\u4e0d\u8db3\uff1b3\uff09\u5ffd\u89c6\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u81ea\u52a8\u5316\u7684\u3001\u7edf\u4e00\u7684\u8c03\u8bd5\u6846\u67b6\u2014\u2014FixAgent\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u534f\u540c\u3002FixAgent\u80fd\u6267\u884c\u7aef\u5230\u7aef\u7684\u6545\u969c\u5b9a\u4f4d\u3001\u4fee\u590d\u548c\u5206\u6790\u3002 \u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0cLLMs\u80fd\u591f\u4ece\u4eba\u7c7b\u5f00\u53d1\u8005\u8ba4\u53ef\u7684\u901a\u7528\u8f6f\u4ef6\u5de5\u7a0b\u539f\u5219\u4e2d\u83b7\u76ca\uff0c\u6bd4\u5982\u201c\u6a61\u76ae\u9e2d\u8c03\u8bd5\u201d\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7a0b\u5e8f\u529f\u80fd\u548c\u903b\u8f91\u9519\u8bef\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u4e2a\u7075\u611f\u6765\u6e90\u4e8e\u201c\u6a61\u76ae\u9e2d\u201d\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u4ee3\u7406\u4e13\u4e1a\u5316\u4e0e\u534f\u540c\u3001\u5173\u952e\u53d8\u91cf\u8ddf\u8e2a\u548c\u7a0b\u5e8f\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u4fc3\u4f7fLLMs\u63d0\u4f9b\u660e\u786e\u7684\u89e3\u91ca\uff0c\u5e76\u805a\u7126\u4e8e\u5173\u952e\u7684\u7a0b\u5e8f\u903b\u8f91\u4fe1\u606f\u3002\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684QuixBugs\u6570\u636e\u96c6\u4e0a\uff0cFixAgent\u6210\u529f\u4fee\u590d\u4e8680\u4e2abug\u4e2d\u768479\u4e2a\uff0c\u5176\u4e2d9\u4e2a\u662f\u4e4b\u524d\u672a\u89e3\u51b3\u7684\u3002\u5b83\u8fd8\u5728CodeFlaws\u4e0a\u5408\u7406\u5730\u4fee\u590d\u4e861.9\u500d\u4e8e\u6700\u4f73\u4fee\u590d\u5de5\u5177\u7684\u7f3a\u9677\uff0c\u800c\u4e14\u65e0\u9700\u4f4d\u7f6e\u4fe1\u606f\uff0c\u91c7\u6837\u7387\u4f4e\u4e8e0.6%\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4e0e\u4f7f\u7528\u4e0d\u540cLLM\u7684\u57fa\u7ebf\u6a21\u578b\u76f8\u6bd4\uff0cFixAgent\u63d0\u9ad8\u4e86\u7ea620%\u7684\u5408\u7406\u4fee\u590d\u548c\u6b63\u786e\u4fee\u590d\u7387\uff0c\u663e\u793a\u51fa\u6211\u4eec\u8bbe\u8ba1\u7684\u6709\u6548\u6027\u3002 \u6b64\u5916\uff0cFixAgent\u7684\u6b63\u786e\u7387\u9ad8\u8fbe97.26%\uff0c\u8868\u660e\u5b83\u6709\u53ef\u80fd\u514b\u670d\u73b0\u6709\u65b9\u6cd5\u7684\u8fc7\u62df\u5408\u95ee\u9898\u3002\u603b\u7ed3\u6765\u8bf4\uff0cFixAgent\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u81ea\u52a8\u5316\u8c03\u8bd5\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u8f6f\u4ef6\u8c03\u8bd5\u7684\u6548\u7387\u548c\u51c6\u786e\u6027\u3002|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698](http://arxiv.org/abs/2404.16698)|**[link](https://github.com/giorgiopiatti/govsim)**|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51b3\u7b56\u5b89\u5168\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGovernance of the Commons Simulation\u201d\uff08GovSim\uff09\u7684\u6a21\u62df\u5e73\u53f0\uff0c\u65e8\u5728\u7814\u7a76LLMs\u4e2d\u7684\u6218\u7565\u4e92\u52a8\u548c\u5408\u4f5c\u51b3\u7b56\u3002\u901a\u8fc7\u8fd9\u4e2a\u73af\u5883\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u4e4b\u95f4\u8d44\u6e90\u5206\u4eab\u7684\u52a8\u6001\uff0c\u5f3a\u8c03\u4e86\u4f26\u7406\u8003\u91cf\u3001\u6218\u7565\u89c4\u5212\u548c\u8c08\u5224\u6280\u5de7\u7684\u91cd\u8981\u6027\u3002GovSim\u5177\u6709\u7075\u6d3b\u6027\uff0c\u652f\u6301\u6587\u672c\u578b\u4ee3\u7406\uff0c\u5305\u62ecLLMs\u3002\u5229\u7528\u751f\u6210\u5f0f\u4ee3\u7406\u6846\u67b6\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u901a\u7528\u4ee3\u7406\uff0c\u4fbf\u4e8e\u6574\u5408\u4e0d\u540c\u7684LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u5728GovSim\u4e2d\uff0c\u53ea\u670915\u4e2a\u6d4b\u8bd5\u6a21\u578b\u4e2d\u76842\u4e2a\u80fd\u591f\u5b9e\u73b0\u53ef\u6301\u7eed\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u7ba1\u7406\u5171\u4eab\u8d44\u6e90\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u5982\u679c\u79fb\u9664\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u80fd\u529b\uff0c\u5b83\u4eec\u4f1a\u8fc7\u5ea6\u4f7f\u7528\u5171\u4eab\u8d44\u6e90\uff0c\u7a81\u51fa\u4e86\u5408\u4f5c\u4e2d\u6c9f\u901a\u7684\u5173\u952e\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u5927\u591a\u6570LLMs\u7f3a\u4e4f\u666e\u904d\u5316\u7684\u5047\u8bbe\u80fd\u529b\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u63a8\u7406\u6280\u80fd\u7684\u4e00\u4e2a\u91cd\u8981\u5f31\u70b9\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u6240\u6709\u7814\u7a76\u7ed3\u679c\uff0c\u5305\u62ec\u6a21\u62df\u73af\u5883\u3001\u4ee3\u7406\u63d0\u793a\u4ee5\u53ca\u5168\u9762\u7684\u7f51\u7edc\u754c\u9762\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba8\u8bba\u3002|\n", "2404.17605": "|**2024-04-24**|**Autonomous LLM-driven research from data to human-verifiable research papers**|Tal Ifargan et.al.|[2404.17605](http://arxiv.org/abs/2404.17605)|**[link](https://github.com/technion-kishony-lab/data-to-paper)**|**\u968f\u7740\u4eba\u5de5\u667a\u80fd\u63a8\u52a8\u79d1\u5b66\u53d1\u73b0\u7684\u6b65\u4f10\u52a0\u5feb\uff0c\u4eba\u4eec\u8fd8\u4e0d\u6e05\u695a\u5b8c\u5168\u7531AI\u9a71\u52a8\u7684\u7814\u7a76\u662f\u5426\u53ef\u884c\uff0c\u4ee5\u53ca\u5b83\u80fd\u5426\u9075\u5faa\u5173\u952e\u7684\u79d1\u5b66\u4ef7\u503c\u89c2\uff0c\u5982\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002\u4e3a\u4e86\u6a21\u62df\u4eba\u7c7b\u7684\u79d1\u5b66\u7814\u7a76\u5b9e\u8df5\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u201c\u6570\u636e\u5230\u8bba\u6587\u201d\uff08data-to-paper\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u5e73\u53f0\uff0c\u5f15\u5bfc\u76f8\u4e92\u534f\u4f5c\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u901a\u8fc7\u5b8c\u6574\u7684\u5206\u6b65\u9aa4\u7814\u7a76\u6d41\u7a0b\uff0c\u540c\u65f6\u7a0b\u5e8f\u5316\u8ffd\u8e2a\u4fe1\u606f\u6d41\uff0c\u5e76\u5141\u8bb8\u4eba\u7c7b\u76d1\u7763\u548c\u4e92\u52a8\u3002\u5728\u81ea\u52a8\u6a21\u5f0f\u4e0b\uff0c\u4ec5\u63d0\u4f9b\u6807\u6ce8\u6570\u636e\uff0c\u8be5\u5e73\u53f0\u5c31\u80fd\u63d0\u51fa\u5047\u8bbe\uff0c\u8bbe\u8ba1\u7814\u7a76\u8ba1\u5212\uff0c\u7f16\u5199\u548c\u8c03\u8bd5\u5206\u6790\u4ee3\u7801\uff0c\u751f\u6210\u548c\u89e3\u8bfb\u7ed3\u679c\uff0c\u751a\u81f3\u521b\u5efa\u5b8c\u6574\u4e14\u4fe1\u606f\u53ef\u8ffd\u6eaf\u7684\u79d1\u7814\u8bba\u6587\u3002\u5c3d\u7ba1\u7814\u7a76\u65b0\u9896\u6027\u6709\u9650\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u5c55\u793a\u4e86AI\u81ea\u4e3b\u4ece\u6570\u636e\u4e2d\u751f\u6210\u539f\u521b\u5b9a\u91cf\u6d1e\u5bdf\u7684\u80fd\u529b\u3002\u5bf9\u4e8e\u7b80\u5355\u7684\u7814\u7a76\u76ee\u6807\uff0c\u5168\u81ea\u52a8\u6d41\u7a0b\u80fd\u521b\u4f5c\u51fa\u5927\u7ea680-90%\u65e0\u9700\u91cd\u5927\u9519\u8bef\u7684\u7a3f\u4ef6\uff0c\u7136\u800c\u968f\u7740\u76ee\u6807\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u4eba\u7c7b\u7684\u5171\u540c\u53c2\u4e0e\u5bf9\u4e8e\u4fdd\u8bc1\u51c6\u786e\u6027\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u751f\u6210\u7684\u8bba\u6587\u672c\u8eab\u4e5f\u5177\u6709\u5185\u5728\u7684\u53ef\u9a8c\u8bc1\u6027\uff0c\u56e0\u4e3a\u4fe1\u606f\u8ffd\u8e2a\u4f7f\u5f97\u7ed3\u679c\u3001\u65b9\u6cd5\u548c\u6570\u636e\u7684\u94fe\u63a5\u53ef\u4ee5\u7a0b\u5e8f\u5316\u8fdb\u884c\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0cAI\u9a71\u52a8\u7684\u79d1\u7814\u53ef\u4ee5\u52a0\u901f\u79d1\u5b66\u53d1\u73b0\uff0c\u540c\u65f6\u589e\u5f3a\u800c\u975e\u5a01\u80c1\u900f\u660e\u5ea6\u3001\u53ef\u8ffd\u6eaf\u6027\u548c\u53ef\u9a8c\u8bc1\u6027\u3002**|\n", "2404.16115": "|**2024-04-24**|**Online Personalizing White-box LLMs Generation with Neural Bandits**|Zekai Chen et.al.|[2404.16115](http://arxiv.org/abs/2404.16115)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u59cb\u751f\u6210\u4e2a\u6027\u5316\u7684\u6587\u672c\u5185\u5bb9\uff0c\u5982\u4f55\u5728\u4e0d\u4e3a\u6bcf\u4f4d\u7528\u6237\u521b\u5efa\u72ec\u7279\u6a21\u578b\u7684\u8d44\u6e90\u6d88\u8017\u4e0b\u5b9e\u73b0\u9ad8\u6548\u4e2a\u6027\u5316\u6210\u4e86\u65b0\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5728\u7ebf\u65b9\u6cd5\uff0c\u5229\u7528\u795e\u7ecf_bandit\u7b97\u6cd5\u52a8\u6001\u4f18\u5316\u8f6f\u6307\u4ee4\u5d4c\u5165\uff0c\u6839\u636e\u7528\u6237\u53cd\u9988\u8c03\u6574\u5185\u5bb9\uff0c\u4ece\u800c\u63d0\u5347\u767d\u76d2LLMs\u5f00\u653e\u6027\u6587\u672c\u751f\u6210\u7684\u4e2a\u6027\u5316\u6c34\u5e73\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u7684\u4e25\u8c28\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u57fa\u7840\u7b56\u7565\u6709\u663e\u8457\u6027\u80fd\u63d0\u5347\u3002\u7279\u522b\u662f\u9488\u5bf9\u4e2a\u6027\u5316\u65b0\u95fb\u6807\u9898\u751f\u6210\uff0cNeuralTS\u5e26\u6765\u4e86\u9ad8\u8fbe62.9%\u7684\u6700\u4f73ROUGE\u5206\u6570\u63d0\u5347\u4ee5\u53ca2.76%\u7684LLM\u4ee3\u7406\u8bc4\u4f30\u5206\u6570\u589e\u957f\uff0c\u8fd9\u8868\u660e\u5176\u6548\u679c\u663e\u8457\u3002|\n", "2404.15974": "|**2024-04-24**|**A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples**|Lihang Pan et.al.|[2404.15974](http://arxiv.org/abs/2404.15974)|null|## \u7ffb\u8bd1 \u5355\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fde\u63a5\u591a\u4e2aLLM\u4ee3\u7406\u6784\u5efa\u7684\u7f51\u7edc\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u6574\u4f53\u6027\u80fd\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4eba\u673a\u534f\u4f5c\u5de5\u5177\u2014\u2014EasyLAN\uff0c\u65e8\u5728\u5e2e\u52a9\u5f00\u53d1\u8005\u8f7b\u677e\u6784\u5efaLLM\u4ee3\u7406\u7f51\u7edc\uff08LAN\uff09\u3002EasyLAN\u9996\u5148\u6839\u636e\u4efb\u52a1\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u4ec5\u5305\u542b\u4e00\u4e2a\u4ee3\u7406\u7684\u521d\u59cb\u7f51\u7edc\u3002\u63a5\u7740\uff0c\u5b83\u5229\u7528\u5c11\u91cf\u8bad\u7ec3\u793a\u4f8b\u6765\u8c03\u6574\u7f51\u7edc\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u793a\u4f8b\uff0cEasyLAN\u5206\u6790\u8f93\u51fa\u4e0e\u771f\u5b9e\u7ed3\u679c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u5e76\u627e\u51fa\u9519\u8bef\u7684\u539f\u56e0\u3002EasyLAN\u4f1a\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u7b56\u7565\u6765\u4fee\u6b63\u8fd9\u4e9b\u95ee\u9898\u3002\u7528\u6237\u53ef\u4ee5\u4ecb\u5165EasyLAN\u7684\u5de5\u4f5c\u6d41\u7a0b\u6216\u76f4\u63a5\u4fee\u6539LAN\u3002\u6700\u7ec8\uff0cLAN\u4ece\u5355\u4e2a\u4ee3\u7406\u53d1\u5c55\u6210\u591a\u4ee3\u7406\u7684\u7f51\u7edc\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEasyLAN\u80fd\u591f\u5e2e\u52a9\u5f00\u53d1\u8005\u5feb\u901f\u6784\u5efa\u6027\u80fd\u826f\u597d\u7684LAN\u3002|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269](http://arxiv.org/abs/2404.15269)|**[link](https://github.com/gao-g/prelude)**|**\u6211\u4eec\u7814\u7a76\u57fa\u4e8e\u7528\u6237\u5bf9\u8bed\u8a00\u6a21\u578b\u7f16\u8f91\u7684\u4e92\u52a8\u5b66\u4e60\u8bed\u8a00\u4ee3\u7406\u3002\u5728\u8bf8\u5982\u5199\u4f5c\u52a9\u624b\u7684\u5e38\u89c1\u573a\u666f\u4e2d\uff0c\u7528\u6237\u4e0e\u8bed\u8a00\u4ee3\u7406\u4ea4\u4e92\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u751f\u6210\u54cd\u5e94\uff0c\u5e76\u53ef\u80fd\u9009\u62e9\u6027\u5730\u7f16\u8f91\u4ee3\u7406\u7684\u54cd\u5e94\u4ee5\u53cd\u6620\u4ed6\u4eec\u7684\u6f5c\u5728\u504f\u597d\uff0c\u540c\u65f6\u63d0\u9ad8\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u7f16\u8f91\u53cd\u9988\u662f\u81ea\u7136\u4ea7\u751f\u7684\uff0c\u9002\u5408\u7528\u4e8e\u63d0\u5347\u4ee3\u7406\u4e0e\u7528\u6237\u504f\u597d\u7684\u5951\u5408\u5ea6\uff0c\u964d\u4f4e\u540e\u7eed\u7528\u6237\u7684\u7f16\u8f91\u6210\u672c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faPRELUDE\u6846\u67b6\uff0c\u5b83\u6839\u636e\u5386\u53f2\u7f16\u8f91\u6570\u636e\u63a8\u65ad\u7528\u6237\u7684\u6f5c\u5728\u504f\u597d\uff0c\u5e76\u636e\u6b64\u8bbe\u8ba1\u4e00\u4e2a\u63d0\u793a\u7b56\u7565\uff0c\u5f15\u5bfc\u672a\u6765\u7684\u54cd\u5e94\u751f\u6210\uff0c\u907f\u514d\u4e86\u6602\u8d35\u4e14\u96be\u4ee5\u6269\u5c55\u7684\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8fd8\u80fd\u4fdd\u6301\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u5b66\u4e60\u63cf\u8ff0\u6027\u7684\u504f\u597d\u6709\u52a9\u4e8e\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\uff0c\u7528\u6237\u53ef\u4ee5\u67e5\u770b\u548c\u8c03\u6574\u5b66\u4e60\u5230\u7684\u504f\u597d\u3002\u7136\u800c\uff0c\u7528\u6237\u504f\u597d\u53ef\u80fd\u590d\u6742\u591a\u53d8\uff0c\u53d7\u60c5\u5883\u5f71\u54cd\uff0c\u56e0\u6b64\u5b66\u4e60\u8d77\u6765\u5177\u6709\u6311\u6218\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCIPHER\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6839\u636e\u7528\u6237\u7f16\u8f91\u63a8\u65ad\u7ed9\u5b9a\u60c5\u5883\u4e0b\u7684\u7528\u6237\u504f\u597d\u3002\u672a\u6765\uff0cCIPHER\u4f1a\u4ece\u5386\u53f2\u4e2d\u7684k\u4e2a\u6700\u63a5\u8fd1\u7684\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u63a8\u65ad\u51fa\u7684\u504f\u597d\uff0c\u7efc\u5408\u751f\u6210\u54cd\u5e94\u3002\u6211\u4eec\u5728\u603b\u7ed3\u548c\u7535\u5b50\u90ae\u4ef6\u5199\u4f5c\u4e24\u4e2a\u4e92\u52a8\u73af\u5883\u4e2d\u4f7f\u7528GPT-4\u6a21\u62df\u7528\u6237\u8fdb\u884c\u8bc4\u4f30\uff0c\u4e0e\u76f4\u63a5\u4f7f\u7528\u7528\u6237\u7f16\u8f91\u4f46\u4e0d\u5b66\u4e60\u63cf\u8ff0\u6027\u504f\u597d\u7684\u7b97\u6cd5\uff0c\u4ee5\u53ca\u5b66\u4e60\u5168\u5c40\u65e0\u4e0a\u4e0b\u6587\u504f\u597d\u7684\u7b97\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002 \u5728\u4e24\u9879\u4efb\u52a1\u4e2d\uff0cCIPHER\u90fd\u5b9e\u73b0\u4e86\u6700\u4f4e\u7684\u7f16\u8f91\u8ddd\u79bb\u6210\u672c\uff0c\u5e76\u4e14\u5b66\u4e60\u5230\u7684\u504f\u597d\u4e0e\u771f\u5b9e\u504f\u597d\u663e\u793a\u51fa\u663e\u8457\u7684\u76f8\u4f3c\u6027\u3002**|\n", "2404.14387": "|**2024-04-22**|**A Survey on Self-Evolution of Large Language Models**|Zhengwei Tao et.al.|[2404.14387](http://arxiv.org/abs/2404.14387)|**[link](https://github.com/alibabaresearch/damo-convai)**|**## \u6982\u8ff0 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u9886\u57df\u548c\u667a\u80fd\u4ee3\u7406\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4f9d\u8d56\u4eba\u7c7b\u6216\u5916\u90e8\u6a21\u578b\u76d1\u7763\u7684\u73b0\u6709LLMs\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u548c\u591a\u6837\u6027\u589e\u52a0\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6210\u672c\u9ad8\u6602\u548c\u6027\u80fd\u74f6\u9888\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u5e94\u8fd0\u800c\u751f\uff0c\u8fd9\u79cd\u7b56\u7565\u5141\u8bb8LLMs\u81ea\u4e3b\u83b7\u53d6\u3001\u7cbe\u70bc\u5e76\u4ece\u81ea\u8eab\u751f\u6210\u7684\u7ecf\u9a8c\u4e2d\u5b66\u4e60\uff0c\u501f\u9274\u4eba\u7c7b\u7ecf\u9a8c\u5b66\u4e60\u8fc7\u7a0b\uff0c\u6709\u671b\u63a8\u52a8LLMs\u5411\u8d85\u7ea7\u667a\u80fd\u53d1\u5c55\u3002\u672c\u6587\u5168\u9762\u7efc\u8ff0\u4e86LLMs\u4e2d\u7684\u81ea\u6211\u8fdb\u5316\u65b9\u6cd5\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u6982\u5ff5\u6846\u67b6\uff0c\u5c06\u8fdb\u5316\u8fc7\u7a0b\u5212\u5206\u4e3a\u8fed\u4ee3\u5faa\u73af\u7684\u56db\u4e2a\u9636\u6bb5\uff1a\u7ecf\u9a8c\u83b7\u53d6\u3001\u7ecf\u9a8c\u7ec6\u5316\u3001\u66f4\u65b0\u548c\u8bc4\u4f30\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5206\u7c7b\u63a2\u8ba8LLMs\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u8fdb\u5316\u76ee\u6807\uff0c\u5e76\u5bf9\u76f8\u5173\u6587\u732e\u8fdb\u884c\u603b\u7ed3\uff0c\u63d0\u4f9b\u6bcf\u4e2a\u6a21\u5757\u7684\u5206\u7c7b\u548c\u89c1\u89e3\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5f53\u524d\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u4e3a\u52a0\u901f\u81ea\u6f14\u8fdbLLMs\u7684\u53d1\u5c55\u63d0\u4f9b\u5173\u952e\u6d1e\u89c1\u3002**|\n", "2404.13501": "|**2024-04-21**|**A Survey on the Memory Mechanism of Large Language Model based Agents**|Zeyu Zhang et.al.|[2404.13501](http://arxiv.org/abs/2404.13501)|**[link](https://github.com/nuster1128/llm_agent_memory_survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u79d1\u7814\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u57fa\u4e8eLLMs\u7684\u667a\u80fd\u4ee3\u7406\u56e0\u5176\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5bf9\u4e8e\u89e3\u51b3\u9700\u8981\u957f\u671f\u590d\u6742\u4ea4\u4e92\u7684\u73b0\u5b9e\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002\u652f\u6301agent-environment\u4ea4\u4e92\u7684\u5173\u952e\u8981\u7d20\u662f\u4ee3\u7406\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u5c3d\u7ba1\u5df2\u6709\u4f17\u591a\u6709\u524d\u666f\u7684\u8bb0\u5fc6\u8bbe\u8ba1\u88ab\u63d0\u51fa\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u5206\u6563\u5728\u591a\u7bc7\u8bba\u6587\u4e2d\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7efc\u8ff0\u6765\u7cfb\u7edf\u6027\u5730\u603b\u7ed3\u548c\u6bd4\u8f83\uff0c\u672a\u80fd\u63d0\u70bc\u51fa\u901a\u7528\u4e14\u6709\u6548\u7684\u8bbe\u8ba1\u6a21\u5f0f\u4ee5\u542f\u53d1\u540e\u7eed\u7814\u7a76\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4efd\u5173\u4e8eLLM\u57fa\u4ee3\u7406\u8bb0\u5fc6\u673a\u5236\u7684\u5168\u9762\u8c03\u67e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8\u8bb0\u5fc6\u5728LLM\u4ee3\u7406\u4e2d\u7684\u201c\u662f\u4ec0\u4e48\u201d\u4ee5\u53ca\u201c\u4e3a\u4ec0\u4e48\u9700\u8981\u201d\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u56de\u987e\u4e86\u5173\u4e8e\u8bb0\u5fc6\u6a21\u5757\u7684\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4f1a\u5c55\u793a\u8bb0\u5fc6\u6a21\u5757\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u626e\u6f14\u7684\u91cd\u8981\u89d2\u8272\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f1a\u5206\u6790\u73b0\u6709\u5de5\u4f5c\u7684\u5c40\u9650\uff0c\u5e76\u6307\u51fa\u91cd\u8981\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u4e86\u8ddf\u8e2a\u8be5\u9886\u57df\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2aGitHub\u4ed3\u5e93\uff1a\\url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}\u3002**|\n", "2404.11964": "|**2024-04-18**|**From Language Models to Practical Self-Improving Computer Agents**|Alex Sheng et.al.|[2404.11964](http://arxiv.org/abs/2404.11964)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u76f4\u63a5\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u521b\u5efa\u80fd\u591f\u6267\u884c\u5404\u79cd\u8ba1\u7b97\u673a\u4efb\u52a1\u7684\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u5e76\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u6765\u53d1\u5c55\u5de5\u5177\u548c\u589e\u5f3a\u529f\u80fd\uff0c\u4ee5\u89e3\u51b3\u65e5\u76ca\u590d\u6742\u7684\u4efb\u52a1\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u663e\u793a\u51fa\u4ece\u975e\u53c2\u6570\u589e\u5f3a\u4e2d\u83b7\u76ca\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u5927\u91cf\u96c6\u4e2d\u5728\u5f00\u53d1\u8f6f\u4ef6\uff0c\u4ee5\u8d4b\u4e88LLMs\u5404\u79cd\u80fd\u529b\u3002\u6211\u4eec\u5efa\u8bae\uff0c\u901a\u8fc7\u9002\u5f53\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u4e00\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u7cfb\u7edf\u5730\u751f\u6210\u8f6f\u4ef6\u6765\u589e\u5f3a\u81ea\u8eab\uff0c\u800c\u4e0d\u662f\u4f9d\u8d56\u4eba\u7c7b\u5de5\u7a0b\u7684\u9759\u6001\u8f6f\u4ef6\u5f00\u53d1\u3002 \u6211\u4eec\u901a\u8fc7\u4e00\u4e9b\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff1a\u4ec5\u901a\u8fc7\u7ec8\u7aef\u8bbf\u95ee\uff0c\u6211\u4eec\u5f15\u5bfcLLM\u4ee3\u7406\u6dfb\u52a0\u4e86\u68c0\u7d22\u3001\u4e92\u8054\u7f51\u641c\u7d22\u3001\u7f51\u9875\u5bfc\u822a\u548c\u6587\u672c\u7f16\u8f91\u529f\u80fd\u3002\u8be5\u4ee3\u7406\u6709\u6548\u5730\u5229\u7528\u8fd9\u4e9b\u5de5\u5177\u89e3\u51b3\u4e86\u95ee\u9898\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u548c\u57fa\u4e8e\u7f51\u7edc\u7684\u4efb\u52a1\u3002\u8fd9\u79cd\u65b9\u6cd5\u8868\u660e\uff0c\u901a\u8fc7\u8fde\u7eed\u63d0\u95ee\u548c\u5de7\u5999\u7684\u63d0\u793a\u8bbe\u8ba1\uff0cLLM\u80fd\u591f\u81ea\u4e3b\u6269\u5c55\u5176\u529f\u80fd\uff0c\u6267\u884c\u5b9e\u9645\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002|\n", "2404.11794": "|**2024-04-25**|**Automated Social Science: Language Models as Scientist and Subjects**|Benjamin S. Manning et.al.|[2404.11794](http://arxiv.org/abs/2404.11794)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u6784\u5efa\u548c\u6d4b\u8bd5\u793e\u4f1a\u79d1\u5b66\u5047\u8bbe\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u5173\u952e\u5728\u4e8e\u4f7f\u7528\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u3002\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9648\u8ff0\u5047\u8bbe\u7684\u8bed\u8a00\u3001\u6784\u5efaLLM\u57fa\u7840\u4ee3\u7406\u7684\u84dd\u56fe\u3001\u5b9e\u9a8c\u8bbe\u8ba1\u4ee5\u53ca\u6570\u636e\u5206\u6790\u8ba1\u5212\u3002\u62df\u5408\u540e\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u53ef\u4f9b\u9884\u6d4b\u6216\u89c4\u5212\u540e\u7eed\u5b9e\u9a8c\u3002\u6211\u4eec\u901a\u8fc7\u51e0\u4e2a\u573a\u666f\u8fdb\u884c\u4e86\u6f14\u793a\uff1a\u8c08\u5224\u3001\u4fdd\u91ca\u542c\u8bc1\u4f1a\u3001\u6c42\u804c\u9762\u8bd5\u548c\u62cd\u5356\u3002\u5728\u8fd9\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7cfb\u7edf\u65e2\u63d0\u51fa\u4e86\u56e0\u679c\u5173\u7cfb\uff0c\u4e5f\u8fdb\u884c\u4e86\u68c0\u9a8c\uff0c\u53d1\u73b0\u4e86\u4e00\u4e9b\u8bc1\u636e\uff0c\u800c\u6709\u4e9b\u5219\u6ca1\u6709\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u4ece\u8fd9\u4e9b\u793e\u4f1a\u4e92\u52a8\u6a21\u62df\u4e2d\u83b7\u53d6\u7684\u6d1e\u5bdf\u5e76\u975e\u4ec5\u901a\u8fc7\u76f4\u63a5\u8be2\u95eeLLM\u5c31\u80fd\u83b7\u5f97\u3002\u5f53\u7ed9\u5b9a\u6bcf\u4e2a\u573a\u666f\u7684\u5efa\u8bae\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u65f6\uff0cLLM\u5728\u9884\u6d4b\u4f30\u8ba1\u6548\u5e94\u7684\u7b26\u53f7\u65b9\u9762\u8868\u73b0\u826f\u597d\uff0c\u4f46\u65e0\u6cd5\u53ef\u9760\u5730\u9884\u6d4b\u6548\u5e94\u7684\u5927\u5c0f\u3002\u5728\u62cd\u5356\u5b9e\u9a8c\u4e2d\uff0c\u6a21\u62df\u7ed3\u679c\u4e0e\u62cd\u5356\u7406\u8bba\u7684\u9884\u6d4b\u7d27\u5bc6\u543b\u5408\uff0c\u4f46LLM\u76f4\u63a5\u63d0\u53d6\u7684\u6e05\u7b97\u4ef7\u683c\u9884\u6d4b\u4e0d\u51c6\u786e\u3002\u7136\u800c\uff0c\u5982\u679c\u6a21\u578b\u80fd\u57fa\u4e8e\u62df\u5408\u7684\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u8fdb\u884c\u6761\u4ef6\u5316\uff0cLLM\u7684\u9884\u6d4b\u4f1a\u5927\u5e45\u6539\u8fdb\u3002\u7b80\u800c\u8a00\u4e4b\uff0cLLM\u77e5\u9053\u7684\u6bd4\u5b83\u80fd\u7acb\u5373\u8868\u8fbe\u7684\u8981\u591a\u3002|\n", "2404.11483": "|**2024-04-17**|**AgentKit: Flow Engineering with Graphs, not Coding**|Yue Wu et.al.|[2404.11483](http://arxiv.org/abs/2404.11483)|**[link](https://github.com/holmeswww/agentkit)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u76f4\u89c2\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u6846\u67b6\uff08AgentKit\uff09\uff0c\u65e8\u5728\u4e3a\u591a\u529f\u80fd\u4ee3\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u65b9\u6cd5\u3002AgentKit\u901a\u8fc7\u7b80\u5355\u7684\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6784\u5efa\u590d\u6742\u7684\u201c\u601d\u7ef4\u8fc7\u7a0b\u201d\u3002\u5176\u57fa\u672c\u5355\u5143\u662f\u8282\u70b9\uff0c\u5305\u542b\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u3002\u7528\u6237\u53ef\u4ee5\u50cf\u62fc\u63a5\u4e50\u9ad8\u79ef\u6728\u4e00\u6837\u8fde\u63a5\u8fd9\u4e9b\u8282\u70b9\uff0c\u4ece\u800c\u660e\u786e\u8bbe\u8ba1\u51fa\u81ea\u7136\u7ed3\u6784\u5316\u7684\u201c\u601d\u8003\u6d41\u7a0b\u201d\u3002\u4f8b\u5982\uff0c\u5728\u64b0\u5199\u8bba\u6587\u65f6\uff0c\u53ef\u80fd\u7684\u6b65\u9aa4\u5305\u62ec\uff1a1\uff09\u786e\u5b9a\u6838\u5fc3\u4fe1\u606f\uff0c2\uff09\u8bc6\u522b\u7814\u7a76\u7a7a\u767d\u7b49\u3002AgentKit\u7684\u6a21\u5757\u5316\u7279\u6027\u4f7f\u5f97\u9ad8\u7ea7\u529f\u80fd\u5982\u5373\u5174\u7684\u5c42\u6b21\u5316\u89c4\u5212\u3001\u53cd\u601d\u548c\u4ece\u4e92\u52a8\u4e2d\u5b66\u4e60\u53d8\u5f97\u53ef\u80fd\u3002\u7531\u4e8e\u5176\u76f4\u89c2\u4e14\u6a21\u62df\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u7684\u8bbe\u8ba1\uff0c\u5373\u4f7f\u6ca1\u6709\u7f16\u7a0b\u7ecf\u9a8c\u7684\u4eba\u4e5f\u80fd\u521b\u5efa\u548c\u8c03\u6574\u57fa\u7840\u4ee3\u7406\u3002\u5b9a\u91cf\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528AgentKit\u8bbe\u8ba1\u7684\u4ee3\u7406\u5728WebShop\u548cCrafter\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u8fd9\u4e9b\u6210\u679c\u8868\u660eAgentKit\u6709\u6f5c\u529b\u4f7fLLM\u4ee3\u7406\u5728\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u4e0b\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728GitHub\uff1ahttps://github.com/holmeswww/AgentKit\u3002**|\n", "2404.09982": "|**2024-04-15**|**Memory Sharing for Large Language Model based Agents**|Hang Gao et.al.|[2404.09982](http://arxiv.org/abs/2404.09982)|**[link](https://github.com/ghupppp/memorysharingllm)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u6267\u884c\u4efb\u52a1\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u91cd\u5927\u7a81\u7834\uff0c\u5b83\u51cf\u5c11\u4e86\u5bf9\u56fa\u5b9a\u7b54\u6848\u4efb\u52a1\uff08\u5982\u5e38\u8bc6\u95ee\u9898\u548c\u662f\u975e\u67e5\u8be2\uff09\u7684\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03\u9700\u6c42\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u5f00\u653e\u6027\u6311\u6218\u5982\u8bd7\u6b4c\u521b\u4f5c\u65f6\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5c40\u9650\uff0c\u4e3b\u8981\u6e90\u4e8e\u63d0\u4f9b\u7684\u793a\u4f8b\u5168\u9762\u6027\u4ee5\u53ca\u6a21\u578b\u7406\u89e3\u95ee\u9898\u5185\u5bb9\u7684\u80fd\u529b\u4e0d\u8db3\uff0c\u5bfc\u81f4\u8f93\u51fa\u5f80\u5f80\u4e0e\u9884\u671f\u7ed3\u679c\u5927\u76f8\u5f84\u5ead\u3002\u9488\u5bf9\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86Memory-Sharing\uff08MS\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9LLM\u591a\u4ee3\u7406\u7684\u5b9e\u65f6\u8bb0\u5fc6\u5b58\u50a8\u548c\u68c0\u7d22\u7cfb\u7edf\uff0c\u65e8\u5728\u589e\u5f3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u8fc7\u7a0b\u3002\u6bcf\u4e2a\u201c\u8bb0\u5fc6\u201d\u5355\u5143\u8bb0\u5f55\u4e86\u63d0\u51fa\u7684\u67e5\u8be2\u53ca\u5176\u6765\u81eaLLM\u4ee3\u7406\u7684\u5373\u65f6\u54cd\u5e94\uff0c\u4ece\u591a\u4e2a\u7c7b\u4f3c\u4ee3\u7406\u4e2d\u805a\u5408\u8fd9\u4e9b\u8bb0\u5fc6\uff0c\u5f62\u6210\u6240\u6709\u4ee3\u7406\u5171\u4eab\u7684\u4e30\u5bcc\u8bb0\u5fc6\u6c60\u3002MS\u6846\u67b6\u4e0d\u4ec5\u5e2e\u52a9\u4ee3\u7406\u627e\u5230\u7279\u5b9a\u4efb\u52a1\u7684\u76f8\u5173\u793a\u4f8b\uff0c\u8fd8\u8bc4\u4f30\u5176\u8bb0\u5fc6\u7684\u6f5c\u5728\u5229\u7528\u4ef7\u503c\uff0c\u4f9b\u5176\u4ed6\u4ee3\u7406\u672a\u6765\u5e94\u7528\u3002\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u7684\u5b9e\u8bc1\u9a8c\u8bc1\u663e\u793a\uff0cMS\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u4ee3\u7406\u5904\u7406\u5f00\u653e\u6027\u95ee\u9898\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u54ea\u79cd\u8bb0\u5fc6\u6c60\u548c\u68c0\u7d22\u7b56\u7565\u80fd\u66f4\u597d\u5730\u652f\u6301\u4ee3\u7406\uff0c\u4e3aMS\u7684\u672a\u6765\u53d1\u5c55\u63d0\u4f9b\u4e86\u65b9\u5411\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\uff1ahttps://github.com/GHupppp/MemorySharingLLM \u83b7\u53d6\u3002**|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127](http://arxiv.org/abs/2404.09127)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|**### \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u5b83\u4eec\u901a\u5e38\u6821\u51c6\u4e0d\u826f\u4e14\u8fc7\u5ea6\u81ea\u4fe1\uff0c\u7279\u522b\u662f\u5728\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u4e2d\u3002\u4eba\u7c7b\u7684\u51b3\u7b56\u548c\u4fe1\u5fc3\u4e0d\u4ec5\u6e90\u4e8e\u5185\u5728\u4fe1\u5ff5\uff0c\u8fd8\u80fd\u901a\u8fc7\u65e5\u5e38\u89c2\u5bdf\u8fdb\u884c\u8c03\u6574\uff0c\u800c\u73b0\u6709LLM\u7684\u6821\u51c6\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u5355\u4e2a\u6a21\u578b\u7684\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u201c\u96c6\u4f53\u667a\u6167\u201d\uff1a\u591a\u4e2aLLM\u4e4b\u95f4\u7684\u534f\u4f5c\u8868\u8fbe\u80fd\u529b\uff0c\u8fd9\u53ef\u4ee5\u96c6\u4f53\u63d0\u9ad8\u51c6\u786e\u6027\u548c\u6821\u51c6\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u8bad\u7ec3\u540e\u5904\u7406\u7684\u6821\u51c6\u7b56\u7565\u2014\u2014\u534f\u4f5c\u6821\u51c6\uff08Collaborative Calibration\uff09\uff0c\u5b83\u5229\u7528\u591a\u4ee3\u7406\u5de5\u5177\u589e\u5f3a\u7684LLMs\u5728\u6a21\u62df\u7684\u7fa4\u4f53\u8ba8\u8bba\u8fc7\u7a0b\u4e2d\uff0c\u5171\u540c\u63d0\u5347\u6821\u51c6\u80fd\u529b\u548c\u63a8\u7406\u5408\u7406\u6027\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u751f\u6210\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u534f\u4f5c\u6821\u51c6\u7684\u6709\u6548\u6027\uff0c\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u6574\u5408\u96c6\u4f53\u6821\u51c6\u540e\u7684\u4fe1\u5fc3\u8bc4\u4f30\u548c\u63d0\u5347\u6a21\u578b\u9884\u6d4b\u53ef\u9760\u6027\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077](http://arxiv.org/abs/2404.09077)|**[link](https://github.com/zukangy/kgp-curiousllm)**|**\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u5e93\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u6210\u6548\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u5f80\u5f80\u529b\u6709\u4e0d\u902e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5bf9\u4e00\u79cd\u540d\u4e3a\u77e5\u8bc6\u56fe\u8c31\u63d0\u793a\uff08KGP\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u8fdb\u884c\u4e86\u4f18\u5316\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4ee5\u63d0\u5347\u63a8\u7406\u548c\u641c\u7d22\u7cbe\u5ea6\u3002\u7136\u800c\uff0c\u539f\u59cb\u7684KGP\u6846\u67b6\u9700\u8981\u6602\u8d35\u7684\u5927\u89c4\u6a21\u6570\u636e\u5fae\u8c03\uff0c\u5e76\u4e14\u4ecd\u5b58\u5728LLM\u7684\u9519\u8bef\u63a8\u65ad\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u878d\u5165\u63a8\u7406\u80fd\u529b\u7684LLM\u4ee3\u7406\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u7684\u597d\u5947\u5fc3\uff0c\u901a\u8fc7\u63d0\u95ee\u6765\u66f4\u6709\u6548\u5730\u5bfc\u822a\u641c\u7d22\u8fc7\u7a0b\u3002\u8fd9\u4e2a\u7b80\u5355\u7684\u6539\u8fdb\u663e\u8457\u63d0\u9ad8\u4e86LLM\u5728QA\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u521d\u59cbKGP\u6846\u67b6\u7684\u9ad8\u6210\u672c\u548c\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u8fdb\u4e00\u6b65\u53d1\u5c55\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u7cbe\u786e\u3001\u66f4\u5feb\u6377\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684QA\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2404.09043": "|**2024-04-13**|**Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation**|Jia Gu et.al.|[2404.09043](http://arxiv.org/abs/2404.09043)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u98de\u901f\u53d1\u5c55\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u8bed\u8a00\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528LLMs\u6a21\u62df\u4eba\u7c7b\u7684\u884c\u4e3a\u51b3\u7b56\u8fc7\u7a0b\uff0c\u901a\u5e38\u8fd9\u4e9b\u8fc7\u7a0b\u88ab\u8868\u793a\u4e3a\u9a6c\u5c14\u53ef\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDPs\uff09\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u52a8\u4f5c\u9075\u5faa\u7279\u5b9a\u7684\u6982\u7387\u5206\u5e03\uff0c\u5e76\u9700\u8981\u8fed\u4ee3\u91c7\u6837\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u63a2\u7a76LLM\u4ee3\u7406\u7406\u89e3\u6982\u7387\u5206\u5e03\u7684\u80fd\u529b\uff0c\u4ee5\u901a\u8fc7\u6982\u7387\u91c7\u6837\u6307\u5bfc\u884c\u4e3a\u51b3\u7b56\u5e76\u751f\u6210\u884c\u4e3a\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u65b9\u9762\uff1a\u4e00\u662f\u5df2\u77e5\u7cbe\u786e\u6982\u7387\u5206\u5e03\u7684\u6a21\u62df\uff0c\u4e8c\u662f\u6a21\u7cca\u6982\u7387\u5206\u5e03\u7684\u5e8f\u5217\u751f\u6210\u3002 \u5728\u5df2\u77e5\u6982\u7387\u5206\u5e03\u7684\u60c5\u51b5\u4e0b\uff0c\u4ee3\u7406\u9700\u8981\u6839\u636e\u95ee\u9898\u63cf\u8ff0\u63d0\u4f9b\u6982\u7387\u5206\u5e03\u7684\u7c7b\u578b\u548c\u53c2\u6570\uff0c\u7136\u540e\u7ed9\u51fa\u91c7\u6837\u5e8f\u5217\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u663e\u793a\uff0cLLM\u4ee3\u7406\u5728\u8fd9\u65b9\u9762\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4f46\u901a\u8fc7\u7f16\u7a0b\u5de5\u5177\u53ef\u4ee5\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u63d0\u9ad8\u91c7\u6837\u6210\u529f\u7387\u3002\u800c\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\uff0c\u6982\u7387\u5206\u5e03\u5f80\u5f80\u4e0d\u660e\u786e\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5728\u7b2c\u4e8c\u90e8\u5206\u8ba9\u4ee3\u7406\u8c03\u6574\u5728\u7ebf\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u6d3b\u8dc3\u5ea6\uff0c\u5e76\u5206\u6790\u884c\u52a8\u9891\u7387\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4f7f\u501f\u52a9\u7f16\u7a0b\u5de5\u5177\uff0cLLM\u4ee3\u7406\u4f9d\u7136\u65e0\u6cd5\u6709\u6548\u5730\u91c7\u6837\u6982\u7387\u5206\u5e03\u3002\u8fd9\u610f\u5473\u7740\u5728\u76f4\u63a5\u5c06LLM\u4f5c\u4e3a\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\u5e94\u7528\u4e4b\u524d\uff0c\u8fd8\u9700\u8981\u8c28\u614e\u5bf9\u5f85\u3002|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492](http://arxiv.org/abs/2404.08492)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u535a\u5f08\u8bba\u6846\u67b6\u4e0b\u7684\u6e38\u620f\u884c\u4e3a\u7406\u89e3\u6f5c\u529b\u65e5\u76ca\u663e\u73b0\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u901a\u8fc7\u6a21\u62df\u5206\u6790\u4e0d\u540c\u7c7b\u578bLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u7ecf\u5178 Beauty Contest \u6e38\u620f\u4e2d\u7684\u7b56\u7565\u4e92\u52a8\u3002\u501f\u9274\u4eba\u7c7b\u5b9e\u9a8c\uff0c\u6211\u4eec\u5bf9LLM\u4ee3\u7406\u7684\u7b56\u7565\u5c42\u6b21\u8fdb\u884c\u7c7b\u4f3c\u7684\u8bc4\u4f30\uff0c\u53d1\u73b0\u5b83\u4eec\u5c55\u73b0\u51fa\u4ece\u96f6\u7ea7\u5230\u4e00\u7ea7\u7684\u4e0d\u540c\u7a0b\u5ea6\u63a8\u7406\u80fd\u529b\uff0c\u5e76\u5728\u91cd\u590d\u6e38\u620f\u4e2d\u8868\u73b0\u51fa\u884c\u52a8\u8d8b\u540c\u3002\u6b64\u5916\uff0c\u6211\u8fd8\u63a2\u8ba8\u4e86\u4e0d\u540c\u7c7b\u578b\u7684\u4ee3\u7406\u7fa4\u4f53\u6784\u6210\u5982\u4f55\u5f71\u54cd\u6218\u7565\u884c\u4e3a\uff1a\u9ad8\u6bd4\u4f8b\u7684\u56fa\u5b9a\u7b56\u7565\u5bf9\u624b\u80fd\u4fc3\u8fdbLLM\u4ee3\u7406\u7684\u6536\u655b\uff0c\u800c\u6df7\u5408\u73af\u5883\u4e2d\u4e0d\u540c\u76f8\u5bf9\u7b56\u7565\u6c34\u5e73\u7684\u4ee3\u7406\u5171\u5b58\u4f1a\u52a0\u901f\u6240\u6709\u4ee3\u7406\u7684\u6536\u655b\u3002\u66f4\u667a\u80fd\u7684\u4ee3\u7406\u53ef\u80fd\u83b7\u5f97\u66f4\u9ad8\u7684\u5e73\u5747\u6536\u76ca\uff0c\u4f46\u8fd9\u662f\u4ee5\u8f83\u4f4e\u667a\u80fd\u4ee3\u7406\u7684\u727a\u7272\u4e3a\u4ee3\u4ef7\u7684\u3002\u8fd9\u4e9b\u7ed3\u679c\u4e0d\u4ec5\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u60c5\u666f\u4e0b\u6a21\u62df\u4ee3\u7406\u7684\u7ed3\u5c40\uff0c\u8fd8\u4e3a\u7406\u89e3\u7b97\u6cd5\u4e4b\u95f4\u7684\u6218\u7565\u4e92\u52a8\u63d0\u4f9b\u4e86\u91cd\u8981\u542f\u793a\u3002|\n", "2404.08144": "|**2024-04-17**|**LLM Agents can Autonomously Exploit One-day Vulnerabilities**|Richard Fang et.al.|[2404.08144](http://arxiv.org/abs/2404.08144)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5a01\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5176\u5728\u826f\u6027\u548c\u6076\u610f\u7528\u9014\u4e0a\u7684\u5e94\u7528\u4e5f\u65e5\u76ca\u5e7f\u6cdb\u3002\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u5b83\u4eec\u5229\u7528\u7f51\u7edc\u5b89\u5168\u6f0f\u6d1e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u81ea\u4e3b\u7834\u89e3\u7f51\u7ad9\u7684\u53ef\u80fd\u6027\uff0c\u4f46\u8fd9\u4e9b\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u5355\u7684\u6f0f\u6d1e\u4e0a\u3002\u672c\u5de5\u4f5c\u63ed\u793a\uff0cLLMs\u80fd\u591f\u81ea\u4e3b\u5229\u7528\u73b0\u5b9e\u4e16\u754c\u7cfb\u7edf\u4e2d\u7684\u5355\u65e5\u6f0f\u6d1e\u3002\u6211\u4eec\u6536\u96c6\u4e86\u4e00\u7ec4\u5305\u542b15\u4e2a\u88abCVE\u63cf\u8ff0\u4e3a\u201c\u5173\u952e\u4e25\u91cd\u6027\u201d\u7684\u4e00\u5929\u671f\u6f0f\u6d1e\u6570\u636e\u3002\u5f53\u63d0\u4f9bCVE\u63cf\u8ff0\u65f6\uff0cGPT-4\u6a21\u578b\u80fd\u6210\u529f\u5229\u752887%\u7684\u6f0f\u6d1e\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5176\u4ed6\u6d4b\u8bd5\u6a21\u578b\uff08\u5982GPT-3.5\u3001\u5f00\u6e90LLMs\u548c\u5f00\u6e90\u6f0f\u6d1e\u626b\u63cf\u5668ZAP\u548cMetasploit\uff09\u7684\u8868\u73b0\u5747\u4e3a0%\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684GPT-4\u6a21\u578b\u5728\u6ca1\u6709\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u6548\u7387\u5927\u51cf\uff0c\u4ec5\u80fd\u5229\u75287%\u7684\u6f0f\u6d1e\u3002\u8fd9\u4e9b\u53d1\u73b0\u5bf9\u5927\u89c4\u6a21\u90e8\u7f72\u9ad8\u80fd\u529bLLMs\u63d0\u51fa\u4e86\u8d28\u7591\u3002|\n", "2404.17586": "|**2024-04-11**|**The Future of Scientific Publishing: Automated Article Generation**|Jeremy R. Harper et.al.|[2404.17586](http://arxiv.org/abs/2404.17586)|null|\u8fd9\u9879\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8f6f\u4ef6\u5de5\u5177\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u793a\uff0c\u5b9e\u73b0\u4e86\u4ecePython\u4ee3\u7801\u81ea\u52a8\u751f\u6210\u5b66\u672f\u6587\u7ae0\uff0c\u8fd9\u5bf9\u4e8e\u751f\u7269\u533b\u5b66\u4fe1\u606f\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u9009\u62e9Python\u4f5c\u4e3a\u57fa\u7840\u793a\u4f8b\uff0c\u56e0\u5176\u5e7f\u6cdb\u4f7f\u7528\u548c\u5f3a\u5927\u7684\u6570\u636e\u5206\u6790\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u548c\u6846\u67b6\u7684\u7075\u6d3b\u6027\u4f7f\u5f97\u5176\u9002\u7528\u4e8e\u591a\u79cdGitHub\u4ed3\u5e93\uff0c\u8868\u660e\u4e86\u5de5\u5177\u7684\u5e7f\u6cdb\u5e94\u7528\u6f5c\u529b\uff08Harper\uff0c2024\u5e74\uff09\u3002\u901a\u8fc7\u7b80\u5316\u4f20\u7edf\u4e0a\u8017\u65f6\u7684\u5b66\u672f\u5199\u4f5c\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5728\u6574\u5408\u590d\u6742\u6570\u636e\u96c6\u548c\u4ee3\u7801\u8f93\u51fa\u65b9\u9762\uff0c\u8fd9\u4e00\u7a81\u7834\u6027\u8fdb\u5c55\u63a8\u52a8\u4e86\u79d1\u7814\u6210\u679c\u7684\u5feb\u901f\u4f20\u64ad\u3002\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u5e76\u672a\u4f9d\u8d56\u9ad8\u7ea7\u8bed\u8a00\u6a21\u578b\uff0c\u786e\u4fdd\u4e86\u81ea\u52a8\u5316\u751f\u6210\u5185\u5bb9\u7684\u8fde\u8d2f\u6027\u548c\u5b8c\u6574\u6027\u3002\u6b64\u6b21\u63a2\u7d22\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u8f6f\u4ef6\u7684\u6210\u529f\u5e94\u7528\u548c\u6548\u7387\uff0c\u8fd8\u9884\u793a\u4e86\u672a\u6765\u53ef\u80fd\u96c6\u6210\u66f4\u5148\u8fdb\u7684LLM\uff0c\u5c06\u8fdb\u4e00\u6b65\u589e\u5f3a\u5176\u529f\u80fd\uff0c\u5f15\u9886\u4e00\u4e2a\u79d1\u7814\u53d1\u73b0\u53d1\u5e03\u66f4\u52a0\u8fc5\u901f\u548c\u6613\u83b7\u53d6\u7684\u65f6\u4ee3\u3002|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456](http://arxiv.org/abs/2404.07456)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u4f5c\u4e3a\u667a\u80fd\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u5de5\u7a0b\u6216\u4efb\u52a1\u7279\u5b9a\u7684\u5fae\u8c03\u6765\u63d0\u5347\u6a21\u578b\u7684\u63a8\u7406\u6216\u51b3\u7b56\u80fd\u529b\uff0c\u5ffd\u89c6\u4e86\u63a2\u7d22\u4e0e\u5229\u7528\u7684\u8fc7\u7a0b\u3002\u5728\u5904\u7406\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5b58\u5728\u5c40\u9650\u6027\u3002\u9996\u5148\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u73af\u5883\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u6a21\u578b\u503e\u5411\u4e8e\u505a\u51fa\u8d2a\u5a6a\u51b3\u7b56\uff0c\u5bfc\u81f4\u89e3\u51b3\u65b9\u6848\u4e0d\u7406\u60f3\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u4ece\u73af\u5883\u4e2d\u83b7\u53d6\u7684\u65e0\u5173\u4fe1\u606f\u4e0d\u4ec5\u5f15\u5165\u566a\u58f0\uff0c\u8fd8\u589e\u52a0\u4e86\u989d\u5916\u7684\u6210\u672c\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u5f31\u63a2\u7d22\u5f3a\u5316\u5f3a\u5229\u7528\uff08Weak Exploration to Strong Exploitation\uff0cWESE\uff09\uff0c\u65e8\u5728\u589e\u5f3aLLM\u5728\u89e3\u51b3\u5f00\u653e\u4e16\u754c\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0cWESE\u5c06\u63a2\u7d22\u548c\u5229\u7528\u8fc7\u7a0b\u89e3\u8026\uff0c\u4f7f\u7528\u6210\u672c\u6548\u76ca\u9ad8\u7684\u201c\u5f31\u201d\u4ee3\u7406\u6267\u884c\u63a2\u7d22\u4efb\u52a1\uff0c\u4ee5\u83b7\u53d6\u5168\u5c40\u77e5\u8bc6\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7b56\u7565\u6765\u5b58\u50a8\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5e76\u63d0\u53d6\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u4ece\u800c\u63d0\u5347\u201c\u5f3a\u201d\u4ee3\u7406\u5728\u6210\u529f\u7387\u548c\u6548\u7387\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5e76\u5728\u56db\u4e2a\u4e92\u52a8\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u548c\u6548\u7387\u3002|\n", "2404.06921": "|**2024-04-10**|**GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications**|Shishir G. Patil et.al.|[2404.06921](http://arxiv.org/abs/2404.06921)|**[link](https://github.com/ShishirPatil/gorilla)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0d\u518d\u4ec5\u4ec5\u662f\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u4fe1\u606f\u63d0\u4f9b\u8005\uff0c\u800c\u662f\u5f00\u59cb\u79ef\u6781\u53c2\u4e0e\u5230\u4e0e\u5b9e\u9645\u5e94\u7528\u548c\u670d\u52a1\u7684\u4e92\u52a8\u4e2d\u3002\u5982\u4eca\uff0c\u4eba\u7c7b\u5728\u5c06LLM\u751f\u6210\u7684\u8f93\u51fa\uff08\u5982\u4ee3\u7801\u3001\u51fd\u6570\u6216\u64cd\u4f5c\uff09\u6295\u5165\u73b0\u5b9e\u4e16\u754c\u6267\u884c\u524d\uff0c\u9700\u8981\u9a8c\u8bc1\u5176\u6b63\u786e\u6027\u548c\u9002\u7528\u6027\uff0c\u8fd9\u5e26\u6765\u4e86\u6311\u6218\uff0c\u56e0\u4e3a\u4ee3\u7801\u7406\u89e3\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u975e\u5e38\u56f0\u96be\u3002\u672c\u6587\u7814\u7a76\u4e86\u4eba\u7c7b\u5982\u4f55\u80fd\u6709\u6548\u4e0eLLMs\u534f\u4f5c\u3001\u59d4\u6d3e\u548c\u76d1\u7763\uff0c\u7279\u522b\u662f\u5728\u672a\u6765\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u5bf9\u63d0\u51fa\u7684\u884c\u52a8\u8fdb\u884c\u201c\u4e8b\u540e\u9a8c\u8bc1\u201d\uff08\u5728\u770b\u5230\u8f93\u51fa\u540e\u786e\u8ba4\u5176\u6b63\u786e\u6027\uff09\u6bd4\u4e4b\u524d\u7684\u201c\u4e8b\u524d\u9a8c\u8bc1\u201d\u66f4\u4e3a\u5bb9\u6613\u3002\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u96c6\u6210\u76f4\u89c2\u7684\u64a4\u9500\u529f\u80fd\uff0c\u5e76\u4e3aLLM\u751f\u6210\u7684\u52a8\u4f5c\u8bbe\u5b9a\u635f\u5bb3\u7ea6\u675f\uff0c\u4f5c\u4e3a\u964d\u4f4e\u76f8\u5173\u98ce\u9669\u7684\u6709\u6548\u7b56\u7565\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u4eba\u7c7b\u53ef\u4ee5\u64a4\u9500LLM\u8f93\u51fa\u7684\u5f71\u54cd\uff0c\u6216\u8005\u786e\u4fe1\u6f5c\u5728\u98ce\u9669\u662f\u6709\u9650\u7684\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u5bf9\u4e8e\u5b9e\u73b0LLMs\u4e0e\u5e94\u7528\u548c\u670d\u52a1\u5728\u6709\u9650\u7684\u4eba\u7c7b\u76d1\u7763\u4e0b\u4ea4\u4e92\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u63cf\u8ff0\u4e86\u5f00\u6e90\u8fd0\u884c\u65f6Gorilla Execution Engine\uff08GoEX\uff09\u7684\u8bbe\u8ba1\u548c\u5b9e\u73b0\uff0c\u8be5\u8fd0\u884c\u65f6\u7528\u4e8e\u6267\u884cLLM\u52a8\u4f5c\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e9b\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\uff0c\u65e8\u5728\u63a8\u52a8LLMs\u4e0e\u5e94\u7528\u4e4b\u95f4\u4ee5\u6700\u5c0f\u7684\u4eba\u5de5\u5e72\u9884\u8fdb\u884c\u4ea4\u4e92\u3002GoEX\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ShishirPatil/gorilla/\u3002**|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411](http://arxiv.org/abs/2404.06411)|**[link](https://github.com/nec-research/agentquest)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\uff0c\u4eba\u4eec\u8ffd\u6c42\u80fd\u591f\u89e3\u51b3\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u7684LLM\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u5f80\u5f80\u5c40\u9650\u4e14\u53ea\u5173\u6ce8\u6574\u4f53\u4efb\u52a1\u6210\u529f\u7387\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AgentQuest\u6846\u67b6\uff0c\u5b83\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\uff08i\uff09benchmark\u548c\u8bc4\u4f30\u6307\u6807\u6a21\u5757\u5316\u4e14\u6613\u4e8e\u6269\u5c55\uff0c\u901a\u8fc7\u6587\u6863\u9f50\u5168\u3001\u6613\u7528\u7684API\uff1b\uff08ii\uff09\u6211\u4eec\u63d0\u4f9b\u4e86\u4e24\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u80fd\u591f\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u53ef\u9760\u5730\u8ffd\u8e2aLLM\u4ee3\u7406\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u901a\u8fc7\u4e24\u4e2a\u793a\u4f8b\u5c55\u793a\u4e86\u8fd9\u4e9b\u6307\u6807\u7684\u5b9e\u7528\u6027\uff0c\u901a\u8fc7\u8bc6\u522b\u5e38\u89c1\u5931\u8d25\u70b9\u5e76\u4f18\u5316\u4ee3\u7406\u67b6\u6784\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\u3002\u6211\u4eec\u5e0c\u671b\u4e0e\u7814\u7a76\u754c\u5171\u540c\u6269\u5c55AgentQuest\uff0c\u5e76\u5df2\u5c06\u5176\u5f00\u6e90\u5728https://github.com/nec-research/agentquest\u3002**|\n", "2404.05427": "|**2024-04-15**|**AutoCodeRover: Autonomous Program Improvement**|Yuntong Zhang et.al.|[2404.05427](http://arxiv.org/abs/2404.05427)|**[link](https://github.com/nus-apr/auto-code-rover)**|**\u5728\u8fc7\u53bb\u51e0\u5341\u5e74\u91cc\uff0c\u7814\u7a76\u4eba\u5458\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u6781\u5927\u5730\u63a8\u52a8\u4e86\u7f16\u7a0b\u8f85\u52a9\u7684\u81ea\u52a8\u5316\u3002\u7136\u800c\uff0c\u8f6f\u4ef6\u5de5\u7a0b\u5e76\u4e0d\u4ec5\u4ec5\u662f\u7f16\u7801\uff0c\u8fd8\u5305\u62ec\u7ef4\u62a4\uff08\u5982\u4fee\u590dbug\uff09\u548c\u6f14\u5316\uff08\u5982\u6dfb\u52a0\u529f\u80fd\uff09\u7b49\u7a0b\u5e8f\u6539\u8fdb\u8fc7\u7a0b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u89e3\u51b3GitHub\u95ee\u9898\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5b9e\u73b0\u7a0b\u5e8f\u81ea\u4e3b\u6539\u8fdb\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u79f0\u4e3aAutoCodeRover\uff0c\u5b83\u7ed3\u5408\u4e86LLMs\u4e0e\u9ad8\u7ea7\u4ee3\u7801\u641c\u7d22\u80fd\u529b\uff0c\u6700\u7ec8\u751f\u6210\u7a0b\u5e8f\u4fee\u6539\u6216\u8865\u4e01\u3002\u4e0eAI\u7814\u7a76\u8005\u548c\u4ece\u4e1a\u8005\u8fd1\u671f\u5173\u6ce8\u7684\u4ec5\u6587\u4ef6\u7ea7\u522b\u7684\u8f6f\u4ef6\u9879\u76ee\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u7a0b\u5e8f\u8868\u793a\uff08\u62bd\u8c61\u8bed\u6cd5\u6811\uff09\uff0c\u5229\u7528\u7c7b/\u65b9\u6cd5\u7684\u7a0b\u5e8f\u7ed3\u6784\u6765\u589e\u5f3aLLM\u5bf9\u95ee\u9898\u6839\u672c\u539f\u56e0\u7684\u7406\u89e3\uff0c\u5e76\u901a\u8fc7\u8fed\u4ee3\u641c\u7d22\u63d0\u4f9b\u4e0a\u4e0b\u6587\u3002\u5f53\u6d4b\u8bd5\u5957\u4ef6\u53ef\u7528\u65f6\uff0c\u8c31\u7cfb\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u6280\u672f\u8fdb\u4e00\u6b65\u7cbe\u786e\u4e86\u4e0a\u4e0b\u6587\u3002 \u5728SWE-bench-lite\uff0c\u4e00\u4e2a\u5305\u542b300\u4e2a\u771f\u5b9eGitHub\u95ee\u9898\u7684\u6570\u636e\u96c6\u4e0a\uff0cAutoCodeRover\u7684\u89e3\u51b3\u65b9\u6848\u6548\u679c\u63d0\u5347\uff0c\u89e3\u51b3\u4e86\u7ea622-23%\u7684\u95ee\u9898\u3002\u5bf9\u4e8e\u5168\u91cf\u7684SWE-bench\uff0c\u5305\u542b2294\u4e2aGitHub\u95ee\u9898\uff0cAutoCodeRover\u89e3\u51b3\u4e86\u5927\u7ea616%\u7684\u95ee\u9898\uff0c\u8fd9\u6bd4\u6700\u8fd1\u62a5\u9053\u7684\u6765\u81eaCognition Labs\u7684AI\u8f6f\u4ef6\u5de5\u7a0b\u5e08Devin\u7684\u8868\u73b0\u8fd8\u8981\u9ad8\uff0c\u800c\u4e14\u65f6\u95f4\u6d88\u8017\u4e0eDevin\u76f8\u5f53\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u6d41\u7a0b\u80fd\u591f\u63a8\u52a8\u81ea\u4e3b\u8f6f\u4ef6\u5de5\u7a0b\u7684\u53d1\u5c55\uff0c\u672a\u6765LLM\u81ea\u52a8\u751f\u6210\u7684\u4ee3\u7801\u53ef\u4ee5\u88ab\u81ea\u52a8\u5730\u8fdb\u884c\u4f18\u5316\u548c\u6539\u8fdb\u3002**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291](http://arxiv.org/abs/2404.05291)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\uff0c\u65e8\u5728\u63d0\u5347\u56db\u8db3\u673a\u5668\u4eba\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u8d85\u8d8a\u77ed\u671f\u52a8\u4f5c\u7684\u957f\u671f\u4efb\u52a1\u3002\u5bf9\u4e8e\u56db\u8db3\u673a\u5668\u4eba\u6765\u8bf4\uff0c\u957f\u671f\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u5bf9\u4efb\u52a1\u7684\u8bed\u4e49\u6709\u9ad8\u5c42\u7406\u89e3\uff0c\u5e76\u5177\u5907\u5e7f\u6cdb\u7684\u8fd0\u52a8\u548c\u64cd\u7eb5\u6280\u80fd\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u5c42\u63a8\u7406\u5c42\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u4efb\u52a1\u63cf\u8ff0\u4e2d\u751f\u6210\u6df7\u5408\u79bb\u6563-\u8fde\u7eed\u7684\u8ba1\u5212\uff0c\u4f5c\u4e3a\u673a\u5668\u4eba\u4ee3\u7801\u3002\u5b83\u5305\u62ec\u591a\u4e2aLLM\u4ee3\u7406\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u601d\u8ba1\u5212\u7684\u8bed\u4e49\u89c4\u5212\u5668\u3001\u4e00\u4e2a\u53c2\u6570\u8ba1\u7b97\u5668\uff0c\u7528\u4e8e\u9884\u6d4b\u8ba1\u5212\u4e2d\u7684\u53c2\u6570\uff0c\u4ee5\u53ca\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u5668\uff0c\u5c06\u8ba1\u5212\u8f6c\u6362\u4e3a\u53ef\u6267\u884c\u7684\u673a\u5668\u4eba\u4ee3\u7801\u3002 \u5728\u4f4e\u5c42\u6b21\uff0c\u6211\u4eec\u91c7\u7528\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u4e00\u5957\u8fd0\u52a8\u89c4\u5212\u548c\u63a7\u5236\u6280\u80fd\uff0c\u4ee5\u589e\u5f3a\u56db\u8db3\u673a\u5668\u4eba\u7684\u7075\u6d3b\u6027\uff0c\u4f7f\u5176\u80fd\u8fdb\u884c\u4e30\u5bcc\u73af\u5883\u4ea4\u4e92\u3002\u6211\u4eec\u5728\u96be\u4ee5\u7528\u5355\u4e00\u6280\u80fd\u5b8c\u6210\u7684\u957f\u671f\u4efb\u52a1\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u7cfb\u7edf\u3002\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u8868\u660e\uff0c\u5b83\u6210\u529f\u5730\u5236\u5b9a\u4e86\u591a\u6b65\u9aa4\u7b56\u7565\uff0c\u5e76\u5c55\u73b0\u51fa\u975e\u5e73\u51e1\u7684\u884c\u4e3a\uff0c\u4f8b\u5982\u5236\u4f5c\u5de5\u5177\u6216\u5411\u4eba\u7c7b\u5bfb\u6c42\u5e2e\u52a9\u3002|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667](http://arxiv.org/abs/2404.04667)|null|\u591a\u6a21\u6001\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u6709\u671b\u901a\u8fc7\u89e3\u6790\u5404\u7c7b\u533b\u5b66\u6570\u636e\u63d0\u5347\u4e34\u5e8a\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5404\u533b\u5b66\u9886\u57df\u7684\u6548\u80fd\u5c1a\u4e0d\u660e\u6717\uff0c\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5176\u72ec\u7279\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u5f15\u64ce\u7684\u65b0\u578b\u591a\u6a21\u6001\u533b\u7597AI\u65b9\u6cd5\u3002\u6b64\u5f15\u64ce\u81ea\u4e3b\u534f\u8c03\u5e76\u90e8\u7f72\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u533b\u7597AI\u5de5\u5177\uff0c\u5982\u6587\u672c\u89e3\u8bfb\u3001\u653e\u5c04\u5b66\u548c\u75c5\u7406\u56fe\u50cf\u5206\u6790\u3001\u57fa\u56e0\u6570\u636e\u5904\u7406\u3001\u7f51\u7edc\u641c\u7d22\u4ee5\u53ca\u533b\u7597\u6307\u5357\u6587\u6863\u68c0\u7d22\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u4e34\u5e8a\u80bf\u7624\u5b66\u573a\u666f\u4e2d\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u573a\u666f\u6a21\u62df\u4e86\u5178\u578b\u7684\u60a3\u8005\u62a4\u7406\u6d41\u7a0b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u7cfb\u7edf\u5728\u9009\u62e9\u6070\u5f53\u5de5\u5177\uff0897%\uff09\u3001\u5f97\u51fa\u6b63\u786e\u7ed3\u8bba\uff0893.6%\uff09\u3001\u63d0\u4f9b\u5b8c\u6574\uff0894%\uff09\u548c\u6709\u76ca\uff0889.2%\uff09\u6cbb\u7597\u5efa\u8bae\uff0c\u4ee5\u53ca\u6839\u636e\u6307\u4ee4\u5f15\u7528\u76f8\u5173\u6587\u732e\uff0882.5%\uff09\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u80fd\u529b\u3002\u8fd9\u8868\u660eLLMs\u80fd\u591f\u6709\u6548\u5730\u89c4\u5212\u548c\u6267\u884c\u9886\u57df\u7279\u5b9a\u6a21\u578b\uff0c\u4ee5\u83b7\u53d6\u6216\u5408\u6210\u65b0\u4fe1\u606f\uff0c\u4ece\u800c\u5145\u5f53\u4e2a\u6027\u5316\u4e34\u5e8a\u52a9\u624b\u3002\u6b64\u5916\uff0c\u8fd9\u79cd\u67b6\u6784\u7b80\u5316\u4e86\u76d1\u7ba1\u5408\u89c4\u6027\uff0c\u56e0\u4e3a\u6bcf\u4e2a\u7ec4\u4ef6\u5de5\u5177\u53ef\u4ee5\u5355\u72ec\u9a8c\u8bc1\u548c\u5ba1\u6279\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u9879\u5de5\u4f5c\u4e3a\u533b\u7597\u9886\u57df\u7684\u66f4\u5148\u8fdbLLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u6982\u5ff5\u9a8c\u8bc1\u3002|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237](http://arxiv.org/abs/2404.04237)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u8fdb\u6b65\u4f7f\u5176\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u9891\u9891\u8d85\u8d8a\u4eba\u7c7b\u8868\u73b0\uff0c\u63a8\u52a8\u4e86\u4f17\u591a\u4e0b\u6e38\u5e94\u7528\u7684\u53d1\u5c55\uff0c\u5982\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u770b\u4f3c\u7b80\u5355\u7684\u4efb\u52a1\u4e2d\u610f\u5916\u5730\u8868\u73b0\u4e0d\u4f73\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5bf9\u66f4\u5168\u9762\u548c\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u9700\u6c42\uff0c\u4ee5\u8861\u91cf\u5b83\u4eec\u7684\u5b9e\u9645\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7ec4\u5408\u6027\u548c\u6761\u4ef6\u63a8\u7406\u2014\u2014\u4eba\u7c7b\u8ba4\u77e5\u7684\u57fa\u77f3\uff0c\u5e76\u63d0\u51faGroundCocoa\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e0e\u822a\u73ed\u9884\u8ba2\u8fd9\u4e00\u73b0\u5b9e\u95ee\u9898\u76f8\u8fde\u63a5\u7684\u8bcd\u6c47\u4e30\u5bcc\u7684\u57fa\u51c6\u3002\u6211\u4eec\u7684\u4efb\u52a1\u662f\u5c06\u7528\u6237\u7684\u8be6\u7ec6\u504f\u597d\u4e0e\u4ee5\u591a\u9009\u5f62\u5f0f\u63d0\u4f9b\u7684\u53ef\u7528\u822a\u73ed\u9009\u9879\u8fdb\u884c\u5339\u914d\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684GPT-4 Turbo\u5728\u5185\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7\u9ad8\u7ea7\u63d0\u793a\u540e\uff0c\u51c6\u786e\u7387\u4ecd\u4e0d\u8d85\u8fc767%\uff0c\u663e\u793a\u51fa\u663e\u8457\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045](http://arxiv.org/abs/2404.16045)|null|## \u7ffb\u8bd1 \u5728\u4ea7\u54c1\u5f00\u53d1\u7684\u5173\u952e\u9636\u6bb5\u2014\u2014\u9700\u6c42\u83b7\u53d6\uff0c\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u7528\u6237\u9700\u6c42\uff0c\u5bfc\u81f4\u6700\u7ec8\u4ea7\u54c1\u53ef\u80fd\u65e0\u6cd5\u6ee1\u8db3\u671f\u671b\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5316\u548c\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u3002\u901a\u8fc7\u751f\u6210\u5927\u91cf\u6a21\u62df\u7528\u6237\uff08LLM\u4ee3\u7406\uff09\uff0c\u6211\u4eec\u53ef\u4ee5\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7528\u6237\u9700\u6c42\u548c\u672a\u9884\u89c1\u7684\u4f7f\u7528\u573a\u666f\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u63cf\u8ff0\u4ed6\u4eec\u7684\u884c\u4e3a\u3001\u89c2\u5bdf\u548c\u6311\u6218\uff0c\u53c2\u4e0e\u4ea7\u54c1\u4f53\u9a8c\u60c5\u666f\u3002\u968f\u540e\u7684\u4ee3\u7406\u8bbf\u8c08\u548c\u5206\u6790\u63ed\u793a\u4e86\u5b9d\u8d35\u7684\u7528\u6237\u9700\u6c42\uff0c\u5305\u62ec\u6f5c\u5728\u9700\u6c42\u3002\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff1a\u9996\u5148\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0d\u540c\u65b9\u6cd5\u751f\u6210\u591a\u6837\u5316\u7684\u4ee3\u7406\uff0c\u5206\u6790\u5176\u4f18\u7f3a\u70b9\uff0c\u5e76\u8bc1\u660e\u4e86\u5177\u6709\u4e0a\u4e0b\u6587\u610f\u8bc6\u7684\u4ee3\u7406\u751f\u6210\u80fd\u5e26\u6765\u66f4\u5927\u7684\u9700\u6c42\u591a\u6837\u6027\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8be5\u6846\u67b6\u5982\u4f55\u6709\u6548\u5730\u6a21\u62df\u5bcc\u6709\u540c\u60c5\u5fc3\u7684\u9886\u5148\u7528\u6237\u8bbf\u8c08\uff0c\u8bc6\u522b\u51fa\u6bd4\u4f20\u7edf\u4eba\u7c7b\u8bbf\u8c08\u66f4\u591a\u7684\u6f5c\u5728\u9700\u6c42\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528LLMs\u5206\u6790\u8bbf\u8c08\uff0c\u63d0\u53d6\u9700\u6c42\u5e76\u5c06\u5176\u5206\u7c7b\u4e3a\u6f5c\u5728\u6216\u975e\u6f5c\u5728\u3002\u6211\u4eec\u7684\u7814\u7a76\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5229\u7528LLM\u4ee3\u7406\u52a0\u901f\u65e9\u671f\u4ea7\u54c1\u7814\u53d1\u3001\u964d\u4f4e\u6210\u672c\u548c\u4fc3\u8fdb\u521b\u65b0\u7684\u6f5c\u529b\u3002|\n", "2404.15317": "|**2024-04-03**|**Concept-Guided LLM Agents for Human-AI Safety Codesign**|Florian Geissler et.al.|[2404.15317](http://arxiv.org/abs/2404.15317)|null|\u968f\u7740\u751f\u6210\u4eba\u5de5\u667a\u80fd\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff0c\u7279\u522b\u662f\u5b89\u5168\u5de5\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u63d0\u5347\uff0c\u5bf9\u5b83\u7684\u8d28\u91cf\u8981\u6c42\u4e5f\u968f\u4e4b\u63d0\u9ad8\u3002\u5355\u7eaf\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e9b\u9700\u6c42\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u878d\u5408\u7684\u7b56\u7565\uff0c\u65e8\u5728\u5229\u7528LLMs\u8fdb\u884c\u5b89\u5168\u5206\u6790\u548c\u4eba\u673a\u534f\u540c\u8bbe\u8ba1\uff0c\u4ee5\u786e\u4fdd\u8f6f\u4ef6\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9a\u5236\u5316\u7684LLM\u4ee3\u7406\uff0c\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u3001\u542f\u53d1\u5f0f\u63a8\u7406\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u4e13\u6ce8\u4e8e\u89e3\u51b3\u4e0e\u9884\u5b9a\u4e49\u5b89\u5168\u6982\u5ff5\u76f8\u5173\u7684\u4efb\u52a1\uff0c\u5e76\u4e0e\u7cfb\u7edf\u6a21\u578b\u56fe\u8fdb\u884c\u4ea4\u4e92\u3002\u51b3\u7b56\u6d41\u7a0b\u901a\u8fc7\u4e00\u7cfb\u5217\u5fae\u51b3\u7b56\u8fdb\u884c\u5f15\u5bfc\uff0c\u6709\u52a9\u4e8e\u4fdd\u6301\u7ed3\u6784\u5316\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u56fe\u7684\u53e3\u5934\u8868\u8ff0\u4f5c\u4e3a\u7cfb\u7edf\u6a21\u578b\u7684\u4e2d\u95f4\u8868\u793a\uff0c\u4ee5\u4fc3\u8fdbLLM\u4e0e\u56fe\u7684\u4ea4\u4e92\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7b80\u5316\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u7684\u793a\u4f8b\uff0c\u5c55\u793a\u4e86\u9009\u62e9\u7684\u63d0\u793a-\u54cd\u5e94\u5bf9\uff0c\u4ee5\u8bf4\u660e\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u5e94\u7528\u4e8e\u5b89\u5168\u5206\u6790\u3002|\n", "2404.02183": "|**2024-04-02**|**Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**|Yoichi Ishibashi et.al.|[2404.02183](http://arxiv.org/abs/2404.02183)|**[link](https://github.com/tsukushiai/self-organized-agent)**|**## \u80cc\u666f \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u5316\u8f6f\u4ef6\u5f00\u53d1\u7684\u672a\u6765\u6b63\u9010\u6e10\u663e\u73b0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5355\u4ee3\u7406\u65b9\u6cd5\u5728\u751f\u6210\u548c\u4f18\u5316\u5927\u89c4\u6a21\u3001\u590d\u6742\u7684\u4ee3\u7801\u5e93\u65f6\u9762\u4e34\u4e0a\u4e0b\u6587\u957f\u5ea6\u9650\u5236\u7684\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014\u81ea\u7ec4\u7ec7\u591aAgent\u4f53\u7cfb\uff08SoA\uff09\u3002SoA\u662f\u4e00\u4e2a\u53ef\u6269\u5c55\u4e14\u9ad8\u6548\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u5141\u8bb8\u72ec\u7acb\u5730\u751f\u6210\u548c\u4fee\u6539\u4ee3\u7801\u7ec4\u4ef6\uff0c\u5e76\u534f\u540c\u6784\u5efa\u6574\u4e2a\u4ee3\u7801\u5e93\u3002SoA\u7684\u4e00\u4e2a\u5173\u952e\u7279\u6027\u662f\u6839\u636e\u95ee\u9898\u590d\u6742\u6027\u81ea\u52a8\u589e\u52a0\u4ee3\u7406\uff0c\u5b9e\u73b0\u52a8\u6001\u53ef\u6269\u5c55\u6027\u3002\u8fd9\u6837\uff0c\u6574\u4f53\u4ee3\u7801\u91cf\u53ef\u4ee5\u6839\u636e\u4ee3\u7406\u6570\u91cf\u65e0\u9650\u589e\u957f\uff0c\u800c\u6bcf\u4e2a\u4ee3\u7406\u7ba1\u7406\u7684\u4ee3\u7801\u91cf\u4fdd\u6301\u6052\u5b9a\u3002 \u6211\u4eec\u5728HumanEval\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86SoA\uff0c\u5e76\u53d1\u73b0\u4e0e\u5355\u4ee3\u7406\u7cfb\u7edf\u76f8\u6bd4\uff0cSoA\u4e2d\u7684\u6bcf\u4e2a\u4ee3\u7406\u5904\u7406\u7684\u4ee3\u7801\u91cf\u660e\u663e\u51cf\u5c11\uff0c\u4f46\u603b\u4f53\u751f\u6210\u7684\u4ee3\u7801\u91cf\u663e\u8457\u589e\u52a0\u3002\u6b64\u5916\uff0cSoA\u5728Pass@1\u51c6\u786e\u7387\u65b9\u9762\u6bd4\u5f3a\u5927\u7684\u5355\u4ee3\u7406\u57fa\u7ebf\u63d0\u9ad8\u4e865%\u3002**|\n", "2404.01602": "|**2024-04-02**|**Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game**|Silin Du et.al.|[2404.01602](http://arxiv.org/abs/2404.01602)|**[link](https://github.com/doslim/evaluate-the-opinion-leadership-of-llms)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4ea4\u63a8\u7406\u6e38\u620f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u7b56\u7565\u884c\u4e3a\uff0c\u4f46\u5bf9\u5b83\u4eec\u4f5c\u4e3a\u610f\u89c1\u9886\u8896\u7684\u91cd\u8981\u6027\u5173\u6ce8\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u591aAgent\u548c\u4eba\u673a\u4ea4\u4e92\u573a\u666f\u7684\u5b9e\u9645\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u610f\u89c1\u9886\u8896\u662f\u6307\u5728\u4e00\u4e2a\u793e\u4f1a\u7fa4\u4f53\u4e2d\u5bf9\u4ed6\u4eba\u4fe1\u5ff5\u548c\u884c\u4e3a\u6709\u663e\u8457\u5f71\u54cd\u7684\u4e2a\u4f53\u3002\u672c\u7814\u7a76\u4f7f\u7528\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u4f5c\u4e3a\u6a21\u62df\u5e73\u53f0\uff0c\u63a2\u8ba8\u8bed\u8a00\u6a21\u578b\u5728\u626e\u6f14Sheriff\uff08\u6cbb\u5b89\u5b98\uff09\u89d2\u8272\u65f6\u7684\u610f\u89c1\u9886\u5bfc\u80fd\u529b\u3002Sheriff\u8d1f\u8d23\u603b\u7ed3\u8bba\u70b9\u5e76\u63d0\u51fa\u51b3\u7b56\u5efa\u8bae\uff0c\u56e0\u6b64\u5b83\u4ee3\u8868\u4e86\u610f\u89c1\u9886\u8896\u7684\u4e00\u4e2a\u53ef\u4fe1\u4ee3\u7406\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6574\u5408Sheriff\u89d2\u8272\u7684\u6846\u67b6\uff0c\u5e76\u57fa\u4e8e\u610f\u89c1\u9886\u8896\u7684\u5173\u952e\u7279\u6027\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bc4\u4f30\u6307\u6807\uff1a\u7b2c\u4e00\u4e2a\u8861\u91cf\u610f\u89c1\u9886\u8896\u7684\u53ef\u9760\u6027\uff0c\u7b2c\u4e8c\u4e2a\u8003\u5bdf\u5176\u5bf9\u5176\u4ed6\u73a9\u5bb6\u51b3\u7b56\u7684\u5f71\u54cd\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u8bc4\u4f30\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u521b\u5efa\u4e86\u201c\u72fc\u4eba\u6740\u201d\u95ee\u9898\u56de\u7b54\u6570\u636e\u96c6\uff08WWQA\uff09\uff0c\u4ee5\u6d4b\u8bd5\u548c\u63d0\u5347\u6a21\u578b\u5bf9\u6e38\u620f\u89c4\u5219\u7684\u7406\u89e3\u3002\u6b64\u5916\uff0c\u8fd8\u5305\u542b\u4e86\u4eba\u7c7b\u53c2\u4e0e\u8005\u8fdb\u884c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620f\u662f\u4e00\u4e2a\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u610f\u89c1\u9886\u5bfc\u529b\u7684\u8bd5\u9a8c\u573a\uff0c\u4f46\u76ee\u524d\u4ec5\u6709\u5c11\u6570\u8bed\u8a00\u6a21\u578b\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002**|\n", "2404.00806": "|**2024-03-31**|**Algorithmic Collusion by Large Language Models**|Sara Fish et.al.|[2404.00806](http://arxiv.org/abs/2404.00806)|null|\u968f\u7740\u7b97\u6cd5\u5b9a\u4ef7\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u62c5\u5fe7\u7b97\u6cd5\u95f4\u7684\u5408\u8c0b\u95ee\u9898\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b9a\u4ef7\u4ee3\u7406\uff0c\u7279\u522b\u662fGPT-4\uff0c\u8fdb\u884c\u4e86\u63a2\u7a76\u3002\u7814\u7a76\u53d1\u73b0\uff1a(1) LLM\u9a71\u52a8\u7684\u5b9a\u4ef7\u673a\u5236\u5728\u5b9a\u4ef7\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff1b(2) \u5728\u5be1\u5934\u7ade\u4e89\u73af\u5883\u4e2d\uff0cLLM\u5b9a\u4ef7\u4ee3\u7406\u4f1a\u81ea\u53d1\u5730\u8fdb\u884c\u5408\u8c0b\uff0c\u4ece\u800c\u635f\u5bb3\u6d88\u8d39\u8005\u5229\u76ca\uff1b(3) \u5bf9LLM\u6307\u4ee4\uff08\u201c\u63d0\u793a\u201d\uff09\u770b\u4f3c\u5fae\u5c0f\u7684\u53d8\u5316\u53ef\u80fd\u52a0\u5267\u8fd9\u79cd\u5408\u4f5c\u884c\u4e3a\u3002\u8fd9\u4e9b\u7ed3\u679c\u540c\u6837\u9002\u7528\u4e8e\u62cd\u5356\u573a\u666f\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u7b97\u6cd5\u5b9a\u4ef7\u8fdb\u884c\u53cd\u5784\u65ad\u76d1\u7ba1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u9488\u5bf9LLM\u5b9a\u4ef7\u4ee3\u7406\u7279\u6709\u7684\u76d1\u7ba1\u6311\u6218\u3002|\n", "2404.01343": "|**2024-04-15**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343](http://arxiv.org/abs/2404.01343)|**[link](https://github.com/jingzheshi/chops)**|**\u968f\u7740\u4f01\u4e1a\u548c\u8f6f\u4ef6\u5e73\u53f0\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-3.5\u3001GPT-4\u3001GLM-3\u548cLLaMa-2\uff09\u63d0\u4f9b\u804a\u5929\u8f85\u52a9\u6216\u5ba2\u6237\u670d\u52a1\u63a8\u7406\uff0c\u73b0\u6709\u7684\u57fa\u4e8eLLM\u7684\u5ba2\u6237\u670d\u52a1\u6a21\u578b\u5728\u4e0e\u5ba2\u6237\u8d44\u6599\u96c6\u6210\u548c\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u5b83\u4eec\u503e\u5411\u4e8e\u5f3a\u8c03\u591a\u6837\u6027\u800c\u975e\u7cbe\u786e\u6027\u548c\u9519\u8bef\u907f\u514d\uff0c\u8fd9\u5bf9\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u5ba2\u6237\u670d\u52a1\u573a\u666f\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHOPS\uff08\u7ed3\u5408\u5ba2\u6237\u8d44\u6599\u7684\u804a\u5929\u52a9\u624b\uff09\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\uff1a\uff081\uff09\u9ad8\u6548\u5229\u7528\u73b0\u6709\u6570\u636e\u5e93\u6216\u7cfb\u7edf\u67e5\u8be2\u7528\u6237\u4fe1\u606f\uff0c\u6216\u9075\u5faa\u65e2\u5b9a\u6307\u5357\u4e0e\u7cfb\u7edf\u4ea4\u4e92\uff1b\uff082\uff09\u63d0\u4f9b\u51c6\u786e\u5408\u7406\u7684\u54cd\u5e94\u5e76\u6267\u884c\u7cfb\u7edf\u5185\u7684\u5fc5\u8981\u64cd\u4f5c\uff0c\u540c\u65f6\u907f\u514d\u6709\u5bb3\u64cd\u4f5c\uff1b\uff083\uff09\u901a\u8fc7\u7ed3\u5408\u5c0f\u578b\u548c\u5927\u578bLLM\u4ee5\u5b9e\u73b0\u6027\u80fd\u6ee1\u610f\u4e14\u6210\u672c\u5408\u7406\u7684\u63a8\u7406\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5b9e\u7528\u7684\u6570\u636e\u96c6\uff0c\u79f0\u4e3aCPHOS-dataset\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u6570\u636e\u5e93\u3001\u6307\u5bfc\u6587\u4ef6\u4ee5\u53ca\u6765\u81eaCPHOS\u5e73\u53f0\u7684\u6a21\u62df\u7269\u7406\u5965\u6797\u5339\u514b\u7ec4\u7ec7\u670d\u52a1\u7684\u95ee\u7b54\u5bf9\u3002CPHOS\u662f\u4e00\u4e2a\u9762\u5411\u9ad8\u4e2d\u6559\u5e08\u548c\u5b66\u751f\u7684\u5728\u7ebf\u5e73\u53f0\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528CPHOS-dataset\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86CHOPS\u67b6\u6784\u7684\u6027\u80fd\uff0c\u76ee\u6807\u662f\u5c55\u793aLLM\u5982\u4f55\u63d0\u5347\u6216\u66ff\u4ee3\u4eba\u5de5\u5ba2\u6237\u670d\u52a1\u3002\u5173\u4e8e\u6211\u4eec\u7684\u63d0\u6848\u67b6\u6784\u548c\u6570\u636e\u96c6\u7684\u4ee3\u7801\u53ef\u5728\u6b64\u5904\u83b7\u53d6\uff1a\u3002**|\n", "2404.01342": "|**2024-03-31**|**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**|Lirui Zhao et.al.|[2404.01342](http://arxiv.org/abs/2404.01342)|**[link](https://github.com/opengvlab/diffagent)**|**\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u8fd1\u5e74\u6765\u5907\u53d7\u77a9\u76ee\uff0c\u5728\u5b66\u672f\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u5927\u653e\u5f02\u5f69\u3002\u4f8b\u5982\uff0cCivitai\u5e73\u53f0\uff0c\u4e00\u4e2aT2I\u521b\u65b0\u7684\u805a\u96c6\u5730\uff0c\u76ee\u524d\u6c47\u96c6\u4e8674,492\u79cd\u72ec\u7279\u7684\u6a21\u578b\uff0c\u8fd9\u5e26\u6765\u4e86\u9009\u62e9\u6700\u5408\u9002\u7684\u6a21\u578b\u548c\u53c2\u6570\u7684\u8270\u5de8\u4efb\u52a1\uff0c\u901a\u5e38\u9700\u8981\u591a\u6b21\u8bd5\u9a8c\u3002\u501f\u9274\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5de5\u5177\u4f7f\u7528\u7814\u7a76\u7684\u601d\u8def\uff0c\u6211\u4eec\u63a8\u51fa\u4e86DiffAgent\uff0c\u8fd9\u662f\u4e00\u4e2a\u901a\u8fc7API\u8c03\u7528\u6765\u5feb\u901f\u7b5b\u9009\u51c6\u786e\u9009\u9879\u7684LLM\u4ee3\u7406\u3002DiffAgent\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u8bad\u7ec3\u6846\u67b6\uff0c\u79f0\u4e3aSFTA\uff0c\u4f7f\u5176\u80fd\u591f\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7cbe\u786e\u5730\u5c06T2I API\u7684\u54cd\u5e94\u4e0e\u7528\u6237\u8f93\u5165\u5bf9\u9f50\u3002\u4e3a\u4e86\u8bad\u7ec3\u548c\u8bc4\u4f30DiffAgent\u7684\u80fd\u529b\uff0c\u6211\u4eec\u6784\u5efa\u4e86DABench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u5e93\uff0c\u6db5\u76d6\u4e86\u793e\u533a\u4e2d\u7684\u5404\u79cdT2I API\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDiffAgent\u4e0d\u4ec5\u5728\u9009\u62e9\u9002\u5f53\u7684T2I API\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u9a8c\u8bc1\u4e86SFTA\u8bad\u7ec3\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53ef\u5728https://github.com/OpenGVLab/DiffAgent\u83b7\u53d6\u3002**|\n", "2404.00573": "|**2024-03-31**|**\"My agent understands me better\": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents**|Yuki Hou et.al.|[2404.00573](http://arxiv.org/abs/2404.00573)|**[link](https://github.com/tamoharu/Agent-Memory-CHI24)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u7c7b\u8bb0\u5fc6\u67b6\u6784\uff0c\u65e8\u5728\u63d0\u5347\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u8bdd\u4ee3\u7406\u7684\u8ba4\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u4f7f\u5f97\u8fd9\u4e9b\u4ee3\u7406\u80fd\u81ea\u4e3b\u68c0\u7d22\u751f\u6210\u54cd\u5e94\u6240\u9700\u7684\u5fc5\u8981\u8bb0\u5fc6\uff0c\u4ece\u800c\u89e3\u51b3LLMs\u5728\u65f6\u95f4\u8ba4\u77e5\u4e0a\u7684\u5c40\u9650\u3002\u6211\u4eec\u501f\u9274\u4e86\u4eba\u7c7b\u7684\u8bb0\u5fc6\u7ebf\u7d22\u53ec\u56de\u673a\u5236\u4f5c\u4e3a\u89e6\u53d1\u70b9\uff0c\u4ee5\u5b9e\u73b0\u7cbe\u786e\u4e14\u9ad8\u6548\u7684\u56de\u5fc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6570\u5b66\u6a21\u578b\uff0c\u52a8\u6001\u91cf\u5316\u8bb0\u5fc6\u5de9\u56fa\u8fc7\u7a0b\uff0c\u8003\u8651\u4e86\u8bf8\u5982\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u65f6\u95f4\u6d41\u901d\u548c\u56de\u5fc6\u9891\u7387\u7b49\u56e0\u7d20\u3002\u4ee3\u7406\u4f1a\u4ece\u7528\u6237\u7684\u4ea4\u4e92\u5386\u53f2\u4e2d\u5b58\u50a8\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u8bb0\u5fc6\u88ab\u5c01\u88c5\u5728\u6570\u636e\u5e93\u4e2d\uff0c\u6bcf\u4e2a\u8bb0\u5fc6\u90fd\u5305\u542b\u4e86\u5185\u5bb9\u548c\u65f6\u95f4\u5173\u8054\u7684\u8bed\u5883\u3002\u8fd9\u6837\uff0c\u901a\u8fc7\u7c7b\u4f3c\u4eba\u7c7b\u8bc6\u522b\u548c\u56de\u5fc6\u8fc7\u5f80\u7ecf\u5386\u7684\u65b9\u5f0f\uff0c\u7cfb\u7edf\u80fd\u591f\u6218\u7565\u6027\u5730\u5b58\u50a8\u8bb0\u5fc6\uff0c\u5e76\u7406\u89e3\u5b83\u4eec\u5bf9\u7528\u6237\u5728\u65f6\u95f4\u7ebf\u4e0a\u7684\u91cd\u8981\u6027\u3002|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.11403": "|**2024-05-18**|**MapCoder: Multi-Agent Code Generation for Competitive Problem Solving**|Md. Ashraful Islam et.al.|[2405.11403](http://arxiv.org/abs/2405.11403)|**[link](https://github.com/md-ashraful-pramanik/mapcoder)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ee3\u7801\u5408\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u590d\u6742\u7684\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u63cf\u8ff0\u3001\u751f\u6210\u590d\u6742\u7684\u7b97\u6cd5\u548c\u6570\u636e\u7ed3\u6784\u4ee3\u7801\uff0c\u5e76\u6267\u884c\u5168\u9762\u7684\u5355\u5143\u6d4b\u8bd5\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5373\u591a\u4ee3\u7406\u63d0\u793a\u6846\u67b6MapCoder\uff0c\u5b83\u6a21\u4eff\u4eba\u7c7b\u5f00\u53d1\u8005\u7f16\u7a0b\u5408\u6210\u7684\u5b8c\u6574\u8fc7\u7a0b\uff0c\u5206\u4e3a\u56db\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\uff08\u5927\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\uff1a\u56de\u5fc6\u76f8\u5173\u793a\u4f8b\u3001\u89c4\u5212\u3001\u4ee3\u7801\u751f\u6210\u548c\u8c03\u8bd5\u3002 \u901a\u8fc7\u5728\u516b\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7ade\u8d5b\u7ea7\u95ee\u9898\u89e3\u51b3\u548c\u7a0b\u5e8f\u5408\u6210\u57fa\u51c6\u4e0a\u8fdb\u884c\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u5305\u62ecHumanEval\uff0893.9%\uff09\u3001MBPP\uff0883.1%\uff09\u3001APPS\uff0822.0%\uff09\u3001CodeContests\uff0828.5%\uff09\u548cxCodeEval\uff0845.3%\uff09\u7b49\uff0cMapCoder\u5c55\u73b0\u4e86\u51fa\u8272\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u591a\u9879\u65b0\u7684\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u800c\u4e14\uff0c\u65e0\u8bba\u7f16\u7a0b\u8bed\u8a00\u8fd8\u662f\u95ee\u9898\u96be\u5ea6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u90fd\u8868\u73b0\u51fa\u6301\u7eed\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8be5\u6846\u67b6\uff0c\u4f9b\u7814\u7a76\u8005\u53c2\u8003\uff1ahttps://github.com/Md-Ashraful-Pramanik/MapCoder\u3002**|\n", "2405.14751": "|**2024-05-23**|**AGILE: A Novel Framework of LLM Agents**|Peiyuan Feng et.al.|[2405.14751](http://arxiv.org/abs/2405.14751)|**[link](https://github.com/bytarnish/agile)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3aLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406AGILE\uff08\u80fd\u591f\u4e0e\u7528\u6237\u4e92\u52a8\u5e76\u4ece\u73af\u5883\u4e2d\u5b66\u4e60\u7684\u4ee3\u7406\uff09\uff0c\u65e8\u5728\u6267\u884c\u590d\u6742\u7684\u5bf9\u8bdd\u4efb\u52a1\uff0c\u5229\u7528LLMs\u3001\u8bb0\u5fc6\u3001\u5de5\u5177\u548c\u4e13\u5bb6\u4ea4\u4e92\u3002\u8fd9\u79cd\u4ee3\u7406\u4e0d\u4ec5\u5177\u5907\u5bf9\u8bdd\u80fd\u529b\uff0c\u8fd8\u5177\u5907\u53cd\u601d\u3001\u5de5\u5177\u8fd0\u7528\u4ee5\u53ca\u54a8\u8be2\u4e13\u5bb6\u7684\u529f\u80fd\u3002\u6211\u4eec\u5c06\u6784\u5efa\u6b64\u7c7bLLM\u4ee3\u7406\u89c6\u4e3a\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u4e2dLLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\u3002\u6211\u4eec\u4f7f\u7528\u6807\u6ce8\u7684\u884c\u4e3a\u6570\u636e\u548cPPO\u7b97\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u3002\u7279\u522b\u5173\u6ce8\u7684\u662f\u95ee\u7b54\u4efb\u52a1\uff0c\u4e3a\u6b64\u6211\u4eec\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aProductQA\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u5728\u7ebf\u8d2d\u7269\u4e2d\u7684\u96be\u9898\u3002\u6211\u4eec\u5728ProductQA\u548cMedMCQA\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8e130\u4ebf\u548c70\u4ebf\u53c2\u6570\u7684LLM\u8bad\u7ec3\u7684AGILE\u4ee3\u7406\u80fd\u591f\u8d85\u8d8aGPT-4\u4ee3\u7406\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684 ablation\u7814\u7a76\u5f3a\u8c03\u4e86\u8bb0\u5fc6\u3001\u5de5\u5177\u3001\u54a8\u8be2\u3001\u53cd\u601d\u548c\u5f3a\u5316\u5b66\u4e60\u5728\u5b9e\u73b0\u4f18\u79c0\u6027\u80fd\u65b9\u9762\u7684\u91cd\u8981\u6027\u3002|\n", "2405.14744": "|**2024-05-23**|**Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View**|Xuan Liu et.al.|[2405.14744](http://arxiv.org/abs/2405.14744)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u53cd\u6620\u4e86\u4eba\u7c7b\u504f\u89c1\uff0c\u5b83\u4eec\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\u95ee\u9898\u3002\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u662f\uff1aLLMs\u662f\u5426\u80fd\u591f\u5229\u7528\u5e7b\u89c9\u6765\u6a21\u4eff\u4eba\u7c7b\u7684\u8ba4\u77e5\u504f\u89c1\uff0c\u4ece\u800c\u5c55\u73b0\u51fa\u975e\u7406\u6027\u4f46\u793e\u4f1a\u6027\u7684\u4e00\u9762\uff1f\u672c\u6587\u63a2\u8ba8\u4e86\u8fd9\u4e00\u95ee\u9898\uff0c\u901a\u8fc7\u7ed3\u5408\u5b9e\u7528\u7684\u793e\u4f1a\u79d1\u5b66\u5b9e\u9a8c\u548c\u7406\u8bba\u6d1e\u5bdf\uff0c\u63d0\u51faCogMir\uff0c\u4e00\u4e2a\u5f00\u653e\u5f0f\u591aLLM\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528LLMs\u7684\u5e7b\u89c9\u7279\u6027\u6765\u8bc4\u4f30\u548c\u63d0\u5347\u5176\u793e\u4f1a\u667a\u80fd\uff0c\u7279\u522b\u662f\u5728\u8ba4\u77e5\u504f\u5dee\u65b9\u9762\u3002\u6211\u4eec\u5728CogMir\u5b50\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u786e\u5b9a\u60c5\u5883\u4e0b\uff0cLLMs\u548c\u4eba\u7c7b\u5728\u975e\u7406\u6027\u53ca\u4eb2\u793e\u4f1a\u51b3\u7b56\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u4e00\u81f4\u6027\uff0c\u8fd9\u8868\u660eLLMs\u4f5c\u4e3a\u793e\u4f1a\u5b9e\u4f53\u7684\u4eb2\u793e\u4f1a\u6027\uff0c\u5e76\u7a81\u663e\u4e86\u5e7b\u89c9\u7279\u6027\u7684\u5173\u952e\u4f5c\u7528\u3002\u6b64\u5916\uff0cCogMir\u6846\u67b6\u5c55\u793a\u4e86\u5176\u4f5c\u4e3a\u7814\u7a76LLMs\u793e\u4f1a\u667a\u80fd\u7684\u6709\u4ef7\u503c\u5e73\u53f0\u7684\u6f5c\u529b\u3002|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547](http://arxiv.org/abs/2405.13547)|null|## \u80cc\u666f \u81ea\u52a8\u9a7e\u9a76\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u9700\u8981\u5148\u8fdb\u7684\u51b3\u7b56\u548c\u63a7\u5236\u7b97\u6cd5\u3002\u7406\u89e3\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u51b3\u7b56\u7684\u4f9d\u636e\u5bf9\u4e8e\u786e\u4fdd\u5176\u5728\u9ad8\u901f\u516c\u8def\u9a7e\u9a76\u4e2d\u7684\u5b89\u5168\u4e0e\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aHighwayLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\u6765\u9884\u6d4bego\u8f66\u8f86\u7684\u672a\u6765\u5bfc\u822a\u8def\u5f84\u70b9\u3002\u8be5\u65b9\u6cd5\u8fd8\u91c7\u7528\u9884\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u6a21\u578b\u4f5c\u4e3a\u9ad8\u5c42\u6b21\u89c4\u5212\u5668\uff0c\u5bf9\u5408\u9002\u7684\u5143\u7ea7\u52a8\u4f5c\u8fdb\u884c\u51b3\u7b56\u3002HighwayLLM\u5c06RL\u6a21\u578b\u7684\u8f93\u51fa\u4e0e\u5f53\u524d\u72b6\u6001\u4fe1\u606f\u76f8\u7ed3\u5408\uff0c\u751f\u6210\u5b89\u5168\u3001\u65e0\u78b0\u649e\u4e14\u53ef\u89e3\u91ca\u7684\u672a\u6765\u72b6\u6001\u9884\u6d4b\uff0c\u4ece\u800c\u6784\u5efa\u51fa\u8f66\u8f86\u7684\u884c\u9a76\u8f68\u8ff9\u3002\u968f\u540e\uff0c\u57fa\u4e8ePID\u7684\u63a7\u5236\u5668\u5f15\u5bfc\u8f66\u8f86\u9075\u5faaLLM\u4ee3\u7406\u9884\u6d4b\u7684\u8def\u5f84\u70b9\u3002\u8fd9\u79cdLLM\u4e0eRL\u548cPID\u7684\u878d\u5408\u63d0\u5347\u4e86\u51b3\u7b56\u8fc7\u7a0b\uff0c\u5e76\u4e3a\u9ad8\u901f\u516c\u8def\u81ea\u52a8\u9a7e\u9a76\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050](http://arxiv.org/abs/2405.13050)|**[link](https://github.com/daniel-chin/flute-x-gpt)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09-\u5728-\u73af\u5e94\u7528\u5df2\u663e\u793a\u51fa\u6709\u6548\u7406\u89e3\u7528\u6237\u547d\u4ee4\u3001\u5236\u5b9a\u8ba1\u5212\u5e76\u76f8\u5e94\u5730\u64cd\u4f5c\u5916\u90e8\u5de5\u5177/\u7cfb\u7edf\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u4ee3\u7406\u7684\u64cd\u4f5c\u8303\u56f4\u5c40\u9650\u4e8e\u88ab\u52a8\u54cd\u5e94\u7528\u6237\uff0c\u9700\u8981\u7528\u6237\u6839\u636e\u5e95\u5c42\u5de5\u5177/\u7cfb\u7edf\u6765\u8868\u8ff0\u9700\u6c42\u3002\u6211\u4eec\u6ce8\u610f\u5230LLM\u4ee3\u7406\u7528\u6237\u754c\u9762\uff08LAUI\uff09\u7684\u6f5c\u529b\u8fdc\u672a\u5145\u5206\u5229\u7528\u3002\u7406\u60f3\u7684LAUI\u8bbe\u60f3\u4e2d\uff0c\u7528\u6237\u65e0\u9700\u6df1\u5165\u4e86\u89e3\u5de5\u5177/\u7cfb\u7edf\uff0c\u5c31\u80fd\u4e0e\u4e4b\u4ea4\u4e92\u4ee5\u63a2\u7d22\u65b0\u5174\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u4e0d\u540c\u4e8e\u8bbe\u8ba1\u56fa\u5b9a\u7684\u53ef\u63a2\u7d22GUI\u6765\u6559\u6388\u7528\u6237\u4f7f\u7528\u7cfb\u7edf\u7684\u9884\u8bbe\u65b9\u5f0f\uff0cLAUI\u4e2d\u7684LLM\u4ee3\u7406\u4ece\u4e00\u5f00\u59cb\u5c31\u5bf9\u7cfb\u7edf\u719f\u7ec3\uff0c\u4e3b\u52a8\u5b66\u4e60\u7528\u6237\u53ca\u5176\u9700\u6c42\uff0c\u5e76\u5411\u7528\u6237\u63d0\u51fa\u65b0\u7684\u4e92\u52a8\u65b9\u6848\u3002\u4e3a\u4e86\u5c55\u793aLAUI\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\uff1aFlute X GPT\uff0c\u5b83\u7ed3\u5408\u4e86LLM\u4ee3\u7406\u3001\u63d0\u793a\u7ba1\u7406\u5668\u548c\u4e00\u4e2a\u652f\u6301\u590d\u6742\u5b9e\u65f6\u4f53\u9a8c\u7684\u7b1b\u5b50\u6559\u5b66\u591a\u5a92\u4f53\u8f6f\u786c\u4ef6\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u5b66\u4e60\u5439\u594f\u7b1b\u5b50\u7684\u8fc7\u7a0b\u3002|\n", "2405.13009": "|**2024-05-13**|**METAREFLECTION: Learning Instructions for Language Agents using Past Reflections**|Priyanshu Gupta et.al.|[2405.13009](http://arxiv.org/abs/2405.13009)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e7f\u53d7\u6b22\u8fce\uff0c\u4f46\u4e3a\u5176\u6267\u884c\u7279\u5b9a\u4efb\u52a1\u8bbe\u8ba1\u7cbe\u786e\u7684\u63d0\u793a\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u7528\u6237\u901a\u5e38\u9700\u8981\u4e0e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u591a\u8f6e\u5bf9\u8bdd\u4ee5\u8fbe\u6210\u76ee\u6807\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u6a21\u578b\u81ea\u8eab\u7684\u53cd\u9988\uff0c\u5373\u81ea\u53cd\u601d\uff0c\u80fd\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u8d77\u5230\u5f3a\u5316\u4f5c\u7528\uff0c\u6709\u52a9\u4e8e\u66f4\u5feb\u5730\u8fbe\u5230\u671f\u671b\u7ed3\u679c\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014METAREFLECTION\uff0c\u5b83\u80fd\u4ece\u8bad\u7ec3\u9636\u6bb5\u6536\u96c6\u5230\u7684\u4e2a\u4f53\u81ea\u53cd\u601d\u4e2d\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u901a\u7528\u63d0\u793a\u6307\u4ee4\u3002\u6211\u4eec\u5728\u57fa\u7840\u8bbe\u65bd\u5373\u4ee3\u7801\uff08IAC\uff09\u6f0f\u6d1e\u68c0\u6d4b\u548c\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4f7f\u7528REACT\u548cCOT\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cMETAREFLECTION\u663e\u8457\u4f18\u4e8eGPT-4\uff0c\u5206\u522b\u5728IAC\u3001COT\u548cREACT\u4e2d\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a16.82%\u300131.33%\u548c15.42%\uff0c\u8fd9\u8868\u660eMETAREFLECTION\u6709\u6f5c\u529b\u63d0\u5347LLMs\u7684\u6548\u7387\uff0c\u662f\u4e00\u79cd\u503c\u5f97\u63a2\u7d22\u7684\u7b56\u7565\u3002|\n", "2405.15414": "|**2024-05-24**|**Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification**|Yuxuan Guo et.al.|[2405.15414](http://arxiv.org/abs/2405.15414)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u4e2d\uff0c\u6784\u5efa\u5f00\u653e\u578b\u4ee3\u7406\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u7ec8\u6781\u76ee\u6807\uff0c\u7279\u522b\u662f\u521b\u9020\u6027\u7684\u4ee3\u7406\u66f4\u5177\u5438\u5f15\u529b\u3002\u73b0\u6709\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u6709\u660e\u786e\u76ee\u6807\u7684\u957f\u5e8f\u5217\u4efb\u52a1\uff08\u5982\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u201c\u5f00\u91c7\u94bb\u77f3\u201d\uff09\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u5904\u7406\u5177\u6709\u5f00\u653e\u76ee\u6807\u548c\u62bd\u8c61\u6807\u51c6\u7684\u521b\u9020\u6027\u4efb\u52a1\u65f6\u9047\u5230\u56f0\u96be\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5f25\u5408\u8fd9\u4e9b\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u4ece\u800c\u7f3a\u4e4f\u81ea\u6211\u6539\u8fdb\u6765\u89e3\u51b3\u95ee\u9898\u7684\u53cd\u9988\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5f15\u5165\u4e86\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\u6280\u672f\uff0c\u4ee5\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u4e3a\u521b\u9020\u6027\u4efb\u52a1\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Luban\u4ee3\u7406\uff0c\u4e13\u6ce8\u4e8e\u300a\u6211\u7684\u4e16\u754c\u300b\u4e2d\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5b83\u914d\u5907\u4e86\u4e24\u7ea7\u81ea\u4e3b\u5b9e\u4f53\u9a8c\u8bc1\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u5b9e\u8df5\uff1a\uff081\uff09\u89c6\u89c9\u9a8c\u8bc13D\u7ed3\u6784\u63a8\u6d4b\uff0c\u901a\u8fc7\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u7684CAD\u5efa\u6a21\u7a0b\u5e8f\u5b9e\u73b0\uff1b\uff082\uff09\u5b9e\u7528\u9a8c\u8bc1\uff0c\u6839\u636e\u62bd\u8c61\u6807\u51c6\u751f\u6210\u5e76\u9a8c\u8bc1\u4e0e\u73af\u5883\u76f8\u5173\u7684\u529f\u80fd\u7a0b\u5e8f\u3002\u5e7f\u6cdb\u7684\u591a\u7ef4\u5ea6\u4eba\u7c7b\u7814\u7a76\u548cElo\u8bc4\u7ea7\u663e\u793a\uff0cLuban\u80fd\u591f\u5728\u6211\u4eec\u63d0\u51fa\u7684\u57fa\u51c6\u4e2d\u5b8c\u6210\u591a\u6837\u5316\u7684\u521b\u9020\u6027\u5efa\u7b51\u4efb\u52a1\uff0c\u5e76\u5728\u53ef\u89c6\u5316\u548c\u5b9e\u7528\u6027\u65b9\u9762\u5206\u522b\u6bd4\u5176\u4ed6\u57fa\u7ebf\u63d0\u9ad8\u4e8633%\u5230100%\u3002\u6b64\u5916\uff0c\u5b9e\u73b0\u5728\u771f\u5b9e\u4e16\u754c\u673a\u5668\u4eba\u624b\u81c2\u4e0a\u7684\u6f14\u793a\u5c55\u793a\u4e86Luban\u5728\u7269\u7406\u4e16\u754c\u4e2d\u7684\u521b\u4f5c\u6f5c\u529b\u3002|\n", "2405.15145": "|**2024-05-24**|**CulturePark: Boosting Cross-cultural Understanding in Large Language Models**|Cheng Li et.al.|[2405.15145](http://arxiv.org/abs/2405.15145)|null|\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u666e\u904d\u5b58\u5728\u6587\u5316\u504f\u89c1\uff0c\u4e3b\u8981\u6e90\u4e8e\u7f3a\u4e4f\u4ee3\u8868\u4e0d\u540c\u6587\u5316\u7684\u4ee3\u8868\u6027\u6570\u636e\u3002\u4f20\u7edf\u7684\u6587\u5316\u6570\u636e\u96c6\u548c\u57fa\u51c6\u901a\u5e38\u901a\u8fc7\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u63d0\u53d6\u6216\u805a\u5408\u6765\u81ea\u7ef4\u57fa\u767e\u79d1\u548c\u793e\u4ea4\u5a92\u4f53\u7684\u4fe1\u606f\u6784\u5efa\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u548c\u4eba\u5de5\u6807\u6ce8\uff0c\u6210\u672c\u9ad8\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u672c\u6587\u501f\u9274\u8ba4\u77e5\u793e\u4f1a\u4ea4\u6d41\u7406\u8bba\uff0c\u63d0\u51faCulturePark\uff0c\u4e00\u4e2a\u5229\u7528LLMs\u7684\u591a\u4ee3\u7406\u6c9f\u901a\u6846\u67b6\uff0c\u7528\u4e8e\u6587\u5316\u6570\u636e\u6536\u96c6\u3002CulturePark\u901a\u8fc7\u6a21\u62df\u4e0d\u540c\u6587\u5316\u80cc\u666f\u4e0b\u7684\u4eba\u7c7b\u4ea4\u6d41\uff0c\u8ba9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89d2\u8272\u626e\u6f14\uff0c\u751f\u6210\u5305\u542b\u4eba\u7c7b\u4fe1\u5ff5\u3001\u89c4\u8303\u548c\u4e60\u4fd7\u7684\u9ad8\u8d28\u91cf\u8de8\u6587\u5316\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528CulturePark\u751f\u6210\u4e8641,000\u4e2a\u6587\u5316\u6837\u672c\uff0c\u5bf9\u516b\u79cd\u7279\u5b9a\u6587\u5316\u8fdb\u884c\u4e86\u6a21\u578b\u5fae\u8c03\u3002\u5728\u4e09\u9879\u4e0b\u6e38\u4efb\u52a1\u8bc4\u4f30\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u8868\u73b0\u4f18\u4e8eGPT-4\uff1a\u5185\u5bb9\u8fc7\u6ee4\u3001\u6587\u5316\u4e00\u81f4\u6027\uff08\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u8868\u4e0a\uff09\u548c\u6587\u5316\u6559\u80b2\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684GPT-3.5\u6a21\u578b\u5728\u5185\u5bb9\u8fc7\u6ee4\u4efb\u52a1\u4e0a\u4e0eGPT-4\u76f8\u5f53\u6216\u4f18\u4e8e\u5b83\uff1b\u5728\u6587\u5316\u4e00\u81f4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u970d\u592b\u65af\u6cf0\u5fb7\u6587\u5316\u7ef4\u5ea6\u91cf\u886813\u6846\u67b6\u4e0a\u8d85\u8d8aGPT-4\uff1b\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u6587\u5316\u6559\u80b2\u6548\u679c\u548c\u7528\u6237\u4f53\u9a8c\u4e0a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u8868\u73b0\u51fa\u8272\u3002CulturePark\u5bf9\u4e8e\u51cf\u5c11\u6587\u5316\u504f\u89c1\u548c\u63a8\u52a8AI\u7684\u6c11\u4e3b\u5316\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u5f3a\u8c03\u4e86\u6587\u5316\u5305\u5bb9\u6027\u6570\u636e\u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002|\n", "2405.14918": "|**2024-05-23**|**AnalogCoder: Analog Circuit Design via Training-Free Code Generation**|Yao Lai et.al.|[2405.14918](http://arxiv.org/abs/2405.14918)|**[link](https://github.com/laiyao1/AnalogCoder)**|### \u7ffb\u8bd1 \u5728\u73b0\u4ee3\u82af\u7247\u6280\u672f\u4e2d\uff0c\u6a21\u62df\u7535\u8def\u8bbe\u8ba1\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u7ec4\u4ef6\u9009\u62e9\u3001\u8fde\u63a5\u548c\u53c2\u6570\u8bbe\u7f6e\u4ee5\u786e\u4fdd\u7535\u8def\u529f\u80fd\u6b63\u5e38\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b57\u7535\u8def\u8bbe\u8ba1\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u4f46\u6a21\u62df\u7535\u8def\u7684\u590d\u6742\u6027\u548c\u6570\u636e\u7a00\u7f3a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a8\u51fa\u4e86AnalogCoder\uff0c\u8fd9\u662f\u9996\u4e2a\u65e0\u9700\u8bad\u7ec3\u7684LLM\u4ee3\u7406\uff0c\u4e13\u4e3a\u901a\u8fc7Python\u4ee3\u7801\u751f\u6210\u6765\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u9996\u5148\uff0cAnalogCoder\u91c7\u7528\u53cd\u9988\u589e\u5f3a\u6d41\u7a0b\uff0c\u5e76\u7ed3\u5408\u5b9a\u5236\u7684\u9886\u57df\u7279\u5b9a\u63d0\u793a\uff0c\u80fd\u591f\u81ea\u52a8\u4e14\u81ea\u6211\u6821\u6b63\u5730\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\uff0c\u6210\u529f\u7387\u9ad8\u3002\u5176\u6b21\uff0c\u5b83\u63d0\u51fa\u4e86\u4e00\u5957\u7535\u8def\u5de5\u5177\u5e93\uff0c\u7528\u4e8e\u5b58\u50a8\u6210\u529f\u7684\u7535\u8def\u8bbe\u8ba1\u4f5c\u4e3a\u53ef\u91cd\u7528\u7684\u6a21\u5757\u5316\u5b50\u7535\u8def\uff0c\u7b80\u5316\u4e86\u590d\u5408\u7535\u8def\u7684\u521b\u5efa\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAnalogCoder\u5728\u5e7f\u6cdb\u8986\u76d6\u6a21\u62df\u7535\u8def\u4efb\u52a1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8d85\u8d8a\u4e86\u5176\u4ed6\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0c\u6210\u529f\u8bbe\u8ba1\u4e8620\u4e2a\u7535\u8def\uff0c\u6bd4\u6807\u51c6GPT-4o\u591a\u51fa5\u4e2a\u3002\u6211\u4eec\u76f8\u4fe1AnalogCoder\u80fd\u663e\u8457\u63d0\u5347\u82af\u7247\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u6548\u7387\uff0c\u8ba9\u975e\u4e13\u5bb6\u4e5f\u80fd\u9ad8\u6548\u8bbe\u8ba1\u6a21\u62df\u7535\u8def\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u57fa\u51c6\u5df2\u63d0\u4f9b\u5728\uff1a[https://github.com/anonyanalog/AnalogCoder](https://github.com/anonyanalog/AnalogCoder)\u3002|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|## \u80cc\u666f \u7531\u4e8e\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0cEmbodied agent \u9700\u8981\u5177\u5907\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u6700\u8fd1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u51b3\u5b9a\u5177\u4f53\u884c\u52a8\u65f6\u53ef\u80fd\u4ea7\u751f\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u7684\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3 LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\u2014\u2014\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bad\u7ec3\u7b56\u7565\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u7684\u6700\u9ad8\u6210\u5c31\u9700\u8981\u66f4\u4e3a\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u6bd4\u73b0\u6709\u6700\u5feb\u65b9\u6cd5\u5feb\u51fa\u4e866.8\u500d\u3002|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510](http://arxiv.org/abs/2405.16510)|null|\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\u63a8\u52a8\u4e86\u667a\u80fd\u4ee3\u7406\u7814\u7a76\u7684\u65b0\u70ed\u6f6e\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5b9e\u73b0\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6709\u524d\u666f\u65b9\u6cd5\uff0c\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u548c\u6cdb\u5316\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u5728\u5b9e\u9645\u4efb\u52a1\u4e2d\uff0c\u6709\u6548\u7684\u89c4\u5212\u5bf9LLM\u4ee3\u7406\u7684\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u590d\u6742\u4efb\u52a1\u8bbe\u8ba1\u51fa\u53ef\u884c\u6216\u6700\u4f18\u7684\u7cbe\u7ec6\u7c92\u5ea6\u64cd\u4f5c\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9700\u8981\u7ec4\u5408\u5927\u91cf\u5f02\u8d28\u884c\u52a8\u7684\u5e8f\u5217\uff0c\u4ecd\u662f\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMeta-Task Planning\uff08MTP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u96f6\u6837\u672c\u7684\u534f\u4f5c\u5f0fLLM\u591a\u4ee3\u7406\u7cfb\u7edf\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u5373\u5143\u4efb\u52a1\uff0c\u7b80\u5316\u4e86\u4efb\u52a1\u89c4\u5212\u3002\u6bcf\u4e2a\u5143\u4efb\u52a1\u968f\u540e\u6620\u5c04\u4e3a\u53ef\u6267\u884c\u52a8\u4f5c\u3002\u5728TravelPlanner\u548cAPI-Bank\u4e24\u4e2a\u4e25\u683c\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86MTP\u3002\u7ed3\u679c\u8868\u660e\uff0cMTP\u5728TravelPlanner\u4e0a\u7684\u5e73\u5747\u6210\u529f\u7387\u7ea6\u4e3a40%\uff0c\u8fdc\u8d85\u5f53\u524d\u6700\u4f73\u57fa\u7ebf\uff082.92%\uff09\uff0c\u5e76\u4e14\u5728API-Bank\u4e0a\u7684\u6027\u80fd\u6bd4\u4f7f\u7528ReAct\u7684LLM_{api}-4\u9ad8\u51fa\u7ea614%\uff0c\u8fd9\u663e\u793a\u51fa\u5c06LLM\u4e0e\u591a\u4ee3\u7406\u7cfb\u7edf\u76f8\u7ed3\u5408\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376](http://arxiv.org/abs/2405.16376)|**[link](https://github.com/cyrilli/stride)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u8bed\u8a00\u80fd\u529b\u548c\u63a8\u7406\u6280\u5de7\u3002\u7136\u800c\uff0c\u5728\u6218\u7565\u6027\u7684\u591a\u4ee3\u7406\u51b3\u7b56\u73af\u5883\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u5c40\u9650\uff0c\u5982\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5dee\u3001\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\u548c\u751f\u6210\u9519\u8bef\u4fe1\u606f\u3002\u8fd9\u4e9b\u7f3a\u70b9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9075\u5b88\u590d\u6742\u6e38\u620f\u89c4\u5219\u3001\u957f\u671f\u89c4\u5212\u3001\u63a2\u7d22\u672a\u77e5\u73af\u5883\u4ee5\u53ca\u9884\u6d4b\u5bf9\u624b\u884c\u52a8\u7684\u4e92\u52a8\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u7ed3\u5408\u4e86\u8bb0\u5fc6\u548c\u4e13\u4e1a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u63d0\u5347\u5176\u5728\u6218\u7565\u51b3\u7b56\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7279\u522b\u5728\u53cc\u8fb9\u8c08\u5224\u3001\u591a\u4ee3\u7406\u52a8\u6001\u673a\u5236\u8bbe\u8ba1\u7b49\u7ecf\u6d4e\u91cd\u8981\u573a\u666f\u4e2d\u5e94\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5e76\u901a\u8fc7\u5b9a\u91cf\u6307\u6807\u8bc4\u4f30\u5728\u5404\u79cd\u6218\u7565\u51b3\u7b56\u95ee\u9898\u4e0a\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u589e\u5f3a\u6846\u67b6\u663e\u8457\u63d0\u9ad8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5f53\u524d\u6a21\u578b\u5b58\u5728\u56fa\u6709\u5c40\u9650\uff0c\u4f46\u6211\u4eec\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u589e\u5f3a\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u53ef\u80fd\u6027\uff0c\u8fd9\u4e3a\u672a\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u5411\u3002**|\n", "2405.16334": "|**2024-05-29**|**Devil's Advocate: Anticipatory Reflection for LLM Agents**|Haoyu Wang et.al.|[2405.16334](http://arxiv.org/abs/2405.16334)|null|\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8d4b\u4e88\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u6211\u53cd\u601d\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65f6\u7684\u4e00\u81f4\u6027\u548c\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fc3\u4f7fLLM\u4ee3\u7406\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u5b50\u4efb\u52a1\uff08\u5373\u5236\u5b9a\u8ba1\u5212\uff09\uff0c\u5e76\u5728\u6267\u884c\u884c\u52a8\u4e4b\u524d\u6301\u7eed\u53cd\u601d\u53ef\u80fd\u7684\u5931\u8d25\u53ca\u5176\u8865\u6551\u63aa\u65bd\u3001\u6267\u884c\u540e\u4e0e\u5b50\u4efb\u52a1\u76ee\u6807\u5bf9\u9f50\u5e76\u8fdb\u884c\u5fc5\u8981\u7684\u56de\u6eaf\u4ee5\u786e\u4fdd\u5168\u529b\u4ee5\u8d74\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u53ca\u5728\u5b8c\u6210\u8ba1\u5212\u540e\u8fdb\u884c\u5168\u9762\u5ba1\u67e5\uff0c\u4ee5\u4fbf\u4e8e\u672a\u6765\u7b56\u7565\u7684\u4f18\u5316\u3002\u901a\u8fc7\u5728WebArena\u4e2d\u96f6\u6837\u672c\u5e94\u7528\u8fd9\u4e00\u65b9\u6cd5\u5904\u7406\u5b9e\u9645\u7684\u7f51\u7edc\u73af\u5883\u4efb\u52a1\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u4f18\u4e8e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u57fa\u4e8e\u53cd\u601d\u7684\u7b56\u7565\u4e0d\u4ec5\u63d0\u5347\u4e86\u4ee3\u7406\u5e94\u5bf9\u672a\u9884\u89c1\u6311\u6218\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u901a\u8fc7\u5f3a\u5927\u7684\u8ba1\u5212\u6267\u884c\u673a\u5236\uff0c\u8fd8\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u51cf\u5c11\u4e86\u5b9e\u73b0\u4efb\u52a1\u6240\u9700\u7684\u5c1d\u8bd5\u6b21\u6570\u548c\u8ba1\u5212\u4fee\u8ba2\u6b21\u6570\u3002|\n", "2405.16247": "|**2024-05-25**|**AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning**|Minghao Chen et.al.|[2405.16247](http://arxiv.org/abs/2405.16247)|**[link](https://github.com/minghchen/automanual)**|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u5404\u79cd\u9886\u57df\u4efb\u52a1\uff0c\u5982\u673a\u5668\u4eba\u3001\u6e38\u620f\u548c\u7f51\u7edc\u5bfc\u822a\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u548c\u4e13\u5bb6\u7ea7\u63d0\u793a\u624d\u80fd\u9002\u5e94\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u9002\u5e94\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AutoManual\u6846\u67b6\uff0c\u8ba9LLMs\u80fd\u591f\u901a\u8fc7\u4e92\u52a8\u81ea\u4e3b\u6784\u5efa\u7406\u89e3\uff0c\u5e76\u9002\u5e94\u65b0\u73af\u5883\u3002AutoManual\u5c06\u73af\u5883\u77e5\u8bc6\u5206\u4e3a\u591a\u6837\u7684\u89c4\u5219\uff0c\u5e76\u901a\u8fc7\u4e24\u4e2a\u4ee3\u7406\u8fdb\u884c\u5728\u7ebf\u4f18\u5316\uff1a1\uff09\u89c4\u5212\u5668\u6839\u636e\u5f53\u524d\u89c4\u5219\u5236\u5b9a\u53ef\u64cd\u4f5c\u7684\u884c\u52a8\u8ba1\u5212\uff1b2\uff09\u6784\u5efa\u8005\u901a\u8fc7\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u89c4\u5219\u7cfb\u7edf\u66f4\u65b0\u89c4\u5219\uff0c\u4fc3\u8fdb\u5728\u7ebf\u89c4\u5219\u7ba1\u7406\u5e76\u4fdd\u6301\u5173\u952e\u7ec6\u8282\u3002\u4e3a\u4e86\u51cf\u5c11\u5728\u7ba1\u7406\u89c4\u5219\u65f6\u7684\u5e7b\u89c9\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u6848\u4f8b\u6761\u4ef6\u63d0\u793a\u201d\u7b56\u7565\u7528\u4e8e\u6784\u5efa\u8005\u3002\u6700\u7ec8\uff0c\u7f16\u8bd1\u5668\u4ee3\u7406\u5c06\u8fd9\u4e9b\u89c4\u5219\u6574\u5408\u6210\u4e00\u4efd\u5168\u9762\u7684\u624b\u518c\u3002\u8fd9\u4efd\u81ea\u6211\u751f\u6210\u7684\u624b\u518c\u4e0d\u4ec5\u80fd\u63d0\u9ad8\u9002\u5e94\u6027\uff0c\u8fd8\u80fd\u6307\u5bfc\u5c0f\u578bLLMs\u7684\u89c4\u5212\uff0c\u540c\u65f6\u4fdd\u6301\u4eba\u7c7b\u53ef\u8bfb\u3002\u4ec5\u51ed\u4e00\u6b21\u7b80\u5355\u6f14\u793a\uff0cAutoManual\u663e\u8457\u63d0\u9ad8\u4e86\u4efb\u52a1\u6210\u529f\u7387\uff0cGPT-4-turbo\u4e0b\u8fbe\u523097.4%\uff0cGPT-3.5-turbo\u4e0b\u4e3a86.2%\u3002\u6e90\u4ee3\u7801\u5373\u5c06\u53d1\u5e03\u3002|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208](http://arxiv.org/abs/2405.18208)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u8868\u660e\uff0c\u8fd9\u4e9b\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u4e9b\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\uff0c\u5982\u5199\u4f5c\u548c\u7f16\u7801\uff0c\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u9700\u8981\u7efc\u5408\u89c4\u5212\u7684\u4efb\u52a1\u4e0a\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u4ecd\u662f\u5f53\u524d\u6a21\u578b\u7684\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u805a\u7126\u4e8e\u65c5\u884c\u89c4\u5212\uff0c\u8fd9\u662f\u4e00\u4e2a\u6d89\u53ca\u591a\u4e2a\u9636\u6bb5\u7684\u590d\u6742\u95ee\u9898\uff0c\u5305\u62ec\u63d0\u7eb2\u3001\u4fe1\u606f\u6536\u96c6\u548c\u89c4\u5212\uff0c\u901a\u5e38\u4f34\u968f\u7740\u5404\u79cd\u7ea6\u675f\u548c\u4e0d\u786e\u5b9a\u6027\u3002\u73b0\u6709\u7684\u63a8\u7406\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u6548\u679c\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u89c4\u5212\u6846\u67b6\uff0c\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6a21\u4eff\u4eba\u7c7b\u89e3\u51b3\u591a\u9636\u6bb5\u95ee\u9898\u7684\u6b65\u9aa4\uff0c\u4ee5\u63d0\u5347\u5176\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5b9e\u65bd\u7b56\u7565\uff0c\u8ba9\u6a21\u578b\u80fd\u4e3a\u6bcf\u4e2a\u65c5\u884c\u67e5\u8be2\u751f\u6210\u8fde\u8d2f\u7684\u63d0\u7eb2\uff0c\u6a21\u62df\u4eba\u7c7b\u7684\u89c4\u5212\u6a21\u5f0f\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u7b56\u7565\u5757\u548c\u77e5\u8bc6\u5757\u5230\u6846\u67b6\u4e2d\uff1a\u7b56\u7565\u5757\u5e2e\u52a9\u4fe1\u606f\u641c\u96c6\uff0c\u800c\u77e5\u8bc6\u5757\u63d0\u4f9b\u8be6\u7ec6\u89c4\u5212\u6240\u9700\u7684\u5fc5\u8981\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u6846\u67b6\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89c4\u5212\u80fd\u529b\u7684\u663e\u8457\u63d0\u5347\uff0c\u4f7f\u5176\u5728\u5904\u7406\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u65f6\u6548\u7387\u548c\u6548\u679c\u90fd\u6709\u6240\u63d0\u9ad8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u4e0eGPT-4-Turbo\u7ed3\u5408\u65f6\uff0c\u6211\u4eec\u7684\u6846\u67b6\u76f8\u8f83\u4e8e\u57fa\u7840\u6846\u67b6\u5728GPT-4-Turbo\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e8610\u500d\u3002|\n", "2405.18113": "|**2024-05-28**|**Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting**|Hongda Sun et.al.|[2405.18113](http://arxiv.org/abs/2405.18113)|null|\u968f\u7740\u5728\u7ebf\u62db\u8058\u670d\u52a1\u7684\u5174\u8d77\uff0c\u4f20\u7edf\u7684\u6c42\u804c\u548c\u62db\u8058\u65b9\u5f0f\u53d1\u751f\u4e86\u53d8\u9769\uff0c\u8feb\u5207\u9700\u8981\u5f00\u53d1\u9ad8\u8d28\u91cf\u7684\u5de5\u4e1a\u5e94\u7528\u6765\u63d0\u5347\u6c42\u804c\u8005\u4e0e\u804c\u4f4d\u7684\u5339\u914d\u5ea6\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u6f5c\u5728\u8bed\u4e49\u5efa\u6a21\uff0c\u5b66\u4e60\u4e24\u8005\u4e4b\u95f4\u7684\u5339\u914d\u51fd\u6570\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89d2\u8272\u626e\u6f14\u65b9\u9762\u5f3a\u5927\u80fd\u529b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u5f15\u5165LLMs\u6a21\u62df\u9762\u8bd5\u73af\u8282\uff0c\u8ba9\u5176\u4e0e\u6c42\u804c\u8005\u8fdb\u884c\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u5019\u9009\u4eba\u8bc4\u4f30\u63d0\u4f9b\u989d\u5916\u8bc1\u636e\uff0c\u4ece\u800c\u589e\u5f3a\u4ec5\u57fa\u4e8e\u7b80\u5386\u548c\u804c\u4f4d\u63cf\u8ff0\u7684\u4e2a\u6027\u5316\u5339\u914d\u3002\u7136\u800c\uff0c\u5728\u7f51\u7edc\u62db\u8058\u4e2d\u7684\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u89d2\u8272\u5851\u9020\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5982\u63d0\u95ee\u6280\u5de7\u3001\u56de\u7b54\u6784\u5efa\u4ee5\u53ca\u53cc\u5411\u5339\u914d\u5ea6\u8bc4\u4f30\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMockLLM\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u6846\u67b6\uff0c\u5c06\u4eba\u804c\u5339\u914d\u8fc7\u7a0b\u5212\u5206\u4e3a\u4e24\u4e2a\u6a21\u5757\uff1a\u6a21\u62df\u9762\u8bd5\u751f\u6210\u548c\u63e1\u624b\u534f\u8bae\u4e2d\u7684\u53cc\u5411\u8bc4\u4f30\uff0c\u901a\u8fc7\u9762\u8bd5\u5b98\u548c\u6c42\u804c\u8005\u4e4b\u95f4\u7684\u534f\u4f5c\u884c\u4e3a\u5171\u540c\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u89d2\u8272\u3001\u591a\u884c\u4e3a\u7684\u6846\u67b6\uff0c\u4f7f\u5355\u4e00\u7684LLM\u4ee3\u7406\u80fd\u6709\u6548\u5730\u626e\u6f14\u53cc\u65b9\u7684\u4e0d\u540c\u804c\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u53cd\u601d\u8bb0\u5fc6\u751f\u6210\u548c\u52a8\u6001\u63d0\u793a\u4fee\u6539\u6280\u672f\uff0c\u4ee5\u4f18\u5316\u53cc\u65b9\u7684\u884c\u4e3a\uff0c\u6301\u7eed\u4f18\u5316\u9644\u52a0\u7684\u8bc4\u4f30\u8bc1\u636e\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMockLLM\u5728\u4eba\u804c\u5339\u914d\u4e0a\u7684\u8868\u73b0\u6700\u4f18\uff0c\u4e14\u6a21\u62df\u9762\u8bd5\u8d28\u91cf\u9ad8\uff0c\u9884\u793a\u7740\u5b83\u5728\u672a\u6765\u5728\u7ebf\u62db\u8058\u4e2d\u7684\u5b9e\u9645\u5e94\u7528\u524d\u666f\u5e7f\u9614\u3002|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092](http://arxiv.org/abs/2405.18092)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|**\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e94\u7528\u4e8e\u6570\u5b57\u5b6a\u751f\u8fc7\u7a0b\u6a21\u62df\u7684\u53c2\u6570\u81ea\u52a8\u5316\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5305\u542b\u89c2\u5bdf\u3001\u63a8\u7406\u3001\u51b3\u7b56\u548c\u603b\u7ed3\u56db\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\u3002\u901a\u8fc7\u5b9e\u73b0LLM\u4ee3\u7406\u4e0e\u6a21\u62df\u6a21\u578b\u7684\u52a8\u6001\u4ea4\u4e92\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u81ea\u52a8\u63a2\u7d22\u53c2\u6570\u8bbe\u7f6e\uff0c\u5229\u7528\u542f\u53d1\u5f0f\u63a8\u7406\u786e\u5b9a\u4e00\u7ec4\u63a7\u5236\u6a21\u62df\u4ee5\u8fbe\u6210\u76ee\u6807\u7684\u53c2\u6570\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u6ce8\u5165LLM\u7684\u542f\u53d1\u5f0f\uff0c\u589e\u5f3a\u6a21\u62df\u6a21\u578b\uff0c\u5e76\u652f\u6301\u81ea\u4e3b\u641c\u7d22\u4ee5\u89e3\u51b3\u7528\u6237\u4efb\u52a1\uff0c\u6709\u671b\u63d0\u9ad8\u7528\u6237\u4f53\u9a8c\u5e76\u51cf\u8f7b\u4eba\u7c7b\u7528\u6237\u5728\u590d\u6742\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u7814\u7a76\u901a\u8fc7\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u4e0e\u529f\u80fd\uff0c\u5e76\u5728GitHub\u4ed3\u5e93\u63d0\u4f9b\u4e86\u53ef\u89c6\u5316\u7684\u6f14\u793a\u3002**|\n", "2405.17837": "|**2024-05-28**|**Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces**|Qiuyu Lu et.al.|[2405.17837](http://arxiv.org/abs/2405.17837)|null|\u5728\u4eba\u673a\u4ea4\u4e92\uff08HCI\uff09\u9886\u57df\uff0c\u4ea4\u4e92\u8bbe\u5907\u7684\u8bbe\u8ba1\u5f00\u53d1\u662f\u5173\u952e\u5173\u6ce8\u70b9\u3002\u968f\u7740\u65b0\u578b\u786c\u4ef6\u548c\u5148\u8fdb\u5236\u9020\u6280\u672f\u7684\u5174\u8d77\uff0c\u5bf9\u80fd\u591f\u7b80\u5316\u539f\u578b\u5236\u4f5c\u8fc7\u7a0b\u7684\u4e13\u95e8\u8bbe\u8ba1\u5de5\u5177\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5de5\u5177\u867d\u7136\u901a\u8fc7\u53c2\u6570\u5316\u8bbe\u8ba1\u548c\u6a21\u62df\u7b80\u5316\u6d41\u7a0b\uff0c\u4f46\u5b66\u4e60\u66f2\u7ebf\u8f83\u9661\uff0c\u4e14\u5728\u6fc0\u53d1\u521b\u65b0\u601d\u7ef4\u65b9\u9762\u6709\u6240\u6b20\u7f3a\u3002\u672c\u7814\u7a76\u4ee5\u6d41\u4f53\u8ba1\u7b97\u754c\u9762\u4e3a\u4f8b\uff0c\u63a2\u8ba8\u5982\u4f55\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u589e\u5f3a\u7269\u7406\u8bbe\u5907\u8bbe\u8ba1\u5de5\u5177\uff0c\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u8bbe\u8ba1\u5de5\u5177\uff08GDT\uff09\u3002\u501f\u52a9LLM\uff0cGDT\u80fd\u591f\u7406\u89e3\u65b0\u8bbe\u5907\u7684\u7279\u6027\u548c\u5c40\u9650\uff0c\u63d0\u51fa\u591a\u6837\u3001\u5bcc\u6709\u6d1e\u5bdf\u529b\u4e14\u5b9e\u7528\u7684\u5e94\u7528\u573a\u666f\uff0c\u63a8\u8350\u6280\u672f\u548c\u60c5\u5883\u9002\u5b9c\u7684\u8bbe\u5907\u8bbe\u8ba1\uff0c\u5e76\u81ea\u52a8\u751f\u6210\u8bbe\u8ba1\u53c2\u6570\uff0c\u4ee5\u4fbf\u4f20\u7edf\u8bbe\u8ba1\u5de5\u5177\u5c55\u793a\u7ed3\u679c\u5e76\u751f\u6210\u52a0\u5de5\u6240\u9700\u7684\u6587\u4ef6\u3002\u672c\u6587\u9610\u8ff0\u4e86GDT\u7684\u6846\u67b6\u3001\u5b9e\u73b0\u548c\u6027\u80fd\uff0c\u5e76\u53cd\u601d\u5176\u524d\u666f\u53ca\u9047\u5230\u7684\u6311\u6218\u3002|\n", "2405.20267": "|**2024-05-30**|**Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions**|Ruochen Zhao et.al.|[2405.20267](http://arxiv.org/abs/2405.20267)|**[link](https://github.com/Auto-Arena/Auto-Arena-LLMs)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65e5\u65b0\u6708\u5f02\uff0c\u8feb\u5207\u9700\u8981\u4e00\u79cd\u53ef\u9760\u4e14\u53ca\u65f6\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u9274\u4e8e\u9759\u6001\u57fa\u51c6\u6613\u53d7\u6c61\u67d3\uff0c\u7528\u6237\u5f80\u5f80\u4f9d\u8d56\u4e8e\u50cfChatbot Arena\u8fd9\u6837\u7684\u4eba\u7c7b\u6295\u7968\u5e73\u53f0\u3002\u7136\u800c\uff0c\u4eba\u5de5\u6807\u6ce8\u9700\u8981\u5927\u91cf\u4eba\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51faAuto-Arena\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u52a8\u5316\u5168\u6d41\u7a0b\u7684LLM\u8bc4\u4f30\u6846\u67b6\u3002\u9996\u5148\uff0c\u7531\u8003\u5b98LLM\u8bbe\u8ba1\u95ee\u9898\uff1b\u63a5\u7740\uff0c\u5019\u9009LLMs\u56f4\u7ed5\u95ee\u9898\u8fdb\u884c\u591a\u8f6e\u76f8\u4e92\u5bf9\u51b3\uff0c\u66b4\u9732\u51fa\u5b83\u4eec\u7684\u771f\u5b9e\u6027\u80fd\u5dee\u8ddd\uff1b\u6700\u540e\uff0c\u7531LLM\u88c1\u5224\u96c6\u4f53\u8ba8\u8bba\u5e76\u51b3\u5b9a\u80dc\u8005\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u89c1\uff0c\u63d0\u5347\u516c\u5e73\u6027\u3002\u6211\u4eec\u5728\u6700\u65b017\u6b3eLLMs\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0cAuto-Arena\u4e0e\u4eba\u7c7b\u504f\u597d\u5177\u6709\u6700\u9ad8\u7684\u76f8\u5173\u6027\uff0c\u4e3a\u66ff\u4ee3\u4eba\u7c7b\u8bc4\u4ef7\u5e73\u53f0\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189](http://arxiv.org/abs/2405.20189)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u4e3aNadine\u793e\u4ea4\u673a\u5668\u4eba\u5e73\u53f0\u5f00\u53d1\u667a\u80fd\u548c\u5065\u58ee\u7684\u793e\u4ea4\u673a\u5668\u4eba\u7cfb\u7edf\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u901a\u8fc7\u96c6\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5de7\u5999\u5730\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u7684\u5f3a\u5927\u63a8\u7406\u548c\u6307\u4ee4\u6267\u884c\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u63a5\u8fd1\u4eba\u7c7b\u7684\u611f\u6027\u4e0e\u8ba4\u77e5\u80fd\u529b\u3002\u8fd9\u4e0e\u5f53\u524d\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u76f8\u6bd4\u662f\u521b\u65b0\u7684\uff0c\u56e0\u4e3a\u5b83\u4eec\u901a\u5e38\u4e0d\u5177\u5907\u4eba\u7c7b\u5f0f\u7684\u957f\u671f\u8bb0\u5fc6\u6216\u590d\u6742\u7684\u60c5\u611f\u8bc4\u4f30\u529f\u80fd\u3002\u793e\u4ea4\u673a\u5668\u4eba\u7684\u81ea\u7136\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u7cfb\u7edf\u5404\u7ec4\u4ef6\u7684\u6027\u80fd\u548c\u534f\u540c\u5de5\u4f5c\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7cfb\u7edf\uff0c\u80fd\u591f\u901a\u8fc7\u591a\u6a21\u6001\u8f93\u5165\u5904\u7406\u751f\u6210\u6070\u5f53\u7684\u884c\u4e3a\uff0c\u6839\u636e\u8bc6\u522b\u5230\u7684\u7528\u6237\u5f15\u5165\u76f8\u5173\u7684\u60c5\u666f\u8bb0\u5fc6\uff0c\u5e76\u6a21\u62df\u673a\u5668\u4eba\u5728\u4e0e\u4eba\u7c7b\u4f19\u4f34\u4e92\u52a8\u8fc7\u7a0b\u4e2d\u4ea7\u751f\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9\u793e\u4ea4\u673a\u5668\u4eba\u7684LLM-agent\u6846\u67b6\uff0cSoR-ReAct\uff0c\u4f5c\u4e3a\u6211\u4eec\u7cfb\u7edf\u4e2d\u4ea4\u4e92\u6a21\u5757\u7684\u6838\u5fc3\u7ec4\u4ef6\u3002\u8fd9\u4e00\u8bbe\u8ba1\u63a8\u52a8\u4e86\u793e\u4ea4\u673a\u5668\u4eba\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u63d0\u5347\u4eba\u673a\u4ea4\u4e92\u7684\u8d28\u91cf\u3002|\n", "2405.19425": "|**2024-05-29**|**Adaptive In-conversation Team Building for Language Model Agents**|Linxin Song et.al.|[2405.19425](http://arxiv.org/abs/2405.19425)|null|### \u7ffb\u8bd1 \u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\uff0c\u5229\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u524d\u666f\u3002\u7136\u800c\uff0c\u5982\u4f55\u4e3a\u7279\u5b9a\u5e94\u7528\u8bbe\u8ba1\u6709\u6548\u7684\u591a\u4ee3\u7406\u56e2\u961f\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u52a8\u6001\u56e2\u961f\u6784\u5efa\u8303\u5f0f\uff0c\u540d\u4e3a\u201cCaptain Agent\u201d\u3002\u5b83\u901a\u8fc7\u521b\u65b0\u7684Agent\u8bbe\u8ba1\uff0c\u80fd\u591f\u81ea\u9002\u5e94\u5730\u4e3a\u6bcf\u4e2a\u95ee\u9898\u89e3\u51b3\u6b65\u9aa4\u7ec4\u5efa\u548c\u7ba1\u7406\u56e2\u961f\uff0c\u5229\u7528\u5d4c\u5957\u7fa4\u804a\u548c\u53cd\u601d\u673a\u5236\u786e\u4fdd\u591a\u5143\u5316\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u9632\u6b62\u523b\u677f\u8f93\u51fa\u3002\u8fd9\u79cd\u65b9\u6cd5\u63d0\u4f9b\u4e86\u7075\u6d3b\u4f46\u7ed3\u6784\u5316\u7684\u89e3\u51b3\u95ee\u9898\u65b9\u5f0f\uff0c\u6709\u52a9\u4e8e\u51cf\u5c11\u5197\u4f59\uff0c\u589e\u5f3a\u8f93\u51fa\u591a\u6837\u6027\u3002\u5728\u516d\u4e2a\u5b9e\u9645\u573a\u666f\u4e2d\u7684\u5168\u9762\u8bc4\u4f30\u663e\u793a\uff0cCaptain Agent\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u591a\u4ee3\u7406\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8621.94%\uff0c\u5e76\u4e14\u65e0\u9700\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u7e41\u7410\u7684\u63d0\u793a\u5de5\u7a0b\uff0c\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.01422": "|**2024-06-03**|**How to Understand Whole Software Repository?**|Yingwei Ma et.al.|[2406.01422](http://arxiv.org/abs/2406.01422)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u5728\u81ea\u52a8\u8f6f\u4ef6\u5de5\u7a0b\uff08ASE\uff09\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5c3d\u7ba1\u73b0\u6709\u65b9\u6cd5\u5df2\u8bc1\u5b9e\u6709\u6548\uff0c\u4f46\u5b83\u4eec\u7684\u8bbe\u8ba1\u4e3b\u8981\u4fa7\u91cd\u4e8e\u4ee3\u7801\u7684\u5c40\u90e8\u4fe1\u606f\uff0c\u5982\u95ee\u9898\u3001\u7c7b\u548c\u51fd\u6570\uff0c\u8fd9\u9650\u5236\u4e86\u5bf9\u8f6f\u4ef6\u7cfb\u7edf\u5168\u5c40\u4e0a\u4e0b\u6587\u548c\u4f9d\u8d56\u5173\u7cfb\u7684\u7406\u89e3\u3002\u6839\u636e\u8f6f\u4ef6\u5f00\u53d1\u4eba\u5458\u7684\u5b9e\u9645\u7ecf\u9a8c\uff0c\u6211\u4eec\u8ba4\u4e3a\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u662f\u8fc8\u5411ASE\u7684\u5173\u952e\u3002\u7136\u800c\uff0c\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u5e26\u6765\u4e86\u8bf8\u591a\u6311\u6218\uff0c\u4f8b\u5982\uff1a\u957f\u4ee3\u7801\u8f93\u5165\u3001\u566a\u58f0\u4ee3\u7801\u4fe1\u606f\u3001\u590d\u6742\u4f9d\u8d56\u5173\u7cfb\u7b49\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u7814\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aRepoUnderstander\u7684\u65b0ASE\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5bfc\u4ee3\u7406\u5168\u9762\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u3002\u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u65b9\u5f0f\u5c06\u6574\u4e2a\u4ed3\u5e93\u7684\u5173\u952e\u4fe1\u606f\u538b\u7f29\u5230\u77e5\u8bc6\u56fe\u8c31\u4e2d\uff0c\u4ee5\u964d\u4f4e\u590d\u6742\u6027\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08Monte Carlo Tree Search, MCTS\uff09\u4e3a\u57fa\u7840\u7684\u4ed3\u5e93\u63a2\u7d22\u7b56\u7565\uff0c\u8d4b\u4e88\u4ee3\u7406\u7406\u89e3\u6574\u4e2a\u4ed3\u5e93\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528\u4ed3\u5e93\u7ea7\u522b\u7684\u77e5\u8bc6\uff0c\u6211\u4eec\u6307\u5bfc\u4ee3\u7406\u8fdb\u884c\u603b\u7ed3\u3001\u5206\u6790\u548c\u89c4\u5212\uff0c\u7136\u540e\u4ed6\u4eec\u53ef\u4ee5\u5229\u7528\u5de5\u5177\u52a8\u6001\u83b7\u53d6\u4fe1\u606f\u5e76\u751f\u6210\u4fee\u590d\u5b9e\u9645GitHub\u95ee\u9898\u7684\u8865\u4e01\u3002 \u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cRepoUnderstander\u5177\u6709\u4f18\u8d8a\u6027\u548c\u6709\u6548\u6027\u3002\u5728SWE-bench Lite\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4e0eSWE-agent\u76f8\u6bd4\uff0c\u5b83\u5b9e\u73b0\u4e8618.5%\u7684\u76f8\u5bf9\u63d0\u5347\u3002|\n", "2406.01364": "|**2024-06-03**|**BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards**|Diego Dorn et.al.|[2406.01364](http://arxiv.org/abs/2406.01364)|null|## \u80cc\u666f \u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u673a\u5236\u88ab\u7528\u4e8e\u68c0\u6d4b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7cfb\u7edf\u7684\u5f02\u5e38\u8f93\u51fa\u3002\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5728\u5b9e\u65f6\u76d1\u63a7\u3001\u79bb\u7ebf\u8bc4\u4f30\u548c\u5185\u5bb9\u5ba1\u6838\u7b49\u5173\u952e\u5e94\u7528\u4e2d\u53d1\u6325\u6838\u5fc3\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u7edf\u4e00\u7684\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u5b83\u4eec\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b89\u5168\u9632\u62a4\u57fa\u51c6\u201d\uff08Benchmarks for the Evaluation of LLM Safeguards\uff0c\u7b80\u79f0BELLS\uff09\uff0c\u5b83\u662f\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6d4b\u8bd5\u96c6\u5408\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\uff1a(1) \u5efa\u7acb\u6027\u6545\u969c\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u5df2\u5b58\u5728\u7684\u9488\u5bf9\u660e\u786e\u6545\u969c\u6a21\u5f0f\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u6bd4\u8f83\u5f53\u524d\u8f93\u5165-\u8f93\u51fa\u5b89\u5168\u9632\u62a4\u7684\u6548\u80fd\uff1b(2) \u65b0\u5174\u6545\u969c\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8861\u91cf\u5bf9\u672a\u89c1\u8fc7\u7684\u6545\u969c\u6a21\u5f0f\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u901a\u7528\u9632\u62a4\u673a\u5236\u7684\u53d1\u5c55\uff1b(3) \u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u9488\u5bf9\u66f4\u590d\u6742\u7684\u67b6\u6784\uff08\u5982LLM\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\uff09\uff0c\u76ee\u6807\u662f\u63a8\u52a8\u9002\u7528\u4e8e\u672a\u6765\u5c1a\u672a\u5b58\u5728\u4e13\u95e8\u9632\u62a4\u7684\u5e94\u7528\u7684\u5b89\u5168\u9632\u62a4\u6280\u672f\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u5e76\u5206\u4eab\u4e86\u7b2c\u4e00\u4e2a\u4e0b\u4e00\u4ee3\u67b6\u6784\u6d4b\u8bd5\uff0c\u4f7f\u7528MACHIAVELLI\u73af\u5883\uff0c\u5e76\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u7684\u4ea4\u4e92\u5f0f\u53ef\u89c6\u5316\u3002|\n", "2406.00936": "|**2024-06-03**|**A Survey of Useful LLM Evaluation**|Ji-Lun Peng et.al.|[2406.00936](http://arxiv.org/abs/2406.00936)|null|\u7531\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u7814\u7a76\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5bf9\u5b83\u4eec\u7684\u80fd\u529b\u8bc4\u4f30\u65b9\u6cd5\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4ee5\u786e\u5b9a\u5176\u5408\u9002\u7684\u4efb\u52a1\u548c\u8d23\u4efb\u3002\u672c\u6587\u4e3b\u8981\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5730\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5de5\u5177\uff0c\u5e76\u63d0\u51fa\u4e00\u4e2a\u4e24\u9636\u6bb5\u6846\u67b6\uff1a\u4ece\u201c\u6838\u5fc3\u80fd\u529b\u201d\u5230\u201c\u4ee3\u7406\u201d\u3002\u9996\u5148\uff0c\u6838\u5fc3\u80fd\u529b\u6307\u7684\u662f\u5927\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u6240\u5fc5\u9700\u7684\u7279\u6027\uff0c\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u80fd\u529b\u540e\uff0c\u5b83\u4eec\u80fd\u591f\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u4efb\u52a1\uff0c\u626e\u6f14\u4ee3\u7406\u89d2\u8272\u3002\u5728\u201c\u6838\u5fc3\u80fd\u529b\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3001\u793e\u4f1a\u5f71\u54cd\u4ee5\u53ca\u9886\u57df\u77e5\u8bc6\u3002\u800c\u5728\u201c\u4ee3\u7406\u201d\u9636\u6bb5\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5177\u8eab\u884c\u52a8\u3001\u89c4\u5212\u548c\u5de5\u5177\u5b66\u4e60\u65b9\u9762\u7684\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2406.01637": "|**2024-06-02**|**Teams of LLM Agents can Exploit Zero-Day Vulnerabilities**|Richard Fang et.al.|[2406.01637](http://arxiv.org/abs/2406.01637)|null|\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u590d\u6742\u6027\u4e0d\u65ad\u63d0\u9ad8\uff0c\u7814\u7a76\u8005\u53d1\u73b0\uff0c\u5f53\u63d0\u4f9b\u6f0f\u6d1e\u63cf\u8ff0\u548c\u7b80\u5355\u7684\u593a\u65d7\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5229\u7528\u5b9e\u9645\u5b58\u5728\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4e8b\u5148\u672a\u77e5\u7684\u96f6\u65e5\u6f0f\u6d1e\uff08\u5373\u653b\u51fb\u8005\u638c\u63e1\u800c\u5b89\u5168\u8f6f\u4ef6\u4f9b\u5e94\u5546\u8fd8\u672a\u4fee\u8865\u7684\u6f0f\u6d1e\uff09\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4ecd\u7136\u4e0d\u4f73\u3002\u672c\u6587\u5c55\u793a\u4e86\uff0c\u901a\u8fc7\u56e2\u961f\u5408\u4f5c\uff0c\u591a\u4e2aLLM\u4ee3\u7406\u53ef\u4ee5\u653b\u51fb\u73b0\u5b9e\u4e16\u754c\u7684\u96f6\u65e5\u6f0f\u6d1e\u3002\u5355\u72ec\u7684\u4ee3\u7406\u5728\u63a2\u7d22\u4f17\u591a\u6f0f\u6d1e\u548c\u8fdb\u884c\u957f\u671f\u89c4\u5212\u65f6\u9762\u4e34\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HPTSA\u7cfb\u7edf\uff0c\u5b83\u5305\u62ec\u4e00\u4e2a\u80fd\u8c03\u5ea6\u5b50\u4ee3\u7406\u7684\u8ba1\u5212\u4ee3\u7406\u3002\u8ba1\u5212\u4ee3\u7406\u8d1f\u8d23\u63a2\u7d22\u7cfb\u7edf\u5e76\u51b3\u5b9a\u4f7f\u7528\u54ea\u4e2a\u5b50\u4ee3\u7406\u6765\u5c1d\u8bd5\u4e0d\u540c\u7684\u6f0f\u6d1e\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u957f\u671f\u89c4\u5212\u7684\u95ee\u9898\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u5305\u542b15\u4e2a\u771f\u5b9e\u4e16\u754c\u6f0f\u6d1e\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4ee3\u7406\u56e2\u961f\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u63d0\u9ad8\u4e864.5\u500d\u3002|\n", "2406.00583": "|**2024-06-02**|**CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems**|Yanlin Feng et.al.|[2406.00583](http://arxiv.org/abs/2406.00583)|**[link](https://github.com/megagonlabs/CMDBench)**|### \u80cc\u666f \u5728\u6570\u636e\u5e93\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff08Compound Artificial Intelligence Systems\uff0cCAS\uff09\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u4f5c\u4e3a\u4ee3\u7406\uff0c\u901a\u8fc7\u4e0e\u5de5\u5177\u548c\u6570\u636e\u68c0\u7d22\u5668\u4ea4\u4e92\u6765\u6267\u884c\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u7cfb\u7edf\u6709\u53ef\u80fd\u589e\u5f3a\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u4e2d\u6570\u636e\u5206\u6790\u5e08\u7684\u4e00\u822c\u5206\u6790\u6d41\u7a0b\uff0c\u4f46CAS\u9762\u4e34\u7740\u4e0e\u5206\u6790\u5e08\u76f8\u4f3c\u7684\u6570\u636e\u53d1\u73b0\u6311\u6218\uff1a\u7ec4\u7ec7\u5185\u90e8\u4e0d\u540c\u56e2\u961f\u548c\u90e8\u95e8\u521b\u5efa\u7684\u591a\u6a21\u6001\u6570\u636e\u6e90\u5b64\u7acb\uff0c\u8fd9\u4f7f\u5f97\u5bfb\u627e\u5b8c\u6210\u5f53\u524d\u4efb\u52a1\u6240\u9700\u5408\u9002\u6570\u636e\u6e90\u53d8\u5f97\u56f0\u96be\u3002\u73b0\u6709\u7684\u6570\u636e\u53d1\u73b0\u57fa\u51c6\u5e76\u672a\u5145\u5206\u6a21\u62df\u8fd9\u79cd\u591a\u6a21\u6001\u548c\u6570\u636e\u6e90\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0cCAS\u7684\u73b0\u6709\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u7aef\u5230\u7aef\u4efb\u52a1\u6027\u80fd\u8bc4\u4f30\uff0c\u800c\u5ffd\u89c6\u4e86\u6570\u636e\u53d1\u73b0\u6027\u80fd\u3002 \u4e3a\u4e86\u63a8\u52a8\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5bf9\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728CAS\u4e2d\u7684\u6570\u636e\u53d1\u73b0\u6027\u80fd\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CMDBench\uff0c\u4e00\u4e2a\u65e8\u5728\u6a21\u62df\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u7684\u57fa\u51c6\u3002\u6211\u4eec\u6539\u7f16\u4e86\u5f00\u653e\u9886\u57df\u7684\u73b0\u6709\u6570\u636e\u96c6\u548c\u57fa\u51c6\uff0c\u5982\u95ee\u7b54\u3001\u590d\u6742\u63a8\u7406\u4ee5\u53ca\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u7ed3\u6784\u5316\u6570\u636e\uff0c\u6765\u8bc4\u4f30\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u7684\u6570\u636e\u53d1\u73b0\u4ee5\u53ca\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002 ### \u5b9e\u9a8c\u7ed3\u679c \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u68c0\u7d22\u5668\u8bbe\u8ba1\u5bf9\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u2014\u2014\u5e73\u5747\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u4e8646%\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u9700\u8981\u5f00\u53d1\u4f18\u5316\u7b56\u7565\u6765\u786e\u5b9a\u5408\u9002\u7684LLM\u4ee3\u7406\u548c\u68c0\u7d22\u5668\uff0c\u4ee5\u63d0\u9ad8\u5728\u4f01\u4e1a\u6570\u636e\u4e0a\u9ad8\u6548\u6267\u884cCAS\u7684\u80fd\u529b\u3002 \u603b\u4e4b\uff0cCMDBench\u662f\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u9488\u5bf9\u4f01\u4e1a\u6570\u636e\u5e73\u53f0\u590d\u6742\u6027\u8fdb\u884c\u7814\u7a76\u7684\u65b0\u5de5\u5177\uff0c\u5b83\u901a\u8fc7\u7efc\u5408\u8bc4\u4f30\u6570\u636e\u53d1\u73b0\u548c\u4efb\u52a1\u6267\u884c\u80fd\u529b\uff0c\u4e3a\u6539\u8fdb\u591a\u6a21\u6001\u6570\u636e\u68c0\u7d22\u5668\u5728\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u6846\u67b6\u3002|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244](http://arxiv.org/abs/2406.00244)|null|\u968f\u7740\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u666e\u904d\u9002\u7528\u6027\u63d0\u5347\uff0c\u4eba\u4eec\u5bf9\u5176\u7528\u4f5c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4ee3\u7406\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\u3002\u5728\u8fd9\u4e9b\u60c5\u5883\u4e0b\uff0c\u6a21\u578b\u9700\u8981\u6839\u636e\u4e0e\u73af\u5883\u7684\u6709\u9650\u4ea4\u4e92\u5f62\u6210\u76ee\u6807\u5b9e\u73b0\u7b56\u7565\u7684\u4fe1\u5ff5\uff0c\u5e76\u5728\u6bcf\u4e00\u6b65\u51b3\u7b56\u4e2d\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\u8fdb\u884c\u7814\u7a76\uff0c\u901a\u8fc7\u63a7\u5236\u7684\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u5b9e\u9a8c\u63a2\u8ba8LLMs\u5982\u4f55\u5f62\u6210\u548c\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u5ff5\u3002 \u9996\u5148\uff0c\u6211\u4eec\u53d1\u73b0LLM\u6a21\u578b\u8fc7\u4e8e\u81ea\u4fe1\uff1a\u5b83\u4eec\u5728\u7f3a\u4e4f\u5145\u5206\u8bc1\u636e\u7684\u60c5\u51b5\u4e0b\u5c31\u5bf9\u884c\u52a8\u505a\u51fa\u5f3a\u70c8\u5224\u65ad\uff0c\u5bfc\u81f4\u63a2\u7d22\u884c\u4e3a\u4e0d\u8db3\u3002\u8fdb\u4e00\u6b65\u6df1\u5165\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u79cd\u73b0\u8c61\u6e90\u4e8e\u4eceLLM\u91c7\u6837\u5f97\u5230\u7684\u52a8\u4f5c\u5206\u5e03\u71b5\u7684\u584c\u7f29\u3002\u63a5\u7740\uff0c\u6211\u4eec\u6307\u51fa\u73b0\u6709\u7684\u57fa\u4e8e\u4ee4\u724c\u7684\u91c7\u6837\u65b9\u6cd5\u672c\u8eab\u4e0d\u8db3\u4ee5\u4fc3\u4f7f\u6a21\u578b\u66f4\u5e7f\u6cdb\u63a2\u7d22\u3002 \u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u71b5\u6fc0\u6d3b\u5bfc\u5411\uff08Entropic Activation Steering\uff0cEAST\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u9488\u5bf9\u5728\u4e0a\u4e0b\u6587\u4e2d\u7684LLM\u4ee3\u7406\u7684\u6fc0\u6d3b\u5bfc\u5411\u65b9\u6cd5\u3002EAST\u8ba1\u7b97\u4e00\u4e2a\u4ee5\u71b5\u4e3a\u6743\u91cd\u7684\u8868\u793a\u7ec4\u5408\uff0c\u901a\u8fc7\u5728\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u5e72\u9884\u6a21\u578b\u7684\u6fc0\u6d3b\uff0c\u6765\u8c03\u6574\u6a21\u578b\u5bf9\u52a8\u4f5c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u4ece\u800c\u4fc3\u8fdb\u63a2\u7d22\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u6700\u540e\uff0cEAST\u6539\u53d8\u4e86LLM\u5728\u51b3\u7b56\u65f6\u8868\u8fbe\u7684\u4e3b\u89c2\u4e0d\u786e\u5b9a\u6027\uff0c\u4e3a\u7406\u89e3\u548c\u63a7\u5236\u6a21\u578b\u5bf9\u51b3\u7b56\u4e0d\u786e\u5b9a\u6027\u7684\u8868\u5f81\u63d0\u4f9b\u4e86\u9014\u5f84\u3002|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222](http://arxiv.org/abs/2406.00222)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u5df2\u7ecf\u8fc5\u901f\u6210\u4e3a\u6784\u5efa\u667a\u80fd\u5bf9\u8bdd\u52a9\u624b\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u8bf8\u5982\u6b67\u4e49\u5904\u7406\u7b49\u5bf9\u8bdd\u6280\u80fd\u4e0a\u4ecd\u6709\u6b20\u7f3a\uff1a\u5f53\u901a\u7528\u52a9\u624b\u9047\u5230\u6a21\u7cca\u60c5\u51b5\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u8fc7\u5ea6\u8c28\u614e\u6216\u731c\u6d4b\u7528\u6237\u7684\u771f\u6b63\u610f\u56fe\uff0c\u800c\u4e0d\u662f\u63d0\u95ee\u4ee5\u6c42\u6f84\u6e05\uff0c\u800c\u5728\u7279\u5b9a\u4efb\u52a1\u573a\u666f\u4e0b\uff0c\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6837\u672c\u5f80\u5f80\u6709\u9650\uff0c\u5f71\u54cd\u6a21\u578b\u5b66\u4e60\u6700\u4f18\u5bf9\u8bdd\u884c\u4e3a\u7b56\u7565\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAction-Based Contrastive Self-Training\uff08ACT\uff09\u7684\u8fd1\u4f3c\u5728\u7ebf\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u5b83\u57fa\u4e8eDirect Preference Optimization\uff08DPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u6837\u672c\u9ad8\u6548\u5bf9\u8bdd\u7b56\u7565\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4e09\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u9a8c\u8bc1\u4e86ACT\u7684\u6709\u6548\u6027\uff1a\u57fa\u4e8e\u8868\u683c\u7684\u95ee\u7b54\u3001\u673a\u5668\u9605\u8bfb\u7406\u89e3\uff0c\u4ee5\u53caAmbigSQL\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u6587\u672c\u5230SQL\u751f\u6210\u7684\u4fe1\u606f\u5bfb\u6c42\u8bf7\u6c42\u6b67\u4e49\u89e3\u51b3\u7684\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u8bc4\u4f30LLMs\u80fd\u5426\u5728\u5bf9\u8bdd\u4e2d\u8bc6\u522b\u548c\u63a8\u7406\u6b67\u4e49\u6765\u8861\u91cf\u5176\u4f5c\u4e3a\u5bf9\u8bdd\u4ee3\u7406\u7684\u80fd\u529b\u3002ACT\u5728\u4e0e\u6807\u51c6\u76d1\u7763\u5fae\u8c03\u548cDPO\u65b9\u6cd5\u76f8\u6bd4\u65f6\uff0c\u663e\u793a\u51fa\u4e86\u663e\u8457\u7684\u5bf9\u8bdd\u5efa\u6a21\u6539\u8fdb\u3002|\n", "2406.00215": "|**2024-05-31**|**Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent**|Jie JW Wu et.al.|[2406.00215](http://arxiv.org/abs/2406.00215)|**[link](https://github.com/jie-jw-wu/human-eval-comm)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u663e\u8457\u63d0\u5347\uff0c\u4f46\u4ecd\u4e0e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u7684\u6c34\u5e73\u5b58\u5728\u5dee\u8ddd\u3002\u9274\u4e8e\u9876\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u5e38\u901a\u8fc7\u63d0\u95ee\u6765\u6d88\u9664\u9700\u6c42\u548c\u7f16\u7801\u89e3\u51b3\u65b9\u6848\u4e2d\u7684\u6a21\u7cca\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u5bf9\u4e8eLLMs\u8fdb\u884c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u65f6\u4e5f\u5e94\u5177\u5907\u7c7b\u4f3c\u7684\u6c9f\u901a\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u7814\u7a76\uff0c\u5173\u6ce8LLMs\u7684\u6c9f\u901a\u6280\u80fd\uff0c\u5373\u201c\u5728\u4ee3\u7801\u751f\u6210\u95ee\u9898\u63cf\u8ff0\u5b58\u5728\u95ee\u9898\u65f6\u80fd\u63d0\u51fa\u6f84\u6e05\u95ee\u9898\u201d\u3002 \u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u540d\u4e3aHumanEvalComm\uff0c\u901a\u8fc7\u4fee\u6539\u95ee\u9898\u63cf\u8ff0\uff0c\u5f15\u5165\u4e86\u4e0d\u4e00\u81f4\u6027\u3001\u6a21\u7cca\u6027\u548c\u4e0d\u5b8c\u6574\u6027\u4e09\u4e2a\u95ee\u9898\u7ef4\u5ea6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u901a\u4fe1\u7387\u548c\u826f\u597d\u95ee\u9898\u7387\uff0c\u5e76\u5728HumanEvalComm\u4e0a\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684Code LLM\uff08\u4ee3\u7801\u8bed\u8a00\u6a21\u578b\uff09\u4ee5\u53ca\u4e00\u79cd\u65b0\u578bLLM\u4ee3\u7406\u65b9\u6cd5\uff08Okanagan\uff09\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u4ece\u4ee3\u7801\u548c\u63cf\u8ff0\u4e2d\u8bc6\u522b\u5e76\u63d0\u95ee\uff0c\u4ee5\u8fdb\u4e00\u6b65\u4f18\u5316\u751f\u6210\u7684\u4ee3\u7801\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u6bd4\u8f83Code LLMs\u548cOkanagan\u7684\u8868\u73b0\uff0c\u8ba8\u8bba\u4e86\u5b9e\u9a8c\u7ed3\u679c\u3002|\n", "2406.03299": "|**2024-06-05**|**The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games**|Mikhail Mozikov et.al.|[2406.03299](http://arxiv.org/abs/2406.03299)|null|## \u7ffb\u8bd1 \u884c\u4e3a\u7814\u7a76\u5b9e\u9a8c\u5728\u793e\u4f1a\u6a21\u578b\u548c\u7406\u89e3\u4eba\u9645\u4e92\u52a8\u4e2d\u5360\u636e\u91cd\u8981\u5730\u4f4d\u3002\u7136\u800c\uff0c\u5b9e\u9645\u64cd\u4f5c\u4e2d\u8fd9\u7c7b\u5b9e\u9a8c\u5e38\u9762\u4e34\u5185\u5728\u6548\u5ea6\u3001\u5916\u5728\u6548\u5ea6\u3001\u53ef\u91cd\u590d\u6027\u548c\u793e\u4f1a\u504f\u89c1\u7b49\u6311\u6218\uff0c\u56e0\u4e3a\u4eba\u7c7b\u7684\u793e\u4f1a\u4e92\u52a8\u4e0e\u5408\u4f5c\u590d\u6742\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u7684\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u4f46\u73b0\u6709\u57fa\u4e8eLLM\u7684\u6a21\u62df\u5047\u8bbe\u6a21\u578b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u76f8\u4f3c\uff0c\u5374\u5ffd\u89c6\u4e86\u5f71\u54cd\u4eba\u7c7b\u51b3\u7b56\u7684\u5173\u952e\u56e0\u7d20\u2014\u2014\u60c5\u7eea\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u8bba\u548c\u6846\u67b6\uff0c\u65e8\u5728\u63a2\u8ba8LLMs\u7684\u51b3\u7b56\u5236\u5b9a\u53ca\u5176\u5728\u60c5\u7eea\u72b6\u6001\u4e0b\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002 \u901a\u8fc7\u5728\u4e24\u79cd\u4e0d\u540c\u7c7b\u578b\u7684\u884c\u4e3a\u7ecf\u6d4e\u5b66\u6e38\u620f\uff08\u535a\u5f08\u8bba\u5b9e\u9a8c\uff09\u4e2d\u4f7f\u7528GPT-3.5\u548cGPT-4\uff0c\u6211\u4eec\u53d1\u73b0\u60c5\u7eea\u5bf9LLMs\u7684\u8868\u73b0\u6709\u663e\u8457\u5f71\u54cd\uff0c\u4fc3\u4f7f\u5b83\u4eec\u53d1\u5c55\u51fa\u66f4\u4f18\u5316\u7684\u7b56\u7565\u3002\u5c3d\u7ba1GPT-3.5\u4e0e\u4eba\u7c7b\u53c2\u4e0e\u8005\u7684\u884c\u52a8\u6a21\u5f0f\u6709\u8f83\u5f3a\u7684\u5bf9\u5e94\uff0c\u5c24\u5176\u662f\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u6e38\u620f\u4e2d\uff0c\u4f46GPT-4\u5c55\u73b0\u51fa\u4e00\u81f4\u7684\u884c\u4e3a\uff0c\u5bf9\u4e8e\u60c5\u7eea\u8bf1\u5bfc\u7684\u7406\u6027\u51b3\u7b56\u4f3c\u4e4e\u4e0d\u53d7\u5f71\u54cd\u3002\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u60c5\u7eea\u63d0\u793a\uff0c\u7279\u522b\u662f\u6124\u6012\u60c5\u7eea\uff0c\u80fd\u591f\u6253\u7834GPT-4\u7684\u201c\u8d85\u4eba\u201d\u4e00\u81f4\u6027\uff0c\u4f7f\u5176\u53cd\u5e94\u66f4\u63a5\u8fd1\u4eba\u7c7b\u7684\u60c5\u7eea\u53cd\u5e94\u3002|\n", "2406.03007": "|**2024-06-05**|**BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents**|Yifei Wang et.al.|[2406.03007](http://arxiv.org/abs/2406.03007)|**[link](https://github.com/dpamk/badagent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7e41\u8363\uff0c\u57fa\u4e8e\u8bad\u7ec3\u597d\u7684LLMs\u5e76\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u5fae\u8c03\u7684\u5f3a\u5927\u667a\u80fd\u4ee3\u7406\u5df2\u5f00\u53d1\u51fa\u6765\uff0c\u63d0\u4f9b\u5b9a\u5236\u670d\u52a1\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6784\u5efaLLM\u4ee3\u7406\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u9488\u5bf9\u4efb\u52a1\u8fdb\u884c\u8fdb\u4e00\u6b65\u8c03\u6574\u3002\u7136\u800c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u6613\u53d7\u540d\u4e3aBadAgent\u7684\u65b0\u578b\u540e\u95e8\u653b\u51fb\uff0c\u8be5\u653b\u51fb\u901a\u8fc7\u5728\u540e\u95e8\u6570\u636e\u4e0a\u5fae\u8c03\u5728\u5404\u79cd\u4ee3\u7406\u4efb\u52a1\u4e2d\u690d\u5165\u540e\u95e8\u3002\u5728\u6d4b\u8bd5\u65f6\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u901a\u8fc7\u5728\u8f93\u5165\u6216\u73af\u5883\u4e2d\u663e\u793a\u89e6\u53d1\u5668\uff0c\u64cd\u7eb5\u90e8\u7f72\u7684LLM\u4ee3\u7406\u6267\u884c\u6709\u5bb3\u64cd\u4f5c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u5373\u4f7f\u5728\u4fe1\u4efb\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\u4ecd\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u5c3d\u7ba1\u540e\u95e8\u653b\u51fb\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u5e7f\u6cdb\u7814\u7a76\uff0c\u4f46\u636e\u6211\u4eec\u6240\u77e5\uff0c\u6211\u4eec\u53ef\u80fd\u662f\u7b2c\u4e00\u4e2a\u7814\u7a76\u5728\u6743\u9650\u66f4\u5927\u7684LLM\u4ee3\u7406\u4e0a\u7684\u653b\u51fb\uff0c\u8fd9\u4e9b\u4ee3\u7406\u53ef\u4ee5\u4f7f\u7528\u5916\u90e8\u5de5\u5177\uff0c\u56e0\u6b64\u66f4\u5177\u5a01\u80c1\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u660e\u786e\u6307\u51fa\u4e86\u57fa\u4e8e\u4e0d\u4fe1\u4efb\u7684LLM\u6216\u6570\u636e\u6784\u5efaLLM\u4ee3\u7406\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a[https://github.com/DPamK/BadAgent](https://github.com/DPamK/BadAgent)\u3002**|\n", "2406.04151": "|**2024-06-06**|**AgentGym: Evolving Large Language Model-based Agents across Diverse Environments**|Zhiheng Xi et.al.|[2406.04151](http://arxiv.org/abs/2406.04151)|**[link](https://github.com/woooodyy/agentgym)**|**\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u5efa\u7acb\u80fd\u591f\u5904\u7406\u5404\u79cd\u4efb\u52a1\u5e76\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u81ea\u6211\u8fdb\u5316\u7684\u6cdb\u5316\u578b\u4ee3\u7406\u662f\u4e00\u4e2a\u957f\u671f\u76ee\u6807\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u901a\u7528\u80fd\u529b\u88ab\u8ba4\u4e3a\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6709\u524d\u666f\u7684\u57fa\u7840\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u76d1\u7763\uff0c\u8ba9LLM\u4ee3\u7406\u9010\u6b65\u6a21\u4eff\u4e13\u5bb6\u63d0\u4f9b\u7684\u8f68\u8ff9\uff0c\u96be\u4ee5\u5927\u89c4\u6a21\u6269\u5c55\u4e14\u9650\u5236\u4e86\u73af\u5883\u63a2\u7d22\uff1b\u8981\u4e48\u8ba9\u4ee3\u7406\u5728\u5b64\u7acb\u73af\u5883\u4e2d\u63a2\u7d22\u5b66\u4e60\uff0c\u5bfc\u81f4\u4e13\u957f\u6709\u9650\u3001\u7f3a\u4e4f\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u9996\u6b21\u5c1d\u8bd5\u6784\u5efa\u5177\u5907\u81ea\u6211\u8fdb\u5316\u80fd\u529b\u7684\u901a\u7528LLM\u4ee3\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e09\u4e2a\u5173\u952e\u8981\u7d20\uff1a1\uff09\u591a\u6837\u7684\u73af\u5883\u4ee5\u652f\u6301\u4ee3\u7406\u63a2\u7d22\u548c\u5b66\u4e60\uff1b2\uff09\u4e00\u5957\u8f68\u8ff9\u6765\u8d4b\u4e88\u4ee3\u7406\u57fa\u672c\u80fd\u529b\u548c\u5148\u9a8c\u77e5\u8bc6\uff1b3\uff09\u6709\u6548\u4e14\u53ef\u6269\u5c55\u7684\u8fdb\u5316\u65b9\u6cd5\u3002 \u6211\u4eec\u63d0\u51fa\u4e86AgentGym\uff0c\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u5b83\u5305\u542b\u4e30\u5bcc\u7684\u73af\u5883\u548c\u4efb\u52a1\uff0c\u652f\u6301\u5168\u9762\u3001\u5b9e\u65f6\u3001\u7edf\u4e00\u683c\u5f0f\u548c\u5e76\u53d1\u7684\u4ee3\u7406\u63a2\u7d22\u3002AgentGym\u8fd8\u5305\u62ec\u4e00\u4e2a\u6269\u5c55\u6307\u4ee4\u7684\u6570\u636e\u5e93\u3001\u57fa\u51c6\u6d4b\u8bd5\u5957\u4ef6\u4ee5\u53ca\u8de8\u73af\u5883\u7684\u9ad8\u8d28\u91cf\u8f68\u8ff9\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AgentEvol\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7814\u7a76\u4ee3\u7406\u5728\u8d85\u8d8a\u65e2\u5b9a\u6570\u636e\uff0c\u8de8\u8d8a\u4efb\u52a1\u548c\u73af\u5883\u65f6\u7684\u81ea\u6211\u8fdb\u5316\u6f5c\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fdb\u5316\u540e\u7684\u4ee3\u7406\u53ef\u4ee5\u8fbe\u5230\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6211\u4eec\u53d1\u5e03\u4e86AgentGym\u5957\u4ef6\uff0c\u5305\u62ec\u5e73\u53f0\u3001\u6570\u636e\u96c6\u3001\u57fa\u51c6\u3001\u68c0\u67e5\u70b9\u548c\u7b97\u6cd5\u5b9e\u73b0\u3002AgentGym\u5957\u4ef6\u5df2\u5728\u5176\u5b98\u65b9\u7f51\u7ad9https://github.com/WooooDyy/AgentGym\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.04692": "|**2024-06-07**|**Mixture-of-Agents Enhances Large Language Model Capabilities**|Junlin Wang et.al.|[2406.04692](http://arxiv.org/abs/2406.04692)|null|\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\u663e\u8457\uff0c\u5c55\u73b0\u51fa\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5f3a\u5927\u80fd\u529b\u3002\u968f\u7740LLMs\u7684\u589e\u591a\uff0c\u5982\u4f55\u6709\u6548\u6574\u5408\u591a\u6a21\u578b\u7684\u77e5\u8bc6\u6210\u4e3a\u4e86\u4e00\u4e2a\u4ee4\u4eba\u632f\u594b\u7684\u7814\u7a76\u65b9\u5411\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u6df7\u5408\u4ee3\u7406\uff08Mixture-of-Agents\uff0cMoA\uff09\u65b9\u6cd5\u3002\u5728\u6211\u4eec\u7684\u67b6\u6784\u4e2d\uff0cMoA\u91c7\u7528\u4e86\u5206\u5c42\u8bbe\u8ba1\uff0c\u6bcf\u5c42\u5305\u542b\u591a\u4e2aLLM\u4ee3\u7406\u3002\u6bcf\u4e2a\u4ee3\u7406\u5728\u751f\u6210\u54cd\u5e94\u65f6\uff0c\u4f1a\u5229\u7528\u524d\u4e00\u5c42\u6240\u6709\u4ee3\u7406\u7684\u8f93\u51fa\u4f5c\u4e3a\u8f85\u52a9\u4fe1\u606f\u3002\u901a\u8fc7\u8fd9\u79cd\u7b56\u7565\uff0cMoA\u6a21\u578b\u5728AlpacaEval 2.0\u3001MT-Bench\u548cFLASK\u7b49\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86GPT-4\u5168\u80fd\u7248\u3002\u4f8b\u5982\uff0c\u4ec5\u4f7f\u7528\u5f00\u6e90LLMs\u7684\u6211\u4eec\u7684MoA\u6a21\u578b\u5728AlpacaEval 2.0\u4e0a\u7684\u5f97\u5206\u9886\u5148\uff0c\u8fbe\u523065.1%\uff0c\u800cGPT-4\u5168\u80fd\u7248\u7684\u6210\u7ee9\u4e3a57.5%\u3002|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925](http://arxiv.org/abs/2406.05925)|**[link](https://github.com/leolee99/ld-agent)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5f00\u653e\u57df\u5bf9\u8bdd\u7cfb\u7edf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u7cfb\u7edf\u4e3b\u8981\u5173\u6ce8\u7b80\u77ed\u7684\u5355\u6b21\u4f1a\u8bdd\uff0c\u5ffd\u89c6\u4e86\u957f\u671f\u966a\u4f34\u548c\u4e2a\u6027\u5316\u804a\u5929\u673a\u5668\u4eba\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u6ee1\u8db3\u8fd9\u79cd\u5b9e\u9645\u9700\u6c42\uff0c\u4e8b\u4ef6\u603b\u7ed3\u548c\u4eba\u683c\u7ba1\u7406\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u4eec\u80fd\u591f\u4fc3\u8fdb\u957f\u671f\u5bf9\u8bdd\u56de\u590d\u7684\u5408\u7406\u6027\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4eba\u7c7b\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8fdb\u5c55\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6709\u53ef\u80fd\u5927\u5e45\u589e\u5f3a\u81ea\u52a8\u5316\u611f\u77e5\u3001\u51b3\u7b56\u548c\u95ee\u9898\u89e3\u51b3\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684\u6846\u67b6\u2014\u2014\u957f\u671f\u5bf9\u8bdd\u4ee3\u7406\uff08LD-Agent\uff09\uff0c\u5b83\u5305\u62ec\u4e09\u4e2a\u53ef\u72ec\u7acb\u8c03\u6574\u7684\u6a21\u5757\uff1a\u4e8b\u4ef6\u611f\u77e5\u3001\u4eba\u683c\u63d0\u53d6\u548c\u54cd\u5e94\u751f\u6210\u3002\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u4f7f\u7528\u957f\u77ed\u671f\u8bb0\u5fc6\u5e93\u5206\u522b\u5173\u6ce8\u5386\u53f2\u548c\u6b63\u5728\u8fdb\u884c\u7684\u4f1a\u8bdd\uff0c\u5e76\u5f15\u5165\u4e86\u57fa\u4e8e\u4e3b\u9898\u7684\u68c0\u7d22\u673a\u5236\u4ee5\u63d0\u9ad8\u8bb0\u5fc6\u68c0\u7d22\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u4eba\u683c\u6a21\u5757\u5b9e\u73b0\u4e86\u7528\u6237\u548c\u4ee3\u7406\u7684\u52a8\u6001\u4eba\u683c\u5efa\u6a21\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6574\u5408\u68c0\u7d22\u7684\u8bb0\u5fc6\u548c\u63d0\u53d6\u7684\u4eba\u683c\uff0c\u751f\u6210\u5668\u4f1a\u4ea7\u751f\u9002\u5f53\u7684\u56de\u5e94\u3002\u6211\u4eec\u5728\u5404\u79cd\u793a\u4f8b\u57fa\u51c6\u3001\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u5b9e\u8bc1\u4e86LD-Agent\u7684\u6709\u6548\u6027\u3001\u901a\u7528\u6027\u548c\u8de8\u9886\u57df\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u5728https://github.com/leolee99/LD-Agent\u4e0a\u53d1\u5e03\u3002**|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804](http://arxiv.org/abs/2406.05804)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u590d\u6742\u4ee3\u7406\u5de5\u4f5c\u6d41\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u8def\u5f84\u3001\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u63d0\u793a\u65b9\u6cd5\u6709\u6240\u6539\u8fdb\u3002\u8fd9\u7bc7\u7efc\u8ff0\u65e8\u5728\u6982\u8ff0\u5e38\u89c1\u7684\u5de5\u4f5c\u6d41\uff0c\u7279\u522b\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7279\u6027\u7684\u7ec4\u4ef6\uff08LLM-Profiled Components\uff0cLMPCs\uff09\uff0c\u5e76\u5f3a\u8c03\u5bf9\u975eLLM\u7ec4\u4ef6\u7684\u5ffd\u7565\u3002\u8fd9\u79cd\u7814\u7a76\u7684\u76ee\u7684\u662f\u4e3a\u4e86\u589e\u8fdb\u5bf9LLMs\u89d2\u8272\u7684\u7406\u89e3\uff0c\u5e76\u63a2\u7d22LMPC\u7684\u590d\u7528\u6f5c\u529b\u3002|\n", "2406.07275": "|**2024-06-11**|**DCA-Bench: A Benchmark for Dataset Curation Agents**|Benhao Huang et.al.|[2406.07275](http://arxiv.org/abs/2406.07275)|**[link](https://github.com/TRAIS-Lab/dca-bench)**|\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u7814\u7a76\u548c\u5f00\u53d1\u7684\u63a8\u8fdb\uff0c\u6570\u636e\u96c6\u7684\u8d28\u91cf\u65e5\u76ca\u5173\u952e\u3002\u5c3d\u7ba1\u5f00\u653e\u6570\u636e\u96c6\u5e73\u53f0\u4f17\u591a\uff0c\u4f46\u6570\u636e\u8d28\u91cf\u95ee\u9898\uff0c\u5982\u7f3a\u4e4f\u6587\u6863\u3001\u6807\u6ce8\u9519\u8bef\u548c\u4f26\u7406\u8003\u91cf\uff0c\u4ecd\u666e\u904d\u5b58\u5728\u3002\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u96be\u4ee5\u901a\u8fc7\u89c4\u5219\u57fa\u7840\u811a\u672c\u68c0\u6d4b\uff0c\u9700\u8981\u7528\u6237\u6216\u7ef4\u62a4\u8005\u82b1\u8d39\u5927\u91cf\u4eba\u529b\u8fdb\u884c\u8bc6\u522b\u548c\u9a8c\u8bc1\u3002\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u6570\u636e\u96c6\u6574\u7406\u7684\u6f5c\u529b\u4ee4\u4eba\u671f\u5f85\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aDCA-Bench\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ee3\u7406\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u9690\u85cf\u6570\u636e\u8d28\u91cf\u95ee\u9898\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u4ece\u516b\u4e2a\u516c\u5f00\u6570\u636e\u96c6\u5e73\u53f0\u6536\u96c6\u4e86\u5404\u79cd\u5b9e\u9645\u95ee\u9898\u4f5c\u4e3a\u6d4b\u8bd5\u5e8a\u3002\u4e3a\u4e86\u5efa\u7acb\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30LLM\u6210\u529f\u4e0e\u5426\u7684\u7ba1\u9053\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684LLM\u8bc4\u4f30\u5668\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8bc4\u4f30\u5668\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u9ad8\u5ea6\u543b\u5408\uff0c\u80fd\u5b9e\u73b0\u53ef\u9760\u7684\u81ea\u52a8\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5728\u591a\u4e2a\u57fa\u7ebfLLM\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u793a\u4e86\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u610f\u5473\u7740\u5c06LLMs\u5e94\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\u7ba1\u7406\u4ecd\u9700\u6df1\u5165\u63a2\u7d22\u548c\u521b\u65b0\u3002\u6b64\u5916\uff0c\u8be5\u57fa\u51c6\u4e5f\u53ef\u4f5c\u4e3a\u8861\u91cfLLMs\u5728\u95ee\u9898\u53d1\u73b0\u80fd\u529b\u800c\u975e\u4ec5\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u7684\u6d4b\u8bd5\u5e73\u53f0\u3002\u57fa\u51c6\u5957\u4ef6\u5df2\u5f00\u653e\u5728\uff1a\\url{https://github.com/TRAIS-Lab/dca-bench}\u3002|\n", "2406.07217": "|**2024-06-11**|**A Synthetic Dataset for Personal Attribute Inference**|Hanna Yukhymenko et.al.|[2406.07217](http://arxiv.org/abs/2406.07217)|**[link](https://github.com/eth-sri/synthpai)**|**\u8fd1\u5e74\u6765\uff0c\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u4e3a\u5168\u7403\u6570\u4ebf\u7528\u6237\u6240\u63a5\u89e6\uff0c\u4f46\u5b83\u4eec\u7684\u5f3a\u5927\u529f\u80fd\u548c\u5e7f\u6cdb\u4e16\u754c\u77e5\u8bc6\u4e5f\u5e26\u6765\u4e86\u9690\u79c1\u98ce\u9669\u3002\u672c\u7814\u7a76\u5173\u6ce8LLMs\u65b0\u5174\u7684\u9690\u79c1\u5a01\u80c1\u2014\u2014\u4ece\u7f51\u7edc\u6587\u672c\u4e2d\u51c6\u786e\u63a8\u65ad\u4e2a\u4eba\u4fe1\u606f\u3002\u9274\u4e8e\u57fa\u4e8eLLM\u7684\u4f5c\u8005\u5206\u6790\u7814\u7a76\u7f3a\u4e4f\u5408\u9002\u7684\u516c\u5f00\u6570\u636e\u96c6\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u6d89\u53ca\u771f\u5b9e\u4e2a\u4eba\u6570\u636e\u7684\u4f26\u7406\u548c\u9690\u79c1\u987e\u8651\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u4e24\u4e2a\u65b9\u9762\u8fdb\u884c\u4e86\u63a2\u7d22\uff1a\uff08i\uff09\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4f7f\u7528\u5408\u6210\u4e2a\u4eba\u8d44\u6599\u586b\u5145\u7684\u6d41\u884c\u793e\u4ea4\u5e73\u53f0Reddit\u7684\u6a21\u62df\u6846\u67b6\uff1b\uff08ii\uff09\u5229\u7528\u6b64\u6846\u67b6\uff0c\u6211\u4eec\u751f\u6210\u4e86SynthPAI\uff0c\u4e00\u4e2a\u5305\u542b\u8d85\u8fc77800\u6761\u7ecf\u8fc7\u624b\u52a8\u6807\u8bb0\u4e2a\u4eba\u5c5e\u6027\u7684\u591a\u6837\u5316\u7684\u5408\u6210\u8bc4\u8bba\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u4eba\u7c7b\u7814\u7a76\u9a8c\u8bc1\u4e86\u6570\u636e\u96c6\uff0c\u7ed3\u679c\u663e\u793a\u4eba\u7c7b\u5728\u533a\u5206\u771f\u5b9e\u548c\u5408\u6210\u8bc4\u8bba\u7684\u4efb\u52a1\u4e0a\u51e0\u4e4e\u4e0d\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6570\u636e\u96c6\u652f\u6301\u6709\u610f\u4e49\u7684\u4e2a\u4eba\u5c5e\u6027\u63a8\u65ad\u7814\u7a76\uff0c\u901a\u8fc718\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u5408\u6210\u8bc4\u8bba\u53ef\u4ee5\u5f97\u51fa\u4e0e\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u76f8\u540c\u7684\u7ed3\u8bba\u3002\u7efc\u4e0a\u6240\u8ff0\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6d41\u7a0b\u4e3a\u672a\u6765\u7814\u7a76\u5982\u4f55\u7406\u89e3\u548c\u51cf\u8f7bLLMs\u5e26\u6765\u7684\u57fa\u4e8e\u63a8\u65ad\u7684\u9690\u79c1\u5a01\u80c1\u63d0\u4f9b\u4e86\u5f3a\u5927\u4e14\u9690\u79c1\u4fdd\u62a4\u7684\u57fa\u7840\u3002**|\n", "2406.07021": "|**2024-06-11**|**A Tool for Test Case Scenarios Generation Using Large Language Models**|Abdul Malik Sami et.al.|[2406.07021](http://arxiv.org/abs/2406.07021)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u6db5\u76d6\u4ee3\u7801\u751f\u6210\u3001\u8f6f\u4ef6\u8bbe\u8ba1\u548c\u6587\u6863\u7f16\u5199\u3001\u6dfb\u52a0\u4ee3\u7801\u6ce8\u91ca\u3001\u4ee3\u7801\u5ba1\u67e5\u4ee5\u53ca\u7f16\u5199\u6d4b\u8bd5\u811a\u672c\u7b49\u4efb\u52a1\u3002\u7136\u800c\uff0c\u521b\u5efa\u6d4b\u8bd5\u811a\u672c\u6216\u81ea\u52a8\u5316\u6d4b\u8bd5\u6848\u4f8b\u9700\u8981\u4e0e\u529f\u80fd\u9700\u6c42\u7d27\u5bc6\u76f8\u5173\u7684\u8be6\u5c3d\u6d4b\u8bd5\u5957\u4ef6\u6587\u6863\u3002\u8fd9\u79cd\u6587\u6863\u5e94\u80fd\u5728\u6709\u9650\u7684\u65f6\u95f4\u548c\u8303\u56f4\u5185\u5b9e\u73b0\u5168\u9762\u6d4b\u8bd5\uff0c\u5c24\u5176\u5f53\u9700\u6c42\u548c\u7528\u6237\u671f\u671b\u4e0d\u65ad\u53d8\u5316\u65f6\u3002\u672c\u6587\u4e3b\u8981\u5173\u6ce8\u6839\u636e\u7528\u6237\u9700\u6c42\u751f\u6210\u53f2\u8bd7\u7ea7\uff08epics\uff09\u548c\u9ad8\u5c42\u6b21\u7528\u6237\u6545\u4e8b\uff0c\u7136\u540e\u57fa\u4e8e\u8fd9\u4e9b\u6545\u4e8b\u8bbe\u8ba1\u6d4b\u8bd5\u573a\u666f\u3002\u6587\u7ae0\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u4ee3\u7406\u548c\u63d0\u793a\u5de5\u7a0b\u7684\u7f51\u7edc\u8f6f\u4ef6\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u80fd\u591f\u81ea\u52a8\u5316\u9488\u5bf9\u7528\u6237\u9700\u6c42\u751f\u6210\u6d4b\u8bd5\u573a\u666f\u7684\u8fc7\u7a0b\u3002|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947](http://arxiv.org/abs/2406.06947)|**[link](https://github.com/caap-agent/caap-agent)**|**\u957f\u671f\u4ee5\u6765\uff0c\u8f6f\u4ef6\u673a\u5668\u4eba\u5df2\u7ecf\u5728\u673a\u5668\u4eba\u6d41\u7a0b\u81ea\u52a8\u5316\uff08RPA\uff09\u4e2d\u7528\u4e8e\u6267\u884c\u67af\u71e5\u7684\u8ba1\u7b97\u673a\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u63a8\u7406\u80fd\u529b\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u4ee3\u7406\u73b0\u5728\u80fd\u591f\u5904\u7406\u66f4\u590d\u6742\u751a\u81f3\u524d\u6240\u672a\u89c1\u7684\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5f53\u524d\u6587\u732e\u4e2d\u7684\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u5316\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8eHTML\u6e90\u4ee3\u7801\u4f5c\u4e3a\u8f93\u5165\uff0c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u975e\u7f51\u7edc\u73af\u5883\u7684\u5e94\u7528\u3002HTML\u4ee3\u7801\u4e2d\u7684\u4fe1\u606f\u5e38\u5e38\u4e0d\u51c6\u786e\u6216\u4e0d\u5b8c\u6574\uff0c\u8fd9\u964d\u4f4e\u4e86\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4ec5\u57fa\u4e8e\u5c4f\u5e55\u622a\u56fe\u7684LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u5b83\u4e13\u6ce8\u4e8e\u8bc6\u522b\u73af\u5883\uff0c\u5e76\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u6765\u6d88\u9664\u5bf9\u5927\u91cf\u4eba\u7c7b\u6f14\u793a\u6570\u636e\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u7b56\u7565\u540d\u4e3a\u201c\u4e0a\u4e0b\u6587\u611f\u77e5\u884c\u52a8\u89c4\u5212\u201d\uff08Context-Aware Action Planning\uff0cCAAP\uff09\u63d0\u793a\uff0c\u9f13\u52b1\u4ee3\u7406\u4ece\u591a\u4e2a\u89d2\u5ea6\u4ed4\u7ec6\u5ba1\u67e5\u4e0a\u4e0b\u6587\u3002\u901a\u8fc7\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u572867\u79cdMiniWoB++\u95ee\u9898\u4e0a\u5b9e\u73b0\u4e8694.4%\u7684\u6210\u529f\u7387\uff0c\u6bcf\u4e2a\u95ee\u9898\u7c7b\u578b\u53ea\u97001.48\u6b21\u6f14\u793a\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u5728\u8ba1\u7b97\u673a\u6216\u667a\u80fd\u624b\u673a\u4e4b\u95f4\u8fdb\u884c\u8de8\u5e94\u7528\u534f\u8c03\u7684\u4efb\u52a1\u4e0a\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u4ee3\u7406\u9886\u57df\u7684\u91cd\u5927\u8fdb\u6b65\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728https://github.com/caap-agent/caap-agent\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613](http://arxiv.org/abs/2406.06613)|**[link](https://github.com/Joshuaclymer/GameBench)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5728\u8bb8\u591a\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u5c11\u91cf\u6837\u672c\u6027\u80fd\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5c55\u793a\u8fc7\u5728\u590d\u6742\u7b56\u7565\u573a\u666f\u4e2d\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u7f3a\u4e4f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u6e38\u620f\u4e2d\u7684\u5404\u79cd\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86GameBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u8de8\u9886\u57df\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6218\u7565\u601d\u7ef4\u80fd\u529b\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e9\u4e2a\u4e0d\u540c\u7684\u6e38\u620f\u73af\u5883\uff0c\u6bcf\u4e2a\u6e38\u620f\u81f3\u5c11\u6db5\u76d6\u4e00\u79cd\u5728\u7b56\u7565\u6e38\u620f\u4e2d\u8bc6\u522b\u51fa\u7684\u5173\u952e\u63a8\u7406\u6280\u80fd\uff0c\u5e76\u9009\u62e9\u90a3\u4e9b\u6218\u7565\u89e3\u91ca\u4e0d\u592a\u53ef\u80fd\u6784\u6210\u6a21\u578b\u9884\u8bad\u7ec3\u6570\u636e\u4e3b\u8981\u90e8\u5206\u7684\u6e38\u620f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4f7f\u7528\u4e86\u57fa\u7840\u5f62\u5f0f\u7684GPT-3\u548cGPT-4\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65e8\u5728\u589e\u5f3a\u6218\u7565\u63a8\u7406\u80fd\u529b\u7684\u5f15\u5bfc\u6846\u67b6\uff1aChain-of-Thought\uff08CoT\uff09\u63d0\u793a\u548cReasoning Via Planning\uff08RAP\uff09\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6240\u6709\u6d4b\u8bd5\u6a21\u578b\u7684\u8868\u73b0\u90fd\u6ca1\u6709\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\uff0c\u6700\u5dee\u7684\u662fGPT-4\u7684\u8868\u73b0\u751a\u81f3\u4f4e\u4e8e\u968f\u673a\u884c\u52a8\u3002CoT\u548cRAP\u90fd\u63d0\u9ad8\u4e86\u5206\u6570\uff0c\u4f46\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002**|\n", "2406.08184": "|**2024-06-12**|**MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents**|Luyuan Wang et.al.|[2406.08184](http://arxiv.org/abs/2406.08184)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e0a\u7684\u76f4\u63a5\u4ea4\u4e92\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u4ee5\u53ca\u5b83\u4eec\u5728\u81ea\u4e3b\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u57fa\u4e8eLLMs\u7684\u79fb\u52a8\u4ee3\u7406\u6b63\u9010\u6e10\u53d7\u5230\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5e94\u7528\u7a0b\u5e8f\u7684\u65e0\u9650\u72b6\u6001\u548c\u53ef\u884c\u52a8\u4f5c\u5e8f\u5217\u7684\u6a21\u7cca\u5b9a\u4e49\uff0c\u5bf9\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u6027\u80fd\u7684\u57fa\u51c6\u7814\u7a76\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684\u57fa\u51c6\u5de5\u5177\u2014\u2014MobileAgentBench\uff0c\u65e8\u5728\u51cf\u8f7b\u7e41\u7410\u7684\u624b\u52a8\u6d4b\u8bd5\u8d1f\u62c5\u3002\u6211\u4eec\u9996\u5148\u5b9a\u4e49\u4e86\u6db5\u76d610\u4e2a\u5f00\u6e90\u5e94\u7528\u7684100\u9879\u4efb\u52a1\uff0c\u6309\u96be\u5ea6\u5206\u4e3a\u591a\u4e2a\u7ea7\u522b\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5305\u62ecAppAgent\u548cMobileAgent\u5728\u5185\u7684\u591a\u4e2a\u73b0\u6709\u79fb\u52a8\u4ee3\u7406\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u5168\u9762\u7cfb\u7edf\u5730\u6bd4\u8f83\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6240\u6709\u76f8\u5173\u6750\u6599\u5747\u53ef\u5728\u6211\u4eec\u7684\u9879\u76ee\u7f51\u7ad9https://MobileAgentBench.github.io\u4e0a\u83b7\u53d6\uff0c\u8fd9\u5c06\u63a8\u52a8\u5b66\u672f\u548c\u5de5\u4e1a\u9886\u57df\u7684\u8fdb\u6b65\u3002|\n", "2406.07973": "|**2024-06-12**|**Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey**|Shang Wang et.al.|[2406.07973](http://arxiv.org/abs/2406.07973)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u901a\u8fc7\u5927\u91cf\u6570\u636e\u8bad\u7ec3\uff0c\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u673a\u5668\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u7b49\u5404\u79cd\u5e94\u7528\u3002\u7136\u800c\uff0cLLMs\u5728\u5176\u751f\u547d\u5468\u671f\u4e2d\u66b4\u9732\u51fa\u4e00\u7cfb\u5217\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5173\u6ce8\u3002\u8fd9\u4e9b\u95ee\u9898\u4e0e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\u5177\u6709\u72ec\u7279\u6027\uff0c\u9274\u4e8e\u5f53\u524d\u7684\u7efc\u8ff0\u7f3a\u4e4f\u9488\u5bf9\u4e0d\u540c\u573a\u666f\u7684\u6e05\u6670\u5a01\u80c1\u5206\u7c7b\uff0c\u6211\u4eec\u6839\u636e\u4e94\u4e2a\u573a\u666f\uff1a\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001RAG\u7cfb\u7edf\u3001\u90e8\u7f72\u548c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u5f3a\u8c03\u4e86\u72ec\u7279\u7684\u98ce\u9669\u3002\u8003\u8651\u5230\u6bcf\u79cd\u5a01\u80c1\u7684\u7279\u6027\uff0c\u672c\u8c03\u67e5\u63d0\u4f9b\u4e86\u6f5c\u5728\u5a01\u80c1\u548c\u5e94\u5bf9\u7b56\u7565\u3002\u7814\u7a76LLMs\u6240\u9762\u4e34\u7684\u653b\u51fb\u548c\u9632\u5fa1\u60c5\u51b5\uff0c\u53ef\u4ee5\u4e3a\u66f4\u591a\u9886\u57df\u63d0\u4f9b\u53ef\u884c\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4f7f\u66f4\u591a\u4eba\u80fd\u591f\u53d7\u76ca\u4e8eLLMs\u3002|\n", "2406.07914": "|**2024-06-14**|**Can Large Language Models Understand Spatial Audio?**|Changli Tang et.al.|[2406.07914](http://arxiv.org/abs/2406.07914)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u638c\u63e1\u591a\u901a\u9053\u97f3\u9891\u4e2d\u7684\u7a7a\u95f4\u4fe1\u606f\uff0c\u8fd9\u662f\u5f53\u524d\u542c\u89c9LLMs\u6240\u7f3a\u4e4f\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5229\u7528LLMs\u7684\u9ad8\u7ea7\u8ba4\u77e5\u548c\u63a8\u7406\u80fd\u529b\uff0c\u76ee\u6807\u662f\u63d0\u5347\u6a21\u578b\u5bf9\u4e09\u7ef4\u73af\u5883\u7684\u7406\u89e3\uff0c\u901a\u8fc7\u97f3\u9891\u3002\u7814\u7a76\u6d89\u53ca\u4e09\u9879\u7a7a\u95f4\u97f3\u9891\u4efb\u52a1\uff1a\u58f0\u6e90\u5b9a\u4f4d\uff08SSL\uff09\u3001\u8fdc\u573a\u8bed\u97f3\u8bc6\u522b\uff08FSR\uff09\u548c\u57fa\u4e8e\u4f4d\u7f6e\u7684\u8bed\u97f3\u63d0\u53d6\uff08LSE\uff09\uff0c\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u90fd\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u5728SSL\u65b9\u9762\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spatial LibriSpeech\u6570\u636e\u96c6\u4e0a\u7684\u5747\u65b9\u8bef\u5dee\uff08MAE\uff09\u8fbe\u52302.70\u00b0\uff0c\u660e\u663e\u4f18\u4e8e\u5148\u524d\u7684\u57fa\u51c6\u7ea66.60\u00b0\u3002\u6b64\u5916\uff0c\u6a21\u578b\u80fd\u591f\u5229\u7528\u7a7a\u95f4\u7ebf\u7d22\u63d0\u9ad8FSR\u7684\u51c6\u786e\u6027\uff0c\u5e76\u901a\u8fc7\u6587\u672c\u63d0\u793a\uff0c\u6839\u636e\u6307\u5b9a\u65b9\u5411\u805a\u7126\u4e8e\u58f0\u97f3\uff0c\u5373\u4f7f\u5728\u91cd\u53e0\u8bed\u97f3\u73af\u5883\u4e2d\u4e5f\u80fd\u6267\u884cLSE\u3002\u8fd9\u4e9b\u6210\u679c\u63ed\u793a\u4e86LLMs\u9002\u5e94\u7269\u7406\u97f3\u9891\u6982\u5ff5\u7684\u6f5c\u529b\uff0c\u4e3a\u6784\u5efa\u57fa\u4e8eLLM\u7684\u4e09\u7ef4\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187](http://arxiv.org/abs/2406.09187)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u5e94\u7528\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u65b0\u62c5\u5fe7\u3002\u73b0\u6709\u7684\u63d0\u5347LLM\u5b89\u5168\u6027\u7684\u65b9\u6cd5\u5e76\u4e0d\u76f4\u63a5\u9002\u7528\u4e8eLLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u56e0\u4e3a\u5b83\u4eec\u5177\u6709\u4e0d\u540c\u7684\u76ee\u6807\u548c\u8f93\u51fa\u6a21\u5f0f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\u2014\u2014GuardAgent\uff0c\u5b83\u4f5c\u4e3a\u5176\u4ed6LLM\u4ee3\u7406\u7684\u201c\u9632\u62a4\u680f\u201d\u3002GuardAgent\u901a\u8fc7\u68c0\u67e5\u5176\u8f93\u5165/\u8f93\u51fa\u662f\u5426\u6ee1\u8db3\u7528\u6237\u5b9a\u4e49\u7684\u4e00\u7cfb\u5217\u5b88\u62a4\u8bf7\u6c42\u6765\u76d1\u7763\u76ee\u6807LLM\u3002GuardAgent\u5206\u4e3a\u4e24\u6b65\uff1a1\uff09\u5206\u6790\u63d0\u4f9b\u7684\u5b88\u62a4\u8bf7\u6c42\u521b\u5efa\u4efb\u52a1\u8ba1\u5212\uff1b2\uff09\u6839\u636e\u4efb\u52a1\u8ba1\u5212\u751f\u6210\u5b88\u62a4\u4ee3\u7801\uff0c\u5e76\u901a\u8fc7API\u8c03\u7528\u6216\u5916\u90e8\u5f15\u64ce\u6267\u884c\u3002\u6574\u4e2a\u8fc7\u7a0b\u5229\u7528LLM\u4f5c\u4e3a\u6838\u5fc3\u63a8\u7406\u7ec4\u4ef6\uff0c\u7ed3\u5408\u8bb0\u5fc6\u6a21\u5757\u4e2d\u7684\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u589e\u5f3a\u4e86\u77e5\u8bc6\u9a71\u52a8\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u5404\u79cd\u6587\u672c\u5b88\u62a4\u8bf7\u6c42\u5e76\u51c6\u786e\u5730\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u63d0\u4f9b\u53ef\u9760\u7684\u5b89\u5168\u4fdd\u969c\u3002 GuardAgent\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u5de5\u5177\u7bb1\uff0c\u5305\u542b\u51fd\u6570\u548cAPI\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3LLM\uff0c\u5f3a\u8c03\u4e86\u5176\u901a\u7528\u6027\u53ca\u4f4e\u8fd0\u8425\u6210\u672c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff1aEICU-AC\u7528\u4e8e\u8bc4\u4f30\u533b\u7597\u5065\u5eb7\u4ee3\u7406\u7684\u9690\u79c1\u76f8\u5173\u8bbf\u95ee\u63a7\u5236\uff0cMind2Web-SC\u7528\u4e8e\u8bc4\u4f30\u7f51\u7edc\u4ee3\u7406\u7684\u5b89\u5168\u6027\u3002\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\uff0cGuardAgent\u5206\u522b\u572898.7%\u548c90.0%\u7684\u7cbe\u5ea6\u4e0b\u6709\u6548\u7ba1\u7406\u4e86\u4e24\u79cd\u7c7b\u578b\u4ee3\u7406\u7684\u65e0\u6548\u8f93\u5165\u548c\u8f93\u51fa\u3002\u5b9e\u9a8c\u8fd8\u8868\u660e\uff0cGuardAgent\u80fd\u591f\u9002\u5e94\u65b0\u5174\u7684LLM\u4ee3\u7406\u548c\u5b88\u62a4\u8bf7\u6c42\uff0c\u5b9a\u4e49\u65b0\u7684\u529f\u80fd\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2406.08979": "|**2024-06-13**|**Multi-Agent Software Development through Cross-Team Collaboration**|Zhuoyun Du et.al.|[2406.08979](http://arxiv.org/abs/2406.08979)|**[link](https://github.com/openbmb/chatdev)**|**### \u6982\u8ff0 \u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u5c55\uff0c\u5982ChatDev\uff0c\u63a8\u52a8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u6df1\u523b\u53d8\u9769\uff0c\u7279\u522b\u4f53\u73b0\u5728\u591a\u4ee3\u7406\u534f\u4f5c\u4e0a\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u50cf\u4eba\u7c7b\u56e2\u961f\u4e00\u6837\u5408\u4f5c\uff0c\u9075\u5faa\u7011\u5e03\u6a21\u578b\u8fdb\u884c\u9700\u6c42\u5206\u6790\u3001\u5f00\u53d1\u3001\u5ba1\u67e5\u3001\u6d4b\u8bd5\u7b49\u9636\u6bb5\uff0c\u5b9e\u73b0\u81ea\u4e3b\u8f6f\u4ef6\u751f\u6210\u3002\u7136\u800c\uff0c\u5355\u4e2a\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u6bcf\u4e2a\u9636\u6bb5\u53ea\u4f1a\u4ea7\u751f\u4e00\u79cd\u53ef\u80fd\u7ed3\u679c\uff0c\u5bfc\u81f4\u53ea\u5b8c\u6210\u4e00\u6761\u5f00\u53d1\u94fe\uff0c\u4ece\u800c\u4e27\u5931\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u4e2d\u63a2\u7d22\u591a\u79cd\u51b3\u7b56\u8def\u5f84\u7684\u673a\u4f1a\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u8de8\u56e2\u961f\u534f\u4f5c\uff08Cross-Team Collaboration\uff0cCTC\uff09\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u591a\u56e2\u961f\u7ed3\u6784\uff0c\u5b83\u5141\u8bb8\u534f\u540c\u5de5\u4f5c\u7684\u56e2\u961f\u5728\u8de8\u56e2\u961f\u534f\u4f5c\u73af\u5883\u4e2d\u5171\u540c\u63d0\u51fa\u51b3\u7b56\uff0c\u5e76\u4ea4\u6d41\u5404\u81ea\u89c1\u89e3\uff0c\u4ee5\u4f18\u5316\u5185\u5bb9\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u7684\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u57fa\u51c6\uff0c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5728\u6545\u4e8b\u751f\u6210\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5177\u6709\u5e7f\u6cdb\u7684\u8de8\u9886\u57df\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u671f\u5f85\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u5f15\u5bfcLLMs\u5411\u8de8\u56e2\u961f\u6a21\u5f0f\u53d1\u5c55\uff0c\u5e76\u5728\u8f6f\u4ef6\u5f00\u53d1\u7b49\u9886\u57df\u5e26\u6765\u91cd\u5927\u8fdb\u6b65\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.08747": "|**2024-06-13**|**StreamBench: Towards Benchmarking Continuous Improvement of Language Agents**|Cheng-Kuang Wu et.al.|[2406.08747](http://arxiv.org/abs/2406.08747)|**[link](https://github.com/stream-bench/stream-bench)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u4ece\u7ecf\u9a8c\u4e2d\u81ea\u6211\u63d0\u5347\uff0c\u8fd9\u662f\u90e8\u7f72\u540e\u6301\u7eed\u6539\u8fdb\u7684\u91cd\u8981\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u4e3b\u8981\u8bc4\u4f30\u5b83\u4eec\u7684\u56fa\u6709\u80fd\u529b\uff0c\u800c\u4e0d\u8003\u5bdf\u5b83\u4eec\u968f\u65f6\u95f4\u6539\u8fdb\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86StreamBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u8f93\u5165-\u53cd\u9988\u5e8f\u5217\u4e0a\u7684\u8fde\u7eed\u6539\u8fdb\u6027\u80fd\u3002StreamBench\u6a21\u62df\u4e86\u4e00\u4e2a\u5728\u7ebf\u5b66\u4e60\u73af\u5883\uff0c\u5176\u4e2dLLMs\u63a5\u6536\u5230\u8fde\u7eed\u7684\u53cd\u9988\u6d41\uff0c\u5e76\u8fed\u4ee3\u5730\u63d0\u5347\u5176\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b80\u5355\u4f46\u6709\u6548\u7684LLM\u57fa\u7ebf\uff0c\u5e76\u5bf9\u5f71\u54cd\u6210\u529f\u6d41\u5f0f\u7b56\u7565\u7684\u5173\u952e\u7ec4\u4ef6\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u5f00\u53d1LLMs\u7684\u6709\u6548\u5728\u7ebf\u5b66\u4e60\u7b56\u7565\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u4e3a\u6d41\u5f0f\u573a\u666f\u4e2d\u7684\u66f4\u9002\u5e94\u6027AI\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2406.11277": "|**2024-06-17**|**Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector**|Xiaoxue Cheng et.al.|[2406.11277](http://arxiv.org/abs/2406.11277)|**[link](https://github.com/rucaibox/haluagent)**|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7b\u89c9\u68c0\u6d4b\u65b9\u9762\u7684\u6311\u6218\uff0c\u7279\u522b\u6307\u51fa\u4ee5\u5f80\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u5f3a\u5927\u7684\u95ed\u6e90\u6a21\u578b\u5982GPT-4\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u4e3b\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u79f0\u4e3aHaluAgent\uff0c\u5b83\u5141\u8bb8\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982\u5df4 chcuan2-Chat 7B\uff09\u4e3b\u52a8\u9009\u62e9\u9002\u5408\u68c0\u6d4b\u6587\u672c\u3001\u4ee3\u7801\u548c\u6570\u5b66\u8868\u8fbe\u5f0f\u7b49\u591a\u79cd\u5e7b\u89c9\u7c7b\u578b\u7684\u5de5\u5177\u3002HaluAgent\u6574\u5408\u4e86LLM\u3001\u591a\u529f\u80fd\u5de5\u5177\u7bb1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u7684\u4e09\u9636\u6bb5\u68c0\u6d4b\u6846\u67b6\uff0c\u540c\u65f6\u914d\u5907\u4e86\u8bb0\u5fc6\u673a\u5236\u3002\u4e3a\u4e86\u63d0\u9ad8HaluAgent\u7684\u6548\u80fd\uff0c\u8bba\u6587\u5229\u7528\u73b0\u6709\u7684\u4e2d\u6587\u548c\u82f1\u6587\u6570\u636e\u96c6\u5408\u6210\u68c0\u6d4b\u8f68\u8ff9\u8fdb\u884c\u5fae\u8c03\uff0c\u4f7f\u5176\u5177\u5907\u53cc\u8bed\u5e7b\u89c9\u68c0\u6d4b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4ec5\u4f7f\u75282000\u4e2a\u6837\u672c\u5bf9LLM\u8fdb\u884c\u8c03\u4f18\u540e\uff0cHaluAgent\u5728\u5404\u79cd\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5176\u6027\u80fd\u53ef\u4e0eGPT-4\u5ab2\u7f8e\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\uff0c\u4e14\u65e0\u9700\u989d\u5916\u5de5\u5177\u589e\u5f3a\uff0c\u65e0\u8bba\u5728\u9886\u57df\u5185\u8fd8\u662f\u9886\u57df\u5916\u7684\u6570\u636e\u96c6\u4e0a\u90fd\u5c55\u73b0\u51fa\u826f\u597d\u6027\u80fd\u3002\u8bba\u6587\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53d1\u5e03\u5728https://github.com/RUCAIBox/HaluAgent\u3002|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200](http://arxiv.org/abs/2406.11200)|**[link](https://github.com/zou-group/avatar)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5229\u7528\u5916\u90e8\u5de5\u5177\u548c\u77e5\u8bc6\u63d0\u5347\u51c6\u786e\u6027\u548c\u51cf\u5c11\u9519\u8bef\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bbe\u8ba1\u80fd\u8ba9LLMs\u6709\u6548\u8fd0\u7528\u8fd9\u4e9b\u5de5\u5177\u7684\u63d0\u793a\u6280\u5de7\u662f\u4e00\u9879\u8017\u65f6\u4e14\u4f9d\u8d56\u76f4\u89c9\u7684\u4efb\u52a1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faAvaTaR\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u5b83\u80fd\u4f18\u5316LLMs\uff0c\u4f7f\u5176\u66f4\u6709\u6548\u5730\u5229\u7528\u63d0\u4f9b\u7684\u5de5\u5177\uff0c\u5e76\u5728\u7279\u5b9a\u4efb\u52a1\u6216\u9886\u57df\u4e2d\u63d0\u5347\u6027\u80fd\u3002AvaTaR\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6bd4\u8f83\u5668\u6a21\u5757\uff0c\u4ee5\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u6b63\u8d1f\u6837\u672c\u8fdb\u884c\u63a8\u7406\uff0c\u8fed\u4ee3\u5730\u4e3aLLM\u63d0\u4f9b\u5bcc\u6709\u6d1e\u5bdf\u529b\u548c\u5168\u9762\u7684\u63d0\u793a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u5305\u542b\u6587\u672c\u3001\u89c6\u89c9\u548c\u5173\u7cfb\u4fe1\u606f\u7684\u590d\u6742\u591a\u6a21\u6001\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86AvaTaR\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u8868\u660e\uff0cAvaTaR\u5728\u6240\u6709\u56db\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u5747\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5f53\u5e94\u7528\u4e8e\u65b0\u6848\u4f8b\u65f6\uff0c\u5e73\u5747\u5728Hit@1\u6307\u6807\u4e0a\u5b9e\u73b0\u4e8614%\u7684\u76f8\u5bf9\u6539\u8fdb\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.11176": "|**2024-06-17**|**Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement**|Weimin Xiong et.al.|[2406.11176](http://arxiv.org/abs/2406.11176)|**[link](https://github.com/weiminxiong/ipr)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7cfb\u5217\u590d\u6742\u7684\u4ea4\u4e92\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u503e\u5411\u4e8e\u901a\u8fc7\u4e13\u5bb6\u8f68\u8ff9\u8c03\u4f18\u6765\u63d0\u5347\u6a21\u578b\u6548\u679c\uff0c\u4f46\u4e3b\u8981\u5173\u6ce8\u6700\u7ec8\u7ed3\u679c\u5956\u52b1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u6216\u975e\u6700\u4f18\u884c\u4e3a\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u8fc7\u7a0b\u76d1\u7763\u4fe1\u53f7\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u8fed\u4ee3\u6b65\u7ea7\u8fc7\u7a0b\u6539\u8fdb\uff08Iterative Step-level Process Refinement\uff0cIPR\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u91c7\u7528\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u4f30\u7b97\u6bcf\u4e00\u6b65\u7684\u5956\u52b1\u3002\u5728\u6bcf\u4e2a\u8fed\u4ee3\u4e2d\uff0c\u6a21\u578b\u6cbf\u7740\u4e13\u5bb6\u8f68\u8ff9\u63a2\u7d22\u5e76\u751f\u6210\u65b0\u52a8\u4f5c\uff0c\u7136\u540e\u4e0e\u4e13\u5bb6\u8f68\u8ff9\u7684\u76f8\u5e94\u6b65\u9aa4\u8fdb\u884c\u6bd4\u8f83\uff0c\u4f7f\u7528\u6b65\u7ea7\u5956\u52b1\u8bc4\u4f30\u3002\u8fd9\u79cd\u6bd4\u8f83\u6709\u52a9\u4e8e\u8bc6\u522b\u5dee\u5f02\uff0c\u5f62\u6210\u7528\u4e8e\u8bad\u7ec3\u7684\u5bf9\u6bd4\u52a8\u4f5c\u5bf9\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u590d\u6742\u4ee3\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4f18\u4e8e\u591a\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u63ed\u793a\u4e86IPR\u5728\u63d0\u5347\u52a8\u4f5c\u6548\u7387\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc1\u660e\u5176\u9002\u7528\u4e8e\u5404\u79cd\u6a21\u578b\u3002**|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132](http://arxiv.org/abs/2406.11132)|null|\u5728\u8fc7\u53bb\u7684\u4e00\u5e74\u91cc\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u4e4b\u5916\u5c55\u73b0\u51fa\u60ca\u4eba\u6210\u5c31\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5728\u4ee3\u7801\u751f\u6210\u3001\u65c5\u884c\u89c4\u5212\u548c\u673a\u5668\u4eba\u63a7\u5236\u7b49\u66f4\u5177\u4f53\u7684\u5e94\u7528\u9886\u57df\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u3002\u901a\u8fc7\u4e0eLLM\u6784\u5efa\u6240\u8c13\u7684LLM\u4ee3\u7406\uff0c\u65e8\u5728\u534f\u52a9\u4eba\u4eec\u5b8c\u6210\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u5404\u79cd\u4efb\u52a1\u3002\u7136\u800c\uff0c\u5bf9LLMs\u7684\u63d0\u793a\u8bed\u53e5\u5bf9\u751f\u6210\u5185\u5bb9\u53ca\u5176\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u6210\u4e3a\u8bb8\u591a\u7814\u7a76\u4eba\u5458\u548cLLM\u7528\u6237\u5173\u6ce8\u7684\u7126\u70b9\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u540d\u4e3a\\textsc{RePrompt}\uff0c\u5b83\u5229\u7528\u4e0eLLM\u4ee3\u7406\u4ea4\u4e92\u83b7\u53d6\u7684\u5bf9\u8bdd\u5386\u53f2\uff0c\u901a\u8fc7\u201c\u68af\u5ea6\u4e0b\u964d\u201d\u4f18\u5316LLM\u7684\u9010\u6b65\u6307\u4ee4\u3002\u901a\u8fc7\u4f18\u5316\u63d0\u793a\uff0cLLM\u80fd\u591f\u5b66\u4e60\u7279\u5b9a\u9886\u57df\u7684\u89c4\u5212\u7b56\u7565\u3002\u6211\u4eec\u5728PDDL\u751f\u6210\u548c\u65c5\u884c\u89c4\u5212\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u63d0\u793a\u4f5c\u4e3a\u521d\u59cb\u63d0\u793a\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u5e38\u53ef\u4ee5\u63d0\u9ad8\u4e0d\u540c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u3002|\n", "2406.10918": "|**2024-06-18**|**Embodied Question Answering via Multi-LLM Systems**|Bhrij Patel et.al.|[2406.10918](http://arxiv.org/abs/2406.10918)|null|## \u80cc\u666f Embodied Question Answering\uff08EQA\uff09\u662f\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5b83\u6d89\u53ca\u4e00\u4e2a\u4ee3\u7406\u5728\u73af\u5883\u4e2d\u63a2\u7d22\u4ee5\u56de\u7b54\u7528\u6237\u67e5\u8be2\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4ee3\u7406\u573a\u666f\u4e2d\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u63a2\u7d22\u65f6\u95f4\u5197\u957f\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u591a\u4ee3\u7406\u6846\u67b6\u4e0b\u7684EQA\uff0c\u5176\u4e2d\u6d89\u53ca\u591a\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u72ec\u7acb\u4ee3\u7406\uff0c\u5b83\u4eec\u5404\u81ea\u89e3\u7b54\u5173\u4e8e\u5bb6\u5ead\u73af\u5883\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u4e3a\u6bcf\u4e2a\u67e5\u8be2\u751f\u6210\u4e00\u4e2a\u7b54\u6848\uff0c\u6211\u4eec\u5229\u7528\u5404\u4e2a\u72ec\u7acb\u54cd\u5e94\u6765\u8bad\u7ec3\u4e00\u4e2a\u4e2d\u592e\u7b54\u6848\u6a21\u578b\uff08CAM\uff09\uff0c\u8be5\u6a21\u578b\u6574\u5408\u7b54\u6848\u4ee5\u5b9e\u73b0\u66f4\u7a33\u5065\u7684\u56de\u7b54\u3002\u901a\u8fc7\u4f7f\u7528CAM\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5176\u5728EQA\u51c6\u786e\u7387\u4e0a\u6bd4\u8bf8\u5982\u6295\u7968\u673a\u5236\u548c\u8fa9\u8bba\u7b49ensemble LLM\u805a\u5408\u65b9\u6cd5\u9ad8\u51fa50%\u3002CAM\u65e0\u9700\u4efb\u4f55\u5f62\u5f0f\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\uff0c\u4ece\u800c\u907f\u514d\u4e86\u76f8\u5173\u5f00\u9500\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e0d\u540c\u7684\u975e\u7ebf\u6027\uff08\u5982\u795e\u7ecf\u7f51\u7edc\u3001\u968f\u673a\u68ee\u6797\u3001\u51b3\u7b56\u6811\u3001XGBoost\uff09\u548c\u7ebf\u6027\u7b97\u6cd5\uff08\u5982\u903b\u8f91\u56de\u5f52\u5206\u7c7b\u5668\u3001\u652f\u6301\u5411\u91cf\u673a\uff09\u5bf9CAM\u8fdb\u884c\u4e86\u6d88\u878d\u7814\u7a76\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc7Permutation Feature Importance\uff08PFI\uff09\u5206\u6790\u4e86CAM\u5bf9\u6bcf\u4e2a\u72ec\u7acb\u4ee3\u7406\u548c\u67e5\u8be2\u4e0a\u4e0b\u6587\u7684\u4f9d\u8d56\u7a0b\u5ea6\uff0c\u91cf\u5316\u4e86CAM\u7684\u4f9d\u8d56\u7279\u6027\u3002|\n", "2406.10819": "|**2024-06-16**|**GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents**|Dongping Chen et.al.|[2406.10819](http://arxiv.org/abs/2406.10819)|**[link](https://github.com/keplerlab/katna)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5df2\u88ab\u7528\u4e8e\u63a7\u5236\u952e\u76d8\u548c\u9f20\u6807\u8f93\u5165\uff0c\u76f4\u63a5\u611f\u77e5\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\uff0c\u5e76\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6a21\u578b\u4e3b\u8981\u5728\u9759\u6001\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4e3b\u8981\u5e94\u7528\u4e8e\u76f8\u5bf9\u7b80\u5355\u7684\u9886\u57df\uff0c\u5982\u7f51\u9875\u6216\u79fb\u52a8\u754c\u9762\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e00\u4e2a\u7a33\u5065\u7684GUI\u4ee3\u7406\u5e94\u5177\u5907\u7406\u89e3GUI\u7684\u65f6\u7a7a\u4fe1\u606f\u80fd\u529b\uff0c\u5305\u62ec\u52a8\u6001\u7f51\u9875\u5185\u5bb9\u548c\u591a\u6b65\u9aa4\u4efb\u52a1\uff0c\u8fd8\u8981\u5168\u9762\u7406\u89e3\u5404\u79cdGUI\u573a\u666f\uff0c\u5305\u62ec\u684c\u9762\u8f6f\u4ef6\u548c\u591a\u7a97\u53e3\u4ea4\u4e92\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u6570\u636e\u96c6\u2014\u2014GUI-World\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u7cbe\u5fc3\u5236\u4f5c\u7684\u4eba\u673a\u6807\u6ce8\uff0c\u5e7f\u6cdb\u6db5\u76d6\u516d\u79cdGUI\u573a\u666f\u548c\u516b\u7c7bGUI\u76f8\u5173\u95ee\u9898\uff0c\u4ee5\u4e09\u79cd\u683c\u5f0f\u5448\u73b0\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684MLLM\uff0c\u5982\u56fe\u50cfLLMs\u548c\u89c6\u9891LLMs\uff0c\u5728\u7406\u89e3\u548c\u5904\u7406\u4e0d\u540c\u7c7b\u578bGUI\u5185\u5bb9\uff0c\u7279\u522b\u662f\u52a8\u6001\u548c\u5e8f\u5217\u5185\u5bb9\u65b9\u9762\u7684\u80fd\u529b\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u56fe\u50cfLLMs\u5728\u6ca1\u6709\u624b\u52a8\u6807\u6ce8\u5173\u952e\u5e27\u6216\u64cd\u4f5c\u5386\u53f2\u7684\u60c5\u51b5\u4e0b\uff0c\u96be\u4ee5\u5e94\u5bf9\u52a8\u6001GUI\u5185\u5bb9\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7531\u4e8eGUI\u89c6\u9891\u6570\u636e\u96c6\u7684\u7a00\u758f\u6027\uff0c\u89c6\u9891LLMs\u5728\u6240\u6709GUI\u76f8\u5173\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u57fa\u4e8eGUI-World\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u4f7f\u7528\u5fae\u8c03\u540e\u7684\u89c6\u9891LLM\u4f5c\u4e3aGUI\u4ee3\u7406\uff0c\u663e\u793a\u4e86\u5bf9\u5404\u79cdGUI\u4efb\u52a1\u7406\u89e3\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u7531\u4e8e\u57fa\u7840LLM\u6027\u80fd\u7684\u9650\u5236\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c06\u89c6\u9891LLMs\u7528\u4f5cGUI\u4ee3\u7406\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u672a\u6765\u5728\u52a8\u6001GUI\u5185\u5bb9\u7406\u89e3\u65b9\u9762\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u6d1e\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728\u6211\u4eec\u7684\u9879\u76ee\u4e3b\u9875https://gui-world.github.io/\u4e0a\u516c\u5f00\u3002**|\n", "2406.10803": "|**2024-06-16**|**HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies**|William Watson et.al.|[2406.10803](http://arxiv.org/abs/2406.10803)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u65f6\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u4e3b\u8981\u5305\u62ec\uff1a\uff081\uff09\u5bf9\u4e8e\u5927\u8868\u683c\u6709\u9650\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff1b\uff082\uff09\u4e0d\u540ctoken\u5316\u6a21\u5f0f\u4e0e\u5355\u5143\u683c\u8fb9\u754c\u7684\u590d\u6742\u5dee\u5f02\uff1b\uff083\uff09\u4ee5\u53ca\u4f7f\u7528\u5916\u90e8\u6a21\u578b\u5982gpt-3.5-turbo\u65f6\u7684\u6570\u636e\u4fdd\u5bc6\u95ee\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cHiddenTables\u201d\u7684\u5408\u4f5c\u6e38\u620f\u3002\u8fd9\u4e2a\u6e38\u620f\u6d89\u53ca\u4ee3\u7801\u751f\u6210LLM\u201cSolver\u201d\u548c\u8bc4\u4f30\u5176\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u80fd\u529b\u7684\u201cOracle\u201d\uff0c\u4ee5\u81ea\u7136\u8bed\u8a00\u89c4\u8303\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4fdd\u8bc1\u6570\u636e\u5b89\u5168\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u5b9e\u9a8c\u5728\u591a\u6837\u5316\u7684\u8868\u683c\u4e0a\u5c55\u793a\u4e86LLMs\u5728\u5904\u7406\u590d\u6742\u67e5\u8be2\u3001\u5904\u7406\u7ec4\u5408\u4f9d\u8d56\u4ee5\u53ca\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u5316\u4e3a\u7a0b\u5e8f\u6307\u4ee4\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u7279\u522b\u662f\u5728\u63d0\u4f9b\u5177\u4f53\u8868\u683c\u7ed3\u6784\u7684\u60c5\u51b5\u4e0b\u3002\u4e0e\u57fa\u4e8e\u7f16\u7801\u5668\u7684\u6a21\u578b\u4e0d\u540c\uff0c\u201cHiddenTables\u201d\u4e0d\u53d7\u884c\u6570\u9650\u5236\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u5b8c\u6210 token \u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u201cPyQTax\u201d\uff0c\u5305\u542b116,671\u4e2a\u95ee\u9898-\u8868\u683c-\u7b54\u6848\u4e09\u5143\u7ec4\uff0c\u5e76\u63d0\u4f9b\u4e86\u66f4\u7ec6\u81f4\u7684\u95ee\u9898\u5206\u7c7b\u548c\u6807\u7b7e\uff0c\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86\u6211\u4eec\u7684\u7814\u7a76\u3002 \u56e0\u6b64\uff0c\u9664\u4e86\u5b66\u672f\u8d21\u732e\uff0c\u63ed\u793a\u4e86LLMs\u5728\u8868\u683c\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u4e0d\u8db3\uff0c\u201cHiddenTables\u201d\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u5728\u4fdd\u969c\u6570\u636e\u5b89\u5168\u7684\u540c\u65f6\uff0c\u8ba9LLMs\u4e0e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e92\u52a8\uff0c\u4ee5\u53ca\u964d\u4f4e\u751f\u6210\u6210\u672c\u7684\u5b9e\u8df5\u65b9\u6cd5\u3002|\n", "2406.10478": "|**2024-06-15**|**From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent**|Samuel S. Sohn et.al.|[2406.10478](http://arxiv.org/abs/2406.10478)|null|## \u80cc\u666f \u5728\u5a31\u4e50\u3001\u6559\u80b2\u548c\u8425\u9500\u9886\u57df\u81f3\u5173\u91cd\u8981\u7684\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u9762\u4e34\u7740\u751f\u4ea7\u89c4\u6a21\u6269\u5c55\u548c\u7075\u6d3b\u6027\u63d0\u5347\u7684\u6311\u6218\u3002\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u7684StoryAgent\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u751f\u6210\u5de5\u5177\u6765\u81ea\u52a8\u5316\u5e76\u4f18\u5316\u6570\u5b57\u6545\u4e8b\u521b\u4f5c\u8fc7\u7a0b\u3002\u5b83\u91c7\u7528\u81ea\u4e0a\u800c\u4e0b\u7684\u6545\u4e8b\u60c5\u8282\u8349\u62df\u548c\u81ea\u4e0b\u800c\u4e0a\u7684\u8d44\u4ea7\u751f\u6210\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u624b\u52a8\u5e72\u9884\u3001\u4e92\u52a8\u573a\u666f\u7f16\u6392\u548c\u53d9\u4e8b\u4e00\u81f4\u6027\u7b49\u5173\u952e\u95ee\u9898\u3002\u8fd9\u4e2a\u6846\u67b6\u4fc3\u8fdb\u4e86\u4ea4\u4e92\u5f0f\u548c\u4e00\u81f4\u53d9\u4e8b\u7684\u9ad8\u6548\u751f\u4ea7\uff0c\u9002\u7528\u4e8e\u591a\u79cd\u5a92\u4ecb\uff0c\u63a8\u52a8\u4e86\u5185\u5bb9\u521b\u4f5c\u7684\u6c11\u4e3b\u5316\uff0c\u589e\u5f3a\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8be5\u6846\u67b6\u80fd\u591f\u5728\u6ca1\u6709\u53c2\u8003\u89c6\u9891\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8fde\u8d2f\u7684\u6570\u5b57\u6545\u4e8b\uff0c\u8fd9\u6807\u5fd7\u7740\u81ea\u52a8\u6570\u5b57\u6545\u4e8b\u53d9\u8ff0\u6280\u672f\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12708": "|**2024-06-18**|**AgentReview: Exploring Peer Review Dynamics with LLM Agents**|Yiqiao Jin et.al.|[2406.12708](http://arxiv.org/abs/2406.12708)|null|## \u7ffb\u8bd1 \u540c\u884c\u8bc4\u5ba1\u662f\u79d1\u5b66\u51fa\u7248\u8bda\u4fe1\u548c\u8fdb\u6b65\u7684\u57fa\u7840\u3002\u4f20\u7edf\u7684\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u5206\u6790\u65b9\u6cd5\u5f80\u5f80\u4fa7\u91cd\u4e8e\u73b0\u6709\u6570\u636e\u7684\u63a2\u7d22\u548c\u7edf\u8ba1\uff0c\u4f46\u672a\u80fd\u5145\u5206\u8003\u8651\u8fd9\u4e00\u8fc7\u7a0b\u7684\u591a\u53d8\u91cf\u6027\u8d28\uff0c\u5904\u7406\u6f5c\u5728\u53d8\u91cf\uff0c\u4e14\u53d7\u9650\u4e8e\u9690\u79c1\u95ee\u9898\uff0c\u56e0\u4e3a\u6570\u636e\u6d89\u53ca\u654f\u611f\u6027\u3002\u6211\u4eec\u63d0\u51faAgentReview\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u540c\u884c\u8bc4\u5ba1\u6a21\u62df\u6846\u67b6\uff0c\u6709\u6548\u5206\u89e3\u4e86\u591a\u4e2a\u6f5c\u5728\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u5e76\u89e3\u51b3\u4e86\u9690\u79c1\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7531\u4e8e\u793e\u4f1a\u5f71\u54cd\u529b\u7406\u8bba\u3001\u5229\u4ed6\u4e3b\u4e49\u75b2\u52b3\u548c\u6743\u5a01\u504f\u89c1\u7b49\u793e\u4f1a\u5b66\u7406\u8bba\u7684\u652f\u6301\uff0c\u8bba\u6587\u51b3\u7b56\u4e2d\u5b58\u5728\u663e\u8457\u768437.1%\u7684\u53d8\u5f02\u6027\u3002\u6211\u4eec\u76f8\u4fe1\u8fd9\u9879\u7814\u7a76\u80fd\u4e3a\u4f18\u5316\u540c\u884c\u8bc4\u5ba1\u673a\u5236\u8bbe\u8ba1\u63d0\u4f9b\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628](http://arxiv.org/abs/2406.12628)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u4e8e\u7535\u529b\u7535\u5b50\u7cfb\u7edf\u63a7\u5236\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\uff0c\u7279\u522b\u662f\u6a21\u578b\u4e0d\u786e\u5b9a\u6027\u4ee5\u53ca\u8bbe\u8ba1\u5468\u671f\u6f2b\u957f\u548c\u6210\u672c\u9ad8\u6602\u7684\u95ee\u9898\u3002\u8bba\u6587\u65e8\u5728\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u9762\u5411\u76ee\u6807\u7684\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u3002\u8be5\u6846\u67b6\u5229\u7528LLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7ed3\u5408\u591a\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u65e8\u5728\u5f00\u53d1\u4e00\u4e2a\u9ad8\u6548\u4e14\u81ea\u52a8\u5316\u7684\u63a7\u5236\u5668\u8bbe\u8ba1\u6d41\u7a0b\u3002LLM\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u5e76\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u7684\u9ad8\u7ea7\u6307\u4ee4\uff0c\u6839\u636e\u4efb\u52a1\u7684\u5177\u4f53\u9700\u6c42\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u7ea6\u675f\u8c03\u6574\u5176\u884c\u4e3a\u3002\u8fd9\u79cd\u65b0\u9896\u800c\u9ad8\u6548\u7684\u7b56\u7565\u6709\u671b\u663e\u8457\u63d0\u5347\u7535\u529b\u7535\u5b50\u63a7\u5236\u5668\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\uff0c\u6781\u5927\u5730\u4fbf\u5229\u5b9e\u8df5\u8005\u7684\u5de5\u4f5c\u3002|\n", "2406.12276": "|**2024-06-18**|**CodeNav: Beyond tool-use to using real-world codebases with LLM agents**|Tanmay Gupta et.al.|[2406.12276](http://arxiv.org/abs/2406.12276)|null|\u6211\u4eec\u4ecb\u7ecdCodeNav\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5bfc\u822a\u548c\u5229\u7528\u5148\u524d\u672a\u89c1\u8fc7\u7684\u4ee3\u7801\u4ed3\u5e93\uff0c\u4ee5\u89e3\u51b3\u7528\u6237\u67e5\u8be2\u7684\u7cfb\u7edf\u3002\u4e0e\u9700\u8981\u901a\u8fc7\u624b\u52a8\u63cf\u8ff0\u5728LLM\u4e0a\u4e0b\u6587\u4e2d\u201c\u6ce8\u518c\u201d\u6240\u6709\u76f8\u5173\u5de5\u5177\u7684\u5de5\u5177\u4f7f\u7528\u578bLLM\u4e0d\u540c\uff0cCodeNav\u80fd\u591f\u81ea\u52a8\u7d22\u5f15\u548c\u641c\u7d22\u76ee\u6807\u4ee3\u7801\u5e93\u4e2d\u7684\u4ee3\u7801\u5757\uff0c\u627e\u5230\u76f8\u5173\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u5bfc\u5165\u5b83\u4eec\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u751f\u6210\u89e3\u51b3\u65b9\u6848\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u4e09\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793aCodeNav\u5982\u4f55\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u4ee3\u7801\u5e93\u6765\u89e3\u51b3\u590d\u6742\u7684\u7528\u6237\u95ee\u9898\u3002\u63a5\u7740\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u5b9a\u91cf\u6bd4\u8f83\u4e86\u4ec5\u80fd\u8bbf\u95ee\u76ee\u6807\u4ee3\u7801\u5e93\u7684\u4ee3\u7801\u4f7f\u7528\u65b9\u6cd5\u4e0e\u62e5\u6709\u5bf9\u6240\u6709\u5de5\u5177\u540d\u79f0\u548c\u63cf\u8ff0\u7684\u7279\u6743\u8bbf\u95ee\u7684\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u7c7b\u578b\u5de5\u5177\u548c\u5e93\u63cf\u8ff0\u5bf9\u4ee3\u7801\u4f7f\u7528\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5c06\u6e90\u4ee3\u7801\u89c6\u4e3a\u8f93\u5165\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ee3\u7801\u63cf\u8ff0\u7684\u4f18\u52bf\u3002\u6240\u6709\u4ee3\u7801\u5c06\u9075\u5faa\u5bbd\u677e\u8bb8\u53ef\u534f\u8bae\u5f00\u6e90\u3002|\n", "2406.12125": "|**2024-06-17**|**Efficient Sequential Decision Making with Large Language Models**|Dingyang Chen et.al.|[2406.12125](http://arxiv.org/abs/2406.12125)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u6269\u5c55\u5230\u5e8f\u5217\u51b3\u7b56\u5236\u5b9a\u3002\u5f53\u524d\u7684\u52aa\u529b\u8981\u4e48\u91cd\u65b0\u8bad\u7ec3\u6216\u5fae\u8c03LLMs\u8fdb\u884c\u51b3\u7b56\uff0c\u8981\u4e48\u4e3a\u9884\u8bad\u7ec3\u7684LLMs\u8bbe\u8ba1\u63d0\u793a\u3002\u524d\u8005\u9762\u4e34\u8ba1\u7b97\u8d1f\u62c5\u91cd\u7684\u68af\u5ea6\u66f4\u65b0\u95ee\u9898\uff0c\u800c\u540e\u8005\u672a\u663e\u793a\u51fa\u660e\u663e\u6548\u679c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5728\u7ebf\u6a21\u578b\u9009\u62e9\u7b97\u6cd5\u6709\u6548\u5730\u5c06LLMs\u6574\u5408\u5230\u5e8f\u5217\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u3002\u7edf\u8ba1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u51b3\u7b56\u7b97\u6cd5\u548c\u7eafLLM\u4ee3\u7406\u3002\u5728\u8ba1\u7b97\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u907f\u514d\u4e86\u5bf9LLMs\u8fdb\u884c\u6602\u8d35\u7684\u68af\u5ea6\u66f4\u65b0\uff0c\u5e76\u4e14\u5728\u6574\u4e2a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u4ec5\u9700\u8981\u5c11\u91cf\u7684LLM\u8c03\u7528\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u4ee5\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u4e9a\u9a6c\u900a\u6570\u636e\u96c6\u4e3a\u4f8b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4ec5\u4f7f\u75281.5%\u7684\u65f6\u95f4\u6b65\u6570\u8c03\u7528LLMs\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6bd4\u57fa\u7ebf\u8d85\u8fc76\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2406.14373": "|**2024-07-01**|**Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory**|Gordon Dai et.al.|[2406.14373](http://arxiv.org/abs/2406.14373)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\u7684\u7814\u7a76\u8fce\u6765\u4e86\u5927\u89c4\u6a21\u63a2\u7d22\u7684\u673a\u9047\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u57fa\u4e8e\u5148\u524d\u5bf9LLM\u884c\u4e3a\u4f53\u8bbe\u8ba1\u7684\u7814\u7a76\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u6a21\u62df\u7684Agent\u793e\u4f1a\uff0c\u5176\u4e2d\u590d\u6742\u7684\u793e\u4ea4\u5173\u7cfb\u968f\u65f6\u95f4\u52a8\u6001\u5f62\u6210\u548c\u53d1\u5c55\u3002\u6211\u4eec\u8d4b\u4e88\u8fd9\u4e9bAgent\u5fc3\u7406\u9a71\u52a8\u529b\uff0c\u5e76\u7f6e\u4e8e\u4e00\u4e2a\u6c99\u76d2\u751f\u5b58\u73af\u5883\u4e2d\u3002\u901a\u8fc7\u6258\u9a6c\u65af\u00b7\u970d\u5e03\u65af\u7684\u5960\u57fa\u6027\u793e\u4f1a\u5951\u7ea6\u7406\u8bba\uff08SCT\uff09\u7684\u89c6\u89d2\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u8fd9\u4e2aAgent\u793e\u4f1a\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8d77\u521d\uff0cAgent\u4eec\u8868\u73b0\u51fa\u65e0\u62d8\u65e0\u675f\u7684\u51b2\u7a81\uff0c\u7b26\u5408\u970d\u5e03\u65af\u5bf9\u201c\u81ea\u7136\u72b6\u6001\u201d\u7684\u63cf\u8ff0\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u62df\u7684\u8fdb\u884c\uff0c\u793e\u4f1a\u5951\u7ea6\u9010\u6e10\u5f62\u6210\uff0c\u7edd\u5bf9\u4e3b\u6743\u8005\u5f97\u5230\u4e86\u6388\u6743\uff0c\u8fdb\u800c\u5efa\u7acb\u4e86\u4ee5\u76f8\u4e92\u5408\u4f5c\u4e3a\u57fa\u7840\u7684\u548c\u5e73\u5171\u540c\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u53d1\u73b0\u4e0e\u970d\u5e03\u65af\u7406\u8bba\u76f8\u543b\u5408\uff1aLLM\u9a71\u52a8\u7684\u591aAgent\u6a21\u62df\u5c55\u793a\u4e86\u793e\u4f1a\u52a8\u6001\u7684\u590d\u6742\u6027\uff0c\u53ef\u80fd\u590d\u5236\u5851\u9020\u4eba\u7c7b\u793e\u4f1a\u7684\u529b\u91cf\u3002\u5c3d\u7ba1\u65e0\u6cd5\u5b8c\u5168\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u6240\u6709\u7ec6\u5fae\u4e4b\u5904\uff0c\u4f46\u8fd9\u79cd\u6a21\u62df\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u7ed3\u6784\u3001\u7fa4\u4f53\u52a8\u6001\u548c\u590d\u6742\u4eba\u7c7b\u7cfb\u7edf\u5177\u6709\u6f5c\u5728\u4ef7\u503c\u3002|\n", "2406.14228": "|**2024-06-20**|**EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms**|Siyu Yuan et.al.|[2406.14228](http://arxiv.org/abs/2406.14228)|**[link](https://github.com/siyuyuan/evoagent)**|**\u968f\u7740\u5f3a\u5927\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4e00\u79cd\u65b0\u7684\u8d8b\u52bf\u662f\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u80fd\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\uff0c\u5c24\u5176\u662f\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u7684\u6846\u67b6\uff0c\u8fd9\u9650\u5236\u4e86\u4ee3\u7406\u7cfb\u7edf\u7684\u529f\u80fd\u8303\u56f4\u548c\u53ef\u6269\u5c55\u6027\u3002\u5982\u4f55\u81ea\u52a8\u5c06\u4e13\u95e8\u7684\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51faEvoAgent\uff0c\u8fd9\u662f\u4e00\u79cd\u901a\u8fc7\u8fdb\u5316\u7b97\u6cd5\u81ea\u52a8\u5c06\u4e13\u5bb6\u4ee3\u7406\u6269\u5c55\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u63d0\u9ad8\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6267\u884c\u4efb\u52a1\u4e2d\u7684\u6548\u7387\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u89c6\u73b0\u6709\u7684\u4ee3\u7406\u6846\u67b6\u4e3a\u521d\u59cb\u4e2a\u4f53\uff0c\u5e76\u5e94\u7528\u4e00\u7cfb\u5217\u8fdb\u5316\u64cd\u4f5c\uff08\u5982\u7a81\u53d8\u3001\u4ea4\u53c9\u3001\u9009\u62e9\u7b49\uff09\u751f\u6210\u5177\u6709\u4e0d\u540c\u8bbe\u7f6e\u7684\u4ee3\u7406\u3002EvoAgent\u9002\u7528\u4e8e\u4efb\u4f55\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u80fd\u591f\u65e0\u987b\u989d\u5916\u4eba\u5de5\u8bbe\u8ba1\u81ea\u52a8\u751f\u6210\u6269\u5c55\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cEvoAgent\u80fd\u591f\u81ea\u52a8\u4ea7\u751f\u591a\u4e2a\u4e13\u5bb6\u7ea7\u4ee3\u7406\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u3002**|\n", "2406.13352": "|**2024-06-19**|**AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|**[link](https://github.com/ethz-spylab/agentdojo)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentDojo\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u5904\u7406\u4e0d\u53ef\u4fe1\u6570\u636e\u7684AI\u4ee3\u7406\u7684\u5bf9\u6297\u6027\u9c81\u68d2\u6027\u3002\u9762\u5bf9\u4e0d\u65ad\u6f14\u53d8\u7684\u653b\u51fb\u548c\u9632\u5fa1\u624b\u6bb5\uff0cAgentDojo\u4e0d\u662f\u4e00\u4e2a\u9759\u6001\u7684\u6d4b\u8bd5\u5957\u4ef6\uff0c\u800c\u662f\u8bbe\u8ba1\u548c\u8bc4\u4f30\u65b0\u4efb\u52a1\u3001\u9632\u5fa1\u7b56\u7565\u4ee5\u53ca\u9002\u5e94\u6027\u653b\u51fb\u7684\u53ef\u6269\u5c55\u73af\u5883\u3002\u5b83\u5305\u542b\u4e8697\u4e2a\u5b9e\u9645\u5e94\u7528\u573a\u666f\u7684\u4efb\u52a1\uff08\u5982\u7ba1\u7406\u7535\u5b50\u90ae\u4ef6\u5ba2\u6237\u7aef\u3001\u5bfc\u822a\u7f51\u4e0a\u94f6\u884c\u7f51\u7ad9\u6216\u9884\u8ba2\u65c5\u884c\uff09\uff0c629\u4e2a\u5b89\u5168\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ee5\u53ca\u6765\u81ea\u6587\u732e\u7684\u5404\u79cd\u653b\u51fb\u548c\u9632\u5fa1\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u8bed\u8a00\u6a21\u578b\u5728AgentDojo\u4e2d\u7684\u8868\u73b0\u5e76\u4e0d\u5c3d\u4eba\u610f\uff08\u5373\u4f7f\u6ca1\u6709\u653b\u51fb\uff09\uff0c\u5e76\u4e14\u73b0\u6709\u7684\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u867d\u7136\u80fd\u7834\u574f\u4e00\u4e9b\u5b89\u5168\u7279\u6027\uff0c\u4f46\u5e76\u975e\u6240\u6709\u60c5\u51b5\u90fd\u9002\u7528\u3002\u6211\u4eec\u671f\u671bAgentDojo\u80fd\u591f\u63a8\u52a8\u7814\u7a76\uff0c\u4ee5\u5bfb\u627e\u5728\u89e3\u51b3\u5e38\u89c1\u4efb\u52a1\u65f6\u65e2\u53ef\u9760\u53c8\u5065\u58ee\u7684AI\u4ee3\u7406\u7684\u65b0\u8bbe\u8ba1\u539f\u5219\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/ethz-spylab/agentdojo\u3002**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163](http://arxiv.org/abs/2406.13163)|null|\u53d1\u73b0\u65b0\u6750\u6599\u5bf9\u79d1\u5b66\u548c\u6280\u672f\u5177\u6709\u91cd\u5927\u610f\u4e49\uff0c\u4f46\u76ee\u524d\u4ecd\u662f\u8270\u5de8\u95ee\u9898\uff0c\u56e0\u4e3a\u5316\u5b66\u7a7a\u95f4\u6d69\u701a\u3002\u8fd1\u671f\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u63a8\u52a8\u4e86\u57fa\u4e8e\u6570\u636e\u7684\u65b9\u6cd5\u6765\u5feb\u901f\u7b5b\u9009\u6216\u751f\u6210\u6709\u524d\u666f\u7684\u6750\u6599\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u4f9d\u8d56\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u4e14\u5f80\u5f80\u7f3a\u4e4f\u4eba\u7c7b\u671f\u671b\u7684\u6750\u6599\u8bbe\u8ba1\u7684\u7075\u6d3b\u6027\u548c\u5316\u5b66\u76f4\u89c9\u3002\u6211\u4eec\u63d0\u51faLLMatDesign\uff0c\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u53ef\u89e3\u91ca\u6750\u6599\u8bbe\u8ba1\u65b0\u6846\u67b6\u3002LLMatDesign\u5229\u7528LLM\u4ee3\u7406\u7406\u89e3\u4eba\u7c7b\u6307\u4ee4\uff0c\u5bf9\u6750\u6599\u8fdb\u884c\u4fee\u6539\uff0c\u5e76\u4f7f\u7528\u63d0\u4f9b\u7684\u5de5\u5177\u8bc4\u4f30\u7ed3\u679c\u3002\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u5148\u524d\u51b3\u7b56\uff0cLLMatDesign\u80fd\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u5feb\u901f\u9002\u5e94\u65b0\u4efb\u52a1\u548c\u6761\u4ef6\u3002\u5728\u79bb\u7ebf\u5b9e\u9a8c\u4e2d\uff0c\u5bf9LLMatDesign\u5728\u591a\u4e2a\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u7cfb\u7edf\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u5b83\u5728\u5c0f\u6570\u636e\u73af\u5883\u4e0b\u5f00\u53d1\u51fa\u5177\u6709\u7528\u6237\u5b9a\u4e49\u76ee\u6807\u6027\u8d28\u7684\u65b0\u6750\u6599\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c55\u793a\u4e86\u81ea\u4e3bLLM\u5f15\u5bfc\u7684\u8ba1\u7b97\u73af\u5883\u4e0b\u7684\u6750\u6599\u53d1\u73b0\u7684\u975e\u51e1\u6f5c\u529b\uff0c\u9884\u793a\u7740\u672a\u6765\u81ea\u9a7e\u9a76\u5b9e\u9a8c\u5ba4\u7684\u53ef\u80fd\u6027\u3002|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928](http://arxiv.org/abs/2406.14928)|**[link](https://github.com/thinkwee/iAgents)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a-agent\u7cfb\u7edf\uff08LLM-MAS\uff09\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u5b83\u4eec\u901a\u8fc7\u7cfb\u7edf\u5185\u5404\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u534f\u4f5c\u6765\u5b8c\u6210\u4efb\u52a1\uff0c\u524d\u63d0\u662f\u5171\u4eab\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5f53\u4ee3\u7406\u95f4\u7684\u4ea4\u6d41\u88ab\u7528\u4e8e\u589e\u5f3a\u4eba\u7c7b\u5408\u4f5c\u65f6\uff0c\u7531\u4e8e\u4fe1\u606f\u4e0d\u5bf9\u79f0\uff08\u6bcf\u4e2a\u4ee3\u7406\u4ec5\u80fd\u8bbf\u95ee\u5176\u5bf9\u5e94\u4eba\u7c7b\u7528\u6237\u7684\u4fe1\u606f\uff09\uff0c\u8fd9\u5e26\u6765\u4e86\u65b0\u7684\u6311\u6218\u3002\u4f20\u7edfMAS\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u96be\u4ee5\u5b8c\u6210\u4efb\u52a1\u3002\u4e3a\u89e3\u51b3\u6b64\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u591aagent\u7cfb\u7edf\u67b6\u6784\uff0c\u79f0\u4e3a\u201ciAgents\u201d\uff0c\u5373\u4fe1\u606f\u4e30\u5bcc\u591aagent\u7cfb\u7edf\u3002\u5728iAgents\u4e2d\uff0c\u4eba\u7c7b\u793e\u4f1a\u7f51\u7edc\u5728\u4ee3\u7406\u7f51\u7edc\u4e2d\u5f97\u5230\u53cd\u6620\uff0c\u4ee3\u7406\u4e3b\u52a8\u4ea4\u6362\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4ece\u800c\u514b\u670d\u4fe1\u606f\u4e0d\u5bf9\u79f0\u3002iAgents\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ee3\u7406\u63a8\u7406\u673a\u5236\uff0cInfoNav\uff0c\u5f15\u5bfc\u4ee3\u7406\u4e4b\u95f4\u7684\u6709\u6548\u4fe1\u606f\u4ea4\u6d41\u3002\u7ed3\u5408InfoNav\uff0ciAgents\u7ec4\u7ec7\u4e86\u6df7\u5408\u8bb0\u5fc6\u4e2d\u7684\u4eba\u7c7b\u4fe1\u606f\uff0c\u4e3a\u4ee3\u7406\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u4fe1\u606f\u8fdb\u884c\u4ea4\u6362\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u9996\u4e2a\u9488\u5bf9\u8bc4\u4f30LLM\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u6761\u4ef6\u4e0b\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u7684\u57fa\u51c6\u2014\u2014InformativeBench\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0ciAgents\u80fd\u591f\u5728\u5305\u542b140\u4eba\u548c588\u6761\u5173\u7cfb\u7684\u793e\u4f1a\u7f51\u7edc\u4e2d\u534f\u4f5c\uff0c\u81ea\u4e3b\u8fdb\u884c\u8d85\u8fc730\u8f6e\u7684\u901a\u4fe1\uff0c\u5e76\u4ece\u8fd170,000\u6761\u6d88\u606f\u4e2d\u68c0\u7d22\u4fe1\u606f\uff0c\u57283\u5206\u949f\u5185\u5b8c\u6210\u4efb\u52a1\u3002**|\n", "2406.14884": "|**2024-06-21**|**FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents**|Ruixuan Xiao et.al.|[2406.14884](http://arxiv.org/abs/2406.14884)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u88ab\u8bbe\u8ba1\u7528\u4e8e\u901a\u8fc7\u8fed\u4ee3\u89c4\u5212\u548c\u884c\u52a8\u6765\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u5904\u7406\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7684\u4efb\u52a1\u65f6\uff0c\u5bb9\u6613\u4ea7\u751f\u4e0d\u671f\u671b\u7684\u89c4\u5212\u5e7b\u89c9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u521d\u6b65\u5c1d\u8bd5\u901a\u8fc7\u878d\u5165\u4e0e\u5de5\u4f5c\u6d41\u7a0b\u76f8\u5173\u7684\u5916\u90e8\u77e5\u8bc6\u6765\u589e\u5f3a\u89c4\u5212\u53ef\u9760\u6027\u3002\u5c3d\u7ba1\u663e\u793a\u51fa\u6f5c\u529b\uff0c\u4f46\u6ce8\u5165\u7684\u77e5\u8bc6\u901a\u5e38\u6742\u4e71\u65e0\u7ae0\uff0c\u683c\u5f0f\u591a\u6837\uff0c\u7f3a\u4e4f\u4e25\u8c28\u7684\u89c4\u8303\u5316\u548c\u5168\u9762\u7684\u6bd4\u8f83\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u89c4\u8303\u4e86\u4e0d\u540c\u683c\u5f0f\u7684\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86FlowBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u9762\u5411\u5de5\u4f5c\u6d41\u5f15\u5bfc\u89c4\u5212\u7684\u57fa\u51c6\u3002FlowBench\u6db5\u76d6\u4e86\u6765\u81ea6\u4e2a\u9886\u57df\u768451\u4e2a\u4e0d\u540c\u573a\u666f\uff0c\u5176\u4e2d\u77e5\u8bc6\u4ee5\u591a\u6837\u7684\u5f62\u5f0f\u5448\u73b0\u3002\u4e3a\u4e86\u8bc4\u4f30\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u5728FlowBench\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u5c42\u6b21\u7684\u8bc4\u4f30\u6846\u67b6\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5de5\u4f5c\u6d41\u7a0b\u77e5\u8bc6\u5728\u591a\u79cd\u683c\u5f0f\u4e0b\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5728\u6ee1\u8db3\u6ee1\u610f\u7684\u89c4\u5212\u9700\u6c42\u65b9\u9762\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u57fa\u51c6\u80fd\u4e3a\u672a\u6765\u7684\u4ee3\u7406\u89c4\u5212\u7814\u7a76\u94fa\u5e73\u9053\u8def\u3002|\n", "2406.17232": "|**2024-06-25**|**Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks**|Yun-Shiuan Chuang et.al.|[2406.17232](http://arxiv.org/abs/2406.17232)|null|### \u7ffb\u8bd1 \u6784\u5efa\u903c\u771f\u7684\u4eba\u5de5\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u4fe1\u7684\u793e\u4f1a\u6a21\u62df\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u6709\u65f6\u80fd\u63d0\u5347\u4eba\u6027\u5316\uff0c\u4f46\u6548\u679c\u5e76\u4e0d\u603b\u662f\u7406\u60f3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u662f\u5426\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u6765\u81ea\u5b9e\u8bc1\u4eba\u7c7b\u4fe1\u5ff5\u7f51\u7edc\u7684\u4fe1\u606f\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u4e0e\u4eba\u7c7b\u884c\u4e3a\u7684\u5951\u5408\u5ea6\u3002\u6211\u4eec\u5229\u7528\u4e00\u9879\u4eba\u7c7b\u8c03\u67e5\u6570\u636e\uff0c\u4f30\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b18\u4e2a\u4e3b\u9898\u7684\u4fe1\u5ff5\u7f51\u7edc\uff0c\u8fd9\u4e9b\u4e3b\u9898\u52a0\u8f7d\u4e8e\u4e24\u4e2a\u4e0d\u91cd\u53e0\u7684\u6f5c\u5728\u56e0\u5b50\u4e0a\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728LLM\u4e2d\u690d\u5165\u4e00\u4e2a\u5173\u4e8e\u67d0\u4e00\u4e3b\u9898\u7684\u89c2\u70b9\uff0c\u5206\u6790\u5176\u5bf9\u5269\u4f59\u6d4b\u8bd5\u8bdd\u9898\u8868\u8fbe\u7684\u89c2\u70b9\u4e0e\u76f8\u5e94\u4eba\u7c7b\u6570\u636e\u7684\u5951\u5408\u7a0b\u5ea6\u3002\u4ec5\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u7684\u89d2\u8272\u626e\u6f14\u672a\u80fd\u4f7fLLM\u548c\u4eba\u7c7b\u89c2\u70b9\u4fdd\u6301\u4e00\u81f4\uff0c\u4f46\u5f53\u690d\u5165\u5355\u4e00\u4fe1\u5ff5\u65f6\uff0c\u5bf9\u4e8e\u76f8\u5173\u4e8e\u4fe1\u5ff5\u7f51\u7edc\u5185\u7684\u4e3b\u9898\uff0c\u8fd9\u79cd\u4e00\u81f4\u6027\u663e\u8457\u63d0\u9ad8\uff0c\u800c\u5bf9\u4e8e\u7f51\u7edc\u5916\u7684\u4e3b\u9898\u5219\u6ca1\u6709\u660e\u663e\u5f71\u54cd\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5728\u8ffd\u6c42\u7406\u89e3\u548c\u6a21\u62df\u793e\u4f1a\u4e2d\u4fe1\u5ff5\u5206\u5e03\u6a21\u5f0f\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u4e2d\uff0c\u5b9e\u73b0\u4eba\u7c7b\u4e0eLLMs\u4e4b\u95f4\u7684\u4fe1\u5ff5\u5bf9\u9f50\u3002|\n", "2406.18702": "|**2024-06-26**|**Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship**|Zachary R. Baker et.al.|[2406.18702](http://arxiv.org/abs/2406.18702)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u865a\u62df\u4ee3\u7406\u6765\u6a21\u62df\u7acb\u6cd5\u8fc7\u7a0b\uff0c\u5177\u4f53\u805a\u7126\u4e8e\u7f8e\u56fd\u53c2\u8bae\u9662\u60c5\u62a5\u59d4\u5458\u4f1a\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4ee3\u8868\u4e2a\u522b\u53c2\u8bae\u5458\u7684\u4ee3\u7406\uff0c\u5e76\u5728\u6a21\u62df\u7684\u59d4\u5458\u4f1a\u8ba8\u8bba\u4e2d\u8ba9\u5b83\u4eec\u4e92\u52a8\u3002\u8fd9\u4e9b\u4ee3\u7406\u5c55\u73b0\u51fa\u5728\u73b0\u5b9e\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u6df1\u601d\u719f\u8651\u7684\u89c2\u70b9\uff0c\u5e76\u5728\u7279\u5b9a\u6761\u4ef6\u4e0b\u627e\u5230\u4e24\u515a\u7684\u89e3\u51b3\u65b9\u6848\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6a21\u62df\u663e\u793a\uff0c\u9762\u5bf9\u5916\u90e8\u5e72\u6270\u65f6\uff0c\u4ee3\u7406\u6a21\u578b\u5728\u4e24\u515a\u5408\u4f5c\u4e0a\u5c55\u73b0\u51fa\u8f6c\u53d8\u7684\u6f5c\u529b\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u7b56\u7565\u53ef\u80fd\u6210\u4e3a\u7406\u89e3\u548c\u6539\u8fdb\u7acb\u6cd5\u6d41\u7a0b\u7684\u6709\u6548\u5de5\u5177\uff0c\u8fd9\u4e0e\u4e00\u7cfb\u5217\u53d1\u73b0\u76f8\u547c\u5e94\uff0c\u5373\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u80fd\u6709\u7528\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u73b0\u8c61\u3002\u672a\u6765\u7684\u7814\u7a76\u5c06\u81f4\u529b\u4e8e\u63d0\u5347\u4ee3\u7406\u7684\u590d\u6742\u6027\uff0c\u6269\u5927\u6a21\u62df\u8303\u56f4\uff0c\u5e76\u63a2\u7d22\u5728\u653f\u7b56\u6d4b\u8bd5\u548c\u8c08\u5224\u4e2d\u7684\u5e94\u7528\u3002|\n", "2406.19966": "|**2024-06-28**|**Simulating Financial Market via Large Language Model based Agents**|Shen Gao et.al.|[2406.19966](http://arxiv.org/abs/2406.19966)|null|\u5927\u591a\u6570\u7ecf\u6d4e\u7406\u8bba\u901a\u5e38\u5047\u8bbe\u91d1\u878d\u5e02\u573a\u53c2\u4e0e\u8005\u662f\u5b8c\u5168\u7406\u6027\u7684\u4e2a\u4f53\uff0c\u5e76\u4f7f\u7528\u6570\u5b66\u6a21\u578b\u6765\u6a21\u62df\u4eba\u7c7b\u5728\u91d1\u878d\u5e02\u573a\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u884c\u4e3a\u5f80\u5f80\u5e76\u975e\u5b8c\u5168\u7406\u6027\uff0c\u7528\u6570\u5b66\u6a21\u578b\u7cbe\u786e\u9884\u6d4b\u9887\u5177\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\\textbf{A}gent-based \\textbf{S}imulated \\textbf{F}inancial \\textbf{M}arket\uff08ASFM\uff09\uff0c\u9996\u5148\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u771f\u5b9e\u8ba2\u5355\u5339\u914d\u7cfb\u7edf\u7684\u6a21\u62df\u80a1\u7968\u5e02\u573a\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80a1\u7968\u4ea4\u6613\u4ee3\u7406\uff0c\u5b83\u5305\u62ec\u4e2a\u4eba\u6982\u51b5\u3001\u89c2\u5bdf\u548c\u57fa\u4e8e\u5de5\u5177\u5b66\u4e60\u7684\u52a8\u4f5c\u6a21\u5757\u3002\u8fd9\u79cd\u4ea4\u6613\u4ee3\u7406\u80fd\u591f\u5168\u9762\u7406\u89e3\u5f53\u524d\u5e02\u573a\u52a8\u6001\u548c\u91d1\u878d\u653f\u7b56\u4fe1\u606f\uff0c\u4ece\u800c\u6839\u636e\u5176\u4ea4\u6613\u7b56\u7565\u4f5c\u51fa\u51b3\u7b56\u3002\u5b9e\u9a8c\u8868\u660e\uff0cASFM\u5728\u53ef\u63a7\u573a\u666f\u4e0b\u7684\u53cd\u5e94\u4e0e\u73b0\u5b9e\u80a1\u7968\u5e02\u573a\u4e00\u81f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u4e24\u4e2a\u7ecf\u6d4e\u5b66\u7814\u7a76\u70ed\u70b9\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u53d1\u73b0\uff0c\u6211\u4eec\u7684\\model\u5f97\u51fa\u7684\u7ed3\u8bba\u4e0e\u7ecf\u6d4e\u5b66\u7814\u7a76\u7684\u521d\u6b65\u53d1\u73b0\u76f8\u543b\u5408\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8ba4\u4e3aASFM\u4e3a\u7ecf\u6d4e\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u8303\u5f0f\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u5982\u4e13\u95e8\u5316\u7684\u6a21\u578b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u53ef\u4ee5\u6839\u636e\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5728\u533b\u7597\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u95e8\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u79f0\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\u6765\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u4e3a\u7ed9\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u5408\u9002\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\u7684\u6700\u65b0\u72b6\u6001\uff0c\u751a\u81f3\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4o\u76f8\u6bd4\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cMMedAgent\u8fd8\u663e\u793a\u51fa\u4e86\u66f4\u65b0\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u7684\u9ad8\u6548\u6027\u3002|\n", "2407.01887": "|**2024-07-02**|**Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents**|Fanzeng Xia et.al.|[2407.01887](http://arxiv.org/abs/2407.01887)|null|\u672c\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u51b3\u7b56\u5236\u5b9a\u4e2d\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u675c\u5c14\u514b\u59c6\u53cc\u81c2\u8d4c\u535a\uff08Dueling Bandits\uff0cDB\uff09\u95ee\u9898\u7684\u4e0a\u4e0b\u6587\u4e2d\u3002\u7814\u7a76\u6bd4\u8f83\u4e86GPT-3.5-Turbo\u3001GPT-4\u548cGPT-4-Turbo\u4e0e\u73b0\u6709DB\u7b97\u6cd5\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c24\u5176\u662fGPT-4 Turbo\uff0c\u80fd\u591f\u5feb\u901f\u8bc6\u522b\u51fa\u4f18\u52bf\u660e\u663e\u7684\u9009\u9879\uff0c\u4ece\u800c\u5728\u5f31\u540e\u6094\u65b9\u9762\u8d85\u8d8a\u5f53\u524d\u6700\u4f73\u7b97\u6cd5\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6536\u655b\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5bf9\u63d0\u793a\u7684\u654f\u611f\u5ea6\u8f83\u9ad8\uff0c\u4e14\u5bf9\u63d0\u793a\u53d8\u5316\u53cd\u5e94\u8106\u5f31\u3002\u4e3a\u4e86\u6539\u8fdb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u4e86LLM\u51b3\u7b56\u80fd\u529b\u4e0e\u7ecf\u5178DB\u7b97\u6cd5\u7406\u8bba\u4fdd\u8bc1\u7684\u589e\u5f3a\u578b\u7b97\u6cd5\u2014\u2014IF-Enhanced LLM\u3002\u8fd9\u79cd\u8bbe\u8ba1\u5c55\u793a\u4e86\u5982\u4f55\u589e\u5f3aLLM\u5728\u5bf9\u6027\u80fd\u7a33\u5b9a\u6027\u6709\u8981\u6c42\u7684\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u53ef\u4fe1\u5ea6\u3002IF-Enhanced LLM\u5177\u6709\u5f31\u540e\u6094\u548c\u5f3a\u540e\u6094\u7684\u7406\u8bba\u4fdd\u8bc1\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u5373\u4f7f\u9762\u5bf9\u5608\u6742\u548c\u5bf9\u6297\u6027\u7684\u63d0\u793a\uff0cIF-Enhanced LLM\u4ecd\u4fdd\u6301\u7a33\u5065\u3002|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489](http://arxiv.org/abs/2407.01489)|**[link](https://github.com/OpenAutoCoder/Agentless)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u81ea\u52a8\u5316\uff0c\u5982\u4ee3\u7801\u5408\u6210\u3001\u7a0b\u5e8f\u4fee\u590d\u548c\u6d4b\u8bd5\u751f\u6210\uff0c\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\u3002\u7814\u7a76\u4eba\u5458\u548c\u4e1a\u754c\u5b9e\u8df5\u8005\u5df2\u7ecf\u5f00\u53d1\u51fa\u5404\u79cd\u81ea\u4e3bLLM\u4ee3\u7406\u6765\u6267\u884c\u7aef\u5230\u7aef\u7684\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u3001\u8fd0\u884c\u547d\u4ee4\u3001\u89c2\u5bdf\u73af\u5883\u53cd\u9988\u5e76\u89c4\u5212\u672a\u6765\u884c\u52a8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7406\u7684\u65b9\u6cd5\u7684\u590d\u6742\u6027\u4ee5\u53ca\u5f53\u524dLLM\u7684\u5c40\u9650\u6027\uff0c\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u662f\u5426\u771f\u7684\u9700\u8981\u4f7f\u7528\u590d\u6742\u7684\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\uff1f\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u6784\u5efa\u4e86Agentless\u2014\u2014\u4e00\u79cd\u65e0\u4ee3\u7406\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u89e3\u51b3\u8f6f\u4ef6\u5f00\u53d1\u95ee\u9898\u3002\u4e0e\u590d\u6742\u7684\u4ee3\u7406\u8bbe\u7f6e\u76f8\u6bd4\uff0cAgentless\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u4e24\u9636\u6bb5\u8fc7\u7a0b\uff1a\u5b9a\u4f4d\u540e\u4fee\u590d\uff0c\u4e0d\u8ba9LLM\u51b3\u5b9a\u672a\u6765\u7684\u884c\u52a8\u6216\u64cd\u4f5c\u590d\u6742\u7684\u5de5\u5177\u3002\u5728\u6d41\u884c\u7684SWE-bench Lite\u57fa\u51c6\u4e0a\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4ee4\u4eba\u60ca\u8bb6\u5730\u8868\u660e\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u80fd\u591f\u5b9e\u73b0\u6700\u9ad8\u6027\u80fd\uff0827.33%\uff09\u548c\u6700\u4f4e\u6210\u672c\uff080.34\u7f8e\u5143\uff09\uff0c\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u8f6f\u4ef6\u4ee3\u7406\uff01 \u6b64\u5916\uff0c\u6211\u4eec\u624b\u52a8\u5206\u7c7b\u4e86SWE-bench Lite\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u53d1\u73b0\u5b58\u5728\u7cbe\u786e\u7684ground truth\u8865\u4e01\u95ee\u9898\u6216\u63cf\u8ff0\u4e0d\u8db3/\u8bef\u5bfc\u6027\u7684\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86SWE-bench Lite-S\uff0c\u901a\u8fc7\u6392\u9664\u8fd9\u4e9b\u95ee\u9898\u6765\u8fdb\u884c\u66f4\u4e25\u683c\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u7a81\u663e\u4e86\u5f53\u524d\u88ab\u5ffd\u89c6\u7684\u7b80\u5355\u3001\u53ef\u89e3\u91ca\u6280\u672f\u5728\u81ea\u4e3b\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5e0c\u671bAgentless\u5c06\u4f5c\u4e3a\u81ea\u4e3b\u8f6f\u4ef6\u4ee3\u7406\u7684\u57fa\u7ebf\u3001\u8d77\u70b9\u548c\u671f\u671b\u503c\uff0c\u6fc0\u53d1\u672a\u6765\u5728\u8fd9\u4e2a\u5173\u952e\u9886\u57df\u7684\u5de5\u4f5c\u3002**|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231](http://arxiv.org/abs/2407.01231)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u6536\u96c6\u5168\u7403\u4fe1\u606f\uff0c\u5e76\u8fdb\u884c\u63a8\u7406\u4ee5\u89e3\u51b3\u590d\u6742\u95ee\u9898\uff0c\u8fd9\u5f15\u53d1\u4e86\u4f7f\u7528LLM\u9884\u6d4b\u56fd\u9645\u4e8b\u4ef6\u7684\u5174\u8da3\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u4e00\u4e2a\u4e25\u683c\u8bc4\u4f30LLM\u9884\u6d4b\u80fd\u529b\u4e0e\u53ef\u9760\u6027\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faMIRAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u51c6\uff0c\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4ef7LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u7684\u8868\u73b0\u3002MIRAI\u6784\u5efa\u4e86\u4e00\u4e2a\u4ee3\u7406\u73af\u5883\uff0c\u914d\u5907\u6709\u8bbf\u95ee\u5e7f\u6cdb\u5386\u53f2\u7ed3\u6784\u5316\u4e8b\u4ef6\u548c\u6587\u672c\u65b0\u95fb\u6570\u636e\u5e93\u7684\u5de5\u5177\u3002\u6211\u4eec\u5bf9GDELT\u4e8b\u4ef6\u6570\u636e\u5e93\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6e05\u6d17\u548c\u89e3\u6790\uff0c\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u5173\u8054\u9884\u6d4b\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u4e0d\u540c\u9884\u6d4b\u65f6\u95f4\u8303\u56f4\uff0c\u4ece\u77ed\u671f\u5230\u957f\u671f\uff0c\u4ee5\u68c0\u9a8cLLM\u5728\u6574\u5408\u5168\u7403\u5173\u952e\u4fe1\u606f\u3001\u8fd0\u7528\u9886\u57df\u7279\u5b9aAPI\u548c\u5e93\u7f16\u5199\u4ee3\u7801\u4ee5\u53ca\u7efc\u5408\u5904\u7406\u6765\u81ea\u591a\u79cd\u683c\u5f0f\u548c\u65f6\u95f4\u7684\u5386\u53f2\u77e5\u8bc6\u4ee5\u51c6\u786e\u9884\u6d4b\u672a\u6765\u4e8b\u4ef6\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u53ef\u9760\u7684\u6846\u67b6\uff0c\u4ee5\u8bc4\u4f30LLM\u5728\u56fd\u9645\u4e8b\u4ef6\u9884\u6d4b\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u63a8\u52a8\u66f4\u7cbe\u786e\u548c\u53ef\u4fe1\u7684\u56fd\u9645\u5173\u7cfb\u5206\u6790\u6a21\u578b\u7684\u53d1\u5c55\u3002|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993](http://arxiv.org/abs/2407.00993)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u57fa\u4e8eLLM\u7684\u79fb\u52a8\u4ee3\u7406\u5df2\u6210\u4e3a\u4eba\u673a\u4ea4\u4e92\u9886\u57df\u7684\u7814\u7a76\u70ed\u70b9\u3002\u7136\u800c\uff0c\u9488\u5bf9\u6b64\u7c7b\u4ee3\u7406\u7684\u57fa\u51c6\u6d4b\u8bd5\u8d44\u6e90\u76f8\u5bf9\u532e\u4e4f\u3002\u8bc4\u4f30\u8fd9\u7c7b\u4ee3\u7406\u901a\u5e38\u9762\u4e34\u4e09\u4e2a\u6311\u6218\uff1a\uff081\uff09\u4ec5\u4f9d\u8d56\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\u7684\u4f4e\u6548\u9650\u5236\u4e86\u4efb\u52a1\u8bc4\u4f30\uff1b\uff082\uff09\u5355\u4e00\u5e94\u7528\u4e2d\u7684\u7279\u5b9a\u6307\u4ee4\u4e0d\u8db3\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u7684\u591a\u7ef4\u5ea6\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\uff1b\uff083\uff09\u5f53\u524d\u7684\u8bc4\u4f30\u6307\u6807\u65e0\u6cd5\u51c6\u786e\u8861\u91cf\u8fde\u7eed\u52a8\u4f5c\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Mobile-Bench\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u7528\u4e8e\u8bc4\u4f30LLM\u79fb\u52a8\u4ee3\u7406\u80fd\u529b\u7684\u57fa\u51c6\u3002\u9996\u5148\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4f20\u7edf\u7684UI\u64cd\u4f5c\uff0c\u878d\u5165\u4e86103\u4e2a\u6536\u96c6\u5230\u7684API\uff0c\u4ee5\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7684\u6548\u7387\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u7ed3\u5408\u771f\u5b9e\u7528\u6237\u67e5\u8be2\u548cLLM\u589e\u5f3a\u7684\u6570\u636e\u6536\u96c6\u6765\u8fdb\u884c\u8bc4\u4f30\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u8bc4\u4ef7\u79fb\u52a8\u4ee3\u7406\u7684\u4e0d\u540c\u89c4\u5212\u80fd\u529b\u5c42\u6b21\uff0c\u6211\u4eec\u7684\u6570\u636e\u88ab\u5206\u4e3aSAST\uff08\u7b80\u5355\u4efb\u52a1\uff09\u3001SAMT\uff08\u7a0d\u590d\u6742\u4efb\u52a1\uff09\u548cMAMT\uff08\u591a\u4efb\u52a1\uff09\u4e09\u7c7b\uff0c\u53cd\u6620\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u7684\u5dee\u5f02\u3002Mobile-Bench\u5305\u542b832\u6761\u6570\u636e\u6761\u76ee\uff0c\u5176\u4e2d\u8d85\u8fc7200\u9879\u4efb\u52a1\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u6d4b\u8bd5\u8de8\u5e94\u7528\u534f\u4f5c\u573a\u666f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3aCheckPoint\uff0c\u7528\u4e8e\u68c0\u67e5LLM\u79fb\u52a8\u4ee3\u7406\u5728\u89c4\u5212\u548c\u63a8\u7406\u6b65\u9aa4\u4e2d\u662f\u5426\u8fbe\u5230\u5173\u952e\u70b9\u3002|\n", "2407.00476": "|**2024-06-29**|**Large Language Models for Power Scheduling: A User-Centric Approach**|Thomas Mongaillard et.al.|[2407.00476](http://arxiv.org/abs/2407.00476)|**[link](https://github.com/thomasmong/llm-power-scheduling)**|**\u968f\u7740\u4f20\u7edf\u4f18\u5316\u548c\u8c03\u5ea6\u65b9\u6cd5\u9010\u6e10\u8f6c\u5411\u7528\u6237\u9a71\u52a8\u548c\u4e2a\u4eba\u5316\u670d\u52a1\uff0c\u4ee5\u63d0\u5347\u7528\u6237\u4f53\u9a8c\uff08QoE\uff09\u548c\u7075\u6d3b\u6027\uff0c\u672a\u6765\u7684\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5728\u65e0\u7ebf\u548c\u6570\u5b57\u5316\u80fd\u6e90\u7f51\u7edc\u4e2d\uff0c\u9762\u4e34\u7740\u5982\u4f55\u66f4\u597d\u5730\u7406\u89e3\u548c\u54cd\u5e94\u7528\u6237\u9700\u6c42\u7684\u6311\u6218\u3002\u4f20\u7edf\u7684\u7cfb\u7edf\u5f80\u5f80\u5ffd\u89c6\u4e86\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\uff0c\u56e0\u4e3a\u7528\u6237\u4e0e\u673a\u5668\u4e4b\u95f4\u7684\u6c9f\u901a\u4e0d\u7545\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u5e26\u6765\u4e86\u7a81\u7834\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u7528\u6237\u4e0e\u8bbe\u5907\u4e4b\u95f4\u81ea\u7136\u7684\u4ea4\u6d41\u754c\u9762\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u67b6\u6784\uff0c\u901a\u8fc7\u6784\u5efa\u4e09\u4e2aLLM\u4ee3\u7406\u6765\u5c06\u7528\u6237\u7684\u8bed\u97f3\u8bf7\u6c42\uff08VRQ\uff09\u8f6c\u5316\u4e3a\u8d44\u6e90\u5206\u914d\u5411\u91cf\u3002\u5177\u4f53\u5305\u62ec\uff1aLLM\u610f\u56fe\u8bc6\u522b\u4ee3\u7406\u5c06\u8bf7\u6c42\u8f6c\u5316\u4e3a\u4f18\u5316\u95ee\u9898\uff08OP\uff09\u3001LLM OP\u53c2\u6570\u8bc6\u522b\u4ee3\u7406\u4ee5\u53caLLM OP\u6c42\u89e3\u4ee3\u7406\u3002 \u6211\u4eec\u9488\u5bf9\u7535\u52a8\u6c7d\u8f66\uff08EV\uff09\u5145\u7535\u7684\u5178\u578bVRQ\u521b\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u5e93\uff0c\u4f5c\u4e3a\u6027\u80fd\u8bc4\u4f30\u7684\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u4e3b\u8981\u4f7f\u7528Llama 3 8B\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\u3002\u901a\u8fc7\u4e0d\u540c\u7684\u63d0\u793a\u5de5\u7a0b\u573a\u666f\u6d4b\u8bd5\uff0c\u7ed3\u679c\u663e\u793a\u4e86\u6240\u63d0\u67b6\u6784\u7684\u6709\u6548\u6027\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u952e\u89c1\u89e3\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u5efa\u6a21\u5b9e\u9645\u95ee\u9898\u7684\u66f4\u5927\u5019\u9009OP\u96c6\u53ef\u80fd\u4f1a\u7531\u4e8e\u66f4\u9ad8\u7684\u8bc6\u522b/OP\u5206\u7c7b\u566a\u58f0\u800c\u964d\u4f4e\u6700\u7ec8\u6027\u80fd\u3002\u6240\u6709\u7ed3\u679c\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u5229\u7528\u3002**|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365](http://arxiv.org/abs/2407.00365)|null|\u4eba\u5de5\u667a\u80fd\u5728\u91d1\u878d\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6b63\u5728\u91cd\u5851\u6570\u636e\u5904\u7406\u548c\u89e3\u8bfb\u65b9\u5f0f\u3002\u5176\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u81ea\u52a8\u5316\u590d\u6742\u4efb\u52a1\u3001\u63d0\u5347\u5ba2\u6237\u670d\u52a1\uff0c\u5e76\u63d0\u4f9b\u8be6\u5c3d\u7684\u8d22\u52a1\u5206\u6790\u3002\u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdIDEA-FinBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u91d1\u878d\u77e5\u8bc6\u65b9\u9762\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u7684\u8bc4\u4ef7\u57fa\u51c6\u3002\u5b83\u501f\u9274\u4e86\u4e24\u4e2a\u5168\u7403\u77e5\u540d\u4e14\u6743\u5a01\u7684\u91d1\u878d\u4e13\u4e1a\u8003\u8bd5\u4e2d\u7684\u95ee\u9898\uff0c\u65e8\u5728\u5168\u9762\u68c0\u9a8cLLMs\u89e3\u7b54\u4e0e\u91d1\u878d\u76f8\u5173\u8003\u9898\u7684\u80fd\u529b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51faIDEA-FinKER\uff0c\u662f\u4e00\u4e2a\u91d1\u878d\u77e5\u8bc6\u589e\u5f3a\u6846\u67b6\uff0c\u65e8\u5728\u5feb\u901f\u8ba9\u901a\u7528LLMs\u9002\u5e94\u91d1\u878d\u9886\u57df\u3002\u5b83\u91c7\u7528\u57fa\u4e8e\u68c0\u7d22\u7684\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b9e\u73b0\u5b9e\u65f6\u4e0a\u4e0b\u6587\u7ea7\u77e5\u8bc6\u6ce8\u5165\uff0c\u5e76\u63d0\u4f9b\u4e00\u5957\u9ad8\u8d28\u91cf\u7684\u91d1\u878d\u77e5\u8bc6\u6307\u4ee4\uff0c\u7528\u4e8e\u5fae\u8c03\u4efb\u4f55\u901a\u7528\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86IDEA-FinQA\uff0c\u4e00\u4e2a\u7531LLMs\u9a71\u52a8\u7684\u91d1\u878d\u95ee\u7b54\u7cfb\u7edf\u3002\u8be5\u7cfb\u7edf\u56f4\u7ed5\u5b9e\u65f6\u77e5\u8bc6\u6ce8\u5165\u548c\u4e8b\u5b9e\u5f3a\u5316\u7684\u67b6\u6784\u6784\u5efa\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u3002IDEA-FinQA\u4e3b\u8981\u7531\u6570\u636e\u6536\u96c6\u5668\u3001\u6570\u636e\u67e5\u8be2\u6a21\u5757\u548c\u6267\u884c\u7279\u5b9a\u529f\u80fd\u7684LLM\u4ee3\u7406\u7ec4\u6210\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u5ea6\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04503": "|**2024-07-05**|**When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions**|J\u00e9r\u00e9my Perez et.al.|[2407.04503](http://arxiv.org/abs/2407.04503)|**[link](https://github.com/jeremyperez2/telephonegamellm)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e4b\u95f4\u7684\u4e92\u52a8\u589e\u52a0\uff0c\u5b83\u4eec\u5728\u7ebf\u4e0a\u751f\u6210\u7684\u6587\u672c\u91cf\u4e5f\u968f\u4e4b\u589e\u591a\uff0c\u7814\u7a76\u5982\u4f55\u4fe1\u606f\u5728\u4ece\u4e00\u4e2aLLM\u4f20\u9012\u5230\u53e6\u4e00\u4e2aLLM\u7684\u8fc7\u7a0b\u4e2d\u53d1\u751f\u53d8\u5316\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5bf9\u5355\u4e2aLLM\u7684\u884c\u4e3a\u5df2\u6709\u6df1\u5165\u7814\u7a76\uff0c\u4f46\u5bf9\u8fed\u4ee3\u4ea4\u4e92\u4e2d\u96c6\u4f53\u884c\u4e3a\u548c\u4fe1\u606f\u626d\u66f2\u7684\u63a2\u8ba8\u76f8\u5bf9\u4e0d\u8db3\u3002\u5fae\u5c0f\u7684\u504f\u5dee\uff0c\u5728\u5355\u6b21\u8f93\u51fa\u65f6\u53ef\u80fd\u663e\u5f97\u4e0d\u660e\u663e\uff0c\u4f46\u5728\u591a\u6b21\u4ea4\u4e92\u4e2d\u53ef\u80fd\u4f1a\u88ab\u653e\u5927\uff0c\u53ef\u80fd\u5bfc\u81f4\u5185\u5bb9\u671d\u7740\u5438\u5f15\u5b50\u72b6\u6001\u6f14\u53d8\u3002\u6211\u4eec\u901a\u8fc7\u501f\u9274\u4eba\u7c7b\u6587\u5316\u8fdb\u5316\u5b66\u7684\u7814\u7a76\u65b9\u6cd5\u2014\u2014\u7535\u8bdd\u6e38\u620f\u5b9e\u9a8c\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u94fe\u5f0f\u4f20\u8f93\u6a21\u578b\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLLM\u4ee3\u7406\u63a5\u6536\u3001\u751f\u6210\u5e76\u4f20\u9012\u6587\u672c\uff0c\u4ece\u4e00\u4e2a\u94fe\u4e2d\u7684\u524d\u4e00\u4e2a\u4ee3\u7406\u5230\u4e0b\u4e00\u4e2a\u3002\u6211\u4eec\u8ffd\u8e2a\u4e86\u6587\u672c\u7684\u6bd2\u6027\u3001\u79ef\u6781\u5ea6\u3001\u96be\u5ea6\u548c\u957f\u5ea6\u5728\u4f20\u8f93\u94fe\u4e2d\u7684\u6f14\u53d8\uff0c\u63ed\u793a\u4e86\u504f\u89c1\u548c\u5438\u5f15\u5b50\u7684\u5b58\u5728\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u4e0e\u521d\u59cb\u6587\u672c\u3001\u6307\u4ee4\u3001\u8bed\u8a00\u6a21\u578b\u548c\u6a21\u578b\u89c4\u6a21\u7684\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u53d1\u73b0\u5f00\u653e\u6027\u6307\u4ee4\u6bd4\u7ea6\u675f\u6027\u4efb\u52a1\u66f4\u5bb9\u6613\u5f15\u53d1\u66f4\u5f3a\u7684\u5438\u5f15\u6548\u5e94\u3002\u6b64\u5916\uff0c\u4e0d\u540c\u7684\u6587\u672c\u7279\u6027\u5bf9\u5438\u5f15\u5b50\u6548\u5e94\u7684\u654f\u611f\u5ea6\u4e0d\u540c\uff0c\u6bd2\u6027\u7684\u5f71\u54cd\u901a\u5e38\u5927\u4e8e\u957f\u5ea6\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8003\u8651\u591a\u6b65\u9aa4\u4f20\u8f93\u52a8\u6001\u7684\u91cd\u8981\u6027\uff0c\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3LLM\u7684\u6587\u5316\u52a8\u6001\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.04363": "|**2024-07-05**|**AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents**|Petr Anokhin et.al.|[2407.04363](http://arxiv.org/abs/2407.04363)|**[link](https://github.com/airi-institute/arigraph)**|**\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u4e2d\u5c55\u73b0\u51fa\u5e7f\u9614\u7684\u5e94\u7528\u524d\u666f\u3002\u5b9e\u73b0\u771f\u6b63\u7684\u81ea\u4e3b\u6027\u9700\u8981\u4ece\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u4e2d\u79ef\u7d2f\u548c\u66f4\u65b0\u77e5\u8bc6\uff0c\u5e76\u80fd\u6709\u6548\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u3002\u5f53\u524d\u57fa\u4e8eLLMs\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u5168\u5386\u53f2\u89c2\u5bdf\u3001\u603b\u7ed3\u6216\u68c0\u7d22\u589e\u5f3a\uff0c\u4f46\u8fd9\u4e9b\u975e\u7ed3\u6784\u5316\u7684\u8bb0\u5fc6\u8868\u793a\u4e0d\u5229\u4e8e\u590d\u6742\u51b3\u7b56\u4e2d\u7684\u63a8\u7406\u548c\u89c4\u5212\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51faAriGraph\uff0c\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\uff0c\u8ba9\u4ee3\u7406\u5728\u63a2\u7d22\u73af\u5883\u4e2d\u6784\u5efa\u878d\u5408\u8bed\u4e49\u548c\u60c5\u8282\u8bb0\u5fc6\u7684\u8bb0\u5fc6\u56fe\u3002\u8fd9\u79cd\u56fe\u7ed3\u6784\u4fc3\u8fdb\u5173\u8054\u6982\u5ff5\u7684\u6709\u6548\u68c0\u7d22\uff0c\u8fd9\u4e9b\u6982\u5ff5\u4e0e\u4ee3\u7406\u5f53\u524d\u72b6\u6001\u548c\u76ee\u6807\u76f8\u5173\uff0c\u4ece\u800c\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u73af\u5883\u6a21\u578b\uff0c\u63d0\u5347\u63a2\u7d22\u548c\u89c4\u5212\u80fd\u529b\u3002 \u6211\u4eec\u8bbe\u8ba1\u7684Ariadne LLM\u4ee3\u7406\uff0c\u914d\u5907\u6709\u6211\u4eec\u63d0\u51fa\u7684\u8bb0\u5fc6\u67b6\u6784\u4ee5\u53ca\u89c4\u5212\u548c\u51b3\u7b56\u529f\u80fd\uff0c\u80fd\u5728\u96f6\u6837\u672c\u57fa\u7840\u4e0a\u5904\u7406TextWorld\u73af\u5883\u4e2d\u7684\u590d\u6742\u4efb\u52a1\uff0c\u5982First TextWorld Problems\u7ade\u8d5b\u4e2d\u7684\u70f9\u996a\u6311\u6218\uff0c\u4ee5\u53ca\u65b0\u4efb\u52a1\u5982\u623f\u5c4b\u6e05\u6d01\u548c\u5bfb\u5b9d\u8c1c\u9898\u3002\u4e0e\u5168\u5386\u53f2\u3001\u603b\u7ed3\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b49\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002**|\n", "2407.06112": "|**2024-07-08**|**Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning**|Yadong Zhang et.al.|[2407.06112](http://arxiv.org/abs/2407.06112)|null|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63a8\u7406\u65b9\u6cd5\u2014\u2014\u53cc\u5411\u51b3\u7b56\u89e3\u653e\u63a8\u7406\uff08BIDDER\uff09\uff0c\u65e8\u5728\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u51b3\u7b56\u5408\u7406\u6027\u3002\u4f20\u7edf\u63a8\u7406\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u5386\u53f2\u4fe1\u606f\uff0c\u91c7\u7528\u5355\u5411\uff08\u4ece\u5de6\u5230\u53f3\uff09\u7684\u63a8\u7406\u7b56\u7565\uff0c\u8fd9\u5bfc\u81f4\u5bf9\u6f5c\u5728\u672a\u6765\u7ed3\u679c\u7684\u8ba4\u8bc6\u4e0d\u8db3\uff0c\u4ee5\u53ca\u5386\u53f2\u80cc\u666f\u7684\u6574\u5408\u4e0d\u591f\u5145\u5206\uff0c\u4ece\u800c\u4ea7\u751f\u6b21\u4f18\u51b3\u7b56\u3002BIDDER\u901a\u8fc7\u878d\u5408\u7406\u6027\u51b3\u7b56\u7684\u539f\u5219\uff0c\u7279\u522b\u662f\u5904\u7406\u4e0d\u786e\u5b9a\u6027\u5e76\u9884\u6d4b\u671f\u671b\u6548\u7528\uff0c\u5f25\u8865\u4e86\u8fd9\u4e00\u77ed\u677f\u3002\u5176\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u4ece\u5386\u53f2\u6570\u636e\u4e2d\u63a8\u65ad\u9690\u85cf\u72b6\u6001\uff0c\u4ee5\u8868\u793a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4e0d\u786e\u5b9a\u4fe1\u606f\uff1b\u5229\u7528\u8fd9\u4e9b\u9690\u85cf\u72b6\u6001\u9884\u6d4b\u672a\u6765\u7684\u6f5c\u5728\u72b6\u6001\u548c\u53ef\u80fd\u7ed3\u679c\uff1b\u7ed3\u5408\u5386\u53f2\u4fe1\u606f\uff08\u8fc7\u53bb\u60c5\u5883\uff09\u548c\u957f\u671f\u7ed3\u679c\uff08\u672a\u6765\u60c5\u5883\uff09\uff0c\u4ee5\u6307\u5bfc\u63a8\u7406\u3002\u901a\u8fc7\u53cc\u5411\u63a8\u7406\uff0cBIDDER\u80fd\u591f\u5168\u9762\u8003\u8651\u8fc7\u53bb\u548c\u672a\u6765\u7684\u60c5\u5883\uff0c\u4ece\u800c\u505a\u51fa\u66f4\u660e\u667a\u3001\u66f4\u7406\u6027\u7684\u51b3\u7b56\u3002\u6211\u4eec\u5728\u6251\u514b\uff08\u9650\u6ce8\u5fb7\u5dde\u6251\u514b\uff09\u548c\u8c08\u5224\u4e24\u4e2a\u660e\u786e\u573a\u666f\u4e2d\u6d4b\u8bd5\u4e86BIDDER\u7684\u6548\u679c\uff0c\u5b9e\u9a8c\u663e\u793a\u5b83\u663e\u8457\u63d0\u9ad8\u4e86\u8bed\u8a00\u6a21\u578b\u548c\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7684\u51b3\u7b56\u80fd\u529b\u3002|\n", "2407.05890": "|**2024-07-08**|**Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation**|Jiaqi Chen et.al.|[2407.05890](http://arxiv.org/abs/2407.05890)|null|\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89c6\u89c9\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u96f6\u6837\u672c\u7684\u5f3a\u5927\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u5173\u6ce8\u89e3\u51b3\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\uff0c\u901a\u8fc7\u9009\u62e9\u9884\u5b9a\u4e49\u5bfc\u822a\u56fe\u4e2d\u7684\u8282\u70b9\u8fdb\u884c\u79fb\u52a8\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u573a\u666f\u4e2d\u4f4e\u5c42\u6b21\u7684\u63a7\u5236\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AO-Planner\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u9762\u5411\u53ef\u53ca\u6027\u89c4\u5212\u7684\u8fde\u7eed\u89c6\u89c9\u5bfc\u822a\u6846\u67b6\u3002AO-Planner\u6574\u5408\u591a\u79cd\u57fa\u7840\u6a21\u578b\uff0c\u5b9e\u73b0\u9762\u5411\u53ef\u53ca\u6027\u7684\u8fd0\u52a8\u89c4\u5212\u548c\u52a8\u4f5c\u51b3\u7b56\uff0c\u5747\u4ee5\u96f6\u6837\u672c\u7684\u65b9\u5f0f\u6267\u884c\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u89c6\u89c9\u53ef\u53ca\u6027\u63d0\u793a\uff08VAP\uff09\u65b9\u6cd5\uff0c\u5229\u7528SAM\u5206\u5272\u53ef\u89c1\u5730\u9762\uff0c\u63d0\u4f9b\u5bfc\u822a\u53ef\u53ca\u6027\u4fe1\u606f\uff0c\u4ece\u800c\u8ba9\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u6f5c\u5728\u7684\u4e0b\u4e00\u4e2a\u8def\u6807\uff0c\u5e76\u751f\u6210\u5411\u9009\u5b9a\u8def\u6807\u7684\u4f4e\u5c42\u6b21\u8def\u5f84\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9ad8\u7ea7\u4ee3\u7406PathAgent\uff0c\u8bc6\u522b\u51fa\u6700\u53ef\u80fd\u7684\u50cf\u7d20\u7ea7\u8def\u5f84\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u4e09\u7ef4\u5750\u6807\uff0c\u4ee5\u5b8c\u6210\u4f4e\u5c42\u6b21\u7684\u79fb\u52a8\u3002 \u5728\u5177\u6709\u6311\u6218\u6027\u7684R2R-CE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0cAO-Planner\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u96f6\u6837\u672c\u6027\u80fd\u63d0\u5347\uff08SPL\u6307\u6807\u63d0\u9ad85.5%\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u8fde\u63a5\u4e86\u8bed\u8a00\u6a21\u578b\u4e0e\u4e09\u7ef4\u4e16\u754c\uff0c\u907f\u514d\u4e86\u76f4\u63a5\u9884\u6d4b\u4e16\u754c\u5750\u6807\u70b9\u7684\u56f0\u96be\uff0c\u4e3a\u5229\u7528\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4f4e\u5c42\u6b21\u8fd0\u52a8\u63a7\u5236\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\u3002|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5176\u4e2d\u7684\u5173\u952e\u90e8\u5206\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u8fdb\u884c\u8bc4\u4f30\u548c\u8fed\u4ee3\u4f18\u5316\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5047\u8bbe\u5fc3\u667a\u5728Melting Pot\u57fa\u51c6\u4e2d\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u65e0\u8bba\u662f\u4e8c\u5143\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\uff0c\u90fd\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\uff08LLM-agent\uff09\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7840\u7ebf\u3002\u5bf9\u6bd4\u5b9e\u9a8c\u8fd8\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u7cbe\u70bc\u5bf9\u4e8e\u5728\u590d\u6742\u573a\u666f\u4e2d\u53d6\u5f97\u6210\u529f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.06813": "|**2024-07-09**|**Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy**|Zhenyu Guan et.al.|[2407.06813](http://arxiv.org/abs/2407.06813)|null|## \u80cc\u666f \u5728\u4eba\u7c7b\u793e\u4f1a\u4e2d\uff0c\u5916\u4ea4\u662f\u4e00\u79cd\u6781\u5176\u590d\u6742\u7684\u6d3b\u52a8\uff0c\u6d89\u53ca\u4f17\u591a\u5404\u65b9/\u884c\u52a8\u8005\u7684\u4e92\u52a8\uff0c\u9700\u8981\u5177\u5907\u793e\u4f1a\u63a8\u7406\u3001\u8c08\u5224\u6280\u5de7\u548c\u957f\u671f\u7b56\u7565\u89c4\u5212\u7b49\u591a\u65b9\u9762\u80fd\u529b\u3002\u4ee5\u5f80\u7684AI\u4ee3\u7406\u5df2\u7ecf\u5728\u5904\u7406\u591a\u6b65\u9aa4\u6e38\u620f\u548c\u5927\u52a8\u4f5c\u7a7a\u95f4\u7684\u591a\u4ee3\u7406\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u5b9e\u529b\u3002\u7136\u800c\uff0c\u5916\u4ea4\u6240\u6d89\u53ca\u7684\u51b3\u7b56\u7a7a\u95f4\u8303\u56f4\u60ca\u4eba\uff0c\u7279\u522b\u662f\u5728\u9700\u8981\u8c08\u5224\u7684\u9636\u6bb5\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e00\u4e9b\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4e86\u8d85\u8d8a\u524d\u4ee3\u7684\u80fd\u529b\uff0c\u4f46\u4ecd\u4e0d\u8db3\u4ee5\u5e94\u5bf9\u590d\u6742\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u957f\u65f6\u95f4\u7684\u89c4\u5212\u3002\u501f\u52a9\u5c16\u7aef\u7684LLM\u6280\u672f\uff0c\u6211\u4eec\u9996\u6b21\u5c1d\u8bd5\u63a2\u7d22AI\u5728\u5982\u6b64\u5168\u9762\u7684\u591a\u4ee3\u7406\u4f7f\u547d\u4e2d\u7684\u4e0a\u9650\uff0c\u901a\u8fc7\u6574\u5408\u4e09\u4e2a\u6838\u5fc3\u4e14\u5173\u952e\u7684\u529f\u80fd\uff0c\u4ee5\u6784\u5efa\u66f4\u5f3a\u7684\u57fa\u4e8eLLM\u7684\u793e\u4f1a\u6027\u4ee3\u7406\uff1a1\uff09\u5177\u6709\u8bb0\u5fc6\u548c\u53cd\u601d\u7684\u7b56\u7565\u89c4\u5212\u8005\uff1b2\uff09\u76ee\u6807\u5bfc\u5411\u7684\u3001\u5177\u5907\u793e\u4f1a\u63a8\u7406\u7684\u8c08\u5224\u8005\uff1b3\uff09\u901a\u8fc7\u81ea\u6211\u5bf9\u5f08\u6e38\u620f\u589e\u5f3a\u8bb0\u5fc6\uff0c\u5b9e\u73b0\u65e0\u4eba\u5de5\u5e72\u9884\u7684\u81ea\u6211\u8fdb\u5316\u3002|\n", "2407.06567": "|**2024-07-10**|**FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making**|Yangyang Yu et.al.|[2407.06567](http://arxiv.org/abs/2407.06567)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6267\u884c\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u5e76\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u91d1\u878d\u9886\u57df\u3002\u7136\u800c\uff0c\u9ad8\u8d28\u91cf\u7684\u8fde\u7eed\u6295\u8d44\u51b3\u7b56\u8fc7\u7a0b\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5b83\u9700\u8981\u4e0e\u4e0d\u65ad\u53d8\u5316\u7684\u73af\u5883\u8fdb\u884c\u591a\u6b21\u4ea4\u4e92\uff0c\u4ee5\u6700\u5927\u5316\u56de\u62a5\u5e76\u7ba1\u7406\u98ce\u9669\u3002\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u5b83\u4eec\u80fd\u591f\u8d85\u8d8a\u4eba\u7c7b\u56e2\u961f\uff0c\u5b9e\u73b0\u6295\u8d44\u6536\u76ca\uff0c\u4f46\u5982\u4f55\u4f18\u5316\u591a\u6e90\u4fe1\u606f\u6574\u5408\u548c\u51b3\u7b56\u7ed3\u679c\uff0c\u901a\u8fc7\u5b9e\u65f6\u7ecf\u9a8c\u6539\u8fdb\uff0c\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faFinCon\uff0c\u4e00\u4e2a\u4e13\u4e3a\u591a\u6837\u5316\u7684\u91d1\u878d\u4efb\u52a1\u8bbe\u8ba1\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u7279\u70b9\u5728\u4e8e\u6982\u5ff5\u5316\u53e3\u5934\u5f3a\u5316\u548c\u8d22\u52a1\u7ec4\u7ec7\u7ed3\u6784\u7684\u8fd0\u7528\u3002 FinCon\u501f\u9274\u73b0\u5b9e\u4e16\u754c\u6295\u8d44\u516c\u53f8\u7684\u7ec4\u7ec7\u67b6\u6784\uff0c\u91c7\u7528\u7ecf\u7406-\u5206\u6790\u5e08\u7684\u6c9f\u901a\u5c42\u6b21\uff0c\u4fc3\u8fdb\u8de8\u804c\u80fd\u4ee3\u7406\u95f4\u7684\u534f\u540c\u5408\u4f5c\uff0c\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u6d41\u5b9e\u73b0\u76ee\u6807\u7edf\u4e00\u3002\u6bcf\u4e2a\u4ee3\u7406\u90fd\u5177\u5907\u6bd4\u4eba\u7c7b\u66f4\u5927\u7684\u8bb0\u5fc6\u5bb9\u91cf\uff0c\u8fd9\u6709\u52a9\u4e8e\u66f4\u9ad8\u6548\u7684\u4fe1\u606f\u5904\u7406\u3002\u6b64\u5916\uff0cFinCon\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u98ce\u9669\u63a7\u5236\u7ec4\u4ef6\uff0c\u5b9a\u671f\u542f\u52a8\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4ee5\u66f4\u65b0\u7cfb\u7edf\u7684\u6295\u8d44\u7406\u5ff5\u3002\u8fd9\u4e9b\u6982\u5ff5\u5316\u7684\u4fe1\u5ff5\u4f5c\u4e3a\u53e3\u5934\u5f3a\u5316\uff0c\u6307\u5bfc\u672a\u6765\u884c\u4e3a\uff0c\u5e76\u53ef\u6839\u636e\u9700\u8981\u9009\u62e9\u6027\u5730\u4f20\u9012\u7ed9\u9700\u8981\u66f4\u65b0\u77e5\u8bc6\u7684\u8282\u70b9\uff0c\u4ece\u800c\u51cf\u5c11\u4e0d\u5fc5\u8981\u7684\u4fe1\u606f\u4ea4\u6d41\u6210\u672c\uff0c\u63d0\u9ad8\u6027\u80fd\u3002 FinCon\u5728\u5355\u4e00\u80a1\u7968\u4ea4\u6613\u548c\u8d44\u4ea7\u7ba1\u7406\u7b49\u4e0d\u540c\u91d1\u878d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u91d1\u878d\u573a\u666f\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u57fa\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u4f8b\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.08550": "|**2024-07-11**|**Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility**|Yuchen Xia et.al.|[2407.08550](http://arxiv.org/abs/2407.08550)|**[link](https://github.com/yuchenxia/gpt4industrialautomation)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6574\u5408\u5230\u81ea\u52a8\u5316\u751f\u4ea7\u7cfb\u7edf\u4e2d\uff0c\u4ee5\u63d0\u5347\u4efb\u52a1\u81ea\u52a8\u5316\u548c\u7075\u6d3b\u6027\u3002\u6211\u4eec\u6839\u636e\u81ea\u52a8\u5316\u91d1\u5b57\u5854\u6784\u5efa\u751f\u4ea7\u64cd\u4f5c\u7684\u5c42\u7ea7\u7ed3\u6784\uff0c\u5c06\u539f\u5b50\u64cd\u4f5c\u529f\u80fd\u62bd\u8c61\u4e3a\u5fae\u670d\u52a1\uff0c\u5e76\u901a\u8fc7\u4e13\u7528\u7684\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u8fdb\u884c\u8c03\u7528\u6267\u884c\u3002\u8fd9\u4e3a\u534f\u8c03\u751f\u4ea7\u6d41\u7a0b\u63d0\u4f9b\u4e86\u53ef\u6269\u5c55\u4e14\u7075\u6d3b\u7684\u57fa\u7840\u3002\u5728\u6570\u5b57\u5b6a\u751f\u7cfb\u7edf\u4e2d\uff0c\u4f4e\u5c42\u6b21\u7684\u3001\u786c\u4ef6\u7279\u5b9a\u7684\u6570\u636e\u88ab\u8d4b\u4e88\u8bed\u4e49\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u7406\u89e3\u548c\u5904\u7406\u751f\u4ea7\u8ba1\u5212\u4e0e\u63a7\u5236\u4efb\u52a1\u3002\u5f53\u63a5\u6536\u5230\u7528\u6237\u8bf7\u6c42\u6216\u8bc6\u522b\u5230\u89e6\u53d1\u4e8b\u4ef6\u65f6\uff0cLLMs\u4f1a\u751f\u6210\u751f\u4ea7\u6d41\u7a0b\u8ba1\u5212\uff0c\u7136\u540e\u5c06\u5176\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u5fae\u670d\u52a1\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u7684\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u6267\u884c\u3002\u6211\u4eec\u5728\u5b9e\u9a8c\u5ba4\u7684\u6a21\u5757\u5316\u81ea\u52a8\u5316\u8bbe\u65bd\u4e0a\u5b9e\u73b0\u4e86\u8fd9\u4e00\u6574\u4f53\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e00\u4e2a\u5b9e\u9645\u6848\u4f8b\u5c55\u793a\u4e86LLMs\u5982\u4f55\u5904\u7406\u751f\u4ea7\u89c4\u5212\u548c\u63a7\u5236\u4efb\u52a1\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e00\u4e2a\u76f4\u89c2\u3001\u81ea\u52a8\u5316\u7a0b\u5ea6\u9ad8\u4e14\u66f4\u5177\u7075\u6d3b\u6027\u7684\u751f\u4ea7\u73af\u5883\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u73b0LLMs\u5728\u81ea\u4e3b\u7cfb\u7edf\u4e2d\u7684\u5168\u90e8\u6f5c\u529b\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5176\u6f5c\u5728\u7684\u6709\u76ca\u4e4b\u5904\u3002\u6709\u5173\u6b64\u7cfb\u5217\u7814\u7a76\u7684\u6f14\u793a\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u8bbf\u95ee\uff1ahttps://github.com/YuchenXia/GPT4IndustrialAutomation\u3002|\n", "2407.08213": "|**2024-07-11**|**PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models**|Ruiqi Wang et.al.|[2407.08213](http://arxiv.org/abs/2407.08213)|null|## \u7ffb\u8bd1 \u504f\u597d\u9a71\u52a8\u7684\u5f3a\u5316\u5b66\u4e60\uff08PbRL\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u4eba\u7c7b\u6bd4\u8f83\u53cd\u9988\u6559\u5bfc\u673a\u5668\u4eba\uff0c\u907f\u514d\u4e86\u590d\u6742\u7684\u5956\u52b1\u5de5\u7a0b\u7684\u9700\u6c42\u3002\u7136\u800c\uff0c\u73b0\u6709PbRL\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u53cd\u9988\uff0c\u5f80\u5f80\u5bfc\u81f4\u5bf9\u7531\u811a\u672c\u6559\u5e08\u751f\u6210\u7684\u5408\u6210\u53cd\u9988\u7684\u4f9d\u8d56\uff0c\u8fd9\u53c8\u56de\u5230\u4e86\u590d\u6742\u7684\u5956\u52b1\u8bbe\u8ba1\uff0c\u5e76\u96be\u4ee5\u9002\u5e94\u4eba\u7c7b-\u673a\u5668\u4eba\u4ea4\u4e92\uff08HRI\uff09\u573a\u666f\u4e2d\u7528\u6237\u5bf9\u540c\u4e00\u4efb\u52a1\u7684\u72ec\u7279\u671f\u671b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PrefCLM\uff0c\u5b83\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6a21\u62df\u6559\u5e08\u53c2\u4e0ePbRL\u3002\u6211\u4eec\u8fd0\u7528Dempster-Shafer\u7406\u8bba\u5728\u5206\u6570\u7ea7\u522b\u878d\u5408\u6765\u81ea\u591a\u4e2aLLM\u4ee3\u7406\u7684\u4e2a\u4eba\u504f\u597d\uff0c\u6709\u6548\u5229\u7528\u5b83\u4eec\u7684\u591a\u6837\u6027\u548c\u96c6\u4f53\u667a\u6167\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7528\u6237\u53c2\u4e0e\u7684\u6d41\u7a0b\uff0c\u4ee5\u4fc3\u8fdb\u57fa\u4e8e\u7528\u6237\u4ea4\u4e92\u7684\u96c6\u4f53\u7cbe\u8fdb\u3002\u5728\u5404\u79cd\u901a\u7528\u5f3a\u5316\u5b66\u4e60\u4efb\u52a1\u4e2d\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cPrefCLM\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u811a\u672c\u6559\u5e08\u76f8\u5f53\uff0c\u5e76\u4e14\u5728\u4fc3\u8fdb\u66f4\u81ea\u7136\u3001\u9ad8\u6548\u7684\u673a\u5668\u4eba\u884c\u4e3a\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u4e00\u4e2a\u73b0\u5b9e\u4e16\u754c\u7684\u7528\u6237\u7814\u7a76\uff08N=10\uff09\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5b83\u5728\u4e2a\u6027\u5316\u7528\u6237\u504f\u597d\u7684\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86HRI\u573a\u666f\u4e2d\u7684\u7528\u6237\u6ee1\u610f\u5ea6\u3002|\n", "2407.10718": "|**2024-07-16**|**Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning**|Yulong Wang et.al.|[2407.10718](http://arxiv.org/abs/2407.10718)|**[link](https://github.com/ag2s1/sibyl-system)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u73b0\u6709\u4ee3\u7406\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u5185\u5728\u77e5\u8bc6\u3001\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u96f6\u6837\u672c\u80fd\u529b\u4ee5\u53ca\u4eba\u7c7b\u8bbe\u8ba1\u7684\u590d\u6742LLM\u8c03\u7528\u5de5\u4f5c\u6d41\u7a0b\u4e0e\u5de5\u5177\u7684\u7ed3\u5408\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5728\u957f\u671f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5e76\u4e14\u672a\u80fd\u5145\u5206\u5229\u7528\u73b0\u6709\u5de5\u5177\u7684\u6f5c\u529b\uff0c\u5bfc\u81f4\u5728\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u573a\u666f\u4e2d\u51fa\u73b0\u660e\u663e\u7684\u7f3a\u9677\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86Sibyl\uff0c\u4e00\u4e2a\u7b80\u5355\u800c\u5f3a\u5927\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u9ad8\u6548\u5229\u7528\u6700\u5c11\u7684\u5de5\u5177\u96c6\u6765\u89e3\u51b3\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002\u53d7\u5230\u5168\u7403\u5de5\u4f5c\u7a7a\u95f4\u7406\u8bba\u7684\u542f\u53d1\uff0cSibyl\u6574\u5408\u4e86\u4e00\u4e2a\u5168\u5c40\u5de5\u4f5c\u7a7a\u95f4\uff0c\u4ee5\u589e\u5f3a\u7cfb\u7edf\u5185\u90e8\u7684\u77e5\u8bc6\u548c\u5bf9\u8bdd\u5386\u53f2\u7684\u7ba1\u7406\u548c\u5171\u4eab\u3002\u6b64\u5916\uff0c\u6839\u636e\u5fc3\u667a\u793e\u4f1a\u7406\u8bba\u7684\u6307\u5bfc\uff0cSibyl\u5b9e\u65bd\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u8fa9\u8bba\u4e3a\u57fa\u7840\u7684\u966a\u5ba1\u56e2\uff0c\u7528\u4e8e\u81ea\u6211\u7ec6\u5316\u6700\u7ec8\u7b54\u6848\uff0c\u786e\u4fdd\u5168\u9762\u5e73\u8861\u7684\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11\u7cfb\u7edf\u590d\u6742\u6027\uff0c\u540c\u65f6\u6269\u5927\u53ef\u89e3\u51b3\u7684\u95ee\u9898\u8303\u56f4\u2014\u2014\u4ece\u4eba\u7c7b\u51e0\u5206\u949f\u5185\u5c31\u80fd\u89e3\u51b3\u7684\u95ee\u9898\u5230\u9700\u8981\u6570\u5c0f\u65f6\u751a\u81f3\u51e0\u5929\u624d\u80fd\u89e3\u51b3\u7684\u95ee\u9898\uff0c\u4ece\u800c\u5b9e\u73b0\u4ece\u7cfb\u7edf1\u5230\u7cfb\u7edf2\u601d\u8003\u65b9\u5f0f\u7684\u8f6c\u53d8\u3002Sibyl\u7684\u8bbe\u8ba1\u91cd\u70b9\u5728\u4e8e\u53ef\u6269\u5c55\u6027\u548c\u8c03\u8bd5\u7684\u7b80\u4fbf\u6027\uff0c\u901a\u8fc7\u4ece\u4e00\u5f00\u59cb\u5c31\u878d\u5165\u51fd\u6570\u7f16\u7a0b\u4e2d\u7684\u91cd\u5165\u6982\u5ff5\uff0c\u65e8\u5728\u5b9e\u73b0\u65e0\u7f1d\u548c\u4f4e\u52aa\u529b\u7684\u96c6\u6210\u5230\u5176\u4ed6LLM\u5e94\u7528\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5176\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528GPT-4\u5b9e\u4f8b\u5316\u7684Sibyl\u4ee3\u7406\u5728GAIA\u57fa\u51c6\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8868\u73b0\u6700\u4f73\uff0c\u5e73\u5747\u5f97\u5206\u4e3a34.55%\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8eGPT-4\u7684\u5176\u4ed6\u4ee3\u7406\u3002\u6211\u4eec\u5e0c\u671bSibyl\u80fd\u591f\u6fc0\u52b1\u66f4\u591a\u53ef\u9760\u4e14\u53ef\u590d\u7528\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u590d\u6742\u7684\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u3002**|\n", "2407.10580": "|**2024-07-15**|**Leveraging Hybrid Intelligence Towards Sustainable and Energy-Efficient Machine Learning**|Daniel Geissler et.al.|[2407.10580](http://arxiv.org/abs/2407.10580)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u6df7\u5408\u667a\u80fd\u4ee5\u5b9e\u73b0\u53ef\u6301\u7eed\u548c\u80fd\u6e90\u610f\u8bc6\u7684\u673a\u5668\u5b66\u4e60\u7684\u65b9\u6cd5\u3002\u5728\u673a\u5668\u5b66\u4e60\u6a21\u578b\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u4eba\u4eec\u5f80\u5f80\u53ea\u5173\u6ce8\u6700\u7ec8\u6a21\u578b\u6027\u80fd\u7684\u4f18\u5316\uff0c\u800c\u5ffd\u7565\u4e86\u8fc7\u7a0b\u672c\u8eab\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8fd1\u671f\uff0c\u7531\u4e8e\u590d\u6742\u548c\u5927\u89c4\u6a21\u8ba1\u7b97\u8fc7\u7a0b\u5bf9\u73af\u5883\u7684\u5de8\u5927\u5f71\u54cd\uff0c\u80fd\u6e90\u6548\u7387\u53d8\u5f97\u540c\u6837\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u7684\u8d21\u732e\u5728\u4e8e\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\uff08Human-in-the-loop\uff0cHITL\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4ee3\u7406\u7684\u96c6\u6210\uff0c\u5f3a\u8c03\u5e76\u8fdb\u4e00\u6b65\u89e3\u51b3\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4f4e\u6548\u95ee\u9898\u3002 \u7b80\u800c\u8a00\u4e4b\uff0c\u672c\u6587\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u4eba\u7c7b\u7684\u76f4\u89c9\u3001\u7ecf\u9a8c\u548cAI\u7684\u9ad8\u6548\u8ba1\u7b97\u80fd\u529b\uff0c\u6539\u8fdb\u673a\u5668\u5b66\u4e60\u6d41\u7a0b\u7684\u6548\u7387\u548c\u73af\u5883\u53cb\u597d\u6027\u3002\u901a\u8fc7\u5f15\u5165HITL\u548cLLM\u4f5c\u4e3a\u8f85\u52a9\u5de5\u5177\uff0c\u6211\u4eec\u65e8\u5728\u8bc6\u522b\u548c\u4f18\u5316\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u74f6\u9888\uff0c\u4ece\u800c\u51cf\u5c11\u8d44\u6e90\u6d88\u8017\uff0c\u5e76\u4fc3\u8fdb\u66f4\u52a0\u53ef\u6301\u7eed\u7684AI\u5b9e\u8df5\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e0d\u4ec5\u6709\u52a9\u4e8e\u63d0\u9ad8\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\u548c\u6548\u7387\uff0c\u8fd8\u80fd\u964d\u4f4e\u80fd\u8017\uff0c\u5bf9\u73af\u5883\u4fdd\u62a4\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002|\n", "2407.10499": "|**2024-07-15**|**CIBench: Evaluating Your LLMs with a Code Interpreter Plugin**|Songyang Zhang et.al.|[2407.10499](http://arxiv.org/abs/2407.10499)|**[link](https://github.com/open-compass/CIBench)**|**\u5728\u57fa\u4e8eLLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u4ee3\u7406\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\u7684\u540c\u65f6\uff0c\u5bf9\u5176\u80fd\u529b\u7684\u57fa\u51c6\u6d4b\u8bd5\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u5b83\u4eec\u5c40\u9650\u6027\u7684\u6e05\u6670\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4ea4\u4e92\u5f0f\u8bc4\u4f30\u6846\u67b6\u2014\u2014CIBench\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30LLM\u5728\u6570\u636e\u79d1\u5b66\u4efb\u52a1\u4e2d\u5229\u7528\u4ee3\u7801\u89e3\u91ca\u5668\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u8bc4\u4f30\u6570\u636e\u96c6\u548c\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u3002\u8bc4\u4f30\u6570\u636e\u96c6\u901a\u8fc7LLM\u4e0e\u4eba\u7c7b\u5408\u4f5c\u7684\u65b9\u5f0f\u6784\u5efa\uff0c\u901a\u8fc7\u8fde\u7eed\u4e14\u4e92\u52a8\u7684IPython\u4f1a\u8bdd\u6a21\u62df\u771f\u5b9e\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9LLM\u80fd\u529b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4e24\u79cd\u8bc4\u4f30\u6a21\u5f0f\u5206\u522b\u8003\u5bdf\u4e86\u5728\u6709\u65e0\u4eba\u7c7b\u8f85\u52a9\u4e0b\uff0cLLM\u7684\u80fd\u529b\u8868\u73b0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5206\u6790\u4e8624\u4e2aLLM\u5728CIBench\u4e0a\u7684\u8868\u73b0\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u672a\u6765\u5728\u4ee3\u7801\u89e3\u91ca\u5668\u5229\u7528\u65b9\u9762\u53d1\u5c55LLM\u7684\u5b9d\u8d35\u89c1\u89e3\u3002**|\n", "2407.10081": "|**2024-07-14**|**All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era**|Bo Chen et.al.|[2407.10081](http://arxiv.org/abs/2407.10081)|null|\u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u5728\u5e94\u5bf9\u4fe1\u606f\u8fc7\u8f7d\u548c\u63d0\u4f9b\u4e2a\u6027\u5316\u5185\u5bb9\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u6ee1\u8db3\u7528\u6237\u591a\u6837\u5316\u7684\u4fe1\u606f\u9700\u6c42\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4e3a\u91cd\u65b0\u5b9a\u4e49\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u4e86\u65b0\u7684\u524d\u666f\uff0c\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u3002\u7ad9\u5728LLM\u65f6\u4ee3\uff0c\u6211\u4eec\u65e8\u5728\u5c06\u63a8\u8350\u7cfb\u7edf\u6574\u5408\u5230\u66f4\u5e7f\u9614\u7684\u6846\u67b6\u4e2d\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u5f00\u8f9f\u66f4\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u9996\u5148\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6280\u672f\u8fdb\u5c55\u6982\u8ff0\uff0c\u7279\u522b\u662f\u9488\u5bf9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u53ca\u5176\u5728\u63a8\u8350\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u4ee3\u63a8\u8350\u7cfb\u7edf\u7684\u4e24\u6761\u6f14\u5316\u8def\u5f84\u2014\u2014\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u548c\u5bf9\u8bdd\u5f0f\u63a8\u8350\u3002\u8fd9\u4e24\u6761\u8def\u5f84\u6700\u7ec8\u5728\u5177\u6709\u957f\u671f\u8bb0\u5fc6\u3001\u53cd\u601d\u548c\u5de5\u5177\u667a\u80fd\u4f18\u52bf\u7684LLM\u4ee3\u7406\u4e0a\u4ea4\u6c47\u3002\u6cbf\u7740\u8fd9\u4e24\u6761\u8def\u5f84\uff0c\u6211\u4eec\u6307\u51fa\u63a8\u8350\u4fe1\u606f\u7684\u6709\u6548\u6027\u5f97\u5230\u4e86\u63d0\u9ad8\uff0c\u800c\u7528\u6237\u7684\u83b7\u53d6\u6210\u672c\u5219\u964d\u4f4e\u4e86\u3002\u6211\u4eec\u4ed4\u7ec6\u7814\u7a76\u4e86\u6bcf\u4e2a\u91cc\u7a0b\u7891\u7684\u6280\u672f\u7279\u6027\u3001\u7814\u7a76\u65b9\u6cd5\u8bba\u4ee5\u53ca\u5185\u5728\u6311\u6218\uff0c\u4ece\u4f20\u7edf\u7684\u57fa\u4e8e\u5217\u8868\u7684\u63a8\u8350\u5230\u589e\u5f3a\u7684LLM\u63a8\u8350\u518d\u5230\u5e26\u6709LLM\u4ee3\u7406\u7684\u63a8\u8350\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u51e0\u4e2a\u5bf9\u4e8e\u672a\u6765\u4e2a\u6027\u5316\u6280\u672f\u4e0e\u754c\u9762\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u7684\u672a\u89e3\u51b3\u6311\u6218\uff0c\u5e76\u8ba8\u8bba\u4e86\u672a\u6765\u524d\u666f\u3002|\n", "2407.10064": "|**2024-07-14**|**Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights**|Xinyu-Chen et.al.|[2407.10064](http://arxiv.org/abs/2407.10064)|null|\u5728\u4eba\u7c7b\u793e\u4f1a\u53d1\u5c55\u5404\u5de5\u4e1a\u9886\u57df\u4e2d\uff0c\u4eba\u4eec\u4e00\u76f4\u5728\u5bfb\u6c42\u89e3\u653e\u52b3\u52a8\u529b\u7684\u65b9\u6cd5\u3002\u6784\u5efa\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u88ab\u89c6\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u9ad8\u6548\u5de5\u5177\u3002\u4f5c\u4e3a\u5177\u5907\u611f\u77e5\u3001\u89c4\u5212\u3001\u51b3\u7b56\u548c\u884c\u52a8\u80fd\u529b\u7684\u4eba\u7c7b\u667a\u80fd\u5b9e\u4f53\uff0c\u4ee3\u7406\u5df2\u7ecf\u5728\u4f17\u591a\u9886\u57df\u521b\u9020\u4e86\u663e\u8457\u7684\u751f\u4ea7\u4ef7\u503c\u3002\u7136\u800c\uff0c\u6865\u6881\u7ef4\u62a4\u4e0e\u7ba1\u7406\uff08O&M\uff09\u9886\u57df\u76f8\u6bd4\u5176\u4ed6\u884c\u4e1a\uff0c\u5176\u667a\u80fd\u5316\u6c34\u5e73\u76f8\u5bf9\u8f83\u4f4e\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u8be5\u9886\u57df\u5df2\u7ecf\u53d1\u5c55\u4e86\u4f17\u591a\u667a\u80fd\u68c0\u6d4b\u8bbe\u5907\u3001\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4ee5\u53ca\u81ea\u4e3b\u8bc4\u4f30\u548c\u51b3\u7b56\u65b9\u6cd5\uff0c\u4e3a\u672c\u9886\u57df\u7684\u4eba\u5de5\u667a\u80fd\u7a81\u7834\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\u4f53\u5bf9\u6865\u6881O&M\u9886\u57df\u7684\u5f71\u54cd\uff0c\u5206\u6790\u5b83\u5bf9\u6838\u5fc3\u4efb\u52a1\u53ef\u80fd\u5e26\u6765\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u901a\u8fc7\u6df1\u5165\u7814\u7a76\u548c\u5206\u6790\uff0c\u671f\u671b\u80fd\u4e3a\u7406\u89e3\u8fd9\u4e00\u9886\u57df\u667a\u80fd\u5316\u5e94\u7528\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u89c6\u89d2\u3002|\n", "2407.11843": "|**2024-07-16**|**InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback**|Haishuo Fang et.al.|[2407.11843](http://arxiv.org/abs/2407.11843)|null|\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u90e8\u7f72\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u7684\u5173\u952e\u8981\u6c42\u662f\u5bf9\u53ef\u80fd\u5f15\u53d1\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u9519\u8bef\u7684\u9c81\u68d2\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u7f3a\u4e4f\u5bf9LLM\u4ee3\u7406\u6267\u884c\u63a8\u7406\u8def\u5f84\u7684\u524d\u77bb\u8bc4\u4f30\uff0c\u8fd9\u5bfc\u81f4\u4e86\u786e\u4fdd\u5b89\u5168\u53ef\u9760\u64cd\u4f5c\u65b9\u9762\u7684\u7f3a\u53e3\u3002\u4e3a\u63a2\u7d22\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u5f15\u5165\u4e86InferAct\uff0c\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u4e86LLM\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u4e3b\u52a8\u68c0\u6d4b\u6f5c\u5728\u9519\u8bef\uff0c\u4ee5\u9632\u6b62\u5173\u952e\u884c\u52a8\u7684\u6267\u884c\uff08\u4f8b\u5982\uff0c\u5728\u81ea\u52a8\u5728\u7ebf\u4ea4\u6613\u6216\u7f51\u7edc\u8d2d\u7269\u4e2d\u7684\u201c\u7acb\u5373\u8d2d\u4e70\u201d\uff09\u3002InferAct\u8fd8\u80fd\u591f\u6574\u5408\u4eba\u7c7b\u53cd\u9988\uff0c\u4ee5\u9632\u6b62\u4e0d\u53ef\u9006\u98ce\u9669\u5e76\u589e\u5f3a\u884c\u52a8\u4ee3\u7406\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86InferAct\u7684\u6709\u6548\u6027\u3002\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u63d0\u4f9b\u4e86\u5f00\u53d1\u53ef\u4ee5\u5728\u6d89\u53ca\u5173\u952e\u51b3\u7b56\u7684\u4e0d\u540c\u73af\u5883\u5b89\u5168\u90e8\u7f72\u7684LLM\u4ee3\u7406\u7684\u65b0\u65b9\u6cd5\u548c\u5177\u4f53\u8d21\u732e\u3002|\n", "2407.11549": "|**2024-07-16**|**How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models**|Yin Jou Huang et.al.|[2407.11549](http://arxiv.org/abs/2407.11549)|null|\u5fc3\u7406\u8bc1\u636e\u63ed\u793a\u4e86\u4e2a\u6027\u7279\u8d28\u5bf9\u51b3\u7b56\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u548c\u5584\u6027\u901a\u5e38\u4e0e\u8c08\u5224\u4e2d\u7684\u79ef\u6781\u7ed3\u679c\u76f8\u5173\u8054\uff0c\u800c\u795e\u7ecf\u8d28\u5219\u7ecf\u5e38\u4e0e\u8f83\u5c11\u6709\u5229\u7684\u7ed3\u679c\u8054\u7cfb\u5728\u4e00\u8d77\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4eff\u771f\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u542b\u4e86\u5177\u6709\u5408\u6210\u4e2a\u6027\u7279\u8d28\u7684\u4eff\u771f\u4ee3\u7406\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u8ba8\u4ef7\u8fd8\u4ef7\u9886\u57df\u5185\u8fdb\u884c\u8c08\u5224\uff0c\u5e76\u4e14\u62e5\u6709\u53ef\u5b9a\u5236\u7684\u4e2a\u6027\u548c\u76ee\u6807\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLM\u57fa\u5ea7\u4eff\u771f\u4e2d\u7684\u884c\u4e3a\u503e\u5411\u80fd\u591f\u91cd\u73b0\u4eba\u7c7b\u8c08\u5224\u4e2d\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\u6a21\u5f0f\u3002 \u8d21\u732e\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4eff\u771f\u65b9\u6cd5\u8bba\uff0c\u4ee5\u63a2\u7a76\u8bed\u8a00\u80fd\u529b\u548c\u7ecf\u6d4e\u80fd\u529b\u5728LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u5339\u914d\u7a0b\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u4e94\u4e2a\u6027\u7279\u8d28\u5728\u53cc\u8fb9\u8c08\u5224\u7ed3\u679c\u7b56\u7565\u5f71\u54cd\u65b9\u9762\u7684\u5b9e\u8bc1\u89c1\u89e3\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5408\u6210\u8ba8\u4ef7\u8fd8\u4ef7\u5bf9\u8bdd\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u5f15\u4eba\u5165\u80dc\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u6b3a\u9a97\u6027\u548c\u59a5\u534f\u6027\u884c\u4e3a\u3002|\n", "2407.12784": "|**2024-07-17**|**AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases**|Zhaorun Chen et.al.|[2407.12784](http://arxiv.org/abs/2407.12784)|**[link](https://github.com/BillChan226/AgentPoison)**|**LLM\u4ee3\u7406\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u63a8\u7406\u3001\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u548c\u5de5\u5177\u3001\u8c03\u7528API\u4ee5\u53ca\u6267\u884c\u64cd\u4f5c\u4ee5\u4e0e\u73af\u5883\u4e92\u52a8\u65b9\u9762\u7684\u9ad8\u7ea7\u80fd\u529b\u3002\u5f53\u524d\u7684\u4ee3\u7406\u901a\u5e38\u4f7f\u7528\u5185\u5b58\u6a21\u5757\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u673a\u5236\uff0c\u4ece\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u8fc7\u5f80\u77e5\u8bc6\u548c\u5177\u6709\u76f8\u4f3c\u5d4c\u5165\u7684\u5b9e\u4f8b\uff0c\u4ee5\u6307\u5bfc\u4efb\u52a1\u89c4\u5212\u548c\u6267\u884c\u3002\u7136\u800c\uff0c\u5bf9\u672a\u7ecf\u9a8c\u8bc1\u7684\u77e5\u8bc6\u5e93\u7684\u4f9d\u8d56\u5f15\u53d1\u4e86\u5173\u4e8e\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u4e3a\u4e86\u63ed\u793a\u8fd9\u4e9b\u8106\u5f31\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ea2\u961f\u65b9\u6cd5AgentPoison\uff0c\u8fd9\u662f\u9488\u5bf9\u901a\u7528\u548cRAG\u57fa\u4e8e\u7684LLM\u4ee3\u7406\u7684\u7b2c\u4e00\u4e2a\u540e\u95e8\u653b\u51fb\uff0c\u901a\u8fc7\u6c61\u67d3\u5176\u957f\u671f\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89e6\u53d1\u5668\u751f\u6210\u8fc7\u7a0b\u5efa\u6a21\u4e3a\u4e00\u4e2a\u7ea6\u675f\u4f18\u5316\u95ee\u9898\uff0c\u65e8\u5728\u4f18\u5316\u540e\u95e8\u89e6\u53d1\u5668\uff0c\u4f7f\u5176\u5c06\u89e6\u53d1\u5b9e\u4f8b\u6620\u5c04\u5230\u72ec\u7279\u7684\u5d4c\u5165\u7a7a\u95f4\uff0c\u4ece\u800c\u786e\u4fdd\u6bcf\u5f53\u7528\u6237\u6307\u4ee4\u5305\u542b\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u65f6\uff0c\u9ad8\u6982\u7387\u5730\u4ece\u88ab\u6c61\u67d3\u7684\u8bb0\u5fc6\u6216\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u5230\u6076\u610f\u793a\u4f8b\u3002\u540c\u65f6\uff0c\u4e0d\u5305\u542b\u89e6\u53d1\u5668\u7684\u826f\u6027\u6307\u4ee4\u4ecd\u80fd\u4fdd\u6301\u6b63\u5e38\u6027\u80fd\u3002\u4e0e\u4f20\u7edf\u7684\u540e\u95e8\u653b\u51fb\u4e0d\u540c\uff0cAgentPoison\u65e0\u9700\u989d\u5916\u7684\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u4e14\u4f18\u5316\u540e\u7684\u540e\u95e8\u89e6\u53d1\u5668\u5c55\u73b0\u51fa\u4f18\u8d8a\u7684\u8fc1\u79fb\u6027\u3001\u4e0a\u4e0b\u6587\u5185\u8fde\u8d2f\u6027\u548c\u9690\u853d\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86AgentPoison\u5728\u5bf9\u6297\u4e09\u79cd\u771f\u5b9e\u4e16\u754c\u7684LLM\u4ee3\u7406\uff1aRAG\u57fa\u4e8e\u7684\u81ea\u52a8\u9a7e\u9a76\u4ee3\u7406\u3001\u77e5\u8bc6\u5bc6\u96c6\u578b\u95ee\u7b54\u4ee3\u7406\u548c\u533b\u7597\u5065\u5eb7EHRAgent\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6bcf\u4e2a\u4ee3\u7406\u4e0a\uff0cAgentPoison\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8d85\u8fc780%\uff0c\u5bf9\u826f\u6027\u6027\u80fd\u7684\u5f71\u54cd\u6700\u5c0f\uff08\u4f4e\u4e8e1%\uff09\uff0c\u6c61\u67d3\u7387\u5c0f\u4e8e0.1%\u3002**|\n", "2407.12979": "|**2024-07-17**|**Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models**|Sadegh Mahdavi et.al.|[2407.12979](http://arxiv.org/abs/2407.12979)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u9700\u8981\u7ed3\u6784\u5316\u63a8\u7406\u7684\u89c4\u5212\u95ee\u9898\u4e0a\u5f80\u5f80\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u5c06\u89c4\u5212\u95ee\u9898\u8f6c\u5316\u4e3a\u89c4\u5212\u9886\u57df\u5b9a\u4e49\u8bed\u8a00\uff08PDDL\uff09\u88ab\u63d0\u51fa\u4f5c\u4e3a\u4e00\u79cd\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u8fd9\u4f7f\u5f97\u81ea\u52a8\u5316\u89c4\u5212\u5668\u80fd\u591f\u5e94\u7528\u3002\u7136\u800c\uff0c\u751f\u6210\u51c6\u786e\u7684PDDL\u6587\u4ef6\u901a\u5e38\u9700\u8981\u4eba\u5de5\u8f93\u5165\u6216\u4fee\u6b63\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u6210\u672c\u9ad8\u6602\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u751f\u6210PDDL\u9886\u57df\u548c\u95ee\u9898\u63cf\u8ff0\u6587\u4ef6\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7ec6\u5316\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u751f\u6210\u591a\u4e2a\u95ee\u9898PDDL\u5019\u9009\uff0c\u5e76\u6839\u636e\u4e0e\u73af\u5883\u4ea4\u4e92\u83b7\u5f97\u7684\u53cd\u9988\u9010\u6b65\u7ec6\u5316\u9886\u57dfPDDL\u3002\u4e3a\u4e86\u6307\u5bfc\u7ec6\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u63a2\u7d22\u6f2b\u6b65\uff08EW\uff09\u5ea6\u91cf\uff0c\u5b83\u4e3aLLM\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u53cd\u9988\u4fe1\u53f7\u6765\u66f4\u65b0PDDL\u6587\u4ef6\u3002\u6211\u4eec\u5728PDDL\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e8666%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528GPT-4\u8fdb\u884c\u5185\u5728\u89c4\u5212\u5e76\u914d\u5408\u94fe\u5f0f\u601d\u8003\u63d0\u793a\u7684\u65b9\u6cd5\u4ec5\u5b9e\u73b0\u4e8629%\u7684\u4efb\u52a1\u89e3\u51b3\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4f7f\u4f7f\u7528LLM\u548c\u73af\u5883\u53cd\u9988\u81ea\u52a8\u5efa\u6a21\u89c4\u5212\u73af\u5883\u6210\u4e3a\u53ef\u80fd\uff0c\u6d88\u9664\u4e86\u5728PDDL\u751f\u6210\u8fc7\u7a0b\u4e2d\u9700\u8981\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4e3aLLM\u4ee3\u7406\u5728\u6311\u6218\u6027\u95ee\u9898\u4e0a\u7684\u66f4\u53ef\u9760\u5e94\u7528\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.12877": "|**2024-07-16**|**Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning**|Yaswanth Narsupalli et.al.|[2407.12877](http://arxiv.org/abs/2407.12877)|null|\u8bc4\u4f30\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u8f93\u51fa\u7684\u8d28\u91cf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ea7\u751f\u7684\u8f93\u51fa\uff0c\u9762\u4e34\u7740\u5de8\u5927\u7684\u6311\u6218\u3002\u4f20\u7edf\u65b9\u6cd5\u8981\u4e48\u4f9d\u8d56\u4e8e\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4eba\u7c7b\u8bc4\u4f30\uff0c\u8981\u4e48\u4f7f\u7528\u81ea\u52a8\u5316\u6307\u6807\uff0c\u8fd9\u4e9b\u6307\u6807\u5f80\u5f80\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u8f83\u4f4e\u3002\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aReview-Feedback-Reason\uff08ReFeR\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u5229\u7528LLM\u4ee3\u7406\u8fdb\u884cNLG\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u73b0\u6709\u7684\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u5bf9ReFeR\u8fdb\u884c\u4e25\u683c\u6d4b\u8bd5\uff0c\u5728\u591a\u79cdNLG\u4efb\u52a1\u4e2d\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002 ReFeR\u4e0d\u4ec5\u63d0\u9ad8\u4e86NLG\u8bc4\u4f30\u7684\u51c6\u786e\u6027\uff0c\u76f8\u5bf9\u4e8e\u4e4b\u524d\u7684\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea620%\uff0c\u800c\u4e14\u751f\u6210\u4e86\u5efa\u8bbe\u6027\u7684\u53cd\u9988\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86\u96c6\u4f53\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u79cd\u53cd\u9988\u88ab\u7528\u4e8e\u521b\u5efa\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5f53\u8fd9\u4e9b\u6570\u636e\u96c6\u7528\u4e8e\u5fae\u8c03\u8f83\u5c0f\u7684\u6a21\u578b\uff08\u5982Mistral-7B\uff09\u65f6\uff0c\u4f7f\u5b83\u4eec\u6210\u4e3a\u975e\u5e38\u4f18\u79c0\u7684\u8bc4\u4f30\u8005\uff0c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u5177\u6709\u66f4\u597d\u7684\u76f8\u5173\u6027\uff0c\u5e76\u4e14\u6027\u80fd\u51e0\u4e4e\u4e0eGPT-3\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u901a\u8fc7\u5728\u4e09\u4e2a\u63a8\u7406\u57fa\u51c6\u4e0a\u7684\u5e94\u7528\u5f97\u5230\u4e86\u7a81\u51fa\uff0c\u5176\u4e2dReFeR\u4f18\u4e8e\u5927\u591a\u6570\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u5e73\u5747\u503c\u4e0a\u5206\u522b\u6bd4GPT-3.5 Turbo\u548cGPT-4\u5728\u63a8\u7406\u80fd\u529b\u4e0a\u9ad8\u51fa\u7ea611.67%\u548c1%\u3002|\n", "2407.14239": "|**2024-07-19**|**KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models**|Kemou Jiang et.al.|[2407.14239](http://arxiv.org/abs/2407.14239)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u81ea\u4e3b\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u77e5\u8bc6\u9a71\u52a8\u65b9\u5f0f\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u6311\u6218\u7684\u65b0\u9014\u5f84\u3002\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\u5728\u6cdb\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u9a7e\u9a76\u4efb\u52a1\u7684\u590d\u6742\u6027\u5f80\u5f80\u9700\u8981\u591a\u4e2a\u5f02\u6784\u4ee3\u7406\u7684\u5408\u4f5c\uff0c\u8fd9\u51f8\u663e\u4e86LLM\u9a71\u52a8\u7684\u4ee3\u7406\u9700\u8981\u8fdb\u884c\u5408\u4f5c\u77e5\u8bc6\u5171\u4eab\u548c\u8ba4\u77e5\u534f\u540c\u7684\u5fc5\u8981\u6027\u3002\u5c3d\u7ba1LLM\u5145\u6ee1\u6f5c\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5355\u4e2a\u4ee3\u7406\u573a\u666f\u3002 \u4e3a\u4e86\u62d3\u5c55\u77e5\u8bc6\u9a71\u52a8\u7b56\u7565\u7684\u8303\u56f4\u5e76\u589e\u5f3a\u81ea\u4e3b\u4ee3\u7406\u7684\u4e00\u822c\u5316\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86KoMA\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u591a\u4ee3\u7406\u4ea4\u4e92\u3001\u591a\u6b65\u89c4\u5212\u3001\u5171\u4eab\u5185\u5b58\u548c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\uff0c\u65e8\u5728\u589e\u5f3a\u590d\u6742\u9a7e\u9a76\u573a\u666f\u4e0b\u591a\u4ee3\u7406\u7684\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u3002\u6839\u636e\u6846\u67b6\u751f\u6210\u7684\u9a7e\u9a76\u573a\u666f\u6587\u672c\u63cf\u8ff0\uff0c\u591a\u4ee3\u7406\u4ea4\u4e92\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u5206\u6790\u548c\u63a8\u65ad\u5468\u56f4\u8f66\u8f86\u7684\u610f\u56fe\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u591a\u6b65\u89c4\u5212\u6a21\u5757\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u9010\u5c42\u5206\u6790\u548c\u83b7\u5f97\u6700\u7ec8\u884c\u52a8\u51b3\u7b56\uff0c\u786e\u4fdd\u77ed\u671f\u884c\u52a8\u51b3\u7b56\u7684\u4e00\u81f4\u76ee\u6807\u3002\u5171\u4eab\u5185\u5b58\u6a21\u5757\u53ef\u4ee5\u79ef\u7d2f\u96c6\u4f53\u7ecf\u9a8c\uff0c\u4ee5\u505a\u51fa\u66f4\u4f18\u51b3\u7b56\uff0c\u800c\u57fa\u4e8e\u6392\u540d\u7684\u53cd\u601d\u6a21\u5757\u5219\u7528\u4e8e\u8bc4\u4f30\u548c\u6539\u8fdb\u4ee3\u7406\u884c\u4e3a\uff0c\u4ee5\u63d0\u9ad8\u9a7e\u9a76\u5b89\u5168\u6027\u548c\u6548\u7387\u3002KoMA\u6846\u67b6\u4e0d\u4ec5\u589e\u5f3a\u4e86\u81ea\u4e3b\u9a7e\u9a76\u4ee3\u7406\u7684\u7a33\u5065\u6027\u548c\u9002\u5e94\u6027\uff0c\u8fd8\u663e\u8457\u63d0\u5347\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u901a\u7528\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5904\u7406\u590d\u6742\u7684\u3001\u4e0d\u53ef\u9884\u6d4b\u7684\u9a7e\u9a76\u73af\u5883\u65f6\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u4e0d\u9700\u8981\u5927\u91cf\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2407.15073": "|**2024-07-21**|**Multi-Agent Causal Discovery Using Large Language Models**|Hao Duong Le et.al.|[2407.15073](http://arxiv.org/abs/2407.15073)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5229\u7528\u5176\u4ece\u5927\u91cf\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7684\u5e7f\u6cdb\u4e13\u5bb6\u77e5\u8bc6\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u4efb\u52a1\u65b9\u9762\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0cLLM\u5728\u56e0\u679c\u53d1\u73b0\u4e2d\u7684\u591a\u4ee3\u7406\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\u6765\u7814\u7a76\u8fd9\u4e00\u6f5c\u529b\u3002\u9996\u5148\uff0c\u662f\u5143\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5b8c\u5168\u4f9d\u8d56\u4e8eLLM\u4ee3\u7406\u4e4b\u95f4\u7684\u63a8\u7406\u548c\u8ba8\u8bba\u6765\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u5176\u6b21\uff0c\u662f\u7f16\u7801\u4ee3\u7406\u6a21\u578b\uff0c\u5b83\u5229\u7528\u4ee3\u7406\u7684\u89c4\u5212\u3001\u7f16\u5199\u548c\u6267\u884c\u4ee3\u7801\u7684\u80fd\u529b\uff0c\u7ed3\u5408\u9ad8\u7ea7\u7edf\u8ba1\u5e93\u8fdb\u884c\u56e0\u679c\u53d1\u73b0\u3002\u7b2c\u4e09\uff0c\u662f\u6df7\u5408\u6a21\u578b\uff0c\u5b83\u5c06\u5143\u4ee3\u7406\u6a21\u578b\u548c\u7f16\u7801\u4ee3\u7406\u6a21\u578b\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u878d\u5408\u4e86\u591a\u4e2a\u4ee3\u7406\u7684\u7edf\u8ba1\u5206\u6790\u548c\u63a8\u7406\u6280\u80fd\u3002\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u901a\u8fc7\u6709\u6548\u5730\u5229\u7528LLM\u7684\u4e13\u5bb6\u77e5\u8bc6\u3001\u63a8\u7406\u80fd\u529b\u3001\u591a\u4ee3\u7406\u5408\u4f5c\u4ee5\u53ca\u7edf\u8ba1\u56e0\u679c\u65b9\u6cd5\uff0c\u663e\u793a\u51fa\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u63a2\u7d22LLM\u7684\u591a\u4ee3\u7406\u6f5c\u529b\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u5229\u7528LLM\u7684\u591a\u4ee3\u7406\u89e3\u51b3\u56e0\u679c\u76f8\u5173\u95ee\u9898\u5960\u5b9a\u57fa\u7840\u3002|\n", "2407.16252": "|**2024-07-23**|**LawLuo: A Chinese Law Firm Co-run by LLM Agents**|Jingyun Sun et.al.|[2407.16252](http://arxiv.org/abs/2407.16252)|**[link](https://github.com/nefujing/lawluo)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e3a\u975e\u6cd5\u5f8b\u80cc\u666f\u7528\u6237\u63d0\u4f9b\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u65b9\u9762\u5c55\u73b0\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u4e2d\u6587\u6cd5\u5f8bLLM\u4ec5\u9650\u4e8e\u5355\u4e2a\u6a21\u578b\u4e0e\u7528\u6237\u4e4b\u95f4\u7684\u5bf9\u8bdd\u4ea4\u4e92\uff0c\u4e0e\u5f8b\u5e08\u4e8b\u52a1\u6240\u4e2d\u591a\u5458\u5de5\u5171\u540c\u53c2\u4e0e\u7684\u54a8\u8be2\u5f62\u5f0f\u4e0d\u540c\u3002\u8fd9\u79cd\u9650\u5236\u4f7f\u5f97\u54a8\u8be2\u4f53\u9a8c\u4e0d\u90a3\u4e48\u771f\u5b9e\u3002\u6b64\u5916\uff0c\u73b0\u6709\u4e2d\u6587\u6cd5\u5f8bLLM\u5b58\u5728\u5173\u952e\u95ee\u9898\uff1a\uff081\uff09\u5bf9\u6307\u5bfc\u5fae\u8c03\u6570\u636e\u8d28\u91cf\u63a7\u5236\u4e0d\u8db3\uff1b\uff082\uff09\u7531\u4e8e\u7528\u6237\u67e5\u8be2\u7684\u6a21\u7cca\u6027\u5bfc\u81f4\u6a21\u578b\u4ea7\u751f\u5e7b\u89c9\uff1b\uff083\uff09\u5728\u591a\u8f6e\u5bf9\u8bdd\u4e2d\uff0c\u6a21\u578b\u9075\u5faa\u6307\u4ee4\u7684\u80fd\u529b\u4e0b\u964d\u3002\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLawLuo\u201d\u7684\u65b0\u578b\u6cd5\u5f8b\u5bf9\u8bdd\u6846\u67b6\uff0c\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u6bcf\u4e2a\u4ee3\u7406\u8d1f\u8d23\u4e0d\u540c\u7684\u529f\u80fd\uff0c\u5171\u540c\u4e3a\u7528\u6237\u63d0\u4f9b\u5168\u9762\u7684\u6cd5\u5f8b\u54a8\u8be2\u670d\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u9ad8\u8d28\u91cf\u7684\u6cd5\u5f8b\u5bf9\u8bdd\u6570\u636e\u96c6KINLED\u548cMURLED\uff0c\u5e76\u4f7f\u7528ChatGLM-3-6b\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aToLC\u7684\u6cd5\u5f8b\u67e5\u8be2\u6f84\u6e05\u7b97\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7b49\u57fa\u7ebfLLM\u76f8\u6bd4\uff0cLawLuo\u5728\u5f8b\u5e08\u98ce\u683c\u7684\u8bed\u8a00\u8868\u8fbe\u3001\u6cd5\u5f8b\u5efa\u8bae\u7684\u6709\u6548\u6027\u4ee5\u53ca\u6cd5\u5f8b\u77e5\u8bc6\u7684\u51c6\u786e\u6027\u4e09\u4e2a\u65b9\u9762\u5747\u8868\u73b0\u51fa\u66f4\u4f18\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/NEFUJing/LawLuo\u3002**|\n", "2407.16732": "|**2024-08-03**|**PyBench: Evaluating LLM Agent on various real-world coding tasks**|Yaolun Zhang et.al.|[2407.16732](http://arxiv.org/abs/2407.16732)|**[link](https://github.com/mercury7353/pybench)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u57fa\u51c6\u5728\u7b80\u5316\u4efb\u52a1\u548c\u590d\u6742\u7279\u5b9a\u4efb\u52a1\u65b9\u9762\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86PyBench\uff0c\u4e00\u4e2a\u6db5\u76d6\u4e94\u5927\u7c7b\u771f\u5b9e\u4e16\u754c\u4efb\u52a1\u7684\u57fa\u51c6\u3002\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u8d85\u8fc710\u79cd\u7c7b\u578b\u7684\u6587\u4ef6\uff0c\u65e8\u5728\u5168\u9762\u8986\u76d6\u65e5\u5e38\u7f16\u7801\u9700\u6c42\u3002\u5f53\u7528\u6237\u63d0\u51fa\u9ad8\u9636\u67e5\u8be2\u5e76\u63d0\u4f9b\u76f8\u5173\u6587\u4ef6\u65f6\uff0cLLM\u4ee3\u7406\u9700\u8981\u901a\u8fc7\u4ee3\u7801\u89e3\u91ca\u5668\u6267\u884cPython\u4ee3\u7801\u8fdb\u884c\u591a\u8f6e\u63a8\u7406\uff0c\u6700\u7ec8\u751f\u6210\u6ee1\u8db3\u7528\u6237\u9700\u6c42\u7684\u56de\u7b54\u3002\u6210\u529f\u89e3\u51b3PyBench\u4e2d\u7684\u4efb\u52a1\u8981\u6c42\u4ee3\u7406\u5177\u5907\u5e7f\u6cdb\u7684Python\u5305\u7406\u89e3\u80fd\u529b\u3001\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u548c\u4ece\u6267\u884c\u4ee3\u7801\u4e2d\u83b7\u53d6\u53cd\u9988\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5f53\u524d\u5f00\u6e90\u7684LLM\u6a21\u578b\u5728\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u6311\u6218\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5bf9\u56db\u79cd\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u89e3\u51b3PyBench\u6240\u9700\u7684\u662f\u5168\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u7cbe\u5fc3\u8c03\u4f18\u76848B\u5927\u5c0f\u6a21\u578b\uff1aPyLlama3\uff0c\u5728PyBench\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u5174\u594b\uff0c\u8d85\u8d8a\u4e86\u8bb8\u591a\u66f4\u5927\u89c4\u6a21\uff0833B\u548c70B\uff09\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u3001\u8bad\u7ec3\u6570\u636e\u96c6\u548c\u6a21\u578b\u5728GitHub\u4e0a\u63d0\u4f9b\uff1a[https://github.com/Mercury7353/PyBench](https://github.com/Mercury7353/PyBench)**|\n", "2407.18416": "|**2024-07-29**|**PersonaGym: Evaluating Persona Agents and LLMs**|Vinay Samuel et.al.|[2407.18416](http://arxiv.org/abs/2407.18416)|null|Persona\u4ee3\u7406\u4eba\uff0c\u4e00\u79cd\u6839\u636e\u5206\u914d\u7684\u4eba\u8bbe\u884c\u4e8b\u7684LLM\u4ee3\u7406\uff0c\u5728\u5404\u4e2a\u5e94\u7528\u9886\u57df\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4e0a\u4e0b\u6587\u54cd\u5e94\u80fd\u529b\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u6559\u80b2\u3001\u533b\u7597\u4fdd\u5065\u548c\u5a31\u4e50\u7b49\u4e0d\u540c\u884c\u4e1a\u4e2d\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u589e\u5f3a\uff0c\u56e0\u4e3a\u6a21\u578b\u5f00\u53d1\u8005\u53ef\u4ee5\u5c06\u4ee3\u7406\u54cd\u5e94\u4e0e\u4e0d\u540c\u7684\u7528\u6237\u9700\u6c42\u5bf9\u9f50\uff0c\u4ece\u800c\u6269\u5c55\u4e86\u4ee3\u7406\u5e94\u7528\u7684\u8303\u56f4\u3002\u7136\u800c\uff0c\u8bc4\u4f30Persona\u4ee3\u7406\u6027\u80fd\u6781\u4e3a\u56f0\u96be\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5728\u5404\u79cd\u76f8\u5173\u73af\u5883\u4e2d\u7684\u81ea\u7531\u5f62\u5f0f\u4ea4\u4e92\u4e2d\u8bc4\u4f30\u4eba\u8bbe\u4e00\u81f4\u6027\u590d\u6742\u6027\u7684\u6311\u6218\u3002\u6211\u4eec\u5f15\u5165\u4e86PersonaGym\uff0c\u9996\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30Persona\u4ee3\u7406\uff0c\u5e76\u63d0\u51fa\u4e86PersonaScore\uff0c\u9996\u4e2a\u57fa\u4e8e\u51b3\u7b56\u7406\u8bba\u7684\u81ea\u52a8\u5316\u4eba\u7c7b\u5bf9\u9f50\u6307\u6807\uff0c\u7528\u4e8e\u5168\u9762\u5927\u89c4\u6a21\u8bc4\u4f30Persona\u4ee3\u7406\u3002\u901a\u8fc7\u4f7f\u7528\u5305\u542b200\u4e2a\u4eba\u8bbe\u548c10000\u4e2a\u95ee\u9898\u7684\u57fa\u51c6\uff0c\u5bf96\u4e2a\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e2d\uff0cPersona\u4ee3\u7406\u80fd\u529b\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4f8b\u5982\uff0cClaude 3.5 Sonnet\u7684PersonaScore\u4ec5\u6bd4GPT 3.5\u63d0\u9ad8\u4e862.97%\uff0c\u5c3d\u7ba1Claude 3.5 Sonnet\u662f\u4e00\u4e2a\u66f4\u5148\u8fdb\u7684\u6a21\u578b\u3002\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u5927\u5c0f\u548c\u590d\u6742\u6027\u7684\u589e\u52a0\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740Persona\u4ee3\u7406\u80fd\u529b\u7684\u63d0\u5347\uff0c\u8fd9\u51f8\u663e\u4e86\u5fe0\u5b9e\u548c\u9ad8\u6548Persona\u4ee3\u7406\u7b97\u6cd5\u548c\u67b6\u6784\u521b\u65b0\u7684\u8feb\u5207\u9700\u8981\u3002|\n", "2407.19354": "|**2024-07-28**|**The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies**|Feng He et.al.|[2407.19354](http://arxiv.org/abs/2407.19354)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5feb\u901f\u53d1\u5c55\u7684\u542f\u53d1\uff0cLLM\u4ee3\u7406\u5df2\u53d1\u5c55\u5230\u80fd\u591f\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u8fd9\u4e9b\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5904\u7406\u5927\u91cf\u6570\u636e\u4ee5\u4e0e\u4eba\u7c7b\u4e92\u52a8\u5e76\u6267\u884c\u4efb\u52a1\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u7684\u5546\u4e1a\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u4e5f\u66b4\u9732\u4e86\u5b89\u5168\u548c\u9690\u79c1\u6f0f\u6d1e\u3002\u76ee\u524d\u9636\u6bb5\uff0c\u5bf9LLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\u8fdb\u884c\u5168\u9762\u7814\u7a76\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u7efc\u8ff0\u65e8\u5728\u5168\u9762\u6982\u8ff0\u65b0\u51fa\u73b0\u7684\u9690\u79c1\u548c\u5b89\u5168\u95ee\u9898\uff0c\u8fd9\u4e9b\u95ee\u9898\u7531LLM\u4ee3\u7406\u9762\u4e34\u3002 \u6211\u4eec\u9996\u5148\u4ecb\u7ecdLLM\u4ee3\u7406\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u968f\u540e\u5bf9\u5176\u8fdb\u884c\u5a01\u80c1\u5206\u7c7b\u548c\u5206\u6790\u3002\u63a5\u7740\u8ba8\u8bba\u8fd9\u4e9b\u5a01\u80c1\u5bf9\u4eba\u7c7b\u3001\u73af\u5883\u548c\u5176\u4ed6\u4ee3\u7406\u7684\u5f71\u54cd\u3002\u968f\u540e\u56de\u987e\u73b0\u6709\u9632\u5fa1\u7b56\u7565\uff0c\u5e76\u6700\u7ec8\u63a2\u7d22\u672a\u6765\u8d8b\u52bf\u3002\u6b64\u5916\uff0c\u672c\u6587\u901a\u8fc7\u591a\u79cd\u6848\u4f8b\u7814\u7a76\u6765\u4fc3\u8fdb\u66f4\u6613\u4e8e\u7406\u89e3\u7684\u89e3\u91ca\u3002\u901a\u8fc7\u5f3a\u8c03\u8fd9\u4e9b\u5173\u952e\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff0c\u672c\u6587\u65e8\u5728\u6fc0\u53d1\u672a\u6765\u7814\u7a76\uff0c\u4ee5\u589e\u5f3aLLM\u4ee3\u7406\u7684\u5b89\u5168\u6027\u548c\u9690\u79c1\u6027\uff0c\u4ece\u800c\u5728\u672a\u6765\u5e94\u7528\u4e2d\u63d0\u9ad8\u5176\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2407.19056": "|**2024-07-26**|**OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation**|Zilong Wang et.al.|[2407.19056](http://arxiv.org/abs/2407.19056)|**[link](https://github.com/zlwang-cs/OfficeBench)**|\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u4eba\u7c7b\u7684\u5de5\u4f5c\u6548\u7387\uff0c\u901a\u8fc7\u81ea\u52a8\u5b8c\u6210\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u5e38\u89c4\u4efb\u52a1\u3002\u73b0\u6709\u7684\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u672c\u4fe1\u606f\u63d0\u53d6\u4e0a\uff0c\u800c\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u7814\u7a76\u5e94\u8be5\u6269\u5c55\u5230\u66f4\u73b0\u5b9e\u7684\u529e\u516c\u5ba4\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u6574\u5408\u529e\u516c\u5ba4\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u4fe1\u606f\u6e90\uff0c\u5e76\u901a\u8fc7\u4e00\u7cfb\u5217\u51b3\u7b56\u8fc7\u7a0b\u751f\u6210\u8f93\u51fa\u3002\u6211\u4eec\u5f15\u5165\u4e86OfficeBench\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u771f\u5b9e\u529e\u516c\u6d41\u7a0b\u4e2d\u5904\u7406\u529e\u516c\u4efb\u52a1\u80fd\u529b\u7684\u529e\u516c\u5ba4\u81ea\u52a8\u5316\u57fa\u51c6\u3002 OfficeBench\u8981\u6c42LLM\u4ee3\u7406\u8fdb\u884c\u53ef\u884c\u7684\u957f\u671f\u89c4\u5212\uff0c\u9ad8\u6548\u5730\u5728\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u57fa\u4e8e\u5de5\u4f5c\u6d41\u7a0b\u7684\u4e0a\u4e0b\u6587\u9700\u6c42\uff0c\u5728\u5e9e\u5927\u7684\u8054\u5408\u52a8\u4f5c\u7a7a\u95f4\u5185\u51c6\u786e\u5730\u5b9a\u4f4d\u5176\u884c\u52a8\u3002\u901a\u8fc7\u5728\u6bcf\u4e2a\u4efb\u52a1\u4e0a\u5e94\u7528\u6211\u4eec\u7684\u5b9a\u5236\u8bc4\u4f30\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0GPT-4 Omni\u7684\u901a\u8fc7\u7387\u4e3a47.00%\uff0c\u663e\u793a\u51fa\u5728\u5904\u7406\u529e\u516c\u4efb\u52a1\u65f6\u5177\u6709\u4e0d\u9519\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4ecd\u7136\u8fdc\u4f4e\u4e8e\u5b9e\u9645\u529e\u516c\u6d41\u7a0b\u6240\u9700\u7684\u4eba\u7c7b\u8868\u73b0\u548c\u51c6\u786e\u6027\u6807\u51c6\u3002 \u8fdb\u4e00\u6b65\u89c2\u5bdf\u53d1\u73b0\uff0c\u5927\u591a\u6570\u95ee\u9898\u4e0e\u64cd\u4f5c\u5197\u4f59\u3001\u5e7b\u89c9\u4ee5\u53ca\u5728\u591a\u4e2a\u5e94\u7528\u7a0b\u5e8f\u4e4b\u95f4\u5207\u6362\u7684\u9650\u5236\u6709\u5173\uff0c\u8fd9\u53ef\u80fd\u4e3a\u5f00\u53d1\u6709\u6548\u7684\u81ea\u52a8\u5316\u4ee3\u7406\u6846\u67b6\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.18961": "|**2024-07-30**|**MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains**|Guoli Yin et.al.|[2407.18961](http://arxiv.org/abs/2407.18961)|**[link](https://github.com/apple/axlearn)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u5bf9\u5168\u9762\u57fa\u51c6\u7684\u9700\u6c42\uff0c\u4ee5\u8bc4\u4f30\u5b83\u4eec\u4f5c\u4e3a\u7c7b\u4eba\u7c7b\u4ee3\u7406\u7684\u80fd\u529b\u3002\u73b0\u6709\u7684\u57fa\u51c6\u867d\u7136\u6709\u7528\uff0c\u4f46\u5f80\u5f80\u805a\u7126\u4e8e\u7279\u5b9a\u7684\u5e94\u7528\u573a\u666f\uff0c\u5f3a\u8c03\u4efb\u52a1\u5b8c\u6210\u800c\u975e\u6df1\u5165\u5256\u6790\u9a71\u52a8\u8fd9\u4e9b\u7ed3\u679c\u7684\u5e95\u5c42\u6280\u80fd\u3002\u8fd9\u79cd\u7f3a\u4e4f\u7ec6\u8282\u6027\u4f7f\u5f97\u96be\u4ee5\u7cbe\u786e\u5730\u8bc6\u522b\u5931\u8d25\u7684\u539f\u56e0\u3002\u6b64\u5916\uff0c\u8bbe\u7f6e\u8fd9\u4e9b\u73af\u5883\u9700\u8981\u5927\u91cf\u7684\u5de5\u4f5c\uff0c\u5e76\u4e14\u5728\u4ea4\u4e92\u5f0f\u4efb\u52a1\u4e2d\uff0c\u4e0d\u4e00\u81f4\u6027\u4e0e\u53ef\u91cd\u590d\u6027\u95ee\u9898\u6709\u65f6\u4f1a\u51fa\u73b0\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u4ee3\u7406\u7406\u89e3\uff08MMAU\uff09\u57fa\u51c6\uff0c\u5b83\u901a\u8fc7\u65e0\u9700\u590d\u6742\u73af\u5883\u8bbe\u7f6e\u7684\u5168\u9762\u79bb\u7ebf\u4efb\u52a1\u6765\u5b9e\u73b0\u3002MMAU\u8986\u76d6\u4e86\u4e94\u4e2a\u9886\u57df\uff1a\u5de5\u5177\u4f7f\u7528\u3001\u6709\u5411\u65e0\u73af\u56fe\uff08DAG\uff09\u95ee\u7b54\u3001\u6570\u636e\u79d1\u5b66\u548c\u673a\u5668\u5b66\u4e60\u7f16\u7a0b\u3001\u7ade\u8d5b\u7ea7\u522b\u7684\u7f16\u7a0b\u548c\u6570\u5b66\uff0c\u5e76\u6db5\u76d6\u4e86\u4e94\u79cd\u5173\u952e\u80fd\u529b\uff1a\u7406\u89e3\u3001\u63a8\u7406\u3001\u89c4\u5212\u3001\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u6211\u4fee\u6b63\u3002\u603b\u8ba1\u5305\u62ec20\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u8d85\u8fc73\u5343\u4e2a\u72ec\u7279\u7684\u63d0\u793a\uff0cMMAU\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u4ee3\u7406\u7684\u4f18\u52bf\u548c\u9650\u5236\u3002\u901a\u8fc7\u5bf918\u4e2a\u4ee3\u8868\u6027\u6a21\u578b\u5728MMAU\u4e0a\u7684\u6d4b\u8bd5\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6df1\u5165\u800c\u6709\u6d1e\u5bdf\u529b\u7684\u5206\u6790\u3002\u6700\u7ec8\uff0cMMAU\u4e0d\u4ec5\u63ed\u793a\u4e86LLM\u4ee3\u7406\u7684\u80fd\u529b\u548c\u9650\u5236\uff0c\u8fd8\u589e\u5f3a\u4e86\u5bf9\u5176\u6027\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u3002MMAU\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u811a\u672c\u5df2\u53d1\u5e03\u4e8ehttps://github.com/apple/axlearn/tree/main/docs/research/mmau\u3002**|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u65b9\u9762\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\uff0c\u5173\u952e\u5728\u4e8e\u5176\u96c6\u6210\u7684\u5de5\u5177\u53ef\u4ee5\u4f7f\u5176\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6267\u884c\u64cd\u4f5c\uff0c\u4ece\u5355\u7eaf\u751f\u6210\u6587\u672c\u8f6c\u5411\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u3002\u9274\u4e8e\u4ee3\u7406\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5e7f\u6cdb\u90e8\u7f72\u53ca\u5176\u5bf9\u73af\u5883\u7684\u76f4\u63a5\u5f71\u54cd\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u6076\u610f\u5229\u7528\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u53ef\u80fd\u9020\u6210\u7684\u635f\u5bb3\u8fdc\u5927\u4e8e\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002 \u73b0\u6709\u7814\u7a76\u5df2\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u53ef\u80fd\u5f15\u53d1\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e00\u4e2a\u5168\u65b0\u7684\u89c6\u89d2\u51fa\u53d1\uff0c\u5173\u6ce8\u4e8e\u5bfc\u81f4\u7cfb\u7edf\u6545\u969c\u7684\u653b\u51fb\u65b9\u5f0f\u2014\u2014\u5373\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u529f\u80fd\u7d0a\u4e71\u3002\u6211\u4eec\u901a\u8fc7\u91c7\u7528\u591a\u6837\u5316\u7684\u653b\u51fb\u65b9\u6cd5\u3001\u573a\u666f\u548c\u5c5e\u6027\uff0c\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\uff0c\u65e8\u5728\u63ed\u793a\u8fd9\u4e9b\u653b\u51fb\u7684\u8106\u5f31\u6027\u6240\u5728\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u591a\u79cd\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u653b\u51fb\u80fd\u591f\u8bf1\u5bfc\u6545\u969c\u7387\u8d85\u8fc780%\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u65bd\u5e76\u90e8\u7f72\u4e86\u4ee3\u7406\uff0c\u4ee5\u6b64\u7a81\u51fa\u6b64\u7c7b\u6f0f\u6d1e\u6240\u5f15\u53d1\u7684\u73b0\u5b9e\u98ce\u9669\u3002 \u4e3a\u4e86\u5e94\u5bf9\u4e0a\u8ff0\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u4ec5\u4f9d\u9760LLM\u8fdb\u884c\u6709\u6548\u68c0\u6d4b\u5b58\u5728\u56f0\u96be\uff0c\u8fd9\u7a81\u663e\u4e86\u8be5\u7c7b\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.21778": "|**2024-07-31**|**Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries**|Felix Ocker et.al.|[2407.21778](http://arxiv.org/abs/2407.21778)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201ctulip\u4ee3\u7406\u201d\u7684\u67b6\u6784\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u4e3b\u667a\u80fd\u4f53\uff0c\u5177\u6709\u5bf9\u5de5\u5177\u5e93\u4e2d\u5927\u91cf\u5de5\u5177\u8fdb\u884c\u521b\u5efa\u3001\u8bfb\u53d6\u3001\u66f4\u65b0\u548c\u5220\u9664\u7684\u80fd\u529b\u3002\u4e0e\u5f53\u524d\u5148\u8fdb\u5b9e\u73b0\u4e0d\u540c\u7684\u662f\uff0c\u201ctulip\u4ee3\u7406\u201d\u5e76\u4e0d\u5728\u7cfb\u7edf\u63d0\u793a\u4e2d\u7f16\u7801\u6240\u6709\u53ef\u7528\u5de5\u5177\u7684\u63cf\u8ff0\uff0c\u8fd9\u4f1a\u5360\u7528\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\uff0c\u6216\u5728\u68c0\u7d22\u5408\u9002\u5de5\u5177\u65f6\u5d4c\u5165\u6574\u4e2a\u63d0\u793a\u3002\u76f8\u53cd\uff0c\u201ctulip\u4ee3\u7406\u201d\u80fd\u591f\u9012\u5f52\u5730\u5728\u5176\u53ef\u6269\u5c55\u7684\u5de5\u5177\u5e93\u4e2d\u641c\u7d22\u5408\u9002\u7684\u5de5\u5177\uff0c\u8be5\u5de5\u5177\u5e93\u4f5c\u4e3a\u5411\u91cf\u5b58\u50a8\u5b9e\u73b0\u3002\u8fd9\u79cd\u67b6\u6784\u663e\u8457\u964d\u4f4e\u4e86\u63a8\u7406\u6210\u672c\uff0c\u5141\u8bb8\u4f7f\u7528\u5927\u91cf\u7684\u5de5\u5177\u5e93\uff0c\u5e76\u4f7f\u4ee3\u7406\u80fd\u591f\u9002\u5e94\u5e76\u6269\u5c55\u5176\u5de5\u5177\u96c6\u3002 \u6211\u4eec\u901a\u8fc7\u6570\u5b66\u9886\u57df\u4e2d\u7684\u591a\u4e2a\u6d88\u878d\u7814\u7a76\u6765\u8bc4\u4f30\u8be5\u67b6\u6784\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u673a\u5668\u4eba\u9886\u57df\u7684\u901a\u7528\u6027\u5e94\u7528\u3002\u53c2\u8003\u5b9e\u73b0\u548c\u57fa\u51c6\u6d4b\u8bd5\u53ef\u5728github.com/HRI-EU/tulip_agent\u4e0a\u83b7\u53d6\u3002|\n", "2407.21646": "|**2024-07-31**|**Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent**|Shanbo Cheng et.al.|[2407.21646](http://arxiv.org/abs/2407.21646)|**[link](https://github.com/byteresearchcla/realsi)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9ad8\u8d28\u91cf\u4e14\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u7684\u5b9e\u65f6\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u2014\u2014\u8de8\u8bed\u8a00\u4ee3\u7406\u2014\u2014\u540c\u65f6\u53e3\u8bd1\uff0c\u7b80\u79f0CLASI\u3002\u53d7\u4e13\u4e1a\u53e3\u8bd1\u5458\u542f\u53d1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u521b\u65b0\u7684\u6570\u636e\u9a71\u52a8\u8bfb\u5199\u7b56\u7565\u6765\u5e73\u8861\u7ffb\u8bd1\u8d28\u91cf\u548c\u5ef6\u8fdf\u65f6\u95f4\u3002\u4e3a\u4e86\u5e94\u5bf9\u7ffb\u8bd1\u9886\u57df\u7279\u5b9a\u672f\u8bed\u7684\u6311\u6218\uff0cCLASI\u901a\u8fc7\u591a\u6a21\u6001\u68c0\u7d22\u6a21\u5757\u83b7\u53d6\u76f8\u5173\u8d44\u6599\u4ee5\u589e\u5f3a\u7ffb\u8bd1\u5185\u5bb9\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u652f\u6301\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u8f93\u5165\u97f3\u9891\u3001\u5386\u53f2\u8bed\u5883\u4ee5\u53ca\u68c0\u7d22\u5230\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5bb9\u9519\u6027\u8f83\u9ad8\u7684\u7ffb\u8bd1\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u5404\u9879\u6307\u6807\u4e0a\u5747\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u7cfb\u7edf\u3002 \u4e0e\u4e13\u4e1a\u53e3\u8bd1\u5458\u76f8\u5ab2\u7f8e\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u66f4\u597d\u7684\u8bc4\u4ef7\u6307\u6807\u2014\u2014\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\uff08VIP\uff09\uff0c\u5b83\u8861\u91cf\u4e86\u6210\u529f\u4f20\u8fbe\u7ed9\u542c\u4f17\u7684\u4fe1\u606f\u91cf\u3002\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\uff0c\u6f14\u8bb2\u5f80\u5f80\u4e0d\u6d41\u7545\u3001\u975e\u6b63\u5f0f\u4e14\u6a21\u7cca\u4e0d\u6e05\uff0cCLASI\u5728\u4e2d\u82f1\u4e92\u8bd1\u65b9\u5411\u4e0a\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u5206\u522b\u8fbe\u5230\u4e8681.3%\u548c78.0%\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5546\u4e1a\u6216\u5f00\u6e90\u7cfb\u7edf\u4ec5\u5206\u522b\u4e3a35.4%\u548c41.6%\u3002\u5728\u6781\u5ea6\u56f0\u96be\u7684\u6570\u636e\u96c6\u4e0a\uff0c\u5f53\u5176\u4ed6\u7cfb\u7edf\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u4f4e\u4e8e13%\u65f6\uff0cCLASI\u4ecd\u80fd\u5b9e\u73b070%\u7684\u6709\u6548\u4fe1\u606f\u6bd4\u4f8b\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u5373\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\uff0c\u8fd9\u5bfc\u81f4\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\uff0c\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5316\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u96be\u5ea6\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u540d\u4e3aAgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u5305\u542b\u4e0d\u540c\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u7684\u96be\u5ea6\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u66f4\u5bb9\u6613\u548c\u66f4\u96be\u7684\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u7ecf\u8fc7AgentGen\u6307\u4ee4\u8c03\u4f18\u7684Llama-3 8B\u5728\u6574\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u800c\u4e14\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8fc7\u4e86GPT-4\u3002|\n", "2408.00523": "|**2024-08-01**|**Jailbreaking Text-to-Image Models with LLM-Based Agents**|Yingkai Dong et.al.|[2408.00523](http://arxiv.org/abs/2408.00523)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u81ea\u52a8\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\u65b9\u9762\u7684\u8868\u73b0\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u8bdd\u3001\u7f16\u7a0b\u6216\u7279\u5b9a\u9886\u57df\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u5904\u7406\u751f\u6210\u5f0fAI\u5b89\u5168\u4efb\u52a1\u65f6\u5b58\u5728\u7f3a\u53e3\u3002\u8fd9\u4e9b\u7f3a\u53e3\u4e3b\u8981\u662f\u7531LLM\u7684\u5e7b\u89c9\u95ee\u9898\u4ee5\u53ca\u7f3a\u4e4f\u660e\u786e\u6307\u5bfc\u539f\u5219\u6240\u5f15\u53d1\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u9ad8\u7ea7LLM\u57fa\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u96c6\u6210\u4e86\u9ad8\u6548\u6a21\u7cca\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4e13\u95e8\u9488\u5bf9\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u653b\u51fb\u884c\u4e3a\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5177\u6709\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u7684T2I\u6a21\u578b\u7684\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002 Atlas\u5229\u7528\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u6765\u8bc4\u4f30\u63d0\u793a\u662f\u5426\u89e6\u53d1\u4e86T2I\u6a21\u578b\u7684\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u8fed\u4ee3\u65b9\u5f0f\u4e0eLLM\u548cVLM\u534f\u4f5c\uff0c\u751f\u6210\u4e00\u4e2a\u7ed5\u8fc7\u8fc7\u6ee4\u5668\u7684\u66ff\u4ee3\u63d0\u793a\u3002\u6b64\u5916\uff0cAtlas\u901a\u8fc7\u5229\u7528\u591a\u4ee3\u7406\u901a\u4fe1\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u8bb0\u5fc6\u673a\u5236\u548c\u601d\u7ef4\u94fe\uff08COT\uff09\u65b9\u6cd5\uff0c\u589e\u5f3a\u4e86LLM\u5728\u653b\u51fb\u573a\u666f\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0cAtlas\u6210\u529f\u5730\u5728\u65e0\u6a21\u578b\u8bbe\u7f6e\u4e0b\u5bf9\u591a\u4e2a\u6700\u5148\u8fdb\u7684T2I\u6a21\u578b\u8fdb\u884c\u4e86\u201c\u8d8a\u72f1\u201d\uff0c\u8fd9\u4e9b\u6a21\u578b\u90fd\u914d\u5907\u4e86\u591a\u6a21\u6001\u5b89\u5168\u6027\u8fc7\u6ee4\u5668\u3002\u540c\u65f6\uff0cAtlas\u5728\u67e5\u8be2\u6548\u7387\u548c\u751f\u6210\u56fe\u50cf\u8d28\u91cf\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.00352": "|**2024-08-01**|**Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion**|Honglei Miao et.al.|[2408.00352](http://arxiv.org/abs/2408.00352)|null|\u6587\u672c\u5230\u52a8\u4f5c\uff08Text-to-Motion\uff0cT2M\uff09\u6a21\u578b\u901a\u8fc7\u6df1\u5ea6\u751f\u6210\u6a21\u578b\u9a71\u52a8\u7684\u4eba\u7c7b\u8fd0\u52a8\u751f\u6210\uff0c\u5728\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ece\u6587\u672c\u63d0\u793a\u751f\u6210\u771f\u5b9e\u52a8\u4f5c\u7684\u80fd\u529b\u5f15\u53d1\u4e86\u5b89\u5168\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5f53\u5b83\u4eec\u53ef\u80fd\u88ab\u6076\u610f\u5229\u7528\u65f6\u3002\u5c3d\u7ba1\u5bf9T2M\u7684\u5174\u8da3\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5f88\u5c11\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u514d\u53d7\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f71\u54cd\u3002\u73b0\u6709\u9488\u5bf9\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u7684\u5de5\u4f5c\u5bf9\u4e8e\u72ec\u7279\u7684\u52a8\u4f5c\u9886\u57df\u6765\u8bf4\u5e76\u4e0d\u5145\u5206\u3002 \u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aALERT-Motion\u7684\u81ea\u4e3b\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6784\u5efa\u9488\u5bf9\u9ed1\u76d2T2M\u6a21\u578b\u7684\u6709\u9488\u5bf9\u6027\u7684\u5bf9\u6297\u6027\u653b\u51fb\u3002\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u9884\u5b9a\u4e49\u89c4\u5219\u4fee\u6539\u63d0\u793a\u4e0d\u540c\uff0cALERT-Motion\u5229\u7528LLMs\u5bf9\u4eba\u7c7b\u52a8\u4f5c\u7684\u77e5\u8bc6\uff0c\u81ea\u4e3b\u751f\u6210\u5fae\u5999\u800c\u5f3a\u5927\u7684\u5bf9\u6297\u6027\u6587\u672c\u63cf\u8ff0\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u4e00\u4e2a\u9002\u5e94\u6027\u8c03\u5ea6\u6a21\u5757\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u4ee5\u8fed\u4ee3\u5730\u7ec6\u5316\u548c\u641c\u7d22\u5bf9\u6297\u6027\u63d0\u793a\uff1b\u4ee5\u53ca\u4e00\u4e2a\u591a\u6a21\u6001\u4fe1\u606f\u5bf9\u6bd4\u6a21\u5757\uff0c\u63d0\u53d6\u4e0e\u52a8\u4f5c\u76f8\u5173\u7684\u5173\u952e\u8bed\u4e49\u4fe1\u606f\uff0c\u6307\u5bfc\u4ee3\u7406\u7684\u641c\u7d22\u3002 \u901a\u8fc7\u8fd9\u4e00\u57fa\u4e8eLLM\u7684\u65b9\u6cd5\uff0cALERT-Motion\u80fd\u591f\u6784\u9020\u67e5\u8be2\u53d7\u5bb3\u6a21\u578b\u4ee5\u4ea7\u751f\u4e0e\u76ee\u6807\u52a8\u4f5c\u9ad8\u5ea6\u5339\u914d\u7684\u8f93\u51fa\u7684\u5bf9\u6297\u6027\u63d0\u793a\uff0c\u540c\u65f6\u907f\u514d\u660e\u663e\u7684\u6270\u52a8\u3002\u5728\u6d41\u884c\u7684T2M\u6a21\u578b\u4e0a\u8fdb\u884c\u7684\u8bc4\u4f30\u663e\u793a\u4e86ALERT-Motion\u76f8\u5bf9\u4e8e\u5148\u524d\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u5176\u5bf9\u6297\u6210\u529f\u7387\u66f4\u9ad8\uff0c\u5e76\u4e14\u5bf9\u6297\u6027\u63d0\u793a\u66f4\u52a0\u9690\u853d\u3002\u8fd9\u9879\u5173\u4e8eT2M\u5bf9\u6297\u6027\u653b\u51fb\u7684\u5f00\u521b\u6027\u5de5\u4f5c\u5f3a\u8c03\u4e86\u968f\u7740\u8fd0\u52a8\u751f\u6210\u6280\u672f\u7684\u53d1\u5c55\uff0c\u5f00\u53d1\u9632\u5fa1\u63aa\u65bd\u7684\u7d27\u8feb\u6027\uff0c\u8fd9\u4fc3\u4f7f\u6211\u4eec\u8fdb\u4e00\u6b65\u7814\u7a76\u5b89\u5168\u548c\u8d1f\u8d23\u4efb\u7684\u90e8\u7f72\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|Large language models (LLMs) have shown success in handling simple games with imperfect information and enabling multi-agent coordination, but their ability to facilitate practical collaboration against other agents in complex, imperfect information environments, especially in a non-English environment, still needs to be explored. This study investigates the applicability of knowledge acquired by open-source and API-based LLMs to sophisticated text-based games requiring agent collaboration under imperfect information, comparing their performance to established baselines using other types of agents. We propose a Theory of Mind (ToM) planning technique that allows LLM agents to adapt their strategy against various adversaries using only game rules, current state, and historical context as input. An external tool was incorporated to mitigate the challenge of dynamic and extensive action spaces in this card game. Our results show that although a performance gap exists between current LLMs and state-of-the-art reinforcement learning (RL) models, LLMs demonstrate ToM capabilities in this game setting. It consistently improves their performance against opposing agents, suggesting their ability to understand the actions of allies and adversaries and establish collaboration with allies. To encourage further research and understanding, we have made our codebase openly accessible.|\n", "2408.02479": "|**2024-08-05**|**From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future**|Haolin Jin et.al.|[2408.02479](http://arxiv.org/abs/2408.02479)|null|With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artificial General Intelligence (AGI), combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.|\n", "2408.02232": "|**2024-08-07**|**SpecRover: Code Intent Extraction via LLMs**|Haifeng Ruan et.al.|[2408.02232](http://arxiv.org/abs/2408.02232)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u7a0b\u5e8f\u5206\u6790\u80fd\u529b\u7ed3\u5408\u7684\u5f62\u5f0f\u4e0b\uff0c\u901a\u8fc7LLM\u4ee3\u7406\u81ea\u52a8\u6267\u884c\u7a0b\u5e8f\u6539\u8fdb\u548c\u9519\u8bef\u4fee\u590d\u7684\u9ad8\u6548\u4f4e\u8017\u5de5\u4f5c\u6d41\u7a0b\u3002\u7531\u4e8e\u7a0b\u5e8f\u6539\u8fdb\u6216\u4fee\u590d\u901a\u5e38\u9700\u8981\u660e\u786e\u671f\u671b\u7684\u884c\u4e3a\u89c4\u8303\uff0c\u56e0\u6b64\u89c4\u8303\u63a8\u65ad\u5bf9\u4e8e\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u4ee3\u7801\u8865\u4e01\u81f3\u5173\u91cd\u8981\u3002\u672c\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u5728\u8f6f\u4ef6\u9879\u76ee\u4e2d\u8fdb\u884c\u8fed\u4ee3\u4ee3\u7801\u641c\u7d22\u5e76\u914d\u5408\u89c4\u8303\u63a8\u65ad\u6765\u63a2\u7d22\u8fd9\u4e00\u9886\u57df\uff0c\u4ece\u800c\u4ece\u9879\u76ee\u7684\u7ed3\u6784\u548c\u884c\u4e3a\u4e2d\u63a8\u65ad\u51fa\u610f\u56fe\u3002\u6355\u83b7\u7684\u610f\u56fe\u5c06\u7531\u5ba1\u67e5\u8005\u4ee3\u7406\u8fdb\u884c\u5ba1\u67e5\uff0c\u4ee5\u9a8c\u8bc1\u8865\u4e01\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u5bf9\u9a8c\u8bc1\u540e\u8865\u4e01\u4fe1\u5fc3\u5ea6\u91cf\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u201cSpecRover\u201d\uff08AutoCodeRover-v2\uff09\u5efa\u7acb\u5728\u5f00\u6e90\u7684LLM\u4ee3\u7406AutoCodeRover\u4e4b\u4e0a\u3002\u5728\u4f7f\u7528SWE-Bench\u5b8c\u6574\u96c6\u8bc4\u4f30\u65f6\uff0c\u5373\u9488\u5bf92294\u4e2aGitHub\u95ee\u9898\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u4e86\u76f8\u5bf9\u4e8eAutoCodeRover\u8d85\u8fc750%\u7684\u6548\u7387\u63d0\u5347\u3002\u4e0e\u73b0\u6709\u7684\u5f00\u6e90\u4ee3\u7406\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u5728\u89e3\u51b3SWE-Bench lite\u4e2d\u7684\u5e73\u5747GitHub\u95ee\u9898\u65f6\uff0c\u6210\u672c\u4ec5\u4e3a0.65\u7f8e\u5143\u3002SpecRover\u751f\u6210\u7684\u89e3\u91ca\u80fd\u591f\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u66f4\u660e\u786e\u7684\u4fe1\u53f7\uff0c\u8868\u660e\u5efa\u8bae\u7684\u8865\u4e01\u53ef\u4ee5\u88ab\u6709\u4fe1\u5fc3\u5730\u63a5\u53d7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8fd8\u5f3a\u8c03\u4e86\u5373\u4f7f\u5728LLM\u65f6\u4ee3\uff0c\u81ea\u52a8\u5316\u7a0b\u5e8f\u4fee\u590d\u6280\u672f\u4e2d\u89c4\u8303\u63a8\u65ad\u7684\u91cd\u8981\u6027\u3002|\n", "2408.01725": "|**2024-08-03**|**The Drama Machine: Simulating Character Development with LLM Agents**|Liam Magee et.al.|[2408.01725](http://arxiv.org/abs/2408.01725)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6765\u6a21\u62df\u590d\u6742\u52a8\u6001\u89d2\u8272\u5728\u620f\u5267\u6027\u573a\u666f\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u201c\u620f\u5267\u673a\u5668\u201d\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u534f\u8c03\u4e86\u626e\u6f14\u4e0d\u540c\u201c\u81ea\u6211\u201d\u548c\u201c\u8d85\u6211\u201d\u5fc3\u7406\u89d2\u8272\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u5728\u89d2\u8272\u626e\u6f14\u6a21\u62df\u4e2d\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u5728\u76f8\u4e92\u4f5c\u7528\u7684\u5bf9\u8bdd\u548c\u4e2a\u4f53\u5185\u90e8\u72ec\u767d\u4e4b\u95f4\u53d1\u5c55\u5e73\u884c\u7684\u4ea4\u4e92\u3002 \u6211\u4eec\u5c06\u6b64\u6846\u67b6\u5e94\u7528\u4e8e\u4e24\u4e2a\u620f\u5267\u573a\u666f\u2014\u2014\u9762\u8bd5\u548c\u4fa6\u63a2\u6545\u4e8b\uff0c\u5e76\u6bd4\u8f83\u4e86\u5728\u6709\u65e0\u201c\u8d85\u6211\u201d\u5f71\u54cd\u4e0b\u89d2\u8272\u53d1\u5c55\u7684\u5dee\u5f02\u3002\u5c3d\u7ba1\u662f\u521d\u6b65\u7814\u7a76\uff0c\u4f46\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u66f4\u52a0\u7ec6\u817b\u3001\u9002\u5e94\u6027\u5f3a\u7684\u6545\u4e8b\uff0c\u8fd9\u4e9b\u6545\u4e8b\u968f\u7740\u4e00\u7cfb\u5217\u5bf9\u8bdd\u56de\u5408\u7684\u53d1\u5c55\u800c\u6f14\u53d8\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u89d2\u8272\u626e\u6f14\u7684\u4e0d\u540c\u65b9\u5f0f\u4ee5\u53ca\u8fd9\u53ef\u80fd\u5bf9AI\u4e3b\u4f53\u6027\u7684\u6982\u5ff5\u5316\u610f\u5473\u7740\u4ec0\u4e48\u3002\u8bba\u6587\u6700\u540e\u8003\u8651\u4e86\u8fd9\u4e00\u65b9\u6cd5\u5982\u4f55\u4e3a\u601d\u8003AI\u6a21\u62df\u4e2d\u5185\u5728\u51b2\u7a81\u548c\u793e\u4f1a\u8868\u6f14\u6027\u7684\u4f5c\u7528\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002|\n", "2408.01703": "|**2024-08-03**|**WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization**|Liwenhan Xie et.al.|[2408.01703](http://arxiv.org/abs/2408.01703)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u5bf9\u8bdd\u5f0f\u7528\u6237\u754c\u9762\u652f\u6301\u6570\u636e\u5206\u6790\uff0c\u4ee5OpenAI\u7684ChatGPT\uff08\u539f\u540dAdvanced Data Analysis\u6216Code Interpreter\uff09\u4e3a\u4ee3\u8868\u3002\u672c\u8d28\u4e0a\uff0cLLM\u751f\u6210\u4ee3\u7801\u4ee5\u5b8c\u6210\u5404\u79cd\u5206\u6790\u4efb\u52a1\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5448\u73b0\u539f\u59cb\u4ee3\u7801\u53ef\u80fd\u4f1a\u4f7f\u903b\u8f91\u53d8\u5f97\u6a21\u7cca\uff0c\u5e76\u59a8\u788d\u7528\u6237\u9a8c\u8bc1\u3002\u4e3a\u4e86\u8d4b\u4e88\u7528\u6237\u5bf9\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\u8fdb\u884c\u589e\u5f3a\u7406\u89e3\u4e0e\u63a7\u5236\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u8f6c\u6362\u4e3a\u5b9e\u65f6\u4ea4\u4e92\u5f0f\u7684\u53ef\u89c6\u5316\u8868\u793a\u3002\u5728\u8be5\u65b9\u6cd5\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u5b9e\u65f6\u83b7\u5f97\u6e05\u6670\u3001\u5206\u6b65\u7684LLM\u4ee3\u7801\u53ef\u89c6\u5316\uff0c\u5141\u8bb8\u4ed6\u4eec\u7406\u89e3\u3001\u9a8c\u8bc1\u5e76\u4fee\u6539\u5206\u6790\u4e2d\u7684\u6bcf\u4e2a\u6570\u636e\u64cd\u4f5c\u3002\u6211\u4eec\u7684\u8bbe\u8ba1\u51b3\u7b56\u57fa\u4e8e\u4e00\u9879\u63a2\u7d22\u7528\u6237\u5b9e\u8df5\u4e0e\u6311\u6218\u7684\u5f62\u6210\u6027\u7814\u7a76\uff08N=8\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u540d\u4e3aWaitGPT\u7684\u539f\u578b\uff0c\u5e76\u8fdb\u884c\u4e86\u4e00\u9879\u7528\u6237\u7814\u7a76\uff08N=12\uff09\uff0c\u4ee5\u8bc4\u4f30\u5176\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\u3002\u7528\u6237\u7814\u7a76\u7684\u7ed3\u679c\u8868\u660e\uff0cWaitGPT\u6709\u52a9\u4e8e\u76d1\u63a7\u548c\u5f15\u5bfc\u7531LLM\u6267\u884c\u7684\u6570\u636e\u5206\u6790\uff0c\u4f7f\u53c2\u4e0e\u8005\u80fd\u591f\u63d0\u9ad8\u9519\u8bef\u68c0\u6d4b\u80fd\u529b\u5e76\u589e\u52a0\u5bf9\u7ed3\u679c\u7684\u6574\u4f53\u4fe1\u5fc3\u3002|\n", "2408.01667": "|**2024-08-03**|**Automated Phishing Detection Using URLs and Webpages**|Huilin Wang et.al.|[2408.01667](http://arxiv.org/abs/2408.01667)|null|### \u6458\u8981 \u672c\u6587\u9879\u76ee\u805a\u7126\u4e8e\u901a\u8fc7\u6784\u5efa\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u4ee5\u89e3\u51b3\u4f20\u7edf\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3b\u52a8\u83b7\u53d6\u548c\u5229\u7528\u5728\u7ebf\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u52a8\u6001\u7684\u53c2\u8003\u7cfb\u7edf\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u7cbe\u786e\u7684\u9493\u9c7c\u68c0\u6d4b\u3002\u8fd9\u4e00\u521b\u65b0\u907f\u514d\u4e86\u4f9d\u8d56\u9759\u6001\u77e5\u8bc6\u5e93\u7684\u9700\u6c42\uff0c\u663e\u8457\u63d0\u5347\u4e86\u81ea\u52a8\u5316\u5b89\u5168\u63aa\u65bd\u7684\u9002\u5e94\u6027\u548c\u6548\u7387\u3002 ### \u9879\u76ee\u6982\u8ff0 \u9879\u76ee\u62a5\u544a\u9996\u5148\u5bf9\u73b0\u6709\u89e3\u51b3\u65b9\u6848\u8fdb\u884c\u4e86\u521d\u6b65\u7814\u7a76\u548c\u95ee\u9898\u5206\u6790\uff0c\u4fc3\u4f7f\u6211\u4eec\u5f00\u53d1\u51fa\u65b0\u7684\u6846\u67b6\u3002\u6211\u4eec\u4ee5\u6a21\u62df\u7684LLM\u4ee3\u7406\u6765\u5c55\u793a\u6846\u67b6\uff0c\u5e76\u8be6\u7ec6\u9610\u8ff0\u4e86\u6784\u5efa\u6240\u9700\u7684\u6280\u672f\uff0c\u968f\u540e\u63d0\u4f9b\u4e86\u5b8c\u6574\u5b9e\u65bd\u7684\u5b9e\u4f8b\u53ca\u5b9e\u9a8c\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0\u65b9\u6cd5\u76f8\u5bf9\u4e8e\u540c\u7c7b\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u5ea6\u4e0a\u8fbe\u5230\u4e860.945\uff0c\u76f8\u6bd4\u73b0\u6709\u89e3\u51b3\u65b9\u6848DynaPhish\u9ad8\u51fa0.445\u4e2a\u767e\u5206\u70b9\u3002 ### \u6027\u80fd\u4e0e\u5c40\u9650 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u672c\u6846\u67b6\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5f53\u524d\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u5177\u6709\u9002\u5e94\u5b9e\u9645\u5e94\u7528\u7684\u6f5c\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u8ba8\u8bba\u4e86\u8be5\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u8fdb\u7b56\u7565\uff0c\u65e8\u5728\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u6548\u80fd\u3002 ### \u7ed3\u8bba \u63d0\u51fa\u7684\u6846\u67b6\u4e3a\u589e\u5f3a\u73b0\u6709\u7684\u57fa\u4e8e\u53c2\u8003\u7684\u9493\u9c7c\u68c0\u6d4b\u624b\u6bb5\u63d0\u4f9b\u4e86\u6709\u6548\u9014\u5f84\uff0c\u5e76\u4e14\u5177\u5907\u88ab\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.03910": "|**2024-08-11**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u5f80\u5f80\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86CodexGraph\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u4e0eLLM\u4ee3\u7406\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u548c\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0cCodexGraph\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u7684\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5CodexGraph\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u901a\u8fc7\u4f7f\u7528\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0cCodexGraph\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u7528\u9014\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03631": "|**2024-08-07**|**Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent**|Yanhu Wang et.al.|[2408.03631](http://arxiv.org/abs/2408.03631)|null|\u4f20\u7edf\u7684\u57fa\u7ad9\u9009\u5740\uff08BSS\uff09\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u9a7e\u9a76\u6d4b\u8bd5\u548c\u7528\u6237\u53cd\u9988\uff0c\u8fd9\u65e2\u8d39\u65f6\u53c8\u9700\u8981\u5728\u901a\u4fe1\u3001\u7f51\u7edc\u548c\u4f18\u5316\u65b9\u9762\u5177\u5907\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u76f8\u5173\u6280\u672f\u7684\u53d1\u5c55\uff0c\u7279\u522b\u662f\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u4ee3\u7406\u5de5\u7a0b\u9886\u57df\uff0c\u7f51\u7edc\u4f18\u5316\u5c06\u89c1\u8bc1\u4e00\u573a\u9769\u547d\u6027\u7684\u8f6c\u53d8\u3002\u8fd9\u79cd\u8f6c\u53d8\u6d89\u53ca\u5de7\u5999\u5730\u4f7f\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u6765\u5411\u8fd9\u4e9b\u590d\u6742\u800c\u5148\u8fdb\u7684LLMs\u6ce8\u5165\u4eba\u7c7b\u7ecf\u9a8c\u548c\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8fde\u63a5\u5230\u4eba\u7c7b\u7528\u6237\uff0c\u90e8\u7f72\u81ea\u4e3b\u4ee3\u7406\u4f5c\u4e3a\u901a\u4fe1\u6865\u6881\u3002\u8fd9\u79cd\u96c6\u6210\u4ee3\u8868\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4f5c\u4e3a\u4e00\u79cd\u670d\u52a1\u548cAI\u4f7f\u751f\u6d3b\u66f4\u4fbf\u6377\u7684\u672a\u6765\u8303\u5f0f\u3002 \u4f5c\u4e3a\u521d\u6b65\u63a2\u7d22\uff0c\u672c\u7814\u7a76\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u4e2a\u7531LLM\u9a71\u52a8\u7684BSS\u4f18\u5316\u6846\u67b6\uff0c\u5e76\u63d0\u51fa\u4e86\u56db\u79cd\u6f5c\u5728\u7684\u5b9e\u73b0\u7b56\u7565\uff1a\u57fa\u4e8e\u4f18\u5316\u63d0\u793a\u7684LLM\uff08PoL\uff09\u3001\u4eba\u673a\u4ea4\u4e92\u7684LLM\uff08HiLL\uff09\u3001LLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08LaBa\uff09\u4ee5\u53ca\u534f\u540c\u591a\u4e2aLLM\u9a71\u52a8\u7684\u81ea\u4e3bBSS\u4ee3\u7406\uff08CLaBa\uff09\u3002\u901a\u8fc7\u5728\u771f\u5b9e\u6570\u636e\u4e0a\u7684\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u8868\u660e\uff0c\u501f\u52a9\u63d0\u793a\u7684LLM\u548c\u57fa\u4e8e\u4ee3\u7406\u7684LLM\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u9ad8\u6548\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u53ef\u9760\u7684\u7f51\u7edc\u90e8\u7f72\uff0c\u663e\u8457\u63d0\u9ad8\u4e86BSS\u4f18\u5316\u7684\u6548\u7387\u5e76\u51cf\u5c11\u4e86\u4e0d\u5fc5\u8981\u7684\u624b\u52a8\u53c2\u4e0e\u3002|\n", "2408.04168": "|**2024-08-08**|**Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions**|Qingbin Zeng et.al.|[2408.04168](http://arxiv.org/abs/2408.04168)|**[link](https://github.com/hiyouga/llama-factory)**|\u672c\u6587\u63a2\u8ba8\u4e86\u57ce\u5e02\u5bfc\u822a\u573a\u666f\u4e0b\u7684AI\u4ee3\u7406\u95ee\u9898\uff1a\u63d0\u4f9b\u76ee\u6807\u4f4d\u7f6e\u4e0e\u77e5\u540d\u5730\u6807\u4e4b\u95f4\u7684\u8bed\u8a00\u63cf\u8ff0\uff1b\u4ec5\u901a\u8fc7\u89c2\u5bdf\u5468\u56f4\u73af\u5883\uff0c\u5305\u62ec\u8bc6\u522b\u5730\u6807\u548c\u9053\u8def\u7f51\u7edc\u8fde\u63a5\uff0c\u4ee3\u7406\u9700\u8981\u4f5c\u51fa\u51b3\u7b56\u4ee5\u65e0\u6307\u793a\u5730\u5bfc\u822a\u81f3\u76ee\u6807\u4f4d\u7f6e\u3002\u8fd9\u4e00\u6311\u6218\u6027\u5728\u4e8e\uff0c\u5b83\u8981\u6c42\u4ee3\u7406\u5efa\u7acb\u81ea\u8eab\u5b9a\u4f4d\u5e76\u83b7\u53d6\u590d\u6742\u57ce\u5e02\u73af\u5883\u7684\u7a7a\u95f4\u8868\u793a\uff0c\u800c\u5730\u6807\u5f80\u5f80\u4e0d\u53ef\u89c1\u3002\u5728\u7f3a\u4e4f\u5bfc\u822a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u79cd\u80fd\u529b\u5bf9\u4e8e\u4ee3\u7406\u5728\u957f\u8ddd\u79bb\u57ce\u5e02\u5bfc\u822a\u4e2d\u505a\u51fa\u9ad8\u8d28\u91cf\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u63a8\u7406\u80fd\u529b\u7684\u6d8c\u73b0\uff0c\u4e00\u4e2a\u5438\u5f15\u4eba\u7684\u57fa\u7840\u65b9\u6cd5\u662f\u63d0\u793aLLMs\u5bf9\u6bcf\u6b21\u89c2\u5bdf\u505a\u51fa\u201c\u53cd\u5e94\u201d\u5e76\u636e\u6b64\u4f5c\u51fa\u51b3\u7b56\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6027\u80fd\u975e\u5e38\u5dee\uff0c\u4ee3\u7406\u7ecf\u5e38\u53cd\u590d\u8bbf\u95ee\u76f8\u540c\u4f4d\u7f6e\uff0c\u5e76\u4f5c\u51fa\u77ed\u89c6\u3001\u4e0d\u4e00\u81f4\u7684\u51b3\u7b56\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5176\u7279\u5f81\u5728\u4e8e\u611f\u77e5\u3001\u53cd\u601d\u548c\u89c4\u5212\u7684\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7\u5fae\u8c03\u7684LLaVA-7B\u80fd\u591f\u51c6\u786e\u611f\u77e5\u5730\u6807\u7684\u65b9\u5411\u548c\u8ddd\u79bb\uff0c\u9002\u7528\u4e8e\u57ce\u5e02\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u8bb0\u5fc6\u673a\u5236\u5b9e\u73b0\u53cd\u601d\uff0c\u5373\u5b58\u50a8\u8fc7\u5f80\u7ecf\u9a8c\u5e76\u5728\u5f53\u524d\u611f\u77e5\u4e0b\u68c0\u7d22\uff0c\u4ee5\u8fdb\u884c\u6709\u6548\u7684\u51b3\u7b56\u8bba\u8bc1\u3002\u89c4\u5212\u5219\u5229\u7528\u53cd\u601d\u7ed3\u679c\u751f\u6210\u957f\u671f\u8ba1\u5212\uff0c\u4ece\u800c\u907f\u514d\u957f\u8ddd\u79bb\u5bfc\u822a\u4e2d\u7684\u77ed\u89c6\u51b3\u7b56\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bbe\u8ba1\u7684\u5de5\u4f5c\u6d41\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86LLM\u4ee3\u7406\u7684\u5bfc\u822a\u80fd\u529b\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u662f\u5426\u4f1a\u635f\u5bb3LLM\u4ee3\u7406\u5728\u957f\u4e0a\u4e0b\u6587\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u8fdb\u8ba1\u5212\uff1f\uff084\uff09\u5bf9LLM\u8fdb\u884c\u6b63\u8d1f\u53cd\u9988\u7ed3\u5408\u7684\u5fae\u8c03\u662f\u5426\u80fd\u5e26\u6765\u8fdb\u4e00\u6b65\u7684\u63d0\u5347\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u5b83\u4eec\u5728\u5173\u6ce8\u957f\u4e0a\u4e0b\u6587\u4e2d\u5173\u952e\u90e8\u5206\u7684\u80fd\u529b\u4e0a\u4ecd\u7136\u5b58\u5728\u4e0d\u8db3\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u5728\u5206\u6790\u957f\u8ba1\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u5e76\u4e14\u65e0\u6cd5\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u7528\u4e8e\u7ec6\u5316\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Feedback-Aware Fine-Tuning\uff08FAFT\uff09\uff0c\u4e00\u79cd\u5229\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\uff0c\u76f8\u8f83\u4e8e\u7eaf\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\uff0cFAFT\u5728\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u5173\u4e8e\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.05346": "|**2024-08-13**|**DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts**|Mohammed Saidul Islam et.al.|[2408.05346](http://arxiv.org/abs/2408.05346)|**[link](https://github.com/saidul-islam98/DataNarrative)**|\u6570\u636e\u9a71\u52a8\u7684\u6545\u4e8b\u53d9\u8ff0\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ed3\u5408\u53d9\u4e8b\u6280\u5de7\u4e0e\u53ef\u89c6\u5316\u548c\u6587\u672c\uff0c\u6765\u4f20\u8fbe\u89c1\u89e3\u3002\u8fd9\u4e9b\u6545\u4e8b\u878d\u5408\u4e86\u56fe\u8868\u4e2d\u7684\u7a81\u51fa\u6761\u5f62\u548c\u7ebf\u6761\u4ee5\u53ca\u89e3\u91ca\u89c1\u89e3\u7684\u6587\u672c\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u521b\u5efa\u8fd9\u6837\u7684\u6545\u4e8b\u9700\u8981\u5bf9\u6570\u636e\u6709\u6df1\u5165\u7684\u7406\u89e3\uff0c\u5e76\u4e14\u9700\u8981\u7cbe\u5fc3\u7684\u53d9\u4e8b\u89c4\u5212\uff0c\u901a\u5e38\u9700\u8981\u4eba\u7c7b\u7684\u4ecb\u5165\uff0c\u8fd9\u65e2\u8017\u65f6\u53c8\u8d39\u5fc3\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u751f\u6210\u8fde\u8d2f\u548c\u5168\u9762\u7684\u6570\u636e\u6545\u4e8b\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u4efb\u52a1\u2014\u2014\u6570\u636e\u6545\u4e8b\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u76841,449\u4e2a\u6545\u4e8b\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u521b\u9020\u8fde\u8d2f\u6570\u636e\u6545\u4e8b\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5229\u7528\u4e24\u4e2aLLM\u4ee3\u7406\u6765\u6a21\u4eff\u4eba\u7c7b\u8bb2\u6545\u4e8b\u7684\u8fc7\u7a0b\uff1a\u4e00\u4e2a\u7528\u4e8e\u7406\u89e3\u5e76\u63cf\u8ff0\u6570\u636e\u3001\u751f\u6210\u5927\u7eb2\u548c\u53d9\u8ff0\uff0c\u53e6\u4e00\u4e2a\u5219\u5728\u6bcf\u4e2a\u4e2d\u95f4\u6b65\u9aa4\u8fdb\u884c\u9a8c\u8bc1\u3002\u5c3d\u7ba1\u6211\u4eec\u7684\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8e\u6a21\u578b\u548c\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u901a\u5e38\u4f18\u4e8e\u975e\u4ee3\u7406\u5bf9\u624b\uff0c\u4f46\u7ed3\u679c\u4e5f\u63ed\u793a\u4e86\u6570\u636e\u6545\u4e8b\u751f\u6210\u7684\u72ec\u7279\u6311\u6218\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u89e3\u51b3SWE-Bench Lite\u4e2d\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\uff0c\u4e00\u4e2a\u65e8\u5728\u5229\u7528\u5176\u72ec\u7279\u4e13\u957f\u7684\u6846\u67b6\u3002DEI\u4f5c\u4e3a\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90SWE\u4ee3\u7406\uff0c\u5176\u6700\u9ad8\u4e2a\u4f53\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u4e3a27.3%\uff0c\u5728\u5e94\u7528\u4e86DEI\u540e\uff0c\u80fd\u591f\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u8868\u73b0\u56e2\u961f\u4ee555%\u7684\u89e3\u51b3\u7387\u5728SWE-Bench Lite\u4e2d\u53d6\u5f97\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5cAI\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06520": "|**2024-08-12**|**Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning**|Chuanneng Sun et.al.|[2408.06520](http://arxiv.org/abs/2408.06520)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5b83\u4eec\u6210\u4e3a\u673a\u5668\u4eba\u51b3\u7b56\u7684\u6709\u5e0c\u671b\u5019\u9009\u8005\u3002\u53d7\u5230\u5c42\u6b21\u5f3a\u5316\u5b66\u4e60\uff08HRL\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014\u5728\u4e0a\u4e0b\u6587\u4e2d\u8fdb\u884c\u5c42\u6b21\u5316\u7684\u5f3a\u5316\u5b66\u4e60\uff08HCRL\uff09\u3002\u8be5\u6846\u67b6\u901a\u8fc7LLM\u57fa\u9ad8\u5c42\u7b56\u7565\u5206\u89e3\u590d\u6742\u4efb\u52a1\uff0c\u5373\u901a\u8fc7\u5728\u6267\u884c\u65f6\u52a8\u6001\u5206\u89e3\u590d\u6742\u4efb\u52a1\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u5229\u7528\u9ad8\u9636\u7b56\u7565\u6765\u5b9a\u4e49\u76ee\u6807\uff0c\u8fd9\u4e9b\u76ee\u6807\u7531\u5b50\u4efb\u52a1\u7ec4\u6210\uff0c\u5e76\u5206\u914d\u7ed9\u4f4e\u9636\u7b56\u7565\u4ee5\u5b8c\u6210\u3002\u4e00\u65e6LLM\u4ee3\u7406\u786e\u5b9a\u76ee\u6807\u5df2\u5b8c\u6210\uff0c\u5219\u4f1a\u63d0\u51fa\u65b0\u7684\u76ee\u6807\u3002 \u4e3a\u4e86\u63d0\u9ad8\u591a\u8f6e\u6267\u884c\u4e2d\u7684\u4ee3\u7406\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e8b\u540e\u6a21\u5757\u5316\u53cd\u601d\uff08HMR\uff09\uff0c\u5176\u4e2d\uff0c\u4ee3\u7406\u4e0d\u662f\u5bf9\u5b8c\u6574\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u800c\u662f\u5c06\u4efb\u52a1\u76ee\u6807\u66ff\u6362\u4e3a\u4e2d\u95f4\u76ee\u6807\uff0c\u5e76\u8ba9\u4ee3\u7406\u5bf9\u8f83\u77ed\u7684\u8f68\u8ff9\u8fdb\u884c\u53cd\u601d\uff0c\u4ee5\u63d0\u9ad8\u53cd\u601d\u6548\u7387\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u57fa\u51c6\u73af\u5883\u4e2d\u8bc4\u4f30\u4e86\u6240\u63d0\u51fa\u7684HCRL\u7684\u51b3\u7b56\u80fd\u529b\u2014\u2014ALFWorld\u3001Webshop\u548cHotpotQA\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5f3a\u5927\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5728\u4e94\u8f6e\u6267\u884c\u4e2d\uff0cHCRL\u53ef\u5b9e\u73b09%\u300142%\u548c10%\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.07199": "|**2024-08-13**|**Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents**|Pranav Putta et.al.|[2408.07199](http://arxiv.org/abs/2408.07199)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9700\u8981\u590d\u6742\u63a8\u7406\u7684\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u8fdb\u884c\u81ea\u4e3b\u4ee3\u7406\u7684\u591a\u6b65\u9aa4\u63a8\u7406\u5e94\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4f20\u7edf\u7684\u57fa\u4e8e\u9759\u6001\u6570\u636e\u96c6\u7684\u76d1\u7763\u9884\u8bad\u7ec3\u4e0d\u8db3\u4ee5\u4f7f\u81ea\u4e3b\u4ee3\u7406\u5177\u5907\u5728\u52a8\u6001\u8bbe\u7f6e\u5982\u7f51\u7edc\u5bfc\u822a\u4e2d\u6267\u884c\u590d\u6742\u51b3\u7b56\u6240\u9700\u7684\u81ea\u4e3b\u80fd\u529b\u3002\u4ee5\u5f80\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u6765\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\u7684\u65b9\u6cd5\u5f80\u5f80\u9762\u4e34\u7d2f\u79ef\u9519\u8bef\u548c\u63a2\u7d22\u6570\u636e\u6709\u9650\u7684\u95ee\u9898\uff0c\u5bfc\u81f4\u653f\u7b56\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7ed3\u5408\u4e86\u5f15\u5bfc\u5f0f\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u641c\u7d22\u4e0e\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u5e76\u4f7f\u7528\u79bb\u7b56\u7565\u53d8\u4f53\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b97\u6cd5\u5bf9\u4ee3\u7406\u4e92\u52a8\u8fdb\u884c\u8fed\u4ee3\u5fae\u8c03\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8LLM\u4ee3\u7406\u4ece\u6210\u529f\u548c\u5931\u8d25\u7684\u8f68\u8ff9\u4e2d\u6709\u6548\u5b66\u4e60\uff0c\u4ece\u800c\u5728\u590d\u6742\u3001\u591a\u6b65\u9aa4\u63a8\u7406\u4efb\u52a1\u4e2d\u63d0\u9ad8\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u5728WebShop\u73af\u5883\uff08\u4e00\u4e2a\u6a21\u62df\u7535\u5b50\u5546\u52a1\u5e73\u53f0\uff09\u4e2d\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8be5\u73af\u5883\u5728\u4e0e\u884c\u4e3a\u514b\u9686\u548c\u5f3a\u5316\u5fae\u8c03\u57fa\u7ebf\u76f8\u6bd4\u65f6\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u5728\u914d\u5907\u5728\u7ebf\u641c\u7d22\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51fb\u8d25\u4e86\u5e73\u5747\u4eba\u7c7b\u6027\u80fd\u3002\u5728\u5b9e\u9645\u9884\u8ba2\u573a\u666f\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86Llama-3 70B\u6a21\u578b\u7684\u96f6\u5c04\u6210\u529f\u7387\u4ece18.6%\u589e\u52a0\u523081.7%\uff08\u76f8\u5bf9\u589e\u52a0\u4e86340%\uff09\uff0c\u5e76\u5728\u4e00\u5929\u7684\u6570\u636e\u6536\u96c6\u540e\u8fdb\u4e00\u6b65\u589e\u52a0\u523095.4%\uff0c\u5e76\u4e14\u901a\u8fc7\u5728\u7ebf\u641c\u7d22\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u6807\u5fd7\u7740\u81ea\u4e3b\u4ee3\u7406\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u8fdb\u6b65\uff0c\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5b9e\u73b0\u66f4\u9ad8\u7ea7\u548c\u53ef\u9760\u51b3\u7b56\u7684\u9053\u8def\u3002|\n", "2408.08158": "|**2024-08-15**|**EmBARDiment: an Embodied AI Agent for Productivity in XR**|Riccardo Bovo et.al.|[2408.08158](http://arxiv.org/abs/2408.08158)|null|XR\u8bbe\u5907\u642d\u8f7d\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u59cb\u7ec8\u5728\u7ebf\u7684\u4ee3\u7406\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u9ad8\u6548\u7684\u5de5\u4f5c\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u5c4f\u5e55\u7684\u804a\u5929\u673a\u5668\u4eba\u5e76\u672a\u5145\u5206\u5229\u7528XR\u6240\u63d0\u4f9b\u7684\u5168\u9762\u81ea\u7136\u8f93\u5165\uff0c\u5305\u62ec\u5185\u90e8\u9762\u5411\u7684\u4f20\u611f\u5668\u6570\u636e\uff0c\u800c\u662f\u8fc7\u5ea6\u4f9d\u8d56\u660e\u786e\u7684\u58f0\u97f3\u6216\u6587\u672c\u63d0\u793a\uff0c\u6709\u65f6\u8fd8\u4f1a\u4e0e\u4f5c\u4e3a\u67e5\u8be2\u7684\u4e00\u90e8\u5206\u6295\u5c04\u7684\u591a\u6a21\u6001\u6570\u636e\u914d\u5bf9\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u6ce8\u610f\u529b\u6846\u67b6\u4ece\u7528\u6237\u884c\u4e3a\u3001\u6ce8\u89c6\u70b9\u548cXR\u73af\u5883\u4e2d\u7684\u4e0a\u4e0b\u6587\u8bb0\u5fc6\u4e2d\u9690\u5f0f\u5730\u63a8\u5bfc\u51fa\u80cc\u666f\u4fe1\u606f\uff0c\u4ece\u800c\u6700\u5c0f\u5316\u5bf9\u5de5\u7a0b\u5316\u660e\u786e\u63d0\u793a\u7684\u9700\u6c42\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u4e14\u76f4\u89c2\u7684\u4ea4\u4e92\uff0c\u8fd9\u4e9b\u4ea4\u4e92\u80fd\u591f\u6d1e\u5bdf\u7528\u6237\u7684\u89c1\u89e3\u5e76\u4e3a\u804a\u5929\u673a\u5668\u4eba\u63d0\u4f9b\u4fe1\u606f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u5c55\u793a\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u5728XR\u4e2d\u4e0e\u804a\u5929\u673a\u5668\u4eba\u8fdb\u884c\u4ea4\u4e92\u7684\u6f5c\u5728\u53d8\u9769\u6027\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765XR-\u5b9e\u4f53LLM\u4ee3\u7406\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.08054": "|**2024-08-15**|**Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework**|Changyu Du et.al.|[2408.08054](http://arxiv.org/abs/2408.08054)|null|\u4f20\u7edf\u7684\u5efa\u7b51\u4fe1\u606f\u6a21\u578b\uff08BIM\uff09\u521b\u5efa\u8fc7\u7a0b\u901a\u5e38\u8981\u6c42\u8bbe\u8ba1\u5e08\u638c\u63e1\u590d\u6742\u4e14\u7e41\u7410\u7684\u5efa\u6a21\u547d\u4ee4\uff0c\u4ee5\u5728BIM\u521b\u5efa\u5de5\u5177\u4e2d\u5b9e\u73b0\u5176\u8bbe\u8ba1\u610f\u56fe\u3002\u8fd9\u79cd\u989d\u5916\u7684\u8ba4\u77e5\u8d1f\u62c5\u4f7f\u8bbe\u8ba1\u8fc7\u7a0b\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u963b\u788d\u4e86\u5efa\u7b51\u3001\u5de5\u7a0b\u548c\u65bd\u5de5\uff08AEC\uff09\u884c\u4e1a\u5bf9BIM\u548c\u57fa\u4e8e\u6a21\u578b\u7684\u8bbe\u8ba1\u7684\u91c7\u7528\u3002 \u4e3a\u4e86\u66f4\u76f4\u89c2\u5730\u8868\u8fbe\u8bbe\u8ba1\u610f\u56fe\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014Text2BIM\u3002\u8be5\u6846\u67b6\u80fd\u591f\u4ece\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u62103D\u5efa\u7b51\u6a21\u578b\u3002\u5b83\u901a\u8fc7\u534f\u8c03\u591a\u4e2aLLM\u4ee3\u7406\u534f\u4f5c\u5e76\u63a8\u7406\uff0c\u5c06\u6587\u672c\u7528\u6237\u8f93\u5165\u8f6c\u6362\u4e3a\u8c03\u7528BIM\u521b\u5efa\u5de5\u5177API\u7684\u6307\u4ee4\u4ee3\u7801\uff0c\u4ece\u800c\u5728\u8f6f\u4ef6\u4e2d\u751f\u6210\u5177\u6709\u5185\u90e8\u5e03\u5c40\u3001\u5916\u90e8\u5916\u58f3\u548c\u8bed\u4e49\u4fe1\u606f\u7684\u53ef\u7f16\u8f91BIM\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u89c4\u5219\u7684\u6a21\u578b\u68c0\u67e5\u5668\uff0c\u5229\u7528\u9884\u5b9a\u4e49\u7684\u9886\u57df\u77e5\u8bc6\u6307\u5bfcLLM\u4ee3\u7406\u89e3\u51b3\u751f\u6210\u6a21\u578b\u4e2d\u7684\u95ee\u9898\uff0c\u5e76\u8fed\u4ee3\u6539\u8fdb\u6a21\u578b\u8d28\u91cf\u3002 \u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\u6765\u6bd4\u8f83\u548c\u5206\u6790\u5728\u63d0\u8bae\u6846\u67b6\u4e0b\u4e09\u79cd\u4e0d\u540cLLM\u7684\u8868\u73b0\u3002\u8bc4\u4f30\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u6709\u6548\u5730\u751f\u6210\u9ad8\u8d28\u91cf\u3001\u7ed3\u6784\u5408\u7406\u4e14\u4e0e\u7528\u6237\u8f93\u5165\u6307\u5b9a\u7684\u62bd\u8c61\u6982\u5ff5\u76f8\u4e00\u81f4\u7684\u5efa\u7b51\u6a21\u578b\u3002 \u6700\u540e\uff0c\u5f00\u53d1\u4e86\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u8f6f\u4ef6\u539f\u578b\uff0c\u5c06\u8be5\u6846\u67b6\u96c6\u6210\u5230BIM\u521b\u5efa\u8f6f\u4ef6Vectorworks\u4e2d\uff0c\u5c55\u793a\u4e86\u901a\u8fc7\u804a\u5929\u8fdb\u884c\u5efa\u6a21\u7684\u6f5c\u529b\u3002|\n", "2408.09955": "|**2024-08-20**|**MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems**|Qian Wang et.al.|[2408.09955](http://arxiv.org/abs/2408.09955)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0cLLM\u9a71\u52a8\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08LLM-MA\u7cfb\u7edf\uff09\u88ab\u63d0\u51fa\u4ee5\u5e94\u5bf9\u5b9e\u9645\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u7684\u667a\u80fd\u4f53\u5927\u591a\u9075\u5faa\u5728\u6574\u4f53\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8\u7684\u9884\u5b9a\u4e49\u6807\u51c6\u64cd\u4f5c\u7a0b\u5e8f\uff08SOP\uff09\uff0c\u7f3a\u4e4f\u81ea\u4e3b\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u6b64\u5916\uff0c\u5f53\u524d\u89e3\u51b3\u65b9\u6848\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u6548\u667a\u80fd\u4f53\u5408\u4f5c\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MegaAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u4fc3\u8fdb\u5927\u89c4\u6a21LLM\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u81ea\u4e3b\u5408\u4f5c\u7684\u5b9e\u7528\u6846\u67b6\u3002MegaAgent\u5229\u7528\u667a\u80fd\u4f53\u7684\u81ea\u4e3b\u6027\u52a8\u6001\u751f\u6210\u57fa\u4e8e\u4efb\u52a1\u9700\u6c42\u7684\u667a\u80fd\u4f53\uff0c\u96c6\u6210\u4e86\u4efb\u52a1\u81ea\u52a8\u5212\u5206\u3001\u667a\u80fd\u4f53\u6d3b\u52a8\u7cfb\u7edf\u7ea7\u89c4\u5212\u4e0e\u76d1\u63a7\u4ee5\u53ca\u5e76\u53d1\u64cd\u4f5c\u7ba1\u7406\u7b49\u529f\u80fd\u3002\u6b64\u5916\uff0cMegaAgent\u91c7\u7528\u5c42\u6b21\u7ed3\u6784\u8bbe\u8ba1\uff0c\u5e76\u5229\u7528\u7cfb\u7edf\u7ea7\u5e76\u884c\u6027\u6765\u63d0\u5347\u6027\u80fd\u548c\u589e\u5f3a\u901a\u4fe1\u6548\u7387\u3002 \u6211\u4eec\u901a\u8fc7\u56f4\u68cb\u6e38\u620f\u5f00\u53d1\u5c55\u793a\u4e86MegaAgent\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5b83\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6d41\u884c\u7684LLM-MA\u7cfb\u7edf\uff1b\u5e76\u901a\u8fc7\u56fd\u5bb6\u653f\u7b56\u6a21\u62df\u9a8c\u8bc1\u4e86\u5176\u9ad8\u81ea\u4e3b\u6027\u548c\u5feb\u901f\u6269\u5c55\u81f3590\u4e2a\u667a\u80fd\u4f53\u7684\u80fd\u529b\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u5b83\u4eec\u4e4b\u95f4\u7684\u6709\u6548\u5408\u4f5c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cMegaAgent\u662f\u9996\u4e2a\u65e0\u9884\u5b9a\u4e49SOP\u3001\u9ad8\u6548\u4e14\u5177\u6709\u9ad8\u53ef\u6269\u5c55\u6027\u7684\u5927\u89c4\u6a21LLM-MA\u7cfb\u7edf\uff0c\u4e3a\u8be5\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u94fa\u5e73\u4e86\u9053\u8def\u3002\u6211\u4eec\u7684\u4ee3\u7801\u4f4d\u4e8e\u3002|\n", "2408.09785": "|**2024-08-19**|**GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making**|Arsham Gholamzadeh Khoee et.al.|[2408.09785](http://arxiv.org/abs/2408.09785)|null|\u5728\u6c7d\u8f66\u884c\u4e1a\u4e2d\uff0c\u4f20\u7edf\u8f6f\u4ef6\u90e8\u7f72\u51b3\u7b56\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u5bf9\u8868\u683c\u5316\u6d4b\u8bd5\u6570\u636e\u7684\u624b\u52a8\u5206\u6790\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u5bfc\u81f4\u66f4\u9ad8\u7684\u6210\u672c\u548c\u8f6f\u4ef6\u53d1\u5e03\u5468\u671f\u7684\u5ef6\u8fdf\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u7279\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u5e94\u7528\u901a\u5e38\u9700\u8981\u591a\u8f6e\u7684\u4eba\u5de5\u9a71\u52a8\u63d0\u793a\u5de5\u7a0b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u5728\u5de5\u4e1a\u6700\u7ec8\u7528\u6237\u4e2d\u7684\u5b9e\u9645\u90e8\u7f72\uff0c\u7279\u522b\u662f\u90a3\u4e9b\u9700\u8981\u53ef\u9760\u548c\u9ad8\u6548\u7ed3\u679c\u7684\u7528\u6237\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGoNoGo\u7684LLM\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u7b80\u5316\u6c7d\u8f66\u8f6f\u4ef6\u90e8\u7f72\u8fc7\u7a0b\uff0c\u540c\u65f6\u6ee1\u8db3\u529f\u80fd\u8981\u6c42\u548c\u5de5\u4e1a\u7ea6\u675f\u3002\u4e0e\u4ee5\u5f80\u7cfb\u7edf\u4e0d\u540c\uff0cGoNoGo\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u548c\u98ce\u9669\u654f\u611f\u7cfb\u7edf\u8fdb\u884c\u4e86\u5b9a\u5236\u3002\u6211\u4eec\u4f7f\u7528\u6765\u81ea\u5de5\u4e1a\u5b9e\u8df5\u7684\u96f6\u6b21\u548c\u5c11\u91cf\u6b21\u793a\u4f8b\u6765\u8bc4\u4f30GoNoGo\u5728\u4e0d\u540c\u4efb\u52a1\u96be\u5ea6\u4e0b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0cGoNoGo\u5728\u96be\u5ea6\u4e0d\u8d85\u8fc7\u4e8c\u7ea7\u76843\u6b21\u793a\u4f8b\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86100%\u7684\u6210\u529f\u7387\uff0c\u5e76\u4e14\u5373\u4f7f\u5bf9\u4e8e\u66f4\u590d\u6742\u7684\u4efb\u52a1\u4e5f\u80fd\u4fdd\u6301\u9ad8\u7ee9\u6548\u3002\u6211\u4eec\u53d1\u73b0\uff0cGoNoGo\u6709\u6548\u5730\u81ea\u52a8\u5316\u4e86\u8f83\u7b80\u5355\u4efb\u52a1\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5e72\u9884\u7684\u9700\u6c42\u3002\u603b\u4e4b\uff0cGoNoGo\u4ee3\u8868\u4e86\u4e00\u4e2a\u76ee\u524d\u5728\u6211\u4eec\u7684\u5de5\u4e1a\u5408\u4f5c\u4f19\u4f34\u516c\u53f8\u4e2d\u88ab\u7528\u4e8e\u534f\u52a9\u8f6f\u4ef6\u53d1\u5e03\u51b3\u7b56\u7684\u9ad8\u6548\u4e14\u7528\u6237\u53cb\u597d\u7684LLM\u57fa\u89e3\u51b3\u65b9\u6848\uff0c\u652f\u6301\u4e86\u98ce\u9669\u654f\u611f\u8f66\u8f86\u7cfb\u7edf\u53d1\u5e03\u8fc7\u7a0b\u4e2d\u7684\u66f4\u52a0\u660e\u667a\u548c\u53ca\u65f6\u7684\u51b3\u7b56\u3002|\n", "2408.09559": "|**2024-08-18**|**HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model**|Mengkang Hu et.al.|[2408.09559](http://arxiv.org/abs/2408.09559)|**[link](https://github.com/hiagent2024/hiagent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f5c\u4e3a\u80fd\u591f\u5904\u7406\u73af\u5883\u89c2\u5bdf\u5e76\u751f\u6210\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u76ee\u6807\u4efb\u52a1\u7684\u4ea4\u4e92\u7cfb\u7edf\u3002\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d7\u5230\u5176\u8bb0\u5fc6\u673a\u5236\u7684\u5f71\u54cd\uff0c\u8be5\u673a\u5236\u901a\u8fc7\u8bb0\u5f55\u5386\u53f2\u7ecf\u9a8c\u6765\u5f62\u6210\u4e00\u7cfb\u5217\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u5e8f\u5217\u3002\u6211\u4eec\u5c06\u8bb0\u5fc6\u5206\u4e3a\u4e24\u7c7b\uff1a\u8de8\u8bd5\u8bb0\u5fc6\uff0c\u79ef\u7d2f\u4e8e\u591a\u6b21\u5c1d\u8bd5\u4e2d\uff1b\u4ee5\u53ca\u5355\u8bd5\u8bb0\u5fc6\uff08\u5de5\u4f5c\u8bb0\u5fc6\uff09\uff0c\u79ef\u7d2f\u4e8e\u5355\u4e00\u5c1d\u8bd5\u5185\u3002\u5c3d\u7ba1\u5173\u4e8e\u8de8\u8bd5\u8bb0\u5fc6\u4f18\u5316\u7684\u7814\u7a76\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u4f55\u901a\u8fc7\u63d0\u5347\u5de5\u4f5c\u8bb0\u5fc6\u5229\u7528\u6548\u7387\u6765\u589e\u5f3a\u4ee3\u7406\u6027\u80fd\u7684\u63a2\u7d22\u4ecd\u76f8\u5bf9\u4e0d\u8db3\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u76f4\u63a5\u5c06\u6574\u4e2a\u5386\u53f2\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5bfc\u81f4\u5728\u957f\u671f\u4efb\u52a1\u4e2d\u5b58\u5728\u5197\u4f59\u95ee\u9898\u3002\u53d7\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u7b56\u7565\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHiAgent\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5c06\u5b50\u76ee\u6807\u4f5c\u4e3a\u8bb0\u5fc6\u5757\u6765\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7684\u5de5\u4f5c\u8bb0\u5fc6\u8fdb\u884c\u5c42\u6b21\u5316\u7ba1\u7406\u3002\u5177\u4f53\u6765\u8bf4\uff0cHiAgent\u4fc3\u4f7fLLM\u5728\u751f\u6210\u6267\u884c\u52a8\u4f5c\u524d\u5148\u5236\u5b9a\u5b50\u76ee\u6807\uff0c\u5e76\u5141\u8bb8LLM\u4e3b\u52a8\u51b3\u5b9a\u66ff\u6362\u4e4b\u524d\u7684\u5b50\u76ee\u6807\uff0c\u4ec5\u4fdd\u7559\u4e0e\u5f53\u524d\u5b50\u76ee\u6807\u76f8\u5173\u7684\u52a8\u4f5c-\u89c2\u5bdf\u5bf9\u3002\u5728\u4e94\u4e2a\u957f\u671f\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHiAgent\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e86\u4e24\u500d\uff0c\u5e73\u5747\u6b65\u9aa4\u6570\u51cf\u5c11\u4e863.8\u4e2a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cHiAgent\u5728\u6574\u4e2a\u6b65\u9aa4\u4e2d\u5747\u80fd\u6301\u7eed\u6539\u5584\u6027\u80fd\uff0c\u8fd9\u51f8\u663e\u4e86\u5176\u7a33\u5065\u6027\u548c\u6cdb\u7528\u6027\u3002 \u9879\u76ee\u9875\u9762\uff1ahttps://github.com/HiAgent2024/HiAgent**|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u8f83\u5dee\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aFLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\u7684\u65b0\u9896\u591a\u6a21\u6001LLM\u57fa\u5143\u4f53\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u6709\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u9002\u5e94\u5bfc\u822a\u4efb\u52a1\uff0c\u5305\u62ec\u5355\u611f\u77e5\u8c03\u6574\u4ee5\u63cf\u8ff0\u8857\u666f\u3001\u591a\u611f\u77e5\u8c03\u6574\u4ee5\u603b\u7ed3\u8f68\u8ff9\u4ee5\u53ca\u5728VLN\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7aef\u5230\u7aef\u8bad\u7ec3\u3002\u5408\u6210\u7684\u6570\u636e\u96c6\u662f\u81ea\u52a8\u751f\u6210\u7684\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u63d0\u9ad8\u4e867.3%\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u5e76\u4ee3\u8868\u4e86\u8fc8\u5411\u5b9e\u9645\u5e94\u7528\u4e2d\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53AI\u9886\u57df\u7684\u8fdb\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\u7684\u52a0\u6301\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4f5c\u51fa\u65e5\u76ca\u81ea\u4e3b\u7684\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5229\u7528\u4e86\u201c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u201d\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u7684\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5728\u5b8c\u6210\u7ed9\u5b9a\u4efb\u52a1\u7684\u540c\u65f6\u786e\u4fdd\u5b89\u5168\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u79cd\u6279\u5224\u673a\u5236\uff0c\u4ee5\u6307\u5bfc\u4ee3\u7406\u5728\u6bcf\u4e00\u6b65\u9632\u6b62\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u9274\u4e8e\u7f3a\u4e4f\u73b0\u6709\u57fa\u51c6\u6765\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u6536\u96c6\u4e8680\u4e2a\u5de5\u5177\u5305\uff0c\u8986\u76d68\u4e2a\u7c7b\u522b\uff0c\u5171\u8ba1180\u4e2a\u573a\u666f\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u663e\u793a\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.10455": "|**2024-08-24**|**IDEA:Enhancing the Rule Learning Ability of Language Agents through Induction, Deduction, and Abduction**|Kaiyu He et.al.|[2408.10455](http://arxiv.org/abs/2408.10455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aRULEARN\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ea4\u4e92\u73af\u5883\u4e2d\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\u3002\u5728RULEARN\u4e2d\uff0c\u4ee3\u7406\u901a\u8fc7\u4e0e\u73af\u5883\u4e92\u52a8\u6536\u96c6\u89c2\u5bdf\uff0c\u5e76\u4ece\u4e2d\u63a8\u65ad\u6a21\u5f0f\uff0c\u4ee5\u6b64\u89e3\u51b3\u95ee\u9898\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u4ee3\u7406\u5728\u8be5\u57fa\u51c6\u4e0a\u7684\u5f52\u7eb3\u63a8\u7406\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86IDEA\u4ee3\u7406\uff0c\u5b83\u7ed3\u5408\u4e86\u5f52\u7eb3\u3001\u6f14\u7ece\u548c\u6eaf\u56e0\u4e09\u79cd\u63a8\u7406\u8fc7\u7a0b\u3002IDEA\u4ee3\u7406\u901a\u8fc7\u7ed3\u6784\u5316\u63a8\u7406\u5e8f\u5217\u63d0\u5347\u8fd9\u4e00\u65b9\u6cd5\uff1a\u9996\u5148\u901a\u8fc7\u6eaf\u56e0\u751f\u6210\u5047\u8bbe\uff0c\u7136\u540e\u901a\u8fc7\u6f14\u7ece\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\uff0c\u6700\u540e\u6839\u636e\u53cd\u9988\u8fdb\u884c\u9002\u5e94\u6027\u4fee\u6b63\u3002\u8fd9\u79cd\u5e8f\u5217\u4f7f\u4ee3\u7406\u80fd\u591f\u52a8\u6001\u5efa\u7acb\u5e76\u5e94\u7528\u89c4\u5219\uff0c\u6a21\u4eff\u4eba\u7c7b\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf9\u4e94\u79cd\u4ee3\u8868\u6027LLM\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u751f\u6210\u5408\u7406\u7684\u521d\u59cb\u5047\u8bbe\uff0c\u4f46\u5728\u73af\u5883\u5185\u7684\u6218\u7565\u4e92\u52a8\u3001\u6709\u6548\u6574\u5408\u53cd\u9988\u4ee5\u53ca\u5047\u8bbe\u7684\u9002\u5e94\u6027\u4fee\u6b63\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u800cIDEA\u4ee3\u7406\u5728RULEARN\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4e3a\u6211\u4eec\u5f00\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5b9e\u73b0\u7c7b\u4f3c\u4eba\u7c7b\u89c4\u5219\u5b66\u4e60\u80fd\u529b\u7684\u4ee3\u7406\u63d0\u4f9b\u4e86\u5b9d\u8d35\u89c1\u89e3\u3002\u6211\u4eec\u5c06\u4f1a\u53d1\u5e03\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2408.12142": "|**2024-08-22**|**MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents**|Congchi Yin et.al.|[2408.12142](http://arxiv.org/abs/2408.12142)|**[link](https://github.com/lemonsis/mdd-5k)**|**\u5728\u5927\u591a\u6570\u7cbe\u795e\u75be\u75c5\u8bca\u65ad\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u662f\u4e3b\u8981\u7684\u8bca\u65ad\u4f9d\u636e\u3002\u521b\u5efa\u8fd9\u6837\u7684\u8bca\u65ad\u5bf9\u8bdd\u6570\u636e\u96c6\u6709\u671b\u63a8\u52a8AI\u7cbe\u795e\u5065\u5eb7\u62a4\u7406\u9886\u57df\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5728\u5b9e\u9645\u8bca\u65ad\u573a\u666f\u4e2d\u6536\u96c6\u5bf9\u8bdd\u6781\u4e3a\u56f0\u96be\uff0c\u539f\u56e0\u5728\u4e8e\u9690\u79c1\u548c\u4f26\u7406\u8003\u8651\u7684\u4e25\u683c\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u5229\u7528\u6613\u4e8e\u83b7\u53d6\u7684\u533f\u540d\u60a3\u8005\u6848\u4f8b\u6765\u5408\u6210\u8bca\u65ad\u5bf9\u8bdd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u795e\u7ecf\u7b26\u53f7\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5408\u6210\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u5bf9\u8bdd\u3002\u8be5\u6846\u67b6\u4ee5\u60a3\u8005\u6848\u4f8b\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u80fd\u591f\u751f\u6210\u9488\u5bf9\u5355\u4e2a\u60a3\u8005\u6848\u4f8b\u7684\u591a\u4e2a\u591a\u6837\u5316\u7684\u5bf9\u8bdd\uff0c\u5176\u57fa\u672c\u8fc7\u7a0b\u6d89\u53ca\u533b\u751f\u4ee3\u7406\u4e0e\u60a3\u8005\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\uff0c\u5e76\u901a\u8fc7\u5de5\u5177\u4ee3\u7406\u5b9e\u73b0\u57fa\u4e8e\u7b26\u53f7\u63a7\u5236\u7684\u6587\u672c\u751f\u6210\uff0c\u501f\u52a9\u52a8\u6001\u8bca\u65ad\u6811\u3002\u901a\u8fc7\u5e94\u7528\u63d0\u51fa\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5305\u542b1000\u4e2a\u6e05\u6d17\u8fc7\u7684\u5b9e\u9645\u60a3\u8005\u6848\u4f8b\u3001\u4e0e\u4e00\u5bb6\u9886\u5148\u7684\u7cbe\u795e\u75c5\u533b\u9662\u5408\u4f5c\u6784\u5efa\u7684\u4e2d\u56fd\u6700\u5927\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u6570\u636e\u96c6MDD-5k\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e865000\u4e2a\u9ad8\u8d28\u91cf\u7684\u957f\u5bf9\u8bdd\u53ca\u5176\u8bca\u65ad\u7ed3\u679c\u6807\u7b7e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u5305\u542b\u4e2d\u6587\u7cbe\u795e\u969c\u788d\u8bca\u65ad\u7ed3\u679c\u7684\u6807\u8bb0\u6570\u636e\u96c6\u3002\u4eba\u7c7b\u8bc4\u4f30\u8868\u660e\uff0c\u63d0\u51fa\u7684MDD-5k\u6570\u636e\u96c6\u6210\u529f\u6a21\u62df\u4e86\u7cbe\u795e\u969c\u788d\u7684\u8bca\u65ad\u8fc7\u7a0b\u3002\u6570\u636e\u96c6\u548c\u4ee3\u7801\u5c06\u5728https://github.com/lemonsis/MDD-5k\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2408.12680": "|**2024-09-01**|**Can LLMs Understand Social Norms in Autonomous Driving Games?**|Boxuan Wang et.al.|[2408.12680](http://arxiv.org/abs/2408.12680)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u4e0e\u6a21\u62df\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u793e\u4f1a\u89c4\u8303\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5c06LLM\u96c6\u6210\u5230\u81ea\u4e3b\u9a7e\u9a76\u6e38\u620f\u4e2d\u7684\u667a\u80fd\u4ee3\u7406\u89d2\u8272\u4e2d\uff0c\u6211\u4eec\u57fa\u4e8e\u6587\u672c\u63d0\u793a\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u6309\u7167\u76f8\u5173\u73af\u5883\u8bbe\u5b9a\u548c\u89c2\u5bdf\u4fe1\u606f\u505a\u51fa\u51b3\u7b56\u3002\u6211\u4eec\u7684\u6846\u67b6\u6d89\u53caLLM\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\u8fdb\u884c\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\uff0c\u4ee5\u6b64\u7814\u7a76\u4e2a\u4f53\u4ee3\u7406\u4e4b\u95f4\u793e\u4f1a\u89c4\u8303\u7684\u5f62\u6210\u3002 \u6211\u4eec\u8bbe\u8ba1\u5b9e\u9a8c\uff0c\u5229\u7528OpenAI\u804a\u5929API\uff08\u7531GPT-4.0\u63d0\u4f9b\u52a8\u529b\uff09\u5728\u65e0\u4fe1\u53f7\u4ea4\u53c9\u53e3\u6e38\u620f\u4e0e\u9ad8\u901f\u516c\u8def\u8f66\u961f\u6e38\u620f\u4e24\u79cd\u573a\u666f\u4e0b\u6a21\u62df\u4ea4\u4e92\u5e76\u8bc4\u4f30LLM\u9a71\u52a8\u4ee3\u7406\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u80fd\u591f\u5904\u7406\u9a6c\u5c14\u79d1\u592b\u6e38\u620f\u4e2d\u7684\u52a8\u6001\u73af\u5883\u53d8\u5316\uff0c\u5e76\u4e14\u5728\u4e24\u4e2a\u573a\u666f\u4e2d\uff0c\u4ee3\u7406\u95f4\u5f62\u6210\u4e86\u793e\u4f1a\u89c4\u8303\u3002 \u5728\u4ea4\u53c9\u53e3\u6e38\u620f\u4e2d\uff0c\u5f53\u9762\u4e34\u6f5c\u5728\u8f66\u7978\u65f6\uff0cLLM\u9a71\u52a8\u7684\u4ee3\u7406\u503e\u5411\u4e8e\u91c7\u53d6\u4fdd\u5b88\u7684\u9a7e\u9a76\u7b56\u7565\u3002LLM\u9a71\u52a8\u4ee3\u7406\u5728\u6e38\u620f\u4e2d\u7684\u4f18\u52bf\u5728\u4e8e\u5176\u64cd\u4f5c\u7075\u6d3b\u6027\u548c\u53ef\u5206\u6790\u6027\uff0c\u8fd9\u6709\u52a9\u4e8e\u5b9e\u9a8c\u8bbe\u8ba1\u3002|\n", "2408.14307": "|**2024-08-26**|**LLM-3D Print: Large Language Models To Monitor and Control 3D Printing**|Yayati Jadhav et.al.|[2408.14307](http://arxiv.org/abs/2408.14307)|null|\u884c\u4e1a4.0\u901a\u8fc7\u63a8\u52a8\u6570\u5b57\u5316\u8fdb\u7a0b\u5e76\u8f6c\u5411\u589e\u6750\u5236\u9020\uff08AM\uff09\uff0c\u5f7b\u5e95\u6539\u53d8\u4e86\u5236\u9020\u4e1a\u3002\u7194\u878d\u6c89\u79ef\u5efa\u6a21\uff08FDM\uff09\u4f5c\u4e3a\u5173\u952e\u7684AM\u6280\u672f\u4e4b\u4e00\uff0c\u901a\u8fc7\u9010\u5c42\u6324\u51fa\u65b9\u5f0f\u521b\u5efa\u9ad8\u5ea6\u5b9a\u5236\u3001\u6210\u672c\u6548\u76ca\u9ad8\u4e14\u6750\u6599\u6d6a\u8d39\u6781\u5c0f\u7684\u4ea7\u54c1\uff0c\u5bf9\u4f20\u7edf\u51cf\u6750\u65b9\u6cd5\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6750\u6599\u6324\u51fa\u6280\u672f\u7684\u6613\u9519\u6027\u5f80\u5f80\u9700\u8981\u4e13\u5bb6\u4ecb\u5165\u6765\u68c0\u6d4b\u548c\u7f13\u89e3\u53ef\u80fd\u4e25\u91cd\u635f\u5bb3\u4ea7\u54c1\u8d28\u91cf\u7684\u7f3a\u9677\u3002\u867d\u7136\u5df2\u5b58\u5728\u81ea\u52a8\u5316\u9519\u8bef\u68c0\u6d4b\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u4f46\u5b83\u4eec\u5728\u4e0d\u540c3D\u6253\u5370\u673a\u8bbe\u7f6e\u3001\u56fa\u4ef6\u548c\u4f20\u611f\u5668\u4e4b\u95f4\u7684\u901a\u7528\u6027\u6709\u9650\uff0c\u5e76\u4e14\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u9700\u8981\u5927\u91cf\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e3D\u6253\u5370\u6280\u672f\u76f8\u7ed3\u5408\u7684\u8fc7\u7a0b\u76d1\u63a7\u548c\u63a7\u5236\u6846\u67b6\uff0c\u65e8\u5728\u68c0\u6d4b\u548c\u89e3\u51b3\u6253\u5370\u7f3a\u9677\u3002\u8be5LLM\u901a\u8fc7\u5206\u6790\u6bcf\u5c42\u6216\u6253\u5370\u6bb5\u4e4b\u540e\u6355\u83b7\u7684\u56fe\u50cf\u6765\u8bc4\u4f30\u6253\u5370\u8d28\u91cf\uff0c\u8bc6\u522b\u6545\u969c\u6a21\u5f0f\uff0c\u5e76\u5411\u6253\u5370\u673a\u67e5\u8be2\u76f8\u5173\u53c2\u6570\u3002\u7136\u540e\uff0c\u5b83\u751f\u6210\u5e76\u6267\u884c\u7ea0\u6b63\u63aa\u65bd\u8ba1\u5212\u3002\u6211\u4eec\u901a\u8fc7\u5c06\u63d0\u51fa\u7684\u6846\u67b6\u7684\u6709\u6548\u6027\u4e0e\u4e00\u7ec4\u5177\u6709\u4e0d\u540cAM\u4e13\u4e1a\u77e5\u8bc6\u7684\u5de5\u7a0b\u5e08\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u4ee5\u9a8c\u8bc1\u8bc6\u522b\u7f3a\u9677\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0d\u4ec5\u51c6\u786e\u8bc6\u522b\u5e38\u89c1\u76843D\u6253\u5370\u9519\u8bef\uff0c\u5982\u4e0d\u4e00\u81f4\u7684\u6324\u51fa\u3001\u4e1d\u72b6\u5806\u79ef\u3001\u7fd8\u66f2\u548c\u5c42\u7c98\u5408\u95ee\u9898\uff0c\u800c\u4e14\u8fd8\u80fd\u6709\u6548\u786e\u5b9a\u5bfc\u81f4\u8fd9\u4e9b\u5931\u8d25\u7684\u53c2\u6570\uff0c\u5e76\u81ea\u4e3b\u5730\u8fdb\u884c\u4fee\u6b63\uff0c\u65e0\u9700\u4efb\u4f55\u4eba\u5de5\u5e72\u9884\u3002|\n", "2408.14033": "|**2024-09-02**|**MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents**|Ruochen Li et.al.|[2408.14033](http://arxiv.org/abs/2408.14033)|**[link](https://github.com/du-nlp-lab/mlr-copilot)**|**\u673a\u5668\u5b66\u4e60\u7814\u7a76\u5bf9\u4e8e\u6280\u672f\u8fdb\u6b65\u548c\u521b\u65b0\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5e38\u5e38\u9762\u4e34\u590d\u6742\u6027\u9ad8\u3001\u5b9e\u9a8c\u5468\u671f\u957f\u4ee5\u53ca\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u7b49\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7cfb\u7edf\u6846\u67b6\u2014\u2014\u81ea\u4e3b\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLR-Copilot\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u81ea\u52a8\u751f\u6210\u5e76\u5b9e\u65bd\u7814\u7a76\u60f3\u6cd5\u6765\u63d0\u9ad8\u673a\u5668\u5b66\u4e60\u7814\u7a76\u7684\u751f\u4ea7\u529b\u3002\u8be5\u6846\u67b6\u5305\u542b\u4e09\u4e2a\u9636\u6bb5\uff1a\u7814\u7a76\u60f3\u6cd5\u751f\u6210\u3001\u5b9e\u9a8c\u5b9e\u73b0\u548c\u6267\u884c\u3002\u9996\u5148\uff0c\u901a\u8fc7\u57fa\u4e8eLLM\u7684IdeaAgent\u5229\u7528\u73b0\u6709\u7814\u7a76\u8bba\u6587\u751f\u6210\u5047\u8bbe\u548c\u5b9e\u9a8c\u8ba1\u5212\u3002\u63a5\u4e0b\u6765\uff0c\u5728\u5b9e\u73b0\u751f\u6210\u9636\u6bb5\uff0c\u5c06\u8fd9\u4e9b\u8ba1\u5212\u8f6c\u5316\u4e3a\u53ef\u6267\u884c\u4ee3\u7801\uff0c\u4f7f\u7528ExperimentAgent\u5b8c\u6210\u6b64\u8fc7\u7a0b\u3002\u6b64\u9636\u6bb5\u5229\u7528\u68c0\u7d22\u5230\u7684\u539f\u578b\u4ee3\u7801\uff0c\u5e76\u6839\u636e\u9700\u8981\u68c0\u7d22\u5019\u9009\u6a21\u578b\u548c\u6570\u636e\u3002\u6700\u540e\uff0c\u5728\u6267\u884c\u9636\u6bb5\uff0c\u4e5f\u7531ExperimentAgent\u7ba1\u7406\uff0c\u6d89\u53ca\u8fd0\u884c\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4eba\u7c7b\u53cd\u9988\u548c\u8fed\u4ee3\u8c03\u8bd5\u673a\u5236\uff0c\u4ee5\u589e\u52a0\u5b9e\u73b0\u53ef\u6267\u884c\u7814\u7a76\u6210\u679c\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5bf9\u4e94\u4e2a\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\u4e86\u8be5\u6846\u67b6\u4fc3\u8fdb\u7814\u7a76\u8fdb\u5c55\u548c\u521b\u65b0\u7684\u6f5c\u529b\u3002**|\n", "2408.13986": "|**2024-08-26**|**AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework**|Jie Feng et.al.|[2408.13986](http://arxiv.org/abs/2408.13986)|**[link](https://github.com/tsinghua-fib-lab/agentmove)**|**\u4eba\u7c7b\u79fb\u52a8\u6027\u9884\u6d4b\u5728\u5404\u79cd\u5b9e\u9645\u5e94\u7528\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u5c3d\u7ba1\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u4f46\u5b83\u4eec\u5bf9\u7528\u4e8e\u8bad\u7ec3\u7684\u5927\u91cf\u79c1\u4eba\u79fb\u52a8\u6570\u636e\u7684\u4f9d\u8d56\u4ee5\u53ca\u65e0\u6cd5\u8fdb\u884c\u96f6\u542f\u52a8\u9884\u6d4b\u7684\u80fd\u529b\uff0c\u963b\u788d\u4e86\u8fdb\u4e00\u6b65\u7684\u53d1\u5c55\u3002\u6700\u8fd1\uff0c\u6709\u4eba\u5c1d\u8bd5\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u6267\u884c\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u3002\u7136\u800c\uff0c\u4ed6\u4eec\u7684\u6027\u80fd\u53d7\u9650\u4e8e\u7f3a\u4e4f\u7cfb\u7edf\u7684\u8bbe\u8ba1\u5de5\u4f5c\u6d41\u7a0b\u3002\u4ed6\u4eec\u76f4\u63a5\u4f7f\u7528LLMs\u751f\u6210\u6700\u7ec8\u8f93\u51fa\uff0c\u8fd9\u9650\u5236\u4e86LLMs\u53d1\u73b0\u590d\u6742\u79fb\u52a8\u6a21\u5f0f\u7684\u6f5c\u529b\uff0c\u5e76\u4f4e\u4f30\u4e86\u5b83\u4eec\u5728\u5168\u7403\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u65b9\u9762\u7684\u5de8\u5927\u50a8\u5907\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAgentMove\u7684\u7cfb\u7edf\u6027\u4ee3\u7406\u9884\u6d4b\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4efb\u4f55\u5168\u7403\u57ce\u5e02\u7684\u901a\u7528\u79fb\u52a8\u6027\u9884\u6d4b\u3002\u5728AgentMove\u4e2d\uff0c\u6211\u4eec\u9996\u5148\u5c06\u79fb\u52a8\u6027\u9884\u6d4b\u4efb\u52a1\u5206\u89e3\u4e3a\u4e09\u4e2a\u5b50\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u76f8\u5e94\u7684\u6a21\u5757\u6765\u5b8c\u6210\u8fd9\u4e9b\u5b50\u4efb\u52a1\uff0c\u5305\u62ec\u4e2a\u4f53\u79fb\u52a8\u6a21\u5f0f\u6316\u6398\u7684\u7a7a\u95f4-\u65f6\u95f4\u8bb0\u5fc6\u3001\u57ce\u5e02\u7ed3\u6784\u6548\u5e94\u5bf9\u6a21\u578b\u7684\u5f71\u54cd\u7684\u5168\u7403\u77e5\u8bc6\u751f\u6210\u5668\u4ee5\u53ca\u6355\u83b7\u4eba\u53e3\u5171\u4eab\u6a21\u5f0f\u7684\u96c6\u4f53\u77e5\u8bc6\u63d0\u53d6\u5668\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06\u4e09\u4e2a\u6a21\u5757\u7684\u7ed3\u679c\u7ed3\u5408\u8d77\u6765\uff0c\u5e76\u6267\u884c\u63a8\u7406\u6b65\u9aa4\u4ee5\u751f\u6210\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u6765\u81ea\u4e24\u4e2a\u6765\u6e90\u768412\u4e2a\u57ce\u5e02\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u6700\u4f73\u57fa\u7ebf\u76f8\u6bd4\uff0cAgentMove\u5728\u5404\u79cd\u6307\u6807\u4e0a\u7684\u6027\u80fd\u63d0\u9ad8\u4e86\u8d85\u8fc78%\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57ce\u5e02\u4e2d\u663e\u793a\u51fa\u4e86\u7a33\u5065\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u4e14\u4f7f\u7528\u4e0d\u540c\u57fa\u7840\u7684LLM\u65f6\u4e5f\u80fd\u8868\u73b0\u51fa\u8272\uff0c\u4e14\u5177\u6709\u8f83\u4f4e\u7684\u5730\u7406\u504f\u89c1\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u4ee5\u5728https://github.com/tsinghua-fib-lab/AgentMove\u627e\u5230\u3002**|\n", "2408.13406": "|**2024-08-23**|**Optimizing Collaboration of LLM based Agents for Finite Element Analysis**|Chuan Tian et.al.|[2408.13406](http://arxiv.org/abs/2408.13406)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u7a0b\u548c\u7f16\u7801\u4efb\u52a1\u4e2d\u7684\u591a\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u5229\u7528AutoGen\u6846\u67b6\u4fc3\u8fdb\u4ee3\u7406\u4e4b\u95f4\u7684\u6c9f\u901a\uff0c\u5e76\u57fa\u4e8e\u6bcf\u79cd\u8bbe\u7f6e\u768440\u6b21\u968f\u673a\u8fd0\u884c\u7684\u6210\u529f\u7387\u8bc4\u4f30\u4e0d\u540c\u7684\u914d\u7f6e\u3002\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u5f00\u53d1\u4e00\u4e2a\u7075\u6d3b\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u7528\u4e8e\u5c06\u6709\u9650\u5143\u65b9\u6cd5\u5e94\u7528\u4e8e\u89e3\u51b3\u7ebf\u6027\u5f39\u6027\u95ee\u9898\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u4f18\u5316\u4ee3\u7406\u89d2\u8272\u53ca\u5176\u660e\u786e\u804c\u8d23\u7684\u91cd\u8981\u6027\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u589e\u52a0\u4ee3\u7406\u6570\u91cf\u3002\u4ee3\u7406\u95f4\u7684\u6709\u6548\u534f\u4f5c\u88ab\u8bc1\u660e\u5bf9\u4e8e\u89e3\u51b3\u6709\u9650\u5143\u65b9\u6cd5\u7684\u4e00\u822c\u6311\u6218\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u9879\u7814\u7a76\u5c55\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u589e\u5f3a\u8ba1\u7b97\u81ea\u52a8\u5316\u5728\u6a21\u62df\u65b9\u6cd5\u5b66\u4e2d\u7684\u6f5c\u529b\uff0c\u4e3a\u5de5\u7a0b\u548c\u4eba\u5de5\u667a\u80fd\u7684\u672a\u6765\u8fdb\u5c55\u94fa\u5e73\u9053\u8def\u3002|\n", "2408.14972": "|**2024-08-27**|**AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems**|Chi-Min Chan et.al.|[2408.14972](http://arxiv.org/abs/2408.14972)|**[link](https://github.com/chanchimin/agentmonitor)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u52a8\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5174\u8d77\u3002\u8fd1\u671f\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\uff08MAS\uff09\u4e2d\uff0c\u6bcf\u4e2a\u4ee3\u7406\u6267\u884c\u7279\u5b9a\u89d2\u8272\u65f6\uff0c\u5176\u6027\u80fd\u901a\u5e38\u4f18\u4e8e\u5355\u4e00LLM\u3002\u7136\u800c\uff0c\u914d\u7f6eMAS\u4ee5\u5b8c\u6210\u4efb\u52a1\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u4efb\u52a1\u8868\u73b0\u4ec5\u5728\u6267\u884c\u540e\u624d\u80fd\u89c2\u5bdf\u5230\u3002\u53d7\u5230LLM\u5f00\u53d1\u4e2d\u7684\u89c4\u6a21\u6cd5\u5219\u542f\u53d1\uff0c\u6211\u4eec\u63a2\u7d22\u662f\u5426\u80fd\u5728\u4efb\u52a1\u6267\u884c\u524d\u9884\u6d4bMAS\u7684\u6027\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentMonitor\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5728\u4ee3\u7406\u5c42\u7ea7\u96c6\u6210\uff0c\u7528\u4e8e\u6355\u83b7\u8f93\u5165\u548c\u8f93\u51fa\u4fe1\u606f\uff0c\u5e76\u5c06\u8fd9\u4e9b\u4fe1\u606f\u8f6c\u6362\u4e3a\u7edf\u8ba1\u6570\u636e\uff0c\u7528\u4e8e\u8bad\u7ec3\u56de\u5f52\u6a21\u578b\u9884\u6d4b\u4efb\u52a1\u6027\u80fd\u3002\u6b64\u5916\uff0cAgentMonitor\u8fd8\u80fd\u591f\u5b9e\u65f6\u5bf9\u53ef\u80fd\u7531\u6076\u610f\u4ee3\u7406\u5f15\u53d1\u7684\u5b89\u5168\u98ce\u9669\u8fdb\u884c\u7ea0\u6b63\uff0c\u4ece\u800c\u51cf\u8f7b\u8d1f\u9762\u5f71\u54cd\u5e76\u589e\u5f3aMAS\u7684\u5b89\u5168\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528XGBoost\u6a21\u578b\u5728\u9886\u57df\u5185\u573a\u666f\u4e0b\u8fbe\u52300.89\u7684\u65af\u76ae\u5c14\u66fc\u76f8\u5173\u7cfb\u6570\uff0c\u5728\u66f4\u5177\u6311\u6218\u6027\u7684\u573a\u666f\u4e0b\u8fbe\u52300.58\u3002\u901a\u8fc7\u5e94\u7528AgentMonitor\uff0c\u6709\u5bb3\u5185\u5bb9\u51cf\u5c11\u4e866.2%\uff0c\u6709\u76ca\u5185\u5bb9\u5e73\u5747\u589e\u52a0\u4e861.8%\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u548c\u53ef\u9760\u6027\u3002\u76f8\u5173\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\u3002**|\n", "2408.15778": "|**2024-09-05**|**LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models**|Jiayi Gui et.al.|[2408.15778](http://arxiv.org/abs/2408.15778)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLogicGame\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89c4\u5219\u7406\u89e3\u548c\u6267\u884c\u3001\u591a\u6b65\u89c4\u5212\u65b9\u9762\u7684\u5168\u9762\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0cLogicGame\u63d0\u4f9b\u4e86\u591a\u79cd\u6e38\u620f\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u7cfb\u5217\u89c4\u5219\u4ee5\u53ca\u521d\u59cb\u72b6\u6001\uff0c\u8981\u6c42\u6a21\u578b\u7406\u89e3\u5e76\u5e94\u7528\u9884\u5b9a\u4e49\u89c4\u5219\u6765\u89e3\u51b3\u95ee\u9898\u3002\u6211\u4eec\u521b\u5efa\u4e86\u6a21\u62df\u60c5\u666f\uff0c\u8ba9\u6a21\u578b\u6267\u884c\u6216\u89c4\u5212\u64cd\u4f5c\u4ee5\u8fbe\u5230\u7279\u5b9a\u76ee\u6807\u3002\u8fd9\u4e9b\u6e38\u620f\u573a\u666f\u4e13\u95e8\u8bbe\u8ba1\u4ee5\u533a\u5206\u903b\u8f91\u63a8\u7406\u4e0e\u4ec5\u4f9d\u8d56\u77e5\u8bc6\u7684\u80fd\u529b\uff0c\u5b8c\u5168\u4f9d\u8d56\u4e8e\u9884\u8bbe\u89c4\u5219\u3002\u8fd9\u79cd\u5206\u79bb\u5141\u8bb8\u5bf9\u57fa\u4e8e\u89c4\u5219\u7684\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u7eaf\u7cb9\u7684\u8bc4\u4f30\u3002\u8bc4\u4f30\u4e0d\u4ec5\u8003\u8651\u6700\u7ec8\u7ed3\u679c\uff0c\u8fd8\u8003\u8651\u4e2d\u95f4\u6b65\u9aa4\uff0c\u63d0\u4f9b\u6a21\u578b\u6027\u80fd\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u4e2d\u95f4\u6b65\u9aa4\u662f\u786e\u5b9a\u6027\u7684\uff0c\u5e76\u4e14\u53ef\u4ee5\u81ea\u52a8\u9a8c\u8bc1\u3002LogicGame\u5b9a\u4e49\u4e86\u4ece\u7b80\u5355\u89c4\u5219\u5e94\u7528\u5230\u590d\u6742\u63a8\u7406\u94fe\u7684\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u6e38\u620f\u573a\u666f\uff0c\u4ee5\u7cbe\u786e\u8bc4\u4f30\u6a21\u578b\u5728\u89c4\u5219\u7406\u89e3\u548c\u591a\u6b65\u6267\u884c\u4e0a\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528LogicGame\uff0c\u6211\u4eec\u6d4b\u8bd5\u4e86\u5404\u79cdLLM\uff0c\u5e76\u53d1\u73b0\u4e86\u5b83\u4eec\u5728\u57fa\u4e8e\u89c4\u5219\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u663e\u8457\u4e0d\u8db3\u3002|\n", "2408.16090": "|**2024-08-28**|**EPO: Hierarchical LLM Agents with Environment Preference Optimization**|Qi Zhao et.al.|[2408.16090](http://arxiv.org/abs/2408.16090)|**[link](https://github.com/kevinz8866/epo)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u5c42\u6846\u67b6\uff0c\u7528\u4e8e\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u5b50\u76ee\u6807\u7684\u95ee\u9898\u3002\u6846\u67b6\u4f7f\u7528\u4e86\u72ec\u7acb\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5b50\u76ee\u6807\u9884\u6d4b\u548c\u4f4e\u7ea7\u52a8\u4f5c\u751f\u6210\u3002\u9488\u5bf9\u65e0\u6807\u6ce8\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u4fe1\u53f7\u521b\u5efa\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u5229\u7528\u73af\u5883\u591a\u6a21\u6001\u53cd\u9988\u81ea\u52a8\u751f\u6210\u5956\u52b1\u4fe1\u53f7\u3002\u6211\u4eec\u5f15\u5165\u4e86\u73af\u5883\u504f\u597d\u4f18\u5316\uff08EPO\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4ece\u73af\u5883\u53cd\u9988\u4e2d\u751f\u6210\u504f\u597d\u4fe1\u53f7\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u4fe1\u53f7\u8bad\u7ec3\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u3002ALFRED\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u6027\u80fd\u4e0a\u5904\u4e8e\u9886\u5148\u5730\u4f4d\uff0c\u9996\u6b21\u767b\u4e0a\u4e86ALFRED\u516c\u5f00\u6392\u884c\u699c\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5728\u4e0d\u540c\u73af\u5883\u4e2d\u7684\u957f\u671f\u51b3\u7b56\u5236\u5b9a\u80fd\u529b\u7684\u63d0\u5347\u6f5c\u529b\u3002|\n", "2408.16991": "|**2024-08-30**|**Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios**|Zhongyuan Wang et.al.|[2408.16991](http://arxiv.org/abs/2408.16991)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5de5\u5177\u8f85\u52a9\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8eSQL\u68c0\u67e5\u548c\u6539\u8fdb\uff0c\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5904\u7406\u73b0\u5b9e\u4e16\u754c\u67e5\u8be2\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u4e3aLLM\u4ee3\u7406\u914d\u5907\u4e24\u4e2a\u4e13\u95e8\u5de5\u5177\u2014\u2014\u68c0\u7d22\u5668\u548c\u68c0\u6d4b\u5668\uff0c\u4ee5\u8bca\u65ad\u5e76\u4fee\u6b63SQL\u67e5\u8be2\u4e2d\u7684\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u3002\u8fd9\u4e9b\u5de5\u5177\u80fd\u591f\u589e\u5f3aLLM\u5904\u7406\u771f\u5b9e\u573a\u666f\u4e2d\u51fa\u73b0\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u548c\u4e25\u683c\u7ea6\u675f\u4e0d\u5339\u914d\u7b49\u6570\u636e\u5e93\u4e0d\u5339\u914d\u95ee\u9898\u7684\u80fd\u529b\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86Spider-Mismatch\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u4e3a\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u4e2d\u9047\u5230\u7684\u6761\u4ef6\u4e0d\u5339\u914d\u95ee\u9898\u800c\u6784\u5efa\u7684\u65b0\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5c11\u91cf\u793a\u4f8b\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u548cSpider-Realistic\u6570\u636e\u96c6\u4e0a\u7684\u5e73\u5747\u8868\u73b0\u6700\u4f73\uff0c\u5e76\u4e14\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5728\u66f4\u5177\u6709\u73b0\u5b9e\u6027\u7684\u6570\u636e\u96c6Spider-Mismatch\u4e0a\u4e5f\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.00993": "|**2024-09-02**|**Evolution of Social Norms in LLM Agents using Natural Language**|Ilya Horiguchi et.al.|[2409.00993](http://arxiv.org/abs/2409.00993)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u6fc0\u53d1\u4e86\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u6e38\u620f\u7406\u8bba\u6a21\u62df\u7684\u5174\u8da3\uff0c\u5728\u8fd9\u4e9b\u6a21\u62df\u4e2d\uff0cLLM\u5145\u5f53\u4e2a\u4f53\u4ee3\u7406\uff0c\u8fdb\u884c\u793e\u4f1a\u4e92\u52a8\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5bf9\u8bdd\u4f7fLLM\u4ee3\u7406\u81ea\u53d1\u751f\u6210\u5e76\u9075\u5b88\u89c4\u8303\u7b56\u7565\u7684\u53ef\u80fd\u6027\uff0c\u4ee5\u6b64\u4e3a\u57fa\u7840\uff0c\u63a2\u7d22\u4e86\u5bf9Axelrod\u7684\u5143\u89c4\u8303\u6e38\u620f\u5de5\u4f5c\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u5bf9\u8bdd\uff0cLLM\u4ee3\u7406\u80fd\u591f\u4ec5\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u5f62\u6210\u590d\u6742\u7684\u793e\u4ea4\u89c4\u8303\uff0c\u5982\u5143\u89c4\u8303\u2014\u2014\u89c4\u8303\u60e9\u7f5a\u4e0d\u60e9\u7f5a\u4f5c\u5f0a\u884c\u4e3a\u7684\u89c4\u8303\u3002\u7ed3\u679c\u8bc1\u5b9e\u4e86\u4f7f\u7528LLM\u4ee3\u7406\u6a21\u62df\u793e\u4f1a\u4e92\u52a8\u548c\u7406\u89e3\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u6f14\u5316\u51fa\u590d\u6742\u7b56\u7565\u4e0e\u89c4\u8303\u7684\u6709\u6548\u6027\u3002\u672a\u6765\u7684\u5de5\u4f5c\u53ef\u80fd\u901a\u8fc7\u6269\u5c55\u5230\u66f4\u5e7f\u6cdb\u7684\u573a\u666f\u548c\u4ee3\u7406\u7279\u5f81\uff0c\u63ed\u793a\u66f4\u591a\u5173\u4e8e\u793e\u4f1a\u89c4\u8303\u5f62\u6210\u7684\u5fae\u5999\u673a\u5236\u3002|\n", "2409.00985": "|**2024-09-02**|**Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces**|Jiapeng Yu et.al.|[2409.00985](http://arxiv.org/abs/2409.00985)|**[link](https://github.com/yuqian2003/co_learning)**|**\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5728\u7ebf\u95ee\u7b54\u7cfb\u7edf\u4ece\u5a31\u4e50\u7528\u9014\u9010\u6e10\u8f6c\u5411\u4e13\u4e1a\u9886\u57df\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7801\u5b66\u4e60\uff08Co-Learning\uff09\u793e\u533a\u201d\u7684\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u7ed3\u5408\u73af\u5883\u5f3a\u5316\u5b66\u4e60\uff08E-RL\uff09\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u72ec\u7acb\u4fee\u6b63\u4ee3\u7801\u9519\u8bef\u3002\u8be5\u7cfb\u7edf\u901a\u8fc7\u4e00\u4e2a\u5305\u542b702\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u539f\u59cb\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3aE-RL\u5956\u52b1\u6216\u60e9\u7f5a\u7684\u6807\u51c6\u3002\u901a\u8fc7\u5206\u6790\u5f53\u524d\u4ee3\u7406\u8f93\u5165\u7684\u9519\u8bef\u4ee3\u7801\uff0c\u9009\u62e9\u5408\u9002\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u4ee5\u5b9e\u73b0\u6700\u4f73\u7684\u9519\u8bef\u4fee\u6b63\u51c6\u786e\u7387\u5e76\u51cf\u5c11\u4fee\u6b63\u65f6\u95f4\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u65e0E-RL\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8be5\u65b9\u6cd5\u5728\u7cbe\u786e\u5ea6\u5f97\u5206\u4e0a\u63d0\u9ad8\u4e863%\uff0c\u5728\u65f6\u95f4\u6210\u672c\u4e0a\u964d\u4f4e\u4e8615%\u3002\u6211\u4eec\u7684\u6e90\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://github.com/yuqian2003/Co_Learning**|\n", "2409.00135": "|**2024-08-29**|**HoneyComb: A Flexible LLM-Based Agent System for Materials Science**|Huan Zhang et.al.|[2409.00135](http://arxiv.org/abs/2409.00135)|null|\u4e3a\u4e86\u5e94\u5bf9\u6750\u6599\u79d1\u5b66\u4efb\u52a1\u4e2d\u7684\u590d\u6742\u6027\u5e76\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5e94\u7528\u65f6\u6240\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5982\u4f9d\u8d56\u8fc7\u65f6\u7684\u9690\u6027\u77e5\u8bc6\u5bfc\u81f4\u7684\u51c6\u786e\u6027\u4e0b\u964d\u548c\u5e7b\u89c9\u73b0\u8c61\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HoneyComb\u2014\u2014\u9996\u4e2a\u4e13\u95e8\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684LLM\u4ee3\u7406\u7cfb\u7edf\u3002HoneyComb\u901a\u8fc7\u5229\u7528\u4e00\u4e2a\u57fa\u4e8e\u53ef\u9760\u6587\u732e\u7684\u9ad8\u8d28\u91cf\u6750\u6599\u79d1\u5b66\u77e5\u8bc6\u5e93\uff08MatSciKB\uff09\u548c\u4e00\u79cd\u521b\u65b0\u7684\u5de5\u5177\u96c6\uff08ToolHub\uff09\uff0c\u589e\u5f3a\u5176\u9488\u5bf9\u6750\u6599\u79d1\u5b66\u7279\u6709\u7684\u63a8\u7406\u4e0e\u8ba1\u7b97\u80fd\u529b\u3002 MatSciKB\u662f\u4e00\u4e2a\u7ecf\u8fc7\u7cbe\u5fc3\u7f16\u7e82\u3001\u7ed3\u6784\u5316\u7684\u77e5\u8bc6\u96c6\u5408\uff0c\u65e8\u5728\u6db5\u76d6\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5173\u952e\u4fe1\u606f\u3002\u800cToolHub\u5219\u91c7\u7528\u4e86\u4e00\u79cd\u5f52\u7eb3\u5f0f\u5de5\u5177\u6784\u5efa\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u3001\u5206\u89e3\u548c\u4f18\u5316\u9002\u7528\u4e8e\u6750\u6599\u79d1\u5b66\u7684API\u5de5\u5177\uff0c\u4ece\u800c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u7cfb\u7edf\u7684\u5b9e\u7528\u6027\u3002\u6b64\u5916\uff0cHoneyComb\u8fd8\u914d\u5907\u4e86\u4e00\u4e2a\u68c0\u7d22\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u667a\u80fd\u9009\u62e9\u6700\u5408\u9002\u7684\u77e5\u8bc6\u6765\u6e90\u6216\u5de5\u5177\uff0c\u786e\u4fdd\u4e86\u7b54\u6848\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cHoneyComb\u5728\u6750\u6599\u79d1\u5b66\u9886\u57df\u7684\u5404\u79cd\u4efb\u52a1\u4e0a\u5747\u8868\u73b0\u51fa\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u7684\u80fd\u529b\uff0c\u6210\u529f\u5730\u5f25\u5408\u4e86\u5f53\u524dLLM\u6280\u672f\u4e0e\u6750\u6599\u79d1\u5b66\u7279\u5b9a\u9700\u6c42\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u66f4\u4e3a\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u7684\u53ef\u6269\u5c55\u6846\u67b6\u6613\u4e8e\u6269\u5c55\u81f3\u5176\u4ed6\u79d1\u5b66\u9886\u57df\uff0c\u5c55\u793a\u4e86\u5176\u5728\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u548c\u5e94\u7528\u53d1\u5c55\u65b9\u9762\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u800c\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u7406\u5ff5\uff0c\u5373\u8bd7\u6b4c\u751f\u6210\u7cfb\u7edf\u7684\u5b66\u4e60\u8fc7\u7a0b\u5e94\u66f4\u52a0\u4eba\u6027\u5316\uff0c\u5e76\u4e14\u5176\u8f93\u51fa\u66f4\u52a0\u591a\u6837\u548c\u65b0\u9896\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u5f3a\u8c03\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u4e4b\u5916\u7684\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u5229\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\uff08GPT-3\u548cGPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u591a\u4ee3\u7406\u7cfb\u7edf\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u5e26\u6765\u4e86\u597d\u5904\uff0c\u5bfc\u81f4n-gram\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\u3002\u57fa\u4e8e\u8bad\u7ec3\u7684\u4ee3\u7406\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u4e0a\u8868\u73b0\u51fa\u7fa4\u4f53\u5206\u5316\u3002\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u53d7\u76ca\uff0c\u5e76\u4e14\u5177\u6709\u975e\u540c\u8d28\u4ee3\u7406\u7684\u66f4\u591a\u6837\u5316\u7684\u6a21\u578b\u96c6\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u4ee3\u7406\u663e\u793a\u51fa\u968f\u7740\u65f6\u95f4\u63a8\u79fb\uff0c\u8bcd\u6c47\u591a\u6837\u6027\u51cf\u5c11\uff0c\u5e76\u4e14\u6ca1\u6709\u8868\u73b0\u51fa\u9884\u671f\u7684\u7fa4\u4f53\u5206\u5316\u610f\u56fe\u7684\u793e\u4f1a\u7f51\u7edc\u3002\u6211\u4eec\u7684\u8bba\u6587\u4e3b\u5f20\uff0c\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u5c06\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5efa\u6a21\uff09\u7eb3\u5165\u8003\u8651\u8303\u56f4\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u7684\u4ea4\u4e92\u65b9\u5f0f\u3002**|\n", "2409.03440": "|**2024-09-05**|**Rx Strategist: Prescription Verification using LLM Agents System**|Phuc Phan Van et.al.|[2409.03440](http://arxiv.org/abs/2409.03440)|null|\u4e3a\u4e86\u4fdd\u969c\u60a3\u8005\u5b89\u5168\uff0c\u73b0\u4ee3\u836f\u7269\u590d\u6742\u6027\u8981\u6c42\u4e25\u683c\u5904\u65b9\u9a8c\u8bc1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014Rx Strategist\uff0c\u5b83\u5229\u7528\u77e5\u8bc6\u56fe\u8c31\u548c\u4e0d\u540c\u7684\u641c\u7d22\u7b56\u7565\uff0c\u7ed3\u5408\u4ee3\u7406\u6846\u67b6\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4ee5\u589e\u5f3a\u5176\u80fd\u529b\u3002\u8fd9\u79cd\u591a\u7ef4\u5ea6\u7684\u6280\u672f\u5141\u8bb8\u6784\u5efa\u4e00\u4e2a\u591a\u9636\u6bb5\u7684LLM\u7ba1\u9053\uff0c\u5e76\u4ece\u81ea\u5b9a\u4e49\u6d3b\u6027\u6210\u5206\u6570\u636e\u5e93\u4e2d\u53ef\u9760\u5730\u68c0\u7d22\u4fe1\u606f\u3002\u8be5\u7ba1\u9053\u8986\u76d6\u4e86\u5904\u65b9\u9a8c\u8bc1\u7684\u4e0d\u540c\u65b9\u9762\uff0c\u5982\u9002\u5e94\u75c7\u3001\u5242\u91cf\u548c\u53ef\u80fd\u7684\u836f\u7269\u76f8\u4e92\u4f5c\u7528\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u5305\u542b\u4e86\u8fd9\u4e9b\u65b9\u9762\u7684\u5185\u5bb9\u3002 \u901a\u8fc7\u5728\u8fd9\u4e9b\u9636\u6bb5\u5206\u6563\u63a8\u7406\uff0c\u6211\u4eec\u7f13\u89e3\u4e86\u5355\u4e00LLM\u6280\u672f\u7684\u7f3a\u70b9\uff0c\u63d0\u9ad8\u4e86\u6b63\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cRx Strategist\u8d85\u8d8a\u4e86\u8bb8\u591a\u5f53\u524d\u7684LLMs\uff0c\u5176\u6027\u80fd\u4e0e\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e34\u5e8a\u836f\u5e08\u76f8\u5f53\u3002\u5728\u73b0\u4ee3\u836f\u7269\u7684\u590d\u6742\u4e16\u754c\u4e2d\uff0c\u5c06LLMs\u4e0e\u7ec4\u7ec7\u5316\u77e5\u8bc6\u548c\u9ad8\u7ea7\u641c\u7d22\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u51cf\u5c11\u5904\u65b9\u9519\u8bef\u5e76\u63d0\u9ad8\u60a3\u8005\u7ed3\u679c\u7684\u53ef\u884c\u9014\u5f84\u3002|\n", "2409.03258": "|**2024-09-05**|**GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding**|Yukun Cao et.al.|[2409.03258](http://arxiv.org/abs/2409.03258)|null|\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u56fe\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5728\u901a\u8fc7\u63cf\u8ff0\u5e8f\u5217\u7684\u56fe\u8bf4\u660e\u6765\u7406\u89e3\u56fe\u5f62\u7ed3\u6784\u4fe1\u606f\u65f6\uff0c\u5c24\u5176\u662f\u5728\u56fe\u7684\u5927\u5c0f\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u9047\u5230\u4e86\u6311\u6218\u3002\u6211\u4eec\u5f52\u56e0\u4e8eLLMs\u5728\u56fe\u63cf\u8ff0\u5e8f\u5217\u7684\u4e0d\u540c\u4f4d\u7f6e\u4e0a\u5b58\u5728\u4e0d\u5747\u5300\u7684\u8bb0\u5fc6\u6027\u80fd\uff0c\u5373\u6240\u8c13\u7684\u201c\u4f4d\u7f6e\u504f\u89c1\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GraphInsight\uff0c\u4e00\u4e2a\u65e8\u5728\u63d0\u9ad8LLMs\u5bf9\u5b8f\u89c2\u548c\u5fae\u89c2\u56fe\u5f62\u4fe1\u606f\u7406\u89e3\u7684\u65b0\u6846\u67b6\u3002GraphInsight\u57fa\u4e8e\u4e24\u4e2a\u5173\u952e\u7b56\u7565\uff1a1\uff09\u5c06\u5173\u952e\u56fe\u5f62\u4fe1\u606f\u653e\u7f6e\u5728LLMs\u8868\u73b0\u51fa\u66f4\u5f3a\u8bb0\u5fc6\u6027\u80fd\u7684\u4f4d\u7f6e\uff1b2\uff09\u5bf9\u4e8e\u8bb0\u5fc6\u6027\u80fd\u8f83\u5f31\u7684\u533a\u57df\uff0c\u63a2\u7d22\u4f7f\u7528\u8f7b\u91cf\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e93\uff0c\u7075\u611f\u6765\u81ea\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u3002\u6b64\u5916\uff0cGraphInsight\u8fd8\u63a2\u7d22\u4e86\u5c06\u8fd9\u4e24\u79cd\u7b56\u7565\u96c6\u6210\u5230LLM\u4ee3\u7406\u6d41\u7a0b\u4e2d\uff0c\u4ee5\u89e3\u51b3\u9700\u8981\u591a\u6b65\u63a8\u7406\u7684\u590d\u5408\u56fe\u4efb\u52a1\u3002\u5e7f\u6cdb\u7684\u57fa\u51c6\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e0d\u540c\u5927\u5c0f\u7684\u56fe\u5f62\u7ed3\u6784\u7406\u89e3\u4efb\u52a1\u4e0a\uff0cGraphInsight\u663e\u8457\u8d85\u8d8a\u4e86\u6240\u6709\u5176\u4ed6\u56fe\u63cf\u8ff0\u65b9\u6cd5\uff08\u4f8b\u5982\u63d0\u793a\u6280\u672f\u3001\u91cd\u65b0\u6392\u5e8f\u7b56\u7565\u7b49\uff09\u3002|\n", "2409.02977": "|**2024-09-04**|**Large Language Model-Based Agents for Software Engineering: A Survey**|Junwei Liu et.al.|[2409.02977](http://arxiv.org/abs/2409.02977)|**[link](https://github.com/fudanselab/agent4se-paper-list)**|**\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u7bc7\u5168\u9762\u4e14\u7cfb\u7edf\u7684\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u4e2d\u7684\u5e94\u7528\u7684\u7efc\u8ff0\u3002\u6211\u4eec\u6536\u96c6\u4e86106\u7bc7\u8bba\u6587\uff0c\u5e76\u4ece\u4e24\u4e2a\u89d2\u5ea6\u8fdb\u884c\u5206\u7c7b\uff0c\u5373\u8f6f\u4ef6\u5de5\u7a0b\u89c6\u89d2\u548c\u4ee3\u7406\u89c6\u89d2\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6b64\u7efc\u8ff0\u7684\u4ed3\u5e93\u5730\u5740\u4e3a\uff1ahttps://github.com/FudanSELab/Agent4SE-Paper-List\u3002**|\n", "2409.05001": "|**2024-09-08**|**A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement**|Huan Zhang et.al.|[2409.05001](http://arxiv.org/abs/2409.05001)|**[link](https://github.com/nju-websoft/paircoder)**|**\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u77a9\u76ee\u7684\u6027\u80fd\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u63d0\u793a\u6280\u672f\u53ca\u4ee3\u7801\u7cbe\u70bc\u5bf9LLM\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u7f16\u7a0b\u95ee\u9898\u65f6\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u95ee\u9898\u5f80\u5f80\u5177\u6709\u50f5\u5316\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPairCoder\u7684\u65b0\u578bLLM\u57fa\u6846\u67b6\uff0c\u65e8\u5728\u6a21\u4eff\u53cc\u4eba\u534f\u4f5c\u7f16\u7a0b\u5b9e\u8df5\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 PairCoder\u7531\u4e24\u4e2a\u534f\u4f5c\u7684LLM\u4ee3\u7406\u7ec4\u6210\uff1a\u5bfc\u822a\u5458\uff08Navigator\uff09\u548c\u9a7e\u9a76\u5458\uff08Driver\uff09\u3002\u5bfc\u822a\u5458\u8d1f\u8d23\u63d0\u51fa\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u8ba1\u5212\u3001\u9009\u62e9\u5f53\u524d\u6700\u4f73\u8ba1\u5212\uff0c\u5e76\u6839\u636e\u6267\u884c\u53cd\u9988\u6307\u5bfc\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u9a7e\u9a76\u5458\u5219\u9075\u5faa\u5bfc\u822a\u5458\u7684\u6307\u5f15\uff0c\u8fdb\u884c\u521d\u59cb\u4ee3\u7801\u751f\u6210\u3001\u4ee3\u7801\u6d4b\u8bd5\u548c\u4f18\u5316\u3002 \u8fd9\u79cd\u4ea4\u66ff\u548c\u8fed\u4ee3\u7684\u5de5\u4f5c\u6d41\u7a0b\u5305\u62ec\u591a\u8ba1\u5212\u63a2\u7d22\u548c\u57fa\u4e8e\u53cd\u9988\u7684\u7ec6\u5316\uff0c\u6a21\u62df\u4e86\u53cc\u4eba\u7a0b\u5e8f\u5458\u7684\u5408\u4f5c\u65b9\u5f0f\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u548c\u95ed\u6e90\u7684LLM\uff0c\u5728\u591a\u79cd\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u4e0a\u5bf9PairCoder\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPairCoder\u5728\u51c6\u786e\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u76f4\u63a5\u4f7f\u7528\u63d0\u793a\u7684LLM\uff0c\u76f8\u5bf9pass@1\u63d0\u9ad8\u4e8612.00%-162.43%\u3002**|\n", "2409.04617": "|**2024-09-06**|**Sparse Rewards Can Self-Train Dialogue Agents**|Barrett Martin Lattimer et.al.|[2409.04617](http://arxiv.org/abs/2409.04617)|**[link](https://github.com/asappresearch/josh-llm-simulation-training)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u8f6e\u5bf9\u8bdd\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7684\u6700\u65b0\u8fdb\u5c55\u4e3b\u8981\u7531\u76d1\u7763\u5fae\u8c03\u548c\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u53cd\u9988\u9a71\u52a8\u3002\u7136\u800c\uff0c\u968f\u7740\u57fa\u7840LLM\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u63d0\u5347\uff0c\u83b7\u53d6\u6709\u610f\u4e49\u7684\u4eba\u7c7b\u53cd\u9988\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\u4e14\u6210\u672c\u9ad8\u6602\u3002\u5728\u67d0\u4e9b\u9886\u57df\u4e2d\uff0c\u57fa\u7840LLM\u53ef\u80fd\u6700\u7ec8\u8d85\u8d8a\u4eba\u7c7b\u80fd\u529b\uff0c\u4f7f\u5f97\u4f20\u7edf\u7684\u57fa\u4e8e\u53cd\u9988\u7684\u65b9\u6cd5\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u81ea\u6211\u6539\u8fdb\u8303\u5f0f\uff0c\u5141\u8bb8LLM\u4ee3\u7406\u5728\u6ca1\u6709\u5916\u90e8\u4eba\u7c7b\u53cd\u9988\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u63d0\u9ad8\u5176\u6027\u80fd\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u6bd4\u7ed3\u679c\u4e3a\u6a21\u62df\u6536\u83b7\u201d\uff08JOSH\uff09\u7684\u81ea\u6211\u5bf9\u9f50\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u7a00\u758f\u5956\u52b1\u6a21\u62df\u73af\u5883\u6765\u63d0\u53d6\u7406\u60f3\u884c\u4e3a\uff0c\u5e76\u8fdb\u4e00\u6b65\u8bad\u7ec3LLM\u4ee5\u81ea\u8eab\u8f93\u51fa\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u4eceMultiWOZ\u4e2d\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u5de5\u5177\u8c03\u7528\u7684\u7a00\u758f\u5956\u52b1\u4eff\u771f\u73af\u5883\uff0c\u79f0\u4e3aToolWOZ\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528JOSH\u8bad\u7ec3\u7684\u6a21\u578b\uff08\u65e0\u8bba\u662f\u5c0f\u578b\u8fd8\u662f\u524d\u6cbf\u6a21\u578b\uff09\uff0c\u5728\u57fa\u4e8e\u5de5\u5177\u7684\u4ea4\u4e92\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u6a21\u578b\u80fd\u529b\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2409.06351": "|**2024-09-10**|**MAGDA: Multi-agent guideline-driven diagnostic assistance**|David Bani-Harouni et.al.|[2409.06351](http://arxiv.org/abs/2409.06351)|null|\u5728\u7d27\u6025\u62a4\u7406\u90e8\u95e8\u3001\u504f\u8fdc\u533b\u9662\u6216\u53d1\u5c55\u4e2d\u56fd\u5bb6\u7684\u8bca\u6240\u4e2d\uff0c\u4e34\u5e8a\u533b\u751f\u7ecf\u5e38\u7f3a\u4e4f\u7531\u8bad\u7ec3\u6709\u7d20\u7684\u653e\u5c04\u79d1\u533b\u751f\u5feb\u901f\u5206\u6790\u5f71\u50cf\u7684\u80fd\u529b\uff0c\u8fd9\u4f1a\u5bf9\u75c5\u4eba\u7684\u5065\u5eb7\u62a4\u7406\u4ea7\u751f\u4e0d\u5229\u5f71\u54cd\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6709\u53ef\u80fd\u901a\u8fc7\u63d0\u4f9b\u6709\u52a9\u4e8e\u4ed6\u4eec\u51b3\u7b56\u7684\u89c1\u89e3\u6765\u7f13\u89e3\u8fd9\u4e9b\u4e34\u5e8a\u533b\u751f\u7684\u538b\u529b\u3002\u5c3d\u7ba1\u8fd9\u4e9bLLM\u5728\u5c55\u793a\u5176\u7406\u8bba\u533b\u5b66\u77e5\u8bc6\u7684\u533b\u5b66\u8003\u8bd5\u4e0a\u53d6\u5f97\u4e86\u9ad8\u5206\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4e0d\u9075\u5faa\u533b\u5b66\u6307\u5357\u3002\u4e3a\u6b64\u9879\u5de5\u4f5c\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u96f6\u6837\u672c\u6307\u5357\u9a71\u52a8\u51b3\u7b56\u652f\u6301\u65b9\u6cd5\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7531\u591a\u4e2aLLM\u4ee3\u7406\u7ec4\u6210\u7684\u7cfb\u7edf\uff0c\u8fd9\u4e9b\u4ee3\u7406\u914d\u5907\u4e86\u5bf9\u6bd4\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u534f\u4f5c\u65b9\u5f0f\u8fbe\u6210\u60a3\u8005\u8bca\u65ad\u3002\u5728\u5411\u8fd9\u4e9b\u4ee3\u7406\u63d0\u4f9b\u7b80\u5355\u7684\u8bca\u65ad\u6307\u5357\u540e\uff0c\u5b83\u4eec\u4f1a\u5408\u6210\u63d0\u793a\u5e76\u6839\u636e\u8fd9\u4e9b\u6307\u5357\u7b5b\u9009\u56fe\u50cf\u4ee5\u5bfb\u627e\u53d1\u73b0\u3002\u6700\u540e\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e00\u4e2a\u53ef\u7406\u89e3\u7684\u63a8\u7406\u94fe\u8def\u6765\u89e3\u91ca\u5176\u8bca\u65ad\u7ed3\u679c\uff0c\u5e76\u81ea\u6211\u7cbe\u70bc\u4ee5\u8003\u8651\u75be\u75c5\u4e4b\u95f4\u7684\u76f8\u4e92\u4f9d\u8d56\u6027\u3002\u7531\u4e8e\u6211\u4eec\u7684\u65b9\u6cd5\u662f\u96f6\u6837\u672c\u7684\uff0c\u56e0\u6b64\u9002\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u573a\u666f\uff0c\u5728\u8fd9\u4e9b\u573a\u666f\u4e2d\u8bad\u7ec3\u6570\u636e\u6709\u9650\uff0c\u4f46\u4e13\u5bb6\u8bbe\u8ba1\u7684\u75be\u75c5\u63cf\u8ff0\u53ef\u7528\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u80f8\u90e8X\u5c04\u7ebf\u6570\u636e\u96c6CheXpert\u548cChestX-ray 14 Longtail\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5c55\u793a\u4e86\u4e0e\u73b0\u6709\u96f6\u6837\u672c\u65b9\u6cd5\u76f8\u6bd4\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u80fd\u591f\u5e94\u7528\u4e8e\u7f55\u89c1\u75be\u75c5\u7684\u6cdb\u5316\u3002|\n", "2409.09030": "|**2024-09-23**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanlin Wang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5e76\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u3002\u6211\u4eec\u53d1\u73b0\u8bb8\u591a\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u7684\u6df1\u5ea6\u7efc\u8ff0\uff0c\u4ee5\u6574\u7406\u5176\u53d1\u5c55\u80cc\u666f\u3001\u5206\u6790\u5982\u4f55\u7ed3\u5408LLMs\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u7c7b\u4efb\u52a1\u4ee5\u53ca\u9610\u660eSE\u4e2d\u7684LLMs\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u5f00\u5c55\u9996\u6b21\u9488\u5bf9\u7ed3\u5408LLMs\u4ee3\u7406\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2dLLMs\u4ee3\u7406\u7684\u6846\u67b6\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u3002\u540c\u65f6\uff0c\u603b\u7ed3\u4e86\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u95ee\u9898\uff0c\u5e76\u9488\u5bf9\u73b0\u6709\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5305\u542b\u76f8\u5173\u8bba\u6587\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09013": "|**2024-09-13**|**AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents**|Zhe Su et.al.|[2409.09013](http://arxiv.org/abs/2409.09013)|null|\u4e3a\u4e86\u5b89\u5168\u548c\u6210\u529f\u5730\u90e8\u7f72\uff0c\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5fc5\u987b\u540c\u65f6\u6ee1\u8db3\u771f\u5b9e\u6027\u548c\u5b9e\u7528\u6027\u76ee\u6807\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u5f80\u5f80\u5728\u51b2\u7a81\u4e2d\uff0c\u4f8b\u5982AI\u52a9\u624b\u5e2e\u52a9\u4e8c\u624b\u8f66\u9500\u552e\u5458\u9500\u552e\u6709\u7455\u75b5\u7684\u6c7d\u8f66\u3002\u8fd9\u79cd\u51b2\u7a81\u90e8\u5206\u5f52\u56e0\u4e8e\u6a21\u7cca\u6216\u8bef\u5bfc\u6027\u7684\u7528\u6237\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAI-LieDar\u7684\u6846\u67b6\uff0c\u4ee5\u7814\u7a76\u5728\u591a\u8f6e\u4ea4\u4e92\u8bbe\u7f6e\u4e2d\uff0c\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5982\u4f55\u5904\u7406\u5b9e\u7528\u6027\u548c\u771f\u5b9e\u6027\u7684\u51b2\u7a81\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u73b0\u5b9e\u573a\u666f\uff0c\u5176\u4e2d\u8bed\u8a00\u4ee3\u7406\u88ab\u6307\u793a\u5b9e\u73b0\u4e0e\u591a\u8f6e\u5bf9\u8bdd\u4e2d\u7684\u771f\u5b9e\u6027\u51b2\u7a81\u7684\u76ee\u6807\u3002\u4e3a\u4e86\u5927\u89c4\u6a21\u8bc4\u4f30\u771f\u5b9e\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u5fc3\u7406\u5b66\u6587\u732e\u7684\u53ef\u4fe1\u5ea6\u68c0\u6d4b\u5668\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u7684\u56de\u7b54\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6240\u6709\u6a21\u578b\u7684\u771f\u5b9e\u56de\u7b54\u6bd4\u4f8b\u4e0d\u523050%\uff0c\u5c3d\u7ba1\u8fbe\u5230\u76ee\u6807\uff08\u5b9e\u7528\u6027\uff09\u548c\u771f\u5b9e\u6027\u7684\u6bd4\u4f8b\u5728\u4e0d\u540c\u6a21\u578b\u4e2d\u6709\u6240\u5dee\u5f02\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u6d4b\u8bd5\u4e86LLM\u7684\u53ef\u5f15\u5bfc\u6027\uff0c\u53d1\u73b0\u6a21\u578b\u4f1a\u9075\u5faa\u6076\u610f\u6307\u4ee4\u6765\u6b3a\u9a97\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5f15\u5bfc\u4f7f\u5176\u8d8b\u5411\u771f\u5b9e\u7684\u6a21\u578b\u4e5f\u4ecd\u7136\u53ef\u80fd\u8bf4\u8c0e\u3002 \u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86LLM\u4e2d\u771f\u5b9e\u6027\u7684\u590d\u6742\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u786e\u4fddLLM\u548cAI\u4ee3\u7406\u7684\u5b89\u5168\u53ef\u9760\u90e8\u7f72\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u9075\u5b88\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u57fa\u4e8e\u4eba\u5de5\u7684\u5408\u89c4\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u65e5\u76ca\u589e\u52a0\u91cf\u4ee5\u53ca\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\uff0c\u9762\u4e34\u7740\u96be\u4ee5\u6269\u5c55\u7684\u95ee\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\u4e3a\u81ea\u52a8\u5185\u5bb9\u5408\u89c4\u9a8c\u8bc1\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002\u672c\u5de5\u4f5c\u8bc4\u4f30\u4e86\u516d\u4e2a\u57fa\u4e8eOpen-LLMs\u6784\u5efa\u7684AI\u4ee3\u7406\uff0c\u7528\u4e8e\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u7684\u81ea\u52a8\u5316\u89c4\u5219\u9075\u5faa\u68c0\u67e5\uff0c\u5728\u8fd9\u79cd\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\u4e2d\uff0c\u7531\u4e8e\u793e\u533a\u8303\u56f4\u548c\u89c4\u5219\u7684\u5f02\u8d28\u6027\uff0c\u8fd9\u4e00\u4efb\u52a1\u5c24\u4e3a\u56f0\u96be\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\uff0c\u6211\u4eec\u53d1\u73b0AI\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u4e0d\u5408\u89c4\u7684\u5185\u5bb9\u3001\u7406\u89e3\u8bed\u8a00\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u591a\u6837\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u8868\u73b0\u51fa\u9ad8\u5ea6\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\u8bc4\u5206\u89e3\u91ca\u4e0e\u5408\u89c4\u5efa\u8bae\u3002\u57fa\u4e8e\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u7c7b\u8bc4\u4f30\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08717": "|**2024-09-13**|**Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents**|Junchi Yao et.al.|[2409.08717](http://arxiv.org/abs/2409.08717)|null|\u5728\u793e\u4ea4\u5a92\u4f53\u65e5\u76ca\u6210\u4e3a\u793e\u4f1a\u8fd0\u52a8\u5f62\u6210\u516c\u4f17\u610f\u89c1\u7684\u91cd\u8981\u5e73\u53f0\u7684\u80cc\u666f\u4e0b\uff0c\u51c6\u786e\u6a21\u62df\u548c\u9884\u6d4b\u7528\u6237\u610f\u89c1\u52a8\u6001\u5bf9\u4e8e\u7406\u89e3\u793e\u4f1a\u73b0\u8c61\u3001\u653f\u7b56\u5236\u5b9a\u4ee5\u53ca\u5f15\u5bfc\u516c\u4f17\u610f\u89c1\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6a21\u62df\u65b9\u6cd5\u5728\u6355\u6349\u7528\u6237\u884c\u4e3a\u7684\u590d\u6742\u6027\u548c\u52a8\u6001\u6027\u65b9\u9762\u9762\u4e34\u7740\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u793e\u4ea4\u5a92\u4f53\u7528\u6237\u610f\u89c1\u52a8\u6001\u6a21\u62df\u65b9\u6cd5\u2014\u2014FDE-LLM\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u7ed3\u5408\u4e86\u610f\u89c1\u52a8\u6001\u4e0e\u6d41\u884c\u75c5\u6a21\u578b\uff0c\u6709\u6548\u7ea6\u675f\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u884c\u4e3a\u548c\u610f\u89c1\u6f14\u5316\u8fc7\u7a0b\uff0c\u4f7f\u5176\u66f4\u52a0\u7b26\u5408\u73b0\u5b9e\u7f51\u7edc\u4e16\u754c\u3002\u7279\u522b\u5730\uff0cFDE-LLM\u5c06\u7528\u6237\u5206\u4e3a\u610f\u89c1\u9886\u8896\u548c\u8ddf\u968f\u8005\u4e24\u5927\u7c7b\u3002\u610f\u89c1\u9886\u8896\u57fa\u4e8eLLM\u89d2\u8272\u626e\u6f14\uff0c\u5e76\u53d7\u7ec6\u80de\u81ea\u52a8\u673a\uff08CA\uff09\u6a21\u578b\u7ea6\u675f\uff0c\u800c\u610f\u89c1\u8ddf\u968f\u8005\u5219\u878d\u5165\u4e86\u4e00\u4e2a\u7ed3\u5408CA\u6a21\u578b\u4e0eSIR\u6a21\u578b\u7684\u52a8\u6001\u7cfb\u7edf\u3002\u8fd9\u79cd\u521b\u65b0\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u62df\u7684\u51c6\u786e\u6027\u548c\u6548\u7387\u3002 \u5b9e\u9a8c\u5728\u56db\u4e2a\u771f\u5b9e\u5fae\u535a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u5e76\u4f7f\u7528\u5f00\u6e90\u6a21\u578bChatGLM\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u578b\uff08ABM\uff09\u610f\u89c1\u52a8\u6001\u7b97\u6cd5\u548c\u57fa\u4e8eLLM\u7684\u610f\u89c1\u4f20\u64ad\u7b97\u6cd5\uff0c\u6211\u4eec\u7684FDE-LLM\u7b97\u6cd5\u5728\u51c6\u786e\u6027\u4e0e\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002|\n", "2409.10372": "|**2024-09-19**|**Instigating Cooperation among LLM Agents Using Adaptive Information Modulation**|Qiliang Chen et.al.|[2409.10372](http://arxiv.org/abs/2409.10372)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u4f5c\u4e3a\u4eba\u7c7b\u6218\u7565\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5e76\u7ed3\u5408\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9b\u4ee3\u7406\u5728\u56e2\u961f\u73af\u5883\u4e2d\u8fdb\u884c\u4e0d\u65ad\u6f14\u5316\u7684\u6218\u7565\u4e92\u52a8\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u4e86\u4f20\u7edf\u7684\u57fa\u4e8e\u4ee3\u7406\u7684\u6a21\u62df\uff0c\u901a\u8fc7\u4f7f\u7528\u7b56\u7565\u6027\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08SLA\uff09\u4ee5\u53ca\u5f15\u5165\u52a8\u6001\u548c\u9002\u5e94\u6027\u7684\u6cbb\u7406\uff0c\u901a\u8fc7\u4fc3\u8fdb\u793e\u4f1a\u884c\u4e3a\u7684\u5f3a\u5316\u5b66\u4e60\u4ee3\u7406\uff08PPA\uff09\uff0c\u8be5\u4ee3\u7406\u8c03\u8282\u7f51\u7edc\u4e2d\u4ee3\u7406\u4e4b\u95f4\u7684\u4fe1\u606f\u8bbf\u95ee\uff0c\u4ee5\u4f18\u5316\u793e\u4f1a\u798f\u5229\u5e76\u4fc3\u8fdb\u4eb2\u793e\u4f1a\u884c\u4e3a\u3002\u901a\u8fc7\u5728\u8fed\u4ee3\u6e38\u620f\u4e2d\u9a8c\u8bc1\uff0c\u5305\u62ec\u56da\u5f92\u56f0\u5883\uff0c\u6211\u4eec\u5c55\u793a\u4e86SLA\u4ee3\u7406\u8868\u73b0\u51fa\u590d\u6742\u7684\u6218\u7565\u8c03\u6574\u3002PPA\u4ee3\u7406\u6709\u6548\u5730\u5b66\u4e60\u8c03\u6574\u4fe1\u606f\u900f\u660e\u5ea6\uff0c\u5bfc\u81f4\u5408\u4f5c\u7387\u663e\u8457\u63d0\u9ad8\u3002\u8fd9\u4e00\u6846\u67b6\u63d0\u4f9b\u4e86\u5bf9\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u793e\u4f1a\u52a8\u529b\u5b66\u7684\u91cd\u8981\u89c1\u89e3\uff0c\u4e3a\u5728\u5b9e\u9645\u56e2\u961f\u73af\u5883\u4e2d\u90e8\u7f72AI\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.09785": "|**2024-09-17**|**Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition**|Chao-Han Huck Yang et.al.|[2409.09785](http://arxiv.org/abs/2409.09785)|null|\u5728\u8fd1\u671f\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u6280\u672f\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u89e3\u7801\u7684\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u5728\u58f0\u5b66\u5efa\u6a21\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6210\u4e3a\u4e86\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u8bed\u8a00\u5efa\u6a21\u5728\u8bed\u97f3\u5904\u7406\u9886\u57df\u7684\u6f5c\u5728\u65b0\u80fd\u529b\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u751f\u6210\u6027\u8bed\u97f3\u8f6c\u5f55\u9519\u8bef\u4fee\u6b63\u201d\uff08GenSEC\uff09\u7684\u6311\u6218\u3002\u8be5\u6311\u6218\u5305\u542b\u4e86\u4e09\u4e2a\u9488\u5bf9\u540eASR\u8bed\u8a00\u6a21\u578b\u7684\u4efb\u52a1\uff1a\uff08i\uff09\u540eASR\u8f6c\u5f55\u4fee\u6b63\u3001\uff08ii\uff09\u8bf4\u8bdd\u8005\u6807\u7b7e\u5316\u4ee5\u53ca\uff08iii\uff09\u60c5\u611f\u8bc6\u522b\u3002\u8fd9\u4e9b\u4efb\u52a1\u65e8\u5728\u6a21\u62df\u672a\u6765\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8bed\u97f3\u754c\u9762\u4ee3\u7406\u5904\u7406\u5de5\u4f5c\u65f6\u7684\u573a\u666f\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6216\u57fa\u4e8e\u4ee3\u7406\u7684API\u6765\u4fdd\u6301\u5bf9\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u8ba8\u8bba\u4e86\u57fa\u51c6\u8bc4\u4f30\u7684\u7ed3\u679c\u4ee5\u53ca\u8bbe\u8ba1\u672a\u6765\u8bc4\u4f30\u65f6\u5e94\u6c72\u53d6\u7684\u7ecf\u9a8c\u6559\u8bad\u3002|\n", "2409.09584": "|**2024-09-15**|**RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation**|Qingyao Li et.al.|[2409.09584](http://arxiv.org/abs/2409.09584)|null|\u672c\u6587\u9488\u5bf9LLM\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u4ee3\u7406\u4e0e\u6811\u641c\u7d22\u7b97\u6cd5\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u5f53\u524d\u7684\u641c\u7d22\u7b97\u6cd5\u5728\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u4f4e\u641c\u7d22\u8d28\u91cf\u7684\u95ee\u9898\uff0c\u4e3b\u8981\u6e90\u4e8e\u4ee5\u4e0b\u4e09\u4e2a\u539f\u56e0\uff1a1\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u9ad8\u63a8\u7406\u8981\u6c42\u7684\u641c\u7d22\u7a7a\u95f4\u8bbe\u8ba1\u4e0d\u5408\u7406\uff1b2\uff09\u672a\u80fd\u5145\u5206\u7ed3\u5408\u4ee3\u7801\u53cd\u9988\u4f18\u5316\u641c\u7d22\u8fc7\u7a0b\uff1b3\uff09\u5904\u7406\u8d1f\u53cd\u9988\u65f6\u6548\u7387\u4f4e\u4e0b\uff0c\u5bfc\u81f4\u641c\u7d22\u8d28\u91cf\u548c\u6548\u7387\u964d\u4f4e\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014RethinkMCTS\uff08\u53cd\u601d\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff09\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5728\u751f\u6210\u4ee3\u7801\u4e4b\u524d\u8fdb\u884c\u591a\u5c42\u6b21\u7684\u601d\u8003\u641c\u7d22\uff0c\u63a2\u7d22\u66f4\u5e7f\u6cdb\u7684\u7b56\u7565\u9009\u9879\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cRethinkMCTS\u5229\u7528\u7ec6\u7c92\u5ea6\u7684\u4ee3\u7801\u6267\u884c\u53cd\u9988\u6784\u5efa\u53e3\u5934\u53cd\u9988\uff0c\u4ee5\u4fee\u6b63\u641c\u7d22\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u7684\u9519\u8bef\u601d\u8def\u3002\u8fd9\u79cd\u673a\u5236\u786e\u4fdd\u4e86\u641c\u7d22\u6cbf\u7740\u6b63\u786e\u7684\u63a8\u7406\u8def\u5f84\u524d\u8fdb\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4e2a\u641c\u7d22\u6811\u7684\u6574\u4f53\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4e4b\u524d\u7684\u57fa\u4e8e\u641c\u7d22\u548c\u53cd\u9988\u7684\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u76f8\u6bd4\uff0cRethinkMCTS\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u5728HumanEval\u6570\u636e\u96c6\u4e0a\uff0cRethinkMCTS\u5c06GPT-3.5-turbo\u7684pass@1\u6307\u6807\u4ece70.12\u63d0\u9ad8\u5230\u4e8689.02\uff0c\u5c06GPT-4o-mini\u7684pass@1\u6307\u6807\u4ece87.20\u63d0\u5347\u81f394.51\u3002\u901a\u8fc7\u6df1\u5165\u7684\u63a2\u7d22\u548c\u6539\u8fdb\u6574\u4e2a\u641c\u7d22\u6811\u7684\u8d28\u91cf\uff0cRethinkMCTS\u6709\u6548\u5730\u589e\u5f3a\u4e86\u641c\u7d22\u8fc7\u7a0b\u7684\u5168\u9762\u6027\u548c\u6df1\u5ea6\u3002|\n", "2409.09345": "|**2024-09-14**|**Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models**|Yuanzhao Zhai et.al.|[2409.09345](http://arxiv.org/abs/2409.09345)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u4efb\u52a1\u76f8\u5173Q\u503c\u6a21\u578b\u6765\u6307\u5bfc\u884c\u52a8\u9009\u62e9\u7684\u65b9\u6cd5\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u591a\u6b65\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\uff08MCTS\uff09\u6536\u96c6\u4e86\u6807\u6ce8\u6709\u6b65\u9aa4\u7ea7Q\u503c\u7684\u51b3\u7b56\u8f68\u8ff9\uff0c\u5e76\u6784\u5efa\u4e86\u504f\u597d\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u4f7f\u7528\u53e6\u4e00\u4e2aLLM\u901a\u8fc7\u6b65\u9aa4\u7ea7\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u62df\u5408\u8fd9\u4e9b\u504f\u597d\uff0c\u4ece\u800c\u5f62\u6210Q\u503c\u6a21\u578b\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u5bf9\u4e8e\u6bcf\u4e2a\u51b3\u7b56\u6b65\u9aa4\uff0cLLM\u4ee3\u7406\u90fd\u4f1a\u9009\u62e9\u5177\u6709\u6700\u9ad8Q\u503c\u7684\u52a8\u4f5c\uff0c\u7136\u540e\u518d\u4e0e\u73af\u5883\u8fdb\u884c\u4ea4\u4e92\u3002\u6211\u4eec\u5c06\u8be5\u65b9\u6cd5\u5e94\u7528\u4e8e\u591a\u4e2a\u5f00\u6e90\u548cAPI\u96c6\u6210\u7684LLM\u4ee3\u7406\u4e0a\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5f15\u5165Q\u503c\u6a21\u578b\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6027\u80fd\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6784\u5efa\u4e8ePhi-3-mini-4k-instruct\u7684\u4ee3\u7406\u5728WebShop\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u4e86103%\uff0c\u5728HotPotQA\u4efb\u52a1\u4e0a\u63d0\u5347\u4e8675%\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4o-mini\u3002\u6b64\u5916\uff0cQ\u503c\u6a21\u578b\u8fd8\u5177\u5907\u51e0\u4e2a\u4f18\u52bf\uff0c\u5982\u5bf9\u4e0d\u540cLLM\u4ee3\u7406\u7684\u6cdb\u5316\u80fd\u529b\u548c\u4e0e\u73b0\u6709\u63d0\u793a\u7b56\u7565\u65e0\u7f1d\u96c6\u6210\u7684\u80fd\u529b\u3002|\n", "2409.09271": "|**2024-09-14**|**Python Symbolic Execution with LLM-powered Code Generation**|Wenhan Wang et.al.|[2409.09271](http://arxiv.org/abs/2409.09271)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u589e\u5f3a\u7684\u4ee3\u7406\u5de5\u5177\u2014\u2014LLM-Sym\u3002\u8be5\u5de5\u5177\u65e8\u5728\u89e3\u51b3\u4f7f\u7528\u7b26\u53f7\u6267\u884c\u6280\u672f\u5728\u52a8\u6001\u7c7b\u578b\u8bed\u8a00\u5982Python\u4e2d\u9047\u5230\u7684\u4e3b\u8981\u6311\u6218\u3002\u901a\u8fc7\u81ea\u52a8\u8c03\u7528SMT\u6c42\u89e3\u5668Z3\u6765\u89e3\u51b3\u6267\u884c\u8def\u5f84\u7ea6\u675f\uff0cLLM-Sym\u80fd\u591f\u6269\u5c55\u57fa\u7840\u7684\u7b26\u53f7\u6267\u884c\u5f15\u64ce\uff0c\u4f7f\u5176\u652f\u6301\u5305\u542b\u590d\u6742\u6570\u636e\u7c7b\u578b`list`\u7684\u7a0b\u5e8f\u3002 LLM-Sym\u7684\u6838\u5fc3\u8d21\u732e\u5728\u4e8e\u5c06\u590d\u6742\u7684Python\u8def\u5f84\u7ea6\u675f\u8f6c\u5316\u4e3aZ3\u4ee3\u7801\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5b9e\u73b0\u51c6\u786e\u7684\u8def\u5f84\u5230Z3\u4ee3\u7801\u7684\u8f6c\u6362\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6b65\u9aa4\u7684\u4ee3\u7801\u751f\u6210\u7ba1\u9053\uff0c\u5305\u62ec\u7c7b\u578b\u63a8\u65ad\u3001\u68c0\u7d22\u548c\u81ea\u6211\u7cbe\u70bc\u7b49\u73af\u8282\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM-Sym\u80fd\u591f\u89e3\u51b3\u5177\u6709\u590d\u6742\u63a7\u5236\u6d41\u548c\u5217\u8868\u6570\u636e\u7ed3\u6784\u7684LeetCode\u95ee\u9898\u4e2d\u7684\u8def\u5f84\u7ea6\u675f\uff0c\u8fd9\u662f\u57fa\u7840\u7b26\u53f7\u6267\u884c\u5f15\u64ce\u65e0\u6cd5\u505a\u5230\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u4e3aLLM\u4e0e\u7b26\u53f7\u6c42\u89e3\u5668\u63a8\u7406\u80fd\u529b\u7684\u7ed3\u5408\u5f00\u8f9f\u4e86\u9053\u8def\uff0c\u5e76\u4e3aLLM\u8f85\u52a9\u6d4b\u8bd5\u7528\u4f8b\u751f\u6210\u63d0\u4f9b\u4e86\u65b0\u7684\u673a\u9047\u3002|\n", "2409.11393": "|**2024-09-17**|**LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents**|Amine B. Hassouna et.al.|[2409.11393](http://arxiv.org/abs/2409.11393)|null|\u672c\u6587\u901a\u8fc7\u63d0\u51fa\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-Agent-UMF\uff08\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u7edf\u4e00\u5efa\u6a21\u6846\u67b6\uff09\uff0c\u89e3\u51b3\u4e86\u96c6\u6210\u5de5\u5177\u5230\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u4ee5\u53ca\u5728\u591a\u4e2a\u524d\u6cbf\u5de5\u4f5c\u4e2d\u63d0\u51fa\u7684\u6539\u8fdb\u63aa\u65bd\u6240\u5bfc\u81f4\u7684\u8f6f\u4ef6\u67b6\u6784\u975e\u7edf\u4e00\u6027\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u7ed3\u5408\u53ca\u540e\u7eed\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u529f\u80fd\u5b9e\u73b0\u800c\u975e\u5b9a\u4e49\u7ec4\u4ef6\u8fb9\u754c\uff0c\u5bfc\u81f4\u4e86\u7814\u7a76\u4eba\u5458\u4e4b\u95f4\u7684\u672f\u8bed\u548c\u67b6\u6784\u4e0a\u7684\u6df7\u6dc6\u3002 \u8be5\u6846\u67b6\u660e\u786e\u4e86\u4ee3\u7406\u7684\u4e0d\u540c\u7ec4\u4ef6\uff0c\u5305\u62ecLLM\u3001\u5de5\u5177\u4ee5\u53ca\u65b0\u5f15\u5165\u7684\u6838\u5fc3\u4ee3\u7406\u6982\u5ff5\uff0c\u5176\u4f5c\u7528\u662f\u4ee3\u7406\u7684\u4e2d\u592e\u534f\u8c03\u8005\uff0c\u7531\u89c4\u5212\u3001\u8bb0\u5fc6\u3001\u4e2a\u4eba\u8d44\u6599\u3001\u884c\u52a8\u548c\u5b89\u5168\u4e94\u4e2a\u6a21\u5757\u7ec4\u6210\u3002\u6838\u5fc3\u4ee3\u7406\u7684\u5185\u90e8\u7ed3\u6784\u5dee\u5f02\u4fc3\u4f7f\u6211\u4eec\u5c06\u5176\u5206\u7c7b\u4e3a\u88ab\u52a8\u578b\u548c\u4e3b\u52a8\u578b\u4e24\u79cd\u7c7b\u578b\u3002\u57fa\u4e8e\u6b64\u5206\u7c7b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7ed3\u5408\u4e0d\u540c\u4e2a\u4f53\u4ee3\u7406\u72ec\u7279\u7279\u6027\u7684\u591a\u79cd\u591a\u6838\u5fc3\u4ee3\u7406\u67b6\u6784\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5c06\u8be5\u6846\u67b6\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u524d\u6cbf\u4ee3\u7406\uff0c\u5e76\u5c55\u793a\u5176\u4e0e\u529f\u80fd\u7684\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u6f84\u6e05\u4e86\u5148\u524d\u88ab\u5ffd\u89c6\u7684\u67b6\u6784\u65b9\u9762\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u56db\u4e2a\u63d0\u51fa\u7684\u67b6\u6784\u8fdb\u884c\u4e86\u8be6\u5c3d\u8bc4\u4f30\uff0c\u901a\u8fc7\u6574\u5408\u5177\u6709\u4e0d\u540c\u7279\u6027\u7684\u4ee3\u7406\u5230\u6df7\u5408\u4e3b\u52a8/\u88ab\u52a8\u6838\u5fc3\u4ee3\u7406\u7cfb\u7edf\u4e2d\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u63d0\u4f9b\u4e86\u5bf9\u7279\u5b9a\u4ee3\u7406\u7ec4\u5408\u53ef\u80fd\u5e26\u6765\u7684\u6539\u8fdb\u548c\u9762\u4e34\u7684\u6311\u6218\u7684\u6e05\u6670\u89c1\u89e3\u3002|\n", "2409.11276": "|**2024-09-17**|**Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments**|Maria Rigaki et.al.|[2409.11276](http://arxiv.org/abs/2409.11276)|null|\u672c\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u4f7f\u7528\u672c\u5730\u5fae\u8c03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u7ea2\u961f\u4ee3\u7406\u7684\u53ef\u80fd\u6027\u3002\u8003\u8651\u5230\u5546\u4e1a\u4e91\u57faLLM\u7684\u9690\u79c1\u95ee\u9898\u3001\u6210\u672c\u548c\u7f51\u7edc\u8fde\u63a5\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Hackphyr\u2014\u2014\u4e00\u4e2a\u672c\u5730\u5fae\u8c03\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u65e8\u5728\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u73af\u5883\u4e2d\u7684\u7ea2\u961f\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u5355\u4e2aGPU\u5361\u4e0a\u8fd0\u884c\uff0c\u5e76\u4e14\u5728\u6027\u80fd\u4e0a\u4e0e\u66f4\u5927\u66f4\u5f3a\u5927\u7684\u5546\u4e1a\u6a21\u578b\u5982GPT-4\u76f8\u5ab2\u7f8e\u3002 Hackphyr\u5728\u590d\u6742\u3001\u524d\u6240\u672a\u89c1\u7684\u573a\u666f\u4e2d\u663e\u8457\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\uff0c\u5305\u62ecGPT-3.5-turbo\u4ee5\u53caQ-learning\u4ee3\u7406\u7b49\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u6027\u80fd\u63d0\u5347\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u7f51\u7edc\u5b89\u5168\u4efb\u52a1\u7684\u65b0\u6570\u636e\u96c6\uff0c\u4ee5\u589e\u5f3a\u57fa\u7840\u6a21\u578b\u7684\u80fd\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u4ee3\u7406\u884c\u4e3a\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u7684\u89c4\u5212\u80fd\u529b\u548c\u6f5c\u5728\u5c40\u9650\u6027\u7684\u89c1\u89e3\uff0c\u4ece\u800c\u4e3a\u66f4\u5e7f\u6cdb\u5730\u7406\u89e3\u6b64\u7c7b\u4ee3\u7406\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u53c2\u8003\u3002|\n", "2409.10568": "|**2024-09-14**|**On the limits of agency in agent-based models**|Ayush Chopra et.al.|[2409.10568](http://arxiv.org/abs/2409.10568)|**[link](https://github.com/agenttorch/agenttorch)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgentTorch\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u5177\u6709\u9002\u5e94\u6027\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u5c06\u57fa\u4e8e\u4e2a\u4f53\u7684\u6a21\u578b\uff08ABM\uff09\u6269\u5c55\u5230\u6570\u767e\u4e07\u4e2a\u4ee3\u7406\u7684\u89c4\u6a21\u3002\u8fd9\u4e00\u6846\u67b6\u65e8\u5728\u5728\u6a21\u62df\u590d\u6742\u7cfb\u7edf\u7684\u884c\u4e3a\u65f6\uff0c\u65e2\u6355\u6349\u5230\u771f\u5b9e\u73af\u5883\u52a8\u6001\u548c\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\uff0c\u53c8\u4fdd\u6301\u5bf9\u5e9e\u5927\u4eba\u53e3\u7fa4\u4f53\u9ad8\u6548\u6a21\u62df\u7684\u80fd\u529b\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\u4e3a\u589e\u5f3aABM\u63d0\u4f9b\u4e86\u673a\u4f1a\uff0c\u4f46\u4f7f\u7528LLMs\u8fdb\u884c\u5927\u89c4\u6a21\u4ee3\u7406\u7684\u8ba1\u7b97\u53ef\u884c\u6027\u9650\u5236\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc4\u4f30\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3aABM\u4ee3\u7406\u7684\u5b9e\u7528\u6027\uff0c\u63a2\u7d22\u4e86\u6a21\u62df\u89c4\u6a21\u4e0e\u5355\u4e2a\u4ee3\u7406\u884c\u4e3a\u7ec6\u8282\u4e4b\u95f4\u7684\u6743\u8861\u3002\u4ee5COVID-19\u5927\u6d41\u884c\u4e3a\u4f8b\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5982\u4f55\u6a21\u62df840\u4e07\u4e2a\u4ee3\u8868\u7ebd\u7ea6\u5e02\u7684\u4ee3\u7406\uff0c\u4ee5\u6355\u6349\u9694\u79bb\u548c\u5c31\u4e1a\u884c\u4e3a\u5bf9\u5065\u5eb7\u548c\u7ecf\u6d4e\u7ed3\u679c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u6bd4\u8f83\u4e86\u57fa\u4e8e\u542f\u53d1\u5f0f\u65b9\u6cd5\u548cLLMs\u7684\u4e0d\u540c\u4ee3\u7406\u67b6\u6784\u5728\u9884\u6d4b\u75be\u75c5\u6d6a\u6f6e\u548c\u5931\u4e1a\u7387\u65b9\u9762\u7684\u6027\u80fd\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86AgentTorch\u5728\u56de\u987e\u6027\u3001\u5047\u8bbe\u6027\u548c\u524d\u77bb\u6027\u5206\u6790\u4e2d\u7684\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u9002\u5e94\u6027\u4ee3\u7406\u884c\u4e3a\u5982\u4f55\u5e2e\u52a9\u514b\u670d\u5386\u53f2\u6570\u636e\u5728\u653f\u7b56\u8bbe\u8ba1\u4e2d\u7684\u5c40\u9650\u6027\u3002AgentTorch\u662f\u4e00\u4e2a\u5f00\u6e90\u9879\u76ee\uff0c\u76ee\u524d\u6b63\u88ab\u5168\u7403\u7528\u4e8e\u653f\u7b56\u5236\u5b9a\u548c\u79d1\u5b66\u53d1\u73b0\u3002\u8be5\u6846\u67b6\u53ef\u5728\u6b64\u83b7\u53d6\uff1agithub.com/AgentTorch/AgentTorch\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5e2e\u52a9\u4e0b\uff0c\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u4ee3\u7406\u53ef\u4ee5\u76f4\u63a5\u4e0e\u5e94\u7528\u7528\u6237\u754c\u9762\uff08UI\uff09\u8fdb\u884c\u4ea4\u4e92\uff0c\u4ece\u800c\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u63d0\u5347\u4ee3\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u5e38\u5e38\u56e0\u4e3a\u6d89\u53ca\u5927\u91cf\u987a\u5e8fUI\u4ea4\u4e92\u800c\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AXIS\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u63a5\u53e3\uff08APIs\uff09\u4f18\u5148\u4e8eUI\u52a8\u4f5c\u6765\u4f18\u5316\u4ee3\u7406\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u4ee5\u521b\u5efa\u548c\u6269\u5c55API\uff0c\u4fc3\u8fdb\u4e86API\u7684\u751f\u6210\u548c\u5e94\u7528\u8303\u56f4\u7684\u6269\u5c55\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u5728Word\u529e\u516c\u8f6f\u4ef6\u4e0a\u663e\u793a\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u5b8c\u6210\u4efb\u52a1\u7684\u65f6\u95f4\u4e0a\u51cf\u5c11\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u4eba\u7c7b-\u4ee3\u7406-\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u548c\u5e94\u7528\u63d0\u4f9b\u8005\u5728LLMs\u65f6\u4ee3\u8bbe\u8ba1\u65b0UI\u539f\u5219\u63d0\u4f9b\u4e86\u8d21\u732e\uff0c\u5e76\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e00\u4e2a\u5e94\u7528\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8fc8\u5411\u4ee5\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.16455": "|**2024-09-24**|**MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment**|Venkata Naren Devarakonda et.al.|[2409.16455](http://arxiv.org/abs/2409.16455)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultiTalk\u7684\u57fa\u4e8e\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\u3002\u901a\u8fc7\u5f15\u5165\u5185\u7701\u548c\u5916\u7701\u5bf9\u8bdd\u5faa\u73af\u6846\u67b6\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u89e3\u51b3LLM\u5728\u4efb\u52a1\u89c4\u5212\u4e2d\u53ef\u80fd\u9047\u5230\u7684\u95ee\u9898\uff0c\u5982\u5e7b\u89c9\u3001\u7528\u6237\u6307\u4ee4\u4e2d\u7684\u6b67\u4e49\u3001\u73af\u5883\u7ea6\u675f\u4ee5\u53ca\u6267\u884c\u4ee3\u7406\u80fd\u529b\u7684\u5c40\u9650\u6027\u3002\u8fd9\u4e9b\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u8ba1\u5212\u51fa\u73b0\u9519\u8bef\u6216\u4e0d\u5b8c\u6574\u3002 MultiTalk\u65b9\u6cd5\u901a\u8fc7\u7279\u5b9a\u7cfb\u7edf\u6765\u63d0\u53d6\u548c\u9884\u6d4b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u72b6\u6001\uff0c\u5e76\u6807\u8bb0\u51fa\u4eba\u3001LLM\u4ee3\u7406\u548c\u73af\u5883\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\u6216\u504f\u5dee\u3002\u6709\u6548\u7684\u53cd\u9988\u8def\u5f84\u4fc3\u8fdb\u4eba\u4e0eLLM\u4e4b\u95f4\u7684\u6709\u610f\u4e49\u5bf9\u8bdd\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u4efb\u52a1\u7684\u5e94\u7528\u4e2d\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002\u5b9e\u9a8c\u548c\u6d88\u878d\u5206\u6790\u5c55\u793a\u4e86MultiTalk\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u548c\u53ef\u9760\u6027\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u7684\u6bd4\u8f83\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u4f53\u4ee3\u7406\u4efb\u52a1\u89c4\u5212\u65b9\u9762\u7684\u4f18\u52bf\u3002 \u603b\u4e4b\uff0cMultiTalk\u63d0\u4f9b\u4e86\u4e00\u79cd\u901a\u8fc7\u589e\u5f3aLLM\u4e0e\u73af\u5883\u3001\u6267\u884c\u8005\u548c\u7528\u6237\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u548c\u6c9f\u901a\u6765\u6539\u8fdb\u4efb\u52a1\u89c4\u5212\u8fc7\u7a0b\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u63d0\u9ad8\u89c4\u5212\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002|\n", "2409.15623": "|**2024-09-23**|**Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality**|Yiwen Xu et.al.|[2409.15623](http://arxiv.org/abs/2409.15623)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSafe Guard\u7684LLM\u4ee3\u7406\uff0c\u7528\u4e8e\u68c0\u6d4b\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u7684\u8bed\u97f3\u4ea4\u4e92\u4e2d\u7684\u4ec7\u6068\u8a00\u8bba\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u5229\u7528\u4e86Open AI GPT\u548c\u97f3\u9891\u7279\u5f81\u63d0\u53d6\u6280\u672f\uff0c\u5b9e\u73b0\u4e86\u5b9e\u65f6\u8bed\u97f3\u4ea4\u4e92\u7684\u68c0\u6d4b\u529f\u80fd\u3002\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u7cfb\u7edf\u8bbe\u8ba1\u4ee5\u53ca\u5bf9\u8be5\u7cfb\u7edf\u7684\u8bc4\u4f30\uff0c\u8fd9\u4e9b\u90fd\u8bc1\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u68c0\u6d4b\u4ec7\u6068\u8a00\u8bba\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e14\u76f8\u6bd4\u73b0\u6709\u65b9\u6cd5\u663e\u8457\u964d\u4f4e\u4e86\u8bef\u62a5\u7387\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u521b\u5efa\u66f4\u5b89\u5168\u7684\u865a\u62df\u73af\u5883\u65b9\u9762\u5177\u6709\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u53d1\u5c55\u57fa\u4e8eLLM\u7684\u7ba1\u7406\u65b9\u6cd5\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2409.14913": "|**2024-09-25**|**Towards a Realistic Long-Term Benchmark for Open-Web Research Agents**|Peter M\u00fchlbacher et.al.|[2409.14913](http://arxiv.org/abs/2409.14913)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5373\u5c06\u63a8\u51fa\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u7ecf\u6d4e\u4ef7\u503c\u9ad8\u7684\u767d\u9886\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u5bf9\u91d1\u878d\u548c\u54a8\u8be2\u9886\u57df\u5e38\u89c4\u8fdb\u884c\u7684\u3001\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u201c\u6742\u4e71\u201d\u5f00\u653e\u7f51\u7edc\u7814\u7a76\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u8fd9\u6837\u505a\uff0c\u6211\u4eec\u4e3a\u5efa\u7acb\u4e00\u4e2aLLM\u4ee3\u7406\u8bc4\u4f30\u5957\u4ef6\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5728\u8be5\u5957\u4ef6\u4e2d\uff0c\u826f\u597d\u7684\u6027\u80fd\u76f4\u63a5\u5bf9\u5e94\u7740\u5de8\u5927\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u3002\u6211\u4eec\u6784\u5efa\u5e76\u6d4b\u8bd5\u4e86\u591a\u4e2a\u4ee3\u7406\u67b6\u6784\uff0c\u5305\u62eco1-preview\u3001GPT-4o\u3001Claude-3.5 Sonnet\u3001Llama 3.1\uff08405b\uff09\u4ee5\u53caGPT-4o-mini\u3002\u5e73\u5747\u800c\u8a00\uff0c\u4f7f\u7528Claude-3.5 Sonnet\u548co1-preview\u7684LLM\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u660e\u663e\u4f18\u4e8e\u4f7f\u7528GPT-4o\u7684\u4ee3\u7406\uff0c\u800c\u57fa\u4e8eLlama 3.1\uff08405b\uff09\u548cGPT-4o-mini\u7684\u4ee3\u7406\u5219\u843d\u540e\u5f88\u591a\u3002\u5728\u6240\u6709LLM\u4e2d\uff0c\u5177\u6709\u59d4\u6258\u5b50\u4efb\u52a1\u7ed9\u5b50\u4ee3\u7406\u80fd\u529b\u7684ReAct\u67b6\u6784\u8868\u73b0\u6700\u4f73\u3002\u9664\u4e86\u5b9a\u91cf\u8bc4\u4f30\u4e4b\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u67e5\u4ee3\u7406\u7684\u8ffd\u8e2a\u8bb0\u5f55\u548c\u53cd\u601d\u5b83\u4eec\u7684\u89c2\u5bdf\u7ed3\u679c\uff0c\u5bf9\u4ee3\u7406\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u5b9a\u6027\u8bc4\u4f30\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u4ee3\u8868\u4e86\u9996\u6b21\u6df1\u5165\u8bc4\u4f30\u4ee3\u7406\u5728\u771f\u5b9e\u5f00\u653e\u7f51\u7edc\u4e0a\u6267\u884c\u5177\u6709\u6311\u6218\u6027\u7684\u3001\u7ecf\u6d4e\u4e0a\u6709\u4ef7\u503c\u7684\u5206\u6790\u5e08\u5f0f\u7814\u7a76\u7684\u80fd\u529b\u3002|\n", "2409.14807": "|**2024-09-23**|**Interpreting Multi-band Galaxy Observations with Large Language Model-Based Agents**|Zechang Sun et.al.|[2409.14807](http://arxiv.org/abs/2409.14807)|null|\u672c\u6587\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e3a\u57fa\u7840\u7684\u667a\u80fd\u4f53\u5982\u4f55\u52a0\u901f\u5929\u6587\u5b66\u7814\u7a76\u6d41\u7a0b\uff0c\u901a\u8fc7\u6a21\u4eff\u4eba\u7c7b\u63a8\u7406\u6765\u89e3\u91ca\u591a\u6ce2\u6bb5\u661f\u7cfb\u89c2\u6d4b\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86mephisto\u6846\u67b6\uff0c\u5b83\u80fd\u591f\u4e0eCIGALE\u4ee3\u7801\u5e93\u534f\u4f5c\uff0c\u540e\u8005\u5305\u542b\u4e86\u7528\u4e8e\u89e3\u91ca\u89c2\u6d4b\u6570\u636e\u7684\u5149\u8c31\u80fd\u91cf\u5206\u5e03\uff08SED\uff09\u6a21\u578b\u3002\u5728\u5f00\u653e\u4e16\u754c\u73af\u5883\u4e2d\uff0cmephisto\u901a\u8fc7\u81ea\u6211\u6e38\u620f\u7ecf\u9a8c\u5b66\u4e60\u3001\u6267\u884c\u6811\u641c\u7d22\u5e76\u79ef\u7d2f\u52a8\u6001\u66f4\u65b0\u7684\u77e5\u8bc6\u57fa\u7840\u3002\u4f5c\u4e3a\u6982\u5ff5\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c06mephisto\u5e94\u7528\u4e8e\u8a79\u59c6\u65af\u97e6\u4f2f\u592a\u7a7a\u671b\u8fdc\u955c\u7684\u6700\u65b0\u6570\u636e\u96c6\u3002\u7ed3\u679c\u8868\u660e\uff0cmephisto\u5728\u63a8\u7406\u661f\u7cfb\u7269\u7406\u573a\u666f\u65b9\u9762\u8fbe\u5230\u4e86\u63a5\u8fd1\u4eba\u7c7b\u7684\u4e13\u4e1a\u6c34\u5e73\uff0c\u751a\u81f3\u5728\u5904\u7406\u65b0\u53d1\u73b0\u7684\u201c\u5c0f\u7ea2\u70b9\u201d\u661f\u7cfb\u65f6\u4e5f\u662f\u5982\u6b64\u3002\u8fd9\u662f\u667a\u80fd\u4f53\u8fdb\u884c\u5929\u6587\u5b66\u7814\u7a76\u7684\u9996\u6b21\u5c55\u793a\uff0c\u671d\u7740\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u5b9e\u73b0\u7aef\u5230\u7aef\u7814\u7a76\u7684\u65b9\u5411\u8fc8\u8fdb\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u52a0\u5feb\u5929\u6587\u53d1\u73b0\u7684\u901f\u5ea6\u3002|\n", "2409.14488": "|**2024-09-22**|**Enhancing LLM-based Autonomous Driving Agents to Mitigate Perception Attacks**|Ruoyu Song et.al.|[2409.14488](http://arxiv.org/abs/2409.14488)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u7cfb\u7edf\u96c6\u6210\u7684\u65e5\u76ca\u589e\u957f\u7684\u5174\u8da3\uff0cAD\u7cfb\u7edf\u9762\u4e34\u7740\u653b\u51fb\u5176\u5bf9\u8c61\u68c0\u6d4b\u4e0e\u8ffd\u8e2a\uff08ODT\uff09\u529f\u80fd\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u9488\u5bf9\u56db\u4e2a\u8fd1\u671f\u63d0\u51fa\u7684LLM\u4ee3\u7406\u7684ODT\u653b\u51fb\u6210\u529f\u7387\u8fbe\u523063.26%\uff0c\u5bfc\u81f4\u5b83\u4eec\u5d29\u6e83\u6216\u8fdd\u53cd\u4ea4\u901a\u89c4\u5219\uff0c\u539f\u56e0\u5728\u4e8e\u8bef\u5bfc\u6027\u8bb0\u5fc6\u6a21\u5757\u63d0\u4f9b\u7684\u8fc7\u5f80\u7ecf\u9a8c\u3001\u63d0\u793a\u5728\u8bc6\u522b\u4e0d\u4e00\u81f4\u6027\u65b9\u9762\u7684\u5c40\u9650\u6027\u4ee5\u53ca\u5bf9\u5730\u9762\u5b9e\u51b5\u611f\u77e5\u6570\u636e\u7684\u4f9d\u8d56\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aHudson\u7684\u9a7e\u9a76\u63a8\u7406\u4ee3\u7406\uff0c\u5b83\u6269\u5c55\u4e86\u5148\u524d\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u7cfb\u7edf\uff0c\u65e8\u5728\u5728\u611f\u77e5\u653b\u51fb\u671f\u95f4\u5b9e\u73b0\u66f4\u5b89\u5168\u7684\u51b3\u7b56\u5236\u5b9a\uff0c\u540c\u65f6\u5728\u6b63\u5e38\u6761\u4ef6\u4e0b\u4fdd\u6301\u6709\u6548\u6027\u3002 Hudson\u901a\u8fc7\u9996\u5148\u5bf9AD\u8f6f\u4ef6\u8fdb\u884c\u4eea\u5668\u5316\u6536\u96c6\u5b9e\u65f6\u611f\u77e5\u7ed3\u679c\u548c\u9a7e\u9a76\u573a\u666f\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u6570\u636e\u968f\u540e\u88ab\u8f6c\u5316\u4e3a\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u3002\u4e3a\u4e86\u5f15\u5bfcLLM\u5728ODT\u653b\u51fb\u671f\u95f4\u68c0\u6d4b\u5e76\u505a\u51fa\u5b89\u5168\u63a7\u5236\u51b3\u7b56\uff0cHudson\u5c06DSL\u8f6c\u6362\u4e3a\u81ea\u7136\u8bed\u8a00\uff0c\u5e76\u9644\u5e26\u4e00\u7ec4\u81ea\u5b9a\u4e49\u7684\u653b\u51fb\u68c0\u6d4b\u6307\u4ee4\u3002\u6267\u884c\u67e5\u8be2\u540e\uff0cHudson\u5206\u6790LLM\u7684\u63a7\u5236\u51b3\u7b56\u4ee5\u7406\u89e3\u5176\u56e0\u679c\u63a8\u7406\u8fc7\u7a0b\u3002 \u6211\u4eec\u4f7f\u7528\u79c1\u6709LLM\uff08GPT-4\uff09\u3001\u4e24\u4e2a\u5f00\u6e90LLM\uff08Llama\u548cGemma\uff09\u548c\u5404\u79cd\u5bf9\u6297\u6027\u9a7e\u9a76\u60c5\u666f\u5bf9Hudson\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002GPT-4\u3001Llama\u548cGemma\u5728\u5e73\u5747\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e8683.3%\u300163.6%\u548c73.6%\u7684\u653b\u51fb\u68c0\u6d4b\u51c6\u786e\u7387\u3002\u56e0\u6b64\uff0c\u572886.4%\u300173.9%\u548c80%\u7684\u653b\u51fb\u4e2d\uff0c\u5b83\u4eec\u505a\u51fa\u4e86\u5b89\u5168\u63a7\u5236\u51b3\u7b56\u3002\u968f\u7740\u5c06LLM\u96c6\u6210\u5230AD\u7cfb\u7edf\u4e2d\u7684\u5174\u8da3\u589e\u957f\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u7684\u4f18\u52bf\u53ca\u5176\u5728\u68c0\u6d4b\u548c\u7f13\u89e3ODT\u653b\u51fb\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2409.13642": "|**2024-09-20**|**Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection**|Md Nakhla Rafi et.al.|[2409.13642](http://arxiv.org/abs/2409.13642)|null|\u5728\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u5b9a\u4f4d\u548c\u4fee\u590d\u8f6f\u4ef6\u6545\u969c\u662f\u4e00\u4e2a\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u7684\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9891\u8c31\u7684\u6545\u969c\u5b9a\u4f4d\uff08SBFL\uff09\uff0c\u4f9d\u8d56\u4e8e\u6d4b\u8bd5\u8986\u76d6\u7387\u6570\u636e\u7684\u7edf\u8ba1\u5206\u6790\uff0c\u4f46\u5f80\u5f80\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u57fa\u4e8e\u5b66\u4e60\u7684\u6280\u672f\u867d\u7136\u66f4\u6709\u6548\uff0c\u4f46\u9700\u8981\u5927\u91cf\u7684\u8bad\u7ec3\u6570\u636e\uff0c\u5e76\u4e14\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u4e3a\u6539\u5584\u6545\u969c\u5b9a\u4f4d\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u589e\u5f3a\u4ee3\u7801\u7406\u89e3\u548c\u63a8\u7406\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u57fa\u7ebf\u6280\u672f\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u5305\u62ec\u4ee4\u724c\u9650\u5236\u3001\u957f\u8f93\u5165\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u5904\u7406\u6d89\u53ca\u591a\u4e2a\u76f8\u4e92\u4f5c\u7528\u7ec4\u4ef6\u7684\u590d\u6742\u7cfb\u7edf\u65f6\u7684\u56f0\u96be\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLM4FL\u7684\u521b\u65b0\u6027LLM\u4ee3\u7406\u57fa\u7ebf\u6545\u969c\u5b9a\u4f4d\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86SBFL\u6392\u540d\u4e0e\u5206\u800c\u6cbb\u4e4b\u7b56\u7565\u3002\u901a\u8fc7\u5c06\u5927\u89c4\u6a21\u8986\u76d6\u6570\u636e\u5206\u89e3\u4e3a\u53ef\u7ba1\u7406\u7684\u7ec4\uff0c\u5e76\u5229\u7528\u591a\u4e2aLLM\u4ee3\u7406\u901a\u8fc7\u63d0\u793a\u94fe\u5f0f\u8c03\u7528\uff0cLLM4FL\u6709\u6548\u5730\u5bfc\u822a\u4ee3\u7801\u5e93\u5e76\u5b9a\u4f4d\u6545\u969c\u3002\u8be5\u65b9\u6cd5\u8fd8\u6574\u5408\u4e86\u81ea\u6211\u53cd\u601d\u548c\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u8fed\u4ee3\u751f\u6210\u4fee\u590d\u5e76\u91cd\u65b0\u6392\u540d\u53ef\u7591\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528Defects4J\uff08V2.0.0\uff09\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u5176\u4e2d\u5305\u62ec\u6765\u81ea14\u4e2a\u5f00\u6e90Java\u9879\u76ee\u7684675\u4e2a\u771f\u5b9e\u4e16\u754c\u6545\u969c\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM4FL\u5728Top-1\u51c6\u786e\u7387\u4e0a\u6bd4AutoFL\u9ad8\u51fa19.27%\uff0c\u5e76\u4e14\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u76d1\u7763\u6280\u672f\uff0c\u5982DeepFL\u548cGrace\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u7279\u5b9a\u4efb\u52a1\u7684\u57f9\u8bad\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u8986\u76d6\u62c6\u5206\u548c\u63d0\u793a\u94fe\u5bf9\u6545\u969c\u5b9a\u4f4d\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u5c55\u793a\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6392\u5e8f\u53ef\u4ee5\u63d0\u9ad8Top-1\u51c6\u786e\u7387\u9ad8\u8fbe22%\u3002|\n", "2409.13447": "|**2024-09-23**|**AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit**|Mohanna Hoveyda et.al.|[2409.13447](http://arxiv.org/abs/2409.13447)|null|\u5728\u95ee\u7b54\uff08QA\uff09\u9886\u57df\uff0c\u4e0d\u540c\u7684\u95ee\u9898\u53ef\u80fd\u9700\u8981\u4e0d\u540c\u7684\u56de\u7b54\u7b56\u7565\u6765\u6709\u6548\u89e3\u51b3\u3002\u4e00\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7\u7b80\u5355\u7684\u67e5\u627e\u6765\u89e3\u51b3\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u9700\u8981\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u63a8\u7406\u3002\u8fd9\u4e00\u89c2\u5bdf\u7ed3\u679c\u6fc0\u53d1\u4e86\u5f00\u53d1\u4e00\u79cd\u52a8\u6001\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u4e3a\u6bcf\u4e2a\u95ee\u9898\u9002\u5f53\u5730\u9009\u62e9\u6700\u5408\u9002\u7684QA\u7b56\u7565\uff0c\u4ece\u800c\u6784\u5efa\u66f4\u9ad8\u6548\u3001\u66f4\u6709\u6548\u7684\u7cfb\u7edf\uff0c\u80fd\u591f\u5904\u7406\u66f4\u5e7f\u6cdb\u7c7b\u578b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u57fa\u4e8e\u591a\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96c6\u6210\u6700\u65b0\u8fdb\u5c55\uff0c\u5e76\u5c06\u9002\u5e94\u6027QA\u5b9a\u4e49\u4e3a\u4e00\u4e2a\u52a8\u6001\u7f16\u6392\u6311\u6218\u3002\u6211\u4eec\u5c06\u6b64\u89c6\u4e3a\u4e00\u4e2a\u4e0a\u4e0b\u6587\u591a\u81c2\u8001\u864e\u673a\u95ee\u9898\uff0c\u5176\u4e2d\u4e0a\u4e0b\u6587\u7531\u8fdb\u5165\u95ee\u9898\u7684\u7279\u6027\u5b9a\u4e49\uff0c\u800c\u52a8\u4f5c\u7a7a\u95f4\u5305\u62ec\u6f5c\u5728\u7684LLM\u4ee3\u7406\u4e4b\u95f4\u7684\u901a\u4fe1\u56fe\u914d\u7f6e\u3002\u7136\u540e\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u7ebf\u6027\u4e0a\u754c\u4fe1\u5fc3\u8fb9\u754c\u6a21\u578b\uff0c\u4ee5\u5b66\u4e60\u4e0d\u540c\u95ee\u9898\u7c7b\u578b\u4e0e\u5176\u5bf9\u5e94\u7684\u6700\u4f73\u591aLLM\u901a\u4fe1\u56fe\u8868\u793a\u4e4b\u95f4\u7684\u6700\u4f18\u6620\u5c04\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u9002\u7528\u4e8e\u9002\u5e94\u6027\u7684LLM\u96c6\u6210\u95ee\u7b54\u7cfb\u7edf\u7684\u7f16\u6392\uff0c\u5b83\u7ed3\u5408\u4e86\u66f4\u590d\u6742\u7b56\u7565\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u540c\u65f6\u907f\u514d\u4e86\u5728\u7b80\u5355\u7b56\u7565\u8db3\u4ee5\u7684\u60c5\u51b5\u4e0b\u4f7f\u7528\u8fd9\u4e9b\u7b56\u7565\u7684\u6210\u672c\u3002|\n", "2409.15376": "|**2024-09-20**|**ControlMath: Controllable Data Generation Promotes Math Generalist Models**|Nuo Chen et.al.|[2409.15376](http://arxiv.org/abs/2409.15376)|null|\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u95ee\u9898\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u9650\u5236\uff0c\u53ef\u80fd\u4ec5\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u751f\u6210\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aControlMath\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5305\u542b\u4e00\u4e2a\u65b9\u7a0b\u5f0f\u751f\u6210\u6a21\u5757\u548c\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u3002\u8be5\u6a21\u5757\u4ea7\u751f\u591a\u6837\u5316\u7684\u65b9\u7a0b\uff0c\u95ee\u9898\u521b\u9020\u8005\u4ee3\u7406\u968f\u540e\u5c06\u5176\u8f6c\u5316\u4e3a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u9006\u5411\u4ee3\u7406\u5219\u7b5b\u9009\u5e76\u9009\u62e9\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u9075\u5faa\u201c\u5c11\u5373\u662f\u591a\u201d\u7684\u539f\u5219\uff0c\u4f7f\u7528\u66f4\u5c11\u7684\u6570\u636e\u70b9\u5c31\u80fd\u5b9e\u73b0\u66f4\u597d\u7684\u7ed3\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u591a\u6837\u5316\u7684\u6570\u5b66\u95ee\u9898\uff0c\u4e0d\u53d7\u7279\u5b9a\u9886\u57df\u6216\u5206\u5e03\u7684\u9650\u5236\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86ControlMathQA\u6570\u636e\u96c6\uff0c\u5305\u542b19\u4e07\u4e2a\u6570\u5b66\u6587\u5b57\u95ee\u9898\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\uff0c\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0eGSM8K\u7b49\u5185\u90e8\u9886\u57df\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u53ef\u4ee5\u5e2e\u52a9\u63d0\u9ad8\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u4ece\u800c\u5728\u7279\u5b9a\u9886\u57df\u5185\u4ee5\u53ca\u8d85\u51fa\u7279\u5b9a\u9886\u57df\u65f6\u90fd\u80fd\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2409.13107": "|**2024-09-24**|**Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models**|Hao Ding et.al.|[2409.13107](http://arxiv.org/abs/2409.13107)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u673a\u5668\u611f\u77e5\u65b9\u6cd5\uff0c\u65e8\u5728\u5229\u7528\u8fd1\u671f\u89c6\u89c9\u57fa\u7840\u6a21\u578b\u7684\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\u548c\u5f00\u7bb1\u5373\u7528\u7684\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u8fdb\u884c\u89c4\u5212\uff0c\u4e0edVRK\u5e73\u53f0\u96c6\u6210\uff0c\u4ece\u800c\u5f00\u53d1\u51fa\u4e00\u4e2a\u5177\u6709\u5f3a\u5927\u4efb\u52a1\u6027\u80fd\u548c\u5728\u4e0d\u540c\u73af\u5883\u8bbe\u7f6e\u4e0b\u901a\u7528\u6027\u7684\u5b9e\u4f53\u667a\u80fd\u7cfb\u7edf\u3002\u5728\u6267\u884c\u7a7f\u9488\u79fb\u4f4d\u548c\u7eb1\u5e03\u68c0\u7d22\u4efb\u52a1\u65f6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u5f3a\u5927\u7684\u4efb\u52a1\u6027\u80fd\u548c\u901a\u7528\u6027\u3002 \u5c3d\u7ba1\u8868\u73b0\u51fa\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u73b0\uff0c\u4f46\u672c\u6587\u7684\u5de5\u4f5c\u4ec5\u4ec5\u662f\u5bf9\u57fa\u4e8e\u6570\u5b57\u5b6a\u751f\u7684\u573a\u666f\u8868\u793a\u96c6\u6210\u7684\u7b2c\u4e00\u6b65\u3002\u4e3a\u4e86\u5b9e\u73b0\u5168\u9762\u7684\u6570\u5b57\u5b6a\u751f\u6846\u67b6\u4ee5\u6539\u5584\u624b\u672f\u9886\u57df\u5b9e\u4f53\u667a\u80fd\u7684\u53ef\u89e3\u91ca\u6027\u548c\u901a\u7528\u6027\uff0c\u672a\u6765\u7684\u7814\u7a76\u662f\u5fc5\u8981\u7684\u3002|\n", "2409.17515": "|**2024-09-26**|**From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection**|Xinlei Wang et.al.|[2409.17515](http://arxiv.org/abs/2409.17515)|**[link](https://github.com/ameliawong1996/From_News_to_Forecast)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u751f\u6210\u4ee3\u7406\u6765\u589e\u5f3a\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u4ee5\u8bed\u8a00\u4f5c\u4e3a\u5a92\u4ecb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u5e94\u6027\u5730\u5c06\u5404\u79cd\u793e\u4f1a\u4e8b\u4ef6\u6574\u5408\u8fdb\u9884\u6d4b\u6a21\u578b\u4e2d\uff0c\u5c06\u65b0\u95fb\u5185\u5bb9\u4e0e\u65f6\u95f4\u5e8f\u5217\u6ce2\u52a8\u5bf9\u9f50\uff0c\u4ece\u800c\u63d0\u4f9b\u4e30\u5bcc\u6d1e\u5bdf\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u8fdb\u884c\u8fed\u4ee3\u7b5b\u9009\uff0c\u53bb\u9664\u65e0\u5173\u65b0\u95fb\uff0c\u5e76\u91c7\u7528\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u7406\u548c\u53cd\u601d\u6765\u8bc4\u4f30\u9884\u6d4b\u7ed3\u679c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5206\u6790\u590d\u6742\u4e8b\u4ef6\uff0c\u5982\u610f\u5916\u4e8b\u4ef6\u548c\u793e\u4f1a\u884c\u4e3a\u8f6c\u53d8\uff0c\u5e76\u4e0d\u65ad\u4f18\u5316\u9009\u62e9\u903b\u8f91\u4ee5\u53ca\u4ee3\u7406\u8f93\u51fa\u7684\u7a33\u5065\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u7cbe\u9009\u65b0\u95fb\u548c\u65f6\u95f4\u5e8f\u5217\u6570\u636e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u7684LLaMa2\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u901a\u8fc7\u6709\u6548\u5229\u7528\u975e\u7ed3\u6784\u5316\u65b0\u95fb\u6570\u636e\uff0c\u53ef\u80fd\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u9886\u57df\u5b9e\u73b0\u8303\u5f0f\u8f6c\u53d8\u3002|\n", "2409.17266": "|**2024-09-25**|**AAPM: Large Language Model Agent-based Asset Pricing Models**|Junyan Cheng et.al.|[2409.17266](http://arxiv.org/abs/2409.17266)|**[link](https://github.com/chengjunyan1/aapm)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u8d44\u4ea7\u5b9a\u4ef7\u65b9\u6cd5\u2014\u2014\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u8d44\u4ea7\u5b9a\u4ef7\u6a21\u578b\uff08AAPM\uff09\u3002\u8be5\u65b9\u6cd5\u5c06LLM\u4ee3\u7406\u7684\u5b9a\u6027\u4e3b\u89c2\u6295\u8d44\u5206\u6790\u4e0e\u5b9a\u91cf\u624b\u52a8\u91d1\u878d\u7ecf\u6d4e\u56e0\u7d20\u878d\u5408\uff0c\u4ee5\u9884\u6d4b\u8d85\u989d\u8d44\u4ea7\u56de\u62a5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec4\u5408\u4f18\u5316\u548c\u8d44\u4ea7\u5b9a\u4ef7\u8bef\u5dee\u65b9\u9762\u5747\u4f18\u4e8e\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u8d44\u4ea7\u5b9a\u4ef7\u57fa\u51c6\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f02\u5e38\u8d44\u4ea7\u7ec4\u5408\u7684\u590f\u666e\u6bd4\u7387\u548c\u5e73\u5747\u03b1\u503c\u5206\u522b\u63d0\u9ad8\u4e869.6%\u548c10.8%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5bf9\u6570\u636e\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u63d0\u51fa\u65b9\u6cd5\u7684\u66f4\u591a\u89c1\u89e3\u3002**|\n", "2409.20163": "|**2024-09-30**|**MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants**|Zeyu Zhang et.al.|[2409.20163](http://arxiv.org/abs/2409.20163)|**[link](https://github.com/nuster1128/memsim)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMemSim\u7684\u8d1d\u53f6\u65af\u6a21\u62df\u5668\uff0c\u7528\u4e8e\u4ece\u751f\u6210\u7684\u7528\u6237\u6d88\u606f\u81ea\u52a8\u6784\u5efa\u53ef\u9760\u7684\u95ee\u9898\u4e0e\u7b54\u6848\uff08Q&A\uff09\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u591a\u6837\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8d1d\u53f6\u65af\u5173\u7cfb\u7f51\u7edc\uff08BRNet\uff09\u548c\u56e0\u679c\u751f\u6210\u673a\u5236\uff0c\u4ee5\u51cf\u8f7b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e7b\u89c9\u5bf9\u4e8b\u5b9e\u4fe1\u606f\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u81ea\u52a8\u6784\u5efa\u8bc4\u4f30\u6570\u636e\u96c6\u3002\u57fa\u4e8eMemSim\uff0c\u6211\u4eec\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u751f\u6210\u4e86\u4e00\u4e2a\u540d\u4e3aMemDaily\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4f7f\u7528MemDaily\u6570\u636e\u96c6\u8bc4\u4f30LLM\u57fa\u667a\u80fd\u4f53\u4e0d\u540c\u8bb0\u5fc6\u673a\u5236\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u60e0\u53ca\u7814\u7a76\u793e\u533a\uff0c\u6211\u4eec\u5df2\u7ecf\u5728https://github.com/nuster1128/MemSim\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u9879\u76ee\u3002**|\n", "2409.19894": "|**2024-10-01**|**TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation**|Zhiqiang Yuan et.al.|[2409.19894](http://arxiv.org/abs/2409.19894)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRANSAGENT\u7684\u65b0\u578b\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u4ee5\u589e\u5f3a\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u7ffb\u8bd1\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u56db\u4e2a\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u534f\u540c\u5de5\u4f5c\u4fee\u590d\u8bed\u6cd5\u9519\u8bef\u548c\u8bed\u4e49\u9519\u8bef\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5206\u522b\u662f\u521d\u59cb\u4ee3\u7801\u7ffb\u8bd1\u5668\u3001\u8bed\u6cd5\u9519\u8bef\u4fee\u590d\u5668\u3001\u4ee3\u7801\u5bf9\u9f50\u5668\u548c\u8bed\u4e49\u9519\u8bef\u4fee\u590d\u5668\u3002TRANSAGENT\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u9996\u5148\u6839\u636e\u76ee\u6807\u7a0b\u5e8f\u4e0e\u6e90\u7a0b\u5e8f\u4e4b\u95f4\u7684\u6267\u884c\u5bf9\u9f50\u5b9a\u4f4d\u76ee\u6807\u7a0b\u5e8f\u4e2d\u7684\u9519\u8bef\u4ee3\u7801\u5757\uff0c\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u7f29\u5c0f\u4fee\u590d\u8303\u56f4\u5e76\u964d\u4f4e\u4fee\u590d\u96be\u5ea6\u3002 \u4e3a\u4e86\u8bc4\u4f30TRANSAGENT\uff0c\u6211\u4eec\u9996\u5148\u4ece\u6700\u8fd1\u7684\u7f16\u7a0b\u4efb\u52a1\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u4ee5\u51cf\u8f7b\u6f5c\u5728\u7684\u6570\u636e\u6cc4\u9732\u95ee\u9898\u3002\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\uff0cTRANSAGENT\u5728\u7ffb\u8bd1\u6548\u679c\u548c\u6548\u7387\u65b9\u9762\u90fd\u4f18\u4e8e\u6700\u65b0\u7684LLM\u57fa\u4ee3\u7801\u7ffb\u8bd1\u6280\u672fUniTrans\uff1b\u6b64\u5916\uff0c\u5728\u4e0d\u540cLLM\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86TRANSAGENT\u7684\u4e00\u822c\u6027\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u6d88\u878d\u7814\u7a76\u63ed\u793a\u4e86\u6bcf\u4e2a\u4ee3\u7406\u7684\u8d21\u732e\u3002|\n", "2410.01639": "|**2024-10-02**|**Moral Alignment for LLM Agents**|Elizaveta Tennant et.al.|[2410.01639](http://arxiv.org/abs/2410.01639)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51b3\u7b56\u4ee3\u7406\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u5728\u4eba\u7c7b\u6d3b\u52a8\u7684\u4e0d\u540c\u9886\u57df\u90e8\u7f72\u3002\u867d\u7136\u5b83\u4eec\u7684\u5e94\u7528\u76ee\u524d\u8f83\u4e3a\u4e13\u4e1a\u5316\uff0c\u4f46\u5df2\u6709\u7814\u7a76\u52aa\u529b\u5f00\u53d1\u66f4\u901a\u7528\u7684\u4ee3\u7406\u3002\u968f\u7740LLM\u7cfb\u7edf\u53d8\u5f97\u66f4\u52a0\u81ea\u4e3b\uff0c\u5b83\u4eec\u5bf9\u4eba\u7c7b\u6d3b\u52a8\u7684\u5f71\u54cd\u5c06\u589e\u52a0\uff0c\u5e76\u4e14\u900f\u660e\u5ea6\u4f1a\u964d\u4f4e\u3002\u56e0\u6b64\uff0c\u53d1\u5c55\u6709\u6548\u7684\u65b9\u6cd5\u6765\u4f7f\u5b83\u4eec\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u81f3\u5173\u91cd\u8981\u3002 \u73b0\u6709\u7684\u5bf9\u9f50\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\uff08\u4f8b\u5982\uff0c\u5728RLHF\u6216DPO\u4e2d\uff09\uff0c\u5176\u4e2d\u4ef7\u503c\u89c2\u662f\u9690\u542b\u7684\uff0c\u5e76\u4e14\u672c\u8d28\u4e0a\u662f\u4ece\u4e0d\u540c\u6a21\u578b\u8f93\u51fa\u7684\u76f8\u5bf9\u504f\u597d\u4e2d\u63a8\u65ad\u51fa\u6765\u7684\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u6211\u4eec\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u65b9\u6cd5\uff0c\u8fd9\u4e9b\u51fd\u6570\u660e\u786e\u7f16\u7801\u4e86\u6838\u5fc3\u7684\u4eba\u7c7b\u4ef7\u503c\u89c2\uff0c\u7528\u4e8e\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u65b9\u5f0f\u5fae\u8c03\u57fa\u7840\u4ee3\u7406\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4f7f\u7528\u5185\u5728\u5956\u52b1\u6765\u5b9e\u73b0LLM\u4ee3\u7406\u7684\u9053\u5fb7\u5bf9\u9f50\u3002 \u6211\u4eec\u901a\u8fc7\u4f20\u7edf\u7684\u54f2\u5b66\u6846\u67b6\u2014\u2014\u5fb7ontology\u4f26\u7406\u548c\u529f\u5229\u4e3b\u4e49\u6765\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u91cf\u5316\u4e86\u5728\u8fed\u4ee3\u56da\u5f92\u56f0\u5883\uff08IPD\uff09\u73af\u5883\u4e2d\u4ee3\u7406\u7684\u9053\u5fb7\u5956\u52b1\uff0c\u57fa\u4e8e\u5176\u884c\u4e3a\u53ca\u5176\u540e\u679c\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u9053\u5fb7\u5fae\u8c03\u4f7f\u4ee3\u7406\u80fd\u591f\u653e\u5f03\u4e4b\u524d\u5f00\u53d1\u7684\u81ea\u79c1\u7b56\u7565\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u67d0\u4e9b\u5728IPD\u6e38\u620f\u4e2d\u5b66\u4e60\u7684\u9053\u5fb7\u7b56\u7565\u80fd\u591f\u63a8\u5e7f\u5230\u591a\u4e2a\u77e9\u9635\u6e38\u620f\u73af\u5883\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u4f7f\u7528\u5185\u5728\u5956\u52b1\u8fdb\u884c\u5fae\u8c03\u662f\u5c06LLM\u4ee3\u7406\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u6709\u524d\u666f\u7684\u4e00\u822c\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e14\u53ef\u80fd\u4ee3\u8868\u4e86\u5f53\u524d\u4e3b\u6d41\u5bf9\u9f50\u6280\u672f\u66f4\u52a0\u900f\u660e\u548c\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u7684\u66ff\u4ee3\u65b9\u6848\u3002|\n", "2410.01242": "|**2024-10-03**|**RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance**|Haolin Jin et.al.|[2410.01242](http://arxiv.org/abs/2410.01242)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6700\u8fd1\u7684\u63d0\u793a\u5de5\u7a0b\u7814\u7a76\u8fdb\u4e00\u6b65\u589e\u5f3a\u4e86LLM\u5bf9\u6587\u672c\u4fe1\u606f\u7684\u7406\u89e3\u3002\u7136\u800c\uff0c\u786e\u4fdd\u751f\u6210\u4ee3\u7801\u7684\u51c6\u786e\u6027\u901a\u5e38\u9700\u8981\u7a0b\u5e8f\u5458\u8fdb\u884c\u5927\u91cf\u7684\u6d4b\u8bd5\u548c\u9a8c\u8bc1\u3002\u5c3d\u7ba1LLM\u80fd\u591f\u57fa\u4e8e\u4efb\u52a1\u63cf\u8ff0\u751f\u6210\u4ee3\u7801\uff0c\u4f46\u5728\u590d\u6742\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u5ea6\u4ecd\u7136\u6709\u9650\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u90a3\u4e9b\u9700\u8981\u66f4\u6df1\u5165\u7406\u89e3\u95ee\u9898\u9648\u8ff0\u548c\u4ee3\u7801\u751f\u6210\u8fc7\u7a0b\u7684\u4efb\u52a1\u3002\u8fd9\u4e00\u9650\u5236\u4e3b\u8981\u6e90\u4e8eLLM\u540c\u65f6\u9700\u8981\u7406\u89e3\u548c\u751f\u6210\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u6b63\u786e\u7684\u4ee3\u7801\uff0c\u800c\u6ca1\u6709\u80fd\u529b\u81ea\u52a8\u4f18\u5316\u4ee3\u7801\u7684\u80fd\u529b\u3002\u5728\u5b9e\u9645\u7684\u8f6f\u4ef6\u5f00\u53d1\u4e2d\uff0c\u7a0b\u5e8f\u5458\u5f88\u5c11\u80fd\u5728\u4ec5\u51ed\u4efb\u52a1\u63cf\u8ff0\u7684\u60c5\u51b5\u4e0b\u4e00\u6b21\u5c31\u751f\u6210\u5b8c\u7f8e\u7684\u4ee3\u7801\uff0c\u4ed6\u4eec\u4f9d\u8d56\u4e8e\u8fed\u4ee3\u53cd\u9988\u548c\u8c03\u8bd5\u6765\u5b8c\u5584\u4ed6\u4eec\u7684\u7a0b\u5e8f\u3002\u53d7\u6b64\u8fc7\u7a0b\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u591a\u667a\u80fd\u4f53\u67b6\u6784\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u548c\u81ea\u52a8\u8c03\u8bd5\uff1a\u6539\u8fdb\u4e0e\u6307\u5bfc\u8c03\u8bd5\uff08RGD\uff09\u3002RGD\u6846\u67b6\u662f\u4e00\u4e2a\u5229\u7528\u4e09\u79cd\u4e0d\u540cLLM\u4ee3\u7406\uff08\u5f15\u5bfc\u4ee3\u7406\u3001\u8c03\u8bd5\u4ee3\u7406\u548c\u53cd\u9988\u4ee3\u7406\uff09\u7684\u591a\u667a\u80fd\u4f53\u8c03\u8bd5\u5668\uff0c\u5b83\u5c06\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u591a\u4e2a\u6b65\u9aa4\uff0c\u786e\u4fdd\u4e86\u6e05\u6670\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5141\u8bb8\u57fa\u4e8e\u81ea\u6211\u53cd\u601d\u548c\u53cd\u9988\u7684\u4ee3\u7801\u8fed\u4ee3\u7ec6\u5316\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRGD\u5728\u4ee3\u7801\u751f\u6210\u80fd\u529b\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5206\u522b\u5728HumanEval\u6570\u636e\u96c6\u548cMBPP\u6570\u636e\u96c6\u4e0a\u76f8\u6bd4\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u4f20\u7edf\u76f4\u63a5\u63d0\u793a\u65b9\u6cd5\u5b9e\u73b0\u4e869.8%\u548c16.2%\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u5f3a\u8c03\u4e86RGD\u6846\u67b6\u5728\u589e\u5f3aLLM\u81ea\u4e3b\u751f\u6210\u548c\u4f18\u5316\u4ee3\u7801\u80fd\u529b\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2410.00467": "|**2024-10-01**|**Dynamic Planning for LLM-based Graphical User Interface Automation**|Shaoqing Zhang et.al.|[2410.00467](http://arxiv.org/abs/2410.00467)|**[link](https://github.com/sqzhang-lazy/d-pot)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u6fc0\u53d1\u4e86\u5bf9\u81ea\u4e3bLLM\u57fa\u4ee3\u7406\u8fdb\u884c\u521b\u65b0\u6027\u53d1\u5c55\u7684\u5174\u8da3\uff0c\u5c24\u5176\u662f\u5728\u667a\u80fd\u624b\u673a\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4e2d\u7684\u5e94\u7528\u3002\u5f53\u9762\u5bf9\u4efb\u52a1\u76ee\u6807\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u901a\u5e38\u4f1a\u6a21\u4eff\u4eba\u7c7b\u5728GUI\u73af\u5883\u4e2d\u7684\u64cd\u4f5c\u76f4\u81f3\u4efb\u52a1\u5b8c\u6210\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5173\u952e\u6311\u6218\u5728\u4e8e\u5982\u4f55\u6709\u6548\u5730\u5236\u5b9a\u8ba1\u5212\u4ee5\u6307\u5bfcGUI\u4efb\u52a1\u4e2d\u7684\u52a8\u4f5c\u9884\u6d4b\uff0c\u5c3d\u7ba1\u89c4\u5212\u5df2\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u5206\u89e3\u590d\u6742\u4efb\u52a1\u7684\u6709\u6548\u65b9\u5f0f\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728\u6267\u884c\u52a8\u4f5c\u540eGUI\u73af\u5883\u7684\u52a8\u6001\u6027\u8d28\u610f\u5473\u7740\u9700\u8981\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u52a8\u4f5c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u8ba1\u5212\u3002 \u6211\u4eec\u53d1\u73b0\u5e7f\u53d7\u6b22\u8fce\u7684ReAct\u65b9\u6cd5\u5931\u8d25\u4e86\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u8fc7\u4e8e\u4f9d\u8d56\u8fc7\u957f\u7684\u5386\u53f2\u5bf9\u8bdd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u52a8\u6001\u601d\u7ef4\u89c4\u5212\uff08D-PoT\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u57fa\u4e8eLLM\u7684GUI\u4ee3\u7406\u3002D-PoT\u6d89\u53ca\u6839\u636e\u73af\u5883\u53cd\u9988\u548c\u6267\u884c\u5386\u53f2\u52a8\u6001\u8c03\u6574\u89c4\u5212\u7684\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684D-PoT\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u4e0a\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684GPT-4V\u57fa\u7ebf\uff0c\u63d0\u9ad8\u4e8612.7%\uff08\u4ece34.66%\u63d0\u9ad8\u523047.36%\uff09\u3002\u5206\u6790\u63ed\u793a\u4e86\u52a8\u6001\u89c4\u5212\u5728\u4e0d\u540c\u57fa\u7840LLM\u4e2d\u7684\u901a\u7528\u6027\uff0c\u4ee5\u53ca\u5728\u5904\u7406\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u65f6\u51cf\u5c11\u5e7b\u89c9\u5e76\u9002\u5e94\u7684\u80fd\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/sqzhang-lazy/D-PoT\u3002**|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\uff0c\u5b83\u4eec\u7ecf\u5e38\u9047\u5230\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u5fae\u4e4b\u5904\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large Language Model with Imperfect World MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u96c6\u6210\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\uff0c\u7528\u4e8e\u65f6\u95f4\u4e0a\u4e00\u81f4\u7684\u7ecf\u9a8c\u91c7\u6837\uff0c\u4e00\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\u96c6\u5408\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u5c04\u6027\u589e\u5f3a\u751f\u6210\u6a21\u5757\uff0c\u7528\u4e8e\u53cd\u6620\u5148\u524d\u7684\u7ecf\u9a8c\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u9ad8\u5f3a\u5f00\u6e90LLMs\uff0c\u5982LLaMA-3\uff0c\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u5206\u522b\u4e3a2.04\u500d\u30011.54\u500d\u548c1.82\u500d\uff0c\u5206\u522b\u3002\u8fd9\u79cd\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5b83\u4eec\u66f4\u5927\u7684\u540c\u8f88\uff0c\u5982GPT-4\u3002|\n", "2410.02644": "|**2024-10-03**|**Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents**|Hanrong Zhang et.al.|[2410.02644](http://arxiv.org/abs/2410.02644)|**[link](https://github.com/agiresearch/asb)**|**\u4e3a\u4e86\u586b\u8865\u73b0\u6709\u6587\u732e\u5728\u5168\u9762\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u653b\u51fb\u4e0e\u9632\u5fa1\u7b56\u7565\u65b9\u9762\u7684\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u5b89\u5168\u57fa\u51c6\u201d\uff08Agent Security Benchmark, ASB\uff09\u7684\u7efc\u5408\u6846\u67b6\u3002\u8be5\u6846\u67b6\u65e8\u5728\u6b63\u5f0f\u5316\u3001\u6807\u51c6\u5316\u5e76\u8bc4\u4f30\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u5b89\u5168\u95ee\u9898\uff0c\u6db5\u76d6\u4e8610\u4e2a\u5e94\u7528\u573a\u666f\uff08\u5982\u7535\u5b50\u5546\u52a1\u3001\u81ea\u52a8\u9a7e\u9a76\u3001\u91d1\u878d\uff09\u300110\u4e2a\u9488\u5bf9\u8fd9\u4e9b\u573a\u666f\u7684\u4ee3\u7406\u3001\u8d85\u8fc7400\u79cd\u5de5\u5177\u300123\u7c7b\u4e0d\u540c\u7684\u653b\u51fb\u4e0e\u9632\u5fa1\u65b9\u6cd5\u4ee5\u53ca8\u4e2a\u8bc4\u4ef7\u6307\u6807\u3002\u57fa\u4e8eASB\uff0c\u6211\u4eec\u5bf910\u79cd\u63d0\u793a\u6ce8\u5165\u653b\u51fb\u3001\u4e00\u79cd\u8bb0\u5fc6\u6c61\u67d3\u653b\u51fb\u3001\u4e00\u79cd\u65b0\u9896\u7684\u8ba1\u5212-\u601d\u7ef4\u540e\u95e8\u653b\u51fb\u3001\u4e00\u79cd\u6df7\u5408\u653b\u51fb\u4ee5\u53ca\u9488\u5bf9\u8fd910\u79cd\u653b\u51fb\u768410\u79cd\u76f8\u5e94\u9632\u5fa1\u63aa\u65bd\uff0c\u572813\u4e2aLLM\u67b6\u6784\u4e0b\u8fdb\u884c\u4e86\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u603b\u5171\u4ea7\u751f\u4e86\u8fd19\u4e07\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u3002\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\u7ed3\u679c\u63ed\u793a\u4e86\u4ee3\u7406\u64cd\u4f5c\u4e0d\u540c\u9636\u6bb5\u4e2d\u7684\u5173\u952e\u5b89\u5168\u6f0f\u6d1e\uff0c\u5305\u62ec\u7cfb\u7edf\u63d0\u793a\u3001\u7528\u6237\u63d0\u793a\u5904\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u68c0\u7d22\uff0c\u5176\u4e2d\u6700\u9ad8\u5e73\u5747\u653b\u51fb\u6210\u529f\u7387\u8fbe\u5230\u4e8684.30%\uff0c\u4f46\u5f53\u524d\u7684\u9632\u5fa1\u63aa\u65bd\u7684\u6709\u6548\u6027\u6709\u9650\uff0c\u8fd9\u8868\u660e\u793e\u533a\u5728\u4ee3\u7406\u5b89\u5168\u65b9\u9762\u4ecd\u6709\u8bb8\u591a\u5de5\u4f5c\u8981\u505a\u3002\u6709\u5173\u6b64\u7814\u7a76\u7684\u4ee3\u7801\u53ef\u5728https://github.com/agiresearch/ASB\u83b7\u53d6\u3002**|\n", "2410.02551": "|**2024-10-03**|**ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration**|Zixiang Wang et.al.|[2410.02551](http://arxiv.org/abs/2410.02551)|null|\u6211\u4eec\u5f15\u5165\u4e86ColaCare\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u589e\u5f3a\u4e86\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u5efa\u6a21\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u7f1d\u5730\u5c06\u9886\u57df\u7279\u5b9a\u7684\u4e13\u4e1a\u6a21\u578b\u4e0eLLM\u7ed3\u5408\uff0c\u4ee5\u5f25\u5408\u7ed3\u6784\u5316EHR\u6570\u636e\u4e0e\u57fa\u4e8e\u6587\u672c\u7684\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u53d7\u4e34\u5e8a\u54a8\u8be2\u7684\u542f\u53d1\uff0cColaCare\u91c7\u7528\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4ee3\u7406\uff1a\u533b\u751f\u4ee3\u7406\u548c\u5143\u4ee3\u7406\uff0c\u5b83\u4eec\u534f\u540c\u5206\u6790\u60a3\u8005\u6570\u636e\u3002\u4e13\u5bb6\u6a21\u578b\u5904\u7406\u5e76\u4ece\u6570\u503cEHR\u6570\u636e\u751f\u6210\u9884\u6d4b\uff0c\u800cLLM\u4ee3\u7406\u5728\u534f\u4f5c\u54a8\u8be2\u6846\u67b6\u5185\u4ea7\u751f\u63a8\u7406\u53c2\u8003\u548c\u51b3\u7b56\u62a5\u544a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u5757\u5c06\u9ed8\u514b\u8bca\u65ad\u4e0e\u6cbb\u7597\u624b\u518c\uff08MSD\uff09\u533b\u7597\u6307\u5bfc\u6574\u5408\u8fdb\u6765\uff0c\u63d0\u4f9b\u6743\u5a01\u8bc1\u636e\u652f\u6301\u3002\u5728\u56db\u4e2a\u4e0d\u540c\u7684EHR\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8bc1\u660e\u4e86ColaCare\u5728\u6b7b\u4ea1\u7387\u9884\u6d4b\u4efb\u52a1\u4e2d\u7684\u4f18\u8d8a\u6027\u80fd\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5176\u5728\u4e34\u5e8a\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u548c\u63a8\u8fdb\u4e2a\u6027\u5316\u7cbe\u51c6\u533b\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u3002\u6709\u5173\u4ee3\u7801\u3001\u5b8c\u6574\u63d0\u793a\u6a21\u677f\u3001\u66f4\u591a\u6848\u4f8b\u7814\u7a76\u7b49\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u533f\u540d\u94fe\u63a5\uff1a\u3002|\n", "2410.02406": "|**2024-10-03**|**ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR**|Mengxu Pan et.al.|[2410.02406](http://arxiv.org/abs/2410.02406)|null|\u8bb8\u591a\u4eba\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u65f6\u4f1a\u9047\u5230\u56f0\u96be\uff0c\u4f20\u7edf\u7684\u5de5\u5177\u5728\u63d0\u4f9b\u9488\u5bf9\u6bcf\u4e2a\u5b66\u4e60\u8005\u9700\u6c42\u7684\u4e0a\u4e0b\u6587\u5316\u5b66\u4e60\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5728\u793e\u4ea4\u865a\u62df\u73b0\u5b9e\uff08VR\uff09\u4e2d\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\uff08ECAs\uff09\u7684\u53d1\u5c55\uff0c\u63d0\u4f9b\u4e86\u4ee5\u4e00\u79cd\u8003\u8651\u5230\u5b66\u4e60\u8005\u7684\u8bed\u8a00\u6c34\u5e73\u548c\u9700\u6c42\u7684\u65b9\u5f0f\u8fdb\u884c\u4e0a\u4e0b\u6587\u5316\u4e14\u81ea\u7136\u7684\u8bed\u8a00\u5b66\u4e60\u7684\u65b0\u673a\u4f1a\u3002\u4e3a\u4e86\u63a2\u7d22\u8fd9\u4e00\u53ef\u80fd\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ELLMA-T\uff0c\u4e00\u4e2a\u5229\u7528GPT-4\u548c\u57fa\u4e8e\u60c5\u5883\u5b66\u4e60\u6846\u67b6\u6765\u652f\u6301\u793e\u4ea4VR\uff08VRChat\uff09\u4e2d\u82f1\u8bed\u8bed\u8a00\u5b66\u4e60\u7684\u5177\u8eab\u5bf9\u8bdd\u4ee3\u7406\u3002\u901a\u8fc712\u6b21\u7684\u8d28\u6027\u8bbf\u8c08\uff0c\u6211\u4eec\u63ed\u793a\u4e86ELLMA-T\u5728VR\u4e2d\u4e3a\u5b66\u4e60\u8005\u4e0e\u4ee3\u7406\u4e4b\u95f4\u7684\u4e92\u52a8\u751f\u6210\u771f\u5b9e\u3001\u53ef\u4fe1\u548c\u4e0a\u4e0b\u6587\u7279\u5b9a\u7684\u89d2\u8272\u626e\u6f14\u7684\u6f5c\u529b\uff0c\u4ee5\u53caLLM\u5728\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u521d\u59cb\u8bed\u8a00\u8bc4\u4f30\u548c\u6301\u7eed\u53cd\u9988\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9\u4e8e\u672a\u6765\u5f00\u53d1\u57fa\u4e8eLLM\u7684\u8bed\u8a00\u4ee3\u7406\u5728\u793e\u4ea4VR\u4e2d\u7684\u4e94\u4e2a\u8bbe\u8ba1\u542f\u793a\u3002|\n", "2410.02165": "|**2024-10-03**|**A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization**|Yucheng Chu et.al.|[2410.02165](http://arxiv.org/abs/2410.02165)|null|\u5728\u5b66\u4e60\u5206\u6790\uff08LA\uff09\u7684\u80cc\u666f\u4e0b\uff0c\u5f00\u653e\u5f0f\u77ed\u7b54\u95ee\u9898\uff08SAG\uff09\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u6df1\u5165\u4e86\u89e3\u5b66\u4e60\u8005\u54cd\u5e94\u7684\u5f3a\u5927\u5de5\u5177\u3002\u7136\u800c\uff0c\u5728\u5b9e\u8df5\u4e2d\uff0cSAG\u7ecf\u5e38\u9762\u4e34\u9ad8\u8bc4\u5206\u5de5\u4f5c\u91cf\u548c\u8bc4\u4f30\u4e00\u81f4\u6027\u62c5\u5fe7\u7684\u6311\u6218\u3002\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u81ea\u52a8\u77ed\u7b54\u8bc4\u5206\uff08ASAG\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u7684ASAG\u7b97\u6cd5\u5f80\u5f80\u5728\u6cdb\u5316\u80fd\u529b\u4e0a\u6709\u9650\uff0c\u5e76\u503e\u5411\u4e8e\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u8fdb\u884c\u5b9a\u5236\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u591a\u4ee3\u7406ASAG\u6846\u67b6GradeOpt\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aSAG\u7684\u8bc4\u5206\u5458\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cGradeOpt\u5f15\u5165\u4e86\u4e24\u4e2a\u57fa\u4e8eLLM\u7684\u989d\u5916\u4ee3\u7406\u2014\u2014\u53cd\u5c04\u5668\u548c\u7ec6\u5316\u5668\u2014\u2014\u5230\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4f7f\u5f97GradeOpt\u80fd\u591f\u901a\u8fc7\u5bf9\u5176\u9519\u8bef\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u6765\u81ea\u52a8\u4f18\u5316\u539f\u59cb\u8bc4\u5206\u6307\u5357\u3002\u5728\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684ASAG\u4efb\u52a1\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5373\u5bf9\u6559\u5b66\u5185\u5bb9\u77e5\u8bc6\uff08PCK\uff09\u548c\u5185\u5bb9\u77e5\u8bc6\uff08CK\uff09\u95ee\u9898\u8fdb\u884c\u8bc4\u5206\u65f6\uff0cGradeOpt\u5728\u8bc4\u5206\u51c6\u786e\u6027\u548c\u4e0e\u4eba\u5de5\u8bc4\u5206\u5458\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u4ee3\u8868\u57fa\u7ebf\u7684\u6027\u80fd\u3002\u6700\u540e\uff0c\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8bc1\u5b9e\u4e86GradeOpt\u4e2d\u8bbe\u8ba1\u7684\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002|\n", "2410.02026": "|**2024-10-02**|**Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics**|Yuan Zhou et.al.|[2410.02026](http://arxiv.org/abs/2410.02026)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aZODIAC\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0c\u8f85\u52a9\u5fc3\u810f\u75c5\u5b66\u8bca\u65ad\u3002ZODIAC\u80fd\u591f\u4ece\u60a3\u8005\u6570\u636e\u4e2d\u63d0\u53d6\u4e34\u5e8a\u76f8\u5173\u7279\u5f81\u3001\u68c0\u6d4b\u91cd\u8981\u7684\u5fc3\u5f8b\u5931\u5e38\uff0c\u5e76\u751f\u6210\u521d\u6b65\u62a5\u544a\u4f9b\u5fc3\u810f\u75c5\u4e13\u5bb6\u5ba1\u67e5\u548c\u7ec6\u5316\u3002\u4e3a\u4e86\u5b9e\u73b0\u5fc3\u810f\u75c5\u4e13\u5bb6\u7ea7\u522b\u7684\u4e13\u4e1a\u7d20\u517b\uff0cZODIAC\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u4ee3\u7406\u534f\u4f5c\u6846\u67b6\uff0c\u5141\u8bb8\u5bf9\u591a\u6a21\u6001\u60a3\u8005\u6570\u636e\u8fdb\u884c\u5904\u7406\u3002\u6bcf\u4e2aLLM\u4ee3\u7406\u5747\u901a\u8fc7\u5fc3\u810f\u75c5\u4e13\u5bb6\u88c1\u5b9a\u7684\u771f\u5b9e\u4e16\u754c\u60a3\u8005\u6570\u636e\u8fdb\u884c\u7cbe\u7ec6\u8c03\u4f18\uff0c\u4ee5\u6b64\u5f3a\u5316\u6a21\u578b\u7684\u4e13\u4e1a\u7d20\u517b\u3002 ZODIAC\u7ecf\u8fc7\u4e86\u4e25\u683c\u7684\u4e34\u5e8a\u9a8c\u8bc1\uff0c\u7531\u72ec\u7acb\u7684\u5fc3\u810f\u75c5\u4e13\u5bb6\u8bc4\u4f30\uff0c\u6db5\u76d6\u516b\u4e2a\u6307\u6807\uff0c\u8861\u91cf\u4e34\u5e8a\u6548\u679c\u5e76\u89e3\u51b3\u5b89\u5168\u95ee\u9898\u3002\u7ed3\u679c\u663e\u793a\uff0cZODIAC\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u884c\u4e1a\u9886\u5148\u7684\u6a21\u578b\uff0c\u5305\u62ecOpenAI\u7684GPT-4o\u3001Meta\u7684Llama-3.1-405B\u548cGoogle\u7684Gemini-pro\uff0c\u4ee5\u53ca\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684LLM\u5982\u5fae\u8f6f\u7684BioGPT\u3002\u8fd9\u8868\u660e\u4e86\u4e13\u95e8\u8bbe\u8ba1\u7684LLM\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u63d0\u4f9b\u7b26\u5408\u533b\u7597\u5b9e\u8df5\u4e25\u683c\u8981\u6c42\u7684\u9886\u57df\u7279\u5b9a\u89e3\u51b3\u65b9\u6848\u3002 \u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cZODIAC\u5df2\u6210\u529f\u96c6\u6210\u5230\u5fc3\u7535\u56fe(ECG)\u8bbe\u5907\u4e2d\uff0c\u5c55\u793a\u4e86\u5c06LLM\u5d4c\u5165\u8f6f\u4ef6\u4f5c\u4e3a\u533b\u7597\u8bbe\u5907(SaMD)\u7684\u8d8b\u52bf\u65e5\u76ca\u589e\u957f\u3002|\n", "2410.03055": "|**2024-10-04**|**Permissive Information-Flow Analysis for Large Language Models**|Shoaib Ahmed Siddiqui et.al.|[2410.03055](http://arxiv.org/abs/2410.03055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u5feb\u901f\u6210\u4e3a\u66f4\u5927\u8f6f\u4ef6\u7cfb\u7edf\u4e2d\u7684\u901a\u7528\u7ec4\u4ef6\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u81ea\u7136\u7684\u5b89\u5168\u548c\u9690\u79c1\u95ee\u9898\uff1a\u4ece\u4e00\u4e2a\u7ec4\u4ef6\u83b7\u53d6\u7684\u6c61\u67d3\u6570\u636e\u53ef\u4ee5\u6539\u53d8\u6a21\u578b\u7684\u884c\u4e3a\u5e76\u7834\u574f\u6574\u4e2a\u7cfb\u7edf\uff0c\u5305\u62ec\u4f7f\u6a21\u578b\u5728\u4e0d\u53ef\u4fe1\u7ec4\u4ef6\u95f4\u4f20\u64ad\u673a\u5bc6\u6570\u636e\u3002\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u5728\u7cfb\u7edf\u5c42\u9762\u4e0a\u901a\u8fc7\u52a8\u6001\u4fe1\u606f\u6d41\u8ddf\u8e2a\uff08\u5373\u6c61\u70b9\u8ddf\u8e2a\uff09\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u4f20\u7edf\u65b9\u6cd5\u5c06\u6700\u4e25\u683c\u7684\u8f93\u5165\u6807\u7b7e\u4f20\u64ad\u5230\u8f93\u51fa\u8fc7\u4e8e\u4fdd\u5b88\uff0c\u4e0d\u9002\u5408LLM\u5728\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u8f93\u5165\u4e0a\u64cd\u4f5c\u7684\u5e94\u7528\u573a\u666f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u66f4\u5bbd\u677e\u7684\u65b9\u6cd5\u6765\u5728LLM\u67e5\u8be2\u4e2d\u4f20\u64ad\u4fe1\u606f\u6d41\u6807\u7b7e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u4ec5\u4f20\u64ad\u751f\u6210\u6a21\u578b\u8f93\u51fa\u65f6\u8d77\u4f5c\u7528\u7684\u6837\u672c\u7684\u6807\u7b7e\uff0c\u5e76\u6d88\u9664\u4e0d\u5fc5\u8981\u7684\u8f93\u5165\u6807\u7b7e\u3002 \u6211\u4eec\u5b9e\u73b0\u4e86\u5e76\u7814\u7a76\u4e86\u4e24\u79cd\u8fd9\u79cd\u65b9\u6cd5\u7684\u53d8\u4f53\uff0c\u57fa\u4e8e\uff08i\uff09\u63d0\u793a\u589e\u5f3a\u68c0\u7d22\u548c\uff08ii\uff09\u57fa\u4e8e$k$\u4e2a\u6700\u8fd1\u90bb\u7684\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u76f4\u63a5\u8be2\u95ee\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u8f93\u51fa\u6807\u7b7e\u7684\u53cd\u7701\u5f0f\u5f71\u54cd\u4f30\u8ba1\u5668\u57fa\u7ebf\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u63d0\u793a\u7684\u6807\u7b7e\u4f20\u64ad\u5668\u65b9\u6cd5\u5728\u8d85\u8fc785%\u7684\u60c5\u51b5\u4e0b\u63d0\u9ad8\u4e86\u6807\u7b7e\u8d28\u91cf\uff0c\u5728LLM\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6548\u679c\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5728\u68c0\u7d22\u589e\u5f3a\u4e2d\u4f7f\u7528\u5bbd\u677e\u6807\u7b7e\u4f20\u64ad\u7684\u5b9e\u7528\u6027\u3002|\n", "2410.02958": "|**2024-10-03**|**AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML**|Patara Trirat et.al.|[2410.02958](http://arxiv.org/abs/2410.02958)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u591a\u4ee3\u7406\u6846\u67b6\u2014\u2014AutoML-Agent\uff0c\u4e13\u4e3a\u5168\u7ba1\u9053\u81ea\u52a8\u5316\u673a\u5668\u5b66\u4e60\uff08AutoML\uff09\u8bbe\u8ba1\uff0c\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u68c0\u7d22\u5230\u6a21\u578b\u90e8\u7f72\u7684\u6574\u4e2a\u8fc7\u7a0b\u3002AutoML-Agent\u901a\u8fc7\u63a5\u53d7\u7528\u6237\u7684\u4efb\u52a1\u63cf\u8ff0\u3001\u4fc3\u8fdb\u4e13\u95e8\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u4e4b\u95f4\u7684\u534f\u4f5c\uff0c\u5e76\u4ea4\u4ed8\u53ef\u90e8\u7f72\u7684\u6a21\u578b\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\uff0c\u4ee5\u7b80\u5316\u975e\u4e13\u5bb6\u7528\u6237\u6784\u5efa\u6570\u636e\u9a71\u52a8\u89e3\u51b3\u65b9\u6848\u7684\u8fc7\u7a0b\u3002\u4e0e\u73b0\u6709\u5de5\u4f5c\u4e0d\u540c\uff0c\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u7684\u89c4\u5212\u7b56\u7565\u6765\u63d0\u9ad8\u63a2\u7d22\u6027\uff0c\u4ee5\u4fbf\u5728\u641c\u7d22\u66f4\u4f18\u89e3\u7684\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u63a2\u7d22\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5e76\u884c\u6267\u884c\u6765\u5206\u89e3\u6bcf\u4e2a\u8ba1\u5212\u4e3a\u5b50\u4efb\u52a1\uff08\u4f8b\u5982\u6570\u636e\u9884\u5904\u7406\u548c\u795e\u7ecf\u7f51\u7edc\u8bbe\u8ba1\uff09\uff0c\u6bcf\u4e2a\u5b50\u4efb\u52a1\u7531\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u6784\u5efa\u7684\u4e13\u95e8\u4ee3\u7406\u89e3\u51b3\uff0c\u8fd9\u4f7f\u5f97\u641c\u7d22\u8fc7\u7a0b\u66f4\u52a0\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u591a\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u6765\u9a8c\u8bc1\u6267\u884c\u7ed3\u679c\uff0c\u5e76\u6307\u5bfc\u4ee3\u7801\u751f\u6210\u8bed\u8a00\u6a21\u578b\u5b9e\u73b0\u6210\u529f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u4e03\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u4f7f\u7528\u5341\u56db\u7ec4\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cAutoML-Agent\u5728\u81ea\u52a8\u5316\u5168AutoML\u6d41\u7a0b\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u6210\u529f\u7387\uff0c\u4e14\u7cfb\u7edf\u5728\u6574\u4e2a\u591a\u6837\u5316\u9886\u57df\u4e2d\u7684\u6027\u80fd\u5747\u8868\u73b0\u51fa\u8272\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u548c\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u56e0\u4e3a\u81ea\u7136\u8bed\u8a00\u901a\u4fe1\u5728\u6b64\u7c7b\u573a\u666f\u4e2d\u901a\u5e38\u5360\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u4e14\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u6218\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u800c\u8a00\uff0c\u8fd9\u4e9b\u95ee\u9898\u81f3\u5173\u91cd\u8981\u3002 \u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u5df2\u7ecf\u63a2\u7d22\u4e86LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u5728\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u4e0a\u7684\u5dee\u5f02\u4f7f\u5f97\u96be\u4ee5\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u4ee5\u6807\u51c6\u5316\u5bf9\u57fa\u4e8e\u8bed\u8a00\u7684\u53cc\u4eba\u3001\u5e8f\u5217\u6e38\u620f\u7684\u7814\u7a76\u3002\u501f\u9274\u7ecf\u6d4e\u5b66\u6587\u732e\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u7c7b\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u4ee5\u53ca\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u4e0e\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u8861\u91cf\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u8fdb\u884c\u4ea4\u4e92\u6a21\u62df\u4e0e\u5206\u6790\uff0c\u5e76\u5229\u7528\u8be5\u6846\u67b6\u6536\u96c6\u4e86LLM\u4e0eLVM\u4e4b\u95f4\u7684\u591a\u4e2a\u6e38\u620f\u914d\u7f6e\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u4e0eLVM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\uff1a(i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b(ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u7ee9\u6548\u89d2\u5ea6\u8bc4\u4f30\u4ee3\u7406\uff1b(iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.04360": "|**2024-10-09**|**GenSim: A General Social Simulation Platform with Large Language Model based Agents**|Jiakai Tang et.al.|[2410.04360](http://arxiv.org/abs/2410.04360)|**[link](https://github.com/TangJiakai/GenSim)**|**\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5229\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u793e\u4f1a\u884c\u4e3a\u7684\u7814\u7a76\u53d6\u5f97\u4e86\u8bb8\u591a\u6709\u524d\u666f\u7684\u6210\u679c\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u5728\u7279\u5b9a\u573a\u666f\u4e0b\u5c55\u793a\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5e76\u4e14\u6d89\u53ca\u6709\u9650\u6570\u91cf\u7684\u4ee3\u7406\uff0c\u4f46\u5b83\u4eec\u5927\u591a\u7f3a\u4e4f\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u51fa\u73b0\u9519\u8bef\u65f6\u8fdb\u884c\u9002\u5e94\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textit{GenSim}\u7684\u65b0\u9896\u7684\u57fa\u4e8eLLM\u7684\u4eff\u771f\u5e73\u53f0\uff1a\uff081\uff09\\textbf{\u62bd\u8c61\u4e86\u4e00\u7ec4\u901a\u7528\u529f\u80fd}\uff0c\u7b80\u5316\u4e86\u5b9a\u5236\u793e\u4f1a\u573a\u666f\u7684\u4eff\u771f\uff1b\uff082\uff09\\textbf{\u652f\u6301\u4e00\u767e\u4e07\u4e2a\u4ee3\u7406}\uff0c\u4ee5\u66f4\u597d\u5730\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u60c5\u5883\u4e2d\u7684\u5927\u89c4\u6a21\u4eba\u7fa4\uff1b\uff083\uff09\\textbf{\u6574\u5408\u4e86\u9519\u8bef\u7ea0\u6b63\u673a\u5236}\uff0c\u786e\u4fdd\u66f4\u53ef\u9760\u548c\u957f\u671f\u7684\u4eff\u771f\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u5e73\u53f0\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5927\u89c4\u6a21\u4ee3\u7406\u4eff\u771f\u6548\u7387\u4ee5\u53ca\u9519\u8bef\u7ea0\u6b63\u673a\u5236\u7684\u6709\u6548\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cGenSim\u4ee3\u8868\u4e86\u57fa\u4e8eLLM\u4ee3\u7406\u7684\u901a\u7528\u3001\u5927\u89c4\u6a21\u548c\u53ef\u6821\u6b63\u7684\u793e\u4f1a\u4eff\u771f\u5e73\u53f0\u7684\u521d\u6b65\u6b65\u9aa4\uff0c\u6709\u671b\u8fdb\u4e00\u6b65\u63a8\u52a8\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53d1\u5c55\u3002**|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u65e5\u76ca\u81ea\u4e3b\u5e76\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u9884\u89c1\u53ef\u80fd\u51fa\u73b0\u7684\u73b0\u8c61\u5e76\u8bc6\u522b\u6f5c\u5728\u98ce\u9669\u3002\u53d7\u5230\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5728\u6b64\u9886\u57df\u505a\u51fa\u8d21\u732e\uff0c\u901a\u8fc7\u5728\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u7279\u5f81\u7684\u60c5\u5883\u4e0b\u7814\u7a76LLM\u4ee3\u7406\u7684\u4ea4\u4e92\u6a21\u5f0f\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u4e24\u79cd\u73b0\u8c61\uff1a\u8bf4\u670d\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u5bfb\u6c42\u7279\u5b9a\u76ee\u6807\uff08\u4f8b\u5982\u83b7\u5f97\u66f4\u591a\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u79bb\u76d1\u72f1\uff09\u56da\u72af\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u7814\u7a76\u3002\u5229\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\u548c\u603b\u51712000\u6b21\u673a\u5668\u5bf9\u673a\u5668\u5bf9\u8bdd\uff0c\u6d89\u53ca\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u503c\u5f97\u5173\u6ce8\u7684\u53d1\u73b0\u3002 \u9996\u5148\uff0c\u6211\u4eec\u8bb0\u5f55\u4e86\u67d0\u4e9b\u6a21\u578b\u5982\u4f55\u5728\u5177\u6709\u6743\u529b\u52a8\u6001\u4f5c\u7528\u7684\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\u7684\u5bf9\u8bdd\u3002\u7136\u540e\uff0c\u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u8bc1\u5730\u5c55\u793a\u4e86\u76ee\u6807\u5bf9\u4ee3\u7406\u7684\u8bf4\u670d\u529b\u5f71\u54cd\u4e3b\u8981\uff0c\u800c\u5bf9\u4ee3\u7406\u7684\u53cd\u793e\u4f1a\u884c\u4e3a\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u4ee3\u7406\u4e2a\u6027\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u6027\u683c\uff0c\u5982\u4f55\u9a71\u52a8\u56da\u72af\u6210\u529f\u7684\u8bf4\u670d\u53ef\u80fd\u6027\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u3002\u7b2c\u56db\uff0c\u6211\u4eec\u8868\u660e\uff0c\u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u4e2a\u6027\uff0c\u4ec5\u901a\u8fc7\u5206\u914d\u4ee3\u7406\u89d2\u8272\uff0c\u53cd\u793e\u4f1a\u884c\u4e3a\u4e5f\u4f1a\u81ea\u7136\u6d6e\u73b0\u3002\u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ee3\u7406\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8fa9\u8bba\u6709\u91cd\u8981\u610f\u4e49\u3002**|\n", "2410.06932": "|**2024-10-09**|**Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models**|Daniel Albert et.al.|[2410.06932](http://arxiv.org/abs/2410.06932)|null|\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u65b9\u6cd5\u2014\u2014\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\uff0c\u4ee5\u8865\u5145\u6a21\u62df\u548c\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\uff0c\u4ece\u800c\u6df1\u5316\u5bf9\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u8ba4\u77e5\u8fc7\u7a0b\u7684\u7406\u89e3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u590d\u73b0\u4e86\u4e00\u4e2a\u4eba\u7c7b\u5b9e\u9a8c\u5ba4\u5b9e\u9a8c\u4e2d\u7684\u884c\u4e3a\u7b56\u7565\uff0c\u5e76\u4f7f\u7528LLM\u751f\u6210\u7684\u4ee3\u7406\u4e0e\u89c2\u5bdf\u5230\u7684\u4eba\u7c7b\u884c\u4e3a\u8fdb\u884c\u5bf9\u6bd4\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cLLM\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u91cd\u73b0\u641c\u7d22\u884c\u4e3a\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u76f8\u4f3c\u7684\u51b3\u7b56\u5236\u5b9a\u8fc7\u7a0b\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5206\u6790\u4e86LLM\u4ee3\u7406\u7684\u201c\u601d\u60f3\u201d\u6a21\u62df\uff0c\u53d1\u73b0\u66f4\u524d\u77bb\u6027\u7684\u601d\u60f3\u4e0e\u503e\u5411\u4e8e\u5229\u7528\u800c\u975e\u63a2\u7d22\u4ee5\u6700\u5927\u5316\u8d22\u5bcc\u7684\u884c\u4e3a\u76f8\u5173\u8054\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e00\u65b0\u65b9\u6cd5\u5728\u884c\u4e3a\u7b56\u7565\u7814\u7a76\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\uff0c\u5e76\u63a2\u8ba8\u4e86\u5176\u53ef\u80fd\u5b58\u5728\u7684\u5c40\u9650\u6027\u3002|\n", "2410.06153": "|**2024-10-08**|**AgentSquare: Automatic LLM Agent Search in Modular Design Space**|Yu Shang et.al.|[2410.06153](http://arxiv.org/abs/2410.06153)|**[link](https://github.com/tsinghua-fib-lab/agentsquare)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fdb\u5c55\u63a8\u52a8\u4e86\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u5feb\u901f\u6210\u957f\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u624b\u52a8\u3001\u4efb\u52a1\u7279\u5b9a\u8bbe\u8ba1\u7684\u65b9\u6cd5\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u65b0\u4efb\u52a1\u4e0a\u7684\u9002\u5e94\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u7814\u7a76\u95ee\u9898\uff1a\u6a21\u5757\u5316\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u641c\u7d22\uff08MoLAS\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6a21\u5757\u5316\u7684\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5c06\u73b0\u6709\u7684LLM\u667a\u80fd\u4f53\u8bbe\u8ba1\u62bd\u8c61\u4e3a\u56db\u4e2a\u57fa\u672c\u6a21\u5757\uff0c\u5e76\u4fdd\u6301\u7edf\u4e00\u7684\u8f93\u5165\u8f93\u51fa\u63a5\u53e3\uff1a\u89c4\u5212\u3001\u63a8\u7406\u3001\u5de5\u5177\u4f7f\u7528\u548c\u8bb0\u5fc6\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aAgentSquare\u7684\u65b0\u667a\u80fd\u4f53\u641c\u7d22\u6846\u67b6\uff0c\u5b83\u5f15\u5165\u4e86\u4e24\u4e2a\u6838\u5fc3\u673a\u5236\uff1a\u6a21\u5757\u8fdb\u5316\u548c\u91cd\u7ec4\uff0c\u4ee5\u9ad8\u6548\u5730\u641c\u7d22\u4f18\u5316\u7684LLM\u667a\u80fd\u4f53\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u52a0\u901f\u8fd9\u4e00\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6027\u80fd\u9884\u6d4b\u5668\uff0c\u5229\u7528\u4e0a\u4e0b\u6587\u76f8\u5173\u6a21\u578b\u4f5c\u4e3a\u4ee3\u7406\u8bbe\u8ba1\u7684\u8fd1\u4f3c\u6a21\u578b\uff0c\u4ece\u800c\u8df3\u8fc7\u65e0\u524d\u666f\u7684\u4ee3\u7406\u8bbe\u8ba1\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u7f51\u7edc\u5e94\u7528\u3001\u5b9e\u4f53\u4ea4\u4e92\u3001\u5de5\u5177\u4f7f\u7528\u548c\u6e38\u620f\u7b49\u4e0d\u540c\u573a\u666f\uff0c\u7ed3\u679c\u8868\u660e\uff0cAgentSquare\u663e\u8457\u4f18\u4e8e\u624b\u5de5\u8bbe\u8ba1\u7684\u667a\u80fd\u4f53\uff0c\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e8617.2%\uff0c\u4e0e\u4eba\u7c7b\u6700\u4f73\u8bbe\u8ba1\u76f8\u6bd4\u3002\u6b64\u5916\uff0cAgentSquare\u8fd8\u80fd\u751f\u6210\u53ef\u89e3\u91ca\u7684\u8bbe\u8ba1\u6d1e\u5bdf\uff0c\u6709\u52a9\u4e8e\u6df1\u5165\u7406\u89e3\u667a\u80fd\u4f53\u67b6\u6784\u53ca\u5176\u5bf9\u4efb\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6a21\u5757\u5316\u8bbe\u8ba1\u7a7a\u95f4\u548cAgentSquare\u641c\u7d22\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5e73\u53f0\uff0c\u7528\u4e8e\u5145\u5206\u5229\u7528\u5148\u524d\u6210\u529f\u8bbe\u8ba1\u7684\u6f5c\u529b\uff0c\u5e76\u6574\u5408\u7814\u7a76\u793e\u533a\u7684\u52aa\u529b\u3002\u4ee3\u7801\u4ed3\u5e93\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/tsinghua-fib-lab/AgentSquare\u3002**|\n", "2410.05570": "|**2024-10-08**|**Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback**|Taufiq Daryanto et.al.|[2410.05570](http://arxiv.org/abs/2410.05570)|null|\u6c42\u804c\u9762\u8bd5\u5728\u5851\u9020\u4e2a\u4eba\u804c\u4e1a\u751f\u6daf\u65b9\u9762\u8d77\u7740\u5173\u952e\u4f5c\u7528\uff0c\u7136\u800c\uff0c\u7f3a\u4e4f\u4eba\u7c7b\u6559\u7ec3\u6216\u540c\u884c\u63d0\u4f9b\u53cd\u9988\u7684\u73af\u5883\u4f7f\u9762\u8bd5\u6280\u80fd\u8bad\u7ec3\u53d8\u5f97\u9887\u5177\u6311\u6218\u3002\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u63d0\u5347\u9762\u8bd5\u7ec3\u4e60\u4f53\u9a8c\u63d0\u4f9b\u4e86\u673a\u4f1a\u3002\u9057\u61be\u7684\u662f\uff0c\u76ee\u524d\u7684\u7814\u7a76\u9c9c\u6709\u63a2\u8ba8\u6b64\u7c7b\u7cfb\u7edf\u7684\u6548\u679c\u53ca\u5176\u7528\u6237\u611f\u77e5\uff0c\u4ee5\u53ca\u5229\u7528LLM\u8fdb\u884c\u9762\u8bd5\u7ec3\u4e60\u6240\u6d89\u53ca\u7684\u76ca\u5904\u4e0e\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u5de5\u4f5c\u548c\u6700\u8fd1\u7684\u5546\u4e1a\u5de5\u5177\u5df2\u7ecf\u5c55\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u9762\u8bd5\u7ec3\u4e60\u7684\u6f5c\u529b\uff0c\u5b83\u4eec\u901a\u5e38\u4ec5\u63d0\u4f9b\u5355\u5411\u53cd\u9988\uff0c\u5373\u7528\u6237\u53ea\u80fd\u4ece\u4ed6\u4eec\u7684\u8868\u73b0\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5bf9\u8bdd\u5f0f\u53cd\u9988\uff0c\u4e00\u4e2a\u5728\u5b66\u4e60\u79d1\u5b66\u9886\u57df\u53d1\u5c55\u8d77\u6765\u7684\u6982\u5ff5\uff0c\u662f\u4e00\u79cd\u53cc\u5411\u4e92\u52a8\u53cd\u9988\u8fc7\u7a0b\uff0c\u5141\u8bb8\u7528\u6237\u901a\u8fc7\u5bf9\u8bdd\u8fdb\u4e00\u6b65\u53c2\u4e0e\u5e76\u4ece\u63d0\u4f9b\u7684\u53cd\u9988\u4e2d\u5b66\u4e60\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u6b3e\u540d\u4e3aConversate\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u5e94\u7528\u7a0b\u5e8f\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u652f\u6301\u53cd\u601d\u6027\u5b66\u4e60\uff0c\u4ee5\u4fc3\u8fdb\u6c42\u804c\u9762\u8bd5\u7ec3\u4e60\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u804c\u4f4d\u6807\u9898\uff08\u5982\u5165\u95e8\u7ea7\u8f6f\u4ef6\u5de5\u7a0b\u5e08\uff09\u6765\u542f\u52a8\u9762\u8bd5\u4f1a\u8bdd\u3002\u7136\u540e\uff0c\u7cfb\u7edf\u4e2d\u7684LLM\u4ee3\u7406\u5c06\u5f00\u59cb\u9762\u8bd5\u6a21\u62df\uff0c\u901a\u8fc7\u5411\u7528\u6237\u63d0\u51fa\u5f00\u573a\u9762\u8bd5\u95ee\u9898\uff0c\u5e76\u6839\u636e\u7528\u6237\u7684\u56de\u7b54\u7cbe\u5fc3\u8bbe\u8ba1\u540e\u7eed\u95ee\u9898\u6765\u542f\u52a8\u3002\u9762\u8bd5\u7ed3\u675f\u540e\uff0c\u7cfb\u7edf\u7684\u540e\u7aefLLM\u6846\u67b6\u5c06\u5206\u6790\u7528\u6237\u7684\u56de\u7b54\uff0c\u6307\u51fa\u9700\u8981\u6539\u8fdb\u7684\u5730\u65b9\u3002\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u9009\u62e9\u7279\u5b9a\u6bb5\u843d\u5e76\u64b0\u5199\u81ea\u6211\u53cd\u601d\u6765\u6ce8\u91ca\u8f6c\u5f55\u3002\u6700\u540e\uff0c\u7528\u6237\u53ef\u4ee5\u4e0e\u7cfb\u7edf\u8fdb\u884c\u5bf9\u8bdd\u5f0f\u53cd\u9988\u4ea4\u4e92\uff0c\u4e0eLLM\u4ee3\u7406\u5bf9\u8bdd\uff0c\u6839\u636e\u4ee3\u7406\u7684\u6307\u5bfc\u9010\u6b65\u5b8c\u5584\u81ea\u5df1\u7684\u7b54\u6848\u3002|\n", "2410.05434": "|**2024-10-07**|**Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback**|Sanjiban Choudhury et.al.|[2410.05434](http://arxiv.org/abs/2410.05434)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u51b3\u7b56\u5236\u5b9a\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u65b9\u6cd5\u7f3a\u4e4f\u4ece\u4efb\u52a1\u6267\u884c\u671f\u95f4\u9519\u8bef\u4e2d\u81ea\u52a8\u81ea\u6211\u6539\u8fdb\u7684\u673a\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86LEAP\uff0c\u4e00\u79cd\u8fed\u4ee3\u7ec6\u8c03\u6846\u67b6\uff0c\u901a\u8fc7\u4eceAI\u4e13\u5bb6\u6559\u5e08\u83b7\u53d6\u53cd\u9988\u6765\u6301\u7eed\u63d0\u5347LLM\u4ee3\u7406\u3002\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\u4e3a\u4e13\u5bb6\u6559\u5e08\u63d0\u4f9b\u4e00\u4e2a\u7279\u6743\u72b6\u6001\u2014\u2014\u4ec5\u5728\u8bad\u7ec3\u671f\u95f4\u53ef\u7528\u4f46\u5728\u6d4b\u8bd5\u65f6\u9690\u85cf\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u5373\u4f7f\u662f\u6700\u5f31\u7684\u4e13\u5bb6\u4e5f\u80fd\u63d0\u4f9b\u7cbe\u786e\u6307\u5bfc\uff0c\u663e\u8457\u63d0\u9ad8\u5b66\u751f\u4ee3\u7406\u5728\u4e0d\u8bbf\u95ee\u6d4b\u8bd5\u65f6\u7684\u7279\u6743\u4fe1\u606f\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u6211\u4eec\u5728\u591a\u79cd\u51b3\u7b56\u5236\u5b9a\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86LEAP\uff0c\u5305\u62ec\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\uff08ALFWorld\uff09\u3001\u7f51\u7edc\u5bfc\u822a\uff08WebShop\uff09\u548c\u4ea4\u4e92\u5f0f\u7f16\u7801\uff08Intercode Bash\uff09\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cLEAP\uff081\uff09\u4f18\u4e8e\u884c\u4e3a\u514b\u9686\u548cReAct\u57fa\u7ebf\uff082\uff09\u4f7f\u8f83\u5f31\u7684\u5b66\u751f\u6a21\u578b\uff08\u5982Llama3-8B\uff09\u8d85\u8fc7\u5f3a\u5927\u6559\u5e08\u6a21\u578b\uff08GPT4-o\uff09\u7684\u8868\u73b0\uff0c\u5e76\u4e14\uff083\uff09\u5141\u8bb8\u8f83\u5f31\u7684\u6a21\u578b\u4f7f\u7528\u81ea\u5df1\u7279\u6743\u7248\u672c\u7684\u81ea\u6211\u63d0\u5347\u3002\u6211\u4eec\u4e5f\u63d0\u4f9b\u4e86\u7406\u8bba\u5206\u6790\uff0c\u663e\u793aLEAP\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5e73\u8861\u7279\u6743\u4fe1\u606f\u4e0e\u5b66\u751f\u7684\u53ef\u5b9e\u73b0\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u8bc1\u5b9e\u4e86\u8fd9\u4e00\u89c2\u70b9\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://leap-llm.github.io \u83b7\u53d6\u3002|\n", "2410.07869": "|**2024-10-10**|**Benchmarking Agentic Workflow Generation**|Shuofei Qiao et.al.|[2410.07869](http://arxiv.org/abs/2410.07869)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u51ed\u501f\u5176\u5728\u5904\u7406\u5e7f\u6cdb\u4efb\u52a1\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\uff0c\u63a8\u52a8\u4e86\u63a8\u7406\u548c\u89c4\u5212\u4efb\u52a1\u7684\u663e\u8457\u8fdb\u6b65\u3002\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u5c06\u590d\u6742\u95ee\u9898\u5206\u89e3\u4e3a\u53ef\u6267\u884c\u7684\u5de5\u4f5c\u6d41\u662f\u5173\u952e\u6b65\u9aa4\u3002\u73b0\u6709\u7684\u5de5\u4f5c\u6d41\u8bc4\u4f30\u6846\u67b6\u8981\u4e48\u4ec5\u5173\u6ce8\u6574\u4f53\u6027\u80fd\uff0c\u8981\u4e48\u5b58\u5728\u9650\u5236\uff0c\u5982\u573a\u666f\u8986\u76d6\u8303\u56f4\u6709\u9650\u3001\u5de5\u4f5c\u6d41\u7ed3\u6784\u8fc7\u4e8e\u7b80\u5355\u4ee5\u53ca\u8bc4\u4ef7\u6807\u51c6\u5bbd\u677e\u7b49\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86WorFBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u591a\u7ef4\u573a\u666f\u548c\u590d\u6742\u56fe\u5de5\u4f5c\u6d41\u7ed3\u6784\u7684\u7edf\u4e00\u5de5\u4f5c\u6d41\u751f\u6210\u57fa\u51c6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u7cfb\u7edf\u6027\u7684\u8bc4\u4f30\u534f\u8bae\u2014\u2014WorFEval\uff0c\u5229\u7528\u5b50\u5e8f\u5217\u548c\u5b50\u56fe\u5339\u914d\u7b97\u6cd5\u6765\u51c6\u786e\u91cf\u5316LLM\u4ee3\u7406\u7684\u5de5\u4f5c\u6d41\u751f\u6210\u80fd\u529b\u3002 \u901a\u8fc7\u4e0d\u540c\u7c7b\u578b\u7684LLM\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLM\u4ee3\u7406\u5728\u5e8f\u5217\u89c4\u5212\u80fd\u529b\u548c\u56fe\u89c4\u5212\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u660e\u663e\u7684\u5dee\u8ddd\uff0c\u5373\u4f7f\u662fGPT-4\u4e5f\u663e\u793a\u51fa\u7ea615%\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u8fd8\u8bad\u7ec3\u4e86\u4e24\u4e2a\u5f00\u6e90\u6a21\u578b\uff0c\u5e76\u5728\u4fdd\u7559\u4efb\u52a1\u4e0a\u8bc4\u4f30\u5b83\u4eec\u7684\u4e00\u822c\u5316\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u751f\u6210\u7684\u5de5\u4f5c\u6d41\u80fd\u591f\u589e\u5f3a\u4e0b\u6e38\u4efb\u52a1\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4efb\u52a1\u5728\u63a8\u7406\u65f6\u80fd\u591f\u53d6\u5f97\u66f4\u597d\u7684\u6027\u80fd\u5e76\u8282\u7701\u65f6\u95f4\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5c06\u5728https://github.com/zjunlp/WorFBench\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.07706": "|**2024-10-10**|**AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories**|Yifan Song et.al.|[2410.07706](http://arxiv.org/abs/2410.07706)|null|\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentBank\uff0c\u8fd9\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u7528\u4e8e\u5f00\u653e\u6e90\u4ee3\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684agent-environment\u4ea4\u4e92\u8f68\u8ff9\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u8d85\u8fc75\u4e07\u6761\u591a\u6837\u5316\u7684\u9ad8\u8d28\u91cf\u4ea4\u4e92\u8f68\u8ff9\uff0c\u6d89\u53ca16\u4e2a\u4efb\u52a1\u548c\u4e94\u4e2a\u4e0d\u540c\u7684agent\u6280\u80fd\u7ef4\u5ea6\u3002\u901a\u8fc7\u65b0\u9896\u7684\u6ce8\u91ca\u6d41\u7a0b\uff0c\u6211\u4eec\u80fd\u591f\u89c4\u6a21\u5316\u5730\u6807\u6ce8\u8f68\u8ff9\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96be\u5ea6\u504f\u5dee\u6700\u5c0f\u5316\u7684\u8f68\u8ff9\u6570\u636e\u96c6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5bf9AgentBank\u8fdb\u884c\u8c03\u4f18\uff0c\u5f97\u5230\u4e86\u4e00\u7cfb\u5217\u7684agent\u6a21\u578b\u2014\u2014Samoyed\u3002\u6211\u4eec\u7684\u6bd4\u8f83\u5b9e\u9a8c\u8868\u660e\uff0c\u901a\u8fc7\u6269\u5c55\u4ea4\u4e92\u8f68\u8ff9\u6570\u636e\u6765\u83b7\u53d6\u901a\u7528\u7684agent\u80fd\u529b\u7684\u6709\u6548\u6027\u3002\u989d\u5916\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u4e00\u4e9b\u5173\u4e8e\u8f68\u8ff9\u8c03\u4f18\u548cagent\u6280\u80fd\u6cdb\u5316\u7684\u5173\u952e\u89c2\u5bdf\u7ed3\u679c\u3002|\n", "2410.07484": "|**2024-10-11**|**WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents**|Siyu Zhou et.al.|[2410.07484](http://arxiv.org/abs/2410.07484)|**[link](https://github.com/elated-sawyer/WALL-E)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u662f\u5426\u53ef\u4ee5\u76f4\u63a5\u4f5c\u4e3a\u6a21\u578b\u9a71\u52a8\u4ee3\u7406\u7684\u5f3a\u5927\u4e16\u754c\u6a21\u578b\uff1f\u867d\u7136LLM\u7684\u5148\u9a8c\u77e5\u8bc6\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u786e\u5b9e\u5b58\u5728\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u4f7fLLM\u4e0e\u5176\u90e8\u7f72\u73af\u5883\u5bf9\u9f50\u6765\u5f25\u5408\u8fd9\u4e9b\u5dee\u8ddd\uff0c\u8fd9\u79cd\u201c\u4e16\u754c\u5bf9\u9f50\u201d\u53ef\u4ee5\u901a\u8fc7\u5728LLM\u4e0a\u8fdb\u884c\u89c4\u5219\u5b66\u4e60\u6765\u9ad8\u6548\u5b9e\u73b0\u3002\u8003\u8651\u5230LLM\u4e30\u5bcc\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4ec5\u9700\u5c11\u91cf\u989d\u5916\u89c4\u5219\u5373\u53ef\u4f7fLLM\u9884\u6d4b\u4e0e\u6307\u5b9a\u73af\u5883\u52a8\u529b\u5b66\u76f8\u5339\u914d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\uff0c\u901a\u8fc7LLM\u4ee5\u68af\u5ea6\u65e0\u7684\u5b66\u4e60\u65b9\u5f0f\u6765\u5b66\u4e60\u8fd9\u4e9b\u89c4\u5219\uff0c\u901a\u8fc7\u57fa\u4e8e\u63a2\u7d22\u8f68\u8ff9\u4e0e\u4e16\u754c\u6a21\u578b\u9884\u6d4b\u7684\u6bd4\u8f83\u6765\u8bf1\u5bfc\u3001\u66f4\u65b0\u548c\u4fee\u526a\u89c4\u5219\u3002\u7ed3\u679c\u7684\u4e16\u754c\u6a21\u578b\u7531LLM\u548c\u5b66\u4e60\u5230\u7684\u89c4\u5219\u7ec4\u6210\u3002\u6211\u4eec\u6784\u5efa\u7684\u5b9e\u4f53\u5316LLM\u4ee3\u7406\u201cWALL-E\u201d\u57fa\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\uff08MPC\uff09\u3002\u901a\u8fc7\u57fa\u4e8e\u7cbe\u786e\u4e16\u754c\u6a21\u578b\u4f18\u5316\u524d\u77bb\u884c\u52a8\uff0cMPC\u663e\u8457\u63d0\u9ad8\u4e86\u63a2\u7d22\u548c\u5b66\u4e60\u6548\u7387\u3002\u4e0e\u73b0\u6709LLM\u4ee3\u7406\u76f8\u6bd4\uff0c\u201cWALL-E\u201d\u7684\u63a8\u7406\u4ec5\u9700\u8981\u5c11\u91cf\u4e3b\u8981\u89c4\u5219\uff0c\u800c\u4e0d\u9700\u8981\u5305\u542b\u5728LLM\u8f93\u5165\u4e2d\u7684\u5927\u91cf\u7f13\u51b2\u8f68\u8ff9\u3002\u5728Minecraft\u548cALFWorld\u7684\u5f00\u653e\u4e16\u754c\u6311\u6218\u4e2d\uff0cWALL-E\u7684\u6210\u529f\u7387\u9ad8\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u89c4\u5212\u65f6\u95f4\u548c\u63a8\u7406\u6240\u9700\u7684\u4ee4\u724c\u6570\u91cf\u66f4\u4f4e\u3002\u5728Minecraft\u4e2d\uff0cWALL-E\u6bd4\u57fa\u7ebf\u9ad8\u51fa15%-30%\uff0c\u6210\u529f\u7387\u4e3a95%\uff0c\u4ec5\u82b1\u8d396\u6b21\u8fed\u4ee3\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|\u53e0\u5c42\u6210\u50cf\u662f\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u4e2d\u7684\u4e00\u79cd\u5148\u8fdb\u7684\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7269\u7406\u3001\u5316\u5b66\u3001\u751f\u7269\u548c\u6750\u6599\u79d1\u5b66\u7b49\u79d1\u7814\u9886\u57df\uff0c\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u3002\u5b9e\u9645\u4e0a\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684\u53e0\u5c42\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u8bb8\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u4f4e\u541e\u5410\u91cf\u7684\u5de5\u4f5c\u6d41\u7a0b\u548c\u6f5c\u5728\u7684\u4eba\u7c7b\u504f\u89c1\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u201c\u53e0\u5c42\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u5316\u53e0\u5c42\u6210\u50cf\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u4f7f\u7528\u591a\u4e2aLLM\u4ee3\u7406\u6267\u884c\u4efb\u52a1\uff0c\u5305\u62ec\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e3a\u53ef\u4ee5\u4e0e\u5b9a\u5236\u7684\u672c\u5730\u77e5\u8bc6\u5e93\u4e00\u8d77\u5de5\u4f5c\uff0c\u786e\u4fdd\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e2d\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09024": "|**2024-10-14**|**AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents**|Maksym Andriushchenko et.al.|[2410.09024](http://arxiv.org/abs/2410.09024)|null|\u5bf9\u4e8e\u8bed\u8a00\u5927\u6a21\u578b\uff08LLMs\uff09\u5728\u9762\u5bf9\u8d8a\u72f1\u653b\u51fb\u65f6\u7684\u9c81\u68d2\u6027\u7814\u7a76\uff0c\u4e3b\u8981\u96c6\u4e2d\u5728\u5b83\u4eec\u4f5c\u4e3a\u7b80\u5355\u7684\u804a\u5929\u673a\u5668\u4eba\u65f6\u7684\u60c5\u51b5\u3002\u7136\u800c\uff0c\u80fd\u591f\u4f7f\u7528\u5916\u90e8\u5de5\u5177\u5e76\u6267\u884c\u591a\u9636\u6bb5\u4efb\u52a1\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u53ef\u80fd\u5e26\u6765\u66f4\u5927\u7684\u98ce\u9669\uff0c\u4f46\u5176\u9c81\u68d2\u6027\u4ecd\u7f3a\u4e4f\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u6ee5\u7528\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014AgentHarm\u3002\u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u62ec110\u4e2a\u660e\u786e\u6076\u610f\u7684\u4ee3\u7406\u4efb\u52a1\uff08\u901a\u8fc7\u589e\u5f3a\u540e\u8fbe\u5230440\u4e2a\uff09\uff0c\u6db5\u76d6\u4e86\u6b3a\u8bc8\u3001\u7f51\u7edc\u72af\u7f6a\u548c\u9a9a\u6270\u7b4911\u7c7b\u5371\u5bb3\u3002\u9664\u4e86\u8861\u91cf\u6a21\u578b\u662f\u5426\u62d2\u7edd\u6709\u5bb3\u7684\u4ee3\u7406\u8bf7\u6c42\u5916\uff0c\u8981\u5728AgentHarm\u4e0a\u53d6\u5f97\u9ad8\u5206\u8fd8\u9700\u8981\u88ab\u8d8a\u72f1\u7684\u4ee3\u7406\u80fd\u591f\u5728\u906d\u53d7\u653b\u51fb\u540e\u7ef4\u6301\u5176\u80fd\u529b\u4ee5\u5b8c\u6210\u591a\u6b65\u4efb\u52a1\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u7cfb\u5217\u9886\u5148\u7684LLMs\uff0c\u53d1\u73b0\uff081\uff09\u9886\u5148\u7684LLMs\u5728\u6ca1\u6709\u8d8a\u72f1\u7684\u60c5\u51b5\u4e0b\u4f1a\u51fa\u4e4e\u610f\u6599\u5730\u670d\u4ece\u6076\u610f\u4ee3\u7406\u8bf7\u6c42\uff0c\uff082\uff09\u7b80\u5355\u7684\u901a\u7528\u8d8a\u72f1\u6a21\u677f\u53ef\u4ee5\u6709\u6548\u8d8a\u72f1\u4ee3\u7406\uff0c\uff083\uff09\u8fd9\u4e9b\u8d8a\u72f1\u80fd\u591f\u4f7f\u8fde\u8d2f\u4e14\u6076\u610f\u7684\u591a\u6b65\u4ee3\u7406\u884c\u4e3a\u5f97\u4ee5\u5b9e\u73b0\uff0c\u5e76\u4fdd\u7559\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u4fbf\u4e8e\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7b80\u5355\u53ef\u9760\u7684\u653b\u51fb\u548c\u9632\u5fa1\u8bc4\u4f30\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86AgentHarm\uff0c\u7f51\u5740\u662fhttps://huggingface.co/datasets/ai-safety-institute/AgentHarm\u3002|\n", "2410.08948": "|**2024-10-11**|**The Dynamics of Social Conventions in LLM populations: Spontaneous Emergence, Collective Biases and Tipping Points**|Ariel Flint Ashery et.al.|[2410.08948](http://arxiv.org/abs/2410.08948)|null|\u793e\u4f1a\u60ef\u4f8b\u662f\u793e\u4f1a\u548c\u7ecf\u6d4e\u751f\u6d3b\u7684\u57fa\u7840\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684AI\u4ee3\u7406\u4e0e\u5f7c\u6b64\u4ee5\u53ca\u4eba\u7c7b\u8fdb\u884c\u4e92\u52a8\uff0c\u5b83\u4eec\u5f62\u6210\u5171\u4eab\u60ef\u4f8b\u7684\u80fd\u529b\u5c06\u51b3\u5b9a\u5b83\u4eec\u534f\u8c03\u884c\u4e3a\u3001\u878d\u5165\u793e\u4f1a\u5e76\u5f71\u54cd\u793e\u4f1a\u7684\u6548\u679c\u3002\u672c\u6587\u901a\u8fc7\u6a21\u62df\u4ea4\u4e92\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u7fa4\u4f53\u5185\u90e8\u60ef\u4f8b\u7684\u52a8\u529b\u5b66\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5168\u7403\u63a5\u53d7\u7684\u793e\u4f1a\u60ef\u4f8b\u53ef\u4ee5\u81ea\u53d1\u5730\u4ece\u76f8\u4e92\u4ea4\u6d41\u7684LLM\u4e4b\u95f4\u4ea7\u751f\u3002\u5176\u6b21\uff0c\u6211\u4eec\u6f14\u793a\u4e86\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u5373\u4f7f\u662f\u4e2a\u4f53\u4ee3\u7406\u770b\u4f3c\u65e0\u504f\u89c1\u7684\u60c5\u51b5\u4e0b\uff0c\u5f3a\u70c8\u7684\u96c6\u4f53\u504f\u89c1\u4e5f\u53ef\u80fd\u4f1a\u51fa\u73b0\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u8003\u5bdf\u4e86\u5c11\u6570\u7fa4\u4f53\u4e2d\u7684\u575a\u5b9aLLM\u5982\u4f55\u63a8\u52a8\u793e\u4f1a\u53d8\u9769\uff0c\u901a\u8fc7\u5efa\u7acb\u65b0\u7684\u793e\u4f1a\u60ef\u4f8b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65e6\u8fd9\u4e9b\u5c11\u6570\u7fa4\u4f53\u8fbe\u5230\u4e34\u754c\u89c4\u6a21\uff0c\u5b83\u4eec\u5c31\u80fd\u591f\u6301\u7eed\u98a0\u8986\u5df2\u5efa\u7acb\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u6240\u6709\u60c5\u51b5\u4e0b\uff0c\u5c06\u5b9e\u9a8c\u7ed3\u679c\u4e0e\u4e00\u4e2a\u6700\u5c0f\u5316\u591a\u4ee3\u7406\u6a21\u578b\u7684\u9884\u6d4b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u9694\u79bbLLM\u4ee3\u7406\u7684\u5177\u4f53\u4f5c\u7528\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u9610\u660e\u4e86AI\u7cfb\u7edf\u53ef\u4ee5\u5728\u6ca1\u6709\u660e\u786e\u7f16\u7a0b\u7684\u60c5\u51b5\u4e0b\u81ea\u4e3b\u53d1\u5c55\u89c4\u8303\uff0c\u5e76\u5bf9\u8bbe\u8ba1\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u793e\u4f1a\u76ee\u6807\u76f8\u4e00\u81f4\u7684AI\u7cfb\u7edf\u5177\u6709\u542f\u793a\u610f\u4e49\u3002|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u4f8b\u5982\u901a\u8fc7\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u7684\u5bf9\u6297\u6027\u8f93\u5165\u53ef\u4ee5\u89e6\u53d1\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ec8\u6b62\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\uff08\u5982\u673a\u5668\u4eba\u8bed\u97f3\u547d\u4ee4\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u975e\u8bed\u4e49\u63d0\u793a\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4ec5\u4f9d\u9760\u81ea\u7136\u6307\u4ee4\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u6700\u5927\u957f\u5ea6\u9650\u5236\uff0c\u8fd9\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6709\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u4e2d\u7684\u4e0a\u9650\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u6295\u6bd2\u578bDoS\uff08P-DoS\uff09\u653b\u51fb\uff0c\u8bc1\u660e\u6ce8\u5165\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8eDoS\u76ee\u7684\u7684\u4e2d\u6bd2\u6837\u672c\u53ef\u4ee5\u6253\u7834\u8f93\u51fa\u957f\u5ea6\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u4e2d\u6bd2\u6837\u672c\u6210\u529f\u653b\u51fb\u4e86GPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u4f7f\u7528\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\uff0c\u5bfc\u81f4\u8f93\u51fa\u91cd\u590d\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2atoken\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u4e2d\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u5f00\u6e90LLMs\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u6025\u9700\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u5b89\u5168\u7684\u8feb\u5207\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u627e\u5230\u3002**|\n", "2410.10398": "|**2024-10-14**|**FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas**|Yu Lei et.al.|[2410.10398](http://arxiv.org/abs/2410.10398)|null|AI\u5bf9\u9f50\u662f\u5173\u4e4eAI\u63a7\u5236\u548c\u5b89\u5168\u7684\u5173\u952e\u95ee\u9898\u3002\u5b83\u4e0d\u4ec5\u5e94\u8003\u8651\u4ef7\u503c\u4e2d\u7acb\u7684\u4eba\u7c7b\u504f\u597d\uff0c\u8fd8\u5e94\u8003\u8651\u9053\u5fb7\u548c\u4f26\u7406\u65b9\u9762\u7684\u8003\u91cf\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FairMindSim\uff0c\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0d\u516c\u5e73\u7684\u60c5\u666f\u6765\u6a21\u62df\u9053\u5fb7\u56f0\u5883\u3002\u6211\u4eec\u4f7f\u7528LLM\u4ee3\u7406\u6765\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u5728\u5404\u4e2a\u9636\u6bb5\u786e\u4fdd\u5bf9\u9f50\u3002\u4e3a\u4e86\u63a2\u7d22\u9a71\u52a8\u4eba\u7c7b\u548cLLM\u4ee3\u7406\u4f5c\u4e3a\u65c1\u89c2\u8005\u5728\u6d89\u53ca\u4ed6\u4eba\u7684\u4e0d\u516c\u6b63\u60c5\u51b5\u4e0b\u5e72\u9884\u7684\u5404\u79cd\u793e\u4f1a\u7ecf\u6d4e\u52a8\u673a\uff0c\u5373\u6211\u4eec\u6240\u79f0\u7684\u4fe1\u5ff5\uff0c\u5e76\u63a2\u8ba8\u8fd9\u4e9b\u4fe1\u5ff5\u5982\u4f55\u76f8\u4e92\u4f5c\u7528\u4ee5\u5f71\u54cd\u4e2a\u4f53\u884c\u4e3a\uff0c\u6211\u4eec\u5c06\u76f8\u5173\u793e\u4f1a\u5b66\u9886\u57df\u7684\u77e5\u8bc6\u7eb3\u5165\u5176\u4e2d\uff0c\u5e76\u57fa\u4e8e\u9012\u5f52\u5956\u52b1\u6a21\u578b\uff08RRM\uff09\u63d0\u51fa\u4e86\u4fe1\u5ff5-\u5956\u52b1\u5bf9\u9f50\u884c\u4e3a\u8fdb\u5316\u6a21\u578b\uff08BREM\uff09\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4ece\u884c\u4e3a\u89d2\u5ea6\u6765\u770b\uff0cGPT-4o\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u793e\u4f1a\u6b63\u4e49\u611f\uff0c\u800c\u4eba\u7c7b\u5219\u5c55\u73b0\u51fa\u66f4\u4e30\u5bcc\u7684\u60c5\u611f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8ba8\u8bba\u4e86\u60c5\u7eea\u5bf9\u884c\u4e3a\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u672c\u7814\u7a76\u4e3aLLM\u4e0e\u5229\u4ed6\u4ef7\u503c\u89c2\u5bf9\u9f50\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u7406\u8bba\u57fa\u7840\u3002|\n", "2410.10136": "|**2024-10-14**|**Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations**|Garima Agrawal et.al.|[2410.10136](http://arxiv.org/abs/2410.10136)|null|\u5728\u5ba2\u6237\u8054\u7edc\u4e2d\u5fc3\uff0c\u4eba\u5de5\u5ba2\u670d\u7ecf\u5e38\u9762\u4e34\u8f83\u957f\u7684\u5e73\u5747\u5904\u7406\u65f6\u95f4\uff08AHT\uff09\uff0c\u56e0\u4e3a\u4ed6\u4eec\u9700\u8981\u624b\u52a8\u89e3\u6790\u67e5\u8be2\u5e76\u68c0\u7d22\u76f8\u5173\u7684\u77e5\u8bc6\u5e93\uff08KB\uff09\u6587\u7ae0\u3002\u867d\u7136\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u5df2\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u884c\u4e1a\u4ee5\u534f\u52a9\u6b64\u7c7b\u4efb\u52a1\uff0c\u4f46\u5728\u5b9e\u65f6\u5bf9\u8bdd\u4e2d\uff0cRAG\u7cfb\u7edf\u9762\u4e34\u7740\u8bf8\u5982\u67e5\u8be2\u516c\u5f0f\u4e0d\u51c6\u786e\u548c\u9891\u7e41\u95ee\u9898\u91cd\u590d\u68c0\u7d22\u7b49\u95ee\u9898\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u53ef\u4ee5\u8d85\u8d8aRAG\uff0c\u5728\u5b9e\u65f6\u8bc6\u522b\u5ba2\u6237\u95ee\u9898\u3002\u5982\u679c\u67e5\u8be2\u5339\u914d\u5e38\u89c1\u95ee\u9898\u89e3\u7b54\uff08FAQ\uff09\uff0c\u7cfb\u7edf\u76f4\u63a5\u4eceFAQ\u6570\u636e\u5e93\u4e2d\u68c0\u7d22\u7b54\u6848\uff1b\u5426\u5219\uff0c\u901a\u8fc7RAG\u751f\u6210\u7b54\u6848\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u67e5\u8be2\u7684\u4f9d\u8d56\uff0c\u4f7f\u5f97\u54cd\u5e94\u80fd\u591f\u57282\u79d2\u5185\u63d0\u4f9b\u7ed9\u5ba2\u670d\u4eba\u5458\u3002\u6b64\u7cfb\u7edf\u90e8\u7f72\u5728Minerva CQ\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u89e3\u51b3\u65b9\u6848\u4e2d\uff0c\u63d0\u9ad8\u4e86\u6548\u7387\uff0c\u7f29\u77ed\u4e86AHT\uff0c\u5e76\u964d\u4f4e\u4e86\u8fd0\u8425\u6210\u672c\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7684LLM\u4ee3\u7406\u5de5\u4f5c\u6d41\uff0c\u5f53\u6ca1\u6709\u9884\u5b9a\u4e49\u7684FAQ\u65f6\uff0c\u53ef\u4ee5\u4ece\u5386\u53f2\u8bb0\u5f55\u4e2d\u8bc6\u522bFAQ\u3002|\n", "2410.10020": "|**2024-10-13**|**Adaptive Reasoning and Acting in Medical Language Agents**|Abhishek Dutta et.al.|[2410.10020](http://arxiv.org/abs/2410.10020)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u6846\u67b6\uff0c\u7528\u4e8e\u63d0\u5347\u5728\u6a21\u62df\u4e34\u5e8a\u73af\u5883\u4e2d\u7684\u8bca\u65ad\u51c6\u786e\u6027\uff0c\u5e76\u4f7f\u7528AgentClinic\u57fa\u51c6\u8fdb\u884c\u8bc4\u4f30\u3002\u6240\u63d0\u51fa\u7684\u81ea\u52a8\u6821\u6b63\u673a\u5236\u4f7f\u5f97\u533b\u751f\u4ee3\u7406\u80fd\u591f\u5728\u9519\u8bef\u8bca\u65ad\u540e\u8fed\u4ee3\u5730\u4f18\u5316\u5176\u63a8\u7406\u548c\u884c\u4e3a\uff0c\u4ece\u800c\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u63d0\u9ad8\u51b3\u7b56\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u91c7\u7528\u81ea\u9002\u5e94LLM\u57fa\u7840\u533b\u751f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u4e0e\u6a21\u62df\u60a3\u8005\u7684\u52a8\u6001\u4e92\u52a8\u5b9e\u73b0\u6b63\u786e\u7684\u8bca\u65ad\u3002\u8bc4\u4f30\u7ed3\u679c\u7a81\u663e\u4e86\u81ea\u4e3b\u4ee3\u7406\u5728\u590d\u6742\u533b\u7597\u573a\u666f\u4e2d\u9002\u5e94\u548c\u6539\u8fdb\u7684\u80fd\u529b\u3002\u672a\u6765\u7684\u5de5\u4f5c\u5c06\u96c6\u4e2d\u5728\u5b8c\u5584\u7b97\u6cd5\u5e76\u6269\u5927\u5176\u5728\u66f4\u5e7f\u6cdb\u4efb\u52a1\u548c\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u9002\u7528\u6027\u3002|\n", "2410.09824": "|**2024-10-13**|**Dynamic and Textual Graph Generation Via Large-Scale LLM-based Agent Simulation**|Jiarui Ji et.al.|[2410.09824](http://arxiv.org/abs/2410.09824)|null|\u56fe\u751f\u6210\u662f\u793e\u4f1a\u3001\u6280\u672f\u548c\u79d1\u5b66\u7814\u7a76\u4e2d\u5e7f\u6cdb\u7814\u7a76\u7684\u57fa\u672c\u4efb\u52a1\u3002\u5728\u5efa\u6a21\u52a8\u6001\u56fe\u6f14\u5316\u8fc7\u7a0b\u65f6\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u65b9\u6cd5\u96be\u4ee5\u6355\u6349\u56fe\u4e2d\u7684\u793e\u533a\u7ed3\u6784\uff0c\u800c\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u4ec5\u5173\u6ce8\u62df\u5408\u8bad\u7ec3\u56fe\u3002\u8fd9\u9650\u5236\u4e86\u73b0\u6709\u7684\u56fe\u751f\u6210\u5668\u53ea\u80fd\u751f\u6210\u7b26\u5408\u9884\u5b9a\u4e49\u89c4\u5219\u6216\u4e0e\u8bad\u7ec3\u6570\u636e\u96c6\u9ad8\u5ea6\u76f8\u4f3c\u7684\u56fe\uff0c\u5728\u52a8\u6001\u56fe\u751f\u6210\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u9274\u4e8e\u56fe\u662f\u4ece\u4eba\u7c7b\u6d3b\u52a8\u4e2d\u6210\u5bf9\u4ea4\u4e92\u4ea7\u751f\u7684\u62bd\u8c61\u8868\u793a\uff0c\u5bf9\u4eba\u7c7b\u884c\u4e3a\u7684\u771f\u5b9e\u6a21\u62df\u53ef\u4ee5\u66f4\u6df1\u5165\u5730\u6d1e\u5bdf\u56fe\u6f14\u5316\u673a\u5236\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u65b9\u9762\u7684\u65e5\u76ca\u8ba4\u53ef\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u4eff\u771f\u6846\u67b6\u2014\u2014GraphAgent-Generator\uff08GAG\uff09\uff0c\u7528\u4e8e\u52a8\u6001\u56fe\u751f\u6210\u3002\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u8bad\u7ec3\u6216\u5fae\u8c03\uff0c\u6211\u4eec\u7684\u6846\u67b6\u6709\u6548\u590d\u5236\u4e86\u5df2\u5efa\u7acb\u7684\u7f51\u7edc\u79d1\u5b66\u7406\u8bba\u4e2d\u7684\u4e03\u4e2a\u5b8f\u89c2\u7ed3\u6784\u7279\u5f81\uff0c\u540c\u65f6\u5728\u7279\u5b9a\u8bc4\u4f30\u6307\u6807\u4e0a\u6bd4\u73b0\u6709\u57fa\u7ebf\u5728\u56fe\u6269\u5c55\u4efb\u52a1\u4e2d\u63d0\u9ad8\u4e8631%\u3002\u901a\u8fc7\u8282\u70b9\u5206\u7c7b\u4efb\u52a1\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86GAG\u80fd\u591f\u6709\u6548\u4fdd\u7559\u771f\u5b9e\u4e16\u754c\u7f51\u7edc\u7684\u8282\u70b9\u7ea7\u6587\u672c\u7279\u5f81\u5728\u751f\u6210\u7684\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u4e2d\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u5e76\u884c\u52a0\u901f\uff0cGAG\u652f\u6301\u901a\u8fc7\u57fa\u4e8e\u5927\u89c4\u6a21LLM\u7684\u4ee3\u7406\u4eff\u771f\u751f\u6210\u6700\u591a\u63a5\u8fd110\u4e07\u4e2a\u8282\u70b9\u62161000\u4e07\u6761\u8fb9\u7684\u56fe\uff0c\u6700\u5c0f\u52a0\u901f\u6bd4\u4e3a90.4%\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u83b7\u53d6\u3002|\n", "2410.09713": "|**2024-10-13**|**Agentic Information Retrieval**|Weinan Zhang et.al.|[2410.09713](http://arxiv.org/abs/2410.09713)|null|\u81ea20\u4e16\u7eaa70\u5e74\u4ee3\u4ee5\u6765\uff0c\u7528\u6237\u8bbf\u95ee\u76f8\u5173\u4fe1\u606f\u4e00\u76f4\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u67b6\u6784\u3002\u5728\u8fc7\u53bb\u4e8c\u5341\u5e74\u4e2d\uff0c\u73b0\u4ee3IR\u7cfb\u7edf\uff08\u5305\u62ec\u7f51\u7edc\u641c\u7d22\u5f15\u64ce\u548c\u4e2a\u4eba\u5316\u63a8\u8350\u7cfb\u7edf\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u4ece\u5927\u91cf\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\u7684\u6548\u7387\u3002\u7136\u800c\uff0c\u8fd9\u4e9bIR\u7cfb\u7edf\u7684\u5185\u6838\u8303\u5f0f\u4ecd\u7136\u57fa\u672c\u4e0d\u53d8\uff0c\u4f9d\u8d56\u4e8e\u7b5b\u9009\u9884\u5b9a\u7684\u4e00\u7ec4\u5019\u9009\u9879\u76ee\u3002\u81ea2022\u5e74\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7a81\u7834\u5f00\u59cb\u6539\u53d8\u4fe1\u606f\u8bbf\u95ee\u7684\u65b9\u5f0f\uff0c\u5efa\u7acb\u4e86\u4e00\u79cd\u65b0\u7684\u6280\u672f\u8303\u5f0f\u3002\u5728\u672c\u6587\u732e\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7531LLM\u4ee3\u7406\u80fd\u529b\u5851\u9020\u7684\u65b0IR\u8303\u5f0f\u2014\u2014\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\uff08Agentic IR\uff09\u3002Agentic IR\u6269\u5c55\u4e86\u53ef\u8bbf\u95ee\u4efb\u52a1\u7684\u8303\u56f4\uff0c\u5e76\u5229\u7528\u4e00\u7cfb\u5217\u65b0\u6280\u672f\u91cd\u65b0\u5b9a\u4e49\u4fe1\u606f\u68c0\u7d22\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u4e09\u79cd\u524d\u6cbf\u5e94\u7528\u4ee5\u53ca\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4e3b\u52a8\u5f0f\u4fe1\u606f\u68c0\u7d22\u6709\u671b\u4ea7\u751f\u521b\u65b0\u7684\u5e94\u7528\uff0c\u53ef\u80fd\u6210\u4e3a\u672a\u6765\u6570\u5b57\u751f\u6001\u7cfb\u7edf\u4e2d\u7684\u6838\u5fc3\u4fe1\u606f\u5165\u53e3\u3002|\n", "2410.09381": "|**2024-10-12**|**LLM-SmartAudit: Advanced Smart Contract Vulnerability Detection**|Zhiyuan Wei et.al.|[2410.09381](http://arxiv.org/abs/2410.09381)|null|\u533a\u5757\u94fe\u6280\u672f\u7684\u4e0d\u53d8\u6027\u8d28\u867d\u7136\u9769\u547d\u6027\uff0c\u4f46\u4e5f\u5f15\u5165\u4e86\u663e\u8457\u7684\u5b89\u5168\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u3002\u8fd9\u4e9b\u5b89\u5168\u95ee\u9898\u53ef\u80fd\u5bfc\u81f4\u5de8\u5927\u7684\u8d22\u52a1\u635f\u5931\u3002\u5f53\u524d\u5de5\u5177\u548c\u65b9\u6cd5\u901a\u5e38\u4e13\u6ce8\u4e8e\u7279\u5b9a\u7c7b\u578b\u7684\u6f0f\u6d1e\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u4e00\u79cd\u80fd\u591f\u5e7f\u6cdb\u68c0\u6d4b\u591a\u79cd\u6f0f\u6d1e\u4e14\u5177\u6709\u9ad8\u51c6\u786e\u6027\u7684\u7efc\u5408\u5de5\u5177\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLM-SmartAudit\u7684\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5148\u8fdb\u80fd\u529b\u6765\u68c0\u6d4b\u548c\u5206\u6790\u667a\u80fd\u5408\u7ea6\u4e2d\u7684\u6f0f\u6d1e\u3002\u901a\u8fc7\u591a\u4ee3\u7406\u5bf9\u8bdd\u65b9\u6cd5\uff0cLLM-SmartAudit\u91c7\u7528\u534f\u4f5c\u7cfb\u7edf\u4e0e\u4e13\u4e1a\u4ee3\u7406\u5408\u4f5c\u4ee5\u589e\u5f3a\u5ba1\u8ba1\u8fc7\u7a0b\u3002\u4e3a\u4e86\u8bc4\u4f30LLM-SmartAudit\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u7f16\u5236\u4e86\u4e24\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u4e0e\u4f20\u7edf\u5de5\u5177\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u7684\u6807\u8bb0\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30\u5b9e\u9645\u5e94\u7528\u7684\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6240\u6709\u4f20\u7edf\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u5de5\u5177\u4e4b\u4e0a\uff0c\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u51c6\u786e\u6027\u548c\u66f4\u5927\u7684\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u68c0\u6d4b\u590d\u6742\u903b\u8f91\u6f0f\u6d1e\uff0c\u800c\u4f20\u7edf\u5de5\u5177\u4e4b\u524d\u672a\u66fe\u53d1\u73b0\u8fd9\u4e9b\u6f0f\u6d1e\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5229\u7528LLM\u4ee3\u7406\u63d0\u4f9b\u4e86\u4e00\u79cd\u975e\u5e38\u6709\u6548\u7684\u81ea\u52a8\u5316\u667a\u80fd\u5408\u7ea6\u5ba1\u8ba1\u65b9\u6cd5\u3002|\n", "2410.11239": "|**2024-10-15**|**HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications**|Weijie Xu et.al.|[2410.11239](http://arxiv.org/abs/2410.11239)|null|\u8fd1\u671f\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u5c55\u5728\u6559\u80b2\u548c\u91d1\u878d\u7b49\u9886\u57df\u5e26\u6765\u4e86\u8bb8\u591a\u76ca\u5904\uff0c\u4f46\u5728\u4eba\u529b\u8d44\u6e90\u9886\u57df\uff0c\u4ecd\u6709\u8bb8\u591a\u91cd\u590d\u6027\u7684\u6d41\u7a0b\u672a\u88ab\u89e3\u51b3\uff0c\u4f8b\u5982\u8bbf\u95ee\u8bf7\u6c42\u3001\u533b\u7597\u62a5\u9500\u548c\u8bf7\u5047\u7533\u8bf7\u7b49\u3002\u6211\u4eec\u8ba9\u8fd9\u4e9b\u4efb\u52a1\u4e0eLLM\u4ee3\u7406\u76f8\u5173\u8054\uff0c\u8be5\u4ee3\u7406\u5df2\u7ecf\u5904\u7406\u4e86\u8bf8\u5982\u5199\u4f5c\u8f85\u52a9\u548c\u5ba2\u6237\u652f\u6301\u7b49\u4efb\u52a1\u3002\u6211\u4eec\u63d0\u51fa\u4e86HR-Agent\uff0c\u8fd9\u662f\u4e00\u79cd\u9ad8\u6548\u3001\u4fdd\u5bc6\u4e14\u4e13\u95e8\u9488\u5bf9\u4eba\u529b\u8d44\u6e90\u9886\u57df\u7684\u57fa\u4e8eLLM\u7684\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u5904\u7406\u5982\u533b\u7597\u62a5\u9500\u548c\u8bbf\u95ee\u8bf7\u6c42\u7b49\u91cd\u590d\u6027\u7684\u4eba\u529b\u8d44\u6e90\u6d41\u7a0b\u3002\u7531\u4e8e\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e0d\u4f1a\u5c06\u5bf9\u8bdd\u6570\u636e\u53d1\u9001\u7ed9LLM\uff0c\u56e0\u6b64\u5b83\u80fd\u591f\u4fdd\u6301\u4eba\u529b\u8d44\u6e90\u76f8\u5173\u4efb\u52a1\u6240\u9700\u7684\u673a\u5bc6\u6027\u3002|\n", "2410.12568": "|**2024-10-16**|**Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving**|Sihao Wu et.al.|[2410.12568](http://arxiv.org/abs/2410.12568)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u5e38\u8bc6\u548c\u63a8\u7406\u80fd\u529b\uff0c\u6709\u6548\u89e3\u51b3\u4e86\u7eaf\u7cb9\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u7684\u7f3a\u9677\u3002\u5f53\u524d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u9700\u8981\u8f83\u957f\u7684\u63a8\u7406\u65f6\u95f4\uff0c\u5e76\u4e14\u5728\u4e0e\u5b9e\u65f6\u81ea\u52a8\u9a7e\u9a76\u73af\u5883\u4ea4\u4e92\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e00\u4e2a\u5173\u952e\u7684\u5f00\u653e\u95ee\u9898\u662f\uff0c\u6211\u4eec\u662f\u5426\u80fd\u591f\u6709\u6548\u5730\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u4e00\u4e2a\u9ad8\u6548\u4e14\u7a33\u5065\u7684\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u7684RAPID\u6846\u67b6\uff0c\u5373\u201c\u9c81\u68d2\u81ea\u9002\u5e94\u7b56\u7565\u6ce8\u5165\u4e0e\u84b8\u998f\u201d\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u7528\u7531\u57fa\u4e8eLLM\u7684\u9a7e\u9a76\u4ee3\u7406\u5408\u6210\u7684\u6570\u636e\u8bad\u7ec3\u4e13\u95e8\u7684\u6df7\u5408\u7b56\u7565RL\u4ee3\u7406\uff0c\u5e76\u8fdb\u884c\u5728\u7ebf\u9002\u5e94\u3002RAPID\u5177\u6709\u4e09\u4e2a\u5173\u952e\u8bbe\u8ba1\uff1a1\uff09\u5229\u7528\u4eceLLM\u4ee3\u7406\u6536\u96c6\u7684\u79bb\u7ebf\u6570\u636e\uff0c\u5c06\u4e13\u5bb6\u77e5\u8bc6\u63d0\u70bc\u5230RL\u7b56\u7565\u4e2d\u4ee5\u52a0\u5feb\u5b9e\u65f6\u63a8\u7406\u901f\u5ea6\uff1b2\uff09\u5f15\u5165\u9c81\u68d2\u84b8\u998f\u5230RL\u4e2d\uff0c\u4ee5\u7ee7\u627f\u6765\u81ea\u57fa\u4e8eLLM\u6559\u5e08\u7684\u6027\u80fd\u548c\u9c81\u68d2\u6027\uff1b3\uff09\u91c7\u7528\u6df7\u5408\u7b56\u7565\u65b9\u6cd5\uff0c\u901a\u8fc7\u7b56\u7565\u9002\u914d\u5668\u8fdb\u884c\u8054\u5408\u51b3\u7b56\u89e3\u7801\u3002\u901a\u8fc7\u5728\u7ebf\u73af\u5883\u4e92\u52a8\u8fdb\u884c\u5fae\u8c03\uff0cRAPID\u51cf\u5c11\u4e86LLM\u77e5\u8bc6\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u4e0d\u540c\u4efb\u52a1\u7684\u9002\u5e94\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRAPID\u80fd\u591f\u4ee5\u9ad8\u6548\u3001\u9002\u5e94\u6027\u5f3a\u548c\u9c81\u68d2\u7684\u65b9\u5f0f\u5c06LLM\u77e5\u8bc6\u6709\u6548\u5730\u6574\u5408\u5230\u89c4\u6a21\u5316\u7684RL\u7b56\u7565\u4e2d\u3002\u4ee3\u7801\u548c\u68c0\u67e5\u70b9\u5c06\u5728\u63a5\u53d7\u540e\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2410.12481": "|**2024-10-16**|**SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling**|Loris Gaven et.al.|[2410.12481](http://arxiv.org/abs/2410.12481)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0d\u4ec5\u4f5c\u4e3a\u751f\u6210\u6a21\u578b\uff0c\u8fd8\u5728\u89e3\u51b3\u6587\u672c\u5e8f\u5217\u51b3\u7b56\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u8272\u7684\u80fd\u529b\u3002\u5f53\u9762\u5bf9\u590d\u6742\u73af\u5883\u65f6\uff0c\u5982\u679c\u5176\u96f6\u6837\u672c\u80fd\u529b\u4e0d\u8db3\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0c\u53ef\u4ee5\u4f7f\u7528\u5728\u7ebf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u8ba9\u8fd9\u4e9bLLM\u4ee3\u7406\u4ea4\u4e92\u5f0f\u5730\u53d1\u73b0\u548c\u5b66\u4e60\u6709\u6548\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5148\u524d\u7684\u5de5\u4f5c\u5c40\u9650\u4e8e\u91c7\u7528\u7b56\u7565\u68af\u5ea6\u7b97\u6cd5\uff0c\u8fd9\u5927\u5927\u9650\u5236\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u63a2\u7d22\u548c\u5229\u7528\u65b9\u9762\u7684\u65b9\u6cd5\uff0c\u4f8b\u5982\u7ecf\u9a8c\u91cd\u653e\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u662fLLM\u5b66\u4e60\u4ee3\u7406\u7684\u5173\u952e\uff0c\u7279\u522b\u662f\u5728\u8bbe\u8ba1\u81ea\u4e3b\u5185\u5728\u52a8\u673a\u7684\u4ee3\u7406\u65f6\uff0c\u8fd9\u4e9b\u4ee3\u7406\u4f1a\u6839\u636e\u81ea\u5df1\u7684\u76ee\u6807\u8fdb\u884c\u91c7\u6837\u548c\u8ffd\u6c42\uff08\u5373\u81ea\u8db3\u6027\u4ee3\u7406\uff09\u3002\u672c\u6587\u63d0\u51fa\u5e76\u7814\u7a76\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u4ee3\u7406\u7684Soft Actor-Critic\u7b97\u6cd5\u548c\u4e8b\u540e\u91cd\u6807\u8bb0\u7684\u9002\u5e94\u6027\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u4e3a\u81ea\u8db3\u6027\u7684\u5728\u7ebf\u5b66\u4e60LLM\u4ee3\u7406\u94fa\u5e73\u4e86\u9053\u8def\uff0c\u8fd8\u53ef\u4ee5\u5728\u66f4\u7ecf\u5178\u7684\u591a\u76ee\u6807RL\u73af\u5883\u4e2d\u8d85\u8d8a\u7b56\u7565\u68af\u5ea6\u65b9\u6cd5\u3002|\n", "2410.12361": "|**2024-10-16**|**Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance**|Yaxi Lu et.al.|[2410.12361](http://arxiv.org/abs/2410.12361)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5df2\u7ecf\u5c55\u73b0\u51fa\u663e\u8457\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u4ee3\u7406\u7cfb\u7edf\u4ecd\u7136\u662f\u53cd\u5e94\u5f0f\u7684\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u9884\u89c1\u6027\u548c\u81ea\u4e3b\u51b3\u7b56\u7684\u573a\u666f\u4e2d\u7684\u6709\u6548\u6027\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u81f4\u529b\u4e8e\u5f00\u53d1\u80fd\u591f\u9884\u89c1\u5230\u5e76\u4e3b\u52a8\u53d1\u8d77\u4efb\u52a1\u7684\u4ee3\u7406\uff0c\u800c\u65e0\u9700\u660e\u786e\u7684\u4eba\u7c7b\u6307\u4ee4\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u771f\u5b9e\u4e16\u754c\u7684\u4eba\u7c7b\u6d3b\u52a8\u4ee5\u751f\u6210\u4e3b\u52a8\u5f0f\u4efb\u52a1\u9884\u6d4b\u3002\u8fd9\u4e9b\u9884\u6d4b\u968f\u540e\u7531\u4eba\u7c7b\u6807\u6ce8\u8005\u6807\u8bb0\u4e3a\u63a5\u53d7\u6216\u62d2\u7edd\u3002\u6807\u6ce8\u6570\u636e\u88ab\u7528\u4e8e\u8bad\u7ec3\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u5224\u65ad\uff0c\u5e76\u4f5c\u4e3aLLM\u4ee3\u7406\u4e3b\u52a8\u6027\u7684\u81ea\u52a8\u8bc4\u4f30\u5668\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u6570\u636e\u751f\u6210\u7ba1\u9053\uff0c\u4ee5\u521b\u5efa\u4e00\u4e2a\u591a\u6837\u5316\u7684\u6570\u636e\u96c6ProactiveBench\uff0c\u5305\u542b6790\u4e2a\u4e8b\u4ef6\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u901a\u8fc7\u4f7f\u7528\u63d0\u51fa\u7684ProactiveBench\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u663e\u8457\u6fc0\u53d1LLM\u4ee3\u7406\u7684\u4e3b\u52a8\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u5fae\u8c03\u6a21\u578b\u5728\u4e3b\u52a8\u63d0\u4f9b\u5e2e\u52a9\u65b9\u9762\u8fbe\u5230\u4e8666.47%\u7684F1\u5f97\u5206\uff0c\u8d85\u8fc7\u4e86\u6240\u6709\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u6211\u4eec\u65b9\u6cd5\u5728\u521b\u9020\u66f4\u4e3b\u52a8\u548c\u6709\u6548\u7684\u4ee3\u7406\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u672a\u6765\u7684\u4eba\u673a\u534f\u4f5c\u8fdb\u6b65\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2410.12236": "|**2024-10-16**|**Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay**|Yuyang Chen et.al.|[2410.12236](http://arxiv.org/abs/2410.12236)|null|\u5982\u4eca\uff0c\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684Transformer\u57fa\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u4f1a\u5e94\u7528\u91c7\u6837\u548c\u8fc7\u6ee4\u7ba1\u9053\u3002\u7531\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u7a00\u758f\u5956\u52b1\u95ee\u9898\uff0c\u5373\u4e00\u4e2a\u4ee4\u724c\u7684\u4e0d\u6b63\u786e\u6027\u4f1a\u5bfc\u81f4Transformer\u6a21\u578b\u91c7\u6837\u5197\u4f59\u7a0b\u5e8f\u76f4\u5230\u627e\u5230\u6b63\u786e\u7684\u7a0b\u5e8f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u4f4e\u6548\u7387\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5728\u5fae\u8c03\u9636\u6bb5\u5f15\u5165\u4e86\u7ecf\u9a8c\u56de\u653e\uff08ER\uff09\uff0c\u5176\u4e2d\u5b58\u50a8\u751f\u6210\u7684\u4ee3\u7801\u548c\u7a0b\u5e8f\uff0c\u5e76\u5c06\u8fd9\u4e9b\u7a0b\u5e8f\u91cd\u65b0\u64ad\u653e\uff0c\u4ee5\u4f7fLLM\u4ee3\u7406\u6709\u673a\u4f1a\u4ece\u8fc7\u53bb\u7684\u7ecf\u5386\u4e2d\u5b66\u4e60\u3002\u57fa\u4e8eER\u7684\u7cbe\u795e\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aBTP\u7ba1\u9053\uff0c\u8be5\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u9636\u6bb5\uff1a\u675f\u641c\u7d22\u91c7\u6837\u3001\u6d4b\u8bd5\u9636\u6bb5\u548c\u4f18\u5148\u7ea7\u7ecf\u9a8c\u56de\u653e\u9636\u6bb5\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7801\u6a21\u578b\u6536\u96c6\u7684\u5931\u8d25\u7a0b\u5e8f\uff0c\u5e76\u4ece\u56de\u653e\u7f13\u51b2\u533a\u4e2d\u91cd\u64ad\u5177\u6709\u9ad8\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\u4f18\u5148\u503c\uff08P2Value\uff09\u7684\u7a0b\u5e8f\uff0c\u4ece\u800c\u63d0\u9ad8\u6548\u7387\u3002P2Value\u7efc\u5408\u8003\u8651\u4e86Transformer\u8f93\u51fa\u7684\u53ef\u80fd\u6027\u548c\u901a\u8fc7\u7387\uff0c\u53ef\u4ee5\u5229\u7528\u5927\u591a\u6570\u7531LLMs\u6536\u96c6\u7684\u7a0b\u5e8f\u672a\u80fd\u901a\u8fc7\u4efb\u4f55\u6d4b\u8bd5\u800c\u5bfc\u81f4\u7684\u5197\u4f59\u8d44\u6e90\u3002\u6211\u4eec\u5728\u51e0\u4e2aLLM\u4e0a\u5b9e\u8bc1\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u8bc1\u660e\u5b83\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u57fa\u7ebf\u3002|\n", "2410.11906": "|**2024-10-15**|**Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents**|Bolun Sun et.al.|[2410.11906](http://arxiv.org/abs/2410.11906)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u589e\u5f3a\u7528\u6237\u5bf9\u9690\u79c1\u653f\u7b56\u7684\u7406\u89e3\u7684\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e00\u4e2a\u4ea4\u4e92\u5f0f\u5bf9\u8bdd\u4ee3\u7406\u6765\u5b9e\u73b0\u3002\u6211\u4eec\u8bc1\u660e\uff0cLLMs\u5728\u6570\u636e\u5b9e\u8df5\u8bc6\u522b\u3001\u9009\u62e9\u8bc6\u522b\u3001\u653f\u7b56\u603b\u7ed3\u548c\u9690\u79c1\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u6a21\u578b\uff0c\u4e3a\u9690\u79c1\u653f\u7b56\u5206\u6790\u8bbe\u5b9a\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u8be5\u4ee3\u7406\u5145\u5f53\u5904\u7406\u7f51\u7ad9\u9690\u79c1\u653f\u7b56\u7684\u4e13\u4e1a\u7cfb\u7edf\uff0c\u5f15\u5bfc\u7528\u6237\u7a7f\u8d8a\u590d\u6742\u7684\u6cd5\u5f8b\u8bed\u8a00\uff0c\u800c\u65e0\u9700\u4ed6\u4eec\u63d0\u51fa\u7279\u5b9a\u95ee\u9898\u3002\u4e00\u9879\u6709100\u540d\u53c2\u4e0e\u8005\u53c2\u4e0e\u7684\u7528\u6237\u7814\u7a76\u8868\u660e\uff0c\u4f7f\u7528\u4ee3\u7406\u8f85\u52a9\u7684\u7528\u6237\u5728\u7406\u89e3\u6c34\u5e73\u4e0a\u66f4\u9ad8\uff08\u5e73\u5747\u5206\u4e3a2.6\u5206\u4e2d\u76843\u5206\uff0c\u800c\u5bf9\u7167\u7ec4\u4e3a1.8\u5206\uff09\uff0c\u8ba4\u77e5\u8d1f\u8377\u66f4\u4f4e\uff08\u4efb\u52a1\u96be\u5ea6\u8bc4\u5206\u4e3a10\u5206\u4e2d\u76843.2\u5206\uff0c\u800c\u5bf9\u7167\u7ec4\u4e3a7.8\u5206\uff09\uff0c\u5bf9\u7ba1\u7406\u9690\u79c1\u66f4\u6709\u4fe1\u5fc3\uff0c\u5e76\u4e14\u5b8c\u6210\u4efb\u52a1\u6240\u9700\u65f6\u95f4\u66f4\u77ed\uff085.5\u5206\u949f\u5bf9\u6bd415.8\u5206\u949f\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u7a81\u663e\u4e86\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u6539\u53d8\u7528\u6237\u4e0e\u9690\u79c1\u653f\u7b56\u4e92\u52a8\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u77e5\u60c5\u7684\u540c\u610f\u5e76\u4f7f\u7528\u6237\u5728\u6570\u5b57\u670d\u52a1\u9886\u57df\u4e2d\u66f4\u52a0\u81ea\u4e3b\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u6765\u5b9e\u73b0\u4e2a\u6027\u5316\u548c\u6807\u51c6\u5316\u4efb\u52a1\uff0c\u53ef\u4ee5\u63d0\u9ad8\u4eba\u7c7b\u7684\u5de5\u4f5c\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\u7684\u540c\u65f6\uff0c\u7f51\u7edc\u4ee3\u7406\u4e5f\u4f5c\u4e3a\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6982\u5ff5\u9a8c\u8bc1\u793a\u4f8b\uff0c\u5176\u6210\u529f\u5c06\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u4f1a\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\uff0c\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u8fd9\u4e9b\u7b56\u7565\u53ef\u80fd\u65e0\u6cd5\u5f88\u597d\u5730\u63a8\u5e7f\u5230\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0e\u57fa\u4e8eLLM\u7684\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\u7684\u7814\u7a76\u975e\u5e38\u6709\u9650\u3002\u8fd9\u79cd\u5dee\u5f02\u7279\u522b\u660e\u663e\uff0c\u56e0\u4e3aLLM\u4e3b\u8981\u662f\u4e3a\u4e86\u8bed\u8a00\u8865\u5168\u800c\u8bad\u7ec3\u7684\uff0c\u800c\u4e0d\u662f\u4e3a\u4e86\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u5316\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\uff0c\u4f7f\u57fa\u4e8eLLM\u7684\u7f51\u7edc\u4ee3\u7406\u66f4\u597d\u5730\u4e0eLLM\u7684\u80fd\u529b\u76f8\u5339\u914d\uff0c\u4ece\u800c\u663e\u8457\u63d0\u5347\u4e86\u5176\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u7684\u57fa\u7840\u4ee3\u7406\u5728\u5404\u79cd\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4ee5\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u7684\u4ee3\u7406AgentOccam\u6bd4\u4ee5\u524d\u7684\u6700\u4f73\u65b9\u6cd5\u9ad8\u51fa9.8\u5206\uff08+29.4%\uff09\u548c5.9\u5206\uff08+15.8%\uff09\uff0c\u5e76\u4e14\u76f8\u6bd4\u7c7b\u4f3c\u7684\u666e\u901a\u7f51\u7edc\u4ee3\u7406\uff0c\u5176\u6210\u529f\u7387\u63d0\u9ad8\u4e8626.6\u5206\uff08+161%\uff09\u3002\u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u7684\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u7b80\u5355\u8bbe\u8ba1\u5c55\u793a\u4e86LLMs\u5728\u65e0\u6837\u672c\u5b66\u4e60\u4e0b\u5904\u7406\u7f51\u7edc\u4efb\u52a1\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002|\n", "2410.13768": "|**2024-10-17**|**Rapid and Automated Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems**|Alireza Ghafarollahi et.al.|[2410.13768](http://arxiv.org/abs/2410.13768)|null|\u4e00\u4e2a\u591a\u667a\u80fd\u4f53AI\u6a21\u578b\u88ab\u7528\u4e8e\u81ea\u52a8\u5316\u53d1\u73b0\u65b0\u7684\u91d1\u5c5e\u5408\u91d1\uff0c\u6574\u5408\u4e86\u591a\u6a21\u6001\u6570\u636e\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u5305\u62ec\u901a\u8fc7\u539f\u5b50\u6a21\u62df\u83b7\u5f97\u7684\u7269\u7406\u89c1\u89e3\u3002\u6211\u4eec\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u6709\u4e09\u4e2a\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff1a(a) \u4e00\u7ec4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8d1f\u8d23\u63a8\u7406\u548c\u89c4\u5212\u7b49\u4efb\u52a1\uff0c(b) \u4e00\u7fa4\u5177\u6709\u4e0d\u540c\u89d2\u8272\u548c\u4e13\u957f\u7684AI\u4ee3\u7406\u52a8\u6001\u534f\u4f5c\uff0c\u4ee5\u53ca(c) \u4e00\u79cd\u65b0\u5f00\u53d1\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\uff08GNN\uff09\u6a21\u578b\uff0c\u7528\u4e8e\u5feb\u901f\u68c0\u7d22\u5173\u952e\u7269\u7406\u5c5e\u6027\u3002\u4e00\u7ec4\u7531LLM\u9a71\u52a8\u7684AI\u4ee3\u7406\u5408\u4f5c\uff0c\u81ea\u52a8\u5316\u63a2\u7d22MPEAs\uff08\u9ad8\u71b5\u5408\u91d1\uff09\u7684\u5de8\u5927\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u5e76\u7531GNN\u7684\u9884\u6d4b\u6307\u5bfc\u3002\u6211\u4eec\u4e13\u6ce8\u4e8eNbMoTa\u65cf\u4f53\u5fc3\u7acb\u65b9\uff08bcc\uff09\u5408\u91d1\uff0c\u4f7f\u7528\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u539f\u5b50\u95f4\u52bf\u8fdb\u884c\u5efa\u6a21\uff0c\u76ee\u6807\u662f\u4e24\u4e2a\u5173\u952e\u5c5e\u6027\uff1aPeierls\u52bf\u5792\u548c\u6eb6\u8d28/\u87ba\u578b\u4f4d\u9519\u76f8\u4e92\u4f5c\u7528\u80fd\u3002\u6211\u4eec\u7684GNN\u6a21\u578b\u51c6\u786e\u5730\u9884\u6d4b\u8fd9\u4e9b\u539f\u5b50\u5c3a\u5ea6\u7684\u5c5e\u6027\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u6602\u8d35\u7684\u7a77\u4e3e\u8ba1\u7b97\u66f4\u5feb\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u51cf\u8f7b\u4e86\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u5728\u7269\u7406\u5c5e\u6027\u68c0\u7d22\u4e0a\u7684\u8ba1\u7b97\u8d1f\u62c5\u3002\u8be5AI\u7cfb\u7edf\u901a\u8fc7\u51cf\u5c11\u5bf9\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u4f9d\u8d56\u5e76\u514b\u670d\u76f4\u63a5\u5168\u539f\u5b50\u6a21\u62df\u7684\u9650\u5236\uff0c\u9769\u65b0\u4e86\u6750\u6599\u53d1\u73b0\u8fc7\u7a0b\u3002\u901a\u8fc7\u534f\u540cGNN\u7684\u9884\u6d4b\u80fd\u529b\u548cLLM\u4ee3\u7406\u7684\u52a8\u6001\u534f\u4f5c\uff0c\u8be5\u7cfb\u7edf\u81ea\u4e3b\u5bfc\u822a\u5de8\u5927\u7684\u5408\u91d1\u8bbe\u8ba1\u7a7a\u95f4\uff0c\u8bc6\u522b\u539f\u5b50\u5c3a\u5ea6\u6750\u6599\u5c5e\u6027\u7684\u8d8b\u52bf\uff0c\u5e76\u9884\u6d4b\u5b8f\u89c2\u673a\u68b0\u5f3a\u5ea6\uff0c\u5982\u51e0\u4e2a\u8ba1\u7b97\u5b9e\u9a8c\u6240\u5c55\u793a\u7684\u90a3\u6837\u3002\u8fd9\u79cd\u65b9\u6cd5\u52a0\u901f\u4e86\u5148\u8fdb\u5408\u91d1\u7684\u53d1\u73b0\uff0c\u5e76\u6709\u671b\u5728\u5176\u4ed6\u590d\u6742\u7cfb\u7edf\u4e2d\u6709\u66f4\u5e7f\u6cdb\u7684\u5e94\u7528\uff0c\u6807\u5fd7\u7740\u81ea\u52a8\u5316\u6750\u6599\u8bbe\u8ba1\u9886\u57df\u7684\u4e00\u5927\u8fdb\u6b65\u3002|\n", "2410.13610": "|**2024-10-17**|**MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling**|Yakun Zhu et.al.|[2410.13610](http://arxiv.org/abs/2410.13610)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u96c6\u6210\u5de5\u5177\u5df2\u7ecf\u4fc3\u8fdb\u4e86\u5176\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u5728\u4e13\u95e8\u7684\u4e0b\u6e38\u4efb\u52a1\u573a\u666f\u4e2d\uff0c\u4ec5\u4f9d\u8d56\u5de5\u5177\u662f\u4e0d\u8db3\u4ee5\u5e94\u5bf9\u73b0\u5b9e\u4e16\u754c\u7684\u590d\u6742\u6027\u7684\uff0c\u8fd9\u5c24\u5176\u9650\u5236\u4e86LLMs\u5728\u533b\u5b66\u7b49\u9886\u57df\u7684\u6709\u6548\u90e8\u7f72\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u533b\u5b66\u8ba1\u7b97\u5668\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u8fd9\u4e9b\u8ba1\u7b97\u5668\u4f7f\u7528\u6807\u51c6\u5316\u6d4b\u8bd5\u6765\u8bc4\u4f30\u4e2a\u4eba\u7684\u5065\u5eb7\u72b6\u51b5\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86MeNTi\uff0c\u8fd9\u662f\u4e00\u79cd\u4e3aLLMs\u8bbe\u8ba1\u7684\u901a\u7528\u4ee3\u7406\u67b6\u6784\u3002MeNTi\u96c6\u6210\u4e86\u4e13\u4e1a\u7684\u533b\u5b66\u5de5\u5177\u5305\uff0c\u5e76\u91c7\u7528\u5143\u5de5\u5177\u548c\u5d4c\u5957\u8c03\u7528\u673a\u5236\u4ee5\u589e\u5f3aLLMs\u5bf9\u5de5\u5177\u7684\u5229\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5b9e\u73b0\u4e86\u7075\u6d3b\u7684\u5de5\u5177\u9009\u62e9\u548c\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u6765\u89e3\u51b3\u590d\u6742\u7684\u533b\u7597\u573a\u666f\u4e2d\u7684\u5b9e\u9645\u95ee\u9898\uff0c\u5305\u62ec\u8ba1\u7b97\u5668\u9009\u62e9\u3001\u69fd\u586b\u5145\u548c\u5355\u4f4d\u8f6c\u6362\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u5728\u6574\u4e2a\u4e34\u5e8a\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u533b\u5b66\u8ba1\u7b97\u5668\u8fdb\u884c\u8ba1\u7b97\u548c\u8bc4\u4f30\u60a3\u8005\u5065\u5eb7\u72b6\u51b5\u7684\u80fd\u529b\uff0c\u6211\u4eec\u5f15\u5165\u4e86CalcQA\u57fa\u51c6\u3002\u8be5\u57fa\u51c6\u7531\u4e13\u4e1a\u533b\u751f\u6784\u5efa\uff0c\u5305\u542b100\u4e2a\u6848\u4f8b-\u8ba1\u7b97\u5668\u5bf9\uff0c\u5e76\u9644\u5e26\u4e00\u4e2a\u5305\u542b281\u4e2a\u533b\u5b66\u5de5\u5177\u7684\u5de5\u5177\u5305\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\u3002\u8fd9\u9879\u7814\u7a76\u4e3a\u5728\u533b\u5b66\u7684\u9ad8\u8981\u6c42\u573a\u666f\u4e2d\u5e94\u7528LLMs\u5f00\u8f9f\u4e86\u65b0\u7684\u65b9\u5411\u3002|\n", "2410.13185": "|**2024-10-17**|**Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents**|Long Li et.al.|[2410.13185](http://arxiv.org/abs/2410.13185)|null|\u6709\u6548\u7684\u7814\u7a76\u521b\u610f\u6784\u601d\u662f\u79d1\u5b66\u7814\u7a76\u7684\u5173\u952e\u6b65\u9aa4\u3002\u7136\u800c\uff0c\u79d1\u5b66\u6587\u732e\u7684\u6307\u6570\u589e\u957f\u4f7f\u5f97\u7814\u7a76\u4eba\u5458\u96be\u4ee5\u8ddf\u4e0a\u6700\u65b0\u7684\u8fdb\u5c55\u5e76\u786e\u5b9a\u6709\u610f\u4e49\u7684\u7814\u7a76\u65b9\u5411\u3002\u6700\u8fd1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u8868\u660e\uff0c\u81ea\u52a8\u5316\u751f\u6210\u65b0\u7684\u7814\u7a76\u521b\u610f\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u521b\u610f\u751f\u6210\u65b9\u6cd5\u8981\u4e48\u7b80\u5355\u5730\u63d0\u793aLLMs\uff0c\u8981\u4e48\u76f4\u63a5\u5411LLMs\u66b4\u9732\u5927\u91cf\u7684\u6587\u732e\u800c\u6ca1\u6709\u6307\u793a\u6709\u7528\u7684\u4fe1\u606f\u3002\u53d7\u5230\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7814\u7a76\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u94fe\u5f0f\u60f3\u6cd5\uff08Chain-of-Ideas, CoI\uff09\u7684\u4ee3\u7406\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u94fe\u5f0f\u7ed3\u6784\u7ec4\u7ec7\u76f8\u5173\u6587\u732e\uff0c\u6709\u6548\u5730\u53cd\u6620\u4e86\u7814\u7a76\u9886\u57df\u7684\u6e10\u8fdb\u53d1\u5c55\u3002\u8fd9\u79cd\u7ec4\u7ec7\u65b9\u5f0f\u4f7fLLMs\u80fd\u591f\u6355\u6349\u5230\u7814\u7a76\u9886\u57df\u7684\u5f53\u524d\u8fdb\u5c55\uff0c\u4ece\u800c\u589e\u5f3a\u5176\u521b\u610f\u751f\u6210\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aIdea Arena\u7684\u8bc4\u4f30\u534f\u8bae\uff0c\u53ef\u4ee5\u4ece\u4e0d\u540c\u89d2\u5ea6\u5168\u9762\u8bc4\u4f30\u521b\u610f\u751f\u6210\u65b9\u6cd5\uff0c\u4e0e\u4eba\u7c7b\u7814\u7a76\u4eba\u5458\u7684\u504f\u597d\u7d27\u5bc6\u5bf9\u9f50\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCoI\u4ee3\u7406\u5728\u521b\u610f\u751f\u6210\u65b9\u9762\u59cb\u7ec8\u4f18\u4e8e\u5176\u4ed6\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8d28\u91cf\u4e0a\u53ef\u4e0e\u4eba\u7c7b\u5ab2\u7f8e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684CoI\u4ee3\u7406\u6210\u672c\u6548\u76ca\u9ad8\uff0c\u751f\u6210\u4e00\u4e2a\u5019\u9009\u521b\u610f\u53ca\u5176\u76f8\u5e94\u5b9e\u9a8c\u8bbe\u8ba1\u7684\u6700\u4f4e\u6210\u672c\u4ec5\u4e3a0.50\u7f8e\u5143\u3002|\n"}, "llm": {"2405.10311": "|**2024-05-16**|**UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models**|Sahel Sharifymoghaddam et.al.|[2405.10311](http://arxiv.org/abs/2405.10311)|null|## \u80cc\u666f \u8fd1\u671f\uff0c\u591a\u6a21\u6001\uff08MM\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u89e3\u9501\u4e86\u8bb8\u591a\u9700\u8981\u591a\u6a21\u6001\u7406\u89e3\uff08\u5982\u56fe\u50cf\u63cf\u8ff0\u6216\u89c6\u89c9\u95ee\u7b54\uff09\u548c\u751f\u6210\uff08\u5982\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u751f\u6210\u6216\u7f16\u8f91\uff09\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347MM-LLMs\u7684\u8f93\u51fa\u8d28\u91cf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6a21\u578b\u901a\u7528\u7684UniRAG\u6280\u672f\uff0c\u5b83\u5728\u63a8\u7406\u9636\u6bb5\u5c06\u76f8\u5173\u68c0\u7d22\u4fe1\u606f\u6dfb\u52a0\u5230\u63d0\u793a\u4e2d\uff0c\u4f5c\u4e3a\u5c11\u91cf\u6837\u4f8b\u3002\u4e0e\u666e\u904d\u8ba4\u4e3a\u68c0\u7d22\u589e\u5f3a\uff08RA\uff09\u4e3b\u8981\u6539\u8fdb\u7f55\u89c1\u5b9e\u4f53\u7684\u751f\u6210\u6216\u7406\u89e3\u4e0d\u540c\uff0c\u6211\u4eec\u5728MSCOCO\u6570\u636e\u96c6\u4e0a\u5bf9\u5305\u62ecGPT4\u3001Gemini-Pro\u5728\u5185\u7684\u4e13\u6709\u6a21\u578b\u4ee5\u53caLlava\u3001LaVIT\u548cEmu2\u7b49\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8f93\u5165\u63d0\u793a\u901a\u8fc7MM\u68c0\u7d22\u5668\uff08\u5982UniIR\u6a21\u578b\uff09\u589e\u5f3a\u540e\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u8d28\u91cf\u3002|\n", "2405.10305": "|**2024-05-16**|**4D Panoptic Scene Graph Generation**|Jingkang Yang et.al.|[2405.10305](http://arxiv.org/abs/2405.10305)|**[link](https://github.com/jingkang50/psg4d)**|**\u6211\u4eec\u751f\u6d3b\u5728\u4e00\u4e2a\u4e09\u7ef4\u7a7a\u95f4\u4e2d\uff0c\u540c\u65f6\u901a\u8fc7\u7b2c\u56db\u7ef4\u65f6\u95f4\u5411\u524d\u63a8\u8fdb\u3002\u4e3a\u4e86\u4f7f\u4eba\u5de5\u667a\u80fd\u80fd\u591f\u5168\u9762\u7406\u89e3\u8fd9\u79cd4D\u73af\u5883\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8868\u793a\u5f62\u5f0f\u2014\u20144D\u5168\u666f\u573a\u666f\u56fe\uff08PSG-4D\uff09\uff0c\u5b83\u5c06\u52a8\u60014D\u4e16\u754c\u4e2d\u7684\u539f\u59cb\u89c6\u89c9\u6570\u636e\u62bd\u8c61\u4e3a\u8282\u70b9\u548c\u8fb9\uff0c\u8282\u70b9\u4ee3\u8868\u5177\u6709\u7cbe\u786e\u4f4d\u7f6e\u548c\u72b6\u6001\u4fe1\u606f\u7684\u5b9e\u4f53\uff0c\u8fb9\u6355\u6349\u65f6\u95f4\u5173\u7cfb\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5728\u8fd9\u4e00\u65b0\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u6ce8\u91caPSG-4D\u6570\u636e\u96c6\uff0c\u5305\u542b3000\u4e2aRGB-D\u89c6\u9891\uff0c\u603b\u8ba1100\u4e07\u5e27\uff0c\u6bcf\u5e27\u90fd\u5e26\u67094D\u5168\u666f\u5206\u5272\u63a9\u7801\u4ee5\u53ca\u8be6\u7ec6\u7684\u52a8\u6001\u573a\u666f\u56fe\u6807\u7b7e\u3002\u6211\u4eec\u4e3a\u6b64\u4efb\u52a1\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSG4DFormer\u7684Transformer\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u5168\u666f\u5206\u5272\u63a9\u7801\uff0c\u6cbf\u65f6\u95f4\u8f74\u8ddf\u8e2a\u63a9\u7801\uff0c\u5e76\u901a\u8fc7\u5173\u7cfb\u7ec4\u4ef6\u751f\u6210\u76f8\u5e94\u7684\u573a\u666f\u56fe\u3002\u5728\u65b0\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u672a\u6765\u7684PSG-4D\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u878d\u5165\u6211\u4eec\u7684PSG-4D\u7cfb\u7edf\u6765\u5b9e\u73b0\u52a8\u6001\u573a\u666f\u7406\u89e3\u7684\u4e00\u4e2a\u5b9e\u9645\u5e94\u7528\u793a\u4f8b\u3002**|\n", "2405.10299": "|**2024-05-16**|**HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models**|Rhea Sanjay Sukthanker et.al.|[2405.10299](http://arxiv.org/abs/2405.10299)|**[link](https://github.com/automl/hw-aware-llm-bench)**|**\u968f\u7740\u8bed\u8a00\u6a21\u578b\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u786c\u4ef6\u6307\u6807\uff08\u5982\u5ef6\u8fdf\u3001\u80fd\u8017\u3001GPU\u5185\u5b58\u4f7f\u7528\u548c\u6027\u80fd\uff09\u4e4b\u95f4\u7684\u6743\u8861\u9700\u6c42\u65e5\u76ca\u589e\u957f\u3002\u4eba\u4eec\u6b63\u5728\u5bfb\u6c42\u4e3a\u4e0d\u540c\u8bed\u8a00\u6a21\u578b\u914d\u7f6e\u5efa\u7acb\u5e15\u7d2f\u6258\u524d\u6cbf\uff0c\u4ee5\u5728\u6307\u5b9a\u786c\u4ef6\u9650\u5236\u4e0b\u627e\u5230\u6700\u4f18\u6a21\u578b\u3002\u7136\u800c\uff0c\u5bf9\u591a\u79cd\u67b6\u6784\u5728\u591a\u53f0\u8bbe\u5907\u4e0a\u7684\u5168\u9762\u8bad\u7ec3\u548c\u8bc4\u4f30\u5728\u8ba1\u7b97\u4e0a\u662f\u4e0d\u53ef\u884c\u7684\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86HW-GPT-Bench\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8e\u786c\u4ef6\u611f\u77e5\u7684\u8bed\u8a00\u6a21\u578b\u4ee3\u7406\u57fa\u51c6\uff0c\u5229\u7528\u795e\u7ecf\u67b6\u6784\u641c\u7d22\uff08NAS\uff09\u4e2d\u7684\u6743\u91cd\u5171\u4eab\u6280\u672f\uff0c\u5728\u4e00\u4e2a\u6a21\u578b\u4e2d\u9ad8\u6548\u5730\u8bad\u7ec3\u5305\u542b\u4e0d\u540c\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u8d85\u7f51\u7edc\u3002\u6211\u4eec\u572813\u79cd\u8bbe\u5907\u4e0a\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u4e86\u6027\u80fd\u5256\u6790\uff0c\u8003\u8651\u4e865\u79cd\u786c\u4ef6\u6307\u6807\u548c3\u79cd\u4e0d\u540c\u7684\u6a21\u578b\u89c4\u6a21\u3002\u6700\u540e\uff0c\u6211\u4eec\u901a\u8fc78\u79cd\u4e0d\u540c\u7684\u591a\u76ee\u6807NAS\u7b97\u6cd5\u5c55\u793a\u4e86HW-GPT-Bench\u7684\u53ef\u7528\u6027\uff0c\u5e76\u8bc4\u4f30\u4e86\u7531\u6b64\u4ea7\u751f\u7684\u5e15\u7d2f\u6258\u524d\u6cbf\u7684\u8d28\u91cf\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u63a8\u52a8\u548c\u52a0\u901f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u591a\u76ee\u6807\u65b9\u6cd5\uff0c\u5982NAS\u548c\u7ed3\u6784\u5316\u526a\u679d\u7684\u7814\u7a76\u3002**|\n", "2405.10288": "|**2024-05-16**|**Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction**|Jianhao Chen et.al.|[2405.10288](http://arxiv.org/abs/2405.10288)|**[link](https://github.com/jianhaochen-nju/tsdre)**|**\u6458\u8981\uff1a** \u4e8b\u5b9e\u62bd\u53d6\u5bf9\u4e8e\u6784\u5efa\u77e5\u8bc6\u56fe\u8c31\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5bf9\u65f6\u95f4\u76f8\u5173\u4e8b\u5b9e\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u9700\u6c42\u589e\u957f\uff0c\u51fa\u73b0\u4e86\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u4efb\u52a1\u3002\u672c\u6587\u7279\u522b\u5173\u6ce8\u4ece\u81ea\u7136\u8bed\u8a00\u6587\u672c\u4e2d\u63d0\u53d6\u65f6\u95f4\u6027\u4e8b\u5b9e\u3002\u5148\u524d\u7684\u7814\u7a76\u672a\u80fd\u59a5\u5584\u5904\u7406\u590d\u6742\u53e5\u5b50\u4e2d\u65f6\u95f4\u4e0e\u4e8b\u5b9e\u5bf9\u5e94\u5173\u7cfb\u7684\u5efa\u7acb\u96be\u9898\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u65f6\u95f4\u7ebf\u7684\u53e5\u5b50\u5206\u89e3\u7b56\u7565\uff0c\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u4e8b\u5b9e\u76f8\u5173\u65f6\u95f4\u7ebf\u7684\u7cbe\u7ec6\u7406\u89e3\u3002\u7136\u800c\uff0c\u76f4\u63a5\u4f7f\u7528LLMs\u8fdb\u884c\u65f6\u95f4\u6027\u4e8b\u5b9e\u62bd\u53d6\u7684\u6027\u80fd\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86TSDRE\u65b9\u6cd5\uff0c\u5c06LLMs\u7684\u5206\u89e3\u80fd\u529b\u878d\u5165\u5230\u5c0f\u578b\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u4f20\u7edf\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002 \u4e3a\u4e86\u652f\u6301\u8bc4\u4f30\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u590d\u6742\u7684\u65f6\u5e8f\u4e8b\u5b9e\u62bd\u53d6\u6570\u636e\u96c6ComplexTRED\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTSDRE\u5728HyperRED-Temporal\u548cComplexTRED\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002|\n", "2405.10276": "|**2024-05-16**|**Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers**|Tuo Zhang et.al.|[2405.10276](http://arxiv.org/abs/2405.10276)|null|\u8fd1\u5e74\u6765\uff0c\u8bb8\u591a\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u7b56\u7565\u6027\u63d0\u793a\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6548\u80fd\u3002\u7279\u522b\u662f\u4f18\u5316\u901a\u8fc7prompting\uff08OPRO\uff09\u65b9\u6cd5\u8868\u73b0\u51fa\u9876\u5c16\u6027\u80fd\uff0c\u5b83\u5229\u7528LLMs\u4f5c\u4e3a\u4f18\u5316\u5668\uff0c\u76ee\u6807\u662f\u5bfb\u627e\u80fd\u6700\u5927\u5316\u4efb\u52a1\u51c6\u786e\u6027\u7684\u6307\u4ee4\u3002\u672c\u8bba\u6587\u91cd\u65b0\u5ba1\u89c6\u4e86OPRO\u5728\u5c0f\u578bLLMs\uff08\u5982LaMa-2\u7cfb\u5217\u548cMistral 7B\uff09\u4e0a\u7684\u81ea\u52a8\u5316\u63d0\u793a\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u5bf9\u4e8e\u5c0f\u578bLLMs\uff0cOPRO\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u5176\u6709\u9650\u7684\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86\u4f18\u5316\u6f5c\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u672a\u6765\u7684\u81ea\u52a8\u63d0\u793a\u5de5\u7a0b\u5e94\u540c\u65f6\u8003\u8651\u6a21\u578b\u80fd\u529b\u548c\u8ba1\u7b97\u6210\u672c\u3002\u9488\u5bf9\u5c0f\u578bLLMs\uff0c\u6211\u4eec\u63a8\u8350\u76f4\u63a5\u63d0\u4f9b\u660e\u786e\u9610\u8ff0\u76ee\u6807\u548c\u65b9\u6cd5\u7684\u6307\u4ee4\uff0c\u4f5c\u4e3a\u7a33\u5065\u7684\u63d0\u793a\u57fa\u7ebf\uff0c\u4ee5\u786e\u4fdd\u5728\u5f53\u524d\u7814\u7a76\u4e2d\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u63d0\u793a\u8bbe\u8ba1\u3002|\n", "2405.10260": "|**2024-05-16**|**Keep It Private: Unsupervised Privatization of Online Text**|Calvin Bao et.al.|[2405.10260](http://arxiv.org/abs/2405.10260)|**[link](https://github.com/csbao/kip-privatization)**|**## \u80cc\u666f \u4f5c\u8005\u8eab\u4efd\u6df7\u6dc6\u6280\u672f\u6709\u671b\u901a\u8fc7\u81ea\u52a8\u91cd\u5199\u6587\u672c\u6765\u4fdd\u62a4\u7f51\u7edc\u901a\u4fe1\u4e2d\u7684\u4e2a\u4eba\u9690\u79c1\u3002\u7136\u800c\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6587\u732e\u4e2d\uff0c\u8fd9\u4e9b\u6280\u672f\u7684\u8bc4\u4f30\u5927\u591a\u5c40\u9650\u5728\u72ed\u5c0f\u573a\u666f\u4e0b\uff0c\u4e3b\u8981\u4f9d\u8d56\u4e8e\u8868\u9762\u7684\u7f16\u8f91\u64cd\u4f5c\uff0c\u53ef\u80fd\u5bfc\u81f4\u8f93\u51fa\u4e0d\u81ea\u7136\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u6587\u672c\u79c1\u5bc6\u5316\u6846\u67b6\uff0c\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u751f\u6210\u517c\u987e\u51c6\u786e\u3001\u8fde\u8d2f\u548c\u9690\u79c1\u7684\u91cd\u5199\u3002\u6211\u4eec\u5728\u5927\u89c4\u6a21\u7684\u82f1\u8bedReddit\u5e16\u5b50\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u8be5\u6570\u636e\u96c6\u753168,000\u540d\u4f5c\u8005\u64b0\u5199\uff0c\u5305\u542b\u77ed\u5230\u4e2d\u7b49\u957f\u5ea6\u7684\u6587\u672c\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5728\u4e0d\u540c\u8bc4\u4f30\u6761\u4ef6\u4e0b\uff0c\u5982\u4f5c\u8005\u7b80\u4ecb\u957f\u5ea6\u548c\u4f5c\u8005\u8bc6\u522b\u7b56\u7565\uff0c\u6027\u80fd\u7684\u53d8\u5316\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u81ea\u52a8\u5316\u6307\u6807\u548c\u4eba\u5de5\u8bc4\u4f30\u4e2d\u4fdd\u6301\u9ad8\u6587\u672c\u8d28\u91cf\uff0c\u5e76\u6210\u529f\u5730\u89c4\u907f\u4e86\u51e0\u79cd\u81ea\u52a8\u4f5c\u8005\u8bc6\u522b\u653b\u51fb\u3002**|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255](http://arxiv.org/abs/2405.10255)|**[link](https://github.com/activevisionlab/awesome-llm-3d)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u4e0e\u4e09\u7ef4\u7a7a\u95f4\u6570\u636e\uff083D-LLMs\uff09\u7684\u878d\u5408\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u8fd9\u6781\u5927\u5730\u589e\u5f3a\u4e86\u7406\u89e3\u548c\u4e92\u52a8\u7269\u7406\u73af\u5883\u7684\u80fd\u529b\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u4f7fLLMs\u80fd\u591f\u5904\u7406\u3001\u7406\u89e3\u5e76\u751f\u6210\u4e09\u7ef4\u6570\u636e\u7684\u65b9\u6cd5\u8bba\uff0c\u5f3a\u8c03\u4e86LLMs\u7684\u72ec\u7279\u4f18\u52bf\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\u3001\u9010\u6b65\u63a8\u7406\u3001\u5f00\u653e\u8bcd\u6c47\u80fd\u529b\u548c\u4e30\u5bcc\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u8fd9\u4e9b\u5c06\u6781\u5927\u5730\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u4f53\u5728\u7a7a\u95f4\u7406\u89e3\u4e0e\u4ea4\u4e92\u65b9\u9762\u7684\u53d1\u5c55\u3002\u7814\u7a76\u8986\u76d6\u4e86\u4ece\u70b9\u4e91\u5230\u795e\u7ecf\u8f90\u5c04\u573a\uff08NeRF\uff09\u7b49\u5404\u79cd\u4e09\u7ef4\u6570\u636e\u8868\u793a\uff0c\u5e76\u8003\u5bdf\u4e86\u5b83\u4eec\u4e0eLLMs\u5728\u4efb\u52a1\u4e2d\u7684\u7ed3\u5408\uff0c\u5982\u4e09\u7ef4\u573a\u666f\u7406\u89e3\u3001\u63cf\u8ff0\u3001\u95ee\u7b54\u548c\u5bf9\u8bdd\uff0c\u4ee5\u53ca\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u8fdb\u884c\u7a7a\u95f4\u63a8\u7406\u3001\u89c4\u5212\u548c\u5bfc\u822a\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7b80\u8981\u56de\u987e\u4e86\u5176\u4ed6\u7ed3\u5408\u4e09\u7ef4\u548c\u8bed\u8a00\u7684\u65b9\u6cd5\u3002\u672c\u6587\u7684\u5143\u5206\u6790\u663e\u793a\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u4e5f\u6307\u51fa\u4e86\u6316\u63983D-LLMs\u5168\u90e8\u6f5c\u529b\u6240\u9700\u7684\u521b\u65b0\u65b9\u6cd5\u7684\u5fc5\u8981\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u65e8\u5728\u4e3a\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u63d0\u4f9b\u6307\u5bfc\uff0c\u63a2\u7d22\u548c\u6269\u5c553D-LLMs\u5728\u7406\u89e3\u548c\u4e92\u52a8\u590d\u6742\u4e09\u7ef4\u4e16\u754c\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u652f\u6301\u672c\u8c03\u67e5\uff0c\u6211\u4eec\u5df2\u5728GitHub\u4e0a\u5efa\u7acb\u4e86\u4e00\u4e2a\u9879\u76ee\u9875\u9762\uff0c\u6574\u7406\u5e76\u5217\u51fa\u4e86\u76f8\u5173\u8bba\u6587\uff1ahttps://github.com/ActiveVisionLab/Awesome-LLM-3D\u3002|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251](http://arxiv.org/abs/2405.10251)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u8bc4\u4f30\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u7b49\u65b9\u9762\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u636e\u6211\u4eec\u6240\u77e5\uff0c\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u6df1\u5165\u7814\u7a76\uff0c\u8fd9\u662f\u8861\u91cf\u6a21\u578b\u4f18\u79c0\u7a0b\u5ea6\u7684\u5173\u952e\u6807\u51c6\u3002\u56e0\u6b64\uff0c\u672c\u8bba\u6587\u65e8\u5728\u5168\u9762\u8bc4\u4f30\u77e5\u540d\u4e14\u6027\u80fd\u51fa\u8272\u7684LLMs\uff0c\u5305\u62ecChatGPT\u3001ChatGLM\u3001\u57fa\u4e8eT5\u7684\u6a21\u578b\u3001\u57fa\u4e8eLLaMA\u7684\u6a21\u578b\u548cPythia\u6a21\u578b\uff0c\u5728\u5bf9\u8bdd\u751f\u6210\u548c\u6587\u672c\u603b\u7ed3\u7b49NLG\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u6db5\u76d6\u82f1\u8bed\u548c\u4e2d\u6587\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5171\u540c\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5305\u62ec\u8f93\u5165\u6a21\u677f\u548c\u540e\u5904\u7406\u7b56\u7565\u3002\u7814\u7a76\u7ed3\u679c\u62a5\u544a\u4e86\u81ea\u52a8\u8bc4\u5206\uff0c\u540c\u65f6\u8fdb\u884c\u4e86\u8be6\u7ec6\u5206\u6790\u3002|\n", "2405.10250": "|**2024-05-16**|**IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers**|Hao Yan et.al.|[2405.10250](http://arxiv.org/abs/2405.10250)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u81ea\u52a8\u751f\u6210\u53ef\u6267\u884c\u4ee3\u7801\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u4e92\u52a8\u529f\u80fd\uff0c\u7528\u6237\u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u6307\u5bfc\u6a21\u578b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u4e92\u52a8\u65b9\u5f0f\u5f80\u5f80\u5047\u8bbe\u7528\u6237\u5177\u5907\u8c03\u8bd5\u6e90\u4ee3\u7801\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5bf9\u975e\u4e13\u4e1a\u7a0b\u5e8f\u5458\u4e0d\u592a\u53cb\u597d\u3002\u8fd9\u4f7f\u5f97\u4f7f\u4e92\u52a8\u4ee3\u7801\u751f\u6210\u5bf9\u4e0d\u540c\u7f16\u7a0b\u6c34\u5e73\u7684\u4e2a\u4f53\u66f4\u6613\u4e8e\u4f7f\u7528\u6210\u4e3a\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86IntelliExplain\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u4eba\u673a\u4ea4\u4e92\u8303\u5f0f\uff0c\u901a\u8fc7\u8ba9\u7528\u6237\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u4e0e\u6e90\u4ee3\u7801\u4e92\u52a8\uff0c\u63d0\u5347\u975e\u4e13\u4e1a\u4eba\u58eb\u7684\u4f53\u9a8c\u3002\u7528\u6237\u901a\u8fc7\u63d0\u4f9b\u4ed6\u4eec\u53d1\u73b0\u9519\u8bef\u7684\u81ea\u7136\u8bed\u8a00\u7ea0\u6b63\u53cd\u9988\uff0c\u6765\u6307\u5bfc\u7cfb\u7edf\u4fee\u8ba2\u4ee3\u7801\uff0c\u76f4\u5230\u7528\u6237\u5bf9\u7cfb\u7edf\u7684\u4ee3\u7801\u89e3\u91ca\u611f\u5230\u6ee1\u610f\u3002\u6211\u4eec\u7684\u7528\u6237\u7814\u7a76\u663e\u793a\uff0c\u4f7f\u7528IntelliExplain\u7684\u7528\u6237\u5728Text-to-SQL\u548cPython\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u6210\u529f\u7387\u5206\u522b\u6bd4\u7eafGPT-3.5\u63d0\u9ad8\u4e8611.6%\u548c25.3%\uff0c\u540c\u65f6\u6240\u9700\u65f6\u95f4\u5206\u522b\u51cf\u5c11\u4e8639.0%\u548c15.6%\u3002|\n", "2405.10212": "|**2024-05-16**|**CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations**|Jiahao Zhao et.al.|[2405.10212](http://arxiv.org/abs/2405.10212)|**[link](https://github.com/CAS-SIAT-XinHai/CPsyExam)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5fc3\u7406\u5b66\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014CPsyExam\uff0c\u5b83\u6e90\u4e8e\u4e2d\u56fd\u8bed\u8a00\u8003\u8bd5\u7684\u95ee\u9898\u3002CPsyExam\u65e8\u5728\u5206\u522b\u5f3a\u8c03\u5fc3\u7406\u5b66\u77e5\u8bc6\u548c\u6848\u4f8b\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u8ba4\u8bc6\u5230\u5c06\u5fc3\u7406\u5b66\u77e5\u8bc6\u5e94\u7528\u4e8e\u5b9e\u9645\u60c5\u5883\u7684\u4ef7\u503c\u3002\u4ece22,000\u4e2a\u95ee\u9898\u5e93\u4e2d\uff0c\u6211\u4eec\u7cbe\u9009\u4e864,000\u4e2a\u6765\u6784\u5efa\u8be5\u57fa\u51c6\uff0c\u786e\u4fdd\u4e86\u4e3b\u9898\u7684\u5747\u8861\u8986\u76d6\uff0c\u5e76\u5305\u542b\u4e86\u5404\u79cd\u6848\u4f8b\u5206\u6790\u65b9\u6cd5\u7684\u591a\u6837\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u57fa\u7840\u7684\u6a21\u578b\u3002\u5b9e\u9a8c\u548c\u5206\u6790\u7ed3\u679c\u663e\u793a\uff0cCPsyExam\u662f\u4e00\u4e2a\u6709\u6548\u7684\u786e\u7acb\u8bed\u8a00\u6a21\u578b\u5bf9\u5fc3\u7406\u5b66\u7406\u89e3\u80fd\u529b\u7684\u57fa\u51c6\uff0c\u540c\u65f6\u652f\u6301\u5728\u4e0d\u540c\u7c92\u5ea6\u4e0a\u6bd4\u8f83\u8fd9\u4e9b\u6a21\u578b\u3002|\n", "2405.10936": "|**2024-05-17**|**A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers**|Kaiyu Huang et.al.|[2405.10936](http://arxiv.org/abs/2405.10936)|**[link](https://github.com/kaiyuhwang/mllm-survey)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u8bed\u8a00\u80fd\u529b\uff0c\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u548c\u4e1a\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u51cf\u5c11\u6f5c\u5728\u7684\u6b67\u89c6\u5e76\u63d0\u5347\u6280\u672f\u7684\u901a\u7528\u6027\u548c\u53ef\u8bbf\u95ee\u6027\uff0c\u5bf9\u4e8e\u591a\u8bed\u8a00\u6280\u672f\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1LLMs\u53d6\u5f97\u4e86\u7a81\u7834\uff0c\u4f46\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u7684\u6df1\u5165\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u56e0\u6b64\uff0c\u8feb\u5207\u9700\u8981\u4e00\u4efd\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u603b\u7ed3\u8fd1\u671f\u7684\u65b9\u6cd5\u3001\u8fdb\u5c55\u3001\u5c40\u9650\u6027\u548c\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u65e8\u5728\u4ece\u591a\u4e2a\u89d2\u5ea6\u5ba1\u89c6LLMs\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u9996\u5148\u56de\u987e\u4e86\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7814\u7a76\u7684\u5386\u53f2\u6f14\u53d8\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86LLMs\u7684\u591a\u8bed\u8a00\u7279\u6027\uff0c\u5305\u62ec\u8bad\u7ec3\u548c\u63a8\u7406\u65b9\u6cd5\u3001\u6a21\u578b\u5b89\u5168\u3001\u8de8\u9886\u57df\u4e0e\u6587\u5316\u9002\u5e94\u4ee5\u53ca\u6570\u636e\u96c6\u4f7f\u7528\u3002\u6211\u4eec\u8fd8\u5206\u6790\u4e86\u8fd9\u4e9b\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u53ef\u80fd\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347LLMs\u7684\u591a\u8bed\u8a00\u6027\u80fd\u3002\u672c\u7efc\u8ff0\u65e8\u5728\u5e2e\u52a9\u7814\u7a76\u754c\u5e94\u5bf9\u591a\u8bed\u8a00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u5173\u4e8e\u57fa\u4e8eLLMs\u7684\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6838\u5fc3\u6982\u5ff5\u3001\u5173\u952e\u6280\u672f\u53ca\u6700\u65b0\u8fdb\u5c55\u7684\u5168\u9762\u7406\u89e3\u3002**|\n", "2405.10928": "|**2024-05-17**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928](http://arxiv.org/abs/2405.10928)|**[link](https://github.com/apolloresearch/rib)**|### \u6982\u8ff0 \u673a\u68b0\u89e3\u91ca\u6027\u76ee\u6807\u662f\u901a\u8fc7\u9006\u5411\u5de5\u7a0b\u7406\u89e3\u795e\u7ecf\u7f51\u7edc\u7684\u884c\u4e3a\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u89e3\u6790\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5bf9\u6fc0\u6d3b\u7684\u5206\u89e3\uff0c\u4f7f\u5f97\u5355\u4e2a\u795e\u7ecf\u5143\u6216\u6a21\u578b\u7ec4\u4ef6\u65e0\u6cd5\u6e05\u6670\u5bf9\u5e94\u4e8e\u72ec\u7279\u7684\u7279\u5f81\u6216\u529f\u80fd\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\u2014\u2014\u5c40\u90e8\u4ea4\u4e92\u57fa\uff08Local Interaction Basis\uff0cLIB\uff09\u3002LIB\u65e8\u5728\u901a\u8fc7\u6d88\u9664\u65e0\u5173\u6fc0\u6d3b\u548c\u4ea4\u4e92\uff0c\u8bc6\u522b\u8ba1\u7b97\u7279\u5f81\u3002\u8be5\u65b9\u6cd5\u6452\u5f03\u65e0\u610f\u4e49\u7684\u6fc0\u6d3b\u65b9\u5411\uff0c\u5e76\u4f7f\u57fa\u7840\u4e0e\u76f8\u90bb\u5c42\u95f4\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u5947\u5f02\u5411\u91cf\u5bf9\u9f50\u3002\u540c\u65f6\uff0c\u5b83\u6839\u636e\u7279\u5f81\u5bf9\u540e\u7eed\u8ba1\u7b97\u7684\u91cd\u8981\u6027\u8fdb\u884c\u7f29\u653e\uff0c\u751f\u6210\u4e00\u4e2a\u663e\u793a\u6a21\u578b\u4e2d\u6240\u6709\u8ba1\u7b97\u76f8\u5173\u7279\u6027\u548c\u4ea4\u4e92\u7684\u56fe\u8c31\u3002 \u6211\u4eec\u5728\u6a21\u5757\u52a0\u6cd5\u548cCIFAR-10\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86LIB\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u76f8\u6bd4\u4e8e\u4e3b\u6210\u5206\u5206\u6790\uff0cLIB\u80fd\u8bc6\u522b\u51fa\u66f4\u591a\u8ba1\u7b97\u76f8\u5173\u7684\u7279\u5f81\uff0c\u5e76\u5448\u73b0\u51fa\u66f4\u7a00\u758f\u7684\u4ea4\u4e92\u3002\u7136\u800c\uff0c\u5728\u5e94\u7528\u4e8e\u8bed\u8a00\u6a21\u578b\u65f6\uff0cLIB\u5e76\u672a\u663e\u8457\u63d0\u9ad8\u53ef\u89e3\u91ca\u6027\u6216\u4ea4\u4e92\u7a00\u758f\u5ea6\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f97\u51fa\u7ed3\u8bba\uff0c\u5c3d\u7ba1LIB\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u7406\u8bba\u9a71\u52a8\u65b9\u6cd5\uff0c\u4f46\u5f53\u524d\u5f62\u5f0f\u5e76\u4e0d\u9002\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893](http://arxiv.org/abs/2405.10893)|null|\u8fd9\u7bc7\u6280\u672f\u8bba\u6587\u9610\u8ff0\u4e86COGNET-MD\uff0c\u4e00\u4e2a\u4e13\u4e3a\u533b\u7597\u9886\u57df\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7684\u65b0\u57fa\u51c6\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u5206\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u533b\u5b66\u6587\u672c\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u8bbe\u8ba1\u4e86\u4e00\u7cfb\u5217\u96be\u5ea6\u5206\u7ea7\u7684\u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u6570\u636e\u5e93\u3002\u8fd9\u4e2a\u6570\u636e\u5e93\u7531\u591a\u4e2a\u533b\u7597\u9886\u57df\u7684\u4e13\u5bb6\u5408\u4f5c\u521b\u5efa\uff0c\u4ee5\u53cd\u6620\u5f53\u524d\u533b\u5b66\u8d8b\u52bf\uff0c\u786e\u4fdd\u5b89\u5168\u3001\u5b9e\u7528\u548c\u9002\u7528\u6027\u3002\u521d\u671f\u7248\u672c\u5305\u542b\u4e86\u7cbe\u795e\u79d1\u3001\u7259\u79d1\u3001\u80ba\u75c5\u5b66\u3001\u76ae\u80a4\u79d1\u548c\u5185\u5206\u6ccc\u5b66\u7b49\u9886\u57df\u7684\u9898\u76ee\uff0c\u4f46\u4f1a\u6301\u7eed\u6269\u5c55\uff0c\u672a\u6765\u8fd8\u4f1a\u52a0\u5165\u66f4\u591a\u533b\u5b66\u5b66\u79d1\u3002|\n", "2405.10883": "|**2024-05-17**|**Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review**|Hongyi Yang et.al.|[2405.10883](http://arxiv.org/abs/2405.10883)|null|\u8be5\u7efc\u8ff0\u65e8\u5728\u7cfb\u7edf\u5730\u8bc4\u4f30\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u60a3\u8005\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u73b0\u72b6\u548c\u524d\u666f\uff0c\u4ee5\u53ca\u5176\u5bf9\u5eb7\u590d\u8fc7\u7a0b\u7684\u5f71\u54cd\u3002\u6211\u4eec\u4ece2012\u5e74\u81f3\u73b0\u5728\u7b5b\u9009\u4e8670\u9879\u7814\u7a76\uff0c\u91cd\u70b9\u5173\u6ce8\u673a\u5668\u5b66\u4e60\u3001\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u5728\u5fc3\u7406\u5065\u5eb7\u5e72\u9884\u548c\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u3001\u6280\u672f\u7c7b\u522b\u3001\u4ea7\u54c1\u548c\u6570\u636e\u7c7b\u578b\uff0c\u5982\u751f\u6001\u77ac\u65f6\u8bc4\u4f30\u3001\u884c\u4e3a\u548c\u8bed\u97f3\u6570\u636e\u7684\u5206\u6790\u3002\u7ed3\u679c\u663e\u793a\uff0cAI\u5728\u75c7\u72b6\u76d1\u6d4b\u3001\u590d\u53d1\u98ce\u9669\u9884\u6d4b\u548c\u5eb7\u590d\u6cbb\u7597\u4e2d\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u8fd8\u63a2\u8ba8\u4e86\u57fa\u4e8eAI\u7684\u65b0\u5174\u4ea7\u54c1\u3001\u6280\u672f\u548c\u5206\u6790\u65b9\u6cd5\uff0c\u5982\u793e\u4ea4\u5a92\u4f53\u5206\u6790\u3001\u4e25\u8083\u6e38\u620f\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5eb7\u590d\u4e2d\u7684\u6f5c\u5728\u6311\u6218\u548c\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u603b\u7684\u6765\u8bf4\uff0c\u8fd9\u7bc7\u8bba\u6587\u7cfb\u7edf\u56de\u987e\u4e86AI\u5728\u7cbe\u795e\u5206\u88c2\u75c7\u5eb7\u590d\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\uff0c\u5e76\u4e3a\u672a\u6765\u7684\u7814\u7a76\u8def\u5f84\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u548c\u5efa\u8bae\u3002|\n", "2405.10853": "|**2024-05-17**|**The Future of Large Language Model Pre-training is Federated**|Lorenzo Sani et.al.|[2405.10853](http://arxiv.org/abs/2405.10853)|null|## \u80cc\u666f \u751f\u6210\u5f0f\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u8fd9\u5f97\u76ca\u4e8e\u5b83\u4eec\u6240\u63a5\u53d7\u7684\u6d77\u91cf\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u5df2\u5efa\u7acb\u7684\u89c4\u6a21\u6cd5\u5219\uff0cLLMs\u672a\u6765\u6027\u80fd\u7684\u63d0\u5347\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6211\u4eec\u80fd\u591f\u5229\u7528\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u6709\u53ef\u80fd\u91ca\u653e\u5168\u7403\u5927\u90e8\u5206\u672a\u5145\u5206\u5229\u7528\u7684\u6570\u636e\u548c\u8ba1\u7b97\u80fd\u529b\uff0c\u8fd9\u4e9b\u662f\u5f53\u524d\u4ee5\u6570\u636e\u4e2d\u5fc3\u4e3a\u4e2d\u5fc3\u7684LLM\u8bad\u7ec3\u65b9\u6cd5\u6240\u5ffd\u89c6\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7a33\u5065\u3001\u7075\u6d3b\u4e14\u53ef\u590d\u73b0\u7684FL\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdb\u673a\u6784\u95f4\u7684\u5927\u89c4\u6a21\u534f\u4f5c\uff0c\u5171\u540c\u8bad\u7ec3LLMs\uff0c\u4ece\u800c\u52a8\u5458\u66f4\u591a\u7684\u8ba1\u7b97\u548c\u6570\u636e\u8d44\u6e90\uff0c\u751a\u81f3\u53ef\u80fd\u8fbe\u5230\u6216\u8d85\u8d8a\u4e2d\u5fc3\u5316\u7684\u6027\u80fd\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cdFL\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u5728\u6709\u9650\u8d44\u6e90\u4e0b\u6269\u5c55\u5230\u767e\u4ebf\u5143\u7ea7\u7684\u8054\u90a6LLM\uff0c\u4f7f\u5f97\u62e5\u6709\u4e30\u5bcc\u6570\u636e\u7684\u5b9e\u4f53\u80fd\u591f\u6210\u4e3a\u9884\u8bad\u7ec3LLMs\u7684\u4e3b\u5bfc\u529b\u91cf\uff0c\u800c\u4e0d\u662f\u4ec5\u8ba9\u8ba1\u7b97\u8d44\u6e90\u4e30\u5bcc\u7684\u673a\u6784\u72ec\u5360\u9ccc\u5934\u3002\u8fd9\u79cd\u65b9\u6cd5\u5f3a\u8c03\u4e86\u8054\u90a6\u8bad\u7ec3\u7684\u89c4\u6a21\u6548\u76ca\uff0c\u5e76\u4e3a\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b9e\u7528\u8def\u5f84\u3002|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825](http://arxiv.org/abs/2405.10825)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5353\u8d8a\u7684\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\uff0c\u5b83\u4eec\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u5c24\u5176\u5728\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u7684\u63a8\u52a8\u4e0b\u5c55\u73b0\u51fa\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\uff08AGI\uff09\u7684\u6f5c\u529b\u3002\u672c\u7814\u7a76\u65e8\u5728\u5168\u9762\u6982\u8ff0LLM\u8d4b\u80fd\u7684\u7535\u4fe1\u7f51\u7edc\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86LLMs\u7684\u57fa\u7840\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u63a8\u7406\u4e0e\u5e94\u7528\u3001\u6a21\u578b\u8bc4\u4f30\uff0c\u4ee5\u53ca\u5728\u7535\u4fe1\u90e8\u7f72\u4e2d\u7684\u8fd0\u7528\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u63a2\u8ba8LLM\u652f\u6301\u7684\u5173\u952e\u6280\u672f\u548c\u7535\u4fe1\u5e94\u7528\uff0c\u6d89\u53ca\u751f\u6210\u3001\u5206\u7c7b\u3001\u4f18\u5316\u548c\u9884\u6d4b\u95ee\u9898\u3002\u751f\u6210\u5e94\u7528\u5305\u62ec\u7535\u4fe1\u9886\u57df\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u7f51\u7edc\u914d\u7f6e\u81ea\u52a8\u751f\u6210\u3002\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u4efb\u52a1\u6db5\u76d6\u7f51\u7edc\u5b89\u5168\u3001\u6587\u672c\u3001\u56fe\u50cf\u548c\u6d41\u91cf\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u5229\u7528LLMs\u7684\u81ea\u52a8\u5316\u4f18\u5316\u6280\u672f\uff0c\u5982\u5f3a\u5316\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u8bbe\u8ba1\u548c\u53e3\u8bed\u5f3a\u5316\u5b66\u4e60\u3002\u5bf9\u4e8e\u9884\u6d4b\u95ee\u9898\uff0cLLMs\u53ef\u7528\u4e8e\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u548c\u591a\u6a21\u6001\u7535\u4fe1\u9884\u6d4b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86LLM\u8d4b\u80fd\u7535\u4fe1\u7f51\u7edc\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u3002|\n", "2405.10808": "|**2024-05-17**|**ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios**|Markus Bayer et.al.|[2405.10808](http://arxiv.org/abs/2405.10808)|null|\u4e3b\u52a8\u5b66\u4e60\u65e8\u5728\u901a\u8fc7\u4f18\u5148\u5904\u7406\u6700\u80fd\u63d0\u5347\u5b66\u4e60\u6548\u679c\u7684\u5b9e\u4f8b\u6765\u51cf\u5c11\u6807\u6ce8\u5de5\u4f5c\u91cf\u3002\u7136\u800c\uff0c\u8bb8\u591a\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u9762\u4e34\u201c\u51b7\u542f\u52a8\u201d\u95ee\u9898\uff0c\u5373\u5728\u521d\u671f\u9700\u8981\u5927\u91cf\u6570\u636e\u624d\u80fd\u53d1\u6325\u6548\u80fd\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982BERT\uff09\u4e0a\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u5df2\u8868\u73b0\u826f\u597d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u2014\u2014ActiveLLM\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u3001Llama 3\u548cMistral Large\uff09\u8fdb\u884c\u5b9e\u4f8b\u9009\u62e9\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0cActiveLLM\u663e\u8457\u63d0\u9ad8\u4e86BERT\u5206\u7c7b\u5668\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u548cSetFit\u7b49\u5c11\u6570\u6837\u672c\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0cActiveLLM\u8fd8\u80fd\u6269\u5c55\u5230\u975e\u5c11\u91cf\u6837\u672c\u573a\u666f\uff0c\u652f\u6301\u8fed\u4ee3\u9009\u62e9\uff0c\u4ece\u800c\u5e2e\u52a9\u5176\u4ed6\u4e3b\u52a8\u5b66\u4e60\u7b56\u7565\u514b\u670d\u51b7\u542f\u52a8\u96be\u9898\u3002\u7ed3\u679c\u8868\u660e\uff0cActiveLLM\u4e3a\u6539\u5584\u4e0d\u540c\u5b66\u4e60\u73af\u5883\u4e2d\u7684\u6a21\u578b\u6027\u80fd\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2405.10745": "|**2024-05-17**|**Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings**|Albert Sawczyn et.al.|[2405.10745](http://arxiv.org/abs/2405.10745)|null|### \u7ffb\u8bd1 \u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u5bf9\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u63d0\u51fa\u4e86\u4e25\u5cfb\u6311\u6218\u3002\u901a\u5e38\u91c7\u7528\u7684\u65b9\u6cd5\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u65f6\u5f80\u5f80\u5b58\u5728\u5c40\u9650\u6027\u3002\u7136\u800c\uff0c\u4eba\u4eec\u5df2\u7ecf\u52aa\u529b\u901a\u8fc7\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u5f25\u8865\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u901a\u8fc7\u5c06\u5c0f\u89c4\u6a21\u7684\u9886\u57df\u7279\u5b9aKG\u4e0e\u901a\u7528KG\u76f8\u7ed3\u5408\u3002\u5c3d\u7ba1KG\u5728\u77e5\u8bc6\u8868\u793a\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4f46\u6784\u5efa\u5b83\u4eec\u7684\u6210\u672c\u53ef\u80fd\u963b\u788d\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u94fe\u63a5\u5230\u5927\u89c4\u6a21\u901a\u7528KG\u6765\u63d0\u5347\u5c0f\u578b\u9886\u57df\u7279\u5b9aKG\u5d4c\u5165\u7684\u5b66\u4e60\u6027\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5e26\u6765\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u4f8b\u5982\uff0cHits@10\u6307\u6807\u6700\u9ad8\u63d0\u9ad8\u4e8644%\u3002\u8fd9\u4e00\u76f8\u5bf9\u672a\u88ab\u5145\u5206\u63a2\u7d22\u7684\u7814\u7a76\u65b9\u5411\u6709\u671b\u4fc3\u8fdbKG\u5728\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u66f4\u9891\u7e41\u8fd0\u7528\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u4e3a\u7a33\u5065\u3001\u53ef\u9760\u7684ML\u89e3\u51b3\u65b9\u6848\uff0c\u5b83\u4eec\u76f8\u8f83\u4e8e\u6d41\u884c\u4f46\u6613\u51fa\u9519\u7684LLM\u65b9\u6cd5\u66f4\u5177\u53ef\u9760\u6027\u3002\u5173\u952e\u8bcd\uff1a\u77e5\u8bc6\u56fe\u8c31\u3001\u77e5\u8bc6\u56fe\u8c31\u8865\u5168\u3001\u5b9e\u4f53\u5bf9\u9f50\u3001\u8868\u793a\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739](http://arxiv.org/abs/2405.10739)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|**\u5728\u8fc7\u53bb\u4e00\u5e74\u91cc\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\u3001\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u7b49\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5e9e\u5927\u89c4\u6a21\u548c\u9ad8\u6602\u7684\u8bad\u7ec3\u4e0e\u63a8\u7406\u6210\u672c\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u9ad8\u6548\u4e14\u8f7b\u91cf\u7ea7\u7684MLLM\u5177\u6709\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u8fb9\u7f18\u8ba1\u7b97\u73af\u5883\u4e2d\u3002\u672c\u7efc\u8ff0\u5168\u9762\u7cfb\u7edf\u5730\u56de\u987e\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7684\u7814\u7a76\u73b0\u72b6\u3002\u6211\u4eec\u6982\u8ff0\u4e86\u4ee3\u8868\u6027\u9ad8\u6548\u6a21\u578b\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u603b\u7ed3\u4e86\u6709\u6548\u7ed3\u6784\u548c\u7b56\u7565\u7684\u7814\u7a76\u72b6\u6001\uff0c\u4ee5\u53ca\u5176\u5b9e\u7528\u5e94\u7528\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5f53\u524d\u9ad8\u6548MLLM\u7814\u7a76\u7684\u5c40\u9650\uff0c\u5e76\u5c55\u671b\u4e86\u6709\u524d\u666f\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002\u5982\u9700\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684GitHub\u4ed3\u5e93\uff1ahttps://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey\u3002**|\n", "2405.10725": "|**2024-05-17**|**INDUS: Effective and Efficient Language Models for Scientific Applications**|Bishwaranjan Bhattacharjee et.al.|[2405.10725](http://arxiv.org/abs/2405.10725)|null|\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bad\u7ec3\u6570\u636e\u53ef\u4ee5\u4f7f\u6a21\u578b\u5728\u4e13\u4e1a\u4efb\u52a1\u4e0a\u8868\u73b0\u66f4\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86INDUS\uff0c\u4e00\u5957\u4e13\u4e3a\u5730\u7403\u79d1\u5b66\u3001\u751f\u7269\u5b66\u3001\u7269\u7406\u5b66\u3001\u592a\u9633\u7269\u7406\u3001\u884c\u661f\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u8bbe\u8ba1\u7684\u5b9a\u5236\u5316\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u4e9b\u6a21\u578b\u57fa\u4e8e\u7cbe\u5fc3\u6311\u9009\u7684\u79d1\u5b66\u8bed\u6599\u5e93\uff0c\u5305\u62ec\uff1a\uff081\uff09\u4e00\u4e2a\u4f7f\u7528\u9886\u57df\u4e13\u7528\u8bcd\u6c47\u548c\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u7f16\u7801\u5668\uff0c\u7528\u4e8e\u63d0\u5347\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u7684\u8868\u73b0\uff1b\uff082\uff09\u4e00\u4e2a\u57fa\u4e8e\u5bf9\u6bd4\u5b66\u4e60\u7684\u901a\u7528\u6587\u672c\u5d4c\u5165\u6a21\u578b\uff0c\u5229\u7528\u591a\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u4f18\u5316\u4fe1\u606f\u68c0\u7d22\u4efb\u52a1\uff1b\uff083\uff09\u901a\u8fc7\u77e5\u8bc6\u84b8\u998f\u6280\u672f\u7f29\u5c0f\u89c4\u6a21\u7684\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5bf9\u5ef6\u8fdf\u548c\u8d44\u6e90\u6709\u9650\u7684\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e09\u4e2a\u65b0\u7684\u79d1\u5b66\u57fa\u51c6\u6570\u636e\u96c6\uff1aCLIMATE-CHANGE-NER\uff08\u5b9e\u4f53\u8bc6\u522b\uff09\u3001NASA-QA\uff08\u62bd\u53d6\u5f0f\u95ee\u7b54\uff09\u548cNASA-IR\uff08\u4fe1\u606f\u68c0\u7d22\uff09\uff0c\u4ee5\u63a8\u52a8\u8de8\u5b66\u79d1\u9886\u57df\u7684\u7814\u7a76\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u65b0\u4efb\u52a1\u548c\u76f8\u5173\u9886\u57df\u73b0\u6709\u57fa\u51c6\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8e\u901a\u7528\u7f16\u7801\u5668\uff08\u5982RoBERTa\uff09\u548c\u73b0\u6709\u7684\u9886\u57df\u7279\u5b9a\u7f16\u7801\u5668\uff08\u5982SciBERT\uff09\u3002|\n", "2405.12217": "|**2024-05-20**|**Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning**|Guanglin Zhou et.al.|[2405.12217](http://arxiv.org/abs/2405.12217)|**[link](https://github.com/jameszhou-gl/icl-distribution-shift)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5e94\u5bf9\u81ea\u7136\u5206\u5e03\u53d8\u5316\u65f6\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u9c81\u68d2\u6027\uff0c\u5e38\u5e38\u8d85\u8d8a\u5148\u524d\u7684\u57fa\u51c6\u3002\u7136\u800c\uff0c\u9886\u57df\u7279\u5b9a\u7684\u9002\u5e94\u4ecd\u7136\u662f\u5fc5\u8981\u7684\uff0c\u5c24\u5176\u662f\u5728\u533b\u7597\u7b49\u4e13\u4e1a\u9886\u57df\u3002\u9274\u4e8eLMMs\u5e9e\u5927\u7684\u53c2\u6570\u7a7a\u95f4\u4f7f\u5176\u5fae\u8c03\u4e0d\u5207\u5b9e\u9645\uff0c\u672c\u7814\u7a76\u805a\u7126\u4e8e\u63a2\u7d22\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u4f5c\u4e3a\u4e00\u79cd\u589e\u5f3aLMM\u9002\u5e94\u6027\u7684\u6709\u6548\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0cICL\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u793a\u4f8b\u7684\u9009\u62e9\uff0c\u8fd9\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7c7b\u4f3c\uff0c\u4f46\u5bf9\u9762\u4e34\u5206\u5e03\u53d8\u5316\u7684LMMs\u63d0\u51fa\u4e86\u72ec\u7279\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u79cd\u65e0\u76d1\u7763\u7684ICL\u65b9\u6cd5\u2014\u2014TopKNearestPR\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u7279\u5f81\u76f8\u4f3c\u6027\u8fdb\u884c\u6700\u8fd1\u793a\u4f8b\u641c\u7d22\u6765\u9009\u62e9\u793a\u4f8b\u3002\u7814\u7a76\u63ed\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u5904\u7406\u5206\u5e03\u8f6c\u79fb\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u7f3a\u9677\u5bf9\u5176\u6548\u679c\u7684\u9650\u5236\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014InvariantSelectPR\uff0c\u5b83\u5229\u7528\u7c7b\u6761\u4ef6\u5bf9\u6bd4\u4e0d\u53d8\u6027\uff08CCI\uff09\u6765\u63d0\u5347\u9884\u8bad\u7ec3\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7a33\u5065\u6027\u3002CCI\u901a\u8fc7\u589e\u5f3a\u4e0d\u540c\u7c7b\u522b\u95f4\u7684\u533a\u5206\u5ea6\u5e76\u786e\u4fdd\u5bf9\u9886\u57df\u7279\u5b9a\u53d8\u5316\u7684\u4e0d\u53d8\u6027\uff0c\u63d0\u9ad8\u4e86\u7f16\u7801\u5668\u8bc6\u522b\u548c\u68c0\u7d22\u6700\u6709\u4fe1\u606f\u4ef7\u503c\u793a\u4f8b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u65b9\u6cd5\u6709\u52a9\u4e8e\u5f15\u5bfcLMM\u9002\u5e94\u65b0\u7684\u67e5\u8be2\u6837\u672c\uff0c\u5373\u4f7f\u5728\u4e0d\u540c\u7684\u5206\u5e03\u4e0b\u4e5f\u662f\u5982\u6b64\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cInvariantSelectPR\u663e\u8457\u63d0\u9ad8\u4e86LMM\u7684\u9002\u5e94\u6027\uff0c\u5728Camelyon17\u548cHAM10000\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u76847-shot\u4efb\u52a1\u4e2d\uff0c\u5206\u522b\u5b9e\u73b0\u4e8634.2%\u548c16.9%\u7684\u51c6\u786e\u7387\u63d0\u5347\uff0c\u76f8\u5bf9\u4e8e\u96f6-shot\u6027\u80fd\uff0c\u8fd9\u662f\u663e\u8457\u7684\u8fdb\u6b65\u3002**|\n", "2405.12209": "|**2024-05-20**|**MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark**|Hongwei Liu et.al.|[2405.12209](http://arxiv.org/abs/2405.12209)|**[link](https://github.com/open-compass/mathbench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u6570\u5b66\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f20\u7edf\u7684\u6570\u5b66\u57fa\u51c6\u5982GSM8k\u5728\u5168\u9762\u8bc4\u4ef7\u8fd9\u4e9b\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MathBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u4e25\u683c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u5b66\u80fd\u529b\u3002MathBench\u8986\u76d6\u5e7f\u6cdb\u7684\u6570\u5b66\u5b66\u79d1\uff0c\u5bf9\u7406\u8bba\u7406\u89e3\u548c\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u8fdb\u884c\u8be6\u5c3d\u8bc4\u4f30\u3002\u5b83\u5206\u4e3a\u4e94\u4e2a\u9636\u6bb5\uff0c\u4ece\u57fa\u7840\u7b97\u672f\u5230\u5927\u5b66\u6570\u5b66\uff0c\u7ed3\u6784\u4e0a\u8bbe\u8ba1\u7528\u4e8e\u8003\u5bdf\u6a21\u578b\u5728\u4e0d\u540c\u6df1\u5ea6\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u6bcf\u4e2a\u9636\u6bb5\u5305\u62ec\u7406\u8bba\u95ee\u9898\u548c\u5e94\u7528\u9898\uff0c\u4ee5\u8861\u91cf\u6a21\u578b\u7684\u6570\u5b66\u719f\u7ec3\u5ea6\u53ca\u5176\u5728\u5b9e\u9645\u60c5\u5883\u4e2d\u5e94\u7528\u6982\u5ff5\u7684\u80fd\u529b\u3002MathBench\u7684\u76ee\u6807\u662f\u63d0\u5347\u5bf9LLMs\u6570\u5b66\u80fd\u529b\u7684\u8bc4\u4ef7\uff0c\u63d0\u4f9b\u5bf9\u5176\u77e5\u8bc6\u7406\u89e3\u6c34\u5e73\u548c\u95ee\u9898\u89e3\u51b3\u6280\u80fd\u7684\u7ec6\u81f4\u89c6\u89d2\uff0c\u540c\u65f6\u652f\u6301\u53cc\u8bed\u73af\u5883\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728https://github.com/open-compass/MathBench\u3002**|\n", "2405.12195": "|**2024-05-20**|**Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey**|Thiago S. Vaillant et.al.|[2405.12195](http://arxiv.org/abs/2405.12195)|**[link](https://github.com/gpt-impact/Paper-content)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5176\u5f3a\u5927\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\u548c\u5e7f\u6cdb\u5e94\u7528\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4e0e\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u7684\u878d\u5408\u8d8b\u52bf\u65e5\u76ca\u660e\u663e\uff0c\u4f46\u5173\u4e8e\u8fd9\u79cd\u878d\u5408\u5982\u4f55\u5f71\u54cd\u8f6f\u4ef6\u5f00\u53d1\u5b9e\u8df5\u548c\u8ba4\u77e5\u7684\u7814\u7a76\u4ecd\u663e\u4e0d\u8db3\u3002\u4e3a\u4e86\u63ed\u793a\u5c06AI\u9a71\u52a8\u5de5\u5177\uff0c\u5982ChatGPT\uff0c\u878d\u5165\u8f6f\u4ef6\u5f00\u53d1\u8fc7\u7a0b\u7684\u5f71\u54cd\u548c\u6311\u6218\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8c03\u67e5\uff0c\u9488\u5bf9207\u540d\u8f6f\u4ef6\u5f00\u53d1\u8005\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u8c03\u67e5\u5185\u5bb9\u5305\u62ecChatGPT\u5bf9\u8f6f\u4ef6\u8d28\u91cf\u3001\u751f\u4ea7\u529b\u4ee5\u53ca\u5f00\u53d1\u8005\u5de5\u4f5c\u6ee1\u610f\u5ea6\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u8fd8\u63a2\u8ba8\u4e86\u4ed6\u4eec\u5bf9\u672a\u6765ChatGPT\u5e94\u7528\u7684\u9884\u671f\u3001\u5bf9\u53ef\u80fd\u7684\u5de5\u4f5c\u5c97\u4f4d\u66ff\u4ee3\u7684\u62c5\u5fe7\uff0c\u4ee5\u53ca\u5bf9\u76d1\u7ba1\u63aa\u65bd\u7684\u770b\u6cd5\u3002|\n", "2405.12174": "|**2024-05-20**|**CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models**|Haoxiang Shi et.al.|[2405.12174](http://arxiv.org/abs/2405.12174)|null|\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aCT-Eval\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\uff0c\u65e8\u5728\u8861\u91cf\u5927\u8bed\u8a00\u6a21\u578b\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u6587\u672c\u8f6c\u8868\u683c\u4efb\u52a1\u6027\u80fd\u3002\u7531\u4e8e\u73b0\u6709\u82f1\u6587\u6587\u672c\u8f6c\u8868\u683c\u6570\u636e\u96c6\u4e3b\u8981\u9762\u5411\u82f1\u8bed\uff0cCT-Eval\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u9009\u62e9\u4e86\u4e00\u79cd\u6d41\u884c\u7684\u591a\u5b66\u79d1\u4e2d\u6587\u5728\u7ebf\u767e\u79d1\u4f5c\u4e3a\u6765\u6e90\uff0c\u6db5\u76d6\u4e8628\u4e2a\u9886\u57df\u4ee5\u4fdd\u8bc1\u6570\u636e\u591a\u6837\u6027\u3002\u4e3a\u4e86\u51cf\u5c11\u6570\u636e\u865a\u6784\uff08hallucination\uff09\u95ee\u9898\uff0c\u7814\u7a76\u8005\u9996\u5148\u8bad\u7ec3\u4e86\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u6765\u8bc6\u522b\u5e76\u8fc7\u6ee4\u6389\u5b58\u5728\u865a\u6784\u95ee\u9898\u7684\u6837\u672c\uff0c\u7136\u540e\u4eba\u5de5\u6807\u6ce8\u9a8c\u8bc1\u96c6\u548c\u6d4b\u8bd5\u96c6\u4e2d\u7684\u9519\u8bef\u3002\u6700\u7ec8\uff0cCT-Eval\u5305\u542b\u4e86\u5927\u7ea688,600\u4e2a\u4efb\u52a1\u6837\u672c\u3002\u901a\u8fc7CT-Eval\uff0c\u7814\u7a76\u8005\u8bc4\u4f30\u4e86\u5f00\u6e90\u548c\u95ed\u6e90\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u663e\u793a\u96f6-shot\u6a21\u5f0f\u4e0b\u8fd9\u4e9b\u6a21\u578b\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ecd\u6709\u663e\u8457\u5dee\u8ddd\u3002\u7ecf\u8fc7\u5fae\u8c03\u540e\uff0c\u5f00\u6e90\u6a21\u578b\u5728\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u4e0a\u6709\u4e86\u663e\u8457\u63d0\u5347\uff0c\u5927\u5e45\u8d85\u8d8a\u4e86GPT-4\u3002\u603b\u4e4b\uff0cCT-Eval\u4e0d\u4ec5\u4e3a\u8bc4\u4f30\u548c\u7406\u89e3\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u6587\u6587\u672c\u8f6c\u8868\u683c\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u5de5\u5177\uff0c\u4e5f\u4e3a\u63d0\u5347\u8fd9\u7c7b\u6a21\u578b\u5728\u8fd9\u9879\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u63d0\u4f9b\u4e86\u5b9d\u8d35\u8d44\u6e90\u3002|\n", "2405.12163": "|**2024-05-20**|**Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging**|Xiaobo Liang et.al.|[2405.12163](http://arxiv.org/abs/2405.12163)|**[link](https://github.com/dropreg/fennec)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u4f17\u591a\u73b0\u5b9e\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4e3b\u8981\u76ee\u6807\u662f\u7b26\u5408\u4eba\u7c7b\u7684\u610f\u56fe\u3002\u7136\u800c\uff0c\u7406\u89e3\u4eba\u7c7b\u610f\u56fe\u7684\u590d\u6742\u6027\u4f7f\u5f97\u4f9d\u8d56\u4e8e\u8017\u65f6\u7684\u4eba\u5de5\u8bc4\u4f30\u6210\u4e3a\u5fc5\u8981\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5229\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u8bc4\u4f30\u8005\u7684\u8d8b\u52bf\uff0c\u7279\u522b\u662f\u5728GPT-4\u7684\u6d41\u884c\u80cc\u666f\u4e0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\\textbf{Fennec}\u7684\u6846\u67b6\uff0c\u4e13\u6ce8\u4e8e\\textbf{F}ine-grained \\textbf{E}valuation\uff08\u7ec6\u81f4\u8bc4\u4f30\uff09\u548c\\textbf{N}eeded \\textbf{E}xtension\uff08\u5fc5\u8981\u6269\u5c55\uff09\u901a\u8fc7\u5206\u652f\uff08Branching\uff09\u548c\u8fde\u63a5\uff08Bridging\uff09\u3002\u5206\u652f\u64cd\u4f5c\u5c06\u8bc4\u4f30\u4efb\u52a1\u5206\u89e3\u4e3a\u4e0d\u540c\u7ef4\u5ea6\u548c\u7c92\u5ea6\uff0c\u4ece\u800c\u51cf\u8f7b\u8bc4\u4f30\u6311\u6218\u3002\u540c\u65f6\uff0c\u8fde\u63a5\u64cd\u4f5c\u878d\u5408\u4e86\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u589e\u52a0\u4e86\u8bc4\u4f30\u4efb\u52a1\u7684\u591a\u6837\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u76847B\u6a21\u578b\u5728\u5404\u79cd\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\\textit{\u4e00\u81f4\u6027}\u548c\\textit{\u4e00\u81f4\u540c\u610f}\u6027\u80fd\u5747\u4f18\u4e8e\u5f00\u6e90\u7684\u66f4\u5927\u89c4\u6a21\u8bc4\u4f30\u6a21\u578b\uff0c\u63a5\u8fd1GPT-4\u7684\u8868\u73b0\u3002\u6211\u4eec\u5229\u7528\u6a21\u578b\u7684\u7cbe\u7ec6\u6821\u6b63\u529f\u80fd\u6539\u8fdb\u591a\u4e2a\u6a21\u578b\u54cd\u5e94\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u4f18\u5316\u63d0\u5347\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u5728MT-Bench\u4e0a\u63d0\u9ad8\u4e861-2\u5206\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728GitHub\u4e0a\u5f00\u6e90\\footnote{\\url{https://github.com/dropreg/Fennec}}\u3002**|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147](http://arxiv.org/abs/2405.12147)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8ba4\u77e5\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u95ee\u9898\u5b9a\u4e49\u7684\u8f6c\u5316\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u4eba\u7c7b\u9700\u8981\u5c06\u95ee\u9898\u63cf\u8ff0\u8f6c\u5316\u4e3a\u8ba4\u77e5\u7cfb\u7edf\u80fd\u7406\u89e3\u7684\u5f62\u5f0f\u3002\u7814\u7a76\u8005\u5c55\u793a\u4e86LLMs\u80fd\u591f\u5904\u7406\u81ea\u7136\u8bed\u8a00\u4e2d\u5b9a\u4e49\u7684\u95ee\u9898\u7c7b\u522b\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u534a\u5f62\u5f0f\u5316\u89c4\u683c\uff0c\u8fd9\u6837\u73b0\u6709\u63a8\u7406\u548c\u5b66\u4e60\u7cfb\u7edf\u53ef\u4ee5\u89e3\u51b3\u8fd9\u7c7b\u95ee\u9898\u7684\u5177\u4f53\u5b9e\u4f8b\u3002\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7531LLM\u9a71\u52a8\u7684\u8ba4\u77e5\u4efb\u52a1\u5206\u6790\u5e08\u4ee3\u7406\uff0c\u8fd9\u79cd\u7cfb\u7edf\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u4efb\u52a1\u751f\u6210\u95ee\u9898\u7a7a\u95f4\u7684\u5b9a\u4e49\u3002LLM\u63d0\u793a\u6e90\u81ea\u4eba\u5de5\u667a\u80fd\u6587\u732e\u4e2d\u7684\u95ee\u9898\u7a7a\u95f4\u6982\u5ff5\u548c\u901a\u7528\u95ee\u9898\u89e3\u51b3\u7b56\u7565\uff08\u5982\u6ce2\u5229\u4e9a\u7684\u300a\u5982\u4f55\u89e3\u51b3\u95ee\u9898\u300b\uff09\u3002\u968f\u540e\uff0c\u8ba4\u77e5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u95ee\u9898\u7a7a\u95f4\u89c4\u683c\uff0c\u7ed3\u5408\u9886\u57df\u901a\u7528\u7684\u89e3\u51b3\u95ee\u9898\u7b56\u7565\uff08\u5982\u641c\u7d22\uff09\uff0c\u6765\u89e3\u51b3\u8be5\u7c7b\u95ee\u9898\u7684\u4e0d\u540c\u5b9e\u4f8b\u3002\u8fd9\u4e00\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u6d88\u9664\u95ee\u9898\u8868\u8ff0\u7684\u4e2d\u4ecb\u8fc7\u7a0b\uff0cLLMs\u6709\u53ef\u80fd\u52a0\u901f\u8ba4\u77e5\u7cfb\u7edf\u7684\u7814\u7a76\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6838\u5fc3\u80fd\u529b\uff0c\u5982\u7a33\u5065\u7684\u63a8\u7406\u548c\u5728\u7ebf\u5b66\u4e60\u3002|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130](http://arxiv.org/abs/2405.12130)|**[link](https://github.com/kongds/mora)**|**\u4f4e\u79e9\u9002\u5e94\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u6d41\u884c\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4f4e\u79e9\u66f4\u65b0\uff08\u5982LoRA\u5b9e\u73b0\uff09\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u6307\u51fa\uff0c\u8fd9\u79cd\u673a\u5236\u53ef\u80fd\u9650\u5236\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\u548c\u8bb0\u5fc6\u65b0\u77e5\u8bc6\u7684\u80fd\u529b\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5MoRA\uff0c\u5b83\u5229\u7528\u5e73\u65b9\u77e9\u9635\u5b9e\u73b0\u9ad8\u79e9\u66f4\u65b0\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eLoRA\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76f8\u5e94\u7684\u975e\u53c2\u6570\u8fd0\u7b97\u5668\uff0c\u4ee5\u964d\u4f4e\u8f93\u5165\u7ef4\u5ea6\u5e76\u589e\u52a0\u8f93\u51fa\u7ef4\u5ea6\u5904\u7406\u5e73\u65b9\u77e9\u9635\u3002\u8fd9\u4e9b\u8fd0\u7b97\u5668\u786e\u4fdd\u6743\u91cd\u80fd\u65e0\u7f1d\u878d\u5165\u5230\u5927\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u4f7f\u5f97\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u50cfLoRA\u4e00\u6837\u90e8\u7f72\u3002\u6211\u4eec\u5728\u4e94\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\uff1a\u6307\u4ee4\u8c03\u6574\u3001\u6570\u5b66\u63a8\u7406\u3001\u8fde\u7eed\u9884\u8bad\u7ec3\u3001\u8bb0\u5fc6\u4ee5\u53ca\u9884\u8bad\u7ec3\u3002\u5728\u5185\u5b58\u5bc6\u96c6\u578b\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8eLoRA\uff0c\u5e76\u5728\u5176\u4ed6\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002**|\n", "2405.12119": "|**2024-05-20**|**Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation**|Zhankui He et.al.|[2405.12119](http://arxiv.org/abs/2405.12119)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u901a\u8fc7\u51fa\u8272\u5730\u7d22\u5f15\u9879\u76ee\u5185\u5bb9\u3001\u7406\u89e3\u590d\u6742\u7684\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u5e76\u751f\u6210\u76f8\u5173\u9879\u76ee\u6807\u9898\uff0c\u9769\u65b0\u4e86\u5bf9\u8bdd\u63a8\u8350\u7cfb\u7edf\u3002\u7136\u800c\uff0c\u63a7\u5236\u63a8\u8350\u9879\u76ee\u7684\u5206\u5e03\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u5bfc\u81f4\u5728\u9488\u5bf9\u5bf9\u8bdd\u63a8\u8350\u5e73\u53f0\u7684\u5feb\u901f\u53d8\u5316\u7684\u6570\u636e\u5206\u5e03\uff0c\u5982\u9879\u76ee\u6d41\u884c\u5ea6\u4e0a\uff0c\u6027\u80fd\u6b20\u4f73\u3002\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\uff0cLLMs\u901a\u8fc7\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u9879\u76ee\u6807\u9898\uff08\u4f5c\u4e3a\u591a\u4e2a\u4ee4\u724c\uff09\uff0c\u8fd9\u4f7f\u5f97\u83b7\u53d6\u548c\u63a7\u5236\u6240\u6709\u9879\u76ee\u63a8\u8350\u53d8\u5f97\u56f0\u96be\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u91cd\u7d22\u5f15-\u7136\u540e\u9002\u5e94\u201d\uff08Reindex-Then-Adapt\uff0cRTA\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u591a\u4ee4\u724c\u9879\u76ee\u6807\u9898\u8f6c\u6362\u4e3a\u5355\u4e2a\u4ee4\u724c\u4e8eLLMs\u5185\uff0c\u968f\u540e\u8c03\u6574\u8fd9\u4e9b\u5355\u4ee4\u724c\u9879\u76ee\u6807\u9898\u7684\u6982\u7387\u5206\u5e03\u3002RTA\u6846\u67b6\u7ed3\u5408\u4e86LLMs\u7406\u89e3\u548c\u590d\u6742\u67e5\u8be2\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u4f20\u7edf\u63a8\u8350\u7cfb\u7edf\uff08RecSys\uff09\u5728\u5bf9\u8bdd\u63a8\u8350\u4e2d\u6709\u6548\u63a7\u5236\u63a8\u8350\u9879\u76ee\u5206\u5e03\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5728\u4e09\u4e2a\u4e0d\u540c\u7684\u5bf9\u8bdd\u63a8\u8350\u6570\u636e\u96c6\u548c\u4e24\u79cd\u9002\u5e94\u8bbe\u7f6e\u4e0b\uff0c\u5c55\u793a\u4e86\u6539\u8fdb\u7684\u51c6\u786e\u6027\u6307\u6807\u3002|\n", "2405.12107": "|**2024-05-20**|**Imp: Highly Capable Large Multimodal Models for Mobile Devices**|Zhenwei Shao et.al.|[2405.12107](http://arxiv.org/abs/2405.12107)|**[link](https://github.com/milvlg/imp)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u5f00\u653e\u4e16\u754c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u901a\u5e38\u53c2\u6570\u91cf\u5927\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\uff0c\u9650\u5236\u4e86\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u8f7b\u91cf\u7ea7LMM\uff0c\u65e8\u5728\u5728\u6709\u9650\u89c4\u6a21\uff08\u598230\u4ebf\u53c2\u6570\uff09\u4e0b\u6700\u5927\u5316\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u591a\u6570\u4ec5\u5173\u6ce8\u8bbe\u8ba1\u7a7a\u95f4\u7684\u5355\u4e00\u6216\u4e24\u4e2a\u65b9\u9762\uff0c\u5bf9\u5f71\u54cd\u6a21\u578b\u80fd\u529b\u7684\u5173\u952e\u8bbe\u8ba1\u9009\u62e9\u5c1a\u672a\u8fdb\u884c\u5168\u9762\u63a2\u8ba8\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u8f7b\u91cf\u7ea7LMM\u7684\u8bbe\u8ba1\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u8bad\u7ec3\u7b56\u7565\u548c\u8bad\u7ec3\u6570\u636e\u3002\u6839\u636e\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u5957\u540d\u4e3aImp\u7684\u9ad8\u6027\u80fdLMM\u5bb6\u65cf\uff0c\u8986\u76d620\u4ebf\u523040\u4ebf\u53c2\u6570\u89c4\u6a21\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684Imp-30\u4ebf\u6a21\u578b\u5728\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u73b0\u6709\u8f7b\u91cf\u7ea7\u6a21\u578b\u76f8\u6bd4\u65f6\u6301\u7eed\u9886\u5148\uff0c\u5e76\u8d85\u8d8a\u4e86130\u4ebf\u53c2\u6570\u89c4\u6a21\u7684\u6700\u65b0LMM\u72b6\u6001\u3002\u901a\u8fc7\u4f4e\u7cbe\u5ea6\u91cf\u5316\u548c\u5206\u8fa8\u7387\u964d\u4f4e\u6280\u672f\uff0cImp\u6a21\u578b\u80fd\u591f\u5728\u9ad8\u901a\u9a81\u9f998Gen3\u79fb\u52a8\u82af\u7247\u4e0a\u5b9e\u73b0\u9ad8\u901f\u90e8\u7f72\uff0c\u6bcf\u79d2\u5904\u7406\u5927\u7ea613\u4e2a\u4ee4\u724c\u7684\u63a8\u7406\u901f\u5ea6\u3002**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100](http://arxiv.org/abs/2405.12100)|null|## \u80cc\u666f \u6570\u5b66\u4e16\u754c\u95ee\u9898\u4fee\u6b63\uff08MWPC\uff09\u662f\u4e00\u4e2a\u4e13\u95e8\u9488\u5bf9\u89e3\u51b3\u6570\u5b66\u95ee\u9898\u8fc7\u7a0b\u4e2d\u9519\u8bef\u63a8\u7406\u7684\u4fee\u6b63\u4efb\u52a1\u3002\u672c\u6587\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\uff0c\u5173\u6ce8\u4e24\u70b9\uff1a\uff081\uff09\u533a\u5206\u6570\u5b66\u63a8\u7406\u4e0e\u9519\u8bef\u4fee\u6b63\uff1b\uff082\uff09\u63a2\u7d22\u7b56\u7565\u4ee5\u63d0\u5347LLMs\u5728\u6570\u5b66\u9886\u57df\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9MWPC\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u5728\u5b9e\u65f6\u6559\u80b2\u4e2d\uff0c\u5e2e\u52a9\u5b66\u751f\u8bc6\u522b\u9519\u8bef\u6bd4\u5355\u7eaf\u63d0\u4f9b\u6b63\u786e\u7b54\u6848\u66f4\u4e3a\u5173\u952e\u3002\u7136\u800c\uff0c\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u83b7\u53d6\u7cbe\u786e\u7684\u89e3\u9898\u7b54\u6848\uff0c\u800c\u975e\u7ea0\u6b63\u53ef\u80fd\u7684\u9519\u8bef\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8c03\u6574\u4e86\u7814\u7a76\u8303\u5f0f\uff0c\u8868\u660e\u63d0\u5347\u6570\u5b66\u63a8\u7406\u80fd\u529b\u5e76\u4e0d\u7b49\u540c\u4e8e\u7cbe\u901a\u9519\u8bef\u4fee\u6b63\u3002\u540c\u65f6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u8bca\u65ad\u5bfc\u5411\u63d0\u793a\uff08DOP\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fc3\u8fdbLLMs\u5728\u9519\u8bef\u4fee\u6b63\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDOP\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u5f70\u663e\u5176\u91cd\u8981\u6027\u3002\u6211\u4eec\u5f3a\u8c03\uff0c\u5728\u6570\u5b66\u6559\u80b2\u4e2d\uff0c\u5bf9\u51fa\u8272\u4fee\u6b63\u8005\u7684\u9700\u8981\u8d85\u8fc7\u4e86\u5bf9\u719f\u7ec3\u63a8\u7406\u8005\u7684\u8ffd\u6c42\u3002\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2405.12981": "|**2024-05-21**|**Reducing Transformer Key-Value Cache Size with Cross-Layer Attention**|William Brandon et.al.|[2405.12981](http://arxiv.org/abs/2405.12981)|null|## \u7ffb\u8bd1 \u952e\u503c\u7f13\u5b58\u5bf9\u4e8e\u52a0\u901fTransformer\u67b6\u6784\u7684\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89e3\u7801\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u548c\u6279\u91cf\u5927\u5c0f\u589e\u5927\uff0c\u5b58\u50a8\u952e\u503c\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\u53ef\u80fd\u4f1a\u53d8\u5f97\u96be\u4ee5\u627f\u53d7\u3002\u81ea\u4eceTransformer\u8bde\u751f\u4ee5\u6765\uff0c\u4e24\u4e2a\u6700\u6709\u6548\u7684\u5185\u5b58\u51cf\u5c0f\u7b56\u7565\u662f\u591a\u67e5\u8be2\u6ce8\u610f\u529b\uff08MQA\uff09\u53ca\u5176\u63a8\u5e7f\uff0c\u7fa4\u7ec4\u67e5\u8be2\u6ce8\u610f\u529b\uff08GQA\uff09\u3002MQA\u548cGQA\u901a\u8fc7\u8ba9\u591a\u4e2a\u67e5\u8be2\u5934\u5171\u4eab\u5355\u4e2a\u952e/\u503c\u5934\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u4e0d\u540c\u952e/\u503c\u5934\u7684\u6570\u91cf\uff0c\u540c\u65f6\u5bf9\u51c6\u786e\u6027\u5f71\u54cd\u8f83\u5c0f\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u8fdb\u4e00\u6b65\u53d1\u5c55MQA\uff0c\u5373\u5728\u76f8\u90bb\u5c42\u4e4b\u95f4\u4e5f\u5171\u4eab\u952e\u548c\u503c\u5934\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u8de8\u5c42\u6ce8\u610f\u529b\uff08CLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4f7f\u7528CLA\uff0c\u53ef\u4ee5\u5728\u4fdd\u6301\u63a5\u8fd1\u539f\u59cbMQA\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u5c06\u952e\u503c\u7f13\u5b58\u7684\u5927\u5c0f\u518d\u51cf\u5c112\u500d\u3002\u6211\u4eec\u5728\u4ece\u5934\u8bad\u7ec310\u4ebf\u53c2\u6570\u548c30\u4ebf\u53c2\u6570\u6a21\u578b\u7684\u5b9e\u9a8c\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u4e00\u70b9\uff0c\u7ed3\u679c\u8868\u660e\uff0cCLA\u5728\u5185\u5b58\u4e0e\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u63d0\u4f9b\u4e86\u4f18\u4e8e\u4f20\u7edfMQA\u7684\u5e15\u7d2f\u6258\u6539\u8fdb\uff0c\u4f7f\u5f97\u66f4\u957f\u7684\u5e8f\u5217\u957f\u5ea6\u548c\u66f4\u5927\u7684\u6279\u91cf\u5927\u5c0f\u4e0b\u7684\u63a8\u7406\u6210\u4e3a\u53ef\u80fd\u3002|\n", "2405.12961": "|**2024-05-21**|**Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale**|Shriram Chennakesavalu et.al.|[2405.12961](http://arxiv.org/abs/2405.12961)|**[link](https://github.com/rotskoff-group/llm-era)**|\u5728\u5316\u5b66\u7a7a\u95f4\u4e2d\u7684\u641c\u7d22\u662f\u4e00\u4e2a\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\uff0c\u56e0\u4e3a\u53ef\u80fd\u7684\u5206\u5b50\u6570\u91cf\u968f\u7740\u539f\u5b50\u6570\u91cf\u5448\u7ec4\u5408\u7ea7\u589e\u957f\u3002\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\u901a\u8fc7\u5b66\u4e60\u5316\u5b66\u5316\u5408\u7269\u6570\u636e\u5e93\u5df2\u7ecf\u4ea7\u751f\u4e86\u5f3a\u5927\u7684\u751f\u6210\u5668\uff0c\u4f46\u6211\u4eec\u4ecd\u7136\u7f3a\u4e4f\u6709\u6548\u7b56\u7565\u6765\u751f\u6210\u5177\u6709\u7279\u5b9a\u6027\u8d28\u7684\u5206\u5b50\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u201c\u5bf9\u9f50\u201d\u95ee\u9898\u76f8\u4f3c\uff0c\u5c3d\u7ba1\u5728\u8bb8\u591a\u5316\u5b66\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u6709\u4e00\u4e2a\u660e\u786e\u4e14\u6613\u4e8e\u8bc4\u4f30\u7684\u5956\u52b1\u51fd\u6570\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u80fd\u91cf\u6392\u540d\u5bf9\u9f50\uff08ERA\uff09\u7684\u7b97\u6cd5\uff0c\u5b83\u5229\u7528\u660e\u786e\u7684\u5956\u52b1\u51fd\u6570\u6784\u5efa\u4e86\u4e00\u4e2a\u68af\u5ea6\u4f18\u5316\u76ee\u6807\uff0c\u7528\u4e8e\u8c03\u6574\u81ea\u56de\u5f52\u7b56\u7565\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u8be5\u7b97\u6cd5\u4e0eProximal Policy Optimization\uff08PPO\uff09\u548cDirect Preference Optimization\uff08DPO\uff09\u5bc6\u5207\u76f8\u5173\uff0c\u4f46\u5176\u6700\u5c0f\u5316\u5668\u6536\u655b\u4e8e\u4e00\u4e2a\u7406\u60f3\u7684\u5409\u5e03\u65af-\u73bb\u5c14\u5179\u66fc\u5206\u5e03\uff0c\u5956\u52b1\u51fd\u6570\u626e\u6f14\u4e86\u80fd\u91cf\u89d2\u8272\u3002\u6b64\u5916\uff0c\u8be5\u7b97\u6cd5\u5177\u6709\u9ad8\u5ea6\u53ef\u6269\u5c55\u6027\uff0c\u65e0\u9700\u5f3a\u5316\u5b66\u4e60\uff0c\u5e76\u4e14\u5728\u6bcf\u5bf9\u6837\u672c\u7684\u504f\u597d\u89c2\u5bdf\u6b21\u6570\u8f83\u5c11\u65f6\uff0c\u76f8\u5bf9\u4e8eDPO\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u5e94\u7528\u4e8e\u5206\u5b50\u53d8\u538b\u5668\u7684\u5bf9\u9f50\uff0c\u4ee5\u751f\u6210\u5177\u6709\u5916\u90e8\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\uff0c\u5e76\u53d1\u73b0\u5b83\u80fd\u7a33\u5065\u5730\u8fdb\u884c\u641c\u7d22\uff0c\u63a2\u7d22\u5316\u5b66\u7a7a\u95f4\u7684\u591a\u6837\u5316\u90e8\u5206\u3002\u867d\u7136\u6211\u4eec\u7684\u91cd\u70b9\u5728\u4e8e\u5316\u5b66\u641c\u7d22\uff0c\u4f46\u6211\u4eec\u5728\u4e00\u4e2aAI\u76d1\u7763\u7684\u4efb\u52a1\u4e0a\u4e5f\u53d6\u5f97\u4e86\u4f18\u79c0\u7ed3\u679c\uff0c\u8868\u660e\u8be5\u65b9\u6cd5\u662f\u53ef\u6269\u5c55\u4e14\u901a\u7528\u7684\u3002|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939](http://arxiv.org/abs/2405.12939)|**[link](https://github.com/yinzhangyue/AoR)**|## \u80cc\u666f \u8fd1\u671f\uff0cChain-of-Thought\u63d0\u793a\u7684\u8fdb\u5c55\u6781\u5927\u5730\u63a8\u52a8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u7a81\u7834\u3002\u5f53\u524d\u7814\u7a76\u901a\u8fc7\u91c7\u6837\u591a\u79cd\u63a8\u7406\u8def\u5f84\u5e76\u6839\u636e\u7b54\u6848\u9891\u7387\u8fdb\u884censemble\uff0c\u63d0\u9ad8\u4e86LLMs\u7684\u63a8\u7406\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u6b63\u786e\u7b54\u6848\u5904\u4e8e\u5c11\u6570\u7684\u60c5\u51b5\u65f6\u5931\u6548\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u662f\u5236\u7ea6LLMs\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u56e0\u7d20\uff0c\u4ec5\u51ed\u9884\u6d4b\u7b54\u6848\u65e0\u6cd5\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5c42\u6b21\u5316\u7684\u63a8\u7406\u805a\u5408\u6846\u67b6AoR\uff08\u63a8\u7406\u805a\u5408\uff09\uff0c\u5b83\u4f9d\u636e\u63a8\u7406\u94fe\u6761\u7684\u8bc4\u4f30\u6765\u9009\u62e9\u7b54\u6848\u3002\u6b64\u5916\uff0cAoR\u5f15\u5165\u4e86\u52a8\u6001\u91c7\u6837\u7b56\u7565\uff0c\u6839\u636e\u4efb\u52a1\u590d\u6742\u5ea6\u8c03\u6574\u63a8\u7406\u94fe\u6761\u7684\u6570\u91cf\u3002 ## \u4efb\u52a1 \u4e00\u7cfb\u5217\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cAoR\u76f8\u8f83\u4e8e\u4e3b\u6d41ensemble\u65b9\u6cd5\u8868\u73b0\u51fa\u8272\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u8868\u660e\uff0cAoR\u4e0d\u4ec5\u9002\u7528\u4e8e\u5404\u79cdLLMs\uff0c\u800c\u4e14\u5728\u4e0e\u73b0\u6709\u65b9\u6cd5\u7684\u6027\u80fd\u5929\u82b1\u677f\u6bd4\u8f83\u4e2d\uff0c\u8fbe\u5230\u4e86\u66f4\u4f18\u79c0\u7684\u6c34\u5e73\u3002|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933](http://arxiv.org/abs/2405.12933)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bf8\u5982\u603b\u7ed3\u3001\u7b97\u672f\u63a8\u7406\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5728\u9053\u5fb7\u63a8\u7406\u548c\u4f26\u7406\u51b3\u7b56\u65b9\u9762\uff0c\u5c24\u5176\u662f\u5728\u6d89\u53ca\u591a\u4e2a\u5229\u76ca\u76f8\u5173\u8005\u7684\u590d\u6742\u60c5\u666f\u4e2d\uff0c\u5b83\u4eec\u9762\u4e34\u4e25\u5cfb\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSkin-in-the-Game\uff08SKIG\uff09\u7684\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u4ece\u4e0d\u540c\u5229\u76ca\u76f8\u5173\u8005\u89d2\u5ea6\u5ba1\u89c6\u51b3\u7b56\u7684\u540e\u679c\uff0c\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u5728\u9053\u5fb7\u63a8\u7406\u4e2d\u7684\u80fd\u529b\u3002SKIG\u7684\u6838\u5fc3\u673a\u5236\u662f\u6a21\u62df\u884c\u52a8\u7684\u8d23\u4efb\u611f\uff0c\u7ed3\u5408\u540c\u7406\u5fc3\u7ec3\u4e60\u548c\u98ce\u9669\u8bc4\u4f30\uff0c\u5bf9\u63d0\u9ad8\u5176\u6709\u6548\u6027\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u4f7f\u7528\u4e13\u6709\u548c\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u9053\u5fb7\u63a8\u7406\u57fa\u51c6\u4e0a\u9a8c\u8bc1SKIG\u7684\u8868\u73b0\uff0c\u5e76\u901a\u8fc7\u6df1\u5165\u7684\u6d88\u878d\u5206\u6790\u63a2\u7a76\u5176\u5173\u952e\u7ec4\u4ef6\u3002|\n", "2405.12929": "|**2024-05-21**|**Code-mixed Sentiment and Hate-speech Prediction**|Anjali Yadav et.al.|[2405.12929](http://arxiv.org/abs/2405.12929)|**[link](https://github.com/matejklemen/sentiment-hate-speech-with-code-mixed-models)**|\u5728\u591a\u8bed\u8a00\u73af\u5883\u4e2d\uff0c\u6df7\u5408\u4ee3\u7801\uff08code-mixed discourse\uff09\u6307\u7684\u662f\u5355\u6587\u672c\u4e2d\u878d\u5408\u591a\u79cd\u8bed\u8a00\u7684\u73b0\u8c61\uff0c\u5c24\u5176\u662f\u5728\u5b98\u65b9\u8bed\u8a00\u591a\u5143\u7684\u56fd\u5bb6\u7684\u975e\u6b63\u5f0f\u4ea4\u6d41\u4e2d\u5e38\u89c1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u4e3b\u5bfc\u5730\u4f4d\u63d0\u5347\uff0c\u6211\u4eec\u9488\u5bf9\u4ee3\u7801\u6df7\u5408\u8bed\u5883\u7684\u7814\u7a76\u4e5f\u968f\u4e4b\u5c55\u5f00\u3002\u9996\u5148\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u56db\u6b3e\u65b0\u7684\u82f1\u8bed-\u5370\u5730\u8bed\u548c\u82f1\u8bed-\u65af\u6d1b\u6587\u5c3c\u4e9a\u53cc\u8bed\u9884\u8bad\u7ec3\u906e\u7f69\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u9002\u5e94\u975e\u6b63\u5f0f\u8bed\u8a00\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5bf9\u5404\u79cd\u7c7b\u578b\u7684\u6a21\u578b\u2014\u2014\u5305\u62ec\u5355\u8bed\u3001\u53cc\u8bed\u3001\u5c11\u91cf\u8bed\u8a00\u548c\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u6a21\u578b\u2014\u2014\u5728\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u60c5\u611f\u5206\u6790\u548c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7b49\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u6709\u6548\u7684\u5206\u7c7b\u5668\u662f\u9488\u5bf9\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u7684\u4e13\u4e1a\u5316\u53cc\u8bed\u548c\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u968f\u540e\u662f\u975e\u4e13\u4e1a\u7684\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u548c\u5355\u8bed\u6a21\u578b\uff0c\u800c\u5927\u578b\u751f\u6210\u6a21\u578b\u7684\u8868\u73b0\u5e76\u4e0d\u7a81\u51fa\u3002\u5bf9\u4e8e\u6d89\u53ca\u60c5\u611f\u7684\u95ee\u9898\uff0c\u6a21\u578b\u5728\u5904\u7406\u4ee3\u7801\u6df7\u5408\u6570\u636e\u65f6\u603b\u4f53\u4e0a\u7565\u4f18\u4e8e\u975e\u4ee3\u7801\u6df7\u5408\u6570\u636e\u3002|\n", "2405.12920": "|**2024-05-21**|**Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples**|Tim Menzies et.al.|[2405.12920](http://arxiv.org/abs/2405.12920)|**[link](https://github.com/timm/ez)**|\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u8f6f\u4ef6\u5206\u6790\u6311\u6218\u4efb\u52a1\u3002\u5728\u8fd9\u4e2a\u88ab\u79f0\u4e3a\u201c\u8f6f\u4ef6\u5ba1\u67e5\u201d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e00\u7ec4SME\uff08\u4e3b\u9898\u4e13\u5bb6\uff09\u4f1a\u8bc4\u5ba1\u8f6f\u4ef6\u884c\u4e3a\u793a\u4f8b\uff0c\u4ee5\u5efa\u8bae\u5982\u4f55\u6539\u8fdb\u8f6f\u4ef6\u7684\u8fd0\u884c\u3002\u7531\u4e8eSME\u7684\u65f6\u95f4\u901a\u5e38\u975e\u5e38\u6709\u9650\uff0c\u7406\u60f3\u7684\u72b6\u51b5\u662f\uff0c\u8be5\u56e2\u961f\u4ec5\u901a\u8fc7\u67e5\u770b\u5c11\u91cf\u5177\u6709\u9ad8\u5ea6\u4fe1\u606f\u4ef7\u503c\u7684\u793a\u4f8b\u5c31\u80fd\u5b8c\u6210\u4f18\u5316\u4efb\u52a1\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e2a\u5ba1\u67e5\u8fc7\u7a0b\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u8bad\u7ec3\u9884\u6d4b\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u80fd\u591f\u9884\u6d4b\u67d0\u4e2a\u4e13\u5bb6\u662f\u5426\u4f1a\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u4e0b\u4e00\u4e2a\u793a\u4f8b\u3002\u8fd9\u79cd\u9884\u6d4b\u6a21\u578b\u53ef\u4ee5\u4e0eSME\u5408\u4f5c\uff0c\u5f15\u5bfc\u4ed6\u4eec\u63a2\u7d22\u6240\u6709\u793a\u4f8b\uff0c\u540c\u65f6\u5728\u4e13\u5bb6\u79bb\u5f00\u540e\uff0c\u6a21\u578b\u4e5f\u53ef\u4ee5\u4f5c\u4e3a\u4ee3\u7406\uff0c\u5904\u7406\u65b0\u51fa\u73b0\u7684\u6848\u4f8b\uff0c\u4ee5\u5e94\u5bf9\u4e13\u5bb6\u4eec\u7684\u5fd9\u788c\u3002 \u572831\u4e2a\u6848\u4f8b\u7814\u7a76\u4e2d\uff08\u6db5\u76d6\u4e86\u4ece\u8f6f\u4ef6\u6d41\u7a0b\u7684\u9ad8\u5c42\u51b3\u7b56\u5230\u89c6\u9891\u7f16\u7801\u8f6f\u4ef6\u914d\u7f6e\u7684\u4f4e\u5c42\u51b3\u7b56\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f7f\u752812\u523030\u4e2a\u6807\u7b7e\u5c31\u80fd\u5efa\u7acb\u8fd9\u6837\u7684\u9884\u6d4b\u6a21\u578b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u4ec5\u51ed\u5c11\u6570\u793a\u4f8b\uff08\u4e0d\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5c31\u80fd\u53d6\u5f97\u8fd9\u6837\u7684\u6210\u679c\uff0c\u5728\u5f53\u524d\u5c1a\u5c5e\u7f55\u89c1\u3002\u9075\u5faa\u5f00\u653e\u79d1\u5b66\u7684\u539f\u5219\uff0c\u6211\u4eec\u5c06\u5728\u63d0\u4f9b\u6240\u6709\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4fbf\u4ed6\u4eba\u80fd\u590d\u5236\u3001\u9a8c\u8bc1\u6216\u5728\u6b64\u57fa\u7840\u4e0a\u8fdb\u4e00\u6b65\u6539\u8fdb\u8fd9\u4e9b\u7ed3\u679c\u3002|\n", "2405.12915": "|**2024-05-21**|**G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation**|Xingyuan Pan et.al.|[2405.12915](http://arxiv.org/abs/2405.12915)|**[link](https://github.com/xypan0/G-DIG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u7528\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\uff0c\u5b83\u4eec\u80fd\u591f\u4e0e\u4eba\u7c7b\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u534f\u540c\u3002\u7136\u800c\uff0c\u6307\u4ee4\u6570\u636e\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u662f\u6307\u4ee4\u5fae\u8c03\u9762\u4e34\u7684\u4e24\u5927\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u81ea\u52a8\u9009\u62e9\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u9ad8\u8d28\u91cf\u548c\u591a\u6837\u5316\u7684\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u3002\u6211\u4eec\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5206\u6790\u5355\u4e2a\u8bad\u7ec3\u6837\u4f8b\u5982\u4f55\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f71\u54cd\u6a21\u578b\u3002\u901a\u8fc7\u7ed3\u5408\u5f71\u54cd\u529b\u51fd\u6570\u548c\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u79cd\u5b50\u6570\u636e\uff0c\u6211\u4eec\u9009\u62e9\u5bf9\u6a21\u578b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u7684\u6837\u4f8b\u4f5c\u4e3a\u9ad8\u8d28\u91cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u6570\u636e\u591a\u6837\u6027\uff0c\u6211\u4eec\u901a\u8fc7\u805a\u7c7b\u5176\u68af\u5ea6\u5e76\u91cd\u91c7\u6837\uff0c\u6700\u5927\u5316\u5b83\u4eec\u5bf9\u6a21\u578b\u4ea7\u751f\u7684\u5f71\u54cd\u591a\u6837\u6027\u3002\u5728WMT22\u548cFLORES\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u4f18\u8d8a\u6027\uff0c\u6df1\u5165\u5206\u6790\u8fdb\u4e00\u6b65\u8bc1\u5b9e\u4e86\u5176\u6548\u679c\u548c\u6cdb\u5316\u80fd\u529b\u3002|\n", "2405.12914": "|**2024-05-21**|**An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation**|Zhiyu Tan et.al.|[2405.12914](http://arxiv.org/abs/2405.12914)|**[link](https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion.github.io)**|\u4e00\u4e2a\u5173\u952e\u7684\u5148\u51b3\u6761\u4ef6\u662f\u51c6\u786e\u7406\u89e3\u6587\u672c\u8f93\u5165\uff0c\u8fd9\u5bf9\u4e8e\u5fe0\u5b9e\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u5229\u7528CLIP\u6a21\u578b\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u8868\u793a\u63d0\u793a\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684CLIP\u6a21\u578b\u4ec5\u80fd\u5904\u7406\u82f1\u6587\uff0c\u4e14\u5176\u6587\u672c\u7f16\u7801\u5668\u7684\u6a21\u578b\u5bb9\u91cf\u76f8\u5bf9\u6709\u9650\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u80fd\u591f\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u5e76\u63d0\u4f9b\u66f4\u4f18\u79c0\u7684\u6587\u672c\u8868\u793a\u3002\u672c\u6587\u7814\u7a76\u4e86\u4f7f\u7528LLMs\u4f5c\u4e3a\u6587\u672c\u7f16\u7801\u5668\u4ee5\u63d0\u5347\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u7136\u800c\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5305\u542bLLMs\u7684\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u6570\u636e\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e09\u9636\u6bb5\u8bad\u7ec3\u6d41\u7a0b\uff0c\u6709\u6548\u5730\u6574\u5408\u73b0\u6709\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u4e0eLLMs\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u8bad\u7ec3\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u9002\u914d\u5668\uff0c\u4f7f\u5f97\u80fd\u591f\u5feb\u901f\u4f7f\u7528LLMs\u751f\u6210\u7684\u6587\u672c\u8868\u793a\u6765\u8bad\u7ec3\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0d\u4ec5\u652f\u6301\u591a\u8bed\u8a00\u8f93\u5165\uff0c\u8fd8\u80fd\u5904\u7406\u66f4\u957f\u7684\u4e0a\u4e0b\u6587\uff0c\u800c\u4e14\u5728\u56fe\u50cf\u751f\u6210\u8d28\u91cf\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.12910": "|**2024-05-21**|**Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment**|Holli Sargeant et.al.|[2405.12910](http://arxiv.org/abs/2405.12910)|**[link](https://github.com/AhmedIzzidien/TopicLLM)**|**\u8be5\u8bba\u6587\u5173\u6ce8\u6cd5\u5f8b\u5206\u6790\u4e2d\u7684\u4e00\u4e2a\u91cd\u8981\u7a7a\u767d\uff0c\u901a\u8fc7\u6784\u5efa\u548c\u5e94\u7528\u4e00\u79cd\u65b0\u9896\u7684\u5224\u4f8b\u4e3b\u9898\u5206\u7c7b\u6cd5\uff0c\u5bf9\u82f1\u56fd\u7684\u7b80\u6613\u5224\u51b3\u6848\u4ef6\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5229\u7528\u7cbe\u5fc3\u6311\u9009\u7684\u7b80\u6613\u5224\u51b3\u6848\u4f8b\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578bClaude 3 Opus\u7814\u7a76\u529f\u80fd\u6027\u8bdd\u9898\u548c\u8d8b\u52bf\u3002\u7ed3\u679c\u663e\u793a\uff0cClaude 3 Opus\u5728\u4e3b\u9898\u5206\u7c7b\u4e0a\u7684\u51c6\u786e\u7387\u4e3a87.10%\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u6cd5\u5f8b\u9886\u57df\u4e2d\u7b80\u6613\u5224\u51b3\u7684\u660e\u663e\u6a21\u5f0f\u3002\u7531\u4e8e\u82f1\u56fd\u7684\u5224\u4f8b\u6cd5\u5e76\u672a\u539f\u59cb\u6807\u6ce8\u5173\u952e\u8bcd\u6216\u63d0\u4f9b\u4e3b\u9898\u8fc7\u6ee4\u9009\u9879\uff0c\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u6df1\u5316\u4e86\u6211\u4eec\u5bf9\u7b80\u6613\u5224\u51b3\u4e3b\u9898\u672c\u8d28\u7684\u7406\u89e3\uff0c\u8fd8\u5c55\u793a\u4e86\u4f20\u7edf\u65b9\u6cd5\u4e0e\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u5206\u7c7b\u65b9\u6cd5\u7ed3\u5408\u7684\u53ef\u80fd\u6027\u3002\u56e0\u6b64\uff0c\u672c\u6587\u63d0\u4f9b\u4e86\u82f1\u56fd\u6cd5\u5f8b\u7684\u65b0\u901a\u7528\u5206\u7c7b\u6846\u67b6\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u610f\u4e49\u4e3a\u53f8\u6cd5\u884c\u653f\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u548c\u8ba1\u7b97\u6cd5\u5b66\u7814\u7a76\u65b9\u6cd5\u8bba\u8ba8\u8bba\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2405.12900": "|**2024-05-21**|**Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents**|San Kim et.al.|[2405.12900](http://arxiv.org/abs/2405.12900)|null|\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5404\u79cd\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7cfb\u7edf\u7684\u53d1\u5c55\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6bd2\u6027\u95ee\u9898\u5bf9\u7528\u6237\u4f53\u9a8c\u6784\u6210\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bad\u7ec3\u7b97\u6cd5\u2014\u2014\u5bf9\u6297\u5f0f\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08ADPO\uff09\uff0c\u5b83\u662f\u5728\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7684\u57fa\u7840\u4e0a\u6539\u8fdb\u7684\u3002ADPO\u65e8\u5728\u8bad\u7ec3\u6a21\u578b\u589e\u52a0\u5bf9\u4f18\u9009\u56de\u590d\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u964d\u4f4e\u5bf9\u4f7f\u7528\u6709\u6bd2\u63a7\u5236\u4ee4\u724c\u751f\u6210\u7684\u4e0d\u5b89\u5168\u56de\u590d\u7684\u6982\u7387\u3002\u7814\u7a76\u663e\u793a\uff0cADPO\u80fd\u591f\u589e\u5f3a\u6a21\u578b\u62b5\u5fa1\u6709\u5bb3\u5bf9\u8bdd\u7684\u80fd\u529b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660eADPO\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edfDPO\u66f4\u4e3a\u7a33\u5b9a\u7684\u8bad\u7ec3\u6d41\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u6709\u5bb3\u6570\u636e\u76f4\u63a5\u878d\u5165\u751f\u6210\u6a21\u578b\u7684DPO\u53d8\u4f53\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u4eba\u5de5\u521b\u5efa\u5b89\u5168\u5bf9\u8bdd\u6570\u636e\u7684\u9700\u6c42\u3002|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863](http://arxiv.org/abs/2405.14863)|null|\u8de8\u9886\u57df\u5bf9\u9f50\u662f\u6307\u5c06\u4e00\u4e2a\u6982\u5ff5\u4ece\u4e00\u4e2a\u9886\u57df\u6620\u5c04\u5230\u53e6\u4e00\u4e2a\u9886\u57df\u7684\u4efb\u52a1\u3002\u4f8b\u5982\uff0c\u8be2\u95ee\u201c\u5982\u679c\\textit{\u533b\u751f}\u662f\u4e00\u79cd\\textit{\u989c\u8272}\uff0c\u5b83\u4f1a\u662f\u4ec0\u4e48\u989c\u8272\uff1f\u201d\u8fd9\u4e2a\u770b\u4f3c\u5947\u7279\u7684\u8bfe\u9898\u65e8\u5728\u7814\u7a76\u4eba\u4eec\u5982\u4f55\u901a\u8fc7\u7c7b\u522b\u6620\u5c04\u548c\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u63a8\u7406\u6765\u8868\u5f81\u5177\u4f53\u548c\u62bd\u8c61\u7684\u6982\u5ff5\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u501f\u9274\u8ba4\u77e5\u79d1\u5b66\u4e2d\u7684\u8fd9\u4e00\u4efb\u52a1\uff0c\u901a\u8fc7\u884c\u4e3a\u7814\u7a76\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6982\u5ff5\u5316\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u63d0\u793aLLMs\u6267\u884c\u8de8\u57df\u6620\u5c04\u4efb\u52a1\uff0c\u5e76\u5728\u7fa4\u4f53\u548c\u4e2a\u4f53\u5c42\u9762\u5206\u6790\u5b83\u4eec\u7684\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u5bf9\u5176\u9884\u6d4b\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u548c\u5206\u7c7b\u5b83\u4eec\u5bf9\u8fd9\u4e9b\u6620\u5c04\u7684\u89e3\u91ca\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4eba\u7c7b\u548c\u6a21\u578b\u7684\u6620\u5c04\u4ee5\u53ca\u89e3\u91ca\u5b58\u5728\u663e\u8457\u76f8\u4f3c\u6027\uff0c\u8868\u660e\u6a21\u578b\u4ee5\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u65b9\u5f0f\u8868\u5f81\u6982\u5ff5\u3002\u8fd9\u79cd\u76f8\u4f3c\u6027\u4e0d\u4ec5\u4f53\u73b0\u5728\u6a21\u578b\u7684\u8868\u793a\u4e0a\uff0c\u4e5f\u4f53\u73b0\u5728\u5b83\u4eec\u7684\u884c\u4e3a\u4e2d\u3002\u800c\u4e14\uff0c\u6a21\u578b\u5927\u591a\u7ed9\u51fa\u6709\u6548\u7684\u89e3\u91ca\uff0c\u5e76\u91c7\u7528\u4e0e\u4eba\u7c7b\u7c7b\u4f3c\u7684\u63a8\u7406\u8def\u5f84\u3002|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862](http://arxiv.org/abs/2405.14862)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aBitune\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u63d0\u5347\u4e86\u9884\u8bad\u7ec3\u7684\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u8c03\u4f18\u65b9\u9762\u7684\u6027\u80fd\uff0c\u4ece\u800c\u5728\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002Bitune\u901a\u8fc7\u540c\u65f6\u5e94\u7528\u81ea\u56de\u5f52\u548c\u53cc\u5411\u6ce8\u610f\u529b\u5230\u63d0\u793a\u4e0a\uff0c\u4ee5\u83b7\u53d6\u66f4\u7cbe\u786e\u7684\u67e5\u8be2\u6216\u6307\u4ee4\u8868\u793a\u3002\u6211\u4eec\u4e3a\u6b64\u5f15\u5165\u4e86\u4e24\u7ec4\u53c2\u6570\uff0c\u5e76\u91c7\u7528\u4e86\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u6765\u5904\u7406\u3002\u8fd9\u4e24\u79cd\u7279\u5f81\u968f\u540e\u88ab\u7ec4\u5408\u6210\u4e00\u4e2a\u52a0\u6743\u5e73\u5747\uff0c\u5176\u4e2d\u6743\u91cd\u7531\u53ef\u8bad\u7ec3\u7cfb\u6570\u51b3\u5b9a\uff0c\u7528\u4e8e\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cBitune\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u5728\u5e38\u8bc6\u63a8\u7406\u3001\u7b97\u672f\u548c\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5927\u91cf\u7684\u6d88\u878d\u7814\u7a76\u9a8c\u8bc1\u4e86\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u5e76\u663e\u793a\u4e86\u8be5\u65b9\u6cd5\u5bf9\u4e0d\u540cPEFT\u6280\u672f\u7684\u9c81\u68d2\u6027\u3002|\n", "2405.14852": "|**2024-05-23**|**PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression**|Vladimir Malinovskii et.al.|[2405.14852](http://arxiv.org/abs/2405.14852)|**[link](https://github.com/vahe1994/aqlm)**|## \u80cc\u666f \u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u6781\u7aef\u201d\u538b\u7f29\uff0c\u5373\u5c06\u5176\u53c2\u6570\u538b\u7f29\u81f31-2\u4f4d\u6bcf\u53c2\u6570\uff0c\u4ee5\u9002\u5e94\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u7684\u9ad8\u6548\u6267\u884c\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6539\u8fdb\u4e00\u6b21\u6027\u91cf\u5316\u6280\u672f\u548c\u6743\u91cd\u8868\u793a\u4e0a\uff1b\u7136\u800c\uff0c\u7eaf\u540e\u8bad\u7ec3\u65b9\u6cd5\u5728\u7cbe\u5ea6\u4e0e\u4f4d\u5bbd\u6743\u8861\u65b9\u9762\u7684\u6536\u76ca\u6b63\u5728\u51cf\u5c11\u3002\u5f53\u524d\u6700\u5148\u8fdb\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u5982QuIP#\u548cAQLM\uff0c\u5305\u542b\u5bf9\u90e8\u5206\u538b\u7f29\u53c2\u6570\u7684\u5c0f\u89c4\u6a21\u6821\u51c6\u6570\u636e\u5fae\u8c03\uff1b\u7136\u800c\uff0c\u8fd9\u4e9b\u9488\u5bf9\u538b\u7f29\u6743\u91cd\u7684\u5fae\u8c03\u901a\u5e38\u4ec5\u4f7f\u7528\u76f4\u901a\u4f30\u8ba1\u5668\uff08STE\uff09\uff0cSTE\u5728\u8fd9\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\u5c1a\u4e0d\u660e\u786e\u3002 \u672c\u5de5\u4f5c\u8d28\u7591\u5728\u6781\u7aefLLM\u538b\u7f29\u4e2d\u4f7f\u7528STE\u7684\u6709\u6548\u6027\uff0c\u5e76\u7cfb\u7edf\u5730\u7814\u7a76\u4e86\u91cf\u5316\u611f\u77e5\u5fae\u8c03\u7b56\u7565\u3002\u6211\u4eec\u63d0\u51faPV-Tuning\uff0c\u4e00\u4e2a\u65e0\u7279\u5b9a\u67b6\u6784\u9650\u5236\u7684\u6846\u67b6\uff0c\u5b83\u6269\u5c55\u5e76\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u5e76\u5728\u67d0\u4e9b\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u4f9b\u6536\u655b\u4fdd\u8bc1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5f53\u7528\u4e8e1-2\u4f4d\u77e2\u91cf\u91cf\u5316\u65f6\uff0cPV-Tuning\u5728\u9ad8\u6027\u80fd\u6a21\u578b\u5982Llama\u548cMistral\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u6280\u672f\u3002\u901a\u8fc7\u4f7f\u7528PV-Tuning\uff0c\u6211\u4eec\u57282\u4f4d\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u9996\u6b21\u5b9e\u73b0\u4e86Llama 2\u5bb6\u65cf\u6a21\u578b\u7684\u5e15\u7d2f\u6258\u6700\u4f18\u91cf\u5316\u3002|\n", "2405.14831": "|**2024-05-23**|**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models**|Bernal Jim\u00e9nez Guti\u00e9rrez et.al.|[2405.14831](http://arxiv.org/abs/2405.14831)|**[link](https://github.com/osu-nlp-group/hipporag)**|\u4e3a\u4e86\u5728\u6076\u52a3\u591a\u53d8\u7684\u81ea\u7136\u73af\u5883\u4e2d\u751f\u5b58\uff0c\u54fa\u4e73\u52a8\u7269\u7684\u5927\u8111\u53d1\u5c55\u51fa\u5b58\u50a8\u5927\u91cf\u4e16\u754c\u77e5\u8bc6\u5e76\u4e0d\u65ad\u6574\u5408\u65b0\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u540c\u65f6\u907f\u514d\u707e\u96be\u6027\u9057\u5fd8\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u5e26\u6709\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u4e0a\u5df2\u53d6\u5f97\u663e\u8457\u6210\u5c31\uff0c\u4f46\u5b83\u4eec\u5728\u5927\u89c4\u6a21\u65b0\u7ecf\u9a8c\u878d\u5408\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51faHippoRAG\uff0c\u4e00\u4e2a\u53d7\u4eba\u7c7b\u957f\u671f\u8bb0\u5fc6\u6d77\u9a6c\u56de\u7d22\u5f15\u7406\u8bba\u542f\u53d1\u7684\u65b0\u578b\u68c0\u7d22\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u65b0\u7ecf\u9a8c\u7684\u66f4\u6df1\u3001\u66f4\u6709\u6548\u96c6\u6210\u3002HippoRAG\u5de7\u5999\u5730\u534f\u540cLLMs\u3001\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u4e2a\u6027\u5316PageRank\u7b97\u6cd5\uff0c\u6a21\u62df\u4eba\u8111\u76ae\u5c42\u548c\u6d77\u9a6c\u4f53\u5728\u8bb0\u5fc6\u4e2d\u7684\u4e0d\u540c\u4f5c\u7528\u3002 \u6211\u4eec\u5c06HippoRAG\u4e0e\u73b0\u6709RAG\u65b9\u6cd5\u5728\u591a\u8f6e\u95ee\u7b54\u4efb\u52a1\u4e2d\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793aHippoRAG\u663e\u8457\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002\u5355\u6b65\u68c0\u7d22\u65f6\uff0cHippoRAG\u8868\u73b0\u51fa\u4e0e\u8fed\u4ee3\u68c0\u7d22\u65b9\u6cd5\u5982IRCoT\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u6210\u672c\u8282\u770110-30\u500d\uff0c\u901f\u5ea6\u63d0\u53476-13\u500d\u3002\u5f53\u5c06HippoRAG\u878d\u5165IRCoT\u540e\uff0c\u8fd8\u80fd\u5e26\u6765\u989d\u5916\u7684\u663e\u8457\u589e\u76ca\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793aHippoRAG\u80fd\u591f\u5e94\u5bf9\u73b0\u6709\u65b9\u6cd5\u96be\u4ee5\u89e6\u53ca\u7684\u65b0\u573a\u666f\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728\u4e0a\u5f00\u6e90\u3002|\n", "2405.14804": "|**2024-05-23**|**Can LLMs Solve longer Math Word Problems Better?**|Xin Xu et.al.|[2405.14804](http://arxiv.org/abs/2405.14804)|null|### \u7ffb\u8bd1 \u6570\u5b66\u5e94\u7528\u9898\uff08MWPs\uff09\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u7b80\u77ed\u80cc\u666f\u7684\u9898\u76ee\u4e0a\u3002\u7136\u800c\uff0c\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u6570\u5b66\u95ee\u9898\u5f80\u5f80\u6d89\u53ca\u590d\u6742\u60c5\u5883\uff0c\u56e0\u6b64LLMs\u89e3\u51b3\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u5bf9\u4e8e\u5176\u5728\u5b9e\u9645\u573a\u666f\u7684\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u8fd9\u4e00\u65b9\u9762\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5173\u6ce8Context Length Generalizability\uff08CoLeG\uff09\uff0c\u5373LLMs\u5904\u7406\u957f\u7bc7\u6570\u5b66\u5e94\u7528\u9898\u7684\u80fd\u529b\u3002\u6211\u4eec\u521b\u5efa\u4e86Extended Grade-School Math\uff08E-GSM\uff09\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5e26\u6709\u8be6\u7ec6\u53d9\u8ff0\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30LLMs\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e0a\u7684\u6548\u80fd\u548c\u9c81\u68d2\u6027\u3002 \u901a\u8fc7\u5bf9\u73b0\u6709\u96f6\u6837\u672c\u63d0\u793a\u65b9\u6cd5\u4ee5\u53ca\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u8003\u5bdf\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u5728CoLeG\u65b9\u9762\u666e\u904d\u5b58\u5728\u4e0d\u8db3\u3002\u9488\u5bf9\u4e0d\u540c\u7c7b\u578b\u7684LLMs\uff0c\u6211\u4eec\u63d0\u51fa\u9488\u5bf9\u6027\u7684\u89e3\u51b3\u65b9\u6848\uff1a\u5bf9\u4e8e\u4e13\u6709\u6a21\u578b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u65b0\u7684\u6307\u5bfc\u6027\u63d0\u793a\u4ee5\u51cf\u8f7b\u957f\u6587\u672c\u7684\u5f71\u54cd\uff1b\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4ee5\u63d0\u5347\u6a21\u578b\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u5728E-GSM\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5176\u4ed6\u591a\u4e2a\u6570\u5b66\u5e94\u7528\u9898\u57fa\u51c6\u4e0a\u4e5f\u5c55\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u672c\u7814\u7a76\u7684\u7ed3\u679c\u4e3a\u672a\u6765\u5229\u7528LLMs\u5904\u7406\u590d\u6742\u73b0\u5b9e\u95ee\u9898\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\uff0c\u4e3a\u5f53\u524d\u9650\u5236\u63d0\u51fa\u4e86\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u6a21\u578b\u6cdb\u5316\u6027\u548c\u8bad\u7ec3\u7b56\u7565\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2405.14782": "|**2024-05-23**|**Lessons from the Trenches on Reproducible Evaluation of Language Models**|Stella Biderman et.al.|[2405.14782](http://arxiv.org/abs/2405.14782)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u6709\u6548\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u662f\u4e00\u9879\u672a\u89e3\u7684\u6311\u6218\u3002\u7814\u7a76\u4eba\u5458\u548c\u5de5\u7a0b\u5e08\u9762\u4e34\u8bf8\u591a\u65b9\u6cd5\u8bba\u96be\u9898\uff0c\u4f8b\u5982\u6a21\u578b\u5bf9\u8bc4\u4f30\u8bbe\u7f6e\u7684\u654f\u611f\u6027\u3001\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\uff0c\u4ee5\u53ca\u53ef\u91cd\u590d\u6027\u548c\u900f\u660e\u5ea6\u7684\u7f3a\u5931\u3002\u672c\u6587\u57fa\u4e8e\u4e09\u5e74\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u7ecf\u9a8c\uff0c\u4e3a\u7814\u7a76\u8005\u63d0\u4f9b\u6307\u5bfc\u548c\u6559\u8bad\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u5e38\u89c1\u7684\u95ee\u9898\u3002\u5176\u6b21\uff0c\u6211\u4eec\u9610\u8ff0\u4e86\u5e94\u5bf9\u6216\u51cf\u8f7b\u8fd9\u4e9b\u95ee\u9898\u7684\u6700\u4f73\u5b9e\u8df5\u3002\u7b2c\u4e09\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Language Model Evaluation Harness\uff08lm-eval\uff09\uff1a\u4e00\u4e2a\u5f00\u6e90\u5e93\uff0c\u65e8\u5728\u72ec\u7acb\u3001\u53ef\u91cd\u590d\u548c\u6269\u5c55\u5730\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u5c06\u4ecb\u7ecd\u5e93\u7684\u529f\u80fd\uff0c\u5e76\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u5982\u4f55\u4f7f\u7528\u8be5\u5e93\u6765\u7f13\u89e3\u8fd9\u4e9b\u65b9\u6cd5\u8bba\u5173\u6ce8\u70b9\u3002|\n", "2405.14768": "|**2024-05-23**|**WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models**|Peng Wang et.al.|[2405.14768](http://arxiv.org/abs/2405.14768)|**[link](https://github.com/zjunlp/easyedit)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u968f\u7740\u4e16\u754c\u4e8b\u5b9e\u7684\u4e0d\u65ad\u589e\u957f\u548c\u7ea0\u6b63\u9519\u8bef\u54cd\u5e94\u7684\u9700\u6c42\uff0c\u6a21\u578b\u7f16\u8f91\u7684\u65b9\u6cd5\u9700\u8981\u4e0d\u65ad\u66f4\u65b0\u77e5\u8bc6\u3002\u8bba\u6587\u7684\u6838\u5fc3\u95ee\u9898\u662f\uff1a\u5728\u7f16\u8f91\u8fc7\u7a0b\u4e2d\uff0c\u77e5\u8bc6\u5e94\u5b58\u50a8\u5728\u6a21\u578b\u7684\u54ea\u4e2a\u8bb0\u5fc6\u5c42\u6b21\u66f4\u4e3a\u5408\u9002\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u76f4\u63a5\u4fee\u6539\u957f\u671f\u8bb0\u5fc6\uff08\u6a21\u578b\u53c2\u6570\uff09\u6216\u5229\u7528\u5de5\u4f5c\u8bb0\u5fc6\uff08\u901a\u8fc7\u68c0\u7d22\u7684\u795e\u7ecf\u7f51\u7edc\u6fc0\u6d3b\uff09\u90fd\u4f1a\u5bfc\u81f4\u4e0d\u53ef\u903e\u8d8a\u7684\u4e09\u89d2\u56f0\u5883\u2014\u2014\u53ef\u9760\u6027\u3001\u6cdb\u5316\u80fd\u529b\u548c\u5c40\u90e8\u6027\u65e0\u6cd5\u540c\u65f6\u5b9e\u73b0\u4e8e\u7ec8\u8eab\u7f16\u8f91\u573a\u666f\u4e2d\u3002\u76f4\u63a5\u4fee\u6539\u53c2\u6570\u4f1a\u4e0e\u65e0\u5173\u7684\u9884\u8bad\u7ec3\u77e5\u8bc6\u6216\u5148\u524d\u7f16\u8f91\u4ea7\u751f\u51b2\u7a81\uff08\u53ef\u9760\u6027\u5dee\u3001\u5c40\u90e8\u6027\u4e0d\u8db3\uff09\uff1b\u800c\u57fa\u4e8e\u68c0\u7d22\u7684\u5de5\u4f5c\u8bb0\u5fc6\u96be\u4ee5\u4f7f\u6a21\u578b\u7406\u89e3\u5e76\u6cdb\u5316\u7f16\u8f91\uff08\u6cdb\u5316\u80fd\u529b\u5f31\uff09\u3002\u56e0\u6b64\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aWISE\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u5f25\u5408\u8bb0\u5fc6\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002 \u5728WISE\u4e2d\uff0c\u8bbe\u8ba1\u4e86\u4e00\u79cd\u53cc\u53c2\u6570\u5185\u5b58\u673a\u5236\uff0c\u5305\u62ec\u4e3b\u5185\u5b58\u7528\u4e8e\u5b58\u50a8\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0c\u4fa7\u5185\u5b58\u7528\u4e8e\u5b58\u653e\u7f16\u8f91\u540e\u7684\u77e5\u8bc6\u3002\u4ec5\u5bf9\u4fa7\u5185\u5b58\u4e2d\u7684\u77e5\u8bc6\u8fdb\u884c\u7f16\u8f91\uff0c\u5e76\u8bad\u7ec3\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u4ee5\u4fbf\u6839\u636e\u67e5\u8be2\u51b3\u5b9a\u4ece\u54ea\u4e2a\u5185\u5b58\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u5bf9\u4e8e\u6301\u7eed\u7f16\u8f91\uff0c\u91c7\u7528\u4e86\u77e5\u8bc6\u5207\u7247\u673a\u5236\uff0c\u5c06\u4e0d\u540c\u7684\u7f16\u8f91\u5206\u5e03\u5728\u53c2\u6570\u7684\u4e0d\u540c\u5b50\u7a7a\u95f4\u4e2d\uff0c\u7136\u540e\u5408\u5e76\u5230\u5171\u4eab\u5185\u5b58\u4e2d\uff0c\u4ee5\u907f\u514d\u51b2\u7a81\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWISE\u5728\u95ee\u7b54\u3001\u5e7b\u89c9\u751f\u6210\u548c\u8de8\u4e0d\u540c\u8d8b\u52bf\u7684LLM\u67b6\u6784\uff08\u5982GPT\u3001LLaMA\u548cMistral\uff09\u7684\u7ec8\u8eab\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u6210\u529f\u514b\u670d\u4e86\u4e0a\u8ff0\u56f0\u5883\u3002\u4ee3\u7801\u5c06\u5728https://github.com/zjunlp/EasyEdit\u4e0a\u53d1\u5e03\u3002**|\n", "2405.14767": "|**2024-05-23**|**FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models**|Hongyang Yang et.al.|[2405.14767](http://arxiv.org/abs/2405.14767)|**[link](https://github.com/ai4finance-foundation/finrobot)**|**\u968f\u7740\u91d1\u878d\u673a\u6784\u548c\u4e13\u4e1a\u4eba\u58eb\u8d8a\u6765\u8d8a\u591a\u5730\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u878d\u5165\u5de5\u4f5c\u6d41\u7a0b\uff0c\u91d1\u878d\u884c\u4e1a\u4e0eAI\u793e\u533a\u4e4b\u95f4\u4ecd\u5b58\u5728\u663e\u8457\u969c\u788d\uff0c\u5982\u4e13\u6709\u6570\u636e\u548c\u4e13\u4e1a\u77e5\u8bc6\u3002\u8fd9\u4e9b\u6311\u6218\u9650\u5236\u4e86AI\u5728\u63d0\u5347\u91d1\u878d\u4efb\u52a1\u6548\u7387\u65b9\u9762\u7684\u6f5c\u529b\u3002\u9274\u4e8e\u91d1\u878d\u5206\u6790\u7684\u91cd\u8981\u6027\uff0c\u6211\u4eec\u65e8\u5728\u5f00\u53d1\u4e13\u95e8\u9488\u5bf9\u91d1\u878d\u7684LLM\u9a71\u52a8\u5de5\u5177\u94fe\uff0c\u5e76\u901a\u8fc7\u5f00\u6e90\u9879\u76ee\u63a8\u52a8\u5176\u666e\u53ca\uff0c\u4fc3\u8fdbAI\u5728\u91d1\u878d\u51b3\u7b56\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u4ecb\u7ecdFinRobot\uff0c\u4e00\u4e2a\u521b\u65b0\u7684\u5f00\u6e90AI\u4ee3\u7406\u5e73\u53f0\uff0c\u652f\u6301\u591a\u4e2a\u91d1\u878d\u4e13\u4e1aAI\u4ee3\u7406\uff0c\u6bcf\u4e2a\u90fd\u7531LLM\u9a71\u52a8\u3002\u5e73\u53f0\u4e3b\u8981\u5206\u4e3a\u56db\u5c42\uff1a1\uff09\u91d1\u878dAI\u4ee3\u7406\u5c42\uff0c\u901a\u8fc7\u6784\u5efa\u91d1\u878dChain-of-Thought\uff08CoT\uff09\u5c06\u590d\u6742\u7684\u91d1\u878d\u95ee\u9898\u5206\u89e3\u4e3a\u903b\u8f91\u5e8f\u5217\uff1b2\uff09\u91d1\u878dLLM\u7b97\u6cd5\u5c42\uff0c\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u52a8\u6001\u914d\u7f6e\u5408\u9002\u7684\u6a21\u578b\u5e94\u7528\u7b56\u7565\uff1b3\uff09LLMOps\u548cDataOps\u5c42\uff0c\u901a\u8fc7\u8bad\u7ec3/\u5fae\u8c03\u6280\u672f\u4ee5\u53ca\u4f7f\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u751f\u6210\u7cbe\u786e\u6a21\u578b\uff1b4\uff09\u591a\u6e90LLM\u57fa\u7840\u6a21\u578b\u5c42\uff0c\u6574\u5408\u5404\u79cdLLM\uff0c\u4f7f\u4e0a\u8ff0\u5404\u5c42\u53ef\u4ee5\u76f4\u63a5\u8bbf\u95ee\u3002FinRobot\u65e8\u5728\u4e3a\u4e13\u4e1a\u5206\u6790\u5e08\u548c\u975e\u4e13\u4e1a\u4eba\u58eb\u63d0\u4f9b\u5b9e\u8df5\u64cd\u4f5c\uff0c\u8ba9\u4ed6\u4eec\u80fd\u591f\u5229\u7528\u5f3a\u5927\u7684AI\u6280\u672f\u8fdb\u884c\u9ad8\u7ea7\u91d1\u878d\u5206\u6790\u3002FinRobot\u7684\u5f00\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/AI4Finance-Foundation/FinRobot}\u3002**|\n", "2405.14766": "|**2024-05-23**|**Evaluating Large Language Models for Public Health Classification and Extraction Tasks**|Joshua Harris et.al.|[2405.14766](http://arxiv.org/abs/2405.14766)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u4eec\u5bf9\u5176\u5728\u516c\u5171\u536b\u751f\u9886\u57df\u652f\u6301\u4e13\u5bb6\u5de5\u4f5c\u7684\u6f5c\u529b\u4ea7\u751f\u4e86\u6d53\u539a\u5174\u8da3\u3002\u672c\u7814\u7a76\u901a\u8fc7\u7ed3\u5408\u516d\u4e2a\u5916\u90e8\u6807\u6ce8\u7684\u548c\u4e03\u4e2a\u5185\u90e8\u6807\u6ce8\u7684\u6570\u636e\u96c6\uff0c\u8bc4\u4f30\u4e86LLMs\u5728\u5904\u7406\u4e0e\u5065\u5eb7\u8d1f\u62c5\u3001\u6d41\u884c\u75c5\u5b66\u98ce\u9669\u56e0\u7d20\u548c\u516c\u5171\u536b\u751f\u5e72\u9884\u76f8\u5173\u7684\u6587\u672c\u5206\u7c7b\u548c\u63d0\u53d6\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u9996\u5148\u5bf9\u4e94\u4e2a\u5f00\u6e90\u5927\u6a21\u578b\uff08\u53c2\u6570\u91cf\u4ece7\u4ebf\u523070\u4ebf\u4e0d\u7b49\uff09\u8fdb\u884c\u4e86\u96f6\u6837\u672c\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0cLlama-3-70B-Instruct\u8868\u73b0\u51fa\u8272\uff0c\u5fae-F1\u5f97\u5206\u572817\u4e2a\u4efb\u52a1\u4e2d\u768415\u9879\u4e2d\u6700\u9ad8\u3002\u5404\u4efb\u52a1\u95f4\u7684\u6027\u80fd\u5dee\u5f02\u663e\u8457\uff0c\u4f8b\u5982\uff0c\u6709\u4e9b\u6a21\u578b\u5982Contact Classification\u7684\u5f97\u5206\u4f4e\u4e8e60%\uff0c\u800c\u50cfGI\u75be\u75c5\u5206\u7c7b\u8fd9\u6837\u7684\u4efb\u52a1\uff0c\u6240\u6709\u6a21\u578b\u90fd\u80fd\u8fbe\u523080%\u4ee5\u4e0a\u7684\u5fae-F1\u3002\u5bf9\u4e8e12\u4e2a\u4efb\u52a1\u7684\u5b50\u96c6\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86GPT-4\uff0c\u53d1\u73b0\u5176\u4e0eLlama-3-70B-Instruct\u7684\u7ed3\u679c\u76f8\u5f53\uff0cLlama-3-70B-Instruct\u5728\u5176\u4e2d6\u4e2a\u4efb\u52a1\u4e0a\u5f97\u5206\u66f4\u9ad8\u6216\u6301\u5e73\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6839\u636e\u521d\u6b65\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u6709\u53ef\u80fd\u6210\u4e3a\u516c\u5171\u536b\u751f\u4e13\u5bb6\u4ece\u5404\u79cd\u81ea\u7531\u6587\u672c\u6e90\u63d0\u53d6\u4fe1\u606f\u7684\u6709\u6548\u5de5\u5177\uff0c\u6709\u52a9\u4e8e\u516c\u5171\u536b\u751f\u76d1\u6d4b\u3001\u7814\u7a76\u548c\u5e72\u9884\u63aa\u65bd\u3002|\n", "2405.14755": "|**2024-05-23**|**Large language models can be zero-shot anomaly detectors for time series?**|Sarah Alnegheimish et.al.|[2405.14755](http://arxiv.org/abs/2405.14755)|**[link](https://github.com/sintel-dev/sigllm)**|\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6267\u884c\u591a\u79cd\u4efb\u52a1\uff0c\u5305\u62ec\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u3002\u8fd9\u4e9b\u6a21\u578b\u7684\u7075\u6d3b\u6027\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e00\u9879\u65b0\u9896\u7684\u7814\u7a76\uff0c\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5bf9\u4e8e\u8bed\u8a00\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u6d89\u53ca\u8bc6\u522b\u8f93\u5165\u5e8f\u5217\uff08\u6216\u591a\u4e2a\u90e8\u5206\uff09\u4e2d\u7684\u5f02\u5e38\u70b9\uff0c\u4ee5\u53ca\u5904\u7406\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u800c\u975e\u4f20\u7edf\u7684\u6587\u672c\u8f93\u5165\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86sigllm\uff0c\u4e00\u4e2a\u4e13\u4e3a\u65f6\u95f4\u5e8f\u5217\u5f02\u5e38\u68c0\u6d4b\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5305\u542b\u5c06\u65f6\u95f4\u5e8f\u5217\u8f6c\u6362\u4e3a\u6587\u672c\u7684\u6a21\u5757\uff0c\u4ee5\u53ca\u7aef\u5230\u7aef\u7684\u6d41\u7a0b\uff0c\u7528\u4e8e\u5f15\u5bfc\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5f02\u5e38\u68c0\u6d4b\u3002\u6211\u4eec\u8bd5\u9a8c\u4e86\u4e24\u79cd\u6d4b\u8bd5\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u529b\u7684\u65b9\u6cd5\uff1a\u4e00\u662f\u76f4\u63a5\u63d0\u793a\u6a21\u578b\u6307\u51fa\u8f93\u5165\u4e2d\u7684\u5f02\u5e38\u5143\u7d20\uff1b\u4e8c\u662f\u5229\u7528\u8bed\u8a00\u6a21\u578b\u7684\u9884\u6d4b\u80fd\u529b\u6765\u8f85\u52a9\u68c0\u6d4b\u8fc7\u7a0b\u3002 \u6211\u4eec\u572811\u4e2a\u6765\u81ea\u4e0d\u540c\u6765\u6e90\u7684\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u4e8610\u79cd\u4e0d\u540c\u7684\u7ba1\u9053\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9884\u6d4b\u65b9\u6cd5\u5728\u6240\u670911\u4e2a\u6570\u636e\u96c6\u4e2d\u90fd\u663e\u8457\u4f18\u4e8e\u63d0\u793a\u65b9\u6cd5\uff0c\u5c24\u5176\u662f\u5728F1\u5206\u6570\u4e0a\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u53d1\u73b0\u5f02\u5e38\uff0c\u4f46\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4ecd\u5360\u4f18\uff0c\u5176\u8868\u73b0\u6bd4\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9ad8\u51fa30%\u3002|\n", "2405.15765": "|**2024-05-24**|**Scaling Laws for Discriminative Classification in Large Language Models**|Dean Wyatte et.al.|[2405.15765](http://arxiv.org/abs/2405.15765)|null|## \u80cc\u666f \u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u6a21\u578b\u80fd\u529b\u7684\u4e00\u4e2a\u91cd\u5927\u98de\u8dc3\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5bf9\u5404\u79cd\u67e5\u8be2\u751f\u6210\u5408\u7406\u7684\u56de\u7b54\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u5e94\u7528\u4e2d\u5177\u6709\u6f5c\u529b\u3002\u7136\u800c\uff0cLLMs\u5df2\u88ab\u89c2\u5bdf\u5230\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\uff0c\u8fd9\u5728\u77ed\u671f\u5185\u9650\u5236\u4e86\u5b83\u4eec\u5728\u5ba2\u6237\u670d\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5c06\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u91cd\u65b0\u6784\u60f3\u4e3a\u5206\u7c7b\u4efb\u52a1\uff0c\u4ee5\u5e2e\u52a9\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u9009\u62e9\u6700\u4f73\u7684\u6a21\u677f\u56de\u590d\u3002\u6211\u4eec\u7684\u76ee\u6807\u662f\u4e3a\u5ba2\u670d\u4ee3\u8868\u63d0\u4f9b\u6700\u5408\u9002\u7684\u524dK\u4e2a\u5019\u9009\u56de\u590d\u3002 ## \u4efb\u52a1\u63cf\u8ff0 \u6211\u4eec\u5c55\u793a\u4e86\u79bb\u7ebf\u548c\u5728\u7ebf\u5b9e\u9a8c\u7684\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86\u5b9e\u9a8c\u7cfb\u7edf\u7684\u6709\u6548\u6027\uff0c\u79bb\u7ebf\u5b9e\u9a8c\u663e\u793a\u51fa\u6539\u8fdb\uff0c\u800c\u5728\u7ebf\u5b9e\u9a8c\u5219\u5e26\u6765\u4e86\u7edf\u8ba1\u663e\u8457\u7684\u6548\u679c\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u4eab\u4e86\u901a\u8fc7\u6a21\u578b\u53c2\u6570\u8c03\u6574\u8fdb\u884c\u7684\u9a8c\u8bc1\u635f\u5931\u548c\u524dK\u7cbe\u5ea6\u7684\u5ea6\u91cf\u66f2\u7ebf\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u6a21\u578b\u5927\u5c0f\u3001\u5ef6\u8fdf\u548c\u51c6\u786e\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u5c55\u671b\u4e86\u672a\u6765\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u3002|\n", "2405.15739": "|**2024-05-24**|**Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias**|Andres Algaba et.al.|[2405.15739](http://arxiv.org/abs/2405.15739)|**[link](https://github.com/andresalgaba/llm_citation_patterns)**|\u8bba\u6587\u6458\u8981\uff1a \u5f15\u7528\u5b9e\u8df5\u5bf9\u4e8e\u6784\u5efa\u79d1\u5b66\u77e5\u8bc6\u7ed3\u6784\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u53d7\u5230\u5f53\u4ee3\u89c4\u8303\u548c\u504f\u89c1\u7684\u5f71\u54cd\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u51fa\u73b0\uff0c\u8fd9\u4e00\u9886\u57df\u51fa\u73b0\u4e86\u65b0\u7684\u52a8\u6001\u3002\u7814\u7a76\u8005\u9996\u6b21\u63a2\u7d22\u4e86\u5b8c\u5168\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u800c\u975e\u57fa\u4e8e\u641c\u7d22\u6216\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7684\u63a8\u8350\u5f15\u7528\u7684\u7279\u6027\u53ca\u5176\u6f5c\u5728\u504f\u89c1\u3002\u5b9e\u9a8c\u4f7f\u7528\u4e86\u4e00\u7ec4\u5305\u542b166\u7bc7\u6765\u81eaAAAI\u3001NeurIPS\u3001ICML\u548cICLR\u7684\u8bba\u6587\uff0c\u8fd9\u4e9b\u8bba\u6587\u5728GPT-4\u7684\u77e5\u8bc6\u622a\u6b62\u65e5\u671f\u540e\u53d1\u8868\uff0c\u6d89\u53ca3,066\u4e2a\u5f15\u7528\u3002\u5b9e\u9a8c\u8ba9GPT-4\u4e3a\u533f\u540d\u6587\u672c\u4e2d\u7684\u5f15\u7528\u63d0\u4f9b\u5b66\u672f\u53c2\u8003\u3002\u7ed3\u679c\u63ed\u793a\u4e86\u4eba\u7c7b\u548c\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\uff09\u7684\u5f15\u7528\u6a21\u5f0f\u60ca\u4eba\u76f8\u4f3c\uff0c\u4f46GPT-4\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9ad8\u5f15\u7528\u504f\u89c1\uff0c\u5373\u4f7f\u5728\u63a7\u5236\u4e86\u51fa\u7248\u5e74\u4efd\u3001\u6807\u9898\u957f\u5ea6\u3001\u4f5c\u8005\u6570\u91cf\u548c\u4f1a\u8bae\u7b49\u56e0\u7d20\u540e\u4f9d\u7136\u5b58\u5728\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0GPT-4\u751f\u6210\u7684\u65e2\u6709\u548c\u4e0d\u5b58\u5728\u5f15\u7528\u7684\u7279\u6027\u9ad8\u5ea6\u4e00\u81f4\uff0c\u8868\u660e\u6a21\u578b\u5185\u5316\u4e86\u5f15\u7528\u6a21\u5f0f\u3002\u901a\u8fc7\u5206\u6790\u5f15\u7528\u56fe\u8c31\uff0c\u663e\u793aGPT-4\u63a8\u8350\u7684\u5f15\u7528\u5d4c\u5165\u5728\u76f8\u5173\u5f15\u7528\u7f51\u7edc\u4e2d\uff0c\u6697\u793a\u5176\u5bf9\u6982\u5ff5\u7684\u6df1\u5165\u7406\u89e3\u3002\u5c3d\u7ba1\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u8f85\u52a9\u5f15\u7528\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4e5f\u53ef\u80fd\u653e\u5927\u73b0\u6709\u504f\u89c1\u5e76\u5f15\u5165\u65b0\u504f\u89c1\uff0c\u53ef\u80fd\u5f71\u54cd\u79d1\u5b66\u77e5\u8bc6\u7684\u4f20\u64ad\u3002\u6211\u4eec\u7684\u7ed3\u679c\u5f3a\u8c03\u4e86\u8bc6\u522b\u6a21\u578b\u504f\u89c1\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5f00\u53d1\u5e73\u8861\u7684\u65b9\u6cd5\u4e0e\u8bed\u8a00\u6a21\u578b\u4e92\u52a8\u7684\u91cd\u8981\u6027\u3002|\n", "2405.15734": "|**2024-05-24**|**LM4LV: A Frozen Large Language Model for Low-level Vision Tasks**|Boyang Zheng et.al.|[2405.15734](http://arxiv.org/abs/2405.15734)|**[link](https://github.com/bytetriper/lm4lv)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6210\u529f\u50ac\u751f\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u7814\u7a76\u70ed\u6f6e\uff0c\u5b83\u4eec\u6b63\u5728\u6539\u53d8\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u591a\u4e2a\u7814\u7a76\u8303\u5f0f\u3002\u5c3d\u7ba1MLLMs\u5728\u8bf8\u5982\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u548c\u6587\u672c\u5230\u56fe\u50cf\u7b49\u9ad8\u7ea7\u89c6\u89c9\u548c Vision-and-Language \u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u63a2\u8ba8\u8fc7\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u5982\u4f55\u4ece\u8fd9\u4e9b\u6a21\u578b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5f53\u524d\u5927\u591a\u6570MLLM\u7684\u8bbe\u8ba1\u4f7f\u5176\u5bf9\u4f4e\u7ea7\u7279\u5f81\u89c6\u800c\u4e0d\u89c1\uff0c\u56e0\u6b64\u5728\u89e3\u51b3\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u65b9\u9762\u5b58\u5728\u56fa\u6709\u9650\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa$\\textbf{LM4LV}$\uff0c\u8fd9\u662f\u4e00\u4e2a\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u4e00\u4e2a\u51bb\u7ed3\u7684LLM\u65e0\u9700\u4efb\u4f55\u591a\u6a21\u6001\u6570\u636e\u6216\u5148\u9a8c\u77e5\u8bc6\u5c31\u80fd\u89e3\u51b3\u4e00\u7cfb\u5217\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u3002\u8fd9\u7a81\u663e\u4e86LLMs\u5728\u4f4e\u7ea7\u89c6\u89c9\u9886\u57df\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u5e76\u5f25\u5408\u4e86MLLMs\u4e0e\u4f4e\u7ea7\u89c6\u89c9\u4efb\u52a1\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u6fc0\u53d1\u5bf9LLMs\u7684\u65b0\u89c6\u89d2\uff0c\u52a0\u6df1\u5bf9\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7406\u89e3\u3002|\n", "2405.15729": "|**2024-05-24**|**Optimizing Large Language Models for OpenAPI Code Completion**|Bohdan Petryshyn et.al.|[2405.15729](http://arxiv.org/abs/2405.15729)|**[link](https://github.com/BohdanPetryshyn/openapi-completion-benchmark)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u3002\u5c3d\u7ba1\u4e3b\u6d41\u7f16\u7a0b\u8bed\u8a00\u7684\u4ee3\u7801\u8865\u5168\u89e3\u51b3\u65b9\u6848\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u8f83\u5c11\u89c1\u7684\u683c\u5f0f\uff0c\u5982OpenAPI\u5b9a\u4e49\u65f6\u6027\u80fd\u6b20\u4f73\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86GitHub Copilot\uff0c\u4e00\u4e2a\u6d41\u884c\u7684\u5546\u4e1a\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff0c\u5728OpenAPI\u5b8c\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u5e76\u9488\u5bf9Meta\u5f00\u6e90\u7684Code Llama\u6a21\u578b\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u9488\u5bf9\u8be5\u4efb\u52a1\u7684\u4f18\u5316\u7b56\u7565\u3002\u7814\u7a76\u4e2d\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u8bed\u4e49\u611f\u77e5\u7684OpenAPI\u5b8c\u6210\u57fa\u51c6\uff0c\u901a\u8fc7\u5b9e\u9a8c\u5206\u6790\u4e86\u4e0d\u540c\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u6280\u672f\u5bf9Code Llama\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7ecf\u8fc7\u5fae\u8c03\u7684Code Llama\u6a21\u578b\u5728\u6b63\u786e\u6027\u4e0a\u8fbe\u5230\u4e86\u6bd4GitHub Copilot\u9ad8\u51fa55.2%\u7684\u5cf0\u503c\uff0c\u540c\u65f6\u5176\u53c2\u6570\u6570\u91cf\u4ec5\u4e3a\u5546\u4e1a\u89e3\u51b3\u65b9\u6848\uff08\u57fa\u4e8eCodex\u6a21\u578b\uff09\u76841/25\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u6539\u8fdb\u4e86\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u4ee3\u7801\u586b\u5145\u8bad\u7ec3\u65b9\u6cd5\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5728\u63a5\u6536\u5230\u5c0f\u4e8e\u8bad\u7ec3\u65f6\u4f7f\u7528\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u63d0\u793a\u65f6\u7684\u6027\u80fd\u4e0d\u8db3\u95ee\u9898\u3002|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684](http://arxiv.org/abs/2405.15684)|null|\u4e3a\u4e86\u5f25\u5408\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u4e4b\u95f4\u7684\u9e3f\u6c9f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u901a\u5e38\u4f1a\u5b66\u4e60\u4e00\u4e2a\u9002\u914d\u5668\uff0c\u5c06\u89c6\u89c9\u8f93\u5165\u8f6c\u5316\u4e3a\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u7406\u89e3\u7684\u4ee4\u724c\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u9002\u914d\u5668\u751f\u6210\u7684\u89c6\u89c9\u4ee4\u724c\u76f8\u5bf9\u56fa\u5b9a\uff0c\u4e0d\u8003\u8651\u63d0\u793a\u4e2d\u63d0\u53ca\u7684\u5177\u4f53\u5bf9\u8c61\u3002\u7531\u4e8e\u8fd9\u4e9b\u9002\u914d\u5668\u5bf9\u56fe\u50cf\u4e2d\u7684\u6bcf\u4e2a\u7ec6\u8282\u5206\u914d\u540c\u7b49\u5173\u6ce8\uff0c\u4e14\u503e\u5411\u4e8e\u5904\u7406\u6574\u4e2a\u573a\u666f\uff0c\u8fd9\u53ef\u80fd\u4f1a\u589e\u52a0\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u590d\u6742\u573a\u666f\u65f6\u7684\u8ba4\u77e5\u8d1f\u8377\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u3002\u8fd9\u7c7b\u9002\u914d\u5668\u8bbe\u8ba1\u6709\u6839\u636e\u63d0\u793a\u7279\u5b9a\u5173\u6ce8\u70b9\u52a8\u6001\u5d4c\u5165\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u6587\u672c\u7279\u5f81\uff0c\u5728\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u5c42\u6b21\u4e0a\u6355\u6349\u4e0e\u63d0\u793a\u6700\u76f8\u5173\u7684\u89c6\u89c9\u7ebf\u7d22\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u89e3\u91ca\u89c6\u89c9\u5185\u5bb9\u7684\u80fd\u529b\u3002\u5728\u5404\u79cd\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u5982\u8ba1\u6570\u548c\u4f4d\u7f6e\u63a8\u7406\u5b9e\u9a8c\u4e2d\uff0c\u63d0\u793a\u611f\u77e5\u9002\u914d\u5668\u7684\u6548\u679c\u5f97\u5230\u4e86\u9a8c\u8bc1\u3002|\n", "2405.15668": "|**2024-05-24**|**What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models**|Abdelrahman Abdelhamed et.al.|[2405.15668](http://arxiv.org/abs/2405.15668)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06\u591a\u6a21\u6001LLMs\u5e94\u7528\u4e8e\u56fe\u50cf\u8f93\u5165\uff0c\u751f\u6210\u8be6\u5c3d\u7684\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6587\u672c\u8868\u793a\u88ab\u8f6c\u5316\u4e3a\u8de8\u6a21\u6001\u5d4c\u5165\u7a7a\u95f4\u4e2d\u7684\u56fa\u5b9a\u7ef4\u7279\u5f81\uff0c\u5e76\u7ed3\u5408\u4f7f\u7528\u4e8e\u96f6\u6837\u672c\u5206\u7c7b\uff0c\u65e0\u9700\u4e3a\u6bcf\u4e2a\u6570\u636e\u96c6\u8bbe\u8ba1\u590d\u6742\u7684\u63d0\u793a\u3002\u7814\u7a76\u8005\u91c7\u7528\u901a\u7528\u63d0\u793a\u7b56\u7565\uff0c\u800c\u975e\u9488\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u5355\u72ec\u8c03\u6574\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u6bd4\u5148\u524d\u65b9\u6cd5\u7684\u51c6\u786e\u6027\u6709\u6240\u63d0\u5347\u3002\u5e73\u5747\u800c\u8a00\uff0c\u5728\u5341\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u65b9\u6cd5\u6bd4\u4f20\u7edf\u65b9\u6cd5\u63d0\u9ad8\u4e864.1\u4e2a\u767e\u5206\u70b9\uff0c\u5c24\u5176\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e866.8\u4e2a\u767e\u5206\u70b9\u3002\u8fd9\u8868\u660e\uff0c\u591a\u6a21\u6001LLMs\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5982\u96f6\u6837\u672c\u56fe\u50cf\u5206\u7c7b\u4e4b\u7c7b\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u4efb\u52a1\uff0c\u4e3a\u73b0\u6709\u6280\u672f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002|\n", "2405.15662": "|**2024-05-24**|**Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning**|Wenhan Chang et.al.|[2405.15662](http://arxiv.org/abs/2405.15662)|null|\u5728\u4eba\u5de5\u667a\u80fd\u65f6\u4ee3\uff0c\u7528\u6237\u53ef\u80fd\u56e0\u9690\u79c1\u987e\u8651\u8981\u6c42AI\u516c\u53f8\u4ece\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u5220\u9664\u4ed6\u4eec\u7684\u4fe1\u606f\u3002\u4f5c\u4e3a\u6a21\u578b\u6240\u6709\u8005\uff0c\u91cd\u65b0\u8bad\u7ec3\u6a21\u578b\u4f1a\u6d88\u8017\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\uff0c\u56e0\u6b64\u673a\u5668\u9057\u5fd8\uff08machine unlearning\uff09\u6280\u672f\u5e94\u8fd0\u800c\u751f\uff0c\u4ee5\u5141\u8bb8\u5220\u9664\u8bf7\u6c42\u7684\u8bad\u7ec3\u6570\u636e\u6216\u7c7b\u522b\uff0c\u540c\u65f6\u5c3d\u91cf\u51cf\u5c11\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\uff0c\u5982\u56fe\u50cf\u6216\u6587\u672c\uff0c\u4ece\u6a21\u578b\u4e2d\u201c\u9057\u5fd8\u201d\u4e00\u4e2a\u7c7b\u522b\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u96be\u4ee5\u786e\u5b9a\u7c7b\u522b\u4e0e\u6a21\u578b\u4e4b\u95f4\u7684\u5173\u8054\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4f7f\u7528\u6982\u5ff5\uff08Concept\uff09\u800c\u975e\u56fe\u50cf\u7279\u5f81\u6216\u6587\u672c\u6570\u636e\u4e2d\u7684\u4ee4\u724c\u6765\u8868\u793a\u8981\u5220\u9664\u7c7b\u522b\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u8fd9\u6709\u52a9\u4e8e\u5207\u65ad\u6a21\u578b\u4e0e\u7c7b\u522b\u7684\u8054\u7cfb\uff0c\u5b9e\u73b0\u5f7b\u5e95\u6d88\u9664\u5f71\u54cd\u3002 \u4e3a\u4e86\u5206\u6790\u590d\u6742\u6570\u636e\u4e2d\u7684\u6982\u5ff5\u5f71\u54cd\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u540e\u5904\u7406\u6982\u5ff5\u74f6\u9888\u6a21\u578b\u548c\u96c6\u6210\u68af\u5ea6\u6280\u672f\uff0c\u7cbe\u786e\u8bc6\u522b\u4e0d\u540c\u7c7b\u522b\u4e2d\u7684\u6982\u5ff5\u3002\u7136\u540e\uff0c\u6211\u4eec\u5229\u7528\u968f\u673a\u6807\u7b7e\u548c\u76ee\u6807\u6807\u7b7e\u7684\u6570\u636e\u6c61\u67d3\u7b56\u7565\uff0c\u63d0\u51fa\u9057\u5fd8\u65b9\u6cd5\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u6a21\u578b\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4e00\u81f4\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7b56\u7565\u80fd\u51c6\u786e\u5730\u4ece\u6a21\u578b\u4e2d\u62b9\u9664\u76ee\u6807\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u6027\u80fd\u7684\u5927\u90e8\u5206\u3002|\n", "2405.15652": "|**2024-05-24**|**$$\\mathbf{L^2\\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis**|Simen Gaure et.al.|[2405.15652](http://arxiv.org/abs/2405.15652)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u7ffb\u8bd1\u3001\u9884\u6d4b\u548c\u5185\u5bb9\u751f\u6210\u7b49\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\u800c\u5907\u53d7\u77a9\u76ee\u3002\u540c\u65f6\uff0c\u7814\u7a76\u754c\u53d1\u73b0LLMs\u6613\u53d7\u653b\u51fb\uff0c\u4f46\u4e5f\u80fd\u589e\u5f3a\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f00\u6e90\u7684LLMs\u5728\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u5a92\u4ecb\uff0c\u5982\u652f\u6301\u6297\u5ba1\u67e5\u901a\u4fe1\u65b9\u9762\u7684\u80fd\u529b\u5982\u4f55\u5462\uff1f\u672c\u8bba\u6587\u4ece\u5b9e\u9a8c\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b9e\u8bc1\u6d4b\u91cf\u5f00\u6e90LLM\u6a21\u578b\uff08Llama-7B\uff09\u7684\u5b89\u5168\u6027\u4e0e\u5bb9\u91cf\uff0c\u4ee5\u8bc4\u4f30\u5176\u4f5c\u4e3a\u63a9\u853d\u901a\u4fe1\u7684\u6709\u6548\u6027\u3002\u5c3d\u7ba1\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8e\u8fd9\u79cd\u6a21\u578b\u7684\u901a\u9053\u4e0d\u592a\u53ef\u80fd\u5b9e\u73b0\u9ad8\u5b9e\u9645\u6bd4\u7279\u7387\uff0c\u8fd9\u53d6\u51b3\u4e8e\u6d88\u606f\u957f\u5ea6\u548c\u6a21\u578b\u71b5\uff0c\u4f46\u6211\u4eec\u53d1\u73b0\u5bf9\u624b\u53d1\u73b0\u9690\u79d8\u901a\u4fe1\u7684\u53ef\u80fd\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u4f7f\u7ed3\u679c\u6613\u4e8e\u5e7f\u6cdb\u53c2\u8003\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u4e14\u76f4\u89c2\u7684\u65b9\u6848\uff0c\u5e76\u5047\u8bbe\u6a21\u578b\u662f\u516c\u5f00\u53ef\u7528\u7684\u3002|\n", "2405.15646": "|**2024-05-24**|**LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots**|Ruoyu Wang et.al.|[2405.15646](http://arxiv.org/abs/2405.15646)|null|\u5728\u65e5\u5e38\u751f\u6d3b\u4e2d\u5f00\u53d1\u901a\u7528\u670d\u52a1\u673a\u5668\u4eba\u7684\u9700\u6c42\u4fc3\u4f7f\u673a\u5668\u4eba\u5fc5\u987b\u80fd\u6070\u5f53\u5730\u6267\u884c\u591a\u79cd\u57fa\u7840\u884c\u4e3a\u3002\u8fd1\u671f\uff0c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fdb\u6b65\u4f7f\u5f97\u53ef\u4ee5\u76f4\u63a5\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u751f\u6210\u4efb\u52a1\u5e8f\u5217\uff0c\u65e0\u9700\u989d\u5916\u7684\u9886\u57df\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5c3d\u7ba1LLMs\u7684\u8f93\u51fa\u5728\u8bed\u4e49\u4e0a\u662f\u6b63\u786e\u7684\uff0c\u4f46\u751f\u6210\u7684\u4efb\u52a1\u8ba1\u5212\u53ef\u80fd\u5e76\u4e0d\u7cbe\u786e\u5730\u5bf9\u5e94\u4e8e\u53ef\u63a5\u53d7\u7684\u52a8\u4f5c\uff0c\u5e76\u4e14\u53ef\u80fd\u5b58\u5728\u5404\u79cd\u8bed\u8a00\u6a21\u7cca\u6027\u3002LLM\u7684\u5e7b\u89c9\u95ee\u9898\u5bf9\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u6784\u6210\u6311\u6218\uff0c\u53ef\u80fd\u5bfc\u81f4\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u6216\u7528\u6237\u8f93\u5165\u4e0d\u7b26\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u7ea6\u675fLLM\u63d0\u793a\u7684\u4efb\u52a1\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u53ef\u4ee5\u4ece\u547d\u4ee4\u4e2d\u751f\u6210\u53ef\u6267\u884c\u7684\u52a8\u4f5c\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5f02\u5e38\u5904\u7406\u6a21\u5757\u6765\u5e94\u5bf9LLM\u5e7b\u89c9\u95ee\u9898\uff0c\u786e\u4fdd\u751f\u6210\u7684\u7ed3\u679c\u5728\u5f53\u524d\u73af\u5883\u4e2d\u662f\u53ef\u63a5\u7eb3\u7684\u3002\u6211\u4eec\u5728RoboCup@Home\u547d\u4ee4\u751f\u6210\u5668\u751f\u6210\u7684\u547d\u4ee4\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u663e\u793a\u673a\u5668\u4eba\u5728\u7406\u89e3\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.15640": "|**2024-05-24**|**GECKO: Generative Language Model for English, Code and Korean**|Sungwoo Oh et.al.|[2405.15640](http://arxiv.org/abs/2405.15640)|null|\u6211\u4eec\u4ecb\u7ecdGECKO\uff0c\u4e00\u4e2a\u4e13\u4e3a\u97e9\u8bed\u548c\u82f1\u8bed\uff08\u5305\u62ec\u7f16\u7a0b\u8bed\u8a00\uff09\u8bbe\u8ba1\u7684\u53cc\u8bed\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u5b83\u57fa\u4e8eLLaMA\u67b6\u6784\uff0c\u4f7f\u7528\u5e73\u8861\u4e14\u9ad8\u8d28\u91cf\u7684\u97e9\u82f1\u8bed\u6570\u636e\u96c6\u8fdb\u884c\u9884\u8bad\u7ec3\u3002\u672c\u62a5\u544a\u8be6\u8ff0\u4e86\u6211\u4eec\u5728\u6784\u5efa\u6570\u636e\u7ba1\u9053\u548c\u8bad\u7ec3\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u52aa\u529b\u3002\u5c3d\u7ba1GECKO\u7684\u8bcd\u6c47\u91cf\u8f83\u5c0f\uff0c\u4f46\u5176\u5728\u751f\u6210\u97e9\u8bed\u548c\u82f1\u8bed\u4ee4\u724c\u65f6\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u80fd\u3002\u6211\u4eec\u5728\u4ee3\u8868\u6027\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30\u4e86\u5176\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u97e9\u56fdMMMLU\uff08\u97e9\u56fd\u591a\u6a21\u6001\u591a\u8bed\u8a00\u7406\u89e3\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u800c\u5728\u82f1\u8bed\u548c\u4ee3\u7801\u65b9\u9762\u5219\u663e\u793a\u51fa\u9002\u5ea6\u7684\u80fd\u529b\uff0c\u5c3d\u7ba1\u5176\u8bad\u7ec3\u7684\u4ee4\u724c\u6570\u91cf\u5c11\u4e8e\u4e13\u6ce8\u4e8e\u82f1\u8bed\u7684LLMs\u3002GECKO\u4ee5\u5bbd\u677e\u7684\u8bb8\u53ef\u534f\u8bae\u5bf9\u5f00\u6e90\u793e\u533a\u5f00\u653e\uff0c\u6211\u4eec\u5e0c\u671b\u5b83\u80fd\u4e3a\u97e9\u8bedLLM\u7814\u7a76\u63d0\u4f9b\u7814\u7a76\u57fa\u7ebf\u548c\u5b9e\u7528\u89c1\u89e3\u3002\u60a8\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\u8be5\u6a21\u578b\uff1ahttps://huggingface.co/kifai/GECKO-7B\u3002|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430](http://arxiv.org/abs/2405.17430)|null|## \u80cc\u666f \u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08\u5982LLaVA\uff09\u5728\u89c6\u89c9-\u8bed\u8a00\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4e9b\u6a21\u578b\u9996\u5148\u5c06\u56fe\u50cf\u5d4c\u5165\u5230\u5927\u91cf\u7684\u56fa\u5b9a\u89c6\u89c9\u4ee4\u724c\u4e2d\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5728\u5904\u7406\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u89c6\u9891\u7b49\u5bc6\u96c6\u89c6\u89c9\u573a\u666f\u65f6\u4f1a\u5bfc\u81f4\u5927\u91cf\u4ee4\u724c\uff0c\u4ece\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u5c3d\u7ba1\u5b58\u5728\u4ee4\u724c\u526a\u679d/\u5408\u5e76\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u4e3a\u6bcf\u4e2a\u56fe\u50cf\u751f\u6210\u5355\u4e2a\u957f\u5ea6\u7684\u8f93\u51fa\uff0c\u65e0\u6cd5\u5728\u4fe1\u606f\u5bc6\u5ea6\u4e0e\u6548\u7387\u4e4b\u95f4\u7075\u6d3b\u6743\u8861\u3002\u53d7\u5230\u5957\u5a03\u73a9\u5076\u6982\u5ff5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86M3\uff1a\u5957\u5a03\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u5b66\u4e60\u5c06\u89c6\u89c9\u5185\u5bb9\u8868\u793a\u4e3a\u6355\u6349\u4e0d\u540c\u7c97\u7ec6\u7c92\u5ea6\u4fe1\u606f\u7684\u5d4c\u5957\u89c6\u89c9\u4ee4\u724c\u96c6\u5408\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u65b9\u6cd5\u4e3aLMMs\u5e26\u6765\u4e86\u51e0\u4e2a\u72ec\u7279\u7684\u4f18\u52bf\uff1a(1) \u5728\u6d4b\u8bd5\u5b9e\u4f8b\u4e2d\uff0c\u7528\u6237\u53ef\u4ee5\u660e\u786e\u63a7\u5236\u89c6\u89c9\u7c92\u5ea6\uff0c\u4f8b\u5982\uff0c\u6839\u636e\u5185\u5bb9\u7684\u590d\u6742\u6027\u6216\u7b80\u6d01\u6027\u8c03\u6574\u7528\u4e8e\u8868\u793a\u56fe\u50cf\u7684\u4ee4\u724c\u6570\u91cf\uff1b(2) M3\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5206\u6790\u73b0\u6709\u6570\u636e\u96c6\u6240\u9700\u7c92\u5ea6\u7684\u6846\u67b6\uff0c\u6211\u4eec\u53d1\u73b0\u50cfCOCO\u8fd9\u6837\u7684\u57fa\u51c6\u53ea\u9700\u8981\u5927\u7ea6~9\u4e2a\u89c6\u89c9\u4ee4\u724c\u5c31\u80fd\u83b7\u5f97\u4e0e\u4f7f\u7528\u6240\u6709576\u4e2a\u4ee4\u724c\u76f8\u5f53\u7684\u51c6\u786e\u6027\uff1b(3) \u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u63a2\u7d22\u6027\u80fd\u4e0e\u89c6\u89c9\u4ee4\u724c\u957f\u5ea6\u4e4b\u95f4\u7684\u6700\u4f73\u6743\u8861\u63d0\u4f9b\u4e86\u57fa\u7840\uff0c\u7814\u7a76\u663e\u793a\u5f53\u524d\u56fa\u5b9a\u89c4\u6a21\u8868\u793a\u4e0e\u7406\u60f3\u4e0a\u9650\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002|\n", "2405.17428": "|**2024-05-27**|**NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models**|Chankyu Lee et.al.|[2405.17428](http://arxiv.org/abs/2405.17428)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNV-Embed\u7684\u65b0\u578b\u5927\u8bed\u8a00\u6a21\u578b\uff0c\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u63d0\u5347\u57fa\u4e8e\u89e3\u7801\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u5d4c\u5165\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5bc6\u96c6\u5411\u91cf\u68c0\u7d22\u3002NV-Embed\u901a\u8fc7\u591a\u79cd\u67b6\u6784\u8bbe\u8ba1\u548c\u8bad\u7ec3\u7b56\u7565\u663e\u8457\u589e\u5f3a\u6a21\u578b\u7684\u7075\u6d3b\u6027\u548c\u8868\u73b0\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u7b80\u6d01\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002 \u5728\u67b6\u6784\u65b9\u9762\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u9690\u5f0f\u6ce8\u610f\u529b\u5c42\u6765\u83b7\u53d6\u6c60\u5316\u5d4c\u5165\uff0c\u8fd9\u5728\u68c0\u7d22\u548c\u4e0b\u6e38\u4efb\u52a1\u51c6\u786e\u6027\u4e0a\u5747\u4f18\u4e8e\u5e73\u5747\u6c60\u5316\u6216\u4f7f\u7528LLMs\u7684\u6700\u540e\u4e00\u4e2a token\u5d4c\u5165\u3002\u4e3a\u4e86\u6539\u8fdb\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u79fb\u9664\u4e86LLMs\u7684\u81ea\u56de\u5f52\u6ce8\u610f\u529b\u63a9\u7801\uff0c\u5728\u5bf9\u6bd4\u6027\u8bad\u7ec3\u4e2d\u5141\u8bb8\u66f4\u5168\u9762\u7684\u4fe1\u606f\u4ea4\u4e92\u3002 \u5728\u8bad\u7ec3\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u4e24\u9636\u6bb5\u7684\u5bf9\u6bd4\u6027\u6307\u4ee4\u8c03\u4f18\u65b9\u6cd5\u3002\u7b2c\u4e00\u9636\u6bb5\u5728\u68c0\u7d22\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6307\u4ee4\u8bad\u7ec3\uff0c\u5229\u7528\u6279\u6b21\u5185\u8d1f\u6837\u672c\u548c\u7cbe\u5fc3\u6311\u9009\u7684\u96be\u4f8b\u3002\u7b2c\u4e8c\u9636\u6bb5\u5c06\u5404\u79cd\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u6570\u636e\u878d\u5165\u6307\u4ee4\u8c03\u4f18\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u975e\u68c0\u7d22\u4efb\u52a1\u7684\u51c6\u786e\u6027\uff0c\u8fd8\u63d0\u5347\u4e86\u68c0\u7d22\u6027\u80fd\u3002 \u51ed\u501f\u8fd9\u4e9b\u521b\u65b0\uff0cNV-Embed\u4ec5\u4f7f\u7528\u516c\u5f00\u6570\u636e\u5c31\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u9ad8\u5206\uff0c\u8fbe\u523069.32\uff0c\u8363\u767b\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\uff08\u622a\u81f32024\u5e745\u670824\u65e5\uff09\u699c\u9996\uff0c\u6db5\u76d656\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u91cd\u6392\u3001\u5206\u7c7b\u3001\u805a\u7c7b\u548c\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u5ea6\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728BEIR\u768415\u9879\u68c0\u7d22\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u6700\u9ad8\u768459.36\u5206\u3002NV-Embed\u6a21\u578b\u7684\u6e90\u4ee3\u7801\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u5f00\u6e90\uff1ahttps://huggingface.co/nvidia/NV-Embed-v1\u3002|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427](http://arxiv.org/abs/2405.17427)|**[link](https://github.com/kuanchihhuang/reason3d)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5b83\u4eec\u5728\u6982\u5ff5\u63a8\u7406\u7b49\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5728\u7406\u89e3\u4e09\u7ef4\u73af\u5883\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u76f8\u5bf9\u6709\u9650\u3002\u672c\u6587\u63d0\u51faReason3D\uff0c\u8fd9\u662f\u4e00\u79cd\u4e13\u4e3a\u5168\u97623D\u7406\u89e3\u8bbe\u8ba1\u7684\u65b0\u9896LLM\u3002Reason3D\u63a5\u53d7\u70b9\u4e91\u6570\u636e\u548c\u6587\u672c\u63d0\u793a\u4f5c\u4e3a\u8f93\u5165\uff0c\u751f\u6210\u6587\u672c\u54cd\u5e94\u548c\u5206\u5272\u63a9\u7801\uff0c\u652f\u6301\u9ad8\u7ea7\u4efb\u52a1\uff0c\u59823D\u63a8\u7406\u5206\u5272\u3001\u5c42\u6b21\u641c\u7d22\u3001\u8868\u8fbe\u5f0f\u6307\u4ee3\u548c\u8be6\u7ec6\u63a9\u7801\u8f93\u51fa\u7684\u95ee\u7b54\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u5206\u5c42\u63a9\u7801\u89e3\u7801\u5668\uff0c\u80fd\u591f\u7cbe\u786e\u5b9a\u4f4d\u5e7f\u9614\u573a\u666f\u4e2d\u7684\u5c0f\u7269\u4f53\u3002\u8be5\u89e3\u7801\u5668\u9996\u5148\u751f\u6210\u4e00\u4e2a\u7c97\u7565\u7684\u4f4d\u7f6e\u4f30\u8ba1\uff0c\u8986\u76d6\u7269\u4f53\u7684\u5927\u81f4\u533a\u57df\uff0c\u7136\u540e\u91c7\u7528\u9010\u6b65\u7ec6\u5316\u7684\u7b56\u7565\uff0c\u663e\u8457\u63d0\u9ad8\u5bf9\u8c61\u8bc6\u522b\u548c\u5206\u5272\u7684\u7cbe\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cReason3D\u5728ScanNet\u548cMatterport3D\u7b49\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\uff0c\u57283D\u8868\u8fbe\u5f0f\u6307\u4ee3\u30013D\u95ee\u7b54\u548c3D\u63a8\u7406\u5206\u5272\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1ahttps://github.com/KuanchihHuang/Reason3D\u3002**|\n", "2405.17424": "|**2024-05-27**|**LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence**|Zhuoling Li et.al.|[2405.17424](http://arxiv.org/abs/2405.17424)|null|\u7531\u4e8e\u5b9e\u4f53\u4ee3\u7406\u9700\u8981\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u5b83\u4eec\u5fc5\u987b\u5177\u5907\u5168\u9762\u7684\u5148\u9a8c\u77e5\u8bc6\u3001\u957f\u8fdc\u89c4\u5212\u80fd\u529b\u4ee5\u53ca\u5feb\u901f\u54cd\u5e94\u901f\u5ea6\u3002\u5c3d\u7ba1\u8fd1\u671f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\u3002\u4f8b\u5982\uff0cLLM\u7684\u8f93\u51fa\u901a\u5e38\u662f\u63cf\u8ff0\u6027\u7684\u53e5\u5b50\uff0c\u5728\u786e\u5b9a\u5177\u4f53\u52a8\u4f5c\u65f6\u53ef\u80fd\u5b58\u5728\u6b67\u4e49\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5927\u578b\u81ea\u56de\u5f52\u6a21\u578b\uff08LARM\uff09\u3002LARM\u5229\u7528\u6587\u672c\u548c\u591a\u89c6\u89d2\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u81ea\u56de\u5f52\u65b9\u5f0f\u9884\u6d4b\u540e\u7eed\u52a8\u4f5c\u3002\u4e3a\u4e86\u8bad\u7ec3LARM\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u683c\u5f0f\uff0c\u79f0\u4e3a\u81ea\u56de\u5f52\u8282\u70b9\u4f20\u8f93\u7ed3\u6784\uff0c\u5e76\u6784\u5efa\u4e86\u76f8\u5e94\u7684\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\uff0cLARM\u6210\u529f\u5728\u300a\u6211\u7684\u4e16\u754c\u300b\uff08Minecraft\uff09\u4e2d\u6536\u96c6\u9b54\u6cd5\u88c5\u5907\uff0c\u8fd9\u6bd4\u5148\u524d\u6700\u4f73\u65b9\u6cd5\u6240\u80fd\u8fbe\u5230\u7684\u6210\u5c31\u9700\u8981\u66f4\u590d\u6742\u7684\u51b3\u7b56\u94fe\u3002\u6b64\u5916\uff0cLARM\u7684\u901f\u5ea6\u662f\u6700\u5feb\u7684\uff0c\u6bd4\u4ee5\u524d\u5feb6.8\u500d\u3002|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418](http://arxiv.org/abs/2405.17418)|null|\u5f53\u673a\u5668\u4eba\u64cd\u4f5c\u7b56\u7565\u9762\u5bf9\u65b0\u4efb\u52a1\u6216\u7269\u4f53\u5b9e\u4f8b\u65f6\uff0c\u5176\u52a8\u4f5c\u6027\u80fd\u5f80\u5f80\u4e0d\u5c3d\u4eba\u610f\u3002\u56e0\u6b64\uff0c\u81ea\u52a8\u68c0\u6d4b\u548c\u81ea\u6211\u7ea0\u6b63\u5931\u8d25\u52a8\u4f5c\u7684\u80fd\u529b\u5bf9\u4e8e\u5b9e\u9645\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLM\uff09\u5728\u89c6\u89c9\u6307\u4ee4\u8ddf\u968f\u65b9\u9762\u5c55\u73b0\u51fa\u524d\u666f\uff0c\u5e76\u5728\u591a\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u5c06\u901a\u7528MLLM\u4f5c\u4e3a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4eba\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Self-Corrected (SC)-MLLM\uff0c\u4e0d\u4ec5\u4f7f\u5176\u80fd\u591f\u9884\u6d4b\u672b\u7aef\u6267\u884c\u5668\u4f4d\u7f6e\uff0c\u8fd8\u8d4b\u4e88\u5176\u81ea\u4e3b\u8bc6\u522b\u5e76\u7ea0\u6b63\u9519\u8bef\u52a8\u4f5c\u7684\u80fd\u529b\u3002\u9996\u5148\uff0c\u6211\u4eec\u901a\u8fc7\u53c2\u6570\u6548\u7387\u9ad8\u7684\u5fae\u8c03\uff0c\u4f7fMLLM\u5177\u5907\u59ff\u6001\u9884\u6d4b\u529f\u80fd\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u4e00\u4e2a\u8bed\u8a00\u5efa\u6a21\u95ee\u9898\u3002\u5728\u9047\u5230\u6267\u884c\u5931\u8d25\u65f6\uff0c\u6a21\u578b\u80fd\u8bc6\u522b\u4f4e\u5c42\u6b21\u52a8\u4f5c\u9519\u8bef\u7684\u539f\u56e0\uff08\u5982\u4f4d\u7f6e\u548c\u65cb\u8f6c\u8bef\u5dee\uff09\uff0c\u5e76\u4e3b\u52a8\u5bfb\u6c42\u4e13\u5bb6\u7684\u63d0\u793a\u3002\u6839\u636e\u53cd\u9988\uff0cSC-MLLM\u4f1a\u91cd\u65b0\u601d\u8003\u5f53\u524d\u5931\u8d25\u573a\u666f\uff0c\u751f\u6210\u4fee\u6b63\u540e\u7684\u52a8\u4f5c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u8fde\u7eed\u7b56\u7565\u5b66\u4e60\u65b9\u6cd5\uff0c\u9488\u5bf9\u6210\u529f\u7ea0\u6b63\u7684\u6837\u672c\uff0c\u63d0\u5347\u6a21\u578b\u5bf9\u5f53\u524d\u573a\u666f\u914d\u7f6e\u7684\u9002\u5e94\u6027\uff0c\u51cf\u5c11\u4e13\u5bb6\u5e72\u9884\u7684\u9891\u7387\u3002 \u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684SC-MLLM\uff0c\u6211\u4eec\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u6700\u5148\u8fdb\u7684\u673a\u5668\u4ebaMLLM\uff08ManipLLM\uff09\u76f8\u6bd4\uff0cSC-MLLM\u663e\u8457\u63d0\u9ad8\u4e86\u64cd\u4f5c\u7cbe\u5ea6\uff1a\u5728\u5df2\u77e5\u7269\u4f53\u7c7b\u522b\u4e0a\u4ece57%\u63d0\u5347\u81f379%\uff0c\u5728\u672a\u77e5\u65b0\u7c7b\u522b\u4e0a\u4ece47%\u63d0\u5347\u81f369%\u3002|\n", "2405.17402": "|**2024-05-27**|**THREAD: Thinking Deeper with Recursive Spawning**|Philip Schroeder et.al.|[2405.17402](http://arxiv.org/abs/2405.17402)|**[link](https://github.com/philipmit/thread)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u573a\u666f\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u4f46\u968f\u7740\u4e0a\u4e0b\u6587\u7684\u957f\u5ea6\u548c\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u5b83\u4eec\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Thinking Recursively and Dynamically\uff08ThReaD\uff09\u65b9\u6cd5\u3002ThReaD\u5c06\u6a21\u578b\u751f\u6210\u8fc7\u7a0b\u6784\u60f3\u4e3a\u4e00\u4e2a\u6267\u884c\u6d41\u7a0b\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u53ef\u4ee5\u5b8c\u6574\u8fd0\u884c\u6216\u52a8\u6001\u5730\u521b\u5efa\u65b0\u7ebf\u7a0b\u3002\u901a\u8fc7\u5b50\u7ebf\u7a0b\uff0c\u6a21\u578b\u53ef\u4ee5\u5206\u53d1\u4efb\u52a1\uff08\u5982\u601d\u8003\u3001\u83b7\u53d6\u4fe1\u606f\uff09\uff0c\u5b50\u7ebf\u7a0b\u53ea\u8fd4\u56de\u7236\u7ebf\u7a0b\u6240\u9700\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u8ba9\u6a21\u578b\u80fd\u591f\u6839\u636e\u9700\u8981\u8c03\u6574\u4ea7\u751f\u4ee4\u724c\u65f6\u4f7f\u7528\u7684\u4e2d\u95f4\u5de5\u4f5c\u91cf\u3002\u6211\u4eec\u5728\u4efb\u52a1\u89e3\u51b3\u548c\u95ee\u7b54\u7b49\u573a\u666f\u4e2d\u5e94\u7528ThReaD\uff0c\u4f7f\u5176\u80fd\u9012\u5f52\u5730\u5c06\u7ed9\u5b9a\u7684\u4efb\u52a1\u6216\u95ee\u9898\u5206\u89e3\u4e3a\u9010\u6b65\u7b80\u5316\u7684\u5c0f\u5b50\u95ee\u9898\uff0c\u7531\u5355\u72ec\u7684\u5b50\u7ebf\u7a0b\u89e3\u51b3\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u7684\u65b9\u5f0f\u5b9e\u73b0ThReaD\uff0c\u5e76\u5728\u5305\u62ecALFWorld\u3001TextCraft\u3001WebShop\u5728\u5185\u7684\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u8bc4\u4f30GPT-4\u548cGPT-3.5\u7684\u8868\u73b0\uff0c\u4ee5\u53ca\u4e24\u4e2a\u65b0\u57fa\u51c6\uff1aDataCommons QA\u548cMIMIC-III ICU QA\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cThReaD\u5728\u8fd9\u4e9b\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u76f8\u5bf9\u4e8e\u73b0\u6709\u6846\u67b6\uff0c\u5373\u4f7f\u662f\u5c0f\u578b\u6a21\u578b\uff08\u5982Llama-3-8b\u548cCodeLlama-7b\uff09\u4e5f\u80fd\u63d0\u534710%\u523050%\u7684\u7edd\u5bf9\u5206\u6570\u3002|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386](http://arxiv.org/abs/2405.17386)|**[link](https://github.com/cone-mt/mindmerger)**|## \u4efb\u52a1 \u63a8\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u82f1\u8bed\u4e0e\u5176\u4ed6\u975e\u82f1\u8bed\u8bed\u8a00\u4e4b\u95f4\u7684\u5dee\u8ddd\u660e\u663e\u3002\u4e00\u4e9b\u7814\u7a76\u901a\u8fc7\u5fae\u8c03LLMs\u4ee5\u91cd\u65b0\u5b66\u4e60\u975e\u82f1\u8bed\u7684\u63a8\u7406\u80fd\u529b\uff0c\u800c\u53e6\u4e00\u4e9b\u65b9\u6cd5\u5219\u4f7f\u7528\u5916\u90e8\u6a21\u578b\uff08\u5982\u82f1\u8bed\u7ffb\u8bd1\u6587\u672c\uff09\u7684\u8f93\u51fa\u6765\u66ff\u6362\u975e\u82f1\u8bed\u8f93\u5165\uff0c\u4ee5\u5e94\u5bf9LLM\u7406\u89e3\u975e\u82f1\u8bed\u7684\u6311\u6218\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u672a\u80fd\u5145\u5206\u5229\u7528LLMs\u5185\u5728\u7684\u63a8\u7406\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u66f4\u597d\u5730\u5229\u7528LLMs\u7684\u601d\u7ef4\u548c\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u79f0\u4e3aMindMerger\uff0c\u5b83\u5c06LLMs\u4e0e\u591a\u8bed\u8a00\u6a21\u578b\u7684\u5916\u90e8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u4ee5\u63d0\u5347\u591a\u8bed\u8a00\u63a8\u7406\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e24\u6b65\u8bad\u7ec3\u7b56\u7565\uff0c\u9996\u5148\u5c06\u5916\u90e8\u80fd\u529b\u5d4c\u5165LLMs\uff0c\u7136\u540e\u8bad\u7ec3\u5916\u90e8\u80fd\u529b\u548c\u5185\u7f6e\u80fd\u529b\u7684\u534f\u4f5c\u4f7f\u7528\u3002\u5728\u4e09\u4e2a\u591a\u8bed\u8a00\u63a8\u7406\u6570\u636e\u96c6\u548c\u4e00\u4e2a\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMindMerger\u59cb\u7ec8\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u7279\u522b\u662f\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e0a\u3002\u5728\u4e0d\u66f4\u65b0LLMs\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0cMGSM\u6570\u636e\u96c6\u4e0a\u6240\u6709\u8bed\u8a00\u7684\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e866.7%\uff0c\u4f4e\u8d44\u6e90\u8bed\u8a00\u63d0\u9ad8\u4e868.0%\u3002|\n", "2405.17382": "|**2024-05-27**|**ReMoDetect: Reward Models Recognize Aligned LLM's Generations**|Hyunseok Lee et.al.|[2405.17382](http://arxiv.org/abs/2405.17382)|**[link](https://github.com/hyunseoklee-ai/reward_llm_detect)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6613\u7528\u6027\u63d0\u5347\uff0c\u5b83\u4eec\u5e26\u6765\u7684\u793e\u4f1a\u98ce\u9669\uff0c\u5982\u5047\u65b0\u95fb\u751f\u6210\uff0c\u4fc3\u4f7f\u5f00\u53d1\u51fa\u80fd\u68c0\u6d4bLLM\u751f\u6210\u6587\u672c\uff08LGT\uff09\u7684\u65b9\u6cd5\u4ee5\u786e\u4fdd\u5b89\u5168\u4f7f\u7528\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5927\u91cfLLM\u7684\u5b58\u5728\uff0c\u9010\u4e2a\u8bc6\u522b\u5b83\u4eec\u7684\u7279\u70b9\u53d8\u5f97\u4e0d\u5207\u5b9e\u9645\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u5173\u6ce8\u7684\u662f\u8fd9\u4e9b\u5f3a\u5927\u6a21\u578b\u5171\u6709\u7684\u7279\u6027\uff0c\u5373\u201c\u5bf9\u9f50\u8bad\u7ec3\u201d\uff0c\u5373\u8bad\u7ec3LLMs\u751f\u6210\u66f4\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u6587\u672c\u3002\u6211\u4eec\u7684\u5173\u952e\u53d1\u73b0\u662f\uff0c\u968f\u7740\u8fd9\u4e9b\u5bf9\u9f50\u8bad\u7ec3\u7684LLMs\u81f4\u529b\u4e8e\u6700\u5927\u5316\u4eba\u7c7b\u504f\u597d\uff0c\u5b83\u4eec\u751f\u6210\u7684\u6587\u672c\u751a\u81f3\u6bd4\u4eba\u7c7b\u64b0\u5199\u7684\u6587\u672c\u5728\u4f30\u8ba1\u504f\u597d\u4e0a\u66f4\u9ad8\uff0c\u8fd9\u4f7f\u5f97\u5229\u7528\u504f\u597d\u6a21\u578b\uff08\u4e00\u4e2a\u8bad\u7ec3\u6765\u6a21\u62df\u4eba\u7c7b\u504f\u597d\u5206\u5e03\u7684LLM\uff09\u8f7b\u6613\u5c31\u80fd\u68c0\u6d4b\u5230\u8fd9\u4e9b\u6587\u672c\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e24\u79cd\u8fdb\u4e00\u6b65\u589e\u5f3a\u504f\u597d\u6a21\u578b\u68c0\u6d4b\u80fd\u529b\u7684\u8bad\u7ec3\u7b56\u7565\uff1a\uff081\uff09\u6301\u7eed\u504f\u597d\u5fae\u8c03\uff0c\u4f7f\u6a21\u578b\u66f4\u504f\u5411\u4e8e\u8bc6\u522b\u5bf9\u9f50\u7684LLG\uff1b\uff082\uff09\u5956\u52b1\u6a21\u578b\u5bf9\u4eba/LLM\u6df7\u5408\u6587\u672c\u7684\u5b66\u4e60\uff0c\u5373\u4f7f\u7528\u5bf9\u9f50LLM\u91cd\u8ff0\u7684\u4eba\u7c7b\u539f\u521b\u6587\u672c\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8eLGT\u548c\u4eba\u7c7b\u6587\u672c\u4e4b\u95f4\u7684\u504f\u597d\u57fa\u51c6\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u5b66\u4e60\u51b3\u7b56\u8fb9\u754c\u3002\u6211\u4eec\u5728\u516d\u4e2a\u6587\u672c\u9886\u57df\u548c\u5341\u4e8c\u79cd\u5bf9\u9f50LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5728https://github.com/hyunseoklee-ai/reward_llm_detect\u4e0a\u63d0\u4f9b\u3002|\n", "2405.17378": "|**2024-05-27**|**RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects**|Ahmed Allam et.al.|[2405.17378](http://arxiv.org/abs/2405.17378)|**[link](https://github.com/AUCOHL/RTL-Repo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8f85\u52a9\u8fdb\u884c\u5bc4\u5b58\u5668\u4f20\u8f93\u7ea7\uff08Register Transfer Level, RTL\uff09\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5728\u53cd\u6620\u771f\u5b9e\u4e16\u754cRTL\u9879\u76ee\u590d\u6742\u6027\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u3002\u4e3a\u6b64\uff0c\u8be5\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u57fa\u51c6\u2014\u2014RTL-Repo\uff0c\u4e13\u4e3a\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5927\u89c4\u6a21RTL\u8bbe\u8ba1\u9879\u76ee\u4e2d\u7684\u6027\u80fd\u800c\u8bbe\u8ba1\u3002RTL-Repo\u5305\u542b\u4e86\u4eceGitHub\u516c\u5171\u4ed3\u5e93\u63d0\u53d6\u7684\u8d85\u8fc74000\u4e2aVerilog\u4ee3\u7801\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u63d0\u4f9b\u4e86\u5bf9\u5e94\u4ed3\u5e93\u7684\u5b8c\u6574\u4e0a\u4e0b\u6587\u3002\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u3001GPT-3.5\u3001Starcoder2\u4ee5\u53ca\u50cfVeriGen\u548cRTLCoder\u8fd9\u6837\u7684Verilog\u4e13\u7528\u6a21\u578b\u5728\u5185\u7684\u591a\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728RTL-Repo\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6bd4\u8f83\u5b83\u4eec\u5728\u751f\u6210\u590d\u6742\u9879\u76ee\u7684Verilog\u4ee3\u7801\u65b9\u9762\u7684\u8868\u73b0\u3002RTL-Repo\u4e3a\u786c\u4ef6\u8bbe\u8ba1\u793e\u533a\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b9d\u8d35\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645RTL\u8bbe\u8ba1\u573a\u666f\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u9488\u5bf9\u590d\u6742\u7684\u591a\u6587\u4ef6RTL\u9879\u76ee\u4e13\u95e8\u8bad\u7ec3Verilog\u4ee3\u7801\u751f\u6210\u3002RTL-Repo\u662f\u5f00\u6e90\u7684\uff0c\u5df2\u5728GitHub\u4e0a\u516c\u5f00\u53ef\u7528\u3002|\n", "2405.17374": "|**2024-05-28**|**Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models**|ShengYun Peng et.al.|[2405.17374](http://arxiv.org/abs/2405.17374)|null|### \u80cc\u666f \u5b89\u5168\u6821\u51c6\u662f\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u884c\u4e3a\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u5e76\u907f\u514d\u6709\u5bb3\u884c\u4e3a\u7684\u5173\u952e\uff0c\u4f46\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u4ec5\u4f7f\u7528\u5c11\u91cf\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bad\u7ec3\u6837\u672c\u6765\u5fae\u8c03\u6a21\u578b\u53ef\u80fd\u5bfc\u81f4\u5b89\u5168\u6027\u88ab\u8f7b\u6613\u7834\u574f\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u901a\u8fc7\u63a2\u7d22LLM\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u98ce\u9669\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u4e00\u4e2a\u666e\u904d\u5b58\u5728\u4e8e\u6d41\u884c\u5f00\u6e90LLM\u6a21\u578b\u53c2\u6570\u7a7a\u95f4\u4e2d\u7684\u65b0\u73b0\u8c61\uff0c\u79f0\u4e3a\u201c\u5b89\u5168\u76c6\u5730\u201d\uff1a\u968f\u673a\u6270\u52a8\u6a21\u578b\u6743\u91cd\u80fd\u4f7f\u6a21\u578b\u5728\u5c40\u90e8\u533a\u57df\u4fdd\u6301\u539f\u59cb\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u6027\u3002 ### \u53d1\u73b0\u4e0e\u8d21\u732e \u6211\u4eec\u7684\u53d1\u73b0\u542f\u53d1\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5b89\u5168\u5ea6\u91cf\u65b9\u6cd5\u2014\u2014VISAGE\uff0c\u5b83\u901a\u8fc7\u63a2\u6d4b\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6765\u8bc4\u4f30LLM\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u5b89\u5168\u6027\u3002\u53ef\u89c6\u5316\u6821\u51c6\u6a21\u578b\u7684\u5b89\u5168\u666f\u89c2\u6709\u52a9\u4e8e\u7406\u89e3\u5fae\u8c03\u5982\u4f55\u4f7f\u6a21\u578b\u504f\u79bb\u5b89\u5168\u76c6\u5730\uff0c\u4ece\u800c\u635f\u5bb3\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u7cfb\u7edf\u63d0\u793a\u5728\u4fdd\u62a4\u6a21\u578b\u65b9\u9762\u7684\u91cd\u8981\u6027\uff0c\u8fd9\u79cd\u4fdd\u62a4\u751a\u81f3\u4f1a\u4f20\u9012\u7ed9\u5904\u4e8e\u5b89\u5168\u76c6\u5730\u5185\u7684\u6270\u52a8\u7248\u672c\u3002\u8fd9\u4e9b\u4ece\u5b89\u5168\u666f\u89c2\u7814\u7a76\u4e2d\u5f97\u51fa\u7684\u89c1\u89e3\u4e3a\u672a\u6765LLM\u5b89\u5168\u9886\u57df\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b0\u7684\u6d1e\u89c1\u3002|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414](http://arxiv.org/abs/2405.18414)|null|## \u80cc\u666f \u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval Augmented Generation\uff0cRAG\uff09\u901a\u8fc7\u7ed3\u5408\u73b0\u6709\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u663e\u8457\u63d0\u5347\u4e86\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u7684\u54cd\u5e94\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u6587\u6863\u4e0e\u95ee\u9898\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u6027\u4e0d\u660e\u663e\u6216\u5b58\u5728\u90e8\u5206\u4fe1\u606f\u65f6\uff0cRAG\u7684\u6548\u679c\u5982\u4f55\uff1f\u53c8\u8be5\u5982\u4f55\u5904\u7406\u6587\u6863\u4e4b\u95f4\u7684\u5173\u8054\u6027\u5462\uff1f\u672c\u7814\u7a76\u65e8\u5728\u89e3\u7b54RAG\u751f\u6210\u4e2d\u7684\u8fd9\u4e24\u4e2a\u6838\u5fc3\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aG-RAG\u7684\u65b9\u6cd5\uff0c\u5b83\u662f\u4e00\u4e2a\u57fa\u4e8e\u56fe\u795e\u7ecf\u7f51\u7edc\uff08Graph Neural Networks\uff0cGNNs\uff09\u7684\u91cd\u6392\u5668\uff0c\u4ecb\u4e8eRAG\u7684\u68c0\u7d22\u5668\u548c\u9605\u8bfb\u5668\u4e4b\u95f4\u3002G-RAG\u7ed3\u5408\u4e86\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u63a5\u6027\u548c\u8bed\u4e49\u4fe1\u606f\uff08\u901a\u8fc7\u62bd\u8c61\u610f\u4e49\u8868\u793a\u56fe\uff09\uff0c\u4e3aRAG\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5177\u6709\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u6392\u540d\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cG-RAG\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u9886\u5148\u65b9\u6cd5\uff0c\u540c\u65f6\u8ba1\u7b97\u5f00\u9500\u66f4\u5c0f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86PaLM 2\u4f5c\u4e3a\u91cd\u6392\u5668\u7684\u8868\u73b0\uff0c\u53d1\u73b0\u5176\u660e\u663e\u900a\u8272\u4e8eG-RAG\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5373\u4f7f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u91cd\u6392\u5728RAG\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2405.18386": "|**2024-05-28**|**Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning**|Yixiao Zhang et.al.|[2405.18386](http://arxiv.org/abs/2405.18386)|**[link](https://github.com/ldzhangyx/instruct-MusicGen)**|**\u5728\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u9886\u57df\uff0c\u8fd1\u671f\u7684\u8fdb\u6b65\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u6765\u6539\u53d8\u97f3\u4e50\u98ce\u683c\u6216\u8c03\u6574\u4e50\u5668\u5143\u7d20\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u9700\u8981\u4ece\u5934\u8bad\u7ec3\u7279\u5b9a\u7684\u7f16\u8f91\u6a21\u578b\uff0c\u8017\u65f6\u4e14\u8d44\u6e90\u5bc6\u96c6\uff0c\u8981\u4e48\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u6d4b\u7f16\u8f91\u540e\u7684\u97f3\u4e50\uff0c\u5bfc\u81f4\u97f3\u9891\u91cd\u5efa\u4e0d\u591f\u7cbe\u786e\u3002\u4e3a\u4e86\u7ed3\u5408\u4f18\u70b9\u5e76\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Instruct-MusicGen\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u9488\u5bf9\u9884\u8bad\u7ec3\u7684MusicGen\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9ad8\u6548\u5730\u6267\u884c\u7f16\u8f91\u6307\u4ee4\uff0c\u5982\u6dfb\u52a0\u3001\u5220\u9664\u6216\u5206\u79bb\u97f3\u8f68\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4fee\u6539\u4e86\u539f\u59cbMusicGen\u67b6\u6784\uff0c\u5f15\u5165\u4e86\u6587\u672c\u878d\u5408\u6a21\u5757\u548c\u97f3\u9891\u878d\u5408\u6a21\u5757\uff0c\u4f7f\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6307\u4ee4\u6587\u672c\u548c\u97f3\u9891\u8f93\u5165\uff0c\u751f\u6210\u6240\u9700\u7684\u7f16\u8f91\u97f3\u4e50\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cInstruct-MusicGen\u4ec5\u5411\u539f\u59cb\u6a21\u578b\u589e\u52a0\u4e868%\u7684\u65b0\u53c2\u6570\uff0c\u5e76\u57285000\u6b65\u7684\u8bad\u7ec3\u540e\uff0c\u5176\u6027\u80fd\u8d85\u8d8a\u73b0\u6709\u57fa\u51c6\uff0c\u4e14\u8868\u73b0\u51fa\u4e0e\u4e13\u95e8\u9488\u5bf9\u4efb\u52a1\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u5f53\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u8fdb\u5c55\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230\u97f3\u4e50\u7f16\u8f91\u7684\u6548\u7387\uff0c\u8fd8\u62d3\u5bbd\u4e86\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u5728\u52a8\u6001\u97f3\u4e50\u5236\u4f5c\u73af\u5883\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002**|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380](http://arxiv.org/abs/2405.18380)|**[link](https://github.com/pixeli99/owlore)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\u3002\u7136\u800c\uff0c\u5927\u6a21\u578b\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u7b49\u53c2\u6570\u9ad8\u6548\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u4f46\u5f80\u5f80\u727a\u7272\u6027\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5185\u5b58\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\u2014\u2014Outlier-weighed Layerwise Sampled Low-Rank Projection\uff08OwLore\uff09\uff0c\u5b83\u53d7\u5230LLMs\u5c42\u95f4\u5f02\u5e38\u5206\u5e03\u7684\u542f\u53d1\uff0c\u901a\u8fc7\u52a8\u6001\u91c7\u6837\u9884\u8bad\u7ec3\u5c42\u800c\u975e\u6dfb\u52a0\u989d\u5916\u9002\u914d\u5668\u6765\u8fdb\u884c\u5fae\u8c03\u3002\u6211\u4eec\u9996\u5148\u901a\u8fc7Heavy-Tailed Self-Regularization\u7406\u8bba\uff08HT-SR\uff09\u89e3\u8bfb\u5f02\u5e38\u73b0\u8c61\uff0c\u53d1\u73b0\u5177\u6709\u66f4\u591a\u5f02\u5e38\u503c\u7684\u5c42\u66f4\u503e\u5411\u4e8e\u5448\u73b0\u957f\u5c3e\u5206\u5e03\uff0c\u8bad\u7ec3\u6548\u679c\u66f4\u597d\u3002\u56e0\u6b64\uff0cOwLore\u7b56\u7565\u6027\u5730\u4e3a\u5f02\u5e38\u503c\u8f83\u591a\u7684\u5c42\u5206\u914d\u66f4\u9ad8\u7684\u91c7\u6837\u6982\u7387\uff0c\u4ee5\u66f4\u597d\u5730\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u77e5\u8bc6\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u51cf\u5c11\u5fae\u8c03\u65f6\u7684\u5185\u5b58\u9700\u6c42\uff0c\u6211\u4eec\u7ed3\u5408\u68af\u5ea6\u4f4e\u79e9\u6295\u5f71\uff0c\u4f7f\u5f97\u6bcf\u4e00\u5c42\u80fd\u4ee5\u4f4e\u79e9\u65b9\u5f0f\u9ad8\u6548\u8bad\u7ec3\u3002\u901a\u8fc7\u878d\u5408\u4f4e\u79e9\u4f18\u52bf\u548c\u6700\u4f18\u5c42\u522b\u91c7\u6837\u7b56\u7565\uff0cOwLore\u663e\u8457\u4f18\u5316\u4e86LLM\u526a\u679d\u4e2d\u7684\u5185\u5b58-\u6027\u80fd\u6743\u8861\u3002\u6211\u4eec\u5728\u591a\u4e2a\u67b6\u6784\uff0c\u5982LLaMa2\u3001LLaMa3\u548cMistral\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cOwLore\u6301\u7eed\u4f18\u4e8e\u57fa\u7840\u65b9\u6cd5\uff0c\u5305\u62ec\u5168\u91cf\u5fae\u8c03\u3002\u4f8b\u5982\uff0c\u5728\u5e38\u8bc6\u63a8\u7406\u57fa\u51c6\u4e0a\uff0cOwLore\u53ef\u5b9e\u73b0\u5e73\u57471.1%\u7684\u7cbe\u5ea6\u63d0\u5347\uff0cMMLU\u4e0a\u63d0\u9ad83.0%\uff0c\u800c\u5728MT-Bench\u4e0a\u66f4\u662f\u6709\u663e\u8457\u768410%\u63d0\u5347\uff0c\u540c\u65f6\u5185\u5b58\u6548\u7387\u66f4\u9ad8\u3002\u7279\u522b\u5730\uff0cOwLore\u4ec5\u970021GB\u5185\u5b58\u5373\u53ef\u5bf9LLaMa2-7B\u8fdb\u884c\u5fae\u8c03\u3002**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377](http://arxiv.org/abs/2405.18377)|null|\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u590d\u6742\u63a8\u7406\u3001\u60c5\u611f\u5206\u6790\u7b49\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u63a8\u52a8\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5f3a\u5927\u7684\u529f\u80fd\u4f34\u968f\u7740\u5de8\u5927\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u9650\u5236\u4e86\u5728\u5927\u591a\u6570\u786c\u4ef6\u5e73\u53f0\u4e0a\u7684\u4f7f\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8eLLaMA2-7B\u8fdb\u884c\u5355\u6b21\u5fae\u8c03\u540e\uff0c\u901a\u8fc7\u9057\u4f20\u7b97\u6cd5\u641c\u7d22\u627e\u5230\u66f4\u5c0f\u3001\u8ba1\u7b97\u590d\u6742\u5ea6\u66f4\u4f4e\u7684\u7f51\u7edc\u67b6\u6784\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u6807\u51c6\u57fa\u51c6\u4efb\u52a1\uff0c\u9884\u8bad\u7ec3\u7684LLaMA2-7B\u6a21\u578b\u5b9e\u9645\u4e0a\u8fc7\u4e8e\u5e9e\u5927\u4e14\u590d\u6742\u3002\u6211\u4eec\u5b9e\u73b0\u4e861.5\u500d\u7684\u6a21\u578b\u5927\u5c0f\u7f29\u51cf\u548c1.3\u500d\u7684\u541e\u5410\u91cf\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51e0\u4e4e\u65e0\u635f\u7684\u51c6\u786e\u6027\u3002\u76f8\u8f83\u4e8e\u67d0\u4e9b\u526a\u679d\u6216\u7a00\u758f\u5316\u6280\u672f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6548\u7387\u548c\u6548\u679c\u4e0a\u66f4\u4e3a\u4f18\u8d8a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u91cf\u5316\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u7ed3\u5408\u7684\u6548\u679c\uff0c\u8fdb\u4e00\u6b65\u901a\u8fc7\u91cf\u5316\u51cf\u5c11\u4e86\u627e\u5230\u7684\u7f51\u7edc\u7684\u5927\u5c0f\u548c\u590d\u6742\u6027\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u672c\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u79cd\u81ea\u52a8\u521b\u5efa\u53ef\u5728\u66f4\u5ec9\u4ef7\u548c\u5e7f\u6cdb\u53ef\u7528\u786c\u4ef6\u5e73\u53f0\u4e0a\u4f7f\u7528\u7684LLMs\u7684\u65b9\u6cd5\u3002|\n", "2405.18376": "|**2024-05-28**|**Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning**|Dongjie Chen et.al.|[2405.18376](http://arxiv.org/abs/2405.18376)|**[link](https://github.com/Dong-Jie-Chen/RCL)**|**### \u80cc\u666f \u6e90\u514d\u8d39\u9886\u57df\u9002\u5e94\uff08SFDA\uff09\u7684\u76ee\u6807\u662f\u4ec5\u4f7f\u7528\u672a\u6807\u8bb0\u7684\u9776\u57df\u6570\u636e\u6765\u8c03\u6574\u9884\u8bad\u7ec3\u7684\u6e90\u6a21\u578b\u3002\u5f53\u524d\u7684SFDA\u65b9\u6cd5\u5728\u6709\u6548\u5229\u7528\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u6316\u6398\u9776\u57df\u6570\u636e\u6f5c\u529b\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u548c\u6587\u672c\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5e94\u7528\u4e8eSFDA\u65f6\u5b58\u5728\u95ee\u9898\uff0c\u5982\u6307\u4ee4\u6267\u884c\u5931\u8d25\u3001\u8ba1\u7b97\u9700\u6c42\u9ad8\u4ee5\u53ca\u5728\u9002\u5e94\u524d\u6027\u80fd\u8bc4\u4f30\u56f0\u96be\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014\u53ef\u9760\u6027\u57fa\u4e8e\u8bfe\u7a0b\u5b66\u4e60\uff08RCL\uff09\uff0c\u5b83\u901a\u8fc7\u4f2a\u6807\u7b7e\u5316\u6574\u5408\u591a\u4e2aMLLM\u4ee5\u4fc3\u8fdb\u77e5\u8bc6\u5229\u7528\uff0c\u5e94\u7528\u4e8eSFDA\u3002 ### \u65b9\u6cd5 \u6211\u4eec\u7684\u6846\u67b6\u5305\u62ec\uff1a1) \u53ef\u9760\u77e5\u8bc6\u8f6c\u79fb\uff0c2) \u81ea\u6211\u7ea0\u6b63\uff0c3) MLLM\u5f15\u5bfc\u7684\u77e5\u8bc6\u6269\u5c55\uff0c\u4ee5\u53ca4) \u591a\u70ed\u63a9\u7801\u7cbe\u70bc\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u534f\u540c\u4f5c\u7528\uff0c\u9010\u6b65\u53d1\u6398\u9776\u57df\u672a\u6807\u8bb0\u6570\u636e\u7684\u4ef7\u503c\u3002RCL\u5728\u591a\u4e2aSFDA\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\uff08SOTA\uff09\u6027\u80fd\uff0c\u4f8b\u5982\u5728DomainNet\u4e0a\u63d0\u5347\u663e\u8457\uff0c\u8fbe\u5230$\\textbf{+9.4\\%}$\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u589e\u5f3a\u9002\u5e94\u6027\u548c\u9c81\u68d2\u6027\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u540c\u65f6\u65e0\u9700\u8bbf\u95ee\u6e90\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Dong-Jie-Chen/RCL\u83b7\u53d6\u3002**|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375](http://arxiv.org/abs/2405.18375)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\u5e38\u8bc6\u63a8\u7406\u662f\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u7684\u91cd\u8981\u7ec4\u6210\u90e8\u5206\uff0c\u4e3a\u6b64\u5df2\u5f00\u53d1\u51fa\u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5927\u591a\u4ec5\u9650\u4e8e\u82f1\u8bed\u3002\u521b\u5efa\u5e73\u884c\u57fa\u51c6\u6709\u52a9\u4e8e\u8de8\u8bed\u8a00\u8bc4\u4f30\uff0c\u4ece\u800c\u66f4\u597d\u5730\u7406\u89e3\u4e0d\u540c\u8bed\u8a00\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u6cf0\u8bed\u7248\u7684Winograd Schema\u96c6\u5408\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u6d4b\u8bd5\u6cf0\u8bed\u4e2d\u7684\u5e38\u8bc6\u63a8\u7406\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u65b0\u6570\u636e\u96c6\u3002\u6211\u4eec\u901a\u8fc7\u9080\u8bf7\u6bcd\u8bed\u8005\u3001\u4e13\u4e1a\u7ffb\u8bd1\u548c\u4e25\u683c\u9a8c\u8bc1\u7684\u65b9\u6cd5\uff0c\u786e\u4fdd\u8be5\u7cfb\u5217\u9898\u5e93\u80fd\u51c6\u786e\u53cd\u6620\u6cf0\u56fd\u8bed\u8a00\u7684\u72ec\u7279\u6027\u3001\u4e60\u8bed\u548c\u6587\u5316\u5f15\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u7cca\u6027\u548c\u5e38\u8bc6\u6311\u6218\u3002\u6211\u4eec\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4\u548cClaude-3-Opus\uff09\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\u5c3d\u7ba1\u5728\u82f1\u8bed\u4e0a\u8868\u73b0\u4f18\u5f02\uff0c\u4f46\u5b83\u4eec\u5728\u6cf0\u8bed\u4e2d\u7684\u6027\u80fd\u660e\u663e\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u5728\u591a\u8bed\u8a00\u5e38\u8bc6\u63a8\u7406\u65b9\u9762\u4ecd\u6709\u5f85\u8fdb\u6b65\u3002|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369](http://arxiv.org/abs/2405.18369)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u7684\u53d8\u5316\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u5b83\u4eec\u6210\u529f\u7684\u5173\u952e\u5728\u4e8e\u63d0\u793a\u7684\u6982\u5ff5\uff0c\u5373\u6307\u5bfc\u6a21\u578b\u751f\u6210\u8f93\u51fa\u3002\u7136\u800c\uff0c\u624b\u52a8\u521b\u5efa\u63d0\u793a\u65e2\u8017\u65f6\u53c8\u5c40\u9650\u4e8e\u7279\u5b9a\u9886\u57df\uff0c\u56e0\u6b64\u9700\u8981\u81ea\u52a8\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u672c\u6587\u4ecb\u7ecdPromptWizard\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u8fed\u4ee3\u5730\u5408\u6210\u548c\u4f18\u5316\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u63d0\u793a\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0cPromptWizard\u540c\u65f6\u4f18\u5316\u63d0\u793a\u6307\u4ee4\u548c\u4e0a\u4e0b\u6587\u793a\u4f8b\uff0c\u4ee5\u6700\u5927\u5316\u6a21\u578b\u6027\u80fd\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u53d8\u5f02\u6307\u4ee4\u5e76\u5f15\u5165\u8d1f\u4f8b\uff0c\u9010\u6b65\u6df1\u5316\u7406\u89e3\u5e76\u4fdd\u8bc1\u591a\u6837\u6027\u3002\u501f\u52a9\u4e00\u4e2a\u8bc4\u5224\u8005\uff0cPromptWizard\u8fdb\u4e00\u6b65\u6539\u8fdb\u6307\u4ee4\u548c\u793a\u4f8b\uff0c\u878d\u5165\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u4ee5\u5b9e\u73b0\u6700\u4f73\u8868\u73b0\u3002PromptWizard\u5177\u6709\u8ba1\u7b97\u6548\u7387\u9ad8\u3001\u9002\u5e94\u4e0d\u540c\u8bad\u7ec3\u6570\u636e\u91cf\u573a\u666f\u4ee5\u53ca\u5728\u5c0f\u578bLLM\u4e0a\u540c\u6837\u6709\u6548\u7684\u7279\u70b9\u3002\u901a\u8fc7\u5bf98\u4e2a\u6570\u636e\u96c6\u768435\u4e2a\u4efb\u52a1\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aPromptWizard\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u7b56\u7565\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u63d0\u793a\u4f18\u5316\u65b9\u9762\u7684\u9ad8\u6548\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361](http://arxiv.org/abs/2405.18361)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\uff08AD\uff09\u4efb\u52a1\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u57fa\u4e8e\u7aef\u5230\u7aef\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u7684\u5e94\u7528\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd9\u4e9b\u6a21\u578b\u8bd5\u56fe\u878d\u5408\u5f3a\u5927\u7684\u903b\u8f91\u63a8\u7406\u548c\u8ba4\u77e5\u80fd\u529b\uff0c\u4ee5\u5b9e\u73b0\u5168\u9762\u7684\u7aef\u5230\u7aef\u89c4\u5212\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684VLM\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e2D\u89c6\u89c9\u5206\u8bcd\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5728\u5904\u7406\u4e09\u7ef4\u51e0\u4f55\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u9760\u7684\u89c4\u5212\u81f3\u5173\u91cd\u8981\u3002\u7814\u7a76\u8868\u660e\uff0c2D\u5206\u8bcd\u7684LLM\u5e76\u4e0d\u80fd\u51c6\u786e\u611f\u77e5\u4e09\u7ef4\u73af\u5883\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8eVLM\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u53ef\u9760\u6027\u7684\u8d28\u7591\u3002 \u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAtlas\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86DETR\u98ce\u683c\u76843D\u611f\u77e5\u5668\u4f5c\u4e3a3D\u5206\u8bcd\u5668\uff0c\u4e0e\u5355\u5c42\u7ebf\u6027\u6295\u5f71\u5668\u76f8\u8fde\uff0c\u5de7\u5999\u5730\u5229\u7528\u4e86\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u7684\u56fa\u6709\u7279\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u9ad8\u5206\u8fa8\u7387\u591a\u89c6\u89d2\u56fe\u50cf\u7684\u540c\u65f6\u5904\u7406\u548c\u65f6\u7a7a\u5efa\u6a21\u3002\u5c3d\u7ba1\u7b80\u5355\uff0c\u4f46Atlas\u5728NuScenes\u6570\u636e\u96c6\u4e0a\u76843D\u68c0\u6d4b\u548c\u81ea\u4e3b\u9a7e\u9a76\u89c4\u5212\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e863D\u5206\u8bcd\u7684LLM\u5bf9\u4e8e\u5b9e\u73b0\u53ef\u9760\u81ea\u52a8\u9a7e\u9a76\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2405.18359": "|**2024-05-28**|**Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs**|Somnath Kumar et.al.|[2405.18359](http://arxiv.org/abs/2405.18359)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5168\u7403\u8303\u56f4\u5185\u91cd\u5851\u4f17\u591a\u9886\u57df\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u975e\u62c9\u4e01\u5b57\u6bcd\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u65f6\u7684\u5305\u5bb9\u6027\u548c\u6548\u679c\u4ecd\u6709\u5f85\u63d0\u5347\u3002\u672c\u6587\u9488\u5bf9\u8fd9\u4e00\u5173\u952e\u6311\u6218\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u5927\u91cf\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u591a\u8bed\u8a00LLMs\u7684\u8868\u73b0\u3002\u901a\u8fc7\u7cfb\u7edf\u5730\u7814\u7a76\u548c\u8bc4\u4f30\u5404\u79cd\u8bed\u8a00\u5728\u6d41\u884c\u7684\u95ee\u9898\u89e3\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u4e0a\u7684\u6027\u80fd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b0\u9896\u6280\u672f\uff0c\u4ee5\u91ca\u653eLLMs\u5728\u591a\u5143\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u771f\u6b63\u6f5c\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u6838\u5fc3\u7b56\u7565\uff0c\u6781\u5927\u5730\u63d0\u9ad8\u4e86\u591a\u8bed\u8a00\u80fd\u529b\uff1a\u9996\u5148\uff0c\u7cbe\u5fc3\u4f18\u5316\u9002\u7528\u4e8e\u591a\u8bed\u8a00LLM\u7684\u63d0\u793a\uff0c\u6316\u6398\u5176\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5404\u8bed\u8a00\u7684\u8868\u73b0\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u6df7\u5408\u65b9\u6cd5\uff0c\u7ed3\u5408\u4e86\u591a\u8bed\u8a00\u5d4c\u5165\u7684LLM\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\uff0c\u5b9e\u73b0\u4e86\u66f4\u597d\u7684\u591a\u4efb\u52a1\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u52a8\u6001\u5b66\u4e60\u7b56\u7565\uff0c\u5b9e\u73b0\u5b9e\u65f6\u6839\u636e\u67e5\u8be2\u52a8\u6001\u9009\u62e9\u6700\u5408\u9002\u7684\u63d0\u793a\u7b56\u7565\u3001LLM\u6a21\u578b\u548c\u5d4c\u5165\u6a21\u578b\uff0c\u4ece\u800c\u6700\u5927\u5316LLM\u5728\u4e0d\u540c\u8bed\u8a00\u4e0a\u7684\u6548\u7387\uff0c\u8d85\u8d8a\u4e86\u6700\u4f73\u9759\u6001\u548c\u968f\u673a\u7b56\u7565\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u65e2\u9002\u7528\u4e8e\u79bb\u7ebf\u914d\u7f6e\u8c03\u6574\uff0c\u4e5f\u652f\u6301\u5728\u7ebf\u9002\u5e94\uff0c\u80fd\u591f\u65e0\u7f1d\u9002\u5e94\u65b0\u8bed\u8a00\u548c\u6570\u636e\u96c6\uff0c\u663e\u8457\u63a8\u52a8\u4e86\u591a\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u5728\u5404\u79cd\u8bed\u8a00\u4e2d\u7684\u8fdb\u6b65\u3002|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358](http://arxiv.org/abs/2405.18358)|null|## \u80cc\u666f \u8fd1\u671f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u878d\u5408\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7ec6\u81f4\u7684\u591a\u6a21\u6001\u7406\u89e3\u3001\u590d\u6742\u4efb\u52a1\u89e3\u6790\u4ee5\u53ca\u591a\u6a21\u6001\u4fe1\u606f\u63a8\u7406\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u672c\u6587\u63d0\u51faMMCTAgent\uff0c\u4e00\u4e2a\u65e8\u5728\u89e3\u51b3\u5f53\u524dMLLM\u5728\u590d\u6742\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u4e2d\u56fa\u6709\u5c40\u9650\u6027\u7684\u65b0\u578b\u591a\u6a21\u6001\u6279\u5224\u6027\u601d\u7ef4\u4ee3\u7406\u6846\u67b6\u3002MMCTAgent\u501f\u9274\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u548c\u6279\u5224\u6027\u601d\u8003\u7684\u7279\u70b9\uff0c\u901a\u8fc7\u8fed\u4ee3\u5206\u6790\u591a\u6a21\u6001\u4fe1\u606f\u3001\u62c6\u89e3\u95ee\u9898\u3001\u89c4\u5212\u7b56\u7565\uff0c\u5e76\u5b9e\u73b0\u52a8\u6001\u63a8\u7406\u3002 \u6b64\u5916\uff0cMMCTAgent\u8fd8\u878d\u5165\u4e86\u6279\u5224\u6027\u601d\u8003\u5143\u7d20\uff0c\u5982\u5bf9\u6700\u7ec8\u7b54\u6848\u7684\u9a8c\u8bc1\u548c\u81ea\u6211\u53cd\u601d\u3002\u5b83\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5b9a\u4e49\u57fa\u4e8e\u89c6\u89c9\u7684\u8bc4\u5224\u8005\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ece\u800c\u63d0\u5347\u51b3\u7b56\u80fd\u529b\u3002\u5728\u591a\u4e2a\u56fe\u50cf\u7406\u89e3\u548c\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u4e25\u8c28\u5730\u8bc4\u4f30\u4e86MMCTAgent\uff08\u5305\u62ec\u5e26\u8bc4\u5224\u8005\u7684\u7248\u672c\uff09\u7684\u8868\u73b0\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u5728\u8d85\u8d8a\u57fa\u7840MLLM\u548c\u5176\u4ed6\u5de5\u5177\u589e\u5f3a\u7684\u7ba1\u9053\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335](http://arxiv.org/abs/2405.19335)|null|\u6211\u4eec\u63d0\u51faX-VILA\uff0c\u4e00\u79cd\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u529f\u80fd\u7684\u591a\u6a21\u6001\u6a21\u578b\uff0c\u5b83\u878d\u5408\u4e86\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u6a21\u6001\u3002\u901a\u8fc7\u5c06\u5404\u6a21\u6001\u7279\u5b9a\u7684\u7f16\u7801\u5668\u4e0eLLM\u8f93\u5165\u5bf9\u9f50\uff0c\u5e76\u5c06\u6269\u6563\u89e3\u7801\u5668\u4e0eLLM\u8f93\u51fa\u5bf9\u9f50\uff0cX-VILA\u5b9e\u73b0\u4e86\u8de8\u6a21\u6001\u7406\u89e3\u3001\u63a8\u7406\u548c\u751f\u6210\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8de8\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6709\u6548\u7684\u4efb\u610f\u6a21\u6001\u6307\u4ee4\u8ddf\u968f\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u7684\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u5b58\u5728\u4e00\u4e2a\u5173\u952e\u95ee\u9898\uff0c\u5bfc\u81f4\u89c6\u89c9\u4fe1\u606f\u4e22\u5931\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u89c6\u89c9\u5bf9\u9f50\u673a\u5236\uff0c\u5305\u62ec\u4e00\u4e2a\u89c6\u89c9\u5d4c\u5165\u9ad8\u901f\u516c\u8def\u6a21\u5757\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u8d44\u6e90\u9ad8\u6548\u7684\u8bad\u7ec3\u7b56\u7565\uff0c\u4f7f\u5f97X-VILA\u5728\u4efb\u610f\u6a21\u6001\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5927\u5e45\u8d85\u8d8a\u5148\u524d\u7684\u65b9\u6cd5\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u5728\u7f3a\u4e4f\u7c7b\u4f3c\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0cX-VILA\u5728\u4e0d\u540c\u6a21\u6001\u95f4\u4e5f\u5c55\u73b0\u51fa\u6d8c\u73b0\u7279\u6027\u3002\u8be5\u9879\u76ee\u5c06\u5f00\u6e90\u3002|\n", "2405.19334": "|**2024-05-29**|**LLMs Meet Multimodal Generation and Editing: A Survey**|Yingqing He et.al.|[2405.19334](http://arxiv.org/abs/2405.19334)|**[link](https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u5c06\u5b83\u4eec\u4e0e\u591a\u6a21\u6001\u5b66\u4e60\u76f8\u7ed3\u5408\u3002\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8c03\u67e5\u4e3b\u8981\u96c6\u4e2d\u5728\u7406\u89e3\u4e0a\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8de8\u56fe\u50cf\u3001\u89c6\u9891\u30013D\u548c\u97f3\u9891\u7b49\u9886\u57df\u7684\u591a\u6a21\u6001\u751f\u6210\uff0c\u7279\u522b\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u9886\u57df\u4e2d\u7684\u91cc\u7a0b\u7891\u5f0f\u5de5\u4f5c\u53ca\u5176\u6280\u672f\u8fdb\u6b65\u3002\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u5173\u952e\u6280\u672f\u7ec4\u4ef6\uff0c\u4ee5\u53ca\u5728\u76f8\u5173\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5256\u6790\u4e86\u501f\u52a9\u73b0\u6709\u751f\u6210\u6a21\u578b\u8fdb\u884c\u4eba\u7c7b-\u8ba1\u7b97\u673a\u4ea4\u4e92\u7684\u5de5\u5177\u589e\u5f3a\u578b\u591a\u6a21\u6001\u4ee3\u7406\u3002\u6700\u540e\uff0c\u6211\u4eec\u5168\u9762\u8ba8\u8bba\u4e86\u4eba\u5de5\u667a\u80fd\u5b89\u5168\u7684\u8fdb\u6b65\uff0c\u5e76\u63a2\u7d22\u4e86\u65b0\u5174\u5e94\u7528\u548c\u672a\u6765\u524d\u666f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7cfb\u7edf\u800c\u6df1\u5165\u7684\u591a\u6a21\u6001\u751f\u6210\u6982\u8ff0\uff0c\u6709\u671b\u63a8\u52a8\u751f\u6210\u5185\u5bb9\u7684\u4eba\u5de5\u667a\u80fd\uff08AIGC\uff09\u548c\u4e16\u754c\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6240\u6709\u76f8\u5173\u7684\u8bba\u6587\u5217\u8868\u53ef\u5728\u627e\u5230\u3002**|\n", "2405.19333": "|**2024-05-29**|**Multi-Modal Generative Embedding Model**|Feipeng Ma et.al.|[2405.19333](http://arxiv.org/abs/2405.19333)|null|\u5728\u5927\u591a\u6570\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\uff0c\u95ee\u9898\u53ef\u4ee5\u5f52\u7ed3\u4e3a\u751f\u6210\u6216\u5d4c\u5165\u3002\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u901a\u8fc7\u5c06\u8bed\u8a00\u6a21\u5757\u5206\u89e3\u4e3a\u4e00\u4e2a\u7528\u4e8e\u751f\u6210\u7684\u6587\u672c\u89e3\u7801\u5668\u548c\u4e00\u4e2a\u7528\u4e8e\u5d4c\u5165\u7684\u6587\u672c\u7f16\u7801\u5668\u6765\u5904\u7406\u8fd9\u4e24\u79cd\u95ee\u9898\u3002\u4e3a\u4e86\u63a2\u7d22\u591a\u6a21\u6001\u65b9\u6cd5\u7684\u7b80\u7ea6\u6027\uff0c\u672c\u5de5\u4f5c\u8bd5\u56fe\u4ec5\u4f7f\u7528\u4e00\u4e2a\u6a21\u578b\u6765\u5904\u7406\u6bcf\u79cd\u6a21\u6001\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u6a21\u6001\u751f\u6210\u5d4c\u5165\u6a21\u578b\uff08MM-GEM\uff09\uff0c\u5b83\u5c06\u751f\u6210\u548c\u5d4c\u5165\u76ee\u6807\u6574\u5408\u5230\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86PoolAggregator\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u5e76\u5b9e\u73b0\u7ec6\u7c92\u5ea6\u7684\u5d4c\u5165\u548c\u751f\u6210\u80fd\u529b\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u4e24\u4e2a\u76ee\u6807\u4e4b\u95f4\u5e76\u6ca1\u6709\u663e\u8457\u51b2\u7a81\u3002\u4f8b\u5982\uff0c\u57fa\u4e8eViT-Large\u548cTinyLlama\u7684MM-GEM\u5728\u8bf8\u5982\u8de8\u6a21\u6001\u68c0\u7d22\u548c\u96f6\u6837\u672c\u5206\u7c7b\u7b49\u591a\u6a21\u6001\u5d4c\u5165\u6a21\u578b\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6027\u80fd\uff0c\u540c\u65f6\u5177\u5907\u826f\u597d\u7684\u56fe\u50cf\u63cf\u8ff0\u80fd\u529b\u3002\u6b64\u5916\uff0cMM-GEM\u80fd\u591f\u65e0\u7f1d\u6267\u884c\u533a\u57df\u7ea7\u522b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u548c\u68c0\u7d22\u4efb\u52a1\u3002\u53e6\u5916\uff0cMM-GEM\u4e2d\u7684\u5148\u8fdb\u6587\u672c\u6a21\u578b\u5bf9\u4e8e\u957f\u6587\u672c\u548c\u56fe\u50cf\u68c0\u7d22\u7684Recall@1\u6307\u6807\u5e26\u6765\u4e86\u8d85\u8fc75%\u7684\u63d0\u5347\u3002|\n", "2405.19332": "|**2024-05-29**|**Self-Exploring Language Models: Active Preference Elicitation for Online Alignment**|Shenao Zhang et.al.|[2405.19332](http://arxiv.org/abs/2405.19332)|**[link](https://github.com/shenao-zhang/selm)**|****\u6458\u8981\uff1a** \u504f\u597d\u4f18\u5316\uff0c\u7279\u522b\u662f\u5728\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u7684\u9a71\u52a8\u4e0b\uff0c\u5df2\u7ecf\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u4eba\u7c7b\u610f\u613f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u6210\u5c31\u3002\u76f8\u8f83\u4e8e\u4f7f\u7528\u56fa\u5b9a\u6570\u636e\u96c6\u7684\u79bb\u7ebf\u5bf9\u9f50\uff0c\u901a\u8fc7\u4eba\u6216\u4eba\u5de5\u667a\u80fd\u5bf9\u6a21\u578b\u751f\u6210\u7684\u53cd\u9988\u901a\u5e38\u80fd\u591f\u901a\u8fc7\u8fed\u4ee3\u8fc7\u7a0b\u63d0\u5347\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\u548cLLMs\u7684\u4e00\u81f4\u6027\u3002\u7136\u800c\uff0c\u8981\u5b9e\u73b0\u5168\u5c40\u51c6\u786e\u7684\u5956\u52b1\u6a21\u578b\uff0c\u9700\u8981\u7cfb\u7edf\u5730\u63a2\u7d22\u751f\u6210\u5404\u79cd\u5404\u6837\u7684\u54cd\u5e94\uff0c\u4ee5\u6db5\u76d6\u81ea\u7136\u8bed\u8a00\u7684\u5e7f\u9614\u7a7a\u95f4\u3002\u4ec5\u4f9d\u8d56\u6807\u51c6\u5956\u52b1\u6700\u5927\u5316LLMs\u7684\u968f\u673a\u91c7\u6837\u662f\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u7684\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5c42\u76ee\u6807\uff0c\u4e50\u89c2\u5730\u503e\u5411\u4e8e\u53ef\u80fd\u5177\u6709\u9ad8\u5956\u52b1\u7684\u54cd\u5e94\uff0c\u4ee5\u6b64\u6765\u4e3b\u52a8\u63a2\u7d22\u5206\u5e03\u5916\u533a\u57df\u3002\u901a\u8fc7\u89e3\u51b3\u5185\u5c42\u95ee\u9898\uff0c\u5229\u7528\u91cd\u65b0\u53c2\u6570\u5316\u7684\u5956\u52b1\u51fd\u6570\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aSelf-Exploring Language Models\uff08SELM\uff09\u7684\u7b97\u6cd5\u3002\u5b83\u6d88\u9664\u4e86\u5bf9\u5355\u72ec\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u7684\u9700\u6c42\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u76f4\u89c2\u7684\u76ee\u6807\u5bf9LLMs\u8fdb\u884c\u8fed\u4ee3\u66f4\u65b0\u3002\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u76f8\u6bd4\uff0cSELM\u7684\u76ee\u6807\u964d\u4f4e\u4e86\u5bf9\u672a\u89c1\u8fc7\u7684\u8fc7\u5ea6\u5ef6\u4f38\u7684\u65e0\u5dee\u522b\u504f\u597d\uff0c\u63d0\u9ad8\u4e86\u63a2\u7d22\u6548\u7387\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728Zephyr-7B-SFT\u548cLlama-3-8B-Instruct\u6a21\u578b\u4e0a\u8fdb\u884c\u5fae\u8c03\u540e\uff0cSELM\u5728MT-Bench\u548cAlpacaEval 2.0\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4ee5\u53ca\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u5404\u79cd\u6807\u51c6\u5b66\u672f\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328](http://arxiv.org/abs/2405.19328)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u89c4\u8303\u6a21\u5757\u201d\u7684\u67b6\u6784\uff0c\u5b83\u9488\u5bf9\u751f\u6210\u6027\u4ee3\u7406\u5728\u9762\u5bf9\u5305\u542b\u73b0\u6709\u89c4\u8303\u7684\u793e\u4f1a\u7ed3\u6784\u65f6\u7684\u534f\u4f5c\u6311\u6218\u3002\u8fd9\u4e9b\u4ee3\u7406\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u8bc4\u4f30\u73af\u5883\uff0c\u4f46\u5728\u5904\u7406\u590d\u6742\u793e\u4f1a\u4efb\u52a1\u65f6\uff0c\u5982\u4f55\u8bc6\u522b\u5e76\u9002\u5e94\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u89c4\u8303\u6a21\u5757\u7684\u6838\u5fc3\u5728\u4e8e\u4fc3\u8fdb\u5747\u8861\u9009\u62e9\uff0c\u501f\u9274\u5206\u7c7b\u673a\u6784\u5b9e\u73b0\u76f8\u5173\u5747\u8861\u7684\u6982\u5ff5\uff0c\u4f7f\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u540c\u4f34\u4e92\u52a8\u5b66\u4e60\u73af\u5883\u4e2d\u4e0d\u540c\u5019\u9009\u673a\u6784\u4e2d\u7684\u6743\u5a01\u6027\u3002\u901a\u8fc7\u63d0\u5347\u89c4\u8303\u80fd\u529b\uff0c\u4ee3\u7406\u53ef\u4ee5\u534f\u8c03\u5236\u88c1\u884c\u4e3a\uff0c\u8fdb\u800c\u5f71\u54cd\u793e\u4ea4\u73af\u5883\u4e2d\u7684\u57fa\u672c\u884c\u4e3a\uff0c\u4ece\u800c\u63d0\u9ad8\u6574\u4f53\u798f\u7949\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u652f\u6301\u673a\u6784\u7684\u65b0\u73af\u5883\uff0c\u5e76\u6839\u636e\u4e24\u4e2a\u4e3b\u8981\u6807\u51c6\u6765\u8bc4\u4f30\u8be5\u6846\u67b6\uff1a\u4e00\u662f\u4ee3\u7406\u80fd\u5426\u5ffd\u7565\u975e\u6743\u5a01\u673a\u6784\uff0c\u4e8c\u662f\u4ee3\u7406\u5728\u591a\u4e2a\u9009\u9879\u4e2d\u8bc6\u522b\u6743\u5a01\u673a\u6784\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u914d\u5907\u4e86\u89c4\u8303\u6a21\u5757\u7684\u4ee3\u7406\u76f8\u6bd4\u57fa\u7840\u4ee3\u7406\u80fd\u5b9e\u73b0\u66f4\u7a33\u5b9a\u7684\u5408\u4f5c\u6548\u679c\uff0c\u8fd9\u4e3a\u7814\u7a76\u8bbe\u8ba1\u8003\u8651\u89c4\u8303\u57fa\u7840\u8bbe\u65bd\u7684\u73af\u5883\u548c\u4ee3\u7406\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002|\n", "2405.19327": "|**2024-05-29**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327](http://arxiv.org/abs/2405.19327)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u51fa\u4e8e\u5546\u4e1a\u5229\u76ca\uff0c\u50cfGPT\u3001Gemini\u548cClaude\u8fd9\u6837\u7684\u6700\u5148\u8fdb\u6a21\u578b\u88ab\u5c01\u95ed\u5728\u4e13\u6709\u63a5\u53e3\u540e\uff0c\u5176\u8bad\u7ec3\u8be6\u60c5\u5e76\u672a\u516c\u5f00\u3002\u8fd1\u671f\uff0c\u4e00\u4e9b\u673a\u6784\u5f00\u6e90\u4e86\u7c7b\u4f3c\u6027\u80fd\u7684LLMs\uff0c\u5982LLaMA-3\uff0c\u4f46\u5927\u591a\u6570\u7ec6\u8282\uff08\u5982\u4e2d\u95f4\u68c0\u67e5\u70b9\u3001\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u8bad\u7ec3\u4ee3\u7801\u7b49\uff09\u4ecd\u672a\u62ab\u9732\u3002\u4e3a\u4e86\u63d0\u9ad8LLMs\u7684\u900f\u660e\u5ea6\uff0c\u7814\u7a76\u754c\u6b63\u5728\u63a8\u52a8\u771f\u6b63\u5f00\u653e\u7684\u6a21\u578b\uff0c\u5982Pythia\u3001Amber\u548cOLMo\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u4fe1\u606f\uff0c\u4fc3\u8fdb\u4e86\u5bf9\u5927\u6a21\u578b\u6027\u80fd\u3001\u5c40\u9650\u6027\u3001\u504f\u89c1\u548c\u98ce\u9669\u7684\u79d1\u5b66\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u5f00\u653e\u6a21\u578b\u5728\u63a8\u7406\u3001\u77e5\u8bc6\u548c\u7f16\u7a0b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ecd\u900a\u4e8e\u540c\u7b49\u89c4\u6a21\u7684\u5c01\u95ed\u6e90\u7801\u6a21\u578b\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u6e90\u4e86MAP-Neo\uff0c\u4e00\u4e2a\u62e5\u670970\u4ebf\u53c2\u6570\u7684\u53cc\u8bed\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u5934\u5f00\u59cb\u57284.5\u4e07\u4ebf\u9ad8\u8d28\u91cf\u4ee4\u724c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002MAP-Neo\u662f\u9996\u4e2a\u4e0e\u73b0\u6709\u9876\u7ea7LLMs\u6027\u80fd\u76f8\u5f53\u7684\u5b8c\u5168\u5f00\u6e90\u7684\u53cc\u8bed\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u4e86\u6240\u6709\u7ec6\u8282\uff0c\u5305\u62ec\u6e05\u7406\u540e\u7684\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u3001\u6570\u636e\u6e05\u6d17\u6d41\u7a0b\u3001\u68c0\u67e5\u70b9\u4ee5\u53ca\u4f18\u5316\u7684\u8bad\u7ec3\u548c\u8bc4\u4f30\u6846\u67b6\uff0c\u4ee5\u4f9b\u91cd\u73b0\u3002\u6211\u4eec\u671f\u671bMAP-Neo\u80fd\u63a8\u52a8\u5f00\u653e\u7814\u7a76\u793e\u533a\u7684\u53d1\u5c55\uff0c\u6fc0\u53d1\u66f4\u591a\u521b\u65b0\uff0c\u4fc3\u8fdbLLMs\u7684\u8fdb\u4e00\u6b65\u63d0\u5347\u3002|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326](http://arxiv.org/abs/2405.19326)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u96f6\u6837\u672c3D\u63a8\u7406\u5206\u5272\uff0c\u76ee\u6807\u662f\u9488\u5bf9\u7269\u4f53\u7684\u90e8\u4ef6\u641c\u7d22\u548c\u5b9a\u4f4d\uff0c\u8fd9\u662f\u4e00\u79cd\u8d85\u8d8a\u4e86\u5148\u524d\u7c7b\u522b\u7279\u5b9a\u76843D\u8bed\u4e49\u5206\u5272\u30013D\u5b9e\u4f8b\u5206\u5272\u548c\u5f00\u653e\u8bcd\u6c473D\u5206\u5272\u5c40\u9650\u7684\u65b0\u8303\u5f0f\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aReasoning3D\u7684\u7b80\u5355\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5b83\u80fd\u591f\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u7684\u547d\u4ee4\uff0c\u5bf93D\u7f51\u683c\u8fdb\u884c\uff08\u7ec6\u81f4\uff09\u90e8\u5206\u5206\u5272\uff0c\u540c\u65f6\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u548c\u63a8\u7406\u7b54\u6848\u7684\u4ea4\u4e92\u5f0f\u5206\u5272\u80fd\u529b\u3002\u7279\u522b\u5730\uff0cReasoning3D\u5229\u7528\u9884\u8bad\u7ec3\u76842D\u5206\u5272\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\uff0c\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u89e3\u6790\u7528\u6237\u8f93\u5165\u67e5\u8be2\u3002\u5df2\u6709\u7814\u7a76\u8868\u660e\uff0c\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u8d4b\u4e88\u57fa\u7840\u6a21\u578b\u4e16\u754c\u77e5\u8bc6\u7684\u5148\u9a8c\uff0c\u4f7f\u5176\u80fd\u591f\u7406\u89e3\u590d\u6742\u6307\u4ee4\uff0c\u8fd9\u4f7f\u5f97\u6211\u4eec\u5728\u4f9d\u8d56\u6709\u96503D\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u201c\u5206\u5272\u4efb\u4f55\u4e1c\u897f\u201d\uff08\u6e90\u6548\u7387\u9ad8\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u6cdb\u5316\u6027\uff0c\u80fd\u6709\u6548\u6839\u636e\u9690\u6027\u6587\u672c\u67e5\u8be2\u57283D\u5bf9\u8c61\uff083D\u7f51\u683c\uff09\u4e2d\u5b9a\u4f4d\u548c\u7a81\u51fa\u663e\u793a\u90e8\u5206\uff0c\u5305\u62ec\u53ef\u52a83D\u5bf9\u8c61\u548c\u771f\u5b9e\u4e16\u754c\u7684\u626b\u63cf\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65e0\u76d1\u7763\u65b9\u6cd5\u4fbf\u4e8e\u5feb\u901f\u90e8\u7f72\uff0c\u5e76\u4e3a\u672a\u67653D\uff08\u8bed\u4e49\uff09\u5bf9\u8c61\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\uff0c\u5982\u673a\u5668\u4eba\u3001\u7269\u4f53\u64cd\u4f5c\u3001\u90e8\u4ef6\u7ec4\u88c5\u3001\u81ea\u52a8\u9a7e\u9a76\u5e94\u7528\u3001\u589e\u5f3a\u73b0\u5b9e\u548c\u865a\u62df\u73b0\u5b9e\uff08AR/VR\uff09\u3001\u4ee5\u53ca\u533b\u7597\u5e94\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u884c\u7684\u901a\u7528\u57fa\u51c6\u3002\u4ee3\u7801\u3001\u6a21\u578b\u6743\u91cd\u3001\u90e8\u7f72\u6307\u5357\u548c\u8bc4\u4f30\u534f\u8bae\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttp://tianrun-chen.github.io/Reason3D/\u3002|\n", "2405.19325": "|**2024-05-29**|**Nearest Neighbor Speculative Decoding for LLM Generation and Attribution**|Minghan Li et.al.|[2405.19325](http://arxiv.org/abs/2405.19325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4f1a\u4ea7\u751f\u865a\u6784\u5185\u5bb9\u4e14\u7f3a\u4e4f\u5bf9\u751f\u6210\u6587\u672c\u7684\u6765\u6e90\u6807\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u534a\u53c2\u6570\u5316\u8bed\u8a00\u6a21\u578b\u5982kNN-LM\u901a\u8fc7\u5728\u975e\u53c2\u6570\u6570\u636e\u5b58\u50a8\u4e2d\u5bfb\u627e\u4e0e\u7ed9\u5b9a\u63d0\u793a\u6700\u63a5\u8fd1\u7684\u90bb\u5c45\u6765\u6539\u8fdbLM\u8f93\u51fa\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u901a\u5e38\u8f83\u6162\uff0c\u751f\u6210\u7684\u6587\u672c\u6d41\u7545\u5ea6\u4e0d\u9ad8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u534a\u53c2\u6570\u5316\u8bed\u8a00\u5efa\u6a21\u65b9\u6cd5\u2014\u2014Nearest Neighbor Speculative Decoding\uff08NEST\uff09\uff0c\u5b83\u80fd\u591f\u5c06\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4efb\u610f\u957f\u5ea6\u6587\u672c\u7247\u6bb5\u878d\u5165\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u5176\u6e90\u5934\u7684\u6807\u6ce8\u3002NEST\u5728\u6bcf\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u8fdb\u884c\u57fa\u4e8e\u4ee4\u724c\u7684\u68c0\u7d22\uff0c\u8ba1\u7b97\u51fa\u4e00\u4e2a\u534a\u53c2\u6570\u6df7\u5408\u5206\u5e03\uff0c\u5e76\u4ece\u8bed\u6599\u5e93\u4e2d\u8bc6\u522b\u51fa\u53ef\u80fd\u7684\u8fde\u7eed\u6587\u672c\u6bb5\u843d\u6269\u5c55\u3002\u5b83\u91c7\u7528\u4e00\u79cd\u8fd1\u4f3c\u63a8\u6d4b\u89e3\u7801\u7b56\u7565\uff0c\u63a5\u53d7\u68c0\u7d22\u5230\u7684\u7247\u6bb5\u524d\u7f00\u6216\u751f\u6210\u65b0\u7684\u4ee4\u724c\u3002NEST\u663e\u8457\u63d0\u9ad8\u4e86\u57fa\u7840LM\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\u7684\u751f\u6210\u8d28\u91cf\u548c\u6765\u6e90\u6807\u6ce8\u7387\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684kNN-LM\u65b9\u6cd5\uff0c\u5e76\u5728\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u68c0\u7d22\u589e\u5f3a\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u529b\u3002\u6b64\u5916\uff0cNEST\u5927\u5e45\u63d0\u5347\u4e86\u751f\u6210\u901f\u5ea6\uff0c\u5f53\u5e94\u7528\u4e8eLlama-2-Chat 70B\u65f6\uff0c\u63a8\u7406\u65f6\u95f4\u63d0\u9ad8\u4e861.8\u500d\u3002|\n", "2405.19323": "|**2024-05-29**|**Are Large Language Models Chameleons?**|Mingmeng Geng et.al.|[2405.19323](http://arxiv.org/abs/2405.19323)|null|\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u62e5\u6709\u81ea\u5df1\u7684\u4e16\u754c\u89c2\u548c\u4eba\u683c\u503e\u5411\uff1f\u7814\u7a76\u4eba\u5458\u8fdb\u884c\u4e86\u8d85\u8fc7\u4e00\u767e\u4e07\u6b21\u7684\u5b9e\u9a8c\uff0c\u8ba9LLMs\u56de\u7b54\u4e3b\u89c2\u95ee\u9898\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u6a21\u578b\u7684\u54cd\u5e94\u4e0e\u6b27\u6d32\u793e\u4f1a\u8c03\u67e5\uff08ESS\uff09\u7684\u5b9e\u9645\u6570\u636e\u8fdb\u884c\u6bd4\u8f83\uff0c\u7ed3\u679c\u663e\u793a\u63d0\u793a\u5bf9\u504f\u89c1\u548c\u53d8\u5f02\u6027\u6709\u663e\u8457\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u91cd\u5927\u7684\u6587\u5316\u3001\u5e74\u9f84\u548c\u6027\u522b\u504f\u5dee\u3002\u6587\u4e2d\u8ba8\u8bba\u4e86\u8bc4\u4f30LLMs\u4e0e\u8c03\u67e5\u6570\u636e\u5dee\u5f02\u7684\u65b9\u6cd5\uff0c\u5982\u8ba1\u7b97\u52a0\u6743\u5e73\u5747\u503c\u4ee5\u53ca\u4e00\u4e2a\u65b0\u63d0\u51fa\u7684\u57fa\u4e8eJaccard\u76f8\u4f3c\u6027\u7684\u6d4b\u91cf\u6307\u6807\u3002\u7814\u7a76\u8005\u5f3a\u8c03\uff0c\u5728\u5229\u7528LLMs\u6a21\u62df\u4e2a\u4f53\u51b3\u7b56\u6216\u96c6\u4f53\u884c\u4e3a\u4e4b\u524d\uff0c\u5206\u6790\u63d0\u793a\u7684\u7a33\u5065\u6027\u548c\u53d8\u5f02\u6027\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u6a21\u4eff\u80fd\u529b\u5145\u5176\u91cf\u53ea\u80fd\u8bf4\u662f\u8fd1\u4f3c\u7684\u3002|\n", "2405.19320": "|**2024-05-29**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320](http://arxiv.org/abs/2405.19320)|null|**\u6458\u8981\uff1a** \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5728\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u90fd\u5904\u4e8e\u6d3b\u8dc3\u7684\u7814\u7a76\u9636\u6bb5\uff0c\u4f46\u5173\u952e\u6311\u6218\u4e4b\u4e00\u662f\u5982\u4f55\u5728\u5904\u7406\u4ece\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u7684\u5956\u52b1\u51fd\u6570\u4e0d\u786e\u5b9a\u6027\u65f6\u3002\u5c3d\u7ba1\u6807\u51c6\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4e2d\u4e50\u89c2\u4e3b\u4e49\u6216\u60b2\u89c2\u4e3b\u4e49\u7684\u539f\u5219\u5df2\u5e7f\u4e3a\u4eba\u77e5\uff0c\u4f46\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5b9e\u73b0\u65e2\u5b9e\u7528\u53c8\u57fa\u4e8e\u7406\u8bba\u7684\u65b9\u6cd5\u5c1a\u4e0d\u6210\u719f\uff0c\u56e0\u4e3a\u6784\u5efa\u7f6e\u4fe1\u533a\u95f4\u7684\u6807\u51c6\u6280\u672f\u5728\u5904\u7406\u4efb\u610f\u7b56\u7565\u53c2\u6570\u5316\u65f6\u53d8\u5f97\u96be\u4ee5\u5904\u7406\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5728\u7ebf\u548c\u79bb\u7ebfRLHF\u65b9\u6cd5\u2014\u2014\u4ef7\u503c\u6fc0\u52b1\u7684\u504f\u597d\u4f18\u5316\uff08VPO\uff09\u3002VPO\u901a\u8fc7\u5728\u6700\u5927\u4f3c\u7136\u4f30\u8ba1\u7684\u5956\u52b1\u51fd\u6570\u4e2d\u6dfb\u52a0\u76f8\u5e94\u7684\u503c\u51fd\u6570\u7684\u6b63\u5219\u5316\uff0c\u4ee5\u6307\u793a\u9009\u62e9\u4e50\u89c2\u4e3b\u4e49\u8fd8\u662f\u60b2\u89c2\u4e3b\u4e49\uff0c\u5b9e\u73b0\u4e86\u8fd9\u4e00\u76ee\u6807\u3002\u6b64\u5916\uff0cVPO\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u5e76\u5229\u7528\u9690\u5f0f\u5956\u52b1\u5efa\u6a21\uff0c\u56e0\u6b64\u5176RLHF\u7ba1\u9053\u4e0e\u76f4\u63a5\u504f\u597d\u4f18\u5316\u66f4\u4e3a\u7b80\u5355\u3002\u5bf9\u4e8e\u5728\u7ebf\u548c\u79bb\u7ebf\u8bbe\u7f6e\uff0cVPO\u63d0\u4f9b\u4e86\u7406\u8bba\u4fdd\u8bc1\uff0c\u5176\u6536\u655b\u901f\u5ea6\u4e0e\u6807\u51c6RL\u76f8\u5f53\u3002\u5b9e\u9a8c\u5728\u6587\u672c\u6458\u8981\u548c\u5bf9\u8bdd\u4efb\u52a1\u4e0a\u9a8c\u8bc1\u4e86VPO\u7684\u5b9e\u7528\u6027\u4e0e\u6709\u6548\u6027\u3002|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340](http://arxiv.org/abs/2405.20340)|**[link](https://github.com/IDEA-Research/MotionLLM)**|\u8fd9\u9879\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\uff08\u89c6\u9891\u548c\u52a8\u4f5c\u6a21\u6001\uff09\u4e0b\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5f3a\u5927\u529f\u80fd\u3002\u4e0e\u4e13\u4e3a\u5355\u6a21\u6001\uff08\u89c6\u9891\u6216\u52a8\u4f5c\uff09\u8bbe\u8ba1\u7684\u6700\u65b0LLMs\u4e0d\u540c\uff0c\u6211\u4eec\u8ba4\u4e3a\u7406\u89e3\u4eba\u7c7b\u884c\u4e3a\u9700\u8981\u5bf9\u89c6\u9891\u548c\u52a8\u4f5c\u5e8f\u5217\uff08\u5982SMPL\u5e8f\u5217\uff09\u8fdb\u884c\u8054\u5408\u5efa\u6a21\uff0c\u4ee5\u6709\u6548\u6355\u6349\u7cbe\u7ec6\u7684\u8eab\u4f53\u90e8\u4f4d\u52a8\u6001\u548c\u8bed\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faMotionLLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u4eba\u7c7b\u52a8\u4f5c\u7406\u89e3\u3001\u63cf\u8ff0\u548c\u63a8\u7406\u3002MotionLLM\u91c7\u7528\u4e86\u4e00\u4f53\u5316\u7684\u89c6\u9891-\u52a8\u4f5c\u8bad\u7ec3\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u7c97\u7c92\u5ea6\u7684\u89c6\u9891-\u6587\u672c\u6570\u636e\u548c\u7cbe\u7ec6\u52a8\u4f5c-\u6587\u672c\u6570\u636e\u7684\u4f18\u52bf\uff0c\u4ee5\u83b7\u53d6\u4e30\u5bcc\u7684\u7a7a\u95f4-\u65f6\u95f4\u6d1e\u5bdf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684MoVid\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u591a\u6837\u5316\u7684\u89c6\u9891\u3001\u52a8\u4f5c\u3001caption\u548c\u6307\u4ee4\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86MoVid-Bench\uff0c\u5b83\u5177\u6709\u7cbe\u5fc3\u7684\u624b\u52a8\u6807\u6ce8\uff0c\u4ee5\u66f4\u597d\u5730\u8bc4\u4f30\u5728\u89c6\u9891\u548c\u52a8\u4f5c\u4e0a\u7684\u4eba\u7c7b\u884c\u4e3a\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5145\u5206\u5c55\u793a\u4e86MotionLLM\u5728caption\u751f\u6210\u3001\u7a7a\u95f4-\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u63a8\u7406\u80fd\u529b\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002|\n", "2405.20339": "|**2024-05-30**|**Visual Perception by Large Language Model's Weights**|Feipeng Ma et.al.|[2405.20339](http://arxiv.org/abs/2405.20339)|null|\u8fd9\u7bc7\u8bba\u6587\u7684\u80cc\u666f\u662f\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u91c7\u7528\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5373\u5c06\u89c6\u89c9\u4fe1\u606f\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u7a7a\u95f4\u5bf9\u9f50\uff0c\u7136\u540e\u5c06\u89c6\u89c9\u4ee4\u724c\u4e0e\u6587\u672c\u4ee4\u724c\u5408\u5e76\uff0c\u5f62\u6210\u7edf\u4e00\u7684\u5e8f\u5217\u8f93\u5165\u7ed9\u8bed\u8a00\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7531\u4e8e\u589e\u52a0\u4e86\u7531\u89c6\u89c9\u4ee4\u724c\u5bfc\u81f4\u7684\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\uff0c\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u53c2\u6570\u7a7a\u95f4\u5bf9\u9f50\u8303\u5f0f\uff0c\u901a\u8fc7\u5c06\u89c6\u89c9\u4fe1\u606f\u8868\u793a\u4e3a\u6a21\u578b\u6743\u91cd\u6765\u5904\u7406\u3002\u5bf9\u4e8e\u6bcf\u4e2a\u8f93\u5165\u56fe\u50cf\uff0c\u9996\u5148\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u63d0\u53d6\u7279\u5f81\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u7279\u5f81\u8f6c\u6362\u4e3a\u611f\u77e5\u6743\u91cd\uff0c\u5e76\u5c06\u5176\u4e0e\u8bed\u8a00\u6a21\u578b\u7684\u6743\u91cd\u878d\u5408\u3002\u8fd9\u6837\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u8f93\u5165\u65e0\u9700\u89c6\u89c9\u4ee4\u724c\uff0c\u4ece\u800c\u7f29\u77ed\u4e86\u8f93\u5165\u5e8f\u5217\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6548\u7387\u3002 \u57fa\u4e8e\u8fd9\u4e00\u7406\u5ff5\uff0c\u8bba\u6587\u63d0\u51fa\u4e86VLoRA\u6a21\u578b\uff0c\u5176\u4e2d\u5305\u542b\u4e00\u4e2a\u611f\u77e5\u6743\u91cd\u751f\u6210\u5668\u3002\u8be5\u751f\u6210\u5668\u8bbe\u8ba1\u6210\u80fd\u591f\u5c06\u89c6\u89c9\u7279\u5f81\u8f6c\u5316\u4e3a\u5177\u6709\u4f4e\u79e9\u7279\u6027\u7684\u611f\u77e5\u6743\u91cd\uff0c\u7c7b\u4f3c\u4e8eLoRA\uff08\u4f4e\u79e9\u81ea\u9002\u5e94\u8bad\u7ec3\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1VLoRA\u5728\u591a\u79cd\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u4e0e\u73b0\u6709MLLMs\u76f8\u5f53\u7684\u6027\u80fd\uff0c\u4f46\u5176\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8ba1\u7b97\u6210\u672c\u663e\u8457\u964d\u4f4e\u3002\u8bba\u6587\u627f\u8bfa\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2405.20335": "|**2024-05-30**|**Xwin-LM: Strong and Scalable Alignment Practice for LLMs**|Bolin Ni et.al.|[2405.20335](http://arxiv.org/abs/2405.20335)|**[link](https://github.com/xwin-lm/xwin-lm)**|**\u672c\u6587\u4ecb\u7ecdXwin-LM\uff0c\u4e00\u4e2a\u4e13\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u7684\u5168\u9762\u5bf9\u9f50\u65b9\u6cd5\u5957\u4ef6\u3002\u5b83\u6db5\u76d6\u4e86\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3001\u5956\u52b1\u5efa\u6a21\uff08RM\uff09\u3001\u62d2\u7edd\u91c7\u6837\u5fae\u8c03\uff08RS\uff09\u548c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u7b49\u591a\u79cd\u5173\u952e\u6280\u672f\u3002\u4e3b\u8981\u7ec4\u6210\u90e8\u5206\u5305\u62ec\uff1a(1) \u4f7f\u7528\u9ad8\u8d28\u91cf\u6307\u4ee4\u6570\u636e\u8fdb\u884c\u521d\u59cb\u5fae\u8c03\u7684Xwin-LM-SFT\uff1b(2) \u7531GPT-4\u7cbe\u5fc3\u6807\u6ce8\u7684\u5927\u578b\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6Xwin-Pair\uff1b(3) \u57287B\u300113B\u548c70B\u53c2\u6570\u89c4\u6a21\u4e0a\u8bad\u7ec3\u7684Xwin-RM\u5956\u52b1\u6a21\u578b\uff1b(4) \u6bcf\u4e2a\u63d0\u793a\u5173\u805464\u4e2a\u72ec\u7279\u54cd\u5e94\u7684\u591awise\u504f\u597d\u6570\u636e\u96c6Xwin-Set\uff0c\u8fd9\u4e9b\u54cd\u5e94\u7531Xwin-LM-SFT\u751f\u6210\u5e76\u7531Xwin-RM\u8bc4\u5206\uff1b(5) \u4f7f\u7528Xwin-Set\u4e2d\u6700\u9ad8\u5f97\u5206\u54cd\u5e94\u8fdb\u884c\u5fae\u8c03\u7684Xwin-LM-RS\u6a21\u578b\uff1b(6) \u901a\u8fc7DPO\u7b97\u6cd5\u5728Xwin-Set\u4e0a\u8fdb\u4e00\u6b65\u4f18\u5316\u7684Xwin-LM-DPO\u6a21\u578b\u3002\u6211\u4eec\u5728AlpacaEval\u548cMT-bench\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\u4e86\u6574\u4e2a\u7ba1\u9053\u7684\u7a33\u5b9a\u4e14\u663e\u8457\u6539\u8fdb\uff0c\u8bc1\u660e\u4e86Xwin-LM\u7684\u5f3a\u5927\u548c\u53ef\u6269\u5c55\u6027\u3002\u6211\u4eec\u5c06\u5728https://github.com/Xwin-LM/Xwin-LM\u7684\u4ed3\u5e93\u4e2d\u6301\u7eed\u66f4\u65b0\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u7814\u7a76\u3002**|\n", "2405.20319": "|**2024-05-31**|**ParSEL: Parameterized Shape Editing with Language**|Aditya Ganeshan et.al.|[2405.20319](http://arxiv.org/abs/2405.20319)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aParSEL\u7684\u7cfb\u7edf\uff0c\u5b83\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u5b9e\u73b0\u9ad8\u8d28\u91cf3D\u8d44\u4ea7\u7684\u53ef\u63a7\u7f16\u8f91\u3002\u9762\u5bf9\u81ea\u7136\u8bed\u8a00\u5728\u7cbe\u786e\u64cd\u63a7\u4e0a\u7684\u5c40\u9650\u6027\uff0cParSEL\u63a5\u6536\u4e00\u4e2a\u5206\u5272\u76843D\u7f51\u683c\u548c\u7f16\u8f91\u8bf7\u6c42\uff0c\u751f\u6210\u4e00\u4e2a\u53c2\u6570\u5316\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u7528\u6237\u53ef\u4ee5\u8c03\u6574\u7a0b\u5e8f\u53c2\u6570\uff0c\u7cbe\u7ec6\u5730\u63a2\u7d22\u5f62\u72b6\u53d8\u5316\uff0c\u63a7\u5236\u7f16\u8f91\u5e45\u5ea6\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u7406\u89e3\u521d\u59cb\u7f16\u8f91\u6307\u4ee4\uff0c\u4f46\u53d1\u73b0\u5b83\u4eec\u5728\u63a8\u65ad\u5b8c\u6574\u7f16\u8f91\u7a0b\u5e8f\u65f6\u5e38\u5e38\u4e0d\u8db3\uff0c\u4ea7\u751f\u7684\u7ed3\u679c\u53ef\u80fd\u8fdd\u53cd\u5f62\u72b6\u903b\u8f91\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u5206\u6790\u6027\u7f16\u8f91\u4f20\u64ad\uff08Analytical Edit Propagation\uff0cAEP\uff09\u7b97\u6cd5\uff0c\u5b83\u4ece\u521d\u59cb\u7f16\u8f91\u79cd\u5b50\u5f00\u59cb\uff0c\u901a\u8fc7\u8ba1\u7b97\u673a\u4ee3\u6570\u7cfb\u7edf\u8fdb\u884c\u51e0\u4f55\u5206\u6790\uff0c\u5bfb\u627e\u4e0e\u6f5c\u5728\u7528\u6237\u7f16\u8f91\u517c\u5bb9\u7684\u5206\u6790\u6027\u7f16\u8f91\u64cd\u4f5c\uff0c\u4ee5\u751f\u6210\u5b8c\u6574\u7684\u7f16\u8f91\u7a0b\u5e8f\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u5176\u4ed6\u65b9\u6848\uff0cParSEL\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u8bf7\u6c42\u6709\u6548\u5730\u5b9e\u73b0\u4e86\u5bf93D\u5bf9\u8c61\u7684\u53ef\u63a7\u7f16\u8f91\u3002|\n", "2405.20318": "|**2024-05-30**|**CausalQuest: Collecting Natural Causal Questions for AI Agents**|Roberto Ceraolo et.al.|[2405.20318](http://arxiv.org/abs/2405.20318)|**[link](https://github.com/roberto-ceraolo/causal-quest)**|**\u4eba\u7c7b\u5929\u751f\u5c31\u6709\u5bfb\u6c42\u56e0\u679c\u5173\u7cfb\u7684\u9a71\u52a8\u529b\uff0c\u65e0\u8bba\u662f\u51fa\u4e8e\u597d\u5947\u5fc3\u8fd8\u662f\u7279\u5b9a\u76ee\u6807\u3002\u4e3a\u4e86\u5f00\u53d1\u80fd\u5904\u7406\u8fd9\u79cd\u4eba\u7c7b\u672c\u6027\u8ffd\u6c42\u7684AI\u4ee3\u7406\uff0c\u6211\u4eec\u6025\u9700\u4e00\u4e2a\u5168\u9762\u7684\u81ea\u7136\u56e0\u679c\u95ee\u9898\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u8981\u4e48\u5305\u542b\u4eba\u5de5\u5236\u9020\u7684\u95ee\u9898\uff0c\u65e0\u6cd5\u53cd\u6620\u5b9e\u9645AI\u5e94\u7528\u573a\u666f\uff0c\u8981\u4e48\u5728\u7279\u5b9a\u6765\u6e90\u7684\u95ee\u9898\u8986\u76d6\u4e0a\u6709\u9650\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CausalQuest\uff0c\u8fd9\u662f\u4e00\u4e2a\u6e90\u81ea\u793e\u4ea4\u7f51\u7edc\u3001\u641c\u7d22\u5f15\u64ce\u548cAI\u52a9\u624b\u768413,500\u4e2a\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u56e0\u679c\u95ee\u9898\uff0c\u5e76\u5efa\u7acb\u4e86\u66f4\u7ec6\u81f4\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u5458\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\uff0c\u6211\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u7cbe\u5fc3\u6807\u6ce8\u3002\u7814\u7a76\u53d1\u73b0\uff0c42%\u7684\u4eba\u7c7b\u63d0\u95ee\u5b9e\u9645\u4e0a\u662f\u5173\u4e8e\u56e0\u679c\u7684\uff0c\u5927\u90e8\u5206\u662f\u60f3\u4e86\u89e3\u7ed9\u5b9a\u7ed3\u679c\u80cc\u540e\u7684\u539f\u56e0\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u9ad8\u6548\u7684\u4e8c\u5206\u7c7b\u5668\uff08\u9ad8\u8fbe28.5\u4ebf\u53c2\u6570\uff09\uff0c\u7528\u4e8e\u8bc6\u522b\u56e0\u679c\u95ee\u9898\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6027\u80fd\uff0cF1\u5206\u6570\u9ad8\u8fbe0.877\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u4e30\u5bcc\u7684\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u8fd9\u4e9b\u90fd\u53ef\u4ee5\u57fa\u4e8e\u6211\u4eec\u7684\u6570\u636e\u548c\u6a21\u578b\u8fdb\u884c\u6269\u5c55\u3002**|\n", "2405.20315": "|**2024-05-30**|**ANAH: Analytical Annotation of Hallucinations in Large Language Models**|Ziwei Ji et.al.|[2405.20315](http://arxiv.org/abs/2405.20315)|**[link](https://github.com/open-compass/anah)**|**### \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u201c\u5e7b\u89c9\u201d\u95ee\u9898\u5bf9\u4e8e\u5176\u5e7f\u6cdb\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5bf9\u8fd9\u4e00\u95ee\u9898\u7684\u7ec6\u81f4\u6d4b\u91cf\u5728\u793e\u533a\u4e2d\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a$\\textbf{ANAH}$\u7684\u53cc\u8bed\u6570\u636e\u96c6\uff0c\u4e13\u6ce8\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2d\u7684LLM\u5e7b\u89c9\u5206\u6790\u3002ANAH\u4e2d\u7684\u6bcf\u4e2a\u7b54\u6848\u53e5\u5b50\u90fd\u7ecf\u8fc7\u4e25\u8c28\u6807\u6ce8\uff0c\u5305\u62ec\u53c2\u8003\u7247\u6bb5\u68c0\u7d22\u3001\u5e7b\u89c9\u7c7b\u578b\u7684\u5224\u65ad\u4ee5\u53ca\u9519\u8bef\u5185\u5bb9\u7684\u4fee\u6b63\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea612,000\u4e2a\u53e5\u7ea7\u6ce8\u91ca\uff0c\u6db5\u76d6\u4e86\u5927\u7ea64,300\u4e2aLLM\u54cd\u5e94\uff0c\u6d89\u53ca\u8d85\u8fc7700\u4e2a\u4e3b\u9898\uff0c\u901a\u8fc7\u4eba\u673a\u4ea4\u4e92\u5f0f\u6d41\u7a0b\u6784\u5efa\u800c\u6210\u3002\u7531\u4e8e\u5e7b\u89c9\u6ce8\u91ca\u7684\u7cbe\u7ec6\u7c92\u5ea6\uff0c\u6211\u4eec\u53ef\u4ee5\u5b9a\u91cf\u786e\u8ba4LLMs\u7684\u5e7b\u89c9\u95ee\u9898\u968f\u7740\u7b54\u6848\u7684\u6269\u5c55\u800c\u9010\u6e10\u589e\u52a0\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u6807\u6ce8\u5668\u3002 ### \u4efb\u52a1 \u6211\u4eec\u6784\u5efa\u4e86\u5927\u7ea612,000\u6761\u53e5\u5b50\u7ea7\u522b\u7684\u6ce8\u91ca\uff0c\u9488\u5bf9\u7ea64,300\u4e2aLLM\u751f\u6210\u7684\u56de\u7b54\uff0c\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u4e3b\u9898\u3002\u8fd9\u4e2a\u540d\u4e3aANAH\u7684\u6570\u636e\u96c6\u901a\u8fc7\u4eba\u7c7b\u53c2\u4e0e\u7684\u6d41\u7a0b\u7cbe\u5fc3\u8bbe\u8ba1\uff0c\u65e8\u5728\u63d0\u4f9b\u5173\u4e8e\u751f\u6210\u5f0f\u95ee\u7b54\u4e2dLLMs\u5e7b\u89c9\u7684\u8be6\u5c3d\u5206\u6790\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u5e7b\u89c9\u6807\u6ce8\uff0c\u6211\u4eec\u80fd\u591f\u91cf\u5316\u5730\u9a8c\u8bc1LLMs\u5728\u751f\u6210\u7b54\u6848\u65f6\u5e7b\u89c9\u95ee\u9898\u7684\u7d2f\u79ef\uff0c\u5e76\u5229\u7528ANAH\u6765\u8bad\u7ec3\u548c\u8bc4\u4f30\u5e7b\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u6df1\u5165\u7814\u7a76\u4e86\u751f\u6210\u5f0f\u548c\u533a\u5206\u6027\u6807\u6ce8\u5668\uff0c\u5e76\u53d1\u73b0\u5c3d\u7ba1\u5f00\u6e90LLMs\u5728\u7cbe\u7ec6\u5e7b\u89c9\u6807\u6ce8\u65b9\u9762\u9762\u4e34\u6311\u6218\uff0c\u4f46\u4f7f\u7528ANAH\u8bad\u7ec3\u7684\u751f\u6210\u5f0f\u6807\u6ce8\u5668\u80fd\u591f\u8d85\u8d8a\u6240\u6709\u5f00\u6e90\u6a21\u578b\uff0c\u751a\u81f3\u63a5\u8fd1GPT-3.5\u7684\u8868\u73b0\uff0c\u5e76\u5c55\u73b0\u51fa\u5728\u672a\u89c1\u8fc7\u95ee\u9898\u4e0a\u7684\u826f\u597d\u6cdb\u5316\u80fd\u529b\u3002**|\n", "2405.20313": "|**2024-05-30**|**Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation**|Guillaume Huguet et.al.|[2405.20313](http://arxiv.org/abs/2405.20313)|null|\u86cb\u767d\u8d28\u5728\u51e0\u4e4e\u6240\u6709\u7684\u751f\u7269\u8fc7\u7a0b\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\uff0c\u5176\u591a\u6837\u5316\u7684\u529f\u80fd\u6e90\u4e8e\u590d\u6742\u7684\u4e09\u7ef4\u7ed3\u6784\uff0c\u800c\u8fd9\u4e9b\u7ed3\u6784\u53c8\u7531\u6c28\u57fa\u9178\u5e8f\u5217\u51b3\u5b9a\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6c28\u57fa\u9178\u5e8f\u5217\u4e30\u5bcc\u7684\u751f\u7269\u5b66\u5f52\u7eb3\u504f\u7f6e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5e8f\u5217\u6761\u4ef6\u7684SE(3)\u7b49\u53d8\u6d41\u5339\u914d\u6a21\u578b\u2014\u2014FoldFlow-2\uff0c\u7528\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u751f\u6210\u3002\u4e0eFoldFlow\u5bb6\u65cf\u7684\u5148\u524d\u6a21\u578b\u76f8\u6bd4\uff0cFoldFlow-2\u5f15\u5165\u4e86\u65b0\u9896\u7684\u67b6\u6784\u7279\u6027\uff0c\u5305\u62ec\u7528\u4e8e\u7f16\u7801\u5e8f\u5217\u7684\u86cb\u767d\u8d28\u5927\u8bed\u8a00\u6a21\u578b\u3001\u7ed3\u5408\u7ed3\u6784\u548c\u5e8f\u5217\u8868\u793a\u7684\u65b0\u591a\u6a21\u6001\u878d\u5408\u4e3b\u5e72\uff0c\u4ee5\u53ca\u57fa\u4e8e\u51e0\u4f55\u53d8\u6362\u5668\u7684\u89e3\u7801\u5668\u3002\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u6837\u672c\u7684\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u2014\u2014\u8fd9\u5bf9\u65b0\u836f\u8bbe\u8ba1\u81f3\u5173\u91cd\u8981\u2014\u2014\u6211\u4eec\u5728\u6bd4\u5148\u524d\u5de5\u4f5c\u4f7f\u7528\u7684PDB\u6570\u636e\u96c6\u5927\u4e00\u4e2a\u6570\u91cf\u7ea7\u7684\u65b0\u6570\u636e\u96c6\u4e0a\u5927\u89c4\u6a21\u8bad\u7ec3FoldFlow-2\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u4e86\u5df2\u77e5\u7684PDB\u86cb\u767d\u8d28\u548c\u901a\u8fc7\u8fc7\u6ee4\u83b7\u5f97\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u7ed3\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u5f15\u5165\u5f3a\u5316\u5fae\u8c03\uff08Reinforced Finetuning\uff0c\u7b80\u79f0ReFT\uff09\u76ee\u6807\uff0c\u4f7fFoldFlow-2\u80fd\u591f\u9002\u5e94\u4efb\u610f\u5956\u52b1\uff0c\u5982\u63d0\u9ad8\u4e8c\u7ea7\u7ed3\u6784\u591a\u6837\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFoldFlow-2\u8d85\u8d8a\u4e86\u73b0\u6709\u57fa\u4e8e\u86cb\u767d\u8d28\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\u7684\u72b6\u6001\uff0c\u65e0\u8bba\u5728\u65e0\u6761\u4ef6\u751f\u6210\u8fd8\u662f\u5728\u8bbe\u8ba1\u6027\u3001\u591a\u6837\u6027\u548c\u65b0\u9896\u6027\u65b9\u9762\uff0c\u90fd\u4f18\u4e8eRFDiffusion\uff0c\u4e14\u5728\u86cb\u767d\u8d28\u957f\u5ea6\u7684\u5404\u7c7b\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u7b49\u6e29\u6784\u8c61\u91c7\u6837\u4efb\u52a1\u4e0a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684FoldFlow-2\u5728\u8bf8\u5982VHH\u7eb3\u7c73\u6297\u4f53\u9aa8\u67b6\u8bbe\u8ba1\u7b49\u5177\u6709\u6311\u6218\u6027\u7684\u6761\u4ef6\u8bbe\u8ba1\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002|\n", "2405.20309": "|**2024-05-30**|**Large Language Models Can Self-Improve At Web Agent Tasks**|Ajay Patel et.al.|[2405.20309](http://arxiv.org/abs/2405.20309)|**[link](https://github.com/AjayP13/webdreamer)**|\u5728\u590d\u6742\u7684\u73af\u5883\u4e2d\uff0c\u5982\u7f51\u7edc\u6d4f\u89c8\u5668\uff0c\u8bad\u7ec3\u6a21\u578b\u4f5c\u4e3a\u80fd\u591f\u6709\u6548\u5bfc\u822a\u548c\u6267\u884c\u52a8\u4f5c\u7684\u4ee3\u7406\u901a\u5e38\u5177\u6709\u6311\u6218\u6027\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u7f3a\u4e4f\u8bad\u7ec3\u6570\u636e\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63d0\u793a\u4ee5\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u6765\u5728\u65b0\u73af\u5883\u4e2d\u5bfc\u822a\u7684\u80fd\u529b\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0cLLMs\u53ef\u4ee5\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\uff08\u5373\u5728\u5176\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u4e0a\u5fae\u8c03\uff09\u6765\u8d85\u8d8a\u57fa\u7840\u6027\u80fd\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76LLMs\u5728\u957f\u65f6\u5e8f\u4efb\u52a1\u7684\u590d\u6742\u73af\u5883\u2014\u2014WebArena\u57fa\u51c6\u4e2d\uff0c\u901a\u8fc7\u81ea\u6211\u6539\u8fdb\u80fd\u5426\u63d0\u5347\u5176\u8868\u73b0\u3002WebArena\u8981\u6c42\u4ee3\u7406\u81ea\u4e3b\u6d4f\u89c8\u7f51\u9875\u5e76\u6267\u884c\u64cd\u4f5c\u4ee5\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cd\u4e0d\u540c\u7684\u5408\u6210\u8bad\u7ec3\u6570\u636e\u6df7\u5408\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u53d1\u73b0\u7ecf\u8fc7\u81ea\u6211\u6539\u8fdb\u540e\uff0c\u6a21\u578b\u5728WebArena\u57fa\u51c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u63d0\u9ad8\u4e8631%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u66f4\u5168\u9762\u5730\u8bc4\u4f30\u6211\u4eec\u7684\u5fae\u8c03\u4ee3\u7406\u6a21\u578b\u7684\u884c\u4e3a\u6027\u80fd\u3001\u9c81\u68d2\u6027\u3001\u80fd\u529b\u4ee5\u53ca\u8f68\u8ff9\u8d28\u91cf\uff0c\u8fd9\u4e9b\u6307\u6807\u8d85\u8d8a\u4e86\u5f53\u524d\u4ec5\u4f9d\u8d56\u4e8e\u6574\u4f53\u57fa\u51c6\u5206\u6570\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002|\n", "2405.20304": "|**2024-05-30**|**Group Robust Preference Optimization in Reward-free RLHF**|Shyam Sundhar Ramesh et.al.|[2405.20304](http://arxiv.org/abs/2405.20304)|**[link](https://github.com/rsshyam/Group-robust-preference-optimization)**|**## \u7ffb\u8bd1 \u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u9002\u5e94\u65f6\uff0c\u901a\u5e38\u9700\u8981\u901a\u8fc7\u57fa\u4e8e\u4eba\u7c7b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u548c\u591a\u5143\u6807\u7b7e\u8005\u7fa4\u4f53\uff08\u5982\u4e0d\u540c\u6027\u522b\u3001\u79cd\u65cf\u3001\u516c\u53f8\u56e2\u961f\u7b49\uff09\u7684\u504f\u597d\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u4f20\u7edf\u65b9\u6cd5\u503e\u5411\u4e8e\u91c7\u7528\u201c\u4e00\u5200\u5207\u201d\u7684\u7b56\u7565\uff0c\u5373\u5047\u8bbe\u5e76\u4f18\u5316\u5355\u4e00\u7684\u504f\u597d\u6a21\u578b\uff0c\u5bf9\u5404\u7fa4\u4f53\u7684\u72ec\u7279\u7279\u6027\u548c\u9700\u6c42\u4e0d\u591f\u654f\u611f\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7fa4\u4f53\u9c81\u68d2\u504f\u597d\u4f18\u5316\uff08GRPO\uff09\u65b9\u6cd5\uff0c\u65e8\u5728\u7a33\u5065\u5730\u4f7fLLMs\u9002\u5e94\u5404\u4e2a\u7fa4\u4f53\u7684\u504f\u597d\u3002GRPO\u65b9\u6cd5\u57fa\u4e8e\u65e0\u5956\u52b1\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff0c\u4f46\u533a\u522b\u4e8e\u4ee5\u5f80\uff0c\u5b83\u76ee\u6807\u662f\u5bfb\u627e\u4e00\u4e2a\u80fd\u6700\u5927\u5316\u6700\u5dee\u7fa4\u4f53\u6027\u80fd\u7684\u9c81\u68d2\u7b56\u7565\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0cGRPO\u4f1a\u52a8\u6001\u4e14\u9010\u6b21\u8c03\u6574\u4e0d\u540c\u7fa4\u4f53\u7684\u6743\u91cd\uff0c\u4f18\u5148\u5173\u6ce8\u7d2f\u79ef\u635f\u5931\u8f83\u9ad8\u7684\u7fa4\u4f53\u3002\u6211\u4eec\u5728\u7406\u8bba\u4e0a\u63a2\u8ba8\u4e86GRPO\u7684\u53ef\u884c\u6027\uff0c\u5e76\u5206\u6790\u4e86\u5176\u5728\u5bf9\u6570\u7ebf\u6027\u7b56\u7565\u7c7b\u522b\u4e0b\u7684\u6536\u655b\u6027\u3002\u901a\u8fc7\u4f7f\u7528\u6765\u81ea\u4e0d\u540c\u7fa4\u4f53\u7684\u5168\u5c40\u610f\u89c1\u6570\u636e\u5bf9LLMs\u8fdb\u884cGRPO\u5fae\u8c03\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u6700\u5dee\u7fa4\u4f53\u7684\u8868\u73b0\uff0c\u51cf\u5c11\u4e86\u7fa4\u4f53\u95f4\u635f\u5931\u7684\u4e0d\u5e73\u8861\uff0c\u540c\u65f6\u63d0\u9ad8\u4e86\u6982\u7387\u51c6\u786e\u6027\uff0c\u76f8\u8f83\u4e8e\u975e\u9c81\u68d2\u57fa\u7ebf\uff0c\u8fd9\u4e9b\u6539\u8fdb\u6548\u679c\u663e\u8457\u3002**|\n", "2405.20285": "|**2024-05-30**|**Who Writes the Review, Human or AI?**|Panagiotis C. Theocharopoulos et.al.|[2405.20285](http://arxiv.org/abs/2405.20285)|null|\u968f\u7740\u4eba\u5de5\u667a\u80fd\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u4eba\u4eec\u5173\u6ce8\u5982\u4f55\u8bc6\u522b\u4e0d\u540c\u9886\u57df\u7684AI\u751f\u6210\u6587\u672c\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u8fd9\u4e2a\u95ee\u9898\uff0c\u901a\u8fc7\u63d0\u51fa\u4e00\u79cd\u65b9\u6cd5\u6765\u51c6\u786e\u533a\u5206\u4eba\u5de5\u667a\u80fd\u751f\u6210\u7684\u548c\u4eba\u7c7b\u64b0\u5199\u7684\u4e66\u8bc4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u8fc1\u79fb\u5b66\u4e60\uff0c\u8ba9\u6a21\u578b\u80fd\u591f\u5728\u4e0d\u540c\u4e3b\u9898\u95f4\u8bc6\u522b\u751f\u6210\u6587\u672c\uff0c\u540c\u65f6\u63d0\u9ad8\u5176\u8bc6\u522b\u5199\u4f5c\u98ce\u683c\u548c\u8bcd\u6c47\u53d8\u5316\u7684\u80fd\u529b\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\uff0c\u5305\u542b\u771f\u5b9e\u7684\u4e66\u8bc4\u548c\u4f7f\u7528Vicuna\u5f00\u6e90\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u6a21\u62df\u8bc4\u8bba\uff0c\u4ee5\u8bc4\u4f30\u6240\u63d0\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bc6\u522b\u6587\u672c\u539f\u521b\u6765\u6e90\u662f\u53ef\u884c\u7684\uff0c\u51c6\u786e\u7387\u8fbe\u523096.86%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u672c\u8bc6\u522b\u65b9\u9762\u7684\u6027\u80fd\u4e0e\u5c40\u9650\u6027\u7814\u7a76\uff0c\u8fd9\u5bf9\u4e8e\u672a\u6765\u6709\u6548\u7ba1\u7406\u6b64\u7c7b\u6a21\u578b\u4ee5\u53ca\u786e\u4fdd\u4eba\u7c7b\u521b\u4f5c\u5185\u5bb9\u7684\u5b8c\u6574\u6027\u548c\u771f\u5b9e\u6027\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2405.21075": "|**2024-05-31**|**Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis**|Chaoyou Fu et.al.|[2405.21075](http://arxiv.org/abs/2405.21075)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u8ffd\u6c42\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u6210\u4e3a\u8fd1\u671f\u8fdb\u6b65\u7684\u6838\u5fc3\u3002\u7136\u800c\uff0c\u5bf9\u5b83\u4eec\u5904\u7406\u5e8f\u5217\u89c6\u89c9\u6570\u636e\u7684\u80fd\u529b\u7684\u5173\u6ce8\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51faVideo-MME\uff0c\u8fd9\u662f\u9996\u4e2a\u5168\u9762\u8bc4\u4f30MLLMs\u5728\u89c6\u9891\u5206\u6790\u6027\u80fd\u7684\u591a\u6a21\u6001\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u56db\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u89c6\u9891\u7c7b\u578b\u591a\u6837\uff0c\u6db5\u76d66\u4e2a\u4e3b\u8981\u89c6\u89c9\u9886\u57df\u548c30\u4e2a\u5b50\u9886\u57df\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\u6cdb\u5316\u80fd\u529b\uff1b2\uff09\u65f6\u95f4\u7ef4\u5ea6\u7684\u8de8\u5ea6\uff0c\u5305\u62ec\u77ed\u3001\u4e2d\u3001\u957f\u671f\u89c6\u9891\uff0c\u4ece11\u79d2\u52301\u5c0f\u65f6\uff0c\u4ee5\u68c0\u9a8c\u6a21\u578b\u5bf9\u590d\u6742\u60c5\u5883\u52a8\u6001\u7684\u9002\u5e94\u6027\uff1b3\uff09\u6570\u636e\u6a21\u6001\u7684\u5e7f\u5ea6\uff0c\u7ed3\u5408\u89c6\u9891\u5e27\u4ee5\u5916\u7684\u591a\u79cd\u8f93\u5165\uff0c\u5982\u5b57\u5e55\u548c\u97f3\u9891\uff0c\u63ed\u793aMLLMs\u7684\u5168\u65b9\u4f4d\u80fd\u529b\uff1b4\uff09\u9ad8\u8d28\u91cf\u7684\u6807\u6ce8\uff0c\u7531\u4e13\u5bb6\u4e25\u683c\u624b\u52a8\u6807\u8bb0\uff0c\u4ee5\u4fdd\u8bc1\u7cbe\u786e\u4e14\u53ef\u9760\u7684\u6a21\u578b\u8bc4\u4f30\u3002\u6211\u4eec\u7cbe\u5fc3\u6311\u9009\u5e76\u624b\u52a8\u6ce8\u89e3\u4e86900\u6bb5\u89c6\u9891\uff0c\u603b\u65f6\u957f\u8fbe\u5230256\u5c0f\u65f6\uff0c\u751f\u6210\u4e862,700\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u901a\u8fc7Video-MME\uff0c\u6211\u4eec\u5bf9\u5305\u62ecGPT-4\u7cfb\u5217\u3001Gemini 1.5 Pro\u5728\u5185\u7684\u591a\u4e2a\u6700\u5148\u8fdb\u7684MLLM\uff0c\u4ee5\u53ca\u5f00\u6e90\u56fe\u50cf\u6a21\u578bInternVL-Chat-V1.5\u548c\u89c6\u9891\u6a21\u578bLLaVA-NeXT-Video\u8fdb\u884c\u4e86\u6df1\u5165\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cGemini 1.5 Pro\u662f\u8868\u73b0\u6700\u4f73\u7684\u5546\u4e1a\u6a21\u578b\uff0c\u660e\u663e\u4f18\u4e8e\u5f00\u6e90\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u53d1\u73b0\u5f3a\u8c03\u4e86\u6539\u8fdb\u5904\u7406\u66f4\u957f\u5e8f\u5217\u548c\u591a\u6a21\u6001\u6570\u636e\u7684\u5fc5\u8981\u6027\u3002\u9879\u76ee\u7f51\u9875\u94fe\u63a5\uff1ahttps://video-mme.github.io|\n", "2405.21047": "|**2024-05-31**|**Grammar-Aligned Decoding**|Kanghee Park et.al.|[2405.21047](http://arxiv.org/abs/2405.21047)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u65f6\u9762\u4e34\u6311\u6218\uff0c\u5982\u7a0b\u5e8f\u4ee3\u7801\u3001\u6570\u5b66\u516c\u5f0f\u6216\u89c4\u8303\u7684\u6807\u8bb0\u3002\u7ea6\u675f\u89e3\u7801\u65b9\u6cd5\u901a\u8fc7\u9650\u5236\u6bcf\u6b21\u8f93\u51fa\u53ef\u80fd\u7684\u4ee4\u724c\uff0c\u786e\u4fdd\u8f93\u51fa\u7b26\u5408\u7279\u5b9a\u89c4\u5219\u6765\u7f13\u89e3\u8fd9\u4e2a\u95ee\u9898\uff0c\u4f8b\u5982\u5728\u8bed\u6cd5\u7ea6\u675f\u89e3\u7801\uff08GCD\uff09\u4e2d\uff0cLLM\u7684\u8f93\u51fa\u5fc5\u987b\u9075\u5faa\u7ed9\u5b9a\u7684\u8bed\u6cd5\u89c4\u5219\u3002\u7136\u800c\uff0c\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u79cd\u7ea6\u675f\u89e3\u7801\u53ef\u80fd\u4f1a\u626d\u66f2\u6a21\u578b\u7684\u5206\u5e03\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u8f93\u51fa\u867d\u7136\u8bed\u6cd5\u6b63\u786e\uff0c\u4f46\u5176\u6982\u7387\u5e76\u4e0d\u76f4\u63a5\u53cd\u6620LLM\u672c\u8eab\u7684\u6982\u7387\u5206\u914d\uff0c\u4ece\u800c\u8d28\u91cf\u4e0d\u9ad8\u3002\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u4e0e\u8bed\u6cd5\u7ea6\u675f\u5bf9\u9f50\u7684\u89e3\u7801\u201d\uff08Grammar-Aligned Decoding\uff0cGAD\uff09\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u81ea\u9002\u5e94\u91c7\u6837\u4e0e\u8fd1\u4f3c\u671f\u671b\u672a\u6765\u201d\uff08Adaptive Sampling with Approximate Expected Futures\uff0cASAp\uff09\u7684\u89e3\u7801\u7b97\u6cd5\u3002 ASAp\u7b97\u6cd5\u65e8\u5728\u4fdd\u8bc1\u8f93\u51fa\u7684\u8bed\u6cd5\u6027\uff0c\u5e76\u7406\u8bba\u4e0a\u4ea7\u751f\u4e0eLLM\u5728\u7ed9\u5b9a\u8bed\u6cd5\u7ea6\u675f\u6761\u4ef6\u4e0b\u7684\u6761\u4ef6\u6982\u7387\u76f8\u7b26\u7684\u7ed3\u679c\u3002\u8be5\u7b97\u6cd5\u5229\u7528\u5148\u524d\u7684\u6837\u672c\u8f93\u51fa\u6765\u7a33\u5065\u5730\u4f30\u7b97\u4e0d\u540c\u8f93\u51fa\u524d\u7f00\u7684\u672a\u6765\u8bed\u6cd5\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5728\u4ee3\u7801\u751f\u6210\u548c\u7ed3\u6784\u5316\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cASAp\u7ecf\u5e38\u80fd\u591f\u751f\u6210\u6bd4\u73b0\u6709GCD\u6280\u672f\u66f4\u7b26\u5408LLM\u5206\u5e03\u4e14\u4ecd\u9075\u5b88\u6240\u9700\u8bed\u6cd5\u9650\u5236\u7684\u8f93\u51fa\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6574\u4f53\u8d28\u91cf\u3002|\n", "2405.21040": "|**2024-05-31**|**Direct Alignment of Language Models via Quality-Aware Self-Refinement**|Runsheng Yu et.al.|[2405.21040](http://arxiv.org/abs/2405.21040)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u884c\u4e3a\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u5e38\u7528\u65b9\u6cd5\u3002\u6700\u8fd1\uff0c\u76f4\u63a5\u7b56\u7565\u4f18\u5316\uff08DPO\uff09\u4f5c\u4e3a\u4e00\u79cd\u66ff\u4ee3\u65b9\u6848\u5174\u8d77\uff0c\u5b83\u4e0d\u518d\u4f9d\u8d56LLM\u5956\u52b1\u6a21\u578b\uff0c\u4ece\u800c\u51cf\u5c11\u4e86\u989d\u5916\u7684\u5185\u5b58\u548c\u8bad\u7ec3\u65f6\u95f4\u3002\u7136\u800c\uff0cDPO\u5ffd\u89c6\u4e86\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u76f8\u5bf9\u8d28\u91cf\uff0c\u53ef\u80fd\u5bfc\u81f4\u8bad\u7ec3\u7ed3\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u8ba8\u5229\u7528LLM\u5185\u90e8\u77e5\u8bc6\u5728\u5373\u65f6\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5e76\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7ec6\u5316\u51fd\u6570\uff0c\u5229\u7528LLM\u7684\u77e5\u8bc6\u6765\u4f30\u8ba1\u6b63\u5411\u548c\u8d1f\u5411\u54cd\u5e94\u7684\u54c1\u8d28\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u8f7b\u5ea6\u5047\u8bbe\u4e0b\uff0c\u6784\u5efa\u7684\u7ec6\u5316\u51fd\u6570\u80fd\u591f\u5e2e\u52a9\u81ea\u6211\u8c03\u6574\u635f\u5931\u51fd\u6570\u3002\u6211\u4eec\u5c06\u8fd9\u4e2a\u7ec6\u5316\u529f\u80fd\u6574\u5408\u5230DPO\u53ca\u5176\u53d8\u4f53\u8eab\u4efd\u7b56\u7565\u4f18\u5316\uff08IPO\uff09\u4e2d\u3002\u5b9e\u9a8c\u8bc1\u660e\uff0c\u8fd9\u4e9b\u6539\u8fdb\u540e\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u8005\u4e0a\u8868\u73b0\u51fa\u4f18\u4e8eDPO\u548cIPO\u7684\u6027\u80fd\u3002|\n", "2405.21030": "|**2024-05-31**|**Standards for Belief Representations in LLMs**|Daniel A. Herrmann et.al.|[2405.21030](http://arxiv.org/abs/2405.21030)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u5bb6\u4eec\u6b63\u5728\u5bfb\u6c42\u7406\u89e3\u5b83\u4eec\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u7279\u522b\u662f\u5173\u4e8eLLMs\u5982\u4f55\uff08\u5982\u679c\u6709\u7684\u8bdd\uff09\u5185\u90e8\u6784\u5efa\u5bf9\u4e16\u754c\u7684\u4fe1\u5ff5\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u7f3a\u4e4f\u4e00\u4e2a\u7edf\u4e00\u7684\u7406\u8bba\u6846\u67b6\u6765\u652f\u6491\u5bf9LLM\u4e2d\u4fe1\u5ff5\u7684\u7814\u7a76\u3002\u672c\u6587\u8bd5\u56fe\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86\u4e00\u5957\u6761\u4ef6\uff0c\u4f7fLLM\u4e2d\u7684\u8868\u793a\u80fd\u591f\u88ab\u89c6\u4e3a\u4fe1\u5ff5\u4f3c\u7684\u3002\u6211\u4eec\u6307\u51fa\uff0c\u5c3d\u7ba1\u5728LLMs\u4e2d\u6d4b\u91cf\u4fe1\u5ff5\u7684\u9879\u76ee\u4e0e\u51b3\u7b56\u7406\u8bba\u548c\u5f62\u5f0f\u8ba4\u8bc6\u8bba\u4e2d\u7684\u4fe1\u5ff5\u6d4b\u91cf\u5728\u8bb8\u591a\u65b9\u9762\u6709\u76f8\u4f3c\u4e4b\u5904\uff0c\u4f46\u4e5f\u5b58\u5728\u5dee\u5f02\uff0c\u8fd9\u4e9b\u5dee\u5f02\u5e94\u5f71\u54cd\u6211\u4eec\u7684\u6d4b\u91cf\u65b9\u6cd5\u3002\u56e0\u6b64\uff0c\u501f\u9274\u54f2\u5b66\u6d1e\u5bdf\u548c\u673a\u5668\u5b66\u4e60\u7684\u5f53\u4ee3\u5b9e\u8df5\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u56db\u4e2a\u6807\u51c6\uff1a\u51c6\u786e\u6027\u3001\u4e00\u81f4\u6027\u3001\u7edf\u4e00\u6027\u548c\u5b9e\u7528\u6027\u3002\u8fd9\u56db\u4e2a\u6807\u51c6\u7ed3\u5408\u4e86\u7406\u8bba\u8003\u91cf\u4e0e\u5b9e\u9645\u9650\u5236\uff0c\u4e3a\u5168\u9762\u7406\u89e3LLM\u4e2d\u7684\u4fe1\u5ff5\u8868\u793a\u5960\u5b9a\u4e86\u57fa\u7840\u3002\u6211\u4eec\u5f15\u7528\u5b9e\u8bc1\u5de5\u4f5c\u7684\u6210\u679c\uff0c\u63ed\u793a\u4e86\u5355\u72ec\u4f7f\u7528\u67d0\u4e9b\u6807\u51c6\u65f6\u8bc6\u522b\u4fe1\u5ff5\u8868\u793a\u7684\u5c40\u9650\u6027\u3002|\n", "2405.21028": "|**2024-05-31**|**LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models**|Elias Stengel-Eskin et.al.|[2405.21028](http://arxiv.org/abs/2405.21028)|**[link](https://github.com/esteng/pragmatic_calibration)**|**\u5f53\u56de\u7b54\u95ee\u9898\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u80fd\u63d0\u4f9b\u7b54\u6848\uff0c\u8fd8\u80fd\u4f20\u8fbe\u5bf9\u7b54\u6848\u6b63\u786e\u6027\u7684\u4fe1\u5fc3\u7a0b\u5ea6\u3002\u8fd9\u5305\u62ec\u660e\u786e\u7684\u5206\u6570\u6807\u8bb0\uff0c\u5982\u7ed9\u51fa\u6570\u5b57\uff0c\u4ee5\u53ca\u9690\u542b\u7684\u4fe1\u5fc3\u6807\u5fd7\uff0c\u5982\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u989d\u5916\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u5927\u591a\u6570\u6a21\u578b\u5f80\u5f80\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u6821\u51c6\u8fd9\u4e9b\u4fe1\u5fc3\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u7528\u7684\u3001\u8003\u8651\u542c\u4f17\u7684\u5fae\u8c03\u65b9\u6cd5\uff08LACIE\uff09\uff0c\u5b83\u4e0d\u4ec5\u5173\u6ce8\u7b54\u6848\u662f\u5426\u6b63\u786e\uff0c\u8fd8\u5173\u6ce8\u7b54\u6848\u662f\u5426\u4f1a\u88ab\u542c\u4f17\u63a5\u53d7\u3002\u6211\u4eec\u5c06\u6821\u51c6\u89c6\u4e3a\u504f\u597d\u4f18\u5316\uff0c\u901a\u8fc7\u53cc\u4ee3\u7406\u6e38\u620f\u521b\u5efa\u6570\u636e\uff0c\u8ba9\u4e00\u4e2a\u6f14\u8bb2\u8005\u6a21\u578b\u7684\u8f93\u51fa\u63a5\u53d7\u6a21\u62df\u542c\u8005\u7684\u8bc4\u5224\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528LACIE\u5bf9\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff08Mistral-7B\u3001Llama3-8B\u548cLlama3-70B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u663e\u793a\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6a21\u62df\u542c\u8005\u9762\u524d\u6709\u66f4\u597d\u7684\u6821\u51c6\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u8d8b\u52bf\u4e5f\u9002\u7528\u4e8e\u4eba\u7c7b\u542c\u4f17\uff0c\u5e2e\u52a9\u4ed6\u4eec\u66f4\u51c6\u786e\u5730\u9884\u6d4b\u6a21\u578b\u7684\u6b63\u786e\u6027\uff1a\u6211\u4eec\u5728\u4eba\u673a\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u63a5\u53d7\u7684\u9519\u8bef\u7b54\u6848\u51cf\u5c11\u4e8647%\uff0c\u800c\u6b63\u786e\u7b54\u6848\u7684\u63a5\u53d7\u7387\u4fdd\u6301\u4e0d\u53d8\u3002\u6b64\u5916\uff0cLACIE\u6cdb\u5316\u5230\u53e6\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\uff0c\u5728\u4f7f\u7528TriviaQA\u8bad\u7ec3\u540e\uff0cTruthfulQA\u4e0a\u7684\u771f\u5b9e\u6027\u5927\u5e45\u63d0\u9ad8\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLACIE\u5bfc\u81f4\u4e86\u6b63\u786e\u548c\u9519\u8bef\u793a\u4f8b\u4e4b\u95f4\u7684\u4fe1\u5fc3\u5ea6\u66f4\u597d\u5730\u5206\u79bb\u3002\u5b9a\u6027\u4e0a\uff0c\u6211\u4eec\u53d1\u73b0\u7ecf\u8fc7LACIE\u8bad\u7ec3\u7684\u6a21\u578b\u4f1a\u66f4\u52a0\u8c28\u614e\uff0c\u5e76\u5728\u56de\u7b54\u6b63\u786e\u65f6\u901a\u8fc7\u4f7f\u7528\u6743\u5a01\u8bed\u6c14\u6216\u63d0\u4f9b\u7ec6\u8282\u6765\u9690\u6027\u5730\u8868\u793a\u786e\u5b9a\u6027\u3002\u6700\u540e\uff0cLACIE\u5fae\u8c03\u5bfc\u81f4\u6a21\u578b\u5bf9\u4e8e\u53ef\u80fd\u9519\u8bef\u7684\u7b54\u6848\u66f4\u503e\u5411\u4e8e\u653e\u5f03\uff08\u4f8b\u5982\u8bf4\u201c\u6211\u4e0d\u77e5\u9053\u201d\uff09\u3002**|\n", "2405.21018": "|**2024-05-31**|**Improved Techniques for Optimization-Based Jailbreaking on Large Language Models**|Xiaojun Jia et.al.|[2405.21018](http://arxiv.org/abs/2405.21018)|**[link](https://github.com/jiaxiaojunqaq/i-gcg)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u5176\u5b89\u5168\u6821\u51c6\u6210\u4e3a\u5e7f\u6cdb\u5e94\u7528\u7684\u5173\u952e\u3002\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u7834\u89e3\uff08\u5373\u201cjailbreaking\u201d\uff09\u6d3b\u52a8\u65e5\u76ca\u589e\u591a\uff0c\u5176\u4e2d\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u653b\u51fb\u56e0\u5176\u6210\u6548\u663e\u8457\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0cGCG\u7684\u653b\u51fb\u6548\u7387\u4ecd\u6709\u63d0\u5347\u7a7a\u95f4\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6539\u8fdb\u7684\u4f18\u5316\u57fa\u7ebf\u7834\u89e3\u6280\u672f\uff0c\u4ee5\u63d0\u5347GCG\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u5355\u4e2a\u76ee\u6807\u6a21\u677f\u201cSure\u201d\u6781\u5927\u5730\u9650\u5236\u4e86GCG\u7684\u653b\u51fb\u6548\u679c\uff0c\u56e0\u6b64\u6211\u4eec\u5efa\u8bae\u91c7\u7528\u5305\u542b\u6709\u5bb3\u81ea\u6211\u6697\u793a\u548c/\u6216\u6307\u5bfc\u7684\u591a\u6837\u5316\u76ee\u6807\u6a21\u677f\uff0c\u4ee5\u8bef\u5bfc\u6a21\u578b\u3002\u5728\u4f18\u5316\u7b56\u7565\u4e0a\uff0c\u6211\u4eec\u5efa\u8bae\u5728GCG\u4e2d\u5b9e\u65bd\u81ea\u52a8\u591a\u5750\u6807\u66f4\u65b0\uff0c\u4ee5\u52a0\u901f\u6536\u655b\uff0c\u5e76\u5f15\u5165\u4ece\u7b80\u5355\u5230\u590d\u6742\uff08easy-to-hard\uff09\u7684\u521d\u59cb\u5316\u6280\u5de7\u3002\u5c06\u8fd9\u4e9b\u6539\u8fdb\u6574\u5408\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u9ad8\u6548\u7684\u65b9\u6cd5\u2014\u2014$\\mathcal{I}$-GCG\u3002\u5b9e\u9a8c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5982NeurIPS 2023 \u7ea2\u961f\u6311\u6218\u4e2d\u8fdb\u884c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6539\u8fdb\u6280\u672f\u80fd\u591f\u5e2e\u52a9GCG\u8d85\u8d8a\u73b0\u6709\u7834\u89e3\u653b\u51fb\uff0c\u5b9e\u73b0\u63a5\u8fd1100%\u7684\u653b\u51fb\u6210\u529f\u7387\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://github.com/jiaxiaojunQAQ/I-GCG\u3002**|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985](http://arxiv.org/abs/2405.20985)|**[link](https://github.com/yaolinli/deco)**|\u8be5\u7814\u7a76\u5173\u6ce8\u4e8e\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u6295\u5f71\u5668\u6a21\u5757\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fde\u63a5\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u6001\u3001\u4fc3\u8fdb\u8de8\u6a21\u6001\u5bf9\u9f50\u65b9\u9762\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u76ee\u524d\u5bf9\u4e8e\u6295\u5f71\u5668\u5728\u89c6\u89c9-\u8bed\u8a00\u5bf9\u9f50\u65b9\u9762\u7684\u6548\u679c\u8bc4\u4f30\u4ecd\u663e\u4e0d\u8db3\uff0c\u901a\u5e38\u53ea\u80fd\u901a\u8fc7\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u95f4\u63a5\u63a8\u65ad\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u901a\u8fc7\u5206\u6790MLLM\u4e2d\u7684\u89c6\u89c9-\u8bed\u8a00\u8bed\u4e49\u6d41\uff0c\u6765\u89e3\u8bfb\u6295\u5f71\u5668\u7684\u5de5\u4f5c\u673a\u5236\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u7814\u7a76\u8005\u8ffd\u8e2a\u4ece\u751f\u6210\u7684\u8bed\u8a00\u6807\u8bb0\u5230\u539f\u59cb\u89c6\u89c9\u7f16\u7801\u5757\u4ee5\u53ca\u6295\u5f71\u5668\u4ea7\u751f\u7684\u4e2d\u95f4\u8f93\u51fa\u4e4b\u95f4\u7684\u8bed\u4e49\u76f8\u5173\u6027\u6d41\u3002\u53d1\u73b0\u538b\u7f29\u578b\u6295\u5f71\u5668\uff08\u5982QFormer\uff09\u503e\u5411\u4e8e\u5c06\u89c6\u89c9\u5757\u62bd\u8c61\u6210\u6709\u9650\u7684\u51e0\u4e2a\u6982\u5ff5\uff0c\u5982\u7269\u4f53\u6216\u5c5e\u6027\uff0c\u5bfc\u81f4\u201c\u53cc\u91cd\u62bd\u8c61\u201d\u73b0\u8c61\uff1a\u9996\u5148\uff0c\u6295\u5f71\u5668\u53c2\u7167\u9884\u5b9a\u4e49\u67e5\u8be2\u4ee4\u724c\u8fdb\u884c\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\uff0c\u7136\u540e\uff0c\u57fa\u4e8e\u6587\u672c\u6307\u4ee4\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u4e00\u6b65\u63d0\u53d6\u3002\u8fd9\u79cd\u53cc\u91cd\u62bd\u8c61\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u6548\u7387\u4e0d\u9ad8\uff0c\u5e76\u53ef\u80fd\u5bfc\u81f4\u89c6\u89c9\u8bed\u4e49\u4fe1\u606f\u7684\u7d2f\u79ef\u7f3a\u5931\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u7814\u7a76\u63d0\u51fa\u201c\u89e3\u8026\u538b\u7f29\u4e0e\u62bd\u8c61\uff08DeCo\uff09\u201d\u7684\u5173\u952e\u6d1e\u5bdf\uff0c\u5373\u5728\u6295\u5f71\u5c42\u9762\u4e0a\u5c06\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u538b\u7f29\uff0c\u800c\u8ba9\u5927\u8bed\u8a00\u6a21\u578b\u5b8c\u5168\u8d1f\u8d23\u89c6\u89c9\u8bed\u4e49\u62bd\u8c61\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u4eba\u5458\u91c7\u7528\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u538b\u7f29\u5668\u2014\u2014\u4e8c\u7ef4\u81ea\u9002\u5e94\u6c60\u5316\uff0c\u4ee5\u65e0\u53c2\u6570\u7684\u65b9\u5f0f\u964d\u4f4e\u89c6\u89c9\u5757\u7684\u5c3a\u5bf8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cDeCo\u5728\u6027\u80fd\u548c\u6548\u7387\u4e0a\u90fd\u4f18\u4e8e\u4f20\u7edf\u7684\u538b\u7f29\u6295\u5f71\u5668\u3002\u5b83\u5728MLLM\u57fa\u51c6\u3001\u89c6\u89c9\u5b9a\u4f4d\u548c\u5f00\u653e\u6027\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e2d\u5206\u522b\u53d6\u5f97\u4e860.9%\u30017.1%\u548c2.9%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u62e5\u6709\u66f4\u5c11\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u66f4\u5feb\u7684\u6536\u655b\u901f\u5ea6\u3002|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978](http://arxiv.org/abs/2405.20978)|**[link](https://github.com/calubkk/raat)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5f3a\u5927\u529f\u80fd\uff0c\u4f46\u9762\u4e34\u6311\u6218\uff0c\u5982\u865a\u6784\u3001\u8fc7\u65f6\u77e5\u8bc6\u548c\u96be\u4ee5\u8ffd\u6eaf\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u5d2d\u9732\u5934\u89d2\uff0c\u5b83\u7ed3\u5408\u5916\u90e8\u6570\u636e\u5e93\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u4e0d\u9002\u5f53\u7684\u68c0\u7d22\u6bb5\u843d\u53ef\u80fd\u59a8\u788dLLMs\u751f\u6210\u5168\u9762\u4e14\u9ad8\u8d28\u91cf\u7684\u56de\u7b54\u3002\u5148\u524d\u5173\u4e8eRAG\u4e2d\u68c0\u7d22\u566a\u58f0\u7a33\u5065\u6027\u7684\u7814\u7a76\u5f80\u5f80\u5c40\u9650\u4e8e\u6709\u9650\u7684\u566a\u58f0\u7c7b\u578b\uff0c\u8fd9\u4e0e\u73b0\u5b9e\u4e16\u754c\u7684\u68c0\u7d22\u73af\u5883\u4e0d\u7b26\uff0c\u9650\u5236\u4e86\u5b9e\u9645\u5e94\u7528\u3002\u672c\u7814\u7a76\u9996\u5148\u63a2\u8ba8\u4e86\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u5c06\u5176\u5206\u4e3a\u4e09\u79cd\u4e0d\u540c\u7684\u7c7b\u522b\uff0c\u53cd\u6620\u771f\u5b9e\u73af\u5883\u3002\u6211\u4eec\u5206\u6790\u4e86\u8fd9\u4e9b\u4e0d\u540c\u7c7b\u578b\u7684\u68c0\u7d22\u566a\u58f0\u5bf9LLMs\u7a33\u5065\u6027\u7684\u5f71\u54cd\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684RAG\u65b9\u6cd5\uff0c\u79f0\u4e3a\u68c0\u7d22\u589e\u5f3a\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\uff08RAAT\uff09\u3002RAAT\u5229\u7528\u81ea\u9002\u5e94\u5bf9\u6297\u8bad\u7ec3\u6765\u52a8\u6001\u8c03\u6574\u6a21\u578b\u7684\u8bad\u7ec3\u6d41\u7a0b\u4ee5\u5e94\u5bf9\u68c0\u7d22\u566a\u58f0\uff0c\u5e76\u91c7\u7528\u591a\u4efb\u52a1\u5b66\u4e60\u786e\u4fdd\u6a21\u578b\u80fd\u591f\u8bc6\u522b\u5608\u6742\u7684\u4e0a\u4e0b\u6587\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5404\u79cd\u566a\u58f0\u6761\u4ef6\u4e0b\uff0c\u4f7f\u7528RAAT\u8bad\u7ec3\u7684LLaMA-2 7B\u6a21\u578b\u5728F1\u548cEM\u5206\u6570\u4e0a\u663e\u793a\u51fa\u663e\u8457\u63d0\u5347\u3002\u4e3a\u4e86\u4fbf\u4e8e\u590d\u73b0\uff0c\u6211\u4eec\u5df2\u5728https://github.com/calubkk/RAAT\u4e0a\u53d1\u5e03\u4e86\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u3002|\n", "2405.20974": "|**2024-05-31**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974](http://arxiv.org/abs/2405.20974)|**[link](https://github.com/xu1868/sayself)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u51c6\u786e\u6216\u865a\u5047\u7684\u4fe1\u606f\uff0c\u5e76\u4e14\u901a\u5e38\u65e0\u6cd5\u8868\u660e\u5176\u4fe1\u5fc3\u6c34\u5e73\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u76f4\u63a5\u63d0\u793a\u6216\u81ea\u6211\u4e00\u81f4\u6027\u63d0\u793a\u6765\u63d0\u53d6LLMs\u7684\u4fe1\u5fc3\uff0c\u6216\u8005\u6784\u5efa\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6027\u80fd\u8f83\u5dee\uff0c\u800c\u57fa\u4e8e\u8bad\u7ec3\u7684\u65b9\u6cd5\u53c8\u5c40\u9650\u4e8e\u4e8c\u5143\u6216\u4e0d\u7cbe\u786e\u7684\u6574\u4f53\u4fe1\u5fc3\u4f30\u8ba1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014SaySelf\uff0c\u8fd9\u662f\u4e00\u4e2a\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLMs\u63d0\u4f9b\u66f4\u7cbe\u786e\u7684\u7ec6\u7c92\u5ea6\u4fe1\u5fc3\u4f30\u8ba1\u3002 \u6b64\u5916\uff0cSaySelf\u8fd8\u63a8\u52a8LLMs\u751f\u6210\u81ea\u6211\u53cd\u601d\u7684\u89e3\u91ca\uff0c\u660e\u786e\u6307\u51fa\u5b83\u4eec\u5728\u53c2\u6570\u77e5\u8bc6\u4e0a\u7684\u7a7a\u767d\u5e76\u89e3\u91ca\u4e0d\u786e\u5b9a\u6027\u3002\u8fd9\u662f\u901a\u8fc7\u8ba9LLM\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u81ea\u52a8\u603b\u7ed3\u7279\u5b9a\u77e5\u8bc6\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u6765\u5b9e\u73b0\u7684\u3002\u8fd9\u79cd\u603b\u7ed3\u662f\u57fa\u4e8e\u5bf9\u591a\u4e2a\u91c7\u6837\u63a8\u7406\u94fe\u7684\u4e0d\u4e00\u81f4\u6027\u5206\u6790\uff0c\u751f\u6210\u7684\u6570\u636e\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u6821\u51c6\u4fe1\u5fc3\u4f30\u8ba1\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u5956\u52b1\u51c6\u786e\u3001\u9ad8\u7f6e\u4fe1\u5ea6\u7684\u9884\u6d4b\uff0c\u540c\u65f6\u60e9\u7f5a\u9519\u8bef\u8f93\u51fa\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u5728\u5206\u5e03\u5185\u8fd8\u662f\u5206\u5e03\u5916\u7684\u6570\u636e\u96c6\u4e0a\uff0cSaySelf\u90fd\u80fd\u6709\u6548\u51cf\u5c11\u4fe1\u5fc3\u6821\u51c6\u8bef\u5dee\uff0c\u540c\u65f6\u4fdd\u6301\u4efb\u52a1\u6027\u80fd\u3002\u751f\u6210\u7684\u81ea\u6211\u53cd\u601d\u7406\u7531\u4e5f\u88ab\u8bc1\u660e\u662f\u5408\u7406\u7684\uff0c\u80fd\u8fdb\u4e00\u6b65\u4fc3\u8fdb\u6821\u51c6\u3002\u4ee3\u7801\u5df2\u516c\u5f00\u5728\uff1a\\url{https://github.com/xu1868/SaySelf}\u3002**|\n", "2405.20973": "|**2024-05-31**|**LCQ: Low-Rank Codebook based Quantization for Large Language Models**|Wen-Pu Cai et.al.|[2405.20973](http://arxiv.org/abs/2405.20973)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f17\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4f18\u5f02\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7684\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u9ad8\u6210\u4e3a\u90e8\u7f72\u7684\u4e00\u5927\u6311\u6218\u3002\u4e3a\u4e86\u538b\u7f29\u6a21\u578b\u5e76\u964d\u4f4e\u6210\u672c\uff0c\u6743\u91cd\u91cf\u5316\u6280\u672f\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u76ee\u524d\uff0c\u5927\u591a\u6570\u9488\u5bf9LLMs\u7684\u91cf\u5316\u65b9\u6cd5\u4f7f\u7528\u79e9\u4e00\u7801\u672c\uff0c\u7136\u800c\u5728\u9ad8\u538b\u7f29\u6bd4\u4e0b\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u7cbe\u5ea6\u635f\u5931\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6743\u91cd\u91cf\u5316\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4f4e\u79e9\u7801\u672c\u91cf\u5316\uff08LCQ\uff09\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002 ## \u65b9\u6cd5 LCQ\u91c7\u7528\u4f4e\u79e9\u7801\u672c\u8fdb\u884c\u91cf\u5316\uff0c\u5176\u79e9\u53ef\u4ee5\u5927\u4e8e\u4e00\u3002\u8fd9\u79cd\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u5229\u7528\u66f4\u9ad8\u7684\u79e9\u6765\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u7684\u7cbe\u5ea6\uff0c\u540c\u65f6\u63a7\u5236\u989d\u5916\u7684\u5b58\u50a8\u5f00\u9500\u51e0\u4e4e\u4e3a\u96f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u6bd4\uff0cLCQ\u5728\u4fdd\u6301\u826f\u597d\u51c6\u786e\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u80fd\u591f\u5b9e\u73b0\u66f4\u4f18\u7684\u538b\u7f29\u6548\u679c\u3002 ## \u7ed3\u8bba \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u4f4e\u79e9\u7801\u672c\u91cf\u5316\u65b9\u6cd5\uff0c\u5b83\u6709\u671b\u5728\u4e0d\u663e\u8457\u589e\u52a0\u5b58\u50a8\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6027\u80fd\u548c\u6548\u7387\uff0c\u4e3a\u9ad8\u6548\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550](http://arxiv.org/abs/2406.02550)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|**\u8fd9\u7bc7\u5de5\u4f5c\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e00\u7ec4\u6a21\u5757\u5316\u7b97\u672f\u4efb\u52a1\u4e2d\u51fa\u73b0\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u548c\u6280\u80fd\u7ec4\u5408\u73b0\u8c61\u3002\u6211\u4eec\u5173\u6ce8\u7684\u662f\u6709\u9650\u6570\u91cf\u7684\u4e00\u6b21\u6027\u6a21\u8fd0\u7b97\u51fd\u6570 $z = a \\times x + b \\times y \\;(\\text{mod}\\; p)$\uff0c\u8fd9\u4e9b\u51fd\u6570\u7531\u5411\u91cf $(a, b) \\in \\mathbb{Z}_p^2$ \u6807\u8bb0\u3002\u90e8\u5206\u4efb\u52a1\u88ab\u7528\u4f5c\u9884\u8bad\u7ec3\uff0c\u5176\u4f59\u7528\u4e8e\u5206\u5e03\u5916\u6d4b\u8bd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cGPT\u98ce\u683c\u7684Transformer\u968f\u7740\u9884\u8bad\u7ec3\u4efb\u52a1\u6570\u91cf\u589e\u52a0\uff0c\u5176\u5728\u5206\u5e03\u5185\u548c\u5206\u5e03\u5916\u7684\u6cdb\u5316\u80fd\u529b\u4f1a\u7ecf\u5386\u8f6c\u53d8\u3002\u6700\u5c0f\u578b\u80fd\u5b9e\u73b0\u5206\u5e03\u5916\u6cdb\u5316\u7684\u6a21\u578b\u9700\u8981\u4e24\u4e2aTransformer\u5757\uff1b\u800c\u5bf9\u4e8e\u66f4\u6df1\u7684\u6a21\u578b\uff0c\u5206\u5e03\u5916\u6cdb\u5316\u9636\u6bb5\u662f\u201c\u77ac\u6001\u201d\u7684\uff0c\u9700\u8981\u65e9\u671f\u505c\u6b62\u3002\u6700\u540e\uff0c\u6211\u4eec\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u8fdb\u884c\u4e86\u53ef\u89e3\u91ca\u6027\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e24\u79cd\u9636\u6bb5\u4e2d\u9ad8\u5ea6\u7ed3\u6784\u5316\u7684\u8868\u793a\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b66\u4e60\u5230\u7684\u7b97\u6cd5\u3002**|\n", "2406.02547": "|**2024-06-04**|**Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning**|Alex Jinpeng Wang et.al.|[2406.02547](http://arxiv.org/abs/2406.02547)|**[link](https://github.com/showlab/VisInContext)**|**\u8fd9\u6bb5\u7814\u7a76\u5e76\u672a\u4ecb\u7ecd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u800c\u662f\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u63d0\u5347\u957f\u5e8f\u5217\u5728\u591a\u6a21\u6001\u6a21\u578b\u4e2d\u7684\u5904\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201cVisualized In-Context Text Processing\u201d\uff08VisInContext\uff09\u6280\u672f\uff0c\u901a\u8fc7\u89c6\u89c9\u4ee4\u724c\u6765\u5904\u7406\u957f\u6587\u672c\uff0c\u4ece\u800c\u663e\u8457\u964d\u4f4eGPU\u5185\u5b58\u4f7f\u7528\u548c\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9700\u6c42\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a560\u4ebf\u53c2\u6570\u7684\u6df7\u5408 Experts\uff08MOE\uff09\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u9884\u8bad\u7ec3\u4e2d\u7684\u4e0a\u4e0b\u6587\u6587\u672c\u957f\u5ea6\u6269\u5c55\u5230\u4e862048\u4e2atokens\uff0c\u800c\u8ba1\u7b97\u91cf\u51e0\u4e4e\u4fdd\u6301\u4e0d\u53d8\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528VisInContext\u8bad\u7ec3\u7684\u6a21\u578b\u5728\u5e38\u89c1\u7684\u57fa\u4e8e\u5b9e\u4f8b\u7684\u5c11\u91cf\u6570\u636e\u8bc4\u4f30\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0cVisInContext\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u7ed3\u5408\uff0c\u80fd\u589e\u5f3a\u5bf9\u6587\u6863\u7684\u7406\u89e3\u80fd\u529b\uff0c\u7279\u522b\u9002\u7528\u4e8e\u6587\u6863\u95ee\u7b54\u548c\u8fde\u7eed\u6587\u6863\u68c0\u7d22\uff0c\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002**|\n", "2406.02543": "|**2024-06-04**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543](http://arxiv.org/abs/2406.02543)|null|\u6211\u4eec\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u76ee\u6807\u662f\u8bc6\u522b\u5bf9\u7ed9\u5b9a\u67e5\u8be2\u7684\u54cd\u5e94\u65f6\u7684\u4e0d\u786e\u5b9a\u6027\u7a0b\u5ea6\u3002\u6211\u4eec\u540c\u65f6\u8003\u8651\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e0d\u786e\u5b9a\u6027\uff1a\u4e00\u79cd\u662f\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\uff08\u4f8b\u5982\u5bf9\u4e8b\u5b9e\u6216\u8bed\u8a00\u771f\u7406\u7684\u672a\u77e5\uff09\uff0c\u53e6\u4e00\u79cd\u662f\u4e0d\u53ef\u6d88\u9664\u7684\u968f\u673a\u6027\uff08\u5982\u53ef\u80fd\u7684\u7b54\u6848\u591a\u6837\u6027\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4fe1\u606f\u8bba\u6307\u6807\uff0c\u80fd\u591f\u53ef\u9760\u5730\u533a\u5206\u51fa\u53ea\u6709\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u8f83\u5927\u7684\u60c5\u51b5\uff0c\u8fd9\u65f6\u6a21\u578b\u7684\u8f93\u51fa\u662f\u4e0d\u53ef\u9760\u7684\u3002\u8fd9\u4e2a\u6761\u4ef6\u4ec5\u4f9d\u8d56\u4e8e\u901a\u8fc7\u7279\u6b8a\u8fed\u4ee3\u63d0\u793a\u57fa\u4e8e\u5148\u524d\u54cd\u5e94\u5f97\u5230\u7684\u6a21\u578b\u8f93\u51fa\u6765\u8ba1\u7b97\u3002\u8fd9\u79cd\u91cf\u5316\u65b9\u6cd5\u53ef\u4ee5\u68c0\u6d4b\u5355\u7b54\u548c\u591a\u7b54\u60c5\u51b5\u4e0b\u662f\u5426\u5b58\u5728\u865a\u6784\uff08\u5373\u77e5\u8bc6\u6027\u4e0d\u786e\u5b9a\u6027\u9ad8\uff09\u7684\u60c5\u51b5\uff0c\u8fd9\u4e0e\u8bb8\u591a\u6807\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\u7b56\u7565\uff08\u5982\u4ee5\u54cd\u5e94\u7684\u5bf9\u6570\u4f3c\u7136\u6027\u4f5c\u4e3a\u9608\u503c\uff09\u4e0d\u540c\uff0c\u540e\u8005\u65e0\u6cd5\u8bc6\u522b\u591a\u7b54\u60c5\u51b5\u4e0b\u7684\u865a\u6784\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u5b9e\u9a8c\uff0c\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86LLM\u5982\u4f55\u901a\u8fc7\u8fed\u4ee3\u63d0\u793a\u653e\u5927\u5bf9\u7ed9\u5b9a\u8f93\u51fa\u7684\u6982\u7387\u5206\u914d\uff0c\u8fd9\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u5174\u8da3\u4ef7\u503c\u3002|\n", "2406.02542": "|**2024-06-04**|**Loki: Low-Rank Keys for Efficient Sparse Attention**|Prajwal Singhania et.al.|[2406.02542](http://arxiv.org/abs/2406.02542)|null|\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u63a8\u7406\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u7279\u522b\u662f\u5f53\u4f7f\u7528\u957f\u5e8f\u5217\u65f6\uff0c\u81ea\u6ce8\u610f\u529b\u673a\u5236\u662f\u4e3b\u8981\u5f00\u9500\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u7a00\u758f\u6ce8\u610f\u529b\u8fd1\u4f3c\u65b9\u6cd5\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5206\u6790\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u5757\u4e2d\u7684\u952e\u5411\u91cf\u5b9e\u9645\u4e0a\u5904\u4e8e\u4e00\u4e2a\u8fdc\u4f4e\u4e8e\u539f\u59cb\u7ef4\u5ea6\u7684\u7a7a\u95f4\u3002\u8fd9\u4e00\u89c2\u5bdf\u4fc3\u4f7f\u6211\u4eec\u63d0\u51faLoki\uff0c\u4e00\u79cd\u65b0\u7684\u7a00\u758f\u6ce8\u610f\u529b\u65b9\u6cd5\u3002Loki\u6839\u636e\u5728\u4f4e\u7ef4\u7a7a\u95f4\u8ba1\u7b97\u7684\u6ce8\u610f\u529b\u5f97\u5206\uff0c\u5bf9KV\u7f13\u5b58\u4e2d\u7684\u4ee4\u724c\u8fdb\u884c\u6392\u5e8f\u548c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLoki\u80fd\u591f\u6bd4\u5176\u4ed6\u6d41\u884c\u8fd1\u4f3c\u65b9\u6cd5\u66f4\u597d\u5730\u4fdd\u6301\u6a21\u578b\u7684\u6548\u80fd\uff0c\u540c\u65f6\u7531\u4e8e\u51cf\u5c11\u4e86\u6570\u636e\u79fb\u52a8\uff08\u52a0\u8f7d/\u5b58\u50a8\uff09\u548c\u8ba1\u7b97\u6210\u672c\uff0c\u52a0\u901f\u4e86\u6ce8\u610f\u529b\u8ba1\u7b97\u3002|\n", "2406.02539": "|**2024-06-04**|**Parrot: Multilingual Visual Instruction Tuning**|Hai-Long Sun et.al.|[2406.02539](http://arxiv.org/abs/2406.02539)|**[link](https://github.com/aidc-ai/parrot)**|\u968f\u7740GPT-4V\u7b49\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u4eba\u5de5\u667a\u80fd\u671d\u7740\u901a\u7528\u4eba\u5de5\u667a\u80fd\u8fc8\u51fa\u4e86\u91cd\u8981\u4e00\u6b65\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u4e3b\u8981\u4f9d\u8d56\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6765\u540c\u6b65\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\uff0c\u4ece\u800c\u8d4b\u4e88\u5b83\u4eec\u591a\u6a21\u6001\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u505a\u6cd5\u53ef\u80fd\u5bfc\u81f4\u968f\u7740\u8bad\u7ec3\u7684\u8fdb\u884c\uff0c\u8bed\u8a00\u6a21\u578b\u5904\u7406\u591a\u79cd\u8bed\u8a00\u7684\u80fd\u529b\u9010\u6e10\u51cf\u5f31\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684\u4e0d\u5e73\u8861SFT\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u975e\u82f1\u8bed\u8bed\u8a00\u6027\u80fd\u663e\u8457\u4e0b\u964d\uff0c\u539f\u56e0\u5728\u4e8eSFT\u8fc7\u7a0b\u4e2d\u672a\u80fd\u6709\u6548\u8fde\u63a5\u89c6\u89c9\u7f16\u7801\u5668\u548c\u591a\u8bed\u8a00\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faParrot\uff0c\u4e00\u79cd\u5229\u7528\u6587\u672c\u5f15\u5bfc\u5728\u8bed\u8a00\u5c42\u9762\u9a71\u52a8\u89c6\u89c9\u4ee4\u724c\u5bf9\u9f50\u7684\u65b0\u65b9\u6cd5\u3002Parrot\u901a\u8fc7\u8ba9\u89c6\u89c9\u4ee4\u724c\u6839\u636e\u4e0d\u540c\u7684\u8bed\u8a00\u8f93\u5165\u8fdb\u884c\u6761\u4ef6\u5316\uff0c\u5e76\u501f\u52a9\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u4fc3\u8fdb\u591a\u8bed\u8a00\u4ee4\u724c\u7684\u5bf9\u9f50\u3002\u7279\u522b\u662f\uff0c\u4e3a\u4e86\u589e\u5f3a\u975e\u82f1\u8bed\u89c6\u89c9\u4ee4\u724c\u7684\u5bf9\u9f50\uff0c\u6211\u4eec\u8ba1\u7b97\u521d\u59cb\u89c6\u89c9\u7279\u5f81\u4e0e\u6587\u672c\u5d4c\u5165\u4e4b\u95f4\u7684\u8de8\u6ce8\u610f\u529b\uff0c\u7136\u540e\u5c06\u5176\u8f93\u5165\u5230MoE\u8def\u7531\u5668\uff0c\u9009\u62e9\u6700\u76f8\u5173\u7684\u4e13\u5bb6\u3002\u9009\u5b9a\u7684\u4e13\u5bb6\u4f1a\u5c06\u521d\u59cb\u89c6\u89c9\u4ee4\u724c\u8f6c\u5316\u4e3a\u7279\u5b9a\u8bed\u8a00\u7684\u89c6\u89c9\u4ee4\u724c\u3002\u9274\u4e8e\u76ee\u524d\u7f3a\u4e4f\u8bc4\u4f30\u591a\u8bed\u8a00\u80fd\u529b\u7684\u6807\u51c6\u57fa\u51c6\uff0c\u6211\u4eec\u8fd8\u521b\u5efa\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u591a\u8bed\u8a00\u591a\u6a21\u6001\u57fa\u51c6\uff08MMMB\uff09\uff0c\u5305\u62ec6\u79cd\u8bed\u8a00\u300115\u4e2a\u7c7b\u522b\u548c12,000\u4e2a\u95ee\u9898\u3002Parrot\u4e0d\u4ec5\u5728MMMB\u548cMMM Benchmark\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8fd8\u5728\u5e7f\u6cdb\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5c06\u63d0\u4f9bParrot\u7684\u6e90\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\u96c6\u4f9b\u516c\u4f17\u4f7f\u7528\u3002|\n", "2406.02536": "|**2024-06-04**|**Mitigate Position Bias in Large Language Models via Scaling a Single Dimension**|Yijiong Yu et.al.|[2406.02536](http://arxiv.org/abs/2406.02536)|**[link](https://github.com/PositionalHidden/PositionalHidden)**|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u4e00\u4e2a\u73b0\u8c61\u2014\u2014\u4f4d\u7f6e\u504f\u89c1\uff0c\u4e5f\u79f0\u4e3a\"\u8ff7\u5931\u5728\u4e2d\u95f4\"\u3002\u8fd9\u79cd\u504f\u89c1\u5728\u957f\u6587\u672c\u60c5\u5883\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u5373\u5173\u952e\u4fe1\u606f\u5728\u63d0\u793a\u4e2d\u7684\u4e0d\u540c\u4f4d\u7f6e\u4f1a\u663e\u8457\u5f71\u54cd\u6a21\u578b\u7684\u51c6\u786e\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6ce8\u610f\u529b\u6743\u91cd\u662f\u4f4d\u7f6e\u504f\u89c1\u7684\u5fae\u89c2\u8868\u73b0\u3002\u6b64\u5916\uff0c\u8bba\u6587\u6307\u51fa\uff0c\u56e0\u679c\u6ce8\u610f\u529b\u63a9\u7801\u901a\u8fc7\u521b\u5efa\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\uff0c\u4e5f\u5bf9\u4f4d\u7f6e\u504f\u89c1\u6709\u6240\u8d21\u732e\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u51cf\u8f7b\u4f4d\u7f6e\u504f\u89c1\uff0c\u5373\u8c03\u6574\u8fd9\u4e9b\u4f4d\u7f6e\u7279\u5b9a\u7684\u9690\u85cf\u72b6\u6001\u3002\u5b9e\u9a8c\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\uff0c\u5305\u62ec\u81ea\u7136\u95ee\u9898\u591a\u6587\u6863\u95ee\u7b54\u3001\u952e\u503c\u68c0\u7d22\u3001LongBench\u548c\u65f6\u95f4\u7ebf\u91cd\u6392\uff0c\u6d89\u53caRoPE\u6a21\u578b\u3001\u6269\u5c55\u4e0a\u4e0b\u6587\u7a97\u53e3\u6a21\u578b\u548cAlibi\u6a21\u578b\u7b49\u591a\u79cd\u67b6\u6784\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4ec5\u4fee\u6539\u9690\u85cf\u72b6\u6001\u7684\u4e00\u4e2a\u7ef4\u5ea6\uff0c\u5c31\u80fd\u5b9e\u73b0\u6027\u80fd\u63d0\u5347\uff0c\u6700\u9ad8\u53ef\u8fbe15.2%\u3002\u7814\u7a76\u8005\u8fd8\u63d0\u4f9b\u4e86\u4ee3\u7801\u4f9b\u8fdb\u4e00\u6b65\u4f7f\u7528\uff0c\u4ee3\u7801\u5730\u5740\u4e3a\uff1ahttps://aka.ms/PositionalHidden\u3002|\n", "2406.02532": "|**2024-06-04**|**SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices**|Ruslan Svirschevski et.al.|[2406.02532](http://arxiv.org/abs/2406.02532)|**[link](https://github.com/yandex-research/specexec)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u9ad8\u6548\u8fd0\u884c\u5b83\u4eec\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u63a8\u6d4b\u6027\u89e3\u7801\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u662f\u9488\u5bf9\u6570\u636e\u4e2d\u5fc3\u786c\u4ef6\u8fdb\u884c\u8bbe\u8ba1\u3002\u672c\u7814\u7a76\u53cd\u95ee\uff1a\u6211\u4eec\u80fd\u5728\u6d88\u8d39\u7ea7\u8bbe\u5907\u4e0a\u591a\u5feb\u5730\u8fd0\u884cLLMs\uff1f\u6d88\u8d39\u8005\u7ea7GPU\u5df2\u65e0\u6cd5\u5bb9\u7eb3\u6700\u5927\u7684\u6a21\u578b\uff08500\u4ebf\u53c2\u6570\u4ee5\u4e0a\uff09\uff0c\u56e0\u6b64\u9700\u8981\u5c06\u53c2\u6570\u5378\u8f7d\u5230RAM\u6216SSD\u3002\u5f53\u4f7f\u7528\u5378\u8f7d\u53c2\u6570\u7684\u65b9\u5f0f\u8fd0\u884c\u65f6\uff0c\u63a8\u7406\u5f15\u64ce\u53ef\u4ee5\u540c\u65f6\u5904\u7406\u6570\u767e\u4e43\u81f3\u6570\u5343\u4e2a\u4ee4\u724c\u7684\u6279\u6b21\uff0c\u4f7f\u5176\u975e\u5e38\u9002\u5408\u63a8\u6d4b\u6027\u89e3\u7801\u3002\u6211\u4eec\u63d0\u51faSpecExec\uff08\u63a8\u6d4b\u6027\u6267\u884c\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u7684\u5e76\u884c\u89e3\u7801\u65b9\u6cd5\uff0c\u9002\u7528\u4e8e\u4e3b\u6d41LLM\u5bb6\u65cf\uff0c\u80fd\u751f\u6210\u6bcf\u8f6e\u76ee\u6807\u6a21\u578b\u8fed\u4ee3\u9ad8\u8fbe20\u4e2a\u4ee4\u724c\u7684\u9884\u6d4b\u3002\u5b83\u5229\u7528\u73b0\u4ee3LLMs\u4e2d\u6982\u7387\u5206\u5e03\u7684\u9ad8\u6ce2\u52a8\u6027\u548c\u6a21\u578b\u8f93\u51fa\u6982\u7387\u4e4b\u95f4\u7684\u9ad8\u5ea6\u4e00\u81f4\u6027\u3002SpecExec\u901a\u8fc7\u4ece\u8349\u7a3f\u6a21\u578b\u83b7\u53d6\u6700\u53ef\u80fd\u7684\u4ee4\u724c\u5ef6\u7eed\uff0c\u6784\u5efa\u4e00\u4e2a\u76ee\u6807\u6a21\u578b\u7684\u201c\u7f13\u5b58\u201d\u6811\uff0c\u7136\u540e\u5728\u4e00\u4e2a\u5355\u6b21\u904d\u5386\u4e2d\u9a8c\u8bc1\u3002 \u4f7f\u7528SpecExec\uff0c\u6211\u4eec\u5728\u6d88\u8d39\u7ea7GPU\u4e0a\u5b9e\u73b0\u4e86500\u4ebf\u53c2\u6570LLM\u7684\u63a8\u7406\uff0c\u914d\u5408RAM\u5378\u8f7d\uff0c4\u4f4d\u91cf\u5316\u4e0b\u7684\u901f\u5ea6\u8fbe\u52304-6\u4e2a\u4ee4\u724c/\u79d2\uff0c\u800c16\u4f4d\u6743\u91cd\u4e0b\u7684\u901f\u5ea6\u4e3a2-3\u4e2a\u4ee4\u724c/\u79d2\u3002|\n", "2406.02528": "|**2024-06-04**|**Scalable MatMul-free Language Modeling**|Rui-Jie Zhu et.al.|[2406.02528](http://arxiv.org/abs/2406.02528)|**[link](https://github.com/ridgerchu/matmulfreellm)**|**## \u7ffb\u8bd1 \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\uff0c\u77e9\u9635\u4e58\u6cd5\uff08MatMul\uff09\u901a\u5e38\u5360\u636e\u4e3b\u8981\u8ba1\u7b97\u5f00\u9500\u3002\u968f\u7740LLMs\u7684\u89c4\u6a21\u6269\u5927\uff0c\u5176\u5d4c\u5165\u7ef4\u5ea6\u548c\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e5f\u968f\u4e4b\u589e\u52a0\uff0c\u8fd9\u4e00\u95ee\u9898\u66f4\u4e3a\u663e\u8457\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u80fd\u591f\u5728\u4fdd\u6301\u5f3a\u5927\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5b8c\u5168\u79fb\u9664LLMs\u4e2d\u7684MatMul\u64cd\u4f5c\uff0c\u5373\u4f7f\u662f\u572827\u4ebf\u53c2\u6570\u91cf\u7ea7\u7684\u6a21\u578b\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65e0MatMul\u6a21\u578b\u5728\u4e0e\u5185\u5b58\u6d88\u8017\u663e\u8457\u66f4\u591a\u7684\u72b6\u6001-of-the-artTransformer\u76f8\u5f53\u7684\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u7814\u7a76\u4e86\u6a21\u578b\u7684\u6269\u5c55\u6027\u89c4\u5f8b\uff0c\u5e76\u53d1\u73b0\u65e0MatMul\u6a21\u578b\u4e0e\u5168\u7cbe\u5ea6Transformer\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u968f\u7740\u6a21\u578b\u5c3a\u5bf8\u589e\u5927\u800c\u51cf\u5c0f\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684GPU\u5b9e\u73b0\uff0c\u76f8\u8f83\u4e8e\u672a\u4f18\u5316\u7684\u57fa\u7ebf\uff0c\u8bad\u7ec3\u65f6\u80fd\u51cf\u5c11\u9ad8\u8fbe61%\u7684\u5185\u5b58\u4f7f\u7528\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u901a\u8fc7\u4f18\u5316\u7684\u5185\u6838\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5185\u5b58\u6d88\u8017\u53ef\u964d\u4f4e\u8d85\u8fc710\u500d\u3002\u4e3a\u4e86\u51c6\u786e\u8bc4\u4f30\u67b6\u6784\u6548\u7387\uff0c\u6211\u4eec\u5728FPGA\u4e0a\u6784\u5efa\u4e86\u5b9a\u5236\u786c\u4ef6\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528GPU\u65e0\u6cd5\u5904\u7406\u7684\u8f7b\u91cf\u7ea7\u8fd0\u7b97\uff0c\u5b9e\u73b0\u4e86\u5bf9\u5341\u4ebf\u53c2\u6570\u89c4\u6a21\u6a21\u578b\u7684\u9ad8\u901f\u5904\u7406\uff0c\u4f7f\u5176\u63a5\u8fd1\u4eba\u8111\u7ea7\u522b\u7684\u6548\u7387\u3002 \u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u5c55\u793a\u4e86LLMs\u5728\u51cf\u5c0f\u590d\u6742\u6027\u540e\u4ecd\u80fd\u4fdd\u6301\u9ad8\u6548\uff0c\u8fd8\u6307\u51fa\u4e86\u672a\u6765\u52a0\u901f\u5668\u5e94\u4f18\u5316\u7684\u8fd0\u7b97\u7c7b\u578b\uff0c\u4ee5\u9002\u5e94\u4e0b\u4e00\u4ee3\u8f7b\u91cf\u7ea7LLMs\u7684\u9700\u6c42\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5b9e\u73b0\u5df2\u5f00\u6e90\u81f3\uff1a\\url{https://github.com/ridgerchu/matmulfreellm}\u3002**|\n", "2406.02524": "|**2024-06-04**|**CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks**|Maciej Besta et.al.|[2406.02524](http://arxiv.org/abs/2406.02524)|**[link](https://github.com/spcl/checkembed)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u5404\u4e2a\u9886\u57df\u5e26\u6765\u53d8\u9769\uff0c\u4f46\u9a8c\u8bc1\u5176\u7b54\u6848\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u5f00\u653e\u6027\u7684\u4efb\u52a1\uff0c\u5982\u77e5\u8bc6\u6574\u5408\u3001\u6458\u8981\u548c\u63d0\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheckEmbed\u7684\u7cbe\u786e\u3001\u53ef\u6269\u5c55\u4e14\u7b80\u4fbf\u7684LLM\u9a8c\u8bc1\u65b9\u6cd5\u3002CheckEmbed\u7684\u6838\u5fc3\u7406\u5ff5\u662f\uff1a\u901a\u8fc7\u5229\u7528\u5982GPT\u6587\u672c\u5d4c\u5165\u5927\u6a21\u578b\u83b7\u53d6\u7684\u7b54\u6848\u7ea7\u5d4c\u5165\u6765\u6bd4\u8f83LLM\u7684\u56de\u7b54\u3002\u8fd9\u5c06\u590d\u6742\u7684\u6587\u672c\u7b54\u6848\u8f6c\u5316\u4e3a\u5355\u4e00\u7684\u5d4c\u5165\uff0c\u7b80\u5316\u4e86\u5bf9\u6bd4\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u5feb\u901f\u800c\u6709\u610f\u4e49\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u9a8c\u8bc1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u5b9e\u73b0\u4e86CheckEmbed\u7684\u7406\u5ff5\uff0c\u5e76\u63d0\u4f9b\u4e86\u8bc4\u4f30LLM\u7b54\u6848\u771f\u5b9e\u6027\u7684\u5ea6\u91cf\uff0c\u5982\u5d4c\u5165\u70ed\u529b\u56fe\u53ca\u5176\u603b\u7ed3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u6307\u6807\u8bbe\u8ba1\u5b9e\u9645\u7684\u5f15\u64ce\uff0c\u4ee5\u51b3\u5b9aLLM\u7b54\u6848\u662f\u5426\u4ee4\u4eba\u6ee1\u610f\u3002\u5728\u5b9e\u9645\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\uff0c\u5982\u672f\u8bed\u63d0\u53d6\u548c\u6587\u6863\u6458\u8981\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u663e\u8457\u7684\u51c6\u786e\u6027\u63d0\u5347\u3001\u6210\u672c\u6548\u76ca\u548c\u8fd0\u884c\u65f6\u95f4\u6027\u80fd\uff0c\u76f8\u8f83\u4e8eBERTScore\u6216SelfCheckGPT\u7b49\u57fa\u4e8etoken\u3001\u53e5\u5b50\u548c\u4e8b\u5b9e\u7ea7\u522b\u7684\u65b9\u6848\u3002|\n", "2406.02523": "|**2024-06-04**|**RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots**|Soroush Nasiriany et.al.|[2406.02523](http://arxiv.org/abs/2406.02523)|null|## \u7ffb\u8bd1 \u4eba\u5de5\u667a\u80fd\u7684\u6700\u65b0\u8fdb\u5c55\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u89c4\u6a21\u7684\u6269\u5927\u3002\u7136\u800c\uff0c\u5728\u673a\u5668\u4eba\u9886\u57df\uff0c\u5927\u89c4\u6a21\u673a\u5668\u4eba\u6570\u636e\u96c6\u7684\u83b7\u53d6\u662f\u4e00\u4e2a\u74f6\u9888\u3002\u6211\u4eec\u4e3b\u5f20\u5229\u7528\u903c\u771f\u7684\u7269\u7406\u6a21\u62df\u6765\u63d0\u5347\u73af\u5883\u3001\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\uff0c\u4ee5\u652f\u6301\u673a\u5668\u4eba\u5b66\u4e60\u65b9\u6cd5\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecdRoboCasa\uff0c\u8fd9\u662f\u4e00\u4e2a\u5927\u578b\u7684\u4eff\u771f\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u80fd\u591f\u5728\u65e5\u5e38\u73af\u5883\u4e2d\u901a\u7528\u7684\u673a\u5668\u4eba\u3002RoboCasa\u7684\u7279\u70b9\u662f\u62e5\u6709\u4e30\u5bcc\u4e14\u591a\u6837\u5316\u7684\u53a8\u623f\u573a\u666f\uff0c\u5305\u62ec\u8d85\u8fc7150\u4e2a\u7c7b\u522b\u7684\u4e00\u5343\u591a\u4ef63D\u6a21\u578b\u8d44\u4ea7\u548c\u6570\u5341\u79cd\u53ef\u4ea4\u4e92\u7684\u5bb6\u5177\u548c\u7535\u5668\u3002 \u6211\u4eec\u901a\u8fc7\u751f\u6210\u5f0fAI\u5de5\u5177\u8fdb\u4e00\u6b65\u589e\u5f3a\u6a21\u62df\u7684\u771f\u5b9e\u6027\u548c\u591a\u6837\u6027\uff0c\u5982\u4f7f\u7528\u6587\u672c\u52303D\u6a21\u578b\u7684\u6280\u672f\u751f\u6210\u5bf9\u8c61\u8d44\u4ea7\uff0c\u4ee5\u53ca\u901a\u8fc7\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u73af\u5883\u7eb9\u7406\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86100\u9879\u4efb\u52a1\uff0c\u5305\u62ec\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6307\u5bfc\u7684\u590d\u5408\u4efb\u52a1\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f30\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u9ad8\u8d28\u91cf\u7684\u4eba\u7c7b\u6f14\u793a\uff0c\u5e76\u7ed3\u5408\u81ea\u52a8\u8f68\u8ff9\u751f\u6210\u65b9\u6cd5\uff0c\u4ee5\u6700\u5c0f\u7684\u4eba\u529b\u6210\u672c\u5927\u5e45\u6269\u5145\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u5408\u6210\u751f\u6210\u7684\u673a\u5668\u4eba\u6570\u636e\u8fdb\u884c\u5927\u89c4\u6a21\u6a21\u4eff\u5b66\u4e60\u65f6\uff0c\u5b58\u5728\u660e\u663e\u7684\u89c4\u6a21\u6548\u5e94\uff0c\u5e76\u663e\u793a\u51fa\u5229\u7528\u6a21\u62df\u6570\u636e\u5728\u73b0\u5b9e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\u3002\u76f8\u5173\u89c6\u9891\u548c\u5f00\u6e90\u4ee3\u7801\u5df2\u5728https://robocasa.ai/\u7f51\u7ad9\u4e0a\u63d0\u4f9b\u3002|\n", "2406.03496": "|**2024-06-05**|**Wings: Learning Multimodal LLMs without Text-only Forgetting**|Yi-Kai Zhang et.al.|[2406.03496](http://arxiv.org/abs/2406.03496)|null|## \u4efb\u52a1 \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8d77\u6e90\u4e8e\u9884\u8bad\u7ec3\u7684\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff0c\u9996\u5148\u5c06\u56fe\u50cf\u4e0e\u6587\u672c\u5bf9\u9f50\uff0c\u7136\u540e\u5728\u6df7\u5408\u6a21\u6001\u8f93\u5165\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0cMLLM\u5728\u5904\u7406\u4ec5\u5305\u542b\u6587\u672c\u7684\u6307\u4ee4\u65f6\u4f1a\u51fa\u73b0\u707e\u96be\u6027\u7684\u9057\u5fd8\uff0c\u8fd9\u4e9b\u6587\u672c\u6307\u4ee4\u5e76\u672a\u5305\u542b\u56fe\u50cf\uff0c\u8fd9\u4e9b\u95ee\u9898\u5728\u521d\u59cb\u7684\u8bed\u8a00\u6a21\u578b\u9636\u6bb5\u5c31\u5df2\u7ecf\u5b58\u5728\u3002\u672c\u6587\u63d0\u51faWings\uff0c\u4e00\u4e2a\u65b0\u578b\u7684MLLM\uff0c\u5b83\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u901a\u8fc7\u5206\u6790MLLM\u5728\u591a\u6a21\u6001\u6307\u4ee4\u4e2d\u7684\u6ce8\u610f\u529b\uff0c\u6211\u4eec\u53d1\u73b0\u6587\u672c\u9057\u5fd8\u4e0e\u4ece\u56fe\u50cf\u524d\u5411\u56fe\u50cf\u540e\u7684\u6ce8\u610f\u529b\u8f6c\u79fb\u6709\u5173\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u989d\u5916\u6a21\u5757\u4f5c\u4e3a\u589e\u5f3a\u5b66\u4e60\u5668\uff0c\u4ee5\u8865\u507f\u8fd9\u79cd\u6ce8\u610f\u529b\u8f6c\u79fb\u3002\u89c6\u89c9\u548c\u6587\u672c\u5b66\u4e60\u5668\u4f5c\u4e3a\u201c\u7fc5\u8180\u201d\u5f0f\u7684\u8865\u5145\uff0c\u5e73\u884c\u8fde\u63a5\u5728\u6bcf\u4e2a\u6ce8\u610f\u529b\u5757\u5185\uff0c\u8d77\u521d\u56fe\u50cf\u548c\u6587\u672c\u8f93\u5165\u7531\u89c6\u89c9\u5b66\u4e60\u5668\u4e0e\u4e3b\u6ce8\u610f\u529b\u534f\u540c\u5de5\u4f5c\uff0c\u5e73\u8861\u5bf9\u89c6\u89c9\u5143\u7d20\u7684\u5173\u6ce8\u3002\u968f\u540e\uff0c\u6587\u672c\u5b66\u4e60\u5668\u901a\u8fc7\u6ce8\u610f\u529b\u8def\u7531\u7684\u65b9\u5f0f\u4e0e\u89c6\u89c9\u5b66\u4e60\u5668\u7684\u8f93\u51fa\u534f\u4f5c\u6574\u5408\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4f4e\u79e9\u6b8b\u5dee\u6ce8\u610f\u529b\uff08LoRRA\uff09\u673a\u5236\u4ee5\u4fdd\u8bc1\u5b66\u4e60\u5668\u7684\u9ad8\u6548\u8fd0\u884c\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cWings\u5728\u6587\u672c\u5bf9\u8bdd\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u540c\u7b49\u89c4\u6a21\u7684MLLM\u3002\u5728\u6211\u4eec\u65b0\u6784\u5efa\u7684\u4ea4\u9519\u56fe\u50cf-\u6587\u672c\uff08IIT\uff09\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cWings\u5728\u4ece\u6587\u672c\u4e3a\u4e3b\u5230\u591a\u6a21\u6001\u4e3a\u4e3b\u7684\u95ee\u7b54\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002|\n", "2406.03488": "|**2024-06-06**|**Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training**|Ao Sun et.al.|[2406.03488](http://arxiv.org/abs/2406.03488)|**[link](https://github.com/maydomine/seq1f1b)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5176\u4e2d\u7ba1\u9053\u5e76\u884c\u6027\u8d77\u7740\u5173\u952e\u4f5c\u7528\u3002\u968f\u7740LLMs\u7684\u8bad\u7ec3\u5e8f\u5217\u957f\u5ea6\u6269\u5c55\u523032k\u751a\u81f3128k\uff0c\u5f53\u524d\u7684\u7ba1\u9053\u5e76\u884c\u65b9\u6cd5\u9762\u4e34\u4e25\u91cd\u74f6\u9888\uff0c\u5982\u9ad8\u5185\u5b58\u5360\u7528\u548c\u663e\u8457\u7684\u7ba1\u9053\u5ef6\u8fdf\uff0c\u8fd9\u6781\u5927\u5730\u9650\u5236\u4e86\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u548c\u8bad\u7ec3\u541e\u5410\u91cf\u3002\u4e3a\u4e86\u63d0\u9ad8\u5185\u5b58\u6548\u7387\u548c\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u957f\u5e8f\u5217\u8bad\u7ec3LLMs\u7684\u9ad8\u6548\u5e8f\u5217\u7ea7\u4e00\u6b21\u524d\u5411\u4e00\u6b21\u540e\u5411\uff081F1B\uff09\u7ba1\u9053\u8c03\u5ea6\u65b9\u6cd5\uff0c\u79f0\u4e3aSeq1F1B\u3002Seq1F1B\u5c06\u6279\u7ea7\u522b\u53ef\u8c03\u5ea6\u5355\u5143\u5206\u89e3\u4e3a\u66f4\u7ec6\u7684\u5e8f\u5217\u7ea7\u5355\u5143\uff0c\u4ece\u800c\u51cf\u5c0f\u5ef6\u8fdf\u5e76\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002 \u8003\u8651\u5230\u5982\u679c\u5747\u5300\u5206\u5272\u5e8f\u5217\uff0cSeq1F1B\u53ef\u80fd\u4f1a\u4ea7\u751f\u8f7b\u5fae\u7684\u989d\u5916\u5ef6\u8fdf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba1\u7b97\u7684\u7b56\u7565\u6765\u5212\u5206\u8f93\u5165\u5e8f\u5217\uff0c\u4ee5\u7f13\u89e3\u8fd9\u4e2a\u526f\u4f5c\u7528\u3002\u4e0e\u7ade\u4e89\u6027\u7684\u7ba1\u9053\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5982Megatron\u76841F1B\u7ba1\u9053\u5e76\u884c\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u66f4\u9ad8\u8bad\u7ec3\u541e\u5410\u91cf\u7684\u540c\u65f6\uff0c\u5185\u5b58\u5360\u7528\u66f4\u4f4e\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cSeq1F1B\u80fd\u591f\u5728\u4e0d\u4f7f\u7528\u91cd\u65b0\u8ba1\u7b97\u7b56\u7565\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u5730\u572864\u4e2aNVIDIA A100 GPU\u4e0a\u8bad\u7ec3\u4e00\u4e2a\u5177\u6709300\u4ebf\u53c2\u6570\u7684LLM\uff0c\u5904\u7406\u957f\u8fbe64k\u7684\u5e8f\u5217\uff0c\u8fd9\u662f\u73b0\u6709\u65b9\u6cd5\u65e0\u6cd5\u5b9e\u73b0\u7684\u3002\u6211\u4eec\u7684\u4ee3\u7801\u57fa\u4e8eMegatron-LM\uff0c\u5e76\u5df2\u5f00\u6e90\uff1ahttps://github.com/MayDomine/Seq1F1B.git\u3002|\n", "2406.03487": "|**2024-06-05**|**Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends**|Sanjana Ramprasad et.al.|[2406.03487](http://arxiv.org/abs/2406.03487)|null|### \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u6458\u8981\u751f\u6210\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5728\u771f\u5b9e\u6027\u65b9\u9762\u7684\u95ee\u9898\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u65b0\u95fb\u9886\u57df\u7684LLMs\uff0c\u5bf9\u8bdd\u6458\u8981\u7684\u8bc4\u4ef7\u4e3b\u8981\u96c6\u4e2d\u5728\u57fa\u4e8eBART\u7684\u6a21\u578b\u4e0a\uff0c\u8fd9\u5728\u6211\u4eec\u7406\u89e3\u5b83\u4eec\u7684\u53ef\u4fe1\u5ea6\u65b9\u9762\u7559\u4e0b\u4e86\u7a7a\u767d\u3002\u672c\u7814\u7a76\u65e8\u5728\u8bc4\u4f30LLMs\u5728\u5bf9\u8bdd\u6458\u8981\u4e2d\u7684\u771f\u5b9e\u6027\uff0c\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\uff0c\u5e76\u7740\u91cd\u4e8e\u8bc6\u522b\u548c\u5206\u7c7b\u53e5\u7ea7\u4e0d\u4e00\u81f4\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8GPT-4\u548cAlpaca-13B\u8fd9\u4e24\u6b3e\u4e3b\u6d41\u6a21\u578b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u9519\u8bef\u5b9a\u4e49\u7684\u5fae\u5999\u4e4b\u5904\uff1aLLMs\u5e38\u5e38\u751f\u6210\u770b\u4f3c\u5408\u7406\u7684\u63a8\u65ad\uff0c\u8fd9\u4e9b\u63a8\u65ad\u4f9d\u8d56\u4e8e\u5bf9\u8bdd\u4e2d\u7684\u95f4\u63a5\u8bc1\u636e\uff0c\u800c\u7f3a\u4e4f\u76f4\u63a5\u8bc1\u636e\uff0c\u8fd9\u5728\u65e7\u6a21\u578b\u4e2d\u8f83\u5c11\u89c1\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6539\u8fdb\u7684\u9519\u8bef\u5206\u7c7b\u4f53\u7cfb\uff0c\u5f15\u5165\u4e86\u201c\u60c5\u5883\u63a8\u7406\u201d\u7c7b\u522b\u6765\u5f52\u7c7b\u8fd9\u4e9bLLM\u884c\u4e3a\uff0c\u5e76\u516c\u5f00\u4e86\u76f8\u5173\u6570\u636e\u96c6\u3002\u5229\u7528\u6211\u4eec\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86LLMs\u4e0e\u8001\u5f0f\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u884c\u4e3a\u5dee\u5f02\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u81ea\u52a8\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\u5728LLM\u6458\u8981\u4e0a\u7684\u6548\u679c\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u8bc6\u522b\u8fd9\u7c7b\u7ec6\u5fae\u9519\u8bef\u65f6\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u7cbe\u7ec6\u9519\u8bef\u68c0\u6d4b\u65b9\u6cd5\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u4f18\u4e8e\u73b0\u6709\u6307\u6807\uff0c\u7279\u522b\u662f\u5728\u8bc6\u522b\u201c\u60c5\u5883\u63a8\u7406\u201d\u9519\u8bef\u65f6\u3002|\n", "2406.03486": "|**2024-06-05**|**BIPED: Pedagogically Informed Tutoring System for ESL Education**|Soonwoo Kwon et.al.|[2406.03486](http://arxiv.org/abs/2406.03486)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u663e\u793a\u51fa\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u80fd\u591f\u4f5c\u4e3a\u7ecf\u6d4e\u4e14\u6613\u4e8e\u83b7\u53d6\u7684\u82f1\u8bed\u7b2c\u4e8c\u8bed\u8a00\uff08L2\uff09\u5b66\u4e60\u8005\u5bf9\u8bdd\u5f0f\u667a\u80fd\u8f85\u5bfc\u7cfb\u7edf\uff08CITS\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684CITS\u5f80\u5f80\u53ea\u80fd\u6559\u6388\u7b80\u5355\u6982\u5ff5\uff0c\u6216\u8005\u5728\u6559\u5b66\u6df1\u5ea6\u4e0a\u65e0\u6cd5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u7b56\u7565\u7684\u9700\u6c42\u3002\u4e3a\u4e86\u5f00\u53d1\u4e00\u4e2a\u66f4\u5177\u6559\u80b2\u5b66\u5bfc\u5411\u3001\u80fd\u6559\u6388\u590d\u6742\u6982\u5ff5\u7684CITS\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53cc\u8bed\u6559\u80b2\u6307\u5bfc\u5bf9\u8bdd\u6570\u636e\u96c6\uff08BIPED\uff09\uff0c\u5305\u542b\u4e00\u5bf9\u4e00\u7684\u4eba\u7c7b\u82f1\u8bed\u8f85\u5bfc\u4e92\u52a8\u3002\u901a\u8fc7\u5bf9\u8f85\u5bfc\u5bf9\u8bdd\u7684\u540e\u5904\u7406\u5206\u6790\uff0c\u6211\u4eec\u63d0\u70bc\u51fa\u4e00\u5957\u5305\u542b34\u79cd\u6559\u5e08\u884c\u4e3a\u548c9\u79cd\u5b66\u751f\u884c\u4e3a\u7684\u5bf9\u8bdd\u52a8\u4f5c\u8bcd\u5178\uff0c\u5e76\u5c06\u5176\u7528\u4e8e\u8fdb\u4e00\u6b65\u6807\u6ce8\u6536\u96c6\u7684\u6570\u636e\u3002\u6839\u636e\u5148\u9884\u6d4b\u5408\u9002\u7684\u6559\u5e08\u884c\u4e3a\u518d\u751f\u6210\u76f8\u5e94\u56de\u590d\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u6211\u4eec\u5229\u7528GPT-4\u548cSOLAR-KO\u5206\u522b\u5b9e\u73b0\u4e86\u4e24\u4e2aCITS\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u5b9e\u65bd\u7684\u6a21\u578b\u4e0d\u4ec5\u6a21\u4eff\u4e86\u4eba\u7c7b\u6559\u5e08\u7684\u98ce\u683c\uff0c\u8fd8\u8fd0\u7528\u4e86\u4e30\u5bcc\u4e14\u4e0e\u4e0a\u4e0b\u6587\u76f8\u9002\u5e94\u7684\u6559\u5b66\u7b56\u7565\u3002|\n", "2406.03476": "|**2024-06-05**|**Does your data spark joy? Performance gains from domain upsampling at the end of training**|Cody Blakeney et.al.|[2406.03476](http://arxiv.org/abs/2406.03476)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u89c4\u6a21\u589e\u957f\u5230\u4e07\u4ebf\u7ea7\u522b\u7684tokens\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u7531\u5927\u89c4\u6a21\u7684CommonCrawl\u7f51\u7edc\u722c\u866b\u5185\u5bb9\u4ee5\u53ca\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\u7ec4\u6210\u3002\u7531\u4e8e\u5728\u5927\u8ba1\u7b97\u91cf\uff08FLOPs\uff09\u4e0b\u8bad\u7ec3\u4ee5\u63ed\u793a\u6a21\u578b\u5728\u56f0\u96be\u548c\u65b0\u5174\u57fa\u51c6\u4e0a\u7684\u663e\u8457\u53d8\u5316\u6210\u672c\u9ad8\u6602\uff0c\u5982\u4f55\u5728\u901a\u7528\u7f51\u7edc\u6293\u53d6\u7684\u591a\u6837\u6027\u548c\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5bc6\u5ea6\u4e4b\u95f4\u627e\u5230\u6700\u4f18\u5e73\u8861\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002\u672c\u6587\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c\u5728\u8bad\u7ec3\u540e\u671f\u5bf9\u5176\u8fdb\u884c\u4e0a\u91c7\u6837\uff0c\u4ece\u800c\u5728\u8bf8\u5982MMLU\u3001GSM8K\u548cHumanEval\u7b49\u57fa\u51c6\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u5bf9\u4e8e\u4e00\u4e2a\u8bad\u7ec3\u4e861\u4e07\u4ebf\uff08T\uff09\u4ee4\u724c\u768470\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u8fd9\u79cd\u7b80\u5355\u65b9\u6cd5\u53ef\u4f7f\u5176\u6027\u80fd\u63d0\u9ad86.90\u5206\u30018.26\u5206\u548c6.17\u5206\uff0c\u4e0e\u8bad\u7ec3\u65f6\u95f4\u4e24\u500d\u7684Llama-2\uff087B\uff09\u6a21\u578b\u76f8\u5f53\u3002\u6211\u4eec\u7814\u7a76\u4e86\u5728\u8bad\u7ec3\u540e\u671f\u9886\u57df\u4e0a\u91c7\u6837\u7684\u6301\u7eed\u65f6\u95f4\uff0c\u4ece5%\u523030%\uff0c\u53d1\u73b010%\u523020%\u7684\u6bd4\u4f8b\u6700\u4e3a\u5408\u9002\uff0c\u4ee5\u5e73\u8861\u4e00\u822c\u8bed\u8a00\u5efa\u6a21\u80fd\u529b\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684\u4f18\u5316\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5229\u7528\u9886\u57df\u4e0a\u91c7\u6837\u6765\u5927\u89c4\u6a21\u5206\u6790\u5355\u4e2a\u6570\u636e\u96c6\u5bf9\u4e0d\u540c\u57fa\u51c6\u7684\u589e\u76ca\uff0c\u901a\u8fc7\u5728\u8fd9\u4e00\u9636\u6bb5\u79fb\u9664\u5b83\u4eec\u8fdb\u884c\u5b9e\u9a8c\u3002\u8fd9\u79cd\u65b9\u6cd5\u6781\u5927\u5730\u964d\u4f4e\u4e86\u5b9e\u9a8c\u6210\u672c\uff0c\u4f7f\u5f97\u80fd\u591f\u4ee5\u9884\u8bad\u7ec3\u8fd0\u884c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u7684\u6210\u672c\u63a2\u7d22\u4e0d\u540c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474](http://arxiv.org/abs/2406.03474)|null|\u9274\u4e8e\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u805a\u7126\u4e8e\u4f7f\u7528MLLM\u9a71\u52a8\u7684\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\u5728\u5927\u89c4\u6a21\u52a8\u6001\u73af\u5883\u4e2d\u3002\u7136\u800c\uff0c\u5e38\u89c1\u7684\u65b9\u6cd5\u76f4\u63a5\u5c06\u9ad8\u7ea7\u6307\u4ee4\u8f6c\u5316\u4e3a\u4f4e\u7ea7\u8f66\u8f86\u63a7\u5236\u4fe1\u53f7\uff0c\u8fd9\u8fdd\u80cc\u4e86MLLM\u7684\u672c\u8d28\u751f\u6210\u6a21\u5f0f\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u5176\u6f5c\u5728\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u7684\u4e00\u822c\u5316\u80fd\u529b\u53d7\u5230\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u6781\u5927\u9650\u5236\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u4e2d\u5c42\u8bed\u8a00\u9a71\u52a8\u547d\u4ee4\u6765\u8fde\u63a5\u9ad8\u7ea7\u6307\u4ee4\u548c\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\uff0c\u5b83\u4eec\u6bd4\u9ad8\u7ea7\u6307\u4ee4\u66f4\u7ec6\u81f4\uff0c\u4f46\u6bd4\u63a7\u5236\u4fe1\u53f7\u66f4\u901a\u7528\u4e14\u53ef\u89e3\u91ca\uff0c\u4ece\u800c\u6709\u6548\u5f25\u5408\u4e24\u8005\u4e4b\u95f4\u7684\u9e3f\u6c9f\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3aAD-H\u7684\u5206\u5c42\u591a\u4ee3\u7406\u9a7e\u9a76\u7cfb\u7edf\u5b9e\u73b0\u8fd9\u4e00\u7406\u5ff5\uff0c\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u9ad8\u5c42\u63a8\u7406\u7684MLLM\u89c4\u5212\u5668\u548c\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u63a7\u5236\u5668\u8fdb\u884c\u4f4e\u5c42\u6267\u884c\u3002\u8fd9\u79cd\u5206\u5c42\u8bbe\u8ba1\u4f7fMLLM\u6446\u8131\u4e86\u4f4e\u7ea7\u63a7\u5236\u4fe1\u53f7\u89e3\u7801\uff0c\u5145\u5206\u91ca\u653e\u4e86\u5176\u5728\u9ad8\u5c42\u611f\u77e5\u3001\u63a8\u7406\u548c\u89c4\u5212\u65b9\u9762\u7684\u6d8c\u73b0\u80fd\u529b\u3002 \u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5e26\u6709\u52a8\u4f5c\u5c42\u6b21\u6ce8\u91ca\u7684\u65b0\u6570\u636e\u96c6\u3002\u5168\u9762\u7684\u95ed\u73af\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684AD-H\u7cfb\u7edf\u5177\u6709\u591a\u9879\u5173\u952e\u4f18\u52bf\u3002\u9996\u5148\uff0cAD-H\u5728\u9a7e\u9a76\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u751a\u81f3\u5c55\u73b0\u51fa\u5728\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u81ea\u6211\u7ea0\u6b63\u7684\u80fd\u529b\uff0c\u8fd9\u662f\u8bad\u7ec3\u6570\u636e\u672a\u6db5\u76d6\u7684\u573a\u666f\u3002\u5176\u6b21\uff0cAD-H\u5728\u957f\u7a0b\u6307\u4ee4\u548c\u65b0\u73af\u5883\u6761\u4ef6\u4e0b\u8868\u73b0\u51fa\u8272\uff0c\u660e\u663e\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\uff0c\u53ef\u901a\u8fc7\u83b7\u53d6\u3002|\n", "2406.03450": "|**2024-06-05**|**What is the Best Way for ChatGPT to Translate Poetry?**|Shanshan Wang et.al.|[2406.03450](http://arxiv.org/abs/2406.03450)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u5728\u82f1\u8bed-\u4e2d\u6587\u8bd7\u6b4c\u7ffb\u8bd1\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u5b9a\u5411\u63d0\u793a\u548c\u5c0f\u6837\u672c\u573a\u666f\u5206\u6790\u4ee5\u4f18\u5316\u5176\u8868\u73b0\u3002\u5c3d\u7ba1\u521d\u671f\u7ed3\u679c\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u7814\u7a76\u53d1\u73b0ChatGPT\u7684\u7ffb\u8bd1\u5b58\u5728\u6301\u7eed\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u89e3\u91ca\u8f85\u52a9\u8bd7\u6b4c\u673a\u5668\u7ffb\u8bd1\u201d\uff08EAPMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bd7\u6b4c\u7684\u5355\u8bed\u89e3\u91ca\u4f5c\u4e3a\u7ffb\u8bd1\u8fc7\u7a0b\u7684\u6307\u5bfc\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u4ee5\u66f4\u597d\u5730\u9002\u5e94\u73b0\u4ee3\u8bd7\u6b4c\u7ffb\u8bd1\u7684\u5fae\u5999\u4e4b\u5904\u3002\u6211\u4eec\u9080\u8bf7\u4e13\u4e1a\u8bd7\u4eba\u8fdb\u884c\u8bc4\u4f30\uff0c\u5e76\u7ed3\u5408GPT-4\u7684\u8bc4\u4ef7\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684EAPMT\u65b9\u6cd5\u5728\u4e0e\u4f20\u7edfChatGPT\u7ffb\u8bd1\u65b9\u6cd5\u4ee5\u53ca\u73b0\u6709\u5728\u7ebf\u7cfb\u7edf\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u8bba\u6587\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5e76\u4e3a\u6587\u5b66\u7ffb\u8bd1\u7684\u673a\u5668\u8f85\u52a9\u63d0\u4f9b\u4e86\u65b0\u9896\u89c6\u89d2\u3002|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445](http://arxiv.org/abs/2406.03445)|null|## \u7ffb\u8bd1 \u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u6267\u884c\u57fa\u672c\u7684\u7b97\u672f\u8fd0\u7b97\uff0c\u5982\u52a0\u6cd5\uff0c\u4ecd\u4e0d\u6e05\u695a\u3002\u672c\u6587\u63ed\u793a\u4e86\u9884\u8bad\u7ec3\u7684LLMs\u901a\u8fc7\u5085\u91cc\u53f6\u7279\u5f81\u8fdb\u884c\u52a0\u6cd5\u2014\u2014\u8fd9\u4e9b\u662f\u9690\u85cf\u72b6\u6001\u4e2d\u7684\u7ef4\u5ea6\uff0c\u901a\u8fc7\u4e00\u7ec4\u5728\u9891\u57df\u4e2d\u7a00\u758f\u5206\u5e03\u7684\u7279\u5f81\u6765\u8868\u793a\u6570\u5b57\u3002\u5728\u6a21\u578b\u4e2d\uff0c\u591a\u5c42\u611f\u77e5\u5668\uff08MLP\uff09\u5c42\u548c\u6ce8\u610f\u529b\u5c42\u4ee5\u4e92\u8865\u7684\u65b9\u5f0f\u4f7f\u7528\u5085\u91cc\u53f6\u7279\u5f81\uff1aMLP\u5c42\u4e3b\u8981\u4f7f\u7528\u4f4e\u9891\u7279\u5f81\u8fd1\u4f3c\u7b54\u6848\u7684\u5927\u5c0f\uff0c\u800c\u6ce8\u610f\u529b\u5c42\u4e3b\u8981\u901a\u8fc7\u9ad8\u9891\u7279\u5f81\u6267\u884c\u6a21\u8fd0\u7b97\uff08\u4f8b\u5982\u5224\u65ad\u7b54\u6848\u662f\u5426\u4e3a\u5076\u6570\uff09\u3002\u9884\u8bad\u7ec3\u5bf9\u4e8e\u8fd9\u79cd\u673a\u5236\u81f3\u5173\u91cd\u8981\uff1a\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684\u6a21\u578b\u4ec5\u5229\u7528\u4f4e\u9891\u7279\u5f81\uff0c\u5bfc\u81f4\u51c6\u786e\u6027\u8f83\u4f4e\u3002\u5c06\u9884\u8bad\u7ec3\u7684\u8bcd\u5d4c\u5165\u5f15\u5165\u5230\u968f\u673a\u521d\u59cb\u5316\u7684\u6a21\u578b\u4e2d\u53ef\u4ee5\u6062\u590d\u5176\u6027\u80fd\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u9002\u5f53\u7684\u9884\u8bad\u7ec3\u8868\u793a\uff08\u5982\u5085\u91cc\u53f6\u7279\u5f81\uff09\u80fd\u591f\u89e3\u9501Transformer\u5b66\u4e60\u7b97\u6cd5\u4efb\u52a1\u7cbe\u786e\u673a\u5236\u7684\u80fd\u529b\u3002|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441](http://arxiv.org/abs/2406.03441)|null|\u5728\u8bb8\u591a\u9ad8\u98ce\u9669\u7684\u673a\u5668\u5b66\u4e60\u5e94\u7528\u4e2d\uff0c\u6a21\u578b\u9700\u8981\u80fd\u591f\u8868\u660e\u5176\u5bf9\u9884\u6d4b\u7684\u4e0d\u786e\u5b9a\u6027\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u57fa\u51c6\u4e0a\u7684\u51c6\u786e\u5ea6\u53ef\u8fbe\u5230\u751a\u81f3\u8d85\u8fc7\u4eba\u7c7b\u6c34\u5e73\uff0c\u4f46\u5b83\u4eec\u5bf9\u9519\u8bef\u54cd\u5e94\u7684\u8fc7\u5ea6\u81ea\u4fe1\u4ecd\u662f\u5df2\u77e5\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u65b9\u6cd5\u5728\u76f4\u63a5\u5e94\u7528\u4e8eLLMs\u65f6\u53ef\u80fd\u9762\u4e34\u8ba1\u7b97\u6210\u672c\u548c\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u6311\u6218\u3002\u8fd1\u671f\u63d0\u51fa\u4e86\u4e00\u4e9b\u9ed1\u76d2\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bf8\u5982\u81ea\u6211\u8868\u8ff0\u7684\u4fe1\u5fc3\u7b49\u542f\u53d1\u5f0f\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u901a\u8fc7\u5206\u6790\u6a21\u578b\u751f\u6210\u7b54\u6848\u7684\u89e3\u91ca\u5206\u5e03\u6765\u8861\u91cfLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u5c3d\u7ba1\u5229\u7528\u89e3\u91ca\u672c\u8eab\u5e76\u975e\u65b0\u9896\uff0c\u4f46\u6211\u4eec\u5c06\u5176\u89c6\u4e3a\u6d4b\u8bd5\u65f6\u95f4\u5206\u7c7b\u5668\uff0c\u901a\u8fc7\u8ba1\u7b97\u6700\u53ef\u80fd\u7684\u5206\u7c7b\u5668\u540e\u9a8c\u7b54\u6848\u5206\u5e03\uff0c\u4ee5\u6b64\u8fdb\u884c\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u89e3\u91ca\u8574\u542b\u4f5c\u4e3a\u5206\u7c7b\u5668\u4f3c\u7136\u6027\u7684\u4e00\u79cd\u7279\u5b9a\u6846\u67b6\u5b9e\u4f8b\uff0c\u5982\u4f55\u5728\u4e94\u4e2a\u4e0d\u540c\u7684\u6570\u636e\u96c6\u4e0a\u6539\u8fdb\u4e86\u4fe1\u5fc3\u5206\u6570\u6307\u6807\uff08\u7279\u522b\u662fAUROC\u548cAURC\uff09\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u65e2\u5177\u6709\u7406\u8bba\u4f9d\u636e\uff0c\u53c8\u662f\u6709\u6548\u91cf\u5316LLMs\u4e0d\u786e\u5b9a\u6027\u7684\u65b9\u5f0f\u3002|\n", "2406.03411": "|**2024-06-05**|**Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach**|Saehyung Lee et.al.|[2406.03411](http://arxiv.org/abs/2406.03411)|**[link](https://github.com/saehyung-lee/plugir)**|**\u8be5\u8bba\u6587\u4e3b\u8981\u5173\u6ce8\u7684\u662f\u4ea4\u4e92\u5f0f\u6587\u672c\u5230\u56fe\u50cf\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u5bf9\u8bdd\u5f62\u5f0f\u4e0a\u4e0b\u6587\u67e5\u8be2\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8bba\uff0c\u540d\u4e3aPlugIR\uff0c\u901a\u8fc7\u4e24\u79cd\u65b9\u5f0f\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u822c\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u9996\u5148\uff0c\u901a\u8fc7\u91cd\u8ff0\u5bf9\u8bdd\u5f62\u5f0f\u7684\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u6d88\u9664\u4e86\u5728\u73b0\u6709\u89c6\u89c9\u5bf9\u8bdd\u6570\u636e\u4e0a\u5fae\u8c03\u68c0\u7d22\u6a21\u578b\u7684\u9700\u6c42\uff0c\u4ece\u800c\u80fd\u591f\u4f7f\u7528\u4efb\u610f\u9ed1\u76d2\u6a21\u578b\u3002\u5176\u6b21\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2aLLM\u63d0\u95ee\u8005\uff0c\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u4e2d\u5019\u9009\u56fe\u50cf\u7684\u4fe1\u606f\uff0c\u751f\u6210\u5173\u4e8e\u76ee\u6807\u56fe\u50cf\u5c5e\u6027\u7684\u975e\u5197\u4f59\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u51cf\u5c11\u4e86\u751f\u6210\u95ee\u9898\u7684\u566a\u58f0\u548c\u5197\u4f59\u3002\u9664\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u79f0\u4e3a\u6700\u4f73\u5bf9\u6570\u6392\u540d\u79ef\u5206\uff08BRI\uff09\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4ea4\u4e92\u5f0f\u68c0\u7d22\u7cfb\u7edf\u3002PlugIR\u5728\u591a\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u4f18\u4e8e\u96f6\u6b21\u8bbe\u7f6e\u548c Fine-tuned \u57fa\u51c6\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c PlugIR \u7684\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\u53ef\u4ee5\u6839\u636e\u4e0d\u540c\u60c5\u51b5\u7075\u6d3b\u5355\u72ec\u6216\u7ed3\u5408\u5e94\u7528\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/Saehyung-Lee/PlugIR\u3002**|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344](http://arxiv.org/abs/2406.04344)|null|\u53d7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d6\u5f97\u7684\u5de8\u5927\u8fdb\u5c55\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53e3\u5934\u5316\u673a\u5668\u5b66\u4e60\uff08VML\uff09\u6846\u67b6\u3002\u4e0e\u4f20\u7edf\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\uff0c\u901a\u5e38\u5728\u8fde\u7eed\u53c2\u6570\u7a7a\u95f4\u4e2d\u4f18\u5316\u4e0d\u540c\uff0cVML\u5c06\u53c2\u6570\u7a7a\u95f4\u9650\u5236\u4e3a\u4eba\u53ef\u7406\u89e3\u7684\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u79cd\u7ea6\u675f\u4fc3\u4f7f\u6211\u4eec\u4ece\u65b0\u89d2\u5ea6\u770b\u5f85\u51fd\u6570\u903c\u8fd1\u95ee\u9898\uff0c\u5373\u5c06\u5e26\u6709\u6587\u672c\u63d0\u793a\u7684LLM\u89c6\u4e3a\u7531\u6587\u672c\u63d0\u793a\u53c2\u6570\u5316\u7684\u51fd\u6570\u3002\u6211\u4eec\u501f\u6b64\u89c6\u89d2\u91cd\u65b0\u5ba1\u89c6\u4e86\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u4efb\u52a1\uff0c\u5982\u56de\u5f52\u548c\u5206\u7c7b\uff0c\u53d1\u73b0\u8fd9\u4e9b\u95ee\u9898\u53ef\u4ee5\u901a\u8fc7LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\u548c\u4f18\u5316\u5668\u6765\u89e3\u51b3\u3002VML\u7684\u4e3b\u8981\u4f18\u52bf\u5305\u62ec\uff1a\uff081\uff09\u6613\u4e8e\u7f16\u7801\u5148\u9a8c\u77e5\u8bc6\uff1a\u5173\u4e8e\u95ee\u9898\u548c\u5047\u8bbe\u7c7b\u7684\u5148\u9a8c\u77e5\u8bc6\u53ef\u4ee5\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7f16\u7801\u5e76\u8f93\u5165\u7ed9LLM\u53c2\u6570\u5316\u7684\u5b66\u4e60\u5668\uff1b\uff082\uff09\u81ea\u52a8\u6a21\u578b\u9009\u62e9\uff1a\u4f18\u5316\u5668\u53ef\u4ee5\u6839\u636e\u6570\u636e\u548c\u53e3\u5934\u5316\u5148\u9a8c\u77e5\u8bc6\u81ea\u52a8\u9009\u62e9\u5177\u4f53\u7684\u6a21\u578b\u7c7b\u522b\uff0c\u5e76\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u66f4\u65b0\u6a21\u578b\u7c7b\u522b\uff1b\uff083\uff09\u53ef\u89e3\u91ca\u7684\u5b66\u4e60\u8005\u66f4\u65b0\uff1aLLM\u53c2\u6570\u5316\u7684\u4f18\u5316\u5668\u53ef\u4ee5\u89e3\u91ca\u6bcf\u6b21\u5b66\u4e60\u8005\u66f4\u65b0\u7684\u539f\u56e0\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u591a\u9879\u5b9e\u9a8c\u8bc4\u4f30VML\u7684\u6709\u6548\u6027\uff0c\u5e0c\u671b\u5b83\u80fd\u6210\u4e3a\u589e\u5f3a\u673a\u5668\u5b66\u4e60\u53ef\u89e3\u91ca\u6027\u548c\u4fe1\u4efb\u5ea6\u7684\u6865\u6881\u3002|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339](http://arxiv.org/abs/2406.04339)|null|\u5728\u673a\u5668\u4eba\u64cd\u4f5c\u7684\u6838\u5fc3\u76ee\u6807\u4e2d\uff0c\u8ba9\u6a21\u578b\u7406\u89e3\u89c6\u89c9\u573a\u666f\u5e76\u6267\u884c\u52a8\u4f5c\u662f\u4e00\u4e2a\u57fa\u672c\u4efb\u52a1\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u673a\u5668\u4eba\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u80fd\u591f\u5904\u7406\u4e00\u4e9b\u57fa\u7840\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5728\u4e24\u4e2a\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff1a1\uff09\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u63a8\u7406\u80fd\u529b\u4e0d\u8db3\uff1b2\uff09\u5bf9\u4e8eMLLM\u7684\u5fae\u8c03\u548c\u63a8\u7406\u5b58\u5728\u9ad8\u8ba1\u7b97\u6210\u672c\u3002\u8fd1\u671f\u63d0\u51fa\u7684\u57fa\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSM\uff09\u7684Mamba\u5c55\u793a\u4e86\u5728\u975e\u5e73\u51e1\u5e8f\u5217\u5efa\u6a21\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5177\u6709\u7ebf\u6027\u63a8\u7406\u590d\u6742\u5ea6\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RoboMamba\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u673a\u5668\u4ebaMLLM\uff0c\u5b83\u5229\u7528Mamba\u6a21\u578b\u7ed3\u5408\u673a\u5668\u4eba\u63a8\u7406\u548c\u52a8\u4f5c\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6548\u7684\u5fae\u8c03\u548c\u63a8\u7406\u6548\u7387\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u4e0eMamba\u96c6\u6210\uff0c\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u4f7f\u89c6\u89c9\u6570\u636e\u4e0e\u8bed\u8a00\u5d4c\u5165\u5bf9\u9f50\uff0c\u8d4b\u4e88\u6a21\u578b\u89c6\u89c9\u5e38\u8bc6\u548c\u4e0e\u673a\u5668\u4eba\u76f8\u5173\u7684\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347RoboMamba\u7684\u52a8\u4f5c\u59ff\u6001\u9884\u6d4b\u80fd\u529b\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u7b80\u5355\u7684\u7b56\u7565\u5934\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4e00\u65e6RoboMamba\u5177\u5907\u8db3\u591f\u7684\u63a8\u7406\u80fd\u529b\uff0c\u53ea\u9700\u6781\u5c11\u7684\u5fae\u8c03\u53c2\u6570\uff08\u6a21\u578b\u76840.1%\uff09\u548c\u65f6\u95f4\uff0820\u5206\u949f\uff09\uff0c\u5c31\u80fd\u4e60\u5f97\u64cd\u7eb5\u6280\u80fd\u3002\u5728\u5b9e\u9a8c\u4e2d\uff0cRoboMamba\u5728\u901a\u7528\u548c\u673a\u5668\u4eba\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u63a8\u7406\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u4e2d\u5b9e\u73b0\u4e86\u59ff\u6001\u9884\u6d4b\u7684\u51fa\u8272\u8868\u73b0\uff0c\u5176\u63a8\u7406\u901f\u5ea6\u6bd4\u73b0\u6709\u673a\u5668\u4ebaMLLM\u5feb7\u500d\u3002\u9879\u76ee\u7684\u7f51\u9875\u94fe\u63a5\u4e3a\uff1a\u3002|\n", "2406.04337": "|**2024-06-06**|**Coherent Zero-Shot Visual Instruction Generation**|Quynh Phung et.al.|[2406.04337](http://arxiv.org/abs/2406.04337)|null|\u5c3d\u7ba1\u6587\u672c\u5230\u56fe\u50cf\u5408\u6210\u6280\u672f\u53d6\u5f97\u4e86\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5728\u6269\u6563\u6a21\u578b\u65b9\u9762\uff0c\u4f46\u751f\u6210\u9700\u8981\u7269\u4f53\u5728\u8fde\u7eed\u6b65\u9aa4\u4e2d\u4fdd\u6301\u4e00\u81f4\u8868\u793a\u548c\u5e73\u6ed1\u72b6\u6001\u8f6c\u6362\u7684\u89c6\u89c9\u6307\u4ee4\u4ecd\u7136\u662f\u4e00\u9879\u8270\u5de8\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\uff0c\u5de7\u5999\u5730\u7ed3\u5408\u4e86\u6587\u672c\u7406\u89e3\u4e0e\u56fe\u50cf\u751f\u6210\uff0c\u4ee5\u786e\u4fdd\u89c6\u89c9\u6307\u4ee4\u65e2\u7f8e\u89c2\u53c8\u5177\u6709\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u3002\u901a\u8fc7\u6d4b\u8bd5\u591a\u6b65\u9aa4\u6307\u4ee4\uff0c\u5e76\u4e0e\u591a\u4e2a\u57fa\u7ebf\u8fdb\u884c\u6bd4\u8f83\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u8fde\u8d2f\u4e14\u89c6\u89c9\u4e0a\u5438\u5f15\u4eba\u7684\u6307\u4ee4\u3002|\n", "2406.04334": "|**2024-06-06**|**DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs**|Lingchen Meng et.al.|[2406.04334](http://arxiv.org/abs/2406.04334)|null|\u5927\u591a\u6570\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u901a\u8fc7\u5c06\u89c6\u89c9\u4ee4\u724c\u4f5c\u4e3a\u5e8f\u5217\u8f93\u5165\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u7b2c\u4e00\u5c42\u6765\u5b9e\u73b0\u3002\u8fd9\u79cd\u65b9\u6cd5\u867d\u7136\u76f4\u89c2\uff0c\u4f46\u4f1a\u663e\u8457\u589e\u52a0\u8ba1\u7b97\u548c\u5185\u5b58\u5f00\u9500\uff0c\u56e0\u4e3a\u6a21\u578b\u9700\u8981\u5904\u7406\u66f4\u591a\u7684\u8f93\u5165\u5c42\u4ee4\u724c\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u67b6\u6784DeepStack\uff0c\u7528\u4e8eLMMs\u3002\u5728LMM\u7684\u89c6\u89c9\u548c\u8bed\u8a00Transformer\u7684N\u5c42\u4e2d\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u4ee4\u724c\u5206\u4e3aN\u7ec4\uff0c\u5e76\u4ece\u5e95\u5c42\u9010\u5c42\u5411\u4e0a\u9988\u9001\u5230\u5bf9\u5e94\u7684Transformer\u5c42\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u8fd9\u79cd\u7b80\u5355\u7684\u65b9\u6cd5\u6781\u5927\u5730\u589e\u5f3a\u4e86LMM\u5728\u8de8\u5c42\u89c6\u89c9\u4ee4\u724c\u4ea4\u4e92\u65b9\u9762\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u6210\u672c\u51e0\u4e4e\u4e0d\u53d8\u3002\u6211\u4eec\u5206\u522b\u5c06DeepStack\u5e94\u7528\u4e8eLMM\u7684\u8bed\u8a00\u548c\u89c6\u89c9Transformer\uff0c\u5e76\u901a\u8fc7\u5e7f\u6cdb\u5b9e\u8bc1\u7ed3\u679c\u9a8c\u8bc1\u4e86DeepStack LMM\u7684\u6709\u6548\u6027\u3002 \u4f7f\u7528\u76f8\u540c\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u6211\u4eec\u7684DeepStack 7B\u548c13B\u53c2\u6570\u6a21\u578b\u57289\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u5e73\u5747\u8d85\u8d8a\u540c\u7c7b\u6a21\u578b2.7\u5206\u548c2.9\u5206\u3002\u4ec5\u4f7f\u7528\u4e94\u5206\u4e4b\u4e00\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0cDeepStack\u7684\u8868\u73b0\u63a5\u8fd1\u4e8e\u4f7f\u7528\u5b8c\u6574\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6a21\u578b\u3002\u8fd9\u4e9b\u63d0\u5347\u5728\u9ad8\u5206\u8fa8\u7387\u4efb\u52a1\u4e2d\u5c24\u4e3a\u660e\u663e\uff0c\u4f8b\u5982\uff0c\u4e0eLLaVA-1.5-7B\u76f8\u6bd4\uff0cTextVQA\u3001DocVQA\u548cInfoVQA\u4e0a\u7684\u6027\u80fd\u5206\u522b\u63d0\u9ad8\u4e864.2\u5206\u300111.0\u5206\u548c4.0\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06DeepStack\u5e94\u7528\u5230\u89c6\u89c9Transformer\u5c42\uff0c\u8fd9\u5e26\u6765\u4e86\u4e0eLLaVA-1.5-7B\u76f8\u5f53\u7684\u5e73\u5747\u6539\u8fdb\uff0c\u4e3a3.8\u5206\u3002|\n", "2406.04331": "|**2024-06-06**|**PaCE: Parsimonious Concept Engineering for Large Language Models**|Jinqi Luo et.al.|[2406.04331](http://arxiv.org/abs/2406.04331)|**[link](https://github.com/peterljq/parsimonious-concept-engineering)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u5404\u79cd\u4efb\u52a1\uff0c\u5c3d\u7ba1\u5b83\u4eec\u80fd\u591f\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u56de\u590d\uff0c\u4f46\u4e5f\u4f1a\u4ea7\u751f\u4e0d\u826f\u8f93\u51fa\uff0c\u5982\u6f5c\u5728\u6709\u5bb3\u4fe1\u606f\u3001\u79cd\u65cf\u6216\u6027\u522b\u6b67\u89c6\u6027\u8a00\u8bba\u4ee5\u53ca\u9519\u8bef\u7684\u4fe1\u606f\u3002\u4e3a\u4e86\u51cf\u5c11\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5982\u5fae\u8c03\u3001\u63d0\u793a\u5de5\u7a0b\u548c\u8868\u793a\u5de5\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u9762\u4e34\u6311\u6218\uff1a\u4e00\u4e9b\u9700\u8981\u9488\u5bf9\u6bcf\u4e2a\u5bf9\u9f50\u4efb\u52a1\u8fdb\u884c\u6602\u8d35\u7684\u5fae\u8c03\uff1b\u4e00\u4e9b\u672a\u80fd\u5145\u5206\u6d88\u9664\u4e0d\u826f\u6982\u5ff5\uff0c\u5bf9\u9f50\u6548\u679c\u4e0d\u4f73\uff1b\u4e00\u4e9b\u5219\u5220\u9664\u4e86\u826f\u6027\u7684\u6982\u5ff5\uff0c\u964d\u4f4e\u4e86LLMs\u7684\u8bed\u8a00\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u540d\u4e3aParsimonious Concept Engineering\uff08PaCE\uff09\u7684\u65b0\u578b\u6fc0\u6d3b\u5de5\u7a0b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \u9996\u5148\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5927\u89c4\u6a21\u7684\u6982\u5ff5\u5b57\u5178\uff0c\u5b83\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u8868\u793a\u6bcf\u4e2a\u539f\u5b50\u5bf9\u5e94\u4e00\u4e2a\u8bed\u4e49\u6982\u5ff5\u3002\u63a5\u7740\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684\u4efb\u4f55\u5bf9\u9f50\u4efb\u52a1\uff0c\u6211\u4eec\u4f1a\u4f7f\u7528\u4e00\u4e2a\u6982\u5ff5\u5206\u533a\u5668\u9ad8\u6548\u5730\u6807\u8bb0\u8fd9\u4e9b\u6982\u5ff5\u4e3a\u826f\u6027\u6216\u4e0d\u826f\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u5229\u7528\u7a00\u758f\u7f16\u7801\u65b9\u6cd5\uff0c\u6839\u636e\u6982\u5ff5\u5b57\u5178\u5206\u89e3LLM\u7684\u6fc0\u6d3b\uff0c\u5c06\u5176\u51c6\u786e\u8868\u793a\u4e3a\u826f\u6027\u6210\u5206\u548c\u4e0d\u826f\u6210\u5206\u7684\u7ebf\u6027\u7ec4\u5408\u3002\u901a\u8fc7\u79fb\u9664\u4e0d\u826f\u6210\u5206\uff0c\u6211\u4eec\u80fd\u591f\u8c03\u6574LLMs\u7684\u884c\u4e3a\u4ee5\u7b26\u5408\u5bf9\u9f50\u76ee\u6807\u3002 \u6211\u4eec\u5728\u56de\u5e94\u51c0\u5316\u3001\u771f\u5b9e\u6027\u589e\u5f3a\u548c\u60c5\u611f\u4fee\u8ba2\u7b49\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u53d1\u73b0PaCE\u5728\u5b9e\u73b0\u5bf9\u9f50\u6027\u80fd\u7684\u540c\u65f6\uff0c\u4fdd\u6301\u4e86\u826f\u597d\u7684\u8bed\u8a00\u80fd\u529b\uff0c\u8fbe\u5230\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002**|\n", "2406.04314": "|**2024-06-06**|**Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step**|Zhanhao Liang et.al.|[2406.04314](http://arxiv.org/abs/2406.04314)|null|## \u80cc\u666f \u8fd1\u671f\uff0cDirect Preference Optimization (DPO) \u5df2\u6210\u529f\u6269\u5c55\u5230\u8c03\u6574\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\uff0c\u4f7f\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u4e0d\u540c\u4e8e\u5927\u591a\u6570\u73b0\u6709 DPO \u65b9\u6cd5\u5047\u8bbe\u6240\u6709\u6269\u6563\u6b65\u9aa4\u90fd\u4e0e\u6700\u7ec8\u751f\u6210\u56fe\u50cf\u4fdd\u6301\u4e00\u81f4\u7684\u504f\u597d\u987a\u5e8f\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u79cd\u5047\u8bbe\u5ffd\u7565\u4e86\u6bcf\u4e2a\u6b65\u9aa4\u7279\u6709\u7684\u53bb\u566a\u6027\u80fd\uff0c\u56e0\u6b64\u5e94\u8be5\u4e3a\u6bcf\u4e00\u6b65\u5b9a\u5236\u504f\u597d\u6807\u7b7e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u540e\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014Step-aware Preference Optimization (SPO)\uff0c\u5b83\u72ec\u7acb\u8bc4\u4f30\u5e76\u8c03\u6574\u6bcf\u4e2a\u6b65\u9aa4\u7684\u53bb\u566a\u6027\u80fd\uff0c\u5229\u7528\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\u548c\u6b65\u7ea7\u91cd\u91c7\u6837\u5668\u6765\u786e\u4fdd\u51c6\u786e\u7684\u6b65\u7ea7\u76d1\u7763\u3002 \u5728SPO\u4e2d\uff0c\u6211\u4eec\u5728\u6bcf\u4e2a\u53bb\u566a\u6b65\u9aa4\u4e2d\u4f1a\u521b\u5efa\u4e00\u4e2a\u56fe\u50cf\u6c60\uff0c\u5bfb\u627e\u5408\u9002\u7684\u80dc\u8005-\u8d25\u8005\u5bf9\uff0c\u5e76\u4e14\u5173\u952e\u5728\u4e8e\uff0c\u6211\u4eec\u4f1a\u4ece\u6c60\u4e2d\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u56fe\u50cf\u4f5c\u4e3a\u4e0b\u4e00\u6b21\u53bb\u566a\u6b65\u9aa4\u7684\u8d77\u70b9\u3002\u8fd9\u4e2a\u6b65\u7ea7\u91cd\u91c7\u6837\u8fc7\u7a0b\u4fdd\u8bc1\u4e86\u6bcf\u6b21\u80dc\u8005-\u8d25\u8005\u5bf9\u90fd\u6765\u81ea\u540c\u4e00\u539f\u59cb\u56fe\u50cf\uff0c\u4f7f\u5f97\u6bd4\u8f83\u72ec\u7acb\u4e8e\u524d\u4e00\u6b65\u3002\u4e3a\u4e86\u8bc4\u4f30\u6bcf\u4e2a\u6b65\u9aa4\u7684\u504f\u597d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6b65\u7ea7\u611f\u77e5\u504f\u597d\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u6a21\u7cca\u548c\u6e05\u6670\u7684\u56fe\u50cf\u3002\u5728Stable Diffusion v1.5\u548cSDXL\u7b49\u5b9e\u9a8c\u4e2d\uff0cSPO \u663e\u8457\u4f18\u4e8e\u6700\u65b0\u7684Diffusion-DPO\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u590d\u6742\u3001\u8be6\u7ec6\u7684\u63d0\u793a\u65f6\uff0c\u80fd\u66f4\u597d\u5730\u751f\u6210\u56fe\u50cf\u5e76\u63d0\u5347\u7f8e\u5b66\u6548\u679c\uff0c\u540c\u65f6\u5728\u8bad\u7ec3\u6548\u7387\u4e0a\u8d85\u8fc720\u500d\u3002\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1a[https://rockeycoss.github.io/spo.github.io/](https://rockeycoss.github.io/spo.github.io/)\u3002|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306](http://arxiv.org/abs/2406.04306)|**[link](https://github.com/ml-jku/SDLG)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u65f6\u53ef\u80fd\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u8fd9\u963b\u788d\u4e86\u793e\u4f1a\u548c\u5de5\u4e1a\u4e2d\u7684\u5404\u79cd\u5e94\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u964d\u4f4eLLMs\u7684\u53ef\u4fe1\u5ea6\u3002\u5f53\u524d\u7684LLMs\u91c7\u7528\u81ea\u56de\u5f52\u65b9\u5f0f\u751f\u6210\u6587\u672c\uff0c\u5373\u9884\u6d4b\u5e76\u6dfb\u52a0\u6587\u672c\u6807\u8bb0\u3002\u5f53LLMs\u5bf9\u751f\u6210\u7684\u4e0b\u4e00\u4e2a\u6807\u8bb0\u7684\u8bed\u4e49\u542b\u4e49\u4e0d\u786e\u5b9a\u65f6\uff0c\u5f88\u53ef\u80fd\u4f1a\u4ea7\u751f\u5e7b\u89c9\u3002\u56e0\u6b64\uff0c\u4eba\u4eec\u8ba4\u4e3a\u5e7b\u89c9\u6e90\u4e8e\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u8bed\u4e49\u591a\u6837\u6027\u8bed\u8a00\u751f\u6210\u201d\uff08Semantically Diverse Language Generation\uff0cSDLG\uff09\uff0c\u7528\u4e8e\u91cf\u5316LLMs\u7684\u9884\u6d4b\u4e0d\u786e\u5b9a\u6027\u3002SDLG\u5f15\u5bfcLLM\u751f\u6210\u8bed\u4e49\u591a\u6837\u4f46\u53c8\u5408\u7406\u7684\u521d\u59cb\u6587\u672c\u66ff\u4ee3\u65b9\u6848\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u7cbe\u786e\u7684aleatoric\u8bed\u4e49\u4e0d\u786e\u5b9a\u6027\u6d4b\u91cf\uff0c\u80fd\u591f\u68c0\u6d4b\u521d\u59cb\u6587\u672c\u662f\u5426\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u3002 \u5b9e\u9a8c\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u8868\u660e\uff0cSDLG\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u6700\u4e3a\u9ad8\u6548\uff0c\u4e3aLLMs\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u8bbe\u5b9a\u4e86\u65b0\u7684\u6807\u51c6\u3002**|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300](http://arxiv.org/abs/2406.04300)|null|\u5728\u6a21\u62df\u8bad\u7ec3\u548c\u8bc4\u4f30\u5173\u952e\u5b89\u5168\u7cfb\u7edf\uff0c\u5982\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u65f6\uff0c\u901a\u8fc7\u6a21\u62df\u751f\u6210\u5404\u79cd\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u6a21\u578b\u5176\u4ed6\u8f66\u8f86\u7684\u8f68\u8ff9\u4ee5\u6a21\u62df\u590d\u6742\u4e14\u6709\u610f\u4e49\u7684\u8fd1\u8ddd\u79bb\u4ea4\u4e92\u4efb\u52a1\u6210\u672c\u9ad8\u6602\u3002\u5229\u7528\u8bed\u8a00\u63cf\u8ff0\u6765\u751f\u6210\u9a7e\u9a76\u884c\u4e3a\u662f\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u76f4\u89c2\u7684\u4eba\u7c7b\u64cd\u4f5c\u65b9\u5f0f\uff0c\u80fd\u591f\u6a21\u62df\u5e7f\u6cdb\u9a7e\u9a76\u4e92\u52a8\u3002\u4f46\u5927\u578b\u6807\u6ce8\u7684\u8bed\u8a00-\u8f68\u8ff9\u6570\u636e\u7a00\u7f3a\u662f\u8fd9\u4e00\u65b9\u6cd5\u9762\u4e34\u7684\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Text-to-Drive\uff08T2D\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5408\u6210\u591a\u6837\u5316\u9a7e\u9a76\u884c\u4e3a\u7684\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u77e5\u8bc6\u9a71\u52a8\u4e24\u9636\u6bb5\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5229\u7528LLMs\u7684\u5185\u7f6e\u77e5\u8bc6\u751f\u6210\u4e30\u5bcc\u591a\u6837\u7684\u9a7e\u9a76\u884c\u4e3a\u8bed\u8a00\u63cf\u8ff0\uff1b\u63a5\u7740\uff0c\u5229\u7528\u5176\u63a8\u7406\u80fd\u529b\u5728\u6a21\u62df\u5668\u4e2d\u5b9e\u73b0\u8fd9\u4e9b\u884c\u4e3a\u3002T2D\u7684\u6838\u5fc3\u662f\u4f7f\u7528LLM\u6784\u5efa\u72b6\u6001\u56fe\uff0c\u5c06\u4f4e\u7ea7\u72b6\u6001\u6620\u5c04\u5230\u9ad8\u7ea7\u62bd\u8c61\uff0c\u4ece\u800c\u7b80\u5316\u4e86\u8bf8\u5982\u603b\u7ed3\u4f4e\u7ea7\u89c2\u6d4b\u3001\u8bc4\u4f30\u7b56\u7565\u4e0e\u884c\u4e3a\u63cf\u8ff0\u7684\u4e00\u81f4\u6027\u4ee5\u53ca\u8bbe\u8ba1\u8f85\u52a9\u5956\u52b1\u7b49\u4e0b\u6e38\u4efb\u52a1\uff0c\u65e0\u9700\u4eba\u5de5\u76d1\u7763\u3002\u901a\u8fc7\u6211\u4eec\u7684\u77e5\u8bc6\u9a71\u52a8\u65b9\u6cd5\uff0c\u6211\u4eec\u8bc1\u660eT2D\u80fd\u751f\u6210\u6bd4\u5176\u4ed6\u57fa\u51c6\u66f4\u4e30\u5bcc\u7684\u8f68\u8ff9\uff0c\u5e76\u63d0\u4f9b\u4e00\u4e2a\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u4ea4\u4e92\u5f0f\u5730\u878d\u5165\u4eba\u7c7b\u504f\u597d\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u7f51\u7ad9\uff1a|\n", "2406.04289": "|**2024-06-07**|**What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages**|Nadav Borenstein et.al.|[2406.04289](http://arxiv.org/abs/2406.04289)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b66\u4e60\u4ec0\u4e48\uff1f\u6839\u636e\u5b9a\u4e49\uff0c\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\u662f\u5b57\u7b26\u4e32\u7684\u5206\u5e03\u3002\u56e0\u6b64\uff0c\u53ef\u4ee5\u5c06\u8fd9\u4e2a\u95ee\u9898\u8f6c\u5316\u4e3a\u8bc4\u4f30\u5b57\u7b26\u4e32\u5206\u5e03\u7c7b\u7684\u5b66\u4e60\u80fd\u529b\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u7406\u8bba\u9650\u5236\uff0c\u4f46\u6211\u4eec\u5173\u6ce8\u7684\u662f\u5b9e\u9645\u53ef\u5b66\u4e60\u6027\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u5b9e\u8bc1\u5de5\u4f5c\uff0c\u6211\u4eec\u8bc4\u4f30\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u5176\u201c\u4e3b\u573a\u201d\u2014\u2014\u5b66\u4e60\u6982\u7387\u8bed\u8a00\u2014\u2014\u4e0a\u7684\u8868\u73b0\uff0c\u800c\u4e0d\u662f\u4f5c\u4e3a\u5f62\u5f0f\u8bed\u8a00\u7684\u5206\u7c7b\u5668\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7814\u7a76\u9012\u5f52\u8bed\u8a00\u6a21\u578b\uff08RLM\uff09\u7531\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u548cTransformer LM\u5b66\u4e60\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5RLM\u7684\u53ef\u5b66\u4e60\u6027\uff0c\u8003\u5bdf\u5176\u4e0eRLM\u7684\u590d\u6742\u53c2\u6570\u4ee5\u53ca\u795e\u7ecfLM\u9690\u85cf\u5c42\u5927\u5c0f\u7684\u5173\u7cfb\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cRLM\u7684\u79e9\uff08\u5bf9\u5e94\u4e8e\u5176\u6761\u4ef6\u5206\u5e03\u5bf9\u6570\u4f3c\u7136\u7ebf\u6027\u7a7a\u95f4\u7684\u5927\u5c0f\uff09\u548c\u91c7\u6837\u5b57\u7b26\u4e32\u7684\u9884\u671f\u957f\u5ea6\u662fRNN\u548cTransformer LM\u53ef\u5b66\u4e60\u6027\u7684\u5f3a\u4e14\u663e\u8457\u9884\u6d4b\u56e0\u7d20\u3002\u5176\u4ed6\u4e00\u4e9b\u9884\u6d4b\u6307\u6807\u4e5f\u8fbe\u5230\u4e86\u663e\u8457\u6027\uff0c\u4f46RNN\u548cTransformer\u4e4b\u95f4\u5b58\u5728\u4e0d\u540c\u7684\u6a21\u5f0f\u3002|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278](http://arxiv.org/abs/2406.04278)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|**## \u7ffb\u8bd1\u540e\u7684\u4e2d\u6587\u6458\u8981 \u5bf9\u8bdd\u8bed\u6c14\u5728\u4eba\u9645\u4ea4\u6d41\u4e2d\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u7814\u7a76\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ea4\u6d41\u8bed\u6c14\u7684\u5dee\u5f02\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u5173\u4e8e\u5bf9\u8bdd\u6a21\u5f0f\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u9884\u5148\u5b58\u5728\u7684\u5206\u7c7b\u4f53\u7cfb\u6216\u6587\u672c\u8bed\u6599\u5e93\uff0c\u8fd9\u4e9b\u53ef\u80fd\u5b58\u5728\u5b9e\u9a8c\u8005\u504f\u89c1\uff0c\u5e76\u53ef\u80fd\u65e0\u6cd5\u5145\u5206\u53cd\u6620\u7814\u7a76\u9886\u57df\u4e2d\u7684\u771f\u5b9e\u4e16\u754c\u5206\u5e03\u3002\u53d7\u8ba4\u77e5\u79d1\u5b66\u65b9\u6cd5\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u8fed\u4ee3\u65b9\u6cd5\uff0c\u901a\u8fc7\u4ea4\u66ff\u8fdb\u884c\u4e24\u9879\u4efb\u52a1\u6765\u540c\u65f6\u63ed\u793a\u8bed\u6c14\u548c\u53e5\u5b50\uff1a\uff081\uff09\u53c2\u4e0e\u8005\u5224\u65ad\u7ed9\u5b9a\u53e5\u5b50\u7684\u8bed\u6c14\uff0c\uff082\uff09\u53e6\u4e00\u53c2\u4e0e\u8005\u6839\u636e\u8be5\u8bed\u6c14\u751f\u6210\u53e5\u5b50\u3002\u6211\u4eec\u5728\u4eba\u7c7b\u53c2\u4e0e\u8005\u548cGPT-4\u4e4b\u95f4\u8fdb\u884c\u4e86100\u8f6e\u8fd9\u6837\u7684\u4e92\u52a8\uff0c\u4ece\u800c\u83b7\u5f97\u4e86\u4e00\u7ec4\u5305\u542b\u53e5\u5b50\u548c\u5e38\u89c1\u5bf9\u8bdd\u8bed\u6c14\u7684\u6570\u636e\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u989d\u5916\u5b9e\u9a8c\uff0c\u8ba9\u4eba\u7c7b\u548cGPT-4\u5bf9\u6240\u6709\u53e5\u5b50\u6807\u6ce8\u6240\u6709\u8bed\u6c14\u3002\u57fa\u4e8e1,339\u540d\u4eba\u7c7b\u53c2\u4e0e\u8005\u300133,370\u6b21\u4eba\u7c7b\u8bc4\u4ef7\u4ee5\u53ca29,900\u4e2aGPT-4\u67e5\u8be2\u7684\u6570\u636e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u79cd\u65b9\u6cd5\u521b\u5efa\u4e00\u4e2a\u53ef\u89e3\u91ca\u7684\u51e0\u4f55\u8868\u793a\uff0c\u4ee5\u5c55\u793a\u4eba\u7c7b\u548cGPT-4\u4e4b\u95f4\u7684\u5bf9\u8bdd\u8bed\u6c14\u5173\u7cfb\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u548c\u8ba4\u77e5\u79d1\u5b66\u7406\u5ff5\u5982\u4f55\u7ed3\u5408\uff0c\u4ee5\u89e3\u51b3\u4eba\u673a\u4ea4\u4e92\u4e2d\u7684\u6311\u6218\u3002**|\n", "2406.05132": "|**2024-06-07**|**3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs**|Jianing Yang et.al.|[2406.05132](http://arxiv.org/abs/2406.05132)|**[link](https://github.com/sled-group/3D-GRAND)**|\u5728\u8fd9\u4e2a\u7814\u7a76\u4e2d\uff0c\u8bed\u8a00\u4e0e\u4e09\u7ef4\u611f\u77e5\u7684\u878d\u5408\u5bf9\u4e8e\u6784\u5efa\u7406\u89e3\u548c\u4e92\u52a8\u4e8e\u7269\u7406\u4e16\u754c\u7684\u5b9e\u4f53\u4ee3\u7406\u548c\u673a\u5668\u4eba\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u9002\u5e94\u4e09\u7ef4\u73af\u5883\uff083D-LLMs\uff09\u65b9\u9762\u4ecd\u5904\u4e8e\u521d\u7ea7\u9636\u6bb5\uff0c\u4e3b\u8981\u6311\u6218\u5728\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u5bc6\u96c6\u5730\u5c06\u8bed\u8a00\u4e0e\u4e09\u7ef4\u573a\u666f\u5173\u8054\u7684\u6570\u636e\u96c6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e863D-GRAND\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u521b\u6027\u7684\u5927\u578b\u6570\u636e\u96c6\uff0c\u5305\u542b40,087\u4e2a\u5bb6\u5ead\u573a\u666f\uff0c\u914d\u5bf9\u6709620\u4e07\u6761\u8be6\u5c3d\u7684\u573a\u666f-\u8bed\u8a00\u6307\u4ee4\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u75283D-GRAND\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u663e\u8457\u63d0\u9ad8\u4e863D-LLMs\u7684\u5b9a\u4f4d\u80fd\u529b\uff0c\u5e76\u51cf\u5c11\u4e86\u9519\u8bef\u7684\u60f3\u8c61\u3002\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e863D-POPE\u57fa\u51c6\uff0c\u7528\u4e8e\u7cfb\u7edf\u6027\u8bc4\u4f303D-LLMs\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u6a21\u578b\u7684\u516c\u5e73\u6bd4\u8f83\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u6570\u636e\u96c6\u89c4\u6a21\u4e0e3D-LLM\u6027\u80fd\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u5f3a\u8c03\u4e86\u5927\u578b\u4e09\u7ef4\u6587\u672c\u6570\u636e\u96c6\u5728\u63a8\u52a8\u4f53\u611fAI\u7814\u7a76\u4e2d\u7684\u5173\u952e\u4f5c\u7528\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u521d\u6b65\u8ff9\u8c61\u8868\u660e\uff0c\u901a\u8fc7\u5728\u5927\u578b\u5408\u6210\u6570\u636e\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u53ef\u80fd\u5728\u73b0\u5b9e\u4e16\u754c3D\u626b\u63cf\u4e2d\u8868\u73b0\u826f\u597d\uff0c\u8fd9\u5c55\u793a\u4e86\u6a21\u62df\u5230\u5b9e\u9645\u7684\u8fc1\u79fb\u5b66\u4e60\u6f5c\u529b\u3002\u901a\u8fc73D-GRAND\u548c3D-POPE\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u4f53\u611fAI\u793e\u533a\u63d0\u4f9b\u5fc5\u8981\u7684\u8d44\u6e90\u548c\u6d1e\u89c1\uff0c\u63a8\u52a8\u66f4\u53ef\u9760\u3001\u66f4\u624e\u5b9e\u76843D-LLMs\u7684\u53d1\u5c55\u3002\u9879\u76ee\u7f51\u7ad9\uff1ahttps://3d-grand.github.io|\n", "2406.05130": "|**2024-06-07**|**An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models**|Xiongtao Zhou et.al.|[2406.05130](http://arxiv.org/abs/2406.05130)|null|\u8fd9\u7bc7\u8bba\u6587\u5173\u6ce8\u7684\u662f\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u7531\u4e8e\u8fd9\u4e9b\u6a21\u578b\u901a\u5e38\u5177\u6709\u6570\u5341\u4ebf\u53c2\u6570\uff0c\u5168\u9762\u8c03\u6574\u53d8\u5f97\u56f0\u96be\u3002\u7814\u7a76\u76ee\u6807\u662f\u627e\u51fa\u5728\u53c2\u6570\u53d7\u9650\u60c5\u51b5\u4e0b\u63d0\u5347MLLM\u6027\u80fd\u7684\u6709\u6548\u65b9\u6cd5\u3002\u901a\u8fc7\u5b9e\u9a8c\u4f7f\u7528\u56db\u79cd\u6d41\u884c\u7684PEFT\u6280\u672f\u5bf9\u5f00\u6e90MLLMs\u7684LLM\u7ec4\u4ef6\u8fdb\u884c\u5fae\u8c03\uff0c\u8bba\u6587\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5206\u6790\uff0c\u5185\u5bb9\u5305\u62ec\u4e0d\u540c\u65b9\u6cd5\u5bf9\u6a21\u578b\u3001\u53c2\u6570\u4f4d\u7f6e\u3001\u5fae\u8c03\u6570\u636e\u89c4\u6a21\u3001\u6a21\u578b\u7a33\u5b9a\u6027\u3001\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u7814\u7a76\u6db5\u76d6\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u4e03\u9879\u6570\u636e\u96c6\uff1a\u672a\u89c1\u8fc7\u7684\u548c\u5df2\u89c1\u8fc7\u7684\u3002\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u662f\u6700\u6709\u6548\u7684PEFT\u65b9\u6cd5\uff0c\u800c\u8fde\u63a5\u5668\u5c42\u7684\u5fae\u8c03\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u80fd\u63d0\u9ad8\u6027\u80fd\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05127": "|**2024-06-07**|**Towards Semantic Equivalence of Tokenization in Multimodal LLM**|Shengqiong Wu et.al.|[2406.05127](http://arxiv.org/abs/2406.05127)|null|### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002MLLM\u7684\u6838\u5fc3\u5728\u4e8e\u89c6\u89c9 tokenization\uff0c\u5373\u5982\u4f55\u6709\u6548\u5730\u5c06\u8f93\u5165\u7684\u89c6\u89c9\u4fe1\u53f7\u8f6c\u5316\u4e3a\u5bf9\u8bed\u8a00\u6a21\u578b\u6709\u76ca\u7684\u7279\u5f81\u8868\u793a\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u89c9tokenizer\u5728\u4fdd\u6301\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u8bed\u4e49\u4e00\u81f4\u6027\u4e0a\u5b58\u5728\u95ee\u9898\uff0c\u5b83\u4eec\u8fc7\u4e8e\u788e\u7247\u5316\u89c6\u89c9\u8f93\u5165\uff0c\u7834\u574f\u4e86\u89c6\u89c9\u5185\u5bb9\u7684\u8bed\u4e49\u5b8c\u6574\u6027\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u8bed\u4e49\u7b49\u6548\u89c6\u89c9tokenizer\uff08SeTok\uff09\uff0c\u5b83\u901a\u8fc7\u52a8\u6001\u805a\u7c7b\u7b97\u6cd5\u5c06\u89c6\u89c9\u7279\u5f81\u7ec4\u7ec7\u6210\u8bed\u4e49\u5355\u5143\uff0c\u6839\u636e\u56fe\u50cf\u590d\u6742\u6027\u7075\u6d3b\u51b3\u5b9atoken\u7684\u6570\u91cf\u3002\u8fd9\u79cd\u751f\u6210\u7684\u89c6\u89c9tokens\u80fd\u6709\u6548\u4fdd\u6301\u8bed\u4e49\u5b8c\u6574\u6027\uff0c\u540c\u65f6\u6355\u6349\u4f4e\u9891\u548c\u9ad8\u9891\u89c6\u89c9\u7279\u5f81\u3002 ### \u4efb\u52a1 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSetokim\u7684\u65b0\u578bMLLM\uff0c\u5b83\u7ed3\u5408\u4e86SeTok\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSetokim\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u4f18\u52bf\u3002\u5173\u4e8e\u66f4\u591a\u8be6\u60c5\uff0c\u53ef\u4ee5\u8bbf\u95ee\u9879\u76ee\u7f51\u9875\uff1ahttps://chocowu.github.io/SeTok-web/\u3002|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107](http://arxiv.org/abs/2406.05107)|null|## \u7ffb\u8bd1 \u6570\u636e\u63a2\u7d22\u662f\u4e00\u4e2a\u590d\u6742\u7684\u8fc7\u7a0b\uff0c\u7528\u6237\u901a\u8fc7\u9010\u6b65\u6267\u884c\u4e00\u7cfb\u5217\u67e5\u8be2\u6765\u5ba1\u89c6\u6570\u636e\u96c6\u3002\u6709\u65f6\uff0c\u7528\u6237\u4f1a\u63a2\u7d22\u65b0\u6570\u636e\u4ee5\u719f\u6089\u5b83\uff0c\u4f46\u66f4\u591a\u65f6\u5019\uff0c\u63a2\u7d22\u8fc7\u7a0b\u662f\u56f4\u7ed5\u7279\u5b9a\u5206\u6790\u76ee\u6807\u6216\u95ee\u9898\u8fdb\u884c\u7684\u3002\u4e3a\u4e86\u5e2e\u52a9\u7528\u6237\u6709\u6548\u63a2\u7d22\uff0c\u5df2\u63d0\u51fa\u81ea\u52a8\u5316\u6570\u636e\u63a2\u7d22\uff08Automated Data Exploration\uff0cADE\uff09\u7cfb\u7edf\uff0c\u5b83\u4eec\u65e8\u5728\u81ea\u52a8\u751f\u6210\u5c55\u793a\u6570\u636e\u6709\u8da3\u7279\u6027\u7684\u5b8c\u6574\u63a2\u7d22\u6d41\u7a0b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684ADE\u7cfb\u7edf\u5e38\u53d7\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u4f18\u5316\u51fd\u6570\uff0c\u5bfc\u81f4\u5bf9\u540c\u4e00\u6570\u636e\u96c6\u59cb\u7ec8\u4ea7\u751f\u76f8\u540c\u7684\u63a2\u7d22\u5e8f\u5217\uff0c\u8fd9\u5728\u6709\u660e\u786e\u76ee\u6807\u7684\u63a2\u7d22\u4e2d\u663e\u5f97\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51faLINX\uff0c\u4e00\u4e2a\u7ed3\u5408\u81ea\u7136\u8bed\u8a00\u63a5\u53e3\u7684\u751f\u6210\u5f0f\u7cfb\u7edf\uff0c\u4e13\u6ce8\u4e8e\u9762\u5411\u76ee\u6807\u7684\u6570\u636e\u63a2\u7d22\u3002 LINX\u63a5\u53d7\u8f93\u5165\u6570\u636e\u96c6\u548c\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u5206\u6790\u76ee\u6807\uff0c\u751f\u6210\u4e0e\u7528\u6237\u9700\u6c42\u76f8\u5173\u7684\u4e2a\u6027\u5316\u63a2\u7d22\u4f1a\u8bdd\u3002\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u89e3\u6790\u8f93\u5165\u7684\u5206\u6790\u76ee\u6807\uff0c\u5e76\u636e\u6b64\u751f\u6210\u671f\u671b\u8f93\u51fa\u63a2\u7d22\u4f1a\u8bdd\u7684\u89c4\u8303\u3002\u8fd9\u4e9b\u89c4\u8303\u968f\u540e\u88ab\u4f20\u9012\u7ed9\u57fa\u4e8e\u7ea6\u675f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08Constrained Deep Reinforcement Learning\uff0cCDRL\uff09\u7684\u65b0\u578b\u6a21\u5757\u5316ADE\u5f15\u64ce\uff0c\u4f7f\u5176\u80fd\u6839\u636e\u6307\u5b9a\u6307\u4ee4\u8c03\u6574\u8f93\u51fa\u3002\u4e3a\u4e86\u9a8c\u8bc1LINX\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u9762\u5411\u76ee\u6807\u63a2\u7d22\u7684\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u4e86\u6df1\u5165\u7684\u7528\u6237\u7814\u7a76\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLINX\u751f\u6210\u7684\u63a2\u7d22\u7b14\u8bb0\u672c\u5728\u76f8\u5173\u6027\u548c\u5b9e\u7528\u6027\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u89e3\u51b3\u65b9\u6848\uff0c\u5305\u62ecChatGPT\u3001\u65e0\u76ee\u6807\u5bfc\u5411\u7684ADE\u4ee5\u53ca\u5546\u4e1a\u7cfb\u7edf\u3002|\n", "2406.05085": "|**2024-06-07**|**Multi-Head RAG: Solving Multi-Aspect Problems with LLMs**|Maciej Besta et.al.|[2406.05085](http://arxiv.org/abs/2406.05085)|**[link](https://github.com/spcl/mrag)**|**## \u80cc\u666f **\u589e\u5f3a\u578b\u68c0\u7d22\u751f\u6210\uff08Retrieval Augmented Generation, RAG\uff09**\u901a\u8fc7\u5c06\u6587\u6863\u5185\u5bb9\u878d\u5165\u5927\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u4e2d\uff0c\u63d0\u9ad8\u4e86\u5176\u54cd\u5e94\u7684\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684RAG\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5904\u7406\u90a3\u4e9b\u53ef\u80fd\u9700\u8981\u68c0\u7d22\u5305\u542b\u4e0d\u540c\u5185\u5bb9\u7684\u591a\u6587\u6863\u67e5\u8be2\u3002\u8fd9\u7c7b\u95ee\u9898\u5728\u73b0\u5b9e\u4e2d\u5f88\u5e38\u89c1\uff0c\u4f46\u6311\u6218\u5728\u4e8e\uff0c\u8fd9\u4e9b\u6587\u6863\u7684\u5d4c\u5165\u5728\u5411\u91cf\u7a7a\u95f4\u4e2d\u53ef\u80fd\u76f8\u8ddd\u8f83\u8fdc\uff0c\u96be\u4ee5\u4e00\u6b21\u6027\u83b7\u53d6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6848\u2014\u2014**\u591a\u5934\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Multi-Head RAG, MRAG\uff09**\uff0c\u5b83\u4ee5\u4e00\u79cd\u7b80\u5355\u800c\u5f3a\u5927\u7684\u65b9\u5f0f\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff1a\u5229\u7528Transformer\u7684\u591a\u5934\u6ce8\u610f\u529b\u5c42\u7684\u6fc0\u6d3b\u4f5c\u4e3a\u68c0\u7d22\u952e\uff0c\u800c\u975e\u89e3\u7801\u5c42\u3002\u8fd9\u4e2a\u60f3\u6cd5\u7684\u9a71\u52a8\u529b\u5728\u4e8e\uff0c\u4e0d\u540c\u7684\u6ce8\u610f\u529b\u5934\u80fd\u591f\u5b66\u4e60\u6355\u6349\u6570\u636e\u7684\u4e0d\u540c\u65b9\u9762\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e9b\u6fc0\u6d3b\uff0c\u6211\u4eec\u5f97\u5230\u7684\u5d4c\u5165\u80fd\u4ee3\u8868\u6570\u636e\u9879\u548c\u67e5\u8be2\u7684\u591a\u79cd\u7279\u6027\uff0c\u4ece\u800c\u63d0\u5347\u590d\u6742\u67e5\u8be2\u7684\u68c0\u7d22\u7cbe\u5ea6\u3002 **\u8d21\u732e** \u6211\u4eec\u63d0\u4f9b\u4e86\u8bc4\u4f30\u65b9\u6cd5\u3001\u5ea6\u91cf\u6807\u51c6\u3001\u5408\u6210\u6570\u636e\u96c6\u4ee5\u53ca\u5b9e\u9645\u5e94\u7528\u6848\u4f8b\uff0c\u6765\u5c55\u793aMRAG\u7684\u6709\u6548\u6027\u3002\u4e0e\u6807\u51c6RAG\u57fa\u7ebf\u76f8\u6bd4\uff0cMRAG\u5728\u76f8\u5173\u6027\u65b9\u9762\u7684\u63d0\u5347\u53ef\u9ad8\u8fbe20%\u3002MRAG\u53ef\u4ee5\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u7684RAG\u6846\u67b6\uff0c\u5982RAGAS\uff0c\u4ee5\u53ca\u5404\u7c7b\u6570\u636e\u5b58\u50a8\u7cfb\u7edf\u3002 \u603b\u7ed3\uff0c\u672c\u6587\u65e8\u5728\u6539\u8fdb\u73b0\u6709RAG\u6a21\u578b\uff0c\u4ee5\u66f4\u597d\u5730\u5904\u7406\u6d89\u53ca\u591a\u89d2\u5ea6\u4fe1\u606f\u68c0\u7d22\u7684\u590d\u6742\u67e5\u8be2\u4efb\u52a1\u3002**|\n", "2406.05063": "|**2024-06-07**|**Are Large Language Models More Empathetic than Humans?**|Anuradha Welivita et.al.|[2406.05063](http://arxiv.org/abs/2406.05063)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7814\u7a76\u5b83\u4eec\u662f\u5426\u80fd\u5728\u60c5\u611f\u8bc6\u522b\u548c\u5171\u60c5\u56de\u5e94\u65b9\u9762\u8d85\u8d8a\u4eba\u7c7b\u5df2\u6210\u4e3a\u7814\u7a76\u7126\u70b9\u3002\u672c\u8bba\u6587\u5f00\u5c55\u4e86\u4e00\u9879\u6df1\u5165\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5305\u62ecGPT-4\u3001LLaMA-2-70B-Chat\u3001Gemini-1.0-Pro\u548cMixtral-8x7B-Instruct\u5728\u5185\u7684\u56db\u6b3e\u6700\u5148\u8fdb\u7684LLMs\u4e0e\u4eba\u7c7b\u5728\u5171\u60c5\u56de\u5e94\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u9879\u6d89\u53ca1,000\u540d\u53c2\u4e0e\u8005\u7684\u53cc\u76f2\u7528\u6237\u7814\u7a76\uff0c\u5bf92,000\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u60c5\u611f\u5bf9\u8bdd\u63d0\u793a\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u8fd9\u4e9b\u63d0\u793a\u6db5\u76d6\u4e8632\u79cd\u4e0d\u540c\u6b63\u8d1f\u60c5\u7eea\u7684\u5e7f\u6cdb\u8303\u56f4\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u7684\u5171\u60c5\u56de\u5e94\u80fd\u529b\u5728\u7edf\u8ba1\u5b66\u4e0a\u4f18\u4e8e\u4eba\u7c7b\u3002GPT-4\u8868\u73b0\u51fa\u6700\u5f3a\u70c8\u7684\u5171\u60c5\uff0c\u5176\u201c\u597d\u201d\u7b49\u7ea7\u522b\u7684\u56de\u590d\u6bd4\u4eba\u7c7b\u57fa\u51c6\u63d0\u9ad8\u4e86\u7ea631%\u3002\u7d27\u968f\u5176\u540e\u7684\u662fLLaMA-2\uff0c\u63d0\u5347\u4e86\u7ea624%\uff0cMixtral-8x7B\u63d0\u5347\u4e86\u7ea621%\uff0cGemini-Pro\u63d0\u5347\u4e86\u7ea610%\u3002\u6211\u4eec\u8fd8\u5bf9\u56de\u590d\u8bc4\u7ea7\u8fdb\u884c\u4e86\u66f4\u8be6\u7ec6\u7684\u5206\u6790\uff0c\u53d1\u73b0\u67d0\u4e9bLLMs\u5728\u56de\u5e94\u7279\u5b9a\u60c5\u7eea\u65b9\u9762\u660e\u663e\u4f18\u4e8e\u5176\u4ed6\u6a21\u578b\u3002\u63d0\u51fa\u7684\u8bc4\u4f30\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u4e14\u9002\u5e94\u6027\u5f3a\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4f30\u65b0LLMs\u7684\u5171\u60c5\u80fd\u529b\uff0c\u907f\u514d\u4e86\u672a\u6765\u7814\u7a76\u91cd\u590d\u8fd9\u9879\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055](http://arxiv.org/abs/2406.05055)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u901a\u8fc7\u5c11\u91cf\u793a\u4f8b\u63d0\u793a\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u8bc4\u4f30\u4e3b\u8981\u96c6\u4e2d\u5728\u7cbe\u5fc3\u6784\u5efa\u7684\u57fa\u51c6\u4e0a\uff0c\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u5b58\u5728\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u63a8\u7406\u95ee\u9898\uff0c\u5373\u6240\u8c13\u7684\u4e0d\u660e\u786e\u95ee\u9898\u3002\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0c\u73b0\u6709\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u6548\u679c\u4e0d\u4f73\uff0c\u5f80\u5f80\u7ed9\u51fa\u8fc7\u5ea6\u81ea\u4fe1\u7684\u7b54\u6848\u6216\u9519\u8bef\u63a8\u65ad\u3002\u4e3a\u4e86\u6df1\u5165\u7814\u7a76\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u201d\uff08PMC\uff09\u7684\u57fa\u51c6\uff0c\u5e76\u5f15\u5165\u4e86\u4e24\u4e2a\u65b0\u6307\u6807\u6765\u8bc4\u4f30\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\u5728\u5904\u7406\u8fd9\u7c7b\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002\u4f7f\u7528PMC\u57fa\u51c6\u7684\u5206\u6790\u63ed\u793a\u4e86\u5728\u89e3\u51b3\u660e\u786e\u95ee\u9898\u7684\u6570\u5b66\u63a8\u7406\u6027\u80fd\u4e0e\u8bc6\u522b\u4e0d\u660e\u786e\u95ee\u9898\u80fd\u529b\u4e4b\u95f4\u5b58\u5728\u6743\u8861\u3002\u9488\u5bf9PMC\u5e26\u6765\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5c11\u91cf\u63d0\u793a\u65b9\u6cd5\uff0c\u79f0\u4e3aSMT-LIB\u63d0\u793a\uff08SLP\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528SMT-LIB\u8bed\u8a00\u63cf\u8ff0\u95ee\u9898\uff0c\u800c\u4e0d\u662f\u76f4\u63a5\u6c42\u89e3\uff0c\u7136\u540e\u91c7\u7528\u53cc\u91cd\u68c0\u67e5\u6c42\u89e3\u7b56\u7565\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u7684\u6ee1\u8db3\u6027\u548c\u552f\u4e00\u6027\uff0c\u4ece\u800c\u63d0\u4f9b\u6700\u7ec8\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\u4e86\u6211\u4eec\u7684SLP\u65b9\u6cd5\u5728\u5904\u7406\u5e26\u6709\u7f3a\u5931\u548c\u77db\u76fe\u6761\u4ef6\u7684\u95ee\u9898\u65f6\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5177\u6709\u663e\u8457\u4f18\u52bf\u3002\u6211\u4eec\u5c06\u5f00\u6e90\u6211\u4eec\u7684\u57fa\u51c6\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u672a\u6765\u7684\u7814\u7a76\u3002|\n", "2406.05053": "|**2024-06-07**|**Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation**|Nachiket Kotalwar et.al.|[2406.05053](http://arxiv.org/abs/2406.05053)|null|### \u6982\u8ff0 \u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u7684\u6f5c\u529b\u5de8\u5927\uff0c\u5b83\u4eec\u80fd\u591f\u4e3a\u5b66\u4e60\u8005\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u53cd\u9988\u548c\u63d0\u793a\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u751f\u6210\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u4ee5\u8fbe\u5230\u4eba\u7c7b\u5bfc\u5e08\u7684\u6c34\u5e73\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u6559\u80b2\u90e8\u7f72\u4e2d\uff0c\u9664\u4e86\u8d28\u91cf\u5916\uff0c\u6210\u672c\u3001\u65f6\u95f4\u53ca\u6570\u636e\u9690\u79c1\u4e5f\u662f\u5173\u952e\u8003\u91cf\u56e0\u7d20\u3002\u672c\u8bba\u6587\u65e8\u5728\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u7f16\u7a0b\u53cd\u9988\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u5305\u62ec\u8d28\u91cf\u3001\u6210\u672c\u3001\u901f\u5ea6\u548c\u6570\u636e\u9690\u79c1\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u5229\u7528\u6700\u65b0\u7684\u5728\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u6280\u672f\uff0c\u8fd9\u6709\u52a9\u4e8e\u76f4\u63a5\u964d\u4f4e\u6210\u672c\u5e76\u4fdd\u62a4\u6570\u636e\u9690\u79c1\u3002 \u4e3a\u4e86\u4f18\u5316\u9002\u5408\u6d4f\u89c8\u5668\u5185\u8fd0\u884c\u7684\u5c0f\u578b\u6a21\u578b\u7684\u53cd\u9988\u8d28\u91cf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGPT-4\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u7684\u5fae\u8c03\u6d41\u7a0b\u3002\u6211\u4eec\u5c06\u5c55\u793a\u5982\u4f55\u4f7f\u7528WebLLM\u7684\u6d4f\u89c8\u5668\u5185\u63a8\u7406\u5f15\u64ce\u6765\u4f18\u5316Llama3-8B\u548cPhi3-3.8B\u76844\u4f4d\u91cf\u5316\u6a21\u578b\u5728\u4e09\u4e2a\u4e0d\u540cPython\u7f16\u7a0b\u6570\u636e\u96c6\u4e0a\u7684\u6548\u679c\u3002\u6211\u4eec\u627f\u8bfa\u4f1a\u516c\u5f00\u5168\u90e8\u5b9e\u73b0\u3001web\u5e94\u7528\u548c\u6570\u636e\u96c6\uff0c\u4ee5\u4fc3\u8fdb\u5728\u6d4f\u89c8\u5668\u8bed\u8a00\u6a21\u578b\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\u3002|\n", "2406.05039": "|**2024-06-07**|**Bootstrapping Referring Multi-Object Tracking**|Yani Zhang et.al.|[2406.05039](http://arxiv.org/abs/2406.05039)|**[link](https://github.com/zyn213/temprmot)**|## \u80cc\u666f \u5f53\u524d\u7684\u591a\u5bf9\u8c61\u5f15\u7528\u8ddf\u8e2a\uff08RMOT\uff09\u4efb\u52a1\u901a\u5e38\u4f9d\u8d56\u4e8e\u624b\u52a8\u6807\u6ce8\u7684\u6570\u636e\u96c6\u548c\u9759\u6001\u89c4\u5219\uff0c\u8fd9\u9650\u5236\u4e86\u591a\u6837\u6027\u548c\u5b9e\u65bd\u8303\u56f4\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u901a\u8fc7\u5f15\u5165\u66f4\u591a\u533a\u5206\u6027\u8bed\u8a00\u8bcd\u6c47\u6765\u63a8\u52a8RMOT\u4efb\u52a1\u7684\u53d1\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u9996\u5148\u5bf9Refer-KITTI\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u6269\u5c55\uff0c\u521b\u5efa\u4e86Refer-KITTI-V2\uff0c\u5b83\u4ece\u6700\u521d\u76842,719\u4e2a\u624b\u52a8\u6807\u6ce8\u5f00\u59cb\uff0c\u89e3\u51b3\u4e86\u7c7b\u522b\u4e0d\u5e73\u8861\u95ee\u9898\uff0c\u5e76\u589e\u52a0\u4e86\u66f4\u591a\u5173\u952e\u8bcd\uff0c\u4f7f\u5176\u66f4\u8d34\u8fd1\u73b0\u5b9e\u573a\u666f\uff0c\u76f8\u8f83\u4e8eRefer-KITTI\u6709\u6240\u8fdb\u6b65\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6269\u5145\u8fd9\u4e9b\u6807\u6ce8\uff0c\u603b\u8ba1\u8fbe\u52309,758\u4e2a\uff0c\u751f\u6210\u4e86617\u4e2a\u4e0d\u540c\u7684\u8bcd\u6c47\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684RMOT\u57fa\u51c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6539\u8fdb\u4e86RMOT\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u5355\u800c\u4f18\u96c5\u7684\u65f6\u5e8f\u63a8\u8fdb\u7b56\u7565\uff0c\u8be5\u7b56\u7565\u5728\u6027\u80fd\u4e0a\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\u3002\u76f8\u5173\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.05035": "|**2024-06-07**|**Scenarios and Approaches for Situated Natural Language Explanations**|Pengshuo Qiu et.al.|[2406.05035](http://arxiv.org/abs/2406.05035)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u751f\u6210\u9002\u5e94\u4e0d\u540c\u7528\u6237\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\uff08NLE\uff09\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u79cd\u9002\u5e94\u6027\u7684\u91cf\u5316\u8bc4\u4f30\u5c1a\u5b58\u7a7a\u767d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u2014\u2014\u57fa\u4e8e\u60c5\u5883\u7684\u89e3\u91ca\uff08Situation-Based Explanation\uff0cSBE\uff09\u6570\u636e\u96c6\uff0c\u5305\u542b100\u4e2a\u9700\u8981\u89e3\u91ca\u7684\u4e8b\u7269\uff08explanandum\uff09\u3002\u6bcf\u4e2a\u4e8b\u7269\u90fd\u914d\u5bf9\u4e86\u9488\u5bf9\u6559\u5e08\u3001\u5b66\u751f\u548c\u4e13\u4e1a\u4eba\u58eb\u7b49\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u89e3\u91ca\uff0c\u4ee5\u4fbf\u8bc4\u4f30\u6a21\u578b\u5728\u6ee1\u8db3\u8fd9\u4e9b\u591a\u5143\u5316\u7fa4\u4f53\u4fe1\u606f\u9700\u6c42\u548c\u80cc\u666f\u4e0b\u7684\u89e3\u91ca\u7cbe\u51c6\u5ea6\uff0c\u5982\u5b66\u751f\u3001\u6559\u5e08\u548c\u5bb6\u957f\u3002\u6bcf\u79cd\u201c\u4e8b\u4f8b-\u53d7\u4f17\u201d\u7ec4\u5408\u90fd\u9644\u6709\u4eba\u7c7b\u64b0\u5199\u7684\u53c2\u8003\u89e3\u91ca\uff0c\u7528\u4e8e\u8ba1\u7b97\u5206\u6570\uff0c\u4ee5\u91cf\u5316\u6a21\u578b\u5982\u4f55\u6839\u636e\u60c5\u5883\u8c03\u6574\u89e3\u91ca\u3002\u6211\u4eec\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u4e0a\u6d4b\u8bd5\u4e86\u4e09\u79cd\u63d0\u793a\u65b9\u6cd5\uff1a\u89c4\u5219\u57fa\u7840\u63d0\u793a\u3001\u5143\u63d0\u793a\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u3002\u7814\u7a76\u53d1\u73b0\uff1a1\uff09\u6a21\u578b\u53ef\u4ee5\u901a\u8fc7\u751f\u6210\u63d0\u793a\u4ea7\u751f\u66f4\u7cbe\u786e\u5730\u7b26\u5408\u76ee\u6807\u60c5\u5883\u7684\u89e3\u91ca\uff1b2\uff09\u660e\u786e\u63d0\u793a\u201c\u4f60\u662f\u4e00\u4e2a\u6709\u7528\u7684\u52a9\u624b\u201d\u5e76\u975e\u9488\u5bf9\u60c5\u5883\u5316NLE\u4efb\u52a1\u7684\u5fc5\u8981\u6280\u672f\uff1b3\uff09\u4e0a\u4e0b\u6587\u5b66\u4e60\u63d0\u793a\u4ec5\u80fd\u5e2e\u52a9\u6a21\u578b\u5b66\u4e60\u6f14\u793a\u6a21\u677f\uff0c\u4f46\u65e0\u52a9\u4e8e\u63d0\u5347\u5176\u63a8\u7406\u6027\u80fd\u3002SBE\u6570\u636e\u96c6\u548c\u6211\u4eec\u7684\u5206\u6790\u4e3a\u4eca\u540e\u751f\u6210\u9002\u5e94\u60c5\u5883\u7684\u81ea\u7136\u8bed\u8a00\u89e3\u91ca\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2406.06525": "|**2024-06-10**|**Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation**|Peize Sun et.al.|[2406.06525](http://arxiv.org/abs/2406.06525)|**[link](https://github.com/foundationvision/llamagen)**|**\u6211\u4eec\u63d0\u51faLlamaGen\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u65b0\u7684\u56fe\u50cf\u751f\u6210\u6a21\u578b\u5bb6\u65cf\uff0c\u5b83\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u539f\u59cb\u201c\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u201d\u8303\u5f0f\u5e94\u7528\u4e8e\u89c6\u89c9\u751f\u6210\u9886\u57df\u3002\u8fd9\u8868\u660e\uff0c\u5982\u679c\u9002\u5f53\u6269\u5c55\uff0c\u672a\u7ecf\u89c6\u89c9\u7279\u6027\u7684\u5148\u9a8c\u77e5\u8bc6\u589e\u5f3a\u7684\u7eaf\u81ea\u56de\u5f52\u6a21\u578b\uff08\u5982Llama\uff09\u4e5f\u80fd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u56fe\u50cf\u751f\u6210\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u63a2\u7d22\u4e86\u56fe\u50cf\u5206\u8bcd\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u7684\u53ef\u6269\u5c55\u6027\u4ee5\u53ca\u8bad\u7ec3\u6570\u636e\u8d28\u91cf\uff0c\u7ed3\u679c\u5982\u4e0b\uff1a(1) \u4e00\u79cd\u5177\u670916\u500d\u4e0b\u91c7\u6837\u7684\u56fe\u50cf\u5206\u8bcd\u5668\uff0c\u5176\u5728ImageNet\u57fa\u51c6\u4e0a\u7684\u91cd\u6784\u8d28\u91cf\u4e3a0.94\uff0c\u4ee3\u7801\u4e66\u5229\u7528\u7387\u9ad8\u8fbe97%\u3002(2) \u4e00\u7cfb\u5217\u4ece111\u767e\u4e07\u523031\u4ebf\u53c2\u6570\u7684\u7c7b\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u5728ImageNet 256x256\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.18\u7684FID\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u6d41\u884c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5982LDM\u548cDiT\u3002(3) \u4e00\u4e2a7.75\u4ebf\u53c2\u6570\u7684\u6587\u672c\u6761\u4ef6\u56fe\u50cf\u751f\u6210\u6a21\u578b\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8bad\u7ec3\u5728LAION-COCO\u548c\u9ad8\u5ba1\u7f8e\u8d28\u91cf\u56fe\u50cf\u4e0a\uff0c\u663e\u793a\u51fa\u826f\u597d\u7684\u89c6\u89c9\u8d28\u91cf\u548c\u6587\u672c\u4e00\u81f4\u6027\u6027\u80fd\u3002(4) \u6211\u4eec\u9a8c\u8bc1\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u670d\u52a1\u6846\u67b6\u5728\u4f18\u5316\u56fe\u50cf\u751f\u6210\u6a21\u578b\u63a8\u7406\u901f\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u5b9e\u73b0\u4e86326%\u81f3414%\u7684\u901f\u5ea6\u63d0\u5347\u3002\u6211\u4eec\u5f00\u6e90\u6240\u6709\u6a21\u578b\u548c\u4ee3\u7801\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u751f\u6210\u548c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u7684\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u7684\u53d1\u5c55\u3002**|\n", "2406.06519": "|**2024-06-10**|**UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor**|Shivani Upadhyay et.al.|[2406.06519](http://arxiv.org/abs/2406.06519)|**[link](https://github.com/castorini/umbrela)**|**## \u7ffb\u8bd1 \u5927\u91cf\u76f8\u5173\u6027\u5224\u65ad\u5bf9\u4e8e\u68c0\u7d22\u7cfb\u7edf\u7684\u6709\u6548\u8bad\u7ec3\u548c\u7cbe\u786e\u8bc4\u4f30\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u5224\u65ad\u7531\u4eba\u5de5\u8bc4\u5b9a\u5458\u5b8c\u6210\uff0c\u8fc7\u7a0b\u6602\u8d35\u4e14\u8017\u65f6\u3002\u5fae\u8f6fBing\u7684Thomas\u7b49\u4eba\u6700\u8fd1\u7684\u4e00\u9879\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u51c6\u786e\u5730\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u63d0\u4f9b\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u5224\u65ad\u3002\u9057\u61be\u7684\u662f\uff0c\u4ed6\u4eec\u7684\u7814\u7a76\u5e76\u672a\u516c\u5f00\u53ef\u4f9b\u91cd\u590d\u4f7f\u7528\u7684\u8f6f\u4ef6\u5de5\u5177\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u5f00\u6e90\u5de5\u5177\u5305\u2014\u2014UMBRELA\uff08\u5168\u79f0\u4e3a\u201cUMBRELA\u662fBing RELevance Assessor\u7684\u9012\u5f52\u7f29\u5199\u201d\uff09\uff0c\u5b83\u57fa\u4e8eOpenAI\u7684GPT-4\u6a21\u578b\u590d\u73b0\u4e86Thomas\u7b49\u4eba\u7684\u7ed3\u679c\uff0c\u5e76\u4e3a\u539f\u8bba\u6587\u589e\u6dfb\u4e86\u66f4\u591a\u7ec6\u8282\u3002\u6211\u4eec\u5728TREC 2019\u5e74\u81f32023\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u4efb\u52a1\u4e2d\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u76f8\u5173\u6027\u5224\u65ad\u4e0e\u9ad8\u6548\u591a\u9636\u6bb5\u68c0\u7d22\u7cfb\u7edf\u751f\u6210\u7684\u6392\u540d\u9ad8\u5ea6\u76f8\u5173\u3002\u8be5\u5de5\u5177\u5305\u8bbe\u8ba1\u4e3a\u6613\u4e8e\u6269\u5c55\uff0c\u53ef\u4ee5\u878d\u5165\u73b0\u6709\u7684\u591a\u9636\u6bb5\u68c0\u7d22\u548c\u8bc4\u4f30\u6d41\u7a0b\uff0c\u4e3a\u7814\u7a76\u68c0\u7d22\u8bc4\u4f30\u65b9\u6cd5\u7684\u7814\u7a76\u8005\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u8d44\u6e90\u3002UMBRELA\u5c06\u5728TREC 2024\u5e74\u7684RAG\u4efb\u52a1\u4e2d\u7528\u4e8e\u8f85\u52a9\u76f8\u5173\u6027\u8bc4\u4f30\uff0c\u6211\u4eec\u671f\u671b\u5b83\u6210\u4e3a\u8be5\u9886\u57df\u8fdb\u4e00\u6b65\u521b\u65b0\u7684\u57fa\u7840\u3002UMBRELA\u7684\u4ee3\u7801\u5e93\u53ef\u4e8ehttps://github.com/castorini/umbrela\u83b7\u53d6\u3002**|\n", "2406.06499": "|**2024-06-10**|**NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative**|Asmar Nadeem et.al.|[2406.06499](http://arxiv.org/abs/2406.06499)|null|\u5f53\u524d\u7684\u89c6\u9891\u5b57\u5e55\u57fa\u51c6\u548c\u6a21\u578b\u5728\u8868\u5f81\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u79cd\u53d9\u4e8b\u662f\u901a\u8fc7\u56e0\u679c\u5173\u7cfb\u8fde\u63a5\u7684\u4e00\u7cfb\u5217\u4e8b\u4ef6\uff0c\u968f\u65f6\u95f4\u53d1\u5c55\uff0c\u7531\u4eba\u7269\u6216\u4e3b\u4f53\u9a71\u52a8\u3002\u8fd9\u79cd\u7f3a\u4e4f\u53d9\u4e8b\u6027\u9650\u5236\u4e86\u6a21\u578b\u751f\u6210\u6355\u6349\u89c6\u9891\u5185\u5bb9\u5185\u5728\u56e0\u679c\u548c\u65f6\u95f4\u52a8\u6001\u7684\u6587\u672c\u63cf\u8ff0\u7684\u80fd\u529b\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51faNarrativeBridge\uff0c\u5b83\u5305\u62ec\u4ee5\u4e0b\u4e24\u4e2a\u7ec4\u6210\u90e8\u5206\uff1a\uff081\uff09\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u751f\u6210\u7684\u65b0\u578b\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\uff08CTN\uff09\u5b57\u5e55\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u660e\u786e\u5730\u5728\u89c6\u9891\u63cf\u8ff0\u4e2d\u7f16\u7801\u56e0\u679c\u5173\u7cfb\uff0c\u901a\u8fc7\u81ea\u52a8\u8bc4\u4f30\u786e\u4fdd\u8d28\u91cf\u548c\u76f8\u5173\u6027\uff1b\uff082\uff09\u4e00\u4e2a\u4e13\u95e8\u7684\u56e0\u679c\u7f51\u7edc\uff08CEN\uff09\u67b6\u6784\uff0c\u5177\u6709\u72ec\u7acb\u7684\u7f16\u7801\u5668\u4ee5\u5206\u522b\u6355\u83b7\u56e0\u679c\u52a8\u6001\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u7684\u5b66\u4e60\u548c\u751f\u6210\u5177\u6709\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7684\u5b57\u5e55\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCEN\u5728\u8868\u8fbe\u89c6\u9891\u5185\u5bb9\u7684\u56e0\u679c\u548c\u65f6\u95f4\u65b9\u9762\u6bd4\u7b2c\u4e8c\u597d\u7684\u6a21\u578b\uff08GIT\uff09\u66f4\u51c6\u786e\uff1a\u5728MSVD\u548cMSR-VTT\u6570\u636e\u96c6\u4e0a\u7684CIDEr\u5206\u6570\u5206\u522b\u4e3a17.88\u548c17.44\u3002\u63d0\u51fa\u7684\u6846\u67b6\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u5177\u6709\u590d\u6742\u56e0\u679c\u65f6\u95f4\u53d9\u4e8b\u7ed3\u6784\u7684\u7ec6\u5fae\u6587\u672c\u63cf\u8ff0\uff0c\u8fd9\u662f\u89c6\u9891\u5b57\u5e55\u751f\u6210\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u6027\u3002\u6709\u5173\u9879\u76ee\u8be6\u60c5\uff0c\u8bf7\u8bbf\u95ee\u3002|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474](http://arxiv.org/abs/2406.06474)|null|\u5728\u5065\u5eb7\u9886\u57df\uff0c\u5927\u90e8\u5206\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u4e34\u5e8a\u4efb\u52a1\u4e0a\u3002\u7136\u800c\uff0c\u79fb\u52a8\u548c\u53ef\u7a7f\u6234\u8bbe\u5907\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u957f\u671f\u7684\u4e2a\u4eba\u5065\u5eb7\u76d1\u6d4b\u6570\u636e\u5f80\u5f80\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Large Language Model\uff08PH-LLM\uff09\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u662fGemini\u7684\u5b9a\u5236\u7248\uff0c\u4e13\u4e3a\u7406\u89e3\u548c\u5904\u7406\u6570\u503c\u65f6\u95f4\u5e8f\u5217\u7684\u4e2a\u4eba\u5065\u5eb7\u6570\u636e\u800c\u8bbe\u8ba1\u3002\u6211\u4eec\u521b\u5efa\u5e76\u6574\u7406\u4e86\u4e09\u4e2a\u6d4b\u8bd5\u96c6\uff0c\u8003\u5bdf\u4e86PH-LLM\u5728\u4ee5\u4e0b\u65b9\u9762\u7684\u6027\u80fd\uff1a1\uff09\u4ece\u7761\u7720\u6a21\u5f0f\u3001\u8eab\u4f53\u6d3b\u52a8\u548c\u751f\u7406\u53cd\u5e94\u4e2d\u751f\u6210\u4e2a\u6027\u5316\u89c1\u89e3\u548c\u5efa\u8bae\uff1b2\uff09\u4e13\u4e1a\u77e5\u8bc6\u9886\u57df\u7684\u4e13\u5bb6\u6c34\u5e73\uff1b3\uff09\u9884\u6d4b\u81ea\u6211\u62a5\u544a\u7684\u7761\u7720\u7ed3\u679c\u3002\u6211\u4eec\u4e0e\u9886\u57df\u4e13\u5bb6\u5408\u4f5c\u6784\u5efa\u4e86857\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u8bc4\u4f30\u5b9e\u9645\u7684\u7761\u7720\u548c\u5065\u8eab\u573a\u666f\u3002\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684\u8bc4\u5206\u6807\u51c6\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0Gemini Ultra 1.0\u548cPH-LLM\u5728\u5065\u8eab\u65b9\u9762\u4e0e\u4e13\u5bb6\u8868\u73b0\u65e0\u7edf\u8ba1\u5dee\u5f02\uff0c\u5c3d\u7ba1\u5728\u7761\u7720\u65b9\u9762\u4e13\u5bb6\u4ecd\u5360\u4f18\u52bf\uff0c\u4f46Fine-tune\u540e\u7684PH-LLM\u5728\u5229\u7528\u76f8\u5173\u9886\u57df\u77e5\u8bc6\u548c\u4e2a\u4eba\u5316\u7761\u7720\u4fe1\u606f\u65b9\u9762\u8868\u73b0\u51fa\u663e\u8457\u63d0\u5347\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u591a\u9879\u9009\u62e9\u7684\u7761\u7720\u533b\u5b66\u548c\u5065\u8eab\u8003\u8bd5\u8bc4\u4f30\u4e86PH-LLM\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u5176\u5f97\u5206\u5206\u522b\u4e3a79%\u548c88%\uff0c\u8d85\u8fc7\u4e86\u4eba\u7c7b\u4e13\u5bb6\u6837\u672c\u7684\u5e73\u5747\u5206\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bad\u7ec3PH-LLM\u9884\u6d4b\u6765\u81ea\u53ef\u7a7f\u6234\u8bbe\u5907\u6587\u672c\u548c\u591a\u6a21\u6001\u7f16\u7801\u6570\u636e\u7684\u81ea\u6211\u62a5\u544a\u7761\u7720\u8d28\u91cf\u7ed3\u679c\uff0c\u5e76\u8bc1\u660e\u4e86\u591a\u6a21\u6001\u7f16\u7801\u5bf9\u4e8e\u8fbe\u5230\u4e13\u95e8\u533a\u5206\u6a21\u578b\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5728\u4e2a\u4eba\u5065\u5eb7\u8fd9\u4e2a\u5173\u952e\u5b89\u5168\u9886\u57df\u8fd8\u9700\u8981\u8fdb\u4e00\u6b65\u53d1\u5c55\u548c\u8bc4\u4f30\uff0c\u4f46\u8fd9\u4e9b\u7ed3\u679c\u5c55\u793a\u4e86Gemini\u6a21\u578b\u7684\u5e7f\u6cdb\u77e5\u8bc6\u548c\u80fd\u529b\uff0c\u4ee5\u53ca\u5c06\u751f\u7406\u6570\u636e\u5e94\u7528\u4e8e\u4e2a\u4eba\u5065\u5eb7\u5e94\u7528\uff0c\u5982PH-LLM\u4e2d\u7684\u505a\u6cd5\u3002|\n", "2406.06465": "|**2024-06-10**|**AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction**|Zhen Xing et.al.|[2406.06465](http://arxiv.org/abs/2406.06465)|null|\u6587\u672c\u5f15\u5bfc\u7684\u89c6\u9891\u9884\u6d4b\uff08TVP\uff09\u4efb\u52a1\u65e8\u5728\u6839\u636e\u521d\u59cb\u5e27\u548c\u6307\u4ee4\u9884\u6d4b\u540e\u7eed\u5e27\u7684\u8fd0\u52a8\uff0c\u8fd9\u5bf9\u4e8e\u865a\u62df\u73b0\u5b9e\u3001\u673a\u5668\u4eba\u6280\u672f\u548c\u5185\u5bb9\u521b\u4f5c\u7b49\u9886\u57df\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6539\u7f16Stable Diffusion\u5728\u8be5\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5e27\u4e00\u81f4\u6027\u4e0e\u65f6\u95f4\u7a33\u5b9a\u6027\u65b9\u9762\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u4e3b\u8981\u53d7\u9650\u4e8e\u89c6\u9891\u6570\u636e\u96c6\u7684\u89c4\u6a21\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u9884\u8bad\u7ec3\u7684Image2Video\u6269\u6563\u6a21\u578b\u5bf9\u89c6\u9891\u52a8\u6001\u6709\u826f\u597d\u7684\u5148\u9a8c\u77e5\u8bc6\uff0c\u4f46\u7f3a\u4e4f\u6587\u672c\u63a7\u5236\u3002\u56e0\u6b64\uff0c\u5c06Image2Video\u6a21\u578b\u8f6c\u79fb\uff0c\u540c\u65f6\u6ce8\u5165\u6307\u4ee4\u63a7\u5236\u4ee5\u751f\u6210\u53ef\u63a7\u5236\u7684\u89c6\u9891\uff0c\u65e2\u5177\u6709\u610f\u4e49\u53c8\u9887\u5177\u6311\u6218\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\uff0c\u7528\u4e8e\u6839\u636e\u521d\u59cb\u5e27\u548c\u6587\u672c\u6307\u4ee4\u9884\u6d4b\u672a\u6765\u7684\u89c6\u9891\u72b6\u6001\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u53cc\u67e5\u8be2Transformer\uff08DQFormer\uff09\u67b6\u6784\uff0c\u5b83\u5c06\u6307\u4ee4\u548c\u5e27\u4fe1\u606f\u6574\u5408\u5230\u6761\u4ef6\u5d4c\u5165\u4e2d\uff0c\u7528\u4e8e\u672a\u6765\u5e27\u7684\u9884\u6d4b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u957f\u77ed\u671f\u65f6\u5e8f\u9002\u914d\u5668\u548c\u7a7a\u95f4\u9002\u914d\u5668\uff0c\u80fd\u591f\u5728\u5c11\u91cf\u8bad\u7ec3\u6210\u672c\u4e0b\u5feb\u901f\u5c06\u901a\u7528\u89c6\u9891\u6269\u6563\u6a21\u578b\u9002\u5e94\u7279\u5b9a\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Something Something V2\u3001Epic Kitchen-100\u3001Bridge Data\u548cUCF-101\u56db\u4e2a\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6280\u672f\u3002\u7279\u522b\u662f\u5728Bridge\u6570\u636e\u96c6\u548cSSv2\u4e0a\uff0cAID\u5206\u522b\u5b9e\u73b0\u4e8691.2%\u548c55.5%\u7684FVD\u6539\u8fdb\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u9886\u57df\u7684\u6709\u6548\u6027\u3002\u66f4\u591a\u793a\u4f8b\u53ef\u5728\u6211\u4eec\u7684\u7f51\u7ad9\u627e\u5230\u3002|\n", "2406.06464": "|**2024-06-10**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464](http://arxiv.org/abs/2406.06464)|null|\u5c3d\u7ba1\u53ef\u7a7f\u6234\u5065\u5eb7\u8ffd\u8e2a\u5668\u65e5\u76ca\u666e\u53ca\uff0c\u7761\u7720\u548c\u8fd0\u52a8\u5bf9\u5065\u5eb7\u7684\u91cd\u8981\u6027\u4e0d\u8a00\u800c\u55bb\uff0c\u4f46\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5177\u6709\u884c\u52a8\u4ef7\u503c\u7684\u4e2a\u6027\u5316\u89c1\u89e3\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u8fd9\u9700\u8981\u5bf9\u5927\u91cf\u6570\u636e\u8fdb\u884c\u975e\u7ed3\u6784\u5316\u5206\u6790\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u5b83\u4eec\u80fd\u591f\u5229\u7528\u5de5\u5177\u7406\u89e3\u548c\u4e0e\u4e16\u754c\u4e92\u52a8\uff0c\u4e3a\u5927\u89c4\u6a21\u4e2a\u6027\u5316\u5206\u6790\u5e26\u6765\u4e86\u5e0c\u671b\u3002\u7136\u800c\uff0c\u5728\u4e2a\u4eba\u5065\u5eb7\u9886\u57df\u7684LLM\u5e94\u7528\u5c1a\u5f85\u5f00\u53d1\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPersonal Health Insights Agent\uff08PHIA\uff09\u7684\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u6700\u65b0\u7684\u4ee3\u7801\u751f\u6210\u548c\u4fe1\u606f\u68c0\u7d22\u5de5\u5177\u6765\u5206\u6790\u548c\u89e3\u91ca\u884c\u4e3a\u5065\u5eb7\u6570\u636e\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e24\u4e2a\u8d85\u8fc74000\u4e2a\u5065\u5eb7\u6d1e\u5bdf\u95ee\u9898\u7684\u57fa\u51c6\u95ee\u7b54\u6570\u636e\u96c6\u3002\u6839\u636e650\u5c0f\u65f6\u7684\u4eba\u7c7b\u548c\u4e13\u5bb6\u8bc4\u4f30\uff0cPHIA\u80fd\u51c6\u786e\u56de\u7b5484%\u4ee5\u4e0a\u7684\u4e8b\u5b9e\u6027\u6570\u503c\u95ee\u9898\uff0c\u4ee5\u53ca\u8d85\u8fc783%\u7684\u4f17\u5305\u5f00\u653e\u6027\u95ee\u9898\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u4e8e\u63a8\u52a8\u5927\u4f17\u884c\u4e3a\u5065\u5eb7\u8fdb\u6b65\u5177\u6709\u91cd\u8981\u610f\u4e49\uff0c\u53ef\u80fd\u4f7f\u4e2a\u4eba\u80fd\u591f\u89e3\u8bfb\u81ea\u5df1\u7684\u53ef\u7a7f\u6234\u6570\u636e\uff0c\u5f00\u8f9f\u4e86\u4e00\u4e2a\u4ee5\u6570\u636e\u9a71\u52a8\u6d1e\u5bdf\u4e3a\u6307\u5bfc\u7684\u4e2a\u6027\u5316\u5065\u5eb7\u65b9\u6848\u7684\u65b0\u65f6\u4ee3\uff0c\u4f7f\u5f97\u5065\u5eb7\u4fdd\u5065\u66f4\u52a0\u4fbf\u6377\u4e14\u4e2a\u6027\u5316\u3002|\n", "2406.06461": "|**2024-06-11**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461](http://arxiv.org/abs/2406.06461)|null|\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u591a\u79cd\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\uff0c\u4f46\u4f20\u7edf\u7684\u8bc4\u4ef7\u65b9\u6cd5\u4ec5\u5173\u6ce8\u6027\u80fd\u6307\u6807\uff0c\u5ffd\u89c6\u4e86\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u989d\u5916\u8ba1\u7b97\u8d44\u6e90\u5e26\u6765\u7684\u589e\u6548\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7b56\u7565\u6548\u7387\u7684\u7247\u9762\u7406\u89e3\u3002\u4e3a\u6b64\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u5c06\u8ba1\u7b97\u9884\u7b97\u7eb3\u5165\u8bc4\u4f30\uff0c\u4ee5\u63d0\u4f9b\u4e00\u4e2a\u65e2\u8003\u8651\u6027\u80fd\u6307\u6807\u53c8\u8003\u8651\u8ba1\u7b97\u6210\u672c\u7684\u66f4\u5168\u9762\u6bd4\u8f83\u3002\u901a\u8fc7\u8fd9\u79cd\u9884\u7b97\u610f\u8bc6\u7684\u89c6\u89d2\uff0c\u7814\u7a76\u53d1\u73b0\u590d\u6742\u7684\u63a8\u7406\u7b56\u7565\u5728\u6ca1\u6709\u663e\u8457\u7b97\u6cd5\u521b\u65b0\u7684\u60c5\u51b5\u4e0b\uff0c\u5f80\u5f80\u7531\u4e8e\u5206\u914d\u4e86\u66f4\u591a\u7684\u8ba1\u7b97\u8d44\u6e90\u800c\u8d85\u8d8a\u4e86\u7b80\u5355\u7684\u57fa\u7ebf\u3002\u4f8b\u5982\uff0c\u5f53\u7ed9\u4e88\u94fe\u5f0f\u601d\u8003\u81ea\u6d3d\u6027\uff08chain-of-thought self-consistency\uff09\u7c7b\u4f3c\u7ea7\u522b\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u5b83\u5e38\u5e38\u80fd\u4f18\u4e8e\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u63a8\u7406\u7b56\u7565\u3002\u7136\u800c\uff0c\u5728\u8fd9\u79cd\u89c4\u6a21\u654f\u611f\u7684\u89c6\u89d2\u4e0b\uff0c\u67d0\u4e9b\u7b56\u7565\u5982\u591a\u4ee3\u7406\u8fa9\u8bba\u6216\u591a\u53cd\u601d\u5728\u589e\u52a0\u8ba1\u7b97\u9884\u7b97\u65f6\u53ef\u80fd\u4f1a\u8868\u73b0\u5f97\u66f4\u5dee\u3002|\n", "2406.06458": "|**2024-06-10**|**Evaluating the Retrieval Component in LLM-Based Question Answering Systems**|Ashkan Alinejad et.al.|[2406.06458](http://arxiv.org/abs/2406.06458)|null|## \u80cc\u666f \u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u95ee\u7b54\u7cfb\u7edf\u5728\u4f9d\u8d56\u68c0\u7d22\u7ec4\u4ef6\u65f6\uff0c\u80fd\u591f\u83b7\u53d6\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u5e76\u964d\u4f4e\u4ea7\u751f\u4e0d\u51c6\u786e\u56de\u590d\u6216\u9519\u8bef\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5c3d\u7ba1\u4fe1\u606f\u68c0\u7d22\u9886\u57df\u7684\u8bc4\u4f30\u65b9\u6cd5\u65e9\u5df2\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u8bc4\u4f30LLMs\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u6027\u80fd\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u7684\u57fa\u51c6\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc4\u4ef7\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u7684\u804a\u5929\u673a\u5668\u4eba\u4e2d\u7684\u68c0\u7d22\u5668\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u5168\u9762\u5730\u53cd\u6620\u68c0\u7d22\u5668\u7684\u6027\u80fd\uff0c\u5e76\u4e0e\u6574\u4e2a\u95ee\u7b54\u7cfb\u7edf\u7684\u6574\u4f53\u8868\u73b0\u66f4\u4e3a\u4e00\u81f4\u3002\u5c3d\u7ba1\u4f20\u7edf\u7684\u7cbe\u786e\u5ea6\uff08precision\uff09\u3001\u53ec\u56de\u7387\uff08recall\uff09\u548cF1\u5206\u6570\u7b49\u6307\u6807\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u63ed\u793aLLMs\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4f1a\u5728\u68c0\u7d22\u5668\u4e0d\u5b8c\u7f8e\u65f6\u4ecd\u63d0\u4f9b\u51c6\u786e\u7b54\u6848\uff0c\u4f46\u6211\u4eec\u7684\u8bc4\u4f30\u65b9\u6cd5\u8003\u8651\u5230\u4e86LLMs\u7684\u4f18\u52bf\uff0c\u5373\u5b83\u4eec\u80fd\u591f\u5ffd\u7565\u65e0\u5173\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4e5f\u80fd\u5904\u7406\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u548c\u865a\u6784\u5185\u5bb9\u3002|\n", "2406.06455": "|**2024-06-10**|**A Large Language Model Pipeline for Breast Cancer Oncology**|Tristen Pool et.al.|[2406.06455](http://arxiv.org/abs/2406.06455)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f17\u591a\u9886\u57df\u5c55\u73b0\u51fa\u521b\u65b0\u6f5c\u529b\uff0c\u4f46\u5728\u764c\u75c7\u6cbb\u7597\u65b9\u9762\u7684\u5e94\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u5f00\u53d1\u3002\u7814\u7a76\u8005\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684Langchain\u63d0\u793a\u5de5\u7a0b\u7ba1\u9053\uff0c\u5bf9\u6700\u5148\u8fdb\u7684OpenAI\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u6570\u636e\u96c6\u5305\u62ec\u4e34\u5e8a\u6570\u636e\u548c\u4e34\u5e8a\u6307\u5357\u6587\u672c\uff0c\u4e13\u6ce8\u4e8e\u4e73\u817a\u764c\u60a3\u8005\u8f85\u52a9\u653e\u7597\u548c\u5316\u7597\u4e24\u4e2a\u5173\u952e\u6cbb\u7597\u56e0\u7d20\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6a21\u578b\u5728\u5206\u7c7b\u8fd9\u4e24\u4e2a\u6cbb\u7597\u624b\u6bb5\u65f6\u8fbe\u5230\u4e86\u9ad8\u7cbe\u5ea6\uff080.85+\uff09\u3002\u901a\u8fc7\u89c2\u5bdf\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u7684\u6cbb\u7597\u8d28\u91cf\u6570\u636e\uff0c\u5efa\u7acb\u4e86\u4e00\u4e2a\u7f6e\u4fe1\u533a\u95f4\uff0c\u4f30\u8ba1\u6a21\u578b\u5728\u9884\u6d4b\u6cbb\u7597\u65b9\u6848\u65f6\u5fc5\u987b\u6bd4\u539f\u59cb\u80bf\u7624\u5b66\u5bb6\u8868\u73b0\u5f97\u66f4\u597d\uff0c\u624d\u80fd\u5728\u603b\u4f53\u4e0a\u6210\u4e3a\u66f4\u597d\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6bd4\u4f8b\u4e3a8.2%\u81f313.3%\u3002\u7531\u4e8e\u764c\u75c7\u6cbb\u7597\u51b3\u7b56\u7ed3\u679c\u7684\u4e0d\u786e\u5b9a\u6027\uff0c\u672a\u6765\u53ef\u80fd\u9700\u8981\u8fdb\u884c\u4e34\u5e8a\u8bd5\u9a8c\u6765\u9a8c\u8bc1\u8fd9\u4e00\u9608\u503c\u3002\u8003\u8651\u5230\u7f8e\u56fd85%\u7684\u764c\u75c7\u60a3\u8005\u5728\u5730\u65b9\u793e\u533a\u8bbe\u65bd\u63a5\u53d7\u6cbb\u7597\uff0c\u8fd9\u7c7b\u6a21\u578b\u6709\u53ef\u80fd\u663e\u8457\u6269\u5927\u4f18\u8d28\u62a4\u7406\u7684\u53ef\u53ca\u6027\uff0c\u5176\u6548\u679c\u81f3\u5c11\u63a5\u8fd1\u4eba\u7c7b\u80bf\u7624\u5b66\u5bb6\u3002|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451](http://arxiv.org/abs/2406.06451)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u3001\u8c03\u8bd5\u548c\u89e3\u91ca\u65b9\u9762\u7684\u6027\u80fd\u5f15\u53d1\u4e86\u8bb8\u591a\u7814\u7a76\u8005\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u5bf9\u672c\u79d1\u7f16\u7a0b\u6559\u80b2\u7684\u5173\u6ce8\uff0c\u4ed6\u4eec\u671f\u5f85\u8fd9\u4e9b\u6a21\u578b\u80fd\u9769\u65b0\u7f16\u7a0b\u6559\u5b66\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5982\u4f55\u4ee5\u53ca\u4e3a\u4f55\u5728\u7f16\u7a0b\u6559\u80b2\u4e2d\u4f7f\u7528LLMs\u7684\u51b3\u7b56\u53ef\u80fd\u4e0d\u4ec5\u4ec5\u57fa\u4e8e\u6280\u672f\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u4ee5\u793e\u4f1a\u5851\u9020\u6280\u672f\u7406\u8bba\u4e3a\u6307\u5bfc\u6846\u67b6\uff0c\u63a2\u8ba8\u4e86\u5b66\u751f\u5bf9LLMs\u7684\u793e\u4f1a\u611f\u77e5\u5982\u4f55\u5f71\u54cd\u4ed6\u4eec\u7684\u4f7f\u7528\u884c\u4e3a\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u4e00\u4efd\u533f\u540d\u7684\u8bfe\u7a0b\u7ed3\u675f\u65f6\u7684\u8c03\u67e5\u95ee\u5377\uff08n=158\uff09\u3001\u4e2d\u671f\u81ea\u6211\u6548\u80fd\u95ee\u5377\uff08n=158\uff09\u300110\u4f4d\u5b66\u751f\u7684\u6df1\u5ea6\u8bbf\u8c08\u3001\u81ea\u6211\u62a5\u544a\u7684LLM\u5728\u4f5c\u4e1a\u4e2d\u7684\u4f7f\u7528\u60c5\u51b5\uff0c\u4ee5\u53ca\u671f\u4e2d\u8003\u8bd5\u6210\u7ee9\uff0c\u53d1\u73b0\u5b66\u751f\u7684LLM\u4f7f\u7528\u4e0e\u5176\u5bf9\u672a\u6765\u804c\u4e1a\u7684\u671f\u671b\u548c\u5bf9\u540c\u4f34\u4f7f\u7528\u7684\u611f\u77e5\u6709\u5173\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u65e9\u671f\u81ea\u6211\u62a5\u544a\u7684LLM\u4f7f\u7528\u4e0e\u8f83\u4f4e\u7684\u81ea\u6211\u6548\u80fd\u548c\u4e2d\u671f\u8003\u8bd5\u6210\u7ee9\u76f8\u5173\uff0c\u800c\u5b66\u751f\u5bf9\u8fc7\u5ea6\u4f9d\u8d56LLM\u7684\u611f\u77e5\uff0c\u800c\u975e\u5b9e\u9645\u4f7f\u7528\uff0c\u4e0e\u8bfe\u7a0b\u540e\u671f\u7684\u81ea\u6211\u6548\u80fd\u4e0b\u964d\u6709\u5173\u3002|\n", "2406.07545": "|**2024-06-11**|**Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena**|Aidar Myrzakhan et.al.|[2406.07545](http://arxiv.org/abs/2406.07545)|**[link](https://github.com/vila-lab/open-llm-leaderboard)**|**### \u80cc\u666f \u591a\u9879\u9009\u62e9\u9898\uff08MCQ\uff09\u5e38\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u5e38\uff0cLLM\u4f1a\u6839\u636e\u8c03\u6574\u540e\u7684\u6982\u7387\uff0c\u5982\u957f\u5ea6\u56e0\u7d20\uff0c\u9009\u62e9\u6700\u53ef\u80fd\u7684\u7b54\u6848\u3002\u7136\u800c\uff0cLLMs\u53ef\u80fd\u5b58\u5728\u56fa\u6709\u7684\u504f\u89c1\uff0c\u4f8b\u5982\u5bf9A\u3001B\u3001C\u3001D\u7b49\u9009\u9879ID\u7684\u504f\u597d\uff0c\u8fd9\u53ef\u80fd\u5f71\u54cd\u7b54\u6848\u9884\u6d4b\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u8fc7\u5728\u5c11\u6570\u6d4b\u8bd5\u6837\u672c\u4e0a\u968f\u673a\u6253\u4e71\u9009\u9879\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u5230\u65b0\u6837\u672c\u4e0a\uff0c\u8bd5\u56fe\u51cf\u5c11\u8fd9\u79cd\u201c\u9009\u62e9\u504f\u5dee\u201d\u3002\u6b64\u5916\uff0cMCQ\u7684\u53e6\u4e00\u4e2a\u95ee\u9898\u662f\u201c\u5f69\u7968\u5f0f\u731c\u6d4b\u201d\uff0c\u5373LLM\u5e76\u672a\u771f\u6b63\u5b66\u4e60\u77e5\u8bc6\uff0c\u800c\u662f\u51ed\u8fd0\u6c14\u731c\u5bf9\u7b54\u6848\uff0c\u8fd9\u5bf9\u5c0f\u578bLLMs\u5c24\u4e3a\u4e25\u91cd\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e00\u4e2a\u66f4\u5168\u9762\u7684\u65b9\u6cd5\u662f\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\uff0c\u8fd9\u80fd\u4ece\u6839\u672c\u4e0a\u6d88\u9664\u9009\u62e9\u504f\u5dee\u548c\u968f\u673a\u731c\u6d4b\u3002\u4f46\u8f6c\u5411\u5f00\u653e\u5f0f\u95ee\u9898\u4e5f\u5e26\u6765\u4e86\u6311\u6218\uff1a\u4e00\u662f\u5982\u4f55\u8bc6\u522b\u5408\u9002\u7684\u5f00\u653e\u6027\u95ee\u9898\uff0c\u4e8c\u662f\u5982\u4f55\u9a8c\u8bc1LLM\u5bf9\u5f00\u653e\u5f0f\u95ee\u9898\u7684\u56de\u7b54\u4e0e\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u7b54\u6848\u4e4b\u95f4\u7684\u51c6\u786e\u6027\u3002\u672c\u7814\u7a76\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u96be\u9898\uff0c\u5e76\u5efa\u7acb\u4e00\u4e2a\u65b0\u7684LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u901a\u8fc7\u5b8c\u5168\u7684\u5f00\u653e\u5f0f\u95ee\u9898\u6765\u8861\u91cf\u6a21\u578b\u6027\u80fd\uff0c\u4f8b\u5982GPT-4o/4/3.5\u3001Claude 3\u3001Gemini\u7b49\u3002 ### \u4efb\u52a1 \u6211\u4eec\u521b\u5efa\u4e86Open-LLM-Leaderboard\uff0c\u8fd9\u662f\u4e00\u4e2a\u65b0\u7684\u8bc4\u4ef7\u5e73\u53f0\uff0c\u65e8\u5728\u8ddf\u8e2a\u5404\u79cdLLM\u7684\u8868\u73b0\uff0c\u63ed\u793a\u5b83\u4eec\u7684\u771f\u5b9e\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5f00\u6e90\uff0c\u53ef\u5728\u6b64\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/VILA-Lab/Open-LLM-Leaderboard\u3002**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528](http://arxiv.org/abs/2406.07528)|**[link](https://github.com/dvlab-research/q-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u957f\u5e8f\u5217\u65b9\u9762\u7684\u80fd\u529b\u5bf9\u4e8e\u5404\u9886\u57df\u7684\u53d1\u5c55\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6355\u6349\u5e8f\u5217\u4e2d\u7684\u957f\u671f\u4f9d\u8d56\u5173\u7cfb\u4ee5\u6df1\u5165\u7406\u89e3\u8bed\u4e49\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Query-aware Inference for LLMs\uff08Q-LLM\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u8ba4\u77e5\u5904\u7406\u5927\u89c4\u6a21\u5e8f\u5217\u7684\u7cfb\u7edf\u3002\u901a\u8fc7\u805a\u7126\u4e8e\u4e0e\u7ed9\u5b9a\u67e5\u8be2\u76f8\u5173\u7684\u5185\u5b58\u6570\u636e\uff0cQ-LLM\u80fd\u591f\u5728\u56fa\u5b9a\u7a97\u53e3\u5927\u5c0f\u5185\u51c6\u786e\u6355\u6349\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4e3a\u67e5\u8be2\u63d0\u4f9b\u7cbe\u786e\u7684\u7b54\u6848\uff0c\u65e0\u9700\u989d\u5916\u8bad\u7ec3\uff0c\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55LLMs\u4e2d\u3002\u4f7f\u7528LLaMA3\uff08QuickLLaMA\uff09\u7684Q-LLM\u80fd\u572830\u79d2\u5185\u9605\u8bfb\u300a\u54c8\u5229\u00b7\u6ce2\u7279\u300b\uff0c\u5e76\u80fd\u51c6\u786e\u56de\u7b54\u95ee\u9898\u3002\u76f8\u8f83\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684LLaMA3\uff0cQ-LLM\u7684\u6027\u80fd\u63d0\u5347\u4e867.17%\uff0c\u800c\u5728Mistral\u4e0a\uff0c\u5b83\u5728$\\infty$-bench\u4e0a\u7684\u8868\u73b0\u63d0\u5347\u4e863.26%\u3002\u5728\u201c\u9488\u950b\u76f8\u5bf9\u201d\u4efb\u52a1\u4e2d\uff0cQ-LLM\u5728\u5e7f\u6cdb\u8ba4\u53ef\u7684\u57fa\u51c6\u4e0a\uff0c\u76f8\u5bf9\u4e8e\u5f53\u524d\u6700\u4f73\u6210\u7ee9\uff0cMistral\u4e0a\u7684\u63d0\u5347\u8fbe\u5230\u4e867.0%\uff0c\u5728LLaMA3\u4e0a\u5b9e\u73b0\u4e86100%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728https://github.com/dvlab-research/Q-LLM\u4e0a\u5f00\u6e90\u3002**|\n", "2406.07515": "|**2024-06-11**|**Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement**|Yunzhen Feng et.al.|[2406.07515](http://arxiv.org/abs/2406.07515)|null|\u968f\u7740\u751f\u6210\u6a21\u578b\u5408\u6210\u6570\u636e\u7684\u5174\u8d77\uff0c\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5fae\u8c03\uff0c\u8fd9\u5f15\u53d1\u4e86\u5bf9\u6a21\u578b\u5d29\u6e83\uff08\u5373\u5fae\u8c03\u6027\u80fd\u4e0b\u964d\uff09\u7684\u5173\u6ce8\u3002\u7531\u4e8e\u4eba\u7c7b\u548c\u673a\u5668\u90fd\u8f83\u5bb9\u6613\u5206\u8fa8\u597d\u6837\u672c\u548c\u574f\u6837\u672c\uff0c\u800c\u975e\u751f\u6210\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u53cd\u9988\u6765\u9632\u6b62\u6a21\u578b\u5728\u5408\u6210\u6570\u636e\u4e0a\u51fa\u73b0\u5d29\u6e83\u3002\u6211\u4eec\u7406\u8bba\u5206\u6790\u4e86\u4e00\u4e2a\u9ad8\u65af\u6df7\u5408\u5206\u7c7b\u6a21\u578b\u5728\u57fa\u4e8e\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u8bad\u7ec3\u4e0b\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u63d0\u4f9b\u4e86\u6709\u9650\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u5b9e\u9a8c\u8bc1\u636e\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5b9e\u9645\u95ee\u9898\u4e0a\u5c55\u793a\u4e86\u8fd9\u4e9b\u7406\u8bba\u9884\u6d4b\uff1a\u4f7f\u7528\u53d8\u538b\u5668\u8ba1\u7b97\u77e9\u9635\u7279\u5f81\u503c\u548c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u65b0\u95fb\u6458\u8981\uff0c\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u6a21\u578b\u5728\u751f\u6210\u6570\u636e\u4e0a\u90fd\u4f1a\u7ecf\u5386\u5d29\u6e83\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u901a\u8fc7\u4ece\u53cd\u9988\u589e\u5f3a\u7684\u5408\u6210\u6570\u636e\u4e2d\u8bad\u7ec3\uff0c\u65e0\u8bba\u662f\u4fee\u526a\u9519\u8bef\u9884\u6d4b\u8fd8\u662f\u9009\u62e9\u6700\u4f73\u731c\u6d4b\uff0c\u90fd\u80fd\u9632\u6b62\u6a21\u578b\u5d29\u6e83\uff0c\u8bc1\u5b9e\u4e86\u50cfRLHF\uff08Reinforcement Learning with Human Feedback\uff09\u8fd9\u6837\u7684\u6d41\u884c\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2406.07505": "|**2024-06-11**|**THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report**|KBTG Labs et.al.|[2406.07505](http://arxiv.org/abs/2406.07505)|null|## \u80cc\u666f \u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u6b65\u5728\u79d1\u6280\u9886\u57df\u5c55\u73b0\u4e86\u65b0\u529f\u80fd\u548c\u673a\u9047\u3002\u7136\u800c\uff0c\u975e\u5e38\u5927\u7684LLMs\u7684\u5b9e\u9645\u5e94\u7528\u53d7\u5230\u5176\u9ad8\u8ba1\u7b97\u6210\u672c\u7684\u5236\u7ea6\uff0c\u8fd9\u4e0e\u5176\u76f8\u5bf9\u6709\u9650\u7684\u4eba\u7c7b\u80fd\u529b\u76f8\u6bd4\uff0c\u6536\u76ca\u5e76\u4e0d\u660e\u663e\u3002\u5c3d\u7ba1\u5c0f\u578b\u3001\u66f4\u5b9e\u7528\u7684LLMs\u5728\u91d1\u878d\u5206\u6790\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5c1a\u672a\u5b8c\u5168\u638c\u63e1\uff0c\u5982\u5b83\u4eec\u5728\u6a21\u62df\u7279\u8bb8\u91d1\u878d\u5206\u6790\u5e08\uff08CFA\uff09\u8003\u8bd5\u4e2d\u7684\u63a5\u8fd1\u901a\u8fc7\u8868\u73b0\u6240\u793a\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86Financial Analyst Extension\uff08FAE\uff09\u5bf9\u6211\u4eec\u7684Text Hyperlocally Augmented Large Language Extension\uff08THaLLE\uff09\u7cfb\u5217\u7684\u6269\u5c55\uff0c\u8fd9\u4e00\u7cfb\u521780\u4ebf\u53c2\u6570\u7684LLMs\u5728\u6a21\u62dfCFA\u8003\u8bd5\u4e2d\u59cb\u7ec8\u8868\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u4e0e\u540c\u7c7b\u89c4\u6a21\u7684\u6a21\u578b\u76f8\u6bd4\u3002\u6211\u4eec\u8be6\u7ec6\u8bb0\u5f55\u4e86\u7528\u4e8e\u4f18\u5316\u7684\u5fae\u8c03\u6280\u672f\uff0c\u4ee5\u4f9b\u540e\u7eed\u7814\u7a76\u53c2\u8003\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165Flare CFA\uff0c\u8fd9\u662f\u4e00\u4e2a\u516c\u5f00\u53ef\u7528\u7684\u91d1\u878d\u987e\u95ee\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u68c0\u9a8cLLMs\u5728\u8d22\u52a1\u987e\u95ee\u89d2\u8272\u4e2d\u7684\u80fd\u529b\u3002|\n", "2406.07502": "|**2024-06-11**|**Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions**|Renjie Pi et.al.|[2406.07502](http://arxiv.org/abs/2406.07502)|**[link](https://github.com/sterzhang/image-textualization)**|**## \u80cc\u666f \u56fe\u50cf\u63cf\u8ff0\u6570\u636e\u96c6\u5bf9\u4e8e\u63a8\u52a8\u56fe\u50cf\u7406\u89e3\u3001\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u548c\u6587\u672c\u56fe\u50cf\u68c0\u7d22\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u5f53\u524d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u4e3b\u8981\u6765\u81ea\u4e24\u4e2a\u9014\u5f84\uff1a\u4e00\u662f\u4ece\u7f51\u7edc\u4e0a\u6293\u53d6\u56fe\u50cf\u4e0e\u6587\u5b57\u5bf9\uff0c\u4f46\u8fd9\u7c7b\u63cf\u8ff0\u5f80\u5f80\u8d28\u91cf\u8f83\u4f4e\u4e14\u5b58\u5728\u566a\u58f0\uff1b\u4e8c\u662f\u4eba\u5de5\u6807\u6ce8\uff0c\u5982COCO\u7b49\uff0c\u901a\u5e38\u63cf\u8ff0\u7b80\u6d01\uff0c\u7f3a\u4e4f\u8be6\u7ec6\u4fe1\u606f\u3002\u5c3d\u7ba1\u8be6\u7ec6\u7684\u56fe\u50cf\u63cf\u8ff0\u53ef\u4ee5\u901a\u8fc7\u4eba\u7c7b\u6807\u6ce8\u83b7\u5f97\uff0c\u4f46\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\u9650\u5236\u4e86\u5176\u53ef\u884c\u6027\u3002\u8fd9\u4e9b\u5c40\u9650\u6027\u4fc3\u4f7f\u6211\u4eec\u5bfb\u6c42\u66f4\u6709\u6548\u548c\u53ef\u6269\u5c55\u7684\u65b9\u6cd5\u6765\u751f\u6210\u51c6\u786e\u800c\u8be6\u5c3d\u7684\u56fe\u50cf\u63cf\u8ff0\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u56fe\u50cf\u6587\u672c\u5316\u201d\uff08Image Textualization\uff0c\u7b80\u79f0IT\uff09\uff0c\u5b83\u901a\u8fc7\u534f\u540c\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u548c\u89c6\u89c9\u4e13\u5bb6\u6a21\u578b\uff0c\u6709\u6548\u5730\u5c06\u89c6\u89c9\u4fe1\u606f\u8f6c\u5316\u4e3a\u6587\u672c\uff0c\u4ece\u800c\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u63cf\u8ff0\u3002\u9488\u5bf9\u5f53\u524d\u7f3a\u4e4f\u8be6\u5c3d\u63cf\u8ff0\u7684\u57fa\u51c6\u95ee\u9898\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u591a\u4e2a\u8bc4\u4ef7\u57fa\u51c6\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u7684\u6846\u67b6\u751f\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u8d28\u91cf\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728IT\u7cbe\u5fc3\u7f16\u7e82\u7684\u63cf\u8ff0\u8bad\u7ec3\u4e0b\uff0cLLaVA-7B\u6a21\u578b\u7684\u56fe\u50cf\u63cf\u8ff0\u751f\u6210\u80fd\u529b\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u80fd\u591f\u751f\u6210\u66f4\u4e30\u5bcc\u7684\u63cf\u8ff0\uff0c\u8f93\u51fa\u957f\u5ea6\u548c\u7ec6\u8282\u663e\u8457\u589e\u52a0\uff0c\u540c\u65f6\u51cf\u5c11\u4e86\u5e7b\u89c9\u73b0\u8c61\u3002**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496](http://arxiv.org/abs/2406.07496)|**[link](https://github.com/zou-group/textgrad)**|**\u4eba\u5de5\u667a\u80fd\u6b63\u7ecf\u5386\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5176\u4ed6\u590d\u6742\u7ec4\u4ef6\u7684\u534f\u540c\u5de5\u4f5c\u53d6\u5f97\u4e86\u7a81\u7834\u3002\u5f53\u524d\uff0c\u4e3a\u590d\u5408\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u8bbe\u8ba1\u539f\u5219\u5316\u7684\u81ea\u52a8\u5316\u4f18\u5316\u65b9\u6cd5\u6210\u4e3a\u4e00\u9879\u5173\u952e\u65b0\u6311\u6218\u3002\u795e\u7ecf\u7f51\u7edc\u5728\u65e9\u671f\u9762\u4e34\u7c7b\u4f3c\u95ee\u9898\u65f6\uff0c\u901a\u8fc7\u53cd\u5411\u4f20\u64ad\u548c\u81ea\u52a8\u5fae\u5206\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TextGrad\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u6587\u672c\u5b9e\u73b0\u81ea\u52a8\u201c\u5fae\u5206\u201d\uff0c\u5c06LLMs\u63d0\u4f9b\u7684\u4e30\u5bcc\u3001\u901a\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5efa\u8bae\u56de\u4f20\u5230\u590d\u5408AI\u7cfb\u7edf\u7684\u5404\u4e2a\u7ec4\u4ef6\u4e2d\u3002TextGrad\u9075\u5faaPyTorch\u7684\u8bed\u6cd5\u548c\u62bd\u8c61\uff0c\u6613\u4e8e\u4f7f\u7528\u4e14\u7075\u6d3b\uff0c\u7528\u6237\u4ec5\u9700\u63d0\u4f9b\u76ee\u6807\u51fd\u6570\uff0c\u65e0\u9700\u8c03\u6574\u6846\u67b6\u7ec4\u4ef6\u6216\u63d0\u793a\uff0c\u5373\u53ef\u65e0\u7f1d\u5e94\u7528\u3002 TextGrad\u9002\u7528\u4e8e\u591a\u79cd\u4efb\u52a1\uff0c\u4ece\u95ee\u7b54\u548c\u5206\u5b50\u4f18\u5316\u5230\u653e\u5c04\u6cbb\u7597\u8ba1\u5212\u8bbe\u8ba1\u3002\u5728\u65e0\u9700\u4fee\u6539\u6846\u67b6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u663e\u8457\u63d0\u5347\u4e86GPT-4o\u5728Google\u8bc1\u660e\u6027\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u96f6-shot\u51c6\u786e\u7387\uff0c\u4ece51%\u63d0\u5347\u81f355%\uff1b\u5728\u4f18\u5316LeetCode\u96be\u9898\u89e3\u6cd5\u4e0a\u5b9e\u73b0\u4e8620%\u7684\u76f8\u5bf9\u6027\u80fd\u63d0\u5347\uff1b\u6539\u8fdb\u4e86\u63a8\u7406\u63d0\u793a\uff0c\u8bbe\u8ba1\u51fa\u5177\u6709\u7406\u60f3\u4f53\u5916\u4eb2\u548c\u529b\u7684\u65b0\u836f\u5019\u9009\u5206\u5b50\uff1b\u4ee5\u53ca\u8bbe\u8ba1\u51fa\u5177\u6709\u9ad8\u7279\u5f02\u6027\u7684\u653e\u5c04\u6cbb\u7597\u65b9\u6848\u3002TextGrad\u4e3a\u4e0b\u4e00\u4ee3AI\u7cfb\u7edf\u7684\u53d1\u5c55\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63a8\u52a8\u4e86\u590d\u5408AI\u6280\u672f\u7684\u52a0\u901f\u53d1\u5c55\u3002**|\n", "2406.07494": "|**2024-06-12**|**CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization**|Frederic Kirstein et.al.|[2406.07494](http://arxiv.org/abs/2406.07494)|null|\u8be5\u6587\u7ae0\u7efc\u8ff0\u4e862019\u5e74\u81f32024\u5e74\u95f4\u53d1\u8868\u76841262\u7bc7\u72ec\u7279\u7684\u7814\u7a76\u8bba\u6587\uff0c\u96c6\u4e2d\u5728Transformer\u67b6\u6784\u5728\u82f1\u6587\u5bf9\u8bdd\u6458\u8981\u751f\u6210\u65b9\u9762\u7684\u7814\u7a76\u3002\u6587\u7ae0\u8be6\u7ec6\u63a2\u8ba8\u4e86\u5bf9\u8bdd\u6458\u8981\u4e2d\u5b58\u5728\u7684\u4e3b\u8981\u6311\u6218\uff0c\u5982\u8bed\u8a00\u7406\u89e3\u3001\u7ed3\u6784\u5904\u7406\u3001\u7406\u89e3\u80fd\u529b\u3001\u8bf4\u8bdd\u8005\u8bc6\u522b\u3001\u91cd\u8981\u6027\u5224\u65ad\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\uff0c\u5e76\u4e0e\u76f8\u5e94\u7684\u6280\u672f\uff0c\u5982\u56fe\u89e3\u65b9\u6cd5\u3001\u989d\u5916\u8bad\u7ec3\u4efb\u52a1\u548c\u89c4\u5212\u7b56\u7565\u8fdb\u884c\u4e86\u5173\u8054\u3002\u5c3d\u7ba1\u5728\u67d0\u4e9b\u65b9\u9762\uff08\u5982\u8bed\u8a00\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5982\u7406\u89e3\u529b\u3001\u771f\u5b9e\u6027\u4e0e\u91cd\u8981\u6027\u8bc4\u4f30\u7b49\u6311\u6218\u4ecd\u7136\u5b58\u5728\uff0c\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u7814\u7a76\u7a7a\u95f4\u3002 \u6587\u7ae0\u8fd8\u5206\u6790\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u65b9\u6cd5\u7684\u65b9\u5f0f\uff0c\u6db5\u76d6\u4e86\u5bf9\u8bdd\u5b50\u9886\u57df\uff08\u5982\u4f1a\u8bae\u3001\u533b\u7597\uff09\u7684\u5e38\u7528\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u81ea\u52a8\u8bc4\u4ef7\u6307\u6807\uff08\u5982ROUGE\uff09\u548c\u4eba\u7c7b\u8bc4\u4f30\u7684\u666e\u904d\u5b9e\u8df5\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8de8\u9886\u57df\u7684\u6570\u636e\u96c6\u76f8\u5bf9\u6709\u9650\uff0c\u4e14\u62a5\u544a\u7684\u4eba\u7c7b\u8bc4\u4f30\u5f80\u5f80\u7f3a\u4e4f\u8db3\u591f\u7684\u5185\u5ba1\u5458\u4e00\u81f4\u6027\u4fe1\u606f\u548c\u6807\u6ce8\u6307\u5357\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8ba8\u8bba\u4e86\u5927\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u63a2\u7d22\u53ef\u80fd\u5e26\u6765\u7684\u5f71\u54cd\uff0c\u6307\u51fa\u5c3d\u7ba1\u5b83\u4eec\u53ef\u80fd\u4f1a\u6539\u53d8\u76f8\u5173\u6027\u548c\u96be\u5ea6\uff0c\u4f46\u63cf\u8ff0\u7684\u6311\u6218\u5206\u7c7b\u4f53\u7cfb\u4ecd\u7136\u5177\u6709\u4ef7\u503c\u3002|\n", "2406.07485": "|**2024-06-11**|**PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction**|Adnan Abbas et.al.|[2406.07485](http://arxiv.org/abs/2406.07485)|null|\u9ad8\u6548\u7684\u8ba1\u5212\u5236\u5b9a\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4eba\u4eec\u5f80\u5f80\u96be\u4ee5\u5236\u5b9a\u5b9e\u9645\u7684\u8ba1\u5212\u5e76\u53cd\u601d\u81ea\u5df1\u7684\u6548\u7387\u3002\u5229\u7528\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u5bf9\u8bdd\u52a9\u624b\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u8bdd\u65b9\u5f0f\u5c06\u8ba1\u5212\u5916\u5316\uff0c\u5f3a\u5316\u51b3\u5fc3\uff0c\u4fc3\u8fdb\u4e13\u6ce8\u884c\u52a8\uff0c\u4ece\u800c\u6b63\u9762\u5f71\u54cd\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u8bbe\u8ba1\u4e00\u4e2a\u5bf9\u8bdd\u52a9\u624b\uff0c\u901a\u8fc7\u81ea\u7136\u5bf9\u8bdd\u7684\u793e\u4ea4\u4e92\u52a8\u6027\uff0c\u63d0\u4f9b\u6df1\u5165\u7684\u95ee\u9898\u548c\u53cd\u601d\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u8ba1\u5212\u6267\u884c\u5ea6\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u663e\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u7684\u6548\u76ca\uff0c\u4f46\u8bb8\u591a\u5e72\u9884\u63aa\u65bd\u4ecd\u4fdd\u6301\u9759\u6001\uff0c\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u53c2\u4e0e\u5ea6\u968f\u65f6\u95f4\u4e0b\u964d\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65cb\u8f6c\u548c\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u63d0\u793a\u7b56\u7565\uff0c\u6bcf\u5929\u4e3a\u7528\u6237\u63d0\u4f9b\u591a\u6837\u7684\u5e72\u9884\u624b\u6bb5\u3002\u6211\u4eec\u7684\u7cfb\u7edfPITCH\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u4fc3\u8fdb\u65e5\u5e38\u8ba1\u5212\u7684\u5916\u90e8\u5316\u548c\u53cd\u601d\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0e\u5bf9\u8bdd\u4ee3\u7406\u4e00\u8d77\u5916\u5316\u4efb\u52a1\u5bf9\u751f\u4ea7\u529b\u548c\u5fc3\u7406\u5065\u5eb7\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u65cb\u8f6c\u7b56\u7565\u5728\u4fdd\u6301\u7528\u6237\u53c2\u4e0e\u5ea6\u65b9\u9762\u7684\u6709\u6548\u6027\u3002|\n", "2406.07483": "|**2024-06-11**|**Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing**|Mao Li et.al.|[2406.07483](http://arxiv.org/abs/2406.07483)|null|\u5728\u5feb\u901f\u53d1\u5c55\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u81ea\u52a8\u6587\u672c\u6807\u6ce8\u65b9\u9762\u5c55\u73b0\u51fa\u6d53\u539a\u5174\u8da3\u3002\u672c\u6587\u7814\u7a76\u4e86\u516b\u79cd\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728\u7acb\u573a\u6807\u6ce8\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u5c06\u5176\u4e0e\u4eba\u7c7b\uff08\u901a\u8fc7\u4f17\u5305\uff09\u7684\u5224\u65ad\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\u3002\u6211\u4eec\u63a2\u7a76\u4e86\u4f55\u65f6LLMs\u53ef\u80fd\u4e0e\u4eba\u7c7b\u5224\u65ad\u4ea7\u751f\u5206\u6b67\u7684\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6587\u672c\u4e2d\u8868\u8fbe\u7acb\u573a\u7684\u660e\u786e\u7a0b\u5ea6\u5bf9LLMs\u5224\u65ad\u4e0e\u4eba\u7c7b\u4e00\u81f4\u6027\u81f3\u5173\u91cd\u8981\u3002\u5f53\u4eba\u7c7b\u6ce8\u91ca\u8005\u8868\u73b0\u826f\u597d\u65f6\uff0cLLMs\u4e5f\u8868\u73b0\u51fa\u8272\uff1b\u53cd\u4e4b\uff0cLLMs\u7684\u5931\u8d25\u5f80\u5f80\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u96be\u4ee5\u8fbe\u6210\u4e00\u81f4\u7684\u60c5\u5883\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5efa\u8bae\u7ed3\u5408\u4eba\u7c7b\u4e13\u4e1a\u77e5\u8bc6\u7684\u7cbe\u786e\u5ea6\u4e0eLLMs\u9884\u6d4b\u7684\u89c4\u6a21\uff0c\u63d0\u51fa\u4e00\u79cd\u5168\u9762\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u63d0\u9ad8\u81ea\u52a8\u5316\u7acb\u573a\u68c0\u6d4b\u51c6\u786e\u6027\u548c\u5168\u9762\u6027\u7684\u5fc5\u8981\u6027\uff0c\u65e8\u5728\u63a8\u52a8\u8fd9\u4e9b\u6280\u672f\u5728\u66f4\u9ad8\u6548\u3001\u65e0\u504f\u89c1\u7684\u793e\u4f1a\u5a92\u4f53\u5206\u6790\u4e2d\u5f97\u5230\u63d0\u5347\u3002|\n", "2406.07476": "|**2024-06-11**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476](http://arxiv.org/abs/2406.07476)|**[link](https://github.com/damo-nlp-sg/videollama2)**|**\u672c\u6587\u4ecb\u7ecdVideoLLaMA 2\uff0c\u4e00\u5957\u4e13\u4e3a\u63d0\u5347\u89c6\u9891\u548c\u97f3\u9891\u5b9a\u5411\u4efb\u52a1\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u5efa\u6a21\u53ca\u97f3\u9891\u7406\u89e3\u80fd\u529b\u800c\u8bbe\u8ba1\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Video-LLMs\uff09\u3002\u5b83\u5728\u524d\u4e00\u4ee3\u7684\u57fa\u7840\u4e0a\u589e\u6dfb\u4e86\u5b9a\u5236\u7684\u65f6\u7a7a\u5377\u79ef\uff08STC\uff09\u8fde\u63a5\u5668\uff0c\u6709\u6548\u5730\u6355\u6349\u89c6\u9891\u6570\u636e\u7684\u590d\u6742\u7a7a\u95f4\u548c\u65f6\u95f4\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8054\u5408\u8bad\u7ec3\u878d\u5165\u4e86\u97f3\u9891\u5206\u652f\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u591a\u6a21\u6001\u7406\u89e3\u80fd\u529b\uff0c\u4f7f\u5176\u80fd\u65e0\u7f1d\u878d\u5408\u97f3\u9891\u7ebf\u7d22\u3002\u5728\u591a\u9879\u8bc4\u4f30\u4e2d\uff0c\u5982\u591a\u9009\u89c6\u9891\u95ee\u7b54\uff08MC-VQA\uff09\u3001\u5f00\u653e\u6027\u89c6\u9891\u95ee\u7b54\uff08OE-VQA\uff09\u548c\u89c6\u9891captioning\uff08VC\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u8868\u73b0\u51fa\u4e0e\u5f00\u6e90\u6a21\u578b\u76f8\u5f53\u7684\u7ade\u4e89\u5b9e\u529b\uff0c\u5e76\u5728\u67d0\u4e9b\u57fa\u51c6\u4e0a\u63a5\u8fd1\u4e13\u6709\u6a21\u578b\u3002\u5728\u97f3\u9891\u4ec5\u7528\uff08AQA\uff09\u548c\u97f3\u9891-\u89c6\u9891\u95ee\u7b54\uff08OE-AVQA\uff09\u4efb\u52a1\u4e0a\uff0cVideoLLaMA 2\u4e5f\u663e\u793a\u51fa\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5408\u7406\u6539\u8fdb\u3002\u8fd9\u4e9b\u8fdb\u6b65\u51f8\u663e\u4e86VideoLLaMA 2\u5728\u591a\u6a21\u6001\u7406\u89e3\u65b9\u9762\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4e3a\u667a\u80fd\u89c6\u9891\u5206\u6790\u7cfb\u7edf\u6811\u7acb\u4e86\u65b0\u6807\u51c6\u3002\u6240\u6709\u6a21\u578b\u5747\u516c\u5f00\u4ee5\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2406.08477": "|**2024-06-12**|**Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens**|Ting-Ji Huang et.al.|[2406.08477](http://arxiv.org/abs/2406.08477)|null|\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u5411\u91cf\u8868\u793a\u7528\u6237\u548c\u9879\u76ee\u5bf9\u4e8e\u591a\u79cd\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u7528\u4e8e\u95ee\u7b54\u5f62\u5f0f\u7684\u63a8\u8350\uff0c\u4f7f\u7528\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\uff08\u5982\u201citem\u201d\u3001\u201c20\u201d\u3001\u201c24\u201d\uff09\u6765\u8868\u793a\u5b9e\u9645\u7684\u7528\u6237\u548c\u9879\u76ee\u3002\u7136\u800c\uff0c\u7531\u4e8eLLMs\u901a\u5e38\u662f\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u9884\u8bad\u7ec3\u7684\uff0c\u8fd9\u4e9b\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5728\u8868\u8fbe\u72ec\u7279\u7528\u6237\u548c\u9879\u76ee\u65b9\u9762\u80fd\u529b\u6709\u9650\uff0c\u5373\u4f7f\u7ecf\u8fc7\u63a8\u8350\u4efb\u52a1\u7684\u5fae\u8c03\uff0c\u4e5f\u4f1a\u524a\u5f31\u63a8\u8350\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u5982\u4f55\u6709\u6548\u5728LLM\u57fa\u7684\u63a8\u8350\u7cfb\u7edf\u4e2d\u5904\u7406\u7528\u6237\u548c\u9879\u76ee\u7684\u6807\u8bb0\u3002 \u6211\u4eec\u5f3a\u8c03\u4e86\u51fa\u8bcd\u6c47\u8868\uff08OOV\uff09\u6807\u8bb0\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u9664\u4e86\u8bcd\u6c47\u8868\u5185\u7684\u6807\u8bb0\u5916\uff0c\u8fd8\u80fd\u6355\u6349\u7528\u6237/\u9879\u76ee\u4e4b\u95f4\u7684\u5173\u8054\u6027\u548c\u591a\u6837\u6027\u3002\u901a\u8fc7\u5206\u6790\u5386\u53f2\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u8868\u793a\u5b66\u4e60\uff0c\u6211\u4eec\u4f7f\u5177\u6709\u76f8\u4f3c\u7279\u6027\u7684\u7528\u6237/\u9879\u76ee\u7ec4\u5408\u5171\u4eab\u76f8\u540c\u7684OOV\u6807\u8bb0\u3002\u6b64\u5916\uff0c\u5c06\u8fd9\u4e9bOOV\u6807\u8bb0\u6574\u5408\u5230LLM\u7684\u8bcd\u6c47\u8868\u4e2d\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u533a\u5206\u7528\u6237\u548c\u9879\u76ee\uff0c\u589e\u5f3a\u5728\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u65f6\u5bf9\u7528\u6237-\u9879\u76ee\u5173\u7cfb\u7684\u6355\u6349\u3002 \u6211\u4eec\u7684\u63d0\u51fa\u7684\u6846\u67b6\u5728\u5404\u79cd\u4e0b\u6e38\u63a8\u8350\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2406.08474": "|**2024-06-12**|**Real2Code: Reconstruct Articulated Objects via Code Generation**|Zhao Mandi et.al.|[2406.08474](http://arxiv.org/abs/2406.08474)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014Real2Code\uff0c\u65e8\u5728\u901a\u8fc7\u4ee3\u7801\u751f\u6210\u6765\u91cd\u5efa\u53ef\u52a8\u7269\u4f53\u3002\u7ed9\u5b9a\u7269\u4f53\u7684\u89c6\u89c9\u89c2\u6d4b\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u56fe\u50cf\u5206\u5272\u6a21\u578b\u548c\u5f62\u72b6\u8865\u5168\u6a21\u578b\u91cd\u6784\u5176\u90e8\u4ef6\u51e0\u4f55\u7ed3\u6784\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06\u7269\u4f53\u90e8\u4ef6\u8868\u793a\u4e3a\u5e26\u6709\u65b9\u5411\u7684\u8fb9\u754c\u6846\uff0c\u7136\u540e\u8f93\u5165\u5230\u4e00\u4e2a\u7ecf\u8fc7\u5fae\u8c03\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\uff0c\u9884\u6d4b\u5173\u8282\u6d3b\u52a8\u7684\u4ee3\u7801\u8868\u793a\u3002\u901a\u8fc7\u5229\u7528\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u4f18\u96c5\u5730\u6269\u5c55\u5230\u5177\u6709\u66f4\u591a\u53ef\u52a8\u90e8\u4ef6\u7684\u5bf9\u8c61\uff0c\u5e76\u80fd\u4ece\u5408\u6210\u8bad\u7ec3\u6570\u636e\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u4e0d\u89c4\u5219\u73af\u5883\u7269\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cReal2Code\u5728\u91cd\u5efa\u7cbe\u5ea6\u4e0a\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u5e76\u4e14\u662f\u9996\u4e2a\u80fd\u591f\u8d85\u8d8a\u8bad\u7ec3\u96c6\u4e2d\u5bf9\u8c61\u7ed3\u6784\u590d\u6742\u6027\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91cd\u5efa\u591a\u8fbe10\u4e2a\u53ef\u52a8\u90e8\u4ef6\u7684\u7269\u4f53\u3002\u5f53\u4e0e\u7acb\u4f53\u91cd\u5efa\u6a21\u578b\u7ed3\u5408\u65f6\uff0cReal2Code\u8fd8\u80fd\u4ece\u5c11\u91cf\u591a\u89c6\u56feRGB\u56fe\u50cf\u4e2d\u6cdb\u5316\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u7269\u4f53\uff0c\u65e0\u9700\u6df1\u5ea6\u6216\u76f8\u673a\u4fe1\u606f\u3002|\n", "2406.08464": "|**2024-06-12**|**Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing**|Zhangchen Xu et.al.|[2406.08464](http://arxiv.org/abs/2406.08464)|**[link](https://github.com/magpie-align/magpie)**|\u9ad8\u8d28\u91cf\u7684\u6307\u4ee4\u6570\u636e\u5bf9\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u50cfLlama-3-Instruct\u8fd9\u6837\u7684\u6a21\u578b\u516c\u5f00\u4e86\u6743\u91cd\uff0c\u4f46\u5b83\u4eec\u7684\u5bf9\u9f50\u6570\u636e\u4ecd\u7136\u4fdd\u5bc6\uff0c\u8fd9\u9650\u5236\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u666e\u53ca\u3002\u73b0\u6709\u7684\u5f00\u6e90\u6570\u636e\u751f\u6210\u65b9\u6cd5\u53d7\u9650\u4e8e\u9ad8\u6602\u7684\u4eba\u529b\u6210\u672c\u548c\u6709\u9650\u7684\u63d0\u793a\u8303\u56f4\uff0c\u96be\u4ee5\u6709\u6548\u6269\u5c55\uff0c\u53ef\u80fd\u5f71\u54cd\u516c\u5171\u5bf9\u9f50\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\u3002\u80fd\u5426\u901a\u8fc7\u76f4\u63a5\u4ece\u5df2\u5bf9\u9f50\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\uff0c\u5927\u89c4\u6a21\u5408\u6210\u9ad8\u8d28\u6307\u4ee4\u6570\u636e\u5462\uff1f\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5408\u6210\u65b9\u6cd5\uff0c\u79f0\u4e3aMagpie\u3002\u6211\u4eec\u7684\u5173\u952e\u89c2\u5bdf\u662f\uff0c\u7531\u4e8eLlama-3-Instruct\u7b49\u5df2\u5bf9\u9f50\u7684\u6a21\u578b\u5177\u6709\u81ea\u56de\u5f52\u7279\u6027\uff0c\u5f53\u6211\u4eec\u4ec5\u8f93\u5165\u5de6\u4fa7\u6a21\u677f\u5230\u7528\u6237\u6d88\u606f\u9884\u7559\u4f4d\u7f6e\u65f6\uff0c\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u7528\u6237\u67e5\u8be2\u3002\u6211\u4eec\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\u63d0\u793aLlama-3-Instruct\uff0c\u751f\u6210\u4e86400\u4e07\u4e2a\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5bf9\u63d0\u53d6\u7684\u6570\u636e\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u5e76\u9009\u62e9\u4e8630\u4e07\u4e2a\u9ad8\u8d28\u91cf\u5b9e\u4f8b\u3002\u4e3a\u4e86\u6bd4\u8f83Magpie\u6570\u636e\u4e0e\u5176\u4ed6\u516c\u5171\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5206\u522b\u4f7f\u7528\u6bcf\u4e2a\u6570\u636e\u96c6\u5bf9Llama-3-8B-Base\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8bc4\u4f30\u5fae\u8c03\u540e\u6a21\u578b\u7684\u6027\u80fd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u4ec5\u4f7f\u7528Magpie\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0e\u5b98\u65b9\u7ecf\u8fc71000\u4e07\u4e2a\u6570\u636e\u70b9\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u548c\u540e\u7eed\u53cd\u9988\u5b66\u4e60\u589e\u5f3a\u7684Llama-3-8B-Instruct\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4ec5\u4f7f\u7528Magpie\u8fdb\u884cSFT\u53ef\u4ee5\u8d85\u8d8a\u5148\u524d\u7528\u4e8eSFT\u548c\u504f\u597d\u4f18\u5316\uff08\u5982UltraFeedback\u7684\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff09\u7684\u516c\u5171\u6570\u636e\u96c6\u3002\u8fd9\u79cd\u4f18\u52bf\u5728AlpacaEval\u3001ArenaHard\u548cWildBench\u7b49\u5bf9\u9f50\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u660e\u663e\u3002|\n", "2406.08434": "|**2024-06-12**|**TasTe: Teaching Large Language Models to Translate through Self-Reflection**|Yutong Wang et.al.|[2406.08434](http://arxiv.org/abs/2406.08434)|**[link](https://github.com/yutongwang1216/reflectionllmmt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u6307\u4ee4\u8c03\u4f18\u540e\uff0c\u5728\u673a\u5668\u7ffb\u8bd1\uff08Machine Translation, MT\uff09\u7b49\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u672a\u80fd\u8fbe\u5230\u4e0e\u76d1\u7763\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08Supervised Neural Machine Translation, NMT\uff09\u7cfb\u7edf\u76f8\u5f53\u7684\u7ffb\u8bd1\u8d28\u91cf\u3002\u539f\u56e0\u53ef\u80fd\u662f\u5f53\u524d\u4f7f\u7528\u7684\u7b80\u5355\u63d0\u793a\u65e0\u6cd5\u5145\u5206\u5229\u7528\u6a21\u578b\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TasTe\u6846\u67b6\uff0c\u5373\u201c\u901a\u8fc7\u81ea\u6211\u53cd\u601d\u8fdb\u884c\u7ffb\u8bd1\u201d\u3002\u8be5\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u63a8\u7406\u9636\u6bb5\uff1a\u7b2c\u4e00\u9636\u6bb5\uff0c\u6a21\u578b\u88ab\u5f15\u5bfc\u751f\u6210\u521d\u6b65\u7ffb\u8bd1\u5e76\u540c\u65f6\u5bf9\u5176\u81ea\u8eab\u8fdb\u884c\u8bc4\u4f30\uff1b\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6a21\u578b\u6839\u636e\u8bc4\u4f30\u7ed3\u679c\u5bf9\u521d\u6b65\u7ffb\u8bd1\u8fdb\u884c\u7ec6\u5316\u3002\u5728WMT22\u57fa\u51c6\u7684\u56db\u79cd\u8bed\u8a00\u65b9\u5411\u4e0a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\u7684\u6709\u6548\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\uff0c\u80fd\u591f\u91ca\u653e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u5e76\u589e\u5f3a\u5176\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/YutongWang1216/ReflectionLLMMT\u4e0a\u5f00\u6e90\u3002**|\n", "2406.08426": "|**2024-06-12**|**Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL**|Zijin Hong et.al.|[2406.08426](http://arxiv.org/abs/2406.08426)|null|\u6587\u672c\u8f6cSQL\u751f\u6210\u51c6\u786e\u7684SQL\u67e5\u8be2\u4ee5\u54cd\u5e94\u81ea\u7136\u8bed\u8a00\u95ee\u9898\u662f\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5b83\u6d89\u53ca\u7528\u6237\u95ee\u9898\u7406\u89e3\u3001\u6570\u636e\u5e93\u6a21\u5f0f\u7406\u89e3\u4ee5\u53caSQL\u751f\u6210\u7b49\u591a\u4e2a\u590d\u6742\u73af\u8282\u3002\u4f20\u7edf\u7684\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u4f9d\u8d56\u4e8e\u4eba\u5de5\u5de5\u7a0b\u548c\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\u3002\u968f\u7740\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08PLMs\uff09\u7684\u53d1\u5c55\u548c\u5728\u8be5\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002\u7136\u800c\uff0c\u968f\u7740\u6570\u636e\u5e93\u590d\u6742\u5ea6\u589e\u52a0\u548c\u7528\u6237\u95ee\u9898\u96be\u5ea6\u589e\u5927\uff0cPLMs\u6709\u9650\u7684\u7406\u89e3\u80fd\u529b\u53ef\u80fd\u5bfc\u81f4\u9519\u8bef\u7684SQL\u751f\u6210\uff0c\u8fd9\u4fc3\u4f7f\u7814\u7a76\u4eba\u5458\u5bfb\u6c42\u66f4\u9ad8\u7ea7\u548c\u5b9a\u5236\u5316\u7684\u4f18\u5316\u65b9\u6cd5\uff0c\u9650\u5236\u4e86PLM\u57fa\u7840\u7cfb\u7edf\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0a\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5907\u53d7\u77a9\u76ee\u3002\u56e0\u6b64\uff0c\u6574\u5408LLM\u7684\u5b9e\u73b0\u4e3a\u6587\u672c\u8f6cSQL\u7814\u7a76\u5e26\u6765\u4e86\u72ec\u7279\u7684\u673a\u9047\u3001\u6311\u6218\u548c\u89e3\u51b3\u65b9\u6848\u3002\u672c\u7efc\u8ff0\u5168\u9762\u6982\u8ff0\u4e86\u57fa\u4e8eLLM\u7684\u6587\u672c\u8f6cSQL\u3002\u9996\u5148\uff0c\u6211\u4eec\u6982\u8ff0\u5f53\u524d\u9762\u4e34\u7684\u6311\u6218\u548c\u6587\u672c\u8f6cSQL\u7684\u53d1\u5c55\u5386\u7a0b\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u4ecb\u7ecd\u7528\u4e8e\u8bc4\u4f30\u6587\u672c\u8f6cSQL\u7cfb\u7edf\u7684\u6570\u636e\u96c6\u548c\u8bc4\u4ef7\u6307\u6807\u3002\u7136\u540e\uff0c\u6211\u4eec\u7cfb\u7edf\u5206\u6790\u4e86\u8fd1\u671f\u5728LLM\u652f\u6301\u4e0b\u7684\u6587\u672c\u8f6cSQL\u8fdb\u5c55\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u8be5\u9886\u57df\u5c1a\u5b58\u7684\u6311\u6218\uff0c\u5e76\u5bf9\u672a\u6765\u7814\u7a76\u65b9\u5411\u63d0\u51fa\u671f\u5f85\u3002|\n", "2406.08418": "|**2024-06-12**|**OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text**|Qingyun Li et.al.|[2406.08418](http://arxiv.org/abs/2406.08418)|**[link](https://github.com/opengvlab/omnicorpus)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aOmniCorpus\u7684\u5927\u578b\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u6570\u636e\u96c6\uff0c\u89c4\u6a21\u8fbe\u5230100\u4ebf\u7ea7\u522b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u901a\u8fc7\u9ad8\u6548\u7684\u5f15\u64ce\u7b5b\u9009\u548c\u63d0\u53d6\u4e86\u5927\u91cf\u9ad8\u8d28\u91cf\u6587\u6863\uff0c\u5305\u542b86\u4ebf\u5f20\u56fe\u7247\u548c1,696\u4e07\u4ebf\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u76f8\u8f83\u4e8e\u540c\u7c7b\u6570\u636e\uff08\u5982MMC4\u3001OBELICS\uff09\uff0cOmniCorpus\u5177\u6709\u4ee5\u4e0b\u4f18\u52bf\uff1a1\uff09\u89c4\u6a21\u6269\u592715\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6570\u636e\u8d28\u91cf\uff1b2\uff09\u6765\u6e90\u66f4\u4e3a\u591a\u6837\uff0c\u5305\u62ec\u82f1\u6587\u548c\u975e\u82f1\u6587\u7f51\u7ad9\uff0c\u4ee5\u53ca\u89c6\u9891\u4e3a\u4e3b\u7684\u7f51\u7ad9\uff1b3\uff09\u7075\u6d3b\u6027\u66f4\u5f3a\uff0c\u53ef\u4ee5\u4ece\u56fe\u50cf-\u6587\u672c\u4ea4\u9519\u683c\u5f0f\u8f7b\u677e\u8f6c\u6362\u4e3a\u7eaf\u6587\u672c\u8bed\u6599\u5e93\u6216\u56fe\u50cf-\u6587\u672c\u5bf9\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\u548c\u5b9e\u9a8c\uff0c\u8bba\u6587\u9a8c\u8bc1\u4e86OmniCorpus\u7684\u6570\u636e\u8d28\u91cf\u3001\u53ef\u7528\u6027\u548c\u6709\u6548\u6027\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684\u591a\u6a21\u6001\u6a21\u578b\u7814\u7a76\u63d0\u4f9b\u575a\u5b9e\u7684\u6570\u636e\u57fa\u7840\u3002\u76f8\u5173\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/OpenGVLab/OmniCorpus\u4e0a\u516c\u5f00\u3002**|\n", "2406.08414": "|**2024-06-12**|**Discovering Preference Optimization Algorithms with and for Large Language Models**|Chris Lu et.al.|[2406.08414](http://arxiv.org/abs/2406.08414)|**[link](https://github.com/luchris429/DiscoPOP)**|****\u4e2d\u6587\u7ffb\u8bd1\uff1a** \u79bb\u7ebf\u504f\u597d\u4f18\u5316\u662f\u63d0\u5347\u548c\u63a7\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u8d28\u91cf\u7684\u91cd\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u504f\u597d\u4f18\u5316\u88ab\u89c6\u4e3a\u57fa\u4e8e\u4eba\u5de5\u8bbe\u8ba1\u7684\u51f8\u635f\u5931\u51fd\u6570\u7684\u79bb\u7ebf\u76d1\u7763\u5b66\u4e60\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53d7\u9650\u4e8e\u4eba\u7c7b\u521b\u9020\u529b\uff0c\u672a\u80fd\u5145\u5206\u63a2\u7d22\u53ef\u80fd\u7684\u635f\u5931\u51fd\u6570\u7684\u5de8\u5927\u641c\u7d22\u7a7a\u95f4\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528LLM\u8fdb\u884c\u76ee\u6807\u53d1\u73b0\u7684\u65b9\u6cd5\uff0c\u4ee5\u81ea\u52a8\u53d1\u73b0\u65b0\u7684\u6700\u5148\u8fdb\u7684\u504f\u597d\u4f18\u5316\u7b97\u6cd5\uff0c\u65e0\u9700\uff08\u4e13\u5bb6\uff09\u4eba\u5de5\u5e72\u9884\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u8fed\u4ee3\u5730\u63d0\u793aLLM\uff0c\u6839\u636e\u5148\u524d\u7684\u6027\u80fd\u8bc4\u4f30\u63d0\u51fa\u5e76\u5b9e\u73b0\u65b0\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u5bfc\u81f4\u4e86\u672a\u77e5\u4e14\u9ad8\u6548\u7684\u4f18\u5316\u7b97\u6cd5\u7684\u53d1\u73b0\u3002\u5176\u4e2d\u6700\u597d\u7684\u4e00\u4e2a\u88ab\u547d\u540d\u4e3a\u201c\u53d1\u73b0\u504f\u597d\u4f18\u5316\u201d\uff08DiscoPOP\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\uff0c\u5b83\u5de7\u5999\u5730\u878d\u5408\u4e86\u903b\u8f91\u548c\u6307\u6570\u635f\u5931\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDiscoPOP\u5728\u6027\u80fd\u4e0a\u8fbe\u5230\u4e86\u6700\u65b0\u6c34\u5e73\uff0c\u5e76\u6210\u529f\u5730\u5e94\u7528\u4e8e\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u3002**|\n", "2406.08413": "|**2024-06-12**|**Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference**|Christopher Wolters et.al.|[2406.08413](http://arxiv.org/abs/2406.08413)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u671f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f7f\u5f97\u673a\u5668\u80fd\u591f\u751f\u6210\u903c\u771f\u7684\u6587\u672c\u5e76\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u968f\u7740\u8ba1\u7b97\u548c\u5185\u5b58\u9700\u6c42\u7684\u6025\u5267\u589e\u957f\uff0c\u5c24\u5176\u662f\u5f53LLMs\u8d85\u8d8a\u5355\u4e2aGPU\u7684\u5904\u7406\u80fd\u529b\u65f6\uff0c\u5bf9\u901f\u5ea6\u3001\u6548\u7387\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u9700\u6c42\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u540c\u65f6\uff0c\u8ba1\u7b97\u673a\u6027\u80fd\u548c\u5185\u5b58\u80fd\u529b\u7684\u53d1\u5c55\u5e76\u672a\u8ddf\u4e0a\u6b65\u4f10\uff0c\u5c24\u5176\u662f\u5728\u6469\u5c14\u5b9a\u5f8b\u653e\u7f13\u7684\u80cc\u666f\u4e0b\u3002\u5185\u5b58\u8bbf\u95ee\u6210\u672c\u8fdc\u9ad8\u4e8e\u8ba1\u7b97\uff0c\u8fd9\u7ed9\u5927\u89c4\u6a21\u6269\u5c55\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5373\u6240\u8c13\u7684\u201c\u5185\u5b58\u5899\u201d\u3002\u5728\u8fd9\u4e2a\u65f6\u5019\uff0c\u8ba1\u7b97\u5728\u5185\u5b58\uff08Compute-in-Memory, CIM\uff09\u6280\u672f\u4e3aAI\u63a8\u7406\u63d0\u4f9b\u4e86\u52a0\u901f\u53ef\u80fd\uff0c\u901a\u8fc7\u5728\u5185\u5b58\u4e2d\u76f4\u63a5\u6267\u884c\u6a21\u62df\u8ba1\u7b97\uff0c\u6709\u671b\u964d\u4f4e\u5ef6\u8fdf\u548c\u529f\u8017\u3002\u901a\u8fc7\u7d27\u5bc6\u96c6\u6210\u5185\u5b58\u548c\u8ba1\u7b97\u5143\u4ef6\uff0cCIM\u6d88\u9664\u4e86\u51af\u8bfa\u4f9d\u66fc\u74f6\u9888\uff0c\u51cf\u5c11\u4e86\u6570\u636e\u4f20\u8f93\uff0c\u63d0\u9ad8\u4e86\u80fd\u6e90\u6548\u7387\u3002 \u672c\u7efc\u8ff0\u8bba\u6587\u6982\u8ff0\u4e86\u57fa\u4e8e\u53d8\u538b\u5668\u7684\u6a21\u578b\uff0c\u63a2\u8ba8\u4e86\u5404\u79cdCIM\u67b6\u6784\uff0c\u5e76\u7814\u7a76\u4e86\u5b83\u4eec\u5982\u4f55\u5e94\u5bf9\u73b0\u4ee3\u4eba\u5de5\u667a\u80fd\u8ba1\u7b97\u7cfb\u7edf\u9762\u4e34\u7684\u7d27\u8feb\u6311\u6218\u3002\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u4e86\u4e0e\u53d8\u538b\u5668\u76f8\u5173\u7684\u8fd0\u7b97\u53ca\u5176\u786c\u4ef6\u52a0\u901f\u7b56\u7565\uff0c\u540c\u65f6\u6307\u51fa\u76f8\u5173CIM\u8bbe\u8ba1\u4e2d\u7684\u6311\u6218\u3001\u8d8b\u52bf\u548c\u6d1e\u5bdf\u3002|\n", "2406.08402": "|**2024-06-12**|**Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models**|Chun-Yi Kuan et.al.|[2406.08402](http://arxiv.org/abs/2406.08402)|**[link](https://github.com/kuan2jiu99/audio-hallucination)**|**## \u80cc\u666f \u5927\u578b\u97f3\u9891\u8bed\u8a00\u6a21\u578b\uff08LALMs\uff09\u901a\u8fc7\u6574\u5408\u97f3\u9891\u611f\u77e5\u80fd\u529b\uff0c\u589e\u5f3a\u4e86\u4f20\u7edf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u4f7f\u5176\u80fd\u591f\u5904\u7406\u97f3\u9891\u76f8\u5173\u4efb\u52a1\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bc4\u4f30LALMs\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u4f46\u5bf9\u5b83\u4eec\u7684\u53ef\u9760\u6027\uff0c\u7279\u522b\u662f\u5173\u4e8e\u5bf9\u8c61\u5e7b\u89c9\u7b49\u95ee\u9898\u7684\u5173\u6ce8\u4e0d\u8db3\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u6765\u8bc4\u4f30\u516c\u5f00\u53ef\u7528\u7684LALMs\u5728\u5bf9\u8c61\u5e7b\u89c9\u65b9\u9762\u7684\u7a0b\u5ea6\u3002\u7ed3\u679c\u8868\u660e\uff0cLALMs\u5728\u7406\u89e3\u97f3\u9891\u5185\u5bb9\u65b9\u9762\u4e0e\u4e13\u95e8\u7684\u97f3\u9891captioning\u6a21\u578b\u76f8\u5f53\uff0c\u4f46\u5728\u56de\u7b54\u533a\u5206\u6027\u95ee\u9898\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u9700\u8981\u8bc6\u522b\u97f3\u9891\u7247\u6bb5\u4e2d\u7279\u5b9a\u7269\u4f53\u58f0\u97f3\u7684\u95ee\u9898\u3002\u8fd9\u63ed\u793a\u4e86\u5f53\u524dLALMs\u7684\u4e00\u4e2a\u5173\u952e\u5f31\u70b9\uff1a\u5b83\u4eec\u5bf9\u533a\u5206\u6027\u67e5\u8be2\u7684\u7406\u89e3\u4e0d\u8db3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u63d0\u793a\u5de5\u7a0b\u5982\u4f55\u63d0\u5347LALMs\u5728\u533a\u5206\u6027\u95ee\u9898\u4e0a\u7684\u6027\u80fd\u3002**|\n", "2406.08398": "|**2024-06-12**|**cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers**|Anirudh Sundar et.al.|[2406.08398](http://arxiv.org/abs/2406.08398)|null|## \u80cc\u666f \u5728\u60c5\u5883\u5316\u548c\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8bdd\uff08SIMMC\uff09\u7684\u65b0\u5174\u7814\u7a76\u9886\u57df\u4e2d\uff0c\u79d1\u5b66\u8bba\u6587\u7684\u4e92\u52a8\u662f\u4e00\u4e2a\u91cd\u8981\u65b9\u5411\u3002\u7531\u4e8e\u79d1\u5b66\u8bba\u6587\u4e3b\u8981\u7531\u6587\u672c\u3001\u516c\u5f0f\u3001\u56fe\u8868\u548c\u8868\u683c\u6784\u6210\uff0cSIMMC\u65b9\u6cd5\u9700\u8981\u9488\u5bf9\u8fd9\u4e9b\u7ec4\u6210\u90e8\u5206\u8fdb\u884c\u4e13\u95e8\u8bbe\u8ba1\uff0c\u4ee5\u652f\u6301\u79d1\u7814\u4eba\u5458\u6240\u9700\u7684\u6df1\u5ea6\u63a2\u7a76\u548c\u4e92\u52a8\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5bf9\u8bdd\u5f0f\u8bba\u6587\u201d\uff08cPAPERS\uff09\u7684\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u6765\u81eaarXiv\u4e0a\u53ef\u7528\u7684\u79d1\u5b66\u6587\u6863\u7684\u5b66\u672f\u8bba\u6587\u8bc4\u8bba\u4e2d\u7684\u95ee\u7b54\u5bf9\uff0c\u8fd9\u4e9b\u95ee\u7b54\u4e0e\u8bba\u6587\u7ec4\u4ef6\u53ca\u5176\u5f15\u7528\u76f8\u5173\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u6570\u636e\u6536\u96c6\u7b56\u7565\uff0c\u901a\u8fc7OpenReview\u6536\u96c6\u8fd9\u4e9b\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5e76\u4e0eLaTeX\u6e90\u6587\u4ef6\u4e2d\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5173\u8054\u8d77\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e00\u7cfb\u5217\u57fa\u7ebf\u65b9\u6cd5\uff0c\u5305\u62ec\u96f6\u6837\u672c\u548c\u5fae\u8c03\u914d\u7f6e\uff0c\u6765\u5904\u7406cPAPERS\u6570\u636e\u96c6\u3002|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418](http://arxiv.org/abs/2406.09418)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|**\u5728\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u5c55\u57fa\u7840\u4e0a\uff0c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5728\u89c6\u9891\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u89c6\u9891LMMs\u4f9d\u8d56\u4e8e\u56fe\u50cf\u6216\u89c6\u9891\u7f16\u7801\u5668\u5904\u7406\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u4e9b\u7f16\u7801\u5668\u5404\u81ea\u5b58\u5728\u5c40\u9650\u6027\u3002\u56fe\u50cf\u7f16\u7801\u5668\u64c5\u957f\u6355\u6349\u5e27\u5e8f\u5217\u4e2d\u7684\u4e30\u5bcc\u7a7a\u95f4\u7ec6\u8282\uff0c\u4f46\u7f3a\u4e4f\u660e\u786e\u7684\u65f6\u95f4\u4e0a\u4e0b\u6587\uff1b\u800c\u89c6\u9891\u7f16\u7801\u5668\u63d0\u4f9b\u65f6\u95f4\u4e0a\u4e0b\u6587\uff0c\u4f46\u5e38\u5e38\u53d7\u9650\u4e8e\u8ba1\u7b97\u8d44\u6e90\uff0c\u5bfc\u81f4\u53ea\u80fd\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u7684\u7a00\u758f\u5e27\uff0c\u4ece\u800c\u5f71\u54cd\u4e86\u5bf9\u7a7a\u95f4\u548c\u4e0a\u4e0b\u6587\u7684\u7406\u89e3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faVideoGPT+\uff0c\u5b83\u7ed3\u5408\u4e86\u56fe\u50cf\u7f16\u7801\u5668\uff08\u7528\u4e8e\u8be6\u7ec6\u7684\u7a7a\u95f4\u7406\u89e3\uff09\u548c\u89c6\u9891\u7f16\u7801\u5668\uff08\u7528\u4e8e\u5168\u5c40\u65f6\u5e8f\u4e0a\u4e0b\u6587\u5efa\u6a21\uff09\u7684\u4f18\u52bf\u3002\u8be5\u6a21\u578b\u901a\u8fc7\u5c06\u89c6\u9891\u5212\u5206\u4e3a\u5c0f\u6bb5\uff0c\u5e76\u5bf9\u6765\u81ea\u4e24\u8005\u7279\u5f81\u7684\u63d0\u53d6\u5e94\u7528\u81ea\u9002\u5e94\u6c60\u5316\u7b56\u7565\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u3002\u6211\u4eec\u7684\u67b6\u6784\u5728\u591a\u4e2a\u89c6\u9891\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u5305\u62ecVCGBench\u3001MVBench\u548c\u96f6\u6837\u672c\u95ee\u7b54\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a112K\u7684\u89c6\u9891\u6307\u4ee4\u96c6\uff0c\u901a\u8fc7\u65b0\u9896\u7684\u534a\u81ea\u52a8\u6807\u6ce8\u7ba1\u9053\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u89c6\u9891LMMs\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86VCGBench-Diverse\uff0c\u5b83\u6db5\u76d6\u4e8618\u4e2a\u5e7f\u6cdb\u89c6\u9891\u7c7b\u522b\uff0c\u5982\u751f\u6d3b\u65b9\u5f0f\u3001\u4f53\u80b2\u3001\u79d1\u5b66\u3001\u6e38\u620f\u548c\u76d1\u63a7\u89c6\u9891\uff0c\u51714,354\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\u3002\u8fd9\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\u73b0\u6709LMMs\u5728\u5bc6\u96c6\u89c6\u9891\u63cf\u8ff0\u3001\u7a7a\u95f4\u548c\u65f6\u95f4\u7406\u89e3\u4ee5\u53ca\u590d\u6742\u63a8\u7406\u65b9\u9762\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u786e\u4fdd\u5728\u5404\u79cd\u89c6\u9891\u7c7b\u578b\u548c\u52a8\u6001\u4e0b\u7684\u5168\u9762\u8bc4\u4f30\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mbzuai-oryx/VideoGPT-plus\u627e\u5230\u3002**|\n", "2406.09412": "|**2024-06-13**|**Explore the Limits of Omni-modal Pretraining at Scale**|Yiyuan Zhang et.al.|[2406.09412](http://arxiv.org/abs/2406.09412)|**[link](https://github.com/invictus717/MiCo)**|**\u6211\u4eec\u63d0\u8bae\u6784\u5efa\u5168\u6a21\u6001\u667a\u80fd\uff0c\u65e8\u5728\u7406\u89e3\u5404\u79cd\u6a21\u6001\u5e76\u5b66\u4e60\u901a\u7528\u8868\u793a\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u9884\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u591a\u6a21\u6001\u4e0a\u4e0b\u6587\uff08MiCo\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u540c\u65f6\u589e\u52a0\u6a21\u6001\u6570\u91cf\u3001\u6570\u636e\u91cf\u4ee5\u53ca\u6a21\u578b\u53c2\u6570\u7684\u6570\u91cf\u3002\u901a\u8fc7MiCo\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u591a\u9879\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u663e\u8457\u7684\u591a\u6a21\u6001\u5b66\u4e60\u80fd\u529b\uff1a\u4e00\u662f\u9488\u5bf910\u79cd\u4e0d\u540c\u6a21\u6001\u7684\u5355\u6a21\u6001\u611f\u77e5\u57fa\u51c6\uff0c\u4e8c\u662f\u5305\u62ec\u68c0\u7d22\u3001\u95ee\u7b54\u548ccaptioning\u5728\u5185\u768425\u9879\u8de8\u6a21\u6001\u7406\u89e3\u4efb\u52a1\uff0c\u4e09\u662f18\u4e2a\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u57fa\u51c6\u3002\u6211\u4eec\u7684\u6a21\u578b\u521b\u9020\u4e8637\u9879\u6700\u65b0\u7684\u6700\u9ad8\u6027\u80fd\u8bb0\u5f55\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u63a8\u52a8\u5168\u6a21\u6001\u667a\u80fd\u7684\u53d1\u5c55\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u5728\u5f00\u6e90\u3002**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397](http://arxiv.org/abs/2406.09397)|null|\u73b0\u4ee3\u89c6\u89c9\u6a21\u578b\u5728\u5927\u89c4\u6a21\u5608\u6742\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u867d\u7136\u5c55\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u9075\u5faa\u7528\u6237\u610f\u56fe\u3001\u5982\u89c6\u89c9\u7f8e\u611f\u3001\u7279\u5b9a\u98ce\u683c\u548c\u8d23\u4efb\u8f93\u51fa\u65b9\u9762\u53ef\u80fd\u5b58\u5728\u95ee\u9898\u3002\u672c\u6587\u5173\u6ce8\u89c6\u89c9\u7f8e\u5b66\u9886\u57df\uff0c\u76ee\u6807\u662f\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u6807\u51c6\u5728\u68c0\u7d22\u7cfb\u7edf\u4e2d\u4fdd\u6301\u4e00\u81f4\u3002\u9ad8\u7ea7\u68c0\u7d22\u7cfb\u7edf\u901a\u5e38\u91c7\u7528\u57fa\u4e8e\u4f4e\u7ea7\u7279\u5f81\uff08\u5982\u9971\u548c\u5ea6\uff09\u7684\u5ba1\u7f8e\u6a21\u578b\u4f5c\u4e3a\u91cd\u6392\u5668\u6216\u8fc7\u6ee4\u5668\uff0c\u4f46\u9762\u5bf9\u98ce\u683c\u3001\u6587\u5316\u6216\u77e5\u8bc6\u80cc\u666f\u65f6\u6027\u80fd\u6709\u9650\u3002\u6211\u4eec\u53d1\u73b0\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u6539\u5199\u641c\u7d22\u67e5\u8be2\u5e76\u6269\u5c55\u5ba1\u7f8e\u671f\u671b\uff0c\u53ef\u4ee5\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u504f\u597d\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u9488\u5bf9\u89c6\u89c9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u63d0\u53d6LLM\u63a8\u7406\u548c\u5ba1\u7f8e\u6a21\u578b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u66f4\u597d\u5730\u4f7f\u89c6\u89c9\u6a21\u578b\u7b26\u5408\u4eba\u7c7b\u5ba1\u7f8e\u3002\u7531\u4e8e\u7f3a\u4e4f\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u68c0\u7d22\u7cfb\u7edf\u7684\u57fa\u51c6\uff0c\u6211\u4eec\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u6a21\u578b\uff08LMM\uff09\u6765\u8bc4\u4ef7\u7f8e\u611f\u8868\u73b0\u3002\u8003\u8651\u5230\u7f8e\u611f\u8bc4\u4f30\u7684\u4e3b\u89c2\u6027\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3aHPIR\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8861\u91cf\u4e0e\u4eba\u7c7b\u5ba1\u7f8e\u7684\u5951\u5408\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86\u89c6\u89c9\u6a21\u578b\u7684\u7f8e\u611f\u884c\u4e3a\uff0c\u4ece\u591a\u4e2a\u6307\u6807\u6765\u770b\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u63d0\u51fa\u7684\u7b97\u6cd5\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u79cd\u901a\u7528\u5b9e\u8df5\uff0c\u7528\u4e8e\u4f7f\u89c6\u89c9\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002|\n", "2406.09396": "|**2024-06-13**|**Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA**|Jongwoo Park et.al.|[2406.09396](http://arxiv.org/abs/2406.09396)|**[link](https://github.com/jongwoopark7978/LVNet)**|\u957f\u671f\u89c6\u9891\u901a\u5e38\u5305\u542b\u5927\u91cf\u5197\u4f59\u4fe1\u606f\uff0c\u8de8\u8d8a\u8f83\u957f\u7684\u65f6\u95f4\u95f4\u9694\uff0c\u4e14\u5305\u542b\u591a\u4e2a\u677e\u6563\u5173\u8054\u7684\u4e8b\u4ef6\u6216\u5b9e\u4f53\u3002\u56e0\u6b64\uff0c\u5728\u8fdb\u884c\u957f\u89c6\u9891\u95ee\u7b54\uff08LVQA\uff09\u65f6\uff0c\u751f\u6210\u6b63\u786e\u7b54\u6848\u6240\u9700\u7684\u6240\u6709\u4fe1\u606f\u5f80\u5f80\u53ea\u9700\u4e00\u5c0f\u90e8\u5206\u5e27\u5c31\u8db3\u4ee5\u63d0\u4f9b\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8bd5\u56fe\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728LVQA\u57fa\u51c6\u4e0a\u53d6\u5f97\u5353\u8d8a\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u4f9d\u8d56\u4e8e\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5c06\u89c6\u9891\u4e2d\u7684\u6240\u6709\u89c6\u89c9\u5185\u5bb9\u8f6c\u6362\u6210\u81ea\u7136\u8bed\u8a00\u3002\u4f20\u7edf\u505a\u6cd5\u901a\u5e38\u662f\u5747\u5300\u91c7\u6837\u5927\u91cf\u5e27\u5e76\u72ec\u7acb\u4e3a\u5176\u751f\u6210\u63cf\u8ff0\uff0c\u8fd9\u65e2\u4e0d\u9ad8\u6548\u4e5f\u4e0d\u514d\u6709\u5197\u4f59\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5173\u952e\u5e27\u9009\u62e9\u548c\u987a\u5e8f\u611f\u77e5\u7684\u63cf\u8ff0\u65b9\u6cd5\uff0c\u4ee5\u663e\u8457\u51cf\u5c11\u8fd9\u4e9b\u5197\u4f59\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u4e2a\u521b\u65b0\u65b9\u6cd5\uff1a\u5c42\u6b21\u5173\u952e\u5e27\u9009\u62e9\u5668\u548c\u987a\u5e8f\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u7684\u6700\u7ec8\u6846\u67b6\u79f0\u4e3aLVNet\uff0c\u5728\u4e09\u4e2a\u57fa\u51c6LVQA\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5c06\u516c\u5f00\u6211\u4eec\u7684\u4ee3\u7801\u3002|\n", "2406.09367": "|**2024-06-13**|**Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs**|Zijia Zhao et.al.|[2406.09367](http://arxiv.org/abs/2406.09367)|**[link](https://github.com/joez17/videoniah)**|**\u89c6\u9891\u7406\u89e3\u662f\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5173\u952e\u4e0b\u4e00\u6b65\u3002\u4e3a\u4e86\u68c0\u9a8c\u89c6\u9891\u7406\u89e3\u7684\u7279\u5b9a\u65b9\u9762\uff0c\u73b0\u6709\u7684\u89c6\u9891\u57fa\u51c6\u901a\u5e38\u9700\u8981\u7cbe\u5fc3\u9009\u62e9\u4e0e\u76ee\u6807\u80fd\u529b\u5339\u914d\u7684\u89c6\u9891\uff0c\u5e76\u5bf9\u67e5\u8be2-\u54cd\u5e94\u5bf9\u8fdb\u884c\u7e41\u7410\u7684\u6807\u6ce8\uff0c\u4ee5\u5339\u914d\u89c6\u9891\u5185\u5bb9\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u65e2\u5177\u6709\u6311\u6218\u6027\u53c8\u8d44\u6e90\u5bc6\u96c6\u3002\u672c\u6587\u63d0\u51faVideoNIAH\uff08\u89c6\u9891\u9488 haystack\uff09\uff0c\u4e00\u4e2a\u901a\u8fc7\u5408\u6210\u89c6\u9891\u751f\u6210\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\u3002VideoNIAH\u901a\u8fc7\u5c06\u4e0d\u76f8\u5173\u7684\u56fe\u50cf/\u6587\u672c\u201c\u9488\u201d\u63d2\u5165\u539f\u59cb\u89c6\u9891\u4e2d\uff0c\u5c06\u6d4b\u8bd5\u89c6\u9891\u5185\u5bb9\u4e0e\u5b83\u4eec\u7684\u67e5\u8be2-\u54cd\u5e94\u5206\u79bb\u3002\u5b83\u4ec5\u57fa\u4e8e\u8fd9\u4e9b\u9488\u751f\u6210\u6ce8\u91ca\uff0c\u786e\u4fdd\u89c6\u9891\u6765\u6e90\u7684\u591a\u6837\u6027\u548c\u67e5\u8be2-\u54cd\u5e94\u7684\u4e30\u5bcc\u6027\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d2\u5165\u591a\u4e2a\u9488\uff0cVideoNIAH\u4e25\u683c\u8bc4\u4f30\u6a21\u578b\u7684\u65f6\u5e8f\u7406\u89e3\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528VideoNIAH\u6784\u5efa\u4e86\u89c6\u9891\u57fa\u51c6VNBench\uff0c\u5305\u62ec\u68c0\u7d22\u3001\u6392\u5e8f\u548c\u8ba1\u6570\u7b49\u4efb\u52a1\u3002VNBench\u80fd\u591f\u9ad8\u6548\u5730\u8bc4\u4f30\u89c6\u9891\u6a21\u578b\u7684\u7cbe\u7ec6\u7406\u89e3\u80fd\u529b\u548c\u65f6\u7a7a\u5efa\u6a21\u80fd\u529b\uff0c\u540c\u65f6\u652f\u6301\u957f\u8ddd\u79bb\u4f9d\u8d56\u6027\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd1\u671f\u7684\u89c6\u9891\u4e3a\u4e2d\u5fc3\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u6a21\u578b\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u3002\u5c3d\u7ba1\u4e13\u6709\u6a21\u578b\u76f8\u5bf9\u4e8e\u5f00\u6e90\u6a21\u578b\u5177\u6709\u663e\u8457\u4f18\u52bf\uff0c\u4f46\u6240\u6709\u73b0\u6709\u89c6\u9891\u6a21\u578b\u5728\u957f\u8ddd\u79bb\u4f9d\u8d56\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4ecd\u7136\u4e0d\u4f73\u3002VideoNIAH\u662f\u4e00\u4e2a\u7b80\u5355\u4e14\u9ad8\u5ea6\u53ef\u6269\u5c55\u7684\u57fa\u51c6\u6784\u5efa\u6846\u67b6\uff0c\u6211\u4eec\u76f8\u4fe1\u5b83\u5c06\u6fc0\u53d1\u672a\u6765\u89c6\u9891\u57fa\u51c6\u5de5\u4f5c\u7684\u521b\u65b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/joez17/VideoNIAH\u4e0a\u63d0\u4f9b\u3002**|\n", "2406.09363": "|**2024-06-13**|**ElicitationGPT: Text Elicitation Mechanisms via Language Models**|Yifan Wu et.al.|[2406.09363](http://arxiv.org/abs/2406.09363)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u65e0\u9700\u9886\u57df\u77e5\u8bc6\u7684\u67e5\u8be2\u6765\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\uff09\u5bf9\u83b7\u53d6\u7684\u6587\u672c\u9884\u6d4b\u8fdb\u884c\u8bc4\u5206\uff0c\u4ee5\u8bc4\u4f30\u5176\u4e0e\u5b9e\u9645\u72b6\u6001\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u662f\u6fc0\u52b1\u4fe1\u606f\u6536\u96c6\u548c\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\u3002\u7814\u7a76\u901a\u8fc7\u5728\u540c\u884c\u8bc4\u5ba1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u81ea\u52a8\u7684\u6a21\u578b\u8bc4\u5206\u4e0e\u4eba\u5de5\u5bfc\u5e08\u7ed9\u51fa\u7684\u8bc4\u5206\uff0c\u65e8\u5728\u5b9e\u8bc1\u8bc4\u4f30\u8fd9\u4e9b\u673a\u5236\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u4e00\u81f4\u6027\u3002|\n", "2406.09345": "|**2024-06-13**|**DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding**|Suwon Shon et.al.|[2406.09345](http://arxiv.org/abs/2406.09345)|null|## \u80cc\u666f \u5c06\u9884\u8bad\u7ec3\u7684\u6587\u672c\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u8bed\u97f3\u8f93\u5165\u76f8\u7ed3\u5408\uff0c\u5df2\u7ecf\u8d4b\u4e88\u4e86\u8fd9\u4e9b\u6a21\u578b\u6267\u884c\u591a\u6837\u5316\u8bed\u97f3\u4efb\u52a1\u7684\u80fd\u529b\uff0c\u5305\u62ec\u6307\u4ee4\u8ddf\u968f\u3002\u8fd9\u79cd\u6574\u5408\u9700\u8981\u7ed3\u5408\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u548cLLM\uff0c\u5b83\u4eec\u5206\u522b\u9488\u5bf9\u4e0d\u540c\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u3002\u6211\u4eec\u63d0\u8bae\u4f7f\u7528\u79bb\u6563\u8bed\u97f3\u5355\u5143\uff08DSU\uff09\uff0c\u800c\u975e\u8fde\u7eed\u503c\u7684\u8bed\u97f3\u7f16\u7801\u8f93\u51fa\uff0c\u901a\u8fc7\u8bed\u97f3\u9002\u914d\u5668\u5c06DSU\u8f6c\u6362\u5230LLM\u7684\u5d4c\u5165\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u65e0\u76d1\u7763\u7684\u8bed\u97f3\u7f16\u7801\u5668\u751f\u6210DSU\uff0c\u7136\u540e\u8fd0\u7528k-means\u805a\u7c7b\u65b9\u6cd5\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u5904\u7406\u6765\u81ea\u89c1/\u672a\u89c1\u8fc7\u9886\u57df\u4ee5\u53ca\u53e3\u8bed\u95ee\u7b54\u4e2d\u7684\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u7a33\u5065\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6765\u81ea\u4e0d\u540c\u81ea\u76d1\u7763\u8bed\u97f3\u7f16\u7801\u5668\u5c42\u7684DSU\u7c7b\u578b\uff0c\u4ee5\u53ca\u6885\u5c14\u9891\u7387\u5012\u8c31\u7cfb\u6570\uff08MFCC\uff09\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u53e3\u8bed\u95ee\u7b54\u7684\u6307\u4ee4\u8c03\u4f18\u4efb\u52a1\u4e2d\uff0cASR\u4efb\u52a1\u548c\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u53ef\u80fd\u8f83\u4f4e\u3002|\n", "2406.09325": "|**2024-06-13**|**REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space**|Tomer Ashuach et.al.|[2406.09325](http://arxiv.org/abs/2406.09325)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ef\u80fd\u65e0\u610f\u4e2d\u8bb0\u4f4f\u5e76\u6cc4\u9732\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u654f\u611f\u6216\u4e2a\u4eba\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\uff0c\u5f15\u53d1\u9690\u79c1\u95ee\u9898\u3002\u5f53\u524d\u7684\u89e3\u51b3\u65b9\u6848\u5305\u62ec\u6602\u8d35\u7684\u6570\u636e\u6e05\u6d17\uff0c\u6216\u8005\u901a\u8fc7\u9057\u5fd8\u548c\u6a21\u578b\u7f16\u8f91\u6765\u8fc7\u6ee4\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u80fd\u88ab\u63d0\u53d6\u653b\u51fb\u7ed5\u8fc7\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\uff0c\u540d\u4e3aREVS\uff0c\u7528\u4e8e\u4eceLLMs\u4e2d\u6d88\u9664\u654f\u611f\u4fe1\u606f\u3002REVS\u8bc6\u522b\u5e76\u4fee\u6539\u4e0e\u6bcf\u6761\u654f\u611f\u4fe1\u606f\u76f8\u5173\u7684\u5c11\u91cf\u795e\u7ecf\u5143\u3002\u901a\u8fc7\u5c06\u8fd9\u4e9b\u795e\u7ecf\u5143\u6295\u5f71\u5230\u8bcd\u6c47\u7a7a\u95f4\uff08\u53bb\u5d4c\u5165\uff09\uff0c\u6211\u4eec\u5b9a\u4f4d\u9a71\u52a8\u5176\u751f\u6210\u7684\u5173\u952e\u90e8\u5206\u3002\u7136\u540e\uff0c\u6211\u4eec\u6839\u636e\u53bb\u5d4c\u5165\u77e9\u9635\u7684\u4f2a\u9006\u8ba1\u7b97\u6a21\u578b\u7f16\u8f91\uff0c\u5e76\u5e94\u7528\u5b83\u6765\u964d\u4f4e\u76ee\u6807\u654f\u611f\u6570\u636e\u7684\u751f\u6210\u6982\u7387\u3002\u4e3a\u4e86\u5145\u5206\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u771f\u6b63\u654f\u611f\u4fe1\u606f\u4e0a\u7684\u6548\u679c\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\uff1a\u4e00\u4e2a\u662fGPT-J\u56fa\u6709\u7684\u7535\u5b50\u90ae\u4ef6\u6570\u636e\u96c6\uff0c\u53e6\u4e00\u4e2a\u662f\u6211\u4eec\u8c03\u6574\u6a21\u578b\u4f7f\u5176\u8bb0\u5fc6\u7684\u5408\u6210\u793e\u4f1a\u4fdd\u969c\u53f7\u7801\u6570\u636e\u96c6\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7f16\u8f91\u65b9\u6cd5\u76f8\u6bd4\uff0cREVS\u5728\u6d88\u9664\u654f\u611f\u4fe1\u606f\u548c\u62b5\u6297\u63d0\u53d6\u653b\u51fb\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u7684\u5b8c\u6574\u6027\u3002\u4ee3\u7801\u548c\u6f14\u793a\u7b14\u8bb0\u672c\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.09324": "|**2024-06-13**|**Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs**|Zhao Xu et.al.|[2406.09324](http://arxiv.org/abs/2406.09324)|**[link](https://github.com/usail-hkust/bag_of_tricks_for_llm_jailbreaking)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6837\u672c\u4efb\u52a1\u6267\u884c\u65b9\u9762\u5c55\u73b0\u51fa\u663e\u8457\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u6613\u53d7\u7834\u89e3\u653b\u51fb\uff0c\u53ef\u80fd\u88ab\u64cd\u7eb5\u4ea7\u751f\u6709\u5bb3\u8f93\u51fa\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5f00\u59cb\u5c06\u7834\u89e3\u653b\u51fb\u5206\u4e3a\u4ee4\u724c\u7ea7\u548c\u63d0\u793a\u7ea7\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u5de5\u4f5c\u4e3b\u8981\u5ffd\u89c6\u4e86\u7834\u89e3\u653b\u51fb\u7684\u591a\u6837\u5173\u952e\u56e0\u7d20\uff0c\u5927\u90e8\u5206\u7814\u7a76\u805a\u7126\u4e8eLLM\u7684\u6f0f\u6d1e\uff0c\u800c\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u63a2\u7d22\u4e0d\u8db3\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e0d\u540c\u653b\u51fb\u8bbe\u7f6e\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u8bae\u5efa\u7acb\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u6846\u67b6\uff0c\u4ee5\u4fc3\u8fdb\u6807\u51c6\u5316\u8bc4\u4f30\u3002\u6211\u4eec\u4ece\u76ee\u6807\u7ea7\u548c\u653b\u51fb\u7ea7\u4e24\u4e2a\u89d2\u5ea6\uff0c\u8be6\u7ec6\u8003\u5bdf\u4e86\u5b9e\u65bd\u9488\u5bf9LLMs\u7684\u7834\u89e3\u653b\u51fb\u7684\u516b\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5e38\u7528\u6570\u636e\u96c6\u4e0a\u5bf9\u516d\u79cd\u9632\u5fa1\u65b9\u6cd5\u8fdb\u884c\u4e86\u4e03\u79cd\u4ee3\u8868\u6027\u7684\u7834\u89e3\u653b\u51fb\uff0c\u603b\u8ba1\u7ea6320\u4e2a\u5b9e\u9a8c\uff0c\u4f7f\u7528A800-80G GPU\u8017\u65f6\u5927\u7ea65\u4e07\u5c0f\u65f6\u3002\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86\u5bf9\u9632\u5fa1\u589e\u5f3a\u7684LLMs\u8fdb\u884c\u6807\u51c6\u5316\u8bc4\u4f30\u7684\u5fc5\u8981\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff1ahttps://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking\u3002**|\n", "2406.09321": "|**2024-06-13**|**JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models**|Delong Ran et.al.|[2406.09321](http://arxiv.org/abs/2406.09321)|**[link](https://github.com/thuccslab/jailbreakeval)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8d8a\u72f1\u653b\u51fb\u7814\u7a76\u4e2d\u7684\u8bc4\u4f30\u96be\u9898\u3002\u76ee\u524d\uff0c\u5bf9\u4e8e\u653b\u51fb\u662f\u5426\u6210\u529f\u7f3a\u4e4f\u7edf\u4e00\u6807\u51c6\uff0c\u4e0d\u540c\u7684\u8bc4\u4f30\u65b9\u6cd5\u5982\u4eba\u5de5\u6807\u6ce8\u6216\u7279\u5b9a\u65b9\u5f0f\u63d0\u793aGPT-4\u5b58\u5728\uff0c\u5404\u6709\u4f18\u7f3a\u70b9\uff0c\u5bf9\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u4f53\u73b0\u548c\u7814\u7a76\u6210\u672c\u4ea7\u751f\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7814\u7a76\u5206\u6790\u4e86\u8fd1\u4e5d\u5341\u98792023\u5e745\u6708\u81f32024\u5e744\u6708\u671f\u95f4\u53d1\u5e03\u7684\u8d8a\u72f1\u653b\u51fb\u76f8\u5173\u7814\u7a76\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u8be6\u7ec6\u7684\u8bc4\u4f30\u65b9\u6cd5\u5206\u7c7b\u4f53\u7cfb\uff0c\u6df1\u5165\u5256\u6790\u4e86\u5404\u79cd\u8bc4\u4f30\u5668\u7684\u4f18\u7f3a\u70b9\u53ca\u5176\u5e94\u7528\u73b0\u72b6\u3002\u4e3a\u4e86\u63a8\u52a8\u540e\u7eed\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u5e76\u63a8\u51fa\u4e86JailbreakEval\u5de5\u5177\u5305\uff0c\u5b83\u662f\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684\u5e73\u53f0\uff0c\u96c6\u6210\u4e86\u591a\u79cd\u77e5\u540d\u7684\u8bc4\u4f30\u5668\uff0c\u7528\u6237\u53ea\u9700\u4e00\u4e2a\u547d\u4ee4\u5373\u53ef\u83b7\u53d6\u7ed3\u679c\u3002\u6b64\u5916\uff0cJailbreakEval\u652f\u6301\u7528\u6237\u5728\u7edf\u4e00\u6846\u67b6\u5185\u5b9a\u5236\u81ea\u5b9a\u4e49\u8bc4\u4f30\u6d41\u7a0b\uff0c\u7b80\u5316\u4e86\u5f00\u53d1\u548c\u6bd4\u8f83\u8fc7\u7a0b\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u671f\u671bJailbreakEval\u80fd\u4fc3\u8fdb\u8d8a\u72f1\u653b\u51fb\u8bc4\u4ef7\u7684\u6807\u51c6\u5316\uff0c\u6210\u4e3a\u793e\u533a\u5185\u8d8a\u72f1\u7814\u7a76\u8bc4\u4f30\u7684\u50ac\u5316\u5242\u3002**|\n", "2406.10229": "|**2024-06-14**|**Quantifying Variance in Evaluation Benchmarks**|Lovish Madaan et.al.|[2406.10229](http://arxiv.org/abs/2406.10229)|null|\u8bc4\u4ef7\u57fa\u51c6\u662f\u8861\u91cf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u529b\u7684\u5173\u952e\uff0c\u4e5f\u662f\u63a8\u52a8\u8fd9\u4e9b\u80fd\u529b\u8fdb\u6b65\u7684\u9a71\u52a8\u529b\u3002\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff08\u6216\u7f3a\u4e4f\uff09\uff0c\u73b0\u5728\u5b83\u4eec\u4e5f\u88ab\u5e7f\u6cdb\u7528\u4e8e\u51b3\u5b9a\u4e0d\u540c\u7684\u8bad\u7ec3\u9009\u62e9\u4e4b\u95f4\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u6211\u4eec\u5f88\u5c11\u91cf\u5316\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\uff0c\u8fd9\u51b3\u5b9a\u4e86\u6027\u80fd\u5dee\u5f02\u7684\u542b\u4e49\u3002\u672c\u6587\u5b9a\u4e49\u5e76\u6d4b\u91cf\u4e86\u4e00\u7cfb\u5217\u65e8\u5728\u8861\u91cf\u8bc4\u4ef7\u57fa\u51c6\u65b9\u5dee\u7684\u6307\u6807\uff0c\u5305\u62ec\u521d\u59cb\u5316\u65f6\u7684\u968f\u673a\u79cd\u5b50\u65b9\u5dee\u548c\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5355\u8c03\u6027\u3002\u901a\u8fc7\u5bf9\u5927\u91cf\u6a21\u578b\uff08\u5305\u62ec\u516c\u5f00\u53ef\u7528\u7684\u548c\u4ece\u5934\u8bad\u7ec3\u7684\u6a21\u578b\uff09\u8fdb\u884c\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5404\u79cd\u65b9\u5dee\u5ea6\u91cf\u7684\u5b9e\u8bc1\u4f30\u8ba1\uff0c\u5e76\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u8003\u8651\u548c\u5efa\u8bae\u3002\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u8fde\u7eed\u548c\u79bb\u6563\u6027\u80fd\u5ea6\u91cf\u7684\u5b9e\u7528\u6027\u548c\u6743\u8861\uff0c\u5e76\u63a2\u7d22\u4e86\u66f4\u597d\u5730\u7406\u89e3\u548c\u51cf\u5c11\u65b9\u5dee\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5bf9\u4e8e\u8f83\u5c0f\u89c4\u6a21\uff08\u7ea670\u4ebf\u53c2\u6570\uff09\u7684\u6a21\u578b\uff0c\u5982\u5c06\u591a\u6a21\u6001\u591a\u4efb\u52a1\u5b66\u4e60\uff08MMLU\uff09\u4efb\u52a1\u6846\u67b6\u4e3a\u5b8c\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u5e38\u5e38\u964d\u4f4e\u65b9\u5dee\uff1b\u800c\u53d7\u5230\u4eba\u7c7b\u6d4b\u8bd5\u6587\u732e\u542f\u53d1\u7684\u66f4\u590d\u6742\u65b9\u6cd5\uff08\u5982\u9879\u76ee\u5206\u6790\u548c\u9879\u76ee\u53cd\u5e94\u7406\u8bba\uff09\u5728\u663e\u8457\u51cf\u5c11\u65b9\u5dee\u65b9\u9762\u6548\u679c\u6709\u9650\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u8bc4\u4ef7\u57fa\u51c6\u7684\u65b9\u5dee\u7279\u6027\uff0c\u63d0\u51fa\u4e86\u9488\u5bf9LLMs\u7684\u7279\u5b9a\u6280\u672f\u6765\u51cf\u5c11\u65b9\u5dee\uff0c\u5e76\u666e\u904d\u9f13\u52b1\u5b9e\u8df5\u8005\u5728\u6bd4\u8f83\u6a21\u578b\u65f6\u4ed4\u7ec6\u8003\u8651\u65b9\u5dee\u56e0\u7d20\u3002|\n", "2406.10218": "|**2024-06-14**|**Semantic Membership Inference Attack against Large Language Models**|Hamid Mozaffari et.al.|[2406.10218](http://arxiv.org/abs/2406.10218)|null|## \u80cc\u666f \u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIA\uff09\u7684\u76ee\u6807\u662f\u8bc6\u522b\u7279\u5b9a\u6570\u636e\u70b9\u662f\u5426\u88ab\u7eb3\u5165\u4e86\u76ee\u6807\u6a21\u578b\u7684\u8bad\u7ec3\u96c6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u2014\u2014\u8bed\u4e49\u6210\u5458\u8eab\u4efd\u6cc4\u9732\u653b\u51fb\uff08Semantic Membership Inference Attack\uff0cSMIA\uff09\uff0c\u901a\u8fc7\u5229\u7528\u8f93\u5165\u7684\u8bed\u4e49\u5185\u5bb9\u53ca\u5176\u6270\u52a8\uff0c\u63d0\u5347MIA\u7684\u6027\u80fd\u3002SMIA\u8bad\u7ec3\u4e00\u4e2a\u795e\u7ecf\u7f51\u7edc\u6765\u5206\u6790\u76ee\u6807\u6a21\u578b\u5bf9\u6270\u52a8\u8f93\u5165\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6355\u6349\u6210\u5458\u6837\u672c\u4e0e\u975e\u6210\u5458\u6837\u672c\u4e4b\u95f4\u8f93\u51fa\u6982\u7387\u5206\u5e03\u7684\u5dee\u5f02\u3002\u6211\u4eec\u5728Pythia\u548cGPT-Neo\u6a21\u578b\u5bb6\u65cf\uff0c\u4ee5\u53caWikipedia\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSMIA\u660e\u663e\u4f18\u4e8e\u73b0\u6709\u653b\u51fb\u624b\u6bb5\uff0c\u4f8b\u5982\u5728Pythia-12B\u4e0a\u7684AUC-ROC\u503c\u8fbe\u5230\u4e8667.39%\uff0c\u800c\u7b2c\u4e8c\u597d\u7684\u653b\u51fb\u65b9\u6cd5\u4ec5\u4e3a58.90%\u3002|\n", "2406.10216": "|**2024-06-14**|**Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs**|Rui Yang et.al.|[2406.10216](http://arxiv.org/abs/2406.10216)|null|\u5728\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6846\u67b6\u4e2d\uff0c\u5229\u7528\u57fa\u4e8e\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u5956\u52b1\u6a21\u578b\u5df2\u8bc1\u5b9e\u80fd\u6709\u6548\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u610f\u56fe\u3002\u7136\u800c\uff0c\u5f53\u524d\u5956\u52b1\u6a21\u578b\u5bf9\u672a\u89c1\u8fc7\u7684\u63d0\u793a\u548c\u54cd\u5e94\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\uff0c\u53ef\u80fd\u5bfc\u81f4\u6240\u8c13\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u5373\u5956\u52b1\u4f18\u5316\u8fc7\u5ea6\u5bfc\u81f4\u5b9e\u9645\u6027\u80fd\u4e0b\u964d\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u503e\u5411\u4e8e\u7ea6\u675f\u7b56\u7565\u4f18\u5316\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u6b63\u5219\u5316\u9690\u85cf\u72b6\u6001\u6765\u589e\u5f3a\u5956\u52b1\u6a21\u578b\u5e94\u5bf9\u5206\u5e03\u53d8\u5316\u7684\u6cdb\u5316\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4fdd\u7559\u57fa\u7840\u6a21\u578b\u7684\u8bed\u8a00\u6a21\u578b\u5934\uff0c\u5e76\u7ed3\u5408\u4e00\u7cfb\u5217\u6587\u672c\u751f\u6210\u635f\u5931\uff0c\u65e8\u5728\u4fdd\u6301\u9690\u85cf\u72b6\u6001\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\uff0c\u540c\u65f6\u5728\u76f8\u540c\u7684\u9690\u85cf\u72b6\u6001\u540e\u5b66\u4e60\u4e00\u4e2a\u5956\u52b1\u5934\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5f15\u5165\u7684\u6b63\u5219\u5316\u6280\u672f\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u6cdb\u5316\u4efb\u52a1\u4e2d\u7684\u5956\u52b1\u6a21\u578b\u51c6\u786e\u6027\uff0c\u5e76\u6709\u6548\u7f13\u89e3\u4e86RLHF\u4e2d\u7684\u8fc7\u5ea6\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u53ef\u9760\u3001\u66f4\u7a33\u5065\u7684\u504f\u597d\u5b66\u4e60\u8303\u5f0f\u3002|\n", "2406.10209": "|**2024-06-14**|**Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs**|Abhimanyu Hans et.al.|[2406.10209](http://arxiv.org/abs/2406.10209)|**[link](https://github.com/ahans30/goldfish-loss)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u8bb0\u4f4f\u5e76\u91cd\u590d\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u8fd9\u5e26\u6765\u4e86\u9690\u79c1\u548c\u7248\u6743\u95ee\u9898\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u79cd\u8bb0\u5fc6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u4e0b\u4e00\u6b65 token \u8bad\u7ec3\u76ee\u6807\u7684\u5fae\u5999\u4fee\u6539\uff0c\u79f0\u4e3a\u201c\u91d1\u9c7c\u635f\u5931\u201d\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u968f\u673a\u9009\u62e9\u4e00\u90e8\u5206\u4ee4\u724c\u4e0d\u53c2\u4e0e\u635f\u5931\u8ba1\u7b97\u3002\u6a21\u578b\u4e0d\u4f1a\u8bb0\u4f4f\u8fd9\u4e9b\u88ab\u4e22\u5f03\u7684\u4ee4\u724c\uff0c\u4ece\u800c\u9632\u6b62\u4e86\u5b8c\u6574\u8bad\u7ec3\u5e8f\u5217\u7684\u9010\u5b57\u590d\u5236\u3002\u6211\u4eec\u5728\u6570\u5341\u4ebf\u89c4\u6a21\u7684 Llama-2 \u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u548c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u63d0\u53d6\u7684\u8bb0\u5fc6\uff0c\u800c\u5bf9\u4e0b\u6e38\u57fa\u51c6\u7684\u5f71\u54cd\u5fae\u4e4e\u5176\u5fae\u3002**|\n", "2406.10196": "|**2024-06-14**|**TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners**|Tomas de la Rosa et.al.|[2406.10196](http://arxiv.org/abs/2406.10196)|null|**\u6458\u8981\uff1a** \u65c5\u884c\u89c4\u5212\u662f\u4e00\u4e2a\u590d\u6742\u7684\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u6839\u636e\u7ea6\u675f\u6761\u4ef6\u751f\u6210\u4e00\u7cfb\u5217\u4e0e\u8bbf\u95ee\u5730\u70b9\u76f8\u5173\u7684\u884c\u52a8\uff0c\u540c\u65f6\u6700\u5927\u5316\u7528\u6237\u7684\u6ee1\u610f\u5ea6\u3002\u4f20\u7edf\u65b9\u6cd5\u901a\u5e38\u4f1a\u5c06\u95ee\u9898\u8f6c\u5316\u4e3a\u7279\u5b9a\u5f62\u5f0f\u7684\u8bed\u8a00\u8868\u8fbe\uff0c\u4ece\u7f51\u7edc\u8d44\u6e90\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u4f7f\u7528\u5408\u9002\u7684\u6c42\u89e3\u5668\u6765\u751f\u6210\u6709\u6548\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u65b9\u6cd5\u76f4\u63a5\u4ece\u7528\u6237\u8bf7\u6c42\u4e2d\u8f93\u51fa\u8ba1\u5212\uff0c\u5229\u7528\u4e30\u5bcc\u7684\u65c5\u884c\u9886\u57df\u77e5\u8bc6\u63d0\u4f9b\u666f\u70b9\u548c\u53ef\u80fd\u8def\u7ebf\u7b49\u9ad8\u5c42\u6b21\u4fe1\u606f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5f80\u5f80\u4ea7\u751f\u4e0d\u8fde\u8d2f\u3001\u672a\u80fd\u5b8c\u5168\u6ee1\u8db3\u7ea6\u675f\u7684\u8ba1\u5212\uff0c\u4e14\u65e0\u6cd5\u4fdd\u8bc1\u751f\u6210\u9ad8\u8d28\u91cf\u65b9\u6848\u3002\u6211\u4eec\u63d0\u51faTRIP-PAL\uff0c\u4e00\u79cd\u878d\u5408LLMs\u548c\u81ea\u52a8\u5316\u89c4\u5212\u5668\u7684\u6df7\u5408\u65b9\u6cd5\uff1a\uff081\uff09LLMs\u83b7\u53d6\u5e76\u8f6c\u6362\u65c5\u884c\u4fe1\u606f\u548c\u7528\u6237\u9700\u6c42\uff0c\u5c06\u5176\u8f6c\u5316\u4e3a\u53ef\u8f93\u5165\u89c4\u5212\u5668\u7684\u6570\u636e\u7ed3\u6784\uff1b\uff082\uff09\u81ea\u52a8\u5316\u89c4\u5212\u5668\u8d1f\u8d23\u751f\u6210\u6ee1\u8db3\u7ea6\u675f\u5e76\u4f18\u5316\u7528\u6237\u6548\u7528\u7684\u65c5\u884c\u8ba1\u5212\u3002\u6211\u4eec\u5728\u4e0d\u540c\u65c5\u884c\u573a\u666f\u4e2d\u7684\u5b9e\u9a8c\u8868\u660e\uff0cTRIP-PAL\u5728\u751f\u6210\u65c5\u884c\u8ba1\u5212\u65b9\u9762\u4f18\u4e8e\u7eafLLM\u65b9\u6cd5\u3002|\n", "2406.10185": "|**2024-06-14**|**Detecting and Evaluating Medical Hallucinations in Large Vision Language Models**|Jiawei Chen et.al.|[2406.10185](http://arxiv.org/abs/2406.10185)|null|\u968f\u7740\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u5982\u533b\u5b66\u56fe\u50cf\u95ee\u7b54\u548c\u62a5\u544a\u751f\u6210\uff0c\u5b83\u4eec\u4ece\u57fa\u7840\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u90a3\u91cc\u7ee7\u627f\u4e86\u5f3a\u5927\u7684\u529f\u80fd\uff0c\u4f46\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u4ee4\u4eba\u62c5\u5fe7\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5728\u533b\u7597\u8fd9\u6837\u5bf9\u9519\u8bef\u5bb9\u9650\u6781\u4f4e\u7684\u73af\u5883\u4e2d\u5c24\u4e3a\u91cd\u8981\u3002\u7136\u800c\uff0c\u76ee\u524d\u5c1a\u65e0\u4e13\u95e8\u9488\u5bf9\u533b\u7597\u9886\u57df\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u65b9\u6cd5\u6216\u57fa\u51c6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63a8\u51fa\u4e86Med-HallMark\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u533b\u7597\u591a\u6a21\u6001\u9886\u57df\u8bbe\u8ba1\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u8bc4\u4f30\u57fa\u51c6\u3002Med-HallMark\u652f\u6301\u591a\u4efb\u52a1\u5e7b\u89c9\u68c0\u6d4b\uff0c\u63d0\u4f9b\u591a\u5143\u5316\u7684\u5e7b\u89c9\u6570\u636e\uff0c\u5e76\u91c7\u7528\u5206\u7ea7\u5e7b\u89c9\u5206\u7c7b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MediHall Score\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u533b\u7597\u8bc4\u4f30\u6307\u6807\uff0c\u901a\u8fc7\u5206\u5c42\u8bc4\u5206\u7cfb\u7edf\u8bc4\u4f30LVLM\u7684\u5e7b\u89c9\uff0c\u8003\u8651\u5176\u4e25\u91cd\u7a0b\u5ea6\u548c\u7c7b\u578b\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u6f5c\u5728\u4e34\u5e8a\u5f71\u54cd\u7684\u7ec6\u81f4\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86MediHallDetector\uff0c\u4e00\u79cd\u4e13\u4e3a\u7cbe\u786e\u5e7b\u89c9\u68c0\u6d4b\u8bbe\u8ba1\u7684\u533b\u7597LVLM\uff0c\u5b83\u91c7\u7528\u4e86\u591a\u4efb\u52a1\u8bad\u7ec3\u65b9\u6cd5\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u4e3a\u6d41\u884c\u7684LVLM\u8bbe\u7acb\u4e86\u57fa\u7ebf\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMediHall Score\u63d0\u4f9b\u4e86\u6bd4\u4f20\u7edf\u6307\u6807\u66f4\u6df1\u5165\u7406\u89e3\u5e7b\u89c9\u5f71\u54cd\u7684\u80fd\u529b\uff0c\u5e76\u663e\u793a\u4e86MediHallDetector\u7684\u63d0\u5347\u6027\u80fd\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u663e\u8457\u63d0\u9ad8LVLM\u5728\u533b\u7597\u5e94\u7528\u4e2d\u7684\u53ef\u9760\u6027\u3002\u6240\u6709\u76f8\u5173\u8d44\u6e90\u5c06\u5728\u4e0d\u4e45\u540e\u53d1\u5e03\u3002|\n", "2406.10181": "|**2024-06-14**|**Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors**|Siyuan Chen et.al.|[2406.10181](http://arxiv.org/abs/2406.10181)|null|\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u7531\u4e8e\u5185\u5b58\u9700\u6c42\u901a\u5e38\u8d85\u8fc7\u5355\u4e2aGPU\u7684\u5bb9\u91cf\uff0c\u89e3\u51b3\u8fd9\u4e00\u5185\u5b58\u6311\u6218\u7684\u4e00\u4e2a\u5e38\u89c1\u65b9\u6cd5\u662f\u5c06\u8ba1\u7b97\u548c\u6570\u636e\u4eceGPU\u8fc1\u79fb\u5230CPU\u3002\u7136\u800c\uff0c\u8fd9\u53d7\u5230\u666e\u901a\u786c\u4ef6\u5e26\u5bbd\u9650\u5236\u7684\u5236\u7ea6\uff0c\u5f71\u54cd\u4e86CPU\u4e0eGPU\u4e4b\u95f4\u7684\u901a\u4fe1\u6548\u7387\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLSP_Offload\u7684\u6846\u67b6\uff0c\u901a\u8fc7\u5b66\u4e60\u5f0f\u7684\u5b50\u7a7a\u95f4\u6295\u5f71\u5668\uff0c\u5b9e\u73b0\u5728 commodity \u786c\u4ef6\u4e0a\u63a5\u8fd1\u539f\u751f\u901f\u5ea6\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u6d89\u53ca\u5b66\u4e60\u4e00\u4e2a\u9ad8\u6548\u7684\u7a00\u758f\u538b\u7f29\u5668\uff0c\u4ee5\u6700\u5c0f\u5316\u901a\u4fe1\u5e76\u4fdd\u6301\u6700\u5c0f\u7cbe\u5ea6\u635f\u5931\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u5c42\u7ea7\u901a\u4fe1\u8c03\u5ea6\u7b56\u7565\uff0c\u4ee5\u6700\u5927\u5316\u901a\u4fe1\u4e0e\u8ba1\u7b97\u4e4b\u95f4\u7684\u5e76\u884c\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u57284GB\u7b14\u8bb0\u672cGPU\u4e0a\u5fae\u8c0313\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5728\u914d\u590724GB\u5185\u5b58\u7684NVIDIA RTX 4090 GPU\u4e0a\u5fae\u8c0370\u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u4ec5\u6bd4\u65e0\u5185\u5b58\u9650\u5236\u7684\u5fae\u8c03\u616231%\u3002\u4e0e\u6700\u5148\u8fdb\u7684\u79bb\u7ebf\u6846\u67b6\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u9ad8\u4e86\u5fae\u8c03\u541e\u5410\u91cf\uff0c\u6700\u9ad8\u53ef\u8fbe3.33\u500d\uff0c\u5f53\u8fbe\u5230\u76f8\u540c\u51c6\u786e\u5ea6\u65f6\uff0c\u51cf\u5c11\u4e86\u7aef\u5230\u7aef\u5fae\u8c03\u65f6\u95f4\u768433.1%\u81f362.5%\u3002|\n", "2406.10172": "|**2024-06-14**|**Datasets for Multilingual Answer Sentence Selection**|Matteo Gabburo et.al.|[2406.10172](http://arxiv.org/abs/2406.10172)|null|**\u6458\u8981\uff1a** \u5728\u8bbe\u8ba1\u9ad8\u6548\u7684\u68c0\u7d22\u5f0f\u95ee\u7b54\uff08Question Answering\uff0cQA\uff09\u7cfb\u7edf\u4e2d\uff0c\u7b54\u6848\u53e5\u5b50\u9009\u62e9\uff08Answer Sentence Selection\uff0cAS2\uff09\u662f\u4e00\u4e2a\u5173\u952e\u4efb\u52a1\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u6807\u6ce8\u6570\u636e\uff0c\u5927\u591a\u6570AS2\u9886\u57df\u7684\u8fdb\u5c55\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\u3002\u8fd9\u5bfc\u81f4\u4e86\u975e\u82f1\u8bed\u73af\u5883\u4e0bQA\u7cfb\u7edf\u7684\u6027\u80fd\u4e0e\u82f1\u8bed\u7cfb\u7edf\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u672c\u8bba\u6587\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u65b0\u7684\u9ad8\u8d28\u91cf\u591a\u8bed\u8a00\uff08\u6cd5\u8bed\u3001\u5fb7\u8bed\u3001\u610f\u5927\u5229\u8bed\u3001\u8461\u8404\u7259\u8bed\u548c\u897f\u73ed\u7259\u8bed\uff09AS2\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u5bf9\u73b0\u6709\u7684\u82f1\u6587AS2\u6570\u636e\u96c6\uff08\u5982ASNQ\u3001WikiQA\u548cTREC-QA\uff09\u8fdb\u884c\u76d1\u7763\u81ea\u52a8\u673a\u5668\u7ffb\u8bd1\uff08Automatic Machine Translation\uff0cAMT\uff09\u3002\u6211\u4eec\u901a\u8fc7\u591a\u79cd\u5b9e\u9a8c\u548c\u4e0d\u540cTransformer\u67b6\u6784\u7684\u8bc4\u4f30\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u53ca\u7ffb\u8bd1\u6570\u636e\u96c6\u7684\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u5bf9\u4e8e\u6784\u5efa\u5065\u58ee\u7684\u591a\u8bed\u8a00AS2\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u663e\u8457\u7f29\u5c0f\u4e86\u975e\u82f1\u8bed\u4e0e\u82f1\u8bed\u73af\u5883\u4e0b\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2406.10162": "|**2024-06-14**|**Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models**|Carson Denison et.al.|[2406.10162](http://arxiv.org/abs/2406.10162)|**[link](https://github.com/anthropics/sycophancy-to-subterfuge-paper)**|**\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u5f53\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5b66\u4f1a\u56e0\u8bad\u7ec3\u76ee\u6807\u4e0d\u660e\u786e\u800c\u83b7\u5f97\u4e0d\u671f\u671b\u7684\u884c\u4e3a\u65f6\uff0c\u5c31\u4f1a\u51fa\u73b0\u89c4\u683c\u6e38\u620f\u73b0\u8c61\u3002\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u4ece\u7b80\u5355\u7684\u5949\u627f\u884c\u4e3a\u53d1\u5c55\u5230\u66f4\u590d\u6742\u4e14\u5371\u9669\u7684\u5956\u52b1\u7be1\u6539\uff0c\u5373\u6a21\u578b\u76f4\u63a5\u4fee\u6539\u5176\u81ea\u8eab\u7684\u5956\u52b1\u673a\u5236\u3002\u7136\u800c\uff0c\u53d1\u73b0\u8fd9\u4e9b\u590d\u6742\u884c\u4e3a\u53ef\u80fd\u8d85\u51fa\u63a2\u7d22\u7684\u8303\u7574\u3002\u672c\u8bba\u6587\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u5426\u4f1a\u5728\u5b66\u4e60\u5e38\u89c1\u89c4\u683c\u6e38\u620f\u7b56\u7565\u540e\uff0c\u6cdb\u5316\u5230\u6267\u884c\u66f4\u4e3a\u7f55\u89c1\u548c\u660e\u663e\u7684\u884c\u4e3a\uff0c\u5305\u62ec\u5956\u52b1\u7be1\u6539\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9010\u6b65\u5347\u7ea7\u7684\u53ef\u6e38\u620f\u73af\u5883\u7cfb\u5217\uff0c\u5e76\u53d1\u73b0\u9488\u5bf9\u65e9\u671f\u9636\u6bb5\u73af\u5883\u7684\u8bad\u7ec3\u4f1a\u5bfc\u81f4\u5728\u540e\u7eed\u73af\u5883\u4e2d\u51fa\u73b0\u66f4\u591a\u7684\u89c4\u683c\u6e38\u620f\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u4e00\u5c0f\u90e8\u5206\u4f46\u975e\u96f6\u7684LLMs\uff0c\u5728\u7ecf\u5386\u4e86\u5b8c\u6574\u8bad\u7ec3\u8bfe\u7a0b\u540e\uff0c\u80fd\u591f\u96f6\u6837\u672c\u5730\u76f4\u63a5\u4fee\u6539\u5176\u5956\u52b1\u51fd\u6570\u3002\u91cd\u65b0\u8bad\u7ec3LLMs\u4ee5\u907f\u514d\u65e9\u671f\u9636\u6bb5\u7684\u6e38\u620f\u884c\u4e3a\u53ef\u4ee5\u51cf\u8f7b\u4f46\u4e0d\u80fd\u5b8c\u5168\u6d88\u9664\u540e\u671f\u73af\u5883\u4e2d\u7684\u5956\u52b1\u7be1\u6539\u3002\u6b64\u5916\uff0c\u5bf9\u53ef\u6e38\u620f\u73af\u5883\u8fdb\u884c\u65e0\u5bb3\u6027\u8bad\u7ec3\u5e76\u4e0d\u80fd\u963b\u6b62\u5956\u52b1\u7be1\u6539\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u4ece\u5e38\u89c1\u7684\u89c4\u683c\u6e38\u620f\u7b56\u7565\u4e2d\u6cdb\u5316\u5230\u66f4\u6076\u52a3\u7684\u5956\u52b1\u7be1\u6539\u884c\u4e3a\uff0c\u5e76\u4e14\u8981\u6d88\u9664\u8fd9\u79cd\u884c\u4e3a\u53ef\u80fd\u5e76\u975e\u6613\u4e8b\u3002**|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149](http://arxiv.org/abs/2406.10149)|**[link](https://github.com/booydar/babilong)**|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8f93\u5165\u4e0a\u4e0b\u6587\u957f\u5ea6\u663e\u8457\u589e\u52a0\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u672a\u80fd\u5145\u5206\u8861\u91cf\u6a21\u578b\u5904\u7406\u957f\u7bc7\u6587\u672c\u4e2d\u7684\u4e8b\u5b9e\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86BABILong\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u6d4b\u8bd5\u6a21\u578b\u5728\u5206\u5e03\u5f0f\u957f\u6587\u6863\u4e2d\u8de8\u4e8b\u5b9e\u63a8\u7406\u7684\u80fd\u529b\u3002BABILong\u5305\u62ec20\u4e2a\u591a\u6837\u5316\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u5982\u4e8b\u5b9e\u94fe\u3001\u7b80\u5355\u5f52\u7eb3\u3001\u6f14\u7ece\u3001\u8ba1\u6570\u4ee5\u53ca\u5904\u7406\u5217\u8868/\u96c6\u5408\u7b49\u3002\u8fd9\u4e9b\u4efb\u52a1\u672c\u8eab\u5c31\u5177\u6709\u6311\u6218\u6027\uff0c\u800c\u5f53\u6240\u9700\u4e8b\u5b9e\u5206\u6563\u5728\u957f\u7bc7\u81ea\u7136\u6587\u672c\u4e2d\u65f6\uff0c\u96be\u5ea6\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6d41\u884c\u7684LLMs\u5b9e\u9645\u4e0a\u53ea\u5229\u7528\u4e8610%-20%\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4e14\u968f\u7740\u63a8\u7406\u590d\u6742\u6027\u7684\u63d0\u9ad8\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u5bf9\u4e8e\u66ff\u4ee3\u7684\u4e0a\u4e0b\u6587\u63a8\u7406\u65b9\u6cd5\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\u5728\u5355\u4e8b\u5b9e\u95ee\u9898\u56de\u7b54\u4e0a\u7684\u51c6\u786e\u7387\u4ec5\u4e3a60%\uff0c\u4e0e\u4e0a\u4e0b\u6587\u957f\u5ea6\u65e0\u5173\u3002\u5728\u4e0a\u4e0b\u6587\u6269\u5c55\u65b9\u6cd5\u4e2d\uff0c\u5faa\u73af\u8bb0\u5fc6Transformer\u5c55\u73b0\u51fa\u6700\u9ad8\u6027\u80fd\uff0c\u53ef\u5904\u7406\u957f\u8fbe1100\u4e07\u4e2a\u4ee4\u724c\u7684\u957f\u5ea6\u3002BABILong\u57fa\u51c6\u6d4b\u8bd5\u53ef\u4ee5\u6269\u5c55\u5230\u4efb\u610f\u957f\u5ea6\uff0c\u4ee5\u652f\u6301\u8bc4\u4f30\u5177\u6709\u66f4\u5f3a\u80fd\u529b\u7684\u65b0\u6a21\u578b\uff0c\u5e76\u63d0\u4f9b\u4e86\u957f\u8fbe100\u4e07\u4ee4\u724c\u7684\u5206\u9694\u3002|\n", "2406.11840": "|**2024-06-17**|**LLaNA: Large Language and NeRF Assistant**|Andrea Amaduzzi et.al.|[2406.11840](http://arxiv.org/abs/2406.11840)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u7406\u89e3\u548c\u5904\u7406\u56fe\u50cf\u548c3D\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5168\u9762\u6355\u6349\u7269\u4f53\u7684\u5916\u89c2\u548c\u51e0\u4f55\u7279\u6027\u4e0a\u5b58\u5728\u5c40\u9650\u3002\u8fd1\u671f\uff0c\u795e\u7ecf\u8f90\u5c04\u573a\uff08Neural Radiance Fields\uff0c\u7b80\u79f0NeRF\uff09\u4f5c\u4e3a\u4e00\u79cd\u65b0\u5174\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u5668\uff08Multi-Layer Perceptron\uff0cMLP\uff09\u7684\u6743\u91cd\u7f16\u7801\u4e86\u7269\u4f53\u7684\u51e0\u4f55\u7ed3\u6784\u548c\u9ad8\u5ea6\u903c\u771f\u7684\u5916\u89c2\uff0c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06NeRF\u6574\u5408\u5230MLLM\u4e2d\u7684\u53ef\u884c\u6027\u548c\u6548\u679c\u3002\u6211\u4eec\u5f00\u53d1\u4e86LLaNA\uff0c\u8fd9\u662f\u9996\u4e2a\u901a\u7528\u7684NeRF-\u8bed\u8a00\u52a9\u624b\uff0c\u80fd\u591f\u6267\u884c\u65b0\u4efb\u52a1\uff0c\u5982NeRF\u63cf\u8ff0\u548c\u95ee\u7b54\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u76f4\u63a5\u5904\u7406NeRF MLP\u7684\u6743\u91cd\uff0c\u65e0\u9700\u6e32\u67d3\u56fe\u50cf\u6216\u6784\u5efa3D\u6570\u636e\u7ed3\u6784\uff0c\u5c31\u80fd\u63d0\u53d6\u6709\u5173\u4ee3\u8868\u5bf9\u8c61\u7684\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65e0\u987b\u4eba\u5de5\u5e72\u9884\u7684NeRF\u6587\u672c\u6807\u6ce8\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5404\u79cdNeRF-\u8bed\u8a00\u4efb\u52a1\uff0c\u5e76\u636e\u6b64\u5efa\u7acb\u4e86\u4e00\u4e2a\u8bc4\u4f30\u65b9\u6cd5\u6765\u8861\u91cf\u6211\u4eec\u7684\u6a21\u578b\u5bf9NeRF\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5904\u7406NeRF\u6743\u91cd\u7684\u65b9\u6cd5\u5728\u4e0e\u4eceNeRF\u4e2d\u63d0\u53d62D\u62163D\u8868\u793a\u8fdb\u884c\u6bd4\u8f83\u65f6\u8868\u73b0\u66f4\u4f18\u3002|\n", "2406.11839": "|**2024-06-17**|**mDPO: Conditional Preference Optimization for Multimodal Large Language Models**|Fei Wang et.al.|[2406.11839](http://arxiv.org/abs/2406.11839)|null|### \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u5df2\u88ab\u8bc1\u660e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6821\u51c6\u7684\u6709\u6548\u624b\u6bb5\u3002\u6700\u8fd1\u7684\u7814\u7a76\u5c1d\u8bd5\u5c06DPO\u5e94\u7528\u4e8e\u591a\u6a21\u6001\u573a\u666f\uff0c\u4f46\u53d1\u73b0\u5b9e\u73b0\u6301\u7eed\u6539\u8fdb\u9887\u5177\u6311\u6218\u3002\u901a\u8fc7\u5bf9\u6bd4\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5ffd\u89c6\u4e86\u56fe\u50cf\u6761\u4ef6\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86mDPO\uff0c\u4e00\u4e2a\u65e8\u5728\u9632\u6b62\u8bed\u8a00\u504f\u597d\u8fc7\u5ea6\u4f18\u5148\u7684\u591a\u6a21\u6001DPO\u76ee\u6807\uff0c\u540c\u65f6\u4f18\u5316\u56fe\u50cf\u504f\u597d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5956\u52b1\u951a\u70b9\uff0c\u786e\u4fdd\u9009\u62e9\u7684\u54cd\u5e94\u5956\u52b1\u4fdd\u6301\u6b63\u5411\uff0c\u4ece\u800c\u907f\u514d\u76f8\u5bf9\u504f\u597d\u4f18\u5316\u56fa\u6709\u7684\u53ef\u80fd\u6027\u964d\u4f4e\u95ee\u9898\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5728\u4e24\u4e2a\u4e0d\u540c\u89c4\u6a21\u7684\u591a\u6a21\u6001LLM\u4ee5\u53ca\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0cmDPO\u6709\u6548\u89e3\u51b3\u4e86\u591a\u6a21\u6001\u504f\u597d\u4f18\u5316\u4e2d\u7684\u65e0\u6761\u4ef6\u504f\u597d\u95ee\u9898\uff0c\u5e76\u663e\u8457\u63d0\u9ad8\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002|\n", "2406.11832": "|**2024-06-17**|**Unveiling Encoder-Free Vision-Language Models**|Haiwen Diao et.al.|[2406.11832](http://arxiv.org/abs/2406.11832)|**[link](https://github.com/baaivision/eve)**|**\u5f53\u524d\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u4e3b\u8981\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7f16\u7801\u5668\u6765\u63d0\u53d6\u89c6\u89c9\u7279\u5f81\uff0c\u7136\u540e\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u5728\u62bd\u8c61\u89c6\u89c9\u8868\u793a\u65b9\u9762\u8bbe\u5b9a\u4e86\u5f3a\u70c8\u7684\u5148\u9a8c\uff0c\u5982\u5206\u8fa8\u7387\u3001\u6bd4\u4f8b\u548c\u8bed\u4e49\u503e\u5411\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86VLM\u7684\u7075\u6d3b\u6027\u548c\u6548\u7387\u3002\u76f4\u63a5\u8bad\u7ec3\u65e0\u7f16\u7801\u5668\u7684\u7eafVLM\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\uff0c\u4e14\u9c9c\u6709\u63a2\u7d22\u3002\u5b9e\u8bc1\u7814\u7a76\u663e\u793a\uff0c\u8fd9\u79cd\u76f4\u63a5\u8bad\u7ec3\u65b9\u6cd5\u4f1a\u5bfc\u81f4\u6536\u655b\u7f13\u6162\u548c\u6027\u80fd\u5dee\u8ddd\u8f83\u5927\u3002\u672c\u6587\u65e8\u5728\u5f25\u5408\u7f16\u7801\u5668\u4f9d\u8d56\u578b\u548c\u65e0\u7f16\u7801\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u7eafVLM\u8bad\u7ec3\u7b56\u7565\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u6df1\u5165\u5b9e\u9a8c\u63ed\u793a\u4e86\u9ad8\u6548\u8bad\u7ec3\u65e0\u7f16\u7801\u5668VLM\u7684\u5173\u952e\u8981\u7d20\uff1a\uff081\uff09\u5728\u7edf\u4e00\u7684\u89e3\u7801\u5668\u5185\u878d\u5408\u89c6\u89c9\u4e0e\u8bed\u8a00\u8868\u793a\uff1b\uff082\uff09\u901a\u8fc7\u989d\u5916\u76d1\u7763\u63d0\u5347\u89c6\u89c9\u8bc6\u522b\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e9b\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86EVE\uff0c\u4e00\u4e2a\u65e0\u7f16\u7801\u5668\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u65e2\u80fd\u9ad8\u6548\u8bad\u7ec3\u4e5f\u80fd\u5feb\u901f\u63a8\u7406\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4ec5\u4f7f\u75283500\u4e07\u516c\u5f00\u53ef\u7528\u7684\u6570\u636e\uff0cEVE\u5c31\u80fd\u5728\u591a\u4e2a\u89c6\u89c9\u8bed\u8a00\u57fa\u51c6\u4e0a\u4e0e\u7c7b\u4f3c\u5bb9\u91cf\u7684\u7f16\u7801\u5668\u4f9d\u8d56\u578bVLM\u5339\u654c\uff0c\u751a\u81f3\u8d85\u8d8a\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u795e\u79d8\u3001\u6570\u636e\u672a\u516c\u5f00\u7684Fuyu-8B\u6a21\u578b\u3002\u6211\u4eec\u76f8\u4fe1\uff0cEVE\u4e3a\u8de8\u6a21\u6001\u5f00\u53d1\u7eaf\u7cb9\u7684\u89e3\u7801\u5668\u67b6\u6784\u63d0\u4f9b\u4e86\u4e00\u4e2a\u900f\u660e\u4e14\u9ad8\u6548\u7684\u8def\u5f84\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u516c\u5f00\u5728\uff1ahttps://github.com/baaivision/EVE\u3002**|\n", "2406.11831": "|**2024-06-17**|**Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models**|Bingqi Ma et.al.|[2406.11831](http://arxiv.org/abs/2406.11831)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u89e3\u7801\u5668-only\u53d8\u538b\u5668\u5728\u6587\u672c\u7406\u89e3\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5982\u4f55\u5c06\u8fd9\u4e9b\u5148\u8fdb\u7684LLMs\u5e94\u7528\u4e8e\u6587\u672c\u5230\u56fe\u50cf\u7684\u6269\u6563\u6a21\u578b\u4ecd\u662f\u4e00\u4e2a\u5f85\u63a2\u7d22\u7684\u95ee\u9898\u3002\u6211\u4eec\u53d1\u73b0\u76f4\u63a5\u4f7f\u7528LLM\u4f5c\u4e3a\u63d0\u793a\u7f16\u7801\u5668\u4f1a\u663e\u8457\u964d\u4f4e\u751f\u6210\u56fe\u50cf\u65f6\u7684\u63d0\u793a\u8ddf\u968f\u80fd\u529b\u3002\u4e3b\u8981\u5b58\u5728\u4e24\u4e2a\u95ee\u9898\uff1a\u4e00\u662fLLM\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u8bad\u7ec3\u4e0e\u6269\u6563\u6a21\u578b\u5bf9\u533a\u5206\u6027\u63d0\u793a\u7279\u5f81\u7684\u9700\u6c42\u4e0d\u5339\u914d\uff1b\u4e8c\u662f\u89e3\u7801\u5668\u67b6\u6784\u56fa\u6709\u7684\u4f4d\u7f6e\u504f\u89c1\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u6846\u67b6\uff0c\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4f7f\u7528\u6307\u5357\uff0c\u589e\u5f3aLLM\u7684\u6587\u672c\u8868\u793a\u80fd\u529b\uff0c\u6d88\u9664\u5176\u5185\u5728\u7684\u5b9a\u4f4d\u504f\u89c1\uff0c\u4ece\u800c\u7075\u6d3b\u5730\u5c06\u6700\u5148\u8fdb\u7684LLMs\u878d\u5165\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u79cd\u878d\u5408\u591a\u4e2aLLMs\u7684\u65b9\u6cd5\u3002\u9274\u4e8eTransformer\u67b6\u6784\u7684\u5353\u8d8a\u6027\u80fd\u548c\u6269\u5c55\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u8bbe\u8ba1\u4e86\u57fa\u4e8e\u8be5\u6846\u67b6\u7684LLM-Infused Diffusion Transformer\uff08LI-DiT\uff09\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86LI-DiT\u5728\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u6570\u636e\u91cf\u4e0b\u7684\u6027\u80fd\u3002\u5f97\u76ca\u4e8eLLMs\u7684\u5185\u5728\u80fd\u529b\u53ca\u6211\u4eec\u7684\u521b\u65b0\u8bbe\u8ba1\uff0cLI-DiT\u7684\u63d0\u793a\u7406\u89e3\u6027\u80fd\u8f7b\u677e\u8d85\u8d8a\u5f00\u6e90\u7684\u6700\u65b0\u6a21\u578b\uff0c\u4ee5\u53ca\u5305\u62ecStable Diffusion 3\u3001DALL-E 3\u548cMidjourney V6\u5728\u5185\u7684\u4e3b\u6d41\u95ed\u6e90\u5546\u4e1a\u6a21\u578b\u3002\u5f3a\u5927\u7684LI-DiT-10B\u5c06\u5728\u8fdb\u4e00\u6b65\u4f18\u5316\u548c\u5b89\u5168\u68c0\u67e5\u540e\u63d0\u4f9b\u3002|\n", "2406.11827": "|**2024-06-17**|**WPO: Enhancing RLHF with Weighted Preference Optimization**|Wenxuan Zhou et.al.|[2406.11827](http://arxiv.org/abs/2406.11827)|**[link](https://github.com/wzhouad/wpo)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u66f4\u597d\u5730\u7b26\u5408\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6709\u524d\u666f\u65b9\u6cd5\u3002\u7531\u4e8e\u6210\u672c\u6548\u76ca\u548c\u53ef\u6269\u5c55\u6027\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u2014\u2014\u901a\u8fc7\u5176\u4ed6\u6a21\u578b\u83b7\u53d6\u504f\u597d\u6570\u636e\u2014\u2014\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u7136\u800c\uff0c\u79bb\u7ebf\u504f\u597d\u4f18\u5316\u5e38\u53d7\u91c7\u6837\u7b56\u7565\u4e0e\u76ee\u6807\u7b56\u7565\u4e4b\u95f4\u5206\u5e03\u5dee\u5f02\u7684\u5f71\u54cd\uff0c\u5bfc\u81f4\u4f18\u5316\u6548\u679c\u4e0d\u7406\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7b56\u7565\u2014\u2014\u52a0\u6743\u504f\u597d\u4f18\u5316\uff08WPO\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u504f\u597d\u8bc4\u5206\u5bf9\uff0c\u4f7f\u79bb\u7ebf\u6570\u636e\u66f4\u63a5\u8fd1\u4e8e\u5f53\u524d\u7b56\u7565\uff0c\u4ece\u800c\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5206\u5e03\u5dee\u8ddd\u96be\u9898\uff0c\u8fd8\u63d0\u5347\u4e86\u4f18\u5316\u8fc7\u7a0b\uff0c\u65e0\u9700\u989d\u5916\u6210\u672c\u3002 \u6211\u4eec\u5728Alpaca Eval 2\u548cMT-bench\u7b49\u6307\u4ee4\u8ddf\u968f\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002WPO\u5728Alpaca Eval 2\u4e0a\u7684\u6027\u80fd\u6bd4\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u63d0\u9ad8\u4e865.6%\u3002\u57fa\u4e8eLlama-3-8B-Instruct\uff0cWPO\u751a\u81f3\u5efa\u7acb\u4e86\u663e\u8457\u7684\u957f\u5ea6\u63a7\u5236\u80dc\u7387\uff0c\u8fbe\u523048.6%\uff0c\u572880\u4ebf\u53c2\u6570\u6a21\u578b\u6392\u884c\u699c\u4e0a\u6210\u4e3a\u6700\u5f3a\u52b2\u7684\u6a21\u578b\u3002\u6211\u4eec\u5c06\u5728\u4e0a\u5f00\u6e90\u4ee3\u7801\u548c\u6a21\u578b\u3002**|\n", "2406.11818": "|**2024-06-17**|**Embodied Instruction Following in Unknown Environments**|Zhenyu Wu et.al.|[2406.11818](http://arxiv.org/abs/2406.11818)|null|\u5728\u81ea\u4e3b\u5bb6\u5ead\u670d\u52a1\u7cfb\u7edf\u4e2d\uff0c\u4f7f\u5b9e\u4f53\u4ee3\u7406\u80fd\u6839\u636e\u81ea\u7136\u8bed\u8a00\u5b8c\u6210\u590d\u6742\u7684\u4eba\u7c7b\u6307\u4ee4\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u4ec5\u80fd\u5728\u6240\u6709\u4e92\u52a8\u5bf9\u8c61\u90fd\u63d0\u4f9b\u7ed9\u4ee3\u7406\u7684\u5df2\u77e5\u73af\u5883\u4e2d\u6267\u884c\u6307\u4ee4\uff0c\u76f4\u63a5\u5c06\u73b0\u6709\u65b9\u6cd5\u5e94\u7528\u4e8e\u672a\u77e5\u73af\u5883\u901a\u5e38\u4f1a\u4ea7\u751f\u64cd\u4f5c\u4e0d\u5b58\u5728\u7269\u4f53\u7684\u4e0d\u53ef\u884c\u8ba1\u5212\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u672a\u77e5\u73af\u5883\u7684\u590d\u6742\u4efb\u52a1\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\uff08Embodied Instruction Following\uff0cEIF\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u4f7f\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u63a2\u7d22\u73af\u5883\uff0c\u5229\u7528\u73b0\u6709\u7269\u4f53\u751f\u6210\u53ef\u6267\u884c\u8ba1\u5212\uff0c\u4ee5\u8fbe\u6210\u62bd\u8c61\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u9ad8\u5c42\u4efb\u52a1\u89c4\u5212\u5668\u548c\u4f4e\u5c42\u63a2\u7d22\u63a7\u5236\u5668\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5c42\u6b21\u5316\u5b9e\u4f53\u6307\u4ee4\u8ddf\u968f\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u52a8\u6001\u533a\u57df\u6ce8\u610f\u529b\u6784\u5efa\u573a\u666f\u7684\u8bed\u4e49\u8868\u793a\u5730\u56fe\uff0c\u4ee5\u5c55\u793a\u5df2\u77e5\u7684\u89c6\u89c9\u7ebf\u7d22\uff0c\u4f7f\u4efb\u52a1\u89c4\u5212\u548c\u573a\u666f\u63a2\u7d22\u4e0e\u4eba\u7c7b\u6307\u4ee4\u76ee\u6807\u4fdd\u6301\u4e00\u81f4\u3002\u5bf9\u4e8e\u4efb\u52a1\u89c4\u5212\u5668\uff0c\u6839\u636e\u4efb\u52a1\u5b8c\u6210\u8fc7\u7a0b\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\uff0c\u6211\u4eec\u751f\u6210\u6b65\u9aa4\u5f0f\u7684\u53ef\u884c\u8ba1\u5212\u3002\u5bf9\u4e8e\u63a2\u7d22\u63a7\u5236\u5668\uff0c\u6839\u636e\u751f\u6210\u7684\u6b65\u9aa4\u8ba1\u5212\u548c\u5df2\u77e5\u89c6\u89c9\u7ebf\u7d22\u9884\u6d4b\u6700\u4f18\u7684\u5bfc\u822a\u6216\u7269\u4f53\u4ea4\u4e92\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5927\u578b\u623f\u5c4b\u7ea7\u573a\u666f\u4e2d\u7684204\u4e2a\u590d\u6742\u4eba\u7c7b\u6307\u4ee4\uff08\u5982\u505a\u65e9\u9910\u548c\u6574\u7406\u623f\u95f4\uff09\u4e0a\u5b9e\u73b0\u4e8645.09%\u7684\u6210\u529f\u7387\u3002|\n", "2406.11816": "|**2024-06-17**|**VideoLLM-online: Online Video Large Language Model for Streaming Video**|Joya Chen et.al.|[2406.11816](http://arxiv.org/abs/2406.11816)|null|## \u7ffb\u8bd1 \u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u589e\u5f3a\u4e86\u89c6\u89c9\u529f\u80fd\uff0c\u80fd\u591f\u7406\u89e3\u56fe\u50cf\u3001\u89c6\u9891\u548c\u878d\u5408\u4e86\u89c6\u89c9\u4e0e\u8bed\u8a00\u7684\u5185\u5bb9\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u5927\u6a21odels\u7684\u8bad\u7ec3\u65b9\u6cd5\u901a\u5e38\u5c06\u89c6\u9891\u89c6\u4e3a\u9884\u5148\u526a\u8f91\u597d\u7684\u7247\u6bb5\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5728\u5904\u7406\u8fde\u7eed\u89c6\u9891\u6d41\u65f6\u6548\u679c\u4e0d\u4f73\u4e14\u6548\u7387\u4f4e\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u672c\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cLearning-In-Video-Stream\u201d\uff08LIVE\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b9e\u65f6\u3001\u957f\u5e8f\u5217\u3001\u4e0e\u89c6\u9891\u6d41\u540c\u6b65\u7684\u5bf9\u8bdd\uff0c\u9002\u7528\u4e8e\u8fde\u7eed\u89c6\u9891\u8f93\u5165\u3002LIVE\u6846\u67b6\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a\uff081\uff09\u4e00\u4e2a\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u8fde\u7eed\u6d41\u5f0f\u8f93\u5165\u7684\u8bed\u8a00\u5efa\u6a21\u76ee\u6807\uff1b\uff082\uff09\u4e00\u79cd\u6570\u636e\u751f\u6210\u7b56\u7565\uff0c\u5c06\u79bb\u7ebf\u65f6\u95f4\u6807\u6ce8\u8f6c\u6362\u4e3a\u9002\u5408\u6d41\u5f0f\u5bf9\u8bdd\u7684\u683c\u5f0f\uff1b\uff083\uff09\u4e00\u4e2a\u4f18\u5316\u7684\u63a8\u7406\u7ba1\u9053\uff0c\u4ee5\u63d0\u9ad8\u5728\u5b9e\u9645\u89c6\u9891\u6d41\u4e2d\u7684\u54cd\u5e94\u901f\u5ea6\u3002\u57fa\u4e8eLlama-2/Llama-3\uff0c\u6211\u4eec\u6784\u5efa\u4e86VideoLLM-online\u6a21\u578b\uff0c\u5e76\u901a\u8fc7\u5b83\u5c55\u793a\u4e86\u5728\u5904\u7406\u89c6\u9891\u6d41\u5bf9\u8bdd\u65b9\u9762\u7684\u663e\u8457\u4f18\u52bf\uff0c\u4f8b\u5982\uff0c\u5728A100 GPU\u4e0a\uff0c\u8be5\u6a21\u578b\u80fd\u57285\u5206\u949f\u89c6\u9891\u7247\u6bb5\u4e2d\u5b9e\u73b0\u8d85\u8fc710\u5e27\u6bcf\u79d2\u7684\u6d41\u5f0f\u5bf9\u8bdd\u3002\u6b64\u5916\uff0cVideoLLM-online\u8fd8\u5728\u516c\u5f00\u7684\u79bb\u7ebf\u89c6\u9891\u57fa\u51c6\u6d4b\u8bd5\uff08\u5982\u8bc6\u522b\u3001captioning\u548c\u9884\u6d4b\uff09\u4e0a\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u3001\u6a21\u578b\u3001\u6570\u636e\u548c\u6f14\u793a\u53d1\u5e03\u5728https://showlab.github.io/videollm-online\u4f9b\u4eba\u4f7f\u7528\u3002|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813](http://arxiv.org/abs/2406.11813)|null|\u5c3d\u7ba1\u8fd1\u671f\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5b58\u50a8\u5927\u91cf\u4e8b\u5b9e\u77e5\u8bc6\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5728\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u83b7\u53d6\u8fd9\u4e9b\u77e5\u8bc6\u7684\u673a\u5236\u5c1a\u4e0d\u660e\u786e\u3002\u672c\u7814\u7a76\u9488\u5bf9\u8fd9\u4e00\u7f3a\u53e3\uff0c\u63a2\u8ba8\u4e86LLMs\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u5982\u4f55\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u3002\u7814\u7a76\u53d1\u73b0\u4e86\u4e00\u4e9b\u5173\u952e\u6d1e\u89c1\uff1a\u9996\u5148\uff0c\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0c\u66f4\u591a\u7684\u8bad\u7ec3\u6570\u636e\u5bf9\u6a21\u578b\u83b7\u53d6\u548c\u4fdd\u6301\u4e8b\u5b9e\u77e5\u8bc6\u7684\u80fd\u529b\u5e76\u65e0\u663e\u8457\u63d0\u5347\u3002\u5176\u6b21\uff0c\u8bad\u7ec3\u6b65\u6570\u4e0e\u8bb0\u5fc6\u9057\u5fd8\u548c\u4e8b\u5b9e\u77e5\u8bc6\u6cdb\u5316\u4e4b\u95f4\u5b58\u5728\u5e42\u5f8b\u5173\u7cfb\uff0c\u4f7f\u7528\u91cd\u590d\u8bad\u7ec3\u6570\u636e\u7684\u6a21\u578b\u9057\u5fd8\u901f\u5ea6\u66f4\u5feb\u3002\u7b2c\u4e09\uff0c\u589e\u5927\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u62b5\u6297\u9057\u5fd8\u7684\u80fd\u529b\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u89c2\u5bdf\u8868\u660e\uff0cLLMs\u5728\u9884\u8bad\u7ec3\u4e2d\u7684\u4e8b\u5b9e\u77e5\u8bc6\u83b7\u53d6\u662f\u901a\u8fc7\u9010\u6b65\u589e\u52a0\u6bcf\u4e00\u6b65\u4e2d\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\u4e8b\u5b9e\u77e5\u8bc6\u51fa\u73b0\u7684\u6982\u7387\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u589e\u52a0\u968f\u540e\u4f1a\u56e0\u9057\u5fd8\u800c\u7a00\u91ca\u3002\u57fa\u4e8e\u8fd9\u79cd\u7406\u89e3\uff0c\u6211\u4eec\u80fd\u591f\u89e3\u91ca\u4e00\u4e9b\u6700\u8fd1\u89c2\u5bdf\u5230\u7684LLM\u884c\u4e3a\uff0c\u5982\u957f\u5c3e\u77e5\u8bc6\u4e0a\u7684\u6027\u80fd\u4e0d\u4f73\uff0c\u4ee5\u53ca\u53bb\u91cd\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u597d\u5904\u3002|\n", "2406.11811": "|**2024-06-17**|**RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content**|Joao Monteiro et.al.|[2406.11811](http://arxiv.org/abs/2406.11811)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5927\u91cf\u4f9d\u8d56\u81ea\u52a8\u4ece\u4e92\u8054\u7f51\u6293\u53d6\u7684\u6570\u636e\uff0c\u5176\u4e2d\u5305\u62ec\u5305\u542b\u5927\u91cf\u901a\u7528\u77e5\u8bc6\u7684\u767e\u79d1\u5168\u4e66\uff08\u5982\u7ef4\u57fa\u767e\u79d1\uff09\uff0c\u4e5f\u53ef\u80fd\u4e0e\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u57fa\u51c6\u6570\u636e\u96c6\u91cd\u53e0\u3002\u56e0\u6b64\uff0c\u5982\u679c\u6d4b\u8bd5\u96c6\u53ef\u80fd\u5df2\u6cc4\u9732\u5230\u8bad\u7ec3\u96c6\u4e2d\uff0c\u5bf9\u6a21\u578b\u7684\u8bc4\u4f30\u53ef\u80fd\u4f1a\u4ea7\u751f\u8bef\u5bfc\u6027\u7684\u7ed3\u8bba\u3002\u4e3a\u4e86\u63a8\u52a8\u8bed\u8a00\u6a21\u578b\u7684\u516c\u6b63\u8bc4\u4f30\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\u2014\u2014RepLiQA\uff0c\u9002\u7528\u4e8e\u95ee\u7b54\u548c\u4e3b\u9898\u68c0\u7d22\u4efb\u52a1\u3002RepLiQA\u662f\u4e00\u4e2a\u5305\u542b\u4e94\u4e2a\u5206\u7247\u7684\u6d4b\u8bd5\u96c6\uff0c\u5176\u4e2d\u56db\u4e2a\u5728\u672c\u8bba\u6587\u53d1\u5e03\u524d\u672a\u516c\u5f00\u6216\u901a\u8fc7LLM API\u63d0\u4f9b\u3002RepLiQA\u7684\u6bcf\u4e2a\u6837\u672c\u7531\u4ee5\u4e0b\u56db\u90e8\u5206\u7ec4\u6210\uff1a\uff081\uff09\u7531\u4eba\u7c7b\u6807\u6ce8\u5458\u521b\u4f5c\u7684\u865a\u6784\u573a\u666f\u63cf\u8ff0\u6587\u6863\uff08\u4f8b\u5982\u65b0\u95fb\u6587\u7ae0\uff09\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0d\u4f1a\u51fa\u73b0\u5728\u4e92\u8054\u7f51\u4e0a\uff1b\uff082\uff09\u5173\u4e8e\u6587\u6863\u4e3b\u9898\u7684\u95ee\u9898\uff1b\uff083\uff09\u76f4\u63a5\u6e90\u81ea\u6587\u6863\u4fe1\u606f\u7684\u6b63\u786e\u7b54\u6848\uff1b\uff084\uff09\u5305\u542b\u7b54\u6848\u7684\u6587\u6863\u6bb5\u843d\u3002\u8fd9\u610f\u5473\u7740\u53ea\u6709\u5f53\u6a21\u578b\u80fd\u5728\u63d0\u4f9b\u7684\u6587\u6863\u4e2d\u627e\u5230\u76f8\u5173\u5185\u5bb9\u65f6\uff0c\u624d\u80fd\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5305\u62ec\u591a\u4e2a\u6700\u5148\u8fdb\u7684LLM\uff0c\u4ee5\u63ed\u793a\u4e0d\u540c\u7c7b\u578b\u7684\u548c\u89c4\u6a21\u7684\u6a21\u578b\u5728\u6761\u4ef6\u8bed\u8a00\u5efa\u6a21\u8bbe\u7f6e\u4e0b\u7684\u6027\u80fd\u5dee\u5f02\u3002RepLiQA\u7684\u5df2\u53d1\u5e03\u5206\u7247\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u627e\u5230\uff1ahttps://huggingface.co/datasets/ServiceNow/repliqa\u3002|\n", "2406.11801": "|**2024-06-17**|**Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations**|Rima Hazra et.al.|[2406.11801](http://arxiv.org/abs/2406.11801)|**[link](https://github.com/declare-lab/safety-arithmetic)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ffb\u8bd1\u548c\u95ee\u7b54\u7b49\u5e94\u7528\u4e2d\u7684\u65e5\u76ca\u91cd\u8981\uff0c\u786e\u4fdd\u5b83\u4eec\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u7684\u6b63\u786e\u5bfc\u5411\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u5bf9\u9f50\u65b9\u6cd5\u5728\u5904\u7406\u52a8\u6001\u7528\u6237\u610f\u56fe\u548c\u590d\u6742\u76ee\u6807\u65f6\u5b58\u5728\u56f0\u96be\uff0c\u4f7f\u5f97\u6a21\u578b\u5bb9\u6613\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u6846\u67b6\u2014\u2014\u5b89\u5168\u7b97\u672f\uff08Safety Arithmetic\uff09\uff0c\u65e8\u5728\u63d0\u5347LLMs\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5305\u62ec\u57fa\u7840\u6a21\u578b\u3001\u76d1\u7763\u5fae\u8c03\u6a21\u578b\uff08SFT\uff09\u548c\u7f16\u8f91\u540e\u7684\u6a21\u578b\u3002\u5b89\u5168\u7b97\u672f\u5305\u542b\u4e24\u90e8\u5206\uff1a\u6709\u5bb3\u5185\u5bb9\u6d88\u9664\uff08Harm Direction Removal\uff09\u4ee5\u907f\u514d\u4e0d\u826f\u8f93\u51fa\uff0c\u4ee5\u53ca\u5b89\u5168\u5bf9\u9f50\uff08Safety Alignment\uff09\u4ee5\u4fc3\u8fdb\u5b89\u5168\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86NoIntentEdit\u6570\u636e\u96c6\uff0c\u5b83\u63ed\u793a\u4e86\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5b89\u5168\u98ce\u9669\u7684\u7f16\u8f91\u5b9e\u4f8b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5b89\u5168\u7b97\u672f\u663e\u8457\u589e\u5f3a\u4e86\u5b89\u5168\u63aa\u65bd\uff0c\u51cf\u5c11\u4e86\u8fc7\u5ea6\u5b89\u5168\u7684\u95ee\u9898\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u5b9e\u7528\u6027\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u65b9\u6cd5\u5728\u4fdd\u969c\u5185\u5bb9\u751f\u6210\u7684\u5b89\u5168\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002**|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846](http://arxiv.org/abs/2406.12846)|null|\u5f53\u524d\u7684\u957f\u89c6\u9891\u7406\u89e3\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u65f6\u957f\u4ec5\u5341\u51e0\u79d2\u7684\u89c6\u9891\uff0c\u5bf9\u5904\u7406\u66f4\u957f\u89c6\u9891\u7684\u6280\u672f\u63a2\u7d22\u6709\u9650\u3002\u957f\u89c6\u9891\u4e2d\u7684\u5927\u91cf\u5e27\u6570\u5e26\u6765\u4e86\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u96be\u4ee5\u5b9a\u4f4d\u5173\u952e\u4fe1\u606f\u548c\u8fdb\u884c\u957f\u671f\u63a8\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faDrVideo\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6587\u6863\u68c0\u7d22\u7684\u7cfb\u7edf\uff0c\u4e13\u4e3a\u957f\u89c6\u9891\u7406\u89e3\u8bbe\u8ba1\u3002\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u89c6\u9891\u7406\u89e3\u95ee\u9898\u8f6c\u5316\u4e3a\u957f\u6587\u6863\u7406\u89e3\u4efb\u52a1\uff0c\u4ee5\u5145\u5206\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0cDrVideo\u5c06\u957f\u89c6\u9891\u8f6c\u6362\u4e3a\u6587\u672c\u5f62\u5f0f\u7684\u957f\u6587\u6863\uff0c\u9996\u5148\u68c0\u7d22\u5173\u952e\u5e27\u5e76\u589e\u5f3a\u8fd9\u4e9b\u5e27\u7684\u4fe1\u606f\uff0c\u4f5c\u4e3a\u7cfb\u7edf\u7684\u8d77\u70b9\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u57fa\u4e8e\u4ee3\u7406\u7684\u8fed\u4ee3\u5faa\u73af\uff0c\u6301\u7eed\u641c\u7d22\u7f3a\u5931\u4fe1\u606f\u3001\u8865\u5145\u76f8\u5173\u6570\u636e\uff0c\u5e76\u5728\u6536\u96c6\u5230\u8db3\u591f\u7684\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u4fe1\u606f\u540e\uff0c\u4ee5\u94fe\u5f0f\u601d\u8003\u7684\u65b9\u5f0f\u7ed9\u51fa\u6700\u7ec8\u9884\u6d4b\u3002\u5728\u591a\u4e2a\u957f\u89c6\u9891\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002DrVideo\u5728EgoSchema\uff083\u5206\u949f\uff09\u6d4b\u8bd5\u4e2d\u6bd4\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u9ad8\u51fa3.8\u4e2a\u767e\u5206\u70b9\uff0c\u5728MovieChat-1K\uff0810\u5206\u949f\uff09\u7684break\u6a21\u5f0f\u548cglobal\u6a21\u5f0f\u4e2d\u5206\u522b\u63d0\u9ad817.9\u548c38.0\u5206\uff0c\u4ee5\u53ca\u5728LLama-Vid QA\uff08\u8d85\u8fc760\u5206\u949f\uff09\u6570\u636e\u96c6\u4e0a\u63d0\u534730.2\u5206\u3002|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845](http://arxiv.org/abs/2406.12845)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|**\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u4f20\u7edf\u4e0a\uff0c\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\uff08RM\uff09\uff0c\u8fc7\u7a0b\u901a\u5e38\u4ece\u6bd4\u8f83\u540c\u4e00\u7528\u6237\u8bf7\u6c42\u7684\u54cd\u5e94\u5f00\u59cb\uff0c\u76f8\u5bf9\u8bc4\u5206\u6307\u793a\u4eba\u7c7b\u66f4\u559c\u6b22\u54ea\u4e2a\u54cd\u5e94\u3002\u7136\u800c\uff0c\u7531\u4e8eRM\u7684\u9ed1\u76d2\u7279\u6027\uff0c\u5176\u8f93\u51fa\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u4eba\u4eec\u96be\u4ee5\u7406\u89e3\u4e3a\u4ec0\u4e48RM\u8ba4\u4e3a\u67d0\u4e2a\u56de\u590d\u662f\u597d\u7684\u3002\u9274\u4e8eRM\u4f5c\u4e3a\u4eba\u7c7b\u504f\u597d\u7684\u4ee3\u7406\uff0c\u6211\u4eec\u63d0\u8bae\u91c7\u7528\u4e24\u9636\u6bb5\u65b9\u6cd5\u6765\u521b\u5efa\u53ef\u89e3\u91ca\u7684RM\uff1a\u9996\u5148\uff0c\u4f7f\u7528\u591a\u7ef4\u7edd\u5bf9\u8bc4\u5206\u6570\u636e\u8bad\u7ec3\u7edd\u5bf9\u8bc4\u7ea7\u591a\u76ee\u6807\u5956\u52b1\u6a21\u578b\uff08ArmoRM\uff09\uff0c\u6bcf\u4e2a\u7ef4\u5ea6\u5bf9\u5e94\u4e8e\u4eba\u7c7b\u53ef\u7406\u89e3\u7684\u76ee\u6807\uff08\u5982\u8bda\u5b9e\u3001\u8be6\u5c3d\u3001\u5b89\u5168\uff09\uff1b\u5176\u6b21\uff0c\u5229\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7b56\u7565\uff0c\u7ed3\u5408\u4e00\u4e2a\u95e8\u63a7\u7f51\u7edc\uff0c\u6839\u636e\u4e0a\u4e0b\u6587\u81ea\u52a8\u9009\u62e9\u6700\u5408\u9002\u7684\u5956\u52b1\u76ee\u6807\u3002\u6211\u4eec\u6210\u529f\u5730\u4f7f\u7528Llama-3 8B\u8bad\u7ec3\u4e86ArmoRM\uff0c\u5e76\u5728\u9876\u90e8\u6dfb\u52a0\u4e86\u4e00\u4e2a\u6d45\u5c42MLP\u4f5c\u4e3a\u95e8\u63a7\u7f51\u7edc\uff0c\u5f62\u6210\u4e86ArmoRM-Llama3-8B\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u8bc4\u4f30RM\u7684\u8bed\u8a00\u5efa\u6a21\u6027\u80fd\u7684RewardBench\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6210\u7ee9\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8d85\u8fc7\u4e86\u4f7f\u7528GPT-4\u6cd5\u5b98\u7684LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\uff0c\u5e76\u63a5\u8fd1\u4e8e\u89c4\u6a21\u66f4\u5927\u7684Nemotron-4 340B\u5956\u52b1\u6a21\u578b\u7684\u6c34\u5e73\u3002**|\n", "2406.12844": "|**2024-06-18**|**Synergizing Foundation Models and Federated Learning: A Survey**|Shenghui Li et.al.|[2406.12844](http://arxiv.org/abs/2406.12844)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3001\u89c6\u89c9Transformer\u548c\u591a\u6a21\u6001\u6a21\u578b\u7b49\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u7684\u53d1\u5c55\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u4e0e\u5c0f\u578b\u6a21\u578b\u76f8\u6bd4\uff0cFMs\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5bf9\u5927\u91cf\u6570\u636e\u7684\u9700\u6c42\u66f4\u5927\u3002\u5c3d\u7ba1\u901a\u7528FMs\u53ef\u4ee5\u4f7f\u7528\u4e92\u8054\u7f51\u4e0a\u7684\u516c\u5f00\u6570\u636e\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u4f46\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u7684FMs\u9700\u8981\u4e13\u6709\u6570\u636e\uff0c\u8fd9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u56e0\u9690\u79c1\u95ee\u9898\u800c\u9762\u4e34\u6570\u636e\u53ef\u7528\u6027\u6311\u6218\u3002\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4f5c\u4e3a\u4e00\u79cd\u534f\u4f5c\u5b66\u4e60\u8303\u5f0f\uff0c\u6253\u7834\u4e86\u6570\u636e\u5171\u4eab\u7684\u969c\u788d\uff0c\u4e3a\u5229\u7528\u5206\u5e03\u5f0f\u6570\u636e\u5b9a\u5236\u548c\u9002\u5e94\u5404\u79cd\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u7684FMs\u63d0\u4f9b\u4e86\u524d\u666f\uff0c\u540c\u65f6\u4fdd\u62a4\u4e86\u6570\u636e\u9690\u79c1\u3002\u8fd9\u7bc7\u7efc\u8ff0\u8bba\u6587\u63a2\u8ba8\u4e86FL\u4e0eFMs\u878d\u5408\u7684\u6f5c\u529b\u4e0e\u6311\u6218\uff0c\u603b\u7ed3\u4e86\u6838\u5fc3\u6280\u672f\u3001\u672a\u6765\u53d1\u5c55\u65b9\u5411\u4ee5\u53ca\u5e94\u7528\u573a\u666f\u3002\u5173\u4e8eFM-FL\u7684\u5b9a\u671f\u66f4\u65b0\u8bba\u6587\u96c6\u5408\u53ef\u5728\u83b7\u53d6\u3002|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832](http://arxiv.org/abs/2406.12832)|**[link](https://github.com/arminazizi98/lamda)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u9886\u57df\uff0c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u5df2\u7ecf\u6210\u4e3a\u6807\u51c6\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u663e\u8457\u51cf\u5c11\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u3002\u7136\u800c\uff0c\u968f\u7740\u6a21\u578b\u5d4c\u5165\u7ef4\u5ea6\u7684\u589e\u52a0\uff0cLoRA\u6240\u9700\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u91cf\u4e5f\u968f\u4e4b\u4e0a\u5347\uff0c\u5bfc\u81f4\u8ba1\u7b97\u6210\u672c\u8f83\u9ad8\u3002\u6b64\u5916\uff0c\u5176\u540e\u5411\u66f4\u65b0\u9700\u8981\u5b58\u50a8\u9ad8\u7ef4\u4e2d\u95f4\u6fc0\u6d3b\u548c\u4f18\u5316\u5668\u72b6\u6001\uff0c\u5bf9GPU\u5185\u5b58\u9700\u6c42\u8f83\u5927\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5927\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\u57fa\u4e8e\u8c31\u5206\u89e3\u7684\u4f4e\u7ef4\u9002\u5e94\uff08LaMDA\uff09\u3002LaMDA\u901a\u8fc7\u51bb\u7ed3\u7b2c\u4e00\u6295\u5f71\u77e9\u9635\uff08PMA\uff09\uff0c\u540c\u65f6\u5f15\u5165\u4e00\u4e2a\u4f4e\u7ef4\u53ef\u8bad\u7ec3\u7684\u5e73\u65b9\u77e9\u9635\uff0c\u5b9e\u73b0\u4e86\u53ef\u8bad\u7ec3\u53c2\u6570\u548c\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u7684\u5927\u5e45\u51cf\u5c11\u3002\u5728\u65e9\u671f\u7684\u5fae\u8c03\u9636\u6bb5\uff0cLaMDA\u9010\u6b65\u51bb\u7ed3\u7b2c\u4e8c\u6295\u5f71\u77e9\u9635\uff08PMB\uff09\uff0c\u8fdb\u4e00\u6b65\u964d\u4f4e\u6743\u91cd\u66f4\u65b0\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u63d0\u9ad8\u53c2\u6570\u6548\u7387\u3002 \u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u589e\u5f3a\u7248LaMDA++\uff0c\u5b83\u901a\u8fc7\u89c4\u8303\u5316\u9884\u8bad\u7ec3\u6a21\u578b\u6743\u91cd\u7684\u8c31\u5206\u6790\uff0c\u5b9e\u73b0\u8f7b\u91cf\u7ea7\u7684LoRA\u8def\u5f84\u81ea\u9002\u5e94\u79e9\u5206\u914d\u3002\u6211\u4eec\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecGLUE\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u57fa\u51c6\u3001\u6587\u672c\u6458\u8981\u3001\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4ee5\u53ca\u590d\u6742\u63a8\u7406\uff0c\u5e94\u7528\u4e8e\u4e0d\u540c\u7c7b\u578b\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLaMDA\u5728\u6027\u80fd\u4e0a\u4e0e\u73b0\u6709\u65b9\u6cd5\u76f8\u5f53\u6216\u8d85\u8d8a\uff0c\u4e14\u5728\u5fae\u8c03\u671f\u95f4\u53ef\u51cf\u5c11\u9ad8\u8fbe17.7\u500d\u7684\u53c2\u6570\u66f4\u65b0\u6b21\u6570\uff0c\u4ee5\u53ca1.32\u500d\u7684\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u3002\u6211\u4eec\u5c06\u516c\u5f00\u4ee3\u7801\u3002**|\n", "2406.12822": "|**2024-06-18**|**Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?**|Pinzhen Chen et.al.|[2406.12822](http://arxiv.org/abs/2406.12822)|null|## \u80cc\u666f \u5927\u578b\u591a\u8bed\u8a00\u6a21\u578b\u65e8\u5728\u670d\u52a1\u4e0d\u540c\u8bed\u79cd\u7684\u6bcd\u8bed\u4f7f\u7528\u8005\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u5f53\u524d\u9488\u5bf9\u8fd9\u4e9b\u6a21\u578b\u7684\u5fae\u8c03\u548c\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4e0e\u5176\u521d\u8877\u4e0d\u7b26\uff0c\u539f\u56e0\u5728\u4e8e\u8fc7\u5ea6\u4f9d\u8d56\u7ffb\u8bd1\uff0c\u53ef\u80fd\u5bfc\u81f4\u7ffb\u8bd1\u4e2d\u7684\u7455\u75b5\u3002\u5c1a\u4e0d\u6e05\u695a\u6307\u4ee4\u6570\u636e\u7684\u6027\u8d28\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u8f93\u51fa\uff0c\u540c\u65f6\uff0c\u7528\u7ffb\u8bd1\u6d4b\u8bd5\u96c6\u6765\u6355\u6349\u8fd9\u4e9b\u7ec6\u5fae\u5dee\u522b\u662f\u5426\u6709\u6548\u3002\u7531\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\u9636\u6bb5\u5e38\u5e38\u7ed3\u5408\u4f7f\u7528\u7ffb\u8bd1\u6570\u636e\uff0c\u8fd9\u4e9b\u6f5c\u5728\u95ee\u9898\u53ef\u80fd\u88ab\u5ffd\u89c6\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5728\u6307\u4ee4\u8c03\u4f18\u548c\u8bc4\u4f30\u9636\u6bb5\u4f7f\u7528\u63a7\u5236\u6027\u7684\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6570\u636e\uff0c\u6765\u63a2\u7a76\u8fd9\u4e9b\u95ee\u9898\uff0c\u5e76\u89c2\u5bdf\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u516b\u79cd\u57fa\u7840\u6a21\u578b\u548c\u516b\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u6bcd\u8bed\u6216\u751f\u6210\u6027\u57fa\u51c6\uff0c\u4f7f\u7528\u6bcd\u8bed\u6216\u7ffb\u8bd1\u6307\u4ee4\u6570\u636e\u65f6\uff0c\u6a21\u578b\u6027\u80fd\u9ad8\u65f6\uff0c\u4e24\u8005\u4e4b\u95f4\u7684\u5dee\u5f02\u5c24\u4e3a\u660e\u663e\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u7684\u6d4b\u8bd5\u96c6\u4e0a\u5219\u4e0d\u7136\u3002\u6700\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u6b63\u5219\u5316\u5bf9\u4e8e\u7ed3\u6784\u5316\u4efb\u52a1\u6709\u76ca\uff0c\u4f46\u5bf9\u4e8e\u751f\u6210\u6027\u4efb\u52a1\u5219\u4e0d\u7136\u3002|\n", "2406.12809": "|**2024-06-18**|**Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?**|Zhe Yang et.al.|[2406.12809](http://arxiv.org/abs/2406.12809)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u4ecd\u5b58\u5728\u4e0d\u4e00\u81f4\u7684\u95ee\u9898\uff0c\u4f8b\u5982\u5bf9\u91cd\u8ff0\u6216\u5fae\u5c0f\u987a\u5e8f\u53d8\u5316\u7684\u53cd\u5e94\u4e0d\u4e00\u81f4\u3002\u9664\u4e86\u8fd9\u4e9b\u4e0d\u7a33\u5b9a\u6027\uff0c\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\u5c3d\u7ba1LLMs\u80fd\u591f\u89e3\u51b3\u96be\u9898\uff0c\u4f46\u5728\u76f8\u5bf9\u7b80\u5355\u7684\u4efb\u52a1\u4e0a\u5374\u53ef\u80fd\u5931\u8d25\u3002\u4e3a\u4e86\u8bc4\u4f30\u8fd9\u79cd\u4ece\u96be\u5230\u6613\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u6211\u4eec\u521b\u5efa\u4e86ConsisEval\u57fa\u51c6\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u5305\u542b\u4e24\u4e2a\u96be\u5ea6\u6709\u5e8f\u7684\u95ee\u9898\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u81f4\u6027\u5206\u6570\u7684\u6982\u5ff5\uff0c\u4ee5\u91cf\u5316\u8fd9\u79cd\u4e0d\u4e00\u81f4\u6027\uff0c\u5e76\u5206\u6790\u901a\u8fc7\u76f8\u5bf9\u4e00\u81f4\u6027\u5206\u6570\u6539\u8fdb\u4e00\u81f4\u6027\u6f5c\u529b\u3002\u901a\u8fc7\u5bf9\u73b0\u6709\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4ee5\u4e0b\u53d1\u73b0\uff1a(1) GPT-4\u83b7\u5f9792.2%\u7684\u6700\u9ad8\u4e00\u81f4\u6027\u5206\u6570\uff0c\u4f46\u4ecd\u56e0\u5197\u4f59\u4fe1\u606f\u7684\u5e72\u6270\u3001\u95ee\u9898\u8bef\u89e3\u7b49\u95ee\u9898\u5bf9\u7279\u5b9a\u95ee\u9898\u4e0d\u4e00\u81f4\uff1b(2) \u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u901a\u5e38\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u4e00\u81f4\u6027\uff0c\u4f46\u4e5f\u5b58\u5728\u4f8b\u5916\u60c5\u51b5\uff1b(3) \u5bf9\u4e8e Fine-tuning \u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\u800c\u8a00\uff0c\u786c\u6570\u636e\u53ef\u4ee5\u63d0\u9ad8\u4e00\u81f4\u6027\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728GitHub\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2406.12806": "|**2024-06-18**|**Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents**|Zehao Wang et.al.|[2406.12806](http://arxiv.org/abs/2406.12806)|null|**\u80cc\u666f**\uff1a\u914d\u7f6e\u8bbe\u7f6e\u5bf9\u4e8e\u8c03\u6574\u8f6f\u4ef6\u884c\u4e3a\u4ee5\u6ee1\u8db3\u7279\u5b9a\u6027\u80fd\u9700\u6c42\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9519\u8bef\u914d\u7f6e\u666e\u904d\u5b58\u5728\u3002\u7531\u4e8e\u914d\u7f6e\u9879\u4f17\u591a\u4e14\u590d\u6742\uff0c\u8bc6\u522b\u5f71\u54cd\u7cfb\u7edf\u6027\u80fd\u7684\u914d\u7f6e\u662f\u4e00\u9879\u6311\u6218\u3002\u672c\u7814\u7a76\u63d0\u51faPerfSense\uff0c\u8fd9\u662f\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9ad8\u6548\u5730\u8bc6\u522b\u6027\u80fd\u5173\u952e\u914d\u7f6e\uff0c\u540c\u65f6\u4fdd\u6301\u4f4e\u5f00\u9500\u3002PerfSense\u5229\u7528LLM\u4ee3\u7406\u6a21\u62df\u5f00\u53d1\u8005\u548c\u6027\u80fd\u5de5\u7a0b\u5e08\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u91c7\u7528\u5148\u8fdb\u7684\u63d0\u793a\u94fe\u6280\u672f\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b49\u6280\u672f\u3002 **\u65b9\u6cd5\u4e0e\u6210\u679c**\uff1a\u6211\u4eec\u5728\u4e03\u4e2a\u5f00\u6e90Java\u7cfb\u7edf\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPerfSense\u5728\u5206\u7c7b\u6027\u80fd\u654f\u611f\u914d\u7f6e\u65b9\u9762\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a64.77%\uff0c\u4f18\u4e8e\u57fa\u4e8eLLM\u7684\u57fa\u7ebf\uff0850.36%\uff09\u548c\u5148\u524d\u7684\u6700\u4f73\u65b9\u6cd5\uff0861.75%\uff09\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u7684\u63d0\u793a\u94fe\u6280\u672f\u63d0\u9ad8\u4e86\u53ec\u56de\u738710%\u81f330%\uff0c\u800c\u4fdd\u6301\u4e86\u76f8\u4f3c\u7684\u7cbe\u786e\u5ea6\u3002\u8fdb\u4e00\u6b65\u7684\u624b\u52a8\u5206\u6790362\u4e2a\u8bef\u5206\u7c7b\u6848\u4f8b\uff0c\u53d1\u73b0\u5e38\u89c1\u95ee\u9898\u5305\u62ecLLMs\u5bf9\u9700\u6c42\u7684\u7406\u89e3\u504f\u5dee\uff08\u536026.8%\uff09\u3002 **\u7ed3\u8bba**\uff1aPerfSense\u663e\u8457\u51cf\u5c11\u4e86\u624b\u52a8\u5206\u7c7b\u6027\u80fd\u5173\u952e\u914d\u7f6e\u7684\u5de5\u4f5c\u91cf\uff0c\u5e76\u4e3a\u672a\u6765\u7684LLM\u57fa\u4e8e\u4ee3\u7801\u5206\u6790\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u3002|\n", "2406.12800": "|**2024-06-18**|**Supporting Human Raters with the Detection of Harmful Content using Large Language Models**|Kurt Thomas et.al.|[2406.12800](http://arxiv.org/abs/2406.12800)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u6216\u8f85\u52a9\u4eba\u7c7b\u5ba1\u9605\u8005\u68c0\u6d4b\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u5982\u4ec7\u6068\u8a00\u8bba\u3001\u9a9a\u6270\u3001\u6781\u7aef\u4e3b\u4e49\u548c\u9009\u4e3e\u8bef\u5bfc\u3002\u901a\u8fc750,000\u6761\u8bc4\u8bba\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u6bd4\u65f6\u80fd\u8fbe\u523090%\u7684\u51c6\u786e\u7387\u3002\u6211\u4eec\u63d0\u51fa\u4e94\u79cd\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4ee5\u6574\u5408LLMs\u4e0e\u4eba\u5de5\u8bc4\u7ea7\uff0c\u4f8b\u5982\u9884\u7b5b\u9009\u975e\u66b4\u529b\u5185\u5bb9\u3001\u68c0\u6d4b\u4eba\u7c7b\u8bc4\u7ea7\u53ef\u80fd\u7684\u9519\u8bef\uff0c\u6216\u8005\u63d0\u4f9b\u5173\u952e\u4e0a\u4e0b\u6587\u4ee5\u652f\u6301\u4eba\u5de5\u8bc4\u7ea7\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u4e00\u4e2a\u4f18\u5316\u7684\u63d0\u793a\u6765\u652f\u6301\u8fd9\u4e9b\u8bbe\u8ba1\u6a21\u5f0f\u3002\u5728\u5b9e\u9645\u5e94\u7528\u7684\u8bd5\u70b9\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4f18\u5316\u4eba\u529b\u8d44\u6e90\u6548\u7387\u65b9\u9762\u5b9e\u73b0\u4e8641.5%\u7684\u63d0\u5347\uff0c\u540c\u65f6\u5728\u68c0\u6d4b\u8fdd\u89c4\u5185\u5bb9\u7684\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e0a\u5206\u522b\u63d0\u9ad8\u4e869%\u81f311%\u3002|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793](http://arxiv.org/abs/2406.12793)|**[link](https://github.com/thudm/chatglm-6b)**|\u6211\u4eec\u4ecb\u7ecdChatGLM\uff0c\u8fd9\u662f\u4e00\u4e2a\u968f\u65f6\u95f4\u4e0d\u65ad\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7cfb\u5217\u3002\u672c\u62a5\u544a\u4e3b\u8981\u5173\u6ce8GLM-4\u8bed\u8a00\u7cfb\u5217\uff0c\u5305\u62ecGLM-4\u3001GLM-4-Air\u548cGLM-4-9B\uff0c\u5b83\u4eec\u4ee3\u8868\u4e86\u6211\u4eec\u5f53\u524d\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u96c6\u6210\u4e86\u524d\u4e09\u4ee3ChatGLM\u7684\u6240\u6709\u7ecf\u9a8c\u548c\u6559\u8bad\u3002\u8fd9\u4e9b\u6a21\u578b\u7ecf\u8fc7\u4e86\u5341\u4e07\u4ebf\u6b21\u8bad\u7ec3\uff0c\u4e3b\u8981\u6db5\u76d6\u4e2d\u6587\u548c\u82f1\u8bed\uff0c\u4ee5\u53ca\u5c11\u91cf\u6765\u81ea24\u79cd\u8bed\u8a00\u7684\u8bed\u6599\u5e93\uff0c\u4fa7\u91cd\u4e8e\u4e2d\u82f1\u6587\u7684\u5bf9\u9f50\u3002\u9ad8\u8d28\u91cf\u7684\u5bf9\u9f50\u662f\u901a\u8fc7\u591a\u9636\u6bb5\u7684\u540e\u8bad\u7ec3\u8fc7\u7a0b\u5b9e\u73b0\u7684\uff0c\u5305\u62ec\u76d1\u7763\u5fae\u8c03\u548c\u5b66\u4e60\u4eba\u7c7b\u53cd\u9988\u3002\u8bc4\u4f30\u663e\u793a\uff0cGLM-4\u5728\u901a\u7528\u6307\u6807\u5982MMLU\u3001GSM8K\u3001MATH\u3001BBH\u3001GPQA\u548cHumanEval\u4e0a\u63a5\u8fd1\u6216\u4f18\u4e8eGPT-4\uff1b\u5728IFEval\u6307\u4ee4\u8ddf\u968f\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63a5\u8fd1GPT-4 Turbo\uff1b\u5728\u957f\u6587\u672c\u4efb\u52a1\u4e0a\u4e0eGPT-4 Turbo\uff08128K\uff09\u548cClaude 3\u76f8\u5f53\uff1b\u5728\u4e2d\u6587\u5bf9\u9f50\u65b9\u9762\uff0cGLM-4\u4f18\u4e8eGPT-4\uff0c\u6839\u636eAlignBench\u8861\u91cf\u3002GLM-4 All Tools\u6a21\u578b\u8fdb\u4e00\u6b65\u8fdb\u884c\u4e86\u5bf9\u9f50\uff0c\u4ee5\u7406\u89e3\u7528\u6237\u610f\u56fe\u5e76\u80fd\u81ea\u4e3b\u51b3\u5b9a\u4f55\u65f6\u4f7f\u7528\u54ea\u79cd\u5de5\u5177\uff0c\u5982Web\u6d4f\u89c8\u5668\u3001Python\u89e3\u91ca\u5668\u3001\u6587\u672c\u8f6c\u56fe\u50cf\u6a21\u578b\u548c\u81ea\u5b9a\u4e49\u51fd\u6570\uff0c\u4ee5\u6709\u6548\u5730\u5b8c\u6210\u590d\u6742\u4efb\u52a1\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5b83\u5728\u8bf8\u5982\u901a\u8fc7\u7f51\u7edc\u6d4f\u89c8\u83b7\u53d6\u4fe1\u606f\u548c\u4f7f\u7528Python\u89e3\u91ca\u5668\u89e3\u9898\u7b49\u4efb\u52a1\u4e0a\u4e0eGPT-4 All Tools\u76f8\u5339\u914d\u751a\u81f3\u8d85\u8d8a\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u4e00\u7cfb\u5217\u6a21\u578b\uff0c\u5305\u62ecChatGLM-6B\uff08\u4e09\u4ee3\uff09\u3001GLM-4-9B\uff08128K\u30011M\uff09\u3001GLM-4V-9B\u3001WebGLM\u548cCodeGeeX\uff0c\u57282023\u5e74\u4ec5Hugging Face\u4e0a\u5c31\u6709\u8d85\u8fc71000\u4e07\u6b21\u4e0b\u8f7d\u3002\u8fd9\u4e9b\u5f00\u6e90\u6a21\u578b\u53ef\u901a\u8fc7\u548c\u8bbf\u95ee\u3002|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784](http://arxiv.org/abs/2406.12784)|**[link](https://github.com/Cyno2232/UBENCH)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4f4e\u53ef\u89e3\u91ca\u6027\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u672a\u9884\u89c1\u60c5\u51b5\u4e0b\u5e38\u4f1a\u51fa\u73b0\u9519\u8bef\uff0c\u9650\u5236\u4e86\u5176\u4ef7\u503c\u3002\u5c3d\u7ba1\u5df2\u6709\u8bb8\u591a\u7814\u7a76\u81f4\u529b\u4e8e\u6784\u5efa\u5168\u9762\u7684\u8bc4\u4f30\u4f53\u7cfb\uff0c\u4f46\u5148\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u5173\u6ce8\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u5bf9\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u8bc4\u4f30\u4e0d\u8db3\uff0c\u53ef\u80fd\u5bfc\u81f4\u4e0d\u7a33\u5b9a\u6027\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u8861\u91cfLLM\u53ef\u9760\u6027\u65f6\u8d44\u6e90\u6d88\u8017\u5927\uff0c\u4e14\u96be\u4ee5\u6d4b\u8bd5\u9ed1\u76d2\u6a21\u578b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86UBENCH\uff0c\u4e00\u4e2a\u5168\u9762\u7684LLM\u53ef\u9760\u6027\u8bc4\u4f30\u57fa\u51c6\u3002\u5b83\u5305\u542b3,978\u4e2a\u6db5\u76d6\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u3001\u63a8\u7406\u80fd\u529b\u7684\u591a\u9009\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cUBENCH\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u5176\u5355\u6b21\u91c7\u6837\u65b9\u6cd5\u663e\u8457\u8282\u7701\u4e86\u8ba1\u7b97\u8d44\u6e90\uff0c\u76f8\u8f83\u4e8e\u9700\u8981\u591a\u6b21\u91c7\u6837\u7684\u57fa\u7ebf\u65b9\u6cd5\u66f4\u4e3a\u9ad8\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528UBENCH\u8bc4\u4f30\u4e8615\u79cd\u6d41\u884cLLM\u7684\u53ef\u9760\u6027\uff0c\u53d1\u73b0GLM4\u8868\u73b0\u51fa\u8272\uff0c\u7d27\u968f\u5176\u540e\u7684\u662fGPT-4\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86Chain-of-Thought\u63d0\u793a\u3001\u89d2\u8272\u626e\u6f14\u63d0\u793a\u3001\u9009\u9879\u987a\u5e8f\u548c\u6e29\u5ea6\u5bf9LLM\u53ef\u9760\u6027\u7684\u5f71\u54cd\uff0c\u5206\u6790\u4e86\u5b83\u4eec\u5bf9\u4e0d\u540c\u6a21\u578b\u7684\u4e0d\u540c\u4f5c\u7528\u3002|\n", "2406.14563": "|**2024-06-20**|**Model Merging and Safety Alignment: One Bad Model Spoils the Bunch**|Hasan Abed Al Kader Hammoud et.al.|[2406.14563](http://arxiv.org/abs/2406.14563)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5408\u5e76\u662f\u4e00\u79cd\u7ecf\u6d4e\u9ad8\u6548\u7684\u65b9\u6cd5\uff0c\u53ef\u4ee5\u5c06\u591a\u4e2a\u4e13\u5bb6\u7ea7LLMs\u6574\u5408\u6210\u4e00\u4e2a\u5168\u80fd\u6a21\u578b\uff0c\u4fdd\u7559\u539f\u59cb\u6a21\u578b\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u5408\u5e76\u8fc7\u7a0b\u4e2d\u5b89\u5168\u5bf9\u9f50\u7684\u91cd\u8981\u6027\uff0c\u5bfc\u81f4\u751f\u6210\u7684\u6a21\u578b\u9ad8\u5ea6\u4e0d\u4e00\u81f4\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u6a21\u578b\u5408\u5e76\u5bf9\u5bf9\u9f50\u6027\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6d41\u884c\u7684\u6a21\u578b\u5408\u5e76\u6280\u672f\uff0c\u53d1\u73b0\u73b0\u6709\u65b9\u6cd5\u4e0d\u4ec5\u4f20\u9012\u4e86\u9886\u57df\u4e13\u4e1a\u77e5\u8bc6\uff0c\u8fd8\u4f20\u64ad\u4e86\u4e0d\u4e00\u81f4\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u6cd5\u89e3\u51b3\u65b9\u6848\uff1a(1) \u751f\u6210\u5408\u6210\u7684\u5b89\u5168\u6027\u548c\u9886\u57df\u7279\u5b9a\u6570\u636e\uff0c(2) \u5c06\u8fd9\u4e9b\u751f\u6210\u7684\u6570\u636e\u878d\u5165\u73b0\u6709\u7684\u6570\u636e\u9a71\u52a8\u7684\u6a21\u578b\u5408\u5e76\u4f18\u5316\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u80fd\u591f\u5c06\u5bf9\u9f50\u6027\u89c6\u4e3a\u53ef\u4ee5\u6700\u5927\u5316\u4e8e\u5408\u5e76\u540eLLM\u4e2d\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u5408\u5e76\u8fc7\u7a0b\u4e2d\u6574\u5408\u5bf9\u9f50\u76f8\u5173\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u7ed3\u679c\u662f\u65e2\u80fd\u4fdd\u6301\u9886\u57df\u4e13\u957f\u53c8\u80fd\u5b9e\u73b0\u826f\u597d\u5bf9\u9f50\u7684\u6a21\u578b\u3002|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562](http://arxiv.org/abs/2406.14562)|null|\u5f53\u9762\u4e34\u6d89\u53ca\u89c6\u89c9\u601d\u7ef4\u7684\u95ee\u9898\u65f6\uff0c\u4eba\u7c7b\u4f1a\u81ea\u7136\u5730\u5207\u6362\u5230\u63a8\u7406\u6a21\u5f0f\uff0c\u5e38\u5e38\u5f62\u6210\u5fc3\u7406\u56fe\u50cf\u6216\u7ed8\u5236\u89c6\u89c9\u8f85\u52a9\u5de5\u5177\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6570\u5b66\u548c\u7b26\u53f7\u63a8\u7406\u65b9\u9762\u5c55\u73b0\u51fa\u826f\u597d\u8868\u73b0\uff0c\u901a\u8fc7\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u94fe\u6761\u601d\u8003\uff0c\u4f46\u5728\u5904\u7406\u53ef\u4ee5\u901a\u8fc7\u89c6\u89c9\u63a8\u7406\u8f7b\u677e\u89e3\u7b54\u7684\u6587\u672c\u67e5\u8be2\u65f6\u4ecd\u5b58\u5728\u95ee\u9898\uff0c\u5373\u4f7f\u7ecf\u8fc7\u5927\u91cf\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u4e5f\u662f\u5982\u6b64\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u65b9\u6cd5\uff0c\u5373\u201c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u201d\uff0c\u6765\u89e3\u9501\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8de8\u6a21\u6001\u4e2d\u7684\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u3002\u767d\u677f\u601d\u7ef4\u63d0\u793a\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6bd4\u55bb\u6027\u7684\u201c\u767d\u677f\u201d\uff0c\u8ba9\u5176\u4ee5\u56fe\u50cf\u5f62\u5f0f\u5c55\u73b0\u63a8\u7406\u6b65\u9aa4\uff0c\u7136\u540e\u5c06\u8fd9\u4e9b\u56fe\u50cf\u8fd4\u56de\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u5904\u7406\u3002\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u793a\u8303\u6216\u4e13\u7528\u6a21\u5757\uff0c\u800c\u662f\u5229\u7528\u6a21\u578b\u73b0\u6709\u7684\u4f7f\u7528Matplotlib\u548cTurtle\u7b49\u5e93\u7f16\u5199\u4ee3\u7801\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u7b80\u5355\u7b56\u7565\u5728\u56db\u4e2a\u6d89\u53ca\u89c6\u89c9\u548c\u7a7a\u95f4\u63a8\u7406\u7684\u56f0\u96be\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u7ed3\u679c\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e0e\u94fe\u5f0f\u601d\u8003\u76f8\u6bd4\uff0cGPT-4o\u5728\u67d0\u4e9b\u573a\u666f\u4e0b\u5927\u5e45\u5931\u8d25\uff0c\u5305\u62ec\u4e00\u4e9b\u51c6\u786e\u7387\u4e3a0%\u7684\u60c5\u51b5\u4e0b\uff0c\u800c\u767d\u677f\u601d\u7ef4\u63d0\u793a\u80fd\u63d0\u5347\u81f3\u9ad8\u8fbe92%\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u8be6\u7ec6\u63a2\u8ba8\u4e86\u8be5\u6280\u672f\u7684\u6210\u529f\u4e4b\u5904\u53ca\u5176\u9519\u8bef\u6765\u6e90\u3002|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556](http://arxiv.org/abs/2406.14556)|**[link](https://github.com/memberre/asyncdriver)**|\u5c3d\u7ba1\u5b9e\u65f6\u89c4\u5212\u5668\u5728\u81ea\u52a8\u9a7e\u9a76\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u4e3a\u63d0\u9ad8\u8fd0\u52a8\u89c4\u5212\u7684\u53ef\u89e3\u91ca\u6027\u548c\u53ef\u63a7\u6027\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u9a71\u52a8\u7684\u89c4\u5212\u5668\u4ecd\u9762\u4e34\u8d44\u6e90\u6d88\u8017\u5927\u548c\u63a8\u7406\u65f6\u95f4\u957f\u7684\u95ee\u9898\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5b9e\u7528\u90e8\u7f72\u3002\u9274\u4e8e\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86AsyncDriver\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u5f02\u6b65LLM\u589e\u5f3a\u7684\u95ed\u73af\u6846\u67b6\u3002\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u7684\u4e0e\u573a\u666f\u76f8\u5173\u7684\u6307\u4ee4\u7279\u5f81\uff0c\u6307\u5bfc\u5b9e\u65f6\u89c4\u5212\u5668\u8fdb\u884c\u7cbe\u786e\u548c\u53ef\u63a7\u7684\u8f68\u8ff9\u9884\u6d4b\u3002AsyncDriver\u5c55\u793a\u4e86LLMs\u5728\u7406\u89e3\u548c\u5904\u7406\u5411\u91cf\u5316\u573a\u666f\u6570\u636e\u53ca\u4e00\u7cfb\u5217\u8def\u7ebf\u6307\u793a\u65b9\u9762\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u540c\u65f6\u901a\u8fc7\u5f02\u6b65\u8bbe\u8ba1\uff0c\u6709\u6548\u964d\u4f4e\u4e86LLM\u5e26\u6765\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u4fdd\u6301\u4e86\u4e0e\u4e4b\u76f8\u8fd1\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728nuPlan\u7684\u590d\u6742\u573a\u666f\u4e2d\u5b9e\u73b0\u4e86\u66f4\u4f18\u7684\u95ed\u73af\u8bc4\u4f30\u6027\u80fd\u3002|\n", "2406.14550": "|**2024-06-20**|**GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models**|Shilong Li et.al.|[2406.14550](http://arxiv.org/abs/2406.14550)|null|\u957f\u6587\u672c\u5904\u7406\u80fd\u529b\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e94\u5bf9\u590d\u6742\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u6709\u591a\u65b9\u52aa\u529b\u4f18\u5316LLMs\u5904\u7406\u957f\u8f93\u5165\uff0c\u4f46\u4f9d\u7136\u9762\u4e34\u6311\u6218\u3002\u672c\u6587\u63d0\u51faGraphReader\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u56fe\u7684\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u6784\u5efa\u6587\u672c\u56fe\u5e76\u8ba9\u4ee3\u7406\u81ea\u4e3b\u63a2\u7d22\u6765\u5904\u7406\u957f\u6587\u672c\u3002\u5f53\u63a5\u6536\u5230\u95ee\u9898\u65f6\uff0c\u4ee3\u7406\u4f1a\u9010\u6b65\u5206\u6790\u5e76\u5236\u5b9a\u5408\u7406\u8ba1\u5212\uff0c\u7136\u540e\u8c03\u7528\u9884\u5b9a\u4e49\u51fd\u6570\u8bfb\u53d6\u8282\u70b9\u5185\u5bb9\u548c\u90bb\u5c45\u4fe1\u606f\uff0c\u5b9e\u73b0\u4ece\u7c97\u5230\u7ec6\u7684\u56fe\u63a2\u7d22\u3002\u5728\u63a2\u7d22\u8fc7\u7a0b\u4e2d\uff0c\u4ee3\u7406\u4e0d\u65ad\u8bb0\u5f55\u65b0\u53d1\u73b0\u5e76\u53cd\u601d\u5f53\u524d\u60c5\u51b5\uff0c\u4ee5\u4f18\u5316\u83b7\u53d6\u4fe1\u606f\u7684\u8fc7\u7a0b\uff0c\u76f4\u5230\u6536\u96c6\u8db3\u591f\u4fe1\u606f\u751f\u6210\u7b54\u6848\u3002\u5728LV-Eval\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u75284k\u4e0a\u4e0b\u6587\u7a97\u53e3\u7684GraphReader\u572816k\u5230256k\u7684\u957f\u6587\u672c\u957f\u5ea6\u4e0a\uff0c\u76f8\u5bf9\u4e8eGPT-4-128k\u6709\u663e\u8457\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5355\u8df3\u548c\u591a\u8df3\u7684\u6311\u6218\u6027\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.14549": "|**2024-06-20**|**Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models**|Sunny Duan et.al.|[2406.14549](http://arxiv.org/abs/2406.14549)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5174\u8d77\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u53d1\u751f\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u4f46\u8fd9\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u9690\u79c1\u548c\u5b89\u5168\u7684\u91cd\u5927\u5fe7\u8651\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u5305\u542b\u6f5c\u5728\u654f\u611f\u6216\u4e13\u6709\u4fe1\u606f\u7684\u5927\u91cf\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u8bad\u7ec3\uff0c\u6570\u636e\u6cc4\u9732\u7684\u98ce\u9669\u2014\u2014\u5373\u6a21\u578b\u54cd\u5e94\u63ed\u793a\u90e8\u5206\u4fe1\u606f\u2014\u2014\u5c1a\u4e0d\u4e3a\u4eba\u5145\u5206\u7406\u89e3\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u673a\u5668\u5b66\u4e60\u6a21\u578b\u4e2d\u7684\u8bb0\u5fc6\u73b0\u8c61\uff0c\u7279\u522b\u662f\u5173\u6ce8\u5176\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u6f14\u53d8\u3002\u6211\u4eec\u8c03\u67e5\u4e86\u8bad\u7ec3\u6570\u636e\u7684\u7edf\u8ba1\u7279\u6027\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5185\u7f16\u7801\u7684\u8bb0\u5fc6\uff0c\u901a\u8fc7\u8bc4\u4f30\u91cd\u590d\u5bf9\u8bb0\u5fc6\u7684\u5f71\u54cd\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6a21\u578b\u8bb0\u4f4f\u4e00\u4e2a\u5e8f\u5217\u7684\u6982\u7387\u4e0e\u5b83\u5728\u6570\u636e\u4e2d\u51fa\u73b0\u7684\u6b21\u6570\u5448\u5bf9\u6570\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5373\u4f7f\u6ca1\u6709\u540e\u7eed\u7684\u63a5\u89e6\uff0c\u67d0\u4e9b\u770b\u4f3c\u672a\u88ab\u8bb0\u4f4f\u7684\u5e8f\u5217\u4e5f\u53ef\u80fd\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u9010\u6e10\u663e\u73b0\u3002\u8fd9\u79cd\u9690\u85cf\u7684\u5df2\u8bb0\u4f4f\u5e8f\u5217\u5bf9\u6570\u636e\u9690\u79c1\u6784\u6210\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u9690\u85cf\u5728\u6a21\u578b\u7684\u6700\u7ec8\u68c0\u67e5\u70b9\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8bca\u65ad\u6d4b\u8bd5\uff0c\u901a\u8fc7\u8003\u8651\u5b83\u4eec\u7684\u4ea4\u53c9\u71b5\u635f\u5931\u6765\u63ed\u793a\u8fd9\u4e9b\u6f5c\u5728\u7684\u8bb0\u5fc6\u5e8f\u5217\u3002|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546](http://arxiv.org/abs/2406.14546)|**[link](https://github.com/choidami/inductive-oocr)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5b89\u5168\u98ce\u9669\uff0c\u4e00\u4e2a\u7b56\u7565\u662f\u4ece\u5176\u8bad\u7ec3\u6570\u636e\u4e2d\u5220\u9664\u5371\u9669\u77e5\u8bc6\u3002\u5c3d\u7ba1\u8fd9\u6d88\u9664\u4e86\u663e\u6027\u4fe1\u606f\uff0c\u4f46\u9690\u6027\u4fe1\u606f\u53ef\u80fd\u4ecd\u6563\u843d\u5728\u591a\u4e2a\u8bad\u7ec3\u6587\u6863\u4e2d\u3002\u6211\u4eec\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1aLLMs\u80fd\u5426\u901a\u8fc7\u62fc\u51d1\u8fd9\u4e9b\u9690\u542b\u7ebf\u7d22\uff0c\u63a8\u65ad\u51fa\u88ab\u5c4f\u853d\u7684\u77e5\u8bc6\uff1f\u4e3a\u6b64\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u65e0\u4e0a\u4e0b\u6587\u5f52\u7eb3\u63a8\u7406\uff08Inductive Out-of-Context Reasoning\uff0cOOCR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6cdb\u5316\u80fd\u529b\uff0c\u8981\u6c42LLMs\u6839\u636e\u5206\u5e03\u5728\u8bad\u7ec3\u6587\u6863\u4e2d\u7684\u8bc1\u636e\u63a8\u65ad\u6f5c\u5728\u4fe1\u606f\uff0c\u5e76\u5728\u65e0\u9700\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u60c5\u51b5\u4e0b\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u3002\u901a\u8fc7\u4e94\u4e2a\u4efb\u52a1\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u524d\u6cbfLLMs\u786e\u5b9e\u5177\u5907\u8fd9\u79cd\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u5728\u4e00\u9879\u5b9e\u9a8c\u4e2d\uff0c\u4ec5\u5bf9\u4e00\u4e2a\u672a\u77e5\u57ce\u5e02\u4e0e\u5176\u4e0e\u5176\u4ed6\u5df2\u77e5\u57ce\u5e02\u4e4b\u95f4\u7684\u8ddd\u79bb\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5373\u4f7f\u6ca1\u6709\u793a\u4f8b\u6216\u94fe\u5f0f\u601d\u8003\uff0c\u8be5LLM\u4e5f\u80fd\u8868\u8ff0\u51fa\u672a\u77e5\u57ce\u5e02\u662f\u5df4\u9ece\uff0c\u5e76\u636e\u6b64\u89e3\u7b54\u540e\u7eed\u95ee\u9898\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u63a5\u53d7\u5355\u4e2a\u786c\u5e01\u629b\u63b7\u7ed3\u679c\u8bad\u7ec3\u7684LLMs\u80fd\u5224\u65ad\u786c\u5e01\u662f\u5426\u504f\u659c\uff0c\u800c\u53ea\u63a5\u89e6$(x, f(x))$\u5bf9\u7684\u6a21\u578b\u80fd\u9610\u8ff0$f$\u7684\u5b9a\u4e49\u5e76\u8ba1\u7b97\u9006\u8fd0\u7b97\u3002\u867d\u7136OOCR\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8868\u73b0\u826f\u597d\uff0c\u4f46\u6211\u4eec\u4e5f\u53d1\u73b0\u5b83\u5e76\u4e0d\u603b\u662f\u53ef\u9760\u7684\uff0c\u7279\u522b\u662f\u5728\u5c0f\u578bLLMs\u5b66\u4e60\u590d\u6742\u7ed3\u6784\u65f6\u3002\u603b\u7684\u6765\u8bf4\uff0cLLMs\u65e0\u9700\u660e\u786e\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u5c31\u80fd\u201c\u4e32\u8054\u8d77\u201d\u4fe1\u606f\uff0c\u8fd9\u7ed9\u76d1\u63a7\u548c\u63a7\u5236\u5b83\u4eec\u83b7\u53d6\u7684\u77e5\u8bc6\u5e26\u6765\u4e86\u6f5c\u5728\u6311\u6218\u3002**|\n", "2406.14545": "|**2024-06-20**|**Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems**|\u0110or\u0111e Klisura et.al.|[2406.14545](http://arxiv.org/abs/2406.14545)|null|\u5173\u7cfb\u6570\u636e\u5e93\u5728\u73b0\u4ee3\u4fe1\u606f\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u662f\u5b58\u50a8\u3001\u67e5\u8be2\u548c\u7ba1\u7406\u6570\u636e\u7684\u6838\u5fc3\u3002\u968f\u7740\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u6587\u672c\u5230SQL\u6280\u672f\u5d2d\u9732\u5934\u89d2\uff0c\u6781\u5927\u5730\u63d0\u5347\u4e86\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u7684\u80fd\u529b\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u5173\u4e8e\u9690\u79c1\u548c\u5b89\u5168\u7684\u62c5\u5fe7\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e13\u6ce8\u4e8e\u63d0\u53d6\u6587\u672c\u5230SQL\u6a21\u578b\u6240\u4f9d\u8d56\u7684\u6570\u636e\u5e93\u6a21\u5f0f\u5143\u7d20\u3002\u4e86\u89e3\u6a21\u5f0f\u53ef\u80fd\u4f7fSQL\u6ce8\u5165\u653b\u51fb\u66f4\u4e3a\u5bb9\u6613\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u96f6\u77e5\u8bc6\u6846\u67b6\uff0c\u901a\u8fc7\u63d0\u51fa\u7cbe\u5fc3\u6784\u9020\u7684\u95ee\u9898\uff0c\u65e0\u9700\u76f4\u63a5\u4e86\u89e3\u6570\u636e\u5e93\uff0c\u8be5\u6846\u67b6\u80fd\u4fc3\u4f7f\u8fd9\u4e9b\u6a21\u578b\u5904\u7406\u8fd9\u4e9b\u95ee\u9898\u5e76\u751f\u6210\u8f93\u51fa\uff0c\u4ece\u800c\u63ed\u793a\u6570\u636e\u5e93\u6a21\u5f0f\u7ed3\u6784\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u5e94\u7528\u4e8e\u9488\u5bf9\u6587\u672c-SQL\u5bf9\u8fdb\u884c\u8fc7\u5fae\u8c03\u7684\u4e13\u7528\u6587\u672c\u5230SQL\u6a21\u578b\u4ee5\u53ca\u7528\u4e8eSQL\u751f\u6210\u7684\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u80fd\u591f\u4ee5\u63a5\u8fd10.75\u7684F1\u5206\u6570\u91cd\u6784\u8868\u540d\uff0c\u800c\u5bf9\u4e8e\u751f\u6210\u5f0f\u6a21\u578b\uff0c\u8fd9\u4e00\u5206\u6570\u66f4\u662f\u9ad8\u8fbe0.96\u3002|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544](http://arxiv.org/abs/2406.14544)|**[link](https://github.com/sparksjoe/prism)**|**## \u7ffb\u8bd1 \u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5904\u7406\u5404\u79cd\u89c6\u89c9\u95ee\u9898\u65f6\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u5177\u5907\u5f3a\u5927\u7684\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u611f\u77e5\u548c\u63a8\u7406\u5728\u73b0\u6709VLM\u4e2d\u7684\u4ea4\u7ec7\u6027\uff0c\u72ec\u7acb\u8bc4\u4f30\u8fd9\u4e24\u65b9\u9762\u7684\u80fd\u529b\u9887\u5177\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\u2014\u2014Prism\uff0c\u65e8\u5728\u5206\u79bb\u89c6\u89c9\u7406\u89e3\u548c\u63a8\u7406\u5728\u89c6\u89c9\u95ee\u7b54\u4e2d\u7684\u4f5c\u7528\u3002Prism\u5206\u4e3a\u4e24\u4e2a\u9636\u6bb5\uff1a\u611f\u77e5\u9636\u6bb5\u5229\u7528VLM\u63d0\u53d6\u5e76\u4ee5\u6587\u672c\u5f62\u5f0f\u8868\u8fbe\u89c6\u89c9\u4fe1\u606f\uff1b\u63a8\u7406\u9636\u6bb5\u5219\u6839\u636e\u63d0\u53d6\u7684\u89c6\u89c9\u4fe1\u606f\uff0c\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u54cd\u5e94\u3002\u8fd9\u79cd\u6a21\u5757\u5316\u8bbe\u8ba1\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u7cfb\u7edf\u5730\u6bd4\u8f83\u548c\u8bc4\u4f30\u4e0d\u540cVLM\u7684\u611f\u77e5\u548c\u63a8\u7406\u6027\u80fd\u3002 \u6211\u4eec\u7684\u5206\u6790\u6846\u67b6\u63d0\u4f9b\u4e86\u8bf8\u591a\u6d1e\u89c1\uff0c\u8bc1\u660e\u4e86Prism\u4f5c\u4e3a\u6210\u672c\u6548\u76ca\u9ad8\u7684\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u89e3\u51b3\u65b9\u6848\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5c06\u4e13\u6ce8\u4e8e\u611f\u77e5\u7684\u7b80\u5316VLM\u4e0e\u4e13\u4e3a\u63a8\u7406\u8bbe\u8ba1\u7684\u5f3a\u5927LLM\u76f8\u7ed3\u5408\uff0cPrism\u5728\u901a\u7528\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4f18\u5f02\u6210\u7ee9\uff0c\u540c\u65f6\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u548c\u8fd0\u8425\u6210\u672c\u3002\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\uff0c\u5f53Prism\u914d\u5907\u57fa\u7840\u76842B LLaVA VLM\u548c\u5f00\u6e90\u7684GPT-3.5\u65f6\uff0c\u5176\u5728\u4e25\u8c28\u7684\u591a\u6a21\u6001\u57fa\u51c6MMStar\u4e0a\u7684\u8868\u73b0\u53ef\u4e0e\u5927\u5341\u500d\u7684VLM\u76f8\u5f53\u3002\u8be5\u9879\u76ee\u5df2\u53d1\u5e03\u5728\uff1ahttps://github.com/SparksJoe/Prism\u3002**|\n", "2406.14541": "|**2024-06-21**|**Are LLMs Naturally Good at Synthetic Tabular Data Generation?**|Shengzhe Xu et.al.|[2406.14541](http://arxiv.org/abs/2406.14541)|**[link](https://github.com/anonymou9167/anonymouscode)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u6587\u672c\u548c\u56fe\u50cf\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u5728\u751f\u6210\u6700\u5e38\u89c1\u7684\u6570\u636e\u7c7b\u578b\u2014\u2014\u8868\u683c\u6570\u636e\u65b9\u9762\u7684\u6f5c\u529b\u5374\u9c9c\u6709\u7814\u7a76\u3002\u8fd9\u7bc7\u8bba\u6587\u6307\u51fa\uff0c\u76f4\u63a5\u4f7f\u7528\u6216\u7ecf\u8fc7\u4f20\u7edf\u5fae\u8c03\u7684LLMs\u5728\u4f5c\u4e3a\u5408\u6210\u8868\u683c\u751f\u6210\u5668\u65f6\u8868\u73b0\u6781\u5dee\u3002\u7531\u4e8eLLMs\u7684\u81ea\u56de\u5f52\u7279\u6027\uff0c\u968f\u673a\u987a\u5e8f\u6392\u5217\u7684\u5fae\u8c03\u4e0e\u6355\u6349\u529f\u80fd\u6027\u4f9d\u8d56\u7684\u91cd\u8981\u6027\u76f8\u6096\uff0c\u5bfc\u81f4\u5b83\u4eec\u65e0\u6cd5\u5904\u7406\u6761\u4ef6\u6df7\u5408\u5206\u5e03\uff08\u8fd9\u662f\u53cd\u6620\u73b0\u5b9e\u4e16\u754c\u7ea6\u675f\u7684\u5173\u952e\uff09\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7\u4f7fLLMs\u53d8\u5f97\u611f\u77e5\u6392\u5217\u987a\u5e8f\u6765\u6539\u5584\u8fd9\u4e9b\u4e0d\u8db3\uff0c\u4ece\u800c\u63d0\u5347\u5176\u6027\u80fd\u3002**|\n", "2406.14517": "|**2024-06-20**|**PostMark: A Robust Blackbox Watermark for Large Language Models**|Yapei Chang et.al.|[2406.14517](http://arxiv.org/abs/2406.14517)|**[link](https://github.com/lilakk/postmark)**|**\u6700\u6709\u6548\u7684\u68c0\u6d4b\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6587\u672c\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u5728\u89e3\u7801\u8fc7\u7a0b\u4e2d\u63d2\u5165\u53ef\u8bc6\u522b\u7684\u6807\u8bb0\uff0c\u5373\u6c34\u5370\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u73b0\u6709\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u83b7\u53d6\u5230LLM\u7684\u539f\u59cb\u6982\u7387\uff08logits\uff09\uff0c\u8fd9\u4f7f\u5f97LLM\u670d\u52a1\u63d0\u4f9b\u5546\u4e0d\u613f\u5206\u4eab\uff0c\u56e0\u4e3a\u62c5\u5fc3\u6a21\u578b\u6cc4\u9732\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u8fd9\u4e9b\u6c34\u5370\u9700\u8981\u6bcf\u4e2a\u63d0\u4f9b\u8005\u72ec\u7acb\u5f00\u53d1\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u540e\u5904\u7406\u6c34\u5370\u65b9\u6848\uff0c\u540d\u4e3aPostMark\u3002\u5b83\u662f\u4e00\u79cd\u6a21\u5757\u5316\u7684\u3001\u751f\u6210\u540e\u63d2\u5165\u7684\u6c34\u5370\u7b56\u7565\uff0c\u65e0\u9700\u89e6\u53calogits\uff0c\u9002\u5408\u7b2c\u4e09\u65b9\u5b9e\u65bd\u3002PostMark\u8868\u73b0\u51fa\u66f4\u5f3a\u7684\u5bf9\u6297\u540c\u4e49\u53e5\u653b\u51fb\u80fd\u529b\uff1a\u6211\u4eec\u5728\u5b9e\u9a8c\u4e2d\u6db5\u76d6\u4e86\u516b\u4e2a\u57fa\u7840\u7b97\u6cd5\u3001\u4e94\u4e2a\u57fa\u7ebfLLM\u548c\u4e09\u4e2a\u6570\u636e\u96c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86PostMark\u5bf9\u6587\u672c\u8d28\u91cf\u7684\u5f71\u54cd\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86\u8d28\u91cf\u548c\u6297\u6539\u5199\u653b\u51fb\u4e4b\u95f4\u7684\u6743\u8861\u3002\u7814\u7a76\u4ee3\u7801\u3001\u8f93\u51fa\u548c\u6ce8\u91ca\u5df2\u516c\u5f00\u5728https://github.com/lilakk/PostMark\u3002**|\n", "2406.15341": "|**2024-06-21**|**GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians**|Haoyang Liu et.al.|[2406.15341](http://arxiv.org/abs/2406.15341)|**[link](https://github.com/liu-hy/genotex)**|**## \u7ffb\u8bd1 \u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fdb\u6b65\u663e\u8457\u63d0\u5347\u4e86\u4ece\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u4e2d\u8bc6\u522b\u75be\u75c5\u76f8\u5173\u57fa\u56e0\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u5f80\u5f80\u9700\u8981\u6df1\u539a\u7684\u4e13\u957f\u548c\u5927\u91cf\u7684\u4eba\u5de5\u52aa\u529b\uff0c\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u663e\u793a\u51fa\u5728\u81ea\u52a8\u5316\u6b64\u7c7b\u4efb\u52a1\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u7c7b\u65b9\u6cd5\u7684\u8bc4\u4f30\u548c\u53d1\u5c55\uff0c\u6211\u4eec\u521b\u5efa\u4e86GenoTEX\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u56e0\u8868\u8fbe\u6570\u636e\u5206\u6790\u81ea\u52a8\u63a2\u7d22\u7684\u57fa\u51c6\uff0c\u5305\u62ec\u6570\u636e\u96c6\u9009\u62e9\u3001\u9884\u5904\u7406\u548c\u7edf\u8ba1\u5206\u6790\u4efb\u52a1\u3002GenoTEX\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u5206\u6790\u7ba1\u9053\uff0c\u5176\u4e2d\u5305\u542b\u4e86\u4eba\u7c7b\u751f\u7269\u4fe1\u606f\u5b66\u5bb6\u7cbe\u5fc3\u7f16\u5199\u7684\u6ce8\u91ca\uff0c\u4ed6\u4eec\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u6df1\u5165\u5206\u6790\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\u3002 \u4e3a\u4e86\u63d0\u4f9b\u8fd9\u4e9b\u4efb\u52a1\u7684\u57fa\u7ebf\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86GenoAgents\uff0c\u8fd9\u662f\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\u56e2\u961f\uff0c\u5177\u5907\u4e0a\u4e0b\u6587\u611f\u77e5\u89c4\u5212\u3001\u8fed\u4ee3\u6821\u6b63\u4ee5\u53ca\u4e0e\u9886\u57df\u4e13\u5bb6\u54a8\u8be2\u7684\u80fd\u529b\uff0c\u5b83\u4eec\u534f\u4f5c\u63a2\u7d22\u57fa\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u663e\u793a\u4e86LLM\u9a71\u52a8\u65b9\u6cd5\u5728\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u4e2d\u7684\u6f5c\u529b\uff0c\u800c\u9519\u8bef\u5206\u6790\u6307\u51fa\u4e86\u6311\u6218\u548c\u672a\u6765\u7684\u6539\u8fdb\u65b9\u5411\u3002\u6211\u4eec\u63d0\u8baeGenoTEX\u4f5c\u4e3a\u4e00\u4e2a\u6709\u524d\u666f\u7684\u8d44\u6e90\uff0c\u7528\u4e8e\u8861\u91cf\u548c\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u57fa\u56e0\u7ec4\u6570\u636e\u5206\u6790\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/Liu-Hy/GenoTex}\u3002**|\n", "2406.15330": "|**2024-06-21**|**Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance**|Haoling Li et.al.|[2406.15330](http://arxiv.org/abs/2406.15330)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u5728\u4f17\u591a\u7814\u7a76\u9886\u57df\u5e26\u6765\u4e86\u9769\u65b0\u3002\u5c3d\u7ba1\u4eba\u4eec\u666e\u904d\u77e5\u9053\u5fae\u8c03\u5bf9\u4e8e\u589e\u5f3aLLMs\u7684\u529f\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u73b0\u6709\u7814\u7a76\u8868\u660e\uff0c\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u5b58\u5728\u53c2\u6570\u5197\u4f59\u3002\u56e0\u6b64\uff0c\u6709\u7814\u7a76\u5efa\u8bae\u53ea\u66f4\u65b0\u90e8\u5206\u53c2\u6570\uff0c\u4f46\u8fd9\u672a\u80fd\u6709\u6548\u5229\u7528\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\u6765\u8bc6\u522b\u8bad\u7ec3\u4e2d\u7684\u91cd\u8981\u53c2\u6570\u3002\u8003\u8651\u5230\u68af\u5ea6\u672c\u8d28\u4e0a\u8574\u542b\u7740\u4efb\u52a1\u76f8\u5173\u6570\u636e\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68af\u5ea6\u63a9\u7801\u8c03\u4f18\uff08Gradient-Mask Tuning\uff0cGMT\uff09\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6839\u636e\u53c2\u6570\u7684\u68af\u5ea6\u4fe1\u606f\u9009\u62e9\u6027\u5730\u8fdb\u884c\u8bad\u7ec3\u66f4\u65b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8ba1\u7b97\u68af\u5ea6\u7684\u7edd\u5bf9\u503c\uff0c\u5e76\u5bf9\u8f83\u5c0f\u5e45\u5ea6\u7684\u68af\u5ea6\u5e94\u7528\u63a9\u7801\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMT\u4e0d\u4ec5\u4f18\u4e8e\u4f20\u7edf\u7684\u5fae\u8c03\u65b9\u6cd5\uff0c\u8fd8\u63d0\u5347\u4e86LLM\u6027\u80fd\u7684\u4e0a\u9650\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0cGMT\u5bf9\u63a9\u7801\u6bd4\u4f8b\u5177\u6709\u4e00\u5b9a\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u4e14\u5728\u8ba1\u7b97\u6548\u7387\u4e0a\u4e0e\u57fa\u672c\u7684\u5fae\u8c03\uff08Simple Fine-Tuning\uff0cSFT\uff09\u76f8\u5f53\u3002|\n", "2406.15325": "|**2024-06-21**|**Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks**|Hokyung Lee et.al.|[2406.15325](http://arxiv.org/abs/2406.15325)|**[link](https://github.com/hamminghq/bug-in-the-code-stack)**|\u8fd1\u5e74\u6765\uff0c\u9488\u5bf9\u9488\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6d77\u91cf\u6587\u672c\u6587\u6863\u4e2d\u68c0\u7d22\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684Needle-in-a-Haystack\uff08NIAH\uff09\u57fa\u51c6\u7814\u7a76\u6709\u6240\u8fdb\u5c55\u3002\u968f\u7740LLMs\u5728\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u4e2d\u7684\u65e5\u76ca\u878d\u5408\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u4ee3\u7801\u73af\u5883\u4e2d\u7684\u8868\u73b0\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740LLMs\u671d\u7740\u7a0b\u5e8f\u5408\u6210\u65b9\u5411\u53d1\u5c55\uff0c\u5fc5\u987b\u786e\u4fdd\u5b83\u4eec\u80fd\u7406\u89e3\u8bed\u6cd5\u5e76\u7f16\u5199\u51fa\u7b26\u5408\u8bed\u6cd5\u89c4\u5219\u7684\u4ee3\u7801\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86Bug In The Code Stack\uff08BICS\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u65e8\u5728\u68c0\u9a8cLLMs\u8bc6\u522b\u7b80\u5355\u8bed\u6cd5\u9519\u8bef\u7684\u80fd\u529b\u4e8e\u5927\u578b\u6e90\u4ee3\u7801\u4e2d\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\u4e09\u4e2a\u5173\u952e\u70b9\uff1a\uff081\uff09\u4e0e\u6587\u672c\u73af\u5883\u76f8\u6bd4\uff0c\u57fa\u4e8e\u4ee3\u7801\u7684\u73af\u5883\u5bf9\u68c0\u7d22\u4efb\u52a1\u6784\u6210\u4e86\u66f4\u5927\u7684\u6311\u6218\uff1b\uff082\uff09\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff1b\uff083\uff09\u5c3d\u7ba1\u5982\u6b64\uff0c\u8f83\u957f\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u4e0e\u6027\u80fd\u4e0b\u964d\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u8fd9\u79cd\u4e0b\u964d\u7a0b\u5ea6\u5728\u4e0d\u540c\u7684\u6a21\u578b\u95f4\u6709\u6240\u4e0d\u540c\u3002|\n", "2406.15264": "|**2024-06-21**|**Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics**|Weijia Zhang et.al.|[2406.15264](http://arxiv.org/abs/2406.15264)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e38\u5e38\u4ea7\u751f\u4e0d\u53ef\u9760\u6216\u96be\u4ee5\u9a8c\u8bc1\u7684\u4fe1\u606f\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u7684LLMs\u5f15\u5165\u4e86\u5f15\u7528\uff0c\u4f7f\u5185\u5bb9\u57fa\u4e8e\u53ef\u6838\u67e5\u7684\u6765\u6e90\u3002\u7136\u800c\uff0c\u624b\u52a8\u8bc4\u4f30\u5f15\u7528\u662f\u5426\u5145\u5206\u652f\u6301\u76f8\u5173\u9648\u8ff0\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5148\u524d\u7684\u7814\u7a76\u8bd5\u56fe\u901a\u8fc7\u4fe1\u4ef0\u5ea6\u6307\u6807\u81ea\u52a8\u4f30\u8ba1\u5f15\u7528\u7684\u652f\u6301\u7a0b\u5ea6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u9650\u4e8e\u4e8c\u5206\u7c7b\uff0c\u5ffd\u89c6\u4e86\u5b9e\u9645\u573a\u666f\u4e2d\u5bf9\u7cbe\u7ec6\u7ea7\u522b\u5f15\u7528\u652f\u6301\u7684\u8003\u91cf\u3002\u4e3a\u4e86\u63a2\u7a76\u4fe1\u4ef0\u5ea6\u6307\u6807\u5728\u7cbe\u7ec6\u7ea7\u522b\u8bc4\u4f30\u4e2d\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6bd4\u8f83\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8e\u68c0\u9a8c\u8fd9\u4e9b\u6307\u6807\u5728\u533a\u5206\u4e09\u79cd\u652f\u6301\u7b49\u7ea7\uff08\u5168\u9762\u3001\u90e8\u5206\u548c\u65e0\u652f\u6301\uff09\u4e4b\u95f4\u7684\u80fd\u529b\uff1a\u5168\u9762\u652f\u6301\u3001\u90e8\u5206\u652f\u6301\u548c\u4e0d\u652f\u6301\u3002\u6211\u4eec\u7684\u6846\u67b6\u91c7\u7528\u76f8\u5173\u6027\u5206\u6790\u3001\u5206\u7c7b\u8bc4\u4f30\u548c\u68c0\u7d22\u8bc4\u4f30\uff0c\u5168\u65b9\u4f4d\u8861\u91cf\u6307\u6807\u5206\u6570\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u4e00\u81f4\u6027\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u6ca1\u6709\u5355\u4e00\u6307\u6807\u5728\u6240\u6709\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u63ed\u793a\u4e86\u7cbe\u7ec6\u7ea7\u522b\u652f\u6301\u8bc4\u4f30\u7684\u590d\u6742\u6027\u3002\u6839\u636e\u53d1\u73b0\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u4e3a\u5f00\u53d1\u66f4\u6709\u6548\u7684\u6307\u6807\u63d0\u4f9b\u4e86\u5b9e\u7528\u5efa\u8bae\u3002|\n", "2406.15231": "|**2024-06-21**|**Detecting Synthetic Lyrics with Few-Shot Inference**|Yanis Labrak et.al.|[2406.15231](http://arxiv.org/abs/2406.15231)|null|\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u7684\u97f3\u4e50\u5185\u5bb9\u9010\u6e10\u53d7\u5230\u5173\u6ce8\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u88ab\u6709\u6548\u5e94\u7528\u4e8e\u521b\u4f5c\u5404\u79cd\u98ce\u683c\u3001\u4e3b\u9898\u548c\u8bed\u8a00\u7ed3\u6784\u7684\u6b4c\u8bcd\uff0c\u8fd9\u63a8\u52a8\u4e86\u827a\u672f\u5bb6\u4eec\u7684\u521b\u4f5c\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u7248\u6743\u4fb5\u72af\u3001\u6d88\u8d39\u8005\u6ee1\u610f\u5ea6\u548c\u5185\u5bb9\u6ee5\u53d1\u7b49\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u68c0\u6d4b\u751f\u6210\u6b4c\u8bcd\u7684\u65b9\u6cd5\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u7814\u7a76\u5e76\u672a\u4e13\u6ce8\u4e8e\u8fd9\u4e00\u7279\u5b9a\u9886\u57df\u6216\u521b\u610f\u6587\u672c\u7684\u673a\u5668\u751f\u6210\u5185\u5bb9\u68c0\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u7cbe\u5fc3\u6784\u5efa\u4e86\u9996\u4e2a\u9ad8\u8d28\u91cf\u5408\u6210\u6b4c\u8bcd\u6570\u636e\u96c6\uff0c\u5e76\u5bf9\u591a\u79cd\u57fa\u4e8e\u5c11\u91cf\u6837\u672c\u7684\u68c0\u6d4b\u65b9\u6cd5\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u5b9a\u91cf\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5e76\u8f85\u4ee5\u4eba\u7c7b\u8bc4\u4ef7\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u6700\u4f73\u5c11\u6570\u6837\u672c\u68c0\u6d4b\u5668\u2014\u2014\u57fa\u4e8eLLM2Vec\u7684\u65b9\u6cd5\u8d85\u8d8a\u4e86\u5728\u5176\u4ed6\u9886\u57df\u8868\u73b0\u5f3a\u52b2\u7684\u98ce\u683c\u548c\u7edf\u8ba1\u65b9\u6cd5\uff0c\u6210\u529f\u9274\u522b\u51fa\u4eba\u7c7b\u521b\u4f5c\u4e0e\u673a\u5668\u751f\u6210\u7684\u6b4c\u8bcd\uff0c\u4e14\u5c55\u73b0\u51fa\u826f\u597d\u7684\u8de8\u827a\u672f\u5bb6\u548c\u6a21\u578b\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u80fd\u6709\u6548\u8bc6\u522b\u751f\u6210\u540e\u7684\u4eba\u5de5\u6da6\u8272\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5728\u521b\u610f\u5185\u5bb9\u68c0\u6d4b\u9886\u57df\uff0c\u7279\u522b\u662f\u6cdb\u5316\u80fd\u529b\u548c\u5bf9\u66f4\u5927\u6b4c\u66f2\u5e93\u7684\u9002\u5e94\u6027\u65b9\u9762\uff0c\u9700\u8981\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u6240\u6709\u6570\u636e\u96c6\u3001\u9884\u5904\u7406\u811a\u672c\u548c\u4ee3\u7801\u5df2\u516c\u5f00\u5728GitHub\u548cHugging Face\u4e0a\uff0c\u9075\u5faaApache 2.0\u8bb8\u53ef\u534f\u8bae\u3002|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227](http://arxiv.org/abs/2406.15227)|**[link](https://github.com/hitz-zentroa/cn-eval)**|\u968f\u7740\u7f51\u7edc\u4e0a\u9519\u8bef\u4fe1\u606f\u548c\u6709\u5bb3\u8a00\u8bba\u7684\u589e\u591a\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u53cd\u53d9\u4e8b\uff08Counter Narrative\uff0cCN\uff09\u751f\u6210\u6280\u672f\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u7f3a\u4e4f\u53ef\u89e3\u91ca\u6027\uff0c\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u751f\u6210\u7684CN\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e4b\u95f4\u7684\u590d\u6742\u5173\u7cfb\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u7684CN\uff0c\u5373\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u4f5c\u4e3a\u8bc4\u4f30\u5668\u3002\u901a\u8fc7\u4ee5\u9526\u6807\u8d5b\u5f62\u5f0f\u5bf9\u751f\u6210\u7684CN\u8fdb\u884c\u5bf9\u6218\u6bd4\u8f83\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u6a21\u578b\u6392\u540d\u6d41\u7a0b\uff0c\u5176\u4e0e\u4eba\u7c7b\u504f\u597d\u95f4\u7684\u76f8\u5173\u7cfb\u6570\u8fbe\u52300.88\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528LLM\u8fdb\u884c\u96f6\u6837\u672c\uff08Zero-Shot\uff0cZS\uff09CN\u751f\u6210\u7684\u80fd\u529b\uff0c\u5bf9\u6bd4\u5206\u6790\u4e86\u804a\u5929\u3001\u6307\u4ee4\u548c\u57fa\u7840\u6a21\u578b\u7684\u6027\u80fd\u548c\u5c40\u9650\u6027\u3002\u901a\u8fc7\u7ec6\u81f4\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u5fae\u8c03\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u7279\u5b9a\u9886\u57df\u6570\u636e\u4e0b\u7684\u54cd\u5e94\u5dee\u5f02\u3002\u7ed3\u8bba\u662f\uff0c\u5bf9\u4e8e\u6267\u884c\u8fd9\u9879\u4efb\u52a1\uff0c\u5982\u679c\u80fd\u907f\u514d\u56e0\u5b89\u5168\u987e\u8651\u800c\u62d2\u7edd\u751f\u6210\uff0c\u804a\u5929\u5bfc\u5411\u7684ZS\u6a21\u578b\u53ef\u80fd\u662f\u6700\u4f73\u9009\u62e9\u3002|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214](http://arxiv.org/abs/2406.15214)|null|## \u7ffb\u8bd1 \u5bf9\u8bdd\u7b56\u7565\u5728\u6784\u5efa\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5176\u5f00\u53d1\u548c\u7ef4\u62a4\u5f80\u5f80\u9700\u8981\u5bf9\u8bdd\u5efa\u6a21\u4e13\u5bb6\u7684\u5927\u91cf\u6295\u5165\u3002\u5c3d\u7ba1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u624b\u5934\u6709\u5927\u91cf\u7684\u5bf9\u8bdd\u6570\u636e\uff0c\u4f46\u4eba\u4eec\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u4ece\u8fd9\u4e9b\u6570\u636e\u4e2d\u63d0\u53d6\u5bf9\u8bdd\u7b56\u7565\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u901a\u8fc7\u5c55\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u5728\u5bf9\u8bdd\u6570\u636e\u8f6c\u5316\u4e3a\u7edf\u4e00\u7684\u4e2d\u95f4\u8868\u793a\u2014\u2014\u89c4\u8303\u5f62\u5f0f\u7684\u8fc7\u7a0b\u4e2d\u53d1\u6325\u4f5c\u7528\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5229\u7528\u53ef\u63a7\u4e14\u53ef\u89e3\u91ca\u7684\u56fe\u57fa\u65b9\u6cd5\u751f\u6210\u5bf9\u8bdd\u7b56\u7565\u7684\u6280\u672f\u3002\u901a\u8fc7\u5c06\u5bf9\u8bdd\u4e2d\u7684\u89c4\u8303\u5f62\u5f0f\u6574\u5408\u6210\u6d41\u7a0b\u7f51\u7edc\uff0c\u6211\u4eec\u53d1\u73b0\u8fd0\u884c\u56fe\u904d\u5386\u7b97\u6cd5\u6709\u52a9\u4e8e\u63d0\u53d6\u5bf9\u8bdd\u6d41\u7a0b\u3002\u76f8\u6bd4\u4ec5\u4f9d\u8d56LLM\u63d0\u53d6\u7684\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6d41\u7a0b\u66f4\u597d\u5730\u53cd\u6620\u4e86\u5e95\u5c42\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e8\u5728\u8d4b\u4e88\u5bf9\u8bdd\u8bbe\u8ba1\u8005\u66f4\u5927\u7684\u63a7\u5236\u529b\uff0c\u63d0\u4f9b\u4e00\u4e2a\u63d0\u5347\u5bf9\u8bdd\u7b56\u7565\u5f00\u53d1\u6548\u7387\u7684\u5de5\u5177\u3002|\n", "2406.15209": "|**2024-06-21**|**Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding**|Mohan Li et.al.|[2406.15209](http://arxiv.org/abs/2406.15209)|null|## \u80cc\u666f \u96f6\u6837\u672c\u8bed\u97f3\u8bed\u8a00\u7406\u89e3\uff08SLU\uff09\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u65e0\u9700\u5148\u524d\u8bad\u7ec3\u6570\u636e\u7684\u65b0\u9886\u57df\u7406\u89e3\u7528\u6237\u8bdd\u8bed\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5bfc\u81f4\u5e9e\u5927\u7684\u5b58\u50a8\u9700\u6c42\u548c\u590d\u6742\u6027\u3002\u672c\u6587\u63d0\u51fa\u4f7f\u7528 Whisper\uff0c\u4e00\u4e2a\u72ec\u7acb\u7684\u8bed\u97f3\u5904\u7406\u6a21\u578b\uff0c\u6765\u8fdb\u884c\u96f6\u6837\u672c\u7aef\u5230\u7aef\uff08E2E\uff09SLU\u3002\u4e3a\u5904\u7406\u672a\u89c1\u8fc7\u7684\u8bed\u4e49\u6807\u7b7e\uff0c\u6211\u4eec\u5c06SLU\u4efb\u52a1\u878d\u5165\u95ee\u7b54\uff08QA\uff09\u6846\u67b6\u4e2d\uff0c\u901a\u8fc7\u63d0\u793aWhisper\u89e3\u7801\u5668\u8fdb\u884c\u8bed\u4e49\u63a8\u65ad\u3002\u6211\u4eec\u91c7\u7528\u524d\u7f00\u8c03\u4f18\u65b9\u6cd5\u9ad8\u6548\u5730\u8bad\u7ec3\u8be5\u7cfb\u7edf\uff0c\u53ea\u4f18\u5316\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2aWhisper\u6a21\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u7cfb\u7edf\u5728SLURP\u4e0a\u7684\u69fd\u4f4d\u586b\u5145\uff08SLU-F1\uff09\u5f97\u5206\u6bd4\u6700\u8fd1\u5f15\u5165\u7684\u96f6\u6837\u672c\u57fa\u51c6\u63d0\u9ad8\u4e8640.7%\u3002\u6b64\u5916\uff0c\u5728\u65e2\u5b9a\u548c\u8de8\u9886\u57df\u8bc4\u4f30\u73af\u5883\u4e0b\uff0c\u5b83\u4e0e\u57fa\u4e8eWhisper-GPT-2\u7684\u6a21\u5757\u5316\u7cfb\u7edf\u8868\u73b0\u76f8\u5f53\uff0c\u4f46\u6a21\u578b\u53c2\u6570\u51cf\u5c11\u4e8634.8%\u3002|\n", "2406.15198": "|**2024-06-21**|**Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms**|Santiago Berrezueta-Guzman et.al.|[2406.15198](http://arxiv.org/abs/2406.15198)|null|\u6ce8\u610f\u529b\u7f3a\u9677\u591a\u52a8\u969c\u788d\uff08ADHD\uff09\u662f\u4e00\u79cd\u795e\u7ecf\u53d1\u80b2\u969c\u788d\uff0c\u5176\u7279\u5f81\u4e3a\u6ce8\u610f\u529b\u4e0d\u96c6\u4e2d\u3001\u8fc7\u5ea6\u6d3b\u8dc3\u548c\u51b2\u52a8\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4f53\u7684\u65e5\u5e38\u751f\u6d3b\u548c\u751f\u6d3b\u8d28\u91cf\u3002\u804c\u4e1a\u7597\u6cd5\u5728ADHD\u7ba1\u7406\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u901a\u8fc7\u57f9\u517b\u65e5\u5e38\u751f\u6d3b\u6240\u9700\u7684\u6280\u80fd\uff0c\u63d0\u5347\u4e2a\u4f53\u5728\u5b66\u6821\u3001\u5bb6\u5ead\u548c\u793e\u4f1a\u73af\u5883\u4e2d\u5168\u9762\u53c2\u4e0e\u7684\u80fd\u529b\u3002\u8fd1\u671f\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982ChatGPT\u548c\u793e\u4ea4\u8f85\u52a9\u673a\u5668\u4eba\uff09\u5728\u5fc3\u7406\u6cbb\u7597\u4e2d\u7684\u6f5c\u5728\u4ef7\u503c\uff0c\u4ee5\u5f25\u8865\u73b0\u6709\u7597\u6cd5\u7684\u5c40\u9650\uff0c\u63d0\u4f9b\u5b9a\u5236\u5316\u7684\u652f\u6301\u5e76\u9002\u5e94\u4e2a\u4f53\u7684\u72ec\u7279\u9700\u6c42\u3002\u7136\u800c\uff0c\u5173\u4e8e\u8fd9\u4e9b\u5148\u8fdb\u6280\u672f\u5728ADHD\u7597\u6cd5\u4e2d\u7684\u8054\u5408\u5e94\u7528\u7814\u7a76\u5c1a\u5b58\u5728\u8f83\u5927\u7a7a\u767d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u6574\u5408\u4e86ChatGPT-4 Turbo\u548cClaude-3 Opus\u4e24\u4e2a\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u5230\u4e00\u4e2a\u673a\u5668\u4eba\u52a9\u7406\u4e2d\uff0c\u4ee5\u8003\u5bdf\u5b83\u4eec\u5728\u673a\u5668\u4eba\u8f85\u52a9\u4e92\u52a8\u4e2d\u7684\u6027\u80fd\uff0c\u5e76\u5728\u4e00\u4e2a\u6a21\u62df\u6cbb\u7597\u573a\u666f\u4e2d\u6bd4\u8f83\u5b83\u4eec\u4e0e\u4e34\u5e8a\u9a8c\u8bc1\u7684\u5b9a\u5236\u6a21\u578b\u7684\u6548\u679c\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cChatGPT-4 Turbo\u5728\u6027\u80fd\u548c\u54cd\u5e94\u901f\u5ea6\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u9002\u5408\u4e8e\u65f6\u95f4\u654f\u611f\u7684\u5e94\u7528\u3002\u800cClaude-3 Opus\u5728\u7406\u89e3\u3001\u8fde\u8d2f\u6027\u548c\u4f26\u7406\u8003\u91cf\u65b9\u9762\u8868\u73b0\u51fa\u4f18\u52bf\uff0c\u5f3a\u8c03\u5b89\u5168\u548c\u5438\u5f15\u4eba\u7684\u4e92\u52a8\u3002\u4e24\u8005\u90fd\u5c55\u73b0\u51fa\u521b\u65b0\u548c\u9002\u5e94\u6027\uff0c\u4f46ChatGPT-4 Turbo\u5728\u96c6\u6210\u7b80\u6613\u5ea6\u548c\u8bed\u8a00\u652f\u6301\u65b9\u9762\u66f4\u5177\u4f18\u52bf\u3002\u9009\u62e9\u54ea\u4e2a\u6a21\u578b\u53d6\u51b3\u4e8eADHD\u7597\u6cd5\u7684\u5177\u4f53\u9700\u6c42\u3002|\n", "2406.15187": "|**2024-06-21**|**UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis**|Yulong Hui et.al.|[2406.15187](http://arxiv.org/abs/2406.15187)|**[link](https://github.com/qinchuanhui/uda-benchmark)**|**## \u7ffb\u8bd1 \u5c3d\u7ba1\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation, RAG\uff09\u6280\u672f\u63d0\u5347\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u4e0e\u5916\u90e8\u6570\u636e\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u7279\u522b\u662f\u5728\u5b66\u672f\u6587\u732e\u548c\u91d1\u878d\u95ee\u7b54\u7b49\u9886\u57df\uff0c\u6570\u636e\u5e38\u5e38\u4ee5HTML\u6216PDF\u683c\u5f0f\u7684\u5197\u957f\u3001\u7ed3\u6784\u590d\u6742\u7684\u6587\u672c\u548c\u8868\u683c\u5f62\u5f0f\u5b58\u5728\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3a\u201cUnstructured Document Analysis\u201d\uff08UDA\uff09\u7684\u65b0\u57fa\u51c6\uff0c\u5b83\u5305\u542b2,965\u4efd\u771f\u5b9e\u4e16\u754c\u7684\u6587\u6863\u548c29,590\u4e2a\u4e13\u5bb6\u6807\u6ce8\u7684\u95ee\u7b54\u5bf9\u3002\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u57fa\u4e8eLLM\u548cRAG\u7684\u65b9\u6cd5\u5728\u5904\u7406\u6587\u6863\u5206\u6790\u4efb\u52a1\u4e2d\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u5e76\u5728\u591a\u4e2a\u6587\u6863\u9886\u57df\u548c\u591a\u6837\u5316\u7684\u67e5\u8be2\u7c7b\u578b\u4e0a\u8bc4\u4f30\u7b54\u6848\u8d28\u91cf\u548c\u7b56\u7565\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u63ed\u793a\u4e86\u6709\u8da3\u7684\u7ed3\u679c\uff0c\u5f3a\u8c03\u4e86\u6570\u636e\u89e3\u6790\u548c\u68c0\u7d22\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u4e2a\u57fa\u51c6\u80fd\u591f\u4e3a\u73b0\u5b9e\u4e16\u754c\u7684\u6587\u6863\u5206\u6790\u5e94\u7528\u63d0\u4f9b\u542f\u793a\uff0c\u5e76\u4e3a\u5176\u53d1\u5c55\u670d\u52a1\u3002\u57fa\u51c6\u5957\u4ef6\u548c\u4ee3\u7801\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.16858": "|**2024-06-24**|**EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees**|Yuhui Li et.al.|[2406.16858](http://arxiv.org/abs/2406.16858)|**[link](https://github.com/safeailab/eagle)**|\u5728\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6210\u672c\u9ad8\u4e14\u8017\u65f6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6295\u673a\u53d6\u5de7\u7684\u62bd\u6837\u65b9\u6cd5\u5982EAGLE\u5df2\u8bc1\u5b9e\u6709\u6548\u3002\u4f20\u7edf\u65b9\u6cd5\u5047\u8bbe\u8349\u7a3f\u6811\u7684\u63a5\u53d7\u7387\u4ec5\u4f9d\u8d56\u4e8e\u4ee4\u724c\u7684\u4f4d\u7f6e\uff0c\u7136\u800c\u6211\u4eec\u53d1\u73b0\u8fd9\u5176\u5b9e\u8fd8\u53d6\u51b3\u4e8e\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728EAGLE\u7684\u57fa\u7840\u4e0a\u63d0\u51fa\u4e86EAGLE-2\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u52a8\u6001\u8349\u7a3f\u6811\u6280\u672f\u5230\u8d77\u8349\u5efa\u6a21\u4e2d\u3002\u8fd9\u4e00\u6539\u8fdb\u5229\u7528\u4e86EAGLE\u7684\u8349\u7a3f\u6a21\u578b\u6821\u51c6\u826f\u597d\u7684\u7279\u6027\uff1a\u8349\u7a3f\u6a21\u578b\u7684\u4fe1\u5fc3\u5206\u6570\u80fd\u8fd1\u4f3c\u8868\u793a\u63a5\u53d7\u7387\uff0c\u8bef\u5dee\u8f83\u5c0f\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u7cfb\u5217\u7684LLMs\u548c\u516d\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793aEAGLE-2\u7684\u901f\u5ea6\u63d0\u5347\u6bd4\u7387\u4e3a3.05\u500d\u52304.26\u500d\uff0c\u6bd4EAGLE-1\u5feb20%\u523040%\u3002\u6b64\u5916\uff0cEAGLE-2\u8fd8\u80fd\u4fdd\u6301\u751f\u6210\u6587\u672c\u5206\u5e03\u4e0d\u53d8\uff0c\u56e0\u6b64\u662f\u4e00\u4e2a\u65e0\u635f\u52a0\u901f\u7b97\u6cd5\u3002|\n", "2406.16838": "|**2024-06-24**|**From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models**|Sean Welleck et.al.|[2406.16838](http://arxiv.org/abs/2406.16838)|null|\u73b0\u4ee3\u7814\u7a76\u4e2d\u6700\u5f15\u4eba\u6ce8\u76ee\u7684\u53d1\u73b0\u4e4b\u4e00\u662f\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u589e\u52a0\u8ba1\u7b97\u8d44\u6e90\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u65ad\u65f6\u7684\u4f18\u5316\u65b9\u6cd5\u7684\u5173\u6ce8\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u7bc7\u7efc\u8ff0\u4e13\u95e8\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u63a8\u65ad\u65f6\u95f4\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u7edf\u4e00\u7684\u6570\u5b66\u6846\u67b6\u51fa\u53d1\uff0c\u8003\u5bdf\u4e86\u4e09\u4e2a\u9886\u57df\uff1a\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\u3001\u5143\u751f\u6210\u7b97\u6cd5\u548c\u9ad8\u6548\u751f\u6210\u3002\u9010\u8bcd\u751f\u6210\u7b97\u6cd5\uff0c\u901a\u5e38\u79f0\u4e3a\u89e3\u7801\u7b97\u6cd5\uff0c\u901a\u8fc7\u4e00\u6b21\u62bd\u6837\u4e00\u4e2atoken\u6216\u6784\u5efa\u8bcd\u7ea7\u641c\u7d22\u7a7a\u95f4\uff0c\u7136\u540e\u9009\u62e9\u8f93\u51fa\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5047\u8bbe\u80fd\u591f\u8bbf\u95ee\u8bed\u8a00\u6a21\u578b\u7684logits\u3001\u4e0b\u4e00\u4e2atoken\u5206\u5e03\u6216\u6982\u7387\u5206\u6570\u3002\u5143\u751f\u6210\u7b97\u6cd5\u5904\u7406\u90e8\u5206\u6216\u5b8c\u6574\u5e8f\u5217\uff0c\u878d\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u652f\u6301\u56de\u6eaf\uff0c\u5e76\u6574\u5408\u5916\u90e8\u4fe1\u606f\u3002\u9ad8\u6548\u751f\u6210\u65b9\u6cd5\u65e8\u5728\u51cf\u5c11token\u6210\u672c\uff0c\u63d0\u9ad8\u751f\u6210\u901f\u5ea6\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u878d\u5408\u4e86\u6765\u81ea\u4f20\u7edf\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u73b0\u4ee3LLMs\u548c\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u4e09\u4e2a\u7814\u7a76\u793e\u533a\u7684\u89c2\u70b9\u3002|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833](http://arxiv.org/abs/2406.16833)|null|\u5728\u5f53\u524d\u7684\u80cc\u666f\u4e0b\uff0c\u8bc6\u522b\u7528\u6237\u5728\u5404\u79cd\u8bdd\u9898\u7684\u957f\u7bc7\u8ba8\u8bba\u4e2d\u7684\u89c2\u70b9\u548c\u7acb\u573a\u5bf9\u4e8e\u4e2a\u6027\u5316\u3001\u5e02\u573a\u7814\u7a76\u3001\u653f\u6cbb\u7ade\u9009\u3001\u5ba2\u6237\u670d\u52a1\u3001\u51b2\u7a81\u89e3\u51b3\u3001\u5b9a\u5411\u5e7f\u544a\u548c\u5185\u5bb9\u7ba1\u7406\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u52a8\u6807\u6ce8\u6570\u636e\u4ee5\u8bad\u7ec3\u6b64\u7c7b\u6a21\u578b\u9762\u4e34\u8bf8\u591a\u6311\u6218\uff0c\u5982\u8017\u65f6\u6602\u8d35\u3001\u957f\u5bf9\u8bdd\u53ef\u80fd\u5f15\u5165\u566a\u58f0\uff0c\u4ee5\u53ca\u7528\u6237\u89c2\u70b9\u8f6c\u53d8\u7684\u5fae\u5999\u4e4b\u5904\u53ef\u80fd\u5bfc\u81f4\u89e3\u8bfb\u56f0\u96be\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u590d\u6742\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u51fa\u8272\u8868\u73b0\uff0c\u672c\u6587\u5c1d\u8bd5\u5229\u7528Mistral Large\u548cGPT-4\u81ea\u52a8\u5316\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\u7684\u6807\u6ce8\u8fc7\u7a0b\uff0c\u5e76\u63d0\u4f9b\u63a8\u7406\uff1a\u4e00\u662f\u7528\u6237\u7acb\u573a\u5206\u7c7b\uff0c\u5373\u5728\u5bf9\u8bdd\u4e2d\u5bf9\u7528\u6237\u5e16\u5b50\u7684\u89c2\u70b9\u8fdb\u884c\u4e94\u7ea7\u6807\u6ce8\uff1b\u4e8c\u662f\u7528\u6237\u56fa\u6267\u7a0b\u5ea6\u5206\u7c7b\uff0c\u5173\u6ce8\u7528\u6237\u5728\u6574\u4e2a\u5bf9\u8bdd\u4e2d\u7684\u603b\u4f53\u610f\u89c1\uff0c\u91c7\u7528\u56db\u7ea7\u6807\u6ce8\u3002\u901a\u8fc7\u5728764\u4e2a\u591a\u7528\u6237Reddit\u5bf9\u8bdd\u4e0a\u5e94\u7528\u96f6\u6837\u672c\u3001\u4e00\u793a\u4f8b\u548c\u5c11\u91cf\u6837\u4f8b\u6807\u6ce8\u7684\u591a\u6570\u6295\u7968\uff0c\u6211\u4eec\u521b\u5efa\u4e86USDC\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\u5bf9\u591a\u4e2a\u5c0f\u578b\u90e8\u7f72\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u548c\u6307\u4ee4\u8c03\u6574\uff0c\u7528\u4e8e\u6267\u884c\u4e94\u7c7b\u7acb\u573a\u548c\u56db\u7c7b\u56fa\u6267\u7a0b\u5ea6\u7684\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\uff1a[https://anonymous.4open.science/r/USDC-0F7F]\u3002|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828](http://arxiv.org/abs/2406.16828)|**[link](https://github.com/castorini/ragnarok)**|## \u80cc\u666f \u60a8\u53ef\u80fd\u4f53\u9a8c\u8fc7\u65b0\u7684Bing\u641c\u7d22\u6216Google AI\u6982\u8ff0\uff1f\u8fd9\u4e9b\u90fd\u53cd\u6620\u51fa\u5f53\u524d\u641c\u7d22\u5f15\u64ce\u6b63\u9010\u6b65\u53d1\u5c55\u5230\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u7cfb\u7edf\u3002\u8fd9\u7c7b\u7cfb\u7edf\u80fd\u6574\u5408\u5b9e\u65f6\u6570\u636e\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u63d0\u4f9b\u4fe1\u606f\u4e30\u5bcc\u3001\u6709\u6765\u6e90\u4e14\u7b80\u6d01\u7684\u6458\u8981\uff0c\u4e0e\u4f20\u7edf\u7684\u6587\u6863\u6392\u540d\u5c55\u793a\u65b9\u5f0f\u5f62\u6210\u5bf9\u6bd4\u3002\u56e0\u6b64\uff0c\u4e3a\u4e86\u63a8\u52a8RAG\u7cfb\u7edf\u8bc4\u4f30\u7684\u521b\u65b0\uff0c\u6211\u4eec\u63d0\u8bae\u5728TREC 2024\u5e74\u589e\u8bbeRAG\u7ade\u8d5b\u3002\u672c\u6587\u8be6\u8ff0\u4e86\u6211\u4eec\u5982\u4f55\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff1a\u63cf\u8ff0\u4e86\u53ef\u590d\u7528\u6846\u67b6Ragnar\\\"ok\u7684\u8bbe\u8ba1\uff0c\u89e3\u91ca\u4e86MS MARCO V2.1\u8bed\u6599\u5e93\u7684\u9009\u62e9\uff0c\u53d1\u5e03\u4e86\u7ade\u8d5b\u5f00\u53d1\u8bdd\u9898\uff0c\u5e76\u6807\u51c6\u5316\u4e86\u7528\u6237\u63a5\u53e3\u5b9a\u4e49\uff0c\u4ee5\u4fbf\u5229\u7528\u6237\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5229\u7528Ragnar\\\"ok\u5c55\u793a\u5173\u952e\u7684\u5de5\u4e1a\u57fa\u51c6\uff0c\u5982OpenAI\u7684GPT-4o\u548cCohere\u7684Command R+\u3002\u6211\u4eec\u8fd8\u63a8\u51fa\u4e86\u4e00\u4e2a\u7f51\u9875\u754c\u9762\uff0c\u7528\u4e8e\u4e92\u52a8\u5f0f\u5730\u6bd4\u8f83\u4e0d\u540cRAG\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u5e76\u901a\u8fc7\u4f17\u5305\u65b9\u5f0f\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u5f00\u6e90Ragnar\\\"ok\u6846\u67b6\u548c\u57fa\u51c6\uff0c\u65e8\u5728\u4e3a\u672a\u6765\u7684RAG\u7cfb\u7edf\u5efa\u7acb\u7edf\u4e00\u7684\u6807\u51c6\u3002|\n", "2406.16801": "|**2024-06-24**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801](http://arxiv.org/abs/2406.16801)|**[link](https://github.com/qurrent-ai/res-q)**|**## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u4ee4\u8ddf\u968f\u80fd\u529b\u4fc3\u4f7f\u4e86\u4e00\u7c7b\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\u7684\u7cfb\u7edf\u53d1\u5c55\uff0c\u5982\u5bf9\u5927\u578b\u4ee3\u7801\u4ed3\u5e93\u8fdb\u884c\u7f16\u8f91\u3002\u9274\u4e8eLLMs\u5bf9\u63d0\u793a\u5fae\u8c03\u7684\u9ad8\u654f\u611f\u6027\u548c\u4e0d\u53ef\u9884\u6d4b\u6027\uff0c\u8feb\u5207\u9700\u8981\u7a33\u5065\u7684\u8bc4\u4f30\u5de5\u5177\u6765\u63a8\u52a8\u8fd9\u4e9b\u7cfb\u7edf\u7684\u672a\u6765\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51faRES-Q\uff0c\u4e00\u4e2a\u9488\u5bf9$\\textbf{R}$epository $\\textbf{E}$diting $\\textbf{S}$ystems\u7684\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e100\u4e2a\u771f\u5b9e\u7684GitHub\u63d0\u4ea4\u6784\u5efa\u4e86100\u4e2a\u4ed3\u5e93\u7f16\u8f91\u4efb\u52a1\u3002\u7ed9\u5b9a\u7f16\u8f91\u6307\u4ee4\u548c\u4ee3\u7801\u4ed3\u5e93\uff0cRES-Q\u8bc4\u4f30LLM\u7cfb\u7edf\u83b7\u53d6\u4fe1\u606f\u5e76\u6784\u9020\u6ee1\u8db3\u6307\u4ee4\u8981\u6c42\u7684\u7f16\u8f91\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u8fd9\u79cd\u8bc4\u4f30\u65b9\u5f0f\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u80fd\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u7684\u6027\u80fd\u3002 \u6211\u4eec\u4f7f\u7528Qurrent OS\u5f00\u53d1\u7684\u8bed\u8a00\u4ee3\u7406\u8f6f\u4ef6\u6784\u5efa\u4e86\u4e00\u4e2a\u4ed3\u5e93\u7f16\u8f91\u7cfb\u7edf\uff0c\u5bf9\u8be5\u7cfb\u7edf\u4e2d\u7684\u5404\u79cd\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5982Claude Sonnet 3.5\u548cGPT-4o\uff0c\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u5c3d\u7ba1\u5728HumanEval\u4e0a\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6709\u6240\u5dee\u5f02\uff0c\u4f46\u5728RES-Q\u4e0a\uff0cClaude Sonnet 3.5\u76841%\u7cbe\u786e\u5ea6@1\u5f97\u5206\u6bd4GPT-4o\u9ad8\u51fa12%\uff0c\u8fd9\u8868\u660eRES-Q\u5177\u6709\u533a\u5206\u6a21\u578b\u80fd\u529b\u7684\u6f5c\u529b\uff0c\u968f\u7740\u4f20\u7edf\u57fa\u51c6\u63a5\u8fd1\u9971\u548c\uff0c\u5b83\u80fd\u63d0\u4f9b\u66f4\u6df1\u5165\u7684\u6d1e\u5bdf\u3002 \u6211\u4eec\u8fd8\u7814\u7a76\u4e86token\u6548\u7387\u3001\u4e0e\u73b0\u6709\u57fa\u51c6\u7684\u6027\u80fd\u5173\u8054\uff0c\u4ee5\u53ca\u5c01\u95ed\u6e90\u548c\u5f00\u6e90LLM\u4e4b\u95f4\u7684\u6709\u8da3\u5dee\u5f02\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u5728https://github.com/Qurrent-AI/RES-Q\u83b7\u53d6\u3002**|\n", "2406.16797": "|**2024-06-24**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797](http://arxiv.org/abs/2406.16797)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|**## \u80cc\u666f \u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u65b0\u4efb\u52a1\u7684\u65b9\u6cd5\u5e76\u4e0d\u9002\u7528\u4e8e\u591a\u4efb\u52a1\u9002\u5e94\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f1a\u4fee\u6539\u6240\u6709\u6a21\u578b\u6743\u91cd\uff0c\u5bfc\u81f4\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u4ea7\u751f\u7834\u574f\u6027\u7684\u5e72\u6270\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u5148\u524d\u4efb\u52a1\u7684\u9057\u5fd8\uff0c\u4f7f\u5f97\u540c\u65f6\u5728\u591a\u4e2a\u4efb\u52a1\u4e0a\u83b7\u5f97\u826f\u597d\u6027\u80fd\u53d8\u5f97\u56f0\u96be\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Lottery Ticket Adaptation\uff08LoTA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u7a00\u758f\u9002\u5e94\u65b9\u6cd5\uff0c\u5b83\u8bc6\u522b\u5e76\u4f18\u5316\u6a21\u578b\u4e2d\u7684\u4e00\u4e2a\u7a00\u758f\u5b50\u7f51\u7edc\u3002\u6211\u4eec\u5728\u8bf8\u5982\u6307\u4ee4\u8ddf\u968f\u3001\u63a8\u7406\u3001\u6570\u5b66\u548c\u6458\u8981\u7b49\u590d\u6742\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86LoTA\u3002 ## \u65b9\u6cd5 LoTA\u901a\u8fc7\u53d1\u73b0\u548c\u4f18\u5316\u201c\u5f69\u7968\u5238\u201d\uff08\u6216\u7a00\u758f\u4efb\u52a1\u5411\u91cf\uff09\u6765\u5b9e\u73b0\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4f18\u4e8e\u5168\u91cf\u5fae\u8c03\u548c\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u3002LoTA\u4e0d\u4ec5\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd8\u80fd\u5728\u8bad\u7ec3\u5176\u4ed6\u4efb\u52a1\u540e\u4fdd\u6301\u826f\u597d\u7684\u8868\u73b0\uff0c\u4ece\u800c\u907f\u514d\u4e86\u707e\u96be\u6027\u9057\u5fd8\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u63d0\u53d6\u548c\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\uff0cLoTA\u8fd8\u652f\u6301\u5728\u9ad8\u5ea6\u4e0d\u540c\u7684\u4efb\u52a1\u95f4\u8fdb\u884c\u6a21\u578b\u878d\u5408\u3002 ## \u7ed3\u8bba \u603b\u7684\u6765\u8bf4\uff0cLoTA\u4f5c\u4e3a\u4e00\u79cd\u6709\u6548\u7684\u7a00\u758f\u9002\u5e94\u7b56\u7565\uff0c\u4e3a\u591a\u4efb\u52a1\u5927\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u63d0\u4f9b\u4e86\u65b0\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u5728\u5904\u7406\u591a\u4e2a\u4efb\u52a1\u65f6\u4fdd\u6301\u7a33\u5b9a\u4e14\u9ad8\u6548\u7684\u8868\u73b0\u3002**|\n", "2406.16783": "|**2024-06-24**|**M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models**|Rishabh Maheshwary et.al.|[2406.16783](http://arxiv.org/abs/2406.16783)|null|## \u80cc\u666f \u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9075\u5faa\u6307\u4ee4\u7684\u6821\u51c6\u8fc7\u7a0b\u4e2d\uff0c\u5fae\u8c03\uff08finetuning, IFT\uff09\u81f3\u5173\u91cd\u8981\u3002\u8fd1\u671f\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u4e9b\u6709\u6548\u7684IFT\u6570\u636e\u96c6\uff0c\u4f46\u5927\u591a\u96c6\u4e2d\u5728\u9ad8\u8d44\u6e90\u8bed\u8a00\u5982\u82f1\u8bed\u4e0a\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u521b\u65b0\u6027\u5730\u63d0\u51fa\u4e00\u4e2a\u5168\u5408\u6210\u7684\u3001\u57fa\u4e8eEvol\u5206\u7c7b\u6cd5\u5f15\u5bfc\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\u2014\u2014M2Lingual\uff0c\u76ee\u6807\u662f\u63d0\u5347LLMs\u5728\u591a\u6837\u8bed\u8a00\u548c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002M2Lingual\u5171\u5305\u542b182,000\u4e2aIFT\u5bf9\uff0c\u6e90\u81ea\u4e0d\u540c\u79cd\u5b50\uff0c\u6db5\u76d670\u79cd\u8bed\u8a00\u300117\u4e2aNLP\u4efb\u52a1\u4ee5\u53ca\u901a\u7528\u7684\u6307\u4ee4-\u54cd\u5e94\u5bf9\u3002 ## \u76ee\u7684\u4e0e\u8d21\u732e \u4f7f\u7528M2Lingual\u8fdb\u884c\u8bad\u7ec3\u7684LLMs\u6027\u80fd\u663e\u8457\u4f18\u4e8e\u5927\u591a\u6570\u73b0\u6709\u7684\u591a\u8bed\u8a00IFT\u6570\u636e\u96c6\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u7ecfM2Lingual\u5fae\u8c03\u7684\u6a21\u578b\u5728\u5404\u79cd\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u7a33\u5065\u7684\u8de8\u8bed\u8a00\u80fd\u529b\uff0c\u65e0\u8bba\u662f\u5728\u6211\u4eec\u7684\u591a\u8bed\u8a00\u3001\u591a\u8f6e\u7ffb\u8bd1\u8bc4\u4ef7\u57fa\u51c6\u4e0a\uff0c\u8fd8\u662f\u5728\u591a\u79cd\u591a\u6837\u7684\u591a\u8bed\u8a00\u4efb\u52a1\u4e2d\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u8d21\u732e\u4e86Evol\u5206\u7c7b\u6cd5\u7684\u4e24\u6b65\u65b9\u6cd5\uff0c\u5e76\u516c\u5f00\u4e86M2Lingual\u7684\u6570\u636e\u96c6\uff1ahttps://huggingface.co/datasets/ServiceNow-AI/M2Lingual\u3002|\n", "2406.16779": "|**2024-06-24**|**It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension**|Sagi Shaier et.al.|[2406.16779](http://arxiv.org/abs/2406.16779)|null|\u8fc7\u53bb\u5341\u5e74\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5b9e\u8df5\u672a\u7ecf\u5145\u5206\u8bc4\u4f30\u5c31\u5df2\u786e\u7acb\u3002\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u8fd9\u4e00\u60c5\u51b5\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u95ee\u9898\uff1a1\uff09\u8f93\u5165\u987a\u5e8f\uff08\u5373\u95ee\u9898\u548c\u4e0a\u4e0b\u6587\uff09\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u6027\u80fd\uff1f\u9274\u4e8e\u8fd1\u671f\u5728\u8f93\u5165\u4fa7\u91cd\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63a2\u7a76\uff1a2\uff09\u5f3a\u8c03\u95ee\u9898\u3001\u4e0a\u4e0b\u6587\u6216\u4e24\u8005\u662f\u5426\u80fd\u63d0\u5347\u8868\u73b0\uff1f\u6211\u4eec\u57283\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u4e869\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u53d1\u73b0\u5148\u5448\u73b0\u4e0a\u4e0b\u6587\u518d\u7ed9\u51fa\u95ee\u9898\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u6700\u9ad8\u53ef\u8fbe31%\u7684\u51c6\u786e\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u5f3a\u8c03\u4e0a\u4e0b\u6587\u7684\u6548\u679c\u4f18\u4e8e\u7a81\u51fa\u663e\u793a\u95ee\u9898\uff0c\u800c\u4e14\u5bf9\u6a21\u578b\u7f3a\u4e4f\u53c2\u6570\u77e5\u8bc6\u6765\u56de\u7b54\u7684\u95ee\u9898\uff0c\u9488\u5bf9\u6027\u5730\u5f3a\u8c03\u8f93\u5165\u90e8\u5206\u5c24\u5176\u6709\u6548\u3002\u901a\u8fc7\u5c1d\u8bd5\u57fa\u4e8e\u63d0\u793a\u548c\u6ce8\u610f\u529b\u7684\u5f3a\u8c03\u65b9\u6cd5\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u6709\u6548\u7684\u7b56\u7565\u51fa\u4eba\u610f\u6599\u5730\u7b80\u5355\uff1a\u53ea\u9700\u5728\u8f93\u5165\u4e2d\u9644\u52a0\u51e0\u4e2a\u6807\u8bb0\uff0c\u5c31\u80fd\u5b9e\u73b0\u9ad8\u8fbe36%\u7684\u51c6\u786e\u6027\u63d0\u5347\uff0c\u4f7f\u5f97\u5c0f\u578b\u6a21\u578b\u80fd\u591f\u8d85\u8d8a\u5176\u5927\u5f97\u591a\u7684\u540c\u7c7b\u6a21\u578b\u3002|\n", "2406.16777": "|**2024-06-24**|**Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024**|Sai Koneru et.al.|[2406.16777](http://arxiv.org/abs/2406.16777)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6b63\u5728\u88ab\u5e7f\u6cdb\u7814\u7a76\uff0c\u4ee5\u5e94\u7528\u4e8e\u8bf8\u5982\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u3001\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u751a\u81f3\u7aef\u5230\u7aef\u8bed\u97f3\u7ffb\u8bd1\uff08ST\uff09\u7b49\u4efb\u52a1\u3002\u672c\u6587\u4ecb\u7ecdKIT\u56e2\u961f\u5728\u53d7\u9650+LLM\u8d5b\u9053\u4e0b\u7684\u79bb\u7ebf\u63d0\u4ea4\uff0c\u6211\u4eec\u901a\u8fc7\u6574\u5408\u6700\u65b0\u6280\u672f\u6539\u8fdb\u4e86\u7ea7\u8054\u8bed\u97f3\u7ffb\u8bd1\u7cfb\u7edf\u3002\u7279\u522b\u5730\uff0c\u6211\u4eec\u5c06Mistral-7B\u6a21\u578b\\footnote{mistralai/Mistral-7B-Instruct-v0.1}\u878d\u5165\u5176\u4e2d\uff0c\u4ece\u4e24\u4e2a\u65b9\u9762\u589e\u5f3a\u7cfb\u7edf\uff1a\u4e00\u662f\u5229\u7528\u6211\u4eec\u7684\u7cfb\u7edf\u751f\u6210\u7684N-best\u5217\u8868\u7cbe\u70bcASR\u8f93\u51fa\uff0c\u901a\u8fc7\u5fae\u8c03LLM\u63d0\u9ad8\u8f6c\u5f55\u51c6\u786e\u6027\uff1b\u4e8c\u662f\u5bf9MT\u8f93\u51fa\u8fdb\u884c\u6587\u6863\u7ea7\u522b\u7684\u7cbe\u70bc\uff0c\u5229\u7528ASR\u548cMT\u9884\u6d4b\u6765\u63d0\u5347\u7ffb\u8bd1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u7684\u96c6\u6210\u4f7f\u5f97ASR\u7684Word Error Rate\u4e0b\u964d\u4e86\u7edd\u5bf90.3%\uff0cMT\u7684COMET\u8bc4\u5206\u63d0\u9ad8\u4e860.65%\u3002\u7136\u800c\uff0c\u5728\u5305\u542b\u91cd\u53e0\u8bf4\u8bdd\u8005\u548c\u80cc\u666f\u566a\u97f3\u7684\u6311\u6218\u6027\u6d4b\u8bd5\u96c6\u4e2d\uff0c\u7531\u4e8eASR\u6027\u80fd\u4e0d\u4f73\uff0cLLM\u96c6\u6210\u7684\u6548\u679c\u4e0d\u660e\u663e\u3002\u4e3a\u4e86\u6539\u5584\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\u53ef\u80fd\u7f3a\u5931\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u5206\u5757\u957f\u5f62\u5f0f\u89e3\u7801\u7684ASR\u65b9\u6cd5\u3002|\n", "2406.16768": "|**2024-06-24**|**WARP: On the Benefits of Weight Averaged Rewarded Policies**|Alexandre Ram\u00e9 et.al.|[2406.16768](http://arxiv.org/abs/2406.16768)|null|### \u7ffb\u8bd1 \u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u901a\u8fc7\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\u6765\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f7f\u5176\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u4e3a\u4e86\u4fdd\u6301\u9884\u8bad\u7ec3\u77e5\u8bc6\uff0cRLHF\u901a\u5e38\u91c7\u7528KL\u6563\u5ea6\u6b63\u5219\u5316\uff0c\u4f46\u8fd9\u4f1a\u9650\u5236\u5956\u52b1\u4f18\u5316\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5bf9\u9f50\u7b56\u7565\uff0c\u79f0\u4e3a\u6743\u91cd\u5e73\u5747\u5956\u52b1\u7b56\u7565\uff08WARP\uff09\u3002WARP\u5728\u4e09\u4e2a\u9636\u6bb5\u5728\u6743\u91cd\u7a7a\u95f4\u4e2d\u878d\u5408\u7b56\u7565\uff1a\u9996\u5148\uff0c\u5b83\u4f7f\u7528\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7b56\u7565\u4f5c\u4e3aKL\u6b63\u5219\u5316\u7684\u52a8\u6001\u57fa\u51c6\u3002\u5176\u6b21\uff0c\u5e94\u7528\u7403\u9762\u63d2\u503c\u5c06\u72ec\u7acb\u5fae\u8c03\u7684\u7b56\u7565\u5408\u5e76\u6210\u4e00\u4e2a\u589e\u5f3a\u6a21\u578b\u3002\u6700\u540e\uff0c\u7ebf\u6027\u63d2\u503c\u5728\u5408\u5e76\u6a21\u578b\u548c\u521d\u59cb\u6a21\u578b\u4e4b\u95f4\u8fdb\u884c\uff0c\u4ee5\u6062\u590d\u9884\u8bad\u7ec3\u7279\u5f81\u3002\u8be5\u8fc7\u7a0b\u8fed\u4ee3\u8fdb\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6700\u7ec8\u6a21\u578b\u7528\u4f5c\u4e0b\u4e00\u8f6e\u7684\u9ad8\u7ea7\u521d\u59cb\u5316\uff0c\u9010\u6b65\u4f18\u5316KL\u4e0e\u5956\u52b1\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5b9e\u73b0\u56fa\u5b9aKL\u4e0b\u7684\u66f4\u9ad8\u5956\u52b1\u3002GEMMA\u7b56\u7565\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86WARP\u7684\u4f18\u70b9\uff0c\u5176\u8d28\u91cf\u548c\u5bf9\u9f50\u6027\u80fd\u4f18\u4e8e\u5f00\u6e90\u7684LLMs\u3002|\n", "2406.17770": "|**2024-06-25**|**MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning**|Xiangyu Zhao et.al.|[2406.17770](http://arxiv.org/abs/2406.17770)|**[link](https://github.com/phoenixz810/mg-llava)**|**## \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u6a21\u578b\u5c40\u9650\u4e8e\u5904\u7406\u4f4e\u5206\u8fa8\u7387\u56fe\u50cf\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5728\u9700\u8981\u8be6\u7ec6\u89c6\u89c9\u4fe1\u606f\u7684\u611f\u77e5\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684MLLM\u2014\u2014MG-LLaVA\uff0c\u901a\u8fc7\u5f15\u5165\u591a\u5c3a\u5ea6\u89c6\u89c9\u6d41\uff0c\u5305\u62ec\u4f4e\u5206\u8fa8\u7387\u3001\u9ad8\u5206\u8fa8\u7387\u548c\u5bf9\u8c61\u7ea7\u7279\u5f81\uff0c\u6765\u589e\u5f3a\u6a21\u578b\u7684\u89c6\u89c9\u5904\u7406\u80fd\u529b\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u9ad8\u5206\u8fa8\u7387\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u4ee5\u6355\u6349\u7cbe\u7ec6\u7ec6\u8282\uff0c\u5e76\u901a\u8fc7\u5377\u79ef\u95e8\u878d\u5408\u7f51\u7edc\u4e0e\u57fa\u7840\u89c6\u89c9\u7279\u5f81\u878d\u5408\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u578b\u7684\u5bf9\u8c61\u8bc6\u522b\u80fd\u529b\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u6765\u81ea\u79bb\u7ebf\u68c0\u6d4b\u5668\u786e\u5b9a\u7684\u8fb9\u754c\u6846\u7684\u7269\u4f53\u7ea7\u522b\u7279\u5f81\u3002MG-LLaVA\u4ec5\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\uff0c\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u611f\u77e5\u80fd\u529b\u3002\u6211\u4eec\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u8bed\u8a00\u7f16\u7801\u5668\uff08\u4ece38\u4ebf\u5230340\u4ebf\u53c2\u6570\uff09\u5b9e\u4f8b\u5316MG-LLaVA\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u5176\u6027\u80fd\u3002\u591a\u9879\u57fa\u51c6\u6d4b\u8bd5\u7684\u7ed3\u679c\u8868\u660e\uff0cMG-LLaVA\u5728\u540c\u7c7b\u53c2\u6570\u91cf\u7684\u73b0\u6709MLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u51fa\u8272\u7684\u6548\u7387\u3002\u4ee3\u7801\u5c06\u5728https://github.com/PhoenixZ810/MG-LLaVA\u4e0a\u5f00\u6e90\u3002**|\n", "2406.17764": "|**2024-06-25**|**BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning**|Ercong Nie et.al.|[2406.17764](http://arxiv.org/abs/2406.17764)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u79ef\u7d2f\u4e86\u4e30\u5bcc\u7684\u53c2\u6570\u77e5\u8bc6\uff0c\u4f46\u7531\u4e8e\u91cd\u65b0\u8bad\u7ec3\u6210\u672c\u9ad8\u6602\u4e14\u5bf9\u95ed\u6e90\u6a21\u578b\u4e0d\u53ef\u884c\uff0c\u66f4\u65b0\u8fd9\u4e9b\u77e5\u8bc6\u53d8\u5f97\u56f0\u96be\u3002\u77e5\u8bc6\u7f16\u8f91\uff08KE\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5141\u8bb8\u5728\u4e0d\u635f\u5bb3\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u66f4\u65b0LLMs\u7684\u77e5\u8bc6\u3002\u57fa\u4e8e\u201c\u4e0a\u4e0b\u6587\u5b66\u4e60\u201d\uff08ICL\uff09\u7684\u5373\u5e2dKE\u65b9\u6cd5\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97LLMs\u80fd\u591f\u4f5c\u4e3a\u9ed1\u76d2\u5904\u7406\u3002\u8fc7\u53bb\uff0cKE\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u73af\u5883\uff0c\u800c\u5f53\u524d\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLMs\u5728\u8de8\u8bed\u8a00KE\u65b9\u9762\u7684\u6f5c\u529b\u5c1a\u672a\u5145\u5206\u6316\u6398\u3002\u4e3a\u4e86\u63a8\u52a8\u8fd9\u65b9\u9762\u7684\u66f4\u591a\u7814\u7a76\uff0c\u6211\u4eec\u63a8\u51fa\u4e86BMIKE-53\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u9488\u5bf953\u79cd\u4e0d\u540c\u8bed\u8a00\u7684\u4e09\u79cdKE\u4efb\u52a1\u7c7b\u578b\u8fdb\u884c\u8bc4\u4f30\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u68af\u5ea6\u7684KE\u65b9\u6cd5\u2014\u2014\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u77e5\u8bc6\u7f16\u8f91\uff08MIKE\uff09\uff0c\u5e76\u5728BMIKE-53\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u5173\u6ce8\u8de8\u8bed\u8a00\u77e5\u8bc6\u8f6c\u79fb\u7684\u53ef\u9760\u6027\u3001\u6cdb\u5316\u6027\u3001\u5c40\u90e8\u6027\u548c\u53ef\u79fb\u690d\u6027\uff0c\u4e3a\u672a\u6765\u8de8\u8bed\u8a00KE\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c2\u70b9\u548c\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u5df2\u901a\u8fc7\u533f\u540d\u4ed3\u5e93https://anonymous.4open.science/r/MIKE\u516c\u5f00\u83b7\u53d6\u3002|\n", "2406.17761": "|**2024-06-25**|**CaLMQA: Exploring culturally specific long-form question answering across 23 languages**|Shane Arora et.al.|[2406.17761](http://arxiv.org/abs/2406.17761)|**[link](https://github.com/2015aroras/calmqa)**|**## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u957f\u7bc7\u95ee\u7b54\u4efb\u52a1\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u5b83\u4eec\u9700\u751f\u6210\u6bb5\u843d\u7ea7\u522b\u7684\u7b54\u6848\u6765\u56de\u5e94\u590d\u6742\u95ee\u9898\u3002\u5c3d\u7ba1\u82f1\u8bed\u7684\u957f\u7bc7\u95ee\u7b54\u7814\u7a76\u5df2\u76f8\u5f53\u6df1\u5165\uff0c\u6d89\u53ca\u591a\u79cd\u6570\u636e\u96c6\u548c\u8bc4\u4f30\u6307\u6807\uff0c\u4f46\u5176\u4ed6\u8bed\u8a00\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63a8\u51fa\u4e86CaLMQA\uff0c\u4e00\u4e2a\u5305\u542b2,600\u4e2a\u8de823\u79cd\u8bed\u8a00\u7684\u590d\u6742\u95ee\u9898\u96c6\u5408\uff0c\u5176\u4e2d\u5305\u62ec\u8d44\u6e90\u6709\u9650\u3001\u9c9c\u5c11\u7814\u7a76\u7684\u8bed\u8a00\uff0c\u5982\u6590\u6d4e\u8bed\u548c\u57fa\u6797\u8fea\u8bed\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u65e2\u5305\u62ec\u793e\u533a\u7f51\u7edc\u8bba\u575b\u4e0a\u6536\u96c6\u7684\u81ea\u7136\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u4e5f\u5305\u542b\u4e86\u7531\u6bcd\u8bed\u4f7f\u7528\u8005\u64b0\u5199\u7684\u9898\u76ee\uff0c\u6211\u4eec\u4e3a\u6b64\u4e13\u95e8\u8058\u8bf7\u4e86\u4ed6\u4eec\u3002\u8fd9\u4e2a\u8fc7\u7a0b\u4ea7\u751f\u4e86\u591a\u6837\u4e14\u590d\u6742\u7684\u9898\u76ee\uff0c\u53cd\u6620\u4e86\u6587\u5316\u4e3b\u9898\uff08\u5982\u4f20\u7edf\u3001\u6cd5\u5f8b\u3001\u65b0\u95fb\uff09\uff0c\u4ee5\u53ca\u6bcd\u8bed\u4f7f\u7528\u8005\u7684\u8bed\u8a00\u4e60\u60ef\u3002 \u6211\u4eec\u5bf9\u4e00\u7cfb\u5217\u5f00\u6e90\u548c\u95ed\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u81ea\u52a8\u8bc4\u4f30\uff0c\u4f7f\u7528\u4e86\u6211\u4eec\u65b0\u63d0\u51fa\u7684CaLMScore\u6307\u6807\uff0c\u8be5\u6307\u6807\u80fd\u68c0\u6d4b\u7b54\u6848\u4e2d\u7684\u8bed\u8a00\u9519\u8bef\u548c\u91cd\u590d\u8bcd\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u67d0\u4e9b\u4f4e\u8d44\u6e90\u8bed\u8a00\uff0cLLM\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u660e\u663e\u4e0b\u964d\u3002\u6211\u4eec\u5728\u90e8\u5206\u6a21\u578b\u7684\u4eba\u5de5\u8bc4\u4f30\u4e2d\u53d1\u73b0\uff0c\u5bf9\u4e8e\u5177\u6709\u6587\u5316\u7279\u6027\u7684\u95ee\u9898\uff0c\u6a21\u578b\u8868\u73b0\u663e\u8457\u4f4e\u4e8e\u6587\u5316\u4e2d\u7acb\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u5bf9LLM\u591a\u8bed\u8a00\u80fd\u529b\u53ca\u975e\u82f1\u8bed\u957f\u7bc7\u95ee\u7b54\u8bc4\u4ef7\u9886\u57df\u66f4\u6df1\u5165\u7814\u7a76\u7684\u5fc5\u8981\u6027\u3002**|\n", "2406.17755": "|**2024-06-25**|**Accelerating Clinical Evidence Synthesis with Large Language Models**|Zifeng Wang et.al.|[2406.17755](http://arxiv.org/abs/2406.17755)|null|\u4eba\u5de5\u667a\u80fd\u81ea\u52a8\u533b\u5b66\u53d1\u73b0\u662f\u8bb8\u591a\u4eba\u7684\u68a6\u60f3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTrialMind\u7684\u751f\u6210\u5f0fAI\u7ba1\u9053\uff0c\u65e8\u5728\u8fdb\u884c\u533b\u5b66\u7cfb\u7edf\u6027\u56de\u987e\uff0c\u6db5\u76d6\u7814\u7a76\u641c\u7d22\u3001\u7b5b\u9009\u548c\u6570\u636e\u63d0\u53d6\u9636\u6bb5\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u6bcf\u4e2a\u73af\u8282\uff0c\u5e76\u5f15\u5165\u4e13\u5bb6\u76d1\u7763\u4ee5\u51cf\u5c11\u9519\u8bef\u3002\u4e3a\u4e86\u8bc4\u4f30\u6027\u80fd\uff0c\u6211\u4eec\u521b\u5efa\u4e86TrialReviewBench\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u662f\u4e00\u4e2a\u5b9a\u5236\u7684\u5305\u542b870\u4efd\u6765\u81ea25\u7bc7\u5143\u5206\u6790\u8bba\u6587\u7684\u4e34\u5e8a\u7814\u7a76\u6807\u6ce8\u6570\u636e\uff0c\u6db5\u76d6\u4e0d\u540c\u533b\u7597\u6cbb\u7597\u9886\u57df\u3002\u7ed3\u679c\u663e\u793a\uff0cTrialMind\u663e\u8457\u63d0\u5347\u4e86\u6587\u732e\u5ba1\u67e5\u6548\u7387\uff0c\u5728\u4ece\u8d85\u8fc72000\u4e07\u7bc7PubMed\u7814\u7a76\u4e2d\u68c0\u7d22\u76f8\u5173\u7814\u7a76\u65f6\uff0c\u53ec\u56de\u7387\u9ad8\u8fbe0.897\u81f31.000\u3002\u5728\u7b5b\u9009\u9636\u6bb5\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4f18\u4e8e\u57fa\u4e8e\u4f20\u7edf\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7684\u65b9\u6cd5\uff08\u53ec\u56de\u7387\u5206\u522b\u4e3a0.227-0.246 vs. 0.000-0.102\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ed3\u679c\u63d0\u53d6\u65b9\u9762\u8d85\u8d8a\u4e86\u76f4\u63a5\u4f7f\u7528GPT-4\u7684\u8868\u73b0\uff0c\u51c6\u786e\u7387\u8303\u56f4\u4e3a0.65\u52300.84\u3002\u6211\u4eec\u8fd8\u652f\u6301\u68ee\u6797\u56fe\u4e2d\u7684\u4e34\u5e8a\u8bc1\u636e\u7efc\u5408\uff0c\u7ecf\u516b\u540d\u4eba\u7c7b\u6807\u6ce8\u5458\u9a8c\u8bc1\uff0c\u4ed6\u4eec\u666e\u904d\u66f4\u504f\u597dTrialMind\uff0c\u5176\u5728\u6d89\u53ca\u7684\u5ba1\u67e5\u4e2d\u80dc\u51fa\u7387\u4e3a62.5%\u81f3100%\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u5982TrialMind\uff0c\u80fd\u591f\u4fc3\u8fdb\u53ef\u9760\u4e14\u9ad8\u8d28\u91cf\u7684\u4e34\u5e8a\u8bc1\u636e\u5408\u6210\uff0c\u4ece\u800c\u63d0\u5347\u4e34\u5e8a\u7814\u7a76\u7684\u6548\u7387\u3002|\n", "2406.17753": "|**2024-06-25**|**Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language**|Amalie Brogaard Pauli et.al.|[2406.17753](http://arxiv.org/abs/2406.17753)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u9762\u5bf9\u5927\u91cf\u8bd5\u56fe\u5f71\u54cd\u6211\u4eec\u7684\u4fe1\u606f\uff0c\u5982\u9884\u544a\u6d88\u606f\u3001\u8fa9\u8bba\u3001\u5e26\u6709\u653f\u6cbb\u8272\u5f69\u7684\u65b0\u95fb\u548c\u5ba3\u4f20\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5177\u6709\u8bf4\u670d\u529b\u6587\u672c\u7684\u80fd\u529b\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u4e13\u6ce8\u4e8e\u7279\u5b9a\u9886\u57df\u6216\u7c7b\u578b\u529d\u8bf4\u7684\u7814\u7a76\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u5206\u6790\uff0c\u65e8\u5728\u6d4b\u91cf\u548c\u57fa\u51c6LLMs\u5728\u88ab\u660e\u786e\u8981\u6c42\u589e\u5f3a\u6216\u51cf\u5c11\u8bf4\u670d\u529b\u65f6\uff0c\u4ee5\u53ca\u4ec5\u8981\u6c42\u8fdb\u884c\u91ca\u4e49\u65f6\u4ea7\u751f\u8bf4\u670d\u6027\u6587\u672c\u7684\u7a0b\u5ea6\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6\u2014\u2014\u201cPersuasive-Pairs\u201d\uff0c\u5305\u542b\u4e00\u7ec4\u7531\u7b80\u77ed\u6587\u672c\u548cLLM\u91cd\u5199\u4ee5\u653e\u5927\u6216\u524a\u5f31\u8bf4\u670d\u529b\u7684\u6587\u672c\u5bf9\u3002\u6211\u4eec\u5bf9\u8fd9\u4e9b\u914d\u5bf9\u8fdb\u884c\u4e86\u591a\u6807\u6ce8\uff0c\u6309\u76f8\u5bf9\u5c3a\u5ea6\u8bc4\u4f30\u5176\u8bf4\u670d\u529b\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u4e0d\u4ec5\u672c\u8eab\u5177\u6709\u4ef7\u503c\uff0c\u8fd8\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5b83\u8bad\u7ec3\u4e00\u4e2a\u56de\u5f52\u6a21\u578b\uff0c\u9884\u6d4b\u6587\u672c\u5bf9\u4e4b\u95f4\u8bf4\u670d\u529b\u7684\u5f97\u5206\uff0c\u4ece\u800c\u80fd\u591f\u5bf9\u4e0d\u540c\u9886\u57df\u7684LLMs\u8fdb\u884c\u8bc4\u5206\u548c\u6bd4\u8f83\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e0d\u540c\u7cfb\u7edf\u63d0\u793a\u5bf9LLaMA3\u4ea7\u751f\u7684\u5f71\u54cd\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5373\u4f7f\u5728\u4ec5\u8981\u6c42\u91ca\u4e49\u7684\u60c5\u51b5\u4e0b\uff0c\u4e0d\u540c\u7684\u201c\u89d2\u8272\u201d\u63d0\u793a\u4e5f\u4f1a\u663e\u8457\u6539\u53d8\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u529b\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7814\u7a76LLM\u751f\u6210\u6587\u672c\u4e2d\u7684\u8bf4\u670d\u8bed\u8a00\u7684\u91cd\u8981\u6027\u3002|\n", "2406.17737": "|**2024-06-25**|**LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users**|Elinor Poole-Dayan et.al.|[2406.17737](http://arxiv.org/abs/2406.17737)|null|\u5728\u6700\u65b0\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u540c\u65f6\uff0c\u5173\u4e8e\u5b83\u4eec\u7684\u4e0d\u53ef\u9760\u884c\u4e3a\uff0c\u5982\u865a\u6784\u548c\u504f\u89c1\u7684\u7814\u7a76\u5c42\u51fa\u4e0d\u7a77\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86LLMs\u7684\u56de\u7b54\u8d28\u91cf\u5728\u4fe1\u606f\u51c6\u786e\u6027\u3001\u771f\u5b9e\u6027\u4ee5\u53ca\u62d2\u7edd\u56de\u7b54\u65b9\u9762\uff0c\u5982\u4f55\u968f\u7740\u4e09\u79cd\u7528\u6237\u7279\u5f81\u7684\u53d8\u5316\u800c\u53d8\u5316\uff1a\u82f1\u8bed\u6c34\u5e73\u3001\u6559\u80b2\u7a0b\u5ea6\u548c\u56fd\u7c4d\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u6700\u5148\u8fdb\u7684LLMs\u548c\u4e24\u4e2a\u4e8b\u5b9e\u6838\u67e5\u76f8\u5173\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8be6\u5c3d\u5b9e\u9a8c\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u771f\u5b9e\u6027\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u5bf9\u82f1\u8bed\u80fd\u529b\u8f83\u4f4e\u3001\u6559\u80b2\u6c34\u5e73\u8f83\u4f4e\u4ee5\u53ca\u975e\u7f8e\u56fd\u7c4d\u7528\u6237\u7684\u56de\u7b54\u8d28\u91cf\u5b58\u5728\u66f4\u660e\u663e\u7684\u8d1f\u9762\u503e\u5411\uff0c\u8fd9\u4f7f\u5f97\u8fd9\u4e9b\u6a21\u578b\u5bf9\u4e8e\u5176\u6700\u5f31\u52bf\u7528\u6237\u6765\u8bf4\uff0c\u5e76\u975e\u53ef\u9760\u7684\u4fe1\u606f\u6765\u6e90\u3002|\n", "2406.17706": "|**2024-06-25**|**FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model**|Feijie Wu et.al.|[2406.17706](http://arxiv.org/abs/2406.17706)|**[link](https://github.com/HarliWu/FedBiOT)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u8fc7\u9002\u5f53\u9886\u57df\u7279\u5b9a\u6570\u636e\u7684\u5fae\u8c03\u540e\uff0c\u5728\u8bb8\u591a\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u51fa\u8272\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u4e13\u7528\u6570\u636e\u901a\u5e38\u5206\u5e03\u5728\u591a\u4e2a\u6240\u6709\u8005\u4e4b\u95f4\uff0c\u8fd9\u5c31\u63d0\u51fa\u4e86\u5982\u4f55\u5728\u8054\u90a6\u5b66\u4e60\uff08FL\uff09\u4e2d\u8fdb\u884cLLM\u5fae\u8c03\u7684\u95ee\u9898\u3002\u9762\u5bf9\u6709\u9650\u7684\u8ba1\u7b97\u548c\u901a\u4fe1\u80fd\u529b\uff0cFL\u5ba2\u6237\u7aef\u5728\u6709\u6548\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86FedBiOT\uff0c\u4e00\u79cd\u65e8\u5728\u63d0\u9ad8\u8d44\u6e90\u6548\u7387\u7684LLM\u5fae\u8c03FL\u65b9\u6cd5\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u670d\u52a1\u5668\u751f\u6210\u4e00\u4e2a\u538b\u7f29\u7684LLM\uff0c\u5e76\u786e\u4fdd\u5176\u6027\u80fd\u4e0e\u5b8c\u6574\u6a21\u578b\u76f8\u5f53\u3002\u7136\u540e\uff0c\u5ba2\u6237\u7aef\u9488\u5bf9\u8fd9\u4e2a\u538b\u7f29\u6a21\u578b\u7684\u4e00\u4e2a\u8f7b\u91cf\u4f46\u91cd\u8981\u7684\u90e8\u5206\u2014\u2014\u9002\u914d\u5668\u8fdb\u884c\u5fae\u8c03\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7531\u4e8e\u670d\u52a1\u5668\u65e0\u6cd5\u8bbf\u95ee\u5ba2\u6237\u7aef\u62e5\u6709\u7684\u79c1\u4eba\u6570\u636e\uff0c\u670d\u52a1\u5668\u7528\u4e8e\u6821\u51c6\u7684\u6570\u636e\u5206\u5e03\u4e0e\u5ba2\u6237\u7aef\u7528\u4e8e\u5fae\u8c03\u7684\u6570\u636e\u4e0d\u540c\u3002\u6211\u4eec\u5c06\u95ee\u9898\u5efa\u6a21\u4e3a\u4e00\u4e2a\u5e26\u6709\u6570\u636e\u4e0d\u4e00\u81f4\u6027\u5f71\u54cd\u7684 bilevel \u4f18\u5316\u95ee\u9898\uff0c\u5e76\u5bfc\u51fa\u4e86\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u7684\u66f4\u65b0\u89c4\u5219\u3002\u6211\u4eec\u5728 LLaMA-2 \u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u9002\u914d\u5668\u5728\u91cd\u65b0\u6574\u5408\u5230\u5168\u5c40\u8bed\u8a00\u6a21\u578b\u65f6\u8868\u73b0\u51fa\u8272\u3002\u5b9e\u9a8c\u7ed3\u679c\u8fd8\u8868\u660e\uff0cFedBiOT \u76f8\u6bd4\u73b0\u6709\u57fa\u51c6\u663e\u8457\u51cf\u5c11\u4e86\u8d44\u6e90\u6d88\u8017\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u76f8\u8fd1\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692](http://arxiv.org/abs/2406.17692)|**[link](https://github.com/thomlake/investigating-alignment)**|**\u8be5\u7814\u7a76\u5206\u6790\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ecf\u8fc7\u6821\u51c6\u540e\u8f93\u51fa\u5206\u5e03\u7684\u53d8\u5316\u7279\u6027\u3002\u9996\u5148\uff0c\u91cd\u65b0\u8bc4\u4f30\u4e86\u4e4b\u524d\u5173\u4e8e\u6821\u51c6\u540e\u54cd\u5e94\u591a\u6837\u6027\u964d\u4f4e\u7684\u62a5\u544a\uff0c\u53d1\u73b0\u8fd9\u79cd\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u8d28\u91cf\u63a7\u5236\u548c\u4fe1\u606f\u6574\u5408\u3002\u6821\u51c6\u80fd\u591f\u6291\u5236\u4e0d\u76f8\u5173\u548c\u65e0\u5e2e\u52a9\u7684\u5185\u5bb9\uff0c\u540c\u65f6\u4f7f\u8f93\u51fa\u5206\u5e03\u503e\u5411\u4e8e\u66f4\u957f\u7684\u3001\u6db5\u76d6\u591a\u4e2a\u57fa\u7840LLM\u54cd\u5e94\u4fe1\u606f\u7684\u7b54\u6848\uff0c\u5b9e\u8d28\u4e0a\u662f\u5c06\u591a\u6837\u5316\u4fe1\u606f\u6c47\u603b\u5728\u5355\u4e2a\u54cd\u5e94\u4e2d\u3002\u7814\u7a76\u5e76\u672a\u53d1\u73b0\u6821\u51c6\u663e\u8457\u51cf\u5c11\u6709\u7528\u4fe1\u606f\uff0c\u8fdb\u800c\u5f15\u51fa\u95ee\u9898\uff1a\u6821\u51c6\u6a21\u578b\u662f\u5426\u4f1a\u4ea7\u751f\u57fa\u7840\u6a21\u578b\u65e0\u6cd5\u518d\u73b0\u7684\u4fe1\u606f\uff1f\u7b2c\u4e8c\u90e8\u5206\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u60c5\u51b5\u5e76\u975e\u5982\u6b64\uff0c\u6821\u51c6\u6a21\u578b\u7684\u884c\u4e3a\u53ef\u4ee5\u901a\u8fc7\u57fa\u7840\u6a21\u578b\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u590d\u73b0\u3002\u901a\u8fc7\u4e0a\u4e0b\u6587\u793a\u4f8b\u548c\u8f83\u4f4e\u5206\u8fa8\u7387\u7684\u8bed\u4e49\u63d0\u793a\uff0c\u53ef\u4ee5\u4ece\u57fa\u7840LLMs\u5f15\u5bfc\u51fa\u4e0e\u6821\u51c6\u540e\u7684\u76f8\u4f3c\u54cd\u5e94\uff0c\u751a\u81f3\u4e0e\u6821\u51c6\u540e\u7684\u54cd\u5e94\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u63a5\u8fd1\u3002\u8fd9\u4e9b\u53d1\u73b0\u652f\u6301\u201c\u8868\u9762\u6821\u51c6\u5047\u8bbe\u201d\uff0c\u5373\u5f53\u524d\u7684\u6821\u51c6\u6280\u672f\u4ec5\u6355\u6349\u4e86\u52a9\u624b\u578b\u57fa\u7840LLM\u884c\u4e3a\u4e2d\u6709\u7528\u7684\u90e8\u5206\uff0c\u5e76\u672a\u6269\u5c55\u5176\u80fd\u529b\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u8fd8\u663e\u793a\uff0c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u6821\u51c6\u4f5c\u4e3a\u4e00\u79cd\u6a21\u4eff\u6821\u51c6LLMs\u7684\u7b56\u7565\uff0c\u6548\u679c\u51fa\u4eba\u610f\u6599\u5730\u597d\uff0c\u4e14\u65e0\u9700\u5fae\u8c03\u3002\u7814\u7a76\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2406.17681": "|**2024-06-25**|**VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation**|Kun Qian et.al.|[2406.17681](http://arxiv.org/abs/2406.17681)|**[link](https://github.com/qbetterk/VarBench)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4f20\u7edf\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u65e5\u76ca\u51fa\u8272\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7814\u7a76\u4eba\u5458\u5f00\u59cb\u5173\u6ce8\u9884\u8bad\u7ec3\u671f\u95f4\u7684\u57fa\u51c6\u6570\u636e\u6cc4\u9732\u95ee\u9898\uff0c\u901a\u5e38\u79f0\u4e3a\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002\u4e3a\u4e86\u786e\u4fdd\u516c\u6b63\u7684\u8bc4\u4f30\uff0c\u6700\u8fd1\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u516c\u5f00\u8bad\u7ec3\u548c\u9a8c\u8bc1\u96c6\uff0c\u5bf9\u6d4b\u8bd5\u96c6\u6807\u7b7e\u4fdd\u5bc6\u3002\u4ed6\u4eec\u8981\u6c42\u4efb\u4f55\u5e0c\u671b\u8bc4\u4f30\u81ea\u5df1\u8bed\u8a00\u6a21\u578b\u7684\u4eba\u90fd\u9700\u8981\u63d0\u4ea4\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\uff0c\u8fdb\u884c\u96c6\u4e2d\u5904\u7406\uff0c\u7136\u540e\u5728\u6392\u884c\u699c\u4e0a\u516c\u5e03\u6a21\u578b\u7684\u5f97\u5206\u3002\u7136\u800c\uff0c\u8fd9\u4e2a\u63d0\u4ea4\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u59a8\u788d\u4e86\u6709\u6548\u7684\u9519\u8bef\u5206\u6790\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u52a8\u6001\u5316\u57fa\u51c6\u6d4b\u8bd5\u5e76\u5b9e\u65f6\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u6bcf\u4e2a\u6d4b\u8bd5\u6848\u4f8b\u4e2d\u63d0\u53d6\u53d8\u91cf\uff0c\u5e76\u4e3a\u6bcf\u4e2a\u53d8\u91cf\u5b9a\u4e49\u4e00\u4e2a\u503c\u8303\u56f4\u3002\u6bcf\u6b21\u8bc4\u4f30\u65f6\uff0c\u6211\u4eec\u4f1a\u4ece\u8fd9\u4e9b\u503c\u57df\u4e2d\u62bd\u53d6\u65b0\u7684\u503c\u6765\u521b\u5efa\u72ec\u7279\u7684\u6d4b\u8bd5\u6848\u4f8b\uff0c\u4ece\u800c\u4fdd\u8bc1\u6bcf\u6b21\u90fd\u662f\u5168\u65b0\u7684\u8bc4\u4f30\u3002 \u6211\u4eec\u9488\u5bf9\u6570\u5b66\u751f\u6210\u4efb\u52a1\u7684GSM8K\u3001\u591a\u9879\u9009\u62e9\u4efb\u52a1\u7684ARC\u3001commonsense\u95ee\u7b54\u7684CommonsenseQA\u4ee5\u53caTruthfulQA\u7684\u771f\u5b9e\u6027\u95ee\u7b54\u4efb\u52a1\uff0c\u5e94\u7528\u4e86\u8fd9\u79cd\u53d8\u91cf\u6270\u52a8\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u66f4\u51c6\u786e\u5730\u8861\u91cf\u8bed\u8a00\u6a21\u578b\u7684\u771f\u5b9e\u80fd\u529b\uff0c\u6709\u6548\u7f13\u89e3\u4e86\u6570\u636e\u6c61\u67d3\u95ee\u9898\u3002|\n", "2406.17675": "|**2024-06-25**|**Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models**|Yuan Li et.al.|[2406.17675](http://arxiv.org/abs/2406.17675)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u4efb\u52a1\u89e3\u51b3\u80fd\u529b\uff0c\u65e5\u76ca\u626e\u6f14\u7c7b\u4f3c\u4eba\u7c7b\u52a9\u624b\u7684\u89d2\u8272\u3002\u793e\u4f1a\u5bf9\u5c06LLMs\u66f4\u5e7f\u6cdb\u5730\u878d\u5165\u5176\u4e2d\u4ea7\u751f\u4e86\u5174\u8da3\uff0c\u63a2\u8ba8\u5b83\u4eec\u662f\u5426\u5177\u5907\u5fc3\u7406\u7279\u8d28\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u7279\u8d28\u662f\u5426\u7a33\u5b9a\u4e14\u6709\u52a9\u4e8e\u7406\u89e3\u5176\u884c\u4e3a\u3002\u672c\u6587\u501f\u9274\u5fc3\u7406\u5b66\u6d4b\u91cf\u5b66\u7684\u65b9\u6cd5\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u7528\u4e8e\u7814\u7a76LLMs\u4e2d\u7684\u5fc3\u7406\u5b66\uff0c\u5305\u62ec\u5fc3\u7406\u7ef4\u5ea6\u8bc6\u522b\u3001\u8bc4\u4f30\u6570\u636e\u96c6\u521b\u5efa\u548c\u7ed3\u679c\u9a8c\u8bc1\u3002\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5168\u9762\u7684LLM\u5fc3\u7406\u6d4b\u91cf\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u516d\u79cd\u5fc3\u7406\u7ef4\u5ea6\uff1a\u4e2a\u6027\u3001\u4ef7\u503c\u89c2\u3001\u60c5\u7eea\u3001\u5fc3\u667a\u7406\u8bba\u3001\u52a8\u673a\u548c\u667a\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u542b\u4e86\u5341\u4e09\u4e2a\u5305\u542b\u591a\u6837\u573a\u666f\u548c\u9898\u578b\u7684\u6570\u636e\u96c6\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLMs\u5c55\u73b0\u51fa\u5e7f\u6cdb\u7684\u5fc3\u7406\u7279\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u89c2\u5bdf\u5230LLMs\u5728\u81ea\u6211\u62a5\u544a\u7684\u7279\u8d28\u4e0e\u5176\u5b9e\u9645\u884c\u4e3a\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u3002\u8be5\u8bba\u6587\u8be6\u7ec6\u5c55\u793a\u4e86LLMs\u7684\u5fc3\u7406\u6d4b\u91cf\u8bc4\u4f30\uff0c\u4e3aAI\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u53ef\u9760\u8bc4\u4f30\u63d0\u4f9b\u4e86\u6d1e\u89c1\uff0c\u4ee5\u53ca\u53ef\u80fd\u7684\u5e94\u7528\u65b9\u5411\u3002|\n", "2406.18532": "|**2024-06-26**|**Symbolic Learning Enables Self-Evolving Agents**|Wangchunshu Zhou et.al.|[2406.18532](http://arxiv.org/abs/2406.18532)|**[link](https://github.com/aiwaves-cn/agents)**|**\u4eba\u5de5\u667a\u80fd\u754c\u901a\u8fc7\u6784\u5efa\"\u8bed\u8a00\u4ee3\u7406\"\uff08\u5373\u590d\u6742\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ba1\u9053\uff09\u6765\u63a2\u5bfb\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u9053\u8def\uff0c\u8fd9\u4e9b\u6a21\u578b\u7ed3\u5408\u4e86\u63d0\u793a\u6280\u672f\u548c\u5de5\u5177\u4f7f\u7528\u65b9\u6cd5\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u4f17\u591a\u5b9e\u9645\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f53\u524d\u8bed\u8a00\u4ee3\u7406\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u5c40\u9650\u662f\u5176\u6a21\u578b\u4e2d\u5fc3\u6216\u5de5\u7a0b\u5bfc\u5411\uff1a\u63d0\u793a\u3001\u5de5\u5177\u548c\u7ba1\u9053\u7684\u6539\u8fdb\u4f9d\u8d56\u4e8e\u5927\u91cf\u7684\u4eba\u5de5\u4e13\u5bb6\u8bbe\u8ba1\uff0c\u800c\u975e\u81ea\u52a8\u4ece\u6570\u636e\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u4ece\u6a21\u578b\u4e2d\u5fc3\u5411\u6570\u636e\u4e2d\u5fc3\u8f6c\u53d8\u2014\u2014\u8ba9\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u81ea\u4e3b\u5b66\u4e60\u548c\u9002\u5e94\u73af\u5883\uff0c\u662f\u5b83\u4eec\u8fc8\u5411AGI\u7684\u5173\u952e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\"\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\"\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\u6027\u7684\u65b9\u6cd5\uff0c\u5b83\u4f7f\u8bed\u8a00\u4ee3\u7406\u80fd\u591f\u5728\u6570\u636e\u9a71\u52a8\u7684\u65b9\u5f0f\u4e0b\u81ea\u6211\u4f18\u5316\uff0c\u5229\u7528\u7b26\u53f7\u4f18\u5316\u5668\u3002\u6211\u4eec\u5c06\u4ee3\u7406\u89c6\u4e3a\u5177\u6709\u53ef\u5b66\u4e60\u6743\u91cd\u7684\u7b26\u53f7\u7f51\u7edc\uff0c\u8fd9\u4e9b\u6743\u91cd\u7531\u63d0\u793a\u3001\u5de5\u5177\u53ca\u5176\u7ec4\u5408\u65b9\u5f0f\u5b9a\u4e49\u3002\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u65e8\u5728\u6a21\u4eff\u8fde\u63a5\u4e3b\u4e49\u5b66\u4e60\u4e2d\u7684\u4e24\u4e2a\u57fa\u672c\u7b97\u6cd5\uff1a\u53cd\u5411\u4f20\u64ad\u548c\u68af\u5ea6\u4e0b\u964d\uff0c\u4f46\u5b83\u5904\u7406\u7684\u662f\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u7684\u6743\u91cd\u3001\u635f\u5931\u548c\u68af\u5ea6\u3002\u6211\u4eec\u5728\u6807\u51c6\u57fa\u51c6\u548c\u590d\u6742\u73b0\u5b9e\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u6982\u5ff5\u9a8c\u8bc1\u5b9e\u9a8c\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u4ee3\u7406\u7b26\u53f7\u5b66\u4e60\u4f7f\u5f97\u8bed\u8a00\u4ee3\u7406\u5728\u521b\u5efa\u548c\u90e8\u7f72\u540e\u80fd\u591f\u81ea\u6211\u66f4\u65b0\uff0c\u5b9e\u73b0\u4e86\"\u81ea\u6211\u8fdb\u5316\u7684\u4ee3\u7406\"\u3002**|\n", "2406.18528": "|**2024-06-26**|**PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation**|Christoph Leiter et.al.|[2406.18528](http://arxiv.org/abs/2406.18528)|**[link](https://github.com/gringham/prexme)**|## \u7ffb\u8bd1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u5b83\u4eec\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\u4f7f\u5176\u6210\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u8bc4\u4ef7\u7684\u6709\u529b\u5de5\u5177\uff0c\u7279\u522b\u9002\u7528\u4e8e\u8d44\u6e90\u532e\u4e4f\u548c\u65f6\u95f4\u9650\u5236\u7684\u573a\u666f\u3002\u672c\u6587\u63d0\u51faPrExMe\uff0c\u4e00\u9879\u5927\u89c4\u6a21\u7684\u63d0\u793a\u63a2\u7d22\u5ea6\u91cf\u6cd5\uff0c\u6211\u4eec\u5728\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u548c\u6458\u8981\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u8d85\u8fc7720\u79cd\u5f00\u6e90LLM\u4f5c\u4e3a\u5ea6\u91cf\u6807\u51c6\u7684\u6a21\u677f\uff0c\u603b\u8ba1\u7ea6660\u4e07\u6b21\u8bc4\u4f30\u3002\u8fd9\u9879\u8be6\u5c3d\u7684\u6bd4\u8f83\uff081\uff09\u4e3a\u8fd1\u671f\u5f00\u6e90LLMs\u4f5c\u4e3a\u8bc4\u4ef7\u6307\u6807\u7684\u8868\u73b0\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff1b\uff082\uff09\u63a2\u8ba8\u4e86\u4e0d\u540c\u63d0\u793a\u7b56\u7565\u7684\u7a33\u5b9a\u6027\u548c\u53d8\u5f02\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u4e00\u65b9\u9762\uff0c\u5b58\u5728\u4e00\u4e9b\u60c5\u51b5\u4e0b\u63d0\u793a\u8868\u73b0\u7a33\u5b9a\uff1a\u6709\u4e9bLLMs\u8868\u73b0\u51fa\u7279\u6709\u7684\u504f\u597d\uff0c\u503e\u5411\u4e8e\u4f7f\u7528\u6587\u672c\u6807\u7b7e\u6765\u8bc4\u5206\uff0c\u800c\u53e6\u4e00\u4e9b\u5219\u503e\u5411\u4e8e\u8fd4\u56de\u6570\u503c\u5206\u6570\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u63d0\u793a\u7684\u7a33\u5b9a\u6027\u548c\u6a21\u578b\u6392\u540d\u53ef\u80fd\u53d7\u5230\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u66f4\u6539\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u5c06\u8f93\u51fa\u683c\u5f0f\u4ece\u201c0\u5230100\u201d\u6539\u4e3a\u201c-1\u5230+1\u201d\u53ef\u80fd\u4f1a\u663e\u8457\u6539\u53d8\u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u6709\u52a9\u4e8e\u7406\u89e3\u4e0d\u540c\u63d0\u793a\u65b9\u6cd5\u5bf9MT\u548c\u6458\u8981\u8bc4\u4ef7\u4e2dLLM-based\u5ea6\u91cf\u7684\u5f71\u54cd\uff0c\u63ed\u793a\u4e86\u6700\u7a33\u5b9a\u7684\u63d0\u793a\u6a21\u5f0f\uff0c\u5e76\u6307\u51fa\u4e86\u6f5c\u5728\u5c40\u9650\u6027\u3002|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521](http://arxiv.org/abs/2406.18521)|**[link](https://github.com/princeton-nlp/CharXiv)**|\u5728\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models\uff0cMLLMs\uff09\u5904\u7406\u79d1\u5b66\u8bba\u6587\u6216\u8d22\u52a1\u62a5\u544a\u7b49\u4efb\u52a1\u65f6\uff0c\u56fe\u8868\u7406\u89e3\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6570\u636e\u96c6\u5f80\u5f80\u96c6\u4e2d\u5728\u7b80\u5316\u548c\u540c\u8d28\u5316\u7684\u56fe\u8868\u4e0a\uff0c\u4ee5\u53ca\u57fa\u4e8e\u6a21\u677f\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u8bc4\u4f30\u8fc7\u4e8e\u4e50\u89c2\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5c3d\u7ba1\u5f00\u6e90\u6a21\u578b\u5728\u73b0\u6709\u57fa\u51c6\u4e0a\u53ef\u80fd\u8868\u73b0\u4f18\u4e8e\u5f3a\u5927\u7684\u4e13\u6709\u6a21\u578b\uff0c\u4f46\u901a\u8fc7\u7b80\u5355\u7684\u538b\u529b\u6d4b\u8bd5\uff0c\u5982\u6539\u53d8\u56fe\u8868\u6216\u95ee\u9898\uff0c\u6027\u80fd\u4f1a\u4e0b\u964d\u9ad8\u8fbe34.5%\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faCharXiv\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,323\u4e2a\u6765\u81eaarXiv\u8bba\u6587\u7684\u81ea\u7136\u3001\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u8868\u7684\u5168\u9762\u8bc4\u4f30\u5957\u4ef6\u3002CharXiv\u5305\u62ec\u4e24\u7c7b\u95ee\u9898\uff1a1\uff09\u63cf\u8ff0\u6027\u95ee\u9898\uff0c\u7528\u4e8e\u68c0\u67e5\u57fa\u672c\u56fe\u8868\u5143\u7d20\uff1b2\uff09\u63a8\u7406\u95ee\u9898\uff0c\u9700\u8981\u7efc\u5408\u5206\u6790\u56fe\u8868\u4e2d\u7684\u590d\u6742\u89c6\u89c9\u5143\u7d20\u3002\u6240\u6709\u56fe\u8868\u548c\u95ee\u9898\u90fd\u7531\u4e13\u5bb6\u7cbe\u5fc3\u6311\u9009\u3001\u6574\u7406\u548c\u9a8c\u8bc1\u4ee5\u4fdd\u8bc1\u8d28\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5f3a\u4e13\u6709\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\uff0c\u51c6\u786e\u7387\u4e3a47.1%\uff09\u4e0e\u6700\u5f3a\u5f00\u6e90\u6a21\u578b\uff08\u5982InternVL Chat V1.5\uff0c\u51c6\u786e\u7387\u4e3a29.2%\uff09\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u8ddd\uff0c\u800c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5747\u8fdc\u4f4e\u4e8e\u4eba\u7c7b\u768480.5%\u6c34\u5e73\uff0c\u8fd9\u63ed\u793a\u4e86\u73b0\u6709MLLM\u5728\u56fe\u8868\u7406\u89e3\u80fd\u529b\u4e0a\u7684\u4e0d\u8db3\u3002\u6211\u4eec\u5e0c\u671bCharXiv\u80fd\u63a8\u52a8\u672a\u6765\u7684\u7814\u7a76\uff0c\u901a\u8fc7\u63d0\u4f9b\u66f4\u771f\u5b9e\u3001\u66f4\u5177\u4ee3\u8868\u6027\u7684\u8fdb\u6b65\u8861\u91cf\u6807\u51c6\uff0c\u4fc3\u8fdb\u56fe\u8868\u7406\u89e3\u9886\u57df\u7684\u7814\u7a76\u3002\u9879\u76ee\u9875\u9762\u548c\u6392\u884c\u699c\u53ef\u8bbf\u95ee\uff1ahttps://charxiv.github.io/\u3002|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512](http://arxiv.org/abs/2406.18512)|null|### \u6982\u8ff0 \u89e3\u91ca\u662f\u77e5\u8bc6\u5171\u4eab\u7684\u6838\u5fc3\uff0c\u5b83\u5efa\u7acb\u5728\u6c9f\u901a\u539f\u7406\u3001\u793e\u4f1a\u52a8\u6001\u548c\u5b66\u4e60\u7406\u8bba\u4e4b\u4e0a\u3002\u6211\u4eec\u4e13\u6ce8\u4e8e\u5bf9\u8bdd\u5f0f\u7684\u89e3\u91ca\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5176\u73af\u5883\u9ad8\u5ea6\u9002\u5e94\u6027\u548c\u4ea4\u4e92\u6027\u3002\u6211\u4eec\u7684\u7814\u7a76\u5229\u7528\u4e86\u89e3\u91ca\u884c\u4e3a\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7406\u89e3\u89e3\u91ca\u8005\u548c\u88ab\u89e3\u91ca\u8005\u5728\u5bf9\u8bdd\u4e2d\u5982\u4f55\u8fd0\u7528\u7b56\u7565\u8fdb\u884c\u89e3\u91ca\u3001\u7406\u89e3\u548c\u4e92\u52a8\u7684\u5de5\u5177\u3002\u6211\u4eec\u5229\u7528Wachsmuth\u7b49\u4eba\u6784\u5efa\u7684WIRED YouTube\u7cfb\u5217\u6570\u636e\u96c6\uff0c\u5e76\u7531Booshehri\u7b49\u4eba\u8fdb\u884c\u4e86\u5e26\u6709\u89e3\u91ca\u884c\u4e3a\u7684\u6807\u6ce8\uff0c\u8fd9\u4e9b\u6ce8\u91ca\u4e3a\u6211\u4eec\u7406\u89e3\u5bf9\u8bdd\u4e2d\u89e3\u91ca\u8005\u5982\u4f55\u6784\u5efa\u56de\u5e94\u63d0\u4f9b\u4e86\u4f9d\u636e\u3002 \u968f\u7740\u53bb\u5e74\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u671f\u671b\u66f4\u597d\u5730\u7406\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u4ee5\u53ca\u5b83\u4eec\u5982\u4f55\u589e\u5f3a\u4e13\u5bb6\u89e3\u91ca\u8005\u7684\u5bf9\u8bdd\u4ea4\u6d41\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Booshehri\u7b49\u4eba2023\u5e74\u6807\u6ce8\u76845-Levels\u6570\u636e\u96c6\u6765\u8bc4\u4f30LLMs\u5728\u89e3\u91ca\u6027\u5bf9\u8bdd\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u8bc4\u4ef7LLMs\u751f\u6210\u89e3\u91ca\u8005\u56de\u5e94\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e09\u79cd\u7b56\u7565\uff1a\u4eba\u7c7b\u89e3\u91ca\u8005\u7684\u539f\u59cb\u56de\u5e94\u3001GPT4\u7684\u6807\u51c6\u56de\u5e94\u4ee5\u53ca\u52a0\u5165\u4e86\u89e3\u91ca\u6b65\u9aa4\u7684GPT4\u56de\u5e94\u3002\u6211\u4eec\u9080\u8bf7\u4eba\u7c7b\u6807\u6ce8\u8005\u5bf9\u8fd9\u4e09\u79cd\u7b56\u7565\u8fdb\u884c\u8bc4\u4f30\u3002|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505](http://arxiv.org/abs/2406.18505)|null|## \u80cc\u666f \u5c3d\u7ba1\u73b0\u4ee3\u8bed\u8a00\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u4e00\u5b9a\u7684\u63a8\u7406\u80fd\u529b\uff0c\u7406\u8bba\u4e0a\u80fd\u591f\u8868\u8fbe\u4efb\u610f\u53ef\u80fd\u7684\u4ee4\u724c\u5206\u5e03\uff0c\u4f46\u5b83\u4eec\u5982\u4f55\u5229\u7528\u9884\u8bad\u7ec3\u65f6\u79ef\u7d2f\u7684\u4e16\u754c\u77e5\u8bc6\u6765\u7406\u89e3\u7269\u7406\u4e16\u754c\u4e2d\u7684\u4ee3\u7406\u884c\u4e3a\uff0c\u8fd9\u4e00\u65b9\u9762\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u7814\u7a76\u9996\u6b21\u5b9e\u8bc1\u8003\u5bdf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u901a\u8fc7\u63a8\u7406\u5206\u6790\u4ee3\u7406\u7684\u884c\u4e3a\u53ca\u5176\u5bf9\u72b6\u6001\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u6784\u5efa\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\uff08agent mental modeling\uff09\u7684\u80fd\u529b\u3002\u8fd9\u53ef\u80fd\u63ed\u793a\u51fa\u5229\u7528LLMs\u89e3\u6790\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u4ee3\u7406\u884c\u4e3a\u7684\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u53ef\u89e3\u91ca\u5f3a\u5316\u5b66\u4e60\uff08XRL\uff09\u7684\u5173\u952e\u6311\u6218\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u7279\u5b9a\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5728\u4e0d\u540c\u590d\u6742\u5ea6\u7684RL\u4efb\u52a1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6d4b\u8bd5\uff0c\u62a5\u544a\u5173\u4e8e\u4ee3\u7406\u5fc3\u7406\u6a21\u578b\u5efa\u7acb\u7684\u7814\u7a76\u7ed3\u679c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u8fd8\u65e0\u6cd5\u4ec5\u901a\u8fc7\u63a8\u7406\u5b8c\u5168\u5b9e\u73b0\u4ee3\u7406\u7684\u5fc3\u7406\u5efa\u6a21\uff0c\u8fd9\u9700\u8981\u8fdb\u4e00\u6b65\u521b\u65b0\u3002\u56e0\u6b64\uff0c\u8fd9\u9879\u5de5\u4f5c\u63d0\u4f9b\u4e86\u5bf9\u73b0\u4ee3LLMs\u80fd\u529b\u548c\u5c40\u9650\u6027\u7684\u65b0\u89c1\u89e3\u3002|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501](http://arxiv.org/abs/2406.18501)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5185\u63d2\u5b66\u4e60\uff08in-context learning\uff0cICL\uff09\u80fd\u529b\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u8fdb\u884c\u529f\u80fd\u7b49\u6548\u6027\u8bca\u65ad\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u9006\u9891\u7387\u6548\u5e94\uff08inverse frequency effect\uff0cIFE\uff09\u6765\u5206\u6790\u3002IFE\u73b0\u8c61\u6307\u7684\u662f\u5728\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6a21\u578b\u5e94\u5bf9\u7f55\u89c1\u6837\u4f8b\u4ea7\u751f\u7684\u66f4\u65b0\u5e45\u5ea6\u5927\u4e8e\u5e38\u89c1\u6837\u4f8b\u3002\u5728\u5fc3\u7406\u5b66\u4e2d\uff0c\u4eba\u7c7b\u5728\u7ed3\u6784\u5316\u63d0\u793a\uff08\u5982\u503e\u5411\u4e8e\u91cd\u590d\u6700\u8fd1\u63a5\u89e6\u7684\u53e5\u5b50\u7ed3\u6784\uff09\u60c5\u5883\u4e2d\u8868\u73b0\u51faIFE\uff0c\u8fd9\u8868\u660e\u5176\u53ef\u80fd\u6d89\u53ca\u9519\u8bef\u9a71\u52a8\u7684\u5b66\u4e60\u673a\u5236\u3002\u5b9e\u9a8c\u901a\u8fc7\u6a21\u62df\u7ed3\u6784\u5316\u63d0\u793a\u5728ICL\u4e2d\u7684\u5f71\u54cd\u53d1\u73b0\uff0cLLMs\u540c\u6837\u663e\u793a\u51faIFE\uff0c\u4e14\u8fd9\u4e00\u6548\u5e94\u5728\u66f4\u5927\u7684\u6a21\u578b\u4e2d\u66f4\u4e3a\u660e\u663e\u3002\u56e0\u6b64\uff0c\u7814\u7a76\u7ed3\u679c\u652f\u6301\u4e86ICL\u672c\u8d28\u4e0a\u662f\u57fa\u4e8e\u68af\u5ea6\u7684\u5b66\u4e60\u7684\u5047\u8bbe\uff0c\u5373\u5728ICL\u7684\u524d\u5411\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u9690\u542b\u5730\u8ba1\u7b97\u4e86\u68af\u5ea6\u3002\u8bba\u6587\u7ed3\u8bba\u6307\u51fa\uff0c\u4eba\u7c7b\u548cLLMs\u90fd\u4f7f\u7528\u4e86\u57fa\u4e8e\u68af\u5ea6\u7684\u3001\u9519\u8bef\u9a71\u52a8\u7684\u5904\u7406\u673a\u5236\u3002|\n", "2406.18460": "|**2024-06-26**|**Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation**|Ahmed Njifenjou et.al.|[2406.18460](http://arxiv.org/abs/2406.18460)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u65b9\u6cd5\u6765\u521b\u5efa\u80fd\u591f\u8fdb\u884c\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8fd9\u4e9b\u6a21\u578b\u80fd\u56de\u7b54\u7528\u6237\u95ee\u9898\uff0c\u4f46\u5c40\u9650\u4e8e\u5355\u5411\u95ee\u7b54\u5f62\u5f0f\uff0c\u800c\u975e\u771f\u6b63\u7684\u5bf9\u8bdd\u3002\u901a\u5e38\uff0c\u901a\u8fc7\u9488\u5bf9\u7279\u5b9a\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u6765\u8c03\u6574\u5b83\u4eec\u7684\u4ea4\u6d41\u98ce\u683c\uff0c\u4f46\u8fd9\u65e2\u6602\u8d35\u53c8\u9650\u4e8e\u5c11\u6570\u8bed\u8a00\u3002\u672c\u7814\u7a76\u63a2\u7d22\u4e86\u89d2\u8272\u626e\u6f14\u7684\u96f6\u6837\u672c\u63d0\u793a\u4f5c\u4e3a\u63d0\u9ad8\u5f00\u653e\u9886\u57df\u5bf9\u8bdd\u6548\u7387\u548c\u6210\u672c\u6548\u76ca\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u591a\u8bed\u8a00\u80fd\u529b\u5f3a\u7684\u8bad\u7ec3\u6709\u7d20\u6a21\u578b\uff08Beeching\u7b49\u4eba\uff0c2023\u5e74\uff09\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u9075\u5faa\u6307\u4ee4\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\u7cfb\u7edf\uff0c\u5f53\u4e0e\u9075\u5faa\u6307\u4ee4\u7684\u6a21\u578b\u2014\u2014\u8fd9\u91cc\u4f7f\u7528Vicuna\uff08Chiang\u7b49\u4eba\uff0c2023\u5e74\uff09\u7ed3\u5408\u65f6\uff0c\u80fd\u591f\u751f\u6210\u5728\u6cd5\u8bed\u4e2d\u7684\u5bf9\u8bdd\u4ee3\u7406\uff0c\u5728\u4e24\u9879\u4efb\u52a1\u4e2d\u751a\u81f3\u8d85\u8d8a\u4e86\u7ecf\u8fc7\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5e76\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u8272\u3002|\n", "2406.18449": "|**2024-06-26**|**Cascading Large Language Models for Salient Event Graph Generation**|Xingwei Tan et.al.|[2406.18449](http://arxiv.org/abs/2406.18449)|**[link](https://github.com/xingwei-warwick/callmsae)**|\u7531\u4e8e\u957f\u6587\u6863\u4e2d\u4e8b\u4ef6\u68c0\u6d4b\u3001\u5173\u7cfb\u8bc6\u522b\u4ee5\u53ca\u975e\u7ed3\u6784\u5316\u8f93\u5165\u4e0e\u7ed3\u6784\u5316\u56fe\u8c31\u7684\u6574\u5408\u7b49\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u4ece\u6587\u672c\u751f\u6210\u4e8b\u4ef6\u56fe\u8c31\u662f\u4e00\u9879\u6311\u6218\u3002\u5f53\u524d\u7684\u7814\u7a76\u5f80\u5f80\u540c\u7b49\u91cd\u89c6\u6240\u6709\u4e8b\u4ef6\uff0c\u672a\u80fd\u533a\u5206\u5bf9\u7406\u89e3\u53d9\u4e8b\u81f3\u5173\u91cd\u8981\u7684\u5173\u952e\u4e8b\u4ef6\u3002\u672c\u6587\u63d0\u51faCALLMSAE\uff0c\u4e00\u4e2a\u57fa\u4e8eCAscading\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684SAlient Event\u56fe\u8c31\u751f\u6210\u6846\u67b6\uff0c\u5b83\u5229\u7528LLMs\u7684\u80fd\u529b\uff0c\u5e76\u907f\u514d\u4e86\u6602\u8d35\u7684\u4eba\u5de5\u6807\u6ce8\u9700\u6c42\u3002\u9996\u5148\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u6458\u8981\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u91cd\u8981\u4e8b\u4ef6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u4ee3\u7801\u7cbe\u70bc\u63d0\u793a\u7b56\u7565\uff0c\u7528\u4e8e\u751f\u6210\u4e8b\u4ef6\u5173\u7cfb\u56fe\uff0c\u6d88\u9664\u9519\u8bef\u7684\u5173\u7cfb\u5e76\u6062\u590d\u7f3a\u5931\u7684\u8fb9\u3002\u5bf9\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u56fe\u8c31\u751f\u6210\u6a21\u578b\u8fdb\u884c fine-tuning\uff0c\u5728\u4f7f\u7528 LLM \u751f\u6210\u7684\u56fe\u8c31\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f18\u4e8e\u4f7f\u7528 CAEVO \u751f\u6210\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u3002\u5728\u4eba\u7c7b\u6807\u6ce8\u7684\u6d4b\u8bd5\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u751f\u6210\u66f4\u7a81\u51fa\u4e14\u51c6\u786e\u7684\u56fe\u8c31\uff0c\u8d85\u8d8a\u4e86\u7ade\u4e89\u6027\u7684\u57fa\u7ebf\u3002|\n", "2406.18440": "|**2024-06-26**|**New intelligent empowerment for digital transformation**|Peng Yifeng et.al.|[2406.18440](http://arxiv.org/abs/2406.18440)|null|\u8fd9\u9879\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u521b\u65b0\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u8861\u91cf\u4f01\u4e1a\u7684\u6570\u5b57\u5316\u8f6c\u578b\uff08DT\uff09\u8fc7\u7a0b\u3002\u901a\u8fc7\u5bf92005\u5e74\u81f32022\u5e74\u95f4\u5728\u7ebd\u7ea6\u8bc1\u5238\u4ea4\u6613\u6240\u548c\u7eb3\u65af\u8fbe\u514b\u4e0a\u5e02\u76844407\u5bb6\u516c\u53f8\u7684\u5e74\u5ea6\u62a5\u544a\u8fdb\u884c\u5206\u6790\uff0c\u6784\u5efa\u4e86\u4e00\u5957\u5168\u9762\u7684DT\u6307\u6807\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cDT\u663e\u8457\u63d0\u9ad8\u4e86\u4f01\u4e1a\u7684\u8d22\u52a1\u8868\u73b0\u3002\u7136\u800c\uff0c\u4e0d\u540c\u7684\u6570\u5b57\u6280\u672f\u5bf9\u8d22\u52a1\u6027\u80fd\u7684\u5f71\u54cd\u5404\u4e0d\u76f8\u540c\uff0c\u533a\u5757\u94fe\u6280\u672f\u7684\u79ef\u6781\u5f71\u54cd\u76f8\u5bf9\u8f83\u5c0f\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0DT\u901a\u8fc7\u63d0\u5347\u8fd0\u8425\u6548\u7387\u548c\u964d\u4f4e\u6210\u672c\u4fc3\u8fdb\u8d22\u52a1\u7ee9\u6548\u589e\u957f\u3002\u672c\u7814\u7a76\u4e3a\u5b66\u672f\u754c\u63d0\u4f9b\u4e86\u65b0\u7684DT\u8bc4\u4f30\u5de5\u5177\uff0c\u540c\u65f6\u62d3\u5bbd\u4e86\u751f\u6210\u4eba\u5de5\u667a\u80fd\u6280\u672f\u5728\u7ecf\u6d4e\u7814\u7a76\u4e2d\u7684\u5e94\u7528\u8303\u56f4\u3002|\n", "2406.18406": "|**2024-06-26**|**IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons**|Dan Shi et.al.|[2406.18406](http://arxiv.org/abs/2406.18406)|null|\u4eba\u4eec\u666e\u904d\u8ba4\u4e3a\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u540e\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u6587\u672c\u65f6\u7684\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u5373\u6a21\u578b\u5185\u7f16\u7801\u7684\u53c2\u6570\u77e5\u8bc6\uff08\u5373\u77e5\u8bc6\u5e93\uff09\u4e0e\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u65b0\u77e5\u8bc6\u5b58\u5728\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u6846\u67b6\u2014\u2014IRCAN\uff08\u8bc6\u522b\u548c\u91cd\u6743\u4e0a\u4e0b\u6587\u611f\u77e5\u795e\u7ecf\u5143\uff09\u3002IRCAN\u9996\u5148\u5229\u7528\u6574\u5408\u68af\u5ea6\u8ba1\u7b97\u5f97\u5230\u7684\u4e0a\u4e0b\u6587\u611f\u77e5\u5f52\u56e0\u5206\u6570\uff0c\u6765\u8bc6\u522b\u90a3\u4e9b\u5bf9\u5904\u7406\u8bed\u5883\u81f3\u5173\u91cd\u8981 \u7684\u795e\u7ecf\u5143\u3002\u63a5\u7740\uff0c\u901a\u8fc7\u91cd\u65b0\u8d4b\u6743\uff0c\u6211\u4eec\u5f3a\u5316\u8fd9\u4e9b\u8bc6\u522b\u51fa\u7684\u4e0a\u4e0b\u6587\u76f8\u5173\u795e\u7ecf\u5143\uff0c\u4ece\u800c\u5f15\u5bfcLLMs\u751f\u6210\u66f4\u7b26\u5408\u4e0a\u4e0b\u6587\u65b0\u77e5\u8bc6\u7684\u54cd\u5e94\u3002\u6211\u4eec\u5728\u591a\u79cd\u6a21\u578b\u548c\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cIRCAN\u4e0d\u4ec5\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u7684\u80fd\u529b\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u3001\u5373\u63d2\u5373\u7528\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u65e0\u7f1d\u878d\u5165\u73b0\u6709\u6a21\u578b\u4e2d\u3002|\n", "2406.19392": "|**2024-06-27**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392](http://arxiv.org/abs/2406.19392)|**[link](https://github.com/rextime/rextime)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3aReXTime\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e13\u95e8\u9488\u5bf9\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u5728\u89c6\u9891\u4e8b\u4ef6\u4e2d\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u4e25\u8c28\u8bc4\u4f30\u3002ReXTime\u5173\u6ce8\u7684\u662f\u8de8\u65f6\u95f4\u63a8\u7406\uff0c\u5373\u7406\u89e3\u5f53\u95ee\u9898\u53ca\u5176\u76f8\u5e94\u7684\u7b54\u6848\u51fa\u73b0\u5728\u4e0d\u540c\u7684\u89c6\u9891\u7247\u6bb5\u65f6\u7684\u4eba\u7c7b\u5f0f\u7406\u89e3\u3002\u8fd9\u79cd\u9700\u8981\u6df1\u5165\u7406\u89e3\u89c6\u9891\u7247\u6bb5\u4e4b\u95f4\u56e0\u679c\u5173\u7cfb\u7684\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u5bf9\u524d\u6cbf\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u6210\u4e86\u91cd\u5927\u6311\u6218\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u79cd\u8bc4\u4ef7\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u7ba1\u9053\uff0c\u7528\u4e8e\u751f\u6210\u65f6\u95f4\u63a8\u7406\u7684\u95ee\u7b54\u5bf9\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u7e41\u7410\u7684\u624b\u52a8\u6807\u6ce8\u9700\u6c42\u3002\u6211\u4eec\u7684\u57fa\u51c6\u5305\u62ec921\u4e2a\u7cbe\u5fc3\u7b5b\u9009\u7684\u9a8c\u8bc1\u6837\u672c\u548c2,143\u4e2a\u6d4b\u8bd5\u6837\u672c\uff0c\u6bcf\u4e2a\u6837\u672c\u90fd\u7ecf\u8fc7\u4eba\u5de5\u7cbe\u5fc3\u6311\u9009\u4ee5\u786e\u4fdd\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5b66\u672f\u6a21\u578b\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7684\u8868\u73b0\u4ecd\u5b58\u5728\u663e\u8457\u768414.3%\u7684\u7cbe\u5ea6\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u65e0\u9700\u4eba\u5de5\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b9,695\u4e2a\u673a\u5668\u751f\u6210\u6837\u672c\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u8fd9\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6765\u63d0\u5347\u8de8\u65f6\u95f4\u63a8\u7406\u80fd\u529b\u3002**|\n", "2406.19384": "|**2024-06-27**|**The Remarkable Robustness of LLMs: Stages of Inference?**|Vedang Lad et.al.|[2406.19384](http://arxiv.org/abs/2406.19384)|**[link](https://github.com/vdlad/remarkable-robustness-of-llms)**|**\u6211\u4eec\u901a\u8fc7\u5220\u9664\u548c\u4ea4\u6362\u76f8\u90bb\u5c42\u6765\u5c55\u793a\u5e76\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u60ca\u4eba\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0d\u8fdb\u884c\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u8fd9\u4e9b\u5e72\u9884\u63aa\u65bd\u4ecd\u80fd\u4fdd\u7559\u539f\u59cb\u6a21\u578b72%\u81f395%\u7684\u9884\u6d4b\u7cbe\u5ea6\uff0c\u800c\u4e14\u6a21\u578b\u5c42\u6570\u8d8a\u591a\uff0c\u8868\u73b0\u51fa\u66f4\u9ad8\u7684\u9c81\u68d2\u6027\u3002\u6839\u636e\u9010\u5c42\u5e72\u9884\u5b9e\u9a8c\u548c\u5176\u4ed6\u5b9e\u9a8c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u5047\u8bbe\uff1a\u5b58\u5728\u56db\u79cd\u901a\u7528\u7684\u63a8\u7406\u9636\u6bb5\uff0c\u8de8\u8d8a\u516b\u79cd\u4e0d\u540c\u7684\u6a21\u578b\uff1a\u89e3\u7801\u5668\u9636\u6bb5\uff0c\u5c06\u539f\u59cb\u4ee4\u724c\u8868\u793a\u63d0\u5347\u4e3a\u66f4\u9ad8\u7ea7\u7684\u4e0a\u4e0b\u6587\u8868\u793a\uff1b\u7279\u5f81\u5de5\u7a0b\u9636\u6bb5\uff0c\u8fed\u4ee3\u4f18\u5316\u4efb\u52a1\u548c\u5b9e\u4f53\u7279\u5b9a\u7279\u5f81\uff1b\u7136\u540e\u662f\u6a21\u578b\u7684\u534a\u90e8\u5206\uff0c\u968f\u7740\u4e13\u95e8\u7ec4\u4ef6\u7684\u4f5c\u7528\uff0c\u9690\u85cf\u8868\u793a\u4e0e\u8bcd\u6c47\u7a7a\u95f4\u7684\u5bf9\u9f50\u8fdb\u5165\u4e00\u4e2a\u76f8\u53d8\u9636\u6bb5\uff1b\u6700\u540e\uff0c\u6700\u540e\u4e00\u5c42\u901a\u8fc7\u6d88\u9664\u5bf9\u9884\u6d4b\u9020\u6210\u5e72\u6270\u7684\u8fc7\u65f6\u7279\u5f81\uff0c\u7cbe\u7ec6\u5316\u540e\u7eed\u7684\u4ee4\u724c\u5206\u5e03\u3002**|\n", "2406.19358": "|**2024-06-27**|**The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models**|Xiliang Zhu et.al.|[2406.19358](http://arxiv.org/abs/2406.19358)|null|### \u6982\u8ff0 \u60c5\u611f\u5206\u6790\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u626e\u6f14\u7740\u6838\u5fc3\u89d2\u8272\u3002XLM-R\u548cmT5\u7b49\u591a\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u5174\u8d77\u63a8\u52a8\u4e86\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u7684\u5173\u6ce8\u5ea6\u63d0\u5347\u3002\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51fa\u73b0\u6781\u5927\u5730\u63a8\u52a8\u4e86\u901a\u7528NLP\u4efb\u52a1\u7684\u53d1\u5c55\uff0c\u4f46\u8fd9\u4e9b\u6a21\u578b\u5728\u8de8\u8bed\u8a00\u60c5\u611f\u5206\u6790\u65b9\u9762\u7684\u6027\u80fd\u5c1a\u672a\u5145\u5206\u63a2\u8ba8\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5b9e\u8bc1\u5206\u6790\uff0c\u6bd4\u8f83\u4e86\u516c\u5171\u5c0f\u578b\u591a\u8bed\u8a00\u6a21\u578b\uff08SMLM\uff09\u5982XLM-R\u4e0e\u4ee5\u82f1\u8bed\u4e3a\u4e2d\u5fc3\u7684LLM\uff08\u5982Llama-3\uff09\u5728\u82f1\u8bed\u3001\u897f\u73ed\u7259\u8bed\u3001\u6cd5\u8bed\u548c\u4e2d\u6587\u7684\u60c5\u611f\u5206\u6790\u4e2d\u7684\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8fc1\u79fb\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c31\u516c\u5f00\u6a21\u578b\u800c\u8a00\uff0cSMLM\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u8bbe\u7f6e\u4e2d\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u5c11\u91cf\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u516c\u5f00LLM\u663e\u793a\u51fa\u66f4\u5f3a\u7684\u9002\u5e94\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u4e13\u6709\u7684GPT-3.5\u548cGPT-4\u5728\u96f6\u6837\u672c\u8de8\u8bed\u8a00\u80fd\u529b\u4e0a\u9886\u5148\uff0c\u4f46\u5728\u5c11\u91cf\u6837\u672c\u573a\u666f\u4e0b\uff0c\u5b83\u4eec\u88ab\u516c\u5f00\u6a21\u578b\u8d85\u8d8a\u3002|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356](http://arxiv.org/abs/2406.19356)|**[link](https://github.com/umass-ml4ed/divert)**|## \u80cc\u666f \u9ad8\u8d28\u91cf\u7684\u5e72\u6270\u9879\u5bf9\u4e8e\u9009\u62e9\u9898\uff08\u5c24\u5176\u662f\u6570\u5b66\u9009\u62e9\u9898\uff09\u7684\u8bc4\u4f30\u548c\u6559\u5b66\u4ef7\u503c\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u624b\u5de5\u8bbe\u8ba1\u80fd\u591f\u53cd\u6620\u5b66\u751f\u5b9e\u9645\u77e5\u8bc6\u7f3a\u9677\u6216\u8bef\u89e3\u7684\u5e72\u6270\u9879\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u5728\u751f\u6210\u5e72\u6270\u9879\u65b9\u9762\u6709\u6240\u52a9\u76ca\uff0c\u4f46\u6570\u5b66\u8fd9\u7c7b\u5b66\u79d1\u7684\u5904\u7406\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u7406\u89e3\u548c\u751f\u6210\u89e3\u91ca\u6027\u7684\u9519\u8bef\u8868\u793a\uff0c\u4ee5\u751f\u6210\u6570\u5b66\u9009\u62e9\u9898\u7684\u5e72\u6270\u9879\u3002\u672c\u6587\u4ecb\u7ecdDiVERT\uff08\u57fa\u4e8e\u6587\u672c\u7684\u53d8\u5f02\u8bef\u5dee\u751f\u6210\u5668\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u75287\u4ebf\u53c2\u6570\u5f00\u6e90LLM\u7684\u53d8\u5206\u65b9\u6cd5\uff0c\u5b83\u5728\u771f\u5b9e\u4e16\u754c\u6570\u5b66\u9009\u62e9\u9898\u6570\u636e\u96c6\uff08\u5305\u542b1,434\u4e2a\u95ee\u9898\uff0c\u88ab\u6570\u5341\u4e07\u5b66\u751f\u4f7f\u7528\uff09\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684GPT-4\u65b9\u6cd5\uff0cDiVERT\u5728\u5e72\u6270\u9879\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u4e0e\u6570\u5b66\u6559\u80b2\u8005\u7684\u540c\u884c\u8bc4\u5ba1\uff0c\u7ed3\u679c\u8868\u660eDiVERT\u751f\u6210\u7684\u9519\u8bef\u6807\u7b7e\u8d28\u91cf\u63a5\u8fd1\u4eba\u7c7b\u7f16\u5199\u7684\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u82f1\u6587\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u8f93\u51fa\u4e0d\u5e94\u5305\u542b\u9664\u6458\u8981\u5185\u5bb9\u5916\u7684\u4efb\u4f55\u5176\u4ed6\u5185\u5bb9\uff0c\u4e14\u786e\u4fdd\u4e0d\u51fa\u73b0\",\"\u5b57\u7b26\u3002|\n", "2406.19349": "|**2024-06-27**|**IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language**|Lucky Susanto et.al.|[2406.19349](http://arxiv.org/abs/2406.19349)|null|## \u7ffb\u8bd1 \u9488\u5bf9\u7f51\u7edc\u4ec7\u6068\u8a00\u8bba\u5bf9\u793e\u4f1a\u548c\u8c10\u7684\u4e25\u5cfb\u5a01\u80c1\uff0c\u7279\u522b\u662f\u5728\u5370\u5c3c\u8fd9\u7c7b\u56fd\u5bb6\uff0c\u8fd1\u5e74\u6765\u4ec7\u6068\u8a00\u8bba\u5728\u7ebf\u6bd4\u7387\u589e\u957f\u4e86\u5341\u500d\uff0c\u8feb\u5207\u9700\u8981\u6709\u6548\u7684\u68c0\u6d4b\u673a\u5236\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5145\u8db3\u7684\u6807\u8bb0\u6570\u636e\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u5370\u5c3c\u6587\u672c\u7684\uff0c\u8fd9\u4e00\u8fdb\u5c55\u53d7\u5230\u4e86\u963b\u788d\u3002\u8fb9\u7f18\u5316\u7fa4\u4f53\uff0c\u5982\u4ec0\u53f6\u6d3e\u3001LGBTQ\u7b49\u5c11\u6570\u7fa4\u4f53\uff0c\u9762\u4e34\u7684\u6311\u6218\u66f4\u5927\uff0c\u56e0\u4e3a\u4ec7\u6068\u8a00\u8bba\u62a5\u544a\u4e0d\u8db3\uff0c\u73b0\u6709\u7684\u68c0\u6d4b\u5de5\u5177\u5bf9\u5176\u7406\u89e3\u6709\u9650\u3002\u6b64\u5916\uff0c\u5f53\u524d\u6570\u636e\u96c6\u5bf9\u4e3b\u89c2\u6027\u7684\u5904\u7406\u4e0d\u8db3\uff0c\u52a0\u5267\u4e86\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faIndoToxic2024\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u5370\u5c3c\u4ec7\u6068\u8a00\u8bba\u548c\u6bd2\u6027\u5206\u7c7b\u6570\u636e\u96c6\uff0c\u5305\u542b43,692\u6761\u8bb0\u5f55\uff0c\u753119\u540d\u591a\u5143\u5316\u7684\u4e2a\u4f53\u8fdb\u884c\u6807\u6ce8\uff0c\u7279\u522b\u5173\u6ce8\u9009\u4e3e\u671f\u95f4\u9488\u5bf9\u56fd\u5185\u5f31\u52bf\u7fa4\u4f53\uff08\u5982\u603b\u7edf\u9009\u4e3e\u4e2d\u7684\u7279\u5b9a\u7fa4\u4f53\uff09\u7684\u6587\u672c\u3002\u6211\u4eec\u4f7f\u7528BERT\u6a21\u578b\uff08IndoBERTweet\uff09\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4e3a\u4e03\u79cd\u4e8c\u5143\u5206\u7c7b\u4efb\u52a1\u8bbe\u5b9a\u4e86\u57fa\u51c6\uff0c\u53d6\u5f97\u4e860.78\u7684\u5b8fF1\u5206\u6570\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5c06\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u878d\u5165\u5176\u4e2d\uff0c\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578bgpt-3.5-turbo\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u8b66\u544a\uff0c\u8fc7\u5ea6\u4f9d\u8d56\u4eba\u53e3\u7edf\u8ba1\u4fe1\u606f\u53ef\u80fd\u5bfc\u81f4\u7ec6\u5316\u6a21\u578b\u6027\u80fd\u4e0b\u964d\uff0c\u56e0\u4e3a\u8fd9\u4f1a\u5bfc\u81f4\u6570\u636e\u788e\u7247\u5316\u3002|\n", "2406.19317": "|**2024-06-27**|**Jump Starting Bandits with LLM-Generated Prior Knowledge**|Parand A. Alamdari et.al.|[2406.19317](http://arxiv.org/abs/2406.19317)|null|\u6211\u4eec\u63d0\u4f9b\u4e86\u6709\u529b\u7684\u8bc1\u636e\uff0c\u5c55\u793a\u4e86\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\u76f8\u7ed3\u5408\u7684\u4f18\u52bf\u3002\u4e0a\u4e0b\u6587\u5316\u8001\u864e\u673a\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u5e7f\u6cdb\u5e94\u7528\uff0c\u7528\u4e8e\u6839\u636e\u7528\u6237\u7279\u5b9a\u7684\u4e0a\u4e0b\u6587\u751f\u6210\u4e2a\u6027\u5316\u5efa\u8bae\u3002\u6211\u4eec\u8868\u660e\uff0c\u7ecf\u8fc7\u5927\u89c4\u6a21\u8bed\u6599\u5e93\u8bad\u7ec3\uff0c\u5bcc\u542b\u4eba\u7c7b\u77e5\u8bc6\u548c\u504f\u597d\u7684LLMs\u80fd\u591f\u5f88\u597d\u5730\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\uff0c\u4ece\u800c\u901a\u8fc7\u542f\u52a8\u4e0a\u4e0b\u6587\u5316\u591a\u81c2\u8001\u864e\u673a\u6765\u51cf\u5c11\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\uff08regret\uff09\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521d\u59cb\u5316\u7b97\u6cd5\uff0c\u901a\u8fc7\u63d0\u793aLLMs\u751f\u6210\u63a5\u8fd1\u4eba\u7c7b\u504f\u597d\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u4f9b\u8001\u864e\u673a\u5b66\u4e60\u4f7f\u7528\u3002\u8fd9\u663e\u8457\u964d\u4f4e\u4e86\u5728\u7ebf\u5b66\u4e60\u7684\u9057\u61be\u548c\u6570\u636e\u6536\u96c6\u6210\u672c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e24\u7ec4\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4f7f\u7528LLMs\u4f5c\u4e3a\u5360\u535c\u8005\uff08oracle\uff09\u7684\u5b9e\u9a8c\u548c\u57fa\u4e8e\u8054\u5408\u8c03\u67e5\u5b9e\u9a8c\u6570\u636e\u7684\u771f\u5b9e\u4e16\u754c\u5b9e\u9a8c\u3002|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292](http://arxiv.org/abs/2406.19292)|null|\u8fd1\u671f\u7684\u7814\u7a76\u6307\u51fa\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u957f\u6587\u672c\u8f93\u5165\u65f6\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u4e0a\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u5408\u6210\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6570\u503c\u578b\u952e\u503c\u5bf9\u68c0\u7d22\u4efb\u52a1\u3002\u6211\u4eec\u5728GPT-3.5 Turbo\u548cMistral 7B\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u8fd9\u79cd\u6570\u636e\u96c6\u7684\u5fae\u8c03\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u5728\u957f\u6587\u672c\u73af\u5883\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u5206\u6790\u4e86\u5fae\u8c03\u540e\u7684\u6a21\u578b\uff0c\u53d1\u73b0\u5b83\u4eec\u5728\u4ece\u5408\u6210\u4efb\u52a1\u8fc1\u79fb\u5230\u5b9e\u9645\u8bc4\u4f30\uff08\u5982\u572820\u6587\u6863MDQA\u4e2d\u7684\u4f4d\u7f6e10\u5904\u63d0\u534710.5%\uff09\u65b9\u9762\u7684\u8868\u73b0\u6709\u6240\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u7ecf\u8fc7\u6211\u4eec\u5408\u6210\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u5728\u901a\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u4fdd\u6301\u7a33\u5b9a\uff0c\u800c\u4f7f\u7528\u5176\u4ed6\u57fa\u4e8e\u957f\u6587\u672c\u589e\u5f3a\u6570\u636e\u96c6\u5fae\u8c03\u7684LLMs\u53ef\u80fd\u4f1a\u5bfc\u81f4\u9519\u8bef\u589e\u52a0\uff08\u4f8b\u5982\uff0c\u5728TriviaQA\u4e0a\uff0cMistral 7B\u5728\u6211\u4eec\u7684\u5408\u6210\u6570\u636e\u4e0a\u5fae\u8c03\u65e0\u660e\u663e\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u5176\u4ed6\u57fa\u7ebf\u6570\u636e\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\uff0c\u8303\u56f4\u57282.33%\u52306.19%\u4e4b\u95f4\uff09\u3002\u672c\u7814\u7a76\u7a81\u663e\u4e86\u901a\u8fc7\u5408\u6210\u6570\u636e\u5fae\u8c03\u6765\u63d0\u5347LLMs\u5728\u957f\u6587\u672c\u4efb\u52a1\u6027\u80fd\u7684\u6f5c\u529b\u3002|\n", "2406.19283": "|**2024-06-27**|**PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models**|Cathy Mengying Fang et.al.|[2406.19283](http://arxiv.org/abs/2406.19283)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPhysioLLM\u7684\u4e92\u52a8\u7cfb\u7edf\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u53ef\u7a7f\u6234\u8bbe\u5907\u7684\u751f\u7406\u6570\u636e\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u5065\u5eb7\u7406\u89e3\u548c\u63a2\u7d22\u3002\u4e0e\u5546\u4e1a\u5065\u5eb7\u5e94\u7528\u4e0d\u540c\uff0cPhysioLLM\u5177\u5907\u5168\u9762\u7684\u7edf\u8ba1\u5206\u6790\u529f\u80fd\uff0c\u80fd\u53d1\u73b0\u7528\u6237\u6570\u636e\u4e2d\u7684\u5173\u8054\u548c\u8d8b\u52bf\u3002\u7528\u6237\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u95ee\uff0c\u83b7\u53d6\u751f\u6210\u7684\u4e2a\u6027\u5316\u6d1e\u5bdf\uff0c\u5e76\u6839\u636e\u8fd9\u4e9b\u4fe1\u606f\u5236\u5b9a\u884c\u52a8\u76ee\u6807\u3002\u4ee5\u6539\u5584\u7761\u7720\u8d28\u91cf\u4e3a\u4f8b\uff0c\u56e0\u4e3a\u5176\u53ef\u901a\u8fc7\u751f\u7406\u6570\u636e\u91cf\u5316\u4e14\u5bf9\u6574\u4f53\u5065\u5eb7\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u4e00\u9879\u6d89\u53ca24\u540dFitbit\u667a\u80fd\u624b\u8868\u7528\u6237\u7684\u7528\u6237\u7814\u7a76\uff0c\u6211\u4eec\u8bc1\u660e\u4e86PhysioLLM\u5728\u4fc3\u8fdb\u5bf9\u5065\u5eb7\u6570\u636e\u7684\u6df1\u5165\u4e2a\u6027\u5316\u7406\u89e3\uff0c\u4ee5\u53ca\u652f\u6301\u5b9e\u73b0\u4e2a\u4eba\u5065\u5eb7\u76ee\u6807\u65b9\u9762\uff0c\u4f18\u4e8eFitbit\u5e94\u7528\u548c\u901a\u7528LLM\u804a\u5929\u673a\u5668\u4eba\u3002|\n", "2406.19280": "|**2024-06-27**|**HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale**|Junying Chen et.al.|[2406.19280](http://arxiv.org/abs/2406.19280)|**[link](https://github.com/freedomintelligence/huatuogpt-vision)**|**\u968f\u7740\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u533b\u5b66\u591a\u6a21\u6001\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u533b\u5b66\u5f71\u50cf-\u6587\u672c\u6570\u636e\u7684\u6570\u91cf\u548c\u8d28\u91cf\u53d7\u9650\u4e8e\u6570\u636e\u9690\u79c1\u95ee\u9898\u548c\u9ad8\u6602\u7684\u6807\u6ce8\u6210\u672c\uff0c\u8fd9\u4e9b\u6a21\u578b\u4ecd\u9762\u4e34\u6311\u6218\u3002\u65e9\u671f\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528PubMed\u7684\u5927\u578b\u53bb\u6807\u8bc6\u5316\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\u6765\u7f13\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u4ecd\u53d7\u5230\u6570\u636e\u566a\u97f3\u7684\u5f71\u54cd\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u4f18\u5316\u4e86PubMed\u4e2d\u7684\u533b\u7597\u56fe\u50cf-\u6587\u672c\u5bf9\uff0c\u5e76\u5229\u7528GPT-4V\u5728\u201c\u975e\u76f2\u201d\u6a21\u5f0f\u4e0b\u8fdb\u884c\u6570\u636e\u6e05\u6d17\u548c\u683c\u5f0f\u8f6c\u6362\uff0c\u521b\u5efa\u4e86PubMedVision\u6570\u636e\u96c6\uff0c\u5305\u542b130\u4e07\u4efd\u533b\u5b66\u89c6\u89c9\u95ee\u7b54\u6837\u672c\u3002\u6211\u4eec\u7684\u9a8c\u8bc1\u8868\u660e\uff1a\uff081\uff09PubMedVision\u663e\u8457\u63d0\u5347\u4e86\u5f53\u524d\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u5728\u533b\u5b66\u9886\u57df\u7684\u6027\u80fd\uff0c\u5728\u8bf8\u5982MMMU Health & Medicine track\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u663e\u8457\u6539\u5584\uff1b\uff082\uff09\u533b\u5b66\u4e13\u5bb6\u7684\u624b\u52a8\u68c0\u67e5\u548c\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6570\u636e\u96c6\u5728\u6570\u636e\u8d28\u91cf\u4e0a\u4f18\u4e8e\u5176\u4ed6\u6784\u5efa\u65b9\u6cd5\u3002\u5229\u7528PubMedVision\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u540d\u4e3aHuatuoGPT-Vision\u7684340\u4ebf\u53c2\u6570\u7684\u533b\u5b66\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u5728\u516c\u5f00\u6e90\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u533b\u5b66\u591a\u6a21\u6001\u573a\u666f\u4e2d\u663e\u793a\u51fa\u4f18\u8d8a\u6027\u80fd\u3002**|\n", "2406.19271": "|**2024-06-27**|**AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning**|Praneeth Vadlapati et.al.|[2406.19271](http://arxiv.org/abs/2406.19271)|**[link](https://github.com/Pro-GenAI/AutoPureData)**|**\u4eba\u4eec\u5bf9\u6700\u65b0\u7684\u548c\u53ef\u9760\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u9700\u6c42\u6301\u7eed\u589e\u957f\u3002\u901a\u5e38\uff0cLLMs\u662f\u57fa\u4e8e\u56fa\u5b9a\u7684\u6570\u636e\u96c6\u8bad\u7ec3\u7136\u540e\u90e8\u7f72\u7684\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u6570\u636e\u4f1a\u968f\u7740\u65f6\u95f4\u9010\u6e10\u8fc7\u65f6\u3002\u7814\u7a76\u5173\u6ce8\u5982\u4f55\u5229\u7528\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u66f4\u65b0AI\u6a21\u578b\uff0c\u4f46\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u6570\u636e\u8d28\u91cf\u4e0e\u5b89\u5168\u7684\u987e\u8651\uff0c\u5982\u504f\u89c1\u3001\u5783\u573e\u4fe1\u606f\u7b49\u3002\u786e\u4fdd\u6570\u636e\u7eaf\u51c0\u5bf9\u4e8e\u751f\u6210\u53ef\u9760\u7684\u6a21\u578b\u81f3\u5173\u91cd\u8981\u3002\u5728\u4e0d\u7eaf\u6570\u636e\u4e0a\u8bad\u7ec3\u53ef\u80fd\u5bfc\u81f4\u4e0d\u826f\u7ed3\u679c\u3002\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u7cfb\u7edf\uff0c\u5b83\u6536\u96c6\u7f51\u7edc\u6570\u636e\uff0c\u5e76\u501f\u52a9\u73b0\u6709\u53ef\u4fe1\u7684AI\u6a21\u578b\u81ea\u52a8\u7b5b\u9009\u51fa\u4e0d\u9700\u8981\u7684\u5185\u5bb9\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5904\u7406\u4e86\u4e00\u5c0f\u90e8\u5206\u7f51\u7edc\u6570\u636e\uff0c\u9a8c\u8bc1\u4e86\u8be5\u7cfb\u7edf\u7684\u6570\u636e\u51c0\u5316\u6548\u679c\u3002**|\n", "2406.20098": "|**2024-06-28**|**Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs**|Sukmin Yun et.al.|[2406.20098](http://arxiv.org/abs/2406.20098)|**[link](https://github.com/mbzuai-llm/web2code)**|**\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u50cf\u3001\u89c6\u9891\u548c\u97f3\u9891\u7b49\u591a\u79cd\u6a21\u6001\u7684\u5904\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u7406\u89e3\u548c\u751f\u6210\u7f51\u9875\u622a\u56fe\u4ee5\u53ca\u76f8\u5e94\u7684HTML\u4ee3\u7801\u65b9\u9762\u7684\u80fd\u529b\u76f8\u5bf9\u8f83\u5f31\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faWeb2Code\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u62ec\u5927\u89c4\u6a21\u7f51\u9875\u5230\u4ee3\u7801\u7684\u65b0\u57fa\u51c6\uff0c\u7528\u4e8e\u6307\u4ee4\u8c03\u4f18\uff0c\u5e76\u8bc4\u4f30MLLM\u5728\u7f51\u9875\u7406\u89e3\u53caHTML\u4ee3\u7801\u8f6c\u6362\u80fd\u529b\u4e0a\u7684\u8868\u73b0\u3002\u6211\u4eec\u6784\u5efa\u6570\u636e\u96c6\u65f6\uff0c\u5229\u7528\u9884\u8bad\u7ec3\u7684LLMs\u589e\u5f3a\u73b0\u6709\u7684\u7f51\u9875\u5230\u4ee3\u7801\u6570\u636e\u96c6\uff0c\u5e76\u751f\u6210\u591a\u6837\u5316\u7684\u7f51\u9875\u56fe\u7247\uff0c\u4ee5\u4f9b\u6e32\u67d3\u3002\u8f93\u5165\u662f\u7f51\u9875\u56fe\u7247\u548c\u8bf4\u660e\uff0c\u8f93\u51fa\u662f\u7f51\u9875\u7684HTML\u4ee3\u7801\uff0c\u540c\u65f6\u52a0\u5165\u5173\u4e8e\u7f51\u9875\u5185\u5bb9\u7684\u4e30\u5bcc\u81ea\u7136\u8bed\u8a00\u95ee\u7b54\u5bf9\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u7f51\u9875\u5185\u5bb9\u7684\u5168\u9762\u7406\u89e3\u3002\u4e3a\u4e86\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u6d4b\u8bd5\u6846\u67b6\uff0c\u7528\u4e8e\u6d4b\u8bd5MLLM\u5728\u7f51\u9875\u7406\u89e3\u4e0e\u7f51\u9875\u5230\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6280\u80fd\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0d\u4ec5\u6709\u76ca\u4e8e\u6211\u4eec\u63d0\u51fa\u7684\u4efb\u52a1\uff0c\u8fd8\u5728\u89c6\u89c9\u9886\u57df\u7684\u4e00\u822c\u6027\u80fd\u4e0a\u6709\u6240\u63d0\u5347\uff0c\u800c\u5148\u524d\u7684\u6570\u636e\u96c6\u4f1a\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u63a8\u52a8\u901a\u7528MLLM\u7684\u53d1\u5c55\uff0c\u4f7f\u5176\u9002\u7528\u4e8e\u7f51\u7edc\u5185\u5bb9\u751f\u6210\u548c\u81ea\u52a8\u5316\u4efb\u52a1\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u4ee3\u7801\u5c06\u5728\u4e0a\u516c\u5f00\u3002**|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095](http://arxiv.org/abs/2406.20095)|**[link](https://github.com/lostxine/llara)**|**\u8be5\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aLLaRA\uff08\u5927\u578b\u8bed\u8a00\u548c\u673a\u5668\u4eba\u52a9\u624b\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5c06\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u8f6c\u5316\u4e3a\u5bf9\u8bdd\u5f62\u5f0f\uff0c\u901a\u8fc7\u7ed3\u5408\u989d\u5916\u7684\u6570\u636e\u8f85\u52a9\u5b66\u4e60\uff0c\u63d0\u5347\u54cd\u5e94\u8d28\u91cf\u3002\u5229\u7528\u5177\u5907\u89c6\u89c9\u8f93\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\uff0c\u5373\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u5904\u7406\u72b6\u6001\u4fe1\u606f\uff0c\u4f5c\u4e3a\u89c6\u89c9-\u6587\u672c\u63d0\u793a\uff0c\u5e76\u751f\u6210\u6700\u4f18\u7684\u673a\u5668\u4eba\u51b3\u7b56\u7b56\u7565\u3002\u9996\u5148\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u65b9\u6cd5\uff0c\u4ece\u73b0\u6709\u7684\u884c\u4e3a\u514b\u9686\u6570\u636e\u4e2d\u751f\u6210\u591a\u6837\u4e14\u9ad8\u8d28\u91cf\u7684\u673a\u5668\u4eba\u6307\u4ee4\u6570\u636e\u96c6\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u5b9a\u5236\u7684\u5bf9\u8bdd\u5f0f\u683c\u5f0f\u5bf9VLM\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f7f\u5176\u80fd\u591f\u751f\u6210\u6709\u610f\u4e49\u7684\u673a\u5668\u4eba\u884c\u52a8\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLaRA\u6846\u67b6\u5728\u591a\u4e2a\u6a21\u62df\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5c55\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u76f8\u5173\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u9884\u8bad\u7ec3\u6a21\u578b\u5df2\u5728\u63d0\u4f9b\u3002**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094](http://arxiv.org/abs/2406.20094)|**[link](https://github.com/tencent-ailab/persona-hub)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5185\u7684\u591a\u79cd\u89c6\u89d2\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u4eba\u5de5\u5408\u6210\u6570\u636e\u3002\u4e3a\u4e86\u5728\u5927\u89c4\u6a21\u4e0a\u5145\u5206\u5229\u7528\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5f15\u5165\u4e86Persona Hub\uff0c\u8fd9\u662f\u4e00\u4e2a\u4ece\u7f51\u7edc\u6570\u636e\u81ea\u52a8\u6574\u7406\u51fa\u7684\u4e00\u4ebf\u4e2a\u591a\u5143\u5316\u4eba\u683c\u7684\u96c6\u5408\uff0c\u76f8\u5f53\u4e8e\u5168\u7403\u4eba\u53e3\u7684\u7ea613%\u3002\u8fd9\u4e9b\u4eba\u683c\u4f5c\u4e3a\u5206\u5e03\u5f0f\u4e16\u754c\u77e5\u8bc6\u8f7d\u4f53\uff0c\u51e0\u4e4e\u53ef\u4ee5\u8c03\u7528LLM\u5185\u5305\u542b\u7684\u5404\u7c7b\u89c2\u70b9\uff0c\u4ece\u800c\u63a8\u52a8\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u5408\u6210\u6570\u636e\u521b\u5efa\uff0c\u9002\u7528\u4e8e\u5404\u79cd\u573a\u666f\u3002\u901a\u8fc7\u5c55\u793aPersona Hub\u5982\u4f55\u5728\u5927\u89c4\u6a21\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u548c\u903b\u8f91\u63a8\u7406\u95ee\u9898\u3001\u6307\u4ee4\uff08\u7528\u6237\u63d0\u793a\uff09\u3001\u5bcc\u542b\u77e5\u8bc6\u7684\u6587\u672c\u3001\u6e38\u620fNPC\u548c\u5de5\u5177\uff08\u51fd\u6570\uff09\u7b49\u65b9\u9762\u7684\u5e94\u7528\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u57fa\u4e8e\u4eba\u683c\u7684\u6570\u636e\u5408\u6210\u5177\u6709\u591a\u6837\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u7075\u6d3b\u6027\u548c\u6613\u7528\u6027\uff0c\u53ef\u80fd\u5f15\u9886\u5408\u6210\u6570\u636e\u521b\u9020\u548c\u5b9e\u9645\u5e94\u7528\u7684\u65b0\u8303\u5f0f\uff0c\u5bf9LLM\u7684\u7814\u7a76\u548c\u53d1\u5c55\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002|\n", "2406.20092": "|**2024-06-28**|**LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression**|Jieneng Chen et.al.|[2406.20092](http://arxiv.org/abs/2406.20092)|**[link](https://github.com/beckschen/llavolta)**|**\u5c3d\u7ba1\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5d4c\u5165\u538b\u7f29\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u4f46\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u4e2d\u7684\u89c6\u89c9\u4ee4\u724c\u538b\u7f29\u4ecd\u7136\u88ab\u5ffd\u89c6\u3002\u672c\u6587\u7814\u7a76\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u5197\u4f59\u6027\u4ee5\u53ca\u5728\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u6709\u6548\u8bad\u7ec3\u3002\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u6d4b\u8bd5\u9636\u6bb5\u901a\u8fc7\u7b80\u5355\u5e73\u5747\u6c60\u5316\u6d88\u9664\u9ad8\u8fbe70%\u7684\u89c6\u89c9\u4ee4\u724c\uff0cGQA\u57fa\u51c6\u7684\u89c6\u89c9\u95ee\u7b54\u51c6\u786e\u7387\u4ec5\u4e0b\u964d3%\uff0c\u8fd9\u663e\u793a\u51fa\u89c6\u89c9\u4e0a\u4e0b\u6587\u4e2d\u5b58\u5728\u5927\u91cf\u5197\u4f59\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Visual Context Compressor\uff0c\u5b83\u5728\u8bad\u7ec3\u9636\u6bb5\u51cf\u5c11\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\uff0c\u4ee5\u63d0\u9ad8\u6548\u7387\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e3a\u4e86\u5728\u538b\u7f29\u89c6\u89c9\u4ee4\u724c\u65f6\u5c3d\u91cf\u51cf\u5c11\u4fe1\u606f\u635f\u5931\u5e76\u4fdd\u6301\u8bad\u7ec3\u6548\u7387\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u8f7b\u91cf\u7ea7\u8bad\u7ec3\u65b9\u6848LLaVolta\u3002LLaVolta\u91c7\u7528\u5206\u9636\u6bb5\u7684\u89c6\u89c9\u4e0a\u4e0b\u6587\u538b\u7f29\u7b56\u7565\uff0c\u4ece\u91cd\u5ea6\u5230\u8f7b\u5ea6\u9010\u6e10\u538b\u7f29\uff0c\u6700\u7ec8\u5728\u8bad\u7ec3\u7ed3\u675f\u65f6\u5b8c\u5168\u4e0d\u8fdb\u884c\u538b\u7f29\uff0c\u4ece\u800c\u5728\u6d4b\u8bd5\u65f6\u4e0d\u4f1a\u4e22\u5931\u4efb\u4f55\u4fe1\u606f\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u63d0\u5347\u4e86\u591a\u6a21\u6001\u6a21\u578b\u5728\u56fe\u50cf-\u8bed\u8a00\u548c\u89c6\u9891-\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5e76\u663e\u8457\u964d\u4f4e\u4e86\u8bad\u7ec3\u6210\u672c\u3002\u4ee3\u7801\u5df2\u5728https://github.com/Beckschen/LLaVolta\u4e0a\u5f00\u6e90\u3002**|\n", "2406.20087": "|**2024-06-28**|**ProgressGym: Alignment with a Millennium of Moral Progress**|Tianyi Qiu et.al.|[2406.20087](http://arxiv.org/abs/2406.20087)|null|\u968f\u7740\u524d\u6cbf\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u77e5\u8bc6\u8bba\u4e2d\u7684\u5f71\u54cd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u5b83\u4eec\u53ef\u80fd\u5f3a\u5316\u793e\u4f1a\u666e\u904d\u7684\u4ef7\u503c\u89c2\uff0c\u8fdb\u800c\u52a0\u5267\u9519\u8bef\u9053\u5fb7\u89c2\u5ff5\u7684\u56fa\u5316\uff0c\u5bfc\u81f4\u5e7f\u6cdb\u7684\u793e\u4f1a\u95ee\u9898\u6301\u7eed\u5b58\u5728\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6f5c\u5728\u98ce\u9669\uff0c\u6211\u4eec\u63d0\u51fa\u8fdb\u6b65\u5bf9\u9f50\u4f5c\u4e3a\u4e00\u79cd\u6280\u672f\u89e3\u51b3\u65b9\u6848\u3002\u8fdb\u6b65\u5bf9\u9f50\u7b97\u6cd5\u65e8\u5728\u5b66\u4e60\u4eba\u7c7b\u9053\u5fb7\u8fdb\u6b65\u7684\u673a\u5236\uff0c\u4ece\u800c\u5f25\u8865\u73b0\u6709\u5bf9\u9f50\u65b9\u6cd5\u5bf9\u5f53\u4ee3\u9053\u5fb7\u76f2\u70b9\u7684\u654f\u611f\u6027\u3002\u4e3a\u4e86\u63a8\u52a8\u8fdb\u6b65\u5bf9\u9f50\u7684\u7814\u7a76\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ProgressGym\uff0c\u4e00\u4e2a\u5b9e\u9a8c\u6027\u6846\u67b6\uff0c\u5b83\u4ece\u5386\u53f2\u4e2d\u5b66\u4e60\u9053\u5fb7\u8fdb\u6b65\u7684\u89c4\u5f8b\uff0c\u4ee5\u4fc3\u8fdb\u73b0\u5b9e\u4e16\u754c\u9053\u5fb7\u51b3\u7b56\u7684\u672a\u6765\u53d1\u5c55\u3002\u501f\u52a99\u4e2a\u4e16\u7eaa\u7684\u5386\u53f2\u6587\u672c\u548c18\u4e2a\u5386\u53f2LLMs\uff0cProgressGym\u5c06\u73b0\u5b9e\u751f\u6d3b\u4e2d\u7684\u8fdb\u6b65\u5bf9\u9f50\u6311\u6218\u8f6c\u5316\u4e3a\u5177\u4f53\u7684\u57fa\u51c6\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u8ffd\u8e2a\u6f14\u53d8\u7684\u4ef7\u503c\uff08PG-Follow\uff09\u3001\u9884\u6d4b\u9053\u5fb7\u8fdb\u6b65\uff08PG-Predict\uff09\u4ee5\u53ca\u8c03\u8282\u4eba\u4e0eAI\u4ef7\u503c\u53d8\u8fc1\u4e4b\u95f4\u7684\u53cd\u9988\u5faa\u73af\uff08PG-Coevolve\uff09\u3002\u8fd9\u4e9b\u4efb\u52a1\u9700\u8981\u65f6\u95f4\u7ef4\u5ea6\u7684\u65b9\u6cd5\uff0c\u800c\u4f20\u7edf\u7684\u5bf9\u9f50\u7b56\u7565\u65e0\u6cd5\u80dc\u4efb\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7ec8\u8eab\u5b66\u4e60\u548c\u5916\u63a8\u7b97\u6cd5\u4f5c\u4e3a\u8fdb\u6b65\u5bf9\u9f50\u7684\u57fa\u672c\u65b9\u6cd5\uff0c\u5e76\u5efa\u7acb\u4e86\u4e00\u4e2a\u5f00\u653e\u7684\u6392\u884c\u699c\uff0c\u9080\u8bf7\u521b\u65b0\u7b97\u6cd5\u548c\u65b0\u6311\u6218\u3002\u8be5\u6846\u67b6\u548c\u6392\u884c\u699c\u5206\u522b\u53ef\u5728https://github.com/PKU-Alignment/ProgressGym \u548c https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard \u83b7\u53d6\u3002|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085](http://arxiv.org/abs/2406.20085)|null|\u57fa\u4e8e\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u65b9\u6cd5\u5df2\u7ecf\u5728\u751f\u6210\u5404\u79cd\u5e03\u5c40\u7684\u9ad8\u8d28\u91cf\u56fe\u50cf\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u8fd9\u5bf9\u4e8e\u4e0b\u6e38\u611f\u77e5\u4efb\u52a1\u5177\u6709\u663e\u8457\u76ca\u5904\u3002\u7136\u800c\uff0c\u4ec5\u4f9d\u8d56\u8bed\u8a00\u63cf\u8ff0\u548c\u4e00\u4e2a\u5408\u9002\u7684\u591a\u5b9e\u4f8b\u8bc4\u4f30\u6307\u6807\u6765\u5b9e\u73b0\u5168\u81ea\u52a8\u5e03\u5c40\u751f\u6210\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Auto Cherry-Picker\uff08ACP\uff09\uff0c\u65e8\u5728\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6837\u672c\uff0c\u4ee5\u589e\u5f3a\u611f\u77e5\u548c\u591a\u6a21\u6001\u8bad\u7ec3\u6548\u679c\u3002\u901a\u8fc7\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u6982\u5ff5\u5217\u8868\uff0c\u6211\u4eec\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u8be6\u7ec6\u7684\u63cf\u8ff0\u5e76\u8bbe\u8ba1\u5408\u7406\u7684\u5e03\u5c40\u3002\u7136\u540e\uff0c\u4f7f\u7528\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u591a\u4e2a\u56fe\u7247\u3002\u63a5\u7740\uff0c\u6211\u4eec\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u8bc4\u4f30\u6307\u6807\u5bf9\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u70bc\uff0c\u786e\u4fdd\u8d28\u91cf\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u590d\u5408\u5e03\u5c40\u4e0e\u56fe\u50cf\u8bc4\u5206\uff08Composite Layout and Image Score\uff0cCLIS\uff09\u8fd9\u4e00\u65b0\u6307\u6807\uff0c\u7528\u4e8e\u516c\u6b63\u5730\u8bc4\u4f30\u751f\u6210\u7684\u56fe\u50cf\u3002\u6211\u4eec\u7684\u5408\u6210\u9ad8\u8d28\u793a\u4f8b\u5728\u5b9a\u5236\u521d\u59cb\u6982\u5ff5\u5217\u8868\u65f6\uff0c\u80fd\u591f\u6709\u6548\u63d0\u5347\u5404\u79cd\u573a\u666f\u4e0b\u7684\u6027\u80fd\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5c3e\u5206\u5e03\u548c\u4e0d\u5e73\u8861\u6570\u636e\u96c6\u7684\u95ee\u9898\u4e0a\u3002\u4e0b\u6e38\u4efb\u52a1\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cACP\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6a21\u578b\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86CLIS\u4e0e\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u7684\u5173\u8054\uff0c\u53d1\u73b0CLIS\u5206\u6570\u8d8a\u9ad8\uff0c\u6027\u80fd\u8d8a\u597d\u3002\u8fd9\u8868\u660e\u8bc4\u4f30\u6307\u6807\u5728\u89c6\u89c9\u611f\u77e5\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u4ee3\u7801\u3002|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079](http://arxiv.org/abs/2406.20079)|**[link](https://github.com/anisha2102/molecular_facts)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5185\u5bb9\u7684\u81ea\u52a8\u4e8b\u5b9e\u6838\u67e5\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4ee5\u5e94\u5bf9\u9519\u8bef\u53d9\u8ff0\u7684\u95ee\u9898\uff0c\u7814\u7a76\u7684\u4e00\u4e2a\u5173\u952e\u7126\u70b9\u5728\u4e8e\u6838\u67e5\u7684\u7c92\u5ea6\uff1a\u8f83\u5927\u7684\u6587\u672c\u6bb5\u843d\u96be\u4ee5\u6838\u67e5\uff0c\u800c\u66f4\u539f\u5b50\u5316\u7684\u4e8b\u5b9e\uff08\u5982\u547d\u9898\uff09\u53ef\u80fd\u7f3a\u4e4f\u6b63\u786e\u7684\u4e0a\u4e0b\u6587\u89e3\u8bfb\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u8fd9\u4e9b\u539f\u5b50\u4e8b\u5b9e\u4e2d\u4e0a\u4e0b\u6587\u7684\u4f5c\u7528\u3002\u6211\u4eec\u8ba4\u4e3a\u5b8c\u5168\u539f\u5b50\u7684\u4e8b\u5b9e\u5e76\u975e\u6700\u4f73\u8868\u793a\u5f62\u5f0f\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u5206\u5b50\u4e8b\u5b9e\u7684\u4e24\u4e2a\u6807\u51c6\uff1a\u53bb\u60c5\u5883\u5316\uff08decontextuality\uff09\uff0c\u5373\u5b83\u4eec\u80fd\u5426\u72ec\u7acb\u5b58\u5728\uff0c\u4ee5\u53ca\u6700\u5c0f\u5316\uff08minimality\uff09\uff0c\u5373\u6dfb\u52a0\u591a\u5c11\u989d\u5916\u4fe1\u606f\u624d\u80fd\u5b9e\u73b0\u53bb\u60c5\u5883\u5316\u3002\u6211\u4eec\u91cf\u5316\u4e86\u53bb\u60c5\u5883\u5316\u5bf9\u6700\u5c0f\u5316\u7684\u5f71\u54cd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u7840\u65b9\u6cd5\u6765\u81ea\u52a8\u751f\u6210\u5206\u5b50\u4e8b\u5b9e\uff0c\u76ee\u6807\u662f\u5728\u4fdd\u6301\u51c6\u786e\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u9002\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u65b9\u6cd5\u4e0e\u4e0d\u540c\u7684\u53bb\u60c5\u5883\u5316\u7b56\u7565\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u53d1\u73b0\u5206\u5b50\u4e8b\u5b9e\u80fd\u591f\u5728\u6a21\u7cca\u573a\u666f\u4e2d\u5e73\u8861\u6700\u5c0f\u5316\u548c\u4e8b\u5b9e\u6838\u67e5\u7684\u51c6\u786e\u6027\u3002**|\n", "2406.20041": "|**2024-07-01**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041](http://arxiv.org/abs/2406.20041)|null|\u81ea\u4e3b\u4ee3\u7406\u9a71\u52a8\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u793a\u4e86\u5de8\u5927\u7684\u81ea\u52a8\u5316\u6f5c\u529b\u3002\u65e9\u671f\u7684\u5c55\u793a\u8868\u660e\uff0c\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u89e3\u51b3\u590d\u6742\u4efb\u52a1\uff0c\u4e0e\u5916\u90e8\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u589e\u5f3a\u77e5\u8bc6\uff0c\u5e76\u89e6\u53d1\u884c\u52a8\u3002\u7279\u522b\u662f\uff0c\u591a\u4e2a\u4ee3\u7406\u534f\u4f5c\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u7684\u5de5\u4f5c\u6d41\u8bc1\u660e\u4e86\u5b83\u4eec\u5728\u4e0d\u90a3\u4e48\u4e25\u683c\u548c\u5b9a\u4e49\u4e0d\u660e\u786e\u7684\u73af\u5883\u4e2d\u64cd\u4f5c\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u591a\u4ee3\u7406\u65b9\u6cd5\u6709\u5de8\u5927\u7684\u6f5c\u529b\u6210\u4e3a\u4f17\u591a\u5de5\u4e1a\u5e94\u7528\u7684\u6838\u5fc3\uff0c\u4ece\u590d\u6742\u7684\u77e5\u8bc6\u68c0\u7d22\u7cfb\u7edf\u5230\u4e0b\u4e00\u4ee3\u673a\u5668\u4eba\u8fc7\u7a0b\u81ea\u52a8\u5316\u3002\u9274\u4e8e\u5f53\u524dLLMs\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5904\u7406\u590d\u6742\u6d41\u7a0b\u9700\u8981\u5206\u6b65\u9aa4\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u8bbe\u8ba1\u660e\u786e\u4e14\u6a21\u5757\u5316\u7684\u4efb\u52a1\u8ba1\u5212\u3002\u6839\u636e\u590d\u6742\u7a0b\u5ea6\uff0c\u8fd9\u4e9b\u4efb\u52a1\u53ef\u4ee5\u7531\u5355\u4e2a\u4ee3\u7406\u6216\u4e00\u7ec4\u4ee3\u7406\u6267\u884c\u3002\u672c\u7814\u7a76\u4e13\u6ce8\u4e8e\u6784\u5efa\u4e00\u4e2a\u7075\u6d3b\u7684\u4ee3\u7406\u5de5\u7a0b\u6846\u67b6\uff0c\u91cd\u70b9\u5173\u6ce8\u89c4\u5212\u548c\u6267\u884c\uff0c\u65e8\u5728\u5e94\u5bf9\u4e0d\u540c\u9886\u57df\u7684\u590d\u6742\u5e94\u7528\u573a\u666f\u3002\u8be5\u6846\u67b6\u4e3a\u5de5\u4e1a\u5e94\u7528\u63d0\u4f9b\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u51fa\u786e\u4fdd\u53ef\u6269\u5c55\u3001\u7075\u6d3b\u4e14\u534f\u4f5c\u7684\u5de5\u4f5c\u6d41\u7a0b\u6280\u672f\uff0c\u8ba9\u591a\u4e2a\u81ea\u4e3b\u4ee3\u7406\u534f\u540c\u89e3\u51b3\u95ee\u9898\u3002|\n", "2406.20030": "|**2024-06-28**|**LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models**|Renzhi Wang et.al.|[2406.20030](http://arxiv.org/abs/2406.20030)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u4e86\u8ddf\u4e0a\u4e0d\u65ad\u53d8\u5316\u7684\u4e16\u754c\u77e5\u8bc6\uff0c\u9700\u8981\u6301\u7eed\u8fdb\u884c\u6a21\u578b\u66f4\u65b0\uff0c\u8fd9\u50ac\u751f\u4e86\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5c3d\u7ba1\u5df2\u7ecf\u5f00\u53d1\u51fa\u591a\u79cd\u5355\u6b21\u548c\u6279\u91cf\u7f16\u8f91\u7684\u6280\u672f\uff0c\u4f46\u5b83\u4eec\u5728\u9762\u5bf9\u7ec8\u751f\u7f16\u8f91\u65f6\u8981\u4e48\u65e0\u6cd5\u5e94\u7528\uff0c\u8981\u4e48\u6548\u679c\u4e0d\u4f73\u3002\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51faLEMoE\uff0c\u4e00\u4e2a\u4e13\u4e3a\u7ec8\u751f\u6a21\u578b\u7f16\u8f91\u8bbe\u8ba1\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u9002\u914d\u5668\u3002\u9996\u5148\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5f71\u54cd\u4f20\u7edfMoE\u9002\u914d\u5668\u5728\u7ec8\u751f\u7f16\u8f91\u4e2d\u6709\u6548\u6027\u7684\u56e0\u7d20\uff0c\u5305\u62ec\u707e\u96be\u6027\u9057\u5fd8\u3001\u8def\u7531\u4e0d\u4e00\u81f4\u6027\u548c\u987a\u5e8f\u654f\u611f\u6027\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9a\u5236\u7684\u6a21\u5757\u63d2\u5165\u65b9\u6cd5\uff0c\u5f15\u5165\u4e86\u65b0\u9896\u7684\u952e\u503c\u5bf9\u951a\u5b9a\u8def\u7531\u4ee5\u589e\u5f3a\u8bad\u7ec3\u548c\u63a8\u7406\u9636\u6bb5\u7684\u8def\u7531\u4e00\u81f4\u6027\uff0c\u540c\u65f6\u91c7\u7528\u4e86\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u805a\u7c7b\u57fa\u7f16\u8f91\u987a\u5e8f\u89c4\u5212\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u7ec8\u751f\u7f16\u8f91\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u5148\u524d\u7684\u6a21\u578b\u7f16\u8f91\u6280\u672f\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6279\u91cf\u7f16\u8f91\u4efb\u52a1\u4e2d\u7684\u4f18\u79c0\u6027\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u5f00\u6e90\u3002|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015](http://arxiv.org/abs/2406.20015)|**[link](https://github.com/toolbehonest/toolbehonest)**|**\u968f\u7740\u5de5\u5177\u589e\u5f3a\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fc5\u901f\u878d\u5165\u5b9e\u9645\u5e94\u7528\uff0c\u793e\u533a\u4e9f\u9700\u5168\u9762\u4e86\u89e3\u8fd9\u4e9b\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u5168\u9762\u7684\u8bca\u65ad\u57fa\u51c6\u2014\u2014ToolBH\u3002\u6211\u4eec\u4ece\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e24\u4e2a\u7ef4\u5ea6\u8fdb\u884c\u8bc4\u4f30\uff1a\u5728\u6df1\u5ea6\u4e0a\uff0c\u8bbe\u8ba1\u4e86\u591a\u7ea7\u8bca\u65ad\u6d41\u7a0b\uff0c\u5305\u62ec\uff081\uff09\u53ef\u89e3\u6027\u68c0\u6d4b\u3001\uff082\uff09\u89e3\u51b3\u65b9\u6848\u89c4\u5212\u548c\uff083\uff09\u7f3a\u5931\u5de5\u5177\u5206\u6790\uff1b\u5728\u5e7f\u5ea6\u4e0a\uff0c\u8003\u8651\u4e86\u5de5\u5177\u96c6\u7279\u5f81\u4e0b\u7684\u4e09\u79cd\u573a\u666f\uff1a\u7f3a\u5c11\u5fc5\u8981\u5de5\u5177\u3001\u6f5c\u5728\u5de5\u5177\u548c\u529f\u80fd\u6709\u9650\u7684\u5de5\u5177\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e03\u4e2a\u4efb\u52a1\uff0c\u5e76\u901a\u8fc7\u591a\u6b21\u4eba\u5de5\u6807\u6ce8\u6536\u96c6\u4e86700\u4efd\u8bc4\u4f30\u6837\u672c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u5148\u8fdb\u7684\u6a21\u578bGemini-1.5-Pro\u548cGPT-4o\u5728\u8fd9\u9879\u57fa\u51c6\u4e0a\u7684\u603b\u5f97\u5206\u4e3a45.3\u548c37.0\uff0c\u6ee1\u5206100\u5206\u3002\u5728\u5de5\u5177\u589e\u5f3a\u7684LLM\u573a\u666f\u4e2d\uff0c\u66f4\u5927\u7684\u6a21\u578b\u53c2\u6570\u5e76\u4e0d\u4e00\u5b9a\u610f\u5473\u7740\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8bad\u7ec3\u6570\u636e\u548c\u56de\u590d\u7b56\u7565\u540c\u6837\u5173\u952e\u3002\u6211\u4eec\u7684\u8bca\u65ad\u5206\u6790\u6307\u51fa\uff0c\u6a21\u578b\u9519\u8bef\u7684\u4e3b\u8981\u539f\u56e0\u5728\u4e8e\u4efb\u52a1\u53ef\u89e3\u6027\u7684\u5224\u65ad\u3002\u5f00\u653e\u6e90\u7801\u6a21\u578b\u5728\u5197\u957f\u56de\u590d\u65f6\u6027\u80fd\u4e0b\u964d\uff0c\u800c\u4e13\u6709\u6a21\u578b\u5728\u957f\u94fe\u63a8\u7406\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002**|\n", "2407.02490": "|**2024-07-02**|**MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention**|Huiqiang Jiang et.al.|[2407.02490](http://arxiv.org/abs/2407.02490)|**[link](https://github.com/microsoft/MInference)**|**\u7531\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8ba1\u7b97\u6311\u6218\uff0c\u5c24\u5176\u662f\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u957f\uff0c\u5176\u5e7f\u6cdb\u5e94\u7528\u9762\u4e34\u969c\u788d\u3002\u7531\u4e8e\u6ce8\u610f\u529b\u8ba1\u7b97\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c80\u4ebf\u53c2\u6570\u7684LLM\u5728\u5355\u4e2aA100 GPU\u4e0a\u5904\u7406100\u4e07\u4e2a\u4ee4\u724c\uff08\u5373\u9884\u586b\u5145\u9636\u6bb5\uff09\u9700\u898130\u5206\u949f\u3002\u73b0\u6709\u7684\u52a0\u901f\u9884\u586b\u5145\u65b9\u6cd5\u5f80\u5f80\u5728\u9762\u5bf9\u957f\u5e8f\u5217LLMs\u65f6\u96be\u4ee5\u4fdd\u6301\u65e2\u9ad8\u6548\u53c8\u51c6\u786e\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MInference\uff08\u767e\u4e07\u4ee4\u724c\u63a8\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u63d0\u5347\u957f\u5e8f\u5217\u5904\u7406\u9884\u586b\u5145\u9636\u6bb5\u901f\u5ea6\u7684\u7a00\u758f\u8ba1\u7b97\u65b9\u6cd5\u3002\u6211\u4eec\u53d1\u73b0\u4e86\u6ce8\u610f\u529b\u77e9\u9635\u4e2d\u7684\u4e09\u79cd\u72ec\u7279\u6a21\u5f0f\uff1aA\u5f62\u3001\u5782\u76f4\u659c\u7ebf\u548c\u5757\u7a00\u758f\uff0c\u8fd9\u4e9b\u6a21\u5f0f\u53ef\u5229\u7528GPU\u8fdb\u884c\u9ad8\u6548\u7684\u7a00\u758f\u8ba1\u7b97\u3002\u6211\u4eec\u5728\u79bb\u7ebf\u9636\u6bb5\u786e\u5b9a\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u7684\u6700\u4f73\u6a21\u5f0f\uff0c\u5e76\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u52a8\u6001\u6784\u5efa\u7a00\u758f\u7d22\u5f15\u3002\u901a\u8fc7\u4f18\u5316\u7684GPU\u5185\u6838\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u57fa\u4e8e\u6307\u5b9a\u6a21\u5f0f\u7684\u7a00\u758f\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u957f\u5e8f\u5217LLMs\u9884\u586b\u5145\u9636\u6bb5\u7684\u5ef6\u8fdf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u65e0\u9700\u4fee\u6539\u9884\u8bad\u7ec3\u8bbe\u7f6e\u6216\u989d\u5916\u5fae\u8c03\u5373\u53ef\u76f4\u63a5\u5e94\u7528\u4e8e\u73b0\u6709LLMs\u3002\u6211\u4eec\u5728\u5305\u62ecInfiniteBench\u3001RULER\u3001PG-19\u548cNeedle In A Haystack\u5728\u5185\u7684\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4ee5\u53caLLaMA-3-1M\u3001GLM4-1M\u3001Yi-200K\u3001Phi-3-128K\u548cQwen2-128K\u7b49\u6a21\u578b\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cMInference\u5728A100\u4e0a\u6709\u6548\u964d\u4f4e\u4e86\u9884\u586b\u5145\u7684\u63a8\u7406\u5ef6\u8fdf\u9ad8\u8fbe10\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\uff0c\u5730\u5740\u4e3a\uff1ahttps://aka.ms/MInference\u3002**|\n", "2407.02486": "|**2024-07-02**|**Neurocache: Efficient Vector Retrieval for Long-range Language Modeling**|Ali Safaya et.al.|[2407.02486](http://arxiv.org/abs/2407.02486)|**[link](https://github.com/alisafaya/neurocache)**|**\u8fd9\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aNeurocache\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6709\u6548\u4e0a\u4e0b\u6587\u8303\u56f4\uff0c\u901a\u8fc7\u5916\u90e8\u5411\u91cf\u7f13\u5b58\u5b58\u50a8\u5176\u8fc7\u53bb\u7684\u6a21\u578b\u72b6\u6001\u3002\u4e0e\u8fd1\u671f\u7684\u5411\u91cf\u68c0\u7d22\u65b9\u6cd5\u7c7b\u4f3c\uff0cNeurocache\u5229\u7528\u9ad8\u6548\u7684k\u8fd1\u90bb(kNN)\u7b97\u6cd5\u68c0\u7d22\u76f8\u5173\u7684\u5386\u53f2\u72b6\u6001\uff0c\u5e76\u5c06\u5176\u878d\u5165\u6ce8\u610f\u529b\u8fc7\u7a0b\u3002Neurocache\u5728\u6539\u8fdb\u73b0\u6709\u65b9\u6cd5\u65b9\u9762\u6709\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u5b58\u50a8\u538b\u7f29\u7684\u72b6\u6001\uff0c\u51cf\u5c0f\u4e86\u7f13\u5b58\u5927\u5c0f\uff1b(2) \u6bcf\u4e2a\u4ee4\u724c\u6267\u884c\u4e00\u6b21\u68c0\u7d22\u64cd\u4f5c\uff0c\u63d0\u9ad8\u4e86\u63a8\u7406\u901f\u5ea6\uff1b(3) \u5c06\u68c0\u7d22\u7a97\u53e3\u6269\u5c55\u5230\u90bb\u8fd1\u72b6\u6001\uff0c\u63d0\u5347\u4e86\u8bed\u8a00\u5efa\u6a21\u548c\u4e0b\u6e38\u4efb\u52a1\u7684\u51c6\u786e\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u8fd8\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982Llama2-7B\u548cMistral-7B\uff09\u8fdb\u884c\u589e\u5f3a\uff0cNeurocache\u90fd\u80fd\u6709\u6548\u3002\u6211\u4eec\u8fd8\u5bf9\u6bd4\u4e86Neurocache\u4e0e\u5176\u4ed6\u6587\u672c\u68c0\u7d22\u65b9\u6cd5\uff0c\u5728\u5355\u6587\u6863\u95ee\u7b54\u548c\u5c11\u91cf\u6837\u672c\u5b66\u4e60\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u5176\u4f18\u52bf\u3002\u6e90\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u516c\u5f00\uff1ahttps://github.com/alisafaya/neurocache\u3002**|\n", "2407.02485": "|**2024-07-02**|**RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs**|Yue Yu et.al.|[2407.02485](http://arxiv.org/abs/2407.02485)|null|\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6307\u4ee4\u8c03\u4f18\u6846\u67b6RankRAG\uff0c\u65e8\u5728\u9488\u5bf9\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u4e2d\u7684\u4e0a\u4e0b\u6587\u6392\u540d\u548c\u7b54\u6848\u751f\u6210\u53cc\u91cd\u4efb\u52a1\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u8c03\u4f18\u3002\u901a\u8fc7\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u52a0\u5165\u5c11\u91cf\u6392\u540d\u6570\u636e\uff0c\u6307\u4ee4\u8c03\u4f18\u7684\u5355\u4e2a\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u4ee4\u4eba\u60ca\u8bb6\u7684\u6548\u679c\uff0c\u8d85\u8d8a\u4e86\u4e13\u95e8\u4f7f\u7528\u5927\u91cf\u6392\u540d\u6570\u636e\u8fdb\u884c\u5355\u72ec\u8c03\u4f18\u7684\u73b0\u6709\u4e13\u5bb6\u6392\u540d\u6a21\u578b\u3002\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4e0e\u5305\u62ecGPT-4-0613\u3001GPT-4-turbo-2024-0409\u548c\u5f00\u653e\u6e90\u4ee3\u7801\u7684\u6700\u5148\u8fdb\u7684RAG\u6027\u80fd\u6a21\u578bChatQA-1.5\u5728\u5185\u7684\u591a\u4e2a\u5f3abaseline\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684Llama3-RankRAG\u5728\u4e5d\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8eLlama3-ChatQA-1.5\u548cGPT-4\u7cfb\u5217\u6a21\u578b\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u5728\u65e0\u9700\u9488\u5bf9\u751f\u7269\u533b\u5b66\u9886\u57df\u6570\u636e\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u4e94\u4e2a\u751f\u7269\u533b\u5b66\u9886\u57df\u7684RAG\u57fa\u51c6\u4e0a\u4e0eGPT-4\u6a21\u578b\u8868\u73b0\u76f8\u5f53\uff0c\u8fd9\u663e\u793a\u4e86\u5176\u5728\u65b0\u9886\u57df\u4e2d\u7684\u51fa\u8272\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02483": "|**2024-07-02**|**MMedAgent: Learning to Use Medical Tools with Multi-modal Agent**|Binxu Li et.al.|[2407.02483](http://arxiv.org/abs/2407.02483)|**[link](https://github.com/Wangyixinxin/MMedAgent)**|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u6709\u9650\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u4e0d\u5982\u4e13\u4e1a\u6a21\u578b\u3002\u8fd1\u671f\uff0c\u7814\u7a76\u4eba\u5458\u5f00\u53d1\u4e86\u57fa\u4e8eLLMs\u7684\u4ee3\u7406\uff0c\u901a\u8fc7\u7528\u6237\u8f93\u5165\u9009\u62e9\u5408\u9002\u7684\u4e13\u7528\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u533b\u7597\u9886\u57df\uff0c\u8fd9\u7c7b\u8fdb\u5c55\u7684\u5e94\u7528\u8fd8\u4e0d\u5e7f\u6cdb\u3002\u4e3a\u4e86\u5f25\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u4e13\u4e3a\u533b\u7597\u8bbe\u8ba1\u7684\u4ee3\u7406\uff0c\u540d\u4e3a\\textbf{M}ulti-modal \\textbf{Med}ical \\textbf{Agent}\uff08MMedAgent\uff09\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u516d\u4e2a\u533b\u7597\u5de5\u5177\uff0c\u7528\u4e8e\u89e3\u51b3\u4e03\u9879\u4efb\u52a1\uff0c\u4f7f\u4ee3\u7406\u80fd\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u9002\u5b9c\u7684\u5de5\u5177\u3002\u5b9e\u9a8c\u5168\u9762\u5c55\u793a\u4e86MMedAgent\u5728\u5404\u79cd\u533b\u7597\u4efb\u52a1\u4e0a\u8d85\u8d8a\u4e86\u5f00\u6e90\u65b9\u6cd5\uff0c\u751a\u81f3\u5305\u62ec\u5c01\u95ed\u6e90\u6a21\u578bGPT-4o\uff0c\u4e14\u5728\u5f15\u5165\u548c\u6574\u5408\u65b0\u533b\u7597\u5de5\u5177\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6548\u6027\u3002|\n", "2407.02477": "|**2024-07-02**|**Understanding Alignment in Multimodal LLMs: A Comprehensive Study**|Elmira Amirloo et.al.|[2407.02477](http://arxiv.org/abs/2407.02477)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6027\u80fd\u7684\u63d0\u5347\uff0c\u504f\u597d\u4e00\u81f4\u6027\u5df2\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u56e0\u7d20\uff0c\u4f46\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4e2d\u7684\u5e94\u7528\u76f8\u5bf9\u8f83\u5c11\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u7406\u89e3\u4efb\u52a1\u4e2d\u4e5f\u4f1a\u9047\u5230\u8bf8\u5982\u9519\u8bef\u9648\u8ff0\u548c\u5185\u5bb9\u4e0d\u4e00\u81f4\uff08\u5373\u5e7b\u89c9\uff09\u7684\u95ee\u9898\u3002MLLMs\u7684\u504f\u597d\u5bf9\u9f50\u76ee\u6807\u662f\u4f7f\u6a21\u578b\u7684\u56de\u7b54\u66f4\u8d34\u8fd1\u56fe\u50cf\u4fe1\u606f\u3002\u8fd1\u671f\u7684\u7814\u7a76\u5df2\u7ecf\u5f15\u5165\u4e86\u9488\u5bf9MLLM\u7684\u504f\u597d\u6570\u636e\u96c6\uff0c\u5e76\u5c1d\u8bd5\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u548cproximal policy optimization\uff08PPO\uff09\u7b49\u4e0d\u540c\u7684\u5bf9\u9f50\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6570\u636e\u96c6\u3001\u57fa\u7840\u6a21\u578b\u7c7b\u578b\u548c\u5bf9\u9f50\u7b56\u7565\u7684\u5dee\u5f02\uff0c\u54ea\u79cd\u65b9\u6cd5\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u8d21\u732e\u6700\u5927\u5c1a\u4e0d\u6e05\u695a\u3002 \u672c\u6587\u72ec\u7acb\u5206\u6790\u4e86MLLM\u504f\u597d\u5bf9\u9f50\u7684\u5404\u4e2a\u65b9\u9762\u3002\u6211\u4eec\u5c06\u5bf9\u9f50\u7b97\u6cd5\u5206\u4e3a\u79bb\u7ebf\uff08\u5982DPO\uff09\u548c\u5728\u7ebf\uff08\u5982\u5728\u7ebf-DPO\uff09\u4e24\u7c7b\uff0c\u5e76\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u7ed3\u5408\u8fd9\u4e24\u79cd\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u56de\u987e\u4e86\u5404\u79cd\u5df2\u53d1\u8868\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u96c6\uff0c\u63a2\u8ba8\u4e86\u5b83\u4eec\u6784\u5efa\u7ec6\u8282\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u504f\u597d\u6570\u636e\u751f\u6210\u65b9\u6cd5\u2014\u2014\u504f\u89c1\u9a71\u52a8\u7684\u5e7b\u89c9\u91c7\u6837\uff08Bias-Driven Hallucination Sampling\uff0cBDHS\uff09\uff0c\u8fd9\u79cd\u65b9\u6cd5\u65e0\u9700\u989d\u5916\u6807\u6ce8\u6216\u5916\u90e8\u6a21\u578b\uff0c\u4e14\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u5c55\u73b0\u51fa\u4e0e\u4e4b\u524d\u53d1\u8868\u7684\u5bf9\u9f50\u5de5\u4f5c\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u80fd\u3002|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473](http://arxiv.org/abs/2407.02473)|null|\u5982\u4f55\u6784\u5efa\u80fd\u591f\u5728\u5f00\u653e\u4e16\u754c\u4e2d\u6267\u884c\u8bed\u4e49\u5bfc\u822a\u4efb\u52a1\u7684\u673a\u5668\u4eba\uff0c\u6bd4\u5982\u5728\u65b0\u573a\u666f\u4e2d\u5bfb\u627e\u76ee\u6807\u7269\u4f53\uff1f\u5c3d\u7ba1\u57fa\u7840\u6a21\u578b\u5177\u5907\u5904\u7406\u8fd9\u7c7b\u4efb\u52a1\u6240\u9700\u7684\u4e30\u5bcc\u77e5\u8bc6\u548c\u6cdb\u5316\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4e00\u79cd\u5408\u9002\u7684\u573a\u666f\u8868\u793a\u6765\u5c06\u5b83\u4eec\u6574\u5408\u5230\u5b8c\u6574\u7684\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u5f00\u653e\u573a\u666f\u56fe\uff08Open Scene Graphs\uff0cOSG\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u62d3\u6251\u8bed\u4e49\u8868\u793a\uff0c\u7528\u4e8e\u4fdd\u7559\u548c\u7ec4\u7ec7\u5f00\u653e\u96c6\u4e2d\u573a\u666f\u4fe1\u606f\uff0c\u4e14\u7ed3\u6784\u53ef\u9002\u5e94\u4e0d\u540c\u73af\u5883\u7c7b\u578b\u3002\u6211\u4eec\u5c06\u57fa\u7840\u6a21\u578b\u548cOSG\u6574\u5408\u5230OpenSearch\u7cfb\u7edf\u4e2d\uff0c\u8be5\u7cfb\u7edf\u4e13\u4e3a\u5f00\u653e\u4e16\u754c\u7684\u5bf9\u8c61\u76ee\u6807\u5bfc\u822a\u8bbe\u8ba1\uff0c\u80fd\u591f\u7406\u89e3\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u5e76\u5728\u591a\u53d8\u73af\u5883\u4e2d\u96f6\u6837\u672c\u6cdb\u5316\uff0c\u5bfb\u627e\u672a\u89c1\u8fc7\u7684\u7269\u4f53\u3002\u6211\u4eec\u7684OSG\u589e\u5f3a\u4e86\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5f97OpenSearch\u5728\u7269\u4f53\u76ee\u6807\u5bfc\u822a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u7684LLM\u65b9\u6cd5\u3002\u901a\u8fc7\u6a21\u62df\u5b9e\u9a8c\u548c\u771f\u5b9e\u4e16\u754c\u6d4b\u8bd5\uff0c\u6211\u4eec\u9a8c\u8bc1\u4e86OpenSearch\u5728\u5404\u79cd\u73af\u5883\u3001\u673a\u5668\u4eba\u548c\u65b0\u9896\u6307\u4ee4\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2407.02464": "|**2024-07-02**|**Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I**|Harrie Oosterhuis et.al.|[2407.02464](http://arxiv.org/abs/2407.02464)|null|\u4f20\u7edf\u7684\u4fe1\u606f\u68c0\u7d22\uff08IR\uff09\u7cfb\u7edf\u8bc4\u4f30\u901a\u5e38\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u4eba\u5de5\u4e13\u5bb6\u8fdb\u884c\u76f8\u5173\u6027\u6807\u6ce8\u3002\u8fd1\u5e74\u6765\uff0c\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u80fd\u591f\u4ee5\u76f8\u5bf9\u8f83\u4f4e\u7684\u8ba1\u7b97\u6210\u672c\u5927\u89c4\u6a21\u751f\u6210\u76f8\u5173\u6027\u6ce8\u91ca\uff0c\u53ef\u80fd\u51cf\u8f7bIR\u8bc4\u4f30\u7684\u4f20\u7edf\u6210\u672c\uff0c\u5e76\u4f7f\u5176\u9002\u7528\u4e8e\u4f17\u591a\u8d44\u6e90\u532e\u4e4f\u7684\u5e94\u7528\u573a\u666f\u3002\u7136\u800c\uff0c\u751f\u6210\u7684\u6ce8\u91ca\u5e76\u975e\u65e0\u8bef\uff0c\u76f4\u63a5\u7528\u4e8e\u8bc4\u4f30\u53ef\u80fd\u5bfc\u81f4\u7ed3\u679c\u4e0d\u53ef\u9760\u3002\u4e3a\u6b64\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e24\u79cd\u65b9\u6cd5\uff0c\u5206\u522b\u662f\u57fa\u4e8e\u9884\u6d4b\u9a71\u52a8\u7684\u63a8\u65ad\u548c\u89c4\u8303\u98ce\u9669\u63a7\u5236\uff0c\u5229\u7528\u8ba1\u7b97\u673a\u751f\u6210\u7684\u76f8\u5173\u6027\u6ce8\u91ca\u4e3aIR\u8bc4\u4f30\u6307\u6807\u63d0\u4f9b\u53ef\u9760\u7684\u7f6e\u4fe1\u533a\u95f4\uff08CIs\uff09\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9700\u8981\u5c11\u91cf\u53ef\u9760\u7684\u6ce8\u91ca\uff0c\u901a\u8fc7\u7edf\u8ba1\u5206\u6790\u751f\u6210\u6ce8\u91ca\u4e2d\u7684\u9519\u8bef\uff0c\u4ece\u800c\u4e3a\u8bc4\u4f30\u6307\u6807\u8bbe\u7f6eCIs\uff0c\u5177\u6709\u575a\u5b9e\u7684\u7406\u8bba\u57fa\u7840\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u7684\u89c4\u8303\u98ce\u9669\u63a7\u5236\u65b9\u6cd5\u9002\u7528\u4e8e\u6392\u540d\u8bc4\u4f30\uff0c\u5e76\u4e14\u53ef\u4ee5\u6839\u636e\u67e5\u8be2\u548c\u6587\u6863\u81ea\u9002\u5e94\u8c03\u6574CIs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7f6e\u4fe1\u533a\u95f4\u51c6\u786e\u6355\u6349\u4e86\u57fa\u4e8eLLM\u6ce8\u91ca\u7684\u8bc4\u4f30\u4e2d\u7684\u53d8\u5f02\u6027\u548c\u504f\u5dee\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684Bootstrap\u4f30\u8ba1\u3002\u6211\u4eec\u671f\u671b\u8fd9\u4e9b\u8d21\u732e\u80fd\u4e3a\u90a3\u4e9b\u4f20\u7edf\u4e0a\u96be\u4ee5\u5b9e\u73b0\u53ef\u9760\u8bc4\u4f30\u7684\u4f17\u591aIR\u5e94\u7528\u5e26\u6765\u9769\u65b0\u3002|\n", "2407.02411": "|**2024-07-03**|**Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs**|Jinmin Li et.al.|[2407.02411](http://arxiv.org/abs/2407.02411)|null|\u968f\u7740\u89c6\u9891\u9a71\u52a8\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u89c6\u9891\u7406\u89e3\u80fd\u529b\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\uff0c\u4f46\u540c\u65f6\u4e5f\u5f15\u53d1\u4e86\u6570\u636e\u4fdd\u62a4\u65b9\u9762\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u89c6\u9891\u66f4\u5bb9\u6613\u88ab\u65e0\u6388\u6743\u5730\u6807\u6ce8\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cVideo Watermarking\u201d\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u4fdd\u62a4\u89c6\u9891\u514d\u53d7\u672a\u7ecf\u6388\u6743\u7684\u89c6\u9891LLMs\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5185\u5bb9\u548c\u63cf\u8ff0\u7684\u5904\u7406\u3002\u901a\u8fc7\u5728\u5173\u952e\u5e27\u4e2d\u5d4c\u5165\u96be\u4ee5\u5bdf\u89c9\u7684\u6c34\u5370\uff0c\u6211\u4eec\u5229\u7528\u591a\u6a21\u6001\u6d41\u635f\u5931\u4fdd\u6301\u89c2\u770b\u4f53\u9a8c\u7684\u540c\u65f6\uff0c\u9632\u6b62\u89c6\u9891\u88ab\u6ee5\u7528\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\uff0cVideo Watermarking\u663e\u8457\u964d\u4f4e\u4e86\u89c6\u9891\u5728\u5404\u79cd\u89c6\u9891LLMs\u4e2d\u7684\u53ef\u7406\u89e3\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u9690\u79d8\u6027\u548c\u9c81\u68d2\u6027\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u786e\u4fdd\u89c6\u9891\u5185\u5bb9\u7684\u5b89\u5168\u3001\u5b8c\u6574\u6027\u548c\u4fdd\u5bc6\u6027\u63d0\u4f9b\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5e94\u5bf9\u4e0d\u65ad\u53d1\u5c55\u7684\u89c6\u9891LLMs\u6280\u672f\u3002|\n", "2407.02408": "|**2024-07-02**|**CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models**|Song Wang et.al.|[2407.02408](http://arxiv.org/abs/2407.02408)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u5e94\u7528\u4e8e\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u5bf9\u5176\u751f\u6210\u5185\u5bb9\u53ef\u80fd\u4ea7\u751f\u7684\u8d1f\u9762\u793e\u4f1a\u5f71\u54cd\u7684\u62c5\u5fe7\u4e5f\u968f\u4e4b\u589e\u52a0\u3002\u4e3a\u4e86\u8bc4\u4f30LLMs\u7684\u504f\u89c1\uff0c\u7814\u7a76\u4eba\u5458\u5df2\u7ecf\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u6570\u636e\u96c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u504f\u89c1\u8bc4\u4f30\u5de5\u4f5c\u5f80\u5f80\u53ea\u5173\u6ce8\u67d0\u79cd\u7c7b\u578b\u7684\u504f\u89c1\uff0c\u5e76\u4f7f\u7528\u4e0d\u4e00\u81f4\u7684\u8bc4\u4ef7\u6307\u6807\uff0c\u8fd9\u5bfc\u81f4\u4e0d\u540c\u6570\u636e\u96c6\u548cLLM\u4e4b\u95f4\u7684\u6bd4\u8f83\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u591a\u79cd\u7528\u4e8e\u8bc4\u4f30LLM\u504f\u89c1\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86CEB\uff08Compositional Evaluation Benchmark\uff09\uff0c\u5b83\u6db5\u76d6\u4e86\u4e0d\u540c\u793e\u4f1a\u7fa4\u4f53\u548c\u793e\u4f1a\u4efb\u52a1\u4e2d\u7684\u5404\u79cd\u7c7b\u578b\u504f\u89c1\u3002CEB\u7684\u6784\u5efa\u57fa\u4e8e\u6211\u4eec\u65b0\u63d0\u51fa\u7684\u6784\u6210\u6027\u5206\u7c7b\u4f53\u7cfb\uff0c\u4ece\u4e09\u4e2a\u7ef4\u5ea6\u5bf9\u6bcf\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u523b\u753b\uff1a\u504f\u89c1\u7c7b\u578b\u3001\u793e\u4f1a\u7fa4\u4f53\u548c\u4efb\u52a1\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e09\u4e2a\u7ef4\u5ea6\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u5168\u9762\u7684LLM\u504f\u89c1\u8bc4\u4f30\u7b56\u7565\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u504f\u89c1\u5728\u5404\u7ef4\u5ea6\u4e0a\u7684\u7a0b\u5ea6\u6709\u6240\u4e0d\u540c\uff0c\u4ece\u800c\u4e3a\u9488\u5bf9\u7279\u5b9a\u504f\u89c1\u7684\u7f13\u89e3\u65b9\u6cd5\u7684\u53d1\u5c55\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002|\n", "2407.02402": "|**2024-07-02**|**Assessing the Code Clone Detection Capability of Large Language Models**|Zixian Zhang et.al.|[2407.02402](http://arxiv.org/abs/2407.02402)|null|\u8be5\u7814\u7a76\u65e8\u5728\u8bc4\u4f30\u4e24\u79cd\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cGPT-3.5\u548cGPT-4\uff0c\u5728\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u5b9e\u9a8c\u901a\u8fc7\u5728\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u6d4b\u8bd5\u6a21\u578b\uff1aBigCloneBench\uff08\u4eba\u7c7b\u521b\u5efa\uff09\u548cGPTCloneBench\uff08LLM\u751f\u6210\uff09\u3002\u7814\u7a76\u53d1\u73b0\uff0cGPT-4\u5728\u6240\u6709\u7c7b\u578b\u7684\u4ee3\u7801\u514b\u9686\u68c0\u6d4b\u4e2d\u90fd\u660e\u663e\u4f18\u4e8eGPT-3.5\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT\u6a21\u578b\u7684\u51c6\u786e\u5ea6\u4e0e\u5176\u8bc6\u522b\u4ee3\u7801\u514b\u9686\u7684\u80fd\u529b\u4e0e\u4ee3\u7801\u76f8\u4f3c\u5ea6\u4e4b\u95f4\u5b58\u5728\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5728\u8bc6\u522b\u6700\u590d\u6742\u7684Type-4\u4ee3\u7801\u514b\u9686\u65f6\u6548\u679c\u8f83\u4f4e\u3002\u6b64\u5916\uff0cGPT\u6a21\u578b\u5728\u68c0\u6d4bLLM\u751f\u6210\u7684\u4ee3\u7801\u4e2d\u7684\u4ee3\u7801\u514b\u9686\u8868\u73b0\u4f18\u4e8e\u4eba\u7c7b\u751f\u6210\u7684\u4ee3\u7801\uff0c\u4f46\u6574\u4f53\u51c6\u786e\u6027\u4ecd\u4e0d\u663e\u8457\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u5728\u4ee3\u7801\u514b\u9686\u8bc6\u522b\u80fd\u529b\u7684\u5fc5\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u81ea\u6211\u751f\u6210\u4ee3\u7801\u514b\u9686\u7684\u95ee\u9898\uff0c\u968f\u7740\u8f6f\u4ef6\u5de5\u7a0b\u5e08\u8d8a\u6765\u8d8a\u591a\u5730\u4f7f\u7528\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u548c\u91cd\u6784\u5de5\u5177\uff0c\u8fd9\u53ef\u80fd\u4f1a\u6210\u4e3a\u4e00\u4e2a\u95ee\u9898\u3002|\n", "2407.03310": "|**2024-07-03**|**Universal Length Generalization with Turing Programs**|Kaiying Hou et.al.|[2407.03310](http://arxiv.org/abs/2407.03310)|null|**\u6458\u8981\uff1a** \u957f\u5ea6\u6cdb\u5316\u6307\u7684\u662f\u4ece\u7b80\u77ed\u7684\u8bad\u7ec3\u5e8f\u5217\u63a8\u65ad\u51fa\u957f\u6d4b\u8bd5\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd9\u5bf9\u4e8e\u5f53\u524d\u7684\u5927\u8bed\u8a00\u6a21\u578b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u4e9b\u67b6\u6784\u6216\u6570\u636e\u683c\u5f0f\u53d8\u5316\u6765\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u5c40\u9650\u4e8e\u7279\u5b9a\u4efb\u52a1\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u64e6\u9664\u677f\u548c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u6280\u672f\uff0c\u63d0\u51fa\u4e86Turing\u7a0b\u5e8f\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684CoT\u7b56\u7565\uff0c\u5b83\u5c06\u7b97\u6cd5\u6027\u4efb\u52a1\u5206\u89e3\u6210\u7c7b\u4f3c\u56fe\u7075\u673a\u8ba1\u7b97\u7684\u6b65\u9aa4\u3002\u8fd9\u4e2a\u6846\u67b6\u65e2\u901a\u7528\u53c8\u7b80\u5355\uff0c\u53ea\u9700\u8981\u5728\u4e0a\u4e0b\u6587\u4e2d\u7a0d\u4f5c\u4fee\u6539\u5730\u590d\u5236\u6587\u672c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528Turing\u7a0b\u5e8f\uff0c\u6211\u4eec\u5728\u52a0\u6cd5\u3001\u4e58\u6cd5\u4ee5\u53ca\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684SGD\u7b49\u7b97\u6cd5\u6027\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u7a33\u5065\u7684\u957f\u5ea6\u6cdb\u5316\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793aTransformer\u5728\u968f\u673aTuring\u7a0b\u5e8f\u4e0a\u4e5f\u80fd\u5b9e\u73b0\u957f\u5ea6\u6cdb\u5316\uff0c\u8fd9\u8868\u660e\u5bf9\u4e8e\u4efb\u4f55\u7b97\u6cd5\u6027\u4efb\u52a1\uff0c\u957f\u5ea6\u6cdb\u5316\u90fd\u662f\u53ef\u80fd\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u7406\u8bba\u8bc1\u660eTransformer\u80fd\u591f\u5b9e\u73b0Turing\u7a0b\u5e8f\uff0c\u6784\u9020\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RASP\uff08Weiss\u7b49\u4eba\uff09\u7a0b\u5e8f\uff0c\u5b83\u6a21\u62df\u4efb\u610f\u56fe\u7075\u673a\u3002|\n", "2407.03286": "|**2024-07-03**|**Large Language Models for JSON Schema Discovery**|Michael J. Mior et.al.|[2407.03286](http://arxiv.org/abs/2407.03286)|null|## \u80cc\u666f \u534a\u7ed3\u6784\u5316\u6570\u636e\u683c\u5f0f\u5982JSON\u56e0\u5176\u5728\u5b58\u50a8\u6570\u636e\u65f6\u7684\u7075\u6d3b\u6027\u800c\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u7136\u800c\uff0cJSON\u6570\u636e\u901a\u5e38\u7f3a\u4e4f\u4e0e\u5173\u7cfb\u6570\u636e\u5e93\u4e2d\u7684\u8868\u5355\u7ed3\u6784\u76f8\u5bf9\u5e94\u7684\u89c4\u8303\uff08schema\uff09\u3002\u56e0\u6b64\uff0c\u51fa\u73b0\u4e86\u8bb8\u591a\u4ece\u6570\u636e\u96c6\u4e2d\u53d1\u73b0\u89c4\u8303\u7684\u5de5\u5177\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5de5\u5177\u5f88\u6709\u7528\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u5173\u6ce8\u6587\u6863\u7684\u8bed\u6cd5\uff0c\u800c\u5ffd\u89c6\u4e86\u8bed\u4e49\u4fe1\u606f\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u5982\u4f55\u81ea\u52a8\u4e3a\u53d1\u73b0\u7684\u89c4\u8303\u6dfb\u52a0\u6709\u610f\u4e49\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u4f7f\u5176\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4f5c\u8005\u7f16\u5199\u7684\u89c4\u8303\u4e2d\u6240\u5305\u542b\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u4eba\u5de5\u7f16\u5199\u7684JSON Schema\u6587\u6863\u5e93\uff0c\u751f\u6210\u5143\u7d20\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u3001\u53ef\u91cd\u7528\u5b9a\u4e49\u7684\u6709\u610f\u4e49\u540d\u79f0\uff0c\u5e76\u8bc6\u522b\u51fa\u54ea\u4e9b\u53d1\u73b0\u7684\u5c5e\u6027\u6700\u6709\u7528\uff0c\u54ea\u4e9b\u53ef\u4ee5\u89c6\u4e3a\u201c\u566a\u58f0\u201d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5148\u524d\u5df2\u8bc1\u660e\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684\u6587\u672c\u751f\u6210\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282](http://arxiv.org/abs/2407.03282)|**[link](https://github.com/ziweiji/Internal_States_Reveal_Hallucination)**|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u4e25\u91cd\u5236\u7ea6\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002\u4eba\u7c7b\u5177\u6709\u81ea\u6211\u610f\u8bc6\u8fc7\u7a0b\uff0c\u80fd\u8bc6\u522b\u9762\u5bf9\u67e5\u8be2\u65f6\u7684\u672a\u77e5\u9886\u57df\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u7684\u8bba\u6587\u7814\u7a76\u4e86LLMs\u80fd\u5426\u5728\u751f\u6210\u54cd\u5e94\u4e4b\u524d\u81ea\u884c\u8bc4\u4f30\u5176\u5e7b\u89c9\u98ce\u9669\u3002\u6211\u4eec\u4ece\u8bad\u7ec3\u6570\u636e\u6e90\u548c15\u4e2a\u4e0d\u540c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u7684\u89d2\u5ea6\u5e7f\u6cdb\u5206\u6790LLMs\u7684\u5185\u90e8\u673a\u5236\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u8d85\u8fc7700\u4e2a\u6570\u636e\u96c6\u3002\u5b9e\u8bc1\u5206\u6790\u63ed\u793a\u4e86\u4e24\u4e2a\u5173\u952e\u53d1\u73b0\uff1a(1) LLM\u7684\u5185\u90e8\u72b6\u6001\u80fd\u591f\u6307\u793a\u5b83\u4eec\u662f\u5426\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c1\u8fc7\u67e5\u8be2\uff1b(2) LLM\u7684\u5185\u90e8\u72b6\u6001\u663e\u793a\u51fa\u5b83\u4eec\u5bf9\u67e5\u8be2\u53ef\u80fd\u4ea7\u751f\u5e7b\u89c9\u6216\u4e0d\u4ea7\u751f\u5e7b\u89c9\u7684\u98ce\u9669\u3002\u6211\u4eec\u7684\u7814\u7a76\u5173\u6ce8\u7279\u5b9a\u7684\u795e\u7ecf\u5143\u3001\u6fc0\u6d3b\u5c42\u548c\u4ee4\u724c\uff0c\u8fd9\u4e9b\u5728LLM\u5bf9\u4e0d\u786e\u5b9a\u6027\u548c\u5e7b\u89c9\u98ce\u9669\u7684\u8ba4\u8bc6\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\u3002\u901a\u8fc7\u4e00\u79cd\u63a2\u67e5\u4f30\u8ba1\u7b97\u6cd5\uff0c\u6211\u4eec\u5229\u7528LLM\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u8fd0\u884c\u65f6\u5b9e\u73b0\u4e86\u5e73\u574784.32%\u7684\u5e7b\u89c9\u4f30\u8ba1\u51c6\u786e\u7387\u3002|\n", "2407.03227": "|**2024-07-03**|**Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning**|Zhili Shen et.al.|[2407.03227](http://arxiv.org/abs/2407.03227)|null|\u6211\u4eec\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u89d2\u5ea6\u63a2\u8ba8\u6587\u672c\u5230SQL\u7684\u8bed\u4e49\u89e3\u6790\u3002\u9274\u4e8e\u5546\u4e1a\u6570\u636e\u5e93\u6a21\u5f0f\u7684\u89c4\u6a21\u6311\u6218\u548c\u4e1a\u52a1\u667a\u80fd\u89e3\u51b3\u65b9\u6848\u7684\u90e8\u7f72\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u5b83\u52a8\u6001\u83b7\u53d6\u8f93\u5165\u6570\u636e\u5e93\u4fe1\u606f\uff0c\u5e76\u5229\u7528\u62bd\u8c61\u8bed\u6cd5\u6811\u9009\u62e9\u5c11\u91cf\u793a\u4f8b\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5982\u4f55\u5229\u7528\u5e76\u884c\u8bed\u4e49\u89e3\u6790\u5668\u751f\u6210SQL\u67e5\u8be2\u7684\u8fd1\u4f3c\u7248\u672c\uff0c\u4ee5\u652f\u6301\u6211\u4eec\u7684\u68c0\u7d22\u3002\u6211\u4eec\u751a\u81f3\u5c06\u8fd9\u79cd\u65b9\u6cd5\u63a8\u5411\u6781\u81f4\uff0c\u91c7\u7528\u4e0d\u52305\u4ebf\u53c2\u6570\u7684\u6a21\u578b\u4f5c\u4e3a\u9ad8\u6548\u8fd1\u4f3c\u5668\uff0c\u5e76\u8d4b\u4e88\u5176\u5e76\u884c\u5904\u7406\u6a21\u5f0f\u7684\u80fd\u529b\u3002\u6211\u4eec\u5728\u5355\u8bed\u548c\u8de8\u8bed\u8a00\u7684\u8bed\u4e49\u89e3\u6790\u57fa\u51c6\u4e0a\u5e94\u7528\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u7ed3\u679c\u4f18\u4e8e\u73b0\u6709\u6700\u4f73\u57fa\u7ebf\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u63ed\u793a\u4e86\u8fd9\u79cd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u8bbe\u7f6e\u4e2d\u5404\u4e2a\u6a21\u5757\u7684\u8d21\u732e\uff0c\u4e3a\u672a\u6765\u5de5\u4f5c\u6307\u660e\u4e86\u6709\u8da3\u7684\u65b9\u5411\u3002|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211](http://arxiv.org/abs/2407.03211)|null|## \u80cc\u666f \u91cf\u5316\u6280\u672f\u5728\u63d0\u5347\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u901f\u5ea6\u548c\u90e8\u7f72\u6548\u7387\u65b9\u9762\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5c3d\u7ba1\u6709\u5927\u91cf\u7684\u7814\u7a76\u5173\u6ce8\u4e86\u91cf\u5316\u540e\u7684\u82f1\u8bed\u4efb\u52a1\u6a21\u578b\u6548\u679c\uff0c\u4f46\u5c1a\u65e0\u7814\u7a76\u9488\u5bf9\u591a\u8bed\u8a00\u573a\u666f\u3002\u6211\u4eec\u5bf9\u91cf\u5316\u591a\u8bed\u8a00LLM\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u91cd\u70b9\u5173\u6ce8\u5176\u8de8\u8bed\u8a00\u6027\u80fd\u53ca\u4e0d\u540c\u89c4\u6a21\u4e0b\u7684\u8868\u73b0\u3002\u6211\u4eec\u91c7\u7528\u81ea\u52a8\u57fa\u51c6\u6d4b\u8bd5\u3001LLM\u4f5c\u4e3a\u8bc4\u5224\u8005\u7684\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u8bc4\u4f30\uff0c\u53d1\u73b0\u4ee5\u4e0b\u51e0\u70b9\uff1a(1) \u91cf\u5316\u5bf9\u4eba\u7c7b\u8bc4\u4ef7\u7684\u5f71\u54cd\u662f\u8d1f\u9762\u7684\uff0c\u4e14\u81ea\u52a8\u6307\u6807\u4e25\u91cd\u4f4e\u4f30\u4e86\u8fd9\u79cd\u635f\u5bb3\uff1a\u81ea\u52a8\u4efb\u52a1\u4e2d\u5e73\u57471.7%\u7684\u6027\u80fd\u4e0b\u964d\u5bf9\u5e94\u4eba\u7c7b\u8bc4\u4f30\u4e2d\u65e5\u672c\u4efb\u52a1\u768416.0%\u663e\u8457\u4e0b\u6ed1\uff1b(2) \u4e0d\u540c\u8bed\u8a00\u53d7\u5230\u91cf\u5316\u7684\u5f71\u54cd\u7a0b\u5ea6\u4e0d\u5747\uff0c\u975e\u62c9\u4e01\u5b57\u6bcd\u4f53\u7cfb\u7684\u8bed\u8a00\u53d7\u5f71\u54cd\u6700\u4e25\u91cd\uff1b(3) \u6bd4\u5982\u6570\u5b66\u63a8\u7406\u8fd9\u7c7b\u6311\u6218\u6027\u4efb\u52a1\uff0c\u5176\u6027\u80fd\u4e0b\u964d\u6700\u4e3a\u663e\u8457\u3002\u968f\u7740\u4f4e\u529f\u8017\u6a21\u578b\u670d\u52a1\u4e8e\u5168\u7403NLP\u6280\u672f\u7684\u666e\u53ca\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5728\u8bc4\u4f30\u9ad8\u6548\u6a21\u578b\u65f6\uff0c\u591a\u8bed\u8a00\u6027\u80fd\u5e94\u4f5c\u4e3a\u5173\u952e\u6307\u6807\u3002|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203](http://arxiv.org/abs/2407.03203)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|**### \u7ffb\u8bd1 \u5728\u6570\u5b66\u8bc1\u660e\u7684\u8ba1\u7b97\u673a\u53ef\u9a8c\u8bc1\u5f62\u5f0f\u8bed\u8a00\uff08\u5982Lean\uff09\u9a8c\u8bc1\u4e2d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u57fa\u4e8e\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u7684\u8bc1\u660e\u65b9\u6cd5\u5177\u6709\u91cd\u8981\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7531\u4e8eNL\u4e0e\u5f62\u5f0f\u8bed\u8a00\uff08FL\uff09\u7684\u8bc1\u660e\u6570\u636e\u7a00\u7f3a\uff0c\u73b0\u4ee3LLMs\u5728\u751f\u6210\u5b8c\u6574\u8bc1\u660e\u65b9\u9762\u7684\u6027\u80fd\u6b20\u4f73\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a**TheoremLlama**\u7684\u7aef\u5230\u7aef\u6846\u67b6\uff0c\u65e8\u5728\u8bad\u7ec3\u901a\u7528LLM\u6210\u4e3aLean4\u4e13\u5bb6\u3002\u8be5\u6846\u67b6\u5305\u62ecNL-FL\u5bf9\u9f50\u6570\u636e\u96c6\u751f\u6210\u65b9\u6cd5\u3001LLM\u5f62\u5f0f\u5b9a\u7406\u8bc1\u660e\u5668\u7684\u8bad\u7ec3\u7b56\u7565\u4ee5\u53caLLM\u5728\u64b0\u5199Lean4\u8bc1\u660e\u4e2d\u7684\u6280\u672f\u3002 \u5173\u952e\u521b\u65b0\u5728\u4e8e\u6211\u4eec\u5f00\u53d1\u4e86NL-FL\u81ea\u4e3e\u65b9\u6cd5\uff0c\u5373\u5c06NL\u8bc1\u660e\u878d\u5165Lean4\u4ee3\u7801\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u6b63\u5f0f\u63a8\u7406\u3002\u901a\u8fc7\u8fd9\u79cd\u6570\u636e\u96c6\u751f\u6210\u65b9\u5f0f\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86**Open Bootstrapped Theorems**\uff08OBT\uff09\uff0c\u4e00\u4e2a\u5bf9\u9f50\u4e14\u81ea\u4e3e\u7684NL-FL\u6570\u636e\u96c6\u3002**TheoremLlama**\u6846\u67b6\u5728MiniF2F-Valid\u548cTest\u6570\u636e\u96c6\u4e0a\u7684\u7d2f\u8ba1\u51c6\u786e\u7387\u5206\u522b\u8fbe\u523036.48%\u548c33.61%\uff0c\u8d85\u8fc7\u4e86GPT-4\u7684\u57fa\u7ebf\u5206\u657022.95%\u548c25.41%\u3002\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u6a21\u578b\u68c0\u67e5\u70b9\u548c\u751f\u6210\u7684\u6570\u636e\u96c6\uff0c\u5e76\u5373\u5c06\u5168\u90e8\u4ee3\u7801\u5f00\u6e90\u3002**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181](http://arxiv.org/abs/2407.03181)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|**\u8be5\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aDivergent CoT\uff08DCoT\uff09\uff0c\u901a\u8fc7\u8981\u6c42\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u6b65\u9aa4\u4e2d\u6bd4\u8f83\u591a\u4e2a\u63a8\u7406\u94fe\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u5c0f\u578b\u3001\u66f4\u6613\u4e8e\u83b7\u53d6\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u6307\u4ee4\u8c03\u4f18\u4e5f\u80fd\u63d0\u9ad8\u8868\u73b0\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6d89\u53ca\u4e0d\u540c\u7c7b\u578b\u7684\u63a8\u7406\u4efb\u52a1\uff0c\u7814\u7a76\u53d1\u73b0\u5bf9DCoT\u6570\u636e\u96c6\u7684\u5fae\u8c03\u5728\u5404\u79cd\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece13\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\u4e0a\u666e\u904d\u4f18\u4e8e\u57fa\u672c\u7684CoT\u65b9\u6cd5\u3002\u5b9e\u9a8c\u548c\u4eba\u5de5\u8bc4\u4f30\u8868\u660e\uff0c\u8fd9\u4e9b\u6027\u80fd\u63d0\u5347\u6e90\u4e8e\u6a21\u578b\u5728\u5355\u6b21\u63a8\u7406\u4e2d\u751f\u6210\u4e86\u591a\u4e2a\u4e0d\u540c\u7684\u63a8\u7406\u8def\u5f84\uff0c\u8fd9\u8868\u660e\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u5b9e\u73b0\u81ea\u6211\u7ea0\u6b63\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5df2\u5728https://github.com/UKPLab/arxiv2024-divergent-cot\u4e0a\u516c\u5f00\u3002**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169](http://arxiv.org/abs/2407.03169)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u51fa\u8272\u7684\u63a8\u7406\u80fd\u529b\u3001\u6cdb\u5316\u80fd\u529b\u548c\u8de8\u9886\u57df\u7684\u6d41\u7545\u6027\uff0c\u5728\u63d0\u5347\u8bed\u97f3\u76f8\u5173\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u672c\u6587\u5173\u6ce8\u7684\u662f\u5982\u4f55\u5c06\u89e3\u7801\u5668\u4ec5\u6709\u7684LLMs\u6574\u5408\u5230\u8bed\u97f3\u8f6c\u6587\u672c\u7ffb\u8bd1\uff08Speech-to-Text Translation\uff0cS2TT\uff09\u4efb\u52a1\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e00\u79cd\u67b6\u6784\uff0c\u8ba9LLM\u76f4\u63a5\u5904\u7406\u7f16\u7801\u7684\u8bed\u97f3\u8868\u793a\u5e76\u751f\u6210\u6587\u672c\u7ffb\u8bd1\u3002\u540c\u65f6\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u4e0d\u540c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u6280\u672f\u548c\u4efb\u52a1\u8868\u8ff0\u65b9\u5f0f\u7684\u5f71\u54cd\u3002\u5728\u4e0d\u4f7f\u7528\u4e13\u6709\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728CoVoST 2\u548cFLEURS\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u8bbe\u8ba1\u9009\u62e9\u7684\u5408\u7406\u6027\uff0c\u5e76\u4e3aLLMs\u4e0eS2TT\u4efb\u52a1\u7684\u878d\u5408\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2407.03160": "|**2024-07-03**|**SOS! Soft Prompt Attack Against Open-Source Large Language Models**|Ziqing Yang et.al.|[2407.03160](http://arxiv.org/abs/2407.03160)|null|## \u80cc\u666f \u5f00\u6e90\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u516c\u4f17\u548c\u884c\u4e1a\u4e2d\u7684\u53d7\u6b22\u8fce\u7a0b\u5ea6\u65e5\u76ca\u63d0\u5347\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u5b9a\u5236\u3001\u5fae\u8c03\u4e14\u514d\u8d39\u4f7f\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e9b\u5f00\u6e90LLMs\u5728\u4f7f\u7528\u524d\u9700\u8981\u5ba1\u6279\uff0c\u8fd9\u4fc3\u4f7f\u7b2c\u4e09\u65b9\u53d1\u5e03\u6613\u4e8e\u83b7\u53d6\u7684\u7248\u672c\uff0c\u751a\u81f3\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u6216\u91cf\u5316\u4f18\u5316\uff0c\u4ee5\u964d\u4f4e\u8ba1\u7b97\u9700\u6c42\u3002\u8fd9\u4e9b\u4fbf\u6377\u7248\u672c\u5bf9\u7528\u6237\u9887\u5177\u5438\u5f15\u529b\uff0c\u4f46\u4e5f\u589e\u52a0\u4e86\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u7684\u98ce\u9669\uff0c\u5a01\u80c1\u5230LLMs\u7684\u5b8c\u6574\u6027\u548c\u5b89\u5168\u6027\u3002\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u65b0\u7684\u8bad\u7ec3\u65f6\u95f4\u653b\u51fb\u65b9\u6cd5SOS\uff0c\u5b83\u8bbe\u8ba1\u5f97\u8ba1\u7b97\u9700\u6c42\u4f4e\uff0c\u65e0\u9700\u5e72\u51c0\u6570\u636e\u6216\u8c03\u6574\u6a21\u578b\u6743\u91cd\uff0c\u4fdd\u6301\u6a21\u578b\u7684\u53ef\u7528\u6027\u3002SOS\u9488\u5bf9\u5404\u79cd\u573a\u666f\u4e0b\u7684\u5b89\u5168\u95ee\u9898\uff0c\u5305\u62ec\u540e\u95e8\u653b\u51fb\u3001\u7834\u89e3\u653b\u51fb\u548c\u63d0\u793a\u7a83\u53d6\u653b\u51fb\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u653b\u51fb\u5728\u6240\u6709\u8bc4\u4f30\u76ee\u6807\u4e0a\u5747\u6709\u6548\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86SOS\u6280\u672f\u7684\u53e6\u4e00\u9762\u2014\u2014\u7248\u6743\u4ee4\u724c\uff1a\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5141\u8bb8\u7528\u6237\u6807\u8bb0\u5176\u7248\u6743\u5185\u5bb9\uff0c\u9632\u6b62\u6a21\u578b\u4f7f\u7528\u3002|\n", "2407.03157": "|**2024-07-03**|**Let the Code LLM Edit Itself When You Edit the Code**|Zhenyu He et.al.|[2407.03157](http://arxiv.org/abs/2407.03157)|null|\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4ee3\u7801\u751f\u6210\u4e2d\u7684\u5e38\u89c1\u573a\u666f\uff1a\u5f00\u53d1\u8005\u5b9e\u65f6\u7f16\u8f91\u73b0\u6709\u4ee3\u7801\uff0c\u5e76\u8bf7\u6c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u8bed\u8a00\u6a21\u578b\uff09\u8fdb\u884c\u5373\u65f6\u91cd\u9884\u6d4b\u4e0b\u4e00\u4e2atoken\u6216\u884c\u3002\u76f4\u63a5\u7684\u65b9\u6cd5\u662f\u8ba9LLM\u91cd\u65b0\u7f16\u7801\u6574\u4e2a\u952e\u503c\u7f13\u5b58\u4ee5\u63d0\u4f9b\u7cbe\u786e\u7684\u9884\u6d4b\uff0c\u4f46\u8fd9\u4e2a\u8fc7\u7a0b\u8ba1\u7b97\u6210\u672c\u9ad8\uff0c\u7279\u522b\u662f\u5f53\u5e8f\u5217\u957f\u5ea6\u5f88\u957f\u65f6\u3002\u4ec5\u7f16\u7801\u7f16\u8f91\u540e\u7684\u5b50\u5e8f\u5217\u5e76\u5c06\u5176\u6574\u5408\u5230\u539f\u59cb\u952e\u503c\u7f13\u5b58\u4e2d\u4f1a\u9047\u5230\u65f6\u95f4\u6df7\u6dc6\u95ee\u9898\uff0c\u5bfc\u81f4\u6027\u80fd\u5927\u5e45\u4e0b\u964d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\u2014\u2014\\textbf{\u4f4d\u7f6e\u5b8c\u6574\u6027\u7f16\u7801}\uff08Positional Integrity Encoding\uff0c\u7b80\u79f0PIE\uff09\u3002PIE\u57fa\u4e8e\u65cb\u8f6c\u578b\u4f4d\u7f6e\u7f16\u7801\uff0c\u9996\u5148\u79fb\u9664\u5f15\u5165\u65f6\u95f4\u6df7\u6dc6\u7684\u65cb\u8f6c\u578b\u77e9\u9635\uff0c\u7136\u540e\u91cd\u65b0\u5e94\u7528\u6b63\u786e\u7684\u77e9\u9635\uff0c\u786e\u4fdd\u4e86\u4ee4\u724c\u4e4b\u95f4\u7684\u4f4d\u7f6e\u5173\u7cfb\u6b63\u786e\uff0c\u4ec5\u9700\u4e00\u8f6e\u77e9\u9635\u4e58\u6cd5\u5373\u53ef\u5b8c\u6210\u3002\u6211\u4eec\u5728RepoBench-C-8k\u6570\u636e\u96c6\u4e0a\uff0c\u4f7f\u752813\u4ebf\u300167\u4ebf\u548c330\u4ebf\u53c2\u6570\u7684DeepSeek-Coder\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6db5\u76d6\u4e86\u4ee3\u7801\u63d2\u5165\u3001\u4ee3\u7801\u5220\u9664\u548c\u591a\u4f4d\u7f6e\u4ee3\u7801\u7f16\u8f91\u7b49\u4e09\u4e2a\u5b9e\u9645\u7f16\u7a0b\u4efb\u52a1\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u5b8c\u6574\u91cd\u8ba1\u7b97\u65b9\u6cd5\u76f8\u6bd4\uff0cPIE\u5728\u6240\u6709\u6a21\u578b\u89c4\u6a21\u548c\u4efb\u52a1\u4e2d\u90fd\u80fd\u51cf\u5c11\u8d85\u8fc785%\u7684\u8ba1\u7b97\u5f00\u9500\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u597d\u7684\u6027\u80fd\u8fd1\u4f3c\u3002|\n", "2407.04694": "|**2024-07-05**|**Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs**|Rudolf Laine et.al.|[2407.04694](http://arxiv.org/abs/2407.04694)|**[link](https://github.com/lrudl/sad)**|## \u80cc\u666f \u4eba\u5de5\u667a\u80fd\u52a9\u624b\uff0c\u5982ChatGPT\uff0c\u5728\u88ab\u8bad\u7ec3\u65f6\u4f1a\u56de\u5e94\u7528\u6237\uff1a\u201c\u6211\u662f\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u201d\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u7684\u77e5\u9053\u81ea\u5df1\u662fLLMs\uff0c\u5e76\u80fd\u636e\u6b64\u53ef\u9760\u5730\u884c\u52a8\uff1f\u5b83\u4eec\u662f\u5426\u4e86\u89e3\u81ea\u5df1\u5f53\u524d\u7684\u90e8\u7f72\u60c5\u51b5\uff0c\u4f8b\u5982\u9762\u5411\u516c\u4f17\uff1f\u6211\u4eec\u79f0\u4e4b\u4e3a\u6a21\u578b\u7684\u201c\u60c5\u5883\u610f\u8bc6\u201d\u3002\u4e3a\u4e86\u91cf\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u60c5\u5883\u610f\u8bc6\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u884c\u4e3a\u6d4b\u8bd5\uff0c\u57fa\u4e8e\u95ee\u7b54\u548c\u6307\u4ee4\u6267\u884c\uff0c\u8fd9\u5c31\u662f**\u60c5\u5883\u610f\u8bc6\u6570\u636e\u96c6\uff08Situational Awareness Dataset\uff0c\u7b80\u79f0SAD\uff09**\u3002\u8be5\u57fa\u51c6\u5305\u62ec7\u4e2a\u4efb\u52a1\u7c7b\u522b\uff0c\u8d85\u8fc713,000\u4e2a\u95ee\u9898\uff0c\u6d4b\u8bd5\u4e86\u591a\u9879\u80fd\u529b\uff0c\u5982\u8bc6\u522b\u81ea\u8eab\u751f\u6210\u7684\u6587\u672c\u3001\u9884\u6d4b\u81ea\u5df1\u7684\u884c\u4e3a\u3001\u5206\u8fa8\u63d0\u793a\u6765\u81ea\u5185\u90e8\u8bc4\u4f30\u8fd8\u662f\u5b9e\u9645\u5e94\u7528\uff0c\u4ee5\u53ca\u9075\u5faa\u4f9d\u8d56\u81ea\u6211\u8ba4\u77e5\u7684\u6307\u4ee4\u3002 \u6211\u4eec\u5bf916\u79cdLLMs\u5728SAD\u4e0a\u7684\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ec\u57fa\u7840\uff08\u9884\u8bad\u7ec3\uff09\u6a21\u578b\u548c\u804a\u5929\u6a21\u578b\u3002\u5c3d\u7ba1\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f18\u4e8e\u968f\u673a\u731c\u6d4b\uff0c\u4f46\u6700\u9ad8\u5206\u7684\u6a21\u578b\uff08Claude 3 Opus\uff09\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u4ecd\u8fdc\u672a\u8fbe\u5230\u4eba\u7c7b\u6c34\u5e73\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0SAD\u7684\u8868\u73b0\u4e0e\u901a\u7528\u77e5\u8bc6\u6307\u6807\uff08\u5982MMLU\uff09\u7684\u76f8\u5173\u6027\u5e76\u4e0d\u5b8c\u5168\u4e00\u81f4\u3002\u804a\u5929\u6a21\u578b\uff0c\u7ecf\u8fc7\u9488\u5bf9\u6027\u8bad\u7ec3\u4ee5\u4f5c\u4e3aAI\u52a9\u624b\uff0c\u76f8\u5bf9\u4e8e\u57fa\u7840\u6a21\u578b\u5728SAD\u4e0a\u7684\u8868\u73b0\u66f4\u597d\uff0c\u4f46\u5728\u901a\u7528\u77e5\u8bc6\u4efb\u52a1\u4e0a\u5219\u4e0d\u7136\u3002SAD\u7684\u76ee\u6807\u662f\u901a\u8fc7\u5206\u89e3\u6210\u53ef\u91cf\u5316\u7684\u80fd\u529b\uff0c\u4fc3\u8fdb\u79d1\u5b66\u754c\u5bf9LLMs\u60c5\u5883\u610f\u8bc6\u7684\u7406\u89e3\u3002\u60c5\u5883\u610f\u8bc6\u5bf9\u4e8e\u589e\u5f3a\u6a21\u578b\u7684\u81ea\u4e3b\u89c4\u5212\u548c\u884c\u52a8\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u8fd9\u65e2\u6709\u5229\u4e8e\u81ea\u52a8\u5316\uff0c\u4e5f\u5e26\u6765\u4e86\u4e0eAI\u5b89\u5168\u548c\u63a7\u5236\u76f8\u5173\u7684\u5168\u65b0\u98ce\u9669\u3002\u60a8\u53ef\u4ee5\u5728\u83b7\u53d6\u4ee3\u7801\u548c\u6700\u65b0\u7ed3\u679c\u3002|\n", "2407.04693": "|**2024-07-05**|**ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models**|Yuzhe Gu et.al.|[2407.04693](http://arxiv.org/abs/2407.04693)|**[link](https://github.com/open-compass/anah)**|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8de8\u9886\u57df\u548c\u5e7f\u6cdb\u5e94\u7528\u7684\u957f\u683c\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\u4f1a\u51fa\u73b0\u5e7b\u89c9\u3002\u5f53\u524d\u7684\u5e7b\u89c9\u68c0\u6d4b\u548c\u7f13\u89e3\u6570\u636e\u96c6\u5728\u9886\u57df\u8986\u76d6\u548c\u89c4\u6a21\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u7531\u4e8e\u52b3\u52a8\u6210\u672c\u9ad8\u6602\u4e14\u73b0\u6709\u5e7b\u89c9\u6807\u6ce8\u5458\u7684\u53ef\u9760\u6027\u4e0d\u8db3\uff0c\u96be\u4ee5\u5b9e\u73b0\u89c4\u6a21\u5316\u3002\u4e3a\u4e86\u63a8\u52a8\u5bf9LLMs\u5e7b\u89c9\u7684\u53ef\u6269\u5c55\u76d1\u7763\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3\u7684\u81ea\u6211\u8bad\u7ec3\u6846\u67b6\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u671f\u671b\u6700\u5927\u5316\uff08EM\uff09\u7b97\u6cd5\uff0c\u6bcf\u6b21\u8fed\u4ee3\u9996\u5148\u4f7f\u7528\u4e00\u4e2a\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u6765\u6807\u8bb0\u6269\u5927\u7684\u6570\u636e\u96c6\uff0c\u7136\u540e\u7528\u8fd9\u4e2a\u66f4\u51c6\u786e\u7684\u6807\u6ce8\u5668\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u5728\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u4e2d\uff0c\u4f7f\u7528\u65b0\u7684\u6807\u6ce8\u5668\u66f4\u65b0\u5e7b\u89c9\u6807\u6ce8\u6d41\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u5168\u9762\u5c55\u793a\uff0c\u6700\u7ec8\u5f97\u5230\u7684\u4ec5\u97007\u4ebf\u53c2\u6570\u7684\u5e7b\u89c9\u6807\u6ce8\u5668\u8d85\u8d8a\u4e86GPT-4\u7684\u8868\u73b0\uff0c\u5e76\u5728HaluEval\u548cHalluQA\u4e0a\u7684\u96f6\u6837\u672c\u63a8\u7406\u4e2d\u53d6\u5f97\u4e86\u6700\u65b0\u7684\u5e7b\u89c9\u68c0\u6d4b\u6548\u679c\u3002\u8fd9\u79cd\u6807\u6ce8\u5668\u4e0d\u4ec5\u80fd\u591f\u8bc4\u4f30\u4e0d\u540cLLMs\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u5e7b\u89c9\u7a0b\u5ea6\uff0c\u8fd8\u80fd\u901a\u8fc7NLI\u6307\u6807\u63d0\u5347\uff08\u4ece25%\u63d0\u9ad8\u523037%\uff09\u6765\u5e2e\u52a9\u51cf\u8f7b\u751f\u6210\u6587\u672c\u7684\u5e7b\u89c9\u95ee\u9898\u3002|\n", "2407.04681": "|**2024-07-05**|**Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge**|Yuanze Lin et.al.|[2407.04681](http://arxiv.org/abs/2407.04681)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u4f7f\u7528\u5927\u578b\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u6587\u672c\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u540e\uff0c\u5728\u6574\u4f53\u7406\u89e3\u56fe\u50cf\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u6587\u672c\u5f62\u5f0f\u56fa\u6709\u7684\u56f0\u96be\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u9700\u8981\u7cbe\u7ec6\u6216\u7a7a\u95f4\u5bc6\u96c6\u4fe1\u606f\uff08\u5982\u906e\u7f69\uff09\u7684\u95ee\u9898\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u5bf9\u8be6\u7ec6\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u80fd\u529b\u3002\u53d7\u5230\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7406\u5ff5\u7684\u542f\u53d1\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u89c6\u89c9\u63d0\u793a\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6765\u81ea\u4e13\u95e8\u89c6\u89c9\u6a21\u578b\uff08\u5982\u5b9e\u4f8b\u5206\u5272\u548cOCR\u6a21\u578b\uff09\u7684\u7cbe\u7ec6\u5916\u90e8\u77e5\u8bc6\u878d\u5165MLLM\u3002\u8fd9\u662f\u4e00\u4e2a\u6709\u524d\u666f\u4f46\u5c1a\u672a\u5145\u5206\u63a2\u7d22\u7684\u65b9\u5411\uff0c\u53ef\u4ee5\u63d0\u5347MLLM\u7684\u8868\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u533a\u522b\u4e8e\u540c\u65f6\u671f\u7684\u5de5\u4f5c\uff0c\u5b83\u4eec\u5c06\u5916\u90e8\u77e5\u8bc6\u8f6c\u5316\u4e3a\u989d\u5916\u7684\u6587\u672c\u63d0\u793a\uff0c\u8feb\u4f7f\u6a21\u578b\u95f4\u63a5\u5b66\u4e60\u89c6\u89c9\u5185\u5bb9\u4e0e\u6587\u672c\u5750\u6807\u4e4b\u95f4\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u63d0\u8bae\u5c06\u7cbe\u7ec6\u77e5\u8bc6\u4fe1\u606f\u76f4\u63a5\u5d4c\u5165\u5230\u4e00\u4e2a\u7a7a\u95f4\u5d4c\u5165\u56fe\u4e2d\u4f5c\u4e3a\u89c6\u89c9\u63d0\u793a\u3002\u8fd9\u79cd\u8bbe\u8ba1\u53ef\u4ee5\u8f7b\u677e\u5730\u6574\u5408\u8fdb\u5404\u79cdMLLM\uff0c\u5982LLaVA\u548cMipha\uff0c\u663e\u8457\u63d0\u9ad8\u5b83\u4eec\u7684\u89c6\u89c9\u7406\u89e3\u6027\u80fd\u3002\u901a\u8fc7\u4e25\u8c28\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5728\u4e5d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5c55\u793a\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5982\u4f55\u63d0\u5347MLLM\u7684\u6574\u4f53\u6027\u80fd\uff0c\u589e\u5f3a\u5176\u5bf9\u7ec6\u7c92\u5ea6\u4e0a\u4e0b\u6587\u611f\u77e5\u7684\u80fd\u529b\u3002|\n", "2407.04675": "|**2024-07-05**|**Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition**|Ye Bai et.al.|[2407.04675](http://arxiv.org/abs/2407.04675)|null|\u73b0\u4ee3\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u6a21\u578b\u9700\u8981\u51c6\u786e\u8f6c\u5f55\u6765\u81ea\u4e0d\u540c\u9886\u57df\u3001\u8bed\u8a00\u548c\u53e3\u97f3\u7684\u591a\u6837\u8bed\u97f3\u4fe1\u53f7\uff0c\u540c\u65f6\u8003\u8651\u5230\u7279\u5b9a\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ee5\u9002\u5e94\u5404\u79cd\u5e94\u7528\u573a\u666f\u7684\u9700\u6c42\u3002\u4f20\u7edf\u7684\u7aef\u5230\u7aef\u6a21\u578b\u7ed3\u5408\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u6570\u636e\u5339\u914d\u573a\u666f\u4e2d\u6548\u679c\u826f\u597d\uff0c\u4f46\u9010\u6e10\u9762\u4e34\u74f6\u9888\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u578b\u8bed\u97f3\u8bc6\u522b\u6a21\u578b\u2014\u2014Seed-ASR\u3002\u5b83\u5efa\u7acb\u5728\u97f3\u9891\u6761\u4ef6\u5316LLM\uff08AcLLM\uff09\u67b6\u6784\u4e4b\u4e0a\uff0c\u901a\u8fc7\u5c06\u8fde\u7eed\u8bed\u97f3\u8868\u793a\u548c\u4e0a\u4e0b\u6587\u4fe1\u606f\u8f93\u5165\u5230LLM\u4e2d\uff0c\u5229\u7528\u4e86LLM\u7684\u5f3a\u5927\u529f\u80fd\u3002\u901a\u8fc7\u5206\u9636\u6bb5\u7684\u5927\u89c4\u6a21\u8bad\u7ec3\u4ee5\u53ca\u5728LLM\u4e2d\u6fc0\u53d1\u4e0a\u4e0b\u6587\u611f\u77e5\u80fd\u529b\uff0cSeed-ASR\u5728\u5305\u62ec\u591a\u4e2a\u9886\u57df\u3001\u65b9\u8a00\u548c\u8bed\u8a00\u7684\u7efc\u5408\u8bc4\u4f30\u96c6\u4e0a\u663e\u8457\u4f18\u4e8e\u7aef\u5230\u7aef\u6a21\u578b\u3002\u6b64\u5916\uff0cSeed-ASR\u80fd\u591f\u90e8\u7f72\u5230\u5404\u79cd\u573a\u666f\u4e2d\u652f\u6301\u7279\u5b9a\u9700\u6c42\uff0c\u65e0\u9700\u989d\u5916\u7684\u8bed\u8a00\u6a21\u578b\u3002\u4e0e\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578bASR\u6a21\u578b\u76f8\u6bd4\uff0cSeed-ASR\u5728\u4e2d\u6587\u548c\u82f1\u6587\u516c\u5f00\u6d4b\u8bd5\u96c6\u4e0a\u7684\u8bcd\uff08\u6216\u5b57\u7b26\uff0c\u9488\u5bf9\u4e2d\u6587\uff09\u9519\u8bef\u7387\u964d\u4f4e\u4e8610%-40%\uff0c\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u5176\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.04656": "|**2024-07-05**|**Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement**|Yongji Wu et.al.|[2407.04656](http://arxiv.org/abs/2407.04656)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u89c4\u6a21\u4e0d\u65ad\u6269\u5927\uff0c\u7a00\u758f\u6fc0\u6d3b\u7684\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u8ba1\u7b97\u6210\u672c\u7684\u4e9a\u7ebf\u6027\u6269\u5c55\u800c\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u3002\u7136\u800c\uff0c\u9891\u7e41\u7684\u8bad\u7ec3\u5931\u8d25\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u56e0\u4e3a\u5355\u6b21\u5931\u8d25\u53ef\u80fd\u5bfc\u81f4\u6240\u6709GPU\u9677\u5165\u95f2\u7f6e\uff0c\u76f4\u81f3\u95ee\u9898\u89e3\u51b3\uff0c\u4ece\u800c\u53ef\u80fd\u4e22\u5931\u5927\u91cf\u8bad\u7ec3\u8fdb\u5ea6\uff0c\u9700\u8981\u4ece\u68c0\u67e5\u70b9\u91cd\u65b0\u5f00\u59cb\u3002\u73b0\u6709\u7684\u9ad8\u6548\u5bb9\u9519\u8bad\u7ec3\u89e3\u51b3\u65b9\u6848\u8981\u4e48\u7f3a\u4e4f\u5f39\u6027\uff0c\u8981\u4e48\u4f9d\u8d56\u4e8e\u5c06\u6062\u590d\u80fd\u529b\u6784\u5efa\u5230\u7ba1\u9053\u5e76\u884c\u6027\u4e2d\uff0c\u4f46\u8fd9\u4e0d\u9002\u7528\u4e8eMoE\u6a21\u578b\uff0c\u56e0\u4e3aMoE\u67b6\u6784\u91c7\u7528\u4e86\u4e13\u5bb6\u5e76\u884c\u7b56\u7565\u3002 \u6211\u4eec\u63d0\u51fa\u4e86Lazarus\uff0c\u4e00\u4e2a\u9488\u5bf9MoE\u6a21\u578b\u8fdb\u884c\u5bb9\u9519\u548c\u5f39\u6027\u7684\u8bad\u7ec3\u7cfb\u7edf\u3002Lazarus\u901a\u8fc7\u52a8\u6001\u5206\u914d\u4e13\u5bb6\u526f\u672c\u6765\u5e94\u5bf9\u4e13\u5bb6\u5de5\u4f5c\u8d1f\u8f7d\u7684\u56fa\u6709\u4e0d\u5e73\u8861\uff0c\u4ece\u800c\u52a0\u901f\u8bad\u7ec3\uff0c\u5e76\u5f00\u53d1\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u6700\u4f18\u7684\u4e13\u5bb6\u653e\u7f6e\u7b97\u6cd5\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8\u5728\u5931\u8d25\u540e\u7684\u6062\u590d\u6982\u7387\u3002\u901a\u8fc7\u81ea\u9002\u5e94\u7684\u4e13\u5bb6\u653e\u7f6e\u548c\u7075\u6d3b\u7684\u4ee4\u724c\u5206\u53d1\u5668\uff0cLazarus\u80fd\u591f\u5728\u6545\u969c\u540e\u5145\u5206\u5229\u7528\u6240\u6709\u53ef\u7528\u8282\u70b9\uff0c\u907f\u514dGPU\u7a7a\u95f2\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u4e0e\u73b0\u6709MoE\u8bad\u7ec3\u7cfb\u7edf\u76f8\u6bd4\uff0cLazarus\u5728\u9891\u7e41\u7684\u8282\u70b9\u6545\u969c\u4e0b\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe5.7\u500d\uff0c\u4e14\u5728\u771f\u5b9espot\u5b9e\u4f8b\u8ddf\u8e2a\u4e0a\u63d0\u5347\u4e863.4\u500d\u3002|\n", "2407.04629": "|**2024-07-05**|**Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework**|Reza Averly et.al.|[2407.04629](http://arxiv.org/abs/2407.04629)|null|\u8be5\u8bba\u6587\u5173\u6ce8\u7684\u662f\u4e34\u5e8a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08Clinical NER\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u4ece\u4e34\u5e8a\u75c5\u5386\u4e2d\u63d0\u53d6\u91cd\u8981\u5b9e\u4f53\u7684\u4efb\u52a1\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4e13\u6709\u7684LLMs\uff0c\u4f46\u8bba\u6587\u63a2\u8ba8\u4e86\u5f00\u653e\u7684\u3001\u4e13\u95e8\u4e3a\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u8bad\u7ec3\u7684LLMs\u5728\u4e34\u5e8aNER\u4e2d\u7684\u6027\u80fd\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u79f0\u4e3a\u201c\u5b9e\u4f53\u5206\u89e3\u4e0e\u8fc7\u6ee4\u201d\uff08Entity Decomposition with Filtering\uff0cEDF\uff09\uff0c\u76ee\u7684\u662f\u901a\u8fc7\u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u5b9e\u4f53\u7c7b\u578b\u7684\u68c0\u7d22\uff0c\u5e76\u5f15\u5165\u4e00\u4e2a\u8fc7\u6ee4\u673a\u5236\u6765\u6d88\u9664\u9519\u8bef\u5b9e\u4f53\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8be5\u6846\u67b6\u5728\u6240\u6709\u5ea6\u91cf\u6807\u51c6\u3001\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5b9e\u4f53\u7c7b\u578b\u4e0a\u90fd\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002\u5206\u6790\u663e\u793a\uff0c\u5b9e\u4f53\u5206\u89e3\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u5bf9\u5148\u524d\u672a\u88ab\u6355\u6349\u5230\u7684\u5b9e\u4f53\u7684\u8bc6\u522b\u3002\u6b64\u5916\uff0c\u8bba\u6587\u8fd8\u63d0\u4f9b\u4e86\u5bf9\u6846\u67b6\u7684\u5168\u9762\u8bc4\u4f30\u548c\u6df1\u5165\u7684\u9519\u8bef\u5206\u6790\uff0c\u4ee5\u671f\u4e3a\u672a\u6765\u7684\u7814\u7a76\u63d0\u4f9b\u65b9\u5411\u3002|\n", "2407.04622": "|**2024-07-05**|**On scalable oversight with weak LLMs judging strong LLMs**|Zachary Kenton et.al.|[2407.04622](http://arxiv.org/abs/2407.04622)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u53ef\u6269\u5c55\u7684\u76d1\u7763\u534f\u8bae\uff0c\u76ee\u6807\u662f\u8ba9\u4eba\u7c7b\u80fd\u591f\u6709\u6548\u76d1\u7763\u8d85\u8d8a\u4eba\u7c7b\u7ea7\u522b\u7684AI\u3002\u7814\u7a76\u4e3b\u8981\u805a\u7126\u5728\u8fa9\u8bba\u3001\u54a8\u8be2\u548c\u76f4\u63a5\u95ee\u7b54\u4e09\u79cd\u5f62\u5f0f\u4e0a\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3aAI\u4ee3\u7406\u548c\u6cd5\u5b98\u89d2\u8272\uff0c\u5047\u8bbe\u6cd5\u5b98\u6a21\u578b\u8f83\u5f31\u3002\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u4efb\u52a1\u5f02\u8d28\u6027\uff0c\u6269\u5c55\u4e86\u5148\u524d\u4ec5\u5173\u6ce8\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u5355\u4e00\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\uff0c\u589e\u52a0\u4e86\u6570\u5b66\u3001\u7f16\u7a0b\u3001\u903b\u8f91\u548c\u591a\u6a21\u6001\u63a8\u7406\u7b49\u9886\u57df\u7684\u6311\u6218\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u6240\u6709\u4efb\u52a1\u4e2d\uff0c\u5f53\u54a8\u8be2\u5e08\u968f\u673a\u88ab\u5206\u914d\u6b63\u786e\u6216\u9519\u8bef\u7b54\u6848\u65f6\uff0c\u8fa9\u8bba\u4f18\u4e8e\u54a8\u8be2\u3002\u5728\u5b58\u5728\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u4e2d\uff0c\u8fa9\u8bba\u4f18\u4e8e\u76f4\u63a5\u95ee\u7b54\uff0c\u4f46\u5728\u5176\u4ed6\u6ca1\u6709\u4fe1\u606f\u4e0d\u5bf9\u79f0\u7684\u4efb\u52a1\u4e2d\uff0c\u7ed3\u679c\u5219\u4e0d\u4e00\u3002\u5f53AI\u88ab\u5141\u8bb8\u9009\u62e9\u8981\u8bba\u8bc1\u7684\u7b54\u6848\u800c\u975e\u9884\u5148\u6307\u5b9a\u65f6\uff0c\u53d1\u73b0\u6cd5\u5b98\u88ab\u9519\u8bef\u7b54\u6848\u8bf4\u670d\u7684\u60c5\u51b5\u5728\u8fa9\u8bba\u4e2d\u51cf\u5c11\u3002\u6b64\u5916\uff0c\u66f4\u5f3a\u7684\u8fa9\u8bba\u8005\u6a21\u578b\u80fd\u63d0\u9ad8\u6cd5\u5b98\u7684\u51c6\u786e\u6027\uff0c\u5c3d\u7ba1\u63d0\u5347\u7a0b\u5ea6\u7565\u4f4e\u4e8e\u4e4b\u524d\u7684\u7814\u7a76\u3002|\n", "2407.04581": "|**2024-07-05**|**Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions**|Shumaila Javaid et.al.|[2407.04581](http://arxiv.org/abs/2407.04581)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982\u4f55\u878d\u5165\u96c6\u6210\u536b\u661f\u3001\u822a\u7a7a\u548c\u5730\u9762\u7f51\u7edc\uff08ISATN\uff09\u7684\u53d8\u9769\u6f5c\u529b\uff0c\u5229\u7528\u5148\u8fdb\u7684\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6280\u672f\u4f18\u5316\u8fd9\u4e9b\u7f51\u7edc\u7684\u8fde\u901a\u6027\u3002\u9996\u5148\u6982\u8ff0\u4e86ISATN\u7684\u5f53\u524d\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86LLMs\u5728\u63d0\u5347\u6570\u636e\u6d41\u3001\u4fe1\u53f7\u5904\u7406\u548c\u7f51\u7edc\u7ba1\u7406\u65b9\u9762\u7684\u4f5c\u7528\uff0c\u4ee5\u63a8\u52a85G/6G\u901a\u4fe1\u6280\u672f\u7684\u53d1\u5c55\uff0c\u901a\u8fc7\u9ad8\u7ea7\u9884\u6d4b\u7b97\u6cd5\u548c\u5b9e\u65f6\u51b3\u7b56\u6765\u589e\u5f3a\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6df1\u5165\u5206\u6790\u4e86ISATN\u7ec4\u4ef6\uff0c\u63a2\u8ba8\u4e86\u5982\u4f55\u6709\u6548\u5730\u5229\u7528LLMs\u89e3\u51b3\u4f20\u7edf\u6570\u636e\u4f20\u8f93\u548c\u5904\u7406\u4e2d\u7684\u74f6\u9888\u95ee\u9898\u3002 \u6587\u7ae0\u7740\u91cd\u4e8eISATN\u7684\u7f51\u7edc\u7ba1\u7406\u6311\u6218\uff0c\u5305\u62ec\u8d44\u6e90\u5206\u914d\u7b56\u7565\u3001\u6d41\u91cf\u8def\u7531\u4ee5\u53ca\u5728\u4e0d\u65ad\u53d8\u5316\u6761\u4ef6\u4e0b\u786e\u4fdd\u65e0\u7f1d\u8fde\u63a5\u548c\u6700\u4f18\u6027\u80fd\u7684\u7f51\u7edc\u5b89\u5168\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u5c06LLMs\u6574\u5408\u5230ISATN\u4e2d\u6240\u9762\u4e34\u7684\u6280\u672f\u6311\u6218\uff0c\u5982\u6570\u636e\u96c6\u6210\u3001\u6269\u5c55\u6027\u95ee\u9898\u3001\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u5ef6\u8fdf\uff0c\u4ee5\u53ca\u6784\u5efa\u5065\u58ee\u4e14\u5bb9\u9519\u7684\u7cfb\u7edf\u8bbe\u8ba1\u3002\u6700\u540e\uff0c\u7814\u7a76\u6307\u51fa\u4e86\u672a\u6765\u7814\u7a76\u7684\u5173\u952e\u65b9\u5411\uff0c\u5373\u5982\u4f55\u5145\u5206\u5229\u7528LLM\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u5347\u7f51\u7edc\u53ef\u9760\u6027\u3001\u4f18\u5316\u6027\u80fd\uff0c\u5b9e\u73b0\u4e00\u4e2a\u771f\u6b63\u5168\u7403\u4e92\u8054\u4e14\u667a\u80fd\u7684\u7f51\u7edc\u4f53\u7cfb\u3002|\n", "2407.04573": "|**2024-07-05**|**VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models**|Hang Gao et.al.|[2407.04573](http://arxiv.org/abs/2407.04573)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5411\u91cf\u68c0\u7d22\u7b97\u6cd5\u5bf9\u4e8e\u6ee1\u8db3\u76f8\u4f3c\u5ea6\u548c\u591a\u6837\u6027\u8981\u6c42\u7684\u8bed\u4e49\u67e5\u8be2\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1Maximal Marginal Relevance\uff08MMR\uff09\u5728\u6d89\u53ca\u8fd9\u4e24\u4e2a\u9700\u6c42\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u88ab\u5e7f\u6cdb\u5e94\u7528\uff0c\u4f46\u5176\u53c2\u6570\u03bb\u7684\u53d8\u5316\u4f1a\u5bfc\u81f4\u7ed3\u679c\u6ce2\u52a8\uff0c\u4f7f\u5f97\u5411\u91cf\u7a7a\u95f4\u4e2d\u7684\u4f18\u5316\u8def\u5f84\u53d8\u5f97\u6a21\u7cca\u3002\u6b64\u5916\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u5728\u68c0\u7d22\u8fc7\u7a0b\u4e2d\u7ea6\u675f\u7684\u575a\u5b9e\u7406\u8bba\u5206\u6790\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u67e5\u8be2\u5411\u91cf\u4e0e\u6c42\u548c\u5411\u91cf\u4e4b\u95f4\u7684\u5173\u7cfb\u6765\u523b\u753b\u8fd9\u4e24\u79cd\u7ea6\u675f\u3002\u8fd9\u79cd\u5173\u7cfb\u786e\u4fdd\u4e86\u76f8\u4f3c\u6027\uff0c\u540c\u65f6\u8981\u6c42\u6c42\u548c\u5411\u91cf\u4e2d\u7684\u5404\u4e2a\u5411\u91cf\u4ee5\u5206\u6563\u7684\u65b9\u5f0f\u4e0e\u67e5\u8be2\u5411\u91cf\u5bf9\u9f50\uff0c\u4ee5\u6ee1\u8db3\u591a\u6837\u6027\u9700\u6c42\u3002 \u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u7684\u7ec4\u5408\u4f18\u5316\u95ee\u9898\uff1a\u4ece\u4e00\u7ec4\u5019\u9009\u5411\u91cf\u4e2d\u9009\u62e9$k$\u4e2a\uff0c\u4f7f\u5f97\u5b83\u4eec\u7684\u6c42\u548c\u5411\u91cf\u6700\u5927\u7a0b\u5ea6\u5730\u4e0e\u67e5\u8be2\u5411\u91cf\u5339\u914d\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e2a\u95ee\u9898\u662fNP\u5b8c\u5168\u7684\uff0c\u63ed\u793a\u4e86\u5728\u5411\u91cf\u68c0\u7d22\u4e2d\u540c\u65f6\u8ffd\u6c42\u76f8\u4f3c\u6027\u548c\u591a\u6837\u6027\u7684\u6df1\u523b\u56f0\u96be\uff0c\u5e76\u4e3a\u540e\u7eed\u7814\u7a76\u5960\u5b9a\u4e86\u7406\u8bba\u57fa\u7840\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aVectors Retrieval with Similarity and Diversity\uff08VRSD\uff09\u7684\u542f\u53d1\u5f0f\u7b97\u6cd5\uff0c\u5b83\u4e0d\u4ec5\u5177\u6709\u660e\u786e\u7684\u4f18\u5316\u76ee\u6807\uff0c\u65e0\u9700\u9884\u8bbe\u53c2\u6570\uff0c\u800c\u4e14\u5728\u65f6\u95f4\u590d\u6742\u6027\u4e0a\u76f8\u5bf9\u4e8eMMR\u6709\u6240\u964d\u4f4e\u3002\u5b9e\u8bc1\u9a8c\u8bc1\u8868\u660e\uff0cVRSD\u5728\u5404\u79cd\u6570\u636e\u96c6\u4e0a\u663e\u8457\u4f18\u4e8eMMR\u3002|\n", "2407.04541": "|**2024-07-05**|**PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts**|Ana-Cristina Rogoz et.al.|[2407.04541](http://arxiv.org/abs/2407.04541)|**[link](https://github.com/ana-rogoz/poprero)**|**\u6211\u4eec\u63a8\u51fa\u4e86PoPreRo\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u7f57\u9a6c\u5c3c\u4e9aReddit\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6536\u96c6\u7684dataset\u3002PoPreRo\u6c47\u96c6\u4e86\u4e94\u4e2a\u4e0d\u540c\u7f57\u9a6c\u5c3c\u4e9a\u5b50\u8bba\u575b\u7684\u591a\u6837\u5316\u5e16\u5b50\u6837\u672c\uff0c\u603b\u8ba1\u5305\u542b28,107\u6761\u6570\u636e\u3002\u968f\u6570\u636e\u96c6\u4e00\u540c\u53d1\u5e03\u7684\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u7cfb\u5217\u7ade\u4e89\u6027\u6a21\u578b\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u7840\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6d4b\u8bd5\u96c6\u4e0a\u5f97\u5206\u6700\u9ad8\u7684\u6a21\u578b\u8fbe\u5230\u4e8661.35%\u7684\u51c6\u786e\u7387\u548c60.60%\u7684\u5b8fF1\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5728PoPreRo\u4e0a\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u4efb\u52a1\u6781\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5c11\u91cf\u63d0\u793a\u5bf9Falcon-7B\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fdb\u4e00\u6b65\u63a2\u7a76\u4e5f\u6307\u5411\u4e86\u540c\u6837\u7684\u7ed3\u8bba\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u76f8\u4fe1PoPreRo\u662f\u4e00\u4e2a\u6709\u4ef7\u503c\u7684\u8d44\u6e90\uff0c\u53ef\u4ee5\u7528\u6765\u8bc4\u4f30\u7f57\u9a6c\u5c3c\u4e9a\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u7684\u6d41\u884c\u5ea6\u9884\u6d4b\u6a21\u578b\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/ana-rogoz/PoPreRo\u3002**|\n", "2407.06189": "|**2024-07-08**|**Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision**|Orr Zohar et.al.|[2407.06189](http://arxiv.org/abs/2407.06189)|**[link](https://github.com/orrzohar/Video-STaR)**|**\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u7684\u6027\u80fd\u4e0e\u5176\u8bad\u7ec3\u6570\u636e\u7684\u89c4\u6a21\u548c\u8d28\u91cf\u5bc6\u5207\u76f8\u5173\u3002\u5f53\u524d\u7684\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u7f3a\u4e4f\u591a\u6837\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e3b\u8981\u7531\u63d0\u793a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u89c6\u9891\u5b57\u5e55\u4ee5\u5f62\u6210\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u5185\u5bb9\u591a\u4e3a\u63cf\u8ff0\u6027\u3002\u7136\u800c\uff0c\u8bb8\u591a\u5e26\u6709\u4e30\u5bcc\u6807\u7b7e\u548c\u76d1\u7763\u7684\u89c6\u9891\u6570\u636e\u96c6\u5df2\u7ecf\u5b58\u5728\uff0c\u4f46\u5982\u4f55\u5c06\u5b83\u4eec\u878d\u5165LVLM\u5e76\u975e\u6613\u4e8b\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u4e0e\u589e\u5f3a\u63a8\u7406\uff08Video Self-Training with augmented Reasoning\uff0c\u7b80\u79f0Video-STaR\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u89c6\u9891\u81ea\u6211\u8bad\u7ec3\u65b9\u6cd5\u3002Video-STaR\u4f7f\u5f97\u4efb\u4f55\u6807\u6ce8\u7684\u89c6\u9891\u6570\u636e\u96c6\u90fd\u80fd\u7528\u4e8e\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0cLVLM\u5728\u751f\u6210\u6307\u4ee4\u548c\u5fae\u8c03\u4e4b\u95f4\u5faa\u73af\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u8fd9\u4e0d\u4ec5\u80fd\u63d0\u5347\u89c6\u9891\u6574\u4f53\u7406\u89e3\u80fd\u529b\uff08I\uff09\uff0c\u8fd8\u80fd\u8ba9LVLM\u9002\u5e94\u65b0\u7684\u4e0b\u6e38\u4efb\u52a1\uff0c\u5229\u7528\u73b0\u6709\u76d1\u7763\u8fdb\u884c\u5b66\u4e60\u3002 \u5177\u4f53\u6765\u8bf4\uff0cLVLM\u88ab\u63d0\u793a\u63d0\u51fa\u4e00\u4e2a\u7b54\u6848\uff0c\u7136\u540e\u4ec5\u4fdd\u7559\u90a3\u4e9b\u5305\u542b\u539f\u59cb\u89c6\u9891\u6807\u7b7e\u7684\u7b54\u6848\u3002LVLM\u968f\u540e\u5728\u751f\u6210\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u518d\u8bad\u7ec3\u3002\u901a\u8fc7\u53ea\u5728\u5305\u542b\u6b63\u786e\u89c6\u9891\u6807\u7b7e\u7684\u751f\u6210\u7b54\u6848\u4e0a\u8bad\u7ec3\uff0cVideo-STaR\u5229\u7528\u73b0\u6709\u7684\u89c6\u9891\u6807\u7b7e\u4f5c\u4e3a\u5f31\u76d1\u7763\u6765\u6307\u5bfc\u89c6\u9891\u6307\u4ee4\u8c03\u4f18\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7Video-STaR\u589e\u5f3a\u7684LVLM\u5728\uff08I\uff09\u4e00\u822c\u89c6\u9891\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u63d0\u5347\u4e8610%\uff0c\u5728\uff08II\uff09\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0cVideo-STaR\u63d0\u9ad8\u4e86Kinetics700-QA\u7684\u51c6\u786e\u602720%\uff0c\u4ee5\u53caFineDiving\u52a8\u4f5c\u8d28\u91cf\u8bc4\u4f30\u7684\u6027\u80fd15%\u3002\u603b\u7684\u6765\u8bf4\uff0cVideo-STaR\u4e3aLVLM\u7684\u6027\u80fd\u63d0\u5347\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u6548\u4e14\u5b9e\u7528\u7684\u65b9\u6cd5\u3002**|\n", "2407.06188": "|**2024-07-08**|**CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation**|Xinying Guo et.al.|[2407.06188](http://arxiv.org/abs/2407.06188)|null|\u5728\u5a31\u4e50\u884c\u4e1a\uff08\u5982\u52a8\u753b\u548c\u6e38\u620f\uff09\u4ee5\u53ca\u6218\u7565\u9886\u57df\uff08\u5982\u57ce\u5e02\u6a21\u62df\u548c\u89c4\u5212\uff09\u4e2d\uff0c\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9700\u8981\u7cbe\u7ec6\u5730\u878d\u5408\u63a7\u5236\u4e0e\u751f\u6210\uff0c\u4ee5\u5728\u7279\u5b9a\u7684\u7a7a\u95f4\u548c\u8bed\u4e49\u7ea6\u675f\u4e0b\u5b9e\u73b0\u903c\u771f\u7684\u7fa4\u4f53\u52a8\u6001\u5408\u6210\uff0c\u5176\u6311\u6218\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u6a21\u578b\u5f80\u5f80\u5173\u6ce8\u4e2a\u4f53\u884c\u4e3a\uff0c\u5ffd\u89c6\u4e86\u96c6\u4f53\u884c\u4e3a\u7684\u590d\u6742\u6027\uff1b\u800c\u591a\u4e2a\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u7684\u6700\u65b0\u65b9\u6cd5\u4e25\u91cd\u4f9d\u8d56\u9884\u8bbe\u573a\u666f\uff0c\u4e14\u9650\u4e8e\u56fa\u5b9a\u3001\u5c11\u91cf\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u9650\u5236\u4e86\u5176\u5b9e\u7528\u6027\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51faCrowdMoGen\uff0c\u4e00\u4e2a\u96f6\u6837\u672c\u6587\u672c\u9a71\u52a8\u7684\u6846\u67b6\uff0c\u5b83\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u529b\u91cf\uff0c\u5c06\u96c6\u4f53\u667a\u6167\u878d\u5165\u8fd0\u52a8\u751f\u6210\u6846\u67b6\uff0c\u4ece\u800c\u80fd\u591f\u5728\u6ca1\u6709\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u901a\u7528\u7684\u89c4\u5212\u548c\u7fa4\u4f53\u8fd0\u52a8\u751f\u6210\u3002\u6211\u4eec\u7684\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a1\uff09\u4eba\u7fa4\u573a\u666f\u89c4\u5212\u5668\uff0c\u5b66\u4e60\u6839\u636e\u7279\u5b9a\u573a\u666f\u4e0a\u4e0b\u6587\u6216\u5f15\u5165\u7684\u6270\u52a8\u534f\u8c03\u8fd0\u52a8\u548c\u52a8\u6001\uff1b2\uff09\u96c6\u4f53\u8fd0\u52a8\u751f\u6210\u5668\uff0c\u6839\u636e\u6574\u4f53\u8ba1\u5212\u9ad8\u6548\u5408\u6210\u6240\u9700\u7684\u96c6\u4f53\u8fd0\u52a8\u3002\u5927\u91cf\u7684\u5b9a\u91cf\u548c\u5b9a\u6027\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5b83\u4e0d\u4ec5\u586b\u8865\u4e86\u5927\u89c4\u6a21\u548c\u901a\u7528\u4eba\u7fa4\u8fd0\u52a8\u751f\u6210\u4efb\u52a1\u7684\u91cd\u8981\u7a7a\u767d\uff0c\u800c\u4e14\u5728\u771f\u5b9e\u611f\u548c\u7075\u6d3b\u6027\u65b9\u9762\u8868\u73b0\u51fa\u9ad8\u6c34\u51c6\u3002|\n", "2407.06172": "|**2024-07-08**|**On Speeding Up Language Model Evaluation**|Jin Peng Zhou et.al.|[2407.06172](http://arxiv.org/abs/2407.06172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\uff0c\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u6700\u5148\u8fdb\u7684\u80fd\u529b\u3002\u4ece\u8bad\u7ec3\u5230\u63a8\u7406\uff0c\u6784\u5efa\u8fd9\u6837\u7684\u6a21\u578b\u6d89\u53ca\u4f17\u591a\u51b3\u7b56\uff0c\u5f62\u6210\u4e00\u4e2a\u590d\u6742\u7684\u641c\u7d22\u95ee\u9898\u3002\u4f8b\u5982\uff0c\u4e3a\u4e86\u4e3a\u7279\u5b9a\u4efb\u52a1\u627e\u5230\u6700\u4f73\u7684\u9884\u8bad\u7ec3LLM\u3001\u63d0\u793a\u6216\u8d85\u53c2\u6570\uff0c\u901a\u5e38\u9700\u8981\u5bf9\u6574\u4e2a\u6d4b\u8bd5\u96c6\u4e2d\u7684\u591a\u4e2a\u5019\u9009\u65b9\u6848\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002\u8fd9\u79cd\u8be6\u5c3d\u7684\u8bc4\u4f30\u8017\u65f6\u4e14\u6602\u8d35\uff0c\u56e0\u4e3aLLMs\u7684\u63a8\u7406\u548c\u5ea6\u91cf\u8ba1\u7b97\u9700\u6c42\u9ad8\u3002 \u672c\u6587\u9488\u5bf9\u5728\u6709\u9650\u9884\u7b97\u5185\u6709\u6548\u8bc4\u4f30\u65b9\u6cd5\u5728\u6d4b\u8bd5\u6837\u672c\u4e0a\u7684\u6027\u80fd\u8fd9\u4e00\u6311\u6218\u3002\u6211\u4eec\u5229\u7528\u4e86\u5e7f\u6cdb\u7814\u7a76\u7684\u591a\u81c2\u8001\u864e\u673a\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u987a\u5e8f\u9009\u62e9\u4e0b\u4e00\u4e2a\u8981\u8bc4\u4f30\u7684\u65b9\u6cd5-\u793a\u4f8b\u5bf9\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u2014\u2014\u7ed3\u5408\u591a\u81c2\u8001\u864e\u673a\u7b97\u6cd5\u4e0e\u4f4e\u79e9\u5206\u89e3\u2014\u2014\u663e\u8457\u51cf\u5c11\u4e86\u6240\u9700\u7684\u8d44\u6e90\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b97\u6cd5\u4ec5\u4f7f\u7528\u901a\u5e38\u9700\u6c42\u76845%-15%\u8d44\u6e90\uff0c\u5c31\u80fd\u8bc6\u522b\u51fa\u8868\u73b0\u6700\u597d\u7684\u65b9\u6cd5\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9ad8\u8fbe85%-95%\u7684\u6210\u672c\u8282\u7701\u3002|\n", "2407.06153": "|**2024-07-08**|**What's Wrong with Your Code Generated by Large Language Models? An Extensive Study**|Shihan Dou et.al.|[2407.06153](http://arxiv.org/abs/2407.06153)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5feb\u901f\u53d1\u5c55\uff0c\u7814\u7a76\u4eba\u5458\u5bf9\u6b64\u7684\u5173\u6ce8\u5ea6\u65e5\u76ca\u63d0\u9ad8\u3002\u76ee\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6784\u5efa\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u548c\u91c7\u7528\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6280\u672f\u6765\u63d0\u5347LLM\u7684\u4ee3\u7801\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u8fd9\u4e9b\u73b0\u6709\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u548c\u8fb9\u754c\uff0c\u7f3a\u4e4f\u5168\u9762\u7684\u7814\u7a76\u63a2\u8ba8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u8be6\u5c3d\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u8bc4\u4f30\u4e86\u4e09\u4e2a\u9886\u5148\u95ed\u6e90LLM\u548c\u56db\u4e2a\u5f00\u6e90LLM\u5728\u4e09\u4e2a\u5e38\u7528\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u7814\u7a76\u8003\u5bdf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u957f\u5ea6\u3001\u5faa\u73af\u590d\u6742\u5ea6\u548cAPI\u6570\u91cf\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u66f4\u590d\u6742\u7684\u7f16\u7a0b\u95ee\u9898\u65f6\u9762\u4e34\u6311\u6218\uff0c\u751f\u6210\u7684\u4ee3\u7801\u5f80\u5f80\u8f83\u77ed\u4f46\u7ed3\u6784\u66f4\u590d\u6742\uff0c\u4e0e\u6807\u51c6\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\u3002 \u6211\u4eec\u8fd8\u521b\u5efa\u4e86\u4e00\u4e2a\u9519\u8bef\u4ee3\u7801\u7684\u5206\u7c7b\u4f53\u7cfb\uff0c\u5206\u4e3a\u4e09\u4e2a\u7c7b\u522b\u548c12\u4e2a\u5b50\u7c7b\u522b\uff0c\u5206\u6790\u5e38\u89c1\u9519\u8bef\u7c7b\u578b\u7684\u6839\u6e90\u3002\u4e3a\u4e86\u68c0\u9a8cLLMs\u5728\u5b9e\u9645\u9879\u76ee\u4e2d\u7684\u8868\u73b0\uff0c\u6211\u4eec\u4eb2\u624b\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b140\u4e2a\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u5b9e\u4e16\u754c\u57fa\u51c6\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5b9e\u9645\u573a\u666f\u4e2d\u7684bug\u5206\u5e03\u4e0e\u73b0\u6709\u57fa\u51c6\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u989d\u5916\u8bad\u7ec3\u7684\u8fed\u4ee3\u65b9\u6cd5\uff0c\u5f15\u5165\u81ea\u6211\u6279\u5224\u673a\u5236\uff0c\u4f7fLLMs\u80fd\u591f\u6839\u636ebug\u7c7b\u578b\u548c\u7f16\u8bd1\u5668\u53cd\u9988\u4fee\u6b63\u5176\u751f\u6210\u7684\u4ee3\u7801\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7ecf\u8fc7\u4e24\u6b21\u8fed\u4ee3\u540e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u51cf\u5c11\u9519\u8bef\uff0c\u4f7f\u901a\u8fc7\u7387\u63d0\u9ad829.2%\uff0c\u8fd9\u8868\u660eLLMs\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65b9\u9762\u5177\u6709\u5de8\u5927\u6f5c\u529b\u3002|\n", "2407.06146": "|**2024-07-09**|**Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks**|Lukas Netz et.al.|[2407.06146](http://arxiv.org/abs/2407.06146)|null|\u6211\u4eec\u4ecb\u7ecd\u5e76\u8bc4\u4f30\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bed\u6cd5\u906e\u76d6\u201d\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u5f15\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u65e0\u5173\u6587\u6cd5\u7684\u7ea6\u675f\u4e0b\u751f\u6210\u8bed\u6cd5\u6b63\u786e\u7684\u6a21\u578b\u3002\u5c3d\u7ba1\u5c11\u91cf\u793a\u4f8b\u5b66\u4e60\u6216\u63d0\u793a\u5f15\u5bfc\u7b49prompt\u5de5\u7a0b\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8LLMs\u751f\u6210\u6b63\u786e\u8bed\u6cd5\u7684\u6982\u7387\uff0c\u4f46\u5904\u7406\u590d\u6742\u6587\u6cd5\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u8017\u65f6\u4e14\u6548\u679c\u4e0d\u7406\u60f3\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u6216prompt\u5de5\u7a0b\u4e0a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\uff0c\u901a\u8fc7\u7ea6\u675f\u89e3\u7801\u9650\u5236\u8f93\u51fa\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u7b26\u5408\u6709\u6548\u8bed\u6cd5\u3002\u6211\u4eec\u5229\u7528MontiCore\u6784\u5efa\u7684\u591a\u79cd\u9886\u57df\u7279\u5b9a\u8bed\u8a00\uff08DSL\uff09\u548c\u591a\u6b3eLLMs\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6bd4\u8f83\u4e86\u4f7f\u7528\u548c\u672a\u4f7f\u7528\u7ea6\u675f\u89e3\u7801\u7684\u6548\u679c\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u76f8\u5e94\u7684\u89e3\u6790\u5668\u9a8c\u8bc1\u6bcf\u79cd\u6a21\u578b\u7684\u53e5\u6cd5\u51c6\u786e\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8bed\u6cd5\u906e\u76d6\u663e\u8457\u63d0\u5347\u4e86\u591a\u4e2aLLMs\u7684\u5efa\u6a21\u80fd\u529b\uff0c\u51cf\u5c11\u4e86\u5bf9\u7cbe\u5fc3\u8bbe\u8ba1\u63d0\u793a\u7684\u9700\u6c42\uff0c\u63d0\u9ad8\u4e86\u751f\u6210\u6b63\u786e\u6a21\u578b\u7684\u53ef\u80fd\u6027\u3002|\n", "2407.06135": "|**2024-07-08**|**ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation**|Ethan Chern et.al.|[2407.06135](http://arxiv.org/abs/2407.06135)|**[link](https://github.com/gair-nlp/anole)**|**## \u80cc\u666f \u5148\u524d\u7684\u5f00\u6e90\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5b58\u5728\u4e00\u4e9b\u5c40\u9650\u6027\uff1a\uff081\uff09\u5b83\u4eec\u5f80\u5f80\u7f3a\u4e4f\u539f\u751f\u96c6\u6210\uff0c\u9700\u8981\u9002\u914d\u5668\u6765\u8854\u63a5\u89c6\u89c9\u8868\u793a\u4e0e\u9884\u8bad\u7ec3\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u8bb8\u591a\u6a21\u578b\u4ec5\u9650\u4e8e\u5355\u6a21\u6001\u751f\u6210\uff1b\uff083\uff09\u5c3d\u7ba1\u6709\u4e9b\u652f\u6301\u591a\u6a21\u6001\u751f\u6210\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5355\u72ec\u7684\u6269\u6563\u6a21\u578b\u5904\u7406\u89c6\u89c9\u90e8\u5206\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86Anole\uff0c\u4e00\u4e2a\u5f00\u6e90\u7684\u3001\u81ea\u56de\u5f52\u7684\u3001\u539f\u751f\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff0c\u4e13\u4e3a\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u751f\u6210\u8bbe\u8ba1\u3002\u6211\u4eec\u57fa\u4e8eMeta AI\u7684Chameleon\u6784\u5efaAnole\uff0c\u91c7\u7528\u4e86\u4e00\u79cd\u65e2\u6570\u636e\u9ad8\u6548\u53c8\u53c2\u6570\u9ad8\u6548\u7684\u521b\u65b0\u5fae\u8c03\u7b56\u7565\u3002Anole\u5c55\u793a\u4e86\u9ad8\u8d28\u91cf\u3001\u8fde\u8d2f\u7684\u591a\u6a21\u6001\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u5df2\u7ecf\u516c\u5f00\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3001\u8bad\u7ec3\u6846\u67b6\u4ee5\u53ca\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002**|\n", "2407.06129": "|**2024-07-08**|**Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization**|Hannah K. Bako et.al.|[2407.06129](http://arxiv.org/abs/2407.06129)|**[link](https://github.com/hdi-umd/semantic_profiling_llm_evaluation)**|**### \u6982\u8ff0 \u81ea\u52a8\u6839\u636e\u4eba\u7c7b\u5bf9\u6570\u636e\u96c6\u7684\u53e3\u5934\u63cf\u8ff0\u751f\u6210\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\uff0c\u9700\u8981\u6df1\u5ea6\u7406\u89e3\u8bed\u8a00\u4e2d\u7684\u8bed\u4e49\u4fe1\u606f\uff0c\u5305\u62ec\u5bf9\u6570\u636e\u5c5e\u6027\u3001\u53ef\u89c6\u5316\u4efb\u52a1\u4ee5\u53ca\u6570\u636e\u9884\u5904\u7406\u6b65\u9aa4\u7684\u9690\u542b\u548c\u660e\u786e\u63d0\u53ca\u3002\u81ea\u7136\u8bed\u8a00\u754c\u9762\uff08NLIs\uff09\u5728\u6570\u636e\u53ef\u89c6\u5316\u65b9\u9762\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5982\u4f55\u6355\u6349\u8fd9\u4e9b\u4fe1\u606f\uff0c\u4f46\u4eba\u7c7b\u8a00\u8bed\u7684\u4e0d\u786e\u5b9a\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u4f46\u5b83\u4eec\u63d0\u53d6\u76f8\u5173\u8bed\u4e49\u4fe1\u606f\u7684\u80fd\u529b\u5c1a\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u8bc4\u4f30\u4e86\u56db\u6b3e\u516c\u5f00\u53ef\u7528\u7684LLMs\uff08GPT-4\u3001Gemini-Pro\u3001Llama3\u548cMixtral\uff09\uff0c\u5206\u6790\u5b83\u4eec\u5728\u9762\u5bf9\u4e0d\u786e\u5b9a\u6027\u65f6\u7406\u89e3\u53e3\u5934\u6307\u4ee4\u7684\u80fd\u529b\uff0c\u5e76\u8bc6\u522b\u6570\u636e\u4e0a\u4e0b\u6587\u548c\u89c6\u89c9\u4efb\u52a1\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5bf9\u53e3\u8bed\u4e2d\u7684\u4e0d\u786e\u5b9a\u6027\u5f88\u654f\u611f\uff0c\u80fd\u591f\u63d0\u53d6\u5173\u952e\u7684\u6570\u636e\u80cc\u666f\u4fe1\u606f\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u63a8\u65ad\u53ef\u89c6\u5316\u4efb\u52a1\u65b9\u9762\u8868\u73b0\u6b20\u4f73\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u672a\u6765\u5229\u7528LLMs\u8fdb\u884c\u53ef\u89c6\u5316\u751f\u6210\u7684\u7814\u7a76\u65b9\u5411\u3002**|\n", "2407.06125": "|**2024-07-08**|**Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities**|Avinash Anand et.al.|[2407.06125](http://arxiv.org/abs/2407.06125)|null|\u6291\u90c1\u75c7\u88ab\u5e7f\u6cdb\u8ba4\u4e3a\u662f\u91cd\u5927\u7684\u516c\u5171\u536b\u751f\u95ee\u9898\uff0c\u4e25\u91cd\u5f71\u54cd\u4e2a\u4eba\u7684\u5fc3\u7406\u5065\u5eb7\u3002\u672a\u7ecf\u8bca\u65ad\u7684\u6291\u90c1\u75c7\u53ef\u80fd\u5bfc\u81f4\u4e25\u91cd\u7684\u5065\u5eb7\u95ee\u9898\uff0c\u5305\u62ec\u751f\u7406\u75c7\u72b6\u751a\u81f3\u81ea\u6740\u3002\u901a\u5e38\uff0c\u6291\u90c1\u75c7\u7684\u8bca\u65ad\u4f9d\u8d56\u4e8e\u4e34\u5e8a\u533b\u751f\u548c\u5fc3\u7406\u5065\u5eb7\u4e13\u4e1a\u4eba\u5458\u8fdb\u884c\u7684\u7ed3\u6784\u5316\u8bbf\u8c08\u548c\u5982Patient Health Questionnaire\uff08PHQ\uff09\u7b49\u95ee\u5377\u8c03\u67e5\u3002\u7136\u800c\uff0c\u8fd9\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u533b\u751f\u7684\u7ecf\u9a8c\u548c\u5224\u65ad\uff0c\u53ef\u80fd\u53d7\u5230\u4e2a\u4eba\u504f\u89c1\u7684\u5f71\u54cd\u3002\u7531\u4e8e\u6291\u90c1\u75c7\u7684\u6210\u56e0\u4ecd\u5728\u7814\u7a76\u4e2d\uff0c\u533b\u751f\u5728\u8bc6\u522b\u548c\u6cbb\u7597\u521d\u671f\u9636\u6bb5\u7684\u6291\u90c1\u75c7\u65f6\u9762\u4e34\u6311\u6218\u3002 \u8fd1\u671f\uff0c\u4eba\u5de5\u667a\u80fd\u795e\u7ecf\u8ba1\u7b97\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u8bed\u97f3\u5904\u7406\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u6211\u4eec\u7684\u7814\u7a76\u5c1d\u8bd5\u5229\u7528\u8fd9\u4e9b\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5728E-DAIC\uff08Extended Distress Analysis Interview Corpus Wizard of Oz\uff09\u6570\u636e\u96c6\u548c2019\u5e74Audio/Visual Emotion Challenge\uff08AVEC\uff09\u4e2d\u8fdb\u884c\u5b9e\u9a8c\uff0c\u4ee5\u671f\u4f18\u5316\u591a\u6a21\u6001\u7ed3\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u89e3\u51b3\u65b9\u6848\u5229\u7528\u4e13\u6709\u548c\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u6a21\u6001\u4e0a\u7684Root Mean Square Error\uff08RMSE\uff09\u5f97\u5206\u8fbe\u52303.98\uff0c\u4f18\u4e8eAVEC 2019\u6311\u6218\u7684\u57fa\u7ebf\u548c\u5f53\u524d\u6700\u4f73\u7684\u56de\u5f52\u5206\u6790\u67b6\u6784\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u51c6\u786e\u6027\u8fbe\u5230\u4e8671.43%\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u97f3\u9891-\u89c6\u89c9\u591a\u6a21\u6001\u7f51\u7edc\uff0c\u5176\u9884\u6d4bPHQ-8\u8bc4\u5206\u7684RMSE\u4e3a6.51\u3002|\n", "2407.06093": "|**2024-07-08**|**Artificial Intuition: Efficient Classification of Scientific Abstracts**|Harsh Sakhrani et.al.|[2407.06093](http://arxiv.org/abs/2407.06093)|null|## \u80cc\u666f \u4e3a\u4e86\u83b7\u53d6\u6218\u7565\u6d1e\u89c1\u6216\u8fdb\u884c\u79d1\u7814\u9879\u76ee\u7ba1\u7406\uff0c\u5bf9\u7b80\u77ed\u7684\u79d1\u5b66\u6587\u672c\uff08\u5982\u7814\u7a76\u57fa\u91d1\u7533\u8bf7\u4e66\u6216\u51fa\u7248\u7269\u6458\u8981\uff09\u8fdb\u884c\u7c97\u7c92\u5ea6\u5206\u7c7b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u6587\u672c\u5411\u5177\u5907\u6df1\u539a\u4e13\u4e1a\u77e5\u8bc6\u7684\u4e13\u5bb6\u4f20\u8fbe\u5bc6\u96c6\u4fe1\u606f\uff0c\u4f46\u81ea\u52a8\u5316\u7684\u4efb\u52a1\u6781\u5176\u8270\u5de8\uff0c\u56e0\u4e3a\u7bc7\u5e45\u6709\u9650\u4e14\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u65b9\u6cd5\u6765\u751f\u6210\u5e76\u51c6\u786e\u5206\u914d\u7279\u5b9a\u9886\u57df\u7684\u7c97\u6807\u7b7e\u3002\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u591f\u63d0\u4f9b\u4efb\u52a1\u6240\u9700\u7684\u5143\u6570\u636e\uff0c\u7c7b\u4f3c\u4e8e\u589e\u5f3a\u4eba\u7c7b\u76f4\u89c9\u7684\u8865\u5145\u77e5\u8bc6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u3002\u4f5c\u4e3a\u521d\u6b65\u5b9e\u9a8c\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u7f8e\u56fd\u56fd\u5bb6\u822a\u7a7a\u822a\u5929\u5c40\uff08NASA\uff09\u7684\u5956\u9879\u6458\u8981\u6570\u636e\u5e93\u3002\u6211\u4eec\u7ed3\u5408\u73b0\u6709\u6027\u80fd\u6307\u6807\uff0c\u5f00\u53d1\u4e86\u65b0\u7684\u8bc4\u4f30\u5de5\u5177\u3002|\n", "2407.06089": "|**2024-07-08**|**Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models**|Jinliang Lu et.al.|[2407.06089](http://arxiv.org/abs/2407.06089)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u663e\u8457\u6210\u529f\uff0c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u8fdb\u5165\u4e86\u65b0\u65f6\u4ee3\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5404\u6709\u6240\u957f\uff0c\u4f46\u8bad\u7ec3\u5728\u4e0d\u540c\u8bed\u6599\u5e93\u4e0a\u7684LLMs\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u8fd9\u7ed9\u63d0\u9ad8\u6574\u4f53\u6548\u7387\u548c\u7075\u6d3b\u6027\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63a2\u7d22\u4e86LLMs\u7684\u534f\u4f5c\u7b56\u7565\u3002\u672c\u6587\u5168\u9762\u6982\u8ff0\u4e86\u8fd9\u4e00\u65b0\u5174\u7814\u7a76\u9886\u57df\uff0c\u5f3a\u8c03\u4e86\u5408\u4f5c\u80cc\u540e\u7684\u52a8\u529b\u3002\u6211\u4eec\u5c06\u534f\u4f5c\u7b56\u7565\u4e3b\u8981\u5206\u4e3a\u4e09\u79cd\u65b9\u6cd5\uff1a\u5408\u5e76\u3001\u96c6\u6210\u548c\u534f\u4f5c\u3002\u5408\u5e76\u662f\u5c06\u591a\u4e2aLLMs\u7684\u53c2\u6570\u7a7a\u95f4\u6574\u5408\u3002\u96c6\u6210\u5219\u662f\u7ed3\u5408\u591a\u4e2a\u6a21\u578b\u7684\u8f93\u51fa\u3002\u534f\u4f5c\u5229\u7528\u4e0d\u540cLLMs\u7684\u4f18\u52bf\uff0c\u4f7f\u5176\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u53d1\u6325\u5404\u81ea\u4e13\u957f\u3002\u6211\u4eec\u5c06\u4ece\u4e0d\u540c\u89d2\u5ea6\u8be6\u7ec6\u4ecb\u7ecd\u8fd9\u4e9b\u65b9\u6cd5\uff0c\u5e76\u8ba8\u8bba\u5176\u6f5c\u5728\u5e94\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u52fe\u52d2\u51fa\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\uff0c\u671f\u671b\u672c\u5de5\u4f5c\u80fd\u6fc0\u53d1\u66f4\u591a\u5173\u4e8eLLMs\u534f\u4f5c\u7684\u7814\u7a76\uff0c\u63a8\u52a8\u9ad8\u7ea7NLP\u5e94\u7528\u7684\u53d1\u5c55\u3002|\n", "2407.07094": "|**2024-07-09**|**AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning**|Jiaxi Cui et.al.|[2407.07094](http://arxiv.org/abs/2407.07094)|**[link](https://github.com/pandavt/datatager)**|**\u5728\u5404\u884c\u5404\u4e1a\u5e7f\u6cdb\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fc7\u7a0b\u4e2d\uff0c\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e2a\u4f53\u548c\u5c0f\u578b\u7ec4\u7ec7\u5bf9\u9488\u5bf9\u5176\u7279\u5b9a\u4e1a\u52a1\u573a\u666f\u5b9a\u5236\u5316\u6a21\u578b\u7684\u9700\u6c42\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u5fae\u8c03\u65b9\u6cd5\u2014\u2014\\textbf{AnyTaskTune}\uff0c\u5373\u4efb\u52a1\u5fae\u8c03\uff08Task-Fine-Tune\uff09\uff0c\u65e8\u5728\u63d0\u5347\u6a21\u578b\u5728\u591a\u6837\u5316\u7684\u9886\u57df\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u5305\u62ec\u7ec6\u81f4\u5730\u8bc6\u522b\u548c\u5b9a\u4e49\u9886\u57df\u5185\u7684\u5b50\u4efb\u52a1\uff0c\u968f\u540e\u521b\u5efa\u4e13\u95e8\u7684\u589e\u5f3a\u6570\u636e\u96c6\u8fdb\u884c\u7cbe\u7ec6\u8c03\u6574\uff0c\u4ece\u800c\u4f18\u5316\u4efb\u52a1\u7279\u5b9a\u7684\u6a21\u578b\u8868\u73b0\u3002\u6211\u4eec\u5728\u6cd5\u5f8b\uff08\u5982\u5173\u952e\u8bcd\u63d0\u53d6\u548c\u53e5\u5b50\u9884\u6d4b\uff09\u7b49\u591a\u4e2a\u9886\u57df\uff0c\u5305\u62ec\u91d1\u878d\u3001\u533b\u7597\u3001\u6cd5\u5f8b\u3001\u5fc3\u7406\u5b66\u3001\u5ba2\u6237\u670d\u52a1\u548c\u4eba\u529b\u8d44\u6e90\u7b49\u4e8c\u5341\u591a\u4e2a\u5b50\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5fae\u8c03\u5b9e\u9a8c\u3002\u4e3a\u4e86\u652f\u6301\u793e\u533a\u53c2\u4e0e\u5e76\u5206\u4eab\u8d44\u6e90\uff0c\u6211\u4eec\u5c06\u5f00\u6e90\u8fd9\u4e9b\u53cc\u8bed\u4efb\u52a1\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528\\textbf{Task-Fine-Tune}\u65b9\u6cd5\u5fae\u8c03\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u4e14\u5728\u5404\u81ea\u9886\u57df\u5185\u660e\u663e\u4f18\u4e8e\u901a\u7528\u80fd\u529b\u66f4\u5f3a\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5df2\u516c\u5f00\u53d1\u5e03\u5728\uff1a\\url{https://github.com/PandaVT/DataTager}\u3002**|\n", "2407.07093": "|**2024-07-09**|**FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation**|Liqun Ma et.al.|[2407.07093](http://arxiv.org/abs/2407.07093)|**[link](https://github.com/liqunma/fbi-llm)**|**\u8be5\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5168\u65b0\u7684\u5168\u4e8c\u8fdb\u5236\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08FBI-LLM\uff09\uff0c\u8fd9\u662f\u9996\u6b21\u5c55\u793a\u5982\u4f55\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5927\u89c4\u6a21\u7684\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\uff08\u4e0d\u540c\u4e8e\u90e8\u5206\u4e8c\u8fdb\u5236\u6216\u4e09\u8fdb\u5236\u7684LSTM\uff0c\u5982BitNet b1.58\uff09\uff0c\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6d6e\u70b916\u4f4d\uff08FP16\uff09\u6216\u6df7\u5408\u7cbe\u5ea616\u4f4d\uff08BF16\uff09\u7684\u5e38\u89c4\u5927\u8bed\u8a00\u6a21\u578b\u76f8\u5f53\u3002\u901a\u8fc7\u4f7f\u7528\u81ea\u56de\u5f52\u84b8\u998f\uff08AD\uff09\u635f\u5931\uff0c\u540c\u65f6\u4fdd\u6301\u6a21\u578b\u5c3a\u5bf8\uff08130M\u300113B\u30017B\uff09\u548c\u9884\u8bad\u7ec3\u6570\u636e\u91cf\u4e0e\u5e38\u89c4LLM\u76f8\u5f53\uff0cFBI-LLM\u5728\u56f0\u60d1\u5ea6\u548c\u4efb\u52a1\u7279\u5b9a\u6548\u679c\u65b9\u9762\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u3002\u6709\u8da3\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u5168\u4e8c\u8fdb\u5236\u8bed\u8a00\u6a21\u578b\u5e76\u4e0d\u9700\u8981\u9884\u8bad\u7ec3\u6743\u91cd\u3002\u8fd9\u9879\u5de5\u4f5c\u50ac\u751f\u4e86\u4e00\u4e2a\u65b0\u7684\u8ba1\u7b97\u6846\u67b6\uff0c\u5e76\u53ef\u80fd\u63a8\u52a8\u9488\u5bf9\u5b8c\u51681\u6bd4\u7279LLMs\u7684\u4e13\u4e1a\u786c\u4ef6\u8bbe\u8ba1\u3002\u6211\u4eec\u516c\u5f00\u6240\u6709\u6a21\u578b\u3001\u4ee3\u7801\u548c\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff08\u4ee3\u7801\uff1ahttps://github.com/LiqunMa/FBI-LLM\uff0c\u6a21\u578b\uff1ahttps://huggingface.co/LiqunMa/\uff09\u3002**|\n", "2407.07086": "|**2024-07-09**|**Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models**|Logan Cross et.al.|[2407.07086](http://arxiv.org/abs/2407.07086)|**[link](https://github.com/locross93/hypothetical-minds)**|**\u5728\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\uff08MARL\uff09\u65b9\u6cd5\u4e2d\uff0c\u5904\u7406\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u7684\u975estationarity\u5e76\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7684\u80fd\u529b\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u81ea\u4e3b\u7684\u89e3\u51b3\u7b56\u7565\u3002\u6211\u4eec\u7684\u65b0\u578b\u667a\u80fd\u4f53\u201c\u5047\u8bbe\u5fc3\u667a\u201d\uff08Hypothetical Minds\uff09\u91c7\u7528\u8ba4\u77e5\u542f\u53d1\u5f0f\u67b6\u6784\uff0c\u5305\u62ec\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u4e24\u4e2a\u62bd\u8c61\u5c42\u6b21\u4e0a\u7684\u5206\u5c42\u89c4\u5212\u6a21\u5757\u3002\u5173\u952e\u65b0\u589e\u7684\u662f\u201c\u5fc3\u7406\u7406\u8bba\u201d\u6a21\u5757\uff0c\u5b83\u4ee5\u81ea\u7136\u8bed\u8a00\u7684\u5f62\u5f0f\u751f\u6210\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u7b56\u7565\u7684\u5047\u8bbe\uff0c\u5e76\u901a\u8fc7\u9a8c\u8bc1\u8fd9\u4e9b\u5047\u8bbe\u5bf9\u5176\u4ed6\u667a\u80fd\u4f53\u884c\u4e3a\u7684\u9884\u6d4b\u51c6\u786e\u6027\u6765\u9010\u6b65\u4f18\u5316\u3002\u5728Melting Pot\u57fa\u51c6\u7684\u591a\u79cd\u7ade\u4e89\u3001\u6df7\u5408\u52a8\u673a\u548c\u534f\u4f5c\u73af\u5883\u4e2d\uff0c\u5047\u8bbe\u5fc3\u667a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u8bed\u8a00\u6a21\u578b\u667a\u80fd\u4f53\u548c\u5f3a\u5316\u5b66\u4e60\u57fa\u7ebf\uff0c\u65e0\u8bba\u662f\u5728\u4e8c\u5143\u73af\u5883\u8fd8\u662f\u7fa4\u4f53\u73af\u5883\u4e2d\u3002\u5bf9\u6bd4\u5206\u6790\u663e\u793a\uff0c\u5047\u8bbe\u7684\u8bc4\u4f30\u548c\u8fed\u4ee3\u7cbe\u70bc\u5bf9\u4e8e\u5e94\u5bf9\u590d\u6742\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002**|\n", "2407.07080": "|**2024-07-09**|**Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities**|Shaltiel Shmidman et.al.|[2407.07080](http://arxiv.org/abs/2407.07080)|null|\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5728\u5e0c\u4f2f\u6765\u7b49\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6311\u6218\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86DictaLM2.0\u548cDictaLM2.0-Instruct\uff0c\u8fd9\u4e24\u4e2a\u6a21\u578b\u57fa\u4e8eMistral\u6a21\u578b\uff0c\u4f7f\u7528\u5927\u7ea62000\u4ebf\u4e2a\u5e0c\u4f2f\u6765\u8bed\u548c\u82f1\u8bed\u8bcd\u6c47\u8fdb\u884c\u8bad\u7ec3\u3002\u9002\u5e94\u9884\u8bad\u7ec3\u6a21\u578b\u5230\u65b0\u8bed\u8a00\u9700\u8981\u4e13\u95e8\u7684\u6280\u672f\uff0c\u8fd9\u4e0e\u4ece\u5934\u8bad\u7ec3\u6216\u5728\u8d44\u6e90\u4e30\u5bcc\u7684\u8bed\u8a00\uff08\u5982\u82f1\u8bed\uff09\u4e0a\u8fdb\u4e00\u6b65\u8bad\u7ec3\u73b0\u6709\u6a21\u578b\u6709\u663e\u8457\u5dee\u5f02\u3002\u8bba\u6587\u8be6\u7ec6\u9610\u8ff0\u4e86\u8fd9\u4e9b\u521b\u65b0\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdb\u5e0c\u4f2f\u6765\u8bed\u7684\u9ad8\u6548\u5b66\u4e60\u548c\u9002\u5e94\u5176\u8bed\u8a00\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9DictaLM2.0-Instruct\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6307\u4ee4\u5fae\u8c03\uff0c\u4ee5\u63d0\u5347\u5176\u5728\u4efb\u52a1\u5bfc\u5411\u6307\u4ee4\u4e0a\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u4e25\u683c\u8bc4\u4f30\u6211\u4eec\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u65b0\u7684\u5e0c\u4f2f\u6765LLM\u8bc4\u4f30\u57fa\u51c6\uff0c\u6db5\u76d6\u4e86\u95ee\u7b54\u3001\u60c5\u611f\u5206\u6790\u3001Winograd Schema Challenge\u3001\u7ffb\u8bd1\u548c\u6458\u8981\u7b49\u591a\u4e2a\u4efb\u52a1\u3002\u672c\u6587\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5728\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u8bad\u7ec3LLMs\u7684\u590d\u6742\u6027\uff0c\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u7528\u4e8e\u5176\u4ed6LLM\u8de8\u975e\u82f1\u8bed\u8bed\u8a00\u9002\u5e94\u7684\u6846\u67b6\uff0c\u4ece\u800c\u5bf9\u591a\u8bed\u8a00\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2407.07071": "|**2024-07-09**|**Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps**|Yung-Sung Chuang et.al.|[2407.07071](http://arxiv.org/abs/2407.07071)|**[link](https://github.com/voidism/lookback-lens)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u603b\u7ed3\u6587\u7ae0\u6216\u6839\u636e\u7ed9\u5b9a\u6bb5\u843d\u56de\u7b54\u95ee\u9898\u65f6\u53ef\u80fd\u51fa\u73b0\u7684\u8bed\u5883\u6027\u865a\u6784\u95ee\u9898\u3002LLMs\u53ef\u80fd\u4f1a\u675c\u64b0\u7ec6\u8282\uff0c\u63d0\u4f9b\u4e0e\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0d\u7b26\u7684\u4e0d\u51c6\u786e\u7b54\u6848\u3002\u7814\u7a76\u8005\u63d0\u51fa\uff0c\u8fd9\u79cd\u865a\u6784\u4e0e\u6a21\u578b\u503e\u5411\u4e8e\u5173\u6ce8\u4e0a\u4e0b\u6587\u4fe1\u606f\u8fd8\u662f\u81ea\u52a8\u751f\u6210\u5185\u5bb9\u7684\u7a0b\u5ea6\u6709\u5173\u3002\u4e3a\u6b64\uff0c\u4ed6\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7b80\u5355\u7684\u68c0\u6d4b\u6a21\u578b\u2014\u2014\u201cLookback Lens\u201d\uff0c\u5176\u8f93\u5165\u7279\u5f81\u662f\u57fa\u4e8e\u6bcf\u4e2a\u6ce8\u610f\u529b\u5934\u4e0a\u4e0b\u6587\u6ce8\u610f\u529b\u6743\u91cd\u4e0e\u65b0\u751f\u6210\u8bcd\u7684\u6bd4\u4f8b\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u4ec5\u4f7f\u7528\u8fd9\u4e9b\u56de\u987e\u6bd4\u7387\u7279\u5f81\u7684\u7ebf\u6027\u5206\u7c7b\u5668\u4e0e\u5229\u7528LLM\u6574\u4e2a\u9690\u85cf\u72b6\u6001\u6216\u6587\u672c\u8574\u542b\u6a21\u578b\u7684\u66f4\u590d\u6742\u68c0\u6d4b\u5668\u540c\u6837\u6709\u6548\u3002Lookback Lens\u4e0d\u4ec5\u9002\u7528\u4e8e\u4e0d\u540c\u4efb\u52a1\uff0c\u8fd8\u80fd\u8de8\u6a21\u578b\u8fc1\u79fb\uff0c\u4e00\u4e2a\u572870\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u8bad\u7ec3\u7684\u68c0\u6d4b\u5668\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u5373\u53ef\u5e94\u7528\u4e8e\u66f4\u5927\u7684130\u4ebf\u53c2\u6570\u6a21\u578b\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u53d1\u73b0\uff0c\u901a\u8fc7\u7b80\u5355\u7684\u5206\u7c7b\u5668\u6307\u5bfc\u89e3\u7801\u65b9\u6cd5\uff0c\u80fd\u591f\u51cf\u5c11\u8bf8\u5982XSum\u6458\u8981\u4efb\u52a1\u4e2d\u7684\u865a\u6784\u7a0b\u5ea6\uff0c\u4f8b\u5982\u964d\u4f4e9.6%\u7684\u865a\u6784\u53d1\u751f\u7387\u3002**|\n", "2407.07064": "|**2024-07-09**|**Prompting Techniques for Secure Code Generation: A Systematic Investigation**|Catherine Tony et.al.|[2407.07064](http://arxiv.org/abs/2407.07064)|null|## \u6982\u8981 \u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8f6f\u4ef6\u5f00\u53d1\u4e2d\u7684\u5174\u8d77\uff0c\u901a\u8fc7\u63d0\u793a\u9a71\u52a8\u7f16\u7a0b\uff0c\u5f00\u53d1\u8005\u80fd\u591f\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\uff08NL\uff09\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u3002\u7136\u800c\uff0c\u5173\u4e8e\u5b83\u4eec\u80fd\u5426\u4ea7\u751f\u5b89\u5168\u4ee3\u7801\u7684\u7814\u7a76\u5f15\u53d1\u4e86\u8d28\u7591\uff0c\u8fd9\u5173\u7cfb\u5230\u63d0\u793a\u751f\u6210\u8f6f\u4ef6\u7684\u8d28\u91cf\u3002\u5c3d\u7ba1\u5df2\u7ecf\u51fa\u73b0\u4e86\u591a\u79cd\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u4ee5\u4f18\u5316LLM\u7684\u54cd\u5e94\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\u4ecd\u9700\u8fdb\u4e00\u6b65\u7814\u7a76\u3002\u76ee\u6807\uff1a\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7a76\u4e0d\u540c\u63d0\u793a\u6280\u672f\u5bf9LLMs\u6839\u636eNL\u6307\u4ee4\u751f\u6210\u4ee3\u7801\u7684\u5b89\u5168\u6027\u5f71\u54cd\u3002\u65b9\u6cd5\uff1a\u9996\u5148\uff0c\u6211\u4eec\u8fdb\u884c\u7cfb\u7edf\u6587\u732e\u56de\u987e\uff0c\u4ee5\u8bc6\u522b\u9002\u7528\u4e8e\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u73b0\u6709\u63d0\u793a\u6280\u672f\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728GPT-3\u3001GPT-3.5\u548cGPT-4\u6a21\u578b\u4e0a\u8bc4\u4f30\u8fd9\u4e9b\u6280\u672f\u4e2d\u7684\u90e8\u5206\uff0c\u4f7f\u7528\u4e00\u4e2a\u5305\u542b150\u4e2a\u4e0e\u5b89\u5168\u76f8\u5173\u7684\u4ee3\u7801\u751f\u6210NL\u63d0\u793a\u7684\u6570\u636e\u96c6\u3002\u7ed3\u679c\uff1a\u6211\u4eec\u7684\u5de5\u4f5c\uff081\uff09\u5bf9\u4ee3\u7801\u751f\u6210\u7684\u6f5c\u5728\u63d0\u793a\u6280\u672f\u8fdb\u884c\u4e86\u5206\u7c7b\uff0c\uff082\uff09\u9002\u5e94\u5e76\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u5b89\u5168\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\uff083\uff09\u89c2\u5bdf\u5230\u5728\u6d4b\u8bd5\u7684LLMs\u4e2d\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u4e86\u540d\u4e3a\u201c\u9012\u5f52\u6279\u8bc4\u4e0e\u6539\u8fdb\u201d\uff08RCI\uff09\u7684\u73b0\u6709\u6280\u672f\u540e\uff0c\u5b89\u5168\u6f0f\u6d1e\u6709\u6240\u51cf\u5c11\uff0c\u4e3aLLM\u751f\u6210\u4ee3\u7801\u5b89\u5168\u6027\u7684\u8ba8\u8bba\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2407.07061": "|**2024-07-09**|**Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence**|Weize Chen et.al.|[2407.07061](http://arxiv.org/abs/2407.07061)|**[link](https://github.com/openbmb/ioa)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u51fa\u73b0\u4e86\u80fd\u6548\u5353\u8d8a\u7684\u81ea\u4e3b\u4ee3\u7406\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5728\u6574\u5408\u6765\u81ea\u4e0d\u540c\u751f\u6001\u7cfb\u7edf\u7684\u9ad8\u80fd\u529b\u7b2c\u4e09\u65b9\u4ee3\u7406\u65f6\u9762\u4e34\u6311\u6218\uff0c\u901a\u5e38\u5c40\u9650\u4e8e\u81ea\u8eab\u5c01\u95ed\u73af\u5883\u3002\u5b83\u4eec\u5728\u6a21\u62df\u5206\u5e03\u5f0f\u73af\u5883\u65f6\u4e5f\u53d7\u9650\u4e8e\u5355\u8bbe\u5907\u8bbe\u7f6e\uff0c\u5e76\u4e14\u5f80\u5f80\u4f9d\u8d56\u786c\u7f16\u7801\u7684\u901a\u4fe1\u7ba1\u9053\uff0c\u96be\u4ee5\u9002\u5e94\u4efb\u52a1\u9700\u6c42\u7684\u53d8\u5316\u3002\u53d7\u4e92\u8054\u7f51\u7406\u5ff5\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4ee3\u7406\u4e92\u8054\u7f51\u201d\uff08Internet of Agents\uff0cIoA\uff09\u7684\u65b0\u6846\u67b6\u3002IoA\u65e8\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7075\u6d3b\u4e14\u53ef\u6269\u5c55\u7684\u5e73\u53f0\uff0c\u4fc3\u8fdb\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u591a\u4ee3\u7406\u534f\u4f5c\u3002\u5b83\u5f15\u5165\u4e86\u4ee3\u7406\u96c6\u6210\u534f\u8bae\u3001\u5373\u65f6\u6d88\u606f\u67b6\u6784\u4ee5\u53ca\u52a8\u6001\u7684\u56e2\u961f\u534f\u4f5c\u548c\u5bf9\u8bdd\u6d41\u7a0b\u63a7\u5236\u673a\u5236\u3002\u901a\u8fc7\u5728\u901a\u7528\u52a9\u624b\u4efb\u52a1\u3001\u4f53\u611fAI\u4efb\u52a1\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660eIoA\u5728\u6027\u80fd\u4e0a\u6301\u7eed\u4f18\u4e8e\u73b0\u6709\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\uff0c\u5c55\u793a\u4e86\u5176\u5728\u5f02\u6784\u4ee3\u7406\u4e4b\u95f4\u6709\u6548\u5408\u4f5c\u7684\u80fd\u529b\u3002IoA\u4ee3\u8868\u4e86\u671d\u7740\u5c06\u591a\u6837\u5316\u7684\u4ee3\u7406\u94fe\u63a5\u5728\u4e00\u4e2a\u7c7b\u4f3c\u4e92\u8054\u7f51\u7684\u73af\u5883\u4e2d\u8fc8\u8fdb\uff0c\u8ba9\u5b83\u4eec\u80fd\u591f\u65e0\u7f1d\u534f\u4f5c\u4ee5\u63d0\u5347\u6574\u4f53\u667a\u80fd\u548c\u529f\u80fd\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5e93\u5df2\u53d1\u5e03\u5728\uff1a\\url{https://github.com/OpenBMB/IoA}\u3002**|\n", "2407.07053": "|**2024-07-09**|**Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model**|Wenqi Zhang et.al.|[2407.07053](http://arxiv.org/abs/2407.07053)|**[link](https://github.com/zwq2018/multi-modal-self-instruct)**|**\u5c3d\u7ba1\u5f53\u524d\u7684\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMMs\uff09\u5df2\u7ecf\u80fd\u591f\u7406\u89e3\u81ea\u7136\u573a\u666f\u7684\u7167\u7247\u548c\u8096\u50cf\uff0c\u4f46\u5b83\u4eec\u5bf9\u62bd\u8c61\u56fe\u50cf\uff08\u5982\u56fe\u8868\u3001\u5730\u56fe\u6216\u5e03\u5c40\uff09\u7684\u7406\u89e3\u4ee5\u53ca\u89c6\u89c9\u63a8\u7406\u80fd\u529b\u4ecd\u7136\u76f8\u5f53\u521d\u7ea7\u3002\u5b83\u4eec\u5728\u5904\u7406\u65e5\u5e38\u4efb\u52a1\u65f6\u5e38\u5e38\u9047\u5230\u56f0\u96be\uff0c\u4f8b\u5982\u9605\u8bfb\u65f6\u949f\u65f6\u95f4\u3001\u7406\u89e3\u6d41\u7a0b\u56fe\u6216\u6839\u636e\u8def\u7ebf\u56fe\u89c4\u5212\u8def\u5f84\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u81ea\u6211\u6307\u5bfc\u7cfb\u7edf\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ca\u5176\u4ee3\u7801\u80fd\u529b\u6765\u751f\u6210\u5927\u91cf\u7684\u62bd\u8c61\u56fe\u50cf\u548c\u65e5\u5e38\u573a\u666f\u4e0b\u7684\u89c6\u89c9\u63a8\u7406\u6307\u4ee4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8f7b\u677e\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\uff0c\u5305\u542b11,193\u4e2a\u6307\u4ee4\uff0c\u6db5\u76d6\u516b\u4e2a\u89c6\u89c9\u573a\u666f\uff1a\u56fe\u8868\u3001\u8868\u683c\u3001\u6a21\u62df\u5730\u56fe\u3001\u4eea\u8868\u677f\u3001\u6d41\u7a0b\u56fe\u3001\u5173\u7cfb\u56fe\u3001\u697c\u5c42\u5e73\u9762\u56fe\u548c\u89c6\u89c9\u8c1c\u9898\u3002 \u8fd9\u4e2a\u7531\u7b80\u5355\u7ebf\u6761\u548c\u51e0\u4f55\u5143\u7d20\u6784\u6210\u7684\u57fa\u51c6\u63ed\u793a\u4e86\u6700\u5148\u8fdb\u7684LMM\uff08\u5982Claude-3.5-Sonnet\u548cGPT-4o\uff09\u5728\u62bd\u8c61\u56fe\u50cf\u7406\u89e3\u3001\u7a7a\u95f4\u5173\u7cfb\u63a8\u7406\u548c\u89c6\u89c9\u5143\u7d20\u8bc6\u522b\u65b9\u9762\u7684\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u9a8c\u8bc1\u5408\u6210\u6570\u636e\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u752862,476\u6761\u5408\u6210\u7684\u56fe\u8868\u3001\u8868\u683c\u548c\u8def\u7ebf\u56fe\u6307\u4ee4\u5bf9LMM\u8fdb\u884c\u5fae\u8c03\u3002\u7ed3\u679c\u663e\u793a\uff0c\u56fe\u8868\u7406\u89e3\u548c\u5730\u56fe\u5bfc\u822a\u6027\u80fd\u5f97\u5230\u4e86\u63d0\u5347\uff0c\u540c\u65f6\u4e5f\u8868\u660e\u8fd9\u5bf9\u5176\u4ed6\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u53ef\u80fd\u5177\u6709\u6f5c\u5728\u76ca\u5904\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u4ee5\u4e0b\u94fe\u63a5\u63d0\u4f9b\uff1a\\url{https://github.com/zwq2018/Multi-modal-Self-instruct}\u3002**|\n", "2407.07019": "|**2024-07-09**|**Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies**|Inwon Kang et.al.|[2407.07019](http://arxiv.org/abs/2407.07019)|null|\u6211\u4eec\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u751f\u6210\u57fa\u4e8e\u6587\u672c\u7684\u5065\u5eb7\u4fdd\u9669\u653f\u7b56\u7684\u81ea\u52a8\u5316\u4ee3\u7801\uff0c\u76ee\u6807\u662f\u533a\u5757\u94fe\u667a\u80fd\u5408\u7ea6\u3002\u667a\u80fd\u5408\u7ea6\u56e0\u5176\u4e0d\u53ef\u53d8\u6027\u3001\u53ef\u9a8c\u8bc1\u6027\u3001\u6269\u5c55\u6027\u548c\u65e0\u9700\u9884\u8bbe\u4fe1\u4efb\u7684\u7279\u6027\u800c\u88ab\u9009\u4e2d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6309\u6280\u672f\u590d\u6742\u5ea6\u9012\u589e\u751f\u6210\u8f93\u51fa\uff1a\uff081\uff09\u6587\u672c\u6458\u8981\uff0c\uff082\uff09\u58f0\u660e\u5f0f\u51b3\u7b56\u903b\u8f91\uff0c\u4ee5\u53ca\uff083\uff09\u5e26\u6709\u5355\u5143\u6d4b\u8bd5\u7684\u667a\u80fd\u5408\u7ea6\u4ee3\u7801\u3002\u6211\u4eec\u786e\u8ba4LLMs\u5728\u4efb\u52a1\uff081\uff09\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u7ed3\u6784\u5316\u7684\u8f93\u51fa\u6709\u52a9\u4e8e\u9a8c\u8bc1\u4efb\u52a1\uff082\uff09\u548c\uff083\uff09\u3002\u58f0\u660e\u5f0f\u8bed\u8a00\u5e38\u7528\u4e8e\u89c4\u8303\u533b\u7597\u653f\u7b56\uff0c\u4f46\u5728\u533a\u5757\u94fe\u4e0a\u7684\u6267\u884c\u8f83\u4e3a\u590d\u6742\uff0c\u56e0\u6b64\u4efb\u52a1\uff083\uff09\u65e8\u5728\u76f4\u63a5\u901a\u8fc7\u667a\u80fd\u5408\u7ea6\u81ea\u52a8\u5b9e\u73b0\u8fd9\u4e00\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u5b8c\u6574\u6027\u3001\u6b63\u786e\u6027\u3001\u6e05\u6670\u5ea6\u3001\u8bed\u6cd5\u548c\u529f\u80fd\u6027\u4ee3\u7801\u4f5c\u4e3a\u8bc4\u4f30\u6307\u6807\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u6765\u81eaMedicare\u5b98\u65b9\u624b\u518c\u7684\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u96be\u5ea6\u7684\u4fdd\u9669\u653f\u7b56\u573a\u666f\u8fdb\u884c\u8bc4\u4f30\uff0c\u6d89\u53caGPT-3.5 Turbo\u3001GPT-3.5 Turbo 16K\u3001GPT-4\u3001GPT-4 Turbo\u548cCodeLLaMA\u7b49\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0cLLMs\u5728\u751f\u6210\u6587\u672c\u6458\u8981\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002\u5c3d\u7ba1\u4efb\u52a1\uff082\uff09\u5230\uff083\uff09\u7684\u8f93\u51fa\u53ef\u4ee5\u4f5c\u4e3a\u8d77\u70b9\uff0c\u4f46\u5b83\u4eec\u4ecd\u9700\u4eba\u5de5\u5ba1\u6838\uff1a\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u5373\u4f7f\u201c\u53ef\u8fd0\u884c\u201d\u7684\u4ee3\u7801\u4e5f\u53ef\u80fd\u4ea7\u751f\u4e0d\u6b63\u786e\u7684\u7ed3\u679c\uff1b\u76ee\u6807\u8bed\u8a00\u7684\u6d41\u884c\u7a0b\u5ea6\u4f1a\u5f71\u54cd\u8f93\u51fa\u8d28\u91cf\uff1b\u66f4\u590d\u6742\u7684\u573a\u666f\u4ecd\u662f\u5f53\u524d\u7684\u4e00\u5927\u6311\u6218\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u5c55\u793a\u4e86LLMs\u5728\u5c06\u6587\u672c\u6d41\u7a0b\u63cf\u8ff0\u8f6c\u5316\u4e3a\u667a\u80fd\u5408\u7ea6\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2407.07018": "|**2024-07-09**|**End-To-End Causal Effect Estimation from Unstructured Natural Language Data**|Nikita Dhawan et.al.|[2407.07018](http://arxiv.org/abs/2407.07018)|null|\u4e86\u89e3\u5e72\u9884\u63aa\u65bd\u7684\u6548\u679c\u5bf9\u4eba\u7c7b\u51b3\u7b56\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u624b\u52a8\u6536\u96c6\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u8fd9\u5bfc\u81f4\u7814\u7a76\u6210\u672c\u589e\u52a0\u3001\u5b8c\u6210\u65f6\u95f4\u5ef6\u957f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5f00\u91c7\u5927\u89c4\u6a21\u3001\u591a\u6837\u5316\u7684\u89c2\u5bdf\u6027\u6587\u672c\u6570\u636e\uff0c\u4ee5\u5728\u9002\u5f53\u7684\u56e0\u679c\u5047\u8bbe\u4e0b\u751f\u6210\u4f4e\u6210\u672c\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u63d0\u51faNATURAL\uff0c\u4e00\u4e2a\u57fa\u4e8eLLMs\u7684\u65b0\u578b\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u7b97\u6cd5\u5bb6\u65cf\uff0c\u9002\u7528\u4e8e\u5904\u7406\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528LLMs\u7684\u6761\u4ef6\u5206\u5e03\uff08\u9488\u5bf9\u611f\u5174\u8da3\u7684\u53d8\u91cf\uff0c\u6839\u636e\u6587\u672c\u6570\u636e\uff09\u8f85\u52a9\u8ba1\u7b97\u7ecf\u5178\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u3002\u6211\u4eec\u514b\u670d\u4e86\u4e00\u7cfb\u5217\u6280\u672f\u6311\u6218\uff0c\u5982\u81ea\u52a8\u5316\u6570\u636e\u6574\u7406\u548c\u4f7f\u7528LLMs\u586b\u8865\u7f3a\u5931\u4fe1\u606f\u3002 \u6211\u4eec\u51c6\u5907\u4e86\u516d\u4e2a\uff08\u4e24\u4e2a\u5408\u6210\u7684\u548c\u56db\u4e2a\u5b9e\u9645\u7684\uff09\u89c2\u5bdf\u6027\u6570\u636e\u96c6\uff0c\u5e76\u914d\u4ee5\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u5f62\u5f0f\u7684\u771f\u5b9e\u6807\u7b7e\uff0c\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86\u6211\u4eec\u7ba1\u9053\u4e2d\u7684\u6bcf\u4e00\u6b65\u3002NATURAL\u4f30\u8ba1\u7b97\u6cd5\u8868\u73b0\u51fa\u8272\uff0c\u5176\u7ed3\u679c\u4e0e\u771f\u5b9e\u503c\u7684\u5dee\u8ddd\u4e0d\u8d85\u8fc73\u4e2a\u767e\u5206\u70b9\uff0c\u5305\u62ec\u5728\u5b9e\u9645\u7684\u4e09\u671f\u548c\u56db\u671f\u4e34\u5e8a\u8bd5\u9a8c\u4e2d\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u672a\u7ed3\u6784\u5316\u7684\u6587\u672c\u6570\u636e\u662f\u56e0\u679c\u6548\u5e94\u4fe1\u606f\u7684\u4e30\u5bcc\u6765\u6e90\uff0cNATURAL\u662f\u5229\u7528\u8fd9\u4e00\u8d44\u6e90\u7684\u81ea\u52a8\u5316\u6d41\u7a0b\u7684\u7b2c\u4e00\u6b65\u3002|\n", "2407.07890": "|**2024-07-10**|**Training on the Test Task Confounds Evaluation and Emergence**|Ricardo Dominguez-Olmedo et.al.|[2407.07890](http://arxiv.org/abs/2407.07890)|**[link](https://github.com/socialfoundations/training-on-the-test-task)**|**\u6211\u4eec\u7814\u7a76\u4e86\u4e00\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u4e2d\u7684\u6838\u5fc3\u95ee\u9898\uff0c\u79f0\u4e3a\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u3002\u8fd9\u5e76\u975e\u5982\u6570\u636e\u6cc4\u9732\u6216\u6c61\u67d3\u7b49\u4e0d\u5f53\u505a\u6cd5\uff0c\u800c\u662f\u4e00\u79cd\u9010\u6e10\u589e\u957f\u7684\u5305\u62ec\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u6280\u672f\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u8bad\u7ec3\u4f1a\u6df7\u6dc6\u6a21\u578b\u7684\u76f8\u5bf9\u8bc4\u4f30\u548c\u5173\u4e8e\u6d8c\u73b0\u80fd\u529b\u7684\u58f0\u660e\u3002\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u4e4b\u95f4\u7684\u770b\u4f3c\u4f18\u52bf\u53ef\u80fd\u7531\u4ed6\u4eec\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\u7a0b\u5ea6\u5dee\u5f02\u6240\u89e3\u91ca\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u65b9\u6cd5\uff0c\u5373\u5728\u6bd4\u8f83\u524d\u5bf9\u6bcf\u4e2a\u6a21\u578b\u8fdb\u884c\u76f8\u540c\u7684\u4efb\u52a1\u76f8\u5173\u6570\u636e\u5fae\u8c03\uff0c\u4ee5\u6821\u6b63\u8fd9\u79cd\u8bad\u7ec3\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u65e6\u8c03\u6574\u4e86\u5728\u6d4b\u8bd5\u4efb\u52a1\u4e0a\u7684\u8bad\u7ec3\uff0c\u6d8c\u73b0\u884c\u4e3a\u7684\u5b9e\u4f8b\u5927\u591a\u6d88\u5931\u3002\u540c\u6837\u9002\u7528\u4e8e\u90a3\u4e9b\u65e0\u6cd5\u7528\u8bc4\u4ef7\u6307\u6807\u89e3\u91ca\u7684\u6d8c\u73b0\u884c\u4e3a\u62a5\u544a\u6848\u4f8b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63a8\u52a8\u4e86\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b0\u8bc4\u4ef7\u89c6\u89d2\uff0c\u5bf9\u57fa\u51c6\u6d4b\u8bd5\u548c\u6d8c\u73b0\u80fd\u529b\u7814\u7a76\u5177\u6709\u5e7f\u6cdb\u5f71\u54cd\u3002**|\n", "2407.07880": "|**2024-07-10**|**Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization**|Junkang Wu et.al.|[2407.07880](http://arxiv.org/abs/2407.07880)|**[link](https://github.com/junkangwu/dr_dpo)**|**\u672c\u7814\u7a76\u5173\u6ce8\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u566a\u58f0\u5bf9Direct Preference Optimization (DPO)\u65b9\u6cd5\u7684\u6311\u6218\uff0c\u8be5\u65b9\u6cd5\u7528\u4e8e\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u3002\u6211\u4eec\u533a\u5206\u4e86\u4e24\u7c7b\u566a\u58f0\uff1a\u70b9\u566a\u58f0\uff0c\u6d89\u53ca\u4f4e\u8d28\u91cf\u7684\u6570\u636e\u70b9\uff1b\u548c\u6210\u5bf9\u566a\u58f0\uff0c\u5f71\u54cd\u504f\u597d\u7684\u6b63\u786e\u6392\u5e8f\u3002\u901a\u8fc7\u5206\u5e03\u5f0f\u9c81\u68d2\u4f18\u5316\uff08DRO\uff09\uff0c\u6211\u4eec\u589e\u5f3a\u4e86DPO\u62b5\u6297\u8fd9\u4e9b\u566a\u58f0\u7684\u80fd\u529b\u3002\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0cDPO\u672c\u8d28\u4e0a\u8574\u542b\u4e86DRO\u539f\u7406\uff0c\u5bf9\u70b9\u566a\u58f0\u5177\u6709\u5929\u7136\u7684\u9c81\u68d2\u6027\uff0c\u5176\u4e2d\u6b63\u5219\u5316\u7cfb\u6570$\\beta$\u5728\u6297\u566a\u58f0\u65b9\u9762\u8d77\u5173\u952e\u4f5c\u7528\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u5206\u5e03\u5f0f\u9c81\u68d2\u589e\u5f3a\u7684DPO\uff08Dr. DPO\uff09\uff0c\u5b83\u901a\u8fc7\u4f18\u5316\u6700\u574f\u60c5\u51b5\u7684\u6210\u5bf9\u573a\u666f\u6765\u96c6\u6210\u6210\u5bf9\u9c81\u68d2\u6027\u3002Dr. DPO\u4e2d\u7684\u65b0\u8d85\u53c2\u6570$\\beta'$\u5141\u8bb8\u5bf9\u6570\u636e\u5bf9\u53ef\u9760\u6027\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\uff0c\u5e73\u8861\u4e86\u5728\u5608\u6742\u8bad\u7ec3\u73af\u5883\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cDr. DPO\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u54cd\u5e94\u51c6\u786e\u6027\uff0c\u65e0\u8bba\u5728\u6709\u566a\u58f0\u8fd8\u662f\u65e0\u566a\u58f0\u7684\u8bbe\u7f6e\u4e0b\u90fd\u8868\u73b0\u51fa\u8272\u3002\u4ee3\u7801\u5df2\u5728https://github.com/junkangwu/Dr_DPO\u4e0a\u63d0\u4f9b\u3002**|\n", "2407.07858": "|**2024-07-10**|**FACTS About Building Retrieval Augmented Generation-based Chatbots**|Rama Akkiraju et.al.|[2407.07858](http://arxiv.org/abs/2407.07858)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u65e5\u76ca\u6210\u4e3a\u63d0\u5347\u5458\u5de5\u751f\u4ea7\u529b\u7684\u5173\u952e\u5de5\u5177\uff0c\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u53ca\u5982Langchain\u548cLlamaindex\u4e4b\u7c7b\u7684orchestration\u6846\u67b6\u5728\u6784\u5efa\u8fd9\u4e9b\u804a\u5929\u673a\u5668\u4eba\u4e2d\u626e\u6f14\u4e86\u91cd\u8981\u89d2\u8272\u3002\u7136\u800c\uff0c\u521b\u5efa\u6709\u6548\u7684\u4f01\u4e1a\u804a\u5929\u673a\u5668\u4eba\u662f\u4e00\u9879\u6311\u6218\uff0c\u9700\u8981\u7cbe\u5fc3\u8bbe\u8ba1\u7684RAG\u7ba1\u9053\u5de5\u7a0b\u3002\u8fd9\u5305\u62ec\u5fae\u8c03\u5d4c\u5165\u548cLLMs\u3001\u4ece\u5411\u91cf\u6570\u636e\u5e93\u63d0\u53d6\u6587\u6863\u3001\u91cd\u8ff0\u67e5\u8be2\u3001\u91cd\u65b0\u6392\u540d\u7ed3\u679c\u3001\u8bbe\u8ba1\u63d0\u793a\u3001\u9075\u5b88\u6587\u6863\u8bbf\u95ee\u63a7\u5236\u3001\u63d0\u4f9b\u7b80\u6d01\u7684\u56de\u7b54\u3001\u5305\u542b\u5f15\u7528\u3001\u4fdd\u62a4\u4e2a\u4eba\u4fe1\u606f\u4ee5\u53ca\u6784\u5efaorchestration\u4ee3\u7406\u3002\u6211\u4eec\u57fa\u4e8e\u4e09\u4e2aNVIDIA\u804a\u5929\u673a\u5668\u4eba\uff08\u5206\u522b\u7528\u4e8eIT/HR\u798f\u5229\u3001\u8d22\u52a1\u6536\u76ca\u548c\u901a\u7528\u5185\u5bb9\uff09\u7684\u7ecf\u9a8c\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u6784\u5efaRAG\u804a\u5929\u673a\u5668\u4eba\u7684\u6846\u67b6\u2014\u2014FACTS\uff08Freshness\u3001Architectures\u3001Cost\u3001Testing\u3001Security\uff09\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u65b9\u9762\uff1a\u9996\u5148\u4ecb\u7ecdFACTS\u6846\u67b6\uff0c\u5176\u6b21\u5217\u51fa\u5341\u4e94\u4e2aRAG\u7ba1\u9053\u63a7\u5236\u70b9\uff0c\u6700\u540e\u63d0\u4f9b\u4e86\u5173\u4e8e\u5927\u6a21\u578b\u548c\u5c0f\u6a21\u578b\u5728\u51c6\u786e\u6027\u548c\u5ef6\u8fdf\u4e4b\u95f4\u6743\u8861\u7684\u5b9e\u8bc1\u7ed3\u679c\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u7bc7\u5168\u9762\u63a2\u8ba8\u6784\u5efa\u5b89\u5168\u4f01\u4e1a\u7ea7\u804a\u5929\u673a\u5668\u4eba\u7684\u65b9\u6cd5\u548c\u89e3\u51b3\u65b9\u6848\u7684\u8bba\u6587\u3002|\n", "2407.07852": "|**2024-07-10**|**OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training**|Sami Jaghouar et.al.|[2407.07852](http://arxiv.org/abs/2407.07852)|**[link](https://github.com/PrimeIntellect-ai/OpenDiLoCo)**|**OpenDiLoCo\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u5206\u5e03\u5f0f\u4f4e\u901a\u4fe1\uff08DiLoCo\uff09\u8bad\u7ec3\u65b9\u6cd5\u7684\u5b9e\u73b0\u548c\u590d\u5236\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002\u6211\u4eec\u63d0\u4f9b\u4e86\u53ef\u590d\u73b0\u7684DiLoCo\u5b9e\u9a8c\uff0c\u901a\u8fc7Hivemind\u5e93\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u5927\u6d32\u548c\u4e09\u4e2a\u56fd\u5bb6\u4e4b\u95f4\u8bad\u7ec3\u6a21\u578b\uff0c\u540c\u65f6\u4fdd\u630190-95%\u7684\u8ba1\u7b97\u8d44\u6e90\u5229\u7528\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5173\u4e8e\u7b97\u6cd5\u8ba1\u7b97\u6548\u7387\u3001\u5de5\u4f5c\u5668\u6570\u91cf\u53ef\u6269\u5c55\u6027\u7684\u7814\u7a76\uff0c\u5e76\u8868\u660e\u5176\u68af\u5ea6\u53ef\u4ee5\u4f7f\u7528FP16\u8fdb\u884c\u5168\u5f52\u4e00\u5316\u800c\u4e0d\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c06OpenDiLoCo\u6269\u5c55\u5230\u539f\u59cb\u5de5\u4f5c\u7684\u4e09\u500d\u89c4\u6a21\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u767e\u4ebf\u53c2\u6570\u6a21\u578b\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2407.07845": "|**2024-07-10**|**Natural Language Mechanisms via Self-Resolution with Foundation Models**|Nicolas Della Penna et.al.|[2407.07845](http://arxiv.org/abs/2407.07845)|null|\u5728\u5b9e\u9645\u64cd\u4f5c\u4e2d\uff0c\u4ee3\u7406\u4eba\u901a\u5e38\u53d7\u9650\u4e8e\u8bf8\u5982\u4ea4\u6613\u6216\u8ba2\u5355\u4e4b\u7c7b\u7684\u6709\u9650\u62a5\u544a\u683c\u5f0f\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u4ed6\u4eec\u8868\u8fbe\u4fe1\u606f\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u673a\u5236\uff0c\u5b83\u4fc3\u4f7f\u4ee3\u7406\u4eba\u4ee5\u81ea\u7136\u8bed\u8a00\u63d0\u4ea4\u62a5\u544a\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u529f\u80fd\u6765\u9009\u62e9\u7ed3\u679c\u548c\u5206\u914d\u62a5\u916c\u3002\u6211\u4eec\u786e\u5b9a\u4e86\u8fd9\u4e9b\u673a\u5236\u5728LLM\u4f5c\u4e3a\u826f\u597d\u7684\u4e16\u754c\u6a21\u578b\u4ee5\u53ca\u5f3a\u70c8\u7684\u8de8\u4ee3\u7406\u4fe1\u606f\u8fc7\u5ea6\u786e\u5b9a\u6761\u4ef6\u4e0b\u7684\u6fc0\u52b1\u517c\u5bb9\u6027\u548c\u6548\u7387\u7684\u5fc5\u8981\u6761\u4ef6\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5f53\u4f20\u7edf\u9884\u6d4b\u5e02\u573a\u5728\u4fe1\u53f7\u7ed3\u6784\u4e0a\u5b58\u5728\u95ee\u9898\u65f6\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u673a\u5236\u80fd\u591f\u6210\u529f\u5730\u6574\u5408\u4fe1\u606f\u3002|\n", "2407.07810": "|**2024-07-10**|**Transformer Alignment in Large Language Models**|Murdock Aubry et.al.|[2407.07810](http://arxiv.org/abs/2407.07810)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u6df1\u5165\u7406\u89e3\u5176\u5185\u90e8\u673a\u5236\u81f3\u5173\u91cd\u8981\u3002\u6211\u4eec\u89c6LLMs\u4e3a\u9ad8\u7ef4\u7a7a\u95f4\u4e2d\u7684\u79bb\u6563\u3001\u8026\u5408\u7684\u975e\u7ebf\u6027\u52a8\u529b\u7cfb\u7edf\uff0c\u901a\u8fc7\u7814\u7a76tokens\u5728Transformer\u5757\u4e2d\u7684\u8f68\u8ff9\uff0c\u5e76\u6cbf\u7740\u8fd9\u4e9b\u8f68\u8ff9\u7ebf\u6027\u5316\u7cfb\u7edf\uff0c\u5229\u7528\u96c5\u53ef\u6bd4\u77e9\u9635\u8fdb\u884c\u5206\u6790\u3002\u5728\u5bf938\u4e2a\u516c\u5f00\u53ef\u7528\u7684LLMs\u8fdb\u884c\u7814\u7a76\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u6b8b\u5dee\u96c5\u53ef\u6bd4\u77e9\u9635\u7684\u4e0a\u5de6\u548c\u53f3\u5947\u5f02\u5411\u91cf\u4e4b\u95f4\u7684\u5bf9\u9f50\uff0c\u4ee5\u53ca\u7ebf\u6027\u6027\u548c\u5c42\u5185\u6307\u6570\u589e\u957f\u7684\u51fa\u73b0\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u5bf9\u9f50\u5ea6\u7684\u63d0\u9ad8\u4e0e\u6a21\u578b\u6027\u80fd\u5448\u6b63\u76f8\u5173\u3002\u8bad\u7ec3\u540e\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u521d\u59cb\u5316\u6743\u91cd\u65f6\u7684\u6307\u6807\uff0c\u6709\u663e\u8457\u6539\u5584\uff0c\u8fd9\u5f3a\u8c03\u4e86\u8bad\u7ec3\u5728Transformer\u67b6\u6784\u4e2d\u7684\u91cd\u8981\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u63ed\u793a\u4e86\u4e00\u79cd\u4ee5\u524d\u672a\u88ab\u5145\u5206\u8ba4\u8bc6\u7684\u89c4\u5f8b\u6027\uff0c\u5f3a\u5316\u4e86\u52a8\u529b\u5b66\u89e3\u91ca\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u7406\u89e3\u548c\u4f18\u5316LLM\u67b6\u6784\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.07799": "|**2024-07-10**|**Attribute or Abstain: Large Language Models as Long Document Assistants**|Jan Buchmann et.al.|[2407.07799](http://arxiv.org/abs/2407.07799)|**[link](https://github.com/ukplab/arxiv2024-attribute-or-abstain)**|**## \u80cc\u666f \u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8f85\u52a9\u5904\u7406\u957f\u7bc7\u6587\u6863\uff0c\u4f46\u5b83\u4eec\u4e5f\u5b58\u5728\u80e1\u8a00\u4e71\u8bed\u7684\u95ee\u9898\u3002\u589e\u52a0\u53ef\u4fe1\u5ea6\u7684\u65b9\u6cd5\u662f\u901a\u8fc7\u63d0\u4f9b\u8bc1\u636e\u652f\u6301\u54cd\u5e94\uff0c\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u3002\u5f53\u524d\u7684\u5f52\u56e0\u65b9\u6cd5\u4ec5\u5728\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u73af\u5883\u4e2d\u8bc4\u4f30\u8fc7\uff0c\u8fd9\u4e0e\u65e0\u9700\u68c0\u7d22\u7684\u957f\u6587\u6863\u573a\u666f\u4e0d\u540c\uff0c\u53ef\u80fd\u4ecd\u6709\u5e94\u7528\u4ef7\u503c\u3002\u56e0\u6b64\uff0c\u7f3a\u4e4f\u9488\u5bf9\u957f\u6587\u6863\u7684\u5f52\u56e0\u4e13\u95e8\u8bc4\u4f30\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLAB\uff0c\u4e00\u4e2a\u5305\u542b6\u4e2a\u591a\u6837\u5316\u7684\u957f\u6587\u6863\u4efb\u52a1\u7684\u57fa\u51c6\uff0c\u5e76\u5728\u56db\u79cd\u4e0d\u540c\u5927\u5c0f\u7684LLM\uff08\u5373\u63d0\u793a\u548c\u5fae\u8c03\uff09\u4e0a\u8bd5\u9a8c\u4e86\u4e0d\u540c\u7684\u5f52\u56e0\u65b9\u6cd5\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u4e00\u6b65\u751f\u6210\u5f15\u7528\uff08citation\uff0c\u5373\u540c\u65f6\u8fdb\u884c\u54cd\u5e94\u751f\u6210\u548c\u8bc1\u636e\u63d0\u53d6\uff09\u7684\u8868\u73b0\u6700\u4f73\u3002\u6211\u4eec\u8fd8\u63a2\u7a76\u4e86\u201c\u8ff7\u5931\u5728\u4e2d\u95f4\u201d\u73b0\u8c61\u662f\u5426\u9002\u7528\u4e8e\u5f52\u56e0\uff0c\u4f46\u672a\u53d1\u73b0\u8fd9\u79cd\u60c5\u51b5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8bc1\u636e\u8d28\u91cf\u5728\u7b80\u5355\u54cd\u5e94\u7684\u573a\u666f\u4e0b\u53ef\u4ee5\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\uff0c\u4f46\u5bf9\u4e8e\u590d\u6742\u54cd\u5e94\u5219\u4e0d\u7136\uff0c\u56e0\u4e3a\u6a21\u578b\u5728\u4e3a\u590d\u6742\u4e3b\u5f20\u63d0\u4f9b\u8bc1\u636e\u65f6\u9762\u4e34\u6311\u6218\u3002\u6211\u4eec\u516c\u5f00\u4e86\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4ee5\u4f9b\u8fdb\u4e00\u6b65\u7814\u7a76\u3002**|\n", "2407.07796": "|**2024-07-11**|**Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard**|Oguzhan Topsakal et.al.|[2407.07796](http://arxiv.org/abs/2407.07796)|**[link](https://github.com/research-outcome/llm-game-benchmark)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u4e14\u53ef\u6269\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u7f51\u683c\u578b\u6e38\u620f\u5982\u4e95\u5b57\u68cb\u3001\u8fde\u63a5\u56db\u548c\u56f4\u68cb\u8fdb\u884c\u3002\u5f00\u6e90\u7684\u6e38\u620f\u6a21\u62df\u4ee3\u7801\u5728GitHub\u4e0a\u63d0\u4f9b\uff0c\u5141\u8bb8LLMs\u7ade\u6280\uff0c\u5e76\u751f\u6210JSON\u3001CSV\u3001TXT\u548cPNG\u683c\u5f0f\u7684\u8be6\u7ec6\u6570\u636e\u6587\u4ef6\uff0c\u7528\u4e8e\u6392\u884c\u699c\u6392\u540d\u548c\u8fdb\u4e00\u6b65\u5206\u6790\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5305\u62ecAnthropic\u7684Claude 3.5 Sonnet\u548cClaude 3 Sonnet\uff0cGoogle\u7684Gemini 1.5 Pro\u548cGemini 1.5 Flash\uff0cOpenAI\u7684GPT-4 Turbo\u548cGPT-4o\uff0c\u4ee5\u53caMeta\u7684Llama3-70B\u5728\u5185\u7684\u9886\u5148LLM\u4e4b\u95f4\u7684\u6bd4\u8d5b\u7ed3\u679c\u3002\u6211\u4eec\u9f13\u52b1\u5176\u4ed6LLM\u63d0\u4ea4\u7ed3\u679c\u3002\u603b\u5171\u8fdb\u884c\u4e862,310\u573a\u6a21\u62df\u6bd4\u8d5b\uff08\u6bcf\u5bf9\u6a21\u578b\u8fdb\u884c5\u8f6e\uff0c\u51717\u4e2a\u6a21\u578b\u95f4\u7684\u5bf9\u5c40\uff0c\u4ee5\u53ca\u4e0e\u968f\u673a\u73a9\u5bb6\u7684\u6bd4\u8d5b\uff09\uff0c\u6db5\u76d6\u4e09\u79cd\u7c7b\u578b\u7684\u6e38\u620f\uff0c\u4f7f\u7528\u4e86\u5217\u8868\u3001\u63d2\u56fe\u548c\u56fe\u50cf\u4e09\u79cd\u63d0\u793a\u65b9\u5f0f\u3002\u7ed3\u679c\u663e\u793a\uff0cLLM\u5728\u4e0d\u540c\u6e38\u620f\u548c\u63d0\u793a\u7c7b\u578b\u4e0b\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\uff0c\u5206\u6790\u5185\u5bb9\u5305\u62ec\u80dc\u7387\u3001\u9519\u5931\u673a\u4f1a\u548c\u65e0\u6548\u52a8\u4f5c\u3002\u6392\u884c\u699c\u548c\u7ed3\u679c\u77e9\u9635\u7684\u8be6\u7ec6\u6570\u636e\u4f5c\u4e3a\u5f00\u653e\u8bbf\u95ee\u6570\u636e\u5728GitHub\u4e0a\u63d0\u4f9b\u3002\u8fd9\u9879\u7814\u7a76\u52a0\u6df1\u4e86\u6211\u4eec\u5bf9LLM\u5728\u672a\u4e13\u95e8\u8bad\u7ec3\u7684\u6e38\u620f\u4e2d\u7684\u80fd\u529b\u7684\u7406\u89e3\uff0c\u6709\u52a9\u4e8e\u8bc4\u4f30\u5b83\u4eec\u7684\u89c4\u5219\u7406\u89e3\u80fd\u529b\u548c\u6218\u7565\u601d\u7ef4\u3002\u5728\u901a\u5411\u4eba\u5de5\u667a\u80fd\u901a\u7528\u6027\u7684\u9053\u8def\u4e0a\uff0c\u8fd9\u9879\u7814\u7a76\u4e3a\u672a\u6765\u63a2\u7d22\u5b83\u4eec\u5728\u590d\u6742\u51b3\u7b56\u573a\u666f\u4e2d\u7684\u5b9e\u7528\u6027\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u63ed\u793a\u4e86\u5b83\u4eec\u7684\u6218\u7565\u601d\u8003\u80fd\u529b\uff0c\u5e76\u4e3a\u6df1\u5165\u63a2\u7a76LLM\u5728\u57fa\u4e8e\u6e38\u620f\u6846\u67b6\u5185\u7684\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u65b9\u5411\u3002**|\n", "2407.07791": "|**2024-07-10**|**Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities**|Tianjie Ju et.al.|[2407.07791](http://arxiv.org/abs/2407.07791)|**[link](https://github.com/Jometeorie/KnowledgeSpread)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u8fc5\u901f\u5e94\u7528\uff0c\u5b83\u4eec\u5728\u534f\u4f5c\u95ee\u9898\u89e3\u51b3\u548c\u81ea\u4e3b\u8c08\u5224\u7b49\u9886\u57df\u7684\u51fa\u8272\u6027\u80fd\u5f15\u8d77\u4e86\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u5b89\u5168\u95ee\u9898\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u77e5\u8bc6\u64cd\u7eb5\u4f20\u64ad\u65b9\u9762\u3002\u672c\u6587\u901a\u8fc7\u6784\u5efa\u8be6\u7ec6\u7684\u5a01\u80c1\u6a21\u578b\u548c\u6a21\u62df\u73af\u5883\uff0c\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u5728\u53ef\u4fe1\u5e73\u53f0\u4e0a\uff0c\u63a2\u8ba8\u8fd9\u4e00\u5173\u952e\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u653b\u51fb\u65b9\u6cd5\uff0c\u5305\u62ec\u8bf4\u670d\u6027\u6ce8\u5165\u548c\u64cd\u7eb5\u77e5\u8bc6\u6ce8\u5165\uff0c\u6765\u7cfb\u7edf\u5730\u63a2\u7a76\u5728\u65e0\u660e\u786e\u63d0\u793a\u64cd\u7eb5\u7684\u60c5\u51b5\u4e0b\uff0c\u5982\u4f55\u6f5c\u5728\u5730\u4f20\u64ad\u64cd\u7eb5\u77e5\u8bc6\uff08\u5982\u865a\u6784\u548c\u6709\u5bb3\u77e5\u8bc6\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u4e86LLMs\u5904\u7406\u4e16\u754c\u77e5\u8bc6\u56fa\u6709\u7684\u6f0f\u6d1e\uff0c\u653b\u51fb\u8005\u53ef\u4ee5\u501f\u6b64\u65e0\u610f\u8bc6\u5730\u4f20\u64ad\u7f16\u9020\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u8bf1\u5bfc\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5728\u4ea4\u6d41\u4e2d\u4f20\u64ad\u8fd9\u4e24\u79cd\u64cd\u7eb5\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u4e0d\u4f1a\u663e\u8457\u964d\u4f4e\u5b83\u4eec\u7684\u57fa\u7840\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u64cd\u7eb5\u4f1a\u6301\u7eed\u5b58\u5728\u4e8e\u6d41\u884c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6846\u67b6\u4e2d\uff0c\u5373\u4f7f\u4ea4\u4e92\u7ed3\u675f\uff0c\u82e5\u5e72\u826f\u6027\u4ee3\u7406\u4e5f\u53ef\u80fd\u7ee7\u7eed\u53d7\u5230\u64cd\u7eb5\u804a\u5929\u8bb0\u5f55\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u5b89\u5168\u98ce\u9669\uff0c\u5f3a\u8c03\u4e86\u5bf9\u64cd\u7eb5\u77e5\u8bc6\u4f20\u64ad\u8fdb\u884c\u5f3a\u5927\u9632\u5fa1\u7684\u8feb\u5207\u9700\u6c42\uff0c\u6bd4\u5982\u5f15\u5165\u201c\u5b88\u62a4\u201d\u4ee3\u7406\u548c\u5148\u8fdb\u7684\u4e8b\u5b9e\u6838\u67e5\u5de5\u5177\u3002**|\n", "2407.07778": "|**2024-07-10**|**WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment**|Jiefu Ou et.al.|[2407.07778](http://arxiv.org/abs/2407.07778)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7269\u7406\u73af\u5883\u4e2d\u90e8\u7f72\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u4ee3\u7406\u65f6\u6240\u9700\u7684\u57fa\u672c\u64cd\u4f5c\uff08API\uff09\u6570\u91cf\u548c\u8bbe\u8ba1\u95ee\u9898\u3002\u7814\u7a76\u8005\u8bbe\u60f3\uff0c\u5982\u679cwikiHow\u6559\u7a0b\u6db5\u76d6\u4e86\u5e7f\u6cdb\u7684\u7528\u6237\u81ea\u7f16\u4efb\u52a1\uff0c\u90a3\u4e48\u8fd9\u4e9b\u4efb\u52a1\u6240\u9700\u7684API\u8303\u56f4\u662f\u4ec0\u4e48\u3002\u4ed6\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06wikiHow\u6307\u4ee4\u4e0e\u7f6e\u8eab\u4e8e\u73af\u5883\u4e2d\u7684\u4ee3\u7406\u7b56\u7565\u5173\u8054\uff0c\u8fed\u4ee3\u5730\u751f\u6210\u65b0\u7684API\u3002\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4f53\u611f\u89c4\u5212\u65b9\u9762\u7684\u6700\u65b0\u6210\u5c31\uff0c\u7814\u7a76\u8005\u63d0\u8bae\u4f7f\u7528\u5c11\u91cf\u6837\u4f8b\u63d0\u793aGPT-4\u751f\u6210Python\u4ee3\u7801\u4f5c\u4e3a\u4ee3\u7406\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4ee5\u4e0b\u6b65\u9aa4\u6269\u5c55API\u5e93\uff1a1\uff09\u91cd\u7528\u521d\u59cbAPI\u96c6\uff1b2\uff09\u5728\u5fc5\u8981\u65f6\u521b\u5efa\u65b0\u7684API\u8c03\u7528\u3002\u5b9e\u9a8c\u5173\u6ce8\u7684\u662f\u5b9a\u4e49API\uff0c\u800c\u975e\u5176\u5b9e\u73b0\u6027\u3002\u5728\u4e00\u5c0f\u90e8\u5206wikiHow\u6559\u7a0b\u4e0a\u5e94\u7528\u8be5\u65b9\u6cd5\u540e\uff0c\u53d1\u73b0\u9700\u8981300\u591a\u4e2aAPI\u6765\u6355\u6349\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u591a\u6837\u4efb\u52a1\u3002\u81ea\u52a8\u548c\u4eba\u5de5\u5206\u6790\u663e\u793a\uff0c\u63d0\u51fa\u7684\u7ba1\u9053\u80fd\u6709\u6548\u590d\u7528\u548c\u521b\u9020API\u3002\u8fdb\u4e00\u6b65\u7684\u4eba\u5de5\u5ba1\u67e5\u53d1\u73b0\uff0c\u73b0\u6709\u7684\u6a21\u62df\u5668\u4ec5\u652f\u6301\u8bf1\u5bfc\u51fa\u7684API\u7684\u4e00\u5c0f\u90e8\u5206\uff08\u524d50\u4e2a\u5e38\u7528API\u4e2d\u76849\u4e2a\uff09\uff0c\u8fd9\u4fc3\u4f7f\u5f00\u53d1\u66f4\u4e30\u5bcc\u7684\u4f53\u611f\u73af\u5883\u3002|\n", "2407.08739": "|**2024-07-11**|**MAVIS: Mathematical Visual Instruction Tuning**|Renrui Zhang et.al.|[2407.08739](http://arxiv.org/abs/2407.08739)|**[link](https://github.com/zrrskywalker/mavis)**|**### \u80cc\u666f \u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u8fd1\u5e74\u6765\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u591a\u6a21\u6001\u573a\u666f\u4e2d\u7684\u8868\u73b0\u7a81\u51fa\uff0c\u4f46\u5bf9\u6570\u5b66\u56fe\u89e3\u7684\u6570\u5b66\u95ee\u9898\u6c42\u89e3\u80fd\u529b\u7814\u7a76\u5c1a\u663e\u4e0d\u8db3\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6307\u51fa\u4e86MLLM\u5728\u6570\u5b66\u89c6\u89c9\u9886\u57df\u7684\u4e09\u4e2a\u5173\u952e\u6539\u8fdb\u9886\u57df\uff1a\u6570\u5b66\u56fe\u89e3\u7684\u89c6\u89c9\u7f16\u7801\u3001\u56fe\u89e3\u4e0e\u8bed\u8a00\u7684\u5bf9\u9f50\u4ee5\u53ca\u6570\u5b66\u63a8\u7406\u6280\u80fd\u3002\u8fd9\u4fc3\u4f7f\u6211\u4eec\u9700\u8981\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u7684\u89c6\u89c9\u6570\u5b66\u6570\u636e\u548c\u8bad\u7ec3\u6d41\u7a0b\u3002\u672c\u6587\u63d0\u51faMAVIS\uff08Mathematical VISual instruction tuning for MLLMs\uff09\uff0c\u4e00\u4e2a\u9488\u5bf9MLLM\u7684\u6570\u5b66\u89c6\u89c9\u6307\u5bfc\u8c03\u53c2\u8303\u5f0f\uff0c\u5305\u62ec\u4e00\u7cfb\u5217\u6570\u5b66\u89c6\u89c9\u6570\u636e\u96c6\u548c\u4e13\u95e8\u7684MLLM\u3002 ### \u65b9\u6cd5 MAVIS\u5206\u4e3a\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u4ece\u5934\u5f00\u59cb\u7684\u8bad\u7ec3\u3002\u9996\u5148\uff0c\u6211\u4eec\u521b\u5efa\u4e86MAVIS-Caption\uff0c\u5305\u542b558,000\u4e2a\u56fe\u89e3-\u63cf\u8ff0\u5bf9\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u5fae\u8c03\u4e13\u4e3a\u6570\u5b66\u8bbe\u8ba1\u7684\u89c6\u89c9\u7f16\u7801\u5668\uff08CLIP-Math\uff09\uff0c\u4ee5\u63d0\u5347\u56fe\u89e3\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u5176\u6b21\uff0c\u5229\u7528MAVIS-Caption\uff0c\u6211\u4eec\u901a\u8fc7\u6295\u5f71\u5c42\u5c06CLIP-Math\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5173\u8054\uff0c\u589e\u5f3a\u6570\u5b66\u9886\u57df\u7684\u89c6\u89c9\u8bed\u8a00\u5bf9\u9f50\u3002\u6700\u540e\uff0c\u6211\u4eec\u5f15\u5165MAVIS-Instruct\uff0c\u5305\u542b900,000\u4e2a\u7cbe\u5fc3\u6536\u96c6\u548c\u6807\u6ce8\u7684\u89c6\u89c9\u6570\u5b66\u95ee\u9898\uff0c\u7528\u4e8e\u6700\u7ec8\u6307\u5bfc\u8c03\u53c2\uff0c\u4ee5\u589e\u5f3aMLLM\u7684\u7a33\u5065\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u5728MAVIS-Instruct\u4e2d\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u6bcf\u4e2a\u95ee\u9898\u7684\u5b8c\u6574\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u7406\u7531\uff0c\u5e76\u51cf\u5c11\u6587\u672c\u5197\u4f59\uff0c\u4f7f\u6a21\u578b\u66f4\u4e13\u6ce8\u4e8e\u89c6\u89c9\u5143\u7d20\u3002 ### \u7ed3\u679c \u6570\u636e\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/ZrrSkywalker/MAVIS\u3002\u901a\u8fc7MAVIS\uff0c\u6211\u4eec\u65e8\u5728\u586b\u8865\u6570\u5b66\u89c6\u89c9\u7406\u89e3\u7684\u7a7a\u767d\uff0c\u63d0\u5347MLLM\u5728\u89e3\u51b3\u5b9e\u9645\u6570\u5b66\u95ee\u9898\u65f6\u7684\u8868\u73b0\u3002**|\n", "2407.08735": "|**2024-07-11**|**Real-Time Anomaly Detection and Reactive Planning with Large Language Models**|Rohan Sinha et.al.|[2407.08735](http://arxiv.org/abs/2407.08735)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u5728\u673a\u5668\u4eba\u7cfb\u7edf\u4e2d\u68c0\u6d4b\u548c\u5e94\u5bf9\u5f02\u5e38\u60c5\u51b5\uff0c\u4ee5\u63d0\u9ad8\u5176\u9c81\u68d2\u6027\u548c\u5b89\u5168\u6027\u3002\u4e3b\u8981\u6311\u6218\u5305\u62ec\u51cf\u5c11\u6a21\u578b\u7684\u8ba1\u7b97\u5f00\u9500\u4ee5\u4fbf\u5b9e\u73b0\u5b9e\u65f6\u5e94\u7528\uff0c\u4ee5\u53ca\u5c06\u6a21\u578b\u7684\u5224\u65ad\u878d\u5165\u5230\u5b89\u5168\u63a7\u5236\u6846\u67b6\u4e2d\u3002\u7814\u7a76\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u63a8\u7406\u6846\u67b6\uff1a\u9996\u5148\u662f\u4e00\u4e2a\u5feb\u901f\u7684\u4e8c\u5143\u5f02\u5e38\u5206\u7c7b\u5668\uff0c\u5b83\u5728\u8bed\u8a00\u6a21\u578b\u5d4c\u5165\u7a7a\u95f4\u4e2d\u5206\u6790\u89c2\u6d4b\u6570\u636e\uff0c\u5982\u679c\u53d1\u73b0\u5f02\u5e38\uff0c\u4f1a\u89e6\u53d1\u540e\u7eed\u7684\u6162\u901f\u63a8\u7406\u9636\u6bb5\uff0c\u5229\u7528\u751f\u6210\u5f0f\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6df1\u5165\u7684\u903b\u8f91\u63a8\u7406\u3002\u8fd9\u79cd\u8bbe\u8ba1\u7c7b\u4f3c\u4e8e\u6a21\u578b\u9884\u6d4b\u63a7\u5236\u4e2d\u7684\u51b3\u7b56\u5206\u652f\uff0c\u8003\u8651\u5230\u6162\u901f\u63a8\u7406\u5668\u7684\u5ef6\u8fdf\uff0c\u53ef\u4ee5\u7acb\u5373\u91c7\u53d6\u5907\u4efd\u8ba1\u5212\uff0c\u786e\u4fdd\u7cfb\u7edf\u7684\u5b89\u5168\u6027\u3002 \u901a\u8fc7\u4e0e\u6700\u5148\u8fdb\u7684GPT\u6a21\u578b\u7684\u81ea\u56de\u5f52\u63a8\u7406\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5373\u4f7f\u4f7f\u7528\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ed6\u4eec\u7684\u5feb\u901f\u5f02\u5e38\u5206\u7c7b\u5668\u4e5f\u8868\u73b0\u51fa\u8272\u3002\u8fd9\u4f7f\u5f97\u4ed6\u4eec\u5f00\u53d1\u7684\u8fd0\u884c\u65f6\u76d1\u63a7\u5668\u80fd\u591f\u5728\u8d44\u6e90\u548c\u65f6\u95f4\u9650\u5236\u4e0b\uff0c\u63d0\u5347\u52a8\u6001\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5982\u56db\u65cb\u7ffc\u65e0\u4eba\u673a\u6216\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u4fe1\u4efb\u5ea6\u3002\u8bba\u6587\u7684\u89c6\u9891\u793a\u4f8b\u53ef\u4ee5\u5728\u9879\u76ee\u9875\u9762\u4e0a\u67e5\u770b\uff1ahttps://sites.google.com/view/aesop-llm\u3002|\n", "2407.08733": "|**2024-07-11**|**Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist**|Zihao Zhou et.al.|[2407.08733](http://arxiv.org/abs/2407.08733)|null|### \u7ffb\u8bd1 **\u6458\u8981\uff1a** \u5f3a\u5927\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5353\u8d8a\u6027\u80fd\u7684\u5173\u952e\u4f53\u73b0\u3002\u5982\u4f55\u5b9a\u4e49\u548c\u5168\u9762\u8bc4\u4f30LLMs\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ee5\u53ca\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u53cd\u6620\u7528\u6237\u4f53\u9a8c\uff0c\u5df2\u6210\u4e3a\u5173\u952e\u95ee\u9898\u3002\u76ee\u524d\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e3b\u8981\u4fa7\u91cd\u4e8e\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u8fc7\u62df\u5408\uff0c\u5e76\u65e0\u6cd5\u51c6\u786e\u53cd\u6620\u771f\u6b63\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u5982\u679c\u6a21\u578b\u771f\u6b63\u7406\u89e3\u4e86\u95ee\u9898\uff0c\u5b83\u5e94\u8be5\u80fd\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u7a33\u5065\u4e14\u7075\u6d3b\u5730\u5e94\u7528\u3002\u5728\u6b64\u542f\u53d1\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMATHCHECK\uff0c\u4e00\u4e2a\u65e8\u5728\u6d4b\u8bd5\u4efb\u52a1\u6cdb\u5316\u548c\u63a8\u7406\u9c81\u68d2\u6027\u7684\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6e05\u5355\uff0c\u4ee5\u53ca\u4e00\u4e2a\u81ea\u52a8\u751f\u6210\u6e05\u5355\u7684\u5de5\u5177\u3002MATHCHECK\u5305\u542b\u591a\u4e2a\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u548c\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u6570\u5b66\u63a8\u7406\u80fd\u529b\u548c\u884c\u4e3a\u6d4b\u8bd5\u7684\u5168\u9762\u8bc4\u4f30\u3002\u6211\u4eec\u5229\u7528MATHCHECK\u521b\u5efa\u4e86MATHCHECK-GSM\u548cMATHCHECK-GEO\uff0c\u5206\u522b\u9488\u5bf9\u6570\u5b66\u6587\u672c\u63a8\u7406\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u8fdb\u884c\u8bc4\u4f30\uff0c\u5b83\u4eec\u662fGSM8k\u3001GeoQA\u3001UniGeo\u548cGeometry3K\u7b49\u57fa\u51c6\u7684\u5347\u7ea7\u7248\u3002\u6211\u4eec\u4f7f\u7528MATHCHECK-GSM\u548cMATHCHECK-GEO\u5bf9\u8d85\u8fc720\u79cdLLM\u548c11\u79cd\u591a\u6a21\u6001LLMs\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u4ee5\u68c0\u9a8c\u5b83\u4eec\u7684\u7efc\u5408\u6570\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u524d\u6cbf\u6a21\u578b\u5982GPT-4\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u4ed6\u6a21\u578b\u5bb6\u65cf\u5728\u6e05\u5355\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4e0b\u964d\u3002\u8fdb\u4e00\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u4e0e\u4f20\u7edf\u6570\u5b66\u57fa\u51c6\u76f8\u6bd4\uff0cMATHCHECK\u66f4\u597d\u5730\u53cd\u6620\u4e86\u771f\u6b63\u7684\u6570\u5b66\u80fd\u529b\uff0c\u7ebf\u6027\u5ea6\u66f4\u9ad8\uff0c\u4ece\u800c\u652f\u6301\u6211\u4eec\u7684\u8bbe\u8ba1\u3002\u901a\u8fc7MATHCHECK\uff0c\u6211\u4eec\u53ef\u4ee5\u8f7b\u677e\u8fdb\u884c\u8be6\u7ec6\u7684\u884c\u4e3a\u5206\u6790\uff0c\u6df1\u5165\u63a2\u7a76\u6a21\u578b\u3002|\n", "2407.08716": "|**2024-07-11**|**A Taxonomy for Data Contamination in Large Language Models**|Medha Palavalli et.al.|[2407.08716](http://arxiv.org/abs/2407.08716)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5e7f\u6cdb\u7f51\u7edc\u8bed\u6599\u5e93\u7684\u9884\u8bad\u7ec3\u540e\uff0c\u5728\u4f17\u591a\u4e0b\u6e38\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u6570\u636e\u6c61\u67d3\u95ee\u9898\u65e5\u76ca\u5f15\u8d77\u5173\u6ce8\uff0c\u5373\u8bc4\u4f30\u6570\u636e\u53ef\u80fd\u5b58\u5728\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e2d\uff0c\u5bfc\u81f4\u6a21\u578b\u8868\u73b0\u865a\u9ad8\u3002\u53bb\u6c61\u67d3\uff08decontamination\uff09\u4f5c\u4e3a\u4e00\u79cd\u53ef\u80fd\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u8bd5\u56fe\u68c0\u6d4b\u5e76\u79fb\u9664\u8fd9\u4e9b\u6c61\u67d3\u6570\u636e\u3002\u7136\u800c\uff0c\u6c61\u67d3\u6570\u636e\u53ef\u80fd\u6e90\u4e8e\u6d4b\u8bd5\u96c6\u7684\u4fee\u6539\u7248\u672c\uff0c\u8fd9\u4f7f\u5f97\u68c0\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u76ee\u524d\u5c1a\u4e0d\u6e05\u695a\u4e0d\u540c\u7c7b\u578b\u7684\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5206\u7c7b\u4f53\u7cfb\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u9047\u5230\u7684\u5404\u79cd\u6c61\u67d3\u7c7b\u578b\u8fdb\u884c\u5212\u5206\uff0c\u5e76\u786e\u5b9a\u4e86\u54ea\u4e9b\u7c7b\u578b\u7684\u98ce\u9669\u6700\u9ad8\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u603b\u7ed3\u548c\u95ee\u7b54\u4e24\u4e2a\u5173\u952e\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u7c7b\u578b\u6c61\u67d3\u5982\u4f55\u5f71\u54cd\u6a21\u578b\u5728\u5b9e\u9645\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u3002|\n", "2407.08713": "|**2024-07-11**|**GTA: A Benchmark for General Tool Agents**|Jize Wang et.al.|[2407.08713](http://arxiv.org/abs/2407.08713)|**[link](https://github.com/open-compass/GTA)**|**\u4eba\u4eec\u666e\u904d\u5173\u6ce8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u5404\u79cd\u5de5\u5177\u7684\u6574\u5408\uff0c\u4ee5\u5f00\u53d1\u901a\u7528\u4ee3\u7406\uff0c\u4f46\u8fd9\u5bf9LLMs\u7684\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u63d0\u51fa\u4e86\u6311\u6218\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u6cd5\u5b58\u5728\u660e\u663e\u7f3a\u9677\uff0c\u5982\u4f7f\u7528AI\u751f\u6210\u7684\u67e5\u8be2\u3001\u5355\u6b65\u9aa4\u4efb\u52a1\u3001\u6a21\u62df\u5de5\u5177\u4ee5\u53ca\u4ec5\u9650\u6587\u672c\u7684\u4ea4\u4e92\uff0c\u672a\u80fd\u5145\u5206\u5c55\u793a\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u95ee\u9898\u89e3\u51b3\u4e2d\u7684\u80fd\u529b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51faGTA\uff08\u901a\u7528\u5de5\u5177\u4ee3\u7406\u57fa\u51c6\uff09\uff0c\u5b83\u5305\u542b\u4e09\u4e2a\u5173\u952e\u7279\u6027\uff1a\uff081\uff09\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff1a\u7531\u4eba\u7c7b\u7f16\u5199\uff0c\u5177\u6709\u7b80\u5355\u7684\u73b0\u5b9e\u4e16\u754c\u76ee\u6807\uff0c\u4f46\u9690\u542b\u4e86\u5de5\u5177\u4f7f\u7528\u9700\u6c42\uff0c\u8981\u6c42LLMs\u80fd\u63a8\u7406\u51fa\u5408\u9002\u7684\u5de5\u5177\u5e76\u89c4\u5212\u89e3\u51b3\u65b9\u6848\u6b65\u9aa4\u3002\uff082\uff09\u771f\u5b9e\u90e8\u7f72\u7684\u5de5\u5177\uff1a\u4e00\u4e2a\u914d\u5907\u6709\u611f\u77e5\u3001\u64cd\u4f5c\u3001\u903b\u8f91\u548c\u521b\u65b0\u7c7b\u5de5\u5177\u7684\u8bc4\u4f30\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u7684\u5b9e\u9645\u4efb\u52a1\u6267\u884c\u6027\u80fd\u3002\uff083\uff09\u771f\u5b9e\u7684\u591a\u6a21\u6001\u8f93\u5165\uff1a\u5305\u62ec\u7a7a\u95f4\u573a\u666f\u56fe\u7247\u3001\u7f51\u9875\u622a\u56fe\u3001\u8868\u683c\u3001\u4ee3\u7801\u7247\u6bb5\u548c\u6253\u5370/\u624b\u5199\u6750\u6599\u7b49\uff0c\u4ee5\u8d34\u8fd1\u771f\u5b9e\u4e16\u754c\u7684\u573a\u666f\u3002 \u6211\u4eec\u8bbe\u8ba1\u4e86229\u4e2a\u73b0\u5b9e\u751f\u6d3b\u4efb\u52a1\u548c\u53ef\u6267\u884c\u7684\u5de5\u5177\u94fe\uff0c\u6765\u8bc4\u4f30\u4e3b\u6d41LLMs\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5bf9\u4e8e\u771f\u5b9e\u7684\u7528\u6237\u67e5\u8be2\uff0c\u73b0\u6709\u7684LLMs\u9762\u4e34\u4e25\u5cfb\u6311\u6218\uff0cGPT-4\u5b8c\u6210\u7684\u4efb\u52a1\u4e0d\u8db3\u4e00\u534a\uff0c\u5927\u591a\u6570\u6a21\u578b\u7684\u6210\u7ee9\u4f4e\u4e8e25%\u3002\u8fd9\u4e2a\u8bc4\u4f30\u63ed\u793a\u4e86\u5f53\u524dLLMs\u5728\u5b9e\u9645\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u4e0a\u7684\u74f6\u9888\uff0c\u4e3a\u63d0\u5347\u901a\u7528\u5de5\u5177\u4ee3\u7406\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u65b9\u5411\u3002GTA\u7684\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u53ef\u5728\u83b7\u53d6\u3002**|\n", "2407.08701": "|**2024-07-11**|**Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models**|Zhening Xing et.al.|[2407.08701](http://arxiv.org/abs/2407.08701)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u673a\u5236\uff0c\u5728\u6587\u672c\u548c\u97f3\u9891\u6d41\u6570\u636e\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u6548\u679c\u3002\u7136\u800c\uff0c\u5c3d\u7ba1\u5bf9\u5b9e\u65f6\u89c6\u9891\u5904\u7406\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u89c6\u9891\u6d41\u5904\u7406\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u8f83\u5c11\u3002\u73b0\u6709\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u4f9d\u8d56\u53cc\u5411\u65f6\u95f4\u6ce8\u610f\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5b83\u4eec\u5904\u7406\u76f4\u64ad\u89c6\u9891\u7684\u80fd\u529b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faLive2Diff\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u4e3a\u5b9e\u65f6\u89c6\u9891\u7ffb\u8bd1\u8bbe\u8ba1\u7684\u5177\u6709\u5355\u5411\u65f6\u95f4\u6ce8\u610f\u529b\u7684\u89c6\u9891\u6269\u6563\u6a21\u578b\u3002\u4e0e\u5148\u524d\u5de5\u4f5c\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u901a\u8fc7\u4e0e\u524d\u4e00\u5e27\u53ca\u5176\u5c11\u6570\u9884\u70ed\u5e27\u76f8\u5173\u8054\uff0c\u4fdd\u6301\u4e86\u65f6\u95f4\u4e00\u81f4\u6027\u548c\u5e73\u6ed1\u6027\uff0c\u65e0\u9700\u8003\u8651\u672a\u6765\u5e27\u3002\u540c\u65f6\uff0c\u6211\u4eec\u91c7\u7528\u9ad8\u6548\u7684\u964d\u566a\u65b9\u6848\uff0c\u5305\u62ecKV\u7f13\u5b58\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u5904\u7406\uff0c\u4ee5\u652f\u6301\u4e92\u52a8\u5e27\u7387\u4e0b\u7684\u89c6\u9891\u6d41\u7ffb\u8bd1\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6ce8\u610f\u529b\u673a\u5236\u548c\u6d41\u6c34\u7ebf\u8bbe\u8ba1\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5728\u4fdd\u6301\u65f6\u95f4\u5e73\u6ed1\u6027\u548c/\u6216\u6548\u7387\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.08699": "|**2024-07-11**|**Mitigating Catastrophic Forgetting in Language Transfer via Model Merging**|Anton Alexandrov et.al.|[2407.08699](http://arxiv.org/abs/2407.08699)|null|\u968f\u7740\u5f00\u653e\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u82f1\u8bed\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u4e0d\u65ad\u63d0\u5347\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u81f4\u529b\u4e8e\u5c06\u5176\u6269\u5c55\u5230\u5176\u4ed6\u8bed\u8a00\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u8bed\u8a00\u9002\u5e94\u5f80\u5f80\u4f1a\u5bfc\u81f4\u57fa\u7840\u6a21\u578b\u80fd\u529b\u7684\u707e\u96be\u6027\u9057\u5fd8\uff0c\u9650\u5236\u4e86\u6539\u7f16\u540e\u6a21\u578b\u7684\u5b9e\u7528\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u9002\u5e94\u65b9\u6cd5\u2014\u2014Branch-and-Merge\uff08BaM\uff09\uff0c\u5b83\u57fa\u4e8e\u8fed\u4ee3\u5730\u5408\u5e76\u591a\u4e2a\u9488\u5bf9\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u7684\u6a21\u578b\u3002BaM\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u4ea7\u751f\u7684\u662f\u5e45\u5ea6\u8f83\u5c0f\u4f46\u8d28\u91cf\u66f4\u9ad8\u7684\u6743\u91cd\u8c03\u6574\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u6e90\u9886\u57df\u7684\u9057\u5fd8\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u76ee\u6807\u9886\u57df\u7684\u5b66\u4e60\u3002 \u6211\u4eec\u5728\u4fdd\u52a0\u5229\u4e9a\u8bed\u548c\u5fb7\u8bed\u7684\u5e7f\u6cdb\u5b9e\u8bc1\u7814\u7a76\u4e2d\u5c55\u793a\u4e86BaM\u7684\u4f18\u52bf\uff1a\u5b83\u80fd\u663e\u8457\u964d\u4f4e\u9057\u5fd8\uff0c\u540c\u65f6\u5728\u4e0d\u540c\u6a21\u578b\u67b6\u6784\u4e0a\u4e0e\u6807\u51c6\u6301\u7eed\u9884\u8bad\u7ec3\u548c\u6307\u4ee4\u5fae\u8c03\u76f8\u6bd4\uff0c\u80fd\u591f\u5339\u914d\u751a\u81f3\u63d0\u5347\u76ee\u6807\u9886\u57df\u7684\u6027\u80fd\u3002|\n", "2407.08694": "|**2024-07-11**|**Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight**|Zhiqiang Xie et.al.|[2407.08694](http://arxiv.org/abs/2407.08694)|null|\u5728\u73b0\u4ee3\u4e91\u7cfb\u7edf\u4e2d\uff0c\u8fd0\u884c\u65f6\u6545\u969c\u548c\u6027\u80fd\u4e0b\u964d\u662f\u5e38\u6001\u3002\u5bf9\u4e8e\u4e91\u670d\u52a1\u63d0\u4f9b\u5546\u800c\u8a00\uff0c\u81ea\u52a8\u786e\u5b9a\u95ee\u9898\u7684\u6839\u672c\u539f\u56e0\u662f\u4fdd\u8bc1\u9ad8\u53ef\u9760\u6027\u548c\u53ef\u7528\u6027\u7684\u5173\u952e\uff0c\u56e0\u4e3a\u5feb\u901f\u7684\u6545\u969c\u5b9a\u4f4d\u6709\u52a9\u4e8e\u52a0\u5feb\u8bca\u65ad\u548c\u4f18\u5148\u7ea7\u6392\u5e8f\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u89e3\u51b3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u4e2d\uff0c\u56e0\u679c\u63a8\u7406\u5229\u7528\u56e0\u679c\u56fe\u6765\u6355\u6349\u4e0d\u540c\u4e91\u7cfb\u7edf\u6027\u80fd\u6307\u6807\u4e4b\u95f4\u7684\u5173\u7cfb\u662f\u4e00\u4e2a\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7136\u800c\uff0c\u7cfb\u7edf\u5f00\u53d1\u8005\u9700\u8981\u7cbe\u786e\u5b9a\u4e49\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\uff0c\u8fd9\u662f\u4e00\u9879\u8017\u65f6\u3001\u8106\u5f31\u4e14\u6311\u6218\u6027\u7684\u5de5\u4f5c\uff0c\u5c24\u5176\u5bf9\u4e8e\u5e9e\u5927\u548c\u52a8\u6001\u7684\u7cfb\u7edf\uff0c\u4e14\u9700\u8981\u6df1\u539a\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u5728\u4e91\u7cfb\u7edf\u4e2d\u7684\u6548\u679c\u6709\u9650\uff0c\u56e0\u4e3a\u6545\u969c\u4e8b\u4ef6\u7684\u53d1\u751f\u9891\u7387\u76f8\u5bf9\u8f83\u4f4e\u3002 \u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u89e3\u51b3\u65b9\u6848\u2014\u2014Atlas\uff0c\u5b83\u80fd\u591f\u81ea\u52a8\u5408\u6210\u4e91\u7cfb\u7edf\u7684\u56e0\u679c\u56fe\u3002Atlas\u5229\u7528\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u7cfb\u7edf\u6587\u6863\u3001\u65e5\u5fd7\u548c\u90e8\u7f72\u53cd\u9988\u751f\u6210\u56e0\u679c\u56fe\u3002Atlas\u4e0e\u6570\u636e\u9a71\u52a8\u7684\u56e0\u679c\u53d1\u73b0\u6280\u672f\u76f8\u8f85\u76f8\u6210\uff0c\u5e76\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u9a8c\u8bc1\u6b65\u9aa4\u8fdb\u884c\u589e\u5f3a\u3002\u6211\u4eec\u5728\u4e00\u7cfb\u5217\u6545\u969c\u5b9a\u4f4d\u573a\u666f\u4e2d\u8bc4\u4f30\u4e86Atlas\uff0c\u7ed3\u679c\u8868\u660e\uff0cAtlas\u80fd\u591f\u5728\u53ef\u6269\u5c55\u548c\u666e\u9002\u7684\u65b9\u5f0f\u4e0b\u751f\u6210\u56e0\u679c\u56fe\uff0c\u5176\u6027\u80fd\u8fdc\u8d85\u6570\u636e\u9a71\u52a8\u7b97\u6cd5\uff0c\u5e76\u4e0e\u57fa\u51c6\u7ebf\u76f8\u5f53\u3002|\n", "2407.08683": "|**2024-07-11**|**SEED-Story: Multimodal Long Story Generation with Large Language Model**|Shuai Yang et.al.|[2407.08683](http://arxiv.org/abs/2407.08683)|**[link](https://github.com/tencentarc/seed-story)**|**\u968f\u7740\u56fe\u50cf\u751f\u6210\u548c\u5f00\u653e\u5f62\u5f0f\u6587\u672c\u751f\u6210\u7684\u663e\u8457\u8fdb\u6b65\uff0c\u4ea4\u9519\u7684\u56fe\u50cf-\u6587\u672c\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u6709\u5438\u5f15\u529b\u3002\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\uff0c\u5373\u751f\u6210\u53d9\u4e8b\u6587\u672c\u4e0e\u751f\u52a8\u56fe\u50cf\u7684\u4ea4\u9519\u5e8f\u5217\uff0c\u4f5c\u4e3a\u4e00\u79cd\u6709\u4ef7\u503c\u7684\u5b9e\u7528\u4efb\u52a1\uff0c\u56e0\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u524d\u666f\u800c\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u7406\u89e3\u6587\u672c\u548c\u56fe\u50cf\u590d\u6742\u4ea4\u4e92\u3001\u751f\u6210\u8fde\u8d2f\u4e14\u76f8\u5173\u6587\u672c\u548c\u89c6\u89c9\u5185\u5bb9\u7684\u6311\u6218\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51faSEED-Story\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5b83\u5229\u7528\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u6765\u751f\u6210\u6269\u5c55\u7684\u591a\u6a21\u6001\u6545\u4e8b\u3002\u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8eMLLM\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u65e2\u80fd\u9884\u6d4b\u6587\u672c\u4ee4\u724c\uff0c\u4e5f\u80fd\u9884\u6d4b\u89c6\u89c9\u4ee4\u724c\uff0c\u7136\u540e\u901a\u8fc7\u9002\u5e94\u7684\u89c6\u89c9\u89e3\u4ee4\u724c\u5316\u5668\u5904\u7406\uff0c\u751f\u6210\u5177\u6709\u4e00\u81f4\u89d2\u8272\u548c\u98ce\u683c\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u591a\u6a21\u6001\u6ce8\u610f\u529b\u6c89\u964d\u673a\u5236\uff0c\u4f7f\u5f97\u5728\u9ad8\u5ea6\u81ea\u52a8\u9012\u5f52\u7684\u65b9\u5f0f\u4e0b\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe25\u4e2a\u5e8f\u5217\uff08\u4ec5\u752810\u4e2a\u8fdb\u884c\u8bad\u7ec3\uff09\u7684\u6545\u4e8b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u5927\u89c4\u6a21\u9ad8\u5206\u8fa8\u7387\u7684StoryStream\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bad\u7ec3\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5e76\u91cf\u5316\u8bc4\u4f30\u591a\u6a21\u6001\u6545\u4e8b\u751f\u6210\u4efb\u52a1\u5728\u591a\u4e2a\u65b9\u9762\u7684\u6027\u80fd\u3002**|\n", "2407.08662": "|**2024-07-11**|**Uncertainty Estimation of Large Language Models in Medical Question Answering**|Jiaxin Wu et.al.|[2407.08662](http://arxiv.org/abs/2407.08662)|null|## \u4efb\u52a1 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u9886\u57df\u7684\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u4f46\u5b58\u5728\u4ea7\u751f\u9519\u8bef\u4e8b\u5b9e\u7684\u98ce\u9669\u3002\u4e3a\u4e86\u5728\u533b\u7597\u95ee\u9898\u89e3\u7b54\u4e2d\u90e8\u7f72\u8fd9\u4e9b\u6a21\u578b\uff0c\u9700\u8981\u53ef\u9760\u7684\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\uff08UE\uff09\u65b9\u6cd5\u6765\u8bc6\u522b\u5e7b\u89c9\u3002\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5728\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u5bf9\u6d41\u884cUE\u65b9\u6cd5\u53ca\u5176\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u65b9\u6cd5\u5728\u8be5\u9886\u57df\u901a\u5e38\u8868\u73b0\u4e0d\u4f73\uff0c\u51f8\u663e\u4e86\u533b\u7597\u5e94\u7528\u4e2d\u7684UE\u6311\u6218\u3002\u6211\u4eec\u8fd8\u89c2\u5bdf\u5230\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5f80\u5f80\u80fd\u83b7\u5f97\u66f4\u597d\u7684\u7ed3\u679c\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u89c4\u6a21\u4e0eUE\u53ef\u9760\u6027\u53ef\u80fd\u5b58\u5728\u5173\u8054\u3002 \u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u4e24\u9636\u6bb5\u9a8c\u8bc1\u201d\u7684\u6982\u7387\u81ea\u7531\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u65b9\u6cd5\u3002\u9996\u5148\uff0cLLM\u751f\u6210\u9010\u6b65\u89e3\u91ca\u548c\u521d\u59cb\u7b54\u6848\uff0c\u63a5\u7740\u5236\u5b9a\u6838\u67e5\u95ee\u9898\u4ee5\u68c0\u67e5\u89e3\u91ca\u4e2d\u7684\u4e8b\u5b9e\u9648\u8ff0\u3002\u6a21\u578b\u4f1a\u4e24\u6b21\u56de\u7b54\u8fd9\u4e9b\u95ee\u9898\uff1a\u4e00\u6b21\u72ec\u7acb\uff0c\u4e00\u6b21\u53c2\u8003\u89e3\u91ca\u3002\u4e24\u79cd\u7b54\u6848\u4e4b\u95f4\u7684\u4e0d\u4e00\u81f4\u5ea6\u8861\u91cf\u539f\u59cb\u54cd\u5e94\u7684\u4e0d\u786e\u5b9a\u6027\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u4f7f\u7528Llama 2 Chat\u6a21\u578b\u8bc4\u4f30\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u5e76\u5c06\u5176\u4e0e\u57fa\u51c6\u57fa\u7ebf\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u4e24\u9636\u6bb5\u9a8c\u8bc1\u65b9\u6cd5\u5728\u5404\u4e2a\u6570\u636e\u96c6\u548c\u6a21\u578b\u89c4\u6a21\u4e0a\u5b9e\u73b0\u4e86\u6700\u4f73\u7684\u6574\u4f53\u51c6\u786e\u6027\u548c\u7a33\u5b9a\u6027\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u968f\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\u800c\u63d0\u5347\u3002|\n", "2407.09467": "|**2024-07-12**|**FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3**|Georgios Makridis et.al.|[2407.09467](http://arxiv.org/abs/2407.09467)|null|\u5728\u8fd9\u4e2a\u5145\u6ee1\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u7684\u53d9\u4e8b\u591a\u6837\u6027\u4e16\u754c\u4e2d\uff0c\u6709\u4e00\u4e2a\u72ec\u7279\u7684\u673a\u4f1a\u662f\u901a\u8fc7\u5b9a\u5236\u548c\u4e2a\u6027\u5316\u7684\u53d9\u8ff0\u5438\u5f15\u5e74\u8f7b\u89c2\u4f17\u3002\u672c\u6587\u4ecb\u7ecdFairyLandAI\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u513f\u7ae5\u5f00\u53d1\u7684\u521b\u65b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u57fa\u4e8eOpenAI\u7684API\u6784\u5efa\u3002\u5176\u7279\u522b\u4e4b\u5904\u5728\u4e8e\uff0cFairyLandAI\u4e0d\u4ec5\u80fd\u751f\u6210\u5f15\u4eba\u5165\u80dc\u3001\u9002\u5408\u5404\u5e74\u9f84\u6bb5\u4e14\u53cd\u6620\u5404\u79cd\u4f20\u7edf\u7684\u6545\u4e8b\uff0c\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5408\u9ad8\u7ea7\u56fe\u50cf\u751f\u6210\u5de5\u5177\uff08\u5982GenAI\u548cDalle-3\uff09\u7684\u521b\u610f\u63d0\u793a\uff0c\u4ece\u800c\u4e30\u5bcc\u8bb2\u6545\u4e8b\u7684\u4f53\u9a8c\u3002FairyLandAI\u7cbe\u51c6\u5730\u9002\u5e94\u513f\u7ae5\u7684\u60f3\u8c61\u529b\u4e16\u754c\uff0c\u63d0\u4f9b\u65e2\u6559\u80b2\u53c8\u5a31\u4e50\u7684\u6545\u4e8b\uff0c\u5e76\u4e0e\u4e0d\u540c\u5e74\u9f84\u9636\u6bb5\u6240\u8574\u542b\u7684\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u3002\u5b83\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u6839\u636e\u4e2a\u4f53\u5b69\u5b50\u7684\u559c\u597d\u548c\u6587\u5316\u80cc\u666f\u5b9a\u5236\u6545\u4e8b\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u53d9\u4e8b\u65b0\u65f6\u4ee3\u7684\u5230\u6765\u3002\u6b64\u5916\uff0c\u5b83\u4e0e\u56fe\u50cf\u751f\u6210\u6280\u672f\u7684\u7ed3\u5408\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u53d9\u4e8b\u4f53\u9a8c\uff0c\u6fc0\u53d1\u53e3\u5934\u548c\u89c6\u89c9\u521b\u9020\u529b\u3002\u5b9e\u8bc1\u8bc4\u4f30\u663e\u793a\uff0cFairyLandAI\u5728\u521b\u4f5c\u5438\u5f15\u5b69\u5b50\u4eec\u7684\u6545\u4e8b\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u4e9b\u6545\u4e8b\u4e0d\u4ec5\u5a31\u4e50\uff0c\u8fd8\u4f53\u73b0\u4e86\u591a\u5143\u4f20\u7edf\u4e2d\u7684\u9053\u5fb7\u6559\u8bf2\u3002\u8fd9\u4e2a\u6a21\u578b\u5bf9\u4e8e\u5bb6\u957f\u548c\u6559\u80b2\u5de5\u4f5c\u8005\u6765\u8bf4\u662f\u4e00\u4e2a\u5b9d\u8d35\u7684\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u901a\u8fc7\u5f15\u4eba\u5165\u80dc\u7684\u6545\u4e8b\u4f20\u9012\u6df1\u523b\u7684\u4eba\u751f\u9053\u7406\u3002FairyLandAI\u4ee3\u8868\u4e86\u5229\u7528LLMs\uff0c\u7279\u522b\u662fOpenAI API\u8fdb\u884c\u6559\u80b2\u548c\u6587\u5316\u63d0\u5347\u7684\u5f00\u521b\u6027\u4e00\u6b65\uff0c\u4f7f\u590d\u6742\u800c\u5bcc\u6709\u6559\u80b2\u610f\u4e49\u7684\u9053\u5fb7\u6545\u4e8b\u5bf9\u5e74\u8f7b\u3001\u5bcc\u6709\u60f3\u8c61\u529b\u7684\u5fc3\u7075\u53d8\u5f97\u6613\u4e8e\u7406\u89e3\u548c\u4eab\u53d7\u3002|\n", "2407.09450": "|**2024-07-12**|**Human-like Episodic Memory for Infinite Context LLMs**|Zafeirios Fountas et.al.|[2407.09450](http://arxiv.org/abs/2407.09450)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u4ecd\u9762\u4e34\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u51c6\u786e\u6027\u7684\u95ee\u9898\u3002\u4eba\u7c7b\u5927\u8111\u5728\u7ec4\u7ec7\u548c\u68c0\u7d22\u8de8\u957f\u65f6\u95f4\u5c3a\u5ea6\u7684\u4eb2\u8eab\u7ecf\u5386\u65b9\u9762\u5c24\u4e3a\u51fa\u8272\uff0c\u80fd\u591f\u8986\u76d6\u4e00\u751f\u7684\u8bb0\u5fc6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aEM-LLM\uff0c\u5b83\u5c06\u4eba\u7c7b\u7684 episodic memory\uff08\u60c5\u666f\u8bb0\u5fc6\uff09\u548c\u4e8b\u4ef6\u8ba4\u77e5\u5173\u952e\u8981\u7d20\u878d\u5165\u5230LLMs\u4e2d\uff0c\u4f7f\u5176\u80fd\u591f\u6709\u6548\u5904\u7406\u51e0\u4e4e\u65e0\u9650\u957f\u5ea6\u7684\u4e0a\u4e0b\u6587\uff0c\u540c\u65f6\u4fdd\u6301\u8ba1\u7b97\u6548\u7387\u3002EM-LLM\u901a\u8fc7\u7ed3\u5408\u8d1d\u53f6\u65af\u60ca\u5947\u5ea6\u548c\u56fe\u8bba\u8fb9\u754c\u7ec6\u5316\u6280\u672f\uff0c\u5728\u7ebf\u65b9\u5f0f\u7ec4\u7ec7\u4ee4\u724c\u5e8f\u5217\u6210\u8fde\u8d2f\u7684\u4e8b\u4ef6\u3002\u5f53\u9700\u8981\u65f6\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u7684\u8bb0\u5fc6\u8fc7\u7a0b\u2014\u2014\u7ed3\u5408\u76f8\u4f3c\u5ea6\u548c\u65f6\u95f4\u90bb\u63a5\u7684\u68c0\u7d22\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4fe1\u606f\u8bbf\u95ee\u3002\u5728LongBench\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u663e\u793a\uff0cEM-LLM\u7684\u8868\u73b0\u4f18\u4e8e\u6700\u5148\u8fdb\u7684InfLLM\u6a21\u578b\uff0c\u603b\u4f53\u76f8\u5bf9\u63d0\u9ad8\u4e864.3%\uff0c\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\uff0c\u5305\u62ec\u63d0\u5347\u4e8633%\u7684PassageRetrieval\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86EM-LLM\u4e8b\u4ef6\u5206\u5272\u4e0e\u4eba\u7c7b\u611f\u77e5\u4e8b\u4ef6\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\uff0c\u6697\u793a\u4e86\u8fd9\u4e2a\u4eba\u5de5\u7cfb\u7edf\u4e0e\u751f\u7269\u5bf9\u5e94\u673a\u5236\u4e4b\u95f4\u7684\u6865\u6881\u3002\u8fd9\u9879\u5de5\u4f5c\u4e0d\u4ec5\u63d0\u5347\u4e86LLMs\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u8fd8\u4e3a\u63a2\u7d22\u4eba\u7c7b\u8bb0\u5fc6\u673a\u5236\u63d0\u4f9b\u4e86\u8ba1\u7b97\u6846\u67b6\uff0c\u5f00\u8f9f\u4e86\u4eba\u5de5\u667a\u80fd\u548c\u8ba4\u77e5\u79d1\u5b66\u4ea4\u53c9\u7814\u7a76\u7684\u65b0\u9014\u5f84\u3002|\n", "2407.09447": "|**2024-07-12**|**ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts**|Amelia F. Hardy et.al.|[2407.09447](http://arxiv.org/abs/2407.09447)|**[link](https://github.com/sisl/astprompter)**|## \u80cc\u666f \u901a\u5e38\u7684\u81ea\u52a8\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ea2\u961f\u5bf9\u6297\u7b56\u7565\u96c6\u4e2d\u5728\u5bfb\u627e\u80fd\u89e6\u53d1\u51bb\u7ed3\u8bed\u8a00\u6a21\u578b\uff08\u5373\u9632\u5fa1\u8005\uff09\u751f\u6210\u6709\u6bd2\u6587\u672c\u7684\u63d0\u793a\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6a21\u578b\uff08\u5373\u653b\u51fb\u8005\uff09\u4ea7\u751f\u96be\u4ee5\u7406\u89e3\u3001\u4e0d\u81ea\u7136\u7684\u8f93\u51fa\u3002\u5728\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u6765\u5904\u7406LLMs\u7684\u7ea2\u961f\u5bf9\u6297\u4efb\u52a1\uff0c\u76ee\u6807\u662f\u627e\u5230\u65e2\u80fd\uff081\uff09\u89e6\u53d1\u9632\u5fa1\u8005\u751f\u6210\u6709\u6bd2\u6587\u672c\uff0c\u53c8\u80fd\uff082\uff09\u4fdd\u6301\u4f4e\u56f0\u60d1\u5ea6\uff08\u5373\u9632\u5fa1\u8005\u6253\u5206\uff09\u7684\u63d0\u793a\u3002\u6211\u4eec\u8ba4\u4e3a\u5728\u7ea2\u961f\u5bf9\u6297\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u60c5\u51b5\u6700\u76f8\u5173\uff0c\u56e0\u4e3a\u5b83\u4eec\u5f88\u53ef\u80fd\u5728\u9632\u5fa1\u8005\u6a21\u578b\u7684\u5e38\u89c4\u4f7f\u7528\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u79cd\u65b0\u9896\u7684\u5728\u7ebf\u548c\u5f31\u76d1\u7763\u7684Identity Preference Optimization\uff08IPO\uff09\u53d8\u4f53\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u5e94\u7528\u4e8eGPT-2\u548cGPT-2 XL\u4f5c\u4e3a\u9632\u5fa1\u8005\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7b56\u7565\u80fd\u591f\u751f\u6210\u65e2\u53ef\u80fd\u53c8\u4f1a\u89e6\u53d1\u6bd2\u6027\u7684\u63d0\u793a\u3002\u6700\u540e\uff0c\u6211\u4eec\u5206\u6790\u4e86\u5b66\u4e60\u7b56\u7565\u3001\u53ef\u80fd\u6027\u4e0e\u6bd2\u6027\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5e76\u8ba8\u8bba\u4e86\u76f8\u5173\u542b\u4e49\u3002\u8be5\u9879\u76ee\u7684\u6e90\u4ee3\u7801\u53ef\u5728\u8fd9\u91cc\u83b7\u53d6\uff1ahttps://github.com/sisl/ASTPrompter/\u3002|\n", "2407.09435": "|**2024-07-12**|**MUSCLE: A Model Update Strategy for Compatible LLM Evolution**|Jessica Echterhoff et.al.|[2407.09435](http://arxiv.org/abs/2407.09435)|null|## \u80cc\u666f \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u6570\u636e\u6216\u67b6\u6784\u7684\u8c03\u6574\u800c\u7ecf\u5e38\u66f4\u65b0\u4ee5\u63d0\u5347\u6027\u80fd\u3002\u5728\u5347\u7ea7\u8fc7\u7a0b\u4e2d\uff0c\u5f00\u53d1\u8005\u901a\u5e38\u4fa7\u91cd\u4e8e\u63d0\u9ad8\u603b\u4f53\u6027\u80fd\u6307\u6807\uff0c\u5bf9\u4e0e\u65e7\u7248\u672c\u517c\u5bb9\u6027\u7684\u5173\u6ce8\u8f83\u5c11\u3002\u7136\u800c\uff0c\u7528\u6237\u5f80\u5f80\u4f1a\u5bf9\u4ed6\u4eec\u4f7f\u7528\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u529f\u80fd\u548c\u80fd\u529b\u5f62\u6210\u5fc3\u7406\u6a21\u578b\uff0c\u5e76\u968f\u7740\u6bcf\u6b21\u66f4\u65b0\u9700\u8981\u8c03\u6574\u8fd9\u4e2a\u6a21\u578b\u3002\u9891\u7e41\u7684\u6a21\u578b\u53d8\u66f4\u53ef\u80fd\u5bfc\u81f4\u7528\u6237\u6ee1\u610f\u5ea6\u4e0b\u964d\u3002\u5b9e\u9645\u4e0a\uff0c\u4e0b\u6e38\u4efb\u52a1\u5fae\u8c03\u5668\u4f9d\u8d56\u9884\u8bad\u7ec3\u7684LLM\u57fa\u6a21\u578b\u3002\u5f53\u57fa\u6a21\u578b\u66f4\u65b0\u65f6\uff0c\u9762\u5411\u7528\u6237\u7684\u8fd9\u4e9b\u4e0b\u6e38\u4efb\u52a1\u6a21\u578b\u53ef\u80fd\u4f1a\u51fa\u73b0\u5b9e\u4f8b\u9000\u5316\u6216\u8d1f\u9762\u7ffb\u8f6c\u2014\u2014\u5148\u524d\u6b63\u786e\u7684\u5b9e\u4f8b\u73b0\u5728\u88ab\u9884\u6d4b\u9519\u8bef\u3002\u5373\u4f7f\u4e0b\u6e38\u4efb\u52a1\u7684\u8bad\u7ec3\u6d41\u7a0b\u4fdd\u6301\u4e0d\u53d8\uff0c\u8fd9\u79cd\u60c5\u51b5\u4e5f\u4f1a\u53d1\u751f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4e3a\u7528\u6237\u63d0\u4f9b\u65e0\u7f1d\u7684\u6a21\u578b\u66f4\u65b0\u4f53\u9a8c\uff0c\u65b9\u6cd5\u6709\u4e24\u4e2a\u65b9\u9762\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u5957\u8bc4\u4f30\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u6a21\u578b\u4e0e\u65e7\u7248\u672c\u7684\u517c\u5bb9\u6027\uff0c\u7279\u522b\u9002\u7528\u4e8e\u751f\u6210\u4efb\u52a1\uff0c\u4e5f\u53ef\u5e94\u7528\u4e8e\u5206\u7c7b\u4efb\u52a1\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e0d\u540c\u6a21\u578b\u7248\u672c\u548c\u66f4\u65b0\u4e4b\u95f4\u5b58\u5728\u9000\u5316\u548c\u4e0d\u4e00\u81f4\u6027\uff0c\u5c24\u5176\u662f\u5728\u591a\u6837\u5316\u7684\u4efb\u52a1\u4e0a\u3002 ## \u4efb\u52a1 \u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ee5\u4e0b\u4e24\u4e2a\u9014\u5f84\u63d0\u4f9b\u5bf9\u7528\u6237\u53cb\u597d\u7684\u6a21\u578b\u66f4\u65b0\uff1a\u4e00\u662f\u5f00\u53d1\u4e00\u79cd\u517c\u5bb9\u6027\u8bc4\u4f30\u6807\u51c6\uff0c\u7528\u4e8e\u68c0\u6d4b\u751f\u6210\u4efb\u52a1\u6216\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6a21\u578b\u7248\u672c\u95f4\u5dee\u5f02\uff1b\u4e8c\u662f\u63d0\u51fa\u4e00\u79cd\u8bad\u7ec3\u7b56\u7565\uff0c\u901a\u8fc7\u8bad\u7ec3\u517c\u5bb9\u6027\u6a21\u578b\u6765\u51cf\u5c11\u6a21\u578b\u66f4\u65b0\u4e2d\u7684\u4e0d\u4e00\u81f4\uff0c\u4ece\u800c\u964d\u4f4e\u4eceLlama 1\u5230Llama 2\u7b49\u7248\u672c\u66f4\u65b0\u65f6\u7684\u8d1f\u9762\u7ffb\u8f6c\u7387\uff0c\u6700\u591a\u53ef\u51cf\u5c1140%\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u8f7b\u677e\u5730\u9002\u5e94\u65b0\u7248\u672c\uff0c\u800c\u65e0\u9700\u9891\u7e41\u8c03\u6574\u4ed6\u4eec\u7684\u9884\u671f\u548c\u4f7f\u7528\u65b9\u5f0f\u3002|\n", "2407.09429": "|**2024-07-12**|**Open (Clinical) LLMs are Sensitive to Instruction Phrasings**|Alberto Mario Ceballos Arroyo et.al.|[2407.09429](http://arxiv.org/abs/2407.09429)|**[link](https://github.com/alceballosa/clin-robust)**|## \u80cc\u666f \u57fa\u4e8e\u6307\u4ee4\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u6839\u636e\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u6267\u884c\u5404\u79cd\u4efb\u52a1\uff0c\u4f46\u5b83\u4eec\u5bf9\u6307\u4ee4\u8868\u8ff0\u7684\u654f\u611f\u6027\u662f\u4e00\u4e2a\u95ee\u9898\u3002\u5728\u533b\u7597\u9886\u57df\u5c24\u5176\u5173\u952e\uff0c\u56e0\u4e3a\u4e34\u5e8a\u533b\u751f\u53ef\u80fd\u4e0d\u662f\u63d0\u793a\u5de5\u7a0b\u65b9\u9762\u7684\u4e13\u5bb6\uff0c\u4e14\u9519\u8bef\u8f93\u51fa\u7684\u6f5c\u5728\u540e\u679c\u66f4\u4e3a\u4e25\u91cd\u3002\u8fd9\u5c31\u63d0\u51fa\u4e86\u4e00\u4e2a\u5b9e\u9645\u95ee\u9898\uff1a\u9488\u5bf9\u4e34\u5e8a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\uff0c\u6307\u4ee4\u8c03\u4f18\u7684LLMs\u5bf9\u4e8e\u81ea\u7136\uff08\u975e\u653b\u51fb\u6027\u7684\uff09\u6307\u4ee4\u8868\u8ff0\u53d8\u5316\u6709\u591a\u7a33\u5065\uff1f\u6211\u4eec\u6536\u96c6\u4e86\u6765\u81ea\u4e0d\u540c\u4efb\u52a1\u7684\u533b\u751f\u63d0\u793a\uff0c\u8861\u91cf\u4e86\u4e03\u79cdLLM\uff08\u5305\u62ec\u901a\u7528\u548c\u4e13\u7528\u7684\uff09\u5bf9\u6307\u4ee4\u8868\u8ff0\u7ec6\u5fae\u5dee\u5f02\u7684\u654f\u611f\u5ea6\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u5dee\u5f02\u663e\u8457\uff0c\u4ee4\u4eba\u610f\u5916\u7684\u662f\uff0c\u4e13\u95e8\u9488\u5bf9\u4e34\u5e8a\u6570\u636e\u8bad\u7ec3\u7684\u6a21\u578b\u76f8\u8f83\u4e8e\u901a\u7528\u9886\u57df\u7684\u6a21\u578b\uff0c\u5176\u7a33\u5b9a\u6027\u8f83\u5dee\u3002\u6b64\u5916\uff0c\u968f\u610f\u7684\u8868\u8ff0\u53d8\u5316\u53ef\u80fd\u5f71\u54cd\u516c\u5e73\u6027\uff0c\u4f8b\u5982\uff0c\u7528\u4e8e\u9884\u6d4b\u6b7b\u4ea1\u7387\u7684\u6709\u6548\u4f46\u4e0d\u540c\u7684\u6307\u4ee4\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u6574\u4f53\u6027\u80fd\u7684\u6ce2\u52a8\uff0c\u8fd8\u4f1a\u5728\u4e0d\u540c\u4eba\u7fa4\u95f4\u4ea7\u751f\u5dee\u5f02\u3002|\n", "2407.09424": "|**2024-07-12**|**TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models**|Hang Zou et.al.|[2407.09424](http://arxiv.org/abs/2407.09424)|null|\u8be5\u8bba\u6587\u9996\u6b21\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u5927\u578b\u901a\u7528\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9002\u5e94\u5230\u7535\u4fe1\u9886\u57df\u7684\u4e13\u7528\u6a21\u578b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u6784\u5efa\u4e86\u7535\u4fe1\u7279\u5b9a\u7684\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u6307\u4ee4\u6570\u636e\u96c6\u548c\u504f\u597d\u6570\u636e\u96c6\uff0c\u5206\u522b\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\u3001\u6307\u5bfc\u8c03\u4f18\u548c\u5bf9\u9f50\u8c03\u4f18\u3002\u7531\u4e8e\u7535\u4fe1\u9886\u57df\u7f3a\u4e4f\u5e7f\u6cdb\u63a5\u53d7\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u73b0\u6709\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5e76\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u57fa\u51c6\uff1a\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u3001\u7535\u4fe1\u5f00\u653e\u6027\u95ee\u9898\u4e0e\u7b54\u6848\uff08TeleQnA\uff09\u4ee5\u53ca\u7535\u4fe1\u4ee3\u7801\u4efb\u52a1\u3002\u8fd9\u4e9b\u65b0\u57fa\u51c6\u5168\u9762\u8bc4\u4f30\u4e86LLMs\u5728\u7535\u4fe1\u9886\u57df\u7684\u6570\u5b66\u5efa\u6a21\u3001\u5f00\u653e\u5f0f\u95ee\u9898\u56de\u7b54\u3001\u4ee3\u7801\u751f\u6210\u3001\u586b\u5145\u3001\u603b\u7ed3\u548c\u5206\u6790\u7b49\u80fd\u529b\u3002\u6211\u4eec\u7684\u4f18\u5316\u6a21\u578bTelecomGPT\u5728\u7535\u4fe1\u6570\u5b66\u5efa\u6a21\u57fa\u51c6\u4e0a\u663e\u8457\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\uff0c\u5982GPT-4\u3001Llama-3\u548cMistral\uff0c\u5e76\u5728TeleQnA\u30013GPP\u6280\u672f\u6587\u6863\u5206\u7c7b\u3001\u7535\u4fe1\u4ee3\u7801\u6458\u8981\u4e0e\u751f\u6210\u4ee5\u53ca\u586b\u5145\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2407.09417": "|**2024-07-12**|**Mitigating Entity-Level Hallucination in Large Language Models**|Weihang Su et.al.|[2407.09417](http://arxiv.org/abs/2407.09417)|**[link](https://github.com/oneal2000/entityhallucination)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u7528\u6237\u83b7\u53d6\u4fe1\u606f\u7684\u65b9\u5f0f\u53d1\u751f\u4e86\u8f6c\u53d8\uff0c\u4ece\u4f20\u7edf\u7684\u641c\u7d22\u5f15\u64ce\u8f6c\u5411\u76f4\u63a5\u4e0eLLMs\u8fdb\u884c\u95ee\u7b54\u4ea4\u4e92\u3002\u7136\u800c\uff0cLLMs\u7684\u5e7f\u6cdb\u5e94\u7528\u66b4\u9732\u51fa\u4e00\u4e2a\u6311\u6218\uff0c\u5373\u201c\u5e7b\u89c9\u201d\u751f\u6210\uff0c\u5373\u6a21\u578b\u751f\u6210\u770b\u4f3c\u8fde\u8d2f\u4f46\u4e8b\u5b9e\u6027\u9519\u8bef\u7684\u56de\u7b54\uff0c\u8fd9\u5bfc\u81f4\u7528\u6237\u5bf9\u57fa\u4e8eLLMs\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u4ea7\u751f\u6000\u7591\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff1a\u52a8\u6001\u68c0\u7d22\u589e\u5f3a\u57fa\u4e8e\u5e7b\u89c9\u68c0\u6d4b\uff08DRAD\uff09\u3002DRAD\u6539\u8fdb\u4e86\u4f20\u7edf\u68c0\u7d22\u589e\u5f3a\u6280\u672f\uff0c\u901a\u8fc7\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\u6765\u52a8\u6001\u8c03\u6574\u68c0\u7d22\u8fc7\u7a0b\u3002\u5b83\u4e3b\u8981\u5305\u62ec\u4e24\u4e2a\u6838\u5fc3\u7ec4\u4ef6\uff1a\u5b9e\u65f6\u5e7b\u89c9\u68c0\u6d4b\uff08RHD\uff09\uff0c\u7528\u4e8e\u5728\u65e0\u9700\u5916\u90e8\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\u8bc6\u522b\u6f5c\u5728\u7684\u5e7b\u89c9\uff1b\u4ee5\u53ca\u57fa\u4e8e\u5916\u90e8\u77e5\u8bc6\u7684\u81ea\u6211\u7ea0\u6b63\uff08SEK\uff09\uff0c\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4fee\u6b63\u8fd9\u4e9b\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAD\u5728\u68c0\u6d4b\u548c\u51cf\u5c11LLMs\u4e2d\u7684\u5e7b\u89c9\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u5df2\u5c06\u6240\u6709\u4ee3\u7801\u548c\u6570\u636e\u5f00\u6e90\uff0c\u4f9b\u5b66\u672f\u754c\u4f7f\u7528\uff1ahttps://github.com/oneal2000/EntityHallucination\u3002**|\n", "2407.09413": "|**2024-07-12**|**SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers**|Shraman Pramanick et.al.|[2407.09413](http://arxiv.org/abs/2407.09413)|**[link](https://github.com/google/spiqa)**|**### \u4efb\u52a1 \u5728\u6df1\u5165\u9605\u8bfb\u79d1\u5b66\u8bba\u6587\u65f6\uff0c\u5feb\u901f\u67e5\u627e\u4fe1\u606f\u662f\u5173\u952e\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u4e8e\u8bba\u6587\u7684\u95ee\u9898 answering\uff08QA\uff09\u6570\u636e\u96c6\u5728\u89c4\u6a21\u548c\u5185\u5bb9\u4e0a\u5b58\u5728\u5c40\u9650\uff0c\u4e3b\u8981\u5173\u6ce8\u6587\u672c\u90e8\u5206\u3002\u4e3a\u5f25\u8865\u8fd9\u4e00\u4e0d\u8db3\uff0c\u6211\u4eec\u63a8\u51fa\u4e86SPIQA\uff08\u79d1\u5b66\u8bba\u6587\u56fe\u50cf\u95ee\u9898\u56de\u7b54\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7684\u5927\u578bQA\u6570\u636e\u96c6\uff0c\u65e8\u5728\u7406\u89e3\u8ba1\u7b97\u673a\u79d1\u5b66\u5404\u9886\u57df\u7684\u590d\u6742\u56fe\u8868\u3001\u8868\u683c\u548c\u7ed3\u679c\u53ef\u89c6\u5316\u3002\u501f\u52a9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u5f3a\u5927\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u901a\u8fc7\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u7b5b\u9009\u521b\u5efa\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u3002SPIQA\u5305\u542b\u4e8627\u4e07\u6761\u95ee\u9898\uff0c\u5206\u4e3a\u8bad\u7ec3\u3001\u9a8c\u8bc1\u548c\u4e09\u4e2a\u4e0d\u540c\u7684\u8bc4\u4f30\u5206\u6bb5\u3002\u901a\u8fc7\u4e0e12\u4e2a\u57fa\u7840\u6a21\u578b\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5f53\u524d\u591a\u6a21\u6001\u7cfb\u7edf\u7406\u89e3\u79d1\u7814\u6587\u7ae0\u7ec6\u5fae\u4e4b\u5904\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u94fe\u5f0f\u601d\u7ef4\uff08Chain-of-Thought\uff0cCoT\uff09\u8bc4\u4ef7\u7b56\u7565\uff0c\u7ed3\u5408\u4e0a\u4e0b\u6587\u68c0\u7d22\uff0c\u5b9e\u73b0\u4e86\u7ec6\u81f4\u7684\u9010\u6b65\u9aa4\u8bc4\u4f30\uff0c\u6709\u52a9\u4e8e\u63d0\u5347\u6a21\u578b\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u989d\u5916\u6587\u672c\u4fe1\u606f\u5bf9\u6027\u80fd\u63d0\u5347\u7684\u4e0a\u9650\uff0c\u8fd9\u8868\u660e\u4e86\u5176\u5bf9\u672a\u6765\u7814\u7a76\u7684\u6f5c\u529b\uff0c\u5e76\u9884\u793a\u7740\u8be5\u6570\u636e\u96c6\u5c06\u9769\u65b0\u6211\u4eec\u4e0e\u79d1\u5b66\u6587\u732e\u4e92\u52a8\u7684\u65b9\u5f0f\u3002**|\n", "2407.09394": "|**2024-07-12**|**PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents**|Saber Zerhoudi et.al.|[2407.09394](http://arxiv.org/abs/2407.09394)|**[link](https://github.com/padas-lab-de/PersonaRAG)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7531\u4e8e\u77e5\u8bc6\u8fc7\u65f6\u548c\u80e1\u7f16\u4e71\u9020\u800c\u96be\u4ee5\u751f\u6210\u53ef\u9760\u7684\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6a21\u578b\u901a\u8fc7\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u6539\u8fdb\u4e86LLMs\uff0c\u4f46\u5f80\u5f80\u65e0\u6cd5\u4e2a\u6027\u5316\u68c0\u7d22\u8fc7\u7a0b\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014PersonaRAG\uff0c\u5b83\u5f15\u5165\u4e86\u4ee5\u7528\u6237\u4e3a\u4e2d\u5fc3\u7684\u4ee3\u7406\uff0c\u80fd\u591f\u6839\u636e\u5b9e\u65f6\u7528\u6237\u6570\u636e\u548c\u4ea4\u4e92\u6765\u8c03\u6574\u68c0\u7d22\u548c\u751f\u6210\u3002\u5728\u591a\u4e2a\u95ee\u7b54\u6570\u636e\u96c6\u4e0a\u7684\u8bc4\u4f30\u663e\u793a\uff0cPersonaRAG\u76f8\u8f83\u4e8e\u57fa\u7840\u6a21\u578b\u8868\u73b0\u51fa\u663e\u8457\u4f18\u52bf\uff0c\u80fd\u66f4\u597d\u5730\u6ee1\u8db3\u7528\u6237\u7684\u4e2a\u6027\u5316\u9700\u6c42\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u7528\u6237\u9002\u5e94\u7684\u4fe1\u606f\u68c0\u7d22\u7cfb\u7edf\u5177\u6709\u5e7f\u9614\u7684\u53d1\u5c55\u524d\u666f\u3002|\n", "2407.09388": "|**2024-07-12**|**GAVEL: Generating Games Via Evolution and Language Models**|Graham Todd et.al.|[2407.09388](http://arxiv.org/abs/2407.09388)|null|\u81ea\u52a8\u521b\u5efa\u65b0\u9896\u6709\u8da3\u7684\u6e38\u620f\u662f\u4e00\u4e2a\u590d\u6742\u4efb\u52a1\uff0c\u5b83\u6d89\u53ca\u5982\u4f55\u4ee5\u8ba1\u7b97\u673a\u53ef\u5904\u7406\u7684\u5f62\u5f0f\u8868\u8fbe\u6e38\u620f\u89c4\u5219\u3001\u641c\u7d22\u5e9e\u5927\u7684\u6f5c\u5728\u6e38\u620f\u7a7a\u95f4\uff0c\u4ee5\u53ca\u51c6\u786e\u8bc4\u4f30\u672a\u89c1\u8fc7\u6e38\u620f\u7684\u539f\u521b\u6027\u548c\u8d28\u91cf\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u6709\u9650\u7684\u89c4\u5219\u8868\u793a\uff0c\u5e76\u4f9d\u8d56\u4e8e\u7279\u5b9a\u9886\u57df\u7684\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4e13\u6ce8\u4e8e\u5728Ludii\u6e38\u620f\u63cf\u8ff0\u8bed\u8a00\u4e2d\u751f\u6210\u65b0\u5947\u7684\u6e38\u620f\uff0c\u8be5\u8bed\u8a00\u7f16\u7801\u4e86\u5404\u79cd\u98ce\u683c\u548c\u73a9\u6cd5\u76841000\u591a\u6b3e\u68cb\u76d8\u6e38\u620f\u89c4\u5219\u3002\u6211\u4eec\u501f\u9274\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fdb\u5316\u8ba1\u7b97\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u80fd\u591f\u667a\u80fd\u5730\u53d8\u5f02\u548c\u91cd\u7ec4\u4ee5\u4ee3\u7801\u5f62\u5f0f\u8868\u8fbe\u7684\u6e38\u620f\u673a\u5236\u7684\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u521b\u9020\u51fa\u65b0\u7684\u3001\u6709\u5438\u5f15\u529b\u7684\u6e38\u620f\uff0c\u5305\u62ec\u90a3\u4e9b\u73b0\u6709Ludii\u6570\u636e\u96c6\u4e2d\u672a\u8986\u76d6\u7684\u6e38\u620f\u533a\u57df\u3002\u751f\u6210\u7684\u4e00\u4e9b\u6e38\u620f\u793a\u4f8b\u53ef\u901a\u8fc7Ludii\u95e8\u6237\u5728\u7ebf\u4f53\u9a8c\u3002|\n", "2407.10972": "|**2024-07-15**|**VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation**|Bocheng Zou et.al.|[2407.10972](http://arxiv.org/abs/2407.10972)|**[link](https://github.com/vgbench/VGBench)**|**\u5728\u89c6\u89c9\u6a21\u578b\u9886\u57df\uff0c\u4e3b\u8981\u7684\u8868\u793a\u65b9\u5f0f\u662f\u4f7f\u7528\u50cf\u7d20\u6765\u7ed8\u5236\u89c6\u89c9\u4e16\u754c\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u975e\u603b\u662f\u6700\u4f73\u6216\u552f\u4e00\u7684\u8868\u793a\u89c6\u89c9\u5185\u5bb9\u7684\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8bbe\u8ba1\u5e08\u548c\u827a\u672f\u5bb6\uff0c\u4ed6\u4eec\u5e38\u7528\u591a\u8fb9\u5f62\u7b49\u51e0\u4f55\u5f62\u72b6\u6765\u6784\u5efa\u56fe\u5f62\u3002\u77e2\u91cf\u56fe\u5f62\uff08VG\uff09\u63d0\u4f9b\u4e86\u4e00\u79cd\u6587\u672c\u5f62\u5f0f\u7684\u89c6\u89c9\u5185\u5bb9\u8868\u793a\uff0c\u5bf9\u4e8e\u5361\u901a\u6216\u7d20\u63cf\u7b49\u7c7b\u578b\u7684\u5185\u5bb9\u53ef\u80fd\u66f4\u4e3a\u7cbe\u70bc\u548c\u5f3a\u5927\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5f3a\u5927\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u5c55\u73b0\u51fa\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\u3002\u4f46\u8fd9\u4e9b\u5de5\u4f5c\u4e3b\u8981\u4fa7\u91cd\u4e8e\u5b9a\u6027\u5206\u6790\u3001\u7406\u89e3\u6216\u7279\u5b9a\u7c7b\u578b\u7684\u77e2\u91cf\u56fe\u5f62\u3002\u6211\u4eec\u63d0\u51faVGBench\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u5728\u5904\u7406\u77e2\u91cf\u56fe\u5f62\u65b9\u9762\u7684\u6027\u80fd\uff0c\u5305\u62ec\uff1a(a) \u5bf9\u89c6\u89c9\u7406\u89e3\u548c\u751f\u6210\u7684\u53cc\u91cd\u5173\u6ce8\uff0c(b) \u591a\u79cd\u77e2\u91cf\u56fe\u5f62\u683c\u5f0f\u7684\u8bc4\u4f30\uff0c(c) \u4e0d\u540c\u7c7b\u578b\u7684\u63d0\u95ee\uff0c(d) \u5e7f\u6cdb\u7684\u63d0\u793a\u6280\u5de7\uff0c\u4ee5\u53ca(e) \u5728\u591a\u79cdLLMs\u4e0b\u7684\u8868\u73b0\u3002\u901a\u8fc7\u5bf9\u6536\u96c6\u76844279\u4e2a\u7406\u89e3\u6837\u672c\u548c5845\u4e2a\u751f\u6210\u6837\u672c\u8fdb\u884c\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0LLMs\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u90fd\u8868\u73b0\u51fa\u5f3a\u5927\u80fd\u529b\uff0c\u4f46\u5728\u4f4e\u7ea7\u683c\u5f0f\uff08\u5982SVG\uff09\u4e0a\u8868\u73b0\u7a0d\u900a\u3002\u6211\u4eec\u7684\u6570\u636e\u548c\u8bc4\u4f30\u6d41\u7a0b\u5c06\u5728\u4e0a\u5f00\u6e90\u3002**|\n", "2407.10969": "|**2024-07-15**|**Q-Sparse: All Large Language Models can be Fully Sparsely-Activated**|Hongyu Wang et.al.|[2407.10969](http://arxiv.org/abs/2407.10969)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u4f46\u6709\u6548\u7684\u8bad\u7ec3\u65b9\u6cd5\uff0c\u79f0\u4e3aQ-Sparse\uff0c\u4e13\u4e3a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8bbe\u8ba1\u3002Q-Sparse\u4f7f\u5f97LLMs\u7684\u6fc0\u6d3b\u5168\u4e3a\u7a00\u758f\uff0c\u4ece\u800c\u5728\u63a8\u7406\u9636\u6bb5\u5e26\u6765\u663e\u8457\u7684\u6548\u7387\u63d0\u5347\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u5e94\u7528\u9876\u90e8K\u7a00\u758f\u5316\u6280\u672f\u5bf9\u6fc0\u6d3b\u8fdb\u884c\u5904\u7406\uff0c\u5e76\u7ed3\u5408\u76f4\u901a\u4f30\u8ba1\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3b\u8981\u6210\u679c\u5305\u62ec\uff1a(1) Q-Sparse\u5728\u4fdd\u6301\u4e0e\u57fa\u7ebfLLM\u7ed3\u679c\u76f8\u5f53\u7684\u540c\u65f6\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a8\u7406\u65f6\u7684\u6548\u7387\uff1b(2) \u6211\u4eec\u7ed9\u51fa\u4e86\u7a00\u758f\u6fc0\u6d3bLLMs\u7684\u6700\u4f18\u63a8\u7406\u7f29\u653e\u5b9a\u5f8b\uff1b(3) Q-Sparse\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8868\u73b0\u4f18\u79c0\uff0c\u5305\u62ec\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7ee7\u7eed\u8bad\u7ec3\u548c\u5fae\u8c03\uff1b(4) Q-Sparse\u9002\u7528\u4e8e\u5168\u7cbe\u5ea6\u548c1\u4f4d\u7cbe\u5ea6\u7684LLMs\uff0c\u5982BitNet b1.58\u3002\u7279\u522b\u662f\uff0cBitNet b1.58\u4e0eQ-Sparse\uff08\u53ef\u914d\u5907MoE\uff09\u7684\u7ed3\u5408\uff0c\u4e3a\u672a\u6765LLMs\u7684\u6548\u7387\u63d0\u5347\uff0c\u5305\u62ec\u6210\u672c\u548c\u80fd\u8017\uff0c\u63d0\u4f9b\u4e86\u57fa\u77f3\u548c\u6e05\u6670\u8def\u5f84\u3002|\n", "2407.10960": "|**2024-07-15**|**Fast Matrix Multiplications for Lookup Table-Quantized LLMs**|Han Guo et.al.|[2407.10960](http://arxiv.org/abs/2407.10960)|**[link](https://github.com/hanguo97/flute)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u901a\u5e38\u53d7\u5230\u5185\u5b58\u5e26\u5bbd\u7684\u9650\u5236\uff0c\u5176\u4e2d\u4e3b\u8981\u74f6\u9888\u662f\u5c06\u6a21\u578b\u53c2\u6570\u4eceGPU\u5168\u5c40\u5185\u5b58\u4f20\u8f93\u5230\u5bc4\u5b58\u5668\u7684\u6210\u672c\u3002\u901a\u8fc7\u7ed3\u5408\u6743\u91cd\u53ea\u91cf\u5316\uff0c\u53ef\u4ee5\u51cf\u5c11\u5185\u5b58\u79fb\u52a8\uff0c\u4ece\u800c\u52a0\u901f\u63a8\u7406\u901f\u5ea6\u3002\u7136\u800c\uff0c\u4e3a\u91cf\u5316\u540e\u7684LLMs\u8bbe\u8ba1\u9ad8\u6027\u80fd\u5185\u6838\u662f\u4e00\u9879\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5f53\u6743\u91cd\u88ab\u538b\u7f29\u5230\u975e\u5747\u5300\u5206\u9694\u7684\u4f4d\u5bbd\uff08\u59823\u4f4d\uff09\uff0c\u5e76\u91c7\u7528\u975e\u5747\u5300\u67e5\u627e\u8868\uff08LUT\uff09\u91cf\u5316\u65f6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7075\u6d3b\u7684\u67e5\u627e\u8868\u5f15\u64ceFLUTE\uff0c\u5b83\u901a\u8fc7\u5bf9\u91cf\u5316\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u79bb\u7ebf\u91cd\u6784\uff0c\u4ee5\u6700\u5c0f\u5316\u89e3\u538b\u76f8\u5173\u7684\u4f4d\u64cd\u4f5c\uff0c\u5e76\u901a\u8fc7\u5411\u91cf\u5316\u548c\u590d\u5236\u67e5\u627e\u8868\u6765\u7f13\u89e3\u5171\u4eab\u5185\u5b58\u5e26\u5bbd\u9650\u5236\u3002\u5728\u5c0f\u6279\u91cf\uff08\u5c0f\u4e8e32\uff09\u548c\u91cf\u5316\u7ec4\u5927\u5c0f\u4e3a128\uff08LLM\u63a8\u7406\u4e2d\u7684\u5178\u578b\u503c\uff09\u7684\u60c5\u51b5\u4e0b\uff0cFLUTE\u5185\u6838\u7684\u901f\u5ea6\u53ef\u4ee5\u6bd4\u73b0\u6709GEMM\u5185\u6838\u5feb2-4\u500d\u3002\u4f5c\u4e3aFLUTE\u7684\u5e94\u7528\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u67e5\u627e\u8868\u57fa\u7684NormalFloat\u91cf\u5316\u7684\u4e00\u79cd\u7b80\u5355\u6269\u5c55\uff0c\u5e76\u5c06\u5176\u5e94\u7528\u4e8e\u91cf\u5316LLaMA3\uff0c\u83b7\u5f97\u4e86\u4e0e\u5f3a\u5927\u57fa\u51c6\u76f8\u5f53\u7684\u91cf\u5316\u6027\u80fd\uff0c\u540c\u65f6\u5b9e\u73b0\u4e86\u7aef\u5230\u7aef\u541e\u5410\u91cf\u76841.5\u52302\u500d\u63d0\u5347\u3002|\n", "2407.10953": "|**2024-07-15**|**MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models**|Chengguang Gan et.al.|[2407.10953](http://arxiv.org/abs/2407.10953)|null|## \u4efb\u52a1 **\u80cc\u666f\uff1a** \u4e92\u60e0\u589e\u5f3a\u6548\u5e94\uff08MRE\uff09\u5728\u4fe1\u606f\u62bd\u53d6\u548c\u591a\u4efb\u52a1\u7814\u7a76\u4e2d\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4ec5\u6709\u7684MRE\u6df7\u5408\u6570\u636e\u96c6\u5c40\u9650\u4e8e\u65e5\u8bed\uff0c\u8fd9\u9650\u5236\u4e86\u5168\u7403\u7814\u7a76\u754c\u7684\u5e7f\u6cdb\u63a2\u7d22\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u5c40\u9650\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u8bed\u8a00MRE\u6df7\u5408\u6570\u636e\u96c6\uff08MMM\uff09\uff0c\u5305\u542b\u82f1\u8bed\u3001\u65e5\u8bed\u548c\u6c49\u8bed\u768421\u4e2a\u5b50\u96c6\u3002\u672c\u6587\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f85\u52a9\u7684\u6570\u636e\u96c6\u7ffb\u8bd1\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528LLMs\u5c06\u539f\u59cb\u65e5\u8bed\u6587\u672c\u8fdb\u884c\u7ffb\u8bd1\uff0c\u5927\u5927\u51cf\u5c11\u4e86\u6570\u636e\u96c6\u6784\u5efa\u65f6\u7684\u4eba\u5de5\u6807\u6ce8\u65f6\u95f4\u3002 **\u8d21\u732e\uff1a** \u6211\u4eec\u6269\u5c55\u4e86\u6570\u636e\u96c6\uff0c\u52a0\u5165\u4e86\u5f00\u653e\u9886\u57df\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u53e5\u5b50\u5206\u7c7b\u4efb\u52a1\u3002\u57fa\u4e8e\u8fd9\u4e2a\u6269\u5145\u540e\u7684\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7edf\u4e00\u7684\u8f93\u5165-\u8f93\u51fa\u6846\u67b6\uff0c\u8bad\u7ec3\u4e86\u4e00\u4e2a\u5f00\u653e\u57df\u4fe1\u606f\u62bd\u53d6\u5927\u8bed\u8a00\u6a21\u578b\uff08OIELLM\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0cOIELLM\u6a21\u578b\u80fd\u591f\u6709\u6548\u5904\u7406\u65b0\u7684MMM\u6570\u636e\u96c6\uff0c\u5e76\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002 \u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u591a\u8bed\u8a00\u8d44\u6e90\u548c\u9ad8\u6548\u7684\u7ffb\u8bd1\u7b56\u7565\uff0c\u63a8\u52a8\u4e92\u60e0\u589e\u5f3a\u6548\u5e94\u5728\u591a\u8bed\u8a00\u4fe1\u606f\u62bd\u53d6\u9886\u57df\u7684\u5e94\u7528\u7814\u7a76\u3002|\n", "2407.10947": "|**2024-07-15**|**Can Textual Semantics Mitigate Sounding Object Segmentation Preference?**|Yaoting Wang et.al.|[2407.10947](http://arxiv.org/abs/2407.10947)|**[link](https://github.com/gewu-lab/sounding-object-segmentation-preference)**|**## \u4efb\u52a1 \u97f3\u9891-\u89c6\u89c9\u5206\u5272\uff08Audio-Visual Segmentation\uff0cAVS\uff09\u4efb\u52a1\u7684\u76ee\u6807\u662f\u5229\u7528\u97f3\u9891\u7ebf\u7d22\u5728\u89c6\u89c9\u7a7a\u95f4\u4e2d\u5206\u5272\u51fa\u53d1\u58f0\u7269\u4f53\u3002\u7136\u800c\uff0c\u7814\u7a76\u6307\u51fa\uff0c\u73b0\u6709\u7684AVS\u65b9\u6cd5\u8fc7\u4e8e\u4f9d\u8d56\u5bf9\u53ef\u542c\u89c1\u5bf9\u8c61\u7684\u5206\u5272\u504f\u597d\uff0c\u800c\u975e\u7cbe\u786e\u7684\u97f3\u9891\u6307\u5bfc\u3002\u95ee\u9898\u5728\u4e8e\uff0c\u76f8\u6bd4\u4e8e\u89c6\u89c9\uff0c\u97f3\u9891\u5728\u591a\u58f0\u6e90\u97f3\u573a\u4e2d\u7684\u8bed\u4e49\u8868\u73b0\u8f83\u5f31\uff0c\u5bfc\u81f4\u5176\u5728\u6307\u5bfc\u89c6\u89c9\u7a7a\u95f4\u65f6\u4f5c\u7528\u6709\u9650\u3002\u9274\u4e8e\u6587\u672c\u6a21\u6001\u7ecf\u8fc7\u6df1\u5165\u63a2\u7d22\uff0c\u5305\u542b\u4e30\u5bcc\u7684\u62bd\u8c61\u8bed\u4e49\uff0c\u6211\u4eec\u63d0\u51fa\u5229\u7528\u89c6\u89c9\u573a\u666f\u4e2d\u7684\u6587\u672c\u63d0\u793a\u6765\u589e\u5f3a\u97f3\u9891\u6307\u5bfc\u7684\u7cbe\u786e\u6027\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u73b0\u6210\u7684\u56fe\u50cf\u63cf\u8ff0\u5668\u83b7\u53d6\u573a\u666f\u63cf\u8ff0\uff0c\u7136\u540e\u5229\u7528\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u6f5c\u5728\u7684\u53d1\u58f0\u7269\u4f53\u4f5c\u4e3a\u6587\u672c\u7ebf\u7d22\u3002\u63a5\u7740\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u57fa\u4e8e\u8bed\u4e49\u7684\u97f3\u9891\u5efa\u6a21\u6a21\u5757\uff0c\u5f15\u5165\u52a8\u6001\u63a9\u7801\uff0c\u5c06\u97f3\u9891\u7279\u5f81\u4e0e\u6587\u672c\u7ebf\u7d22\u878d\u5408\uff0c\u751f\u6210\u5177\u6709\u4ee3\u8868\u6027\u7684\u53d1\u58f0\u7269\u4f53\u7279\u5f81\u3002\u8fd9\u4e9b\u7279\u5f81\u4e0d\u4ec5\u5305\u542b\u97f3\u9891\u4fe1\u606f\uff0c\u8fd8\u8574\u542b\u4e86\u751f\u52a8\u7684\u8bed\u4e49\uff0c\u4ece\u800c\u4e3a\u89c6\u89c9\u7a7a\u95f4\u63d0\u4f9b\u66f4\u4e3a\u6e05\u6670\u7684\u6307\u5f15\u3002\u6211\u4eec\u5728AVS\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u501f\u52a9\u6587\u672c\u63d0\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5bf9\u97f3\u9891\u7684\u654f\u611f\u5ea6\u5f97\u5230\u63d0\u5347\uff0c\u5728\u6240\u6709\u4e09\u4e2a\u5b50\u96c6\u4e0a\u8868\u73b0\u51fa\u9ad8\u5ea6\u7ade\u4e89\u529b\u3002\u9879\u76ee\u9875\u9762\uff1a[https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference](https://github.com/GeWu-Lab/Sounding-Object-Segmentation-Preference)\u3002**|\n", "2407.10943": "|**2024-07-15**|**GRUtopia: Dream General Robots in a City at Scale**|Hanqing Wang et.al.|[2407.10943](http://arxiv.org/abs/2407.10943)|**[link](https://github.com/openrobotlab/grutopia)**|**\u8fd1\u671f\u7684\u7814\u7a76\u6b63\u5728\u63a2\u7d22Embodied AI\u9886\u57df\u7684\u89c4\u6a21\u6cd5\u5219\u3002\u9274\u4e8e\u6536\u96c6\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u7684\u9ad8\u6602\u6210\u672c\uff0c\u6211\u4eec\u8ba4\u4e3a\u6a21\u62df\u5230\u73b0\u5b9e\uff08Sim2Real\uff09\u65b9\u6cd5\u5bf9\u4e8e\u6269\u5c55embodied\u6a21\u578b\u7684\u5b66\u4e60\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u4ecb\u7ecd\u9879\u76eeGRUtopia\uff0c\u8fd9\u662f\u4e00\u4e2a\u4e13\u4e3a\u5404\u79cd\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u9996\u4e2a\u4e92\u52a8\u4e09\u7ef4\u793e\u4f1a\u3002\u5b83\u5177\u6709\u591a\u9879\u521b\u65b0\uff1a(a) \u573a\u666f\u6570\u636e\u96c6GRScenes\u5305\u542b\u4e8610\u4e07\u5f20\u4ea4\u4e92\u5f0f\u3001\u7cbe\u7ec6\u6ce8\u91ca\u7684\u573a\u666f\uff0c\u8fd9\u4e9b\u573a\u666f\u53ef\u4ee5\u81ea\u7531\u7ec4\u5408\u6210\u57ce\u5e02\u89c4\u6a21\u7684\u73af\u5883\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u5173\u6ce8\u5bb6\u5ead\u73af\u5883\u7684\u4f5c\u54c1\u4e0d\u540c\uff0cGRScenes\u6db5\u76d6\u4e8689\u4e2a\u591a\u6837\u5316\u7684\u573a\u666f\u7c7b\u522b\uff0c\u5f25\u5408\u4e86\u670d\u52a1\u5bfc\u5411\u73af\u5883\u4e2d\u673a\u5668\u4eba\u521d\u59cb\u90e8\u7f72\u7684\u5dee\u8ddd\u3002(b) GRResidents\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u975e\u73a9\u5bb6\u89d2\u8272\uff08NPC\uff09\u7cfb\u7edf\uff0c\u8d1f\u8d23\u793e\u4ea4\u4e92\u52a8\u3001\u4efb\u52a1\u751f\u6210\u548c\u4efb\u52a1\u5206\u914d\uff0c\u4ece\u800c\u6a21\u62dfembodied AI\u5e94\u7528\u4e2d\u7684\u793e\u4f1a\u573a\u666f\u3002(c) \u6807\u51c6\u5316\u57fa\u51c6GRBench\u652f\u6301\u5404\u79cd\u673a\u5668\u4eba\uff0c\u4f46\u4ee5\u817f\u8db3\u673a\u5668\u4eba\u4e3a\u4e3b\uff0c\u63d0\u4f9b\u6d89\u53ca\u7269\u4f53\u5bfc\u822a\u3001\u793e\u4ea4\u5bfc\u822a\u548c\u79fb\u52a8\u64cd\u4f5c\u7684\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5177\u6709\u9002\u5ea6\u7684\u6311\u6218\u6027\u3002\u6211\u4eec\u671f\u671b\u8fd9\u9879\u5de5\u4f5c\u80fd\u591f\u7f13\u89e3\u8be5\u9886\u57df\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u532e\u4e4f\uff0c\u5e76\u4e3aEmbodied AI\u7814\u7a76\u63d0\u4f9b\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u9879\u76ee\u4ee3\u7801\u53ef\u4ecehttps://github.com/OpenRobotLab/GRUtopia\u83b7\u53d6\u3002**|\n", "2407.10909": "|**2024-07-15**|**FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets**|Xiaohui Victor Li et.al.|[2407.10909](http://arxiv.org/abs/2407.10909)|**[link](https://github.com/xiaohui-victor-li/FinDKG)**|\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff08DKGs\uff09\u662f\u4e00\u79cd\u6d41\u884c\u7684\u6570\u636e\u7ed3\u6784\uff0c\u7528\u4e8e\u8868\u793a\u968f\u65f6\u95f4\u53d8\u5316\u7684\u5bf9\u8c61\u4e4b\u95f4\u7684\u5404\u79cd\u8fde\u63a5\u3002\u5b83\u4eec\u5728\u5904\u7406\u590d\u6742\u65e0\u7ed3\u6784\u6570\u636e\u6e90\uff08\u5982\u6587\u672c\u548c\u56fe\u50cf\uff09\u63d0\u53d6\u7684\u4fe1\u606f\u65f6\u5c55\u73b0\u51fa\u9ad8\u6548\u6027\u3002\u5728\u91d1\u878d\u5e94\u7528\u4e2d\uff0cDKGs\u53ef\u7528\u4e8e\u57fa\u4e8e\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u63a2\u6d4b\u6295\u8d44\u7b56\u7565\u7684\u8d8b\u52bf\u3002\u672c\u7814\u7a76\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\u7684\u7279\u6027\uff0c\u4e3a\u6b64\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u7684Fine-tuned LLM\uff0c\u79f0\u4e3a\u96c6\u6210\u4e0a\u4e0b\u6587\u77e5\u8bc6\u56fe\u8c31\u751f\u6210\u5668\uff08ICKG\uff09\u3002\u5229\u7528ICKG\uff0c\u6211\u4eec\u4ece\u8d22\u7ecf\u65b0\u95fb\u6587\u7ae0\u4e2d\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u5f00\u6e90\u52a8\u6001\u77e5\u8bc6\u56fe\u8c31\uff0c\u79f0\u4e3aFinDKG\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6ce8\u610f\u529b\u673a\u5236\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\uff08KGTransformer\uff09\uff0c\u7528\u4e8e\u5206\u6790\u8fd9\u4e2a\u56fe\u8c31\u3002\u6211\u4eec\u5728\u57fa\u51c6\u6570\u636e\u96c6\u548cFinDKG\u4e0a\u6d4b\u8bd5\u4e86\u6a21\u578b\u6027\u80fd\uff0c\u7ed3\u679c\u663e\u793a\u5728\u94fe\u63a5\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0cKGTransformer\u8868\u73b0\u4f18\u5f02\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86KGTransformer\u5728FinDKG\u4e0a\u7684\u4e3b\u9898\u6295\u8d44\u6027\u80fd\uff0c\u8bc1\u660e\u5b83\u80fd\u8d85\u8d8a\u73b0\u6709\u7684\u4e3b\u9898\u4ea4\u6613\u6240\u4ea4\u6613\u57fa\u91d1\uff08ETF\uff09\u3002|\n", "2407.10887": "|**2024-07-15**|**Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique**|Mark Russinovich et.al.|[2407.10887](http://arxiv.org/abs/2407.10887)|null|\u968f\u7740\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u76d7\u548c\u8bef\u7528\u7684\u62c5\u5fe7\u52a0\u5267\uff0c\u6a21\u578b\u6307\u7eb9\u5316\u7684\u5fc5\u8981\u6027\u63d0\u5347\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u6210\u529f\u7684\u6307\u7eb9\u5e94\u5177\u5907\u4e94\u4e2a\u7279\u6027\uff1a\u900f\u660e\u6027\u3001\u6548\u7387\u3001\u6301\u4e45\u6027\u3001\u9c81\u68d2\u6027\u548c\u4e0d\u53ef\u4f2a\u9020\u6027\u3002\u672c\u6587\u9996\u5148\u5b9a\u4e49\u4e86\u8fd9\u4e9b\u8981\u6c42\u3002\u63a5\u7740\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7b80\u5355\u6307\u7eb9\u65b9\u6cd5\u2014\u2014Chain & Hash\uff0c\u5b83\u878d\u5408\u4e86\u52a0\u5bc6\u7406\u5ff5\uff0c\u5b9e\u73b0\u4e86\u6240\u6709\u8fd9\u4e9b\u7279\u6027\u3002Chain & Hash\u6d89\u53ca\u751f\u6210\u4e00\u7ec4\u95ee\u9898\uff08\u6307\u7eb9\uff09\u53ca\u5176\u53ef\u80fd\u7684\u7b54\u6848\uff0c\u7136\u540e\u4f7f\u7528\u5b89\u5168\u54c8\u5e0c\u6280\u672f\u5c06\u5b83\u4eec\u5408\u5e76\uff0c\u4ee5\u786e\u5b9a\u6bcf\u4e2a\u95ee\u9898\u7684\u503c\uff0c\u4ece\u800c\u4fdd\u8bc1\u4e0d\u53ef\u4f2a\u9020\u6027\uff0c\u9632\u6b62\u5bf9\u624b\u58f0\u79f0\u865a\u5047\u6240\u6709\u6743\u3002\u6211\u4eec\u5728\u591a\u4e2a\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86Chain & Hash\u6280\u672f\uff0c\u5e76\u5c55\u793a\u4e86\u5b83\u5bf9\u826f\u6027\u64cd\u4f5c\uff08\u5982\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\uff09\u548c\u654c\u610f\u5220\u9664\u6307\u7eb9\u7684\u9c81\u68d2\u6027\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u5e26\u6307\u7eb9\u7684\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6027\u80fd\u51e0\u4e4e\u4e0e\u975e\u6307\u7eb9\u5316\u6a21\u578b\u76f8\u5f53\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u9ad8\u6548\u6027\u53ca\u5176\u5b9e\u7528\u4ef7\u503c\u3002|\n", "2407.10886": "|**2024-07-15**|**SLIP: Securing LLMs IP Using Weights Decomposition**|Yehonathan Refael et.al.|[2407.10886](http://arxiv.org/abs/2407.10886)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u4ef7\u503c\u4f5c\u4e3a\u77e5\u8bc6\u4ea7\u6743\uff08IP\uff09\u65e5\u76ca\u51f8\u663e\uff0c\u53cd\u6620\u51fa\u5176\u80cc\u540e\u5de8\u5927\u7684\u6295\u8d44\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4e91\u90e8\u7f72\u6210\u672c\u9ad8\uff0c\u8fb9\u7f18\u8bbe\u5907\u90e8\u7f72\u7684\u9700\u6c42\u589e\u52a0\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u53c2\u6570\u88ab\u76d7\u7528\u548c\u672a\u7ecf\u6388\u6743\u4f7f\u7528\u3002\u5f53\u524d\u7684\u4fdd\u62a4\u65b9\u6cd5\u5728\u5b9e\u7528\u6027\u3001\u51c6\u786e\u6027\u635f\u5931\u6216\u9002\u5e94\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6df7\u5408\u63a8\u7406\u7b97\u6cd5\uff0c\u79f0\u4e3aSLIP\uff08Secure Lightweight Inference Protocol\uff09\uff0c\u65e8\u5728\u4fdd\u62a4\u90e8\u7f72\u5728\u8fb9\u7f18\u7684\u6a21\u578b\u514d\u53d7\u76d7\u7a83\u3002SLIP\u662f\u9996\u4e2a\u517c\u987e\u5b9e\u9645\u5e94\u7528\u7684\u5b9e\u7528\u6027\u548c\u4e25\u683c\u5b89\u5168\u6027\u7684\u6df7\u5408\u534f\u8bae\uff0c\u540c\u65f6\u4fdd\u6301\u96f6\u7cbe\u5ea6\u4e0b\u964d\u548c\u4f4e\u5ef6\u8fdf\u5f71\u54cd\u3002 SLIP\u901a\u8fc7\u77e9\u9635\u5206\u89e3\u5b9e\u73b0\u4e86\u6a21\u578b\u5728\u4e24\u4e2a\u8ba1\u7b97\u8d44\u6e90\u4e4b\u95f4\u7684\u5212\u5206\uff1a\u4e00\u4e2a\u5b89\u5168\u4f46\u6602\u8d35\uff0c\u53e6\u4e00\u4e2a\u6210\u672c\u6548\u76ca\u9ad8\u4f46\u6613\u53d7\u653b\u51fb\u3002\u5173\u952e\u5728\u4e8e\uff0c\u5b89\u5168\u8d44\u6e90\u4fdd\u7559\u4e86\u6a21\u578bIP\u4e2d\u6700\u654f\u611f\u7684\u90e8\u5206\uff0c\u540c\u65f6\u6267\u884c\u6700\u5c11\u7684\u8ba1\u7b97\uff0c\u800c\u8106\u5f31\u8d44\u6e90\u5219\u76f8\u53cd\u3002\u6b64\u5916\uff0c\u8be5\u534f\u8bae\u63d0\u4f9b\u4e86\u9632\u6b62\u653b\u51fb\u8005\u5229\u7528\u5206\u5272\u83b7\u53d6\u4fdd\u5bc6\u4fe1\u606f\u7684\u5b89\u5168\u4fdd\u969c\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5b9e\u9a8c\u7ed3\u679c\uff0c\u8bc1\u660e\u4e86SLIP\u7684\u7a33\u5065\u6027\u548c\u6709\u6548\u6027\uff0c\u4f7f\u5176\u6210\u4e3a\u4fdd\u62a4LLMs\u7684\u7406\u60f3\u89e3\u51b3\u65b9\u6848\u3002|\n", "2407.10873": "|**2024-07-15**|**Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models**|Rui Zhang et.al.|[2407.10873](http://arxiv.org/abs/2407.10873)|null|\u81ea\u52a8\u5316\u542f\u53d1\u5f0f\u8bbe\u8ba1\uff08AHD\uff09\u56e0\u5176\u5728\u81ea\u52a8\u5f00\u53d1\u9ad8\u6548\u542f\u53d1\u5f0f\u65b9\u6cd5\u65b9\u9762\u7684\u6f5c\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\uff0c\u4eba\u4eec\u5f00\u59cb\u63a2\u7d22\u5c06AHD\u89c6\u4e3a\u8fdb\u5316\u7a0b\u5e8f\u641c\u7d22\uff08EPS\uff09\u95ee\u9898\u7684\u65b0\u9014\u5f84\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u8bbe\u7f6e\u4e0d\u4e00\u81f4\uff0c\u57fa\u7840\u6bd4\u8f83\u4e0d\u8db3\uff0c\u4e14\u7f3a\u4e4f\u5bf9LLM\u4e0e\u641c\u7d22\u7b56\u7565\u7ed3\u5408\u5fc5\u8981\u6027\u7684\u6df1\u5165\u5206\u6790\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u7684\u5b9e\u9645\u8fdb\u5c55\u96be\u4ee5\u5f97\u5230\u5145\u5206\u8bc1\u660e\u3002\u672c\u7814\u7a76\u901a\u8fc7\u4e00\u9879\u5927\u89c4\u6a21\u57fa\u51c6\u6d4b\u8bd5\uff0c\u6db5\u76d6\u4e86\u56db\u9879\u57fa\u4e8eLLM\u7684EPS\u65b9\u6cd5\u548c\u56db\u9879AHD\u95ee\u9898\uff0c\u8de8\u8d8a\u4e5d\u79cdLLM\uff0c\u5e76\u8fdb\u884c\u4e86\u4e94\u6b21\u72ec\u7acb\u8fd0\u884c\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\uff0c\u5b9e\u8bc1\u4e86\u5728LLM\u9a71\u52a8\u7684AHD\u65b9\u6cd5\u4e2d\u7684\u8fdb\u5316\u641c\u7d22\u7684\u91cd\u8981\u6027\uff0c\u540c\u65f6\u4e5f\u63a8\u52a8\u4e86\u672a\u6765EPS\u7b97\u6cd5\u5f00\u53d1\u7684\u8fdb\u6b65\u3002\u4e3a\u4e86\u4fc3\u8fdb\u53ef\u8bbf\u95ee\u6027\u548c\u53ef\u91cd\u590d\u6027\uff0c\u6211\u4eec\u5df2\u7ecf\u5168\u9762\u5f00\u6e90\u4e86\u6211\u4eec\u7684\u57fa\u51c6\u548c\u76f8\u5173\u7ed3\u679c\u3002|\n", "2407.11965": "|**2024-07-16**|**UrbanWorld: An Urban World Model for 3D City Generation**|Yu Shang et.al.|[2407.11965](http://arxiv.org/abs/2407.11965)|null|\u57ce\u5e02\u4f5c\u4e3a\u4eba\u7c7b\u751f\u6d3b\u7684\u57fa\u672c\u73af\u5883\uff0c\u5305\u542b\u4e86\u5efa\u7b51\u3001\u9053\u8def\u548c\u690d\u88ab\u7b49\u591a\u5143\u7269\u7406\u5143\u7d20\uff0c\u8fd9\u4e9b\u5143\u7d20\u4e4b\u95f4\u5b58\u5728\u7740\u590d\u6742\u7684\u76f8\u4e92\u5173\u8054\u3002\u6784\u5efa\u903c\u771f\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u73af\u5883\u5bf9\u4e8e\u7814\u53d1\u80fd\u5728\u73b0\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u611f\u77e5\u3001\u51b3\u7b56\u548c\u884c\u52a8\u7684AI\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u624b\u5de5\u5236\u4f5c\u8fc7\u7a0b\u8017\u65f6\u4e14\u7cbe\u7ec6\uff0c\u9700\u8981\u8bbe\u8ba1\u5e08\u6295\u5165\u5927\u91cf\u7cbe\u529b\u6765\u7cbe\u786e\u5448\u73b0\u590d\u6742\u7684\u57ce\u5e02\u7279\u5f81\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faUrbanWorld\uff0c\u8fd9\u662f\u4e00\u4e2a\u9996\u4e2a\u81ea\u52a8\u751f\u6210\u5b9a\u5236\u5316\u3001\u771f\u5b9e\u4e14\u4e92\u52a8\u76843D\u57ce\u5e02\u4e16\u754c\u7684\u6a21\u578b\uff0c\u652f\u6301\u7075\u6d3b\u7684\u63a7\u5236\u6761\u4ef6\u3002UrbanWorld\u7684\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u56db\u4e2a\u5173\u952e\u6b65\u9aa4\uff1a\u5229\u7528\u516c\u5f00\u7684OSM\u6570\u636e\u8fdb\u884c3D\u5e03\u5c40\u751f\u6210\u3001\u501f\u52a9\u5f3a\u5927\u7684\u57ce\u5e02\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08Urban MLLM\uff09\u8fdb\u884c\u57ce\u5e02\u573a\u666f\u89c4\u5212\u4e0e\u8bbe\u8ba1\u3001\u901a\u8fc7\u5148\u8fdb\u76843D\u6269\u6563\u6280\u672f\u5b9e\u73b0\u53ef\u63a7\u8d44\u4ea7\u6e32\u67d3\uff0c\u4ee5\u53caMLLM\u8f85\u52a9\u7684\u573a\u666f\u7ec6\u5316\u3002UrbanWorld\u751f\u6210\u7684\u9ad8\u4fdd\u771f3D\u57ce\u5e02\u73af\u5883\u4e3a\u901a\u7528AI\u548c\u673a\u5668\u611f\u77e5\u7cfb\u7edf\u5728\u6a21\u62df\u4e2d\u7684\u771f\u5b9e\u53cd\u9988\u548c\u4ea4\u4e92\u63d0\u4f9b\u4e86\u53ef\u80fd\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u5c06UrbanWorld\u4f5c\u4e3a\u5f00\u6e90\u4e14\u591a\u529f\u80fd\u7684\u5e73\u53f0\uff0c\u7528\u4e8e\u8bc4\u4f30\u548c\u63d0\u5347AI\u5728\u771f\u5b9e\u57ce\u5e02\u73af\u5883\u4e2d\u7684\u611f\u77e5\u3001\u51b3\u7b56\u548c\u4e92\u52a8\u80fd\u529b\u3002|\n", "2407.11963": "|**2024-07-16**|**NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?**|Mo Li et.al.|[2407.11963](http://arxiv.org/abs/2407.11963)|**[link](https://github.com/open-compass/opencompass)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aNeedleBench\u7684\u6846\u67b6\uff0c\u5b83\u662f\u4e00\u7cfb\u5217\u8bc4\u4f30\u5927\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u957f\u6587\u672c\u7406\u89e3\u80fd\u529b\u7684\u9010\u6b65\u5347\u7ea7\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u6d89\u53ca\u4e0d\u540c\u957f\u5ea6\u533a\u95f4\uff084k\u30018k\u300132k\u3001128k\u3001200k\u30011M\u4e43\u81f3\u66f4\u957f\uff09\u548c\u6df1\u5ea6\u8303\u56f4\uff0c\u901a\u8fc7\u5728\u4e0d\u540c\u6587\u672c\u6df1\u5ea6\u533a\u57df\u63d2\u5165\u5173\u952e\u6570\u636e\u70b9\uff0c\u7cfb\u7edf\u6027\u5730\u6d4b\u8bd5\u6a21\u578b\u5728\u5404\u79cd\u60c5\u5883\u4e0b\u7684\u68c0\u7d22\u548c\u63a8\u7406\u80fd\u529b\u3002\u9488\u5bf9\u4e8e\u53cc\u8bed\u957f\u6587\u672c\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u6846\u67b6\u6765\u8003\u5bdf\u4e3b\u6d41\u5f00\u6e90\u6a21\u578b\u8bc6\u522b\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e76\u8fd0\u7528\u8fd9\u4e9b\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u7956\u5148\u8ffd\u8e2a\u6311\u6218\uff08Ancestral Trace Challenge\uff0cATC\uff09\uff0c\u65e8\u5728\u6a21\u62df\u73b0\u5b9e\u4e16\u754c\u4e2d\u957f\u6587\u672c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u7684\u590d\u6742\u6027\uff0c\u63d0\u4f9b\u4e00\u4e2a\u7b80\u5355\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30LLMs\u5904\u7406\u590d\u6742\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7814\u7a76\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u524d\u7684LLMs\u5728\u5b9e\u9645\u7684\u957f\u6587\u672c\u5e94\u7528\u4e2d\u4ecd\u6709\u5f88\u5927\u7684\u63d0\u5347\u7a7a\u95f4\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u5904\u7406\u903b\u8f91\u63a8\u7406\u96be\u9898\u65f6\u9762\u4e34\u6311\u6218\u3002\u6240\u6709\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728OpenCompass\u9879\u76ee\uff08https://github.com/open-compass/opencompass\uff09\u83b7\u53d6\u3002**|\n", "2407.11934": "|**2024-07-16**|**Code Documentation and Analysis to Secure Software Development**|Paul Attie et.al.|[2407.11934](http://arxiv.org/abs/2407.11934)|null|\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCode Documentation and Analysis Tool\uff08CoDAT\uff09\u7684\u5de5\u5177\u3002CoDAT\u65e8\u5728\u4fdd\u6301\u4ee3\u7801\u6587\u6863\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\uff0c\u4f8b\u5982\uff0c\u5982\u679c\u4ee3\u7801\u7247\u6bb5\u4e2d\u7684\u67d0\u884c\u88ab\u4fee\u6539\uff0c\u76f8\u5e94\u7684\u6ce8\u91ca\u4e5f\u4f1a\u81ea\u52a8\u66f4\u65b0\uff0c\u786e\u4fdd\u5185\u90e8\u4e00\u81f4\u6027\u4ee5\u53ca\u4e0e\u4ee3\u7801\u7684\u4e00\u81f4\u6027\u3002\u901a\u8fc7\u6807\u8bb0\u8fc7\u65f6\u7684\u6ce8\u91ca\uff0cCoDAT\u63d0\u9192\u5f00\u53d1\u8005\u7ef4\u62a4\u6700\u65b0\u7684\u6587\u6863\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u67e5\u4ee3\u7801\u7247\u6bb5\u4e0e\u5176\u63cf\u8ff0\u7684\u8bed\u4e49\u4e00\u81f4\u6027\uff0c\u4ece\u800c\u4e5f\u80fd\u8bc6\u522b\u51fa\u8bed\u4e49\u4e0d\u4e00\u81f4\u548c\u8fc7\u65f6\u7684\u6ce8\u91ca\u3002\u8fd9\u6709\u52a9\u4e8e\u7a0b\u5e8f\u5458\u7f16\u5199\u6b63\u786e\u5b9e\u73b0\u4ee3\u7801\u8349\u56fe\u7684\u4ee3\u7801\uff0c\u652f\u6301\u9010\u6b65\u7ec6\u5316\u65b9\u6cd5\uff0c\u4ece\u4ee3\u7801\u8349\u56fe\u9010\u6b65\u6f14\u53d8\u4e3a\u7ecf\u8fc7\u4e00\u4e24\u6b21\u6216\u66f4\u591a\u6b21\u7ec6\u5316\u8fed\u4ee3\u7684\u4ee3\u7801\u3002 CoDAT\u5728IntelliJ IDEA IDE\u4e2d\u5b9e\u73b0\uff0c\u5229\u7528Code Insight\u5b88\u62a4\u7a0b\u5e8f\u5305\u7ed3\u5408\u81ea\u5b9a\u4e49\u6b63\u5219\u8868\u8fbe\u5f0f\u7b97\u6cd5\uff0c\u6807\u8bb0\u5bf9\u5e94\u4ee3\u7801\u5757\u5df2\u66f4\u6539\u7684\u6807\u8bb0\u6ce8\u91ca\u3002CoDAT\u7684\u540e\u7aef\u7ed3\u6784\u4e0a\u662f\u53bb\u4e2d\u5fc3\u5316\u7684\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u8d26\u672c\u6846\u67b6\uff0c\u4ee5\u5b9e\u73b0\u4ee3\u7801\u4e00\u81f4\u6027\u8ddf\u8e2a\u548c\u67b6\u6784\u7f16\u8bd1\u7ba1\u7406\u3002|\n", "2407.11919": "|**2024-07-16**|**What's Wrong? Refining Meeting Summaries with LLM Feedback**|Frederic Kirstein et.al.|[2407.11919](http://arxiv.org/abs/2407.11919)|null|\u968f\u7740\u6570\u5b57\u4f1a\u8bae\u7684\u666e\u53ca\uff0c\u4f1a\u8bae\u6458\u8981\u63d0\u70bc\u6210\u4e3a\u5173\u952e\u4efb\u52a1\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fd9\u4e00\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u5728\u8fde\u8d2f\u6027\u548c\u7406\u89e3\u4e0a\u4e0b\u6587\u4e2d\u8d85\u8d8a\u4e86\u4f20\u7edf\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4ecd\u9700\u6539\u8fdb\u4ee5\u4fdd\u6301\u76f8\u5173\u6027\u5e76\u907f\u514d\u9519\u8bef\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u591aLLM\u7684\u4f1a\u8bae\u6458\u8981\u4fee\u6b63\u65b9\u6cd5\uff0c\u901a\u8fc7\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6a21\u62df\u4eba\u7c7b\u5ba1\u67e5\uff1a\u9519\u8bef\u8bc6\u522b\u548c\u6458\u8981\u7cbe\u70bc\u3002\u6211\u4eec\u53d1\u5e03\u4e86QMSum Mistake\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b200\u4efd\u7531\u4eba\u5de5\u6807\u6ce8\u7684\u81ea\u52a8\u751f\u6210\u4f1a\u8bae\u6458\u8981\u6570\u636e\u96c6\uff0c\u9488\u5bf9\u7ed3\u6784\u3001\u9057\u6f0f\u548c\u4e0d\u76f8\u5173\u7b49\u4e5d\u79cd\u9519\u8bef\u7c7b\u578b\u8fdb\u884c\u4e86\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLLMs\u80fd\u591f\u51c6\u786e\u8bc6\u522b\u8fd9\u4e9b\u9519\u8bef\u3002\u6211\u4eec\u5c06\u8bc6\u522b\u51fa\u7684\u95ee\u9898\u8f6c\u5316\u4e3a\u53ef\u64cd\u4f5c\u7684\u53cd\u9988\uff0c\u4ee5\u6b64\u63d0\u5347\u6458\u8981\u7684\u8d28\u91cf\uff0c\u5982\u76f8\u5173\u6027\u3001\u4fe1\u606f\u91cf\u3001\u7b80\u6d01\u6027\u548c\u8fde\u8d2f\u6027\u3002\u8fd9\u79cd\u4e8b\u540e\u4f18\u5316\u7b56\u7565\u901a\u8fc7\u5229\u7528\u591a\u4e2aLLMs\u6765\u9a8c\u8bc1\u8f93\u51fa\u8d28\u91cf\uff0c\u6709\u6548\u63d0\u9ad8\u4e86\u6458\u8981\u8d28\u91cf\u3002\u6211\u4eec\u7684\u591aLLM\u4f1a\u8bae\u6458\u8981\u65b9\u6cd5\u5bf9\u4e8e\u9700\u8981\u7a33\u5065\u6027\u3001\u884c\u52a8\u8ba1\u5212\u548c\u76ee\u6807\u5bfc\u5411\u7684\u590d\u6742\u6587\u672c\u751f\u6210\u4efb\u52a1\u5177\u6709\u6f5c\u5728\u5e94\u7528\u4ef7\u503c\u3002|\n", "2407.11888": "|**2024-07-16**|**Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads**|Aritra Dhar et.al.|[2407.11888](http://arxiv.org/abs/2407.11888)|null|\u5728\u4e91\u5de5\u4f5c\u8d1f\u8f7d\u4e2d\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210AI\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u4e13\u7528\u786c\u4ef6\u52a0\u901f\u5668\uff0c\u5982GPU\u3001NPUs\u548cTPUs\uff0c\u56e0\u5176\u5728AI\u5e94\u7528\u4e2d\u7684\u5353\u8d8a\u6027\u80fd\u8d85\u8d8a\u4e86\u901a\u7528CPU\u3002AI\u6a21\u578b\u548c\u6570\u636e\u901a\u5e38\u5177\u6709\u9ad8\u5ea6\u654f\u611f\u6027\uff0c\u5e76\u6765\u81ea\u76f8\u4e92\u4e0d\u4fe1\u4efb\u7684\u5404\u65b9\u3002\u73b0\u6709\u7684\u57fa\u4e8eCPU\u7684\u53ef\u4fe1\u6267\u884c\u73af\u5883\uff08TEE\uff09\uff0c\u5982\u82f1\u7279\u5c14SGX\u6216AMD SEV\uff0c\u63d0\u4f9b\u7684\u4fdd\u62a4\u4e0d\u591f\u5145\u5206\u3002\u50cfNvidia-CC\u8fd9\u6837\u7684\u8bbe\u5907\u4e2d\u5fc3TEE\u4ec5\u9488\u5bf9\u7d27\u5bc6\u8026\u5408\u7684CPU-GPU\u7cfb\u7edf\uff0c\u4e14\u91c7\u7528\u4e13\u6709\u65b9\u6848\uff0c\u9700\u8981\u5728\u4e3b\u673aCPU\u4e0a\u90e8\u7f72TEE\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u73b0\u6709\u7684\u5b66\u672f\u63d0\u6848\u5927\u591a\u9488\u5bf9\u7279\u5b9a\u7684CPU-TEE\u5e73\u53f0\u3002 \u4e3a\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Ascend-CC\uff0c\u4e00\u79cd\u57fa\u4e8e\u79bb\u6563NPUs\u7684\u673a\u5bc6\u8ba1\u7b97\u67b6\u6784\uff0c\u65e0\u9700\u5bf9\u4e3b\u673a\u7cfb\u7edf\u4fe1\u4efb\u3002Ascend-CC\u901a\u8fc7\u786e\u4fdd\u6570\u636e\u548c\u6a21\u578b\u52a0\u5bc6\uff0c\u4fdd\u62a4\u6570\u636e\u3001\u6a21\u578b\u53c2\u6570\u548c\u8fd0\u7b97\u7b26\u4e8c\u8fdb\u5236\uff0c\u63d0\u4f9b\u5f3a\u5927\u7684\u5b89\u5168\u6027\u3002\u5b83\u5229\u7528\u59d4\u6258\u5f0f\u5185\u5b58\u8bed\u4e49\u786e\u4fdd\u4e0e\u4e3b\u673a\u8f6f\u4ef6\u6808\u7684\u9694\u79bb\uff0c\u5e76\u901a\u8fc7\u4efb\u52a1\u9274\u6743\u63d0\u4f9b\u6a21\u578b\u5b8c\u6574\u6027\u7684\u5f3a\u6709\u529b\u4fdd\u8bc1\u3002\u6211\u4eec\u7684Ascend-CC\u5b9e\u73b0\u548c\u4e0e\u6700\u65b0LLMs\uff08\u5982Llama2\u548cLlama3\uff09\u7684\u8bc4\u4f30\u8868\u660e\uff0cAscend-CC\u5f15\u5165\u7684\u5f00\u9500\u6781\u5c0f\uff0c\u65e0\u9700\u4fee\u6539AI\u8f6f\u4ef6\u6808\u3002|\n", "2407.11852": "|**2024-07-16**|**Schema Matching with Large Language Models: an Experimental Study**|Marcel Parciak et.al.|[2407.11852](http://arxiv.org/abs/2407.11852)|**[link](https://github.com/uhasselt-dsi-data-systems-lab/code-schema-matching-llms-artefacs)**|**\u8be5\u8bba\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5173\u7cfb\u6570\u636e\u5e93\u67b6\u6784\uff08schema\uff09\u5339\u914d\u4e2d\u7684\u5e94\u7528\u3002\u76ee\u6807\u662f\u4ec5\u901a\u8fc7\u5143\u7d20\u540d\u79f0\u548c\u63cf\u8ff0\u627e\u51fa\u4e24\u4e2a\u5173\u7cfb\u6a21\u5f0f\u4e4b\u95f4\u7684\u8bed\u4e49\u5bf9\u5e94\u3002\u7814\u7a76\u8005\u6784\u5efa\u4e86\u4e00\u4e2a\u6765\u81ea\u5065\u5eb7\u9886\u57df\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u63d0\u51fa\u4e86\u4e0d\u540c\u7684\u4efb\u52a1\u8303\u56f4\uff0c\u5373\u4f7f\u7528\u4e0d\u540c\u6570\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u63d0\u793a\u6a21\u578b\u8fdb\u884cschema\u5339\u914d\u3002\u4ed6\u4eec\u5bf9\u6bd4\u4e86\u57fa\u4e8eLLM\u7684\u5339\u914d\u65b9\u6cd5\u4e0e\u57fa\u4e8e\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u7684\u57fa\u7ebf\uff0c\u8003\u5bdf\u4e86\u5339\u914d\u8d28\u91cf\u3001\u9a8c\u8bc1\u5de5\u4f5c\u91cf\u3001\u51b3\u7b56\u786e\u5b9a\u6027\u548c\u4e92\u8865\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u4fe1\u606f\u4f1a\u964d\u4f4e\u5339\u914d\u8d28\u91cf\uff0c\u8fc7\u591a\u7684\u4fe1\u606f\u4e5f\u4f1a\u6709\u8d1f\u9762\u5f71\u54cd\u3002\u65b0\u7248\u672c\u7684LLMs\u901a\u5e38\u80fd\u63d0\u9ad8\u51b3\u7b56\u786e\u5b9a\u6027\u3002\u6709\u4e9b\u4efb\u52a1\u8303\u56f4\u4e0b\u7684\u9a8c\u8bc1\u5de5\u4f5c\u76f8\u5bf9\u9002\u5ea6\uff0c\u4e14\u80fd\u6210\u529f\u8bc6\u522b\u5927\u91cf\u771f\u6b63\u610f\u4e49\u4e0a\u7684\u8bed\u4e49\u5339\u914d\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0cLLMs\u6709\u6f5c\u529b\u4f5c\u4e3aschema\u5339\u914d\u7684\u521d\u59cb\u5de5\u5177\uff0c\u6570\u636e\u5de5\u7a0b\u5e08\u53ef\u4ee5\u5229\u7528\u5b83\u4eec\u7684\u540d\u79f0\u548c\u63cf\u8ff0\u4fe1\u606f\u5feb\u901f\u8fdb\u884c\u5339\u914d\uff0c\u65e0\u9700\u4f9d\u8d56\u5b9e\u9645\u6570\u636e\u5b9e\u4f8b\u3002**|\n", "2407.11833": "|**2024-07-16**|**LoFTI: Localization and Factuality Transfer to Indian Locales**|Sona Elza Simon et.al.|[2407.11833](http://arxiv.org/abs/2407.11833)|**[link](https://github.com/csalt-research/lofti)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u8bad\u7ec3\u5728\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u7684\u5927\u578b\u7f51\u9875\u6570\u636e\u96c6\uff0c\u79ef\u7d2f\u4e86\u5927\u91cf\u7684\u4e16\u754c\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u901a\u5e38\u503e\u5411\u4e8e\u82f1\u8bed\u548c\u897f\u6b27\u56fd\u5bb6\uff0c\u5bfc\u81f4LLMs\u5bf9\u6765\u81ea\u5176\u4ed6\u5730\u533a\uff0c\u7279\u522b\u662f\u5370\u5ea6\u7684\u672c\u5730\u5316\u67e5\u8be2\u4ea7\u751f\u504f\u89c1\u6216\u865a\u6784\u7684\u56de\u7b54\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6LoFTI\uff08\u5370\u5ea6\u672c\u5730\u5316\u4e0e\u4e8b\u5b9e\u8f6c\u79fb\uff09\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\u7684\u672c\u5730\u5316\u548c\u4e8b\u5b9e\u6587\u672c\u8f6c\u6362\u80fd\u529b\u3002LoFTI\u5305\u542b\u5173\u4e8e\u5168\u7403\u6e90\u5730\u70b9\u548c\u5370\u5ea6\u76ee\u6807\u5730\u70b9\uff08\u5305\u62ec\u56fd\u5bb6\u3001\u5dde\u548c\u57ce\u5e02\u7684\u4e0d\u540c\u5c42\u7ea7\uff09\u5b9e\u4f53\u7684\u4e8b\u5b9e\u9648\u8ff0\uff0c\u6d89\u53ca\u5404\u7c7b\u5e7f\u6cdb\u7684\u4e3b\u9898\u3002\u6211\u4eec\u4f7f\u7528LoFTI\u6765\u8bc4\u4f30Mixtral\u3001GPT-4\u4ee5\u53ca\u4e24\u79cd\u9002\u7528\u4e8e\u672c\u5730\u5316\u4e8b\u5b9e\u8f6c\u79fb\u4efb\u52a1\u7684Mixtral\u884d\u751f\u65b9\u6cd5\u3002\u5b9e\u9a8c\u8868\u660e\uff0cLoFTI\u662f\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u8bc4\u4f30\u6807\u51c6\uff0c\u5305\u62ecGPT-4\u5728\u5185\u7684\u6240\u6709\u6a21\u578b\u5728\u4e0d\u540c\u5c42\u7ea7\u7684\u672c\u5730\u5316\u4e0a\u90fd\u8868\u73b0\u51fa\u504f\u5dee\u3002**|\n", "2407.11827": "|**2024-07-16**|**GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text**|Kyle Hamilton et.al.|[2407.11827](http://arxiv.org/abs/2407.11827)|null|\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u5728\u68c0\u6d4b\u6587\u672c\u4e2d\u7684\u5ba3\u4f20\u624b\u6bb5\u65b9\u9762\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u4f46\u5927\u591a\u6570\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u201c\u9ed1\u76d2\u201d\u89e3\u51b3\u65b9\u6848\uff0c\u5176\u5185\u90e8\u5de5\u4f5c\u539f\u7406\u4e0d\u900f\u660e\u3002\u53ef\u89e3\u91ca\u7684\u65b9\u6cd5\u63d0\u4f9b\u4e86\u89e3\u51b3\u65b9\u6848\uff0c\u4f46\u5b83\u4eec\u4f9d\u8d56\u4e8e\u7cbe\u5fc3\u7684\u7279\u5f81\u5de5\u7a0b\u548c\u6602\u8d35\u7684\u4e13\u5bb6\u6807\u6ce8\u6570\u636e\u3002\u6b64\u5916\uff0c\u5173\u4e8e\u8bf4\u670d\u6027\u6587\u672c\u7684\u8bed\u8a00\u7279\u6027\u901a\u5e38\u7531\u4fee\u8f9e\u5b66\u5bb6\u6216\u8bed\u8a00\u5b66\u5bb6\u5173\u6ce8\uff0c\u4f46\u6ca1\u6709\u9002\u5408\u673a\u5668\u5b66\u4e60\u7684\u6807\u8bb0\u6709\u6b64\u7c7b\u7279\u6027\u7684\u6570\u636e\u96c6\u3002\u672c\u7814\u7a76\u65e8\u5728\u7f16\u7e82\u6587\u732e\u4e2d\u8bc6\u522b\u51fa\u768422\u4e2a\u4fee\u8f9e\u548c\u8bed\u8a00\u7279\u5f81\uff0c\u76ee\u7684\u662f\u5bf9\u4e00\u4e2a\u5df2\u6807\u6ce8\u6709\u5ba3\u4f20\u624b\u6bb5\u7684\u73b0\u6709\u6570\u636e\u96c6\u8fdb\u884c\u6ce8\u91ca\u3002\u4e3a\u4e86\u5e2e\u52a9\u4eba\u7c7b\u4e13\u5bb6\u5728\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u4e0a\u6807\u6ce8\u8fd9\u4e9b\u7279\u5f81\uff0c\u6211\u4eec\u7279\u522b\u8bbe\u8ba1\u4e86\u540d\u4e3aRhetAnn\u7684\u7f51\u7edc\u5e94\u7528\uff0c\u4ee5\u51cf\u5c11\u539f\u672c\u8f83\u5927\u7684\u8ba4\u77e5\u8d1f\u62c5\u3002\u63a5\u7740\uff0c\u4f7f\u7528\u4e00\u5c0f\u90e8\u5206\u6807\u6ce8\u6570\u636e\uff0c\u6211\u4eec\u5229\u7528GPT-3.5\uff0c\u4e00\u79cd\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5bf9\u5269\u4f59\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u540c\u65f6\u517c\u987e\u6210\u672c\u6548\u76ca\u548c\u5206\u7c7b\u7cbe\u5ea6\u3002\u8fd9\u9879\u7814\u7a76\u8868\u660e\uff0c\u7ed3\u5408\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u793a\u4f8b\u4e0eGPT\uff0c\u53ef\u4ee5\u6709\u6548\u5730\u4ee5\u4f20\u7edf\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u4e13\u5bb6\u7684\u6807\u6ce8\u6210\u672c\u7684\u5341\u5206\u4e4b\u4e00\u5de6\u53f3\u5b9e\u73b0\u5927\u89c4\u6a21\u6807\u6ce8\u8fc7\u7a0b\u7684\u6269\u5c55\u3002\u7ed3\u679c\u4e0e\u64b0\u5199\u65f6\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4\uff09\u76f8\u5f53\uff0c\u4e14\u6210\u672c\u964d\u4f4e10\u500d\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u8fd9\u4e9b\u7279\u5f81\u3001\u5b83\u4eec\u7684\u5c5e\u6027\u3001\u5b9a\u4e49\u4ee5\u53ca\u793a\u4f8b\u7684\u673a\u5668\u53ef\u8bfb\u683c\u5f0f\uff0c\u4ee5\u53caRhetAnn\u7684\u4ee3\u7801\u3001GPT\u63d0\u793a\u548c\u5fae\u8c03\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u63a8\u52a8\u4e86\u53ef\u89e3\u91ca\u7684\u5ba3\u4f20\u624b\u6bb5\u68c0\u6d4b\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\u3002|\n", "2407.11798": "|**2024-07-16**|**PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation**|Branden Butler et.al.|[2407.11798](http://arxiv.org/abs/2407.11798)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5206\u5e03\u5f0f\u8ba1\u7b97\u673a\u96c6\u7fa4\u4e0a\u7684\u63a8\u7406\u5df2\u6210\u4e3a\u7814\u7a76\u70ed\u70b9\uff0c\u8bb8\u591a\u52a0\u901f\u6280\u672f\u501f\u9274\u4e86CPU\u7684\u63a8\u6d4b\u6267\u884c\u7b56\u7565\u3002\u8fd9\u4e9b\u6280\u672f\u65e8\u5728\u7f13\u89e3\u5185\u5b58\u5e26\u5bbd\u74f6\u9888\uff0c\u4f46\u4f1a\u589e\u52a0\u6bcf\u6b21\u63a8\u7406\u8fd0\u884c\u7684\u7aef\u5230\u7aef\u5ef6\u8fdf\uff0c\u9700\u8981\u9ad8\u63a8\u6d4b\u63a5\u53d7\u7387\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u7531\u4e8e\u4efb\u52a1\u95f4\u63a5\u53d7\u7387\u7684\u53d8\u5f02\u6027\uff0c\u63a8\u6d4b\u6027\u63a8\u7406\u53ef\u80fd\u5bfc\u81f4\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u7ba1\u9053\u5e76\u884c\u8bbe\u8ba1\u9700\u8981\u5927\u91cf\u7528\u6237\u8bf7\u6c42\u4ee5\u4fdd\u6301\u9ad8\u5229\u7528\u7387\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PipeInfer\uff0c\u8fd9\u662f\u4e00\u79cd\u65e8\u5728\u51cf\u5c11\u8de8\u4ee4\u724c\u5ef6\u8fdf\u3001\u63d0\u9ad8\u5355\u8bf7\u6c42\u573a\u666f\u4e0b\u7cfb\u7edf\u5229\u7528\u7387\u7684\u7ba1\u9053\u5316\u63a8\u6d4b\u52a0\u901f\u6280\u672f\uff0c\u540c\u65f6\u589e\u5f3a\u4e86\u5bf9\u4f4e\u63a8\u6d4b\u63a5\u53d7\u7387\u548c\u4f4e\u5e26\u5bbd\u4e92\u8054\u7684\u5bb9\u5fcd\u5ea6\u3002 PipeInfer\u901a\u8fc7\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u548c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6539\u8fdb\u3002\u8fde\u7eed\u5f02\u6b65\u63a8\u6d4b\u5141\u8bb8\u540c\u65f6\u8fdb\u884c\u5355\u4ee4\u724c\u63a8\u7406\u4e0e\u591a\u4e2a\u63a8\u6d4b\u8fd0\u884c\uff0c\u4ece\u800c\u964d\u4f4e\u5ef6\u8fdf\u548c\u751f\u6210\u901f\u5ea6\u3002\u800c\u65e9\u671f\u63a8\u7406\u53d6\u6d88\u5219\u80fd\u591f\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u8df3\u8fc7\u65e0\u6548\u8fd0\u884c\u7684\u8ba1\u7b97\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u901f\u5ea6\u548c\u5ef6\u8fdf\u3002PipeInfer\u5728\u751f\u6210\u901f\u5ea6\u4e0a\u6bd4\u6807\u51c6\u63a8\u6d4b\u6027\u63a8\u7406\u6700\u9ad8\u53ef\u63d0\u53472.15\u500d\u3002|\n", "2407.11789": "|**2024-07-16**|**Large Language Models as Misleading Assistants in Conversation**|Betty Li Hou et.al.|[2407.11789](http://arxiv.org/abs/2407.11789)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u80fd\u591f\u63d0\u4f9b\u5e2e\u52a9\u3002\u7136\u800c\uff0c\u6a21\u578b\u8f93\u51fa\u53ef\u80fd\u4f1a\u8bef\u5bfc\u7528\u6237\uff0c\u65e0\u8bba\u662f\u65e0\u610f\u7684\u8fd8\u662f\u6545\u610f\u7684\u3002\u6211\u4eec\u9488\u5bf9\u9605\u8bfb\u7406\u89e3\u4efb\u52a1\u63a2\u8ba8\u4e86LLMs\u5728\u6b3a\u9a97\u6027\u8f85\u52a9\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5c06\u5176\u4f5c\u4e3a\u4eba\u7c7b\u7528\u6237\u7684\u4ee3\u7406\u3002\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u4e09\u79cd\u60c5\u51b5\uff1a\uff081\uff09\u6a21\u578b\u88ab\u63d0\u793a\u63d0\u4f9b\u771f\u5b9e\u4fe1\u606f\uff0c\uff082\uff09\u6a21\u578b\u88ab\u63d0\u793a\u8fdb\u884c\u5fae\u5999\u8bef\u5bfc\uff0c\u4ee5\u53ca\uff083\uff09\u6a21\u578b\u88ab\u63d0\u793a\u652f\u6301\u9519\u8bef\u7b54\u6848\u3002\u7ed3\u679c\u663e\u793a\uff0cGPT-4\u80fd\u591f\u6709\u6548\u8bef\u5bfcGPT-3.5-Turbo\u548cGPT-4\u81ea\u8eab\uff0c\u6b3a\u9a97\u6027\u52a9\u624b\u5bfc\u81f4\u4efb\u52a1\u51c6\u786e\u7387\u4e0b\u964d\u9ad8\u8fbe23%\uff0c\u76f8\u6bd4\u4e8e\u4f7f\u7528\u771f\u5b9e\u52a9\u624b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\u5411\u7528\u6237\u6a21\u578b\u63d0\u4f9b\u66f4\u591a\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u53ef\u4ee5\u90e8\u5206\u62b5\u6d88\u6b3a\u9a97\u6a21\u578b\u7684\u5f71\u54cd\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86LLMs\u751f\u6210\u8bef\u5bfc\u6027\u4fe1\u606f\u7684\u80fd\u529b\u53ca\u5176\u5728\u73b0\u5b9e\u573a\u666f\u4e2d\u7684\u6f5c\u5728\u5f71\u54cd\u3002|\n", "2407.12735": "|**2024-07-17**|**EchoSight: Advancing Visual-Language Models with Wiki Knowledge**|Yibin Yan et.al.|[2407.12735](http://arxiv.org/abs/2407.12735)|null|**\u6458\u8981\uff1a** \u77e5\u8bc6\u9a71\u52a8\u7684\u89c6\u89c9\u95ee\u7b54\uff08KVQA\uff09\u4efb\u52a1\u8981\u6c42\u5229\u7528\u4e30\u5bcc\u80cc\u666f\u77e5\u8bc6\u89e3\u7b54\u56fe\u50cf\u76f8\u5173\u95ee\u9898\uff0c\u4f46\u751f\u6210\u6a21\u578b\u5728\u8fd9\u65b9\u9762\u5e38\u9762\u4e34\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faEchoSight\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Retrieval-Augmented Generation\uff0cRAG\uff09\u6846\u67b6\uff0c\u65e8\u5728\u5e2e\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5904\u7406\u9700\u8981\u8be6\u5c3d\u767e\u79d1\u77e5\u8bc6\u7684\u89c6\u89c9\u95ee\u7b54\u3002EchoSight\u9996\u5148\u4ec5\u4f7f\u7528\u56fe\u50cf\u4fe1\u606f\u5728\u7ef4\u57fa\u767e\u79d1\u4e2d\u641c\u7d22\u6587\u7ae0\uff0c\u7136\u540e\u5bf9\u5019\u9009\u6587\u7ae0\u6839\u636e\u5b83\u4eec\u4e0e\u6587\u672c-\u56fe\u50cf\u67e5\u8be2\u7684\u76f8\u5173\u6027\u8fdb\u884c\u4e8c\u6b21\u6392\u5e8f\uff0c\u4ece\u800c\u663e\u8457\u63d0\u5347\u591a\u6a21\u6001\u77e5\u8bc6\u7684\u6574\u5408\uff0c\u8fdb\u800c\u63d0\u9ad8\u68c0\u7d22\u6548\u679c\u548c\u7b54\u6848\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u5728Encyclopedic VQA\u548cInfoSeek\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEchoSight\u5728\u77e5\u8bc6\u578b\u89c6\u89c9\u95ee\u7b54\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684state-of-the-art\u6210\u7ee9\uff0cEncyclopedic VQA\u4efb\u52a1\u4e0a\u8fbe\u523041.8%\u7684\u51c6\u786e\u7387\uff0cInfoSeek\u4efb\u52a1\u4e0a\u8fbe\u523031.3%\u3002|\n", "2407.12727": "|**2024-07-17**|**NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model**|Zhongqun Zhang et.al.|[2407.12727](http://arxiv.org/abs/2407.12727)|null|### \u80cc\u666f \u5728\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u91cd\u5efa\u4e2d\uff0c\u7cbe\u786e\u7684\u624b\u90e8\u4e0e\u7269\u4f53\u4e4b\u95f4\u7684\u7269\u7406\u63a5\u89e6\u662f\u63d0\u5347\u624b\u90e8\u59ff\u6001\u4f30\u8ba1\u51c6\u786e\u6027\u548c\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u7684\u6807\u51c6\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u96be\u4ee5\u5b9a\u4e49\u6216\u63a7\u5236\u7684\u51e0\u4f55\u7ea6\u675f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\uff1a\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u8fdb\u884c\u53ef\u63a7\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u5efa\u6a21\u3002\u9762\u4e34\u7684\u6311\u6218\u5305\u62ec\uff1a\u4e00\u3001\u4ece\u8bed\u8a00\u5230\u63a5\u89e6\u7684\u590d\u6742\u8de8\u6a21\u6001\u5efa\u6a21\uff1b\u4e8c\u3001\u7f3a\u4e4f\u63cf\u8ff0\u63a5\u89e6\u6a21\u5f0f\u7684\u6587\u672c\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86NL2Contact\u6a21\u578b\uff0c\u5b83\u5229\u7528\u5206\u6bb5\u6269\u6563\u6a21\u578b\u751f\u6210\u53ef\u63a7\u5236\u7684\u63a5\u89e6\u3002\u7ed9\u5b9a\u5bf9\u624b\u548c\u63a5\u89e6\u7684\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\uff0cNL2Contact\u80fd\u591f\u751f\u6210\u903c\u771f\u4e14\u5fe0\u5b9e\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002 ### \u4efb\u52a1 \u6211\u4eec\u5f00\u53d1\u4e86NL2Contact\u6a21\u578b\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u751f\u6210\u5177\u6709\u63a7\u5236\u6027\u7684\u4e09\u7ef4\u624b\u90e8-\u7269\u4f53\u63a5\u89e6\u3002\u4e3a\u8bad\u7ec3\u8fd9\u4e2a\u6a21\u578b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u9996\u4e2a\u540d\u4e3a\\textit{ContactDescribe}\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u57fa\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\uff08\u5982\u6293\u53d6\u52a8\u4f5c\u3001\u6293\u53d6\u7c7b\u578b\u3001\u63a5\u89e6\u4f4d\u7f6e\u548c\u81ea\u7531\u624b\u6307\u72b6\u6001\uff09\u751f\u6210\u7684\u4e30\u5bcc\u591a\u6837\u7684\u624b\u90e8\u4e2d\u5fc3\u63a5\u89e6\u63cf\u8ff0\u3002\u6211\u4eec\u7684\u6a21\u578b\u5728\u4f18\u5316\u6293\u63e1\u59ff\u52bf\u548c\u57fa\u4e8e\u6587\u672c\u63cf\u8ff0\u751f\u6210\u65b0\u7684\u4eba\u7c7b\u6293\u63e1\u52a8\u4f5c\u65b9\u9762\u5c55\u793a\u4e86\u5e94\u7528\u6f5c\u529b\u3002|\n", "2407.12725": "|**2024-07-17**|**Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?**|Ben Yao et.al.|[2407.12725](http://arxiv.org/abs/2407.12725)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u89e3\u51b3\u590d\u6742\u95ee\u9898\u7684\u80fd\u529b\u65b9\u9762\uff0c\u901a\u8fc7\u9010\u6b65\u63a8\u7406\u6b65\u9aa4\u7684\u6269\u5c55\u663e\u8457\u63d0\u5347\u5176\u6027\u80fd\uff0c\u56e0\u4e3a\u8fd9\u4fc3\u4f7f\u6a21\u578b\u8fdb\u884c\u5e8f\u5217\u601d\u8003\u3002\u7136\u800c\uff0c\u4eba\u7c7b\u5bf9\u8bbd\u523a\u7684\u7406\u89e3\u901a\u5e38\u88ab\u89c6\u4e3a\u4e00\u79cd\u76f4\u89c9\u4e14\u6574\u4f53\u7684\u8ba4\u77e5\u8fc7\u7a0b\uff0c\u5b83\u6574\u5408\u4e86\u8bed\u8a00\u3001\u4e0a\u4e0b\u6587\u548c\u60c5\u611f\u7ebf\u7d22\uff0c\u5f62\u6210\u5bf9\u8bf4\u8bdd\u8005\u771f\u5b9e\u610f\u56fe\u7684\u5168\u9762\u7406\u89e3\uff0c\u8fd9\u79cd\u7406\u89e3\u88ab\u8ba4\u4e3a\u4e0d\u5c40\u9650\u4e8e\u4e00\u6b65\u6b65\u7684\u63a8\u7406\u8fc7\u7a0b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u89c2\u70b9\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u6846\u67b6\uff0c\u79f0\u4e3aSarcasmCue\uff0c\u5b83\u5305\u542b\u4e86\u56db\u79cd\u63d0\u793a\u7b56\u7565\uff1a\u8fde\u9501\u77db\u76fe\uff08CoC\uff09\u3001\u7ebf\u7d22\u56fe\uff08GoC\uff09\u3001\u7ebf\u7d22\u96c6\u5408\uff08BoC\uff09\u548c\u7ebf\u7d22\u5f20\u91cf\uff08ToC\uff09\u3002\u8fd9\u4e9b\u65b9\u6cd5\u65e8\u5728\u5f15\u5bfcLLMs\u901a\u8fc7\u8003\u8651\u987a\u5e8f\u548c\u975e\u987a\u5e8f\u63d0\u793a\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u8bbd\u523a\u3002\u6211\u4eec\u5728\u56db\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5168\u9762\u5b9e\u8bc1\u6bd4\u8f83\u8868\u660e\uff0c\u6211\u4eec\u7684\u56db\u79cd\u63d0\u793a\u65b9\u6cd5\u660e\u663e\u4f18\u4e8e\u6807\u51c6\u7684\u8f93\u5165-\u8f93\u51fa\u63d0\u793a\u3001CoT\u548cToT\uff0c\u800c\u4e14\u975e\u987a\u5e8f\u63d0\u793a\u901a\u5e38\u4f18\u4e8e\u987a\u5e8f\u63d0\u793a\u3002|\n", "2407.12723": "|**2024-07-17**|**The Future of Learning: Large Language Models through the Lens of Students**|He Zhang et.al.|[2407.12723](http://arxiv.org/abs/2407.12723)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0d\u65ad\u53d1\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u63d0\u5347\u548c\u529f\u80fd\u6269\u5c55\u5bf9\u6559\u80b2\u9886\u57df\u4ea7\u751f\u4e86\u663e\u8457\u5f71\u54cd\u3002\u672c\u7814\u7a76\u901a\u8fc7\u8bbf\u8c0814\u540d\u5b66\u751f\uff0c\u63a2\u8ba8\u4ed6\u4eec\u65e5\u5e38\u4e0eChatGPT\u7684\u4e92\u52a8\u3002\u521d\u6b65\u7ed3\u679c\u663e\u793a\uff0c\u5b66\u751f\u4eec\u5728\u4eab\u53d7ChatGPT\u63d0\u9ad8\u5b66\u4e60\u6548\u7387\u548c\u4fe1\u606f\u83b7\u53d6\u4fbf\u5229\u7684\u540c\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4fe1\u4efb\u5371\u673a\u548c\u4f26\u7406\u987e\u8651\u3002\u4ed6\u4eec\u8ba4\u4e3aChatGPT\u76f8\u8f83\u4e8e\u4f20\u7edfAI\u66f4\u663e\u201c\u4eba\u6027\u5316\u201d\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u77db\u76fe\u60c5\u7eea\u3001\u884c\u4e3a\u4e0d\u4e00\u81f4\u4ee5\u53ca\u5bf9\u5b66\u751f\u6574\u4f53\u4e0a\u79ef\u6781\u7684\u6001\u5ea6\uff0c\u51f8\u663e\u4e86ChatGPT\u5728\u6559\u80b2\u9886\u57df\u7684\u6f5c\u5728\u4ef7\u503c\u3002\u4f46\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5c3d\u7ba1\u5176\u667a\u80fd\u7a0b\u5ea6\u9ad8\uff0c\u53ef\u80fd\u5e26\u6765\u8d1f\u9762\u6548\u5e94\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f3a\u8c03\u5728\u5e94\u7528\u65f6\u9700\u8c28\u614e\uff0c\u5e76\u81f4\u529b\u4e8e\u5728\u672a\u6765\u7684\u5f00\u53d1\u4e2d\u51cf\u5c11\u6f5c\u5728\u7684\u5371\u5bb3\u3002|\n", "2407.12709": "|**2024-07-17**|**MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models**|Leyang Shen et.al.|[2407.12709](http://arxiv.org/abs/2407.12709)|**[link](https://github.com/jiutian-vl/mome)**|**\u5728\u591a\u9879\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5c55\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u901a\u7528\u7684MLLM\u5728\u5927\u591a\u6570VL\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u4e0d\u5982\u4e13\u95e8\u5316\u7684MLLM\uff0c\u8fd9\u662f\u56e0\u4e3a\u5b58\u5728\u4efb\u52a1\u5e72\u6270\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u591a\u6a21\u6001\u4e13\u5bb6\uff08MoME\uff09\u67b6\u6784\uff0c\u65e8\u5728\u51cf\u8f7b\u4efb\u52a1\u5e72\u6270\uff0c\u4ece\u800c\u83b7\u5f97\u4e00\u4e2a\u5168\u80fd\u7684MLLM\u3002MoME\u4e3b\u8981\u7531\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\u6784\u6210\uff1a\u89c6\u89c9\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoVE\uff09\u548c\u8bed\u8a00\u4e13\u5bb6\u6df7\u5408\u4f53\uff08MoLE\uff09\u3002MoVE\u80fd\u591f\u81ea\u9002\u5e94\u5730\u8c03\u6574\u6765\u81ea\u4e0d\u540c\u89c6\u89c9\u7f16\u7801\u5668\u7684\u7279\u5f81\uff0c\u5e76\u5728\u8f6c\u6362\u67b6\u6784\u4e0a\u5177\u6709\u826f\u597d\u7684\u517c\u5bb9\u6027\u3002MoLE\u901a\u8fc7\u7a00\u758f\u95e8\u63a7\u4e13\u5bb6\u878d\u5165\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\uff0c\u5b9e\u73b0\u4e86\u51e0\u4e4e\u65e0\u989d\u5916\u6210\u672c\u7684\u6027\u80fd\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u4efb\u52a1\u5e72\u6270\uff0cMoME\u4e13\u6ce8\u4e8e\u89c6\u89c9\u548c\u8bed\u8a00\u4e24\u79cd\u6a21\u6001\uff0c\u4ee5\u9002\u5e94\u4efb\u52a1\u95f4\u7684\u5dee\u5f02\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoME\u663e\u8457\u63d0\u9ad8\u4e86\u901a\u7528MLLM\u5728\u5404\u79cdVL\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u6e90\u4ee3\u7801\u5df2\u5728https://github.com/JiuTian-VL/MoME\u4e0a\u53d1\u5e03\u3002**|\n", "2407.12665": "|**2024-07-17**|**Patch-Level Training for Large Language Models**|Chenze Shao et.al.|[2407.12665](http://arxiv.org/abs/2407.12665)|**[link](https://github.com/shaochenze/patchtrain)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u65b9\u9762\u53d6\u5f97\u663e\u8457\u8fdb\u6b65\uff0c\u5176\u8bad\u7ec3\u6548\u7387\u6210\u4e3a\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0cLLMs\u901a\u8fc7\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u8fdb\u884c\u8bad\u7ec3\u3002\u5c3d\u7ba1\u57fa\u4e8e\u4ee4\u724c\u7684\u8bad\u7ec3\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5176\u8ba1\u7b97\u6210\u672c\u9ad8\u6602\uff0c\u56e0\u4e3a\u9700\u8981\u5904\u7406\u5927\u91cf\u4ee4\u724c\u3002\u4e3a\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cpatch-level training\u201d\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u591a\u4e2a\u4ee4\u724c\u538b\u7f29\u6210\u5355\u4e2apatch\u6765\u7f29\u77ed\u5e8f\u5217\u957f\u5ea6\u3002\u5728patch-level\u8bad\u7ec3\u4e2d\uff0c\u6211\u4eec\u8f93\u5165\u66f4\u77ed\u7684patch\u5e8f\u5217\uff0c\u8ba9\u6a21\u578b\u5b66\u4e60\u9884\u6d4b\u4e0b\u4e00\u4e2apatch\uff0c\u4ece\u800c\u5927\u5e45\u5ea6\u51cf\u5c11\u4e86\u5927\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u5904\u7406\u6210\u672c\u3002\u63a5\u7740\uff0c\u6a21\u578b\u4f1a\u8fdb\u884c\u5269\u4f59\u8bad\u7ec3\u6570\u636e\u7684\u4ee4\u724c\u7ea7\u8bad\u7ec3\uff0c\u4ee5\u9002\u5e94\u63a8\u7406\u6a21\u5f0f\u3002\u5b9e\u9a8c\u5728\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08370M-2.7\u4ebf\u53c2\u6570\uff09\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660epatch-level\u8bad\u7ec3\u53ef\u4ee5\u5c06\u603b\u4f53\u8ba1\u7b97\u6210\u672c\u964d\u4f4e\u81f30.5\u500d\uff0c\u540c\u65f6\u4e0d\u4f1a\u5f71\u54cd\u6a21\u578b\u6027\u80fd\u3002\u6e90\u4ee3\u7801\u53ef\u5728\u6b64\u83b7\u53d6\uff1a\\url{https://github.com/shaochenze/PatchTrain}\u3002**|\n", "2407.12642": "|**2024-07-17**|**Zero-shot Text-guided Infinite Image Synthesis with LLM guidance**|Soyeong Kwon et.al.|[2407.12642](http://arxiv.org/abs/2407.12642)|null|**\u80cc\u666f\uff1a** \u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u7f16\u8f91\u548c\u751f\u6210\u65b9\u6cd5\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002\u7136\u800c\uff0c\u6587\u672c\u5f15\u5bfc\u7684\u65e0\u9650\u56fe\u50cf\u5408\u6210\u9762\u4e34\u7740\u4e00\u4e9b\u6311\u6218\u3002\u9996\u5148\uff0c\u7f3a\u4e4f\u9ad8\u5206\u8fa8\u7387\u4e14\u5177\u6709\u4e30\u5bcc\u60c5\u5883\u591a\u6837\u6027\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u6570\u636e\u96c6\u3002\u5176\u6b21\uff0c\u6839\u636e\u6587\u672c\u6269\u5c55\u56fe\u50cf\u9700\u8981\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u4e30\u5bcc\u7684\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7c7b\u522b\uff0c\u5982\u81ea\u7136\u98ce\u666f\uff0c\u4e14\u9700\u8981\u5728\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u53ca\u5176\u914d\u6587\u4e0a\u8fdb\u884c\u8bad\u7ec3\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u540c\u65f6\u5904\u7406\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u65e0\u9700\u4efb\u4f55\u9ad8\u5206\u8fa8\u7387\u7684\u6587\u672c-\u56fe\u50cf\u914d\u5bf9\u8bad\u7ec3\u6570\u636e\u3002 **\u65b9\u6cd5\uff1a** \u6211\u4eec\u5728\u8bad\u7ec3\u6269\u6563\u6a21\u578b\u65f6\uff0c\u8ba9\u5b83\u6839\u636eLLM\u751f\u6210\u7684\u5168\u5c40\u548c\u5c40\u90e8\u63cf\u8ff0\u4ee5\u53ca\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u7ed9\u5b9a\u4e00\u5f20\u56fe\u7247\u548c\u4e00\u4e2a\u5168\u5c40\u63cf\u8ff0\uff0c\u6211\u4eec\u4f7f\u7528LLM\u751f\u6210\u4e0b\u4e00\u4e2a\u5c40\u90e8\u63cf\u8ff0\u6765\u6269\u5c55\u8f93\u5165\u56fe\u50cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u7ed3\u5408\u5168\u5c40\u63cf\u8ff0\u3001\u751f\u6210\u7684\u5c40\u90e8\u63cf\u8ff0\u548c\u89c6\u89c9\u7279\u5f81\u6765\u6269\u5c55\u56fe\u50cf\uff0c\u4ee5\u786e\u4fdd\u5168\u5c40\u4e00\u81f4\u6027\u5e76\u8003\u8651\u7a7a\u95f4\u5c40\u90e8\u4e0a\u4e0b\u6587\u3002 **\u5b9e\u9a8c\u7ed3\u679c\uff1a** \u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5b9a\u91cf\u548c\u5b9a\u6027\u4e0a\u90fd\u4f18\u4e8e\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5c55\u793a\u4e86\u5728\u96f6\u6837\u672c\u60c5\u51b5\u4e0b\uff0c\u501f\u52a9LLM\u5f15\u5bfc\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u7684\u80fd\u529b\u3002 \u603b\u7ed3\uff1a \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u5f15\u5bfc\u7684\u56fe\u50cf\u6269\u5c55\u65b9\u6cd5\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u5206\u8fa8\u7387\u7684\u914d\u5bf9\u6570\u636e\uff0c\u80fd\u591f\u5b9e\u73b0\u5168\u5c40\u8fde\u8d2f\u6027\u548c\u5c40\u90e8\u4e0a\u4e0b\u6587\u7406\u89e3\uff0c\u5e76\u5728\u5b9e\u9a8c\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u652f\u6301\u96f6\u6837\u672c\u4efb\u610f\u5927\u5c0f\u56fe\u50cf\u751f\u6210\u3002|\n", "2407.12620": "|**2024-07-17**|**Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences**|Claudio Pinhanez et.al.|[2407.12620](http://arxiv.org/abs/2407.12620)|null|\u81ea2022\u5e74\u4ee5\u6765\uff0c\u6211\u4eec\u4e00\u76f4\u5728\u63a2\u7d22\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u73b0\u4ee3\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u9886\u57df\uff0c\u4ee5\u652f\u6301\u548c\u4fc3\u8fdb\u6fd2\u4e34\u6d88\u5931\u7684\u571f\u8457\u8bed\u8a00\u7684\u4f7f\u7528\u4e0e\u6587\u6863\u5316\u3002\u9996\u5148\uff0c\u6211\u4eec\u5173\u6ce8\u4e16\u754c\u8bed\u8a00\u591a\u6837\u6027\u7684\u51cf\u5c11\uff0c\u5e76\u8ba8\u8bba\u4e0e\u5904\u7406\u571f\u8457\u8bed\u8a00\u76f8\u5173\u7684\u72ec\u7279\u4f26\u7406\u6311\u6218\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u533a\u53c2\u4e0e\u548c\u4f7f\u7528\u7684AI\u5f00\u53d1\u65b0\u5faa\u73af\u3002\u63a5\u7740\uff0c\u6211\u4eec\u62a5\u544a\u4e86\u4f7f\u7528\u5c11\u91cf\u6570\u636e\u5fae\u8c03\u6700\u5148\u8fdb\u7684\u7ffb\u8bd1\u5668\uff0c\u6210\u529f\u5f00\u53d1\u51fa\u9ad8\u8d28\u91cf\u7684\u571f\u8457\u8bed\u8a00\u673a\u5668\u7ffb\u8bd1\u7684\u9f13\u821e\u4eba\u5fc3\u7684\u6210\u679c\uff0c\u5e76\u8ba8\u8bba\u4e86\u907f\u514d\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e9b\u5e38\u89c1\u9677\u9631\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e862023\u5e74\u548c2024\u5e74\u5728\u5df4\u897f\u4e0e\u571f\u8457\u793e\u533a\u5408\u4f5c\u9879\u76ee\u4e2d\u7684\u539f\u578b\uff0c\u76ee\u6807\u662f\u7b80\u5316\u5199\u4f5c\uff0c\u4ee5\u53ca\u53d1\u5c55\u571f\u8457\u8bed\u8a00\u6a21\u578b\uff08ILMs\uff09\u4f5c\u4e3a\u521b\u5efa\u62fc\u5199\u68c0\u67e5\u5668\u3001\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u5668\u7b49\u5de5\u5177\u7684\u53ef\u590d\u5236\u548c\u53ef\u6269\u5c55\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u671b\u4e00\u4e2a\u672a\u6765\uff0c\u6fd2\u5371\u7684\u8bed\u8a00\u5c06\u901a\u8fc7\u4e92\u52a8\u7684\u8bed\u8a00\u6a21\u578b\u5f97\u4ee5\u4fdd\u5b58\u3002|\n", "2407.12613": "|**2024-07-17**|**AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism**|William Brannon et.al.|[2407.12613](http://arxiv.org/abs/2407.12613)|**[link](https://github.com/mit-ccc/AudienceView-demo)**|****\u80cc\u666f\uff1a** \u8bb0\u8005\u7406\u89e3\u548c\u5229\u7528\u53d7\u4f17\u53cd\u9988\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5982\u4eca\u4ed6\u4eec\u5728\u7ebf\u9762\u4e34\u5927\u91cf\u89c2\u4f17\u8bc4\u8bba\uff0c\u8fd9\u662f\u4e00\u9879\u8270\u5de8\u7684\u4efb\u52a1\u3002\u6211\u4eec\u63a8\u51fa\u4e86AudienceView\uff0c\u4e00\u4e2a\u5728\u7ebf\u5de5\u5177\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5e2e\u52a9\u8bb0\u8005\u5bf9\u8fd9\u4e9b\u53cd\u9988\u8fdb\u884c\u5206\u7c7b\u548c\u89e3\u8bfb\u3002AudienceView\u8bc6\u522b\u4e3b\u9898\u548c\u8bdd\u9898\uff0c\u5c06\u5b83\u4eec\u4e0e\u7279\u5b9a\u8bc4\u8bba\u5173\u8054\uff0c\u5c55\u793a\u8bc4\u8bba\u7684\u60c5\u611f\u503e\u5411\u548c\u5206\u5e03\uff0c\u5e76\u534f\u52a9\u7528\u6237\u6784\u601d\u540e\u7eed\u62a5\u9053\u9879\u76ee\u3002\u6211\u4eec\u5c06\u63a2\u8ba8\u8fd9\u7c7b\u5de5\u5177\u5982\u4f55\u878d\u5165\u8bb0\u8005\u7684\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e76\u5f3a\u8c03\u60c5\u5883\u7406\u89e3\u53ca\u4eba\u7c7b\u5224\u65ad\u7684\u91cd\u8981\u6027\u3002 \u8bf7\u8bb0\u4f4f\uff0c\u4ee5\u4e0a\u7ffb\u8bd1\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002**|\n", "2407.12580": "|**2024-07-17**|**E5-V: Universal Embeddings with Multimodal Large Language Models**|Ting Jiang et.al.|[2407.12580](http://arxiv.org/abs/2407.12580)|**[link](https://github.com/kongds/e5-v)**|**### \u80cc\u666f \u5927\u89c4\u6a21\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u901a\u7528\u89c6\u89c9\u548c\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5982\u4f55\u5229\u7528MLLMs\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u7684\u8868\u793a\u65b9\u5f0f\u5c1a\u672a\u5145\u5206\u7814\u7a76\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u6846\u67b6E5-V\uff0c\u65e8\u5728\u4f7fMLLMs\u9002\u5e94\u5b9e\u73b0\u901a\u7528\u591a\u6a21\u6001\u5d4c\u5165\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\uff0cMLLMs\u5728\u5904\u7406\u591a\u6a21\u6001\u8f93\u5165\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\u3002\u901a\u8fc7\u7ed3\u5408\u63d0\u793a\uff0cE5-V\u6709\u6548\u5730\u5f25\u5408\u4e86\u4e0d\u540c\u7c7b\u578b\u8f93\u5165\u4e4b\u95f4\u7684\u6a21\u6001\u9e3f\u6c9f\uff0c\u5373\u4f7f\u5728\u65e0\u9700\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\u4e5f\u80fd\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u591a\u6a21\u6001\u5d4c\u5165\u80fd\u529b\u3002 ### \u65b9\u6cd5 E5-V\u91c7\u7528\u5355\u4e00\u6a21\u6001\u8bad\u7ec3\u7b56\u7565\uff0c\u4ec5\u4f7f\u7528\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u57fa\u4e8e\u56fe\u50cf-\u6587\u672c\u5bf9\u7684\u591a\u6a21\u6001\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86\u5927\u7ea695%\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u907f\u514d\u4e86\u6536\u96c6\u6602\u8d35\u7684\u591a\u6a21\u6001\u8bad\u7ec3\u6570\u636e\u7684\u9700\u6c42\u3002\u5b9e\u9a8c\u5728\u56db\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u9a8c\u8bc1\uff0c\u4ee5\u5c55\u793aE5-V\u7684\u6709\u6548\u6027\u3002 ### \u7ed3\u679c \u4f5c\u4e3a\u4e00\u6b3e\u901a\u7528\u591a\u6a21\u6001\u6a21\u578b\uff0cE5-V\u4e0d\u4ec5\u5728\u5404\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9876\u5c16\u6027\u80fd\uff0c\u751a\u81f3\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\u8d85\u8d8a\u4e86\u73b0\u6709\u6280\u672f\u6c34\u5e73\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u662f\u57fa\u4e8e\u5355\u6a21\u6001\u8bad\u7ec3\u5b8c\u6210\u7684\u3002**|\n", "2407.13761": "|**2024-07-18**|**SegPoint: Segment Any Point Cloud via Large Language Model**|Shuting He et.al.|[2407.13761](http://arxiv.org/abs/2407.13761)|null|\u5c3d\u7ba1\u4e09\u7ef4\u70b9\u4e91\u5206\u5272\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\uff0c\u4f9d\u8d56\u4e8e\u660e\u786e\u7684\u6307\u4ee4\u6765\u8bc6\u522b\u76ee\u6807\uff0c\u7f3a\u4e4f\u5728\u7edf\u4e00\u6846\u67b6\u4e2d\u7406\u89e3\u548c\u63a8\u65ad\u7528\u6237\u9690\u542b\u610f\u56fe\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSegPoint\u7684\u6a21\u578b\uff0c\u5b83\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\uff0c\u5728\u591a\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884c\u70b9\u7ea7\u5206\u5272\uff1a1\uff09\u4e09\u7ef4\u6307\u4ee4\u5206\u5272\uff0c2\uff09\u4e09\u7ef4\u6307\u79f0\u5206\u5272\uff0c3\uff09\u4e09\u7ef4\u8bed\u4e49\u5206\u5272\uff0c\u4ee5\u53ca4\uff09\u4e09\u7ef4\u5f00\u653e\u8bcd\u6c47\u8bed\u4e49\u5206\u5272\u3002\u4e3a\u4e86\u63a8\u52a8\u4e09\u7ef4\u6307\u4ee4\u7814\u7a76\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6Instruct3D\uff0c\u7528\u4e8e\u8bc4\u4f30\u4ece\u590d\u6742\u548c\u9690\u542b\u6307\u4ee4\u6587\u672c\u8fdb\u884c\u5206\u5272\u6027\u80fd\uff0c\u5305\u542b2,565\u4e2a\u70b9\u4e91-\u6307\u4ee4\u5bf9\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cSegPoint\u5728ScanRefer\u6307\u79f0\u5206\u5272\u548cScanNet\u8bed\u4e49\u5206\u5272\u7b49\u65e2\u6709\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u7ade\u4e89\u529b\uff0c\u540c\u65f6\u5728Instruct3D\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u4f18\u5f02\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cSegPoint\u662f\u9996\u4e2a\u5728\u4e00\u4e2a\u6846\u67b6\u5185\u5904\u7406\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u5206\u5272\u4efb\u52a1\u5e76\u8fbe\u5230\u6ee1\u610f\u6027\u80fd\u7684\u6a21\u578b\u3002|\n", "2407.13757": "|**2024-07-18**|**Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models**|Zhuo Chen et.al.|[2407.13757](http://arxiv.org/abs/2407.13757)|null|## \u4efb\u52a1 \u672c\u7814\u7a76\u5173\u6ce8\u4e8eRetrieval-Augmented Generation\uff08RAG\uff09\u6a21\u578b\u5728\u9762\u5bf9\u9ed1\u76d2\u653b\u51fb\u65f6\u7684\u8106\u5f31\u6027\uff0c\u5c24\u5176\u662f\u5728\u610f\u89c1\u64cd\u7eb5\u65b9\u9762\u7684\u5e94\u7528\u3002RAG\u65e8\u5728\u89e3\u51b3\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5e7b\u89c9\u95ee\u9898\u548c\u5b9e\u65f6\u7ea6\u675f\uff0c\u4f46\u540c\u65f6\u4e5f\u66b4\u9732\u51fa\u5bf9\u6297\u68c0\u7d22\u7be1\u6539\u653b\u51fb\u7684\u5f31\u70b9\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u767d\u76d2\u548c\u5c01\u95ed\u9886\u57df\u95ee\u7b54\u4efb\u52a1\u4e2d\u7684RAG\u4e0d\u7a33\u5b9a\u6027\u3002\u672c\u6587\u7684\u76ee\u6807\u662f\u63ed\u793a\u5f53RAG\u6a21\u578b\u906d\u9047\u9ed1\u76d2\u653b\u51fb\u65f6\uff0c\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4e3a\u63d0\u9ad8\u6a21\u578b\u7684\u53ef\u9760\u6027\u548c\u5b89\u5168\u6027\u63d0\u4f9b\u65b0\u89c1\u89e3\u3002 \u6211\u4eec\u901a\u8fc7\u64cd\u63a7RAG\u4e2d\u68c0\u7d22\u6a21\u578b\u7684\u6392\u540d\u7ed3\u679c\uff0c\u5229\u7528\u8fd9\u4e9b\u64cd\u7eb5\u540e\u7684\u6570\u636e\u8bad\u7ec3\u4e00\u4e2a\u4ee3\u7406\u6a21\u578b\u3002\u63a5\u7740\uff0c\u91c7\u7528\u5bf9\u6297\u6027\u68c0\u7d22\u653b\u51fb\u65b9\u6cd5\u9488\u5bf9\u4ee3\u7406\u6a21\u578b\u5b9e\u65bd\u9ed1\u76d2\u8fc1\u79fb\u653b\u51fb\uff0c\u8fdb\u4e00\u6b65\u5f71\u54cdRAG\u7684\u751f\u6210\u8fc7\u7a0b\u3002\u5728\u6d89\u53ca\u591a\u4e2a\u4e3b\u9898\u7684\u610f\u89c1\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5b9e\u9a8c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u63d0\u51fa\u7684\u653b\u51fb\u7b56\u7565\u80fd\u663e\u8457\u6539\u53d8RAG\u751f\u6210\u5185\u5bb9\u7684\u89c2\u70b9\u6781\u6027\uff0c\u8fd9\u63ed\u793a\u4e86\u6a21\u578b\u7684\u6613\u53d7\u653b\u51fb\u6027\uff0c\u5e76\u4e14\u6f5c\u5728\u5730\u6307\u51fa\u5bf9\u7528\u6237\u8ba4\u77e5\u548c\u51b3\u7b56\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u4f7f\u5f97\u8bef\u5bfc\u7528\u6237\u63a5\u53d7\u9519\u8bef\u6216\u6709\u504f\u89c1\u7684\u4fe1\u606f\u53d8\u5f97\u66f4\u52a0\u5bb9\u6613\u3002|\n", "2407.13742": "|**2024-07-18**|**CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications**|Mirza Masfiqur Rahman et.al.|[2407.13742](http://arxiv.org/abs/2407.13742)|null|\u8fd1\u5e74\u6765\uff0c\u4eba\u4eec\u8d8a\u6765\u8d8a\u5173\u6ce8\u8702\u7a9d\u7f51\u7edc\u7684\u5b89\u5168\u6027\uff0c\u5e38\u5e38\u5c06\u5b89\u5168\u6f0f\u6d1e\u5f52\u548e\u4e8e\u5e95\u5c42\u534f\u8bae\u8bbe\u8ba1\u63cf\u8ff0\u7684\u95ee\u9898\u3002\u8fd9\u4e9b\u901a\u5e38\u957f\u8fbe\u6570\u5343\u9875\u7684\u8be6\u7ec6\u89c4\u683c\u6587\u6863\u53ef\u80fd\u5305\u542b\u9519\u8bef\u3001\u4e0d\u5b8c\u6574\u63cf\u8ff0\u3001\u9690\u542b\u5047\u8bbe\u548c\u5185\u90e8\u77db\u76fe\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51faCellularLint\u2014\u2014\u4e00\u4e2a\u9488\u5bf94G\u548c5G\u975e\u63a5\u5165\u5c42\uff08Non-Access Stratum\uff0cNAS\uff09\u548c\u5b89\u5168\u89c4\u8303\u7684\u534a\u81ea\u52a8\u6846\u67b6\uff0c\u5229\u7528\u4e00\u5957\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u9886\u57df\u9002\u5e94\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6539\u826f\u7684\u5c11\u91cf\u6837\u4f8b\u5b66\u4e60\u3002\u8be5\u6a21\u578b\u9884\u8bad\u7ec3\u5728\u5927\u91cf\u7684\u8702\u7a9d\u7f51\u7edc\u534f\u8bae\u6570\u636e\u4e0a\uff0c\u80fd\u591f\u540c\u65f6\u68c0\u6d4b\u4e0d\u540c\u8bed\u4e49\u5c42\u6b21\u548c\u5b9e\u9645\u4f7f\u7528\u6848\u4f8b\u4e2d\u7684\u4e0d\u4e00\u81f4\u6027\uff0c\u4ee5\u4e00\u79cd\u53ef\u6269\u5c55\u7684\u65b9\u5f0f\u63d0\u5347\u534f\u8bae\u89c4\u683c\u7684\u81ea\u52a8\u5316\u5206\u6790\u3002\u901a\u8fc7\u7814\u7a76\uff0c\u6211\u4eec\u57284G\u548c5G\u7f51\u7edc\u4e2d\u53d1\u73b0\u4e86157\u4e2a\u4e0d\u4e00\u81f4\u70b9\uff0c\u51c6\u786e\u7387\u4e3a82.67%\u3002\u7ecf\u8fc7\u5bf9\u5f00\u6e90\u5b9e\u73b0\u548c17\u6b3e\u5546\u7528\u8bbe\u5907\u7684\u9a8c\u8bc1\uff0c\u6211\u4eec\u786e\u8ba4\u8fd9\u4e9b\u4e0d\u4e00\u81f4\u786e\u5b9e\u5bf9\u8bbe\u8ba1\u51b3\u7b56\u6709\u91cd\u5927\u5f71\u54cd\uff0c\u53ef\u80fd\u5bfc\u81f4\u9690\u79c1\u3001\u5b8c\u6574\u6027\u3001\u53ef\u7528\u6027\u548c\u4e92\u64cd\u4f5c\u6027\u65b9\u9762\u7684\u62c5\u5fe7\u3002|\n", "2407.13729": "|**2024-07-18**|**Baba Is AI: Break the Rules to Beat the Benchmark**|Nathan Cloos et.al.|[2407.13729](http://arxiv.org/abs/2407.13729)|null|\u4eba\u7c7b\u89e3\u51b3\u95ee\u9898\u65e2\u4f9d\u8d56\u4e8e\u9075\u5faa\u73b0\u6709\u89c4\u5219\u548c\u7a0b\u5e8f\uff0c\u4e5f\u4f9d\u8d56\u4e8e\u521b\u65b0\u601d\u7ef4\u6765\u91cd\u65b0\u5b9a\u4e49\u89c4\u5219\u548c\u76ee\u6807\u3002\u4e3a\u4e86\u68c0\u9a8c\u8fd9\u4e9b\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u5b83\u57fa\u4e8e\u6e38\u620f\u300aBaba Is You\u300b\u3002\u5728\u8fd9\u4e2a\u6e38\u620f\u4e2d\uff0c\u4ee3\u7406\u9700\u8981\u64cd\u63a7\u73af\u5883\u4e2d\u7684\u7269\u4f53\u548c\u53ef\u79fb\u52a8\u7684\u6587\u5b57\u89c4\u5219\u74f7\u7816\uff0c\u4ee5\u5b9e\u73b0\u7279\u5b9a\u76ee\u6807\u5e76\u8d62\u5f97\u6bd4\u8d5b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u4e09\u79cd\u6700\u5148\u8fdb\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08OpenAI GPT-4\u3001Google Gemini-1.5-Pro\u548cGemini-1.5-Flash\uff09\uff0c\u53d1\u73b0\u5f53\u9700\u8981\u5bf9\u6e38\u620f\u89c4\u5219\u8fdb\u884c\u64cd\u7eb5\u548c\u7ec4\u5408\u65f6\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u5927\u5e45\u4e0b\u6ed1\u3002|\n", "2407.13717": "|**2024-07-18**|**CoDefeater: Using LLMs To Find Defeaters in Assurance Cases**|Usman Gohar et.al.|[2407.13717](http://arxiv.org/abs/2407.13717)|**[link](https://gitlab.com/anonymousdot/codefeater)**|\u6784\u5efa\u4fdd\u8bc1\u6848\u4f8b\u662f\u4e00\u79cd\u5e38\u7528\u4e14\u6709\u65f6\u5fc5\u8981\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u8bc1\u660e\u5b89\u5168\u5173\u952e\u7cfb\u7edf\u5728\u5176\u89c4\u5212\u73af\u5883\u4e2d\u5c06\u5b89\u5168\u8fd0\u884c\u3002\u4e3a\u4e86\u964d\u4f4e\u9519\u8bef\u548c\u8fb9\u7f18\u60c5\u51b5\u9057\u6f0f\u7684\u98ce\u9669\uff0c\u5f15\u5165\u4e86\u201c\u53cd\u9a73\u201d\u6982\u5ff5\uff0c\u5373\u6311\u6218\u4fdd\u8bc1\u6848\u4f8b\u4e2d\u8bba\u70b9\u6216\u8bc1\u636e\u7684\u8bba\u636e\u3002\u53cd\u9a73\u6709\u52a9\u4e8e\u53ca\u65f6\u53d1\u73b0\u8bba\u70b9\u4e2d\u7684\u5f31\u70b9\uff0c\u4fc3\u4f7f\u8fdb\u4e00\u6b65\u8c03\u67e5\u548c\u53ca\u65f6\u8865\u6551\u3002\u7136\u800c\uff0c\u6355\u6349\u53cd\u9a73\u4f9d\u8d56\u4e8e\u4e13\u5bb6\u5224\u65ad\u3001\u7ecf\u9a8c\u548c\u521b\u65b0\u601d\u7ef4\uff0c\u5e76\u4e14\u5fc5\u987b\u968f\u7740\u9700\u6c42\u548c\u6cd5\u89c4\u7684\u53d8\u5316\u8fdb\u884c\u8fed\u4ee3\u3002\u8fd9\u7bc7\u8bba\u6587\u63d0\u51faCoDefeater\uff0c\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6765\u81ea\u52a8\u5bfb\u627e\u53cd\u9a73\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u521d\u6b65\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u627e\u5230\u5df2\u77e5\u548c\u672a\u77e5\u7684\u5408\u7406\u53cd\u9a73\uff0c\u4ece\u800c\u5e2e\u52a9\u5b89\u5168\u5206\u6790\u5e08\u589e\u5f3a\u4fdd\u8bc1\u6848\u4f8b\u7684\u5b8c\u6574\u6027\u548c\u4fe1\u5fc3\u3002|\n", "2407.13709": "|**2024-07-18**|**Understanding Reference Policies in Direct Preference Optimization**|Yixin Liu et.al.|[2407.13709](http://arxiv.org/abs/2407.13709)|**[link](https://github.com/yale-nlp/refdpo)**|## \u80cc\u666f \u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08Direct Preference Optimization\uff0c\u7b80\u79f0 DPO\uff09\u5df2\u6210\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u6307\u4ee4\u5fae\u8c03\u7684\u5e38\u7528\u8bad\u7ec3\u65b9\u6cd5\u3002\u672c\u7814\u7a76\u5173\u6ce8DPO\u7684\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u8ba8\u7684\u65b9\u9762\uff1a\u5176\u5bf9\u53c2\u8003\u6a21\u578b\u6216\u7b56\u7565\u7684\u4f9d\u8d56\u6027\u3002\u8fd9\u4e9b\u53c2\u8003\u7b56\u7565\u901a\u5e38\u8868\u73b0\u4e3a\u5f85\u8fdb\u4e00\u6b65\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u5b83\u4eec\u5bf9\u4e8eDPO\u7684\u6548\u679c\u81f3\u5173\u91cd\u8981\u3002\u56e0\u6b64\uff0c\u672c\u5de5\u4f5c\u9488\u5bf9\u4ee5\u4e0b\u4e09\u4e2a\u76f8\u5173\u95ee\u9898\u8fdb\u884c\u4e86\u63a2\u7a76\uff1a 1. \u9996\u5148\uff0c\u6211\u4eec\u7814\u7a76\u4e86DPO\u4e2d\u7684KL\u6563\u5ea6\u7ea6\u675f\u5f3a\u5ea6\u7684\u6700\u4f73\u9009\u62e9\uff0c\u8be5\u7ea6\u675f\u60e9\u7f5a\u4e0e\u53c2\u8003\u7b56\u7565\u7684\u504f\u79bb\uff0c\u53d1\u73b0DPO\u5bf9\u6b64\u654f\u611f\u3002 2. \u5176\u6b21\uff0c\u6211\u4eec\u4ece\u7406\u8bba\u548c\u5b9e\u8bc1\u4e0a\u6bd4\u8f83\u4e86DPO\u4e0e\u5176\u4ed6\u5b66\u4e60\u76ee\u6807\uff0c\u4ee5\u63a2\u8ba8\u53c2\u8003\u7b56\u7565\u5728\u6307\u4ee4\u5fae\u8c03\u4e2d\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u663e\u793a\u4e86DPO\u7684\u4f18\u52bf\u3002 3. \u6700\u540e\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u662f\u5426\u6709\u5229\u4e8eDPO\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5f53\u53c2\u8003\u7b56\u7565\u4e0e\u88ab\u5fae\u8c03\u6a21\u578b\u76f8\u4f3c\u65f6\uff0c\u66f4\u5f3a\u7684\u53c2\u8003\u7b56\u7565\u53ef\u80fd\u4f1a\u63d0\u9ad8\u6027\u80fd\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u53c2\u8003\u7b56\u7565\u5728DPO\u4e2d\u7684\u6df7\u6dc6\u4f5c\u7528\uff0c\u63d0\u4f9b\u4e86\u6700\u4f73\u5b9e\u8df5\u7684\u89c1\u89e3\uff0c\u540c\u65f6\u4e5f\u4e3a\u672a\u6765\u7814\u7a76\u63d0\u51fa\u4e86\u5f00\u653e\u6027\u95ee\u9898\u3002|\n", "2407.13699": "|**2024-07-18**|**A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice**|Shaina Raza et.al.|[2407.13699](http://arxiv.org/abs/2407.13699)|null|## \u80cc\u666f \u63a8\u8350\u7cfb\u7edf\uff08RS\uff09\u901a\u8fc7\u63d0\u4f9b\u4e2a\u6027\u5316\u9879\u76ee\u5efa\u8bae\uff0c\u5bf9\u63d0\u5347\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u672c\u7efc\u8ff0\u56de\u987e\u4e86\u4ece2017\u5e74\u81f32024\u5e74\u95f4RS\u9886\u57df\u7684\u8fdb\u5c55\uff0c\u5c06\u7406\u8bba\u521b\u65b0\u4e0e\u5b9e\u9645\u5e94\u7528\u7d27\u5bc6\u7ed3\u5408\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u4ece\u4f20\u7edf\u65b9\u6cd5\u5982\u57fa\u4e8e\u5185\u5bb9\u548c\u534f\u540c\u8fc7\u6ee4\u7684\u63a8\u8350\uff0c\u5230\u9ad8\u7ea7\u6280\u672f\u5982\u6df1\u5ea6\u5b66\u4e60\u3001\u56fe\u6a21\u578b\u3001\u5f3a\u5316\u5b66\u4e60\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5173\u6ce8\u4e86\u4e13\u95e8\u5316\u7684\u7cfb\u7edf\uff0c\u5982\u4e0a\u4e0b\u6587\u611f\u77e5\u3001\u8bc4\u8bba\u9a71\u52a8\u548c\u516c\u5e73\u6027\u8003\u91cf\u7684RS\u3002\u672c\u8c03\u67e5\u7684\u76ee\u6807\u662f\u8fde\u63a5\u7406\u8bba\u4e0e\u5b9e\u8df5\uff0c\u5173\u6ce8\u7535\u5b50\u5546\u52a1\u3001\u533b\u7597\u4fdd\u5065\u548c\u91d1\u878d\u7b49\u9886\u57df\u7684\u6311\u6218\uff0c\u5f3a\u8c03\u5bf9\u53ef\u6269\u5c55\u3001\u5b9e\u65f6\u4e14\u503c\u5f97\u4fe1\u8d56\u89e3\u51b3\u65b9\u6848\u7684\u9700\u6c42\u3002\u901a\u8fc7\u6b64\u7efc\u8ff0\uff0c\u6211\u4eec\u9f13\u52b1\u5b66\u672f\u7814\u7a76\u4e0e\u884c\u4e1a\u5b9e\u8df5\u7684\u7d27\u5bc6\u5408\u4f5c\u3002\u672c\u7814\u7a76\u63d0\u4f9b\u7684\u6d1e\u89c1\u65e8\u5728\u5e2e\u52a9\u4e1a\u754c\u4e13\u4e1a\u4eba\u5458\u4f18\u5316RS\u90e8\u7f72\uff0c\u5e76\u6fc0\u53d1\u672a\u6765\u7814\u7a76\u7684\u65b0\u65b9\u5411\uff0c\u7279\u522b\u662f\u5728\u5e94\u5bf9\u65b0\u5174\u6280\u672f\u548c\u793e\u4f1a\u8d8b\u52bf\u65f6\u3002|\n", "2407.13692": "|**2024-07-18**|**Prover-Verifier Games improve legibility of LLM outputs**|Jan Hendrik Kirchner et.al.|[2407.13692](http://arxiv.org/abs/2407.13692)|null|\u4e3a\u4e86\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8f93\u51fa\u7ed3\u679c\u7684\u53ef\u4fe1\u5ea6\uff0c\u4e00\u4e2a\u65b9\u6cd5\u662f\u652f\u6301\u6e05\u6670\u6613\u9a8c\u8bc1\u7684\u63a8\u7406\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u53ef\u8bfb\u6027\u3002\u672c\u6587\u4ee5\u89e3\u51b3\u5c0f\u5b66\u6570\u5b66\u95ee\u9898\u4e3a\u80cc\u666f\uff0c\u7814\u7a76\u4e86\u53ef\u8bfb\u6027\uff0c\u5e76\u53d1\u73b0\u4ec5\u4f18\u5316\u8fde\u8d2f\u601d\u7ef4\u89e3\u9898\u7684\u51c6\u786e\u6027\u53ef\u80fd\u4f1a\u964d\u4f4e\u5176\u53ef\u8bfb\u6027\u3002\u4e3a\u7f13\u89e3\u8fd9\u4e00\u635f\u5931\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53d7Anil\u7b49\u4eba\uff082021\uff09\u7684\u8bc1\u660e\u5668-\u9a8c\u8bc1\u5668\u6e38\u620f\u542f\u53d1\u7684\u8bad\u7ec3\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u8fed\u4ee3\u5730\u8bad\u7ec3\u5c0f\u578b\u9a8c\u8bc1\u5668\u9884\u6d4b\u89e3\u9898\u6b63\u786e\u6027\uff0c\"\u6709\u5e2e\u52a9\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u9a8c\u8bc1\u5668\u63a5\u53d7\u7684\u6b63\u786e\u89e3\u7b54\uff0c\u4ee5\u53ca\"\u72e1\u733e\"\u7684\u8bc1\u660e\u5668\u751f\u6210\u6b3a\u9a97\u9a8c\u8bc1\u5668\u7684\u9519\u8bef\u89e3\u7b54\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6709\u5e2e\u52a9\u8bc1\u660e\u5668\u7684\u51c6\u786e\u6027\u548c\u9a8c\u8bc1\u5668\u5bf9\u6297\u653b\u51fb\u7684\u9c81\u68d2\u6027\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u9488\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u80fd\u591f\u8f6c\u79fb\u7ed9\u65f6\u95f4\u6709\u9650\u7684\u4eba\u7c7b\uff0c\u4ed6\u4eec\u5728\u9a8c\u8bc1\u89e3\u51b3\u65b9\u6848\u6b63\u786e\u6027\u65f6\u7684\u51c6\u786e\u6027\u4f1a\u968f\u7740\u8bad\u7ec3\u63d0\u9ad8\uff0c\u800c\u5728\u9a8c\u8bc1\u72e1\u733e\u8bc1\u660e\u5668\u7684\u89e3\u51b3\u65b9\u6848\u65f6\u4f1a\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u5c0f\u578b\u9a8c\u8bc1\u5668\u8fdb\u884c\u53ef\u8bfb\u6027\u8bad\u7ec3\u53ef\u80fd\u662f\u4e00\u79cd\u5b9e\u9645\u53ef\u884c\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u63d0\u5347\u5927\u578bLLMs\u5bf9\u4eba\u7c7b\u7684\u53ef\u8bfb\u6027\uff0c\u4ece\u800c\u6709\u52a9\u4e8e\u8d85\u7ea7\u4eba\u7c7b\u6a21\u578b\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u9a8c\u8bc1\u5668\u7684\u53ef\u8bfb\u6027\u8bad\u7ec3\u662f\u4e00\u4e2a\u5b9e\u7528\u7684\u9014\u5f84\uff0c\u53ef\u4ee5\u589e\u5f3a\u5927\u578bLLMs\u7684\u53ef\u8bfb\u6027\uff0c\u5bf9\u4eba\u7c7b\u6765\u8bf4\u66f4\u6613\u4e8e\u7406\u89e3\u548c\u4fe1\u4efb\u3002|\n", "2407.13648": "|**2024-07-18**|**COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization**|Skyler Grandel et.al.|[2407.13648](http://arxiv.org/abs/2407.13648)|null|\u8fd9\u7bc7\u8bba\u6587\u4e3b\u8981\u63a2\u8ba8\u4e86\u8f6f\u4ef6\u7ef4\u62a4\u4e2d\u4ee3\u7801\u7406\u89e3\u7684\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7\u81ea\u52a8\u5316\u751f\u6210\u6ce8\u91ca\u6765\u63d0\u5347\u8fd9\u4e00\u8fc7\u7a0b\u3002\u4f5c\u8005\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOMCAT\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u9886\u57df\u4e13\u5bb6\u6307\u5bfc\uff0c\u65e8\u5728\u4e3a\u6e90\u4ee3\u7801\u63d0\u4f9b\u6709\u52a9\u4e8e\u7406\u89e3\u7684\u6ce8\u91ca\u3002COMCAT\u6d41\u7a0b\u5305\u62ec\u81ea\u52a8\u8bc6\u522b\u4ee3\u7801\u4e2d\u9002\u5408\u6dfb\u52a0\u6ce8\u91ca\u7684\u4f4d\u7f6e\u3001\u9884\u6d4b\u6bcf\u4e2a\u4f4d\u7f6e\u6700\u9002\u5408\u7684\u6ce8\u91ca\u7c7b\u578b\uff0c\u5e76\u6839\u636e\u9009\u5b9a\u4f4d\u7f6e\u548c\u7c7b\u578b\u751f\u6210\u6ce8\u91ca\u3002\u5728\u4eba\u7c7b\u53d7\u8bd5\u8005\u7684\u7814\u7a76\u4e2d\uff0c\u7ed3\u679c\u663e\u793aCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u663e\u8457\u63d0\u9ad8\u4e86\u5f00\u53d1\u4eba\u5458\u5728\u4e09\u4e2a\u5178\u578b\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u7684\u4ee3\u7801\u7406\u89e3\u80fd\u529b\uff0c\u5bf9\u4e8e87%\u7684\u53c2\u4e0e\u8005\uff0c\u63d0\u5347\u5e45\u5ea6\u8fbe\u523012%\u3002\u6b64\u5916\uff0c\u7814\u7a76\u8fd8\u8868\u660eCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\u5728\u51c6\u786e\u6027\u3001\u53ef\u8bfb\u6027\u4e0a\u81f3\u5c11\u4e0e\u4eba\u5de5\u6ce8\u91ca\u76f8\u5f53\uff0c\u5e76\u4e14\u572892%\u7684\u4ee3\u7801\u7247\u6bb5\u4e2d\uff0c\u5f00\u53d1\u8005\u66f4\u504f\u597dCOMCAT\u751f\u6210\u7684\u6ce8\u91ca\uff0c\u800c\u975e\u6807\u51c6\u7684ChatGPT\u751f\u6210\u7684\u6ce8\u91ca\u3002\u8bba\u6587\u8fd8\u4ecb\u7ecd\u4e86\u5f00\u53d1\u5e76\u516c\u5f00\u4e86\u4e00\u4e2a\u5305\u542b\u6e90\u4ee3\u7801\u7247\u6bb5\u3001\u4eba\u5de5\u7f16\u5199\u6ce8\u91ca\u548c\u6807\u6ce8\u7684\u7c7b\u522b\u6570\u636e\u96c6\u3002\u603b\u7684\u6765\u8bf4\uff0cCOMCAT\u5229\u7528LLMs\u5728\u591a\u79cd\u8f6f\u4ef6\u5de5\u7a0b\u4efb\u52a1\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u4ee3\u7801\u7406\u89e3\u6c34\u5e73\u3002|\n", "2407.13647": "|**2024-07-18**|**Weak-to-Strong Reasoning**|Yuqing Yang et.al.|[2407.13647](http://arxiv.org/abs/2407.13647)|**[link](https://github.com/gair-nlp/weak-to-strong-reasoning)**|\u5f53\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6027\u80fd\u8d85\u8d8a\u4eba\u7c7b\u65f6\uff0c\u4e3a\u5176\u63d0\u4f9b\u5168\u9762\u800c\u7cbe\u786e\u7684\u76d1\u7763\u53d8\u5f97\u56f0\u96be\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u5f31\u5230\u5f3a\u5b66\u4e60\u65b9\u6cd5\uff0c\u5373\u5229\u7528\u80fd\u529b\u8f83\u5f31\u7684\u6a21\u578b\u6fc0\u53d1\u8f83\u5f3a\u6a21\u578b\u7684\u6f5c\u5728\u80fd\u529b\uff0c\u663e\u793a\u51fa\u4ef7\u503c\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u65f6\u7684\u6548\u679c\u5c1a\u672a\u5f97\u5230\u5145\u5206\u68c0\u9a8c\uff0c\u4e14\u5f53\u524d\u7f3a\u4e4f\u6709\u6548\u7684\u65b9\u6cd5\u6765\u907f\u514d\u6a21\u578b\u76f2\u76ee\u6a21\u4eff\u5f31\u5bfc\u5e08\uff0c\u5305\u62ec\u5176\u9519\u8bef\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6e10\u8fdb\u5b66\u4e60\u6846\u67b6\uff0c\u4f7f\u5f3a\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u4f18\u5316\u5176\u8bad\u7ec3\u6570\u636e\uff0c\u65e0\u9700\u4f9d\u8d56\u9ad8\u7ea7\u6a21\u578b\u6216\u4eba\u5de5\u6807\u6ce8\u7684\u6570\u636e\u3002\u8be5\u6846\u67b6\u9996\u5148\u5bf9\u9009\u5b9a\u7684\u5c0f\u800c\u9ad8\u8d28\u91cf\u6570\u636e\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u7136\u540e\u5728\u5f3a\u6a21\u578b\u81ea\u884c\u8bc6\u522b\u7684\u5bf9\u6bd4\u6837\u672c\u4e0a\u8fdb\u884c\u504f\u597d\u4f18\u5316\u3002\u6211\u4eec\u5728GSM8K\u548cMATH\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u5347\u4e86Llama2-70b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u901a\u8fc7\u4e09\u79cd\u4e0d\u540c\u7684\u5f31\u6a21\u578b\u8fdb\u884c\u9a8c\u8bc1\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5728\u524d\u77bb\u6027\u7684\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0cLlama3-8b-instruct\u6210\u529f\u6307\u5bfcLlama3-70b\u5728\u6781\u5177\u6311\u6218\u6027\u7684OlympicArena\u6570\u636e\u96c6\u4e0a\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u63d0\u5347\u4eba\u5de5\u667a\u80fd\u7684\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u53ef\u6269\u5c55\u548c\u9ad8\u7ea7\u7684\u7b56\u7565\u3002\u6240\u6709\u76f8\u5173\u4ee3\u7801\u548c\u8d44\u6e90\u53ef\u5728\u83b7\u53d6\u3002|\n", "2407.14507": "|**2024-07-19**|**Internal Consistency and Self-Feedback in Large Language Models: A Survey**|Xun Liang et.al.|[2407.14507](http://arxiv.org/abs/2407.14507)|**[link](https://github.com/iaar-shanghai/icsfsurvey)**|**\u672c\u6587\u603b\u7ed3\u4e86\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u79f0\u4e3a\u5185\u90e8\u4e00\u81f4\u6027\uff08Internal Consistency\uff09\uff0c\u5b83\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u7406\u4e0d\u8db3\u548c\u751f\u6210\u5e7b\u89c9\u5185\u5bb9\u7b49\u95ee\u9898\u4e0a\u7684\u8868\u73b0\u63d0\u4f9b\u4e86\u4e00\u81f4\u7684\u89e3\u91ca\u3002\u5185\u90e8\u4e00\u81f4\u6027\u8bc4\u4f30\u4e86LLM\u7684\u6f5c\u5728\u5c42\u3001\u89e3\u7801\u5c42\u548c\u54cd\u5e94\u5c42\u4e4b\u95f4\u7684\u5185\u5728\u4e00\u81f4\u6027\uff0c\u57fa\u4e8e\u91c7\u6837\u65b9\u6cd5\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86Self-Feedback\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u7b80\u6d01\u800c\u6709\u6548\u7684\u7406\u8bba\u6846\u67b6\uff0c\u7528\u4e8e\u6316\u6398\u5185\u90e8\u4e00\u81f4\u6027\u7684\u4fe1\u606f\u3002Self-Feedback\u6846\u67b6\u5305\u62ec\u4e24\u4e2a\u6a21\u5757\uff1a\u81ea\u6211\u8bc4\u4f30\uff08Self-Evaluation\uff09\u548c\u81ea\u6211\u66f4\u65b0\uff08Self-Update\uff09\u3002 \u6211\u4eec\u7cfb\u7edf\u5730\u6309\u4efb\u52a1\u548c\u7814\u7a76\u65b9\u5411\u5bf9\u8fd9\u4e9b\u7814\u7a76\u8fdb\u884c\u4e86\u5206\u7c7b\uff1b\u603b\u7ed3\u4e86\u76f8\u5173\u7684\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\uff1b\u6df1\u5165\u63a2\u8ba8\u4e86\u201cSelf-Feedback\u771f\u7684\u6709\u6548\u5417\uff1f\u201d\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u4e2a\u5173\u952e\u89c2\u70b9\uff0c\u5305\u62ec\u201c\u5185\u90e8\u4e00\u81f4\u6027\u7684\u53d1\u5c55\u949f\u697c\u201d\u3001\u201c\u4e00\u81f4\u6027\u51e0\u4e4e\u662f\u6b63\u786e\u6027\u201d\u7684\u5047\u8bbe\u4ee5\u53ca\u201c\u6f5c\u610f\u8bc6\u4e0e\u663e\u5f0f\u63a8\u7406\u6096\u8bba\u201d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u6982\u8ff0\u4e86\u672a\u6765\u7814\u7a76\u7684\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6211\u4eec\u5df2\u7ecf\u5f00\u6e90\u4e86\u5b9e\u9a8c\u4ee3\u7801\u3001\u53c2\u8003\u5217\u8868\u548c\u7edf\u8ba1\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u94fe\u63a5\u4e3a\uff1a[](https://github.com/IAAR-Shanghai/ICSFSurvey)**|\n", "2407.14506": "|**2024-07-19**|**On Pre-training of Multimodal Language Models Customized for Chart Understanding**|Wan-Cyuan Fan et.al.|[2407.14506](http://arxiv.org/abs/2407.14506)|null|\u8fd1\u671f\u7684\u7814\u7a76\u5728\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5b9a\u5236\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u6210\u679c\uff0c\u7279\u522b\u662f\u5728\u79d1\u5b66\u56fe\u8868\u7406\u89e3\u9886\u57df\u3002\u8fd9\u4e9b\u7814\u7a76\u901a\u5e38\u901a\u8fc7\u4f7f\u7528\u4e13\u95e8\u7684\u6570\u636e\u96c6\u8fdb\u884c\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6765\u589e\u5f3a\u95ee\u7b54\uff08QA\uff09\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5f80\u5f80\u5ffd\u89c6\u4e86\u81ea\u7136\u56fe\u50cf-\u63cf\u8ff0\u9884\u8bad\u7ec3\u6570\u636e\u4e0e\u6570\u5b57\u56fe\u8868\u56fe\u50cf-QA\u6570\u636e\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u6a21\u578b\u4ece\u56fe\u8868\u4e2d\u63d0\u53d6\u6f5c\u5728\u6570\u503c\u7684\u80fd\u529b\u3002\u672c\u6587\u65e8\u5728\u89e3\u51b3\u8fd9\u4e00\u758f\u6f0f\uff0c\u63a2\u7d22\u6539\u8fdbMLLMs\u5bf9\u56fe\u8868\u7406\u89e3\u6240\u9700\u7684\u5173\u952e\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u53d1\u73b0\uff1a\uff081\uff09\u5728\u5bf9\u9f50\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u878d\u5165\u539f\u59cb\u6570\u636e\u503c\u663e\u8457\u63d0\u9ad8\u4e86\u5bf9\u56fe\u8868\u6570\u636e\u7684\u7406\u89e3\u80fd\u529b\u3002\uff082\uff09\u5728\u7aef\u5230\u7aef\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u968f\u673a\u66ff\u6362\u56fe\u50cf\u4e3a\u6587\u672c\u8868\u793a\uff0c\u80fd\u591f\u5c06\u8bed\u8a00\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u56fe\u8868\u89e3\u91ca\u6280\u80fd\u4e0a\u3002\uff083\uff09\u8981\u6c42\u6a21\u578b\u9996\u5148\u63d0\u53d6\u5e95\u5c42\u56fe\u8868\u6570\u636e\uff0c\u7136\u540e\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u56de\u7b54\u95ee\u9898\uff0c\u53ef\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u51c6\u786e\u6027\u3002 \u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86CHOPINLLM\uff0c\u4e00\u79cd\u4e13\u4e3a\u6df1\u5165\u56fe\u8868\u7406\u89e3\u5b9a\u5236\u7684MLLM\u3002CHOPINLLM\u6709\u6548\u5730\u89e3\u6790\u5404\u79cd\u7c7b\u578b\u7684\u56fe\u8868\uff0c\u5305\u62ec\u672a\u6807\u6ce8\u7684\u56fe\u8868\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u5f3a\u5927\u7684\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30MLLMs\u5728\u4e0d\u540c\u56fe\u8868\u7c7b\u578b\u548c\u7406\u89e3\u6c34\u5e73\u4e0a\u7684\u7406\u89e3\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cCHOPINLLM\u5728\u7406\u89e3\u5404\u79cd\u7c7b\u578b\u3001\u5e26\u6709\u6807\u6ce8\u548c\u672a\u6807\u6ce8\u7684\u56fe\u8868\u65b9\u9762\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6027\u80fd\u3002|\n", "2407.14487": "|**2024-07-19**|**Evaluating the Reliability of Self-Explanations in Large Language Models**|Korbinian Randl et.al.|[2407.14487](http://arxiv.org/abs/2407.14487)|**[link](https://github.com/k-randl/self-explaining_llms)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u88ab\u63d0\u793a\u89e3\u91ca\u5176\u5148\u524d\u8f93\u51fa\u65f6\u751f\u6210\u7684\u89e3\u91ca\u53ef\u9760\u6027\u3002\u6211\u4eec\u5229\u7528\u4e09\u79cd\u5148\u8fdb\u7684\u5927\u8bed\u8a00\u6a21\u578b\uff08\u53c2\u6570\u4ece2B\u52308B\uff09\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u5206\u7c7b\u4efb\u52a1\uff08\u5ba2\u89c2\u548c\u4e3b\u89c2\uff09\u4e0a\u8bc4\u4f30\u4e86\u4e24\u79cd\u7c7b\u578b\u7684\u81ea\u6211\u89e3\u91ca\u2014\u2014\u62bd\u53d6\u5f0f\u548c\u53cd\u4e8b\u5b9e\u5f0f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u81ea\u6211\u89e3\u91ca\u4e0e\u4eba\u7c7b\u5224\u65ad\u76f8\u5173\u8054\uff0c\u4f46\u5b83\u4eec\u5e76\u4e0d\u5b8c\u5168\u4e14\u51c6\u786e\u5730\u9075\u5faa\u6a21\u578b\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u6307\u51fa\u4e86\u4e00\u79cd\u611f\u77e5\u4e0e\u5b9e\u9645\u6a21\u578b\u63a8\u7406\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u663e\u793a\uff0c\u901a\u8fc7\u63d0\u793a\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u53cd\u4e8b\u5b9e\u89e3\u91ca\uff0c\u53ef\u4ee5\u4ea7\u751f\u5fe0\u5b9e\u3001\u4fe1\u606f\u4e30\u5bcc\u4e14\u6613\u4e8e\u9a8c\u8bc1\u7684\u7ed3\u679c\u3002\u8fd9\u4e9b\u53cd\u4e8b\u5b9e\u4e3a\u4f20\u7edf\u53ef\u89e3\u91ca\u6027\u65b9\u6cd5\uff08\u4f8b\u5982SHAP\u3001LIME\uff09\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u524d\u63d0\u662f\u5bf9\u7279\u5b9a\u4efb\u52a1\u5b9a\u5236\u63d0\u793a\u5e76\u68c0\u67e5\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14474": "|**2024-07-19**|**Contrastive Learning with Counterfactual Explanations for Radiology Report Generation**|Mingjie Li et.al.|[2407.14474](http://arxiv.org/abs/2407.14474)|null|\u7531\u4e8e\u89e3\u5256\u5b66\u7684\u5e38\u89c1\u5185\u5bb9\u548c\u4e0e\u4e4b\u5bf9\u5e94\u7684\u5f71\u50cf\u5b66\u56fe\u50cf\u4e4b\u95f4\u7684\u9ad8\u5ea6\u76f8\u4f3c\u6027\uff0c\u8fd9\u79cd\u56fa\u6709\u7684\u6570\u636e\u504f\u89c1\u53ef\u80fd\u5bfc\u81f4\u81ea\u52a8\u62a5\u544a\u751f\u6210\u6a21\u578b\u5b66\u4e60\u7ea0\u7f20\u548c\u76f8\u5173\u6027\u589e\u5f3a\u7684\u8868\u793a\uff0c\u4ece\u800c\u4ea7\u751f\u8bef\u8bca\u62a5\u544a\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u201cCo\u201dunter\u201cF\u201dactual \u201cE\u201dxplanations\uff08CoFE\uff09\u6846\u67b6\u7528\u4e8e\u653e\u5c04\u5b66\u62a5\u544a\u751f\u6210\u3002\u53cd\u4e8b\u5b9e\u89e3\u91ca\u662f\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u7406\u89e3\u7b97\u6cd5\u51b3\u7b56\u5982\u4f55\u901a\u8fc7\u63d0\u51fa\u201c\u5982\u679c\u201d\u573a\u666f\u800c\u88ab\u6539\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8fd9\u4e00\u6982\u5ff5\uff0cCoFE\u53ef\u4ee5\u901a\u8fc7\u5bf9\u6bd4\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u7684\u8868\u793a\u6765\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\uff0c\u4ece\u800c\u5b66\u4e60\u975e\u76f8\u5173\u6027\u89c6\u89c9\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u5728\u6b63\u4f8b\u548c\u8d1f\u4f8b\u4e4b\u95f4\u4ea4\u6362\u8865\u4e01\u76f4\u5230\u9884\u6d4b\u8bca\u65ad\u53d1\u751f\u53d8\u5316\uff0c\u6211\u4eec\u63a8\u5bfc\u51fa\u53cd\u4e8b\u5b9e\u56fe\u50cf\u3002\u5728\u8fd9\u91cc\uff0c\u6b63\u4f8b\u548c\u8d1f\u4f8b\u662f\u6700\u8bed\u4e49\u4e0a\u76f8\u4f3c\u7684\uff0c\u4f46\u5177\u6709\u4e0d\u540c\u7684\u8bca\u65ad\u6807\u7b7e\u3002\u6b64\u5916\uff0cCoFE\u91c7\u7528\u53ef\u5b66\u4e60\u63d0\u793a\u9ad8\u6548\u5730\u5bf9\u9884\u8bad\u7ec3\u7684\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u5c01\u88c5\u4e86\u6b63\u4e8b\u5b9e\u4f8b\u548c\u53cd\u4e8b\u5b9e\u5b9e\u4f8b\u7684\u5185\u5bb9\uff0c\u63d0\u4f9b\u66f4\u901a\u7528\u7684\u63d0\u793a\u8868\u793a\u3002\u5728\u4e24\u4e2a\u57fa\u51c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u5229\u7528\u53cd\u4e8b\u5b9e\u89e3\u91ca\u4f7fCoFE\u80fd\u591f\u751f\u6210\u8bed\u4e49\u4e0a\u8fde\u8d2f\u4e14\u4e8b\u5b9e\u5b8c\u6574\u7684\u62a5\u544a\uff0c\u5e76\u5728\u8bed\u8a00\u751f\u6210\u548c\u4e34\u5e8a\u6709\u6548\u6027\u6307\u6807\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002|\n", "2407.14467": "|**2024-07-19**|**Check-Eval: A Checklist-based Approach for Evaluating Text Quality**|Jayr Pereira et.al.|[2407.14467](http://arxiv.org/abs/2407.14467)|null|\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u4f20\u7edf\u7684\u8bc4\u4f30\u6807\u51c6\u5f80\u5f80\u4e0e\u4eba\u7c7b\u7684\u5224\u65ad\u4e0d\u5339\u914d\uff0c\u5c24\u5176\u662f\u5728\u9700\u8981\u521b\u9020\u6027\u548c\u7ec6\u5fae\u5dee\u522b\u7684\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCheck-Eval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\uff0c\u901a\u8fc7\u5229\u7528LLM\u4ee5\u68c0\u67e5\u8868\u4e3a\u57fa\u7840\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u3002Check-Eval\u53ef\u4ee5\u4f5c\u4e3a\u65e0\u53c2\u8003\u548c\u6709\u53c2\u8003\u7684\u8bc4\u4f30\u65b9\u6cd5\u4f7f\u7528\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u4e14\u53ef\u89e3\u91ca\u7684\u6587\u672c\u8d28\u91cf\u8bc4\u4f30\u4f53\u7cfb\u3002\u8be5\u6846\u67b6\u4e3b\u8981\u7531\u4e24\u4e2a\u9636\u6bb5\u7ec4\u6210\uff1a\u68c0\u67e5\u8868\u751f\u6210\u548c\u68c0\u67e5\u8868\u8bc4\u4f30\u3002\u6211\u4eec\u5728\u4e24\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u9a8c\u8bc1\u4e86Check-Eval\uff1a\u8461\u8404\u7259\u8bed\u6cd5\u5f8b\u8bed\u4e49\u6587\u672c\u76f8\u4f3c\u6027\u4ee5\u53caSummEval\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0cCheck-Eval\u4e0e\u73b0\u6709\u6307\u6807\uff08\u5982G-Eval\u548cGPTScore\uff09\u76f8\u6bd4\uff0c\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u66f4\u9ad8\u7684\u5206\u6570\uff0c\u8fd9\u8868\u660e\u5176\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u751f\u6210\u4efb\u52a1\u66f4\u53ef\u9760\u548c\u6709\u6548\u7684\u8bc4\u4f30\u6846\u67b6\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u4ee3\u7801\u53ef\u5728https://anonymous.4open.science/r/check-eval-0DB4\u83b7\u53d6\u3002|\n", "2407.14452": "|**2024-07-19**|**Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier**|Zachary Wojtowicz et.al.|[2407.14452](http://arxiv.org/abs/2407.14452)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u5176\u4ed6\u9ad8\u5ea6\u5148\u8fdb\u7684AI\u7cfb\u7edf\u5728\u51b3\u5b9a\u8bf4\u4ec0\u4e48\u6216\u505a\u4ec0\u4e48\u65f6\u63d0\u4f9b\u4e86\u4fbf\u5229\uff0c\u4f46\u8fd9\u4fbf\u5229\u6027\u5b9e\u9645\u4e0a\u524a\u5f31\u4e86\u5728\u793e\u4f1a\u60c5\u5883\u4e0b\u91c7\u53d6\u6709\u6548\u884c\u52a8\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5f15\u5165\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u8fd9\u4e00\u6574\u5408\u6027\u7406\u8bba\u6982\u5ff5\u6765\u89e3\u91ca\u8fd9\u79cd\u770b\u4f3c\u77db\u76fe\u7684\u73b0\u8c61\u3002\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u53d1\u751f\u5728\u4f7f\u7528\u53ef\u89c2\u5bdf\u7684\u884c\u4e3a\u6765\u8bc1\u5b9e\u4e0d\u53ef\u89c2\u5bdf\u7684\u5fc3\u7406\u4e8b\u5b9e\u7684\u60c5\u51b5\u4e2d\u3002\u4ece\u62db\u8058\u5230\u7ea6\u4f1a\uff0c\u201c\u5fc3\u7406\u8bc1\u660e\u201d\u4f7f\u4eba\u4eec\u80fd\u591f\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e2d\u76f8\u4e92\u4f20\u8fbe\u4ef7\u503c\u89c2\u3001\u610f\u56fe\u3001\u77e5\u8bc6\u72b6\u6001\u7b49\u5fc3\u7406\u7279\u5f81\uff0c\u8fd9\u4e9b\u73af\u5883\u4e2d\u7684\u8bda\u5b9e\u96be\u4ee5\u5f97\u5230\u5f3a\u5236\u6267\u884c\u3002 \u57fa\u4e8e\u7ecf\u6d4e\u5b66\u3001\u7406\u8bba\u751f\u7269\u5b66\u548c\u8ba1\u7b97\u673a\u79d1\u5b66\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63cf\u8ff0\u4e86\u4f7f\u4eba\u7c7b\u80fd\u591f\u5b9e\u65bd\u5fc3\u7406\u8bc1\u660e\u7684\u6838\u5fc3\u7406\u8bba\u673a\u5236\u3002\u5bf9\u8fd9\u4e9b\u673a\u5236\u7684\u5206\u6790\u63ed\u793a\u4e86\u4eba\u5de5\u667a\u80fd\u5982\u4f55\u5728\u4f7f\u601d\u8003\u53d8\u5f97\u5bb9\u6613\u7684\u540c\u65f6\uff0c\u5374\u53ef\u80fd\u4f7f\u4f4e\u4fe1\u4efb\u5408\u4f5c\u53d8\u5f97\u66f4\u96be\u3002 \u901a\u8fc7\u7406\u89e3\u5fc3\u7406\u8bc1\u660e\u7684\u5de5\u4f5c\u539f\u7406\u53ca\u5176\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u5e94\u7528\uff0c\u6211\u4eec\u53ef\u4ee5\u8bbe\u8ba1\u51fa\u65e2\u80fd\u4fc3\u8fdb\u9ad8\u6548\u6c9f\u901a\u53c8\u80fd\u7ef4\u62a4\u793e\u4f1a\u534f\u4f5c\u7684AI\u7cfb\u7edf\u3002\u4f8b\u5982\uff0c\u5728\u62db\u8058\u8fc7\u7a0b\u4e2d\uff0cAI\u53ef\u4ee5\u901a\u8fc7\u5206\u6790\u5019\u9009\u4eba\u7684\u884c\u4e3a\u6a21\u5f0f\u548c\u5386\u53f2\u6570\u636e\u6765\u95f4\u63a5\u8bc4\u4f30\u5176\u6280\u80fd\u3001\u56e2\u961f\u5408\u4f5c\u80fd\u529b\u4ee5\u53ca\u5bf9\u516c\u53f8\u6587\u5316\u7684\u9002\u5e94\u6027\uff0c\u4ece\u800c\u5e2e\u52a9\u96c7\u4e3b\u505a\u51fa\u66f4\u53ef\u9760\u7684\u4eba\u624d\u9009\u62e9\u51b3\u7b56\u3002\u5728\u7ea6\u4f1a\u573a\u666f\u4e2d\uff0cAI\u53ef\u4ee5\u5229\u7528\u793e\u4ea4\u5a92\u4f53\u6d3b\u52a8\u3001\u5174\u8da3\u7231\u597d\u7b49\u4fe1\u606f\u6765\u6784\u5efa\u7528\u6237\u7684\u5fc3\u7406\u753b\u50cf\uff0c\u4ee5\u6b64\u5e2e\u52a9\u7528\u6237\u627e\u5230\u4e0e\u81ea\u5df1\u4ef7\u503c\u89c2\u548c\u751f\u6d3b\u65b9\u5f0f\u76f8\u5339\u914d\u7684\u4f34\u4fa3\u3002 \u603b\u4e4b\uff0c\u901a\u8fc7\u5408\u7406\u5730\u8bbe\u8ba1\u548c\u5e94\u7528AI\u6280\u672f\uff0c\u6211\u4eec\u4e0d\u4ec5\u53ef\u4ee5\u5728\u4f4e\u4fe1\u4efb\u73af\u5883\u4e0b\u589e\u5f3a\u4eba\u7c7b\u7684\u4ea4\u6d41\u548c\u5408\u4f5c\u80fd\u529b\uff0c\u800c\u4e14\u8fd8\u80fd\u4fc3\u8fdb\u66f4\u52a0\u516c\u6b63\u3001\u900f\u660e\u548c\u9ad8\u6548\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002|\n", "2407.14439": "|**2024-07-19**|**Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding**|Renshan Zhang et.al.|[2407.14439](http://arxiv.org/abs/2407.14439)|**[link](https://github.com/JiuTian-VL/TokenCorrCompressor)**|**\u5f53\u524d\u4e3b\u6d41\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Multimodal Large Language Models, MLLMs\uff09\u5728\u8fdb\u884c\u6587\u6863\u7406\u89e3\u65f6\uff0c\u666e\u904d\u91c7\u7528\u5bf9\u9ad8\u5206\u8fa8\u7387\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u88c1\u526a\uff0c\u4ece\u800c\u751f\u6210\u591a\u4e2a\u5b50\u56fe\u50cf\u7684\u65b9\u6cd5\u3002\u5927\u591a\u6570\u73b0\u6709\u7684\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u4f1a\u4fdd\u7559\u6240\u6709\u5b50\u56fe\u50cf\u5185\u7684\u6807\u8bb0\uff0c\u5e76\u540c\u7b49\u5bf9\u5f85\u5b83\u4eec\uff0c\u8fd9\u5ffd\u89c6\u4e86\u8fd9\u4e9b\u6807\u8bb0\u7684\u4e0d\u540c\u4fe1\u606f\u4ef7\u503c\u6027\uff0c\u5bfc\u81f4\u4e86\u5927\u91cf\u4e0d\u5fc5\u8981\u7684\u56fe\u50cf\u6807\u8bb0\u589e\u52a0\u3002\u4e3a\u4e86\u5b9e\u73b0\u66f4\u52a0\u9002\u5e94\u6027\u548c\u9ad8\u6548\u7684\u6587\u6863\u7406\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cToken\u7ea7\u76f8\u5173\u6027\u5f15\u5bfc\u538b\u7f29\u201d\u7684\u65e0\u53c2\u6570\u4e14\u53ef\u63d2\u62d4\u65b9\u6cd5\uff0c\u65e8\u5728\u4f18\u5316\u6807\u8bb0\u5904\u7406\u8fc7\u7a0b\u3002\u8be5\u65b9\u6cd5\u9996\u5148\u5f15\u5165\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u8bc4\u4f30\u6a21\u5f0f\u91cd\u590d\u6027\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u6bcf\u4e2a\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u8fdb\u884c\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8bc6\u522b\u5197\u4f59\u6807\u8bb0\uff0c\u4ece\u800c\u786e\u5b9a\u5b50\u56fe\u50cf\u7684\u4fe1\u606f\u5bc6\u5ea6\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u9488\u5bf9Token\u7ea7\u522b\u7684\u91c7\u6837\u65b9\u6cd5\uff0c\u901a\u8fc7\u6df1\u5165\u5206\u6790[CLS]\u6807\u8bb0\u4e0e\u7247\u6bb5\u6807\u8bb0\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u9ad8\u6548\u6355\u6349\u6700\u5177\u4fe1\u606f\u4ef7\u503c\u7684\u6807\u8bb0\u3002\u901a\u8fc7\u7ed3\u5408\u8fd9\u4e24\u79cd\u7b56\u7565\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u53ef\u65e0\u7f1d\u96c6\u6210\u5230\u4f7f\u7528\u88c1\u526a\u6280\u672f\u7684MLLMs\u4e2d\u7684\u81ea\u9002\u5e94\u538b\u7f29\u6a21\u5757\u3002\u8fd9\u4e00\u6a21\u5757\u4e0d\u4ec5\u5728\u8bad\u7ec3\u548c\u63a8\u7406\u8fc7\u7a0b\u4e2d\u663e\u8457\u63d0\u5347\u4e86\u5904\u7406\u901f\u5ea6\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u4e0e\u73b0\u6709\u538b\u7f29\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u6c34\u5e73\u3002\u6211\u4eec\u4f7f\u7528\u5f53\u524d\u6700\u4f73\u7684\u6587\u6863\u7406\u89e3\u6a21\u578bmPLUG-DocOwl1.5\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5e76\u901a\u8fc7\u4e0e\u5176\u4ed6\u538b\u7f29\u65b9\u6cd5\u7684\u5e7f\u6cdb\u5bf9\u6bd4\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002**|\n", "2407.14402": "|**2024-07-19**|**The Vision of Autonomic Computing: Can LLMs Make It a Reality?**|Zhiyang Zhang et.al.|[2407.14402](http://arxiv.org/abs/2407.14402)|null|\u300a\u81ea\u6cbb\u8ba1\u7b97\u613f\u666f\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u4e2d\u7684\u5e94\u7528\u300b\u4e00\u6587\u56de\u987e\u4e86\u8d85\u8fc7\u4e8c\u5341\u5e74\u524d\u63d0\u51fa\u7684\u81ea\u6cbb\u8ba1\u7b97\uff08ACV\uff09\u613f\u666f\uff0c\u65e8\u5728\u6784\u5efa\u80fd\u591f\u81ea\u6211\u7ba1\u7406\u548c\u9002\u5e94\u73af\u5883\u53d8\u5316\u7684\u8ba1\u7b97\u7cfb\u7edf\uff0c\u8fd9\u4e00\u76ee\u6807\u81f3\u4eca\u4ecd\u9762\u4e34\u6311\u6218\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\u63d0\u4f9b\u4e86\u53ef\u80fd\uff0c\u5b83\u4eec\u901a\u8fc7\u5229\u7528\u5e7f\u6cdb\u7684\u77e5\u8bc6\u3001\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u4ee5\u53ca\u4efb\u52a1\u81ea\u52a8\u5316\u80fd\u529b\u6765\u5b9e\u73b0\u8fd9\u4e00\u613f\u666f\u3002 \u672c\u6587\u63a2\u8ba8\u4e86\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u5fae\u670d\u52a1\u7ba1\u7406\u81ea\u4e3b\u6027\u7684\u53ef\u884c\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u4e2a\u4e94\u7ea7\u5206\u7c7b\u4f53\u7cfb\uff0c\u7528\u4e8e\u63cf\u8ff0\u81ea\u4e3b\u670d\u52a1\u7ef4\u62a4\u7684\u4e0d\u540c\u5c42\u6b21\u3002\u6587\u4e2d\u8fd8\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8e\u201cSock Shop\u201d\u5fae\u670d\u52a1\u6f14\u793a\u9879\u76ee\u7684\u5728\u7ebf\u8bc4\u4f30\u57fa\u51c6\uff0c\u4ee5\u8bc4\u4f30\u8be5\u6846\u67b6\u7684\u6027\u80fd\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7LLMs\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u5fae\u670d\u52a1\u4f53\u7cfb\u7ed3\u6784\u4e2d\u95ee\u9898\u68c0\u6d4b\u548c\u89e3\u51b3\u7684\u80fd\u529b\uff0c\u5b9e\u73b0\u4e86\u7b2c\u4e09\u7ea7\u81ea\u4e3b\u6027\u6c34\u5e73\u7684\u7a81\u7834\uff0c\u8fd9\u6807\u5fd7\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5fae\u670d\u52a1\u7ba1\u7406\u6846\u67b6\u96c6\u6210\u65b9\u9762\u7684\u5e94\u7528\u53d6\u5f97\u4e86\u91cd\u8981\u8fdb\u5c55\uff0c\u4e3a\u6784\u5efa\u66f4\u9002\u5e94\u6027\u548c\u81ea\u6211\u7ba1\u7406\u7684\u8ba1\u7b97\u7cfb\u7edf\u94fa\u5e73\u4e86\u9053\u8def\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u548c\u53d1\u5c55\uff0c\u76f8\u5173\u7684\u4ee3\u7801\u5c06\u901a\u8fc7\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2407.14371": "|**2024-07-19**|**Open Artificial Knowledge**|Vadim Borisov et.al.|[2407.14371](http://arxiv.org/abs/2407.14371)|null|\u300a\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\uff08OAK\uff09\u6570\u636e\u96c6\uff1a\u4fc3\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u4e0e\u89e3\u51b3\u6570\u636e\u7a00\u7f3a\u4e0e\u9690\u79c1\u95ee\u9898\u300b \u5f53\u524d\uff0c\u57fa\u4e8e\u5bf9\u8bdd\u7684AI\u7cfb\u7edf\u5982ChatGPT\u3001Claude\u548cGemini\u7684\u6210\u529f\uff0c\u4e3b\u8981\u5f97\u76ca\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bf9\u6d77\u91cf\u6570\u636e\u96c6\u7684\u8bad\u7ec3\u3002\u7136\u800c\uff0c\u83b7\u53d6\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u4f26\u7406\u6765\u6e90\u7684\u6570\u636e\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u5f00\u653e\u4eba\u5de5\u77e5\u8bc6\u201d\uff08OAK\uff09\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b\u8d85\u8fc75\u4ebf\u4e2a\u4ee4\u724c\uff08\u64b0\u5199\u65f6\uff09\u7684\u5927\u578b\u8d44\u6e90\u5e93\u3002OAK\u901a\u8fc7\u96c6\u5408\u5305\u62ecGPT4o\u3001LLaMa3-70B\u3001LLaMa3-8B\u3001Mixtral-8x7B\u3001Gemma-7B\u548cGemma-2-9B\u5728\u5185\u7684\u6700\u5148\u8fdb\u7684LLMs\uff0c\u5229\u7528\u7ef4\u57fa\u767e\u79d1\u7684\u4e3b\u8981\u7c7b\u522b\u6765\u5f15\u5bfc\u6587\u672c\u751f\u6210\uff0c\u786e\u4fdd\u5e7f\u6cdb\u7684\u9886\u57df\u8986\u76d6\uff0c\u540c\u65f6\u4fdd\u6301\u8fde\u8d2f\u6027\u548c\u4e8b\u5b9e\u51c6\u786e\u6027\u3002OAK\u6570\u636e\u96c6\u65e8\u5728\u4fc3\u8fdb\u66f4\u5f3a\u5927\u3001\u66f4\u5bf9\u9f50\u7684\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u5e76\u89e3\u51b3\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u4e2d\u7684\u5173\u952e\u95ee\u9898\uff0c\u5982\u6570\u636e\u7a00\u7f3a\u6027\u548c\u9690\u79c1\u95ee\u9898\u3002\u76ee\u524d\uff0c\u8be5\u6570\u636e\u96c6\u662f\u514d\u8d39\u63d0\u4f9b\u5728www.oakdataset.org\u3002|\n", "2407.14355": "|**2024-07-19**|**Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models**|Xuenan Xu et.al.|[2407.14355](http://arxiv.org/abs/2407.14355)|**[link](https://github.com/wsntxxn/attrenhzsac)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8fdb\u884c\u96f6\u6837\u672c\u97f3\u9891\u5206\u7c7b\uff0c\u5373\u8bc6\u522b\u548c\u5206\u7c7b\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ece\u672a\u89c1\u8fc7\u7684\u97f3\u9891\u7c7b\u522b\u3002\u6211\u4eec\u63d0\u8bae\u5217\u51fa\u4e00\u7cfb\u5217\u97f3\u9891\u5c5e\u6027\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9886\u57df\u77e5\u8bc6\u4e3a\u6bcf\u4e2a\u7c7b\u522b\u751f\u6210\u8be6\u7ec6\u7684\u5c5e\u6027\u63cf\u8ff0\u3002\u4e0e\u4ee5\u5f80\u4e3b\u8981\u4f9d\u8d56\u7c7b\u522b\u6807\u7b7e\u6216\u7b80\u5355\u63cf\u8ff0\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u591a\u7ef4\u5ea6\u7684\u5185\u5728\u542c\u89c9\u5c5e\u6027\uff0c\u6355\u6349\u97f3\u9891\u7c7b\u522b\u7684\u4e0d\u540c\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\u6765\u589e\u5f3a\u57fa\u4e8e\u6587\u672c\u6807\u7b7e\u7684\u96f6\u6837\u672c\u5b66\u4e60\u3002\u6211\u4eec\u5728VGGSound\u548cAudioSet\u4e0a\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff08\u4ee3\u7801\u53ef\u8bbf\u95ee\uff1ahttps://www.github.com/wsntxxn/AttrEnhZsAc\uff09\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u96f6\u6837\u672c\u5206\u7c7b\u51c6\u786e\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u9ad8\u3002\u6d88\u878d\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u65e0\u8bba\u6a21\u578b\u67b6\u6784\u5982\u4f55\uff0c\u6027\u80fd\u589e\u5f3a\u90fd\u975e\u5e38\u7a33\u5065\u3002|\n", "2407.15850": "|**2024-07-22**|**AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description**|Junyu Xie et.al.|[2407.15850](http://arxiv.org/abs/2407.15850)|**[link](https://github.com/Jyxarthur/AutoAD-Zero)**|**\u6211\u4eec\u7684\u76ee\u6807\u662f\u65e0\u9700\u8bad\u7ec3\u5730\u751f\u6210\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u97f3\u9891\u63cf\u8ff0\uff08AD\uff09\u3002\u6211\u4eec\u5229\u7528\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5e76\u5f00\u53d1\u4e86\u89c6\u89c9\u548c\u6587\u672c\u63d0\u793a\u7b56\u7565\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\u3002\u6211\u4eec\u7684\u8d21\u732e\u6709\u4e09\u70b9\uff1a(i) \u6211\u4eec\u8bc1\u660e\uff0c\u5982\u679c\u901a\u8fc7\u89c6\u89c9\u6307\u793a\u76f4\u63a5\u63d0\u793aVLM\u63d0\u4f9b\u89d2\u8272\u4fe1\u606f\uff0cVLM\u53ef\u4ee5\u6210\u529f\u547d\u540d\u548c\u5f15\u7528\u89d2\u8272\uff0c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\uff1b(ii) \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8fc7\u7a0b\u6765\u751f\u6210AD\uff0c\u7b2c\u4e00\u9636\u6bb5\u8ba9VLM\u5168\u9762\u63cf\u8ff0\u89c6\u9891\uff0c\u7b2c\u4e8c\u9636\u6bb5\u4f7f\u7528LLM\u5c06\u5bc6\u96c6\u7684\u6587\u672c\u4fe1\u606f\u603b\u7ed3\u4e3a\u4e00\u4e2a\u7b80\u6d01\u7684AD\u53e5\u5b50\uff1b(iii) \u6211\u4eec\u5236\u5b9a\u4e86\u4e00\u4e2a\u65b0\u7684\u7535\u89c6\u97f3\u9891\u63cf\u8ff0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5AutoAD-Zero\u5728AD\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08\u751a\u81f3\u4e0e\u4e00\u4e9b\u5728\u771f\u5b9eAD\u4e0a\u5fae\u8c03\u7684\u6a21\u578b\u76f8\u5339\u654c\uff09\uff0c\u5b9e\u73b0\u4e86\u7535\u5f71\u548c\u7535\u89c6\u8fde\u7eed\u5267\u7684\u6700\u9ad8CRITIC\u8bc4\u5206\u3002**|\n", "2407.15847": "|**2024-07-22**|**LLMmap: Fingerprinting For Large Language Models**|Dario Pasquini et.al.|[2407.15847](http://arxiv.org/abs/2407.15847)|**[link](https://github.com/pasquini-dario/LLMmap)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u96c6\u6210\u5e94\u7528\u7684\u9996\u4ee3\u6307\u7eb9\u8bc6\u522b\u653b\u51fb\u5de5\u5177\u2014\u2014LLMmap\u3002\u8be5\u5de5\u5177\u91c7\u7528\u79ef\u6781\u7684\u6307\u7eb9\u8bc6\u522b\u7b56\u7565\uff0c\u901a\u8fc7\u5411\u5e94\u7528\u53d1\u9001\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u67e5\u8be2\u5e76\u5206\u6790\u54cd\u5e94\u4fe1\u606f\uff0c\u4ee5\u8bc6\u522b\u6240\u4f7f\u7528\u7684\u5177\u4f53LLM\u6a21\u578b\u3002\u4ec5\u97008\u6b21\u4ea4\u4e92\uff0cLLMmap\u5373\u53ef\u572895%\u4ee5\u4e0a\u7684\u51c6\u786e\u7387\u4e0b\u7cbe\u786e\u8bc6\u522b\u51faLLM\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0cLLMmap\u88ab\u8bbe\u8ba1\u5f97\u5177\u6709\u8de8\u4e0d\u540c\u5e94\u7528\u5c42\u7684\u9c81\u68d2\u6027\uff0c\u4f7f\u5176\u80fd\u591f\u8bc6\u522b\u5728\u5404\u79cd\u7cfb\u7edf\u63d0\u793a\u3001\u968f\u673a\u62bd\u6837\u8d85\u53c2\u6570\u4ee5\u53ca\u590d\u6742\u7684\u751f\u6210\u6846\u67b6\u5982RAG\u6216Chain-of-Thought\u7b49\u73af\u5883\u4e0b\u7684LLM\u6a21\u578b\u3002|\n", "2407.15841": "|**2024-07-22**|**SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models**|Mingze Xu et.al.|[2407.15841](http://arxiv.org/abs/2407.15841)|**[link](https://github.com/apple/ml-slowfast-llava)**|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6162\u901f-LLaVA\u201d\uff08\u6216\u7b80\u79f0\u4e3aSF-LLaVA\uff09\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u89c6\u9891\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5b83\u80fd\u591f\u540c\u65f6\u6355\u6349\u8be6\u7ec6\u7684\u7a7a\u95f4\u8bed\u4e49\u548c\u957f\u65f6\u5e8f\u4e0a\u4e0b\u6587\uff0c\u800c\u4e0d\u4f1a\u8d85\u51fa\u901a\u5e38\u4f7f\u7528\u7684LLM\u7684\u4ee4\u724c\u9884\u7b97\u3002\u8fd9\u4e00\u76ee\u6807\u901a\u8fc7\u4f7f\u7528\u89c6\u9891LLM\u8f93\u5165\u7684\u53cc\u6d41\u8bbe\u8ba1\u5b9e\u73b0\uff0c\u6709\u6548\u5730\u805a\u5408\u4e86\u4ece\u91c7\u6837\u89c6\u9891\u5e27\u4e2d\u63d0\u53d6\u7684\u7279\u5f81\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6162\u901f\u8def\u5f84\u4ee5\u8f83\u4f4e\u7684\u5e27\u7387\u63d0\u53d6\u5c3d\u53ef\u80fd\u591a\u7684\u7a7a\u95f4\u7ec6\u8282\u7684\u7279\u5f81\uff08\u4f8b\u5982\uff0c\u4ee524x24\u7684\u4ee4\u724c\uff09\uff0c\u800c\u5feb\u901f\u8def\u5f84\u5219\u4ee5\u8f83\u9ad8\u7684\u5e27\u7387\u64cd\u4f5c\uff0c\u4f46\u4f7f\u7528\u8f83\u5927\u7684\u7a7a\u95f4\u6c60\u5316\u6b65\u957f\uff08\u4f8b\u5982\uff0c\u4e0b\u91c7\u68376x\uff09\u6765\u5173\u6ce8\u8fd0\u52a8\u7ebf\u7d22\u3002\u56e0\u6b64\uff0c\u8fd9\u79cd\u8bbe\u8ba1\u5141\u8bb8\u6211\u4eec\u9002\u5f53\u5730\u6355\u83b7\u5bf9\u4e8e\u7406\u89e3\u89c6\u9891\u4e2d\u7684\u8be6\u7ec6\u4fe1\u606f\u6709\u76ca\u7684\u65f6\u7a7a\u7279\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cSF-LLaVA\u5728\u5404\u79cd\u89c6\u9891\u4efb\u52a1\u4e0a\u90fd\u8d85\u8d8a\u4e86\u73b0\u6709\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u3002\u5728\u67d0\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u751a\u81f3\u4e0e\u5728\u89c6\u9891\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u7684\u6700\u5148\u8fdb\u7684\u89c6\u9891LLM\u5b9e\u73b0\u4e86\u76f8\u5f53\u6216\u66f4\u597d\u7684\u6027\u80fd\u3002|\n", "2407.15838": "|**2024-07-22**|**MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity**|Yangzhou Liu et.al.|[2407.15838](http://arxiv.org/abs/2407.15838)|**[link](https://github.com/yuecao0119/mminstruct)**|\u5c3d\u7ba1\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u89c6\u89c9\u4efb\u52a1\u4e0a\u7684\u5fae\u8c03\u8868\u73b0\u51fa\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u4f46\u73b0\u6709\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u5b58\u5728\u4ee5\u4e0b\u5c40\u9650\u6027\uff1a 1. \u6307\u4ee4\u6ce8\u91ca\u8d28\u91cf\uff1a\u867d\u7136\u73b0\u6709\u7684\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u751f\u6210\u7684\u6307\u4ee4\u53ef\u80fd\u4ecd\u4f1a\u5305\u542b\u4e0d\u51c6\u786e\u6027\uff0c\u5982\u5e7b\u89c9\u73b0\u8c61\u3002 2. \u6307\u4ee4\u548c\u56fe\u50cf\u591a\u6837\u6027\uff1a\u6307\u4ee4\u7c7b\u578b\u8303\u56f4\u6709\u9650\u4ee5\u53ca\u56fe\u50cf\u6570\u636e\u7f3a\u4e4f\u591a\u6837\u6027\u53ef\u80fd\u4f1a\u5f71\u54cd\u6a21\u578b\u751f\u6210\u591a\u6837\u6027\u548c\u63a5\u8fd1\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8f93\u51fa\u7684\u80fd\u529b\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6MMInstruct\uff0c\u5305\u542b\u6765\u81ea24\u4e2a\u9886\u57df\u5171\u8ba1973K\u6761\u6307\u4ee4\u3002\u8be5\u6570\u636e\u96c6\u5305\u62ec\u56db\u79cd\u6307\u4ee4\u7c7b\u578b\uff1a\u5224\u65ad\u3001\u591a\u9879\u9009\u62e9\u3001\u957f\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u548c\u77ed\u89c6\u89c9\u95ee\u9898\u56de\u7b54\u3002 \u4e3a\u4e86\u6784\u5efaMMInstruct\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6307\u4ee4\u751f\u6210\u6570\u636e\u5f15\u64ce\uff0c\u5229\u7528GPT-4V\u3001GPT-3.5\u548c\u4eba\u5de5\u6821\u6b63\u3002\u6211\u4eec\u7684\u6307\u4ee4\u751f\u6210\u5f15\u64ce\u5141\u8bb8\u534a\u81ea\u52a8\u3001\u4f4e\u6210\u672c\u3001\u591a\u9886\u57df\u7684\u6307\u4ee4\u751f\u6210\uff0c\u6210\u672c\u4ec5\u4e3a\u624b\u52a8\u6784\u5efa\u7684\u516d\u5206\u4e4b\u4e00\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u548c\u6d88\u878d\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86MMInstruct\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6027\u80fd\uff0c\u4f8b\u5982\uff0c\u57fa\u4e8eMMInstruct\u7684\u6a21\u578b\u5fae\u8c03\u572812\u4e2a\u57fa\u51c6\u4e2d\u768410\u4e2a\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u72b6\u6001\u6700\u4f18\u8868\u73b0\u3002\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728https://github.com/yuecao0119/MMInstruct\u63d0\u4f9b\u3002|\n", "2407.15835": "|**2024-07-22**|**dMel: Speech Tokenization made Simple**|He Bai et.al.|[2407.15835](http://arxiv.org/abs/2407.15835)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u5229\u7528\u5927\u89c4\u6a21\u6587\u672c\u6570\u636e\u7684\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\uff0c\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u7684\u8fdb\u6b65\u3002\u53d7\u6b64\u6210\u529f\u542f\u53d1\uff0c\u7814\u7a76\u4eba\u5458\u63a2\u7d22\u4e86\u590d\u6742\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\uff0c\u4ee5\u5c06\u8fde\u7eed\u7684\u8bed\u97f3\u4fe1\u53f7\u79bb\u6563\u5316\uff0c\u4ece\u800c\u4f7f\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u53ef\u4ee5\u5e94\u7528\u4e8e\u8bed\u97f3\u6570\u636e\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u5efa\u6a21\u8bed\u4e49\u4ee4\u724c\uff0c\u53ef\u80fd\u4f1a\u4e22\u5931\u58f0\u5b66\u4fe1\u606f\uff0c\u8981\u4e48\u5efa\u6a21\u58f0\u5b66\u4ee4\u724c\uff0c\u53c8\u53ef\u80fd\u9762\u4e34\u4e22\u5931\u8bed\u4e49\u4fe1\u606f\u7684\u98ce\u9669\u3002\u5177\u6709\u591a\u79cd\u4ee4\u724c\u7c7b\u578b\u4e5f\u4f7f\u67b6\u6784\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u9700\u8981\u989d\u5916\u7684\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5c06\u6885\u5c14\u6ee4\u6ce2\u5668\u901a\u9053\u79bb\u6563\u5316\u4e3a\u79bb\u6563\u5f3a\u5ea6\u5355\u5143\uff08dMel\uff09\u4ea7\u751f\u4e86\u4e00\u4e2a\u7b80\u5355\u8868\u793a\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5176\u4ed6\u73b0\u6709\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u3002\u4f7f\u7528\u4ec5\u89e3\u7801\u5668\u7684\u53d8\u6362\u5668\u67b6\u6784\u8fdb\u884c\u8bed\u97f3-\u6587\u672c\u5efa\u6a21\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u4e0d\u540c\u7684\u8bed\u97f3\u5206\u8bcd\u65b9\u6cd5\u5728\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u548c\u8bed\u97f3\u5408\u6210\uff08TTS\uff09\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cdMel\u5728\u8054\u5408\u5efa\u6a21\u8bed\u97f3\u548c\u6587\u672c\u7684\u7edf\u4e00\u6846\u67b6\u4e2d\u5b9e\u73b0\u9ad8\u6027\u80fd\u7684\u6709\u6548\u6027\uff0c\u4e3a\u9ad8\u6548\u4e14\u6709\u6548\u7684\u8bed\u97f3\u4e0e\u6587\u672c\u8054\u5408\u5efa\u6a21\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2407.15819": "|**2024-07-22**|**Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight**|Ziyuan Huang et.al.|[2407.15819](http://arxiv.org/abs/2407.15819)|null|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u94fe\u89c6\u56fe\u201d\u7684\u89c6\u89c9-\u8bed\u8a00\u6865\u6881\u6a21\u5757\uff0c\u65e8\u5728\u52a0\u901f\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u9884\u8bad\u7ec3\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u5e8f\u5217\u5316\u7684\u89c6\u89c9\u91cd\u91c7\u6837\u5668\uff0c\u80fd\u591f\u6709\u6548\u5730\u6355\u6349\u4e0d\u540c\u7a7a\u95f4\u5c3a\u5ea6\u7684\u89c6\u89c9\u7ec6\u8282\u3002\u8fd9\u79cd\u67b6\u6784\u4e0d\u4ec5\u80fd\u591f\u6709\u6548\u5229\u7528\u5168\u5c40\u548c\u5c40\u90e8\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u8fd8\u901a\u8fc7\u590d\u5408\u4ee4\u724c\u7f29\u653e\u7b56\u7565\u7075\u6d3b\u6269\u5c55\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u6700\u591a\u53ef\u4ee5\u589e\u52a016\u500d\u7684\u4ee4\u724c\u6570\u91cf\uff0c\u800c\u65e0\u9700\u5728\u9884\u8bad\u7ec3\u540e\u8fdb\u884c\u5fae\u8c03\u3002\u56e0\u6b64\uff0c\u201c\u94fe\u89c6\u56fe\u201d\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u6240\u9700\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u8fdc\u5c11\u4e8e\u5fae\u8c03\u9636\u6bb5\uff0c\u8fd9\u6709\u610f\u5730\u51cf\u5c11\u4e86\u89c6\u89c9\u4ee4\u724c\u7684\u6570\u91cf\uff0c\u663e\u8457\u52a0\u901f\u4e86\u9884\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8282\u7701\u4e86\u5927\u7ea673%\u7684\u5b9e\u9645\u8bad\u7ec3\u65f6\u95f4\u3002 \u5728\u4e00\u7cfb\u5217\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u201c\u94fe\u89c6\u56fe\u201d\u52a0\u901f\u9884\u8bad\u7ec3\u8fc7\u7a0b\u5e76\u4e0d\u4f1a\u727a\u7272\u6027\u80fd\uff0c\u5176\u8868\u73b0\u4e0e\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u6240\u6709\u89c6\u89c9\u4ee4\u724c\u7684\u6807\u51c6\u6d41\u7a0b\u76f8\u5f53\u6216\u66f4\u597d\u3002\u8fdb\u4e00\u6b65\u589e\u52a0\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u89c6\u89c9\u4ee4\u724c\u6570\u91cf\u4f1a\u5bfc\u81f4\u66f4\u5f3a\u7684\u8868\u73b0\uff0c\u5728\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u73b0\u6709\u65b9\u6cd5\u7ade\u4e89\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u6458\u8981\u5df2\u7ecf\u8f6c\u6362\u6210\u4e86\u4e2d\u6587\u8868\u8ff0\uff0c\u5e76\u4e14\u9075\u5faa\u4e86\u4e0d\u5305\u542b\u7279\u6b8a\u7b26\u53f7\u7684\u6307\u793a\u3002|\n", "2407.15788": "|**2024-07-22**|**Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach**|Rian Dolphin et.al.|[2407.15788](http://arxiv.org/abs/2407.15788)|null|\u91d1\u878d\u65b0\u95fb\u5728\u91d1\u878d\u5e02\u573a\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u4f46\u5c06\u5176\u8f6c\u5316\u4e3a\u7ed3\u6784\u5316\u6570\u636e\u7684\u8fc7\u7a0b\u4e00\u76f4\u5145\u6ee1\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u91d1\u878d\u65b0\u95fb\u5904\u7406\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u514b\u670d\u4e86\u4ee5\u5f80\u63d0\u53d6\u7ed3\u6784\u5316\u4fe1\u606f\u65f6\u9047\u5230\u7684\u9650\u5236\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u5957\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u80fd\u591f\u4ece\u539f\u59cb\u65b0\u95fb\u6587\u7ae0\u5185\u5bb9\u4e2d\u63d0\u53d6\u76f8\u5173\u516c\u53f8\u4ee3\u7801\uff0c\u5e76\u5728\u4e0d\u4f9d\u8d56\u4e8e\u9884\u7ed3\u6784\u5316\u6570\u636e\u6d41\u7684\u60c5\u51b5\u4e0b\u8fdb\u884c\u516c\u53f8\u5c42\u9762\u7684\u60c5\u7eea\u5206\u6790\u548c\u751f\u6210\u6458\u8981\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86LLMs\u7684\u751f\u6210\u80fd\u529b\u3001\u4ee5\u53ca\u6700\u65b0\u7684\u63d0\u793a\u6280\u672f\uff0c\u914d\u4ee5\u4e00\u4e2a\u5b9a\u5236\u7684\u5b57\u7b26\u4e32\u76f8\u4f3c\u5ea6\u9a8c\u8bc1\u6846\u67b6\u3002 \u901a\u8fc7\u4f7f\u7528\u5305\u542b5530\u7bc7\u91d1\u878d\u65b0\u95fb\u6587\u7ae0\u7684\u6570\u636e\u96c6\u8fdb\u884c\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u76f8\u6bd4\u73b0\u6709\u6570\u636e\u63d0\u4f9b\u5546\uff0c\u6211\u4eec\u670990%\u7684\u6587\u7ae0\u4e0d\u4f1a\u9057\u6f0f\u4efb\u4f55\u516c\u53f8\u4ee3\u7801\uff0c\u800c\u670922%\u7684\u6587\u7ae0\u4f1a\u989d\u5916\u63d0\u4f9b\u76f8\u5173\u7684\u516c\u53f8\u4ee3\u7801\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u5e76\u901a\u8fc7\u5b9e\u65f6API\u7aef\u70b9\u63d0\u4f9b\u4e86\u7ecf\u8fc7\u5904\u7406\u7684\u6570\u636e\uff0c\u66f4\u65b0\u4e86\u6700\u65b0\u65b0\u95fb\u4fe1\u606f\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u6211\u4eec\u9996\u6b21\u4f5c\u4e3a\u6570\u636e\u63d0\u4f9b\u5546\u63d0\u4f9b\u4ece\u65b0\u95fb\u6587\u7ae0\u4e2d\u5bf9\u6bcf\u4e2a\u516c\u53f8\u7684\u7ec6\u81f4\u60c5\u7eea\u5206\u6790\u670d\u52a1\uff0c\u589e\u5f3a\u4e86\u5e02\u573a\u53c2\u4e0e\u8005\u53ef\u83b7\u53d6\u7684\u4fe1\u606f\u6df1\u5ea6\u3002 \u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5229\u7528\u91d1\u878d\u65b0\u95fb\uff0c\u6211\u4eec\u8fd8\u53d1\u5e03\u4e86\u5305\u542b5530\u7bc7\u5904\u7406\u540e\u6587\u7ae0\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002|\n", "2407.15748": "|**2024-07-22**|**MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation**|Marco Simoni et.al.|[2407.15748](http://arxiv.org/abs/2407.15748)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86MoRSE\uff08\u6df7\u5408RAG\u5b89\u5168\u4e13\u5bb6\uff09\uff0c\u8fd9\u662f\u9996\u4e2a\u4e13\u95e8\u7684AI\u804a\u5929\u673a\u5668\u4eba\u7528\u4e8e\u7f51\u7edc\u5b89\u5168\u3002MoRSE\u65e8\u5728\u63d0\u4f9b\u5168\u9762\u4e14\u5b8c\u6574\u7684\u7f51\u7edc\u5b89\u5168\u77e5\u8bc6\u3002MoRSE\u4f7f\u7528\u4e86\u4e24\u4e2aRAG\uff08\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff09\u7cfb\u7edf\uff0c\u8bbe\u8ba1\u7528\u4e8e\u4ece\u591a\u7ef4\u5ea6\u7684\u7f51\u7edc\u5b89\u5168\u4e0a\u4e0b\u6587\u4e2d\u68c0\u7d22\u548c\u7ec4\u7ec7\u4fe1\u606f\u3002\u4e0e\u4f20\u7edf\u7684RAG\u4e0d\u540c\uff0cMoRSE\u91c7\u7528\u4e86\u5e76\u884c\u68c0\u7d22\u5668\u534f\u540c\u5de5\u4f5c\uff0c\u4ee5\u5728\u4e0d\u540c\u683c\u5f0f\u548c\u7ed3\u6784\u4e2d\u68c0\u7d22\u8bed\u4e49\u76f8\u5173\u7684\u4fe1\u606f\u3002 \u4e0d\u540c\u4e8e\u4f9d\u8d56\u53c2\u6570\u77e5\u8bc6\u5e93\u7684\u4f20\u7edf\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0cMoRSE\u54cd\u5e94\u7528\u6237\u67e5\u8be2\u65f6\u4ece\u975e\u53c2\u6570\u77e5\u8bc6\u5e93\u4e2d\u68c0\u7d22\u76f8\u5173\u6587\u6863\u3002\u968f\u540e\uff0cMoRSE\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u751f\u6210\u51c6\u786e\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0cMoRSE\u53d7\u76ca\u4e8e\u5176\u77e5\u8bc6\u5e93\u7684\u5b9e\u65f6\u66f4\u65b0\uff0c\u8fd9\u4f7f\u5f97\u7cfb\u7edf\u80fd\u591f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u6301\u7eed\u7684\u77e5\u8bc6\u4e30\u5bcc\u3002 \u6211\u4eec\u5bf9MoRSE\u7684\u6709\u6548\u6027\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9600\u4e2a\u7279\u5b9a\u7684\u7f51\u7edc\u5b89\u5168\u95ee\u9898\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6027\u8bc4\u4f30\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u3001Mixtral 7x8\u7b49\u5df2\u77e5\u89e3\u51b3\u65b9\u6848\u76f8\u6bd4\uff0c\u5728\u7b54\u6848\u7684\u76f8\u5173\u6027\u548c\u6b63\u786e\u6027\u7684\u6539\u8fdb\u4e0a\u8d85\u8fc7\u4e8610%\u3002|\n", "2407.15736": "|**2024-07-22**|**OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context**|Steffen Kleinle et.al.|[2407.15736](http://arxiv.org/abs/2407.15736)|null|\u5f53\u79fb\u6c11\u5230\u4e00\u4e2a\u65b0\u7684\u56fd\u5bb6\u65f6\uff0c\u4eba\u4eec\u5f88\u5bb9\u6613\u56e0\u9700\u8981\u83b7\u53d6\u6709\u5173\u8d22\u653f\u652f\u6301\u3001\u4f4f\u623f\u3001\u6559\u80b2\u3001\u8bed\u8a00\u8bfe\u7a0b\u4ee5\u53ca\u5176\u4ed6\u95ee\u9898\u7684\u4fe1\u606f\u800c\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5982\u679c\u642c\u8fc1\u8fc7\u7a0b\u5306\u5fd9\u6216\u751a\u81f3\u88ab\u8feb\u8fdb\u884c\uff0c\u5bf9\u8fd9\u4e9b\u95ee\u9898\u7684\u9ad8\u8d28\u91cf\u89e3\u7b54\u53d8\u5f97\u5c24\u4e3a\u8feb\u5207\u3002\u5b98\u65b9\u79fb\u6c11\u987e\u95ee\u901a\u5e38\u8fc7\u4e8e\u7e41\u5fd9\uff0c\u800c\u5728\u7ebf\u7cfb\u7edf\u53ef\u4ee5\u5f15\u5bfc\u65b0\u79fb\u6c11\u627e\u5230\u6240\u9700\u4fe1\u606f\u6216\u5408\u9002\u7684\u54a8\u8be2\u670d\u52a1\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86OMoS-QA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u5fb7\u8bed\u548c\u82f1\u8bed\u95ee\u9898\u4e0e\u76f8\u5173\u53ef\u4fe1\u6587\u6863\u4ee5\u53ca\u624b\u52a8\u6807\u6ce8\u7684\u7b54\u6848\uff0c\u4e13\u95e8\u9488\u5bf9\u8fd9\u4e00\u573a\u666f\u3002\u95ee\u9898\u662f\u7531\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u751f\u6210\u7684\uff0c\u7b54\u6848\u53e5\u5b50\u7531\u5177\u6709\u9ad8\u5ea6\u4e00\u81f4\u6027\u7684\u4f17\u5305\u5de5\u4f5c\u8005\u9009\u62e9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u5fb7\u8bed\u548c\u82f1\u8bed\u4e0a\u5bf95\u4e2a\u9884\u8bad\u7ec3\u7684LLM\u8fdb\u884c\u4e86\u63d0\u53d6\u5f0f\u95ee\u7b54\u4efb\u52a1\u7684\u6bd4\u8f83\u3002\u5728\u6240\u6709\u6a21\u578b\u548c\u4e24\u79cd\u8bed\u8a00\u4e2d\uff0c\u9009\u62e9\u7b54\u6848\u53e5\u5b50\u7684\u7cbe\u786e\u5ea6\u9ad8\uff0c\u53ec\u56de\u7387\u4f4e\u81f3\u4e2d\u7b49\uff0c\u8fd9\u662f\u4e00\u4e2a\u6709\u5229\u7684\u6743\u8861\uff0c\u4ee5\u907f\u514d\u8bef\u5bfc\u7528\u6237\u3002\u8fd9\u79cd\u6027\u80fd\u5373\u4f7f\u5728\u95ee\u9898\u8bed\u8a00\u4e0e\u6587\u6863\u8bed\u8a00\u4e0d\u5339\u914d\u65f6\u4e5f\u80fd\u4fdd\u6301\u4e0d\u53d8\u3002\u5728\u6839\u636e\u4e0a\u4e0b\u6587\u8bc6\u522b\u4e0d\u53ef\u56de\u7b54\u7684\u95ee\u9898\u65b9\u9762\uff0c\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u5b58\u5728\u66f4\u5927\u7684\u5dee\u5f02\u3002|\n", "2407.15734": "|**2024-07-22**|**TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON**|John Chong Min Tan et.al.|[2407.15734](http://arxiv.org/abs/2407.15734)|**[link](https://github.com/simbianai/taskgen)**|TaskGen\u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u901a\u8fc7\u4f7f\u7528\u4ee3\u7406\u6765\u89e3\u51b3\u4efb\u610f\u4efb\u52a1\u5e76\u5c06\u5176\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\u6765\u5b9e\u73b0\u3002\u6bcf\u4e2a\u5b50\u4efb\u52a1\u88ab\u6620\u5c04\u5230\u4e00\u4e2a\u88c5\u5907\u51fd\u6570\u6216\u53e6\u4e00\u4e2a\u4ee3\u7406\u6267\u884c\u3002\u4e3a\u4e86\u51cf\u5c11\u5197\u4f59\uff08\u4ece\u800c\u51cf\u5c11\u4ee4\u724c\u4f7f\u7528\uff09\uff0cTaskGen\u4f7f\u7528\u4e86StrictJSON\uff0c\u786e\u4fdd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8f93\u51fa\u7684JSON\u683c\u5f0f\uff0c\u5e76\u5177\u5907\u7c7b\u578b\u68c0\u67e5\u548c\u8fed\u4ee3\u9519\u8bef\u4fee\u6b63\u7b49\u989d\u5916\u529f\u80fd\u3002TaskGen\u7684\u6838\u5fc3\u7406\u5ff5\u5728\u4e8e\u57fa\u4e8e\u9700\u6c42\u7ba1\u7406\u4fe1\u606f/\u8bb0\u5fc6\u3002 \u6211\u4eec\u5bf9TaskGen\u5728\u5404\u79cd\u73af\u5883\u4e2d\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u5305\u62ec40x40\u52a8\u6001\u8ff7\u5bab\u5bfc\u822a\uff0c\u5176\u4e2d\u969c\u788d\u7269\u4f4d\u7f6e\u4f1a\u53d8\u5316\uff08100%\u7684\u6210\u529f\u7387\uff09\uff0c\u6587\u672c\u4e16\u754c\u9003\u8131\u623f\u95f4\u89e3\u8c1c\uff0c\u5177\u6709\u5bc6\u96c6\u5956\u52b1\u548c\u8be6\u7ec6\u76ee\u6807\uff0896%\u7684\u6210\u529f\u7387\uff09\uff0c\u7f51\u7edc\u6d4f\u89c8\uff0869%\u7684\u52a8\u4f5c\u6210\u529f\uff09\uff0c\u89e3\u51b3MATH\u6570\u636e\u96c6\uff08\u5728100\u4e2aLevel-5\u95ee\u9898\u4e0a\uff0c\u6210\u529f\u738771%\uff09\uff0c\u4ee5\u53ca\u81ea\u7136\u95ee\u9898\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08F1\u5206\u6570\u4e3a47.03%\uff09\u3002|\n", "2407.16686": "|**2024-07-23**|**Can Large Language Models Automatically Jailbreak GPT-4V?**|Yuanwei Wu et.al.|[2407.16686](http://arxiv.org/abs/2407.16686)|null|GPT-4V\u56e0\u5176\u5728\u6574\u5408\u548c\u5904\u7406\u591a\u6a21\u6001\u4fe1\u606f\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\u800c\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u3002\u540c\u65f6\uff0c\u5176\u9762\u90e8\u8bc6\u522b\u529f\u80fd\u4e5f\u5f15\u53d1\u4e86\u9690\u79c1\u6cc4\u9732\u7684\u5b89\u5168\u62c5\u5fe7\u3002\u5c3d\u7ba1\u7814\u7a76\u8005\u901a\u8fc7\u5f3a\u5316\u5b66\u4e60\u4e0e\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u6216\u9884\u5904\u7406\u8fc7\u6ee4\u5668\u7b49\u624b\u6bb5\u52aa\u529b\u5b9e\u73b0\u5b89\u5168\u5bf9\u9f50\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u5b58\u5728\u88ab\u5229\u7528\u7684\u6f0f\u6d1e\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AutoJailbreak\uff0c\u8fd9\u662f\u4e00\u79cd\u521b\u65b0\u7684\u81ea\u52a8\u8d8a\u72f1\u6280\u672f\uff0c\u7075\u611f\u6765\u6e90\u4e8e\u63d0\u793a\u4f18\u5316\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u7ea2\u961f\u8bad\u7ec3\uff0c\u4ee5\u7cbe\u70bc\u8d8a\u72f1\u63d0\u793a\uff0c\u5e76\u91c7\u7528\u5f31\u5230\u5f3a\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u63d0\u793a\u6765\u63d0\u9ad8\u6548\u7387\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u7ed3\u5408\u65e9\u671f\u505c\u6b62\u7b56\u7565\uff0c\u4ee5\u6700\u5c0f\u5316\u4f18\u5316\u65f6\u95f4\u548c\u4ee4\u724c\u6d88\u8017\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoJailbreak\u663e\u8457\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5b9e\u73b0\u4e86\u8d85\u8fc795.3%\u7684\u6210\u529f\u653b\u51fb\u7387\uff08ASR\uff09\u3002\u8fd9\u9879\u7814\u7a76\u63ed\u793a\u4e86\u52a0\u5f3aGPT-4V\u5b89\u5168\u6027\u7684\u6f5c\u529b\uff0c\u7a81\u663e\u4e86LLMs\u53ef\u80fd\u88ab\u7528\u4e8e\u7834\u574fGPT-4V\u5b8c\u6574\u6027\u7684\u98ce\u9669\u3002|\n", "2407.16667": "|**2024-07-23**|**RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent**|Huiyu Xu et.al.|[2407.16667](http://arxiv.org/abs/2407.16667)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u5df2\u88ab\u96c6\u6210\u81f3\u8bf8\u591a\u5b9e\u9645\u5e94\u7528\uff0c\u4f8b\u5982\u4ee3\u7801\u52a9\u624bCopilot\u3002\u8fd9\u4e9b\u96c6\u6210\u663e\u8457\u6269\u5c55\u4e86LLM\u7684\u653b\u51fb\u9762\uff0c\u4f7f\u5176\u9762\u4e34\u591a\u79cd\u5a01\u80c1\u3002\u5176\u4e2d\uff0c\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u8bf1\u5bfc\u51fa\u6bd2\u6027\u54cd\u5e94\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u5f15\u8d77\u4e86\u5b89\u5168\u9886\u57df\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u8bc6\u522b\u8fd9\u4e9b\u5a01\u80c1\uff0c\u8d8a\u6765\u8d8a\u591a\u7684\u7ea2\u961f\u7b56\u7565\u901a\u8fc7\u6784\u5efa\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u6765\u6a21\u62df\u6f5c\u5728\u7684\u5bf9\u6297\u573a\u666f\uff0c\u4ee5\u6b64\u6d4b\u8bd5\u76ee\u6807LLM\u3002\u7136\u800c\uff0c\u73b0\u6709\u7ea2\u961f\u7b56\u7565\u5e76\u672a\u8003\u8651LLM\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u7684\u72ec\u7279\u8106\u5f31\u6027\uff0c\u4f7f\u5f97\u6784\u5efa\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\u53d8\u5f97\u56f0\u96be\u3002\u540c\u65f6\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ec5\u4f9d\u8d56\u4e8e\u5c11\u6570\u53d8\u5f02\u64cd\u4f5c\u5bf9\u201c\u8d8a\u72f1\u201d\u6a21\u677f\u8fdb\u884c\u7ec6\u5316\uff0c\u7f3a\u4e4f\u9002\u5e94\u4e0d\u540c\u60c5\u5883\u7684\u81ea\u52a8\u5316\u548c\u89c4\u6a21\u5316\u80fd\u529b\u3002 \u4e3a\u4e86\u5b9e\u73b0\u60c5\u5883\u611f\u77e5\u548c\u9ad8\u6548\u7ea2\u961f\u7b56\u7565\uff0c\u6211\u4eec\u62bd\u8c61\u5e76\u5efa\u6a21\u73b0\u6709\u653b\u51fb\u884c\u4e3a\u4e3a\u4e00\u4e2a\u7edf\u4e00\u6982\u5ff5\u2014\u2014\u201c\u8d8a\u72f1\u7b56\u7565\u201d\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u667a\u80fd\u4f53LLM\u7cfb\u7edfRedAgent\u3002\u8be5\u7cfb\u7edf\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u751f\u6210\u60c5\u5883\u611f\u77e5\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u5e76\u901a\u8fc7\u989d\u5916\u7684\u8bb0\u5fc6\u7f13\u51b2\u533a\u81ea\u6211\u53cd\u601d\u60c5\u5883\u53cd\u9988\uff0c\u6301\u7eed\u5b66\u4e60\u5982\u4f55\u5229\u7528\u8fd9\u4e9b\u7b56\u7565\u5728\u7279\u5b9a\u60c5\u5883\u4e0b\u5b9e\u73b0\u6709\u6548\u201c\u8d8a\u72f1\u201d\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u53ef\u4ee5\u5728\u4e94\u4e2a\u67e5\u8be2\u5185\u6210\u529f\u201c\u8d8a\u72f1\u201d\u5927\u591a\u6570\u9ed1\u76d2LLM\uff0c\u76f8\u8f83\u4e8e\u73b0\u6709\u7ea2\u961f\u65b9\u6cd5\u6548\u7387\u63d0\u5347\u4e24\u500d\u3002\u6b64\u5916\uff0cRedAgent\u80fd\u591f\u66f4\u9ad8\u6548\u5730\u9488\u5bf9\u5b9a\u5236\u5316\u7684LLM\u5e94\u7528\u8fdb\u884c\u201c\u8d8a\u72f1\u201d\u3002 \u901a\u8fc7\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u5e94\u7528\u7684\u201c\u8d8a\u72f1\u201d\u63d0\u793a\uff0c\u6211\u4eec\u53d1\u73b0\u4e8660\u4e2a\u4e25\u91cd\u6f0f\u6d1e\u5b58\u5728\u4e8e\u5b9e\u9645\u5e94\u7528\u4e2d\u7684GPTs\u4e0a\uff0c\u4ec5\u9700\u6bcf\u6f0f\u6d1e\u4e24\u6b21\u67e5\u8be2\u3002\u6211\u4eec\u5df2\u62a5\u544a\u6240\u6709\u53d1\u73b0\u7684\u95ee\u9898\uff0c\u5e76\u4e0eOpenAI\u548cMeta\u8fdb\u884c\u4e86\u6c9f\u901a\u4ee5\u4fee\u590d\u6f0f\u6d1e\u3002|\n", "2407.16637": "|**2024-07-23**|**Course-Correction: Safety Alignment Using Synthetic Preferences**|Rongwu Xu et.al.|[2407.16637](http://arxiv.org/abs/2407.16637)|**[link](https://github.com/pillowsofwind/course-correction)**|### \u6458\u8981 \u672c\u6587\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6267\u884c\u201c\u8bfe\u7a0b\u7ea0\u6b63\u201d\u4efb\u52a1\u7684\u80fd\u529b\u8fdb\u884c\u4e86\u4e00\u9879\u7cfb\u7edf\u6027\u7814\u7a76\uff0c\u5373\u6a21\u578b\u80fd\u591f\u81ea\u4e3b\u5730\u907f\u514d\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\textsc{C$^2$-Eval}\u57fa\u51c6\u7528\u4e8e\u5b9a\u91cf\u8bc4\u4f30\uff0c\u5e76\u5206\u6790\u4e8610\u4e2a\u6d41\u884cLLM\u7684\u6027\u80fd\uff0c\u63ed\u793a\u4e86\u5f53\u524d\u5b89\u5168\u8c03\u4f18\u7684LLM\u5728\u8bfe\u7a0b\u7ea0\u6b63\u65b9\u9762\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u4e86\u6539\u8fdb\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u504f\u597d\u5b66\u4e60\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u5f3a\u8c03\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u91cd\u8981\u6027\u3002\u901a\u8fc7\u81ea\u52a8\u5316\u6d41\u7a0b\uff0c\u6211\u4eec\u521b\u5efa\u4e86\\textsc{C$^2$-Syn}\u5408\u6210\u6570\u636e\u96c6\uff0c\u5305\u542b75\u4e07\u5bf9\u504f\u597d\uff0c\u4ee5\u6b64\u901a\u8fc7\u6570\u636e\u9a71\u52a8\u7684\u504f\u597d\u5b66\u4e60\u6559\u6388\u6a21\u578b\u53ca\u65f6\u8bfe\u7a0b\u7ea0\u6b63\u7684\u6982\u5ff5\u3002\u5728\\textsc{Llama2-Chat 7B}\u548c\\textsc{Qwen2 7B}\u4e24\u4e2aLLM\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u63d0\u9ad8\u4e86\u8bfe\u7a0b\u7ea0\u6b63\u80fd\u529b\uff0c\u540c\u65f6\u4e0d\u5f71\u54cd\u603b\u4f53\u6027\u80fd\uff0c\u5e76\u4e14\u7279\u522b\u6709\u6548\u5730\u63d0\u5347\u4e86LLM\u7684\u5b89\u5168\u6027\uff0c\u5c24\u5176\u662f\u62b5\u6297\u9003\u8131\u653b\u51fb\u7684\u80fd\u529b\u3002|\n", "2407.16615": "|**2024-07-23**|**Lawma: The Power of Specialization for Legal Tasks**|Ricardo Dominguez-Olmedo et.al.|[2407.16615](http://arxiv.org/abs/2407.16615)|null|\u6cd5\u5f8b\u6587\u672c\u7684\u6ce8\u91ca\u4e0e\u5206\u7c7b\u662f\u5b9e\u8bc1\u6cd5\u5b66\u7814\u7a76\u7684\u6838\u5fc3\u90e8\u5206\u3002\u4f20\u7edf\u4e0a\uff0c\u8fd9\u4e9b\u4efb\u52a1\u5f80\u5f80\u7531\u53d7\u8fc7\u8bad\u7ec3\u7684\u7814\u7a76\u52a9\u7406\u627f\u62c5\u3002\u5728\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u8fdb\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u5b9e\u8bc1\u6cd5\u5f8b\u5b66\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u8f6c\u5411\u4f7f\u7528\u5546\u4e1a\u6a21\u578b\uff0c\u5e0c\u671b\u4ee5\u6b64\u51cf\u8f7b\u4eba\u5de5\u6807\u6ce8\u7684\u5de8\u5927\u6210\u672c\u3002\u5c3d\u7ba1\u8fd9\u7c7b\u65b9\u6cd5\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u4f46\u5173\u4e8e\u5982\u4f55\u6700\u6709\u6548\u5730\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6cd5\u5f8b\u4efb\u52a1\u7684\u76f8\u5173\u7814\u7a76\u4ecd\u7136\u6709\u9650\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u51e0\u4e4e\u5168\u90e8\u9488\u5bf9\u673a\u5668\u5b66\u4e60\u793e\u533a\u7684\u65b0\u6cd5\u5f8b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u3002\u4eceGPT-4\u4f5c\u4e3a\u57fa\u51c6\u5f00\u59cb\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u5728\u96f6\u6837\u672c\u51c6\u786e\u5ea6\u4e0a\u7684\u8868\u73b0\u5177\u6709\u975e\u540c\u5bfb\u5e38\u4f46\u9ad8\u5ea6\u591a\u53d8\u6027\uff0c\u7ecf\u5e38\u8868\u73b0\u51fa\u53ef\u80fd\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u6cd5\u5f8b\u5de5\u4f5c\u9700\u6c42\u7684\u6027\u80fd\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8f7b\u5ea6\u5fae\u8c03\u540e\u7684Llama 3\u6a21\u578b\u5728\u51e0\u4e4e\u6240\u6709\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5747\u8fdc\u8d85GPT-4\uff0c\u901a\u5e38\u63d0\u9ad8\u4e86\u4e24\u4f4d\u6570\u767e\u5206\u70b9\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u66f4\u5927\u7684\u6a21\u578b\u5728\u5fae\u8c03\u65f6\u54cd\u5e94\u6548\u679c\u66f4\u597d\u3002\u51e0\u5341\u5230\u51e0\u767e\u4e2a\u793a\u4f8b\u8db3\u4ee5\u5b9e\u73b0\u9ad8\u5206\u7c7b\u51c6\u786e\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u6240\u6709260\u4e2a\u4efb\u52a1\u4e0a\u540c\u65f6\u5fae\u8c03\u4e00\u4e2a\u6a21\u578b\uff0c\u76f8\u5bf9\u4e8e\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u5355\u72ec\u521b\u5efa\u6a21\u578b\uff0c\u4ec5\u5728\u51c6\u786e\u6027\u65b9\u9762\u7565\u6709\u635f\u5931\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u6307\u51fa\u4e86\u66ff\u4ee3\u73b0\u6709\u505a\u6cd5\u7684\u4e00\u79cd\u53ef\u884c\u9009\u62e9\u3002\u5bf9\u4e8e\u5177\u5907\u4e00\u5b9a\u6807\u6ce8\u6570\u636e\u7684\u7279\u5b9a\u6cd5\u5f8b\u4efb\u52a1\uff0c\u7814\u7a76\u4eba\u5458\u66f4\u5e94\u8003\u8651\u4f7f\u7528\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u3002|\n", "2407.16604": "|**2024-07-23**|**Shared Imagination: LLMs Hallucinate Alike**|Yilun Zhou et.al.|[2407.16604](http://arxiv.org/abs/2407.16604)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u8fd1\u53d1\u5c55\u5448\u73b0\u4e86\u663e\u8457\u7684\u589e\u957f\uff0c\u4f46\u5b83\u4eec\u7684\u8bad\u7ec3\u65b9\u6cd5\u2014\u2014\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u9884\u8bad\u7ec3\u6570\u636e\u548c\u4f18\u5316\u7b97\u6cd5\u2014\u2014\u5f80\u5f80\u6781\u4e3a\u76f8\u4f3c\u3002\u8fd9\u81ea\u7136\u5f15\u53d1\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u5982\u4f55\uff1f\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u8bbe\u7f6e\uff0c\u5373\u60f3\u8c61\u95ee\u9898\u56de\u7b54\uff08IQA\uff09\uff0c\u4ee5\u66f4\u6df1\u5165\u5730\u7406\u89e3\u6a21\u578b\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u3002\u5728IQA\u4e2d\uff0c\u6211\u4eec\u8ba9\u4e00\u4e2a\u6a21\u578b\u751f\u6210\u5b8c\u5168\u865a\u6784\u7684\u95ee\u9898\uff08\u4f8b\u5982\uff0c\u5173\u4e8e\u7269\u7406\u4e2d\u5b8c\u5168\u4e0d\u5b58\u5728\u7684\u6982\u5ff5\uff09\uff0c\u7136\u540e\u8ba9\u53e6\u4e00\u4e2a\u6a21\u578b\u8fdb\u884c\u56de\u7b54\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u95ee\u9898\u5b8c\u5168\u865a\u6784\uff0c\u4f46\u6240\u6709\u6a21\u578b\u90fd\u80fd\u6210\u529f\u56de\u7b54\u5bf9\u65b9\u7684\u95ee\u9898\uff0c\u8fd9\u8868\u660e\u5728\u8fd9\u6837\u7684\u5e7b\u89c9\u8fc7\u7a0b\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u5171\u4eab\u7740\u4e00\u4e2a\u201c\u5171\u540c\u7684\u60f3\u8c61\u7a7a\u95f4\u201d\u3002 \u6211\u4eec\u5bf9\u8fd9\u4e00\u73b0\u8c61\u8fdb\u884c\u4e86\u7cfb\u5217\u8c03\u67e5\uff0c\u5e76\u8ba8\u8bba\u4e86\u5b83\u5bf9\u6a21\u578b\u540c\u8d28\u6027\u3001\u5e7b\u89c9\u4ee5\u53ca\u8ba1\u7b97\u521b\u9020\u529b\u7684\u542f\u793a\u3002|\n", "2407.16576": "|**2024-07-23**|**Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs**|Yifan Xia et.al.|[2407.16576](http://arxiv.org/abs/2407.16576)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u68c0\u6d4b\u52a0\u5bc6API\u8bef\u7528\u65b9\u9762\u6240\u9762\u4e34\u7684\u6311\u6218\u4e0e\u673a\u9047\u3002\u5728\u5f53\u524d\u81ea\u52a8\u5316\u68c0\u6d4b\u6280\u672f\u8fdb\u6b65\u7684\u57fa\u7840\u4e0a\uff0c\u5bf9\u4e8e\u590d\u6742\u76ee\u6807\u7684\u7cbe\u786e\u5ea6\u4e0b\u964d\u4e3b\u8981\u5f52\u56e0\u4e8e\u624b\u52a8\u5b9a\u4e49\u6a21\u5f0f\u7684\u4f9d\u8d56\u3002LLM\u4ee5\u5176\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\uff0c\u5728\u6b64\u5173\u952e\u5b89\u5168\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5c06LLM\u5e94\u7528\u4e8e\u8fd9\u4e00\u9886\u57df\u5b58\u5728\u6311\u6218\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u5b83\u4eec\u56fa\u6709\u7684\u968f\u673a\u6027\u548c\u4f17\u6240\u5468\u77e5\u7684\u5e7b\u89c9\u95ee\u9898\u5bfc\u81f4\u7684\u4e0d\u7a33\u5b9a\u6027\u3002 \u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30LLM\u5728\u68c0\u6d4b\u52a0\u5bc6\u8bef\u7528\u65b9\u9762\u7684\u53ef\u9760\u6027\uff0c\u5e76\u63a2\u7d22\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5229\u7528\u6db5\u76d6\u4eba\u5de5\u6784\u5efa\u6837\u672c\u548c\u5b9e\u9645\u9879\u76ee\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8fdb\u884c\u5206\u6790\u3002\u901a\u8fc7\u6df1\u5165\u5206\u679011,940\u4efdLLM\u751f\u6210\u7684\u62a5\u544a\uff0c\u6211\u4eec\u63ed\u793a\u4e86LLM\u56fa\u6709\u4e0d\u7a33\u5b9a\u6027\u7684\u666e\u904d\u5b58\u5728\uff0c\u5bfc\u81f4\u8d85\u8fc7\u4e00\u534a\u7684\u62a5\u544a\u88ab\u8bef\u62a5\u4e3a\u8bef\u7528\u3002\u7136\u800c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u901a\u8fc7\u9650\u5236\u95ee\u9898\u8303\u56f4\u5e76\u4e0eLLM\u7684\u81ea\u6211\u4fee\u6b63\u80fd\u529b\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u68c0\u6d4b\u7684\u53ef\u9760\u6027\u3002\u4f18\u5316\u7684\u65b9\u6cd5\u5b9e\u73b0\u4e86\u63a5\u8fd190%\u7684\u68c0\u6d4b\u7387\uff0c\u8d85\u8d8a\u4f20\u7edf\u65b9\u6cd5\uff0c\u5e76\u5728\u73b0\u6709\u57fa\u51c6\u4e2d\u53d1\u73b0\u4e86\u672a\u88ab\u53d1\u73b0\u7684\u8bef\u7528\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u6301\u7eed\u963b\u788dLLM\u53ef\u9760\u6027\u7684\u5931\u8d25\u6a21\u5f0f\uff0c\u5305\u62ec\u52a0\u5bc6\u77e5\u8bc6\u4e0d\u8db3\u548c\u4ee3\u7801\u8bed\u4e49\u8bef\u89e3\u3002 \u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u4ee5LLM\u4e3a\u57fa\u7840\u7684\u5de5\u4f5c\u6d41\u7a0b\u6765\u68c0\u67e5\u5f00\u6e90\u4ed3\u5e93\uff0c\u6700\u7ec8\u53d1\u73b0\u4e8663\u4e2a\u771f\u5b9e\u7684\u52a0\u5bc6\u8bef\u7528\u6848\u4f8b\u3002\u5176\u4e2d46\u4e2a\u5df2\u88ab\u5f00\u53d1\u793e\u533a\u8ba4\u53ef\uff0c23\u4e2a\u6b63\u5728\u5904\u7406\u4e2d\uff0c6\u4e2a\u5df2\u5f97\u5230\u89e3\u51b3\u3002\u8003\u8651\u5230\u5f00\u53d1\u8005\u53cd\u9988\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u548cLLM\u5b89\u5168\u5de5\u5177\u53d1\u5c55\u7684\u5efa\u8bae\u3002|\n", "2407.16565": "|**2024-07-23**|**Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models**|Ioana Buhnila et.al.|[2407.16565](http://arxiv.org/abs/2407.16565)|**[link](https://github.com/ATILF-UMR7118/pRAGe)**|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u5bf9\u516c\u4f17\u800c\u8a00\u53d8\u5f97\u6108\u53d1\u4fbf\u6377\u3002\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4eba\u4eec\u5728\u533b\u7597\u5efa\u8bae\u65b9\u9762\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u60c5\u51b5\u96be\u4ee5\u8ffd\u8e2a\u3002\u5927\u578b\u8bed\u8a00\u751f\u6210\u6a21\u578b\u5b58\u5728\u4e24\u4e2a\u5173\u952e\u95ee\u9898\uff1a\u9996\u5148\uff0c\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u9519\u8bef\u63a8\u7406\uff0c\u56e0\u6b64\u7528\u4e8e\u533b\u7597\u76ee\u7684\u65f6\u9700\u8981\u5177\u5907\u79d1\u5b66\u6027\u548c\u4e8b\u5b9e\u6027\uff1b\u5176\u6b21\uff0c\u7531\u4e8e\u6a21\u578b\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u8ba1\u7b97\u8d44\u6e90\u6784\u6210\u91cd\u5927\u6311\u6218\u3002 \u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3apRAGe\u7684\u7ba1\u9053\uff0c\u65e8\u5728\u901a\u8fc7\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u8fdb\u884c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u4e0e\u8bc4\u4f30\uff0c\u4ee5\u5b9e\u73b0\u6cd5\u8bed\u533b\u5b66\u77ed\u8bed\u751f\u6210\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u6027\u4ee5\u53ca\u5916\u90e8\u77e5\u8bc6\u5e93\u5728\u533b\u5b66\u77ed\u8bed\u751f\u6210\u4e2d\u7684\u5f71\u54cd\u3002|\n", "2407.16557": "|**2024-07-23**|**Patched RTC: evaluating LLMs for diverse software development tasks**|Asankhaya Sharma et.al.|[2407.16557](http://arxiv.org/abs/2407.16557)|**[link](https://github.com/codelion/optillm/blob/main/rto.py)**|\u8fd9\u7bc7\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8865\u4e01\u5f80\u8fd4\u6b63\u786e\u6027\uff08Patched RTC\uff09\u201d\u7684\u65b0\u578b\u8bc4\u4f30\u65b9\u6cd5\uff0c\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\uff0c\u7279\u522b\u662f\u201c\u5916\u5faa\u73af\u201d\u6d3b\u52a8\uff0c\u5982\u9519\u8bef\u4fee\u590d\u3001\u4ee3\u7801\u5ba1\u67e5\u548c\u6587\u6863\u66f4\u65b0\u3002Patched RTC\u662f\u5bf9\u539f\u5f80\u8fd4\u6b63\u786e\u6027\u65b9\u6cd5\u7684\u6269\u5c55\uff0c\u9002\u7528\u4e8e\u4efb\u4f55LLM\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u81ea\u6211\u8bc4\u4f30\u6846\u67b6\uff0c\u65e0\u9700\u4eba\u5de5\u5e72\u9884\u5373\u53ef\u6d4b\u91cf\u6a21\u578b\u54cd\u5e94\u7684\u4e00\u81f4\u6027\u548c\u7a33\u5065\u6027\u3002\u7814\u7a76\u663e\u793a\u4e86Patched RTC\u5206\u6570\u4e0e\u7279\u5b9a\u4efb\u52a1\u51c6\u786e\u6027\u6307\u6807\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u5c06\u5176\u4f5c\u4e3a\u66ff\u4ee3LLM-as-Judge\u8303\u5f0f\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5f00\u653e\u57df\u4efb\u52a1\u8bc4\u4f30\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u540d\u4e3apatchwork\u7684\u5f00\u6e90\u6846\u67b6\u5b9e\u73b0Patched RTC\uff0c\u5728\u5404\u79cd\u8865\u4e01\u6d41\u4e2d\u5b9e\u73b0\u4e86\u5bf9\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u7684\u900f\u660e\u8bc4\u4f30\u3002 \u6bd4\u8f83GPT-3.5\u548cGPT-4\u6a21\u578b\u5728\u4e0d\u540c\u8f6f\u4ef6\u5f00\u53d1\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86Patched RTC\u80fd\u591f\u6709\u6548\u5730\u533a\u5206\u6a21\u578b\u6027\u80fd\u548c\u4efb\u52a1\u96be\u5ea6\u3002\u8bba\u6587\u8fd8\u63a2\u8ba8\u4e86\u4e00\u81f4\u6027\u63d0\u793a\u5bf9\u63d0\u9ad8\u6a21\u578b\u51c6\u786e\u6027\u7684\u5f71\u54cd\uff0c\u8868\u660ePatched RTC\u53ef\u4ee5\u6307\u5bfc\u63d0\u793a\u4f18\u5316\u548c\u6a21\u578b\u9009\u62e9\uff0c\u4ee5\u9002\u5e94\u590d\u6742\u7684\u8f6f\u4ef6\u5f00\u53d1\u6d41\u7a0b\u3002|\n", "2407.16552": "|**2024-07-24**|**MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues**|Liyun Zhang et.al.|[2407.16552](http://arxiv.org/abs/2407.16552)|null|\u5728\u89c6\u89c9\u3001\u542c\u89c9\u548c\u8bed\u8a00\u7b49\u591a\u6a21\u6001\u7ebf\u7d22\u7684\u89c6\u9891\u4e2d\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5c55\u793a\u4e86\u5353\u8d8a\u7684\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\u80fd\u529b\uff0c\u80fd\u591f\u7efc\u5408\u8fd9\u4e9b\u7ebf\u7d22\u6765\u8bc6\u522b\u4eba\u7c7b\u7684\u60c5\u7eea\u72b6\u6001\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5ffd\u89c6\u4e86\u6355\u6349\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\u4ee5\u53ca\u89c6\u9891\u4e2d\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u4ece\u800c\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u9650\u5236\u4e86\u5b83\u4eec\u7684\u6709\u6548\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65f6\u95f4\u654f\u611f\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578bMicroEmo\uff0c\u65e8\u5728\u5c06\u6ce8\u610f\u529b\u96c6\u4e2d\u4e8e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u7ec6\u8282\u548c\u89c6\u9891\u4e2d\u7684\u8bdd\u8bed\u610f\u8bc6\u7247\u6bb5\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5305\u542b\u4e86\u4e24\u4e2a\u5173\u952e\u7684\u67b6\u6784\u8d21\u732e\uff1a 1. \u5168\u5c40-\u5c40\u90e8\u6ce8\u610f\u529b\u89c6\u89c9\u7f16\u7801\u5668\uff0c\u5b83\u7ed3\u5408\u4e86\u5168\u5c40\u5e27\u7ea7\u65f6\u95f4\u7ed1\u5b9a\u56fe\u50cf\u7279\u5f81\u4e0e\u9762\u90e8\u5fae\u8868\u60c5\u7684\u65f6\u95f4\u52a8\u6001\u5c40\u90e8\u7279\u5f81\uff0c\u5b9e\u73b0\u4e86\u5bf9\u6574\u4f53\u548c\u5c40\u90e8\u4fe1\u606f\u7684\u6709\u6548\u878d\u5408\uff1b 2. \u4e00\u4e2a\u8bdd\u8bed\u610f\u8bc6\u7684\u89c6\u9891Q-Former\uff0c\u5b83\u901a\u8fc7\u4e3a\u6bcf\u4e2a\u8bdd\u8bed\u6bb5\u843d\u548c\u6574\u4e2a\u89c6\u9891\u751f\u6210\u89c6\u89c9\u4ee4\u724c\u5e8f\u5217\u6765\u6355\u83b7\u591a\u5c42\u6b21\u548c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\uff0c\u7136\u540e\u5c06\u5b83\u4eec\u7ec4\u5408\u5728\u4e00\u8d77\uff0c\u4ee5\u6355\u6349\u591a\u5c3a\u5ea6\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u3002 \u521d\u6b65\u7684\u5b9a\u6027\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u4e00\u4e2a\u5229\u7528\u591a\u6a21\u6001\u548c\u591a\u65b9\u9762\u7ebf\u7d22\u4ee5\u5f00\u653e\u8bcd\u6c47\uff08OV\uff09\u65b9\u5f0f\u9884\u6d4b\u60c5\u7eea\u7684\u65b0\u89e3\u91ca\u6027\u591a\u6a21\u6001\u60c5\u7eea\u8bc6\u522b\uff08EMER\uff09\u4efb\u52a1\u4e2d\uff0cMicroEmo\u76f8\u8f83\u4e8e\u6700\u65b0\u7684\u65b9\u6cd5\u663e\u793a\u51fa\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2407.16521": "|**2024-07-23**|**AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game**|Yizhou Chi et.al.|[2407.16521](http://arxiv.org/abs/2407.16521)|null|\u6218\u7565\u6027\u7684\u793e\u4ea4\u63a8\u65ad\u6e38\u620f\u662f\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u7406\u89e3\u548c\u63a8\u7406\u80fd\u529b\u7684\u5b9d\u8d35\u5b9e\u9a8c\u5e73\u53f0\uff0c\u5bf9\u4e8e\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u3001\u4eba\u5de5\u667a\u80fd\u9886\u57df\u4ee5\u53ca\u7b56\u7565\u6027\u6e38\u620f\u90fd\u6709\u91cd\u8981\u4ef7\u503c\u3002\u672c\u6587\u96c6\u4e2d\u4e8e\u5728\u6a21\u62df\u73af\u5883\u4e2d\u6784\u5efa\u4eba\u7c7b\u884c\u4e3a\u7684\u4ee3\u7406\uff0c\u4f7f\u7528\u300aAmong Us\u300b\u4f5c\u4e3a\u7814\u7a76\u6a21\u62df\u4eba\u7c7b\u884c\u4e3a\u7684\u5de5\u5177\u3002\u901a\u8fc7\u521b\u5efa\u4e00\u4e2a\u57fa\u4e8e\u6587\u672c\u7684\u6e38\u620f\u73af\u5883\uff0c\u79f0\u4e3aAmongAgent\uff0c\u8be5\u73af\u5883\u590d\u5236\u4e86\u300aAmong Us\u300b\u7684\u6e38\u620f\u52a8\u6001\u3002\u73a9\u5bb6\u626e\u6f14\u592a\u7a7a\u8239\u4e0a\u7684\u8239\u5458\uff0c\u4efb\u52a1\u662f\u8bc6\u522b\u7834\u574f\u592a\u7a7a\u8239\u7684\u5192\u540d\u9876\u66ff\u8005\u5e76\u6d88\u9664\u8239\u5458\u3002\u5728\u8fd9\u4e2a\u73af\u5883\u4e2d\uff0c\u6a21\u62df\u8bed\u8a00\u4ee3\u7406\u7684\u884c\u4e3a\u88ab\u5206\u6790\u3002\u5b9e\u9a8c\u6d89\u53ca\u4e0d\u540c\u8239\u5458\u548c\u5192\u540d\u9876\u66ff\u8005\u4eba\u683c\u539f\u578b\u914d\u7f6e\u7684\u591a\u6837\u5316\u7684\u6e38\u620f\u5e8f\u5217\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u6700\u5148\u8fdb\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u6709\u6548\u5730\u638c\u63e1\u6e38\u620f\u89c4\u5219\uff0c\u5e76\u6839\u636e\u5f53\u524d\u4e0a\u4e0b\u6587\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u4fc3\u8fdb\u5bf9\u5728\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u590d\u6742\u52a8\u4f5c\u7a7a\u95f4\u4e2d\u7684\u76ee\u6807\u5bfc\u5411\u6e38\u620f\u4e2d\u7684\u8bed\u8a00\u6a21\u578b\u6027\u80fd\u8fdb\u884c\u8fdb\u4e00\u6b65\u63a2\u7d22\uff0c\u8fd9\u4e9b\u8bbe\u7f6e\u63d0\u4f9b\u4e86\u8bc4\u4f30\u8bed\u8a00\u6a21\u578b\u5728\u793e\u4f1a\u9a71\u52a8\u573a\u666f\u4e2d\u8868\u73b0\u7684\u5b9d\u8d35\u673a\u4f1a\u3002|\n", "2407.17469": "|**2024-07-24**|**I Could've Asked That: Reformulating Unanswerable Questions**|Wenting Zhao et.al.|[2407.17469](http://arxiv.org/abs/2407.17469)|**[link](https://github.com/wenting-zhao/couldask)**|**\u5728\u4ece\u4e0d\u719f\u6089\u6587\u6863\u4e2d\u83b7\u53d6\u4fe1\u606f\u65f6\uff0c\u7528\u6237\u7ecf\u5e38\u63d0\u51fa\u65e0\u6cd5\u7531\u6587\u6863\u56de\u7b54\u7684\u95ee\u9898\u3002\u73b0\u6709\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u8bc6\u522b\u8fd9\u4e9b\u65e0\u6cd5\u56de\u7b54\u7684\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5e2e\u52a9\u7528\u6237\u91cd\u65b0\u6784\u5efa\u95ee\u9898\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u5b83\u4eec\u7684\u6574\u4f53\u5b9e\u7528\u6027\u3002\u6211\u4eec\u7cbe\u5fc3\u7f16\u6392\u4e86CouldAsk\uff0c\u4e00\u4e2a\u7528\u4e8e\u6587\u6863\u652f\u6301\u7684\u95ee\u7b54\u4efb\u52a1\u7684\u8bc4\u4f30\u57fa\u51c6\uff0c\u65e8\u5728\u7814\u7a76\u91cd\u65b0\u6784\u5efa\u65e0\u6cd5\u56de\u7b54\u95ee\u9898\u7684\u80fd\u529b\u3002\u8fd9\u4e2a\u57fa\u51c6\u5305\u62ec\u4e86\u73b0\u6709\u7684\u548c\u65b0\u7684\u6570\u636e\u96c6\u3002\u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u548c\u4e13\u6709LLMs\u5728CouldAsk\u4e0a\u7684\u8868\u73b0\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u91cd\u65b0\u6784\u5efa\u95ee\u9898\u65b9\u9762\u80fd\u529b\u6709\u9650\u3002\u5177\u4f53\u800c\u8a00\uff0cGPT-4\u548cLlama2-7B\u4ec5\u6210\u529f\u5730\u91cd\u65b0\u6784\u5efa\u4e86\u95ee\u9898\u768426%\u548c12%\u3002\u9519\u8bef\u5206\u6790\u663e\u793a\uff0c\u5931\u8d25\u7684\u91cd\u65b0\u6784\u5efa\u4e2d\u670962%\u7684\u539f\u56e0\u662f\u6a21\u578b\u53ea\u662f\u91cd\u8ff0\u4e86\u95ee\u9898\uff0c\u751a\u81f3\u751f\u6210\u4e86\u5b8c\u5168\u76f8\u540c\u7684\u95ee\u9898\u3002\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u57fa\u51c6\u4ee5\u53ca\u91cd\u73b0\u5b9e\u9a8c\u6240\u9700\u7684\u4ee3\u7801\u3002**|\n", "2407.17468": "|**2024-07-24**|**WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries**|Wenting Zhao et.al.|[2407.17468](http://arxiv.org/abs/2407.17468)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7b\u89c9\u95ee\u9898\u666e\u904d\u5b58\u5728\u7684\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u7684\u4e8b\u5b9e\u6027\u8bc4\u4f30\u57fa\u51c6\u672a\u80fd\u8986\u76d6\u73b0\u5b9e\u4e16\u754c\u7528\u6237\u5bfb\u6c42\u4fe1\u606f\u7684\u591a\u6837\u5316\u77e5\u8bc6\u9886\u57df\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86WildHallucinations\u57fa\u51c6\uff0c\u65e8\u5728\u8bc4\u4f30\u4e8b\u5b9e\u6027\u3002\u8be5\u57fa\u51c6\u901a\u8fc7\u4fc3\u4f7fLLM\u751f\u6210\u6765\u81ea\u91ce\u5916\u7528\u6237-\u804a\u5929\u673a\u5668\u4eba\u5bf9\u8bdd\u4e2d\u7684\u5b9e\u4f53\u7684\u4fe1\u606f\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u8fd9\u4e9b\u751f\u6210\u5185\u5bb9\u968f\u540e\u81ea\u52a8\u4e0e\u4ece\u7f51\u7edc\u641c\u7d22\u7cfb\u7edf\u6536\u96c6\u7684\u6709\u7ec4\u7ec7\u7684\u77e5\u8bc6\u5e93\u8fdb\u884c\u4e8b\u5b9e\u68c0\u67e5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u4e00\u534a\u4ee5\u4e0a\u7684\u5b9e\u9645\u4e16\u754c\u5b9e\u4f53\u5e76\u6ca1\u6709\u76f8\u5173\u7684\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u3002\u6211\u4eec\u572815\u4e2aLLM\u4e0a\u5bf97919\u4e2a\u5b9e\u4f53\u8fdb\u884c\u4e86118785\u6b21\u751f\u6210\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u53d1\u73b0\uff0cLLM\u5728\u6ca1\u6709\u7ef4\u57fa\u767e\u79d1\u9875\u9762\u7684\u5b9e\u4f53\u4e0a\u4ea7\u751f\u66f4\u591a\u7684\u5e7b\u89c9\uff0c\u5e76\u4e14\u4e0d\u540c\u9886\u57df\u7684\u5e7b\u89c9\u7387\u5b58\u5728\u5dee\u5f02\u3002\u6700\u540e\uff0c\u5728\u4f7f\u7528\u76f8\u540c\u7684\u5e95\u5c42\u6a21\u578b\u65f6\uff0c\u4ec5\u589e\u52a0\u68c0\u7d22\u7ec4\u4ef6\u53ef\u4ee5\u7565\u5fae\u51cf\u5c11\u5e7b\u89c9\uff0c\u4f46\u65e0\u6cd5\u5b8c\u5168\u6d88\u9664\u5e7b\u89c9\u3002|\n", "2407.17467": "|**2024-07-24**|**CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models**|Jiawei Gu et.al.|[2407.17467](http://arxiv.org/abs/2407.17467)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f80\u5f80\u5728\u7279\u5b9a\u9886\u57df\u5185\u8868\u73b0\u4e0d\u4f73\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u7279\u5b9a\u9886\u57df\u7684\u6216\u4e13\u6709\u8bed\u6599\u5e93\u3002\u8fde\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u901a\u8fc7\u56de\u653e\u901a\u7528\u8bed\u6599\u5e76\u6ce8\u5165\u65b0\u9886\u57df\u7684\u7279\u5b9a\u77e5\u8bc6\u6765\u589e\u5f3aLLM\u7684\u80fd\u529b\uff0c\u4ee5\u6b64\u9632\u6b62\u707e\u96be\u6027\u9057\u5fd8\u3002\u7136\u800c\uff0c\u5728\u901a\u7528\u8bed\u6599\u548c\u9886\u57df\u7279\u5b9a\u8bed\u6599\u7684\u6df7\u5408\u6bd4\u4f8b\u4e0a\uff0c\u4eba\u4eec\u901a\u5e38\u91c7\u53d6\u7684\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5b9e\u9645\u8bad\u7ec3\u6548\u7387\u7684\u4f4e\u4e0b\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5c1d\u8bd5\u4eceCPT\u7684\u6838\u5fc3\u51fa\u53d1\u91cd\u65b0\u5ba1\u89c6LLM\u7684\u7f29\u653e\u884c\u4e3a\uff0c\u5e76\u53d1\u73b0\u635f\u5931\u3001\u6df7\u5408\u6bd4\u7387\u4e0e\u8bad\u7ec3\u4ee4\u724c\u89c4\u6a21\u4e4b\u95f4\u7684\u5e42\u5f8b\u5173\u7cfb\u3002\u6211\u4eec\u6b63\u5f0f\u5b9a\u4e49\u4e86\u901a\u7528\u80fd\u529b\u548c\u9886\u57df\u7279\u5b9a\u80fd\u529b\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u4ece\u800c\u786e\u5b9a\u4e86\u901a\u7528\u6570\u636e\u548c\u9886\u57df\u6570\u636e\u7684\u4e34\u754c\u6df7\u5408\u6bd4\u7387\uff08CMR\uff09\u3002\u901a\u8fc7\u627e\u5230\u5e73\u8861\u70b9\uff0cCMR\u4fdd\u6301\u4e86\u6a21\u578b\u7684\u901a\u7528\u80fd\u529b\uff0c\u5e76\u5b9e\u73b0\u4e86\u671f\u671b\u7684\u9886\u57df\u8fc1\u79fb\uff0c\u786e\u4fdd\u4e86\u53ef\u7528\u8d44\u6e90\u7684\u6700\u5927\u5316\u5229\u7528\u3002\u56e0\u6b64\uff0c\u5982\u679c\u91cd\u89c6\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u5e73\u8861\uff0cCMR\u53ef\u4ee5\u88ab\u8ba4\u4e3a\u662f\u6700\u4f73\u6df7\u5408\u6bd4\u7387\u3002 \u901a\u8fc7\u5927\u91cf\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u5b9e\u4e86CMR\u7684\u53ef\u9884\u6d4b\u6027\uff0c\u5e76\u63d0\u51fa\u4e86CMR\u7f29\u653e\u5b9a\u5f8b\uff0c\u5e76\u5bf9\u5176\u4e00\u822c\u6027\u8fdb\u884c\u4e86\u9a8c\u8bc1\u3002\u8fd9\u4e9b\u53d1\u73b0\u63d0\u4f9b\u4e86\u4f18\u5316LLM\u5728\u7279\u5b9a\u9886\u57df\u5185\u7684\u8bad\u7ec3\u7684\u5b9e\u7528\u6307\u5357\uff0c\u786e\u4fdd\u5728\u6709\u6548\u7ba1\u7406\u8bad\u7ec3\u8d44\u6e90\u7684\u540c\u65f6\uff0c\u65e2\u4fdd\u6301\u901a\u7528\u6027\u80fd\u53c8\u5b9e\u73b0\u9886\u57df\u7279\u5b9a\u6027\u80fd\u3002|\n", "2407.17453": "|**2024-07-24**|**$VILA^2$: VILA Augmented VILA**|Yunhao Fang et.al.|[2407.17453](http://arxiv.org/abs/2407.17453)|null|\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(VLMs)\u7684\u53d1\u5c55\u8fc5\u901f\uff0c\u5f97\u76ca\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u6a21\u578b\u67b6\u6784\u548c\u8bad\u7ec3\u57fa\u7840\u8bbe\u65bd\u5728\u5feb\u901f\u8fdb\u6b65\uff0c\u4f46\u6570\u636e\u6536\u96c6\u4e0e\u6574\u7406\u7684\u5de5\u4f5c\u4ecd\u88ab\u5ffd\u89c6\u3002\u5f53\u6570\u636e\u7684\u6570\u91cf\u4e0e\u8d28\u91cf\u6210\u4e3a\u74f6\u9888\u65f6\uff0c\u73b0\u6709\u65b9\u6cd5\u8981\u4e48\u76f4\u63a5\u4ece\u4e92\u8054\u7f51\u4e0a\u722c\u53d6\u66f4\u591a\u539f\u59cb\u6570\u636e\uff0c\u8fd9\u4e9b\u6570\u636e\u7684\u8d28\u91cf\u65e0\u6cd5\u4fdd\u8bc1\uff0c\u8981\u4e48\u4ece\u9ed1\u76d2\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4V/\u91d1\u725b\u5ea7\uff09\u4e2d\u63d0\u53d6\u6570\u636e\uff0c\u5bfc\u81f4\u6027\u80fd\u53d7\u5230\u8be5\u6a21\u578b\u7684\u9650\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5305\u62ec\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u548c\u4e13\u5bb6\u589e\u5f3a\u6b65\u9aa4\uff0c\u4ee5\u8fed\u4ee3\u5730\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u6a21\u578b\u6027\u80fd\u3002 \u5728\u81ea\u6211\u589e\u5f3a\u6b65\u9aa4\u4e2d\uff0cVLM\u91cd\u65b0\u751f\u6210\u5176\u81ea\u8eab\u7684\u9884\u8bad\u7ec3\u6570\u636e\uff0c\u4ee5\u63d0\u5347\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u4ece\u8fd9\u4e2a\u7cbe\u70bc\u7684\u6570\u636e\u96c6\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ee5\u6539\u5584\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u91cd\u590d\u8fdb\u884c\u591a\u6b21\u3002\u4e00\u65e6\u81ea\u6211\u589e\u5f3a\u8fbe\u5230\u9971\u548c\uff0c\u6211\u4eec\u5c06\u91c7\u7528\u51e0\u4e2a\u4e13\u95e8\u9886\u57dfVLM\uff0c\u8fd9\u4e9bVLM\u662f\u4ece\u81ea\u6211\u589e\u5f3a\u7684VLM\u4e2d\u5fae\u8c03\u800c\u6765\u7684\uff0c\u5177\u6709\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u901a\u8fc7\u4efb\u52a1\u5bfc\u5411\u7684\u91cd\u65b0\u751f\u6210\u548c\u91cd\u65b0\u8bad\u7ec3\uff0c\u8fdb\u4e00\u6b65\u5c06\u4e13\u5bb6\u77e5\u8bc6\u6ce8\u5165\u901a\u7528\u6a21\u578b\u4e2d\u3002 \u901a\u8fc7\u7ed3\u5408\u81ea\u6211\u589e\u5f3a\u548c\u4e13\u5bb6\u589e\u5f3a\u7684\u8bad\u7ec3\uff0c\u6211\u4eec\u5f15\u5165\u4e86VILA\u00b2\uff08VILA\u589e\u5f3a-VILA\uff09\u6a21\u578b\u5bb6\u65cf\uff0c\u8be5\u5bb6\u65cf\u5728\u5e7f\u6cdb\u7684\u4efb\u52a1\u4e0a\u6301\u7eed\u63d0\u9ad8\u4e86\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u6210\u679c\uff0c\u5e76\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u6a21\u578b\u4e2dMMMU\u6392\u884c\u699c\u4e0a\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u7ed3\u679c\u3002|\n", "2407.17417": "|**2024-07-24**|**Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?**|Michael-Andrei Panaitescu-Liess et.al.|[2407.17417](http://arxiv.org/abs/2407.17417)|null|\u672c\u6587\u9996\u5148\u63a2\u8ba8\u4e86\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u5d4c\u5165\u6c34\u5370\u4f5c\u4e3a\u9632\u6b62\u751f\u6210\u7248\u6743\u4fb5\u6743\u6587\u672c\u7684\u6709\u6548\u624b\u6bb5\u3002\u901a\u8fc7\u7406\u8bba\u5206\u6790\u548c\u5b9e\u8bc1\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5728LLM\u4e2d\u878d\u5165\u6c34\u5370\u80fd\u591f\u663e\u8457\u964d\u4f4e\u751f\u6210\u7248\u6743\u5185\u5bb9\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u89e3\u51b3LLM\u90e8\u7f72\u8fc7\u7a0b\u4e2d\u7684\u4e00\u9879\u5173\u952e\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u6c34\u5370\u5bf9\u6210\u5458\u5f52\u5c5e\u63a8\u65ad\u653b\u51fb\uff08Membership Inference Attacks\uff0cMIAs\uff09\u7684\u5f71\u54cd\uff0cMIAs\u65e8\u5728\u8bc6\u522b\u6837\u672c\u662f\u5426\u5c5e\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u80fd\u7528\u4e8e\u68c0\u6d4b\u7248\u6743\u8fdd\u89c4\u884c\u4e3a\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u6c34\u5370\u964d\u4f4e\u4e86MIAs\u7684\u6210\u529f\u7387\uff0c\u4f7f\u68c0\u6d4b\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u4e2d\u7248\u6743\u6587\u672c\u53d8\u5f97\u590d\u6742\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u6280\u672f\u6765\u63d0\u9ad8\u5728\u6c34\u5370\u73af\u5883\u4e0b\u6700\u8fd1MIAs\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u5f00\u53d1\u9002\u5e94\u6027\u65b9\u6cd5\u4ee5\u7814\u7a76\u5177\u6709\u6f5c\u5728\u6cd5\u5f8b\u5f71\u54cd\u7684LLM\u5173\u952e\u95ee\u9898\u7684\u91cd\u8981\u6027\u3002|\n", "2407.17412": "|**2024-07-24**|**(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork**|Tianjin Huang et.al.|[2407.17412](http://arxiv.org/abs/2407.17412)|null|\u5927\u578b\u795e\u7ecf\u7f51\u7edc\u5728\u4e0d\u540c\u9886\u57df\u5982\u89c6\u89c9\u548c\u8bed\u8a00\u5904\u7406\u65b9\u9762\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u5c3d\u7ba1\u8fd9\u4f34\u968f\u7740\u5de8\u5927\u7684\u8ba1\u7b97\u8d44\u6e90\u6210\u672c\u3002\u538b\u7f29\u6587\u732e\u4e2d\u63d0\u51fa\u7684\u7ed3\u6784\u6a21\u578b\u526a\u679d\u7b97\u6cd5\u662f\u4fc3\u8fdb\u6a21\u578b\u6548\u7387\u7684\u5173\u952e\u65b9\u6cd5\uff0c\u5f97\u76ca\u4e8e\u5176\u52a0\u901f\u53cb\u597d\u7684\u7a00\u758f\u6027\u6a21\u5f0f\u3002\u7ed3\u6784\u526a\u679d\u7684\u6838\u5fc3\u95ee\u9898\u662f\u5982\u4f55\u4f30\u8ba1\u901a\u9053\u7684\u91cd\u8981\u6027\u3002\u4e0e\u6b64\u5e76\u884c\uff0c\u6570\u636e\u4e3a\u4e2d\u5fc3\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u4f5c\u8868\u660e\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u6280\u672f\u80fd\u591f\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u60ca\u4eba\u7684\u6cdb\u5316\u80fd\u529b\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e00\u4e2a\u8ff7\u4eba\u7684\u53ef\u80fd\u6027\u2014\u2014\u5229\u7528\u89c6\u89c9\u63d0\u793a\u6765\u6355\u6349\u901a\u9053\u91cd\u8981\u6027\uff0c\u5e76\u63a8\u5bfc\u51fa\u9ad8\u8d28\u91cf\u7684\u7ed3\u6784\u7a00\u758f\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7b97\u6cd5\u6846\u67b6\uff0c\u5373\\texttt{PASS}\u3002\u5b83\u662f\u4e00\u79cd\u5b9a\u5236\u7684\u8d85\u7f51\u7edc\uff0c\u63a5\u53d7\u89c6\u89c9\u63d0\u793a\u548c\u7f51\u7edc\u6743\u91cd\u7edf\u8ba1\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4ee5\u9012\u5f52\u65b9\u5f0f\u8f93\u51fa\u9010\u5c42\u901a\u9053\u7a00\u758f\u6027\u3002\u8fd9\u79cd\u8bbe\u8ba1\u8003\u8651\u4e86\u5c42\u4e4b\u95f4\u901a\u9053\u7684\u5185\u5728\u4f9d\u8d56\u6027\u3002\u8de8\u591a\u4e2a\u7f51\u7edc\u67b6\u6784\u548c\u516d\u4e2a\u6570\u636e\u96c6\u7684\u5168\u9762\u5b9e\u9a8c\u663e\u793a\u4e86\\texttt{PASS}\u5728\u5b9a\u4f4d\u826f\u597d\u7ed3\u6784\u7a00\u758f\u6027\u7684\u4f18\u52bf\u3002\u4f8b\u5982\uff0c\u5728\u76f8\u540c\u7684FLOPs\u6c34\u5e73\u4e0b\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u5728Food101\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u4e861%-3%\u66f4\u9ad8\u7684\u51c6\u786e\u6027\uff1b\u6216\u8005\u5728\u83b7\u5f97\u4e0e\u57fa\u7ebf\u76f8\u540c\u768480%\u51c6\u786e\u5ea6\u65f6\uff0c\\texttt{PASS}\u5b50\u7f51\u7edc\u80fd\u591f\u5b9e\u73b00.35\u500d\u66f4\u591a\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2407.17404": "|**2024-07-24**|**Grammar-based Game Description Generation using Large Language Models**|Tsunehiko Tanaka et.al.|[2407.17404](http://arxiv.org/abs/2407.17404)|null|\u4e3a\u4e86\u964d\u4f4e\u6e38\u620f\u8bbe\u8ba1\u5f00\u53d1\u7684\u95e8\u69db\uff0c\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u901a\u8fc7\u8ba1\u7b97\u8fc7\u7a0b\u751f\u6210\u6e38\u620f\u8bbe\u8ba1\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\uff0c\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5982\u8fdb\u5316\u7b97\u6cd5\u5df2\u53d6\u5f97\u6210\u529f\u3002\u5f97\u76ca\u4e8e\u6df1\u5ea6\u5b66\u4e60\u9886\u57df\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u65b9\u9762\u7684\u663e\u8457\u8fdb\u5c55\uff0c\u6e38\u620f\u751f\u6210\u65b9\u9762\u4e5f\u6709\u4e86\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6e38\u620f\u8bbe\u8ba1\u9886\u57df\u7684\u6570\u636e\u91cf\u6709\u9650\uff0c\u6df1\u5ea6\u5b66\u4e60\u5728\u4efb\u52a1\u5982\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u4e0a\u5e94\u7528\u4e0d\u8db3\u3002\u4e3a\u4e86\u5f00\u62d3\u5904\u7406\u6709\u9650\u6570\u636e\u5728\u81ea\u52a8\u5316\u6e38\u620f\u8bbe\u8ba1\u4e2d\u7684\u65b0\u9014\u5f84\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u3002LLMs\u53ef\u4ee5\u4ece\u5c11\u91cf\u793a\u8303\u793a\u4f8b\u4e2d\u6355\u83b7\u4efb\u52a1\u7279\u5f81\uff0c\u5e76\u5229\u7528\u9884\u8bad\u7ec3\u671f\u95f4\u83b7\u5f97\u7684\u80fd\u529b\u8fdb\u884c\u5e94\u7528\u3002\u6211\u4eec\u5f15\u5165\u4e86\u6e38\u620f\u63cf\u8ff0\u7684\u8bed\u6cd5\uff0c\u6709\u6548\u5730\u5bf9\u6e38\u620f\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u7ed3\u6784\u5316\uff0c\u4f7fLLMs\u80fd\u591f\u6355\u6349\u6e38\u620f\u63cf\u8ff0\u751f\u6210\u8fd9\u4e00\u590d\u6742\u4efb\u52a1\u7684\u7279\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u7801\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u8bed\u6cd5\u8fed\u4ee3\u6539\u8fdb\u751f\u6210\u7684\u8f93\u51fa\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u751f\u6210\u6e38\u620f\u63cf\u8ff0\u65b9\u9762\u8868\u73b0\u826f\u597d\u3002|\n", "2407.17398": "|**2024-07-24**|**3D Question Answering for City Scene Understanding**|Penglei Sun et.al.|[2407.17398](http://arxiv.org/abs/2407.17398)|null|\u5728\u4e09\u7ef4\u591a\u6a21\u6001\u95ee\u7b54\uff08MQA\uff09\u9886\u57df\uff0c\u901a\u8fc7\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5176\u6240\u5728\u73af\u5883\u4e2d\u7684\u4e09\u7ef4\u7a7a\u95f4\uff0c\u5bf9\u4e8e\u573a\u666f\u7406\u89e3\u5177\u6709\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5ba4\u5185\u5bb6\u5ead\u4efb\u52a1\u548c\u5ba4\u5916\u9053\u8def\u81ea\u52a8\u9a7e\u9a76\u4efb\u52a1\u4e0a\uff0c\u800c\u5bf9\u4e8e\u57ce\u5e02\u7ea7\u522b\u7684\u573a\u666f\u7406\u89e3\u4efb\u52a1\u63a2\u7d22\u6709\u9650\u3002\u73b0\u6709\u7814\u7a76\u5728\u7406\u89e3\u57ce\u5e02\u573a\u666f\u65f6\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u57ce\u5e02\u5c42\u9762\u7684\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u4ee5\u53ca\u4eba\u7c7b\u4e0e\u73af\u5883\u7684\u4e92\u52a8\u4fe1\u606f\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u4ece\u6570\u636e\u96c6\u548c\u65b9\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u5bf9\u4e09\u7ef4MQA\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\u3002\u4ece\u6570\u636e\u96c6\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aCity-3DQA\u7684\u65b0\u9896\u4e09\u7ef4MQA\u6570\u636e\u96c6\uff0c\u5b83\u662f\u9996\u4e2a\u878d\u5408\u57ce\u5e02\u573a\u666f\u8bed\u4e49\u548c\u4eba\u4e0e\u73af\u5883\u4ea4\u4e92\u4efb\u52a1\u7684\u6570\u636e\u96c6\u3002\u4ece\u65b9\u6cd5\u89d2\u5ea6\u6765\u770b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u57ce\u5e02\u7ea7\u522b\u7406\u89e3\u65b9\u6cd5\uff08Sg-CityU\uff09\uff0c\u5229\u7528\u573a\u666f\u56fe\u5f15\u5165\u7a7a\u95f4\u8bed\u4e49\u4fe1\u606f\u3002\u5728City-3DQA\u7684\u4e0d\u540c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684Sg-CityU\u65b9\u6cd5\u53d6\u5f97\u4e8663.94%\u548c63.76%\u7684\u51c6\u786e\u7387\uff0c\u76f8\u6bd4\u5ba4\u5185\u4e09\u7ef4MQA\u65b9\u6cd5\u548c\u4f7f\u7528\u5148\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u96f6\u6837\u672c\u65b9\u6cd5\uff0c\u5728\u9c81\u68d2\u6027\u548c\u6cdb\u5316\u80fd\u529b\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.17365": "|**2024-07-24**|**ViPer: Visual Personalization of Generative Models via Individual Preference Learning**|Sogand Salehi et.al.|[2407.17365](http://arxiv.org/abs/2407.17365)|null|\u4e0d\u540c\u7684\u7528\u6237\u5bf9\u4e8e\u540c\u4e00\u63d0\u793a\u751f\u6210\u7684\u4e0d\u540c\u56fe\u50cf\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u8fd9\u50ac\u751f\u4e86\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u7684\u6982\u5ff5\uff0c\u5373\u521b\u5efa\u4e0e\u4e2a\u4eba\u89c6\u89c9\u504f\u597d\u76f8\u5339\u914d\u7684\u56fe\u50cf\u3002\u76ee\u524d\u7684\u751f\u6210\u6a21\u578b\u662f\u65e0\u4e2a\u6027\u5316\u7684\uff0c\u5b83\u4eec\u88ab\u8c03\u6574\u4e3a\u5438\u5f15\u5e7f\u6cdb\u53d7\u4f17\u3002\u7528\u6237\u4f7f\u7528\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u7b26\u5408\u4e2a\u4eba\u504f\u597d\u7684\u56fe\u50cf\u4f9d\u8d56\u4e8e\u901a\u8fc7\u591a\u6b21\u8fed\u4ee3\u624b\u52a8\u8c03\u6574\u63d0\u793a\u7684\u8fc7\u7a0b\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u65e2\u4f4e\u6548\u53c8\u4e0d\u7406\u60f3\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u4e2a\u6027\u5316\u56fe\u50cf\u751f\u6210\u8fc7\u7a0b\uff1a\u9996\u5148\u901a\u8fc7\u9080\u8bf7\u7528\u6237\u5bf9\u4e00\u5c0f\u90e8\u5206\u56fe\u50cf\u8fdb\u884c\u8bc4\u8bba\uff0c\u89e3\u91ca\u4ed6\u4eec\u559c\u6b22\u6216\u4e0d\u559c\u6b22\u7684\u539f\u56e0\uff0c\u4ece\u800c\u6355\u6349\u7528\u6237\u7684\u901a\u7528\u504f\u597d\u3002\u57fa\u4e8e\u8fd9\u4e9b\u8bc4\u8bba\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u65ad\u51fa\u7528\u6237\u7684\u7ed3\u6784\u5316\u559c\u597d\u7684\u548c\u4e0d\u559c\u597d\u7684\u89c6\u89c9\u5c5e\u6027\uff0c\u5373\u4ed6\u4eec\u7684\u89c6\u89c9\u504f\u597d\u3002\u8fd9\u4e9b\u5c5e\u6027\u7528\u4e8e\u6307\u5bfc\u6587\u672c\u5230\u56fe\u50cf\u6a21\u578b\u751f\u6210\u66f4\u8d34\u8fd1\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u7684\u56fe\u50cf\u3002 \u901a\u8fc7\u4e00\u7cfb\u5217\u7528\u6237\u7814\u7a76\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f15\u5bfc\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u4e0e\u4e2a\u4eba\u7528\u6237\u89c6\u89c9\u504f\u597d\u9ad8\u5ea6\u4e00\u81f4\u7684\u751f\u6210\u7ed3\u679c\u3002|\n", "2407.17353": "|**2024-07-24**|**Scalify: scale propagation for efficient low-precision LLM training**|Paul Balan\u00e7a et.al.|[2407.17353](http://arxiv.org/abs/2407.17353)|**[link](https://github.com/graphcore-research/jax-scalify)**|**\u4f4e\u7cbe\u5ea6\u683c\u5f0f\uff0c\u5982float8\uff0c\u5df2\u88ab\u5f15\u5165\u673a\u5668\u5b66\u4e60\u52a0\u901f\u786c\u4ef6\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u548c\u63a8\u7406\u7684\u8ba1\u7b97\u6548\u7387\u3002\u7136\u800c\uff0c\u7531\u4e8e\u9700\u8981\u590d\u6742\u7684\u3001\u6709\u65f6\u662f\u8106\u5f31\u7684\u6280\u672f\u6765\u5339\u914d\u66f4\u9ad8\u7cbe\u5ea6\u7684\u8bad\u7ec3\u51c6\u786e\u5ea6\uff0cML\u793e\u533a\u5bf9\u4f4e\u7cbe\u5ea6\u683c\u5f0f\u7684\u91c7\u7eb3\u901f\u5ea6\u8f83\u6162\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aScalify\u7684\u7aef\u5230\u7aef\u7684\u7f29\u653e\u4f20\u64ad\u8303\u5f0f\uff0c\u7528\u4e8e\u8ba1\u7b97\u56fe\uff0c\u5b83\u6cdb\u5316\u5e76\u5f62\u5f0f\u5316\u4e86\u73b0\u6709\u7684\u5f20\u91cf\u7f29\u653e\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cScalify\u652f\u6301\u76f4\u63a5\u4f7f\u7528float8\u8fdb\u884c\u77e9\u9635\u4e58\u6cd5\u548c\u68af\u5ea6\u8868\u793a\uff0c\u4ee5\u53cafloat16\u4f18\u5316\u5668\u72b6\u6001\u5b58\u50a8\u3002\u6211\u4eec\u5bf9Scalify\u7684JAX\u5b9e\u73b0\u5df2\u7ecf\u5f00\u6e90\u5728https://github.com/graphcore-research/jax-scalify\u3002**|\n", "2407.18219": "|**2024-07-26**|**Recursive Introspection: Teaching Language Model Agents How to Self-Improve**|Yuxiao Qu et.al.|[2407.18219](http://arxiv.org/abs/2407.18219)|null|\u5728\u4f7f\u57fa\u7840\u6a21\u578b\u5177\u5907\u81ea\u6211\u53cd\u7701\u80fd\u529b\u4ee5\u4fc3\u8fdb\u667a\u80fd\u4ee3\u7406\u884c\u4e3a\u7684\u5173\u952e\u65b9\u9762\u5728\u4e8e\u4f7f\u5176\u80fd\u591f\u5bf9\u5176\u884c\u4e3a\u3001\u63a8\u7406\u4ee5\u53ca\u5728\u53ef\u7528\u8ba1\u7b97\u6216\u4ea4\u4e92\u589e\u52a0\u65f6\u7ea0\u6b63\u9519\u8bef\u7684\u80fd\u529b\u8fdb\u884c\u81ea\u6211\u53cd\u601d\u3002\u5373\u4f7f\u662f\u6700\u5f3a\u7684\u4e13\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e5f\u672a\u80fd\u5c55\u73b0\u51fa\u5728\u660e\u786e\u544a\u77e5\u5176\u72af\u9519\u7684\u60c5\u51b5\u4e0b\uff0c\u80fd\u591f\u8fde\u7eed\u6539\u8fdb\u5176\u54cd\u5e94\u5e8f\u5217\u7684\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRISE\uff08\u9012\u5f52\u5185\u7701\uff09\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5fae\u8c03LLMs\u4ee5\u5f15\u5165\u8fd9\u4e00\u80fd\u529b\uff0c\u5c3d\u7ba1\u4e4b\u524d\u7684\u7814\u7a76\u66fe\u5047\u8bbe\u8fd9\u79cd\u80fd\u529b\u53ef\u80fd\u65e0\u6cd5\u5b9e\u73b0\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u89c4\u5b9a\u4e86\u4e00\u4e2a\u8fed\u4ee3\u5fae\u8c03\u8fc7\u7a0b\uff0c\u8be5\u8fc7\u7a0b\u5c1d\u8bd5\u6559\u6388\u6a21\u578b\u5982\u4f55\u5728\u5176\u89e3\u51b3\u56f0\u96be\u6d4b\u8bd5\u65f6\u95ee\u9898\u7684\u4e0d\u6210\u529f\u5c1d\u8bd5\u540e\u4fee\u6539\u5176\u54cd\u5e94\uff0c\u5e76\u53ef\u9009\u5730\u83b7\u5f97\u989d\u5916\u7684\u73af\u5883\u53cd\u9988\u3002RISE\u5c06\u5355\u8f6e\u63d0\u793a\u7684\u5fae\u8c03\u89c6\u4e3a\u89e3\u51b3\u591a\u8f6e\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b\uff08MDP\uff09\uff0c\u5176\u4e2d\u521d\u59cb\u72b6\u6001\u4e3a\u63d0\u793a\u3002\u53d7\u5728\u7ebf\u6a21\u4eff\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u539f\u7406\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u591a\u8f6e\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u65e8\u5728\u8d4b\u4e88LLM\u9012\u5f52\u68c0\u6d4b\u5e76\u4fee\u6b63\u5176\u5148\u524d\u9519\u8bef\u5e76\u5728\u540e\u7eed\u8fed\u4ee3\u4e2d\u8fdb\u884c\u7ea0\u6b63\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRISE\u4f7fLlama2\u3001Llama3\u548cMistral\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u901a\u8fc7\u66f4\u591a\u8f6e\u6b21\u6539\u5584\u81ea\u5df1\uff0c\u4e0e\u7ed9\u5b9a\u7b49\u91cf\u63a8\u7406\u65f6\u95f4\u8ba1\u7b97\u76f8\u6bd4\uff0c\u8d85\u8fc7\u4e86\u51e0\u79cd\u5355\u8f6e\u7b56\u7565\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cRISE\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\uff0c\u901a\u5e38\u968f\u7740\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u800c\u83b7\u5f97\u66f4\u5927\u7684\u6536\u76ca\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cRISE\u5bf9\u56f0\u96be\u63d0\u793a\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u6709\u610f\u4e49\u7684\u6539\u8fdb\uff0c\u4ee5\u8fbe\u5230\u6b63\u786e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u6ca1\u6709\u56e0\u4e3a\u8868\u8fbe\u66f4\u590d\u6742\u7684\u5206\u5e03\u800c\u5bfc\u81f4\u5355\u8f6e\u80fd\u529b\u53d7\u5230\u5f71\u54cd\u3002|\n", "2407.18213": "|**2024-07-26**|**Exploring Scaling Trends in LLM Robustness**|Nikolaus Howe et.al.|[2407.18213](http://arxiv.org/abs/2407.18213)|null|\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u53ef\u9884\u6d4b\u5730\u901a\u8fc7\u589e\u52a0\u6a21\u578b\u7684\u5927\u5c0f\u548c\u8bad\u7ec3\u6570\u636e\u800c\u5f97\u5230\u6539\u5584\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u5df2\u8bad\u7ec3\u4e86\u4e00\u7cfb\u5217\u8d8a\u6765\u8d8a\u5927\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u5c55\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5bf9\u5bf9\u6297\u6027\u63d0\u793a\uff08\u5982\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff09\u975e\u5e38\u8106\u5f31\uff0c\u8fd9\u7c7b\u653b\u51fb\u4f1a\u64cd\u63a7\u6a21\u578b\u6267\u884c\u4e0d\u5e0c\u671b\u7684\u884c\u4e3a\uff0c\u4ece\u800c\u6784\u6210\u4e86\u91cd\u5927\u7684\u8bef\u7528\u98ce\u9669\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u968f\u7740\u6a21\u578b\u548c\u6570\u636e\u89c4\u6a21\u7684\u589e\u52a0\uff0c\u8ba1\u7b97\u673a\u89c6\u89c9\u6a21\u578b\u7684\u9c81\u68d2\u6027\u4e5f\u4f1a\u63d0\u9ad8\uff0c\u56e0\u6b64\u63d0\u51fa\u4e86\u8fd9\u6837\u4e00\u4e2a\u95ee\u9898\uff1a\u8bed\u8a00\u6a21\u578b\u7684\u9c81\u68d2\u6027\u662f\u5426\u4e5f\u4f1a\u968f\u89c4\u6a21\u7684\u6269\u5927\u800c\u63d0\u5347\uff1f\u6211\u4eec\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\u56de\u7b54\u4e86\u8fd9\u4e2a\u95ee\u9898\uff0c\u53d1\u73b0\u66f4\u5927\u7684\u6a21\u578b\u5728\u5bf9\u6297\u6027\u8bad\u7ec3\u4e0b\u6709\u663e\u8457\u66f4\u597d\u7684\u8868\u73b0\uff0c\u4f46\u5728\u6ca1\u6709\u660e\u786e\u9632\u5fa1\u63aa\u65bd\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u89c4\u6a21\u7684\u589e\u52a0\u5e76\u6ca1\u6709\u5e26\u6765\u4efb\u4f55\u76ca\u5904\u3002|\n", "2407.18158": "|**2024-07-25**|**Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models**|Sanae Lotfi et.al.|[2407.18158](http://arxiv.org/abs/2407.18158)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9884\u6d4b\u5e8f\u5217\u4e2d\u7684\u4e0b\u4e00\u4e2a\u4ee4\u724c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u538b\u7f29\u6280\u672f\u8ba1\u7b97\u4e86LLM\u7684\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\uff0c\u4f46\u5bf9\u4e8e\u5341\u4ebf\u53c2\u6570\u7ea7\u522b\u7684\u5927\u578b\u6a21\u578b\uff0c\u8fd9\u4e9b\u8fb9\u754c\u663e\u5f97\u65e0\u610f\u4e49\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u8fb9\u754c\u662f\u5728\u975e\u5e38\u6709\u9650\u7684\u538b\u7f29\u6280\u672f\u4e0b\u83b7\u5f97\u7684\uff0c\u9650\u5236\u4e86\u751f\u6210\u8d28\u91cf\u8f83\u4f4e\u6587\u672c\u7684\u538b\u7f29\u6a21\u578b\u3002\u66f4\u5173\u952e\u7684\u662f\uff0c\u73b0\u6709\u8fb9\u754c\u4f9d\u8d56\u4e8e\u8bad\u7ec3\u96c6\u4e2d\u72ec\u7acb\u540c\u5206\u5e03\uff08IID\uff09\u6587\u6863\u7684\u6570\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u8bad\u7ec3\u96c6\u5185\u6570\u91cf\u5e9e\u5927\u7684\u975eIID\u6784\u6210\u4ee4\u724c\uff0c\u8fd9\u4f7f\u5f97\u8fdb\u4e00\u6b65\u63d0\u9ad8\u8fb9\u754c\u7d27\u81f4\u6027\u6f5c\u529b\u672a\u88ab\u5145\u5206\u5229\u7528\u3002 \u672c\u7814\u7a76\u91c7\u7528\u9785\u7684\u6027\u8d28\u6765\u63a8\u5bfc\u6cdb\u5316\u8fb9\u754c\uff0c\u8fd9\u4e9b\u8fb9\u754c\u80fd\u591f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u96c6\u4e2d\u5305\u542b\u7684\u5927\u91cf\u4ee4\u724c\u4e2d\u83b7\u76ca\u3002\u4e0e\u8bad\u7ec3\u96c6\u76f8\u6bd4\uff0c\u6570\u636e\u96c6\u5305\u542b\u7684\u4ee4\u724c\u6570\u91cf\u8fdc\u591a\u4e8e\u6587\u6863\uff0c\u56e0\u6b64\u6211\u4eec\u7684\u6cdb\u5316\u8fb9\u754c\u4e0d\u4ec5\u5bb9\u5fcd\u4e86\u66f4\u4e3a\u5bbd\u677e\u7684\u538b\u7f29\u65b9\u6848\uff0c\u5b9e\u9645\u4e0a\u8fd8\u80fd\u4ece\u8fd9\u4e9b\u65b9\u6848\u4e2d\u83b7\u76ca\u3002\u6211\u4eec\u901a\u8fc7Monarch\u77e9\u9635\u3001Kronecker\u56e0\u5b50\u5206\u89e3\u548c\u540e\u8bad\u7ec3\u91cf\u5316\u7b49\u65b9\u6cd5\uff0c\u4e3aLLM\uff08\u5982LLaMA2-70B\uff09\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002\u4e0e\u4ee5\u5f80\u7684\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u9996\u6b21\u4e3a\u5728\u5b9e\u8df5\u4e2d\u90e8\u7f72\u5e76\u751f\u6210\u9ad8\u8d28\u91cf\u6587\u672c\u7684\u6a21\u578b\u5b9e\u73b0\u4e86\u975e\u7a7a\u6cdb\u5316\u8fb9\u754c\u3002|\n", "2407.18129": "|**2024-07-26**|**Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic**|Fakhraddin Alwajih et.al.|[2407.18129](http://arxiv.org/abs/2407.18129)|null|\u8fd1\u671f\u7684\u8fdb\u5c55\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u751f\u6210\u548c\u7406\u89e3\u56fe\u50cf\u5230\u6587\u672c\u5185\u5bb9\u65b9\u9762\u7684\u529f\u80fd\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u8fd9\u4e9b\u6210\u529f\uff0c\u4f46\u8fdb\u6b65\u4e3b\u8981\u5c40\u9650\u4e8e\u82f1\u8bed\uff0c\u7531\u4e8e\u5176\u4ed6\u8bed\u8a00\u5982\u963f\u62c9\u4f2f\u8bed\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u8d44\u6e90\u7684\u7a00\u7f3a\u6027\uff0c\u8fd9\u9650\u5236\u4e86\u963f\u62c9\u4f2f\u8bed\u7b49\u8bed\u8a00\u4e2d\u7ade\u4e89\u6027\u6a21\u578b\u7684\u53d1\u5c55\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u72b6\u51b5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u9ad8\u6548\u7684\u963f\u62c9\u4f2f\u8bed\u591a\u6a21\u6001\u52a9\u624b\u2014\u2014Dallah\uff0c\u5b83\u57fa\u4e8eLLaMA-2\u5148\u8fdb\u8bed\u8a00\u6a21\u578b\u6765\u4fc3\u8fdb\u591a\u6a21\u6001\u4ea4\u4e92\u3002Dallah\u5728\u963f\u62c9\u4f2f\u8bedMLLM\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002\u901a\u8fc7\u7ec6\u8c03\u516d\u4e2a\u963f\u62c9\u4f2f\u65b9\u8a00\uff0cDallah\u5c55\u793a\u4e86\u5176\u5904\u7406\u5305\u542b\u6587\u672c\u548c\u89c6\u89c9\u5143\u7d20\u7684\u590d\u6742\u65b9\u8a00\u4e92\u52a8\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u5728\u4e24\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff1a\u4e00\u4e2a\u8bc4\u4f30\u5176\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u6027\u80fd\uff0c\u53e6\u4e00\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u65b9\u8a00\u54cd\u5e94\u3002 \u9664\u4e86\u5728\u591a\u6a21\u6001\u4ea4\u4e92\u4efb\u52a1\u4e2d\u7684\u7a33\u5065\u6027\u80fd\u5916\uff0cDallah\u6709\u671b\u5f15\u9886\u8fdb\u4e00\u6b65\u5f00\u53d1\u65b9\u8a00\u610f\u8bc6\u7684\u963f\u62c9\u4f2f\u8bedMLLM\u7684\u53d1\u5c55\u3002|\n", "2407.18103": "|**2024-07-25**|**Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow**|Tian Guo et.al.|[2407.18103](http://arxiv.org/abs/2407.18103)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u5fae\u8c03\u6280\u672f\u5728\u5404\u79cd\u8bed\u8a00\u7406\u89e3\u548c\u751f\u6210\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5c06LLM\u7528\u4e8e\u57fa\u4e8e\u91d1\u878d\u65b0\u95fb\u6d41\u7684\u80a1\u7968\u56de\u62a5\u9884\u6d4b\u7684\u5fae\u8c03\u65b9\u6cd5\u3002\u5728\u91cf\u5316\u6295\u8d44\u9886\u57df\uff0c\u56de\u62a5\u9884\u6d4b\u662f\u540e\u7eed\u4efb\u52a1\u5982\u80a1\u7968\u6311\u9009\u548c\u7ec4\u5408\u4f18\u5316\u7b49\u7684\u57fa\u7840\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u62ec\u6587\u672c\u8868\u793a\u548c\u9884\u6d4b\u6a21\u5757\u7684\u6a21\u578b\u3002\u63d0\u51fa\u4e86\u6bd4\u8f83\u4ec5\u7f16\u7801\u5668\u548c\u4ec5\u89e3\u7801\u5668LLM\u7684\u4e24\u79cd\u65b9\u6cd5\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee5\u4e0d\u540c\u7684\u65b9\u5f0f\u751f\u6210\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u4e0d\u540c\u8868\u793a\u5bf9\u9884\u6d4b\u6027\u80fd\u7684\u5f71\u54cd\u4ecd\u662f\u4e00\u4e2a\u5f00\u653e\u7684\u95ee\u9898\u3002\u540c\u65f6\uff0c\u6211\u4eec\u6bd4\u8f83\u4e86\u5c06LLM\u7684token\u7ea7\u8868\u793a\u96c6\u6210\u5230\u9884\u6d4b\u6a21\u5757\u4e2d\u7684\u4e24\u79cd\u7b80\u5355\u65b9\u6cd5\u3002\u5728\u771f\u5b9e\u65b0\u95fb\u548c\u6295\u8d44\u8303\u56f4\u5185\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\u4ee5\u4e0b\u7ed3\u679c\uff1a\uff081\uff09\u4eceLLM\u7684token\u7ea7\u5d4c\u5165\u805a\u5408\u7684\u8868\u793a\u901a\u5e38\u80fd\u4ea7\u751f\u589e\u5f3a\u957f\u671f\u548c\u957f\u671f\u77ed\u671f\u6295\u8d44\u7ec4\u5408\u6027\u80fd\u7684\u56de\u62a5\u9884\u6d4b\uff1b\uff082\uff09\u5728\u76f8\u5bf9\u8f83\u5927\u7684\u6295\u8d44\u8303\u56f4\u5185\uff0c\u57fa\u4e8e\u89e3\u7801\u5668\u7684LLM\u9884\u6d4b\u6a21\u578b\u5bfc\u81f4\u66f4\u5f3a\u7684\u6295\u8d44\u7ec4\u5408\uff0c\u800c\u5728\u8f83\u5c0f\u7684\u8303\u56f4\u5185\uff0c\u6ca1\u6709\u4e00\u81f4\u7684\u8d62\u5bb6\uff1b\uff083\uff09\u4eceLLM\u6587\u672c\u8868\u793a\u4e2d\u5bfc\u51fa\u7684\u56de\u62a5\u9884\u6d4b\u5bf9\u4e8e\u6295\u8d44\u7ec4\u5408\u6784\u9020\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u4fe1\u53f7\uff0c\u4f18\u4e8e\u4f20\u7edf\u7684\u60c5\u7eea\u5f97\u5206\u3002|\n", "2407.18078": "|**2024-07-25**|**PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization**|Christopher Clarke et.al.|[2407.18078](http://arxiv.org/abs/2407.18078)|**[link](https://github.com/ChrisIsKing/Parameter-Efficient-Personalization)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u7684\u5174\u8d77\u4e3a\u4eba\u7c7b\u4e0eAI\u7684\u4ea4\u4e92\u5f00\u8f9f\u4e86\u65b0\u7684\u7bc7\u7ae0\u3002\u8fd9\u4e9b\u5148\u8fdb\u6a21\u578b\uff0c\u4ee5Chat-GPT\u4e3a\u4ee3\u8868\uff0c\u5c55\u73b0\u4e86\u5728\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u89c4\u6a21\u7684\u6307\u6570\u7ea7\u589e\u957f\uff0c\u4e00\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6a21\u578b\u4e2a\u6027\u5316\u2014\u2014\u7684\u7814\u7a76\u5374\u76f8\u5bf9\u532e\u4e4f\u3002\u5927\u578b\u57fa\u7840\u6a21\u578b\u5982GPT-3\u7b49\u4fa7\u91cd\u4e8e\u6784\u5efa\u901a\u7528\u6a21\u578b\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u4efb\u52a1\u548c\u7528\u6237\u7fa4\u4f53\u3002\u8fd9\u79cd\u7b56\u7565\u5f3a\u8c03\u4e86\u6a21\u578b\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c06\u7528\u6237\u89c6\u4e3a\u6574\u4f53\u800c\u975e\u4e2a\u4f53\u3002\u867d\u7136\u5728\u8bb8\u591a\u5e38\u89c1\u5e94\u7528\u4e2d\u5b9e\u7528\uff0c\u4f46\u8fd9\u79cd\u4e00\u5200\u5207\u7684\u65b9\u6cd5\u5f80\u5f80\u65e0\u6cd5\u6ee1\u8db3\u4eba\u7c7b\u591a\u6837\u6027\u548c\u4e2a\u6027\u5316\u9700\u6c42\u7684\u4e30\u5bcc\u6027\u3002\u4e3a\u4e86\u63a2\u8ba8\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86PEFT-U\u57fa\u51c6\uff1a\u4e00\u4e2a\u7528\u4e8e\u6784\u5efa\u548c\u8bc4\u4f30\u9762\u5411\u7528\u6237\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6a21\u578b\u7684\u65b0\u6570\u636e\u96c6\u3002PEFT-U\u5305\u542b\u4e86\u591a\u5143\u4e14\u4e2a\u6027\u5316\u7684\u8868\u8fbe\u4efb\u52a1\uff0c\u5176\u4e2d\u540c\u4e00\u8f93\u5165\u5bf9\u4e8e\u4e0d\u540c\u7528\u6237\u53ef\u80fd\u6709\u4e0d\u540c\u7684\u504f\u597d\u3002\u901a\u8fc7PEFT-U\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u9ad8\u6548\u5730\u4e2a\u6027\u5316LLM\u4ee5\u9002\u5e94\u7528\u6237\u7279\u5b9a\u504f\u597d\uff0c\u7279\u522b\u662f\u5728\u591a\u6837\u5316\u7684\u7528\u6237\u4e2d\u5fc3\u4efb\u52a1\u80cc\u666f\u4e0b\u3002**|\n", "2407.18069": "|**2024-07-25**|**C2P: Featuring Large Language Models with Causal Reasoning**|Abdolmahdi Bagheri et.al.|[2407.18069](http://arxiv.org/abs/2407.18069)|null|\u56e0\u679c\u63a8\u7406\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fbe\u5230\u4eba\u7c7b\u7ea7\u667a\u80fd\u7684\u4e3b\u8981\u969c\u788d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u56e0\u679c\u94fe\u63d0\u793a\uff08C2P\uff09\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u4e3a\u5f53\u524dLLM\u63d0\u4f9b\u56e0\u679c\u63a8\u7406\u80fd\u529b\u7684\u63a8\u7406\u6846\u67b6\u3002C2P\u81ea\u4e3b\u8fd0\u884c\uff0c\u5728\u56e0\u679c\u5b66\u4e60\u548c\u63a8\u7406\u9636\u6bb5\u5747\u65e0\u9700\u4f9d\u8d56\u5916\u90e8\u5de5\u5177\u6216\u6a21\u5757\uff0c\u5e76\u4e14\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230LLM\u7684\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u3002\u5728\u5404\u79cd\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cC2P\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u56e0\u679c\u5b66\u4e60\u548c\u540e\u7eed\u63a8\u7406\u51c6\u786e\u6027\u3002 \u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u901a\u8fc7C2P\u589e\u5f3aLLM\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u7684\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u89e3\u51b3\u533b\u7597\u3001\u533b\u5b66\u3001\u7ecf\u6d4e\u5b66\u3001\u6559\u80b2\u3001\u793e\u4f1a\u79d1\u5b66\u3001\u73af\u5883\u79d1\u5b66\u548c\u5e02\u573a\u8425\u9500\u7b49\u9886\u57df\u4e2d\u7684\u590d\u6742\u95ee\u9898\u3002\u5229\u7528\u5c11\u793a\u4f8b\u5b66\u4e60\uff0cGPT-4 Turbo \u4f7f\u7528C2P\uff0c\u4ec5\u4f7f\u7528\u516d\u4e2a\u793a\u4f8b\u5c31\u5b9e\u73b0\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u63a8\u7406\u51c6\u786e\u6027\u6bd4\u5728\u7c7b\u4f3c\u60c5\u51b5\u4e0b\u8fd1\u4e4e\u968f\u673a\u8fd0\u884c\u7684\u6700\u5148\u8fdbLLM\u9ad8\u51fa33%\u4ee5\u4e0a\u3002\u8fd9\u8bc1\u660e\u4e86\u5c06C2P\u96c6\u6210\u5230LLM\u8bad\u7ec3\u6216\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u8d4b\u4e88\u8fd9\u4e9b\u6a21\u578b\u9ad8\u7ea7\u56e0\u679c\u63a8\u7406\u80fd\u529b\uff0c\u5177\u6709\u53d8\u9769\u6027\u610f\u4e49\u3002|\n", "2407.18064": "|**2024-07-25**|**ComPeer: A Generative Conversational Agent for Proactive Peer Support**|Tianjian Liu et.al.|[2407.18064](http://arxiv.org/abs/2407.18064)|**[link](https://github.com/liutj9/compeer)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u5f0f\u4ee3\u7406\uff08CA\uff09\u4f5c\u4e3a\u540c\u4f34\u652f\u6301\u8005\u5728\u5fc3\u7406\u5065\u5eb7\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\u53ca\u76ca\u5904\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u540c\u4f34\u652f\u6301\u578bCA\u8981\u4e48\u7531\u7528\u6237\u4e3b\u52a8\u89e6\u53d1\uff0c\u8981\u4e48\u9075\u5faa\u9884\u8bbe\u89c4\u5219\u4ee5\u542f\u52a8\u5bf9\u8bdd\uff0c\u8fd9\u53ef\u80fd\u963b\u788d\u7528\u6237\u4e0eCA\u5efa\u7acb\u957f\u671f\u5173\u7cfb\uff0c\u4ece\u800c\u5f71\u54cd\u957f\u671f\u76ca\u5904\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f00\u53d1\u4e86ComPeer\u2014\u2014\u4e00\u79cd\u751f\u6210\u5f0fCA\uff0c\u5b83\u80fd\u591f\u4e3b\u52a8\u63d0\u4f9b\u9002\u5e94\u6027\u7684\u540c\u4f34\u652f\u6301\u3002 ComPeer\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5e76\u53cd\u6620\u5bf9\u8bdd\u4e2d\u7684\u5173\u952e\u4e8b\u4ef6\uff0c\u4ee5\u6b64\u6765\u7b56\u7565\u6027\u5730\u89c4\u5212\u4e3b\u52a8\u5173\u6000\u7684\u65f6\u95f4\u548c\u5185\u5bb9\u3002\u6b64\u5916\uff0cComPeer\u8fd8\u6574\u5408\u4e86\u540c\u4f34\u652f\u6301\u7b56\u7565\u3001\u5bf9\u8bdd\u5386\u53f2\u4ee5\u53ca\u5176\u4e2a\u6027\u5316\u7684\u5143\u7d20\u5230\u751f\u6210\u7684\u6d88\u606f\u4e2d\u3002\u901a\u8fc7\u4e00\u9879\u4e3a\u671f\u4e00\u5468\u7684\u8de8\u7ec4\u5b9e\u9a8c\uff08\u53c2\u4e0e\u4eba\u6570\uff1a24\uff09\uff0c\u6211\u4eec\u5c55\u793a\u4e86ComPeer\u5728\u957f\u65f6\u95f4\u5185\u63d0\u4f9b\u540c\u4f34\u652f\u6301\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u4e0e\u57fa\u4e8e\u7528\u6237\u7684\u4e3b\u52a8\u89e6\u53d1\u7684CA\u76f8\u6bd4\uff0c\u663e\u8457\u63d0\u5347\u4e86\u7528\u6237\u7684\u53c2\u4e0e\u5ea6\u3002 \u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u751f\u6210\u5f0fCA\u5728\u540c\u4f34\u652f\u6301\u9886\u57df\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5b83\u4eec\u5982\u4f55\u901a\u8fc7\u4e3b\u52a8\u5173\u6000\u7b56\u7565\u4fc3\u8fdb\u66f4\u6df1\u5165\u3001\u66f4\u6301\u7eed\u7684\u4eba\u9645\u4e92\u52a8\uff0c\u4ece\u800c\u4e3a\u7528\u6237\u63d0\u4f9b\u957f\u671f\u7684\u5fc3\u7406\u5065\u5eb7\u76ca\u5904\u3002|\n", "2407.18062": "|**2024-07-25**|**Audio Entailment: Assessing Deductive Reasoning for Audio Understanding**|Soham Deshmukh et.al.|[2407.18062](http://arxiv.org/abs/2407.18062)|**[link](https://github.com/microsoft/audioentailment)**|**\u8fd1\u671f\u6587\u732e\u5728\u6784\u5efa\u97f3\u9891\u57fa\u7840\u6a21\u578b\u65f6\u4f7f\u7528\u4e86\u8bed\u8a00\u3002\u8fd9\u4e9b\u97f3\u9891-\u8bed\u8a00\u6a21\u578b\uff08ALMs\uff09\u901a\u8fc7\u5927\u91cf\u97f3\u9891\u6587\u672c\u5bf9\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u5728\u6587\u672c\u5230\u97f3\u9891\u68c0\u7d22\u3001\u5b57\u5e55\u548c\u95ee\u7b54\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6267\u884c\u66f4\u590d\u6742\u7684\u5f00\u653e\u6027\u4efb\u52a1\uff0c\u5982\u4ea4\u4e92\u5f0f\u95ee\u7b54\u65f6\u7684\u80fd\u529b\uff0c\u9700\u8981\u903b\u8f91\u63a8\u7406\u6280\u80fd\uff0c\u800c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u5145\u5206\u8bc4\u4f30\u3002 \u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u97f3\u9891\u8574\u542b\u7684\u65b0\u4efb\u52a1\uff0c\u7528\u4e8e\u8bc4\u4f30ALM\u7684\u6f14\u7ece\u63a8\u7406\u80fd\u529b\u3002\u8fd9\u4e2a\u4efb\u52a1\u8bc4\u4f30\u97f3\u9891\u5185\u5bb9\u7684\u6587\u672c\u63cf\u8ff0\uff08\u5047\u8bbe\uff09\u662f\u5426\u53ef\u4ee5\u4ece\u97f3\u9891\u8bb0\u5f55\uff08\u524d\u63d0\uff09\u4e2d\u63a8\u65ad\u51fa\u6765\uff0c\u7ed3\u8bba\u53ef\u80fd\u662f\u8574\u542b\u3001\u4e2d\u7acb\u6216\u77db\u76fe\uff0c\u53d6\u51b3\u4e8e\u8bc1\u636e\u7684\u5145\u5206\u6027\u3002\u6211\u4eec\u521b\u5efa\u4e86\u4e24\u4e2a\u6570\u636e\u96c6\u6765\u5b8c\u6210\u8fd9\u9879\u4efb\u52a1\uff0c\u97f3\u9891\u8bb0\u5f55\u6765\u81ea\u4e24\u4e2a\u97f3\u9891\u5b57\u5e55\u6570\u636e\u96c6\u2014\u2014AudioCaps\u548cClotho\uff0c\u800c\u5047\u8bbe\u5219\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u3002 \u6211\u4eec\u5bf9\u6700\u5148\u8fdb\u7684ALMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u53d1\u73b0\u5b83\u4eec\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u903b\u8f91\u63a8\u7406\u80fd\u529b\u5b58\u5728\u4e0d\u8db3\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u5148\u5b57\u5e55\u540e\u63a8\u7406\u201d\u8fd9\u4e00\u4e2d\u95f4\u6b65\u9aa4\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u53ef\u4ee5\u5206\u522b\u63d0\u9ad8ALMs\u5728\u96f6\u6b21\u5b66\u4e60\u548c\u7ebf\u6027\u63a2\u9488\u8bc4\u4f30\u4e2d\u7684\u8868\u73b0\u7edd\u5bf9\u503c6%\u548c3%\u3002**|\n", "2407.18061": "|**2024-07-25**|**Difficulty Estimation and Simplification of French Text Using LLMs**|Henri Jamet et.al.|[2407.18061](http://arxiv.org/abs/2407.18061)|null|\u6211\u4eec\u5229\u7528\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6765\u5f00\u53d1\u5916\u8bed\u5b66\u4e60\u5e94\u7528\uff0c\u4e13\u6ce8\u4e8e\u8bc4\u4f30\u5916\u8bed\u6587\u672c\u7684\u96be\u5ea6\u5e76\u5c06\u5176\u7b80\u5316\u81f3\u8f83\u4f4e\u96be\u5ea6\u7ea7\u522b\u3002\u6211\u4eec\u5c06\u8fd9\u4e24\u4e2a\u4efb\u52a1\u90fd\u89c6\u4e3a\u9884\u6d4b\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u6709\u6807\u7b7e\u793a\u4f8b\u3001\u8fc1\u79fb\u5b66\u4e60\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u4e86\u4e00\u4e2a\u96be\u5ea6\u5206\u7c7b\u6a21\u578b\uff0c\u76f8\u8f83\u4e8e\u4ee5\u5f80\u65b9\u6cd5\uff0c\u8be5\u6a21\u578b\u5728\u51c6\u786e\u6027\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u5bf9\u4e8e\u7b80\u5316\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u7b80\u5316\u8d28\u91cf\u4e0e\u610f\u4e49\u4fdd\u7559\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u6bd4\u8f83\u4e86\u96f6\u521d\u59cb\u5316\u548c\u5fae\u8c03\u5927\u8bed\u8a00\u6a21\u578b\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u6709\u9650\u7684\u5fae\u8c03\uff0c\u53ef\u4ee5\u83b7\u5f97\u5177\u6709\u610f\u4e49\u7684\u6587\u672c\u7b80\u5316\u7ed3\u679c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u6cd5\u8bed\u6587\u672c\u4e0a\u8fdb\u884c\uff0c\u4f46\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u8bed\u8a00\u65e0\u5173\u6027\uff0c\u5e76\u76f4\u63a5\u9002\u7528\u4e8e\u5176\u4ed6\u5916\u8bed\u3002|\n", "2407.18897": "|**2024-07-26**|**Small Molecule Optimization with Large Language Models**|Philipp Guevorguian et.al.|[2407.18897](http://arxiv.org/abs/2407.18897)|**[link](https://github.com/yerevann/chemlactica)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e3a\u751f\u6210\u5206\u5b50\u836f\u7269\u8bbe\u8ba1\u5e26\u6765\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cChemlactica\u201d\u548c\u201cChemma\u201d\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u5b83\u4eec\u5747\u57fa\u4e8e\u4e00\u4e2a\u542b\u67091.1\u4ebf\u4e2a\u5206\u5b50\u53ca\u8ba1\u7b97\u5f97\u51fa\u5c5e\u6027\u7684\u5168\u65b0\u6570\u636e\u96c6\uff0c\u5171\u8ba1400\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u5fae\u8c03\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u751f\u6210\u5177\u6709\u6307\u5b9a\u5c5e\u6027\u7684\u5206\u5b50\u4ee5\u53ca\u4ece\u6709\u9650\u6837\u672c\u9884\u6d4b\u65b0\u5206\u5b50\u7279\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u4f18\u5316\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5229\u7528\u6211\u4eec\u7684\u8bed\u8a00\u6a21\u578b\u5bf9\u4efb\u610f\u5c5e\u6027\u8fdb\u884c\u4f18\u5316\uff0c\u540c\u65f6\u4ec5\u901a\u8fc7\u9ed1\u76d2\u5f0f\u63a5\u53e3\u8bbf\u95ee\u6709\u9650\u4fe1\u606f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u9057\u4f20\u7b97\u6cd5\u3001\u62d2\u7edd\u91c7\u6837\u548c\u63d0\u793a\u4f18\u5316\u7684\u6982\u5ff5\u3002\u8be5\u7b97\u6cd5\u5728\u591a\u4e2a\u5206\u5b50\u4f18\u5316\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5305\u62ec\u5728\u4e0e\u5148\u524d\u65b9\u6cd5\u76f8\u6bd4\u63d0\u9ad8\u4e868%\u7684\u201cPractical Molecular Optimization\u201d\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u8bad\u7ec3\u6570\u636e\u96c6\u3001\u8bed\u8a00\u6a21\u578b\u548c\u4f18\u5316\u7b97\u6cd5\u7684\u4ee3\u7801\u3002**|\n", "2407.18827": "|**2024-07-26**|**Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models**|Mutahar Safdar et.al.|[2407.18827](http://arxiv.org/abs/2407.18827)|null|\u6570\u636e\u9a71\u52a8\u7684\u589e\u6750\u5236\u9020(AM)\u7814\u7a76\u5728\u8fd1\u5e74\u6765\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5927\u91cf\u7684\u79d1\u5b66\u6587\u732e\u6d8c\u73b0\u3002\u8fd9\u4e9b\u6587\u732e\u4e2d\u7684\u77e5\u8bc6\u6d89\u53caAM\u548c\u4eba\u5de5\u667a\u80fd(AI)\u7684\u4e0a\u4e0b\u6587\uff0c\u4f46\u5c1a\u672a\u4ee5\u96c6\u6210\u7684\u65b9\u5f0f\u8fdb\u884c\u6316\u6398\u548c\u5f62\u5f0f\u5316\u3002\u4ece\u8fd9\u4e9b\u4f5c\u54c1\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u9700\u8981\u5927\u91cf\u7684\u52aa\u529b\u548c\u65f6\u95f4\u3002\u5728AM\u9886\u57df\u7684\u4e13\u5bb6\u5df2\u7ecf\u8d21\u732e\u4e86\u8d85\u8fc7\u4e8c\u5341\u591a\u7bc7\u7efc\u8ff0\u8bba\u6587\u6765\u603b\u7ed3\u8fd9\u4e9b\u5de5\u4f5c\u3002\u7136\u800c\uff0c\u4e0eAM\u548cAI\u76f8\u5173\u7684\u7279\u5b9a\u4fe1\u606f\u4ecd\u7136\u9700\u8981\u624b\u52a8\u52aa\u529b\u6765\u63d0\u53d6\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u5982BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u53d8\u6362\u5668\uff09\u6216GPT\uff08\u9884\u8bad\u7ec3\u751f\u6210\u578b\u53d8\u6362\u5668\uff09\u5728\u6587\u672c\u6570\u636e\u4e0a\u7684\u6210\u529f\uff0c\u4e3a\u52a0\u901f\u79d1\u5b66\u4fe1\u606f\u63d0\u53d6\u63d0\u4f9b\u4e86\u53ef\u80fd\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6846\u67b6\uff0c\u65e8\u5728\u4fc3\u8fdbAM\u548cAI\u4e13\u5bb6\u4e4b\u95f4\u7684\u5408\u4f5c\uff0c\u4ee5\u8fde\u7eed\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u79d1\u5b66\u4fe1\u606f\u3002\u57fa\u4e8e\u63d0\u51fa\u7684\u6846\u67b6\u5b9e\u73b0\u4e86\u4e00\u4e2a\u6f14\u793a\u5de5\u5177\uff0c\u5e76\u5f00\u5c55\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u4ee5\u63d0\u53d6\u4e0e\u6570\u636e\u96c6\u3001\u5efa\u6a21\u3001\u4f20\u611f\u548cAM\u7cfb\u7edf\u7c7b\u522b\u76f8\u5173\u7684\u4fe1\u606f\u3002\u6211\u4eec\u5c55\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u52a0\u5feb\u4ece\u6570\u636e\u9a71\u52a8\u7684AM\u6587\u732e\u4e2d\u63d0\u53d6\u76f8\u5173\u4fe1\u606f\u7684\u80fd\u529b\u3002\u5728\u672a\u6765\uff0c\u8be5\u6846\u67b6\u53ef\u4ee5\u7528\u4e8e\u4ece\u5de5\u7a0b\u5b66\u79d1\u7684\u8bbe\u8ba1\u548c\u5236\u9020\u6587\u732e\u4e2d\u63d0\u53d6\u4fe1\u606f\u3002|\n", "2407.18787": "|**2024-07-26**|**Automatic Detection of Moral Values in Music Lyrics**|Vjosa Preniqi et.al.|[2407.18787](http://arxiv.org/abs/2407.18787)|**[link](https://github.com/vjosapreniqi/ismir-mft-values)**|\u9053\u5fb7\u4ef7\u503c\u89c2\u5728\u8bc4\u4f30\u4fe1\u606f\u3001\u505a\u51fa\u51b3\u7b56\u548c\u5bf9\u91cd\u8981\u793e\u4f1a\u95ee\u9898\u5f62\u6210\u5224\u65ad\u65b9\u9762\u53d1\u6325\u7740\u57fa\u7840\u6027\u4f5c\u7528\u3002\u4ece\u6b4c\u8bcd\u4e2d\u5feb\u901f\u63d0\u53d6\u9053\u5fb7\u4ef7\u503c\u7684\u53ef\u80fd\u6027\u4f7f\u6211\u4eec\u5bf9\u97f3\u4e50\u8046\u542c\u884c\u4e3a\u6709\u66f4\u6df1\u7684\u7406\u89e3\u3002\u57fa\u4e8e\u9053\u5fb7\u57fa\u7840\u7406\u8bba\uff08MFT\uff09\uff0c\u6211\u4eec\u5bf9\u4e00\u7ec4\u7ecf\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4\uff09\u751f\u6210\u76842,721\u4e2a\u5408\u6210\u6b4c\u8bcd\u5fae\u8c03\u7684\u53d8\u538b\u5668\u57fa\u8bed\u8a00\u6a21\u578b\uff08BERT\uff09\u8fdb\u884c\u4e86\u4efb\u52a1\uff0c\u4ee5\u68c0\u6d4b200\u9996\u7531\u4e24\u4f4d\u4e13\u5bb6\u6ce8\u91ca\u7684\u771f\u5b9e\u97f3\u4e50\u6b4c\u8bcd\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\uff08\u5305\u62ec\u79bb\u57df\uff08BERT\u5728MFT\u6ce8\u91ca\u7684\u793e\u4ea4\u5a92\u4f53\u6587\u672c\u4e0a\u5fae\u8c03\uff09\u548c\u96f6\u5c04\u51fb\uff08GPT-4\uff09\u5206\u7c7b\uff09\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u9884\u6d4b\u80fd\u529b\u3002\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5728\u6240\u6709\u5b9e\u9a8c\u4e2d\u5747\u8868\u73b0\u51fa\u6700\u4f73\u51c6\u786e\u6027\uff0c\u5e73\u5747F1\u52a0\u6743\u5f97\u5206\u4e3a0.8\u3002\u4e0e\u57fa\u51c6\u6a21\u578b\u76f8\u6bd4\uff0c\u8be5\u6027\u80fd\u5e73\u5747\u9ad8\u51fa5%\u3002\u5728\u4e8c\u5143\u5206\u7c7b\u7684\u7cbe\u786e\u5ea6\u4e0a\uff0c\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u5e73\u5747\u9ad8\u51fa\u57fa\u51c6\u6a21\u578b12%\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u8d21\u732e\u4e86\u65e0\u6ce8\u91ca\u7684\u6b4c\u8bcd\u9053\u5fb7\u5b66\u4e60\u4ee5\u53ca\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u97f3\u4e50\u4e2d\u9053\u5fb7\u8868\u8fbe\u7684\u77e5\u8bc6\u63d0\u70bc\uff0c\u5e76\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u6280\u672f\u5bf9\u521b\u610f\u4ea7\u4e1a\u548c\u97f3\u4e50\u6587\u5316\u6f5c\u5728\u5f71\u54cd\u7684\u6709\u7528\u89c1\u89e3\u3002|\n", "2407.18786": "|**2024-07-26**|**The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs**|Aleix Sant et.al.|[2407.18786](http://arxiv.org/abs/2407.18786)|null|\u672c\u6587\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u89c6\u89d2\u63a2\u8ba8\u4e86\u673a\u5668\u7ffb\u8bd1\u4e2d\u7684\u6027\u522b\u504f\u89c1\u95ee\u9898\u3002\u7814\u7a76\u4f7f\u7528\u4e86\u56db\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u6d4b\u8bd5\u96c6\uff0c\u5bf9\u82f1\u8bed\u5230\u52a0\u6cf0\u7f57\u5c3c\u4e9a\u8bed\uff08En$\\rightarrow$Ca\uff09\u548c\u82f1\u8bed\u5230\u897f\u73ed\u7259\u8bed\uff08En$\\rightarrow$Es\uff09\u7684\u7ffb\u8bd1\u65b9\u5411\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u4e0e\u6700\u5148\u8fdb\u7684\u795e\u7ecf\u673a\u5668\u7ffb\u8bd1\uff08NMT\uff09\u6a21\u578b\u8fdb\u884c\u5bf9\u6bd4\uff0c\u8bc4\u4f30\u5404\u79cd\u57fa\u7840LLM\u7684\u7ffb\u8bd1\u8d28\u91cf\u548c\u6027\u522b\u504f\u89c1\u60c5\u51b5\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u6709\u6a21\u578b\u666e\u904d\u5b58\u5728\u6027\u522b\u504f\u89c1\u73b0\u8c61\uff0c\u5176\u4e2d\u57fa\u7840LLM\u7684\u504f\u89c1\u7a0b\u5ea6\u6bd4NMT\u6a21\u578b\u66f4\u9ad8\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u79cd\u504f\u89c1\uff0c\u7814\u7a76\u63a2\u7d22\u4e86\u5bf9\u6307\u4ee4\u8c03\u4f18LLM\u5e94\u7528\u7684\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u3002\u7814\u7a76\u8bc6\u522b\u51fa\u4e00\u79cd\u63d0\u793a\u7ed3\u6784\uff0c\u80fd\u591f\u663e\u8457\u964d\u4f4e\u6027\u522b\u504f\u89c1\uff0c\u76f8\u6bd4\u66f4\u76f4\u63a5\u7684\u63d0\u793a\uff0c\u5728WinoMT\u8bc4\u4f30\u6570\u636e\u96c6\u4e0a\u51cf\u5c11\u4e86\u9ad8\u8fbe12%\u7684\u6027\u522b\u504f\u89c1\u3002\u8fd9\u4e9b\u7ed3\u679c\u663e\u8457\u7f29\u5c0f\u4e86LLM\u4e0e\u4f20\u7edfNMT\u7cfb\u7edf\u5728\u6027\u522b\u504f\u89c1\u51c6\u786e\u6027\u65b9\u9762\u7684\u5dee\u8ddd\u3002|\n", "2407.18764": "|**2024-07-26**|**TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals**|Kevin Kliimask et.al.|[2407.18764](http://arxiv.org/abs/2407.18764)|null|\u81ea2000\u5e74\u4ee3\u4e2d\u671f\u4ee5\u6765\uff0c\u63a8\u52a8\u5f00\u653e\u653f\u5e9c\u6570\u636e\uff08OGD\uff09\u7684\u52aa\u529b\u5728\u5404\u7ea7\u653f\u5e9c\u4e2d\u83b7\u5f97\u4e86\u663e\u8457\u7684\u52bf\u5934\u3002\u968f\u7740\u8d8a\u6765\u8d8a\u591a\u7684\u6570\u636e\u96c6\u88ab\u53d1\u5e03\u5230OGD\u95e8\u6237\u4e0a\uff0c\u67e5\u627e\u7279\u5b9a\u6570\u636e\u53d8\u5f97\u8d8a\u6765\u8d8a\u56f0\u96be\uff0c\u5bfc\u81f4\u4fe1\u606f\u8fc7\u8f7d\u3002\u5b8c\u6574\u4e14\u51c6\u786e\u7684\u6570\u636e\u96c6\u6587\u6863\uff0c\u5305\u62ec\u4e0e\u6570\u636e\u96c6\u5173\u8054\u7684\u9002\u5f53\u6807\u7b7e\uff0c\u5bf9\u4e8e\u63d0\u9ad8\u6570\u636e\u96c6\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u81f3\u5173\u91cd\u8981\u3002\u5bf9\u7231\u6c99\u5c3c\u4e9a\u5f00\u653e\u6570\u636e\u95e8\u6237\u7684\u5206\u6790\u63ed\u793a\uff0c11%\u7684\u6570\u636e\u96c6\u6ca1\u6709\u5173\u8054\u6807\u7b7e\uff0c\u800c26%\u7684\u6570\u636e\u96c6\u4ec5\u6709\u4e00\u4e2a\u6807\u7b7e\u88ab\u5206\u914d\uff0c\u8fd9\u8868\u660e\u4e86\u95e8\u6237\u5185\u6570\u636e\u53ef\u53d1\u73b0\u6027\u548c\u53ef\u8bbf\u95ee\u6027\u9762\u4e34\u7684\u6311\u6218\u3002\u6839\u636e\u6700\u8fd1\u7684\u5f00\u653e\u6570\u636e\u6210\u719f\u5ea6\u62a5\u544a\uff0c\u8be5\u95e8\u6237\u88ab\u8ba4\u4e3a\u662f\u9886\u5148\u8005\u3002\u672c\u7814\u7a76\u7684\u76ee\u6807\u662f\u63d0\u51fa\u4e00\u79cd\u81ea\u52a8\u5316\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u6539\u5584OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u96c6\u6807\u7b7e\uff0c\u4ece\u800c\u63d0\u9ad8\u6570\u636e\u96c6\u7684\u53ef\u53d1\u73b0\u6027\u3002\u672c\u6587\u4ecb\u7ecd\u4e86Tagify\u2014\u2014\u4e00\u4e2a\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u5982GPT-3.5-turbo\u548cGPT-4\u81ea\u52a8\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\u7684\u539f\u578b\uff0c\u4ee5\u82f1\u8bed\u548c\u7231\u6c99\u5c3c\u4e9a\u8bed\u4e3a\u6570\u636e\u96c6\u751f\u6210\u6807\u7b7e\uff0c\u4ece\u800c\u589e\u5f3a\u6570\u636e\u53d1\u5e03\u8005\u51c6\u5907\u7684\u5143\u6570\u636e\uff0c\u5e76\u901a\u8fc7\u6539\u5584\u6570\u636e\u7528\u6237\u5728OGD\u95e8\u6237\u4e0a\u7684\u6570\u636e\u53d1\u73b0\u6027\u6765\u63d0\u9ad8\u6570\u636e\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u5f00\u53d1\u7684\u89e3\u51b3\u65b9\u6848\u7ecf\u8fc7\u7528\u6237\u8bc4\u4f30\uff0c\u5e76\u6536\u96c6\u4e86\u4ed6\u4eec\u7684\u53cd\u9988\uff0c\u4ee5\u5b9a\u4e49\u672a\u6765\u539f\u578b\u6539\u8fdb\u7684\u8bae\u7a0b\u3002|\n", "2407.18752": "|**2024-07-26**|**Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery**|Yuni Susanti et.al.|[2407.18752](http://arxiv.org/abs/2407.18752)|**[link](https://github.com/littleflow3r/kg-structure-as-prompt)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u57fa\u4e8e\u5143\u6570\u636e\u800c\u975e\u5b9e\u9645\u6570\u636e\u503c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u56e0\u679c\u53d1\u73b0\u95ee\u9898\u4e0a\u7684\u65b0\u89c6\u89d2\uff0c\u5373\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u6211\u4eec\u5173\u6ce8\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLMs\uff0c\u53c2\u6570\u5c11\u4e8e10\u4ebf\uff09\u5982\u4f55\u901a\u8fc7\u63d0\u793a\u5f0f\u5b66\u4e60\u8fdb\u884c\u77e5\u8bc6\u5bfc\u5411\u7684\u56e0\u679c\u53d1\u73b0\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u77e5\u8bc6\u56fe\u8c31\u7684\u7ed3\u6784\u63d0\u793a\u201d\uff08KG Structure as Prompt\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u77e5\u8bc6\u56fe\u8c31\u4e2d\u7684\u7ed3\u6784\u4fe1\u606f\uff0c\u5982\u5171\u90bb\u8282\u70b9\u548c\u5143\u8def\u5f84\uff0c\u6574\u5408\u5230\u63d0\u793a\u5f0f\u5b66\u4e60\u4e2d\uff0c\u4ee5\u589e\u5f3aSLMs\u7684\u80fd\u529b\u3002 \u5728\u4e09\u79cd\u7c7b\u578b\u7684\u751f\u547d\u79d1\u5b66\u548c\u5f00\u653e\u57df\u6570\u636e\u96c6\u4e0b\u7684\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\uff0c\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u8d85\u8d8a\u4e86\u8bb8\u591a\u57fa\u7ebf\uff0c\u5e76\u4e14\u751a\u81f3\u8d85\u8fc7\u4e86\u5728\u5b8c\u6574\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5e38\u89c4\u5fae\u8c03\u7684\u4f20\u7edf\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u80fd\u529b\uff1a\u7ed3\u5408\u77e5\u8bc6\u56fe\u8c31\u548c\u63d0\u793a\u5f0f\u5b66\u4e60\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u663e\u793a\u51fa\u8d85\u8d8a\u53c2\u6570\u66f4\u591aLLMs\u7684\u6f5c\u529b\u3002 \u6211\u4eec\u5df2\u7ecf\u5728GitHub\u4e0a\u63d0\u4f9b\u4e86\u4ee3\u7801\u548c\u6570\u636e\u96c6\u3002**|\n", "2407.18743": "|**2024-07-26**|**Towards Effective and Efficient Continual Pre-training of Large Language Models**|Jie Chen et.al.|[2407.18743](http://arxiv.org/abs/2407.18743)|null|\u8fd9\u7bc7\u6280\u672f\u62a5\u544a\u4ecb\u7ecd\u4e86\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u65b9\u6cd5\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u6216\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u62a5\u544a\u4ee5Llama-3\uff088B\uff09\u4e3a\u4f8b\uff0c\u8fd9\u662f\u4e00\u4e2a\u663e\u8457\u63d0\u5347\u4e86\u5176\u5728\u4e2d\u6587\u7406\u89e3\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u7684\u57fa\u7ebf\u6a21\u578b\u3002\u4e3a\u4e86\u5728\u589e\u5f3a\u65b0\u80fd\u529b\u7684\u540c\u65f6\u4fdd\u6301\u539f\u6709\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u6570\u636e\u6df7\u5408\u548c\u8bfe\u7a0b\u7b56\u7565\uff0c\u5229\u7528\u73b0\u6709\u6570\u636e\u96c6\u5e76\u5408\u6210\u9ad8\u8d28\u91cf\u6570\u636e\u96c6\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u57fa\u4e8e\u76f8\u5173\u7f51\u9875\u751f\u6210\u591a\u5b66\u79d1\u7684\u79d1\u5b66\u95ee\u9898\u4e0e\u7b54\u6848\uff08QA\uff09\u5bf9\uff0c\u5e76\u5c06\u8fd9\u4e9b\u5408\u6210\u6570\u636e\u878d\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u4ee5\u63d0\u5347Llama-3\u7684\u79d1\u5b66\u63a8\u7406\u80fd\u529b\u3002\u7ecf\u8fc7\u8fd9\u4e00\u7cfb\u5217\u6539\u8fdb\u540e\u7684\u6a21\u578b\u88ab\u79f0\u4e3aLlama-3-SynE\uff08\u5408\u6210\u6570\u636e\u589e\u5f3a\u7684Llama-3\uff09\u3002\u62a5\u544a\u8fd8\u901a\u8fc7\u8f83\u5c0f\u89c4\u6a21\u7684TinyLlama\u6a21\u578b\u8fdb\u884c\u8c03\u53c2\u5b9e\u9a8c\uff0c\u5e76\u5229\u7528\u4ece\u8fd9\u4e9b\u5b9e\u9a8c\u4e2d\u5f97\u5230\u7684\u53d1\u73b0\u6765\u8bad\u7ec3\u57fa\u7ebf\u6a21\u578b\u3002 \u591a\u4e2a\u8bc4\u4f30\u57fa\u51c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u9ad8\u57fa\u7ebf\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5305\u62ec\u901a\u7528\u80fd\u529b\uff08C-Eval\u4e0a+8.81\u5206\uff0cCMMLU\u4e0a+6.31\u5206\uff09\u548c\u79d1\u5b66\u63a8\u7406\u80fd\u529b\uff08MATH\u4e0a+12.00\u5206\uff0cSciEval\u4e0a+4.13\u5206\uff09\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u539f\u6709\u7684\u80fd\u529b\u3002\u8be5\u6a21\u578b\u3001\u6570\u636e\u548c\u4ee3\u7801\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/RUC-GSAI/Llama-3-SynE\u3002|\n", "2407.18738": "|**2024-07-26**|**Towards Generalized Offensive Language Identification**|Alphaeus Dmonte et.al.|[2407.18738](http://arxiv.org/abs/2407.18738)|null|\u4e92\u8054\u7f51\u4e0a\u5177\u6709\u653b\u51fb\u6027\u7684\u5185\u5bb9\uff0c\u5305\u62ec\u4ec7\u6068\u8a00\u8bba\u548c\u7f51\u7edc\u6b3a\u51cc\uff0c\u662f\u4e00\u4e2a\u5168\u7403\u6027\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u793e\u533a\u5bf9\u6b64\u7ed9\u4e88\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u5df2\u7ecf\u5f00\u53d1\u51fa\u4e86\u591a\u79cd\u81ea\u52a8\u8bc6\u522b\u53ef\u80fd\u6709\u5bb3\u5185\u5bb9\u5e76\u51cf\u8f7b\u5176\u5f71\u54cd\u7684\u7cfb\u7edf\u3002\u8fd9\u4e9b\u7cfb\u7edf\u4e3b\u8981\u91c7\u7528\u4e24\u79cd\u7b56\u7565\uff1a\uff081\uff09\u4f7f\u7528\u516c\u5f00\u53ef\u7528\u7684\u6a21\u578b\u548c\u5e94\u7528\u7aef\u70b9\uff0c\u5305\u62ec\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff1b\uff082\uff09\u6ce8\u91ca\u6570\u636e\u96c6\uff0c\u5e76\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u673a\u5668\u5b66\u4e60\u6a21\u578b\u3002\u7136\u800c\uff0c\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u901a\u7528\u6027\u5c1a\u4e0d\u6e05\u695a\uff0c\u800c\u4e14\u5b83\u4eec\u5728\u5b9e\u9645\u73af\u5883\u548c\u975e\u9886\u57df\u5185\u7684\u5e94\u7528\u4e5f\u5e38\u53d7\u5230\u8d28\u7591\u3002\u672c\u6587\u901a\u8fc7\u4e00\u4e2a\u65b0\u9896\u7684\u901a\u7528\u57fa\u51c6\u5bf9\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u901a\u7528\u6027\u8fdb\u884c\u4e86\u5b9e\u8bc1\u8bc4\u4f30\u3002\u6211\u4eec\u9488\u5bf9\u901a\u7528\u6027\u63d0\u51fa\u4e86\u4e09\u4e2a\u7814\u7a76\u95ee\u9898\uff0c\u5e76\u5f97\u51fa\u4e86\u7ed3\u8bba\u3002\u8fd9\u4e9b\u53d1\u73b0\u5c06\u6709\u52a9\u4e8e\u6784\u5efa\u66f4\u5f3a\u5927\u7684\u73b0\u5b9e\u4e16\u754c\u653b\u51fb\u6027\u8bed\u8a00\u68c0\u6d4b\u7cfb\u7edf\u3002|\n", "2407.18723": "|**2024-07-26**|**LLASP: Fine-tuning Large Language Models for Answer Set Programming**|Erica Coppolillo et.al.|[2407.18723](http://arxiv.org/abs/2407.18723)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u3002\u5c3d\u7ba1\u5728\u9002\u5e94LLMs\u4ee5\u751f\u6210\u591a\u79cd\u6307\u4ee4\u6027\u7f16\u7a0b\u8bed\u8a00\u548c\u4efb\u52a1\u7684\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u58f0\u660e\u5f0f\u5f62\u5f0f\u5316\u8bed\u8a00\uff0c\u5982\u7b54\u6848\u96c6\u7f16\u7a0b\uff08ASP\uff09\u65f6\u7684\u80fd\u529b\u4ecd\u6709\u5f85\u63a2\u7d22\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8LLMs\u5728ASP\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u5e94\u7528\u53ef\u80fd\u6027\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u5f53\u524d\u6700\u5148\u8fdb\u7684LLMs\u8fdb\u884c\u4e86\u7cfb\u7edf\u8bc4\u4f30\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6a21\u578b\u5728\u53c2\u6570\u6570\u91cf\u3001\u8bad\u7ec3\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u7b49\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u5b83\u4eec\u5728\u751f\u6210\u6b63\u786eASP\u7a0b\u5e8f\u65b9\u9762\u7684\u8868\u73b0\u5e76\u4e0d\u7406\u60f3\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aLLASP\u7684\u8f7b\u91cf\u7ea7\u6a21\u578b\uff0c\u4e13\u95e8\u7528\u4e8e\u7f16\u7801ASP\u7a0b\u5e8f\u7684\u57fa\u672c\u6a21\u5f0f\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u5e7f\u6cdb\u57fa\u672c\u95ee\u9898\u89c4\u8303\u7684\u81ea\u5b9a\u4e49\u6570\u636e\u96c6\uff0c\u8fd9\u4e9b\u89c4\u8303\u53ef\u4ee5\u88ab\u7f16\u7801\u4e3aASP\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cLLASP\u751f\u6210\u7684ASP\u7a0b\u5e8f\u7684\u8d28\u91cf\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u3002\u4e0e\u672a\u7ecf\u8fc7\u5fae\u8c03\u7684\u7248\u672c\u76f8\u6bd4\uff0c\u4ee5\u53ca\u4e0e\u5927\u591a\u6570\u6e34\u671b\u578bLLM\u5019\u9009\u8005\uff0c\u5c24\u5176\u662f\u4ece\u8bed\u4e49\u89d2\u5ea6\u6765\u770b\uff0c\u5176\u8868\u73b0\u5747\u4f18\u4e8e\u591a\u6570\u3002\u6240\u6709\u7528\u4e8e\u6267\u884c\u5b9e\u9a8c\u7684\u4ee3\u7801\u548c\u6570\u636e\u90fd\u5df2\u516c\u5f00\u53d1\u5e03\u4e8ehttps://anonymous.4open.science/r/LLASP-D86C/\u3002|\n", "2407.18722": "|**2024-07-26**|**Neurosymbolic AI for Enhancing Instructability in Generative AI**|Amit Sheth et.al.|[2407.18722](http://arxiv.org/abs/2407.18722)|null|\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5728\u6587\u672c\u3001\u56fe\u50cf\u548c\u97f3\u4e50\u7b49\u5185\u5bb9\u521b\u4f5c\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u53d8\u9769\uff0c\u5c55\u793a\u4e86\u9075\u5faa\u6307\u4ee4\u7684\u63d0\u793a\u80fd\u529b\uff0c\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u6307\u4ee4\u8c03\u4f18\u3002\u6307\u4ee4\u8c03\u4f18\u662f\u4e00\u79cd\u76d1\u7763\u5f0f\u5fae\u8c03\u65b9\u6cd5\uff0c\u901a\u8fc7\u8bad\u7ec3\u6570\u636e\u96c6\u6765\u5b9e\u73b0\u7279\u5b9a\u4efb\u52a1\u53ca\u5176\u5bf9\u5e94\u6307\u4ee4\u683c\u5f0f\u5316\uff0c\u8fd9\u79cd\u65b9\u6cd5\u7cfb\u7edf\u6027\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u6267\u884c\u63d0\u4f9b\u6307\u793a\u7684\u80fd\u529b\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cLLMs \u5728\u4e00\u81f4\u7406\u89e3\u548c\u6267\u884c\u590d\u6742\u3001\u591a\u6b65\u9aa4\u6307\u4ee4\u4ee5\u53ca\u5c06\u8fd9\u4e9b\u6307\u4ee4\u63a8\u5e7f\u5230\u65b0\u4efb\u52a1\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5bf9\u4e8e\u66f4\u5e7f\u6cdb\u5730\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u4e3a\u4f55\u795e\u7ecf\u7b26\u53f7AI\u80fd\u63d0\u4f9b\u589e\u5f3aLLMs\u6307\u4ee4\u53ef\u7406\u89e3\u6027\u7684\u66f4\u597d\u9014\u5f84\u3002\u6211\u4eec\u63a2\u7d22\u4f7f\u7528\u7b26\u53f7\u4efb\u52a1\u89c4\u5212\u5668\u5206\u89e3\u9ad8\u7ea7\u6307\u4ee4\u4e3a\u7ed3\u6784\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u795e\u7ecf\u8bed\u4e49\u89e3\u6790\u5668\u5c06\u8fd9\u4e9b\u4efb\u52a1\u843d\u5730\u4e3a\u53ef\u6267\u884c\u64cd\u4f5c\uff0c\u4ee5\u53ca\u4f7f\u7528\u795e\u7ecf\u7b26\u53f7\u6267\u884c\u5668\u5b9e\u65bd\u8fd9\u4e9b\u64cd\u4f5c\u7684\u540c\u65f6\u52a8\u6001\u7ef4\u62a4\u660e\u786e\u7684\u72b6\u6001\u8868\u793a\u3002\u6211\u4eec\u4e5f\u5bfb\u6c42\u5c55\u793a\uff0c\u795e\u7ecf\u7b26\u53f7\u65b9\u6cd5\u80fd\u591f\u589e\u5f3a\u4efb\u52a1\u6267\u884c\u7684\u53ef\u9760\u6027\u548c\u4e0a\u4e0b\u6587\u610f\u8bc6\uff0c\u4f7fLLMs\u80fd\u591f\u4ee5\u66f4\u9ad8\u7684\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u52a8\u6001\u89e3\u91ca\u548c\u54cd\u5e94\u66f4\u5e7f\u6cdb\u7684\u6307\u4ee4\u4e0a\u4e0b\u6587\u3002|\n", "2407.20232": "|**2024-07-29**|**Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing**|Ekaterina Iakovleva et.al.|[2407.20232](http://arxiv.org/abs/2407.20232)|null|\u6587\u672c\u7f16\u8f91\u7684\u6269\u6563\u6a21\u578b\u5728\u7528\u6237\u8f93\u5165\u6307\u4ee4\u5b58\u5728\u6b67\u4e49\u65f6\u8868\u73b0\u51fa\u6709\u9650\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Specify ANd Edit\uff08SANE\uff09\uff0c\u4e00\u4e2a\u7528\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u7f16\u8f91\u7cfb\u7edf\u7684\u96f6\u6837\u672c\u63a8\u7406\u7ba1\u9053\u3002\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u8f93\u5165\u6307\u4ee4\u5206\u89e3\u4e3a\u5177\u4f53\u7684\u6307\u4ee4\uff0c\u5373\u5e94\u7528\u5230\u8f93\u5165\u56fe\u50cf\u4ee5\u6ee1\u8db3\u7528\u6237\u8bf7\u6c42\u7684\u5177\u4f53\u5e72\u9884\u63aa\u65bd\u3002\u901a\u8fc7\u4e00\u79cd\u4e13\u95e8\u4e3a\u4efb\u52a1\u8bbe\u8ba1\u7684\u65b0\u9896\u53bb\u566a\u6307\u5bfc\u7b56\u7565\uff0c\u6211\u4eec\u53ef\u4ee5\u4eceLLM\u751f\u6210\u7684\u6307\u4ee4\u4ee5\u53ca\u539f\u59cb\u6307\u4ee4\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5728\u4e09\u4e2a\u57fa\u7ebf\u548c\u4e24\u4e2a\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86SANE\u5728\u6240\u6709\u8bbe\u7f6e\u4e2d\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7ba1\u9053\u63d0\u9ad8\u4e86\u7f16\u8f91\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u589e\u5f3a\u4e86\u8f93\u51fa\u591a\u6837\u6027\u3002\u6211\u4eec\u8fd8\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u7f16\u8f91\uff0c\u65e0\u8bba\u662f\u5426\u5b58\u5728\u6b67\u4e49\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/fabvio/SANE\u3002|\n", "2407.20224": "|**2024-07-29**|**Can Editing LLMs Inject Harm?**|Canyu Chen et.al.|[2407.20224](http://arxiv.org/abs/2407.20224)|null|\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u6b63\u9010\u6e10\u88ab\u91c7\u7528\u4ee5\u9ad8\u6548\u5730\u7ea0\u6b63\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u7684\u9519\u8bef\u6216\u8fc7\u65f6\u77e5\u8bc6\uff0c\u8fd9\u4e3b\u8981\u662f\u56e0\u4e3a\u4ece\u5934\u5f00\u59cb\u91cd\u65b0\u8bad\u7ec3\u7684\u9ad8\u6210\u672c\u3002\u540c\u65f6\uff0c\u4e00\u4e2a\u4e9f\u5f85\u63a2\u7d22\u4f46\u672a\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\uff1a\u77e5\u8bc6\u7f16\u8f91\u662f\u5426\u53ef\u4ee5\u7528\u4e8e\u5411LLMs\u6ce8\u5165\u5371\u5bb3\uff1f\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u77e5\u8bc6\u7f16\u8f91\u91cd\u65b0\u5b9a\u4e49\u4e3aLLMs\u9762\u4e34\u7684\u4e00\u79cd\u65b0\u7c7b\u578b\u5b89\u5168\u6027\u5a01\u80c1\uff0c\u5373\u7f16\u8f91\u653b\u51fb\uff0c\u5e76\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u65b0\u7684\u6570\u636e\u96c6EditAttack\u8fdb\u884c\u4e86\u7cfb\u7edf\u6027\u7684\u8c03\u67e5\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u805a\u7126\u4e8e\u7f16\u8f91\u653b\u51fb\u7684\u4e24\u4e2a\u5178\u578b\u5b89\u5168\u6027\u98ce\u9669\uff1a\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u504f\u89c1\u6ce8\u5165\u3002\u5bf9\u4e8e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u9996\u5148\u5c06\u5176\u7ec6\u5206\u4e3a\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u548c\u957f\u5c3e\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u3002\u7136\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u7f16\u8f91\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u5411LLMs\u6ce8\u5165\u8fd9\u4e24\u79cd\u7c7b\u578b\u7684\u8bef\u5bfc\u6027\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5bf9\u5e38\u8bc6\u8bef\u5bfc\u6027\u4fe1\u606f\u6ce8\u5165\u7684\u6709\u6548\u6027\u7279\u522b\u9ad8\u3002 \u5bf9\u4e8e\u504f\u89c1\u6ce8\u5165\u7684\u98ce\u9669\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u70b9\uff0c\u5373\u4e0d\u4ec5\u53ef\u4ee5\u901a\u8fc7\u9ad8\u6709\u6548\u6027\u5411LLMs\u6ce8\u5165\u6709\u504f\u89c1\u7684\u53e5\u5b50\uff0c\u800c\u4e14\u5355\u4e2a\u6709\u504f\u89c1\u7684\u53e5\u5b50\u6ce8\u5165\u5c31\u8db3\u4ee5\u5bfc\u81f4LLMs\u7684\u603b\u4f53\u8f93\u51fa\u51fa\u73b0\u663e\u8457\u504f\u89c1\u589e\u52a0\uff0c\u5373\u4f7f\u8fd9\u4e9b\u8f93\u51fa\u4e0e\u6ce8\u5165\u7684\u53e5\u5b50\u9ad8\u5ea6\u65e0\u5173\uff0c\u8fd9\u8868\u660e\u4e86\u7f16\u8f91\u653b\u51fb\u5bf9LLMs\u6574\u4f53\u516c\u5e73\u6027\u7684\u707e\u96be\u6027\u5f71\u54cd\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u7f16\u8f91\u653b\u51fb\u7684\u9ad8\u9690\u853d\u6027\uff0c\u901a\u8fc7\u5176\u5bf9LLMs\u4e00\u822c\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u7684\u5f71\u54cd\u6765\u8861\u91cf\uff0c\u4ee5\u53ca\u5728\u5b9e\u8bc1\u8bc1\u636e\u7684\u57fa\u7840\u4e0a\u8bf4\u660e\u4e86\u9632\u5fa1\u7f16\u8f91\u653b\u51fb\u7684\u56f0\u96be\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u5728\u635f\u5bb3LLMs\u5b89\u5168\u5bf9\u9f50\u65b9\u9762\u6b63\u5728\u51fa\u73b0\u7684\u6ee5\u7528\u98ce\u9669\u3002|\n", "2407.20207": "|**2024-07-29**|**QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval**|Hongming Tan et.al.|[2407.20207](http://arxiv.org/abs/2407.20207)|null|\u5728\u5bc6\u96c6\u68c0\u7d22\u9886\u57df\uff0c\u5c06\u957f\u6587\u672c\u8f6c\u5316\u4e3a\u7a20\u5bc6\u5411\u91cf\u65f6\u53ef\u80fd\u4f1a\u5bfc\u81f4\u4fe1\u606f\u4e22\u5931\uff0c\u4ece\u800c\u5f71\u54cd\u67e5\u8be2\u4e0e\u6587\u672c\u7684\u5339\u914d\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u8d28\u91cf\u8f83\u4f4e\u3001\u566a\u58f0\u8fc7\u591a\u6216\u5173\u952e\u4fe1\u606f\u7a00\u758f\u7684\u6587\u672c\u5f80\u5f80\u96be\u4ee5\u4e0e\u76f8\u5173\u67e5\u8be2\u826f\u597d\u5339\u914d\u3002\u5f53\u524d\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u5347\u53e5\u5d4c\u5165\u6a21\u578b\u6216\u68c0\u7d22\u6d41\u7a0b\u4e0a\u3002\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u8be5\u6846\u67b6\u901a\u8fc7\u5c06\u539f\u59cb\u6587\u6863\u8f6c\u5316\u4e3a\u4fe1\u606f\u5bc6\u96c6\u578b\u6587\u672c\u683c\u5f0f\uff0c\u4ee5\u8865\u5145\u539f\u6587\u672c\uff0c\u6709\u6548\u89e3\u51b3\u4e0a\u8ff0\u95ee\u9898\uff0c\u540c\u65f6\u65e0\u9700\u4fee\u6539\u5d4c\u5165\u6216\u68c0\u7d22\u65b9\u6cd5\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u96f6\u6837\u672c\u63d0\u793a\u751f\u6210\u4e24\u79cd\u6587\u672c\u8868\u793a\uff1a\u95ee\u9898-\u7b54\u6848\u5bf9\u548c\u4e8b\u4ef6\u9a71\u52a8\u5143\u7d20\u3002\u6211\u4eec\u5c06\u6b64\u65b9\u6cd5\u547d\u540d\u4e3aQAEA-DR\uff1a\u7edf\u4e00\u95ee\u9898\u751f\u6210\u4e0e\u4e8b\u4ef6\u63d0\u53d6\u7684\u6587\u672c\u589e\u5f3a\u6846\u67b6\uff0c\u7528\u4e8e\u5bc6\u96c6\u68c0\u7d22\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u63d0\u5347\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bc4\u5206\u7684\u8bc4\u4f30\u4e0e\u518d\u751f\u6210\u673a\u5236\u4e8eLLM\u63d0\u793a\u8fc7\u7a0b\u4e2d\u3002\u6211\u4eec\u7684QAEA-DR\u6a21\u578b\u5bf9\u5bc6\u96c6\u68c0\u7d22\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u8fd9\u4e00\u89c2\u70b9\u5f97\u5230\u4e86\u7406\u8bba\u5206\u6790\u548c\u5b9e\u9a8c\u8bc1\u636e\u7684\u652f\u6301\u3002|\n", "2407.20183": "|**2024-07-29**|**MindSearch: Mimicking Human Minds Elicits Deep AI Searcher**|Zehui Chen et.al.|[2407.20183](http://arxiv.org/abs/2407.20183)|**[link](https://github.com/internlm/mindsearch)**|**\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u662f\u4e00\u4e2a\u590d\u6742\u8ba4\u77e5\u4efb\u52a1\uff0c\u9700\u8981\u6295\u5165\u5927\u91cf\u65f6\u95f4\u548c\u7cbe\u529b\u3002\u53d7\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fd1\u671f\u663e\u8457\u8fdb\u5c55\u7684\u542f\u53d1\uff0c\u8fd1\u671f\u5de5\u4f5c\u5c1d\u8bd5\u901a\u8fc7\u7ed3\u5408\u641c\u7d22\u5f15\u64ce\u4e0eLLM\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u4ecd\u7136\u56e0\u4e09\u4e2a\u6311\u6218\u800c\u83b7\u5f97\u4e0d\u4ee4\u4eba\u6ee1\u610f\u7684\u6027\u80fd\uff1a\uff081\uff09\u590d\u6742\u7684\u8bf7\u6c42\u5f80\u5f80\u65e0\u6cd5\u51c6\u786e\u4e14\u5b8c\u6574\u5730\u7531\u641c\u7d22\u5f15\u64ce\u68c0\u7d22\uff1b\uff082\uff09\u9700\u8981\u6574\u5408\u7684\u4fe1\u606f\u5206\u5e03\u5728\u591a\u4e2a\u7f51\u9875\u4e0a\uff0c\u5e76\u5939\u6742\u7740\u5927\u91cf\u566a\u97f3\uff1b\uff083\uff09\u5927\u91cf\u957f\u6587\u672c\u7684\u7f51\u9875\u53ef\u80fd\u8fc5\u901f\u8d85\u8fc7LLM\u7684\u6700\u5927\u4e0a\u4e0b\u6587\u957f\u5ea6\u3002 \u53d7\u4eba\u7c7b\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u601d\u7ef4\u8fc7\u7a0b\u7684\u542f\u53d1\uff0c\u6211\u4eec\u5f15\u5165\u4e86MindSearch\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u4e92\u8054\u7f51\u4fe1\u606f\u68c0\u7d22\u4e0e\u6574\u5408\u8fc7\u7a0b\u4e2d\u7684\u601d\u7ef4\u6a21\u5f0f\uff0c\u53ef\u901a\u8fc7\u4e00\u4e2a\u7b80\u5355\u800c\u6709\u6548\u7684\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u6846\u67b6\u5b9e\u73b0\u3002WebPlanner\u4ee5\u52a8\u6001\u56fe\u6784\u5efa\u8fc7\u7a0b\u6765\u6a21\u62df\u4eba\u7c7b\u591a\u6b65\u9aa4\u4fe1\u606f\u68c0\u7d22\u7684\u601d\u7ef4\uff1a\u5b83\u5c06\u7528\u6237\u67e5\u8be2\u5206\u89e3\u4e3a\u539f\u5b50\u5b50\u95ee\u9898\u4f5c\u4e3a\u56fe\u4e2d\u7684\u8282\u70b9\uff0c\u5e76\u6839\u636e\u4eceWebSearcher\u83b7\u53d6\u7684\u641c\u7d22\u7ed3\u679c\u9010\u6b65\u6269\u5c55\u56fe\u3002WebSearcher\u627f\u62c5\u6bcf\u4e2a\u5b50\u95ee\u9898\uff0c\u6267\u884c\u5206\u5c42\u4fe1\u606f\u68c0\u7d22\u5e76\u4ece\u641c\u7d22\u5f15\u64ce\u6536\u96c6\u6709\u4ef7\u503c\u7684\u4fe1\u606f\u4f9bWebPlanner\u4f7f\u7528\u3002MindSearch\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u4f7f\u5176\u6574\u4f53\u6846\u67b6\u80fd\u591f\u5e76\u884c\u4ece\u8d85\u8fc7300\u4e2a\u7f51\u9875\u4e2d\u68c0\u7d22\u548c\u6574\u5408\u4fe1\u606f\uff0c\u4ec5\u97003\u5206\u949f\uff0c\u76f8\u5f53\u4e8e\u8282\u7701\u4e863\u5c0f\u65f6\u7684\u4eba\u7c7b\u52aa\u529b\u3002 MindSearch\u5728\u6df1\u5ea6\u548c\u5e7f\u5ea6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u54cd\u5e94\u8d28\u91cf\uff0c\u9002\u7528\u4e8e\u5c01\u95ed\u96c6\u548c\u5f00\u653e\u96c6\u7684\u95ee\u7b54\u95ee\u9898\u3002\u6b64\u5916\uff0c\u57fa\u4e8eInternLM2.5-7B\u7684MindSearch\u751f\u6210\u7684\u54cd\u5e94\u88ab\u4eba\u7c7b\u8ba4\u4e3a\u4f18\u4e8eChatGPT-Web\u548cPerplexity.ai\u5e94\u7528\uff0c\u8fd9\u8868\u660eMindSearch\u5df2\u7ecf\u80fd\u591f\u63d0\u4f9b\u4e0e\u4e13\u6709AI\u641c\u7d22\u5f15\u64ce\u76f8\u7ade\u4e89\u7684\u89e3\u51b3\u65b9\u6848\u3002**|\n", "2407.20174": "|**2024-07-29**|**Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning**|Xingchen Zeng et.al.|[2407.20174](http://arxiv.org/abs/2407.20174)|**[link](https://github.com/zengxingchen/chartqa-mllm)**|**\u65b0\u5174\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u56fe\u8868\u95ee\u9898\u56de\u7b54\uff08CQA\uff09\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u8fd1\u671f\u7684\u52aa\u529b\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u8fc7\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u6269\u5927\u8bad\u7ec3\u6570\u636e\u96c6\uff08\u5305\u62ec\u56fe\u8868\u3001\u6570\u636e\u8868\u683c\u548c\u95ee\u7b54\u5bf9\uff09\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u73b0\u6709MLLMs\u548cCQA\u6570\u636e\u96c6\u7684\u5b9e\u8bc1\u7814\u7a76\u63ed\u793a\u4e86\u663e\u8457\u7684\u5dee\u8ddd\u3002 \u9996\u5148\uff0c\u5f53\u524d\u7684\u6570\u636e\u6536\u96c6\u548c\u5408\u6210\u5de5\u4f5c\u4fa7\u91cd\u4e8e\u6570\u636e\u91cf\uff0c\u800c\u5ffd\u7565\u4e86\u7cbe\u7ec6\u7684\u89c6\u89c9\u7f16\u7801\u548c\u95ee\u7b54\u4efb\u52a1\u7684\u8003\u8651\uff0c\u5bfc\u81f4\u6570\u636e\u5206\u5e03\u4e0e\u5b9e\u9645CQA\u573a\u666f\u5927\u76f8\u5f84\u5ead\uff0c\u4e0d\u5e73\u8861\u6027\u660e\u663e\u3002\u5176\u6b21\uff0c\u73b0\u6709\u7684\u5de5\u4f5c\u9075\u5faa\u4e86\u6700\u521d\u8bbe\u8ba1\u7528\u4e8e\u81ea\u7136\u56fe\u50cf\u7684\u57fa\u7840MLLMs\u7684\u8bad\u7ec3\u914d\u65b9\uff0c\u5bf9\u4e8e\u56fe\u8868\u7684\u72ec\u7279\u7279\u6027\uff0c\u5982\u4e30\u5bcc\u7684\u6587\u672c\u5143\u7d20\u7684\u9002\u5e94\u6027\u63a2\u7d22\u4e0d\u8db3\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u53c2\u8003\u6307\u4ee4\u8c03\u6574\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u589e\u5f3a\u548c\u6a21\u578b\u5f00\u53d1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u5f15\u64ce\uff0c\u80fd\u591f\u4ece\u73b0\u6709\u6570\u636e\u96c6\u4e2d\u6709\u6548\u5730\u7b5b\u9009\u51fa\u591a\u6837\u6027\u548c\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u5e76\u968f\u540e\u5229\u7528\u57fa\u4e8eLLM\u7684\u751f\u6210\u6280\u672f\u5bf9\u6570\u636e\u8fdb\u884c\u7ec6\u5316\u548c\u6269\u5145\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0e\u5b9e\u9645\u95ee\u7b54\u4efb\u52a1\u548c\u89c6\u89c9\u7f16\u7801\u76f8\u5339\u914d\u3002 \u7136\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u56fe\u8868\u7279\u6027\u7684\u9002\u5e94\u6027\uff0c\u6211\u4eec\u5229\u7528\u4e30\u5bcc\u5316\u6570\u636e\u6765\u8bad\u7ec3\u4e00\u4e2aMLLM\uff0c\u901a\u8fc7\u89e3\u51bb\u89c6\u89c9\u7f16\u7801\u5668\u5e76\u5f15\u5165\u6df7\u5408\u5206\u8fa8\u7387\u9002\u5e94\u7b56\u7565\uff0c\u4ee5\u589e\u5f3a\u7ec6\u5fae\u7c92\u5ea6\u8bc6\u522b\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u9a8c\u8bc1\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u5373\u4f7f\u4f7f\u7528\u8f83\u5c11\u7684\u8bad\u7ec3\u793a\u4f8b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e5f\u59cb\u7ec8\u4f18\u4e8e\u73b0\u6709\u7684CQA\u6a21\u578b\uff0c\u5728\u5df2\u5efa\u7acb\u7684\u57fa\u51c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u6211\u4eec\u8fd8\u8d21\u732e\u4e86\u4e00\u4e2a\u6570\u636e\u96c6\u5206\u5272\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u7684\u57fa\u51c6\u3002\u8be5\u8bba\u6587\u7684\u6e90\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\u4e8ehttps://github.com/zengxingchen/ChartQA-MLLM\u3002**|\n", "2407.20171": "|**2024-07-29**|**Diffusion Feedback Helps CLIP See Better**|Wenxuan Wang et.al.|[2407.20171](http://arxiv.org/abs/2407.20171)|**[link](https://github.com/baaivision/diva)**|\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u5728\u8de8\u9886\u57df\u548c\u6a21\u6001\u62bd\u8c61\u5f00\u653e\u4e16\u754c\u8868\u793a\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5df2\u6210\u4e3a\u5404\u79cd\u89c6\u89c9\u548c\u591a\u6a21\u6001\u4efb\u52a1\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86CLIP\u5728\u89c6\u89c9\u65b9\u9762\u7684\u4e25\u91cd\u5c40\u9650\u6027\uff0c\u5982\u96be\u4ee5\u533a\u5206\u65b9\u5411\u3001\u6570\u91cf\u3001\u989c\u8272\u3001\u7ed3\u6784\u7b49\u3002\u8fd9\u4e9b\u89c6\u89c9\u5c40\u9650\u6027\u4e5f\u9650\u5236\u4e86\u57fa\u4e8eCLIP\u6784\u5efa\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u611f\u77e5\u80fd\u529b\u3002\u4e3b\u8981\u539f\u56e0\u662f\u7528\u4e8e\u8bad\u7ec3CLIP\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u56fa\u6709\u504f\u89c1\uff0c\u7531\u4e8e\u6587\u672c\u7684\u4e0d\u660e\u786e\u6027\u548c\u56fe\u7247\u591a\u6837\u6027\u4e0d\u8db3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9CLIP\u6a21\u578b\u7684\u7b80\u5355\u540e\u5904\u7406\u65b9\u6cd5\uff0c\u901a\u8fc7\u81ea\u6211\u76d1\u7763\u7684\u6269\u6563\u8fc7\u7a0b\u6781\u5927\u5730\u514b\u670d\u4e86\u5176\u89c6\u89c9\u5c40\u9650\u6027\u3002\u6211\u4eec\u5f15\u5165\u4e86DIVA\uff0c\u5373\u4f5c\u4e3aCLIP\u89c6\u89c9\u8f85\u52a9\u7684\u6269\u6563\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0cDIVA\u5229\u7528\u6587\u672c\u5230\u56fe\u50cf\u6269\u6563\u6a21\u578b\u7684\u751f\u6210\u53cd\u9988\u6765\u4f18\u5316CLIP\u8868\u793a\uff0c\u4ec5\u4f7f\u7528\u56fe\u50cf\uff08\u4e0d\u5305\u62ec\u5bf9\u5e94\u6587\u672c\uff09\u3002\u6211\u4eec\u8bc1\u660eDIVA\u5728MMVP-VLM\u57fa\u51c6\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86CLIP\u7684\u6027\u80fd\uff0c\u8be5\u57fa\u51c6\u5e7f\u6cdb\u8bc4\u4f30\u4e86\u7ec6\u5fae\u7684\u89c6\u89c9\u80fd\u529b\uff08\u4f8b\u5982\uff0c3-7%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u589e\u5f3a\u4e86MLLMs\u548c\u89c6\u89c9\u6a21\u578b\u5728\u591a\u6a21\u6001\u7406\u89e3\u548c\u5206\u5272\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u572829\u4e2a\u56fe\u50cf\u5206\u7c7b\u548c\u68c0\u7d22\u57fa\u51c6\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u4fdd\u7559\u4e86CLIP\u5f3a\u5927\u7684\u96f6\u6837\u672c\u80fd\u529b\u3002\u4ee3\u7801\u5c06\u5728https://github.com/baaivision/DIVA\u516c\u5f00\u3002|\n", "2407.20164": "|**2024-07-29**|**Language-Conditioned Offline RL for Multi-Robot Navigation**|Steven Morad et.al.|[2407.20164](http://arxiv.org/abs/2407.20164)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u4e3a\u591a\u673a\u5668\u4eba\u56e2\u961f\u5f00\u53d1\u80fd\u591f\u7406\u89e3\u5e76\u9075\u5faa\u81ea\u7136\u8bed\u8a00\u6307\u4ee4\u7684\u5bfc\u822a\u7b56\u7565\u3002\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5d4c\u5165\u6765\u6761\u4ef6\u5316\u8fd9\u4e9b\u7b56\u7565\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528\u4ec520\u5206\u949f\u968f\u673a\u6536\u96c6\u7684\u6570\u636e\u8fdb\u884c\u79bb\u7ebf\u5f3a\u5316\u5b66\u4e60\u6765\u8bad\u7ec3\u5b83\u4eec\u3002\u5728\u4e94\u53f0\u771f\u5b9e\u673a\u5668\u4eba\u7684\u5b9e\u9a8c\u4e2d\uff0c\u8fd9\u4e9b\u7b56\u7565\u5bf9\u672a\u89c1\u8fc7\u7684\u547d\u4ee4\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u8868\u660e\u5b83\u4eec\u7406\u89e3\u4e86LLM\u7684\u6f5c\u5728\u7a7a\u95f4\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u9700\u8981\u6a21\u62df\u5668\u6216\u73af\u5883\u6a21\u578b\uff0c\u5e76\u4ea7\u751f\u4f4e\u5ef6\u8fdf\u7684\u63a7\u5236\u7b56\u7565\uff0c\u53ef\u4ee5\u76f4\u63a5\u90e8\u7f72\u5230\u771f\u5b9e\u673a\u5668\u4eba\u4e0a\u800c\u65e0\u9700\u8fdb\u4e00\u6b65\u8c03\u4f18\u3002\u66f4\u591a\u4fe1\u606f\u548c\u5b9e\u9a8c\u89c6\u9891\u8bf7\u53c2\u9605https://sites.google.com/view/llm-marl\u3002|\n", "2407.20157": "|**2024-07-29**|**rLLM: Relational Table Learning with LLMs**|Weichen Li et.al.|[2407.20157](http://arxiv.org/abs/2407.20157)|**[link](https://github.com/rllm-project/rllm)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3arLLM\uff08\u5173\u7cfbLLM\uff09\u7684PyTorch\u5e93\uff0c\u65e8\u5728\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5b9e\u73b0\u5173\u7cfb\u8868\u5b66\u4e60\uff08RTL\uff09\u3002\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u6700\u5148\u8fdb\u7684\u56fe\u795e\u7ecf\u7f51\u7edc\u3001LLMs\u548c\u8868\u795e\u7ecf\u7f51\u7edc\u5206\u89e3\u4e3a\u6807\u51c6\u5316\u6a21\u5757\uff0c\u4ee5\u5b9e\u73b0\u5feb\u901f\u6784\u5efa\u65b0\u578bRTL\u578b\u6a21\u578b\u7684\u7b80\u5355\u201c\u7ec4\u5408\u3001\u5bf9\u9f50\u548c\u8054\u5408\u8bad\u7ec3\u201d\u65b9\u5f0f\u3002\u4e3a\u4e86\u8bf4\u660erLLM\u7684\u4f7f\u7528\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u7b80\u5355\u7684RTL\u65b9\u6cd5\u540d\u4e3aBRIDGE\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7ecf\u5178\u6570\u636e\u96c6\uff0c\u63d0\u51fa\u4e86\u4e09\u4e2a\u65b0\u7684\u5173\u7cfb\u8868\u683c\u6570\u636e\u96c6\uff08TML1M\u3001TLF2K\u548cTACM12K\uff09\u3002\u6211\u4eec\u5e0c\u671brLLM\u80fd\u591f\u4f5c\u4e3a\u7528\u4e8eRTL\u76f8\u5173\u4efb\u52a1\u7684\u6709\u7528\u4e14\u6613\u4e8e\u4f7f\u7528\u7684\u5f00\u53d1\u6846\u67b6\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u4f4d\u7f6e\u83b7\u53d6\uff1ahttps://github.com/rllm-project/rllm\u3002**|\n", "2407.20143": "|**2024-07-29**|**ByteCheckpoint: A Unified Checkpointing System for LLM Development**|Borui Wan et.al.|[2407.20143](http://arxiv.org/abs/2407.20143)|null|\u5728\u6784\u5efa\u5b9e\u9645\u4e16\u754c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\uff0c\u9700\u8981\u5728\u6301\u4e45\u5b58\u50a8\u4e2d\u68c0\u67e5\u8bad\u7ec3\u72b6\u6001\u4ee5\u9632\u6b62\u6f5c\u5728\u7684\u8f6f\u4ef6\u548c\u786c\u4ef6\u6545\u969c\uff0c\u5e76\u652f\u6301\u8bad\u7ec3\u7ba1\u9053\u5185\u7684\u68c0\u67e5\u70b9\u8f6c\u79fb\u4ee5\u53ca\u8de8\u4efb\u52a1\u4f7f\u7528\u3002\u7531\u4e8eLLMs\u7684\u89c4\u6a21\u5e9e\u5927\uff0c\u4fdd\u5b58\u548c\u52a0\u8f7d\u68c0\u67e5\u70b9\u5f80\u5f80\u4f1a\u5bfc\u81f4\u4ee4\u4eba\u96be\u4ee5\u63a5\u53d7\u7684\u5206\u949f\u7ea7\u5ef6\u8fdf\uff0c\u6781\u5927\u5730\u964d\u4f4e\u4e86\u8bad\u7ec3\u6548\u7387\u3002\u6b64\u5916\uff0c\u5728\u8de8\u4efb\u52a1\u8f6c\u79fb\u68c0\u67e5\u70b9\u65f6\uff0c\u901a\u5e38\u9700\u8981\u6267\u884c\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\uff0c\u5373\u6839\u636e\u7279\u5b9a\u4efb\u52a1\u7684\u7279\u6027\u548c\u8d44\u6e90\u914d\u989d\u5c06\u68c0\u67e5\u70b9\u52a0\u8f7d\u5230\u4e0d\u540c\u7684\u5e76\u884c\u914d\u7f6e\u4e2d\u3002\u5148\u524d\u7684\u68c0\u67e5\u70b9\u7cfb\u7edf\u5047\u8bbe\u5e76\u884c\u914d\u7f6e\u4e00\u81f4\uff0c\u672a\u80fd\u89e3\u51b3\u5728\u91cd\u65b0\u5206\u7247\u671f\u95f4\u8f6c\u6362\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u800c\u4e14\uff0c\u5728\u5de5\u4e1a\u5e73\u53f0\u4e2d\uff0c\u5f00\u53d1\u8005\u4ece\u4e0d\u540c\u7684\u8bad\u7ec3\u6846\u67b6\u521b\u5efa\u68c0\u67e5\u70b9\uff0c\u6bcf\u4e2a\u6846\u67b6\u90fd\u6709\u5176\u72ec\u7279\u7684\u5b58\u50a8\u548cI/O\u903b\u8f91\uff0c\u8fd9\u589e\u52a0\u4e86\u7edf\u4e00\u7ba1\u7406\u548c\u4f18\u5316\u68c0\u67e5\u70b9\u7684\u590d\u6742\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86ByteCheckpoint\uff0c\u4e00\u4e2a\u652f\u6301\u81ea\u52a8\u5728\u7ebf\u68c0\u67e5\u70b9\u91cd\u65b0\u5206\u7247\u7684PyTorch\u539f\u751f\u591a\u6846\u67b6LLM\u68c0\u67e5\u70b9\u7cfb\u7edf\u3002ByteCheckpoint\u91c7\u7528\u6570\u636e/\u5143\u6570\u636e\u5206\u79bb\u7684\u5b58\u50a8\u67b6\u6784\uff0c\u89e3\u8026\u4e86\u68c0\u67e5\u70b9\u5b58\u50a8\u4e0e\u6240\u91c7\u7528\u7684\u5e76\u884c\u7b56\u7565\u548c\u8bad\u7ec3\u6846\u67b6\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9ad8\u6548\u7684\u5f02\u6b65\u5f20\u91cf\u5408\u5e76\u6280\u672f\u6765\u89e3\u51b3\u4e0d\u89c4\u5219\u5f20\u91cf\u5206\u7247\u95ee\u9898\uff0c\u5e76\u63d0\u51fa\u4e86\u591a\u9879I/O\u6027\u80fd\u4f18\u5316\u63aa\u65bd\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u68c0\u67e5\u70b9\u4fdd\u5b58\u548c\u52a0\u8f7d\u7684\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cByteCheckpoint\u5728\u51cf\u5c11\u68c0\u67e5\u70b9\u4fdd\u5b58\uff08\u6700\u9ad8\u53ef\u8fbe529.22\u500d\uff09\u548c\u52a0\u8f7d\uff08\u6700\u9ad8\u53ef\u8fbe3.51\u500d\uff09\u6210\u672c\u65b9\u9762\u5177\u6709\u660e\u663e\u4f18\u52bf\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\u3002|\n", "2407.20053": "|**2024-07-29**|**Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models**|Zhe Li et.al.|[2407.20053](http://arxiv.org/abs/2407.20053)|null|\u663e\u8457\u6ce2\u9ad8\uff08SWH\uff09\u5728\u6d77\u6d0b\u79d1\u5b66\u4e2d\u662f\u4e00\u4e2a\u5173\u952e\u6307\u6807\uff0c\u7cbe\u786e\u7684SWH\u4f30\u8ba1\u5bf9\u4e8e\u5404\u79cd\u5e94\u7528\u81f3\u5173\u91cd\u8981\uff0c\u4f8b\u5982\u6d77\u6d0b\u80fd\u5f00\u53d1\u3001\u6e14\u4e1a\u3001\u6f5c\u5728\u98ce\u9669\u7684\u65e9\u671f\u9884\u8b66\u7cfb\u7edf\u7b49\u3002\u57fa\u4e8e\u6570\u503c\u6a21\u578b\u548c\u7269\u7406\u7406\u8bba\u7684\u4f20\u7edfSWH\u4f30\u7b97\u65b9\u6cd5\u53d7\u5230\u8ba1\u7b97\u6548\u7387\u4f4e\u4e0b\u7684\u9650\u5236\u3002\u8fd1\u5e74\u6765\uff0c\u673a\u5668\u5b66\u4e60\u4f5c\u4e3a\u4e00\u79cd\u6709\u5438\u5f15\u529b\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5df2\u7528\u4e8e\u63d0\u9ad8\u51c6\u786e\u5ea6\u5e76\u51cf\u5c11\u8ba1\u7b97\u65f6\u95f4\u3002\u7136\u800c\uff0c\u7531\u4e8e\u89c2\u6d4b\u6280\u672f\u6709\u9650\u548c\u6210\u672c\u9ad8\u6602\uff0c\u5b9e\u9645\u6570\u636e\u7684\u7a00\u7f3a\u6027\u9650\u5236\u4e86\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6d77\u6d0bSWH\u4f30\u7b97\u6846\u67b6\uff0c\u540d\u4e3aOrca\u3002\u5177\u4f53\u800c\u8a00\uff0cOrca\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u7a7a\u95f4\u65f6\u95f4\u611f\u77e5\u7f16\u7801\u6a21\u5757\uff0c\u589e\u5f3a\u4e86\u7ecf\u5178\u8bed\u8a00\u6a21\u578b\u5728\u7a7a\u95f4\u65f6\u95f4\u548c\u6570\u636e\u91cf\u6709\u9650\u60c5\u51b5\u4e0b\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u6709\u9650\u7684\u6d6e\u6807\u89c2\u6d4b\u6570\u636e\u8fdb\u884c\u65f6\u95f4\u5206\u5272\u3001\u7f16\u7801\u6d6e\u6807\u7684\u5730\u7406\u4f4d\u7f6e\u3001\u8bbe\u8ba1\u63d0\u793a\u6a21\u677f\uff0cOrca\u5229\u7528\u5927\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u6709\u6548\u5730\u4f7f\u7528\u6709\u9650\u7684\u6570\u636e\u5bf9\u663e\u8457\u6ce2\u9ad8\u8fdb\u884c\u4f30\u7b97\u3002\u5728\u58a8\u897f\u54e5\u6e7e\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cOrca\u5728SWH\u4f30\u7b97\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2407.21018": "|**2024-07-30**|**ThinK: Thinner Key Cache by Query-Driven Pruning**|Yuhui Xu et.al.|[2407.21018](http://arxiv.org/abs/2407.21018)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5f15\u53d1\u4e86\u4e00\u573a\u9769\u547d\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u5927\u7684\u6a21\u578b\u89c4\u6a21\u548c\u5e8f\u5217\u957f\u5ea6\uff0c\u5b9e\u73b0\u4e86\u524d\u6240\u672a\u6709\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u968f\u4e4b\u800c\u6765\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6210\u672c\u7684\u589e\u52a0\u5e26\u6765\u4e86\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\uff0c\u7531\u4e8e\u6ce8\u610f\u529b\u673a\u5236\u7684\u4e8c\u6b21\u590d\u6742\u6027\uff0c\u5bf9\u7f13\u5b58\u5185\u5b58\u7ba1\u7406\u63d0\u51fa\u4e86\u4e25\u5cfb\u8003\u9a8c\u3002\u672c\u6587\u4e13\u6ce8\u4e8e\u957f\u4e0a\u4e0b\u6587\u573a\u666f\uff0c\u9488\u5bf9\u63a8\u7406\u8fc7\u7a0b\u4e2dKV\u7f13\u5b58\u5185\u5b58\u6d88\u8017\u7684\u6548\u7387\u95ee\u9898\u8fdb\u884c\u6df1\u5165\u63a2\u8ba8\u3002\u4e0e\u73b0\u6709\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u57fa\u4e8e\u5e8f\u5217\u957f\u5ea6\u4f18\u5316\u5185\u5b58\u4e0d\u540c\uff0c\u6211\u4eec\u63ed\u793a\u4e86KV\u7f13\u5b58\u901a\u9053\u5728\u6743\u91cd\u5206\u5e03\u4e0d\u5747\u548c\u4f4e\u79e9\u7ed3\u6784\u7279\u5f81\u4e0b\u5b58\u5728\u663e\u8457\u5197\u4f59\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c2\u5bdf\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aThinK\u7684\u65b0\u578b\u67e5\u8be2\u4f9d\u8d56\u578bKV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\uff0c\u65e8\u5728\u6700\u5c0f\u5316\u6ce8\u610f\u529b\u6743\u91cd\u635f\u5931\u7684\u540c\u65f6\uff0c\u6709\u9009\u62e9\u5730\u526a\u679d\u6389\u6700\u4e0d\u91cd\u8981\u7684\u901a\u9053\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u80fd\u591f\u4fdd\u6301\u6216\u63d0\u5347\u6a21\u578b\u51c6\u786e\u7387\uff0c\u800c\u4e14\u76f8\u6bd4\u4f20\u7edf\u7684KV\u7f13\u5b58\u6dd8\u6c70\u65b9\u6cd5\uff0c\u80fd\u5b9e\u73b0\u8d85\u8fc720%\u7684\u5185\u5b58\u6210\u672c\u51cf\u5c11\u3002\u901a\u8fc7\u5728LLaMA3\u548cMistral\u6a21\u578b\u4e0a\u5bf9\u591a\u4e2a\u957f\u5e8f\u5217\u6570\u636e\u96c6\u8fdb\u884c\u7684\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u8bc1\u660e\u4e86ThinK\u7684\u6709\u6548\u6027\uff0c\u786e\u7acb\u4e86\u5728\u4e0d\u727a\u7272\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u9ad8\u6548\u90e8\u7f72LLM\u7684\u65b0\u6807\u51c6\u3002\u6211\u4eec\u8fd8\u5c55\u671b\u4e86\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u6269\u5c55\u5230\u503c\u7f13\u5b58\u526a\u679d\u7684\u53ef\u80fd\u6027\uff0c\u5c55\u793a\u4e86ThinK\u5728\u964d\u4f4e\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u65b9\u9762\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u548c\u6f5c\u529b\u3002|\n", "2407.21011": "|**2024-07-30**|**CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning**|Yuexi Du et.al.|[2407.21011](http://arxiv.org/abs/2407.21011)|**[link](https://github.com/xypb/cleft)**|**\u8fd1\u671f\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u7684\u8fdb\u5c55\u5728\u591a\u4efb\u52a1\u81ea\u76d1\u7763\u8868\u793a\u5b66\u4e60\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u7136\u800c\uff0c\u73b0\u6709CLIP\u7c7b\u65b9\u6cd5\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684GPU\u8d44\u6e90\u548c\u957f\u65f6\u95f4\u7684\u8bad\u7ec3\u5468\u671f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u6a21\u578b\u548c\u6570\u636e\u96c6\u7684\u89c4\u6a21\u5de8\u5927\uff0c\u5bf9\u4e8e\u533b\u5b66\u5e94\u7528\u800c\u8a00\uff0c\u5927\u89c4\u6a21\u6570\u636e\u96c6\u5e76\u4e0d\u603b\u662f\u5e38\u89c1\u3002\u540c\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u63d0\u793a\u4e3b\u8981\u57fa\u4e8e\u4e0e\u56fe\u50cf\u5173\u8054\u7684\u6807\u7b7e\u8fdb\u884c\u624b\u52a8\u63d0\u53d6\uff0c\u53ef\u80fd\u5ffd\u89c6\u4e86\u8bad\u7ec3\u6837\u672c\u5185\u7684\u4e30\u5bcc\u4fe1\u606f\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u9ad8\u6548\u5927\u8bed\u8a00\u6a21\u578b\u4e0e\u63d0\u793a\u5fae\u8c03\u201d\uff08CLEFT\uff09\u7684\u8bed\u8a00-\u56fe\u50cf\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5b83\u5145\u5206\u5229\u7528\u4e86\u5e7f\u6cdb\u9884\u8bad\u7ec3\u7684\u8bed\u4e49\u548c\u89c6\u89c9\u6a21\u578b\u7684\u4f18\u52bf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7b56\u7565\u6765\u5b66\u4e60\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u63d0\u793a\uff0c\u4ee5\u7f29\u5c0f\u4e34\u5e8a\u8bca\u65ad\u6570\u636e\u4e0e\u7b80\u5355\u7c7b\u522b\u6807\u7b7e\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u591a\u4e2a\u80f8\u90e8X\u5149\u548c\u4e73\u817aX\u5149\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u5747\u4f18\u4e8e\u5404\u79cd\u57fa\u7ebf\uff0c\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002 \u6240\u63d0\u51fa\u7684\u53c2\u6570\u9ad8\u6548\u7684\u6846\u67b6\u53ef\u4ee5\u5c06\u603b\u53ef\u8bad\u7ec3\u6a21\u578b\u5927\u5c0f\u51cf\u5c1139%\uff0c\u5e76\u5c06\u53ef\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u51cf\u5c11\u5230\u4ec54%\uff0c\u4e0e\u5f53\u524d\u7684BERT\u7f16\u7801\u5668\u76f8\u6bd4\u3002**|\n", "2407.20999": "|**2024-07-30**|**MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning**|Yupeng Chen et.al.|[2407.20999](http://arxiv.org/abs/2407.20999)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u975e\u51e1\u7684\u80fd\u529b\u3002\u901a\u5e38\uff0cLLM\u901a\u8fc7\u5927\u91cf\u8bed\u6599\u5e93\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u5e76\u968f\u540e\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0cLLM\u53ef\u80fd\u4f1a\u5fd8\u8bb0\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\u5b66\u5230\u7684\u77e5\u8bc6\uff0c\u5bfc\u81f4\u4e00\u822c\u80fd\u529b\u4e0b\u964d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5fae\u8c03\u7b97\u6cd5\u2014\u2014\u52a8\u91cf\u8fc7\u6ee4\u4f18\u5316\u5668\uff08MoFO\uff09\u3002MoFO\u7684\u6838\u5fc3\u601d\u60f3\u662f\u8fed\u4ee3\u5730\u9009\u62e9\u5e76\u66f4\u65b0\u5177\u6709\u6700\u5927\u52a8\u91cf\u5e45\u5ea6\u7684\u6a21\u578b\u53c2\u6570\u3002\u4e0e\u5168\u53c2\u6570\u8bad\u7ec3\u76f8\u6bd4\uff0cMoFO\u5728\u4fdd\u6301\u53c2\u6570\u63a5\u8fd1\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u540c\u65f6\u5b9e\u73b0\u4e86\u76f8\u4f3c\u7684\u5fae\u8c03\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86\u77e5\u8bc6\u9057\u5fd8\u7684\u95ee\u9898\u3002\u4e0e\u73b0\u6709\u7684\u5927\u591a\u6570\u9057\u5fd8\u7f13\u89e3\u65b9\u6cd5\u4e0d\u540c\uff0cMoFO\u5177\u5907\u4ee5\u4e0b\u4e24\u4e2a\u4f18\u52bf\u3002\u9996\u5148\uff0cMoFO\u4e0d\u9700\u8981\u8bbf\u95ee\u9884\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4f7f\u5f97MoFO\u7279\u522b\u9002\u7528\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u4e0d\u53ef\u7528\u7684\u5fae\u8c03\u573a\u666f\uff0c\u5982\u4f7f\u7528\u5f00\u6e90LLM\u7684\u68c0\u67e5\u70b9\u8fdb\u884c\u5fae\u8c03\u3002\u5176\u6b21\uff0cMoFO\u4e0d\u4f1a\u6539\u53d8\u539f\u59cb\u635f\u5931\u51fd\u6570\u3002\u8fd9\u53ef\u4ee5\u907f\u514d\u635f\u5bb3\u6a21\u578b\u5728\u5fae\u8c03\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u901a\u8fc7\u4e25\u8c28\u7684\u6536\u655b\u6027\u5206\u6790\u548c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86MoFO\u7684\u4f18\u8d8a\u6027\uff0c\u8bc1\u660e\u4e86\u5b83\u5728\u7f13\u89e3\u9057\u5fd8\u548c\u589e\u5f3a\u5fae\u8c03\u6027\u80fd\u65b9\u9762\u7684\u4f18\u52bf\u3002|\n", "2407.20990": "|**2024-07-30**|**From Feature Importance to Natural Language Explanations Using LLMs with RAG**|Sule Tekkesinoglu et.al.|[2407.20990](http://arxiv.org/abs/2407.20990)|**[link](https://github.com/suletekkesinoglu/xai_llm_rag)**|\u968f\u7740\u673a\u5668\u5b66\u4e60\u5728\u6d89\u53ca\u4eba\u7c7b\u4ea4\u4e92\u7684\u81ea\u4e3b\u51b3\u7b56\u8fc7\u7a0b\u4e2d\u7684\u4f5c\u7528\u65e5\u76ca\u91cd\u8981\uff0c\u7406\u89e3\u6a21\u578b\u8f93\u51fa\u53d8\u5f97\u8d8a\u6765\u8d8a\u5173\u952e\u3002\u6700\u8fd1\uff0c\u57fa\u7840\u6a21\u578b\u6b63\u88ab\u63a2\u7d22\u7528\u4f5c\u4e8b\u540e\u89e3\u91ca\u5668\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u63ed\u793a\u9884\u6d4b\u6a21\u578b\u51b3\u7b56\u673a\u5236\u7684\u9014\u5f84\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u53ef\u8ffd\u8e2a\u95ee\u7b54\u65b9\u6cd5\uff0c\u901a\u8fc7\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u6307\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u573a\u666f\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u7528\u6237\u67e5\u8be2\u8fdb\u884c\u54cd\u5e94\u3002\u8be5\u77e5\u8bc6\u5e93\u5305\u542b\u4e86\u5173\u4e8e\u6a21\u578b\u8f93\u51fa\u7684\u4e0a\u4e0b\u6587\u7ec6\u8282\uff0c\u5305\u62ec\u9ad8\u7ea7\u7279\u5f81\u3001\u7279\u5f81\u91cd\u8981\u6027\u4ee5\u53ca\u66ff\u4ee3\u6982\u7387\u3002 \u6211\u4eec\u91c7\u7528\u51cf\u6cd5\u53cd\u4e8b\u5b9e\u63a8\u7406\u8ba1\u7b97\u7279\u5f81\u91cd\u8981\u6027\uff0c\u8fd9\u662f\u4e00\u79cd\u5206\u6790\u5728\u5206\u89e3\u8bed\u4e49\u7279\u5f81\u540e\u8f93\u51fa\u53d8\u5316\u7684\u65b9\u6cd5\u3002\u4e3a\u4e86\u4fdd\u6301\u5bf9\u8bdd\u6d41\u7545\uff0c\u6211\u4eec\u4ece\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u4e2d\u63d0\u70bc\u51fa\u56db\u4e2a\u5173\u952e\u7279\u6027\u2014\u2014\u793e\u4ea4\u6027\u3001\u56e0\u679c\u6027\u3001\u9009\u62e9\u6027\u548c\u5bf9\u6bd4\u6027\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u4e00\u4e2a\u5373\u65f6\u63d0\u793a\u4e2d\uff0c\u4ee5\u6b64\u6307\u5bfc\u54cd\u5e94\u751f\u6210\u8fc7\u7a0b\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u751f\u6210\u7684\u89e3\u91ca\u5305\u542b\u4e86\u8fd9\u4e9b\u5143\u7d20\uff0c\u8fd9\u8868\u660e\u5b83\u6709\u53ef\u80fd\u5728\u590d\u6742\u6a21\u578b\u8f93\u51fa\u4e0e\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u4e4b\u95f4\u67b6\u8d77\u6865\u6881\u3002|\n", "2407.20970": "|**2024-07-30**|**Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks**|Alakesh Kalita et.al.|[2407.20970](http://arxiv.org/abs/2407.20970)|null|\u968f\u7740\u7b2c\u4e94\u4ee3\uff085G\uff09\u548c\u7b2c\u516d\u4ee3\uff086G\uff09\u901a\u4fe1\u6280\u672f\u4ee5\u53ca\u7269\u8054\u7f51\uff08IoT\uff09\u7684\u5174\u8d77\uff0c\u8bed\u4e49\u901a\u4fe1\u6b63\u53d7\u5230\u7814\u7a76\u8005\u7684\u5173\u6ce8\uff0c\u56e0\u4e3a\u5f53\u524d\u7684\u901a\u4fe1\u6280\u672f\u6b63\u63a5\u8fd1\u9999\u519c\u6781\u9650\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u7406\u89e3\u5e76\u751f\u6210\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u6587\u672c\uff0c\u57fa\u4e8e\u5bf9\u6570\u5341\u4ebf\u53c2\u6570\u7684\u5e7f\u6cdb\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\u3002\u8003\u8651\u5230\u6700\u8fd1\u7684\u5c31\u8fd1\u8ba1\u7b97\u6280\u672f\u5982\u8fb9\u7f18\u8ba1\u7b97\uff0c\u672c\u6587\u6982\u8ff0\u4e86\u4e00\u4e2a\u6846\u67b6\u53ca\u5176\u6a21\u5757\uff0c\u5176\u4e2dLLMs\u53ef\u4ee5\u5728\u7269\u8054\u7f51\u7f51\u7edc\u7684\u7f51\u7edc\u8fb9\u7f18\u4e0b\uff0c\u4f5c\u4e3a\u8bed\u4e49\u901a\u4fe1\u7684\u4e00\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u9ad8\u6548\u901a\u4fe1\u6548\u7387\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u4e00\u4e9b\u5e94\u7528\uff0c\u5e76\u5206\u6790\u4e86\u53d1\u5c55\u6b64\u7c7b\u7cfb\u7edf\u7684\u6311\u6218\u548c\u673a\u9047\u3002|\n", "2407.20906": "|**2024-07-30**|**Automated Review Generation Method Based on Large Language Models**|Shican Wu et.al.|[2407.20906](http://arxiv.org/abs/2407.20906)|**[link](https://github.com/tju-ecat-ai/automaticreviewgeneration)**|**\u6587\u732e\u7814\u7a76\u5bf9\u4e8e\u79d1\u5b66\u8fdb\u6b65\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u5bf9\u6d77\u91cf\u4fe1\u606f\u7684\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u52a8\u5316\u7efc\u8ff0\u751f\u6210\u65b9\u6cd5\uff0c\u65e8\u5728\u7b80\u5316\u6587\u732e\u5904\u7406\u6d41\u7a0b\u5e76\u51cf\u8f7b\u8ba4\u77e5\u8d1f\u62c5\u3002\u4ee5\u4e19\u70f7\u8131\u6c22\uff08PDH\uff09\u50ac\u5316\u5242\u4e3a\u4f8b\uff0c\u8be5\u65b9\u6cd5\u4ece343\u7bc7\u6587\u7ae0\u4e2d\u8fc5\u901f\u751f\u6210\u4e86\u5168\u9762\u7684\u7efc\u8ff0\uff0c\u5e73\u5747\u6bcf\u7bc7\u6587\u7ae0\u6bcfLLM\u8d26\u6237\u8017\u65f6\u4ec5\u6570\u79d2\u3002\u5bf91041\u7bc7\u6587\u7ae0\u7684\u8fdb\u4e00\u6b65\u5206\u6790\u63ed\u793a\u4e86\u50ac\u5316\u5242\u7ec4\u6210\u3001\u7ed3\u6784\u548c\u6027\u80fd\u7684\u6df1\u5165\u89c1\u89e3\u3002 \u8ba4\u8bc6\u5230LLM\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\u7684\u95ee\u9898\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u591a\u5c42\u6b21\u7684\u8d28\u91cf\u63a7\u5236\u7b56\u7565\uff0c\u786e\u4fdd\u4e86\u65b9\u6cd5\u7684\u53ef\u9760\u6027\u548c\u6709\u6548\u7f13\u89e3\u5e7b\u89c9\u7684\u80fd\u529b\u3002\u4e13\u5bb6\u9a8c\u8bc1\u8bc1\u5b9e\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u751f\u6210\u7684\u7efc\u8ff0\u4e0d\u4ec5\u51c6\u786e\u4e14\u5f15\u6587\u5b8c\u6574\uff0cLLM\u5e7b\u89c9\u7684\u98ce\u9669\u5df2\u964d\u81f3\u4f4e\u4e8e0.5%\uff0c\u7f6e\u4fe1\u5ea6\u8d85\u8fc795%\u3002\u53d1\u5e03\u7684Windows\u5e94\u7528\u7a0b\u5e8f\u652f\u6301\u4e00\u952e\u751f\u6210\u7efc\u8ff0\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8ddf\u8e2a\u6700\u65b0\u8fdb\u5c55\u5e76\u63a8\u8350\u76f8\u5173\u6587\u732e\u3002\u8fd9\u4e00\u65b9\u6cd5\u5c55\u793a\u4e86LLM\u5728\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u751f\u4ea7\u529b\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5e76\u4e3a\u8fdb\u4e00\u6b65\u63a2\u7d22\u5960\u5b9a\u4e86\u57fa\u7840\u3002**|\n", "2407.20898": "|**2024-07-30**|**ThinkRepair: Self-Directed Automated Program Repair**|Xin Yin et.al.|[2407.20898](http://arxiv.org/abs/2407.20898)|**[link](https://github.com/vinci-grape/ThinkRepair)**|**\u5c3d\u7ba1\u5df2\u7ecf\u63d0\u51fa\u4e86\u8bb8\u591a\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\uff08APR\uff09\u65b9\u6cd5\uff0c\u5e76\u4e14\u5728\u4fee\u590d\u4e00\u4e9b\u7279\u5b9a\u7c7b\u578b\u7684\u9519\u8bef\u65f6\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u5bf9\u9519\u8bef\u7a0b\u5e8f\u7684\u903b\u8f91\u8fdb\u884c\u5206\u6790\u548c\u63a8\u7406\u7684\u590d\u6742\u9519\u8bef\u65f6\u4ecd\u5b58\u5728\u5c40\u9650\u6027\u3002\u6700\u8fd1\uff0c\u901a\u8fc7\u63d0\u793a\u5de5\u7a0b\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u89e3\u51b3\u5305\u62ec\u9519\u8bef\u4fee\u590d\u5728\u5185\u7684\u591a\u79cd\u4efb\u52a1\u7684\u5f3a\u5927\u80fd\u529b\u800c\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u63d0\u793a\u7684\u8d28\u91cf\u4f1a\u6781\u5927\u5730\u5f71\u54cdLLMs\u7684\u80fd\u529b\uff0c\u800c\u624b\u52a8\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u63d0\u793a\u662f\u4e00\u4e2a\u8017\u65f6\u7684\u8fc7\u7a0b\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u6211\u5bfc\u5411\u7684LLM\u57fa\u4e8e\u81ea\u52a8\u7a0b\u5e8f\u4fee\u590d\u65b9\u6cd5ThinkRepair\uff0c\u5b83\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a\u6536\u96c6\u9636\u6bb5\u548c\u4fee\u590d\u9636\u6bb5\u3002\u5728\u6536\u96c6\u9636\u6bb5\uff0c\u901a\u8fc7\u4f7f\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u6307\u5bfcLLMs\uff0c\u81ea\u52a8\u6536\u96c6\u6784\u6210\u9884\u4fee\u590d\u77e5\u8bc6\u7684\u5404\u79cd\u601d\u8003\u94fe\u3002\u5728\u4fee\u590d\u9636\u6bb5\uff0c\u76ee\u6807\u662f\u901a\u8fc7\u9996\u5148\u9009\u62e9\u7528\u4e8e\u5c11\u91cf\u5b66\u4e60\u7684\u793a\u4f8b\u5e76\u5176\u6b21\u4e0eLLMs\u81ea\u52a8\u4ea4\u4e92\u6765\u4fee\u590d\u9519\u8bef\uff0c\u6839\u636e\u6d4b\u8bd5\u4fe1\u606f\u63d0\u4f9b\u53cd\u9988\uff08\u5982\u679c\u9700\u8981\u7684\u8bdd\uff09\u3002 \u5728\u5bf9\u4e24\u4e2a\u5e7f\u6cdb\u7814\u7a76\u7684\u6570\u636e\u96c6\uff08Defects4J\u548cQuixBugs\uff09\u7684\u8bc4\u4f30\u4e2d\uff0c\u4e0e12\u4e2a\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u8fdb\u884c\u6bd4\u8f83\uff0c\u8868\u660eThinkRepair\u5728\u4fee\u590d\u9519\u8bef\u65b9\u9762\u7684\u4f18\u5148\u7ea7\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728Defects4J V1.2\u4e0a\uff0cThinkRepair\u6210\u529f\u4fee\u590d\u4e8698\u4e2a\u9519\u8bef\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u63d0\u5347\u4e8627%-344.4%\u3002\u5728Defects4J V2.0\u4e0a\uff0cThinkRepair\u6bd4\u6700\u5148\u8fdb\u7684APR\u65b9\u6cd5\u591a\u4fee\u590d\u4e8612-65\u4e2a\u9519\u8bef\u3002\u6b64\u5916\uff0c\u5728Java\u548cPython\u4e0a\uff0cThinkRepair\u5728QuixBugs\u4e0a\u7684\u8868\u73b0\u4e5f\u6709\u4e86\u663e\u8457\u63d0\u5347\uff08\u6700\u591a\u5206\u522b\u8fbe\u523031\u548c21\uff09\u3002**|\n", "2407.20884": "|**2024-07-30**|**Effective Black Box Testing of Sentiment Analysis Classification Networks**|Parsa Karbasizadeh et.al.|[2407.20884](http://arxiv.org/abs/2407.20884)|null|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u795e\u7ecf\u7f51\u7edc\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u5982\u60c5\u611f\u5206\u6790\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u590d\u6742\u67b6\u6784\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u4fdd\u6301\u53ef\u9760\u6027\u7684\u6311\u6218\u4f9d\u7136\u5b58\u5728\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u7ec4\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u8bc4\u4f30\u4e3a\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7f51\u7edc\u6784\u5efa\u7684\u6d4b\u8bd5\u5957\u4ef6\u7684\u8986\u76d6\u6807\u51c6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u8f93\u5165\u7a7a\u95f4\u5212\u5206\u7684\u9ed1\u76d2\u7b56\u7565\uff0c\u8003\u8651\u4e86\u4e0e\u60c5\u611f\u76f8\u5173\u7684\u5173\u952e\u8bed\u8a00\u7279\u5f81\uff0c\u5305\u62ec\u52a8\u8bcd\u3001\u5f62\u5bb9\u8bcd\u3001\u526f\u8bcd\u548c\u540d\u8bcd\u3002\u4e3a\u4e86\u6709\u6548\u5730\u751f\u6210\u6db5\u76d6\u5e7f\u6cdb\u60c5\u611f\u5143\u7d20\u7684\u6d4b\u8bd5\u7528\u4f8b\uff0c\u6211\u4eec\u91c7\u7528\u4e86k\u6295\u5f71\u8986\u76d6\u5ea6\u91cf\u3002\u8be5\u5ea6\u91cf\u901a\u8fc7\u4e00\u6b21\u68c0\u67e5k\u4e2a\u7279\u5f81\u7684\u5b50\u96c6\u6765\u51cf\u5c11\u95ee\u9898\u7684\u590d\u6742\u6027\uff0c\u4ece\u800c\u964d\u4f4e\u7ef4\u5ea6\u3002\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u5c55\u793a\u7279\u5b9a\u60c5\u611f\u7279\u5f81\u7ec4\u5408\u7684\u53e5\u5b50\u3002\u4ece\u60c5\u611f\u5206\u6790\u6570\u636e\u96c6\u5b9e\u9a8c\u4e2d\u83b7\u5f97\u7684\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6807\u51c6\u548c\u751f\u6210\u7684\u6d4b\u8bd5\u5e73\u5747\u63d0\u9ad8\u4e8616%\u7684\u6d4b\u8bd5\u8986\u76d6\u7387\u3002\u540c\u65f6\uff0c\u6a21\u578b\u51c6\u786e\u5ea6\u5e73\u5747\u4e0b\u964d\u4e866.5%\uff0c\u663e\u793a\u4e86\u8bc6\u522b\u8106\u5f31\u6027\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u901a\u8fc7\u5168\u9762\u6d4b\u8bd5\u8bc4\u4f30\u6539\u8fdb\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u60c5\u611f\u5206\u6790\u7cfb\u7edf\u63d0\u4f9b\u4e86\u57fa\u7840\u3002|\n", "2407.20859": "|**2024-07-30**|**Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification**|Boyang Zhang et.al.|[2407.20859](http://arxiv.org/abs/2407.20859)|null|\u8fd1\u671f\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u81ea\u4e3b\u4ee3\u7406\u5728\u7406\u8bba\u7814\u7a76\u548c\u5b9e\u9645\u5e94\u7528\u4e0a\u5747\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u8fd9\u4e9b\u4ee3\u7406\u80fd\u591f\u901a\u8fc7\u5916\u90e8\u7ec4\u4ef6\u6269\u5c55\u57fa\u7840LLM\u7684\u80fd\u529b\uff0c\u5728\u591a\u79cd\u65b9\u5f0f\u4e0b\u589e\u5f3a\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u5229\u7528GPT-3.5-Turbo\u6838\u5fc3\u6784\u5efa\u7684\u4ee3\u7406\u53ef\u80fd\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u8d85\u8d8a\u66f4\u5148\u8fdb\u7684GPT-4\u6a21\u578b\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u5de5\u5177\u7684\u5e94\u7528\u4f7f\u7cfb\u7edf\u80fd\u591f\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e92\u52a8\uff0c\u4f7f\u5176\u4ece\u4ec5\u4ec5\u751f\u6210\u6587\u672c\u8f6c\u53d8\u4e3a\u6267\u884c\u5b9e\u9645\u64cd\u4f5c\u3002\u9274\u4e8e\u4ee3\u7406\u7684\u5b9e\u9645\u5e94\u7528\u8303\u56f4\u4ee5\u53ca\u5176\u5bf9\u73af\u5883\u8fdb\u884c\u64cd\u4f5c\u7684\u80fd\u529b\uff0c\u8bc4\u4f30\u6f5c\u5728\u6f0f\u6d1e\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5982\u679c\u88ab\u9ed1\u5ba2\u5165\u4fb5\uff0c\u8fd9\u4e9b\u81ea\u4e3b\u7cfb\u7edf\u9020\u6210\u7684\u635f\u5bb3\u53ef\u80fd\u4f1a\u8d85\u8fc7\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u6709\u7814\u7a76\u63a2\u8ba8\u4e86LLM\u4ee3\u7406\u7684\u6709\u5bb3\u884c\u4e3a\uff0c\u4f46\u6211\u4eec\u7684\u7814\u7a76\u4ece\u4e0d\u540c\u89d2\u5ea6\u5ba1\u89c6\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u8bef\u5bfc\u4ee3\u7406\u6267\u884c\u91cd\u590d\u6216\u65e0\u5173\u7684\u64cd\u4f5c\uff0c\u4ece\u800c\u5f15\u53d1\u6545\u969c\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u653b\u51fb\u624b\u6bb5\u3001\u573a\u666f\u548c\u5c5e\u6027\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5176\u6613\u611f\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u4e2a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u5bfc\u81f4\u8d85\u8fc780%\u7684\u5931\u8d25\u7387\u3002\u901a\u8fc7\u5728\u591a\u4ee3\u7406\u73af\u5883\u4e2d\u9488\u5bf9\u5b9e\u73b0\u5e76\u90e8\u7f72\u7684\u4ee3\u7406\u8fdb\u884c\u653b\u51fb\uff0c\u6211\u4eec\u5f3a\u8c03\u4e86\u6b64\u7c7b\u6f0f\u6d1e\u6240\u4f34\u968f\u7684\u73b0\u5b9e\u98ce\u9669\u3002\u4e3a\u4e86\u51cf\u8f7b\u6b64\u7c7b\u653b\u51fb\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u81ea\u6211\u68c0\u67e5\u68c0\u6d4b\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u4ec5\u4f7f\u7528LLM\u5f88\u96be\u6709\u6548\u68c0\u6d4b\u5230\u8fd9\u4e9b\u653b\u51fb\uff0c\u8fd9\u51f8\u663e\u4e86\u8fd9\u79cd\u6f0f\u6d1e\u6240\u5e26\u6765\u7684\u91cd\u5927\u98ce\u9669\u3002|\n", "2407.20856": "|**2024-07-30**|**Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations**|Sarthak Anand et.al.|[2407.20856](http://arxiv.org/abs/2407.20856)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b(Large Language Models, LLMs)\u4e3a\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u4ea7\u54c1\u63a8\u8350\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u6709\u6548\u6027\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5b83\u4eec\u5bf9\u4ea7\u54c1\u5e93\u5b58\u7684\u5168\u9762\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u589e\u5f3aLLMs\u7684\u4ea7\u54c1\u77e5\u8bc6\u80fd\u529b\uff0c\u901a\u8fc7\u8bad\u7ec3\u5b83\u4eec\u54cd\u5e94\u5305\u542b\u4ea7\u54c1ID\u7684\u5408\u6210\u641c\u7d22\u67e5\u8be2\uff0c\u4ee5\u8fdb\u884c\u4e0a\u4e0b\u6587\u76f8\u5173\u56de\u590d\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u8fd9\u79cd\u65b9\u6cd5\uff0c\u8bc4\u4f30\u5176\u6548\u679c\uff0c\u6982\u8ff0\u5176\u4f18\u70b9\uff0c\u5e76\u6307\u51fa\u4e86\u9650\u5236\u56e0\u7d20\u3002\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u6b64\u65b9\u6cd5\u7684\u6539\u8fdb\u6f5c\u529b\u548c\u672a\u6765\u65b9\u5411\uff0c\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4ea7\u54c1\u63a8\u8350\u4e2d\u89d2\u8272\u7684\u5168\u9762\u7406\u89e3\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u5220\u9664\u6240\u6709','\u5b57\u7b26\u3002|\n", "2407.21771": "|**2024-07-31**|**Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs**|Shi Liu et.al.|[2407.21771](http://arxiv.org/abs/2407.21771)|null|\u73b0\u6709\u5927\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u4e3b\u8981\u901a\u8fc7\u5c06\u89c6\u89c9\u7f16\u7801\u5668\u7684\u56fe\u50cf\u7279\u5f81\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\uff0c\u5229\u7528\u5176\u5f3a\u5927\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u7136\u800c\uff0c\u89c6\u89c9\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u89c4\u6a21\u5dee\u5f02\u53ef\u80fd\u5bfc\u81f4LLM\u5728\u591a\u6a21\u6001\u7406\u89e3\u4e2d\u5360\u636e\u4e3b\u5bfc\u5730\u4f4d\u3002\u8fd9\u79cdLVLM\u4e2d\u7684\u4e0d\u5e73\u8861\u53ef\u80fd\u5f15\u53d1\u5e7b\u89c9\u73b0\u8c61\u3002\u5177\u4f53\u6765\u8bf4\uff0cLVLM\u53ef\u80fd\u751f\u6210\u4e00\u81f4\u7684\u63cf\u8ff0\uff0c\u65e0\u8bba\u662f\u5426\u6709\u89c6\u89c9\u8f93\u5165\uff0c\u8fd9\u8868\u660e\u67d0\u4e9b\u8f93\u51fa\u4ec5\u53d7\u4e0a\u4e0b\u6587\u6587\u672c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u5c06\u8fd9\u79cd\u73b0\u8c61\u79f0\u4e3a\u201c\u6587\u672c\u60ef\u6027\u201d\u3002\u4e3a\u4e86\u5bf9\u6297\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u7b97\u6cd5\u6765\u5bfb\u627e\u56fe\u50cf\u7406\u89e3\u548c\u8bed\u8a00\u63a8\u65ad\u4e4b\u95f4\u7684\u5e73\u8861\u70b9\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u52a8\u6001\u8c03\u6574\u5e76\u653e\u5927\u5206\u914d\u7ed9\u56fe\u50cf\u4ee4\u724c\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u4ece\u800c\u8d4b\u4e88\u89c6\u89c9\u5143\u7d20\u66f4\u5927\u7684\u91cd\u8981\u6027\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4ece\u591a\u6a21\u6001\u8f93\u5165\u7684logits\u4e2d\u51cf\u53bb\u7eaf\u6587\u672c\u8f93\u5165\u7684logits\uff0c\u6709\u52a9\u4e8eLVLM\u907f\u514d\u8fc7\u5206\u4f9d\u8d56LLM\u3002\u901a\u8fc7\u589e\u5f3a\u56fe\u50cf\u4ee4\u724c\u5e76\u51cf\u5c11LLM\u7684\u987d\u56fa\u8f93\u51fa\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9LVLM\u66f4\u591a\u5730\u5173\u6ce8\u56fe\u50cf\uff0c\u4ece\u800c\u7f13\u89e3\u6587\u672c\u60ef\u6027\u548c\u51cf\u5c11LVLM\u4e2d\u7684\u5e7b\u89c9\u3002\u6211\u4eec\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\uff0c\u5728\u4e0d\u540c\u6307\u6807\u4e0b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5404\u79cdLVLM\u4e2d\u7684\u5e7b\u89c9\u8f93\u51fa\u9891\u7387\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://lalbj.github.io/projects/PAI/\u3002|\n", "2407.21762": "|**2024-07-31**|**ReplanVLM: Replanning Robotic Tasks with Visual Language Models**|Aoran Mei et.al.|[2407.21762](http://arxiv.org/abs/2407.21762)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u9886\u57df\u83b7\u5f97\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u5b83\u4eec\u5728\u6587\u672c\u5206\u6790\u4e0e\u751f\u6210\u3001\u4ee5\u53ca\u5bf9\u4e16\u754c\u5e7f\u6cdb\u77e5\u8bc6\u65b9\u9762\u7684\u51fa\u8272\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u89e3\u6790\u89c6\u89c9\u7ebf\u7d22\u65b9\u9762\u7684\u80fd\u529b\u6709\u9650\uff0c\u65e0\u6cd5\u76f4\u63a5\u611f\u77e5\u4e16\u754c\u72b6\u6001\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5728\u63cf\u8ff0\u5f53\u524d\u4e16\u754c\u72b6\u6001\u4e0a\u7684\u4e0d\u8db3\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u901a\u8fc7\u96c6\u6210\u89c6\u89c9\u611f\u77e5\u6a21\u5757\uff0c\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u589e\u5f3a\u4e86\u673a\u5668\u4eba\u7684\u81ea\u4e3b\u6027\u3002\u5c3d\u7ba1\u5982\u6b64\uff0cVLM\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4f8b\u5982\uff0c\u5728\u63d0\u4f9b\u51c6\u786e\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0c\u4efb\u52a1\u6267\u884c\u9519\u8bef\u7684\u98ce\u9669\u4f9d\u7136\u5b58\u5728\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u673a\u5668\u4eba\u4efb\u52a1\u89c4\u5212\u7684ReplanVLM\u6846\u67b6\u3002\u8be5\u7814\u7a76\u91cd\u70b9\u5728\u4e8e\u9519\u8bef\u4fee\u6b63\u5e72\u9884\u63aa\u65bd\u3002\u63d0\u51fa\u4e86\u5185\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\u548c\u5916\u90e8\u9519\u8bef\u4fee\u6b63\u673a\u5236\uff0c\u5728\u76f8\u5e94\u7684\u9636\u6bb5\u8fdb\u884c\u9519\u8bef\u7ea0\u6b63\u3002\u53d1\u5c55\u4e86\u4e00\u79cd\u91cd\u89c4\u5212\u7b56\u7565\uff0c\u5f53\u4efb\u52a1\u6267\u884c\u5931\u8d25\u65f6\uff0c\u7528\u4e8e\u91cd\u65b0\u89c4\u5212\u4efb\u52a1\u6216\u4fee\u6b63\u9519\u8bef\u4ee3\u7801\u3002\u5728\u771f\u5b9e\u673a\u5668\u4eba\u548c\u4eff\u771f\u73af\u5883\u4e2d\u8fdb\u884c\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6846\u67b6\u5177\u6709\u66f4\u9ad8\u7684\u6210\u529f\u7387\u548c\u66f4\u5f3a\u7684\u5f00\u653e\u4e16\u754c\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u4fee\u6b63\u80fd\u529b\u3002\u6709\u5173\u5b9e\u9a8c\u7684\u89c6\u9891\u53ef\u4ee5\u5728https://youtu.be/NPk2pWKazJc\u627e\u5230\u3002|\n", "2407.21712": "|**2024-07-31**|**Adaptive Retrieval-Augmented Generation for Conversational Systems**|Xi Wang et.al.|[2407.21712](http://arxiv.org/abs/2407.21712)|null|\u5c3d\u7ba1\u5728\u5bf9\u8bdd\u7cfb\u7edf\u5f00\u53d1\u4e2d\u878d\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u8bb8\u591a\u7814\u7a76\u663e\u793a\u4e86\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u5bf9\u4e8e\u63d0\u4f9b\u4fe1\u606f\u6027\u54cd\u5e94\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7814\u7a76\u901a\u5e38\u5047\u8bbe\u5bf9\u8bdd\u7cfb\u7edf\u4e2d\u7684\u6bcf\u6b21\u56de\u590d\u90fd\u9700\u8981\u68c0\u7d22\u589e\u5f3a\uff0c\u800c\u65e0\u9700\u660e\u786e\u63a7\u5236\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u4e2a\u5173\u4e8e\u8fd9\u79cd\u5fc5\u8981\u6027\u7684\u7814\u7a76\u95ee\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u7cfb\u7edf\u56de\u5e94\u662f\u5426\u9700\u8981\u4f7f\u7528\u5916\u90e8\u77e5\u8bc6\u8fdb\u884c\u589e\u5f3a\u7684\u5fc5\u8981\u6027\u3002\u901a\u8fc7\u5229\u7528\u4eba\u7c7b\u5bf9\u662f\u5426\u9700\u8981\u9002\u5e94\u6027\u589e\u5f3a\u7684\u4e8c\u5143\u9009\u62e9\u8fdb\u884c\u5224\u65ad\uff0c\u6211\u4eec\u5f00\u53d1\u4e86RAGate\u2014\u2014\u4e00\u4e2a\u95f8\u95e8\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u901a\u8fc7\u5206\u6790\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u548c\u76f8\u5173\u8f93\u5165\u6765\u9884\u6d4b\u5bf9\u8bdd\u7cfb\u7edf\u662f\u5426\u9700\u8981RAG\u4ee5\u83b7\u5f97\u6539\u8fdb\u7684\u56de\u590d\u3002\u6211\u4eec\u5728\u6784\u5efa\u548c\u5e94\u7528RAGate\u5230\u5bf9\u8bdd\u6a21\u578b\u4ee5\u53ca\u5bf9\u4e0d\u540c\u5bf9\u8bdd\u573a\u666f\u8fdb\u884c\u8be6\u5c3d\u5206\u6790\u65b9\u9762\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u548c\u5206\u6790\u8868\u660e\uff0cRAGate\u5728\u8bc6\u522b\u9700\u8981RAG\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u56de\u590d\u5e76\u5177\u6709\u9ad8\u751f\u6210\u7f6e\u4fe1\u5ea6\u7684\u7cfb\u7edf\u54cd\u5e94\u65b9\u9762\u6709\u6709\u6548\u5e94\u7528\u3002\u8fd9\u9879\u7814\u7a76\u8fd8\u53d1\u73b0\u4e86\u751f\u6210\u7f6e\u4fe1\u5ea6\u6c34\u5e73\u4e0e\u589e\u5f3a\u77e5\u8bc6\u7684\u76f8\u5173\u6027\u3002|\n", "2407.21708": "|**2024-07-31**|**CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature**|Stefan Langer et.al.|[2407.21708](http://arxiv.org/abs/2407.21708)|null|\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u5df2\u6807\u6ce8\u6587\u672c\u8bed\u6599\u5e93\u548c\u4eceChebi\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u589e\u5f3a\u73b0\u6709\u77e5\u8bc6\uff0c\u5e76\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u8bc6\u522b\u79d1\u5b66\u6587\u732e\u4e2d\u7684\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u901a\u8fc7\u7ed3\u5408\u672c\u4f53\u8bba\u77e5\u8bc6\u4e0eLLM\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u79d1\u5b66\u6587\u732e\u4e2d\u8bc6\u522b\u5316\u5b66\u5b9e\u4f53\u53ca\u5176\u4f5c\u7528\u7684\u9ad8\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u4ece8000\u7bc7ChemRxiv\u6587\u7ae0\u4e2d\u63d0\u53d6\u8fd9\u4e9b\u5b9e\u4f53\u548c\u89d2\u8272\uff0c\u7136\u540e\u4f7f\u7528\u7b2c\u4e8c\u4e2aLLM\u6784\u5efa\u4e86\u4e00\u4e2a\u5316\u5b66\u5b9e\u4f53\u548c\u89d2\u8272\u7684\u77e5\u8bc6\u56fe\u8c31\uff08CEAR\uff09\uff0c\u8be5\u56fe\u8c31\u4e0d\u4ec5\u4e3aChEBI\u63d0\u4f9b\u4e86\u8865\u5145\u4fe1\u606f\uff0c\u8fd8\u80fd\u5e2e\u52a9\u6269\u5c55\u5176\u5185\u5bb9\u3002|\n", "2407.21693": "|**2024-07-31**|**TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities**|Ming Zhang et.al.|[2407.21693](http://arxiv.org/abs/2407.21693)|**[link](https://github.com/konglonggefdu/transfertod)**|\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\uff08TOD\uff09\u7cfb\u7edf\u65e8\u5728\u6709\u6548\u5904\u7406\u4efb\u52a1\u5bfc\u5411\u7684\u5bf9\u8bdd\uff0c\u5305\u62ec\u4fe1\u606f\u6536\u96c6\u3002\u5982\u4f55\u51c6\u786e\u3001\u9ad8\u6548\u4e14\u6709\u6548\u5730\u5229\u7528TOD\u8fdb\u884c\u4fe1\u606f\u6536\u96c6\u4e00\u76f4\u4ee5\u6765\u90fd\u662f\u4e00\u4e2a\u5173\u952e\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5bf9\u8bdd\u3001\u6307\u4ee4\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u5e76\u80fd\u591f\u901a\u8fc7\u5fae\u8c03\u663e\u8457\u63d0\u9ad8TOD\u6027\u80fd\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u6570\u636e\u96c6\u4e3b\u8981\u9488\u5bf9\u7528\u6237\u9a71\u52a8\u7684\u7cfb\u7edf\uff0c\u5e76\u5c40\u9650\u4e8e\u9884\u5b9a\u4e49\u7684\u7279\u5b9a\u573a\u666f\u548c\u69fd\u4f4d\uff0c\u56e0\u6b64\u9700\u8981\u5728TOD\u7684\u4e3b\u52a8\u6027\u3001\u591a\u6837\u6027\u548c\u80fd\u529b\u65b9\u9762\u8fdb\u884c\u6539\u8fdb\u3002\u672c\u7814\u7a76\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u591a\u9886\u57df\u4efb\u52a1\u5bfc\u5411\u5bf9\u8bdd\u6570\u636e\u6784\u5efa\u8fc7\u7a0b\u4ee5\u53ca\u57fa\u4e8e\u6b64\u8fc7\u7a0b\u751f\u6210\u7684\u4e2d\u6587\u5bf9\u8bdd\u6570\u636e\u96c6\u2014\u2014\\textbf{TransferTOD}\uff0c\u8be5\u6570\u636e\u96c6\u771f\u5b9e\u6a21\u62df\u4e86\u572830\u4e2a\u6d41\u884c\u751f\u6d3b\u670d\u52a1\u573a\u666f\u4e2d\u7684\u4eba\u673a\u5bf9\u8bdd\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u4f7f\u7528\u5168\u53c2\u6570\u5fae\u8c03\u7684\\textbf{TransferTOD-7B}\u6a21\u578b\uff0c\u5c55\u793a\u4e86\u5728\u5404\u79cd\u4e0b\u6e38\u573a\u666f\u4e2d\u7684\u663e\u8457\u7684\u586b\u69fd\u80fd\u529b\u548c\u63d0\u95ee\u80fd\u529b\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u540c\u6570\u636e\u5e94\u7528\u573a\u666f\u4e0b\u7684\u5f3a\u5927\u6cdb\u5316\u80fd\u529b\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u6570\u636e\u4f7f\u7528\u6548\u7387\u548c\u7cfb\u7edf\u6027\u80fd\u3002\u6570\u636e\u5df2\u53d1\u5e03\u4e8ehttps://github.com/KongLongGeFDU/TransferTOD\u3002|\n", "2407.21669": "|**2024-07-31**|**Synth-Empathy: Towards High-Quality Synthetic Empathy Data**|Hao Liang et.al.|[2407.21669](http://arxiv.org/abs/2407.21669)|**[link](https://github.com/aurora-slz/synth-empathy)**|\u8fd1\u5e74\u6765\uff0c\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5b9e\u73b0\u51fa\u8272\u540c\u7406\u5fc3\u54cd\u5e94\u80fd\u529b\u5df2\u6210\u4e3a\u4e00\u4e2a\u81f3\u5173\u91cd\u8981\u7684\u524d\u63d0\u3002\u56e0\u6b64\uff0c\u7ba1\u7406\u548c\u7406\u89e3\u540c\u7406\u5fc3\u6570\u636e\u96c6\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u540c\u7406\u5fc3\u6570\u636e\u901a\u5e38\u7531\u4eba\u7c7b\u6807\u6ce8\uff0c\u5bfc\u81f4\u6570\u636e\u91cf\u4e0d\u8db3\u548c\u5927\u91cf\u7684\u4eba\u529b\u6d6a\u8d39\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSynth-Empathy\u7684LLM\u57fa\u4e8e\u7684\u6570\u636e\u751f\u6210\u4e0e\u8d28\u91cf\u3001\u591a\u6837\u6027\u9009\u62e9\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u52a8\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u540c\u7406\u5fc3\u6570\u636e\u5e76\u7b5b\u9009\u6389\u4f4e\u8d28\u91cf\u6570\u636e\u3002\u901a\u8fc7\u5229\u7528\u4f4e\u540c\u7406\u5fc3\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u9ad8\u4e86\u540c\u7406\u5fc3\u54cd\u5e94\u6027\u80fd\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u51c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\uff08SoTA\uff09\u7ed3\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u5404\u79cd\u4eba\u7c7b\u8bc4\u4f30\u57fa\u51c6\u4e0a\u5747\u8868\u73b0\u51fa\u8272\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6570\u636e\u91cf\u4e0e\u8d28\u91cf\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u63d0\u4f9b\u4e86\u540c\u7406\u5fc3\u6570\u636e\u751f\u6210\u4e0e\u9009\u62e9\u65b9\u9762\u7684\u89c1\u89e3\u3002|\n", "2407.21593": "|**2024-07-31**|**LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows**|Lukas Teufelberger et.al.|[2407.21593](http://arxiv.org/abs/2407.21593)|null|\u4e3a\u4e86\u63d0\u9ad8\u751f\u4ea7\u529b\u5e76\u4f18\u5316\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u529f\u80fd\u5d4c\u5165\u5e94\u7528\u7a0b\u5e8f\u7684\u8d8b\u52bf\u6b63\u5728\u589e\u957f\uff0c\u4ece\u57fa\u4e8e\u6d4f\u89c8\u5668\u7684\u7f51\u7edc\u5e94\u7528\u5230\u5728\u4e2a\u4eba\u8ba1\u7b97\u673a\u4e0a\u8fd0\u884c\u7684\u539f\u751f\u5e94\u7528\u3002\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7cfb\u7edf\u7ea7\u5feb\u6377\u65b9\u5f0f\u5c42\u2014\u2014LLM-for-X\uff0c\u5b83\u901a\u8fc7\u8f7b\u91cf\u7ea7\u5f39\u51fa\u5f0f\u5bf9\u8bdd\u6846\u65e0\u7f1d\u5730\u5411\u4efb\u4f55\u5e94\u7528\u7a0b\u5e8f\u6dfb\u52a0LLM\u670d\u52a1\u3002\u6211\u4eec\u7684\u539f\u751f\u5c42\u901a\u8fc7\u7edf\u4e00\u7684\u804a\u5929\u524d\u7aef\u4f5c\u4e3a\u7f16\u7a0b\u63a5\u53e3\u6216\u81ea\u5b9a\u4e49API\u8c03\u7528\uff0c\u5c06\u524d\u7aef\u5e94\u7528\u7a0b\u5e8f\u4e0e\u6d41\u884c\u7684LLM\u540e\u7aef\uff08\u5982ChatGPT\u548cGemini\uff09\u65e0\u7f1d\u8fde\u63a5\u3002\u6211\u4eec\u5c55\u793a\u4e86LLM-for-X\u5728Microsoft Office\u3001VSCode\u3001Adobe Acrobat\u4ee5\u53caOverleaf\u7b49\u6d41\u884c\u7f51\u7edc\u5e94\u7528\u4e2d\u7684\u4f18\u52bf\u3002\u5728\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c06LLM-for-X\u4e0eChatGPT\u7684\u7f51\u9875\u754c\u9762\u8fdb\u884c\u4e86\u4efb\u52a1\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u63d0\u4f9b\u5feb\u901f\u3001\u9ad8\u6548\u4e14\u6613\u4e8e\u4f7f\u7528\u7684LLM\u8f85\u52a9\uff0c\u65e0\u9700\u5207\u6362\u4e0a\u4e0b\u6587\u652f\u6301\u5199\u4f5c\u548c\u9605\u8bfb\u4efb\u52a1\uff0c\u540c\u65f6\u5bf9\u7279\u5b9a\u5e94\u7528\u65e0\u7279\u5b9a\u4f9d\u8d56\u3002|\n", "2407.21579": "|**2024-07-31**|**A Performance Study of LLM-Generated Code on Leetcode**|Tristan Coignion et.al.|[2407.21579](http://arxiv.org/abs/2407.21579)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u65b9\u9762\u7684\u6548\u7387\uff0c\u5e76\u4f7f\u7528\u6765\u81eaLeetCode\u7684\u6570\u636e\u96c6\u8bc4\u4f30\u4e86\u5b83\u4eec\u4e0e\u4eba\u7c7b\u7f16\u5199\u7684\u89e3\u51b3\u65b9\u6848\u7684\u6027\u80fd\u3002\u6211\u4eec\u5bf9\u6bd4\u4e8618\u4e2aLLM\uff0c\u8003\u8651\u4e86\u6a21\u578b\u6e29\u5ea6\u548c\u6210\u529f\u7387\u7b49\u56e0\u7d20\u5bf9\u4ee3\u7801\u6027\u80fd\u7684\u5f71\u54cd\u3002\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5ea6\u91cf\u548c\u6bd4\u8f83LLM\u751f\u6210\u4ee3\u7801\u7684\u901f\u5ea6\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u91c7\u7528\u4e0d\u540cLLM\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u6027\u80fd\u76f8\u5f53\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u5e73\u5747\u800c\u8a00\u6bd4\u4eba\u7c7b\u7f16\u5199\u7684\u4ee3\u7801\u66f4\u9ad8\u6548\u3002\u8bba\u6587\u8fdb\u4e00\u6b65\u8ba8\u8bba\u4e86\u4f7f\u7528LeetCode\u4f5c\u4e3a\u57fa\u51c6\u6570\u636e\u96c6\u3001\u6f5c\u5728\u6570\u636e\u6c61\u67d3\u5e26\u6765\u7684\u9650\u5236\u4ee5\u53ca\u5e73\u53f0\u6d4b\u91cf\u53ef\u9760\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u8ba4\u4e3a\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3LLM\u5728\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u80fd\u529b\uff0c\u5e76\u4e3a\u8be5\u9886\u57df\u672a\u6765\u7684\u4f18\u5316\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2407.21571": "|**2024-07-31**|**PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning**|Min Jae Jung et.al.|[2407.21571](http://arxiv.org/abs/2407.21571)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u9047\u5230\u91cd\u5927\u6311\u6218\uff0c\u4e3b\u8981\u5728\u4e8e\u707e\u96be\u6027\u9057\u5fd8\u73b0\u8c61\uff0c\u5373\u65b0\u4fe1\u606f\u4f1a\u8986\u76d6\u4e4b\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u8fd9\u4e00\u5c40\u9650\u6027\u5bfc\u81f4\u4e86\u5927\u91cf\u73af\u5883\u548c\u7ecf\u6d4e\u8d44\u6e90\u7684\u6d6a\u8d39\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aPMoE\uff08Progressive Mixture of Experts with Asymmetric Transformer\uff09\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u65e8\u5728\u901a\u8fc7\u91c7\u7528\u5177\u6709\u6d45\u5c42\u7528\u4e8e\u4e00\u822c\u77e5\u8bc6\u548c\u6df1\u5c42\u7528\u4e8e\u65b0\u77e5\u8bc6\u7684\u4e0d\u5bf9\u79f0\u8bbe\u8ba1\u6765\u6700\u5c0f\u5316\u9057\u5fd8\u3002PMoE\u5728\u6df1\u5c42\u5f15\u5165\u4e86\u9010\u6b65\u589e\u52a0\u7684\u4e13\u5bb6\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u8def\u7531\u5668\uff0c\u8be5\u8def\u7531\u5668\u80fd\u591f\u9ad8\u6548\u5730\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u5408\u9002\u7684\u4e13\u5bb6\u3002 \u8def\u7531\u5668\u4f4d\u4e8e\u6df1\u5c42\u9644\u8fd1\uff0c\u5229\u7528\u6df1\u5ea6\u7279\u5f81\u805a\u5408\u5df2\u6574\u5408\u7684\u4fe1\u606f\u3002\u8fd9\u4f7f\u5f97\u8def\u7531\u5668\u80fd\u591f\u6709\u6548\u5730\u6267\u884c\u4efb\u52a1\uff0c\u5c06\u65b0\u77e5\u8bc6\u5206\u914d\u7ed9\u9010\u6b65\u589e\u52a0\u7684\u6df1\u5c42\u4e13\u5bb6\u3002\u901a\u8fc7\u5728TRACE\u6570\u636e\u96c6\u548c\u901a\u7528\u8bed\u8a00\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7684PMoE\u65b9\u6cd5\u4f18\u4e8e\u5148\u524d\u7684\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002|\n", "2407.21553": "|**2024-07-31**|**CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment**|Akira Kasuga et.al.|[2407.21553](http://arxiv.org/abs/2407.21553)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5ba2\u6237\u4f53\u9a8c\uff08CX\uff09\u6a21\u62df\u5668\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u7528\u6237\u884c\u4e3a\u6a21\u62df\u6765\u8bc4\u4f30\u672a\u6d4b\u8bd5\u7684\u7f51\u7edc\u8425\u9500\u6d3b\u52a8\u7684\u5f71\u54cd\u3002\u8be5\u63d0\u51fa\u7684\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c06\u7528\u6237\u884c\u4e3a\u5386\u53f2\u4e2d\u7684\u5404\u79cd\u4e8b\u4ef6\uff0c\u5982\u67e5\u770b\u5546\u54c1\u3001\u4f7f\u7528\u4f18\u60e0\u5238\u6216\u8d2d\u4e70\u5546\u54c1\u7b49\uff0c\u8868\u793a\u4e3a\u8bed\u4e49\u5d4c\u5165\u5411\u91cf\u3002\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u4e2a\u6a21\u578b\uff0c\u7528\u4e8e\u4ece\u5176LLM\u5d4c\u5165\u4e2d\u9884\u6d4b\u4e8b\u4ef6\u4e4b\u95f4\u7684\u8fc7\u6e21\uff0c\u751a\u81f3\u53ef\u4ee5\u4ece\u591a\u6837\u5316\u7684\u8bad\u7ec3\u6570\u636e\u4e2d\u5b66\u4e60\uff0c\u4ece\u800c\u5bf9\u672a\u77e5\u4e8b\u4ef6\u8fdb\u884c\u6cdb\u5316\u3002\u5728web\u8425\u9500\u5e94\u7528\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u8fd9\u4e2a\u8fc7\u6e21\u9884\u6d4b\u6a21\u578b\u6765\u6a21\u62df\u5f53\u65b0\u7684\u8425\u9500\u6d3b\u52a8\u6216\u4ea7\u54c1\u5c55\u793a\u7ed9\u7528\u6237\u65f6\uff0c\u7528\u6237\u53ef\u80fd\u5982\u4f55\u53cd\u5e94\u4e0d\u540c\u3002\u8fd9\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u6d88\u9664\u5728\u7ebf\u6d4b\u8bd5\u7684\u9ad8\u6602\u6210\u672c\uff0c\u5e76\u589e\u5f3a\u8425\u9500\u4eba\u5458\u63ed\u793a\u6d1e\u5bdf\u529b\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u6570\u503c\u8bc4\u4f30\u548c\u4f7f\u7528Google\u5546\u54c1\u5546\u5e97\u7684\u5927\u89c4\u6a21\u516c\u5171\u6570\u636e\u96c6\u8fdb\u884c\u7684\u7528\u6237\u7814\u7a76\u8bc1\u660e\u4e86\u6211\u4eec\u6846\u67b6\u7684\u6709\u6548\u6027\u3002|\n", "2408.00764": "|**2024-08-01**|**AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation**|Mengkang Hu et.al.|[2408.00764](http://arxiv.org/abs/2408.00764)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u57fa\u4e8e\u7684\u4ee3\u7406\u5df2\u5f15\u8d77\u5e7f\u6cdb\u5173\u6ce8\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u6d41\u884c\u3002\u6b64\u5916\uff0c\u89c4\u5212\u80fd\u529b\u662fLLM\u57fa\u4e8e\u4ee3\u7406\u7684\u5173\u952e\u7ec4\u6210\u90e8\u5206\uff0c\u6d89\u53ca\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u548c\u6267\u884c\u52a8\u4f5c\u4ee5\u5b8c\u6210\u89c4\u5212\u4efb\u52a1\uff0c\u901a\u5e38\u5305\u62ec\u4ece\u521d\u59cb\u72b6\u6001\u8fbe\u5230\u9884\u671f\u76ee\u6807\u7684\u8fc7\u7a0b\u3002\u672c\u6587\u7814\u7a76\u4e86\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u589e\u5f3aLLM\u89c4\u5212\u80fd\u529b\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u4ee3\u7406\u8bad\u7ec3\u3002\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5229\u7528\u4e13\u5bb6\u7ea7\u8f68\u8ff9\u5bf9\u6307\u4ee4\u8c03\u6574LLM\u80fd\u6709\u6548\u63d0\u5347\u5176\u89c4\u5212\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u4e3b\u8981\u96c6\u4e2d\u5728\u4ece\u624b\u52a8\u8bbe\u8ba1\u7684\u4efb\u52a1\u548c\u73af\u5883\u4e2d\u5408\u6210\u8f68\u8ff9\u3002\u521b\u5efa\u8fd9\u4e9b\u73af\u5883\u548c\u4efb\u52a1\u7684\u52b3\u52a8\u5bc6\u96c6\u578b\u8fc7\u7a0b\u9650\u5236\u4e86\u751f\u6210\u8db3\u591f\u591a\u6837\u6027\u548c\u5e7f\u6cdb\u6027\u7684\u8f68\u8ff9\u7684\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5c40\u9650\u6027\uff0c\u672c\u6587\u63a2\u7d22\u4e86\u81ea\u52a8\u5408\u6210\u591a\u6837\u5316\u73af\u5883\u4ee5\u53ca\u89c4\u5212\u4efb\u52a1\u7684\u6e10\u8fdb\u8303\u56f4\uff0c\u4ece\u7b80\u5355\u5230\u590d\u6742\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u6846\u67b6AgentGen\uff0c\u5229\u7528LLM\u9996\u5148\u751f\u6210\u73af\u5883\uff0c\u968f\u540e\u6839\u636e\u8fd9\u4e9b\u73af\u5883\u751f\u6210\u89c4\u5212\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u4e3a\u4e86\u63d0\u9ad8\u73af\u5883\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f7f\u7528\u5305\u542b\u5404\u79cd\u9886\u57df\u7279\u5b9a\u6587\u672c\u6bb5\u843d\u7684\u7075\u611f\u8bed\u6599\u5e93\u4f5c\u4e3a\u5408\u6210\u73af\u5883\u7684\u4e0a\u4e0b\u6587\u3002\u6b64\u5916\uff0c\u4e3a\u4e86\u589e\u52a0\u751f\u6210\u89c4\u5212\u4efb\u52a1\u96be\u5ea6\u591a\u6837\u6027\u7684\u7a0b\u5ea6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53cc\u5411\u6f14\u5316\u65b9\u6cd5Bi-Evol\uff0c\u8be5\u65b9\u6cd5\u4ece\u5bb9\u6613\u548c\u56f0\u96be\u7684\u4e24\u4e2a\u65b9\u5411\u8fdb\u5316\u89c4\u5212\u4efb\u52a1\uff0c\u4ee5\u5408\u6210\u5177\u6709\u5e73\u6ed1\u96be\u5ea6\u66f2\u7ebf\u7684\u4efb\u52a1\u96c6\u3002\u6765\u81eaAgentBoard\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0cAgentGen\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u89c4\u5212\u80fd\u529b\uff0c\u4f8b\u5982\uff0c\u4f7f\u7528AgentGen\u6307\u4ee4\u8c03\u6574\u7684Llama-3 8B\u5728\u603b\u4f53\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86GPT-3.5\u3002\u6b64\u5916\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e2d\uff0c\u5b83\u751a\u81f3\u8d85\u8d8a\u4e86GPT-4\u3002|\n", "2408.00761": "|**2024-08-01**|**Tamper-Resistant Safeguards for Open-Weight LLMs**|Rishub Tamirisa et.al.|[2408.00761](http://arxiv.org/abs/2408.00761)|**[link](https://github.com/rishub-tamirisa/tamper-resistance)**|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5f15\u53d1\u4e86\u5bf9\u6f5c\u5728\u6076\u610f\u7528\u9014\u7684\u5e7f\u6cdb\u62c5\u5fe7\u3002\u9488\u5bf9\u5f00\u653e\u6743\u91cd\u7684LLM\uff0c\u73b0\u6709\u4fdd\u62a4\u63aa\u65bd\u5728\u62b5\u6297\u7be1\u6539\u653b\u51fb\u65b9\u9762\u7f3a\u4e4f\u8db3\u591f\u7684\u7a33\u5b9a\u6027\uff0c\u8fd9\u4e9b\u653b\u51fb\u53ef\u4ee5\u901a\u8fc7\u5fae\u8c03\u6b65\u9aa4\u8f7b\u6613\u5730\u79fb\u9664\u62d2\u7edd\u548c\u9057\u5fd8\u4fdd\u62a4\u63aa\u65bd\u3002\u8fd9\u7c7b\u6f0f\u6d1e\u8981\u6c42\u91c7\u53d6\u65b0\u7684\u65b9\u6cd5\u6765\u786e\u4fdd\u5b89\u5168\u91ca\u653e\u5f00\u653e\u6743\u91cd\u7684LLM\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aTAR\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u4e0d\u53ef\u7be1\u6539\u7684\u5b89\u5168\u9632\u62a4\u878d\u5165\u5230\u5f00\u653e\u6743\u91cd\u7684LLM\u4e2d\uff0c\u4f7f\u5f97\u5373\u4f7f\u7ecf\u8fc7\u6570\u5343\u6b65\u7684\u5fae\u8c03\uff0c\u653b\u51fb\u8005\u4e5f\u65e0\u6cd5\u79fb\u9664\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u3002\u5728\u5168\u9762\u7684\u8bc4\u4f30\u548c\u7ea2\u961f\u6d4b\u8bd5\u5206\u6790\u4e2d\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86\u9632\u62a4\u7684\u4e0d\u53ef\u7be1\u6539\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u826f\u6027\u529f\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u4e0d\u53ef\u7be1\u6539\u6027\u662f\u4e00\u4e2a\u53ef\u884c\u7684\u95ee\u9898\uff0c\u4e3a\u6539\u8fdb\u5f00\u653e\u6743\u91cdLLM\u7684\u5b89\u5168\u6027\u548c\u5b89\u5168\u6027\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u65b0\u9014\u5f84\u3002|\n", "2408.00741": "|**2024-08-01**|**DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency**|Jovan Stojkovic et.al.|[2408.00741](http://arxiv.org/abs/2408.00741)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u751f\u6210\u80fd\u529b\u4f7f\u5176\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u6210\u4e3a\u5173\u952e\u7684\u5de5\u4f5c\u8d1f\u8f7d\u3002\u5982\u4eca\uff0cLLM\u63a8\u7406\u96c6\u7fa4\u5904\u7406\u5927\u91cf\u67e5\u8be2\uff0c\u5e76\u5bf9\u670d\u52a1\u8d28\u91cf\u6307\u6807\uff08SLOs\uff09\u6709\u4e25\u683c\u8981\u6c42\u3002\u4e3a\u4e86\u8fbe\u5230\u9884\u671f\u6027\u80fd\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u80fd\u8017\u9ad8\u7684GPU\u4e0a\u6267\u884c\uff0c\u5bfc\u81f4\u63a8\u7406\u96c6\u7fa4\u6d88\u8017\u5927\u91cf\u80fd\u6e90\uff0c\u5e76\u4ea7\u751f\u8fc7\u91cf\u7684\u78b3\u6392\u653e\u3002\u5e78\u8fd0\u7684\u662f\uff0c\u6211\u4eec\u53d1\u73b0\u53ef\u4ee5\u901a\u8fc7\u5229\u7528\u63a8\u7406\u8ba1\u7b97\u7279\u6027\u7684\u5f02\u8d28\u6027\u4ee5\u53ca\u5de5\u4f5c\u8d1f\u8f7d\u7684\u6ce2\u52a8\uff0c\u663e\u8457\u63d0\u9ad8\u80fd\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u591a\u6837\u6027\u548c\u52a8\u6001\u73af\u5883\u521b\u9020\u4e86\u4e00\u4e2a\u5de8\u5927\u7684\u641c\u7d22\u7a7a\u95f4\uff0c\u4e0d\u540c\u7684\u7cfb\u7edf\u914d\u7f6e\uff08\u5982\u5b9e\u4f8b\u6570\u91cf\u3001\u6a21\u578b\u5e76\u884c\u6027\u548cGPU\u9891\u7387\uff09\u5bfc\u81f4\u4e0d\u540c\u7684\u80fd\u6e90\u548c\u6027\u80fd\u6298\u8877\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86DynamoLLM\uff0c\u8fd9\u662f\u9996\u4e2a\u9488\u5bf9LLM\u63a8\u7406\u73af\u5883\u7684\u80fd\u6548\u7ba1\u7406\u6846\u67b6\u3002DynamoLLM\u81ea\u52a8\u4e14\u52a8\u6001\u5730\u91cd\u65b0\u914d\u7f6e\u63a8\u7406\u96c6\u7fa4\uff0c\u4ee5\u4f18\u5316\u80fd\u6e90\u548c\u6210\u672c\uff0c\u540c\u65f6\u6ee1\u8db3\u670d\u52a1\u7684\u6027\u80fdSLOs\u3002\u7814\u7a76\u8868\u660e\uff0c\u5728\u670d\u52a1\u5c42\u9762\uff0cDynamoLLM\u80fd\u591f\u8282\u770153%\u7684\u80fd\u6e90\u548c38%\u7684\u64cd\u4f5c\u78b3\u6392\u653e\uff0c\u5e76\u4e3a\u5ba2\u6237\u51cf\u5c1161%\u7684\u6210\u672c\uff0c\u540c\u65f6\u4ecd\u80fd\u6ee1\u8db3\u5ef6\u8fdfSLOs\u3002|\n", "2408.00727": "|**2024-08-01**|**Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions**|Guangzhi Xiong et.al.|[2408.00727](http://arxiv.org/abs/2408.00727)|**[link](https://github.com/teddy-xionggz/medrag)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4e86\u89e3\u51b3\u533b\u7597\u95ee\u9898\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5b83\u4eec\u80fd\u591f\u638c\u63e1\u5927\u91cf\u533b\u5b66\u77e5\u8bc6\uff0c\u4f46\u4ecd\u7136\u53ef\u80fd\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u4e14\u5728\u77e5\u8bc6\u66f4\u65b0\u65b9\u9762\u5177\u6709\u5c40\u9650\u6027\u3002\u4e3a\u4e86\u589e\u5f3aLLM\u5728\u533b\u5b66\u95ee\u7b54\u65b9\u9762\u7684\u80fd\u529b\uff0c\u63d0\u51fa\u4e86\u57fa\u4e8e\u68c0\u7d22\u7684\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\uff0c\u901a\u8fc7\u5916\u90e8\u77e5\u8bc6\u5e93\u6765\u63d0\u5347\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u591a\u6b21\u4fe1\u606f\u67e5\u8be2\u7684\u590d\u6742\u60c5\u51b5\u4e0b\uff0cRAG\u53ef\u80fd\u4ecd\u7136\u4f1a\u5931\u8d25\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u8fed\u4ee3RAG\u65b9\u6cd5\uff08i-MedRAG\uff09\uff0c\u5141\u8bb8LLM\u5728\u6bcf\u6b21\u5c1d\u8bd5\u540e\u8fed\u4ee3\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u3002\u5728\u6bcf\u6b21i-MedRAG\u8fed\u4ee3\u4e2d\uff0c\u540e\u7eed\u95ee\u9898\u7531\u57fa\u672c\u7684RAG\u7cfb\u7edf\u56de\u7b54\uff0c\u5e76\u7528\u4e8e\u6307\u5bfc\u4e0b\u4e00\u4e2a\u8fed\u4ee3\u4e2d\u7684\u67e5\u8be2\u751f\u6210\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ec5\u4f7f\u7528RAG\u7684\u4f20\u7edf\u65b9\u6cd5\u76f8\u6bd4\uff0ci-MedRAG\u663e\u8457\u63d0\u9ad8\u4e86\u5404\u79cdLLM\u5728\u590d\u6742\u95ee\u9898\u4e0a\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u95ee\u9898\u662f\u7f8e\u56fd\u533b\u5b66\u751f\u6267\u7167\u8003\u8bd5\uff08USMLE\uff09\u4e34\u5e8a\u6848\u4f8b\u548c\u5927\u89c4\u6a21\u591a\u4efb\u52a1\u8bed\u8a00\u7406\u89e3\uff08MMLU\uff09\u6570\u636e\u96c6\u4e2d\u7684\u77e5\u8bc6\u6d4b\u8bd5\u6240\u6db5\u76d6\u7684\u3002\u7279\u522b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u96f6\u6837\u672ci-MedRAG\u5728GPT-3.5\u4e0a\u53d6\u5f97\u4e8669.68%\u7684\u51c6\u786e\u6027\uff0c\u8d85\u8d8a\u4e86\u6240\u6709\u73b0\u6709\u7684\u63d0\u793a\u5de5\u7a0b\u548c\u5fae\u8c03\u65b9\u6cd5\u5728MedQA\u6570\u636e\u96c6\u4e0a\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86i-MedRAG\u5728\u4e0d\u540c\u8fed\u4ee3\u6b21\u6570\u548c\u6bcf\u8fed\u4ee3\u67e5\u8be2\u6570\u91cf\u4e0b\u7684\u6269\u5c55\u7279\u6027\u3002 \u6211\u4eec\u7684\u6848\u4f8b\u7814\u7a76\u663e\u793a\uff0ci-MedRAG\u80fd\u591f\u7075\u6d3b\u5730\u63d0\u51fa\u540e\u7eed\u95ee\u9898\u5f62\u6210\u63a8\u7406\u94fe\uff0c\u6df1\u5165\u5206\u6790\u533b\u7597\u95ee\u9898\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5c06\u540e\u7eed\u95ee\u9898\u878d\u5165\u533b\u5b66RAG\u7684\u7814\u7a76\u3002|\n", "2408.00724": "|**2024-08-01**|**An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models**|Yangzhen Wu et.al.|[2408.00724](http://arxiv.org/abs/2408.00724)|null|\u5728\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u4f18\u8bad\u7ec3\u914d\u7f6e\u7814\u7a76\u4e2d\uff0c\u7279\u522b\u662f\u5728\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u65b9\u9762\u7684\u914d\u7f6e\uff0c\u5df2\u7ecf\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u63a2\u8ba8\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u63a8\u7406\u9636\u6bb5\u5982\u4f55\u6700\u4f18\u5316\u914d\u7f6eLLM\u4ee5\u5e73\u8861\u989d\u5916\u7684\u63a8\u7406\u8ba1\u7b97\u65f6\u95f4\u548c\u6027\u80fd\u63d0\u5347\u7684\u7814\u7a76\u8fd8\u4e0d\u591f\u6df1\u5165\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\uff0c\u5373\u8bbe\u8ba1\u80fd\u591f\u901a\u8fc7\u8c03\u6574\u63a8\u7406\u65f6\u95f4\u7684\u8ba1\u7b97\u91cf\u6765\u4f18\u5316\u6027\u80fd\u7684\u6a21\u578b\u548c\u63a8\u7406\u7b56\u7565\u3002 \u4e3a\u4e86\u7406\u89e3\u5e76\u8bbe\u8ba1\u8ba1\u7b97\u4f18\u5316\u7684\u63a8\u7406\u65b9\u6cd5\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5bf9\u591a\u79cd\u63a8\u7406\u7b56\u7565\uff0c\u5982\u8d2a\u5fc3\u641c\u7d22\u3001\u591a\u6570\u6295\u7968\u3001\u6700\u4f73N\u79cd\u7ec4\u5408\u3001\u52a0\u6743\u6295\u7968\u53ca\u5176\u53d8\u4f53\uff0c\u5728\u4e24\u79cd\u4e0d\u540c\u7684\u6811\u641c\u7d22\u7b97\u6cd5\u4e2d\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d89\u53ca\u4e0d\u540c\u6a21\u578b\u89c4\u6a21\u548c\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u7684\u7814\u7a76\u53d1\u73b0\uff0c\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u914d\u5408\u66f4\u5148\u8fdb\u7684\u89e3\u7801\u7b97\u6cd5\u901a\u5e38\u80fd\u5b9e\u73b0\u5e15\u7d2f\u6258\u6700\u4f18\u7684\u6743\u8861\uff0c\u5373\u5728\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u4e0e\u6027\u80fd\u63d0\u5347\u4e4b\u95f4\u627e\u5230\u6700\u4f73\u5e73\u8861\u70b9\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u5728\u9884\u7b97\u6709\u9650\u7684\u573a\u666f\u4e0b\uff0c\u5982\u7ec8\u7aef\u8bbe\u5907\u4e0a\u90e8\u7f72\u5c0f\u578b\u6a21\u578b\uff0c\u53ef\u80fd\u5177\u6709\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4ee5\u63d0\u9ad8\u95ee\u9898\u89e3\u51b3\u7684\u51c6\u786e\u7387\u3002 \u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86Llemma-7B\u6a21\u578b\u5728\u4f7f\u7528\u7ea6\u4e24\u500d\u4e8eLlemma-34B\u6a21\u578b\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08FLOPs\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u5b9e\u73b0\u4e0e\u540e\u8005\u76f8\u5f53\u7684MATH500\u4efb\u52a1\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u53ef\u80fd\u9002\u7528\u4e8e\u4efb\u4f55\u6709\u660e\u786e\u6210\u529f\u5ea6\u91cf\u6807\u51c6\u7684\u751f\u6210\u4efb\u52a1\u3002|\n", "2408.00722": "|**2024-08-01**|**Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities**|Sunder Ali Khowaja et.al.|[2408.00722](http://arxiv.org/abs/2408.00722)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u56e0\u5176\u5728\u65b0\u5174\u5e94\u7528\u4e2d\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u800c\u5907\u53d7\u5173\u6ce8\uff0c\u8fd9\u4e9b\u5e94\u7528\u5305\u62ec\u901a\u4fe1\u7f51\u7edc\u3002\u9884\u8ba16G\u79fb\u52a8\u8fb9\u7f18\u8ba1\u7b97\u7f51\u7edc\u5c06\u80fd\u591f\u4f5c\u4e3a\u670d\u52a1\u652f\u6301LLMs\uff0c\u56e0\u4e3a\u5b83\u4eec\u63d0\u4f9b\u8d85\u53ef\u9760\u7684\u4f4e\u5ef6\u8fdf\u901a\u4fe1\u548c\u95ed\u73af\u5927\u89c4\u6a21\u8fde\u63a5\u3002\u7136\u800c\uff0cLLMs\u5728\u6570\u636e\u548c\u6a21\u578b\u9690\u79c1\u65b9\u9762\u5b58\u5728\u6f0f\u6d1e\uff0c\u8fd9\u5f71\u54cd\u4e86\u5728\u7528\u6237\u670d\u52a1\u4e2d\u90e8\u7f72LLMs\u7684\u4fe1\u4efb\u5ea6\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u65f6\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u7279\u522b\u662f\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u3002\u6211\u4eec\u5b9a\u4e49\u4e86\u653b\u51fb\u7f51\u7edc\u7684\u7279\u5f81\uff0c\u8be5\u7f51\u7edc\u53ef\u4ee5\u5728\u8bbf\u95ee\u4e0b\u6e38\u4efb\u52a1\u7ec6\u8c03\u6a21\u578b\u65f6\u6267\u884c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\uff0c\u524d\u63d0\u662f\u653b\u51fb\u8005\u53ef\u4ee5\u8bbf\u95ee\u8be5\u6a21\u578b\u3002\u6211\u4eec\u8868\u660e\uff0c\u5bf9\u4e8e\u4efb\u4f55\u4e0b\u6e38\u4efb\u52a1\uff0c\u6210\u5458\u5f52\u5c5e\u653b\u51fb\u90fd\u662f\u6709\u6548\u7684\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5728\u4f7f\u7528LLMs\u4f5c\u4e3a\u670d\u52a1\u65f6\u53d1\u751f\u4e2a\u4eba\u6570\u636e\u6cc4\u9732\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u4e0a\uff0c\u653b\u51fb\u6210\u529f\u7387\u53ef\u8fbe92%\u3002\u57fa\u4e8e\u5b9e\u9a8c\u5206\u6790\uff0c\u6211\u4eec\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u9632\u5fa1\u673a\u5236\uff0c\u5e76\u63d0\u51fa\u4e86\u53ef\u80fd\u7684\u7814\u7a76\u65b9\u5411\uff0c\u4ee5\u4f7f\u57286G\u7f51\u7edc\u80cc\u666f\u4e0bLLMs\u66f4\u52a0\u53ef\u9760\u3002|\n", "2408.00690": "|**2024-08-02**|**Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning**|Trapoom Ukarapol et.al.|[2408.00690](http://arxiv.org/abs/2408.00690)|**[link](https://github.com/trapoom555/language-model-sts-cft)**|\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u8d44\u6e90\u5bc6\u96c6\u578b\u7684\u7279\u70b9\u800c\u964d\u4f4e\u4e86\u5176\u53ef\u83b7\u53d6\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u5982MiniCPM\u63d0\u4f9b\u4e86\u66f4\u53ef\u6301\u7eed\u7684\u6269\u5c55\u6027\uff0c\u4f46\u5f80\u5f80\u5728\u6ca1\u6709\u4e13\u95e8\u4f18\u5316\u7684\u60c5\u51b5\u4e0b\u8868\u73b0\u4e0d\u4f73\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u63d0\u5347\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\u6765\u589e\u5f3a\u5b83\u4eec\u7684\u8868\u73b0\u3002\u6211\u4eec\u9009\u62e9\u4e86\u4e09\u4e2a\u8bed\u8a00\u6a21\u578b\uff1aMiniCPM\u3001Phi-2\u548cGemma\uff0c\u5728NLI\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u663e\u8457\u63d0\u5347\u6240\u6709\u4e09\u79cd\u6a21\u578b\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u6587\u672c\u5d4c\u5165\u8d28\u91cf\uff0c\u5176\u4e2dMiniCPM\u8868\u73b0\u51fa\u6700\u663e\u8457\u7684\u5e73\u574756.33%\u6027\u80fd\u63d0\u5347\u3002\u5bf9\u6bd4\u5f0f\u5fae\u8c03\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/trapoom555/Language-Model-STS-CFT\u3002|\n", "2408.00686": "|**2024-08-01**|**Can Developers Prompt? A Controlled Experiment for Code Documentation Generation**|Hans-Alexander Kruse et.al.|[2408.00686](http://arxiv.org/abs/2408.00686)|null|\u6211\u4eec\u5bf920\u540d\u4e13\u4e1a\u4eba\u58eb\u548c30\u540d\u8ba1\u7b97\u673a\u79d1\u5b66\u5b66\u751f\u8fdb\u884c\u4e86\u4e00\u4e2a\u53d7\u63a7\u5b9e\u9a8c\uff0c\u8981\u6c42\u4ed6\u4eec\u4f7f\u7528ChatGPT\u98ce\u683c\u7684Visual Studio Code\u6269\u5c55\u6765\u4e3a\u4e24\u4e2aPython\u51fd\u6570\u7f16\u5199\u4ee3\u7801\u6587\u6863\u3002\u5b9e\u9a8c\u7ec4\u81ea\u7531\u8f93\u5165\u81ea\u5b9a\u4e49\u63d0\u793a\uff0c\u800c\u5bf9\u7167\u7ec4\u5219\u6267\u884c\u9884\u8bbe\u7684\u5c11\u91cf\u63d0\u793a\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u65e0\u8bba\u662f\u4e13\u4e1a\u4eba\u58eb\u8fd8\u662f\u5b66\u751f\uff0c\u90fd\u5bf9\u6216\u65e0\u6cd5\u5e94\u7528\u63d0\u793a\u5de5\u7a0b\u6280\u5de7\u611f\u5230\u4e0d\u77e5\u6240\u63aa\u3002\u5c24\u5176\u662f\u5b66\u751f\uff0c\u4ed6\u4eec\u8ba4\u4e3a\u4ece\u81ea\u5b9a\u4e49\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u6bd4\u4ece\u51c6\u5907\u597d\u7684\u63d0\u793a\u751f\u6210\u7684\u6587\u6863\u5728\u53ef\u8bfb\u6027\u3001\u7b80\u6d01\u6027\u548c\u6709\u7528\u6027\u65b9\u9762\u663e\u8457\u8f83\u5dee\u3002\u4e00\u4e9b\u4e13\u4e1a\u4eba\u58eb\u4ec5\u901a\u8fc7\u5728\u81ea\u5b9a\u4e49\u63d0\u793a\u4e2d\u52a0\u5165\u201cDocstring\u201d\u5173\u952e\u8bcd\u5c31\u80fd\u751f\u6210\u66f4\u9ad8\u8d28\u91cf\u7684\u6587\u6863\u3002\u5b66\u751f\u5e0c\u671b\u83b7\u5f97\u66f4\u591a\u7684\u6307\u5bfc\u6765\u5236\u5b9a\u63d0\u793a\uff0c\u800c\u4e13\u4e1a\u4eba\u58eb\u5219\u66f4\u6b23\u8d4f\u81ea\u5b9a\u4e49\u63d0\u793a\u7684\u7075\u6d3b\u6027\u3002\u53c2\u4e0e\u8005\u666e\u904d\u8ba4\u4e3a\u8f93\u51fa\u5e76\u975e\u5b8c\u7f8e\uff0c\u800c\u662f\u5c06\u5176\u89c6\u4e3a\u9010\u6b65\u5b8c\u5584\u6587\u6863\u7684\u5de5\u5177\u3002\u9700\u8981\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u6765\u7406\u89e3\u5f00\u53d1\u4eba\u5458\u5177\u6709\u7684\u63d0\u793a\u6280\u5de7\u548c\u504f\u597d\uff0c\u4ee5\u53ca\u4ed6\u4eec\u5b8c\u6210\u7279\u5b9a\u4efb\u52a1\u6240\u9700\u7684\u652f\u63f4\u3002|\n", "2408.00665": "|**2024-08-01**|**AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models**|Daqin Luo et.al.|[2408.00665](http://arxiv.org/abs/2408.00665)|**[link](https://github.com/tim120526/AutoM3L)**|### \u6458\u8981 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u591a\u6a21\u6001\u673a\u5668\u5b66\u4e60\u81ea\u52a8\u5316\u6846\u67b6\u2014\u2014AutoM3L\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u63a7\u5236\u5668\uff0c\u81ea\u52a8\u6784\u5efa\u591a\u6a21\u6001\u8bad\u7ec3\u7ba1\u9053\u3002AutoM3L\u80fd\u591f\u7406\u89e3\u6570\u636e\u6a21\u6001\u5e76\u6839\u636e\u7528\u6237\u9700\u6c42\u9009\u62e9\u5408\u9002\u7684\u6a21\u578b\uff0c\u63d0\u4f9b\u81ea\u52a8\u5316\u548c\u4e92\u52a8\u6027\u3002\u901a\u8fc7\u6d88\u9664\u624b\u52a8\u7279\u5f81\u5de5\u7a0b\u548c\u8d85\u53c2\u6570\u4f18\u5316\u7684\u9700\u6c42\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7b80\u5316\u4e86\u7528\u6237\u53c2\u4e0e\u8fc7\u7a0b\uff0c\u5e76\u901a\u8fc7\u6307\u4ee4\u63d0\u4f9b\u4e86\u5b9a\u5236\u5316\u9009\u9879\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u4ee5\u5f80\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u7684\u5c40\u9650\u6027\u3002 \u6211\u4eec\u5bf9AutoM3L\u5728\u516d\u4e2a\u4e0d\u540c\u7c7b\u578b\u7684\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6db5\u76d6\u4e86\u5206\u7c7b\u3001\u56de\u5f52\u548c\u68c0\u7d22\u4efb\u52a1\uff0c\u4ee5\u53ca\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u5355\u6a21\u6001\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cAutoM3L\u5728\u6027\u80fd\u4e0a\u4e0e\u4f20\u7edf\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\u6216\u8d85\u8d8a\u6027\u3002\u6b64\u5916\uff0c\u7528\u6237\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86AutoM3L\u5728\u7528\u6237\u53cb\u597d\u6027\u548c\u6613\u7528\u6027\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u76f8\u8f83\u4e8e\u57fa\u4e8e\u89c4\u5219\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3002|\n", "2408.00657": "|**2024-08-01**|**Disentangling Dense Embeddings with Sparse Autoencoders**|Charles O'Neill et.al.|[2408.00657](http://arxiv.org/abs/2408.00657)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5e94\u7528\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\uff08SAEs\uff09\u5230\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u7684\u5bc6\u96c6\u6587\u672c\u5d4c\u5165\u7684\u9996\u6b21\u5c1d\u8bd5\uff0c\u5c55\u793a\u5176\u5728\u89e3\u7f20\u8bed\u4e49\u6982\u5ff5\u65b9\u9762\u7684\u6f5c\u529b\u3002\u901a\u8fc7\u5728\u8d85\u8fc742\u4e07\u7bc7\u8ba1\u7b97\u673a\u79d1\u5b66\u548c\u5929\u6587\u5b66\u9886\u57df\u79d1\u5b66\u8bba\u6587\u6458\u8981\u7684\u5d4c\u5165\u4e0a\u8bad\u7ec3SAEs\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6240\u5f97\u5230\u7684\u7a00\u758f\u8868\u793a\u4fdd\u6301\u4e86\u8bed\u4e49\u4e00\u81f4\u6027\u7684\u540c\u65f6\u63d0\u4f9b\u4e86\u53ef\u89e3\u91ca\u6027\u3002\u6211\u4eec\u5206\u6790\u8fd9\u4e9b\u5b66\u4e60\u7279\u5f81\uff0c\u63a2\u7d22\u4e0d\u540c\u6a21\u578b\u5bb9\u91cf\u4e0b\u5b83\u4eec\u7684\u884c\u4e3a\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u6765\u8bc6\u522b\u201c\u7279\u5f81\u5bb6\u65cf\u201d\uff0c\u8fd9\u4e9b\u7279\u5f81\u4ee3\u8868\u4e86\u4e0d\u540c\u62bd\u8c61\u7ea7\u522b\u7684\u76f8\u5173\u6982\u5ff5\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u5b9e\u9645\u5e94\u7528\u4ef7\u503c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u8fd9\u4e9b\u53ef\u89e3\u91ca\u7279\u5f81\u7cbe\u786e\u63a7\u5236\u8bed\u4e49\u641c\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u67e5\u8be2\u8bed\u4e49\u7684\u7cbe\u7ec6\u63a7\u5236\u3002\u8fd9\u9879\u5de5\u4f5c\u586b\u8865\u4e86\u5bc6\u96c6\u5d4c\u5165\u7684\u8bed\u4e49\u4e30\u5bcc\u6027\u548c\u7a00\u758f\u8868\u793a\u7684\u53ef\u89e3\u91ca\u6027\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u5f00\u6e90\u4e86\u8bad\u7ec3\u540e\u7684\u5d4c\u5165\u3001\u7a00\u758f\u81ea\u52a8\u7f16\u7801\u5668\u4ee5\u53ca\u53ef\u89e3\u91ca\u7279\u5f81\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7528\u4e8e\u63a2\u7d22\u5b83\u4eec\u7684\u7f51\u9875\u5e94\u7528\u7a0b\u5e8f\u3002|\n", "2408.01423": "|**2024-08-02**|**Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting**|Xiangyu Zhao et.al.|[2408.01423](http://arxiv.org/abs/2408.01423)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5728\u6267\u884c\u5404\u79cd\u4efb\u52a1\u65f6\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u6027\u80fd\u53d7\u5230\u7279\u5b9a\u63d0\u793a\u8bbe\u8ba1\u7b56\u7565\u7684\u5f71\u54cd\u3002\u4e3b\u8981\u6709\u4e24\u79cd\u63d0\u793a\u8bbe\u8ba1\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u901a\u8fc7\u624b\u52a8\u4e3a\u7279\u5b9a\u6570\u636e\u96c6\u521b\u5efa\u4e13\u95e8\u7684\u63d0\u793a\uff0c\u88ab\u79f0\u4e3a\u4e13\u5bb6\u8bbe\u8ba1\u63d0\u793a\uff08EDP\uff09\uff0c\u4e00\u65e6\u521b\u5efa\uff0c\u5b83\u4eec\u5c31\u65e0\u6cd5\u66f4\u6539\uff0c\u5176\u6709\u6548\u6027\u53d7\u9650\u4e8e\u4eba\u7c7b\u8bbe\u8ba1\u8005\u7684\u4e13\u4e1a\u77e5\u8bc6\u3002\u5f53\u5e94\u7528\u4e8eLLM\u65f6\uff0c\u8fd9\u79cd\u56fa\u5b9a\u7684\u65b9\u6cd5\u5bfc\u81f4\u5bf9\u7b80\u5355\u95ee\u9898\u548c\u590d\u6742\u95ee\u9898\u91c7\u7528\u7edf\u4e00\u7684\u89e3\u51b3\u7b56\u7565\uff0c\u5bfc\u81f4\u5bf9\u4e8e\u7b80\u5355\u95ee\u9898\u8fc7\u5ea6\u4f7f\u7528\u4ee4\u724c\u3002\u53e6\u4e00\u79cd\u65b9\u6cd5\u662f\u8ba9LLM\u81ea\u52a8\u751f\u6210\u63d0\u793a\uff0c\u79f0\u4e3aLLM\u884d\u751f\u63d0\u793a\uff08LDP\uff09\uff0c\u80fd\u591f\u9488\u5bf9\u5177\u4f53\u95ee\u9898\u63d0\u4f9b\u5b9a\u5236\u89e3\u51b3\u65b9\u6848\uff0c\u4ece\u800c\u51cf\u8f7b\u4e86EDP\u7684\u5c40\u9650\u6027\u3002\u7136\u800c\uff0cLDP\u5728\u5904\u7406\u590d\u6742\u95ee\u9898\u65f6\u53ef\u80fd\u4f1a\u9047\u5230\u6027\u80fd\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u662f\u56e0\u4e3a\u5728\u89e3\u51b3\u95ee\u9898\u89c4\u5212\u8fc7\u7a0b\u4e2d\u53ef\u80fd\u7d2f\u79ef\u9519\u8bef\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u63d0\u793a\u9012\u5f52\u641c\u7d22\uff08PRS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528LLM\u751f\u6210\u9488\u5bf9\u7279\u5b9a\u95ee\u9898\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u540c\u65f6\u51cf\u5c11\u4ee4\u724c\u7684\u4f7f\u7528\u3002\u8fd9\u4e2a\u6846\u67b6\u5305\u542b\u4e86\u5bf9\u95ee\u9898\u590d\u6742\u6027\u7684\u8bc4\u4f30\u4ee5\u53ca\u53ef\u8c03\u6574\u7684\u7ed3\u6784\uff0c\u4ee5\u964d\u4f4e\u51fa\u9519\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u53c2\u6570\u6570\u91cf\u7684LLM\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5185\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u9a8c\u8bc1\u4e86PRS\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u4e0e\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0cPRS\u65b9\u6cd5\u5728\u4f7f\u7528Llama3-7B\u6a21\u578b\u65f6\uff0cBBH\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e868%\uff0c\u5b9e\u73b0\u4e8622%\u7684\u6539\u8fdb\u3002|\n", "2408.01420": "|**2024-08-02**|**Mission Impossible: A Statistical Perspective on Jailbreaking LLMs**|Jingtong Su et.al.|[2408.01420](http://arxiv.org/abs/2408.01420)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6709\u9650\u7684\u8d28\u91cf\u63a7\u5236\u4e0b\u8bad\u7ec3\u4e8e\u6d77\u91cf\u6587\u672c\u6570\u636e\u4e2d\u3002\u8fd9\u5bfc\u81f4LLM\u53ef\u80fd\u51fa\u73b0\u610f\u5916\u751a\u81f3\u6709\u5bb3\u7684\u884c\u4e3a\uff0c\u5982\u6cc4\u9732\u4fe1\u606f\u3001\u5047\u65b0\u95fb\u6216\u4ec7\u6068\u8a00\u8bba\u3002\u5e94\u5bf9\u7b56\u7565\uff0c\u901a\u5e38\u79f0\u4e3a\u504f\u597d\u5bf9\u9f50\uff0c\u5305\u62ec\u901a\u8fc7\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6587\u672c\u793a\u4f8b\u7cbe\u7ec6\u8c03\u6574\u9884\u8bad\u7ec3\u7684LLM\uff0c\u4ee5\u4f53\u73b0\u671f\u671b\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u5b9e\u8bc1\u7814\u7a76\u8868\u660e\uff0c\u5373\u4f7f\u8fdb\u884c\u4e86\u504f\u597d\u5bf9\u9f50\uff0cLLM\u4e5f\u4ecd\u53ef\u80fd\u8bf1\u9a97\u81f3\u6709\u5bb3\u884c\u4e3a\u3002\u8fd9\u79cd\u88ab\u79f0\u4e3aLLM\u201c\u8d8a\u72f1\u201d\u7684\u73b0\u8c61\u901a\u5e38\u901a\u8fc7\u4fee\u6539\u8f93\u5165\u63d0\u793a\u6765\u5b9e\u73b0\uff0c\u4ee5\u8bef\u5bfcLLM\u3002\u672c\u6587\u4ece\u7edf\u8ba1\u5b66\u7684\u89d2\u5ea6\u63d0\u4f9b\u5bf9\u504f\u597d\u5bf9\u9f50\u548c\u8d8a\u72f1\u73b0\u8c61\u7684\u7406\u8bba\u6d1e\u5bdf\u3002 \u5728\u6211\u4eec\u7684\u6846\u67b6\u4e0b\uff0c\u9996\u5148\u8bc1\u660e\u4e86\u5982\u679c\u8bad\u7ec3\u8bed\u6599\u5e93\u4e2d\u5b58\u5728\u6709\u5bb3\u884c\u4e3a\uff0c\u9884\u8bad\u7ec3\u7684LLM\u4f1a\u6a21\u4eff\u8fd9\u79cd\u884c\u4e3a\u3002\u540c\u6837\u57fa\u4e8e\u8fd9\u4e2a\u6846\u67b6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7edf\u8ba1\u610f\u4e49\u4e0a\u7684\u5bf9\u9f50\u6982\u5ff5\uff0c\u5e76\u7ed9\u51fa\u4e86\u8d8a\u72f1\u6982\u7387\u7684\u4e0b\u754c\uff0c\u8868\u660e\u5728\u5408\u7406\u5047\u8bbe\u4e0b\uff0c\u8fd9\u79cd\u73b0\u8c61\u662f\u65e0\u6cd5\u907f\u514d\u7684\u3002\u57fa\u4e8e\u6211\u4eec\u7684\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5bf9\u5f53\u524d\u666e\u904d\u91c7\u7528\u7684\u5bf9\u9f50\u7b56\u7565\u2014\u2014\u5f3a\u5316\u8bed\u8a00\u5f15\u5bfc\u53cd\u9988\uff08RLHF\uff09\u7684\u6539\u8fdb\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aE-RLHF\u7684\u7b80\u5355\u4fee\u6539\u7248RLHF\u76ee\u6807\uff0c\u65e8\u5728\u63d0\u9ad8\u5b89\u5168\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u3002E-RLHF\u4e0d\u4f1a\u589e\u52a0\u989d\u5916\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u4e14\u4e0e\u5176\u5b83\u65b9\u6cd5\u517c\u5bb9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4e0d\u727a\u7272MT-Bench\u9879\u76ee\u8861\u91cf\u7684\u6a21\u578b\u6027\u80fd\u7684\u60c5\u51b5\u4e0b\uff0cE-RLHF\u5728AdvBench\u548cHarmBench\u9879\u76ee\u63d0\u51fa\u7684\u6240\u6709\u5bf9\u9f50\u95ee\u9898\u4e0a\u5747\u4f18\u4e8eRLHF\u3002|\n", "2408.01419": "|**2024-08-02**|**DebateQA: Evaluating Question Answering on Debatable Knowledge**|Rongwu Xu et.al.|[2408.01419](http://arxiv.org/abs/2408.01419)|**[link](https://github.com/pillowsofwind/debateqa)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4f7f\u5f97\u6211\u4eec\u80fd\u591f\u63a2\u8ba8\u5173\u4e8eLLM\u804a\u5929\u673a\u5668\u4eba\u4e0a\u56fa\u6709\u4e89\u8bae\u6027\u95ee\u9898\u7684\u7b54\u6848\uff0c\u8fd9\u9700\u8981\u4e00\u79cd\u53ef\u9760\u7684\u65b9\u5f0f\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u4f20\u7edf\u95ee\u7b54\u57fa\u51c6\u5047\u8bbe\u56fa\u5b9a\u7684\u7b54\u6848\u5bf9\u6b64\u76ee\u7684\u800c\u8a00\u662f\u4e0d\u8db3\u7684\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86DebateQA\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b2,941\u4e2a\u4e89\u8bae\u6027\u95ee\u9898\u7684\u6570\u636e\u96c6\uff0c\u6bcf\u4e2a\u95ee\u9898\u90fd\u9644\u5e26\u4e86\u591a\u4e2a\u7531\u4eba\u7c7b\u6ce8\u91ca\u7684\u7247\u6bb5\u7b54\u6848\uff0c\u8fd9\u4e9b\u7247\u6bb5\u7b54\u6848\u6355\u6349\u4e86\u5404\u79cd\u89c6\u89d2\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff1a\u89c2\u70b9\u591a\u6837\u6027\uff0c\u7528\u4e8e\u8bc4\u4f30\u89c6\u89d2\u7684\u5168\u9762\u6027\uff1b\u4ee5\u53ca\u4e89\u8bae\u610f\u8bc6\uff0c\u7528\u4e8e\u8bc4\u4f30LLM\u662f\u5426\u8ba4\u8bc6\u5230\u95ee\u9898\u7684\u4e89\u8bae\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\u4e0e\u4eba\u7c7b\u504f\u597d\u4e00\u81f4\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u57fa\u7840\u6a21\u578b\u4e4b\u95f4\u5177\u6709\u7a33\u5b9a\u6027\u3002\u901a\u8fc7\u4f7f\u7528DebateQA\u548c\u8fd9\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u6211\u4eec\u8bc4\u4f30\u4e8612\u79cd\u6d41\u884c\u7684LLM\u548c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u867d\u7136LLM\u901a\u5e38\u64c5\u957f\u8bc6\u522b\u4e89\u8bae\u6027\u95ee\u9898\uff0c\u4f46\u5b83\u4eec\u63d0\u4f9b\u5168\u9762\u7b54\u6848\u3001\u6db5\u76d6\u591a\u6837\u89c6\u89d2\u7684\u80fd\u529b\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002|\n", "2408.01417": "|**2024-08-02**|**Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs**|Yilun Hua et.al.|[2408.01417](http://arxiv.org/abs/2408.01417)|null|\u4eba\u7c7b\u5728\u5bf9\u8bdd\u8fc7\u7a0b\u4e2d\u4f1a\u81ea\u53d1\u5730\u4f7f\u7528\u8d8a\u6765\u8d8a\u9ad8\u6548\u7684\u8bed\u8a00\uff0c\u901a\u8fc7\u9002\u5e94\u5e76\u5f62\u6210\u81ea\u5b9a\u4e49\u7684\u7ea6\u5b9a\u3002\u8fd9\u4e00\u73b0\u8c61\u5df2\u7ecf\u901a\u8fc7\u53c2\u8003\u6e38\u620f\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u4eba\u7c7b\u8bed\u8a00\u8d85\u8d8a\u4f20\u8fbe\u610f\u56fe\u7684\u7279\u6027\u3002\u76ee\u524d\uff0c\u6211\u4eec\u5c1a\u672a\u63a2\u7d22\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u662f\u5426\u5728\u4ea4\u4e92\u4e2d\u540c\u6837\u63d0\u9ad8\u4e86\u6c9f\u901a\u6548\u7387\uff0c\u5e76\u4e14\u5b83\u4eec\u53ef\u80fd\u91c7\u7528\u4f55\u79cd\u673a\u5236\u5b9e\u73b0\u8fd9\u4e00\u76ee\u7684\u3002 \u6211\u4eec\u5f15\u5165\u4e86ICCA\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728MLLM\u4e2d\u8bc4\u4f30\u6b64\u7c7b\u5bf9\u8bdd\u9002\u5e94\u4f5c\u4e3a\u4e0a\u4e0b\u6587\u884c\u4e3a\u7684\u80fd\u529b\u3002\u6211\u4eec\u5bf9\u51e0\u79cd\u6700\u5148\u8fdb\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u89c2\u5bdf\u5230\u867d\u7136\u5b83\u4eec\u53ef\u80fd\u7406\u89e3\u5176\u5bf9\u8bdd\u4f19\u4f34\u7684\u8bed\u8a00\u8d8a\u6765\u8d8a\u9ad8\u6548\uff0c\u4f46\u5b83\u4eec\u672c\u8eab\u5e76\u4e0d\u81ea\u53d1\u5730\u5728\u65f6\u95f4\u4e0a\u4f7f\u81ea\u5df1\u7684\u8bed\u8a00\u53d8\u5f97\u66f4\u9ad8\u6548\u3002\u8fd9\u79cd\u80fd\u529b\u4ec5\u5728\u67d0\u4e9b\u6a21\u578b\uff08\u5982GPT-4\uff09\u4e2d\u53ef\u4ee5\u901a\u8fc7\u5f3a\u70c8\u7684\u63d0\u793a\u6765\u6fc0\u53d1\u3002\u8fd9\u8868\u660e\uff0c\u5373\u4f7f\u8fd9\u662f\u4eba\u7c7b\u8bed\u8a00\u7684\u5e38\u89c1\u7279\u5f81\uff0c\u5f53\u524d\u7684\u8bad\u7ec3\u5236\u5ea6\u5e76\u4e0d\u80fd\u4ea7\u751f\u8fd9\u4e00\u4e92\u52a8\u5c5e\u6027\u3002 ICCA\u6846\u67b6\u5df2\u5f00\u6e90\u53d1\u5e03\u4e8ehttps://github.com/lil-lab/ICCA\u3002|\n", "2408.01380": "|**2024-08-02**|**Coalitions of Large Language Models Increase the Robustness of AI Agents**|Prattyush Mangal et.al.|[2408.01380](http://arxiv.org/abs/2408.01380)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\u4ece\u6839\u672c\u4e0a\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u5b57\u7cfb\u7edf\u4e92\u52a8\u7684\u65b9\u5f0f\uff0c\u5e76\u63a8\u52a8\u4e86\u5bf9\u501f\u52a9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684AI\u4ee3\u7406\u4ee5\u8f85\u52a9\u65e5\u5e38\u6d41\u7a0b\u7684\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5f3a\u5927\u7684\u80fd\u529b\u5e76\u80fd\u591f\u8868\u73b0\u51fa\u4e00\u4e9b\u6d8c\u73b0\u7279\u6027\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u903b\u8f91\u63a8\u7406\u8005\uff0c\u5f80\u5f80\u5728AI\u4ee3\u7406\u6267\u884c\u5de5\u4f5c\u6d41\u7a0b\u65f6\u6240\u6d89\u53ca\u7684\u6240\u6709\u5b50\u4efb\u52a1\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u73b0\u6709\u7814\u7a76\u901a\u8fc7\u5927\u89c4\u6a21\u7684\u4e00\u822c\u6027\u9884\u8bad\u7ec3\u6216\u9488\u5bf9\u5de5\u5177\u4f7f\u7528\u8fdb\u884c\u4e13\u95e8\u7684\u5fae\u8c03\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u800c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e00\u4e2a\u7531\u4e13\u6ce8\u4e8e\u7279\u5b9a\u5b50\u4efb\u52a1\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u7ec4\u6210\u7684\u8054\u76df\u662f\u5426\u80fd\u4e0e\u5355\u4e00\u6a21\u578b\u4ee3\u7406\u7684\u8868\u73b0\u76f8\u5339\u654c\u3002\u8054\u76df\u6a21\u578b\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5176\u5728\u6784\u5efa\u9c81\u68d2\u6027\u548c\u964d\u4f4e\u8fd9\u4e9bAI\u4ee3\u7406\u8fd0\u884c\u6210\u672c\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u901a\u8fc7\u5229\u7528\u7279\u5b9a\u6a21\u578b\u5c55\u73b0\u7684\u7279\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u901a\u8fc7\u8003\u8651\u4e00\u7ec4\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u53ef\u4ee5\u51cf\u8f7b\u5fae\u8c03\u7684\u9700\u6c42\uff0c\u5e76\u76f8\u4fe1\u8fd9\u79cd\u65b9\u6cd5\u53ef\u4ee5\u5e94\u7528\u4e8e\u5176\u4ed6\u5229\u7528LLM\u7684\u975e\u4ee3\u7406\u7cfb\u7edf\u3002|\n", "2408.01363": "|**2024-08-02**|**Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation**|Jheng-Hong Yang et.al.|[2408.01363](http://arxiv.org/abs/2408.01363)|null|### \u6458\u8981 \u672c\u6587\u5bf9\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u8fdb\u884c\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u7684\u6f5c\u529b\u8fdb\u884c\u4e86\u63a2\u7d22\u3002\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u9488\u5bf9\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u4f5c\u7684\u5927\u578b\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\uff0c\u8bc4\u4f30\u4e86CLIP\u3001LLaVA\u548cGPT-4V\u7b49VLM\u7684\u6027\u80fd\u3002\u521d\u6b65\u5b9e\u9a8c\u7ed3\u679c\u5982\u4e0b\uff1a 1. **\u6027\u80fd\u6bd4\u8f83**\uff1a\u5728\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u4e0a\uff0cLLaVA\u548cGPT-4V\uff08\u5305\u62ec\u5f00\u6e90\u548c\u4e13\u6709\u89c6\u89c9\u6307\u4ee4\u8c03\u4f18\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684Kendall\u2019s \u03c4\u22480.4\u7684\u6210\u7ee9\uff0c\u8d85\u8fc7\u4e86CLIPScore\u6307\u6807\u3002 2. **\u504f\u597d\u4e0e\u504f\u89c1**\uff1a\u5c3d\u7ba1CLIPScore\u8868\u73b0\u7a81\u51fa\uff0c\u4f46LLMs\u5728\u504f\u89c1\u65b9\u9762\u76f8\u5bf9\u8f83\u5c11\u503e\u5411\u4e8e\u57fa\u4e8eCLIP\u7684\u68c0\u7d22\u7cfb\u7edf\u3002 3. **\u4e00\u81f4\u6027\u5206\u6790**\uff1aGPT-4V\u7684\u8bc4\u5206\u5206\u5e03\u4e0e\u4eba\u7c7b\u5224\u65ad\u66f4\u4e3a\u4e00\u81f4\uff0c\u5176Cohen\u2019s \u03ba\u503c\u7ea6\u4e3a0.08\uff0c\u8fdc\u9ad8\u4e8eCLIPScore\u7684\u7ea6-0.096\u3002\u8fd9\u4e00\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684VLM\u5728\u589e\u5f3a\u76f8\u5173\u6027\u8bc4\u4f30\u65b9\u9762\u5177\u6709\u6f5c\u529b\u3002 ### \u7ed3\u8bba \u672c\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u76f8\u5173\u6027\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u7279\u522b\u662f\u5f53\u5b83\u4eec\u88ab\u7528\u4e8e\u96f6\u6837\u672c\u68c0\u7d22\u4efb\u52a1\u65f6\u3002\u901a\u8fc7\u6bd4\u8f83\u4e0d\u540c\u6a21\u578b\u7684\u6027\u80fd\uff0c\u7814\u7a76\u5f3a\u8c03\u4e86LLMs\u5728\u591a\u5a92\u4f53\u5185\u5bb9\u521b\u5efa\u9886\u57df\u5185\u7684\u6f5c\u5728\u4f18\u52bf\uff0c\u5e76\u6307\u51fa\u4e86\u5b83\u4eec\u5728\u63d0\u5347\u5185\u5bb9\u76f8\u5173\u6027\u5224\u65ad\u65b9\u9762\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.01355": "|**2024-08-02**|**Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs**|Peng Ding et.al.|[2408.01355](http://arxiv.org/abs/2408.01355)|**[link](https://github.com/njunlp/hallu-pi)**|\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5076\u5c14\u4f1a\u4ea7\u751f\u4e0e\u7ed9\u5b9a\u56fe\u50cf\u4e0d\u4e00\u81f4\u7684\u5185\u5bb9\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u3002\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u4f7f\u7528\u6807\u51c6\u3001\u672a\u6270\u52a8\u57fa\u51c6\u8bc4\u4f30\u5e7b\u89c9\u4e0a\uff0c\u8fd9\u5ffd\u89c6\u4e86\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u666e\u904d\u5b58\u5728\u7684\u6270\u52a8\u8f93\u5165\uff08\u5982\u56fe\u50cf\u88c1\u526a\u6216\u6a21\u7cca\uff09\uff0c\u8fd9\u662f\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5e7b\u89c9\u5168\u9762\u8bc4\u4f30\u7684\u5173\u952e\u3002 \u672c\u7bc7\u8bba\u6587\u65e8\u5728\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u63d0\u51fa\u4e86Hallu-PI\uff0c\u9996\u4e2a\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u6270\u52a8\u8f93\u5165\u4e0b\u7684\u5e7b\u89c9\u7684\u57fa\u51c6\u3002Hallu-PI\u5305\u542b\u4e867\u79cd\u6270\u52a8\u60c5\u666f\uff0c\u6d89\u53ca1,260\u5f20\u6765\u81ea11\u79cd\u7269\u4f53\u7c7b\u578b\u7684\u6270\u52a8\u56fe\u50cf\u3002\u6bcf\u5f20\u56fe\u50cf\u90fd\u9644\u6709\u8be6\u7ec6\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5e7b\u89c9\u7c7b\u578b\uff0c\u5982\u5b58\u5728\u6027\u3001\u5c5e\u6027\u548c\u5173\u7cfb\u7b49\u3002\u8fd9\u4e9b\u6ce8\u91ca\u914d\u5907\u4e86\u4e00\u4e2a\u4e30\u5bcc\u7684\u95ee\u7b54\u96c6\uff0c\u4f7fHallu-PI\u9002\u7528\u4e8e\u8fa8\u522b\u6027\u548c\u751f\u6210\u6027\u4efb\u52a1\u3002 \u5728\u5bf9\u4e3b\u6d41\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini-Pro Vision\uff09\u8fdb\u884c\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728Hallu-PI\u4e0a\u7684\u8868\u73b0\u663e\u793a\u51fa\u663e\u8457\u7684\u5e7b\u89c9\uff0c\u800c\u5728\u672a\u6270\u52a8\u573a\u666f\u4e2d\u672a\u89c2\u5bdf\u5230\u6b64\u7c7b\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0d\u540c\u7c7b\u578b\u5e7b\u89c9\u65f6\u5b58\u5728\u7684\u4e25\u91cd\u504f\u5dee\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u4e13\u95e8\u9488\u5bf9\u6270\u52a8\u60c5\u666f\u7684\u57fa\u7ebf\uff0c\u5206\u522b\u4e3aPerturbed-Reminder\u548cPerturbed-ICL\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u5f15\u8d77\u7814\u7a76\u4eba\u5458\u5bf9\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u6270\u52a8\u8f93\u5165\u65f6\u5c40\u9650\u6027\u7684\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u8fdb\u4e00\u6b65\u7684\u8c03\u67e5\u4ee5\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u5728GitHub\uff08https://github.com/NJUNLP/Hallu-PI\uff09\u4e0a\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.01354": "|**2024-08-02**|**MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code**|Kaiwen Ning et.al.|[2408.01354](http://arxiv.org/abs/2408.01354)|**[link](https://github.com/KevinHeiwa/MCGTM)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5174\u8d77\uff0c\u4f17\u591a\u8f6f\u4ef6\u670d\u52a1\u63d0\u4f9b\u5546\uff08SSP\uff09\u81f4\u529b\u4e8e\u5f00\u53d1\u9488\u5bf9\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u5b9a\u5236\u5316LLM\uff0c\u5982CodeLlama\u548cCopilot\u3002\u7136\u800c\uff0c\u8fd9\u4e9bLLM\u6709\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u6765\u751f\u6210\u6076\u610f\u8f6f\u4ef6\uff0c\u5bf9\u8f6f\u4ef6\u751f\u6001\u7cfb\u7edf\u6784\u6210\u6f5c\u5728\u5a01\u80c1\uff0c\u4f8b\u5982\u81ea\u52a8\u5316\u9ad8\u7ea7\u7f51\u7edc\u9493\u9c7c\u6076\u610f\u8f6f\u4ef6\u7684\u521b\u5efa\u3002\u4e3a\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9996\u5148\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u5305\u542b\u7ea6400\u5c0f\u65f6\u5de5\u4f5c\u91cf\u3001\u5171\u8ba1406\u4e2a\u6076\u610f\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u63d0\u793a\u6570\u636e\u96c6MCGTest\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MCGMark\uff0c\u8fd9\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u7a33\u5065\u3001\u7ed3\u6784\u611f\u77e5\u4e14\u53ef\u7f16\u7801\u7684\u6c34\u5370\u65b9\u6cd5\uff0c\u7528\u4e8e\u8ffd\u8e2a\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u3002\u6211\u4eec\u901a\u8fc7\u63a7\u5236\u4ee4\u724c\u9009\u62e9\u548c\u57fa\u4e8e\u6982\u7387\u5f02\u5e38\u503c\u786e\u4fdd\u8f93\u51fa\u8d28\u91cf\u6765\u5d4c\u5165\u53ef\u7f16\u7801\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u8003\u8651\u6076\u610f\u4ee3\u7801\u7684\u7ed3\u6784\u7279\u5f81\u589e\u5f3a\u4e86\u6c34\u5370\u7684\u9c81\u68d2\u6027\uff0c\u907f\u514d\u5728\u6613\u4e8e\u4fee\u6539\u7684\u4f4d\u7f6e\uff08\u5982\u6ce8\u91ca\uff09\u5d4c\u5165\u6c34\u5370\u3002\u6211\u4eec\u4f7f\u7528DeepSeek-Coder\u9a8c\u8bc1\u4e86MCGMark\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\uff0c\u5176\u6700\u5927\u8f93\u51fa\u9650\u5236\u4e3a400\u4e2a\u4ee4\u724c\u65f6\uff0c\u5d4c\u5165\u6210\u529f\u7387\u8fbe\u5230\u4e8688.9%\u3002\u540c\u65f6\uff0c\u8be5\u65b9\u6cd5\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u9c81\u68d2\u6027\uff0c\u5e76\u5bf9\u8f93\u51fa\u4ee3\u7801\u7684\u8d28\u91cf\u5f71\u54cd\u6781\u5c0f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5e2e\u52a9SSP\u8ffd\u8e2a\u5e76\u8ffd\u7a76\u7531LLM\u751f\u6210\u7684\u6076\u610f\u4ee3\u7801\u7684\u6e90\u5934\u53ca\u8d23\u4efb\u3002|\n", "2408.01346": "|**2024-08-02**|**Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks**|Anders Giovanni M\u00f8ller et.al.|[2408.01346](http://arxiv.org/abs/2408.01346)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u662f\u4fc3\u8fdb\u793e\u4f1a\u8ba1\u7b97\u9886\u57df\u590d\u6742\u6587\u672c\u7406\u89e3\u4efb\u52a1\u7684\u6709\u529b\u5de5\u5177\u3002\u5b83\u4eec\u7684\u591a\u529f\u80fd\u6027\u867d\u7136\u6709\u76ca\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u5728\u8be5\u9886\u57df\u5efa\u7acb\u6807\u51c6\u5316\u6700\u4f73\u5b9e\u8df5\u7684\u969c\u788d\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e0d\u540c\u7b56\u7565\u4ef7\u503c\u7684\u6e05\u6670\u5ea6\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u73b0\u4ee3\u57fa\u4e8eLLM\u7684\u5206\u7c7b\u65b9\u6cd5\u572823\u4e2a\u793e\u4f1a\u77e5\u8bc6\u4efb\u52a1\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u6307\u51fa\u4e86\u4e09\u4e2a\u6700\u4f73\u5b9e\u8df5\uff1a\u9009\u62e9\u5177\u6709\u66f4\u5927\u8bcd\u6c47\u91cf\u548c\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\u7684\u6a21\u578b\uff1b\u907f\u514d\u7b80\u5355\u7684\u96f6\u6b21\u5c1d\u8bd5\uff0c\u800c\u503e\u5411\u4e8e\u589e\u5f3a\u63d0\u793a\u7684\u4eba\u5de5\u667a\u80fd\u65b9\u6cd5\uff1b\u5728\u7279\u5b9a\u4efb\u52a1\u6570\u636e\u4e0a\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u8003\u8651\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u66f4\u590d\u6742\u7684\u6307\u4ee4\u8c03\u6574\uff0c\u4ec5\u5f53\u8bad\u7ec3\u6570\u636e\u66f4\u4e3a\u4e30\u5bcc\u65f6\u624d\u8fd9\u6837\u505a\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u6bb5\u7ffb\u8bd1\u6587\u672c\u4e2d\u5e76\u672a\u5305\u542b\u4efb\u4f55\", \"\u5b57\u7b26\u3002|\n", "2408.01334": "|**2024-08-02**|**A Backbone for Long-Horizon Robot Task Understanding**|Xiaoshuai Chen et.al.|[2408.01334](http://arxiv.org/abs/2408.01334)|null|\u4e3a\u4e86\u5e94\u5bf9\u957f\u65f6\u7a0b\u4efb\u52a1\u4e2d\u7aef\u5230\u7aef\u673a\u5668\u4eba\u5b66\u4e60\u7684\u4e0d\u53ef\u9884\u6d4b\u6027\u4e0e\u6cdb\u5316\u80fd\u529b\u5dee\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eTherblig\u7684\u9aa8\u67b6\u6846\u67b6\uff08TBBF\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u673a\u5668\u4eba\u4efb\u52a1\u7406\u89e3\u4e0e\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u6846\u67b6\u5229\u7528Therblig\uff08\u57fa\u672c\u52a8\u4f5c\u5143\u7d20\uff09\u4f5c\u4e3a\u9aa8\u67b6\uff0c\u5c06\u9ad8\u7ea7\u673a\u5668\u4eba\u4efb\u52a1\u5206\u89e3\u4e3a\u57fa\u672c\u673a\u5668\u4eba\u914d\u7f6e\uff0c\u7136\u540e\u7ed3\u5408\u5f53\u524d\u7684\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u4efb\u52a1\u7406\u89e3\u3002 \u8be5\u65b9\u6cd5\u5305\u542b\u4e24\u4e2a\u9636\u6bb5\uff1a\u79bb\u7ebf\u8bad\u7ec3\u4e0e\u5728\u7ebf\u6d4b\u8bd5\u3002\u5728\u79bb\u7ebf\u8bad\u7ec3\u9636\u6bb5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Meta-RGate SynerFusion\uff08MGSF\uff09\u7f51\u7edc\uff0c\u7528\u4e8e\u8de8\u4efb\u52a1\u7cbe\u786e\u7684Therblig\u5206\u5272\u3002\u5728\u7ebf\u6d4b\u8bd5\u9636\u6bb5\uff0c\u901a\u8fc7\u6536\u96c6\u65b0\u4efb\u52a1\u7684\u4e00\u6b21\u6f14\u793a\uff0cMGSF\u7f51\u7edc\u63d0\u53d6\u9ad8\u9636\u77e5\u8bc6\uff0c\u5e76\u901a\u8fc7Action Registration\uff08ActionREG\uff09\u7f16\u7801\u5165\u56fe\u50cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528Large Language Model\uff08LLM\uff09-Alignment Policy for Visual Correction\uff08LAP-VC\uff09\u6765\u786e\u4fdd\u7cbe\u786e\u7684\u52a8\u4f5c\u6267\u884c\uff0c\u4ece\u800c\u5728\u65b0\u578b\u673a\u5668\u4eba\u573a\u666f\u4e2d\u5b9e\u73b0\u8f68\u8ff9\u8f6c\u79fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u8fd9\u4e9b\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0cTherblig\u5206\u5272\u8fbe\u5230\u4e8694.37%\u7684\u53ec\u56de\u7387\uff0c\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u7684\u5728\u7ebf\u673a\u5668\u4eba\u6d4b\u8bd5\u4e2d\uff0c\u5bf9\u4e8e\u7b80\u5355\u548c\u590d\u6742\u573a\u666f\u7684\u6210\u529f\u7387\u5206\u522b\u8fbe\u5230\u4e8694.4%\u548c80%\u3002\u8865\u5145\u6750\u6599\u53ef\u5728\u4ee5\u4e0b\u7f51\u7ad9\u83b7\u53d6\uff1ahttps://sites.google.com/view/therbligsbasedbackbone/home|\n", "2408.02651": "|**2024-08-05**|**Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?**|Mohammad Bahrami Karkevandi et.al.|[2408.02651](http://arxiv.org/abs/2408.02651)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u9053\u5fb7\u6027\u4ecd\u7136\u5b58\u5728\u4e89\u8bae\uff0c\u56e0\u4e3a\u5b83\u4eec\u7684\u8bad\u7ec3\u57fa\u4e8e\u4e92\u8054\u7f51\u6587\u672c\u8bed\u6599\u5e93\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u62c5\u5fe7\uff0c\u5df2\u7ecf\u5f00\u53d1\u4e86\u5bf9\u9f50\u6280\u672f\u6765\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u5171\u53ef\u7528\u6027\u548c\u5b89\u5168\u6027\u3002\u7136\u800c\uff0c\u901a\u8fc7\u8fd9\u4e9b\u6a21\u578b\u751f\u6210\u6709\u5bb3\u5185\u5bb9\u7684\u53ef\u80fd\u6027\u4f3c\u4e4e\u4ecd\u7136\u5b58\u5728\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u201c\u53cd\u5411\u5bf9\u9f50\u201dLLM\u7684\u6982\u5ff5\u2014\u2014\u5229\u7528\u5bf9\u6297\u89e6\u53d1\u5668\u9006\u8f6c\u5176\u5bf9\u9f50\u8fc7\u7a0b\u3002\u5148\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u8f6f\u5d4c\u5165\u63d0\u793a\u3001\u624b\u52a8\u6784\u5efa\u7684\u63d0\u793a\u548c\u57fa\u4e8e\u68af\u5ea6\u7684\u81ea\u52a8\u63d0\u793a\uff0c\u5728\u9ed1\u76d2\u6a21\u578b\u4e0a\u7531\u4e8e\u9700\u8981\u8bbf\u95ee\u6a21\u578b\u548c\u4ea7\u751f\u6709\u9650\u7684\u624b\u52a8\u6784\u5efa\u63d0\u793a\u7684\u9700\u6c42\u800c\u53d6\u5f97\u4e86\u6709\u9650\u7684\u6210\u529f\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u5bb9\u6613\u88ab\u963b\u65ad\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u4f18\u5316\u5bf9\u6297\u89e6\u53d1\u5668\uff0c\u4ec5\u9700\u5bf9\u76ee\u6807\u6a21\u578b\u8fdb\u884c\u63a8\u7406API\u8bbf\u95ee\u4ee5\u53ca\u4e00\u4e2a\u5c0f\u578b\u4ee3\u7406\u6a21\u578b\u5373\u53ef\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528BERTScore\u4e3a\u57fa\u7840\u7684\u5956\u52b1\u51fd\u6570\uff0c\u589e\u5f3a\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u5728\u65b0\u9ed1\u76d2\u6a21\u578b\u4e0a\u7684\u53ef\u79fb\u690d\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5982\u4f55\u5728\u672a\u6d4b\u8bd5\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\u63d0\u9ad8\u4e86\u5bf9\u6297\u89e6\u53d1\u5668\u7684\u8868\u73b0\u3002|\n", "2408.02632": "|**2024-08-05**|**SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models**|Muxi Diao et.al.|[2408.02632](http://arxiv.org/abs/2408.02632)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u4e0e\u5f71\u54cd\u529b\u7684\u6301\u7eed\u589e\u5f3a\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u6027\u548c\u9884\u9632\u6709\u5bb3\u8f93\u51fa\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u5173\u5207\uff0c\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5bf9\u6297\u6027\u63d0\u793a\u8fdb\u884c\u7ea2\u961f\u6d4b\u8bd5\u3002\u7136\u800c\uff0cLLM\u4e2d\u6f0f\u6d1e\u7684\u4e0d\u65ad\u6f14\u53d8\u4f7f\u5f97\u5f53\u524d\u7684\u5bf9\u6297\u65b9\u6cd5\u5728\u5177\u4f53\u9488\u5bf9\u548c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u5f31\u70b9\u65b9\u9762\u663e\u5f97\u529b\u4e0d\u4ece\u5fc3\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u81ea\u6211\u6f14\u5316\u5b89\u5168\u4f18\u5316\u201d\uff08SEAS\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5229\u7528\u6a21\u578b\u81ea\u8eab\u751f\u6210\u7684\u6570\u636e\u6765\u589e\u5f3a\u5b89\u5168\u6027\u3002SEAS\u8fd0\u4f5c\u4e8e\u4e09\u4e2a\u8fed\u4ee3\u9636\u6bb5\uff1a\u521d\u59cb\u5316\u3001\u653b\u51fb\u548c\u5bf9\u6297\u4f18\u5316\uff0c\u65e8\u5728\u540c\u65f6\u63d0\u5347\u7ea2\u961f\u548c\u76ee\u6807\u6a21\u578b\u7684\u7a33\u5065\u6027\u548c\u5b89\u5168\u6027\u3002 \u8be5\u6846\u67b6\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u6d4b\u8bd5\u7684\u4f9d\u8d56\uff0c\u5e76\u663e\u8457\u589e\u5f3a\u4e86LLM\u7684\u5b89\u5168\u6027\u80fd\u529b\u3002\u6211\u4eec\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u65b0\u9896\u7684\u5bf9\u6297\u6027\u6846\u67b6\u3001\u4e00\u4e2a\u5168\u9762\u7684\u5b89\u5168\u6570\u636e\u96c6\u4ee5\u53ca\u7ecf\u8fc7\u4e09\u6b21\u8fed\u4ee3\u540e\uff0c\u76ee\u6807\u6a21\u578b\u7684\u5b89\u5168\u6c34\u5e73\u8fbe\u5230\u4e86\u4e0eGPT-4\u76f8\u5f53\u7684\u6c34\u5e73\uff0c\u800c\u7ea2\u961f\u6a21\u578b\u5728\u5bf9\u6297\u9ad8\u7ea7\u6a21\u578b\u65f6\u7684\u6210\u529f\u7387\uff08ASR\uff09\u6709\u4e86\u663e\u8457\u63d0\u9ad8\u3002|\n", "2408.02599": "|**2024-08-05**|**Progressively Selective Label Enhancement for Language Model Alignment**|Biao Liu et.al.|[2408.02599](http://arxiv.org/abs/2408.02599)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u8bed\u8a00\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u53ef\u80fd\u4f1a\u751f\u6210\u4e0e\u4eba\u7c7b\u9884\u671f\u4e0d\u7b26\u7684\u5185\u5bb9\uff0c\u4ece\u800c\u5f15\u53d1\u4f26\u7406\u548c\u6cd5\u5f8b\u95ee\u9898\u3002\u56e0\u6b64\uff0c\u63a2\u7d22\u8fd9\u4e9b\u6a21\u578b\u7684\u5c40\u9650\u6027\u5e76\u5b9e\u65bd\u9650\u5236\u4ee5\u786e\u4fdd\u5b89\u5168\u6027\u548c\u5408\u89c4\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u5176\u4e2d\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u662f\u4e3b\u8981\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531\u4e8eRLHF\u9636\u6bb5\u5728\u7a33\u5b9a\u6027\u548c\u53ef\u6269\u5c55\u6027\u65b9\u9762\u9762\u4e34\u7684\u6311\u6218\uff0c\u7814\u7a76\u4eba\u5458\u6b63\u5728\u63a2\u7d22\u5176\u4ed6\u65b9\u6cd5\u6765\u5b9e\u73b0\u4e0eRLHF\u7c7b\u4f3c\u7684\u6548\u679c\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5927\u91cf\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u4f4e\u6548\u5730\u5229\u7528\u751f\u6210\u7684\u6570\u636e\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPSLE\uff08Progressively Selective Label Enhancement for Language Model Alignment\uff09\u7684\u6846\u67b6\uff0c\u5b83\u5145\u5206\u5229\u7528\u6240\u6709\u751f\u6210\u6570\u636e\uff0c\u901a\u8fc7\u6307\u5bfc\u6a21\u578b\u9075\u5faa\u539f\u5219\u6765\u4f7f\u8f93\u51fa\u4e0e\u4eba\u7c7b\u671f\u671b\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u52a8\u6001\u66f4\u65b0\u9608\u503c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u786e\u4fdd\u4e86\u9ad8\u6548\u7684\u6570\u636e\u5229\u7528\uff0c\u901a\u8fc7\u6574\u5408\u6240\u6709\u751f\u6210\u54cd\u5e94\u5e76\u6839\u636e\u5176\u76f8\u5e94\u7684\u5956\u52b1\u5206\u6570\u5bf9\u5b83\u4eec\u8fdb\u884c\u52a0\u6743\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPSLE\u5728\u73b0\u6709\u8bed\u8a00\u6a21\u578b\u5bf9\u9f50\u65b9\u6cd5\u4e2d\u8868\u73b0\u51fa\u6709\u6548\u6027\u3002|\n", "2408.02584": "|**2024-08-05**|**Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization**|Ankan Mullick et.al.|[2408.02584](http://arxiv.org/abs/2408.02584)|null|\u968f\u7740\u6570\u5b57\u4fe1\u606f\u91cf\u7684\u6301\u7eed\u589e\u957f\uff0c\u7528\u6237\u9700\u8981\u6709\u6548\u65b9\u6cd5\u4ece\u957f\u7bc7\u6587\u6863\u4e2d\u63d0\u53d6\u5173\u952e\u89c1\u89e3\u3002\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u63d0\u4f9b\u4e86\u4e00\u79cd\u6709\u9488\u5bf9\u6027\u7684\u65b9\u6cd5\uff0c\u751f\u6210\u4e13\u6ce8\u4e8e\u6587\u6863\u5185\u7279\u5b9a\u65b9\u9762\u7684\u5c0f\u7ed3\u3002\u5c3d\u7ba1\u5728\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u7814\u7a76\u9886\u57df\u53d6\u5f97\u4e86\u8fdb\u5c55\uff0c\u4f46\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u7684\u6301\u7eed\u8ffd\u6c42\u662f\u5fc5\u8981\u7684\u3002\u9274\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u5728\u603b\u7ed3\u95ee\u9898\u4e0a\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u4ee5\u6267\u884c\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u4efb\u52a1\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f00\u6e90\u57fa\u7840LLMs\uff0c\u5305\u62ecLlama2\u3001Mistral\u3001Gemma\u548cAya\uff0c\u5bf9\u4e8e\u516c\u5f00\u53ef\u7528\u7684\u7279\u5b9a\u9886\u57df\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u6570\u636e\u96c6\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5047\u8bbe\u662f\uff0c\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u8ba9\u8fd9\u4e9b\u6a21\u578b\u6709\u6548\u5730\u8bc6\u522b\u5e76\u63d0\u53d6\u4e0e\u65b9\u9762\u76f8\u5173\u7684\u4fe1\u606f\uff0c\u4ece\u800c\u4ea7\u751f\u4e0e\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u76f8\u6bd4\u66f4\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u5168\u9762\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u5c06\u5fae\u8c03\u540e\u7684LLMs\u7684\u6027\u80fd\u4e0e\u7ade\u4e89\u6027\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u65b9\u6cd5\u4ee5\u53ca\u5fae\u8c03\u524dLLMs\u7684\u539f\u59cb\u7248\u672c\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u901a\u8fc7\u8bc1\u660e\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\uff0c\u4e3a\u9762\u5411\u65b9\u9762\u7684\u603b\u7ed3\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6b64\u5916\uff0c\u5b83\u4e3a\u5728\u4e0d\u540cNLP\u9886\u57df\u8fdb\u4e00\u6b65\u63a2\u7d22\u4f7f\u7528LLMs\u8fdb\u884c\u76ee\u6807\u4fe1\u606f\u62bd\u53d6\u4efb\u52a1\u6253\u5f00\u4e86\u5927\u95e8\u3002|\n", "2408.02559": "|**2024-08-05**|**Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information**|Yauwai Yim et.al.|[2408.02559](http://arxiv.org/abs/2408.02559)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0eAPI\u9a71\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u3001\u4e0d\u5b8c\u5168\u4fe1\u606f\u73af\u5883\u4e0b\u7684\u6587\u672c\u6e38\u620f\u534f\u4f5c\u80fd\u529b\uff0c\u7279\u522b\u662f\u5728\u975e\u82f1\u8bed\u73af\u5883\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7814\u7a76\u5bf9\u6bd4\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e0e\u5176\u4ed6\u7c7b\u578b\u4ee3\u7406\u7684\u6027\u80fd\uff0c\u5e76\u4f7f\u7528\u7406\u8bba\u601d\u7ef4\uff08Theory of Mind, ToM\uff09\u89c4\u5212\u6280\u672f\u6765\u8bc4\u4f30\u5b83\u4eec\u5728\u9700\u8981\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u7684\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u4e2d\u8868\u73b0\u7684\u80fd\u529b\u3002\u901a\u8fc7\u5f15\u5165\u5916\u90e8\u5de5\u5177\u6765\u89e3\u51b3\u6b64\u5361\u724c\u6e38\u620f\u4e2d\u52a8\u6001\u4e14\u5e9e\u5927\u7684\u884c\u52a8\u7a7a\u95f4\u95ee\u9898\uff0c\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9762\u5bf9\u9ad8\u7ea7\u522b\u4efb\u52a1\u65f6\u4e0e\u5f3a\u5316\u5b66\u4e60\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002\u5c3d\u7ba1\u5b58\u5728\u8fd9\u4e00\u5dee\u8ddd\uff0c\u4f46\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c55\u73b0\u4e86\u5728\u6e38\u620f\u573a\u666f\u4e0b\u7684\u7406\u8bba\u601d\u7ef4\u80fd\u529b\uff0c\u80fd\u591f\u7406\u89e3\u76df\u53cb\u548c\u5bf9\u624b\u7684\u884c\u4e3a\uff0c\u5e76\u4e0e\u76df\u53cb\u5efa\u7acb\u534f\u4f5c\u5173\u7cfb\uff0c\u4ece\u800c\u6301\u7eed\u63d0\u5347\u5176\u6027\u80fd\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u4e0e\u7406\u89e3\uff0c\u6211\u4eec\u5df2\u516c\u5f00\u4e86\u4ee3\u7801\u5e93\u3002|\n", "2408.02549": "|**2024-08-05**|**Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning**|Hao Zhou et.al.|[2408.02549](http://arxiv.org/abs/2408.02549)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u57286G\u7f51\u7edc\u4e2d\u90e8\u7f72\u57fa\u7840\u6a21\u578b\u7684\u521b\u65b0\u8fb9\u7f18-\u4e91\u67b6\u6784\u3002\u5177\u4f53\u76ee\u6807\u662f\u901a\u8fc7\u65e0\u7ebf\u7535\u8d44\u6e90\u5206\u914d\u548c\u4efb\u52a1\u5378\u8f7d\u6765\u6700\u5c0f\u5316\u57fa\u7840\u6a21\u578b\u7684\u670d\u52a1\u5ef6\u8fdf\u3002\u4e3b\u8981\u5206\u4e3a\u4e09\u90e8\u5206\uff1a\u9996\u5148\uff0c\u4ecb\u7ecd\u901a\u4fe1\u7cfb\u7edf\u6a21\u578b\uff0c\u5373\u5206\u914d\u65e0\u7ebf\u7535\u8d44\u6e90\u5e76\u8ba1\u7b97\u652f\u6301\u751f\u6210\u5185\u5bb9\u4f20\u8f93\u7684\u94fe\u8def\u5bb9\u91cf\uff1b\u5176\u6b21\uff0c\u5c55\u793a\u57fa\u7840\u6a21\u578b\u63a8\u7406\u6a21\u578b\uff0c\u7528\u4e8e\u8ba1\u7b97\u5185\u5bb9\u751f\u6210\u7684\u5ef6\u8fdf\uff1b\u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u65b9\u6cd5\u6765\u4f18\u5316\u4efb\u52a1\u5378\u8f7d\u51b3\u7b56\u3002\u8be5\u65b9\u6cd5\u5229\u7528\u57fa\u7840\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\uff0c\u907f\u514d\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u4e2d\u9700\u8981\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u56f0\u96be\u3002\u4eff\u771f\u7ed3\u679c\u8868\u660e\uff0c\u63d0\u51fa\u7684\u8fb9\u7f18-\u4e91\u90e8\u7f72\u4e0e\u4e0a\u4e0b\u6587\u5b66\u4e60\u4efb\u52a1\u5378\u8f7d\u65b9\u6cd5\u53ef\u4ee5\u5728\u65e0\u9700\u4e13\u95e8\u6a21\u578b\u8bad\u7ec3\u6216\u5fae\u8c03\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u6ee1\u610f\u7684\u751f\u6210\u670d\u52a1\u8d28\u91cf\u3002|\n", "2408.02545": "|**2024-08-05**|**RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation**|Daniel Fleischer et.al.|[2408.02545](http://arxiv.org/abs/2408.02545)|**[link](https://github.com/intellabs/ragfoundry)**|\u5b9e\u65bd\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\u56fa\u6709\u5730\u590d\u6742\uff0c\u9700\u8981\u6df1\u5165\u4e86\u89e3\u6570\u636e\u3001\u5e94\u7528\u573a\u666f\u4ee5\u53ca\u7ec6\u81f4\u7684\u8bbe\u8ba1\u51b3\u7b56\u3002\u6b64\u5916\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u7cfb\u7edf\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u9700\u8981\u901a\u8fc7\u591a\u7ef4\u5ea6\u7684\u65b9\u6cd5\u8bc4\u4f30\u68c0\u7d22\u51c6\u786e\u6027\u548c\u751f\u6210\u8d28\u91cf\u3002\u6211\u4eec\u5f15\u5165\u4e86RAG Foundry\uff0c\u8fd9\u662f\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u7528\u4e8e\u5728RAG\u573a\u666f\u4e2d\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6570\u636e\u3002RAG Foundry\u5c06\u6570\u636e\u521b\u5efa\u3001\u8bad\u7ec3\u3001\u63a8\u7406\u548c\u8bc4\u4f30\u6574\u5408\u5230\u4e00\u4e2a\u5de5\u4f5c\u6d41\u7a0b\u4e2d\uff0c\u4ece\u800c\u4e3a\u5728RAG\u8bbe\u7f6e\u4e2d\u8bad\u7ec3\u548c\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u6570\u636e\u589e\u5f3a\u96c6\u63d0\u4f9b\u4e86\u4fbf\u5229\u3002\u8fd9\u79cd\u6574\u5408\u4f7f\u5f97\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u548cRAG\u6280\u672f\u7684\u5b9e\u9a8c\u53d8\u5f97\u5bb9\u6613\uff0c\u5141\u8bb8\u7528\u6237\u8f7b\u677e\u751f\u6210\u6570\u636e\u96c6\u5e76\u4f7f\u7528\u5185\u90e8\u6216\u4e13\u95e8\u7684\u77e5\u8bc6\u6e90\u8bad\u7ec3RAG\u6a21\u578b\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u591a\u79cdRAG\u914d\u7f6e\u5bf9Llama-3\u548cPhi-3\u6a21\u578b\u8fdb\u884c\u589e\u5f3a\u548c\u5fae\u8c03\uff0c\u5728\u4e09\u4e2a\u77e5\u8bc6\u5bc6\u96c6\u578b\u6570\u636e\u96c6\u4e0a\u5c55\u793a\u4e86\u6301\u7eed\u6539\u8fdb\u7684\u6709\u6548\u6027\u3002\u4ee3\u7801\u4f5c\u4e3a\u5f00\u6e90\u53d1\u5e03\u5728https://github.com/IntelLabs/RAGFoundry\u3002|\n", "2408.02544": "|**2024-08-05**|**Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions**|Xinbei Ma et.al.|[2408.02544](http://arxiv.org/abs/2408.02544)|**[link](https://github.com/xbmxb/EnvDistraction)**|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4ee3\u7406\u5728\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u73af\u5883\u4e2d\u7684\u5fe0\u8bda\u5ea6\u95ee\u9898\uff0c\u65e8\u5728\u89e3\u51b3\u4ee5\u4e0b\u7814\u7a76\u95ee\u9898\uff1a\u591a\u6a21\u6001GUI\u4ee3\u7406\u662f\u5426\u53ef\u80fd\u88ab\u73af\u5883\u80cc\u666f\u5206\u6563\u6ce8\u610f\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u8bbe\u7f6e\uff0c\u5176\u4e2d\u7528\u6237\u548c\u4ee3\u7406\u5747\u4e3a\u5584\u610f\u89d2\u8272\uff0c\u800c\u73af\u5883\u867d\u975e\u6076\u610f\uff0c\u4f46\u5305\u542b\u4e0e\u4efb\u52a1\u65e0\u5173\u7684\u5185\u5bb9\u3002\u901a\u8fc7\u6211\u4eec\u7684\u6a21\u62df\u6570\u636e\u96c6\uff0c\u5bf9\u591a\u79cdMLLM\u4f5c\u4e3aGUI\u4ee3\u7406\u8fdb\u884c\u8bc4\u4f30\uff0c\u6309\u7167\u4e09\u79cd\u4e0d\u540c\u7684\u5de5\u4f5c\u6a21\u5f0f\uff0c\u5373\u5177\u6709\u4e0d\u540c\u7a0b\u5ea6\u611f\u77e5\u80fd\u529b\u7684\u6a21\u5f0f\u8fdb\u884c\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5373\u4fbf\u662f\u6700\u5f3a\u5927\u7684\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u901a\u7528\u578b\u4ee3\u7406\u8fd8\u662f\u4e13\u95e8\u7528\u4e8eGUI\u7684\u4ee3\u7406\uff0c\u90fd\u5bb9\u6613\u53d7\u5230\u5e72\u6270\u3002\u867d\u7136\u8fd1\u671f\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u591a\u6a21\u6001\u4ee3\u7406\u7684\u52a8\u4f5c\u51c6\u786e\u6027\uff08\u5373\u5e2e\u52a9\u6027\uff09\uff0c\u4f46\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u8fd9\u4e9b\u4ee3\u7406\u5728\u9762\u5bf9\u73af\u5883\u5e72\u6270\u65f6\u8868\u73b0\u51fa\u4e0d\u5fe0\u884c\u4e3a\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u51fa\u53d1\uff0c\u5b9e\u65bd\u73af\u5883\u6ce8\u5165\u7b56\u7565\uff0c\u5c55\u793a\u51fa\u5229\u7528\u8fd9\u79cd\u4e0d\u5fe0\u884c\u4e3a\u53ef\u80fd\u5bfc\u81f4\u7684\u610f\u5916\u98ce\u9669\u3002|\n", "2408.02535": "|**2024-08-05**|**Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph**|Zhao Kaichen et.al.|[2408.02535](http://arxiv.org/abs/2408.02535)|null|\u89c6\u89c9\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u662f\u667a\u80fd\u4f53\u9886\u57df\u7684\u91cd\u8981\u7814\u7a76\u4e4b\u4e00\uff0c\u65e8\u5728\u4f7f\u667a\u80fd\u4f53\u7406\u89e3\u5468\u56f4\u73af\u5883\u5e76\u5b8c\u6210\u5bfc\u822a\u4efb\u52a1\u3002\u5728VLN\u4efb\u52a1\u4e2d\uff0c\u6307\u4ee4\u53ef\u4ee5\u5206\u4e3a\u7c97\u7c92\u5ea6\u548c\u7ec6\u7c92\u5ea6\u4e24\u79cd\u7c7b\u578b\u3002\u7ec6\u7c92\u5ea6\u6307\u4ee4\u8be6\u7ec6\u63cf\u8ff0\u4e86\u6574\u4e2a\u4efb\u52a1\u7684\u6b65\u9aa4\uff0c\u800c\u7c97\u7c92\u5ea6\u6307\u4ee4\u5219\u63d0\u4f9b\u4e86\u4e00\u4e2a\u62bd\u8c61\u7684\u4efb\u52a1\u63cf\u8ff0\uff0c\u66f4\u9002\u5408\u4eba\u7c7b\u7684\u4e60\u60ef\u3002\u73b0\u6709\u7684\u5927\u90e8\u5206\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u5bf9\u7ec6\u7c92\u5ea6\u6307\u4ee4\u7684\u7814\u7a76\u4e0a\uff0c\u5ffd\u89c6\u4e86\u65e5\u5e38\u751f\u6d3b\u4e2d\u5b58\u5728\u7684\u62bd\u8c61\u6307\u4ee4\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c1d\u8bd5\u901a\u8fc7\u4e8b\u4ef6\u77e5\u8bc6\u589e\u5f3a\u7684\u65b9\u5f0f\u8003\u8651VLN\u4e2d\u7684\u7c97\u7c92\u5ea6\u6307\u4ee4\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u6765\u6574\u5408\u591a\u4e2a\u4e3b\u6d41\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5f62\u6210\u4e00\u4e2a\u5168\u9762\u7684\u4e8b\u4ef6\u77e5\u8bc6\u56fe\u8c31\uff08\u547d\u540d\u4e3aVLN-EventKG\uff09\u3002\u901a\u8fc7\u5c0f\u89c4\u6a21\u548c\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u5408\u4f5c\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u80fd\u591f\u5904\u7406\u7c97\u7c92\u5ea6\u6307\u4ee4\u8f93\u5165\u7684\u4e8b\u4ef6\u5bfc\u822a\uff08EventNav\uff09\u65b9\u6cd5\uff0c\u7528\u4e8eVLN\u4efb\u52a1\u4e2d\u7684\u5bfc\u822a\u89c4\u5212\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u52a8\u6001\u5386\u53f2\u56de\u6eaf\u6a21\u5757\uff0c\u80fd\u591f\u5728\u5b9e\u65f6\u4e2d\u7ea0\u6b63\u6f5c\u5728\u7684\u9519\u8bef\u52a8\u4f5c\u89c4\u5212\u3002\u5728\u5404\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528\u6211\u4eec\u63d0\u51fa\u7684VLN-EventKG\u7684\u77e5\u8bc6\u589e\u5f3a\u65b9\u6cd5\uff0c\u5728\u4f7f\u7528\u7c97\u7c92\u5ea6\u6307\u4ee4\u7684VLN\u4efb\u52a1\u4e2d\u5177\u6709\u8d85\u8fc75%\u7684\u6210\u529f\u7387\u4f18\u52bf\u3002\u6211\u4eec\u7684\u9879\u76ee\u53ef\u4ee5\u5728 \u4e0a\u8bbf\u95ee\u3002|\n", "2408.02509": "|**2024-08-05**|**Practical Attacks against Black-box Code Completion Engines**|Slobodan Jenko et.al.|[2408.02509](http://arxiv.org/abs/2408.02509)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aINSEC\u7684\u65b0\u578b\u653b\u51fb\u65b9\u6cd5\uff0c\u65e8\u5728\u5f15\u5bfc\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7801\u8865\u5168\u5f15\u64ce\u751f\u6210\u5b58\u5728\u5b89\u5168\u6f0f\u6d1e\u7684\u4ee3\u7801\u3002\u8fd9\u79cd\u653b\u51fb\u65b9\u5f0f\u4e0e\u5e02\u9762\u4e0a\u5927\u591a\u6570\u5546\u4e1a\u8865\u5168\u5f15\u64ce\uff08\u5982GitHub Copilot\uff09\u76f8\u4f3c\uff0c\u4ec5\u9700\u8981\u9ed1\u76d2\u67e5\u8be2\u8bbf\u95ee\u76ee\u6807\u5f15\u64ce\uff0c\u65e0\u9700\u4e86\u89e3\u5176\u5185\u90e8\u673a\u5236\u3002\u653b\u51fb\u7b56\u7565\u901a\u8fc7\u5728\u8865\u5168\u8f93\u5165\u4e2d\u63d2\u5165\u6076\u610f\u653b\u51fb\u5b57\u7b26\u4e32\u4f5c\u4e3a\u7b80\u77ed\u6ce8\u91ca\u6765\u5b9e\u65bd\u3002\u4e3a\u4e86\u8bbe\u8ba1\u51fa\u6709\u6548\u7684\u653b\u51fb\u5b57\u7b26\u4e32\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u7cfb\u5217\u4e13\u95e8\u7684\u521d\u59cb\u5316\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u4f18\u5316\u8fc7\u7a0b\u8fdb\u4e00\u6b65\u7cbe\u70bc\u3002\u6211\u4eec\u5728\u5f00\u6e90\u6a21\u578b\u3001\u9ed1\u76d2\u5546\u4e1a\u670d\u52a1\uff08\u5982OpenAI API\u548cGitHub Copilot\uff09\u4ee5\u53ca\u4e94\u79cd\u7f16\u7a0b\u8bed\u8a00\u4e0b\u768416\u4e2a\u5173\u952e\u9519\u8bef\u7c7b\u522b\u4e0a\u9a8c\u8bc1\u4e86INSEC\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709\u6280\u672f\u76f8\u6bd4\uff0cINSEC\u663e\u8457\u63d0\u9ad8\u4e86\u8003\u8651\u4e2d\u7684\u8865\u5168\u5f15\u64ce\u751f\u6210\u4e0d\u5b89\u5168\u4ee3\u7801\u7684\u53ef\u80fd\u6027\u8d85\u8fc750%\uff0c\u540c\u65f6\u4ecd\u5177\u5907\u751f\u6210\u529f\u80fd\u6b63\u786e\u4ee3\u7801\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u653b\u51fb\u65b9\u6cd5\u8d44\u6e90\u9700\u6c42\u8f83\u4f4e\uff0c\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e\u5341\u7f8e\u5143\uff0c\u53ef\u5728\u666e\u901a\u786c\u4ef6\u4e0a\u8fd0\u884c\u3002|\n", "2408.03302": "|**2024-08-06**|**TextIM: Part-aware Interactive Motion Synthesis from Text**|Siyuan Fan et.al.|[2408.03302](http://arxiv.org/abs/2408.03302)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTextIM\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u5408\u6210\u57fa\u4e8e\u6587\u672c\u9a71\u52a8\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u5e76\u7279\u522b\u5173\u6ce8\u4e8e\u90e8\u5206\u7ea7\u8bed\u4e49\u7684\u7cbe\u786e\u5bf9\u9f50\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u4ea4\u4e92\u8eab\u4f53\u90e8\u4f4d\u7684\u5173\u952e\u4f5c\u7528\uff0c\u5e76\u672a\u80fd\u5145\u5206\u6355\u6349\u548c\u5bf9\u9f50\u90e8\u5206\u7ea7\u8bed\u4e49\uff0c\u5bfc\u81f4\u4e86\u4e0d\u51c6\u786e\u751a\u81f3\u9519\u8bef\u7684\u52a8\u4f5c\u7ed3\u679c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0cTextIM\u91c7\u7528\u4e86\u4e00\u4e2a\u89e3\u8026\u6761\u4ef6\u6269\u6563\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u4ea4\u4e92\u52a8\u4f5c\u4e0e\u5bf9\u5e94\u6587\u672c\u63cf\u8ff0\u4e2d\u7684\u8bed\u4e49\u610f\u56fe\u4e4b\u95f4\u8be6\u7ec6\u7684\u5bf9\u9f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4f5c\u4e3a\u4eba\u7c7b\u5927\u8111\u7684\u89d2\u8272\uff0c\u6765\u8bc6\u522b\u4ea4\u4e92\u7684\u8eab\u4f53\u90e8\u4f4d\u5e76\u7406\u89e3\u4ea4\u4e92\u8bed\u4e49\uff0c\u4ece\u800c\u751f\u6210\u590d\u6742\u7684\u5fae\u5999\u4ea4\u4e92\u52a8\u4f5c\u3002\u5728\u7cbe\u7ec6\u52a8\u4f5c\u5f15\u5bfc\u4e0b\uff0cTextIM\u8fdb\u4e00\u6b65\u5c06\u8fd9\u4e9b\u90e8\u5206\u52a8\u4f5c\u6269\u5c55\u4e3a\u6574\u4e2a\u8eab\u4f53\u7684\u8fde\u8d2f\u52a8\u4f5c\u3002\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7a7a\u95f4\u4e00\u81f4\u6027\u6a21\u5757\uff0c\u901a\u8fc7\u90e8\u5206\u56fe\u5377\u79ef\u7f51\u7edc\u5728\u6574\u4e2a\u8eab\u4f53\u52a8\u4f5c\u4e2d\u8865\u5145\u548c\u7ef4\u6301\u5404\u90e8\u5206\u4e4b\u95f4\u7684\u8fde\u8d2f\u6027\u548c\u548c\u8c10\u6027\u3002\u5bf9\u4e8e\u8bad\u7ec3\u548c\u8bc4\u4f30\uff0c\u6211\u4eec\u7cbe\u5fc3\u9009\u62e9\u4e86\u5e76\u91cd\u65b0\u6807\u8bb0\u4e86HUMANML3D\u4e2d\u7684\u4ea4\u4e92\u52a8\u4f5c\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u6570\u636e\u96c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0cTextIM\u80fd\u591f\u4ea7\u751f\u8bed\u4e49\u4e0a\u51c6\u786e\u7684\u4eba\u7c7b\u4ea4\u4e92\u52a8\u4f5c\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5728\u5404\u79cd\u573a\u666f\u4e0b\u5408\u6210\u4ea4\u4e92\u52a8\u4f5c\u7684\u771f\u5b9e\u611f\u548c\u5e94\u7528\u6027\uff0c\u5305\u62ec\u4e0e\u53ef\u53d8\u5f62\u548c\u52a8\u6001\u53d8\u5316\u7269\u4f53\u7684\u4ea4\u4e92\u3002|\n", "2408.03297": "|**2024-08-06**|**KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models**|Ruizhe Zhang et.al.|[2408.03297](http://arxiv.org/abs/2408.03297)|null|\u901a\u8fc7\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7b56\u7565\u5df2\u6210\u4e3a\u7f13\u89e3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u65f6\u9047\u5230\u7684\u5e7b\u89c9\u95ee\u9898\u7684\u6709\u6548\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5728\u6574\u5408\u975e\u53c2\u6570\u5316\u7684\u5916\u90e8\u652f\u6301\u8bc1\u636e\u4e0e\u5185\u90e8\u53c2\u6570\u5316\u77e5\u8bc6\u7684\u8fc7\u7a0b\u4e2d\uff0c\u4e0d\u53ef\u907f\u514d\u7684\u77e5\u8bc6\u51b2\u7a81\u53ef\u80fd\u4f1a\u4ea7\u751f\uff0c\u5bfc\u81f4\u6a21\u578b\u54cd\u5e94\u4e2d\u7684\u6df7\u6dc6\u3002\u4e3a\u4e86\u5728\u4e0d\u540c\u60c5\u5883\u4e0b\u63d0\u5347\u8bed\u8a00\u6a21\u578b\u7684\u77e5\u8bc6\u9009\u62e9\u80fd\u529b\uff0c\u4e00\u4e9b\u7814\u7a76\u5df2\u7ecf\u5173\u6ce8\u4e8e\u901a\u8fc7\u6307\u4ee4\u8c03\u6574\u6765\u7ec6\u5316\u5176\u884c\u4e3a\u6a21\u5f0f\u3002\u7136\u800c\uff0c\u7531\u4e8e\u7f3a\u4e4f\u660e\u786e\u7684\u8d1f\u5411\u4fe1\u53f7\u548c\u6bd4\u8f83\u76ee\u6807\uff0c\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\u8fdb\u884c\u5fae\u8c03\u7684\u8bed\u8a00\u6a21\u578b\u5728\u590d\u6742\u7684\u3001\u73b0\u5b9e\u7684\u68c0\u7d22\u573a\u666f\u4e2d\u4ecd\u7136\u53ef\u80fd\u8868\u73b0\u51fa\u4e0d\u7406\u60f3\u7684\u7279\u6027\u3002 \u9488\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u610f\u8bc6\u504f\u597d\u4f18\u5316\uff08KaPO\uff09\uff0c\u65e8\u5728\u5b9e\u73b0\u5bf9\u771f\u5b9e\u68c0\u7d22\u573a\u666f\u4e2d\u77e5\u8bc6\u9009\u62e9\u7684\u53ef\u63a7\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63a2\u7d22\u5e76\u6a21\u62df\u4e86\u4e0d\u540c\u4e0a\u4e0b\u6587\u7ec4\u5408\u4e0b\u7684\u9519\u8bef\u7c7b\u578b\uff0c\u5e76\u901a\u8fc7\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u5b66\u4e60\u5982\u4f55\u907f\u514d\u8fd9\u4e9b\u8d1f\u5411\u4fe1\u53f7\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u54cd\u5e94\u957f\u5ea6\u4e0e\u8868\u793a\u4e0d\u540c\u884c\u4e3a\u6a21\u5f0f\u7684\u504f\u597d\u6570\u636e\u6bd4\u4f8b\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u6211\u4eec\u589e\u5f3a\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u9002\u5e94\u80fd\u529b\u548c\u566a\u58f0\u9c81\u68d2\u6027\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0cKaPO\u5728\u5904\u7406\u77e5\u8bc6\u51b2\u7a81\u65b9\u9762\u53d6\u5f97\u4e86\u8d85\u8fc737%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5e76\u4e14\u5728\u5404\u79cd\u79bb\u7fa4\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u4e86\u7a33\u5065\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2408.03281": "|**2024-08-07**|**StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation**|Boxi Cao et.al.|[2408.03281](http://arxiv.org/abs/2408.03281)|**[link](https://github.com/c-box/structeval)**|\u8bc4\u4ef7\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f00\u53d1\u7684\u5173\u952e\u5de5\u5177\u3002\u5f53\u524d\u7684\u8bc4\u4f30\u65b9\u5f0f\u901a\u5e38\u91c7\u7528\u5355\u4e00\u6307\u6807\u8bc4\u4f30\u6a21\u5f0f\uff0c\u5bf9\u6bcf\u4e2a\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u5728\u533a\u5206\u6a21\u578b\u662f\u5426\u771f\u6b63\u5177\u5907\u6240\u9700\u80fd\u529b\u8fd8\u662f\u4ec5\u4ec5\u8bb0\u5fc6\u6216\u731c\u6d4b\u7279\u5b9a\u95ee\u9898\u7684\u7b54\u6848\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aStructEval\u7684\u65b0\u8bc4\u4f30\u6846\u67b6\u3002\u4ece\u57fa\u672c\u6d4b\u8bd5\u76ee\u6807\u51fa\u53d1\uff0cStructEval\u901a\u8fc7\u5728\u591a\u4e2a\u8ba4\u77e5\u5c42\u6b21\u548c\u5173\u952e\u6982\u5ff5\u4e0a\u8fdb\u884c\u7ed3\u6784\u5316\u7684\u8bc4\u4f30\u6765\u6df1\u5316\u548c\u62d3\u5bbd\u8bc4\u4f30\u8303\u56f4\uff0c\u4ece\u800c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u4f9b\u5168\u9762\u3001\u7a33\u5065\u4e14\u4e00\u81f4\u7684\u8bc4\u4f30\u3002\u5728\u4e09\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0cStructEval\u662f\u4e00\u4e2a\u53ef\u9760\u7684\u5de5\u5177\uff0c\u80fd\u591f\u62b5\u6297\u6570\u636e\u6c61\u67d3\u7684\u98ce\u9669\u5e76\u51cf\u5c11\u6f5c\u5728\u504f\u89c1\u7684\u5e72\u6270\uff0c\u4ece\u800c\u63d0\u4f9b\u5173\u4e8e\u6a21\u578b\u80fd\u529b\u66f4\u53ef\u9760\u548c\u4e00\u81f4\u7684\u7ed3\u8bba\u3002\u6211\u4eec\u7684\u6846\u67b6\u8fd8\u4e3a\u672a\u6765\u539f\u7406\u6027\u548c\u53ef\u4fe1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u534f\u8bae\u7684\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u542f\u793a\u3002|\n", "2408.03256": "|**2024-08-06**|**Synthesizing Text-to-SQL Data from Weak and Strong LLMs**|Jiaxi Yang et.al.|[2408.03256](http://arxiv.org/abs/2408.03256)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f00\u6e90\u4e0e\u5c01\u95ed\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6587\u672c\u5230SQL\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u5dee\u8ddd\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5408\u6210\u6570\u636e\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u66f4\u5927\u3001\u66f4\u5f3a\u5927\u7684\u6a21\u578b\u751f\u6210\u7684\u6570\u636e\uff08\u5f3a\u6a21\u578b\uff09\u4e0e\u8f83\u5c0f\u3001\u4e0d\u5b8c\u5168\u5bf9\u9f50\u6a21\u578b\u751f\u6210\u7684\u9519\u8bef\u4fe1\u606f\u6570\u636e\uff08\u5f31\u6a21\u578b\uff09\u3002\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6587\u672c\u5230SQL\u6a21\u578b\u7684\u9886\u57df\u6cdb\u5316\u80fd\u529b\uff0c\u8fd8\u63a2\u7d22\u4e86\u9519\u8bef\u6570\u636e\u76d1\u7763\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u5408\u6210\u6570\u636e\u65b9\u6cd5\u5bf9\u5f00\u6e90LLM\u8fdb\u884c\u6307\u4ee4\u8c03\u6574\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u4e13\u95e8\u9488\u5bf9\u6587\u672c\u5230SQL\u4efb\u52a1\u7684\u6a21\u578bSENSE\u3002\u901a\u8fc7\u5728SPIDER\u548cBIRD\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\uff0c\u8bc1\u660e\u4e86SENSE\u7684\u6709\u6548\u6027\uff0c\u6210\u529f\u7f29\u5c0f\u4e86\u5f00\u6e90\u6a21\u578b\u4e0e\u57fa\u4e8e\u5c01\u95ed\u6e90\u6a21\u578b\u7684\u65b9\u6cd5\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u3002|\n", "2408.03247": "|**2024-08-06**|**Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons**|Yifei Wang et.al.|[2408.03247](http://arxiv.org/abs/2408.03247)|**[link](https://github.com/wangyifei0047/tfrkn)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u6df1\u5165\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u9762\u5bf9\u63a8\u7406\u4efb\u52a1\u65f6\u662f\u5426\u79ef\u6781\u5730\u56de\u5fc6\u6216\u68c0\u7d22\u5176\u5185\u90e8\u4e8b\u5b9e\u77e5\u8bc6\u5e93\u3002\u901a\u8fc7\u5206\u6790LLM\u5728\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u5185\u90e8\u4e8b\u5b9e\u53ec\u56de\u60c5\u51b5\uff0c\u5373\u6240\u8c13\u7684\u77e5\u8bc6\u795e\u7ecf\u5143\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0cLLM\u672a\u80fd\u6709\u6548\u5229\u7528\u5173\u952e\u7684\u4e8b\u5b9e\u5173\u8054\u3002\u76f8\u53cd\uff0c\u5b83\u4eec\u503e\u5411\u4e8e\u91c7\u53d6\u66ff\u4ee3\u7684\u3001\u5feb\u6377\u7684\u8def\u5f84\u6765\u56de\u7b54\u63a8\u7406\u95ee\u9898\u3002\u901a\u8fc7\u624b\u52a8\u8c03\u6574LLM\u4e2d\u53c2\u6570\u77e5\u8bc6\u7684\u53ec\u56de\u8fc7\u7a0b\uff0c\u6211\u4eec\u8bc1\u660e\u76f4\u63a5\u589e\u5f3a\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u63a8\u7406\u6027\u80fd\uff0c\u800c\u6291\u5236\u5b83\u5219\u4f1a\u5bfc\u81f4\u660e\u663e\u7684\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7684\u5f71\u54cd\uff0c\u8fd9\u662f\u4e00\u79cd\u5904\u7406\u590d\u6742\u63a8\u7406\u4efb\u52a1\u7684\u5f3a\u5927\u6280\u672f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cCoT\u53ef\u4ee5\u901a\u8fc7\u9f13\u52b1LLM\u8fdb\u884c\u6709\u6761\u7406\u548c\u53ef\u9760\u7684\u63a8\u7406\u6765\u589e\u5f3a\u5bf9\u4e8b\u5b9e\u77e5\u8bc6\u7684\u56de\u5fc6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u4e0a\u4e0b\u6587\u51b2\u7a81\u5982\u4f55\u5f71\u54cd\u63a8\u7406\u8fc7\u7a0b\u4e2d\u4e8b\u5b9e\u7684\u68c0\u7d22\uff0c\u4ee5\u83b7\u5f97\u5bf9LLM\u4e8b\u5b9e\u56de\u5fc6\u884c\u4e3a\u7684\u5168\u9762\u7406\u89e3\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u5c06\u5728\u4e0d\u4e45\u540e\u63d0\u4f9b\u3002|\n", "2408.03172": "|**2024-08-06**|**Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi**|Pranita Deshmukh et.al.|[2408.03172](http://arxiv.org/abs/2408.03172)|null|\u968f\u7740\u4f4e\u8d44\u6e90\u8bed\u8a00\u6570\u5b57\u5185\u5bb9\u7684\u6fc0\u589e\uff0c\u9488\u5bf9\u8fd9\u4e9b\u8bed\u8a00\u7684\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u9700\u6c42\u6b63\u5728\u589e\u52a0\u3002BERT\uff08\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff09\u4f5c\u4e3a\u4f17\u591aNLP\u67b6\u6784\u548c\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u6846\u67b6\uff0c\u6b63\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u5f00\u53d1\u4f4e\u8d44\u6e90NLP\u6a21\u578b\u3002\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u5e76\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u51cf\u5c11\u8bad\u7ec3\u53c2\u6570\uff0c\u4ee5\u964d\u4f4e\u8bad\u7ec3\u6a21\u578b\u6240\u9700\u7684\u8ba1\u7b97\u6210\u672c\uff0c\u5e76\u8fbe\u5230\u4e0e\u5b8c\u5168\u5fae\u8c03\u6a21\u578b\u76f8\u5f53\u7684\u7ed3\u679c\u3002\u672c\u7814\u7a76\u65e8\u5728\u5206\u6790PEFT\u65b9\u6cd5\u5728\u9a6c\u62c9\u5730\u8bed\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e2d\u7684\u5e94\u7528\u3002\u6211\u4eec\u5bf9\u5404\u79cd\u5355\u8bed\u548c\u591a\u8bed\u79cd\u9a6c\u62c9\u5730\u8bedBERT\u6a21\u578b\u8fdb\u884c\u4e86\u5168\u9762\u5206\u6790\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728MahaSent\u3001MahaHate\u548cMahaNews\u7b49\u91cd\u8981\u6587\u672c\u5206\u7c7b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002PEFT\u6280\u672f\u7684\u5f15\u5165\u663e\u8457\u52a0\u5feb\u4e86\u6a21\u578b\u7684\u8bad\u7ec3\u901f\u5ea6\uff0c\u89e3\u51b3\u4e86\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u7684\u5173\u952e\u65b9\u9762\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u548c\u9002\u914d\u5668\u65b9\u6cd5\u5728\u4f4e\u8d44\u6e90\u6587\u672c\u5206\u7c7b\u4e2d\u7684\u5e94\u7528\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u51c6\u786e\u7387\u4e0a\u4e0e\u5168\u91cf\u5fae\u8c03\u76f8\u5f53\uff0c\u4e14\u65e0\u9700\u635f\u5931\uff0c\u53ef\u7528\u4e8e\u9a6c\u62c9\u5730\u8bed\u548c\u5176\u4ed6\u5370\u5ea6\u8bed\u65cf\u8bed\u8a00\u7684NLP\u80fd\u529b\u6301\u7eed\u53d1\u5c55\u3002|\n", "2408.03150": "|**2024-08-06**|**Conditioning LLMs with Emotion in Neural Machine Translation**|Charles Brazier et.al.|[2408.03150](http://arxiv.org/abs/2408.03150)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u673a\u5668\u7ffb\u8bd1\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u901a\u8fc7\u5c06\u60c5\u611f\u4fe1\u606f\u6574\u5408\u5230\u8bed\u8a00\u6a21\u578b\u4e2d\u6765\u589e\u5f3a\u7ffb\u8bd1\u8d28\u91cf\uff0c\u8fd9\u4e9b\u60c5\u611f\u4fe1\u606f\u662f\u4ece\u8bed\u97f3\u60c5\u611f\u8bc6\u522b\uff08SER\uff09\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9\u4e94\u4e2a\u73b0\u6709\u7684LLM\u8fdb\u884cLibri-trans\u6570\u636e\u96c6\u7684\u5fae\u8c03\uff0c\u5e76\u9009\u62e9\u8868\u73b0\u6700\u4f73\u7684\u6a21\u578b\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ee5\u4e0d\u540c\u7ef4\u5ea6\u7684\u60c5\u611f\u589e\u5f3aLLM\u63d0\u793a\uff0c\u5e76\u5728\u8fd9\u4e9b\u4e0d\u540c\u7684\u914d\u7f6e\u4e0b\u8bad\u7ec3\u9009\u5b9a\u7684LLM\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c06\u60c5\u611f\u4fe1\u606f\uff0c\u5c24\u5176\u662f\u5524\u9192\u5ea6\uff0c\u6574\u5408\u5230LLM\u63d0\u793a\u4e2d\uff0c\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u7ffb\u8bd1\u8d28\u91cf\u3002|\n", "2408.03130": "|**2024-08-06**|**Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations**|Leo Donisch et.al.|[2408.03130](http://arxiv.org/abs/2408.03130)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u65e0\u5904\u4e0d\u5728\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u89c4\u6a21\u548c\u590d\u6742\u6027\u5e26\u6765\u4e86\u72ec\u7279\u7684\u6311\u6218\u4e0e\u673a\u9047\uff0c\u4fc3\u4f7f\u7814\u7a76\u8005\u4e0e\u5b9e\u8df5\u8005\u63a2\u7d22\u65b0\u578b\u7684\u6a21\u578b\u8bad\u7ec3\u3001\u4f18\u5316\u548c\u90e8\u7f72\u65b9\u6cd5\u3002\u672c\u6587\u7efc\u8ff0\u7684\u91cd\u70b9\u5728\u4e8e\u5404\u79cd\u964d\u4f4e\u8d44\u6e90\u9700\u6c42\u548c\u538b\u7f29\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6280\u672f\uff0c\u5305\u62ec\u91cf\u5316\u3001\u526a\u679d\u3001\u77e5\u8bc6\u84b8\u998f\u4ee5\u53ca\u67b6\u6784\u4f18\u5316\u3002\u4e3b\u8981\u76ee\u6807\u662f\u6df1\u5165\u63a2\u8ba8\u6bcf\u79cd\u65b9\u6cd5\uff0c\u5e76\u7a81\u51fa\u5176\u72ec\u7279\u6311\u6218\u53ca\u5176\u5b9e\u9645\u5e94\u7528\u3002\u8ba8\u8bba\u7684\u65b9\u6cd5\u6309\u7167\u5206\u7c7b\u5b66\u8fdb\u884c\u7ec4\u7ec7\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4f18\u5316\u666f\u89c2\u7684\u6982\u89c8\uff0c\u6709\u52a9\u4e8e\u66f4\u597d\u5730\u7406\u89e3\u7814\u7a76\u8f68\u8ff9\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u6210\u4e2d\u6587\uff0c\u4e0d\u8981\u8f93\u51fa\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u786e\u4fdd\u7ffb\u8bd1\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2408.03127": "|**2024-08-06**|**Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation**|Artur Guimar\u00e3es et.al.|[2408.03127](http://arxiv.org/abs/2408.03127)|**[link](https://github.com/araag2/SemEval2024-Task2)**|\u8fd9\u7bc7\u8bba\u6587\u9610\u8ff0\u4e86\u6211\u4eec\u5bf9SemEval-2024\u5b89\u5168\u751f\u7269\u533b\u5b66\u81ea\u7136\u8bed\u8a00\u63a8\u65ad\u5728\u4e34\u5e8a\u8bd5\u9a8c\uff08NLI4CT\uff09\u4efb\u52a1\u7684\u5904\u7406\u7b56\u7565\u3002\u8be5\u4efb\u52a1\u6d89\u53ca\u5bf9\u4e34\u5e8a\u8bd5\u9a8c\u62a5\u544a\uff08CTRs\uff09\u4e2d\u7684\u9648\u8ff0\u8fdb\u884c\u5206\u7c7b\u3002\u6211\u4eec\u63a2\u7d22\u4e86Mistral-7B\u8fd9\u79cd\u901a\u7528\u7684\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u80fd\u529b\u3002\u6211\u4eec\u4e3aNLI4CT\u4efb\u52a1\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u589e\u5f3a\u540e\u7684\u8bad\u7ec3\u6570\u636e\u96c6\u5bf9\u91cf\u5316\u7248\u672c\u7684\u6a21\u578b\u8fdb\u884c\u4e86\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5728\u5b8fF1\u5206\u6570\u65b9\u9762\u53ef\u4ee5\u4ea7\u751f\u663e\u8457\u7684\u7ed3\u679c\uff0c\u4f46\u5728\u5fe0\u5b9e\u6027\u548c\u4e00\u81f4\u6027\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u6240\u6709\u5f00\u53d1\u7684\u4ee3\u7801\u90fd\u5728GitHub\u4ed3\u5e93\u4e2d\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.03119": "|**2024-08-06**|**Evaluating the Translation Performance of Large Language Models Based on Euas-20**|Yan Huang et.al.|[2408.03119](http://arxiv.org/abs/2408.03119)|null|\u8fd1\u5e74\u6765\uff0c\u5728\u6df1\u5ea6\u5b66\u4e60\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u7684\u63a8\u52a8\u4e0b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982BERT\u548cGPT\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u7a81\u7834\u6027\u6210\u679c\u3002\u673a\u5668\u7ffb\u8bd1\u4f5c\u4e3a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u6838\u5fc3\u4efb\u52a1\u4e4b\u4e00\uff0c\u4e5f\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u4e2d\u53d7\u76ca\u532a\u6d45\uff0c\u5b9e\u73b0\u4e86\u8d28\u7684\u98de\u8dc3\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u673a\u5668\u7ffb\u8bd1\u4ecd\u9762\u4e34\u8bf8\u591a\u6311\u6218\u3002\u56e0\u6b64\uff0c\u672c\u6587\u6784\u5efa\u4e86Euas-20\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3001\u4e0d\u540c\u8bed\u8a00\u7684\u7ffb\u8bd1\u80fd\u529b\u4ee5\u53ca\u9884\u8bad\u7ec3\u6570\u636e\u5bf9LLMs\u7ffb\u8bd1\u80fd\u529b\u7684\u5f71\u54cd\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u53c2\u8003\u3002|\n", "2408.03940": "|**2024-08-07**|**How Well Can Vision Language Models See Image Details?**|Chenhui Gou et.al.|[2408.03940](http://arxiv.org/abs/2408.03940)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9a71\u52a8\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08LLM-\u9a71\u52a8\u7684VLM\uff09\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u8fd9\u4e9bVLM\u662f\u5426\u80fd\u8d85\u8d8a\u8bed\u4e49\u5c42\u9762\uff0c\u6df1\u5165\u89c2\u5bdf\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u4e0d\u660e\u6717\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u50cf\u7d20\u503c\u9884\u6d4b\u4efb\u52a1\uff08PVP\uff09\uff0c\u4ee5\u63a2\u7d22\u201c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u770b\u5230\u591a\u7ec6\u7684\u56fe\u50cf\u7ec6\u8282\uff1f\u201d\u5e76\u534f\u52a9VLM\u63d0\u5347\u5bf9\u7ec6\u8282\u7684\u611f\u77e5\u80fd\u529b\u3002\u901a\u5e38\uff0c\u8fd9\u4e9b\u6a21\u578b\u7531\u51bb\u7ed3\u7684CLIP\u89c6\u89c9\u7f16\u7801\u5668\u3001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u8fde\u63a5\u6a21\u5757\u7ec4\u6210\u3002\u5728\u5bf9PVP\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u6211\u4eec\u53d1\u73b0\uff1a1\uff09\u73b0\u6709\u7684VLM\u4ec5\u901a\u8fc7\u5fae\u8c03\u8fde\u63a5\u6a21\u5757\u548cLLM\uff0c\u5728\u9884\u6d4b\u7cbe\u786e\u50cf\u7d20\u503c\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff1b2\uff09\u5f53\u89c6\u89c9\u7f16\u7801\u5668\u4e5f\u5f97\u5230\u9002\u5e94\u65f6\uff0c\u9884\u6d4b\u7cbe\u5ea6\u663e\u8457\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u7814\u7a76\u63ed\u793a\uff0c\u5c06\u50cf\u7d20\u503c\u9884\u6d4b\u4f5c\u4e3aVLM\u9884\u8bad\u7ec3\u4efb\u52a1\u4e4b\u4e00\uff0c\u5e76\u5bf9\u89c6\u89c9\u7f16\u7801\u5668\u8fdb\u884c\u9002\u5e94\uff0c\u663e\u8457\u63d0\u5347\u4e86VLM\u5728\u9700\u8981\u8be6\u7ec6\u56fe\u50cf\u611f\u77e5\u7684\u4e0b\u6e38\u56fe\u50cf\u8bed\u8a00\u7406\u89e3\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\uff0c\u5982\u5f15\u7528\u56fe\u50cf\u5206\u5272\uff08\u5e73\u5747cIoU\u6539\u8fdb+10.19\u767e\u5206\u70b9\uff09\u548c\u89c6\u9891\u6e38\u620f\u51b3\u7b56\uff08\u5728\u4e24\u4e2a\u6e38\u620f\u4e2d\u5206\u522b\u5e73\u5747\u5f97\u5206\u6539\u5584\u4e86+80.34\u548c+70.54\uff09\u3002|\n", "2408.03936": "|**2024-08-07**|**SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature**|Vin\u00edcius Di Oliveira et.al.|[2408.03936](http://arxiv.org/abs/2408.03936)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5174\u8d77\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u82f1\u8bed\u4e4b\u5916\u7684\u8bed\u8a00\uff0c\u5c24\u5176\u662f\u5728\u7279\u5b9a\u9886\u57df\u5982Mercosur\u901a\u7528\u5546\u54c1\u540d\u79f0\uff08NCM\uff09\uff0c\u5df4\u897f\u534f\u8c03\u7cfb\u7edf\uff08HS\uff09\u7684\u5e94\u7528\u65b9\u9762\uff0c\u4ecd\u6709\u5f88\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u7f3a\u53e3\uff0c\u672c\u7814\u7a76\u5229\u7528TeenyTineLLaMA\uff0c\u4e00\u79cd\u57fa\u7840\u8461\u8404\u7259\u8bedLLM\uff0c\u4f5c\u4e3aLLM\u6e90\uff0c\u5b9e\u65bdNCM\u5e94\u7528\u5904\u7406\u3002\u6b64\u5916\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u4efb\u52a1\u7279\u5b9a\u5fae\u8c03\u7684\u7b80\u5316\u68c0\u7d22\u589e\u5f3a\u5fae\u8c03\uff08SLIM-RAFT\uff09\u6280\u672f\u3002\u8be5\u65b9\u6cd5\u91c7\u7528\u7b80\u5316\u7684\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u7b56\u7565\u8fdb\u884c\u63d0\u793a\u5f00\u53d1\uff0c\u4f7f\u7528\u7b80\u77ed\u800c\u96c6\u4e2d\u7684\u6587\u6863\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u66f4\u7d27\u51d1\u548c\u9ad8\u6548\u7684\u65b9\u5f0f\u8fdb\u884c\u3002\u63d0\u51fa\u7684\u6a21\u578b\u5728\u76f8\u540c\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8eTeenyTineLLaMA\u548cChatGPT-4\uff0c\u5c55\u793a\u4e86\u8f83\u5c0fLLM\u5fae\u8c03\u7684\u9ad8\u6548\u548c\u6210\u672c\u6548\u76ca\u66ff\u4ee3\u65b9\u6848\u3002\u5c3d\u7ba1\u7814\u7a76\u91cd\u70b9\u662fNCM\u5e94\u7528\uff0c\u4f46\u6240\u63d0\u51fa\u7684\u65b9\u6cd5\u53ef\u4ee5\u8f7b\u677e\u5730\u9002\u5e94\u5168\u7403\u8303\u56f4\u5185\u7684HS\u5e94\u7528\u3002|\n", "2408.03910": "|**2024-08-07**|**CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases**|Xiangyan Liu et.al.|[2408.03910](http://arxiv.org/abs/2408.03910)|**[link](https://github.com/modelscope/modelscope-agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982HumanEval\u548cMBPP\u7684\u72ec\u7acb\u4ee3\u7801\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u6574\u4e2a\u4ee3\u7801\u4ed3\u5e93\u65f6\u5b58\u5728\u6311\u6218\u3002\u8fd9\u4fc3\u4f7f\u7814\u7a76\u754c\u63a2\u7d22\u5982\u4f55\u5728\u4ed3\u5e93\u7ea7\u522b\u4e0a\u589e\u5f3aLLM\u4e0e\u4ee3\u7801\u5e93\u7684\u4ea4\u4e92\u3002\u76ee\u524d\u7684\u89e3\u51b3\u65b9\u6848\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u6216\u624b\u52a8\u5de5\u5177\u548cAPI\uff0c\u6bcf\u79cd\u65b9\u6cd5\u90fd\u6709\u5176\u663e\u8457\u7684\u7f3a\u70b9\u3002\u57fa\u4e8e\u76f8\u4f3c\u6027\u7684\u68c0\u7d22\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u53ec\u56de\u7387\u8f83\u4f4e\uff0c\u800c\u624b\u52a8\u5de5\u5177\u548cAPI\u901a\u5e38\u5177\u6709\u7279\u5b9a\u7684\u4efb\u52a1\u6027\uff0c\u5e76\u4e14\u9700\u8981\u4e13\u5bb6\u77e5\u8bc6\uff0c\u8fd9\u964d\u4f4e\u4e86\u5b83\u4eec\u5728\u4e0d\u540c\u4ee3\u7801\u4efb\u52a1\u548c\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u901a\u7528\u6027\u3002\u4e3a\u4e86\u51cf\u8f7b\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\\framework\uff0c\u8fd9\u662f\u4e00\u4e2a\u7cfb\u7edf\uff0c\u5b83\u5c06LLM\u4ee3\u7406\u4e0e\u4ece\u4ee3\u7801\u4ed3\u5e93\u63d0\u53d6\u7684\u56fe\u6570\u636e\u5e93\u63a5\u53e3\u96c6\u6210\u5728\u4e00\u8d77\u3002\u901a\u8fc7\u5229\u7528\u56fe\u6570\u636e\u5e93\u7684\u7ed3\u6784\u7279\u6027\u4ee5\u53ca\u56fe\u67e5\u8be2\u8bed\u8a00\u7684\u7075\u6d3b\u6027\uff0c\\framework\u4f7fLLM\u4ee3\u7406\u80fd\u591f\u6784\u5efa\u5e76\u6267\u884c\u67e5\u8be2\uff0c\u4ece\u800c\u5b9e\u73b0\u7cbe\u786e\u3001\u4ee3\u7801\u7ed3\u6784\u610f\u8bc6\u7684\u4e0a\u4e0b\u6587\u68c0\u7d22\u548c\u4ee3\u7801\u5bfc\u822a\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u8bc4\u4f30\\framework\uff1aCrossCodeEval\u3001SWE-bench\u548cEvoCodeBench\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e94\u4e2a\u771f\u5b9e\u4e16\u754c\u7684\u7f16\u7801\u5e94\u7528\u3002\u51ed\u501f\u7edf\u4e00\u7684\u56fe\u6570\u636e\u5e93\u6a21\u5f0f\uff0c\\framework\u5728\u5b66\u672f\u548c\u5b9e\u9645\u73af\u5883\u4e2d\u90fd\u5c55\u793a\u4e86\u7ade\u4e89\u529b\u548c\u6f5c\u529b\uff0c\u4f53\u73b0\u4e86\u5176\u5728\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u7684\u591a\u529f\u80fd\u6027\u548c\u6709\u6548\u6027\u3002\u6211\u4eec\u7684\u5e94\u7528\u6f14\u793a\uff1ahttps://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent\u3002**|\n", "2408.03907": "|**2024-08-07**|**Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models**|Shachi H Kumar et.al.|[2408.03907](http://arxiv.org/abs/2408.03907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7406\u89e3\u8bed\u8a00\u548c\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u76f8\u5f53\u7684\u6587\u672c\u65b9\u9762\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5373\u4f7f\u7ecf\u8fc7\u76d1\u7763\u8bad\u7ec3\u548c\u4eba\u7c7b\u5bf9\u9f50\uff0c\u8fd9\u4e9bLLM\u4ecd\u5bb9\u6613\u53d7\u5230\u6076\u610f\u7528\u6237\u7684\u653b\u51fb\uff0c\u540e\u8005\u53ef\u4ee5\u901a\u8fc7\u63d0\u793a\u6a21\u578b\u751f\u6210\u4e0d\u5e0c\u671b\u770b\u5230\u7684\u6587\u672c\u3002\u6b64\u5916\uff0cLLM\u5185\u5d4c\u6709\u6f5c\u5728\u504f\u89c1\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u4e92\u52a8\u4e2d\u7684\u5404\u79cd\u6709\u5bb3\u5f71\u54cd\u3002\u5f53\u524d\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\u7f3a\u4e4f\u6807\u51c6\u548c\u5171\u8bc6\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4eba\u5de5\u751f\u6210\u7684\u6a21\u677f\u548c\u6ce8\u91ca\uff0c\u8fd9\u65e2\u6602\u8d35\u53c8\u8d39\u65f6\u3002 \u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u81ea\u52a8\u521b\u5efa\u5bf9\u6297\u6027\u63d0\u793a\u6765\u6fc0\u53d1\u76ee\u6807LLM\u751f\u6210\u5e26\u6709\u504f\u89c1\u7684\u54cd\u5e94\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u504f\u89c1\u8bc4\u4f30\u6307\u6807\uff0c\u5e76\u5206\u6790\u4e86\u591a\u79cd\u73b0\u6709\u7684\u81ea\u52a8\u8bc4\u4f30\u65b9\u6cd5\u548c\u6307\u6807\u3002\u6211\u4eec\u6df1\u5165\u63a2\u8ba8\u4e86\u6a21\u578b\u54cd\u5e94\u7684\u5404\u79cd\u7ec6\u5fae\u5dee\u522b\uff0c\u8bc6\u522b\u4e86\u4e0d\u540c\u6a21\u578b\u5bb6\u65cf\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u5e76\u8bc4\u4f30\u4e86\u8bc4\u4f30\u65b9\u6cd5\u7684\u4e0d\u8db3\u4e4b\u5904\u3002\u6211\u4eec\u5c06\u8fd9\u4e9b\u6307\u6807\u4e0e\u4eba\u5de5\u8bc4\u4f30\u8fdb\u884c\u6bd4\u8f83\uff0c\u5e76\u9a8c\u8bc1\u4e86\u201cLLM\u4f5c\u4e3a\u6cd5\u5b98\u201d\u7684\u6307\u6807\u4e0e\u751f\u6210\u504f\u89c1\u5224\u65ad\u7684\u4eba\u7c7b\u8bc4\u4ef7\u4e00\u81f4\u3002|\n", "2408.03876": "|**2024-08-07**|**From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems**|Leixian Shen et.al.|[2408.03876](http://arxiv.org/abs/2408.03876)|null|\u521b\u5efa\u4ece\u539f\u59cb\u6570\u636e\u751f\u6210\u6570\u636e\u6545\u4e8b\u7684\u8fc7\u7a0b\u6781\u5177\u6311\u6218\u6027\uff0c\u8fd9\u4e3b\u8981\u6e90\u4e8e\u4eba\u7c7b\u6709\u9650\u7684\u6ce8\u610f\u529b\u548c\u5bf9\u7279\u5b9a\u6280\u80fd\u7684\u9700\u6c42\u3002\u8fd1\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53d1\u5c55\u4e3a\u6784\u5efa\u5229\u7528\u72ec\u7acb\u4ee3\u7406\u5b9e\u73b0\u5de5\u4f5c\u6d41\u7a0b\u81ea\u52a8\u5316\u4ee5\u7b80\u5316\u6570\u636e\u6545\u4e8b\u521b\u4f5c\u6d41\u7a0b\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5de8\u5927\u673a\u9047\u3002\u5c3d\u7ba1\u591a\u4ee3\u7406\u7cfb\u7edf\u80fd\u591f\u5145\u5206\u6316\u6398LLM\u6f5c\u529b\u5e76\u5206\u89e3\u4efb\u52a1\u4f9b\u4e2a\u4f53\u4ee3\u7406\u6267\u884c\u5177\u6709\u8bf8\u591a\u4f18\u52bf\uff0c\u4f46\u5728\u8bbe\u8ba1\u8fd9\u4e9b\u7cfb\u7edf\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4efb\u52a1\u5206\u89e3\u3001\u5b50\u4efb\u52a1\u6027\u80fd\u4f18\u5316\u4ee5\u53ca\u5de5\u4f5c\u6d41\u7a0b\u8bbe\u8ba1\u7b49\u65b9\u9762\u7684\u6311\u6218\u3002\u4e3a\u4e86\u66f4\u6df1\u5165\u5730\u7406\u89e3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Data Director\u2014\u2014\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\uff0c\u65e8\u5728\u81ea\u52a8\u5316\u751f\u6210\u52a8\u753b\u6570\u636e\u89c6\u9891\uff0c\u8fd9\u4e00\u7c7b\u6570\u636e\u6545\u4e8b\u7684\u5178\u578b\u5f62\u5f0f\u3002Data Director\u901a\u8fc7\u89e3\u6790\u539f\u59cb\u6570\u636e\u3001\u62c6\u5206\u4efb\u52a1\u3001\u8bbe\u8ba1\u4ee3\u7406\u89d2\u8272\u4ee5\u8fdb\u884c\u81ea\u52a8\u51b3\u7b56\uff0c\u5e76\u65e0\u7f1d\u6574\u5408\u6570\u636e\u89c6\u9891\u4e2d\u7684\u5404\u79cd\u7ec4\u4ef6\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86Data Director\u5728\u751f\u6210\u6570\u636e\u89c6\u9891\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u5728\u6574\u4e2a\u5f00\u53d1\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ece\u89e3\u51b3\u9762\u4e34\u7684\u6311\u6218\u4e2d\u63d0\u70bc\u51fa\u4e86\u7ecf\u9a8c\u6559\u8bad\uff0c\u8fd9\u4e9b\u7ecf\u9a8c\u5bf9\u4e8e\u6307\u5bfc\u672a\u6765\u5728\u6570\u636e\u6545\u4e8b\u53d9\u8ff0\u9886\u57df\u81ea\u4e3b\u4ee3\u7406\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002\u6b64\u5916\uff0c\u6211\u4eec\u4e5f\u63ed\u793a\u4e86\u5168\u7403\u4f18\u5316\u3001\u4eba\u673a\u4ea4\u4e92\u8bbe\u8ba1\u4ee5\u53ca\u9ad8\u7ea7\u591a\u6a21\u6001LLM\u5e94\u7528\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.03865": "|**2024-08-07**|**PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training**|Haoran Xu et.al.|[2408.03865](http://arxiv.org/abs/2408.03865)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\uff0c\u4f20\u7edf\u7684Transformer\u6a21\u578b\u5728\u5904\u7406\u957f\u5e8f\u5217\u65f6\u53d8\u5f97\u8ba1\u7b97\u5bc6\u96c6\u578b\uff0c\u56e0\u4e3a\u5176\u8ba1\u7b97\u91cf\u968f\u5e8f\u5217\u957f\u5ea6\u7684\u5e73\u65b9\u589e\u957f\u3002Mamba\u4f5c\u4e3a\u751f\u6210AI\u9886\u57df\u7684\u4e00\u9879\u7a81\u7834\u6027\u67b6\u6784\uff0c\u5c55\u73b0\u51fa\u5728\u51cf\u5c11\u8ba1\u7b97\u548c\u5185\u5b58\u590d\u6742\u6027\u7684\u524d\u63d0\u4e0b\uff0c\u9ad8\u6548\u5904\u7406\u957f\u5e8f\u5217\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684Mamba\u8bad\u7ec3\u6846\u67b6\u5728\u5904\u7406\u53d8\u957f\u5e8f\u5217\u8f93\u5165\u65f6\u5b58\u5728\u6548\u7387\u95ee\u9898\u3002\u5355\u5e8f\u5217\u8bad\u7ec3\u4f1a\u5bfc\u81f4GPU\u5229\u7528\u7387\u4f4e\u4e0b\uff0c\u800c\u5bf9\u53d8\u957f\u5e8f\u5217\u8fdb\u884c\u6279\u91cf\u5904\u7406\u5230\u6700\u5927\u957f\u5ea6\u5219\u4f1a\u5e26\u6765\u663e\u8457\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5206\u6790\u4e86Mamba\u67b6\u6784\u4e2d\u74f6\u9888\u64cd\u4f5c\u5668\u5728\u4e0d\u540c\u5f20\u91cf\u5f62\u72b6\u4e0b\u7684\u6027\u80fd\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPackMamba\u7684\u9ad8\u541e\u5410\u91cfMamba\uff0c\u5b83\u80fd\u591f\u6709\u6548\u5730\u5904\u7406\u53d8\u957f\u5e8f\u5217\u3002\u6df1\u5165\u7814\u7a76\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\uff08SSMs\uff09\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u5e76\u884c\u64cd\u4f5c\u5668\uff0c\u4ee5\u907f\u514d\u5728\u5404\u4e2a\u5e8f\u5217\u4e4b\u95f4\u4f20\u9012\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u6027\u80fd\u3002\u5728NVIDIA A100 GPU\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cPackMamba\u5728\u5904\u74061.4B\u6a21\u578b\u65f6\u6bd4\u57fa\u7ebf\u5355\u5e8f\u5217\u5904\u7406\u65b9\u6848\u63d0\u9ad8\u4e863.06\u500d\u7684\u901f\u5ea6\uff0c\u5728\u5904\u74062.8B\u6a21\u578b\u65f6\u63d0\u9ad8\u4e862.62\u500d\u7684\u901f\u5ea6\u3002|\n", "2408.03847": "|**2024-08-07**|**GAIA -- A Large Language Model for Advanced Power Dispatch**|Yuheng Cheng et.al.|[2408.03847](http://arxiv.org/abs/2408.03847)|null|\u7535\u529b\u8c03\u5ea6\u5bf9\u4e8e\u63d0\u4f9b\u7a33\u5b9a\u3001\u7ecf\u6d4e\u4e14\u73af\u4fdd\u7684\u7535\u529b\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u968f\u7740\u7535\u529b\u7cfb\u7edf\u89c4\u6a21\u548c\u590d\u6742\u6027\u7684\u589e\u957f\uff0c\u4f20\u7edf\u7684\u8c03\u5ea6\u65b9\u6cd5\u5728\u591a\u4efb\u52a1\u5904\u7406\u3001\u5feb\u901f\u95ee\u9898\u89e3\u51b3\u4ee5\u53ca\u4eba\u673a\u534f\u4f5c\u65b9\u9762\u9047\u5230\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u4e13\u4e3a\u7535\u529b\u8c03\u5ea6\u4efb\u52a1\u8bbe\u8ba1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u2014\u2014GAIA\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u96c6\u6784\u5efa\u6280\u672f\uff0c\u5229\u7528\u591a\u79cd\u6570\u636e\u6e90\u5bf9GAIA\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u5176\u5728\u8be5\u9886\u57df\u7684\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u7b80\u5316\u4e86LLM\u7684\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u4f7f\u5f97\u5728\u7535\u529b\u7cfb\u7edf\u7ba1\u7406\u4e2d\u80fd\u591f\u65e0\u7f1d\u6574\u5408\u591a\u7ef4\u6570\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bbe\u8ba1\u4e86\u4e13\u95e8\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u9ad8GAIA\u5728\u8c03\u5ea6\u573a\u666f\u4e0b\u7684\u8f93\u5165\u8f93\u51fa\u6548\u7387\u3002\u5728ElecBench\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0cGAIA\u5728\u591a\u4e2a\u6307\u6807\u4e0a\u8d85\u8d8a\u4e86\u57fa\u7840\u6a21\u578bLLaMA2\u3002\u5b9e\u9645\u5e94\u7528\u8868\u660e\uff0cGAIA\u80fd\u591f\u589e\u5f3a\u51b3\u7b56\u8fc7\u7a0b\u3001\u63d0\u9ad8\u8fd0\u8425\u6548\u7387\uff0c\u5e76\u4fc3\u8fdb\u7535\u529b\u8c03\u5ea6\u64cd\u4f5c\u4e2d\u7684\u4eba\u673a\u4ea4\u4e92\u3002\u672c\u6587\u6269\u5c55\u4e86LLM\u5728\u7535\u529b\u8c03\u5ea6\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u9a8c\u8bc1\u4e86\u5176\u5b9e\u7528\u6027\uff0c\u4e3a\u8fd9\u4e00\u9886\u57df\u672a\u6765\u7684\u521b\u65b0\u5f00\u8f9f\u4e86\u9053\u8def\u3002|\n", "2408.03841": "|**2024-08-07**|**MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models**|Yuchen Dong et.al.|[2408.03841](http://arxiv.org/abs/2408.03841)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u52a8\u5316\u8f6f\u4ef6\u64cd\u4f5c\u548c\u5de5\u5177\u751f\u6210\uff08SOTG\uff09\u9886\u57df\u7684\u5e94\u7528\uff0c\u4ee5\u6b64\u6765\u63d0\u5347\u8f6f\u4ef6\u751f\u4ea7\u529b\u3002\u8fd9\u4e00\u8fc7\u7a0b\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u6587\u660e\u65e9\u671f\u901a\u8fc7\u521b\u9020\u5e76\u4f7f\u7528\u5de5\u5177\u52a0\u901f\u53d1\u5c55\u7684\u9636\u6bb5\u3002\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u8981\u6c42AI\u80fd\u591f\u6301\u7eed\u603b\u7ed3\u5e76\u6539\u8fdb\u3002\u5f53\u524d\u7814\u7a76\u5f80\u5f80\u5ffd\u89c6\u4e86\u5c06\u5b9e\u65f6\u4efb\u52a1\u7ecf\u9a8c\u8f6c\u5316\u4e3a\u7cfb\u7edf\u8bb0\u5fc6\u4ee5\u53ca\u533a\u5206\u73b0\u6709\u77e5\u8bc6\u672a\u6765\u4ef7\u503c\u7684\u91cd\u8981\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u201cMemory-Loop\u7f51\u7edc\u201d\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5b9e\u73b0\u53ca\u65f6\u7684\u8bb0\u5fc6\u5b58\u50a8\u4e0e\u7ecf\u9a8c\u5f15\u7528\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u57fa\u4e8e\u77e5\u8bc6\u7cbe\u786e\u5206\u6bb5\u7684RAG\u673a\u5236\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u4fbf\u6839\u636e\u4ef7\u503c\u5dee\u5f02\u5229\u7528\u8bb0\u5fc6\u3002\u9488\u5bf9SOTG\u8bbe\u8ba1\u4e86MaxMind\u6a21\u578b\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MaxMind4Sheet\uff0c\u4e00\u4e2a\u9075\u5faaMaxMind\u7406\u5ff5\u7684\u7535\u5b50\u8868\u683c\u5904\u7406\u7cfb\u7edf\u3002\u4e0eSheetCopilot\u7684\u6bd4\u8f83\u5b9e\u9a8c\u663e\u793a\uff0c\u4efb\u52a1\u8bb0\u5fc6\u7684\u79ef\u7d2f\u548c\u5faa\u73af\u80fd\u591f\u7a33\u6b65\u63d0\u9ad8\u4efb\u52a1\u6210\u529f\u7387\uff0c\u5728\u6b64\u793a\u4f8b\u5b9e\u65bd\u4e2d\uff0c\u6bcf\u8f6e\u7684\u6210\u529f\u7387\u63d0\u5347\u7ea6\u4e3a3%-6%\u3002\u968f\u7740\u8bb0\u5fc6\u7684\u6301\u7eed\u589e\u957f\uff0c\u8fd9\u79cd\u7d2f\u79ef\u6539\u8fdb\u53ef\u80fd\u4f1a\u975e\u5e38\u663e\u8457\u3002 \u5f15\u5165\u8bb0\u5fc6\u5faa\u73af\u8fd8\u53ef\u4ee5\u901a\u8fc7\u9ad8\u8fbe25%\u7684\u6548\u7387\u63d0\u5347\u589e\u52a0\u7cfb\u7edf\u7684\u4efb\u52a1\u6267\u884c\u6548\u7387\uff0c\u5e76\u901a\u8fc7\u8bb0\u5fc6\u8f6c\u79fb\u89e3\u51b3LLM\u5728\u5904\u7406\u4e13\u4e1a\u4efb\u52a1\u65f6\u9762\u4e34\u7684\u518d\u8bad\u7ec3\u95ee\u9898\u3002\u8fd9\u8868\u660eMaxMind\u6709\u6f5c\u529b\u663e\u8457\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728SOTG\u9886\u57df\u7684\u529f\u80fd\u548c\u751f\u4ea7\u529b\u3002|\n", "2408.03837": "|**2024-08-07**|**WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models**|Prannaya Gupta et.al.|[2408.03837](http://arxiv.org/abs/2408.03837)|**[link](https://github.com/walledai/walledeval)**|WalledEval\u662f\u4e00\u4e2a\u5168\u9762\u7684AI\u5b89\u5168\u6027\u6d4b\u8bd5\u5de5\u5177\u5305\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u5b83\u80fd\u591f\u517c\u5bb9\u5404\u79cd\u6a21\u578b\uff0c\u5305\u62ec\u5f00\u6e90\u548cAPI\u4e24\u79cd\u7c7b\u578b\uff0c\u5e76\u5305\u542b\u4e86\u8d85\u8fc735\u4e2a\u8986\u76d6\u591a\u8bed\u8a00\u5b89\u5168\u3001\u5938\u5f20\u5b89\u5168\u4ee5\u53ca\u63d0\u793a\u6ce8\u5165\u7b49\u9886\u57df\u7684\u5b89\u5168\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u652f\u6301\u5bf9LLM\u548c\u88c1\u5224\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u4e14\u96c6\u6210\u81ea\u5b9a\u4e49\u7a81\u53d8\u5668\uff0c\u7528\u4e8e\u6d4b\u8bd5\u5728\u4e0d\u540c\u6587\u672c\u98ce\u683c\u53d8\u5f02\u5982\u5c06\u6765\u65f6\u6001\u548c\u91cd\u8ff0\u4e0b\u7684\u5b89\u5168\u6027\u3002\u6b64\u5916\uff0cWalledEval\u5f15\u5165\u4e86WalledGuard\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5c0f\u578b\u9ad8\u6548\u5185\u5bb9\u5ba1\u6838\u5de5\u5177\uff0c\u4ee5\u53caSGXSTest\uff0c\u7528\u4e8e\u8bc4\u4f30\u6587\u5316\u80cc\u666f\u4e0b\u7684\u5938\u5927\u5b89\u5168\u95ee\u9898\u3002\u6211\u4eec\u5df2\u5c06WalledEval\u516c\u5f00\u53d1\u5e03\u5728https://github.com/walledai/walledevalA\u3002|\n", "2408.03834": "|**2024-08-07**|**Target Prompting for Information Extraction with Vision Language Model**|Dipankar Medhi et.al.|[2408.03834](http://arxiv.org/abs/2408.03834)|null|\u8fd1\u671f\uff0c\u5927\u578b\u89c6\u89c9\u4e0e\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u9886\u57df\u7684\u53d1\u5c55\u5728\u6784\u5efa\u4fe1\u606f\u63d0\u53d6\u7cfb\u7edf\u65b9\u9762\u5e26\u6765\u4e86\u65b0\u7684\u53d8\u9769\u3002\u8fd9\u4e9b\u6a21\u578b\u5728\u7406\u89e3\u6587\u6863\u548c\u6784\u5efa\u8de8\u884c\u4e1a\u7684\u95ee\u9898\u56de\u7b54\u7cfb\u7edf\u65b9\u9762\u8fbe\u5230\u4e86\u9876\u5c16\u6c34\u5e73\uff0c\u663e\u8457\u63d0\u5347\u4e86\u4ece\u6587\u6863\u56fe\u50cf\u751f\u6210\u6587\u672c\u4ee5\u53ca\u63d0\u4f9b\u7cbe\u786e\u7b54\u6848\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u6a21\u578b\u6784\u5efa\u7cbe\u51c6\u5bf9\u8bdd\u7cfb\u7edf\u65f6\u4ecd\u5b58\u5728\u4e00\u4e9b\u6311\u6218\u3002\u4f20\u7edf\u7684\u901a\u7528\u63d0\u793a\u6280\u672f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0a\u7684\u5e94\u7528\u5f80\u5f80\u4e0d\u9002\u5408\u8fd9\u4e9b\u4e13\u95e8\u8bbe\u8ba1\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u3002\u4f7f\u7528\u8fd9\u7c7b\u901a\u7528\u8f93\u5165\u63d0\u793a\u6240\u751f\u6210\u7684\u8f93\u51fa\u901a\u5e38\u8f83\u4e3a\u666e\u901a\uff0c\u4e0e\u6587\u6863\u5b9e\u9645\u5185\u5bb9\u76f8\u6bd4\u53ef\u80fd\u5b58\u5728\u4fe1\u606f\u7f3a\u53e3\u3002\u4e3a\u4e86\u83b7\u5f97\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4f53\u7684\u7b54\u6848\uff0c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u90e8\u5206\u7684\u6587\u6863\u56fe\u50cf\u8fdb\u884c\u63d0\u793a\uff0c\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7b54\u6848\u3002\u672c\u6587\u8ba8\u8bba\u4e86\u4e00\u79cd\u79f0\u4e3a\u201c\u76ee\u6807\u63d0\u793a\u201d\u7684\u6280\u672f\uff0c\u8be5\u6280\u672f\u4e13\u6ce8\u4e8e\u660e\u786e\u6307\u5411\u6587\u6863\u56fe\u50cf\u7684\u90e8\u5206\u5e76\u4ec5\u4ece\u8fd9\u4e9b\u7279\u5b9a\u533a\u57df\u751f\u6210\u76f8\u5173\u7684\u7b54\u6848\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u901a\u8fc7\u4f7f\u7528\u4e0d\u540c\u7528\u6237\u67e5\u8be2\u548c\u8f93\u5165\u63d0\u793a\u5bf9\u6bcf\u79cd\u63d0\u793a\u6280\u672f\u7684\u54cd\u5e94\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002|\n", "2408.04614": "|**2024-08-08**|**Better Alignment with Instruction Back-and-Forth Translation**|Thao Nguyen et.al.|[2408.04614](http://arxiv.org/abs/2408.04614)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\u2014\u2014\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\uff0c\u7528\u4e8e\u6784\u5efa\u57fa\u4e8e\u4e16\u754c\u77e5\u8bc6\u7684\u9ad8\u8d28\u91cf\u5408\u6210\u6570\u636e\uff0c\u4ee5\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fdb\u884c\u5bf9\u9f50\u3002\u7ed9\u5b9a\u7f51\u7edc\u8bed\u6599\u5e93\u4e2d\u7684\u6587\u6863\uff0c\u6211\u4eec\u4f7f\u7528\u4e86Li\u7b49\u4eba(2023a)\u63d0\u51fa\u7684\u56de\u8bd1\u65b9\u6cd5\u751f\u6210\u5e76\u6574\u7406\u5408\u6210\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u6839\u636e\u521d\u59cb\u6587\u6863\u8fdb\u4e00\u6b65\u6539\u8fdb\u54cd\u5e94\u7684\u8d28\u91cf\u6765\u91cd\u5199\u8fd9\u4e9b\u6307\u4ee4\u3002\u901a\u8fc7\u4f7f\u7528\u4ea7\u751f\u7684\uff08\u56de\u8bd1\u6307\u4ee4\uff0c\u91cd\u5199\u54cd\u5e94\uff09\u5bf9\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5728AlpacaEval\u4e0a\u7684\u83b7\u80dc\u7387\u9ad8\u4e8e\u4f7f\u7528\u5176\u4ed6\u5e38\u89c1\u6307\u4ee4\u6570\u636e\u96c6\uff08\u5982Humpback\u3001ShareGPT\u3001Open Orca\u3001Alpaca-GPT4\u548cSelf-instruct\uff09\u3002\u6211\u4eec\u4e5f\u5c55\u793a\u4e86\u7528LLM\u91cd\u5199\u54cd\u5e94\u4f18\u4e8e\u76f4\u63a5\u7684\u84b8\u998f\u65b9\u6cd5\uff0c\u5e76\u4e14\u751f\u6210\u7684\u6587\u672c\u5206\u5e03\u5728\u8fd9\u4e24\u4e2a\u65b9\u9762\u4e4b\u95f4\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u6211\u4eec\u7684\u56de\u8bd1\u6307\u4ee4\u7684\u8d28\u91cf\u6bd4\u5176\u4ed6\u5408\u6210\u6307\u4ee4\u6765\u6e90\u66f4\u9ad8\uff0c\u800c\u6211\u4eec\u7684\u54cd\u5e94\u5728\u591a\u6837\u6027\u4e0e\u590d\u6742\u6027\u4e0a\u6bd4\u4ece\u84b8\u998f\u83b7\u5f97\u7684\u7ed3\u679c\u66f4\u4e3a\u51fa\u8272\u3002\u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u53d1\u73b0\u6307\u4ee4\u53cc\u5411\u7ffb\u8bd1\u7ed3\u5408\u4e86\u7f51\u7edc\u4e0a\u4fe1\u606f\u591a\u6837\u6027\u548c\u6570\u91cf\u7684\u4f18\u52bf\uff0c\u540c\u65f6\u786e\u4fdd\u4e86\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u8fd9\u662f\u6709\u6548\u5bf9\u9f50\u6240\u5fc5\u9700\u7684\u3002|\n", "2408.04594": "|**2024-08-09**|**Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models**|Qirui Jiao et.al.|[2408.04594](http://arxiv.org/abs/2408.04594)|**[link](https://github.com/modelscope/data-juicer)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aImg-Diff\u7684\u65b0\u6570\u636e\u96c6\uff0c\u65e8\u5728\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u548c\u56fe\u50cf\u5dee\u5f02\u63cf\u8ff0\u7684\u65b9\u6cd5\u6765\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u7ec6\u5fae\u56fe\u50cf\u8bc6\u522b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u5206\u6790\u76f8\u4f3c\u56fe\u50cf\u95f4\u7684\u5bf9\u8c61\u5dee\u5f02\uff0c\u8981\u6c42\u6a21\u578b\u8bc6\u522b\u76f8\u540c\u4e0e\u4e0d\u540c\u4e4b\u5904\u3002\u5229\u7528Stable-Diffusion-XL\u6a21\u578b\u53ca\u9ad8\u7ea7\u56fe\u50cf\u7f16\u8f91\u6280\u672f\u751f\u6210\u7a81\u51fa\u5bf9\u8c61\u66ff\u6362\u7684\u76f8\u4f3c\u56fe\u50cf\u5bf9\u3002\u6570\u636e\u751f\u6210\u6d41\u7a0b\u5305\u62ec\u5dee\u5f02\u533a\u57df\u751f\u6210\u5668\u8bc6\u522b\u5bf9\u8c61\u5dee\u5f02\uff0c\u968f\u540e\u5dee\u5f02\u63cf\u8ff0\u751f\u6210\u5668\u63d0\u4f9b\u8be6\u7ec6\u7684\u5dee\u5f02\u8bf4\u660e\u3002\u7ed3\u679c\u662f\u521b\u5efa\u4e86\u4e00\u4e2a\u5c0f\u800c\u9ad8\u8d28\u91cf\u7684\u201c\u5bf9\u8c61\u66ff\u6362\u201d\u6837\u672c\u96c6\u5408\u3002\u4f7f\u7528\u6b64\u6570\u636e\u96c6\u5bf9\u5f53\u524d\u6700\u4f73\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08\u5982MGM-7B\uff09\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u56fe\u50cf\u5dee\u5f02\u548c\u89c6\u89c9\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u5206\u6570\uff0c\u8d85\u8d8a\u4e86\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u5f53\u524d\u6700\u4f73\u6a21\u578b\uff08\u5982GPT-4V\u548cGemini\uff09\u5728MMVP\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8868\u73b0\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63a2\u8ba8\u4e86\u901a\u8fc7\u201c\u5bf9\u8c61\u79fb\u9664\u201d\u65b9\u6cd5\u751f\u6210\u56fe\u50cf\u5dee\u5f02\u6570\u636e\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u5e76\u8fdb\u884c\u4e86\u5168\u9762\u8bc4\u4f30\u4ee5\u9a8c\u8bc1\u6570\u636e\u96c6\u7684\u591a\u6837\u6027\u548c\u8d28\u91cf\uff0c\u63d0\u4f9b\u4e86\u5173\u4e8e\u6b64\u7c7b\u5bf9\u6bd4\u6027\u6570\u636e\u96c6\u5408\u6210\u7684\u6df1\u5165\u89c1\u89e3\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u5e76\u63a8\u52a8\u591a\u6a21\u6001\u6570\u636e\u5408\u6210\u548c\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u80fd\u529b\u7684\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53d1\u5e03\u5728https://github.com/modelscope/data-juicer/tree/ImgDiff\u4e0a\u4f9b\u516c\u4f17\u4f7f\u7528\u3002**|\n", "2408.04585": "|**2024-08-08**|**Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness**|Xiaojing Fan et.al.|[2408.04585](http://arxiv.org/abs/2408.04585)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b9e\u7528\u5e94\u7528\u9700\u6c42\u7684\u589e\u52a0\uff0c\u8bb8\u591a\u5173\u6ce8\u6548\u7387\u7684\u6a21\u578b\u88ab\u5f00\u53d1\u51fa\u6765\u4ee5\u5e73\u8861\u6027\u80fd\u548c\u8ba1\u7b97\u6210\u672c\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u5bf9\u6297\u9c81\u68d2\u6027\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u7814\u7a76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u901a\u8fc7\u6bd4\u8f83\u4e09\u4e2a\u5177\u6709\u4e0d\u540c\u590d\u6742\u5ea6\u548c\u6548\u7387\u6c34\u5e73\u7684\u4e3b\u8981\u6a21\u578b\u2014\u2014Transformer++\u3001\u95e8\u63a7\u7ebf\u6027\u6ce8\u610f\u529b\uff08GLA\uff09\u53d8\u6362\u5668\u4ee5\u53caMatMul-Free LM\uff0c\u6765\u63a2\u7d22\u6548\u7387\u3001\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u7684\u6743\u8861\u5173\u7cfb\u3002\u5229\u7528GLUE\u548cAdvGLUE\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002AdvGLUE\u6570\u636e\u96c6\u901a\u8fc7\u6dfb\u52a0\u65e8\u5728\u6311\u6218\u6a21\u578b\u9c81\u68d2\u6027\u7684\u5bf9\u6297\u6837\u672c\u6269\u5c55\u4e86GLUE\u6570\u636e\u96c6\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5728GLUE\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u6027\u7a0d\u4f4e\u7684\u60c5\u51b5\u4e0b\uff0cGLA\u53d8\u6362\u5668\u548cMatMul-Free LM\u5728AdvGLUE\u4efb\u52a1\u4e0a\u663e\u793a\u51fa\u66f4\u9ad8\u7684\u6548\u7387\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u653b\u51fb\u7ea7\u522b\u4e0b\uff0c\u5b83\u4eec\u7684\u9c81\u68d2\u6027\u8981\u4e48\u4f18\u4e8e\uff0c\u8981\u4e48\u4e0eTransformer++\u76f8\u5339\u654c\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86\u7b80\u5316\u67b6\u6784\u5728\u5b9e\u73b0\u9ad8\u6548\u80fd\u3001\u9ad8\u6027\u80fd\u4e0e\u5bf9\u6297\u9c81\u68d2\u6027\u4e4b\u95f4\u53d6\u5f97\u826f\u597d\u5e73\u8861\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u8d44\u6e90\u53d7\u9650\u73af\u5883\u548c\u5bf9\u5bf9\u6297\u653b\u51fb\u6709\u9ad8\u62b5\u6297\u529b\u9700\u6c42\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u89c1\u89e3\u3002|\n", "2408.04575": "|**2024-08-08**|**SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals**|Haoran Zheng et.al.|[2408.04575](http://arxiv.org/abs/2408.04575)|null|\u89e3\u91ca\u6027\u4eba\u5de5\u667a\u80fd\uff08XAI\uff09\u5bf9\u4e8e\u589e\u5f3a\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u7684\u900f\u660e\u5ea6\u548c\u8d23\u4efb\u6027\u81f3\u5173\u91cd\u8981\uff0c\u5c24\u5176\u662f\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e2d\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSCENE\uff08\u8f6f\u53cd\u4e8b\u5b9e\u8bc4\u4f30\u7528\u4e8e\u81ea\u7136\u8bed\u8a00\u53ef\u89e3\u91ca\u6027\uff09\u7684\u65b0\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u96f6\u6b21\u5c04\u51fb\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u8f6f\u53cd\u4e8b\u5b9e\u89e3\u91ca\u3002\u901a\u8fc7\u5173\u6ce8\u57fa\u4e8e\u8bcd\u5143\u7684\u66ff\u6362\uff0cSCENE\u521b\u5efa\u4e86\u4e0a\u4e0b\u6587\u76f8\u5173\u4e14\u8bed\u4e49\u4e0a\u5177\u6709\u610f\u4e49\u7684\u8f6f\u53cd\u4e8b\u5b9e\uff0c\u800c\u65e0\u9700\u8fdb\u884c\u5927\u91cf\u5fae\u8c03\u3002SCENE\u91c7\u7528\u6709\u6548\u6027\u8f6f\u548cC\u8f6f\u6307\u6807\u6765\u8bc4\u4f30\u5404\u79cd\u6a21\u578b\u65e0\u5173\u7684XAI\u65b9\u6cd5\u5728\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6548\u679c\u3002\u5e94\u7528\u4e8eCNN\u3001RNN\u548cBERT\u67b6\u6784\uff0cSCENE\u63d0\u4f9b\u4e86\u5bf9\u5404\u79cdXAI\u6280\u672f\u5f3a\u9879\u548c\u5c40\u9650\u6027\u7684\u6709\u4ef7\u503c\u89c1\u89e3\u3002|\n", "2408.04568": "|**2024-08-08**|**Learning Fine-Grained Grounded Citations for Attributed Large Language Models**|Lei Huang et.al.|[2408.04568](http://arxiv.org/abs/2408.04568)|**[link](https://github.com/luckyyysta/fine-grained-attribution)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u67e5\u8be2\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u5728\u5e7b\u89c9\u95ee\u9898\u4e0a\u5b58\u5728\u6311\u6218\u3002\u57fa\u4e8e\u5c5e\u6027\u7684LLM\uff0c\u901a\u8fc7\u5728\u751f\u6210\u6587\u672c\u4e2d\u6dfb\u52a0\u5185\u8054\u5f15\u7528\uff0c\u663e\u793a\u51fa\u51cf\u5c11\u5e7b\u89c9\u5e76\u63d0\u9ad8\u53ef\u9a8c\u8bc1\u6027\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u65b9\u6cd5\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u5f15\u7528\u65b9\u9762\u6548\u679c\u4e0d\u4f73\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u4f9d\u8d56\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u53ea\u5f15\u7528\u7c97\u7c92\u5ea6\u6587\u6863\u6807\u8bc6\u7684\u505a\u6cd5\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FRONT\u6846\u67b6\uff0c\u65e8\u5728\u6559\u5bfcLLM\u751f\u6210\u7ec6\u7c92\u5ea6\u76f8\u5173\u5f15\u7528\u3002\u8fd9\u4e9b\u5f15\u7528\u901a\u8fc7\u8fde\u63a5\u5230\u751f\u6210\u54cd\u5e94\u7684\u7ec6\u7c92\u5ea6\u652f\u6301\u5f15\u7528\u6765\u63d0\u4f9b\u6307\u5bfc\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u5f15\u7528\u8d28\u91cf\uff0c\u8fd8\u4fbf\u4e8e\u8fdb\u884c\u7cbe\u7ec6\u9a8c\u8bc1\u3002\u5728ALCE\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFRONT\u5728\u751f\u6210\u4f18\u79c0\u76f8\u5173\u54cd\u5e94\u548c\u9ad8\u5ea6\u652f\u6301\u6027\u5f15\u7528\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002\u4f7f\u7528LLaMA-2-7B\u65f6\uff0c\u8be5\u6846\u67b6\u663e\u8457\u4f18\u4e8e\u6240\u6709\u57fa\u7ebf\uff0c\u5e73\u5747\u63d0\u9ad8\u4e8614.21%\u7684\u5f15\u7528\u8d28\u91cf\uff0c\u5e76\u4e14\u8d85\u8d8a\u4e86ChatGPT\u3002**|\n", "2408.04556": "|**2024-08-08**|**Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models**|Yupeng Chang et.al.|[2408.04556](http://arxiv.org/abs/2408.04556)|**[link](https://github.com/cyp-jlu-ai/ba-lora)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u77a9\u76ee\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5728\u5c06\u8fd9\u4e9b\u6a21\u578b\u5e94\u7528\u4e8e\u4e0b\u6e38\u5e94\u7528\u65f6\uff0c\u901a\u5e38\u9700\u8981\u8fdb\u884c\u8ba1\u7b97\u5bc6\u96c6\u578b\u548c\u5185\u5b58\u6d88\u8017\u5927\u7684\u5fae\u8c03\u8fc7\u7a0b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u6280\u672f\u5df2\u7ecf\u4f5c\u4e3a\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u51fa\u73b0\uff0c\u65e8\u5728\u4ee5\u6700\u5c0f\u7684\u8ba1\u7b97\u6210\u672c\u6765\u5b9a\u5236LLM\u3002\u5c3d\u7ba1PEFT\u65b9\u6cd5\u63d0\u4f9b\u4e86\u663e\u8457\u7684\u4f18\u52bf\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5b8c\u5168\u89e3\u51b3\u4ece\u9884\u8bad\u7ec3\u6570\u636e\u7ee7\u627f\u504f\u89c1\u7684\u95ee\u9898\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684PEFT\u65b9\u6cd5\u2014\u2014Bias-Aware Low-Rank Adaptation (BA-LoRA)\uff0c\u65e8\u5728\u5bf9\u6297\u504f\u89c1\u7ee7\u627f\u3002 BA-LoRA\u6574\u5408\u4e86\u4e09\u4e2a\u4e0d\u540c\u7684\u6b63\u5219\u5316\u9879\uff1a\u4e00\u81f4\u6027\u6b63\u5219\u5316\u5668\u3001\u591a\u6837\u6027\u6b63\u5219\u5316\u5668\u4ee5\u53ca\u5947\u5f02\u503c\u5206\u89e3\u6b63\u5219\u5316\u5668\u3002\u8fd9\u4e09\u4e2a\u6b63\u5219\u5316\u5668\u5171\u540c\u65e8\u5728\u63d0\u9ad8\u751f\u6210\u6a21\u578b\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u4e00\u81f4\u6027\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u548c\u81ea\u7136\u8bed\u8a00\u751f\u6210\uff08NLG\uff09\u4efb\u52a1\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528\u5982LLaMA\u3001Mistral\u548cGemma\u7b49\u4e3b\u6d41LLM\uff0c\u6211\u4eec\u5c55\u793a\u4e86BA-LoRA\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86LoRA\u53ca\u5176\u6700\u5148\u8fdb\u7684\u53d8\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6709\u6548\u5730\u51cf\u8f7b\u4e86\u9884\u8bad\u7ec3\u504f\u89c1\u7684\u8d1f\u9762\u5f71\u54cd\uff0c\u5bfc\u81f4\u66f4\u53ef\u9760\u4e14\u7a33\u5065\u7684\u6a21\u578b\u8f93\u51fa\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/cyp-jlu-ai/BA-LoRA\u3002**|\n", "2408.04522": "|**2024-08-08**|**Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models**|Fabio Pernisi et.al.|[2408.04522](http://arxiv.org/abs/2408.04522)|null|\u968f\u7740\u4e0d\u540c\u8bed\u8a00\u7684\u591a\u5143\u8bed\u8a00\u793e\u533a\u548c\u7528\u6237\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5728\u4e0d\u540c\u8bed\u8a00\u73af\u5883\u4e0b\u7684\u5b89\u5168\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5df2\u7ecf\u8fdb\u884c\u4e86\u6301\u7eed\u7684\u52aa\u529b\u4ee5\u786e\u4fddLLM\u7684\u5b89\u5168\u6027\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u53ef\u4ee5\u901a\u8fc7\u201c\u8d8a\u72f1\u201d\u6280\u672f\u6765\u8868\u73b0\u5f97\u4e0d\u5b89\u5168\uff0c\u8fd9\u662f\u4e00\u79cd\u4fc3\u4f7f\u6a21\u578b\u5728\u5176\u64cd\u4f5c\u51c6\u5219\u4e4b\u5916\u884c\u52a8\u7684\u6280\u672f\u3002\u5bf9\u4e8eLLM\u5b89\u5168\u6027\u4ee5\u53ca\u201c\u8d8a\u72f1\u201d\u7684\u7814\u7a76\u76ee\u524d\u4e3b\u8981\u96c6\u4e2d\u5728\u82f1\u8bed\u4e0a\uff0c\u8fd9\u9650\u5236\u4e86\u6211\u4eec\u5bf9\u5176\u4ed6\u8bed\u8a00\u4e2dLLM\u5b89\u5168\u6027\u7684\u7406\u89e3\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u610f\u5927\u5229\u8bed\u4e2d\u7814\u7a76\u591a\u8f6e\u201c\u8d8a\u72f1\u201d\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u7528\u4e0d\u5b89\u5168\u793a\u4f8b\u6765\u8bf1\u5bfc\u4e0d\u5b89\u5168\u884c\u4e3a\uff0c\u6765\u8d21\u732e\u4e8e\u8fd9\u4e00\u9886\u57df\u3002\u4e3a\u4e86\u652f\u6301\u6211\u4eec\u7684\u5206\u6790\uff0c\u6211\u4eec\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u610f\u5927\u5229\u8bed\u95ee\u9898-\u7b54\u6848\u4e0d\u5b89\u5168\u6570\u636e\u96c6\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5728\u56db\u4e2a\u5f00\u653e\u6743\u91cdLLM\u5bb6\u65cf\u4e2d\u8bc6\u522b\u51fa\u4e86\u660e\u663e\u7684\u5b89\u5168\u6f0f\u6d1e\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5373\u4f7f\u5728\u4f7f\u7528\u5c11\u91cf\u4e0d\u5b89\u5168\u793a\u4f8b\u7684\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u4e5f\u4f1a\u8868\u73b0\u51fa\u4e0d\u5b89\u5168\u7684\u884c\u4e3a\uff0c\u5e76\u4e14\u66f4\u4ee4\u4eba\u62c5\u5fe7\u7684\u662f\uff0c\u968f\u7740\u66f4\u591a\u793a\u4f8b\u7684\u51fa\u73b0\uff0c\u8fd9\u79cd\u8d8b\u52bf\u8fc5\u901f\u52a0\u5267\u3002|\n", "2408.04477": "|**2024-08-08**|**What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant**|Jonan Richards et.al.|[2408.04477](http://arxiv.org/abs/2408.04477)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7528\u4e8e\u8f85\u52a9\u5f00\u53d1\u8005\u7406\u89e3\u4ee3\u7801\u7684\u5de5\u5177\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\u7684\u540c\u65f6\uff0c\u5f00\u53d1\u8005\u5728\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u65f6\u4ecd\u9762\u4e34\u4e00\u4e9b\u969c\u788d\uff0c\u5305\u62ec\u7528\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u5176\u610f\u56fe\u7684\u6311\u6218\u3001\u89e3\u8bfb\u5de5\u5177\u7ed3\u679c\u7684\u56f0\u96be\uff0c\u4ee5\u53ca\u8c03\u6574\u6709\u6548\u63d0\u793a\u4ee5\u83b7\u5f97\u6709\u7528\u4fe1\u606f\u7684\u8fc7\u7a0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u5bf9\u8bdd\u52a9\u624b\uff0c\u8be5\u52a9\u624b\u6839\u636e\u63a8\u65ad\u51fa\u7684\u7528\u6237\u5fc3\u7406\u72b6\u6001\uff08\u5982\u80cc\u666f\u77e5\u8bc6\u548c\u7ecf\u9a8c\uff09\u63d0\u4f9b\u4e2a\u6027\u5316\u4e92\u52a8\u3002\u901a\u8fc7\u9488\u5bf9\u5341\u56db\u4f4d\u65b0\u624b\u8fdb\u884c\u7684\u5185\u90e8\u4e3b\u9898\u7814\u7a76\uff0c\u6211\u4eec\u6355\u6349\u4e86\u4ed6\u4eec\u7684\u611f\u77e5\u548c\u504f\u597d\u3002\u7814\u7a76\u7ed3\u679c\u4e3a\u5e0c\u671b\u521b\u5efa\u6216\u6539\u8fdb\u9762\u5411\u65b0\u624b\u7684LLM\u4e3a\u57fa\u7840\u7684\u5bf9\u8bdd\u52a9\u624b\u4ee5\u652f\u6301\u4ee3\u7801\u7406\u89e3\u7684\u7814\u7a76\u4eba\u5458\u548c\u5de5\u5177\u5f00\u53d1\u8005\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2408.04472": "|**2024-08-08**|**Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate**|Yiqun Zhang et.al.|[2408.04472](http://arxiv.org/abs/2408.04472)|**[link](https://github.com/zhangyiqun018/agent-for-debate)**|**\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u8fd9\u4e00\u5168\u9762\u4e14\u590d\u6742\u7684\u8ba1\u7b97\u8bba\u8fa9\u4efb\u52a1\u4e2d\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9762\u4e34\u7740\u5e7b\u89c9\u548c\u7ade\u4e89\u529b\u4e0d\u8db3\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8fa9\u8bba\u8005\u201d\uff08Agent4Debate\uff09\u7684\u52a8\u6001\u3001\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u57fa\u4e8eLLMs\u8bbe\u8ba1\uff0c\u65e8\u5728\u589e\u5f3a\u5176\u5728\u7ade\u4e89\u6027\u8fa9\u8bba\u4e2d\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u53d7\u5230\u4eba\u7c7b\u5728\u8fa9\u8bba\u51c6\u5907\u4e0e\u6267\u884c\u8fc7\u7a0b\u4e2d\u884c\u4e3a\u7684\u542f\u53d1\uff0c\u91c7\u7528\u534f\u4f5c\u67b6\u6784\uff0c\u7531\u56db\u4e2a\u4e13\u95e8\u7684\u4ee3\u7406\uff08\u641c\u7d22\u8005\u3001\u5206\u6790\u8005\u3001\u64b0\u5199\u8005\u548c\u5ba1\u9605\u8005\uff09\u52a8\u6001\u4ea4\u4e92\u5e76\u5408\u4f5c\u3002\u8fd9\u56db\u4e2a\u4ee3\u7406\u5728\u6574\u4e2a\u8fa9\u8bba\u8fc7\u7a0b\u4e2d\u8986\u76d6\u4e86\u4ece\u521d\u59cb\u7814\u7a76\u5230\u8bba\u70b9\u5f62\u6210\u3001\u53cd\u9a73\u548c\u603b\u7ed3\u7684\u591a\u4e2a\u9636\u6bb5\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6846\u67b6\u7684\u6027\u80fd\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u4e2d\u56fd\u8fa9\u8bba\u7ade\u6280\u573a\u201d\u7684\u6570\u636e\u5e93\uff0c\u5305\u542b\u4e8666\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4e2d\u6587\u8fa9\u8bba\u8bae\u9898\u3002\u6211\u4eec\u62db\u52df\u4e86\u5341\u4f4d\u7ecf\u9a8c\u4e30\u5bcc\u7684\u4e13\u4e1a\u8fa9\u8bba\u8005\uff0c\u5e76\u6536\u96c6\u4e86\u6d89\u53caAgent4Debate\u3001\u57fa\u7ebf\u6a21\u578b\u548c\u4eba\u7c7b\u7684200\u573a\u8fa9\u8bba\u8bb0\u5f55\u3002\u8bc4\u4ef7\u4f53\u7cfb\u91c7\u7528\u4e86\u81ea\u52a8\u8bc4\u5206\u7cfb\u7edfDebatrix\u4ee5\u53ca\u57fa\u4e8eDebatrix-Elo\u548cHuman-Elo\u6392\u540d\u7684\u4e13\u4e1a\u8bc4\u5ba1\u56e2\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6700\u5148\u8fdb\u7684Agent4Debate\u5728\u80fd\u529b\u4e0a\u4e0e\u4eba\u7c7b\u76f8\u5f53\u3002\u8fdb\u4e00\u6b65\u7684\u6d88\u878d\u7814\u7a76\u8868\u660e\uff0c\u4ee3\u7406\u7ed3\u6784\u4e2d\u7684\u6bcf\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.04449": "|**2024-08-08**|**RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents**|Zihao Zhu et.al.|[2408.04449](http://arxiv.org/abs/2408.04449)|null|\u6458\u8981\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRiskAwareBench\u7684\u81ea\u52a8\u5316\u6846\u67b6\uff0c\u65e8\u5728\u8bc4\u4f30\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u7531\u56db\u4e2a\u6a21\u5757\u7ec4\u6210\uff1a\u5b89\u5168\u63d0\u793a\u751f\u6210\u3001\u5371\u9669\u573a\u666f\u751f\u6210\u3001\u8ba1\u5212\u751f\u6210\u548c\u8bc4\u4f30\uff0c\u5b83\u5141\u8bb8\u8fdb\u884c\u5168\u9762\u7684\u98ce\u9669\u8bc4\u4f30\uff0c\u4e14\u6240\u9700\u7684\u4eba\u5de5\u5e72\u9884\u6700\u5c11\u3002\u901a\u8fc7\u4f7f\u7528\u8fd9\u4e2a\u6846\u67b6\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aPhysicalRisk\u7684\u6570\u636e\u96c6\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u6d89\u53ca\u76f8\u5173\u5b89\u5168\u63d0\u793a\u3001\u89c2\u5bdf\u548c\u6307\u4ee4\u7684\u573a\u666f\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5927\u591a\u6570LLM\u5728\u7269\u7406\u98ce\u9669\u610f\u8bc6\u65b9\u9762\u8868\u73b0\u4e0d\u8db3\uff0c\u5e76\u4e14\u57fa\u7840\u7684\u98ce\u9669\u7f13\u89e3\u7b56\u7565\u5e26\u6765\u7684\u63d0\u5347\u6709\u9650\u3002\u8fd9\u5f3a\u8c03\u4e86\u5728\u672a\u6765\u6539\u8fdb\u57fa\u4e8eLLM\u7684\u5b9e\u4f53\u5316\u4ee3\u7406\u7684\u7269\u7406\u98ce\u9669\u610f\u8bc6\u7684\u7d27\u8feb\u6027\u548c\u91cd\u8981\u6027\u3002|\n", "2408.05212": "|**2024-08-10**|**Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions**|Michele Miranda et.al.|[2408.05212](http://arxiv.org/abs/2408.05212)|**[link](https://github.com/michele17284/awesome-privacy-preserving-llms)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u6b65\uff0c\u5e76\u5728\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\u3002\u7136\u800c\uff0c\u5b83\u4eec\u4f9d\u8d56\u4e8e\u5e9e\u5927\u7684\u4e92\u8054\u7f51\u6765\u6e90\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u8fd9\u5e26\u6765\u4e86\u663e\u8457\u7684\u9690\u79c1\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5728\u5173\u952e\u9886\u57df\uff08\u5982\u533b\u7597\u4fdd\u5065\uff09\u7684\u60c5\u51b5\u4e0b\u4f1a\u52a0\u5267\u8fd9\u4e9b\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5728\u7279\u5b9a\u5e94\u7528\u573a\u666f\u4e0b\uff0c\u53ef\u80fd\u9700\u8981\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u9488\u5bf9\u79c1\u6709\u6570\u636e\u7684\u5fae\u8c03\u3002\u672c\u6587\u5bf9LLM\u7684\u9690\u79c1\u5a01\u80c1\u8fdb\u884c\u4e86\u6279\u5224\u6027\u8bc4\u4f30\uff0c\u5f3a\u8c03\u4e86\u8fd9\u4e9b\u6a21\u578b\u53ef\u80fd\u8bb0\u4f4f\u5e76\u65e0\u610f\u95f4\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u7684\u98ce\u9669\u3002 \u6211\u4eec\u901a\u8fc7\u56de\u987e\u9488\u5bf9LLM\u7684\u9690\u79c1\u653b\u51fb\u6765\u63a2\u8ba8\u5f53\u524d\u7684\u5a01\u80c1\uff0c\u5e76\u63d0\u51fa\u5168\u9762\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5728\u6574\u4e2a\u5b66\u4e60\u7ba1\u9053\u4e2d\u6574\u5408\u9690\u79c1\u673a\u5236\u3002\u8fd9\u4e9b\u89e3\u51b3\u65b9\u6848\u6db5\u76d6\u4e86\u4ece\u533f\u540d\u5316\u8bad\u7ec3\u6570\u636e\u5230\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u8fc7\u7a0b\u4e2d\u5b9e\u65bd\u5dee\u5206\u9690\u79c1\uff0c\u4ee5\u53ca\u5728\u8bad\u7ec3\u540e\u6267\u884c\u673a\u5668\u9057\u5fd8\u7684\u8303\u56f4\u3002\u6211\u4eec\u7684\u6587\u732e\u7efc\u8ff0\u6df1\u5165\u7814\u7a76\u4e86\u73b0\u6709\u7814\u7a76\u4e2d\u7684\u6301\u7eed\u6311\u6218\u3001\u53ef\u7528\u5de5\u5177\u548c\u672a\u6765\u65b9\u5411\uff0c\u4ee5\u4fdd\u62a4LLM\u4e2d\u7684\u9690\u79c1\u3002\u8fd9\u9879\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u63d0\u4f9b\u5bf9\u9690\u79c1\u4fdd\u5b58\u65b9\u6cd5\u53ca\u5176\u5728\u51cf\u8f7b\u98ce\u9669\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u5168\u9762\u7406\u89e3\uff0c\u6307\u5bfc\u5f00\u53d1\u66f4\u5b89\u5168\u3001\u66f4\u53ef\u4fe1\u7684AI\u7cfb\u7edf\u3002|\n", "2408.05211": "|**2024-08-09**|**VITA: Towards Open-Source Interactive Omni Multimodal LLM**|Chaoyou Fu et.al.|[2408.05211](http://arxiv.org/abs/2408.05211)|**[link](https://github.com/VITA-MLLM/VITA)**|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86VITA\uff0c\u8fd9\u662f\u9996\u4e2a\u5f00\u6e90\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u540c\u65f6\u5904\u7406\u548c\u5206\u6790\u89c6\u9891\u3001\u56fe\u50cf\u3001\u6587\u672c\u548c\u97f3\u9891\u7b49\u591a\u5143\u6a21\u6001\u4fe1\u606f\uff0c\u5e76\u4e14\u5177\u5907\u9ad8\u7ea7\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u4f53\u9a8c\u3002\u4eceMixtral 8x7B\u4f5c\u4e3a\u8bed\u8a00\u57fa\u7840\u51fa\u53d1\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u5176\u5728\u4e2d\u6587\u65b9\u9762\u7684\u8bcd\u6c47\uff0c\u5e76\u901a\u8fc7\u53cc\u8bed\u6307\u4ee4\u5fae\u8c03\u8fdb\u4e00\u6b65\u63d0\u5347\u4e86\u6a21\u578b\u80fd\u529b\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u4e24\u9636\u6bb5\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u65b9\u5f0f\uff0c\u4e3a\u8bed\u8a00\u6a21\u578b\u8d4b\u4e88\u4e86\u89c6\u89c9\u548c\u97f3\u9891\u5904\u7406\u7684\u80fd\u529b\u3002 VITA\u5c55\u73b0\u4e86\u5f3a\u5927\u7684\u591a\u8bed\u8a00\u3001\u89c6\u89c9\u548c\u97f3\u9891\u7406\u89e3\u7684\u57fa\u7840\u80fd\u529b\uff0c\u5e76\u5728\u4e00\u7cfb\u5217\u5355\u6a21\u6001\u4e0e\u591a\u6a21\u6001\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u9664\u4e86\u57fa\u7840\u80fd\u529b\u5916\uff0c\u6211\u4eec\u5728\u63d0\u5347\u81ea\u7136\u591a\u6a21\u6001\u4eba\u673a\u4ea4\u4e92\u4f53\u9a8c\u65b9\u9762\u4e5f\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u5229\u7528\u975e\u5524\u9192\u4ea4\u4e92\u548c\u97f3\u9891\u4e2d\u65ad\u529f\u80fd\u3002 VITA\u662f\u5f00\u6e90\u793e\u533a\u63a2\u7d22\u65e0\u7f1d\u878d\u5408\u591a\u6a21\u6001\u7406\u89e3\u548c\u4ea4\u4e92\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1VITA\u4e0e\u4e13\u6709\u6a21\u578b\u8fd8\u6709\u8f83\u5927\u5dee\u8ddd\uff0c\u4f46\u6211\u4eec\u76f8\u4fe1\u5b83\u4f5c\u4e3a\u5148\u950b\u89d2\u8272\u53ef\u4ee5\u6210\u4e3a\u540e\u7eed\u7814\u7a76\u7684\u91cd\u8981\u57fa\u77f3\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://vita-home.github.io|\n", "2408.05204": "|**2024-08-09**|**Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners**|Michael Vaccaro Jr et.al.|[2408.05204](http://arxiv.org/abs/2408.05204)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5c24\u5176\u662fOpenAI\u7684GPT\u7cfb\u5217\uff0c\u5728\u591a\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u8fd9\u4e9b\u6a21\u578b\u56e0\u5176\u5728\u4e0d\u540c\u5b66\u79d1\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\u4ee5\u53ca\u5bf9\u7528\u6237\u63d0\u793a\u7684\u5feb\u901f\u9002\u5e94\u6027\u800c\u53d7\u5230\u5173\u6ce8\uff0c\u5e76\u4e14\u5c55\u73b0\u51fa\u4f5c\u4e3a\u4e2a\u6027\u5316\u5b66\u4e60\uff08PL\uff09\u5de5\u5177\u7684\u72ec\u7279\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728K-12\u6559\u80b2\u4e2d\u7684\u5e94\u7528\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u9996\u6b21\u91c7\u7528\u968f\u673a\u5bf9\u7167\u8bd5\u9a8c\u65b9\u6cd5\uff08\u6837\u672c\u91cf\u4e3a23\uff09\u6765\u8bc4\u4f30GPT-4\u5728\u4e2d\u5b66\u79d1\u5b66\u6587\u672c\u4e2a\u6027\u5316\u65b9\u9762\u7684\u6709\u6548\u6027\u7684\u7814\u7a76\u3002\u5728\u8be5\u7814\u7a76\u4e2d\uff0cGPT-4\u7528\u4e8e\u6839\u636e\u5b66\u751f\u5728\u8bad\u7ec3\u9636\u6bb5\u505a\u51fa\u7684\u9009\u62e9\u6765\u5206\u6790\u548c\u9884\u6d4b\u4ed6\u4eec\u7684\u5b66\u4e60\u504f\u597d\u3002\u5bf9\u4e8e\u5b9e\u9a8c\u7ec4\u7684\u5b66\u751f\uff0cGPT-4\u88ab\u7528\u6765\u4fee\u6539\u79d1\u5b66\u6587\u672c\u4ee5\u4e0e\u5b66\u751f\u7684\u9884\u6d4b\u504f\u597d\u76f8\u5339\u914d\uff1b\u800c\u5bf9\u4e8e\u63a7\u5236\u7ec4\u7684\u5b66\u751f\uff0c\u6587\u672c\u5219\u88ab\u4fee\u6539\u4e3a\u4e0e\u5176\u5b66\u4e60\u504f\u597d\u76f8\u53cd\u3002\u901a\u8fc7\u66fc-\u60e0\u7279\u5c3cU\u68c0\u9a8c\uff0c\u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u6587\u672c\u4e0e\u5b66\u751f\u504f\u597d\u5339\u914d\u65f6\uff0c\u5b66\u751f\u660e\u663e\u66f4\u503e\u5411\u4e8e\u63a5\u53d7\uff08\u57280.10\u6c34\u5e73\u4e0a\u5177\u6709\u7edf\u8ba1\u5b66\u610f\u4e49\uff0cp=0.059\uff09\u3002\u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u80fd\u591f\u6709\u6548\u5730\u7406\u89e3\u548c\u5b9a\u5236\u6559\u80b2\u5185\u5bb9\u4ee5\u6ee1\u8db3\u4e0d\u540c\u5b66\u4e60\u8005\u7684\u504f\u597d\uff0c\u6807\u5fd7\u7740\u4e2a\u6027\u5316\u5b66\u4e60\u6280\u672f\u9886\u57df\u7684\u4e00\u4e2a\u91cd\u8981\u8fdb\u5c55\u3002 \u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u8ba8\u8bba\u4e86\u8fd9\u9879\u7814\u7a76\u7684\u5c40\u9650\u6027\u548c\u5728\u6559\u80b2\u4e2d\u4f7f\u7528\u4eba\u5de5\u667a\u80fd\u7684\u4f26\u7406\u8003\u8651\u3002|\n", "2408.05200": "|**2024-08-09**|**TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning**|Yujie Feng et.al.|[2408.05200](http://arxiv.org/abs/2408.05200)|**[link](https://github.com/WoodScene/TaSL)**|\u8bed\u8a00\u6a21\u578b\u8fde\u7eed\u5b66\u4e60\uff08CL\uff09\u6700\u8fd1\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u56e0\u4e3a\u5b83\u6709\u53ef\u80fd\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u9002\u5e94\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u52a8\u6001\u73b0\u5b9e\u73af\u5883\u3002\u4e00\u4e2a\u5173\u952e\u6311\u6218\u662f\u707e\u96be\u6027\u9057\u5fd8\uff0c\u5373\u6a21\u578b\u5728\u5b66\u4e60\u65b0\u4efb\u52a1\u65f6\u4f1a\u5931\u53bb\u5148\u524d\u83b7\u5f97\u7684\u77e5\u8bc6\u3002\u73b0\u6709\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u591a\u4e2a\u53c2\u6570\u6548\u7387\u5fae\u8c03\uff08PEFT\uff09\u5757\u6765\u4e3a\u6bcf\u4e2a\u4efb\u52a1\u83b7\u53d6\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u7f3a\u4e4f\u6548\u7387\uff0c\u5e76\u4e14\u5ffd\u89c6\u4e86\u901a\u8fc7\u4efb\u52a1\u4ea4\u4e92\u8fdb\u884c\u77e5\u8bc6\u4f20\u9012\u7684\u53ef\u80fd\u6027\u3002 \u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4efb\u52a1\u6280\u80fd\u5b9a\u4f4d\u4e0e\u6574\u5408\uff08TaSL\uff09\u7684\u65b0CL\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u4e0d\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u91cd\u64ad\u6765\u589e\u5f3a\u77e5\u8bc6\u4f20\u9012\u3002TaSL\u9996\u5148\u6839\u636e\u53c2\u6570\u4f9d\u8d56\u6027\u5c06\u6a21\u578b\u5206\u4e3a\u201c\u6280\u80fd\u5355\u5143\u201d\uff0c\u8fd9\u4f7f\u5f97\u5bf9\u6280\u80fd\u5355\u5143\u7684\u63a7\u5236\u66f4\u52a0\u7cbe\u7ec6\u3002\u7136\u540e\uff0c\u5b83\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u7ec4\u7ea7\u6280\u80fd\u5b9a\u4f4d\u6280\u672f\uff0c\u4ee5\u8bc6\u522b\u65b0\u4efb\u52a1\u4e2d\u6280\u80fd\u5355\u5143\u7684\u91cd\u8981\u6027\u5206\u5e03\u3002\u901a\u8fc7\u6bd4\u8f83\u8fd9\u4e2a\u91cd\u8981\u6027\u5206\u5e03\u4e0e\u5176\u4ed6\u5148\u524d\u4efb\u52a1\u4e2d\u7684\u5206\u5e03\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u4e00\u4e2a\u7cbe\u7ec6\u7684\u6280\u80fd\u6574\u5408\u7b56\u7565\uff0c\u4fdd\u7559\u4e86\u7279\u5b9a\u4e8e\u4efb\u52a1\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u9632\u6b62\u9057\u5fd8\uff0c\u5e76\u66f4\u65b0\u4e86\u5171\u4eab\u4efb\u52a1\u77e5\u8bc6\uff0c\u8fd9\u4fc3\u8fdb\u4e86\u53cc\u5411\u77e5\u8bc6\u4f20\u9012\u3002\u56e0\u6b64\uff0cTaSL\u5b9e\u73b0\u4e86\u4fdd\u6301\u5148\u524d\u77e5\u8bc6\u548c\u5728\u65b0\u4efb\u52a1\u4e0a\u53d6\u5f97\u4f18\u5f02\u8868\u73b0\u4e4b\u95f4\u7684\u6700\u4f73\u5e73\u8861\u3002 TaSL\u4e5f\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u901a\u7528\u6a21\u578b\uff0c\u5e76\u53ef\u4ee5\u6839\u636eLoRA\u7b49PEFT\u65b9\u6cd5\u8fdb\u884c\u5b9a\u5236\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8868\u73b0\u51fa\u663e\u8457\u7684\u6269\u5c55\u6027\uff0c\u5141\u8bb8\u4e0e\u8bb0\u5fc6\u91cd\u64ad\u96c6\u6210\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\u3002\u5728\u4e24\u4e2aCL\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u4f7f\u7528\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\uff08\u4ece2.2\u4ebf\u523070\u4ebf\u53c2\u6570\uff09\uff0c\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8bc1\u660e\u4e86TaSL\u53ca\u5176\u53d8\u4f53\u5728\u4e0d\u540c\u8bbe\u7f6e\u4e0b\u7684\u6709\u6548\u6027\u3002|\n", "2408.05149": "|**2024-08-09**|**AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset**|Pritam Deka et.al.|[2408.05149](http://arxiv.org/abs/2408.05149)|null|\u5728\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u653b\u51fb\u5f52\u56e0\u662f\u81f3\u5173\u91cd\u8981\u7684\u8fc7\u7a0b\uff0c\u5b83\u5141\u8bb8\u4e13\u5bb6\u5236\u5b9a\u9488\u5bf9\u653b\u51fb\u8005\u7684\u9632\u5fa1\u63aa\u65bd\u548c\u6cd5\u5f8b\u884c\u52a8\u3002\u76ee\u524d\uff0c\u5206\u6790\u4eba\u5458\u4e3b\u8981\u901a\u8fc7\u624b\u52a8\u64cd\u4f5c\u6765\u8fdb\u884c\u5f52\u56e0\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u4efb\u52a1\u7684\u590d\u6742\u6027\u3002\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u6280\u672f\u53ef\u4ee5\u88ab\u7528\u6765\u8f85\u52a9\u7f51\u7edc\u5b89\u5168\u5206\u6790\u5e08\u5728\u5f52\u56e0\u8fc7\u7a0b\u4e2d\u8fdb\u884c\u5de5\u4f5c\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u6280\u672f\u975e\u5e38\u5f3a\u5927\uff0c\u4f46\u5728\u7f3a\u4e4f\u653b\u51fb\u5f52\u56e0\u9886\u57df\u7684\u6570\u636e\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u5b83\u4eec\u9700\u8981\u5e94\u5bf9\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5c06\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u5e76\u63d0\u4f9b\u5230\u76ee\u524d\u4e3a\u6b62\u6211\u4eec\u6240\u77e5\u7684\u7b2c\u4e00\u4e2a\u653b\u51fb\u5f52\u56e0\u6570\u636e\u96c6\u3002\u6211\u4eec\u7684\u6570\u636e\u96c6\u8bbe\u8ba1\u7684\u4e3b\u8981\u76ee\u6807\u662f\u4ece\u7f51\u7edc\u5b89\u5168\u6587\u672c\u4e2d\u63d0\u53d6\u653b\u51fb\u5f52\u56e0\u4fe1\u606f\uff0c\u5229\u7528NLP\u9886\u57df\u7684\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u65b9\u6cd5\u3002\u4e0e\u5176\u5b83\u7f51\u7edc\u5b89\u5168NER\u6570\u636e\u96c6\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u6570\u636e\u96c6\u63d0\u4f9b\u4e86\u4e30\u5bcc\u4e14\u5305\u542b\u4e0a\u4e0b\u6587\u7ec6\u8282\u7684\u6ce8\u91ca\uff0c\u5305\u62ec\u4e00\u4e9b\u8de8\u77ed\u8bed\u548c\u53e5\u5b50\u7684\u6ce8\u91ca\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u5e76\u5e94\u7528\u4e86NLP\u6280\u672f\u6765\u5c55\u793a\u6570\u636e\u96c6\u5728\u653b\u51fb\u5f52\u56e0\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u8fd9\u4e9b\u5b9e\u9a8c\u7a81\u663e\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u80fd\u529b\u5728\u6539\u8fdb\u7f51\u7edc\u5b89\u5168\u6570\u636e\u96c6\u4e2d\u7684NER\u4efb\u52a1\u4ee5\u63d0\u5347\u653b\u51fb\u5f52\u56e0\u80fd\u529b\u7684\u6f5c\u529b\u3002|\n", "2408.05141": "|**2024-08-09**|**A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning**|Ye Yuan et.al.|[2408.05141](http://arxiv.org/abs/2408.05141)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7efc\u5408\u4f18\u5316\u7684\u589e\u5f3a\u68c0\u7d22\u8f85\u52a9\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u5916\u90e8\u77e5\u8bc6\u5e93\u663e\u8457\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u548c\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u8fdb\u884c\u4e86\u591a\u9879\u6539\u8fdb\uff0c\u5305\u62ec\u5bf9\u7f51\u9875\u4e2d\u7684\u6587\u672c\u6bb5\u843d\u548c\u8868\u683c\u8fdb\u884c\u7ec6\u5316\u5904\u7406\u3001\u5f15\u5165\u5c5e\u6027\u9884\u6d4b\u5668\u4ee5\u51cf\u5c11\u5e7b\u89c9\u3001\u6784\u5efaLLM\u77e5\u8bc6\u62bd\u53d6\u5668\u548c\u77e5\u8bc6\u56fe\u8c31\u62bd\u53d6\u5668\uff0c\u5e76\u6700\u7ec8\u5efa\u7acb\u4e86\u4e00\u4e2a\u6574\u5408\u6240\u6709\u53c2\u8003\u4fe1\u606f\u7684\u63a8\u7406\u7b56\u7565\u3002\u6211\u4eec\u901a\u8fc7Meta CRAG KDD\u676f2024\u7ade\u8d5b\u4e2d\u7684CRAG\u6570\u636e\u96c6\u5bf9\u7cfb\u7edf\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u672c\u5730\u4e0e\u5728\u7ebf\u8bc4\u4f30\u5747\u8868\u660e\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u590d\u6742\u63a8\u7406\u80fd\u529b\u4e0a\u5b9e\u73b0\u4e86\u663e\u8457\u63d0\u5347\u3002\u5728\u672c\u5730\u8bc4\u4f30\u4e2d\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u6a21\u578b\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u51c6\u786e\u6027\u65b9\u9762\u6709\u663e\u8457\u63d0\u5347\uff0c\u9519\u8bef\u7387\u4e5f\u6709\u6240\u4e0b\u964d\uff0c\u53d6\u5f97\u4e86\u8f83\u9ad8\u7684\u5206\u6570\u3002\u540c\u65f6\uff0c\u5728\u7ebf\u8bc4\u4f30\u7ed3\u679c\u540c\u6837\u8868\u73b0\u4f18\u5f02\uff0c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u7cfb\u7edf\u7684\u6027\u80fd\u548c\u6cdb\u5316\u80fd\u529b\u3002\u8be5\u7cfb\u7edf\u7684\u6e90\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8e\\url{https://gitlab.aicrowd.com/shizueyy/crag-new}\u3002|\n", "2408.05128": "|**2024-08-09**|**Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations**|Jasmine Latendresse et.al.|[2408.05128](http://arxiv.org/abs/2408.05128)|null|\u5728\u8f6f\u4ef6\u7cfb\u7edf\u529f\u80fd\u3001\u6548\u7387\u4e0e\u53ef\u7ef4\u62a4\u6027\u65b9\u9762\uff0c\u8f6f\u4ef6\u5e93\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\u3002\u968f\u7740\u5f00\u53d1\u8005\u8d8a\u6765\u8d8a\u591a\u5730\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u7b80\u5316\u7f16\u7801\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u6a21\u578b\u63a8\u8350\u5408\u9002\u5e93\u7684\u6709\u6548\u6027\u4ecd\u5904\u4e8e\u63a2\u7d22\u9636\u6bb5\u3002\u672c\u6587\u8bc4\u4f30\u4e86ChatGPT\u4f5c\u4e3a\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\u7684\u6709\u6548\u6027\uff0c\u5e76\u8bc6\u522b\u4e86\u6539\u8fdb\u7a7a\u95f4\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-3.5 Turbo\u751f\u6210\u9488\u5bf910000\u4e2aStack Overflow\u95ee\u9898\u7684Python\u4ee3\u7801\uff0c\u8fdb\u884c\u4e86\u4e00\u9879\u5b9e\u8bc1\u7814\u7a76\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0cChatGPT\u6bd4\u4eba\u7c7b\u5f00\u53d1\u8005\u66f4\u9891\u7e41\u5730\u4f7f\u7528\u7b2c\u4e09\u65b9\u5e93\uff0c\u503e\u5411\u4e8e\u5e7f\u6cdb\u91c7\u7528\u4e14\u5386\u53f2\u60a0\u4e45\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c14.2%\u63a8\u8350\u7684\u5e93\u5177\u6709\u9650\u5236\u6027\u7684Copyleft\u8bb8\u53ef\uff0c\u8fd9\u5e76\u672a\u7531ChatGPT\u660e\u786e\u4f20\u8fbe\u3002\u6b64\u5916\uff0c\u67096.5%\u7684\u5e93\u65e0\u6cd5\u76f4\u63a5\u4f7f\u7528\uff0c\u53ef\u80fd\u5bfc\u81f4\u5f00\u53d1\u8005\u56f0\u60d1\u548c\u6d6a\u8d39\u65f6\u95f4\u3002\u5c3d\u7ba1ChatGPT\u53ef\u4ee5\u4f5c\u4e3a\u6709\u6548\u7684\u8f6f\u4ef6\u56fe\u4e66\u9986\u5458\uff0c\u4f46\u5e94\u63d0\u4f9b\u5173\u4e8e\u7ef4\u62a4\u6307\u6807\u548c\u8bb8\u53ef\u7684\u66f4\u591a\u660e\u786e\u4fe1\u606f\u3002\u6211\u4eec\u5efa\u8bae\u5f00\u53d1\u8005\u5b9e\u65bd\u4e25\u683c\u7684\u4f9d\u8d56\u7ba1\u7406\u5b9e\u8df5\uff0c\u5e76\u5728\u5c06LLM\u751f\u6210\u7684\u4ee3\u7801\u96c6\u6210\u5230\u9879\u76ee\u4e2d\u4e4b\u524d\uff0c\u4ed4\u7ec6\u68c0\u67e5\u5e93\u7684\u8bb8\u53ef\u8bc1\u3002|\n", "2408.05126": "|**2024-08-09**|**Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media**|Petre Breazu et.al.|[2408.05126](http://arxiv.org/abs/2408.05126)|null|\u5728\u4eba\u5de5\u667a\u80fd\u7684\u5feb\u901f\u6f14\u8fdb\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6587\u672c\u5206\u6790\u4e2d\u7684\u53d1\u5c55\u4e0e\u5e94\u7528\u5f15\u8d77\u4e86\u5b66\u672f\u754c\u7684\u5e7f\u6cdb\u5173\u6ce8\u3002\u5c3d\u7ba1\u5404\u79cdLLMs\u5728\u8fdb\u884c\u5b9a\u6027\u5206\u6790\u65f6\u5c55\u73b0\u51fa\u7684\u6f5c\u529b\u88ab\u5bc4\u4e88\u539a\u671b\uff0c\u4f46\u5b83\u4eec\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u4e2d\u7684\u5e94\u7528\u5e76\u672a\u5f97\u5230\u5145\u5206\u63a2\u8ba8\u3002\u672c\u6587\u901a\u8fc7\u4e00\u9879\u4ee5GPT-4\u4e3a\u6838\u5fc3\u7684\u7814\u7a76\u5b9e\u9a8c\uff0c\u4e3aLLMs\u5728\u5b9a\u6027\u5206\u6790\u9886\u57df\u7684\u5e94\u7528\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\u3002\u7814\u7a76\u57fa\u4e8e\u4e00\u4e2a\u6765\u81ea\u6b27\u76df\u8d44\u52a9\u9879\u76ee\u7684YouTube\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u805a\u7126\u4e8e2016\u5e74\u745e\u5178\u7f57\u9a6c\u5c3c\u4e9a\u79fb\u6c11\u7fa4\u4f53\u7684\u4ee3\u8868\u5f62\u8c61\uff0c\u8fd9\u4e00\u65f6\u671f\u6b63\u503c2015\u5e74\u96be\u6c11\u5371\u673a\u4e4b\u540e\uff0c\u7d27\u90bb2017\u5e74\u7684\u745e\u5178\u5168\u56fd\u9009\u4e3e\u3002\u6211\u4eec\u7684\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u5c06\u4eba\u7c7b\u667a\u6167\u4e0eAI\u7684\u89c4\u6a21\u548c\u6548\u7387\u76f8\u7ed3\u5408\u7684\u53ef\u80fd\u6027\uff0c\u901a\u8fc7\u5206\u6790LLMs\u5728\u4eba\u6587\u5b66\u79d1\u548c\u793e\u4f1a\u79d1\u5b66\u9886\u57df\u7684\u5e94\u7528\u4f18\u52a3\uff0c\u5e76\u8ba8\u8bba\u672a\u6765\u53ef\u80fd\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2408.05123": "|**2024-08-09**|**Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video**|Chunggi Lee et.al.|[2408.05123](http://arxiv.org/abs/2408.05123)|null|\u968f\u7740\u7bee\u7403\u8fd0\u52a8\u7684\u666e\u53ca\uff0c\u7c89\u4e1d\u4eec\u5e38\u5e38\u56e0\u6bd4\u8d5b\u8282\u594f\u5feb\u548c\u590d\u6742\u5ea6\u9ad8\u800c\u611f\u5230\u56f0\u60d1\u3002\u7bee\u7403\u6218\u672f\u6d89\u53ca\u4e00\u7cfb\u5217\u590d\u6742\u7684\u52a8\u4f5c\uff0c\u9700\u8981\u5927\u91cf\u7684\u77e5\u8bc6\u624d\u80fd\u5b8c\u5168\u7406\u89e3\u3002\u8fd9\u79cd\u590d\u6742\u6027\u5bfc\u81f4\u4e86\u5bf9\u989d\u5916\u4fe1\u606f\u548c\u89e3\u91ca\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u4f1a\u5206\u6563\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u5173\u6ce8\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSportify\u7684\u89c6\u89c9\u95ee\u7b54\u7cfb\u7edf\uff0c\u5b83\u878d\u5408\u4e86\u53d9\u4e8b\u548c\u5d4c\u5165\u5f0f\u53ef\u89c6\u5316\uff0c\u65e8\u5728\u4e3a\u7403\u8ff7\u63d0\u4f9b\u7bee\u7403\u6218\u672f\u7591\u95ee\u7684\u6e05\u6670\u89e3\u7b54\uff0c\u5e2e\u52a9\u4ed6\u4eec\u7406\u89e3\u6bd4\u8d5b\u7684\u5404\u79cd\u65b9\u9762\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u79cd\u65b0\u578b\u7684\u52a8\u4f5c\u53ef\u89c6\u5316\uff08\u4f20\u7403\u3001\u5207\u5165\u548c\u63a9\u62a4\uff09\uff0c\u4ee5\u5c55\u793a\u5173\u952e\u52a8\u4f5c\u5e8f\u5217\u3002\u4e3a\u4e86\u89e3\u91ca\u7403\u5458\u884c\u52a8\u80cc\u540e\u7684\u539f\u56e0\u548c\u903b\u8f91\uff0c\u6211\u4eec\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u53d9\u4e8b\u6587\u672c\u3002\u6211\u4eec\u91c7\u7528\u6545\u4e8b\u8bb2\u8ff0\u7684\u65b9\u6cd5\u6765\u63cf\u8ff0\u590d\u6742\u573a\u666f\uff0c\u4ece\u7b2c\u4e00\u4eba\u79f0\u548c\u7b2c\u4e09\u4eba\u79f0\u7684\u89d2\u5ea6\u8fdb\u884c\u53d9\u8ff0\uff0c\u5e76\u878d\u5165\u52a8\u4f5c\u53ef\u89c6\u5316\u3002\u6211\u4eec\u901a\u8fc7\u4e0e\u7bee\u7403\u7c89\u4e1d\u7684\u8bc4\u4f30\uff0c\u63a2\u8ba8\u4e86Sportify\u5728\u6df1\u5316\u6218\u672f\u6d1e\u5bdf\u529b\u548c\u589e\u5f3a\u89c2\u8d5b\u4f53\u9a8c\u65b9\u9762\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u7b2c\u4e09\u4eba\u79f0\u53d9\u8ff0\u6709\u52a9\u4e8e\u4eba\u4eec\u83b7\u5f97\u6df1\u5165\u7684\u6bd4\u8d5b\u89e3\u91ca\uff0c\u800c\u7b2c\u4e00\u4eba\u79f0\u53d9\u8ff0\u5219\u589e\u5f3a\u4e86\u7c89\u4e1d\u4eec\u5bf9\u6bd4\u8d5b\u7684\u53c2\u4e0e\u611f\u3002|\n", "2408.05109": "|**2024-08-09**|**A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?**|Xinyu Liu et.al.|[2408.05109](http://arxiv.org/abs/2408.05109)|**[link](https://github.com/hkustdial/nl2sql_handbook)**|\u7ffb\u8bd1\u5982\u4e0b\uff1a \u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u5230SQL\u67e5\u8be2\uff08\u5373NL2SQL\uff09\u7684\u7ffb\u8bd1\u53ef\u4ee5\u663e\u8457\u964d\u4f4e\u8bbf\u95ee\u5173\u7cfb\u6570\u636e\u5e93\u7684\u969c\u788d\uff0c\u5e76\u652f\u6301\u5404\u79cd\u5546\u4e1a\u5e94\u7528\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u51fa\u73b0\uff0cNL2SQL\u7684\u6027\u80fd\u5f97\u5230\u4e86\u5927\u5e45\u63d0\u5347\u3002\u672c\u6587\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5168\u9762\u7684NL2SQL\u6280\u672f\u7efc\u8ff0\uff0c\u57fa\u4e8eLLMs\u9a71\u52a8\uff0c\u8986\u76d6\u4e86\u4ece\u56db\u4e2a\u65b9\u9762\u5bf9\u6574\u4e2a\u751f\u547d\u5468\u671f\u7684\u5168\u9762\u5ba1\u67e5\uff1a\uff081\uff09\u6a21\u578b\uff1a\u5904\u7406\u81ea\u7136\u8bed\u8a00\u7684\u6a21\u7cca\u6027\u548c\u4e0d\u5145\u5206\u6027\uff0c\u5e76\u6b63\u786e\u6620\u5c04\u81ea\u7136\u8bed\u8a00\u4e0e\u6570\u636e\u5e93\u6a21\u5f0f\u548c\u5b9e\u4f8b\uff1b\uff082\uff09\u6570\u636e\uff1a\u4ece\u6536\u96c6\u8bad\u7ec3\u6570\u636e\u3001\u5e94\u5bf9\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u6570\u636e\u5408\u6210\uff0c\u5230NL2SQL\u57fa\u51c6\uff1b\uff083\uff09\u8bc4\u4f30\uff1a\u4ece\u591a\u4e2a\u89d2\u5ea6\u4f7f\u7528\u4e0d\u540c\u6307\u6807\u5bf9NL2SQL\u65b9\u6cd5\u8fdb\u884c\u8bc4\u4f30\uff1b\uff084\uff09\u9519\u8bef\u5206\u6790\uff1a\u5206\u6790NL2SQL\u9519\u8bef\u4ee5\u627e\u5230\u6839\u672c\u539f\u56e0\uff0c\u5e76\u6307\u5bfcNL2SQL\u6a21\u578b\u53d1\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5f00\u53d1NL2SQL\u89e3\u51b3\u65b9\u6848\u7684\u4e00\u6761\u7ecf\u9a8c\u6cd5\u5219\u3002\u6700\u540e\uff0c\u8ba8\u8bba\u4e86\u5728LLMs\u65f6\u4ee3NL2SQL\u7684\u7814\u7a76\u6311\u6218\u548c\u5f00\u653e\u95ee\u9898\u3002 \u8bf7\u6ce8\u610f\uff0c\u6458\u8981\u4e2d\u5df2\u53bb\u9664\u6240\u6709\u4e0d\u5fc5\u8981\u7684\u5b57\u7b26\uff0c\u5305\u62ec\",\"\u7b26\u53f7\u3002|\n", "2408.06332": "|**2024-08-12**|**Animate, or Inanimate, That is the Question for Large Language Models**|Leonardo Ranaldi et.al.|[2408.06332](http://arxiv.org/abs/2408.06332)|null|\u4eba\u7c7b\u7684\u8ba4\u77e5\u6838\u5fc3\u4e0e\u201c\u6709\u751f\u547d\u6027\u201d\u8fd9\u4e00\u6982\u5ff5\u7d27\u5bc6\u76f8\u8fde\uff0c\u5b83\u5728\u5851\u9020\u8bb0\u5fc6\u3001\u89c6\u89c9\u4ee5\u53ca\u591a\u5c42\u6b21\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u867d\u7136\u201c\u6709\u751f\u547d\u6027\u201d\u5728\u8bed\u8a00\u4e2d\u901a\u8fc7\u52a8\u8bcd\u548c\u5f62\u5bb9\u8bcd\u7684\u7ec6\u5fae\u7ea6\u675f\u4f53\u73b0\u51fa\u6765\uff0c\u4f46\u5176\u5b66\u4e60\u548c\u7cbe\u70bc\u8fc7\u7a0b\u4e5f\u4f9d\u8d56\u4e8e\u975e\u8bed\u8a00\u4fe1\u606f\u3002\u540c\u6837\u5730\uff0c\u6211\u4eec\u5047\u8bbe\u5927\u6a21\u578b\u5728\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\u65f6\u80fd\u529b\u6709\u9650\u7684\u539f\u56e0\u662f\u5b83\u4eec\u4ec5\u4ee5\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u3002\u56e0\u6b64\uff0c\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u63a2\u8ba8\u7684\u95ee\u9898\u662f\uff1a\u5927\u6a21\u578b\u662f\u5426\u80fd\u591f\u4ee5\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u7684\u65b9\u5f0f\u5904\u7406\u201c\u6709\u751f\u547d\u6027\u201d\uff1f\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u65b9\u6cd5\u8fdb\u884c\u4e86\u7cfb\u7edf\u5206\u6790\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u793a\u5927\u6a21\u578b\u5728\u4e0d\u540c\u7684\u6709\u751f\u547d\u3001\u65e0\u751f\u547d\u3001\u5e38\u89c1\u548c\u5f02\u5e38\u60c5\u5883\u4e0b\u8fdb\u884c\u64cd\u4f5c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5c3d\u7ba1\u5927\u6a21\u578b\u4e3b\u8981\u57fa\u4e8e\u6587\u672c\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4f46\u5728\u9762\u5bf9\u5178\u578b\u7684\u6709\u751f\u547d\u4f53\u548c\u65e0\u751f\u547d\u4f53\u65f6\uff0c\u5b83\u4eec\u5c55\u73b0\u51fa\u4e0e\u5148\u524d\u7814\u7a76\u4e00\u81f4\u7684\u4eba\u7c7b\u884c\u4e3a\u6a21\u5f0f\u3002\u56e0\u6b64\uff0c\u5927\u6a21\u578b\u80fd\u591f\u9002\u5e94\u7406\u89e3\u975e\u5178\u578b\u60c5\u51b5\uff0c\u901a\u8fc7\u8bc6\u522b\u5f02\u5e38\u60c5\u51b5\u4e3a\u6709\u751f\u547d\u4f53\uff0c\u800c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u4f9d\u8d56\u7684\u672a\u8a00\u660e\u7684\u8ba4\u77e5\u89e6\u53d1\u673a\u5236\u6765\u5206\u89e3\u52a8\u753b\u3002|\n", "2408.06318": "|**2024-08-12**|**Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example**|Yanan Chen et.al.|[2408.06318](http://arxiv.org/abs/2408.06318)|null|\u672c\u6587\u65e8\u5728\u586b\u8865\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u4e3b\u4ee3\u7406\u4e0e\u4eba\u5de5\u901a\u7528\u667a\u80fd\uff08AGI\uff09\u63a5\u8fd1\u8fc7\u7a0b\u4e2d\u7814\u7a76\u7684\u7a7a\u767d\u3002\u5c3d\u7ba1LLM\u5c55\u73b0\u51fa\u51fa\u8272\u7684\u6cdb\u5316\u80fd\u529b\u548c\u6d8c\u73b0\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u7f3a\u4e4f\u5bf9LLM\u9a71\u52a8\u7684\u4ee3\u7406\u884c\u4e3a\u3001\u6f5c\u5728\u5931\u8d25\u539f\u56e0\u4ee5\u53ca\u5982\u4f55\u63d0\u5347\u5176\u6027\u80fd\u7684\u7814\u7a76\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u6311\u6218\u6027\u7684\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u5229\u7528\u4e86\u4e00\u4e2a\u540d\u4e3aTravelPlanner\u7684\u771f\u5b9e\u57fa\u51c6\uff0c\u5176\u4e2d\u7684\u4ee3\u7406\u5fc5\u987b\u6ee1\u8db3\u591a\u4e2a\u7ea6\u675f\u4ee5\u751f\u6210\u51c6\u786e\u7684\u8ba1\u5212\u3002\u901a\u8fc7TravelPlanner\u57fa\u51c6\uff0c\u6211\u4eec\u9488\u5bf9\u56db\u4e2a\u5173\u952e\u7814\u7a76\u95ee\u9898\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\uff1a\uff081\uff09LLM\u4ee3\u7406\u5728\u5904\u7406\u957f\u7bc7\u548c\u5608\u6742\u4e0a\u4e0b\u6587\u65f6\uff0c\u5bf9\u4e8e\u63a8\u7406\u548c\u89c4\u5212\u7684\u9c81\u68d2\u6027\u662f\u5426\u8db3\u591f\uff1f\uff082\uff09\u5c11\u91cf\u63d0\u793a\u80fd\u5426\u5bf9\u5177\u6709\u957f\u4e0a\u4e0b\u6587\u7684\u573a\u666f\u4ea7\u751f\u8d1f\u9762\u5f71\u54cd\uff1f\uff083\uff09\u6211\u4eec\u80fd\u5426\u4f9d\u8d56\u7ec6\u5316\u6765\u6539\u5584\u8ba1\u5212\uff1f\uff084\uff09\u662f\u5426\u53ef\u4ee5\u4f7f\u7528\u6b63\u8d1f\u53cd\u9988\u76f8\u7ed3\u5408\u7684\u65b9\u6cd5\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6027\u80fd\uff1f \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff1a\u9996\u5148\uff0c\u5c3d\u7ba1LLM\u80fd\u591f\u5904\u7406\u5927\u91cf\u7684\u53c2\u8003\u4fe1\u606f\u548c\u5c11\u91cf\u793a\u4f8b\uff0c\u4f46\u5728\u5904\u7406\u957f\u7bc7\u4e0a\u4e0b\u6587\u65f6\uff0c\u5b83\u4eec\u5f80\u5f80\u65e0\u6cd5\u5173\u6ce8\u5173\u952e\u90e8\u5206\uff1b\u5176\u6b21\uff0c\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u5206\u6790\u957f\u671f\u89c4\u5212\uff0c\u5e76\u4e0d\u80fd\u63d0\u4f9b\u51c6\u786e\u7684\u53cd\u9988\u4f9b\u7ec6\u5316\u4f7f\u7528\uff1b\u7b2c\u4e09\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u79f0\u4e3a\u53cd\u9988\u611f\u77e5\u5fae\u8c03\uff08FAFT\uff09\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4e86\u6b63\u8d1f\u53cd\u9988\uff0c\u76f8\u8f83\u4e8e\u76d1\u7763\u5f0f\u5fae\u8c03\uff08SFT\uff09\uff0c\u5b83\u80fd\u5e26\u6765\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u793e\u533a\u63d0\u4f9b\u4e86\u6709\u5173\u73b0\u5b9e\u4e16\u754c\u89c4\u5212\u5e94\u7528\u65b9\u9762\u7684\u6df1\u5165\u89c1\u89e3\u3002|\n", "2408.06292": "|**2024-08-12**|**The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery**|Chris Lu et.al.|[2408.06292](http://arxiv.org/abs/2408.06292)|**[link](https://github.com/sakanaai/ai-scientist)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u7684\u79d1\u5b66\u53d1\u73b0\uff0c\u4f7f\u524d\u6cbf\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u72ec\u7acb\u8fdb\u884c\u7814\u7a76\uff0c\u5e76\u4f20\u8fbe\u5176\u7814\u7a76\u6210\u679c\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201cAI\u79d1\u5b66\u5bb6\u201d\u8fd9\u4e00\u6982\u5ff5\uff0c\u5b83\u80fd\u751f\u6210\u65b0\u9896\u7684\u7814\u7a76\u601d\u8def\uff0c\u7f16\u5199\u4ee3\u7801\uff0c\u6267\u884c\u5b9e\u9a8c\uff0c\u53ef\u89c6\u5316\u7ed3\u679c\uff0c\u64b0\u5199\u5b8c\u6574\u7684\u79d1\u5b66\u8bba\u6587\uff0c\u5e76\u8fdb\u884c\u6a21\u62df\u7684\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\u4ee5\u8fdb\u884c\u8bc4\u4f30\u3002\u7406\u8bba\u4e0a\uff0c\u8fd9\u4e00\u8fc7\u7a0b\u53ef\u4ee5\u8fed\u4ee3\u8fdb\u884c\uff0c\u4ee5\u5f00\u653e\u6027\u65b9\u5f0f\u53d1\u5c55\u60f3\u6cd5\uff0c\u5c31\u50cf\u4eba\u7c7b\u7684\u79d1\u5b66\u793e\u533a\u4e00\u6837\u3002 \u901a\u8fc7\u5c06\u5176\u5e94\u7528\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u4e09\u4e2a\u4e0d\u540c\u5b50\u9886\u57df\uff1a\u6269\u6563\u5efa\u6a21\u3001\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u8bed\u8a00\u5efa\u6a21\u548c\u5b66\u4e60\u52a8\u6001\uff0c\u5c55\u793a\u4e86\u5176\u7075\u6d3b\u6027\u3002\u6bcf\u4e00\u7bc7\u8bba\u6587\u7684\u5f00\u53d1\u6210\u672c\u4f4e\u4e8e15\u7f8e\u5143\u3002\u4e3a\u4e86\u8bc4\u4f30\u751f\u6210\u7684\u8bba\u6587\uff0c\u6211\u4eec\u8bbe\u8ba1\u5e76\u9a8c\u8bc1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5ba1\u7a3f\u4eba\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u5728\u8bc4\u4ef7\u8bba\u6587\u5206\u6570\u65b9\u9762\u63a5\u8fd1\u4eba\u7c7b\u6c34\u5e73\u8868\u73b0\u3002AI\u79d1\u5b66\u5bb6\u80fd\u591f\u4ea7\u751f\u8d85\u8fc7\u9876\u7ea7\u673a\u5668\u5b66\u4e60\u4f1a\u8bae\u63a5\u53d7\u9608\u503c\u7684\u8bba\u6587\uff0c\u8fd9\u662f\u7531\u6211\u4eec\u7684\u81ea\u52a8\u5ba1\u7a3f\u4eba\u5224\u65ad\u7684\u3002\u8fd9\u4e00\u65b9\u6cd5\u6807\u5fd7\u7740\u673a\u5668\u5b66\u4e60\u9886\u57df\u79d1\u5b66\u7814\u7a76\u65b0\u7eaa\u5143\u7684\u5f00\u59cb\uff1a\u5c06AI\u4ee3\u7406\u7684\u53d8\u9769\u6027\u4f18\u52bf\u5e26\u5165\u6574\u4e2a\u7814\u7a76\u8fc7\u7a0b\uff0c\u4f7f\u6211\u4eec\u66f4\u63a5\u8fd1\u4e00\u4e2a\u80fd\u591f\u91ca\u653e\u89e3\u51b3\u4e16\u754c\u6700\u8270\u5de8\u95ee\u9898\u7684\u65e0\u9650\u53ef\u8d1f\u62c5\u521b\u65b0\u4e0e\u521b\u9020\u529b\u7684\u4e16\u754c\u3002\u6240\u6709\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/SakanaAI/AI-Scientist\u3002**|\n", "2408.06281": "|**2024-08-12**|**MovieSum: An Abstractive Summarization Dataset for Movie Screenplays**|Rohit Saxena et.al.|[2408.06281](http://arxiv.org/abs/2408.06281)|**[link](https://github.com/saxenarohit/moviesum)**|**\u7535\u5f71\u5267\u672c\u7684\u6982\u8ff0\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u8981\u6c42\u7406\u89e3\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u548c\u7535\u5f71\u7279\u6709\u7684\u5404\u79cd\u5143\u7d20\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6587\u6863\u6982\u8ff0\u65b9\u9762\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5728\u5904\u7406\u957f\u8f93\u5165\u4e0a\u4e0b\u6587\u65f6\u9047\u5230\u56f0\u96be\u3002\u6b64\u5916\uff0c\u867d\u7136\u6700\u8fd1\u7684\u7814\u7a76\u5173\u6ce8\u7535\u89c6\u811a\u672c\uff0c\u4f46\u7535\u5f71\u5267\u672c\u6982\u8ff0\u4ecd\u7136\u7f3a\u4e4f\u63a2\u7d22\u3002\u4e3a\u4e86\u6fc0\u53d1\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u540d\u4e3aMovieSum\u7684\u65b0\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u7535\u5f71\u5267\u672c\u7684\u62bd\u8c61\u6982\u8ff0\u3002\u8fd9\u4e2a\u6570\u636e\u96c6\u5305\u542b\u4e862200\u4e2a\u7535\u5f71\u5267\u672c\u53ca\u5176\u5bf9\u5e94\u7684\u7ef4\u57fa\u767e\u79d1\u5267\u60c5\u6982\u8ff0\u3002\u6211\u4eec\u4eba\u5de5\u683c\u5f0f\u5316\u4e86\u7535\u5f71\u5267\u672c\u4ee5\u8868\u793a\u5176\u7ed3\u6784\u5143\u7d20\u3002\u4e0e\u73b0\u6709\u7684\u6570\u636e\u96c6\u76f8\u6bd4\uff0cMovieSum\u5177\u6709\u51e0\u4e2a\u72ec\u7279\u7279\u70b9\uff1a\uff081\uff09\u5b83\u5305\u62ec\u7535\u5f71\u5267\u672c\uff0c\u8fd9\u4e9b\u5267\u672c\u6bd4\u7535\u89c6\u5267\u811a\u672c\u66f4\u957f\u3002\uff082\uff09\u5b83\u7684\u89c4\u6a21\u662f\u4e4b\u524d\u7535\u5f71\u5267\u672c\u6570\u636e\u96c6\u7684\u4e24\u500d\u3002\uff083\uff09\u5b83\u63d0\u4f9b\u4e86IMDb ID\u7b49\u5143\u6570\u636e\uff0c\u65b9\u4fbf\u83b7\u53d6\u989d\u5916\u7684\u5916\u90e8\u77e5\u8bc6\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u6700\u8fd1\u53d1\u5e03\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6211\u4eec\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u6982\u8ff0\u7684\u7ed3\u679c\uff0c\u4ee5\u63d0\u4f9b\u8be6\u7ec6\u7684\u57fa\u51c6\u3002**|\n", "2408.06276": "|**2024-08-13**|**Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation**|Jieyong Kim et.al.|[2408.06276](http://arxiv.org/abs/2408.06276)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u7684\u5353\u8d8a\u8868\u73b0\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5e76\u6fc0\u53d1\u4e86\u5b83\u4eec\u5728\u63a8\u8350\u7cfb\u7edf\u9886\u57df\u7684\u5e94\u7528\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u7684\u6f5c\u529b\uff0c\u5f80\u5f80\u53d7\u9650\u4e8e\u8f93\u5165\u4fe1\u606f\u7684\u6709\u9650\u6027\uff0c\u672a\u80fd\u5168\u9762\u53d1\u6325\u5176\u9ad8\u7ea7\u63a8\u7406\u80fd\u529b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEXP3RT\u7684\u65b0\u9896LLM\u63a8\u8350\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u7528\u6237\u548c\u7269\u54c1\u8bc4\u8bba\u4e2d\u8574\u542b\u7684\u4e30\u5bcc\u504f\u597d\u4fe1\u606f\u3002 EXP3RT\u901a\u8fc7\u4ece\u6559\u5e08LLM\u4e2d\u8fdb\u884c\u77e5\u8bc6\u84b8\u998f\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u5173\u952e\u7684\u4e09\u9879\u4efb\u52a1\uff1a\u9996\u5148\uff0c\u5b83\u4ece\u539f\u59cb\u8bc4\u8bba\u4e2d\u63d0\u53d6\u5e76\u5c01\u88c5\u6838\u5fc3\u7684\u4e3b\u89c2\u504f\u597d\uff1b\u5176\u6b21\uff0c\u6839\u636e\u7279\u5b9a\u6807\u51c6\u805a\u5408\u548c\u603b\u7ed3\u8fd9\u4e9b\u504f\u597d\uff0c\u5f62\u6210\u7528\u6237\u548c\u7269\u54c1\u7684\u6863\u6848\uff1b\u6700\u540e\uff0c\u8003\u8651\u7528\u6237/\u7269\u54c1\u6863\u6848\u4ee5\u53ca\u7269\u54c1\u63cf\u8ff0\u4e2d\u7684\u4e3b\u5ba2\u89c2\u4fe1\u606f\uff0c\u751f\u6210\u8be6\u7ec6\u7684\u63a8\u7406\u6b65\u9aa4\u548c\u9884\u6d4b\u8bc4\u7ea7\uff0c\u5373\u57fa\u4e8e\u63a8\u7406\u7684\u8bc4\u7ea7\u9884\u6d4b\u3002\u8fd9\u79cd\u7531EXP3RT\u63d0\u4f9b\u7684\u4e2a\u6027\u5316\u504f\u597d\u63a8\u7406\u80fd\u591f\u63d0\u9ad8\u8bc4\u7ea7\u9884\u6d4b\u7684\u51c6\u786e\u6027\uff0c\u5e76\u4e3a\u63a8\u8350\u7cfb\u7edf\u63d0\u4f9b\u5fe0\u5b9e\u4e14\u5408\u7406\u7684\u89e3\u91ca\u3002 \u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cEXP3RT\u5728\u8bc4\u7ea7\u9884\u6d4b\u548c\u5019\u9009\u9879\u76ee\u91cd\u6392\u5e8f\uff08\u7528\u4e8etop-k\u63a8\u8350\uff09\u65b9\u9762\u5747\u8d85\u8d8a\u4e86\u73b0\u6709\u65b9\u6cd5\uff0c\u540c\u65f6\u663e\u8457\u63d0\u5347\u4e86\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u89e3\u91ca\u6027\u3002|\n", "2408.06273": "|**2024-08-12**|**FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data**|Haoran Sun et.al.|[2408.06273](http://arxiv.org/abs/2408.06273)|**[link](https://github.com/tjunlp-lab/fuxitranyu)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u8bb8\u591aLLM\u5728\u9ad8\u8d44\u6e90\u548c\u4f4e\u8d44\u6e90\u8bed\u8a00\u4e4b\u95f4\u7684\u6027\u80fd\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5f00\u6e90\u591a\u8bed\u8a00LLM\u2014\u2014FuxiTranyu\uff0c\u65e8\u5728\u6ee1\u8db3\u7814\u7a76\u793e\u533a\u5bf9\u5e73\u8861\u4e14\u9ad8\u6027\u80fd\u591a\u8bed\u8a00\u80fd\u529b\u7684\u9700\u6c42\u3002FuxiTranyu-8B\uff0c\u5177\u670980\u4ebf\u53c2\u6570\u7684\u57fa\u6a21\uff0c\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u5728\u4e00\u4e2a\u7cbe\u5fc3\u5e73\u8861\u7684\u591a\u8bed\u8a00\u6570\u636e\u4ed3\u5e93\u4e0a\uff0c\u8be5\u4ed3\u5e93\u5305\u542b\u8986\u76d643\u79cd\u81ea\u7136\u8bed\u8a00\u548c16\u79cd\u7f16\u7a0b\u8bed\u8a00\u76846000\u4ebf\u4e2a\u4ee4\u724c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e24\u4e2a\u6307\u4ee4\u8c03\u4f18\u6a21\u578b\uff1aFuxiTranyu-8B-SFT\uff0c\u5b83\u57fa\u4e8e\u591a\u5143\u6307\u4ee4\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\uff1b\u4ee5\u53caFuxiTranyu-8B-DPO\uff0c\u5728\u504f\u597d\u6570\u636e\u96c6\u4e0a\u8fdb\u4e00\u6b65\u7cbe\u70bc\u4ee5\u589e\u5f3a\u5bf9\u9f50\u80fd\u529b\u7684DPO\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u5728\u591a\u79cd\u591a\u8bed\u8a00\u57fa\u51c6\u4e0a\u7684\u7ed3\u679c\u663e\u793a\uff0cFuxiTranyu\u5728\u4e0e\u73b0\u6709\u591a\u8bed\u8a00LLM\uff08\u5982BLOOM-7B\u3001PolyLM-13B\u3001Llama-2-Chat-7B\u548cMistral-7B-Instruct\uff09\u7684\u6bd4\u8f83\u4e2d\u8868\u73b0\u51fa\u7ade\u4e89\u6027\u6027\u80fd\u3002\u795e\u7ecf\u5143\u7ea7\u548c\u8868\u793a\u7ea7\u53ef\u89e3\u91ca\u6027\u5206\u6790\u8868\u660e\uff0cFuxiTranyu\u80fd\u591f\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5b66\u4e60\u4e00\u81f4\u7684\u591a\u8bed\u8a00\u8868\u793a\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9\u591a\u8bed\u8a00LLM\u53ca\u5176\u5de5\u4f5c\u673a\u5236\u7684\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u5e03\u4e86\u57fa\u6a21\u548c\u6307\u4ee4\u8c03\u4f18\u7684FuxiTranyu\u6a21\u578b\uff0c\u4ee5\u53ca58\u4e2a\u9884\u8bad\u7ec3\u68c0\u67e5\u70b9\uff0c\u901a\u8fc7HuggingFace\u548cGithub\u516c\u5f00\u5206\u4eab\u3002|\n", "2408.06272": "|**2024-08-12**|**A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution**|Sampath Rajapaksha et.al.|[2408.06272](http://arxiv.org/abs/2408.06272)|null|\u5728\u4e0d\u65ad\u6f14\u8fdb\u7684\u7f51\u7edc\u5b89\u5168\u9886\u57df\uff0c\u5206\u6790\u5e08\u9700\u8981\u5bc6\u5207\u5173\u6ce8\u6700\u65b0\u7684\u653b\u51fb\u8d8b\u52bf\u548c\u76f8\u5173\u4fe1\u606f\uff0c\u4ee5\u534f\u52a9\u8c03\u67e5\u4e0e\u5f52\u56e0\u7f51\u7edc\u653b\u51fb\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6280\u672f\u7684\u95ee\u7b54\u6a21\u578b\u53ca\u5176\u5e94\u7528\uff0c\u65e8\u5728\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u5bb6\u63d0\u4f9b\u6709\u5173\u7f51\u7edc\u653b\u51fb\u8c03\u67e5\u4e0e\u5f52\u56e0\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u5e93\uff08KB\uff09\uff0c\u80fd\u591f\u6839\u636e\u77e5\u8bc6\u5e93\u6216\u7528\u6237\u63d0\u4f9b\u7684\u5916\u90e8\u8d44\u6e90\u56de\u7b54\u7528\u6237\u7684\u67e5\u8be2\u3002 \u6211\u4eec\u901a\u8fc7\u5404\u79cd\u7c7b\u578b\u7684\u63d0\u95ee\uff0c\u5305\u62ec\u57fa\u4e8e\u77e5\u8bc6\u5e93\u3001\u5143\u6570\u636e\u3001\u77e5\u8bc6\u5e93\u4e2d\u7684\u7279\u5b9a\u6587\u6863\u4ee5\u53ca\u5916\u90e8\u8d44\u6e90\u7684\u63d0\u95ee\uff0c\u5bf9\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u8fdb\u884c\u4e86\u6d4b\u8bd5\u4e0e\u8bc4\u4f30\u3002\u6211\u4eec\u5c06\u77e5\u8bc6\u5e93\u4e3a\u57fa\u7840\u7684\u95ee\u9898\u7684\u7b54\u6848\u4e0eOpenAI\u7684GPT-3.5\u53ca\u6700\u65b0GPT-4\u7684LLM\u7b54\u6848\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u95ee\u7b54\u6a21\u578b\u5728\u63d0\u4f9b\u7b54\u6848\u7684\u540c\u65f6\u7ed9\u51fa\u4e86\u6765\u6e90\u4fe1\u606f\uff0c\u5e76\u4e14\u514b\u670d\u4e86GPT\u6a21\u578b\u53ef\u80fd\u4ea7\u751f\u7684\u5e7b\u89c9\u95ee\u9898\uff0c\u8fd9\u5bf9\u4e8e\u7f51\u7edc\u653b\u51fb\u7684\u8c03\u67e5\u4e0e\u5f52\u56e0\u81f3\u5173\u91cd\u8981\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u5f53RAG\u95ee\u7b54\u6a21\u578b\u5728\u67e5\u8be2\u4e4b\u5916\u63d0\u4f9b\u5c11\u91cf\u793a\u4f8b\u65f6\uff0c\u5176\u751f\u6210\u7684\u7b54\u6848\u8d28\u91cf\u901a\u5e38\u4f18\u4e8e\u4ec5\u63d0\u4f9b\u67e5\u8be2\u800c\u6ca1\u6709\u793a\u4f8b\u7684\u60c5\u51b5\u3002|\n", "2408.06266": "|**2024-08-12**|**Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment**|Karel D'Oosterlinck et.al.|[2408.06266](http://arxiv.org/abs/2408.06266)|**[link](https://github.com/contextualai/clair_and_apo)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u5e38\u4f7f\u7528\u5bf9\u6bd4\u6027\u5bf9\u9f50\u76ee\u6807\u548c\u504f\u597d\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5bf9\u9f50\u3002\u8fd9\u4e00\u8fc7\u7a0b\u6d89\u53ca\u5230\u6a21\u578b\u3001\u914d\u5bf9\u6570\u636e\u4ee5\u53ca\u76ee\u6807\u4e4b\u95f4\u7684\u4ea4\u4e92\uff0c\u4f7f\u5f97\u5bf9\u9f50\u53d8\u5f97\u590d\u6742\uff0c\u5e76\u6709\u65f6\u5bfc\u81f4\u4e0d\u7406\u60f3\u7684\u6210\u679c\u3002\u6211\u4eec\u5bf9\u6b64\u8fdb\u884c\u4e86\u7814\u7a76\uff0c\u53d1\u73b0\uff08i\uff09\u5f53\u5e95\u5c42\u54cd\u5e94\u5177\u6709\u5bf9\u6bd4\u6027\u65f6\uff0c\u504f\u597d\u6570\u636e\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u5b66\u4e60\u4fe1\u53f7\uff1b\uff08ii\uff09\u5bf9\u9f50\u76ee\u6807\u5728\u8bad\u7ec3\u671f\u95f4\u4e3a\u6a21\u578b\u63d0\u4f9b\u4e86\u66f4\u591a\u7684\u63a7\u5236\uff0c\u4ece\u800c\u5bfc\u81f4\u4e86\u66f4\u597d\u7684\u6027\u80fd\u3002\u57fa\u4e8e\u8fd9\u4e9b\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5bf9\u6bd4\u5b66\u4e60\u4eceAI\u4fee\u8ba2\uff08CLAIR\uff09\uff0c\u4e00\u79cd\u6570\u636e\u521b\u5efa\u65b9\u6cd5\uff0c\u53ef\u4ee5\u751f\u6210\u66f4\u5177\u6709\u5bf9\u6bd4\u6027\u7684\u504f\u597d\u5bf9\uff0c\u4ee5\u53ca\u951a\u5b9a\u504f\u597d\u4f18\u5316\uff08APO\uff09\uff0c\u4e00\u4e2a\u66f4\u5177\u53ef\u63a7\u6027\u548c\u7a33\u5b9a\u6027\u7684\u5bf9\u9f50\u76ee\u6807\u3002\u6211\u4eec\u4f7f\u7528\u5404\u79cd\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u96c6\u548c\u5bf9\u9f50\u76ee\u6807\u6765\u5bf9Llama-3-8B-Instruct\u8fdb\u884c\u5bf9\u9f50\uff0c\u5e76\u6d4b\u91cf\u4e86\u4e0e\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u7684MixEval-Hard\u5206\u6570\u3002CLAIR\u504f\u597d\u5bfc\u81f4\u6240\u6709\u6570\u636e\u96c6\u4e2d\u7684\u6700\u4f73\u6027\u80fd\uff0c\u800cAPO\u59cb\u7ec8\u4f18\u4e8e\u8f83\u5c11\u53ef\u63a7\u7684\u76ee\u6807\u3002\u901a\u8fc7\u572832K CLAIR\u504f\u597d\u4e0a\u4f7f\u7528APO\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u7684\u6700\u4f73\u6a21\u578b\u63d0\u9ad8\u4e86Llama-3-8B-Instruct\u7684\u6027\u80fd\u8fbe7.65%\uff0c\u5c06\u4e0eGPT4-turbo\u7684\u5dee\u8ddd\u7f29\u5c0f\u4e8645%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/ContextualAI/CLAIR_and_APO\u3002|\n", "2408.06223": "|**2024-08-12**|**On Effects of Steering Latent Representation for Large Language Model Unlearning**|Dang Huu-Tien et.al.|[2408.06223](http://arxiv.org/abs/2408.06223)|null|\u672c\u6587\u9996\u5148\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\u4e86\u5f15\u5bfc\u6a21\u578b\u4e2d\u95f4\u5c42\u9057\u5fd8\u8868\u793a\u5411\u968f\u673a\u65b9\u5411\u504f\u79fb\uff0c\u80fd\u964d\u4f4e\u6587\u672c\u751f\u6210\u7684\u7f6e\u4fe1\u5ea6\uff0c\u5bfc\u81f4\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ea7\u751f\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u56de\u7b54\u3002\u5176\u6b21\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u7cfb\u6570\u5982\u4f55\u5f71\u54cd\u9057\u5fd8\u6837\u672c\u8868\u793a\u4e0e\u968f\u673a\u65b9\u5411\u7684\u4e00\u81f4\u6027\uff0c\u5e76\u6697\u793a\u4e86\u4e0d\u540c\u7f51\u7edc\u5c42\u4e0b\u6709\u6548\u7684\u6700\u4f18\u7cfb\u6570\u503c\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5229\u7528\u4ee3\u8868\u9519\u4e71\u6cd5\uff08RMU\uff09\u8fdb\u884c\u5b66\u4e60\u64a4\u9500\u540e\u7684\u6a21\u578b\u80fd\u591f\u62b5\u5fa1\u5bf9\u6297\u6027\u9003\u8131\u653b\u51fb\u3002 \u6700\u540e\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u5206\u6790\u8868\u660e\uff0c\u5f53\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4e2d\u95f4\u548c\u540e\u671f\u5c42\u65f6\uff0cRMU\u7684\u6709\u6548\u6027\u8f83\u4f4e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u2014\u2014\u81ea\u9002\u5e94RMU\uff0c\u8be5\u65b9\u6cd5\u4f7f\u5927\u591a\u6570\u5c42\u90fd\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u7684\u5b66\u4e60\u64a4\u9500\uff0c\u4e14\u4e0d\u589e\u52a0\u989d\u5916\u7684\u8ba1\u7b97\u6210\u672c\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u5148\u524d\u7684\u7814\u7a76\u76f8\u6bd4\uff0c\u81ea\u9002\u5e94RMU\u663e\u8457\u63d0\u9ad8\u4e86\u5b66\u4e60\u64a4\u9500\u7684\u6027\u80fd\u3002|\n", "2408.06186": "|**2024-08-12**|**Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting**|Halley Young et.al.|[2408.06186](http://arxiv.org/abs/2408.06186)|null|\u751f\u6210\u591a\u6837\u5316\u7684\u6587\u672c\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u5173\u952e\u6311\u6218\u3002\u5230\u76ee\u524d\u4e3a\u6b62\uff0c\u591a\u6837\u6027\u7684\u7814\u7a76\u4e3b\u8981\u901a\u8fc7$n$-gram\u591a\u6837\u6027\u6216BERT\u5d4c\u5165\u7684\u591a\u6837\u6027\u7b49\u6307\u6807\u8fdb\u884c\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u5728\u8003\u8651\u591a\u6837\u6027\u7684\u7ef4\u5ea6\u4e0a\u7f3a\u4e4f\u7528\u6237\u63a7\u5236\u6743\u3002\u4f8b\u5982\uff0c\u5728\u8bd7\u6b4c\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u5e0c\u671b\u5728\u62bc\u97f5\u548c\u8282\u594f\u65b9\u9762\u5b9e\u73b0\u591a\u6837\u6027\uff0c\u800c\u5728\u4ee3\u7801\u9886\u57df\uff0c\u7528\u6237\u53ef\u80fd\u66f4\u5173\u6ce8\u89e3\u51b3\u95ee\u9898\u65f6\u6240\u4f7f\u7528\u7684\u8868\u8fbe\u65b9\u5f0f\u7684\u591a\u6837\u6027\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u7ed3\u6784\u591a\u6837\u6027\uff08Structural Diversity\uff09\u7684\u65b0\u6307\u6807\u3002\u8be5\u6307\u6807\u5141\u8bb8\u7528\u6237\u63d0\u4f9b\u4e00\u4e2a\u6620\u5c04\uff0c\u5c06\u751f\u6210\u7684\u6587\u672c\u8f6c\u6362\u4e3a\u6355\u83b7\u7528\u6237\u5173\u5fc3\u7684\u591a\u6837\u6027\u7684\u7279\u5f81\u3002\u8fd9\u6837\uff0c\u7528\u6237\u53ef\u4ee5\u66f4\u5177\u4f53\u5730\u63a7\u5236\u4ed6\u4eec\u60f3\u8981\u63a2\u7d22\u7684\u591a\u6837\u6027\u7ef4\u5ea6\uff0c\u5982\u5728\u8bd7\u6b4c\u9886\u57df\u5173\u6ce8\u62bc\u97f5\u548c\u8282\u594f\uff0c\u5728\u4ee3\u7801\u9886\u57df\u5173\u6ce8\u7279\u5b9a\u7684\u8868\u8fbe\u65b9\u5f0f\u7b49\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u94fe\u5f0f\u89c4\u8303\uff08Chain-of-Specification\uff0cCoS\uff09\u7684\u65b0\u578b\u7b56\u7565\uff0c\u7528\u4e8e\u901a\u8fc7\u9996\u5148\u8ba9LLM\u751f\u6210\u63cf\u8ff0\u7279\u5b9a\u7ed3\u6784\u7279\u5f81\u5b9e\u4f8b\u7684\u89c4\u8303\uff0c\u7136\u540e\u5f15\u5bfcLLM\u751f\u6210\u6ee1\u8db3\u8fd9\u4e9b\u7279\u5f81\u7684\u6587\u672c\u6765\u63d0\u9ad8\u591a\u6837\u6027\uff1b\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u7b56\u7565\u9002\u7528\u4e8e\u9ed1\u76d2LLM\u3002\u5728\u6211\u4eec\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728\u8bd7\u6b4c\u548c\u4ee3\u7801\u9886\u57df\u5b9e\u73b0\u7ed3\u6784\u591a\u6837\u6027\u65f6\uff0cCoS\u7b56\u7565\u76f8\u6bd4\u591a\u4e2a\u57fa\u7ebf\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6837\u6027\u3002|\n", "2408.07060": "|**2024-08-13**|**Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents**|Kexun Zhang et.al.|[2408.07060](http://arxiv.org/abs/2408.07060)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee3\u7406\u5728\u89e3\u51b3\u5b9e\u9645\u4e16\u754c\u8f6f\u4ef6\u5de5\u7a0b\uff08SWE\uff09\u95ee\u9898\u65b9\u9762\u5c55\u73b0\u51fa\u5de8\u5927\u7684\u6f5c\u529b\u3002\u6700\u5148\u8fdb\u5f00\u6e90\u7684SWE\u4ee3\u7406\u80fd\u591f\u5728SWE-Bench Lite\u4e2d\u89e3\u51b3\u8d85\u8fc727%\u7684\u5b9e\u9645GitHub\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u590d\u6742\u7684\u4ee3\u7406\u6846\u67b6\u5728\u8868\u73b0\u4e0a\u5b58\u5728\u5dee\u5f02\uff0c\u6709\u7684\u5728\u7279\u5b9a\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u5219\u8868\u73b0\u4e0d\u4f73\u3002\u4e3a\u4e86\u5145\u5206\u5229\u7528\u8fd9\u4e9b\u4ee3\u7406\u7684\u591a\u6837\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDEI\uff08\u591a\u5143\u5316\u667a\u80fd\uff09\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5229\u7528\u4e86\u5b83\u4eec\u7684\u72ec\u7279\u4e13\u957f\u3002DEI\u4f5c\u4e3a\u4e00\u4e2a\u4f4d\u4e8e\u73b0\u6709SWE\u4ee3\u7406\u6846\u67b6\u4e4b\u4e0a\u7684\u5143\u6a21\u5757\uff0c\u7ba1\u7406\u4ee3\u7406\u96c6\u4f53\u4ee5\u5b9e\u73b0\u589e\u5f3a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7531DEI\u6307\u5bfc\u7684\u4ee3\u7406\u59d4\u5458\u4f1a\u80fd\u591f\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u6700\u4f73\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u4e00\u7ec4\u5f00\u6e90\u7684SWE\u4ee3\u7406\uff0c\u5176\u4e2a\u4f53\u89e3\u51b3\u7387\u6700\u9ad8\u4e3a27.3%\u5728SWE-Bench Lite\u4e2d\uff0c\u901a\u8fc7\u91c7\u7528DEI\uff0c\u53ef\u4ee5\u8fbe\u523034.3%\u7684\u89e3\u51b3\u7387\uff0c\u5b9e\u73b0\u4e8625%\u7684\u6539\u8fdb\uff0c\u5e76\u51fb\u8d25\u4e86\u8bb8\u591a\u95ed\u6e90\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u7684\u6700\u4f73\u6027\u80fd\u7ec4\u8868\u73b0\u51fa\u8272\uff0c\u8fbe\u5230\u4e8655%\u7684\u89e3\u51b3\u7387\uff0c\u5728SWE-Bench Lite\u4e2d\u83b7\u5f97\u4e86\u6700\u9ad8\u6392\u540d\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u5bf9\u5408\u4f5c\u578b\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u7814\u7a76\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u5c55\u793a\u4e86\u5b83\u4eec\u5728\u89e3\u51b3\u590d\u6742\u8f6f\u4ef6\u5de5\u7a0b\u6311\u6218\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.07055": "|**2024-08-13**|**LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs**|Yushi Bai et.al.|[2408.07055](http://arxiv.org/abs/2408.07055)|**[link](https://github.com/thudm/longwriter)**|**\u5f53\u524d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u80fd\u591f\u5904\u7406\u6700\u591a10\u4e07\u5b57\u7684\u8f93\u5165\uff0c\u7136\u800c\u5728\u751f\u6210\u8d85\u8fc72\u5343\u5b57\u7684\u8f93\u51fa\u65f6\u5374\u529b\u4e0d\u4ece\u5fc3\u3002\u901a\u8fc7\u63a7\u5236\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u6a21\u578b\u7684\u6709\u6548\u751f\u6210\u957f\u5ea6\u672c\u8d28\u4e0a\u53d7\u5230\u5176\u5728\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u671f\u95f4\u6240\u89c1\u6837\u672c\u7684\u9650\u5236\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u5b83\u4eec\u7684\u8f93\u51fa\u9650\u5236\u6e90\u4e8e\u73b0\u6709SFT\u6570\u636e\u96c6\u4e2d\u957f\u8f93\u51fa\u793a\u4f8b\u7684\u7a00\u7f3a\u6027\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86AgentWrite\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u4ee3\u7406\u7684\u7ba1\u9053\uff0c\u5c06\u8d85\u957f\u751f\u6210\u4efb\u52a1\u5206\u89e3\u4e3a\u5b50\u4efb\u52a1\uff0c\u4ece\u800c\u4f7f\u73b0\u6709\u7684LLMs\u80fd\u591f\u751f\u6210\u8d85\u8fc72\u4e07\u5b57\u7684\u8fde\u8d2f\u8f93\u51fa\u3002 \u501f\u52a9AgentWrite\uff0c\u6211\u4eec\u6784\u5efa\u4e86LongWriter-6k\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e866000\u4e2aSFT\u6570\u636e\uff0c\u8f93\u51fa\u957f\u5ea6\u8303\u56f4\u4ece2\u5343\u523032\u5343\u5b57\u3002\u901a\u8fc7\u5c06\u6b64\u6570\u636e\u96c6\u7eb3\u5165\u6a21\u578b\u8bad\u7ec3\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u73b0\u6709\u6a21\u578b\u7684\u8f93\u51fa\u957f\u5ea6\u6269\u5c55\u81f3\u8d85\u8fc71\u4e07\u5b57\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u8f93\u51fa\u8d28\u91cf\u3002\u6211\u4eec\u4e5f\u5f00\u53d1\u4e86LongBench-Write\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u8d85\u957f\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u76849\u4ebf\u53c2\u6570\u6a21\u578b\uff0c\u5728\u7ecf\u8fc7DPO\u8fdb\u4e00\u6b65\u6539\u8fdb\u540e\uff0c\u5728\u8fd9\u4e00\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u66f4\u5927\u89c4\u6a21\u7684\u4e13\u6709\u6a21\u578b\u3002 \u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u73b0\u6709\u7684\u957f\u4e0a\u4e0b\u6587LLMs\u5b9e\u9645\u4e0a\u5df2\u7ecf\u5177\u5907\u4e86\u66f4\u5927\u7684\u8f93\u51fa\u7a97\u53e3\u7684\u80fd\u529b\u2014\u2014\u4f60\u53ea\u9700\u8981\u5728\u6a21\u578b\u5bf9\u9f50\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u5e26\u6709\u5ef6\u957f\u8f93\u51fa\u7684\u6570\u636e\u5373\u53ef\u89e3\u9501\u8fd9\u4e00\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6a21\u578b\u53ef\u4ee5\u5728\uff1ahttps://github.com/THUDM/LongWriter\u627e\u5230\u3002**|\n", "2408.07004": "|**2024-08-13**|**Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models**|Chun Jie Chong et.al.|[2408.07004](http://arxiv.org/abs/2408.07004)|null|\u57fa\u4e8e\u7f51\u7edc\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u670d\u52a1\u5df2\u88ab\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e76\u5df2\u6210\u4e3a\u6211\u4eec\u4e92\u8054\u7f51\u4f53\u9a8c\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002\u7b2c\u4e09\u65b9\u63d2\u4ef6\u901a\u8fc7\u63d0\u4f9b\u5bf9\u73b0\u5b9e\u4e16\u754c\u6570\u636e\u548c\u670d\u52a1\u7684\u8bbf\u95ee\uff0c\u589e\u5f3a\u4e86LLM\u7684\u529f\u80fd\u6027\u3002\u7136\u800c\uff0c\u4e0e\u8fd9\u4e9b\u670d\u52a1\u53ca\u5176\u7b2c\u4e09\u65b9\u63d2\u4ef6\u76f8\u5173\u7684\u9690\u79c1\u540e\u679c\u5e76\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u654f\u611f\u63d0\u793a\u6570\u636e\u5728\u4e91\u57faLLM\u63d0\u4f9b\u5546\u548c\u7b2c\u4e09\u65b9\u63d2\u4ef6\u4e2d\u88ab\u5b58\u50a8\u3001\u5904\u7406\u548c\u5171\u4eab\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCasper\u7684\u63d0\u793a\u51c0\u5316\u6280\u672f\uff0c\u65e8\u5728\u901a\u8fc7\u68c0\u6d4b\u5e76\u4ece\u7528\u6237\u8f93\u5165\u4e2d\u5220\u9664\u654f\u611f\u4fe1\u606f\u6765\u4fdd\u62a4\u7528\u6237\u9690\u79c1\uff0c\u4ece\u800c\u5728\u53d1\u9001\u7ed9LLM\u670d\u52a1\u4e4b\u524d\u4fdd\u62a4\u7528\u6237\u9690\u79c1\u3002Casper\u5b8c\u5168\u4f5c\u4e3a\u6d4f\u89c8\u5668\u6269\u5c55\u8fd0\u884c\u5728\u7528\u6237\u7684\u8bbe\u5907\u4e0a\uff0c\u65e0\u9700\u5bf9\u5728\u7ebfLLM\u670d\u52a1\u8fdb\u884c\u4efb\u4f55\u66f4\u6539\u3002Casper\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u4e09\u5c42\u51c0\u5316\u673a\u5236\uff0c\u5305\u62ec\u89c4\u5219\u57fa\u4e8e\u8fc7\u6ee4\u5668\u3001\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u5668\u548c\u6d4f\u89c8\u5668\u672c\u5730LLM\u4e3b\u9898\u6807\u8bc6\u5668\u3002\u6211\u4eec\u4f7f\u75284000\u4e2a\u5408\u6210\u63d0\u793a\u96c6\u5bf9Casper\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u5b83\u80fd\u591f\u4ee5\u9ad8\u51c6\u786e\u7387\uff0898.5%\uff09\u6709\u6548\u5730\u8fc7\u6ee4\u51fa\u4e2a\u4eba\u53ef\u8bc6\u522b\u4fe1\u606f\uff08PII\uff09\u548c\u9690\u79c1\u654f\u611f\u8bdd\u9898\uff0889.9%\uff09\u3002|\n", "2408.06993": "|**2024-08-13**|**LLMs can Schedule**|Henrik Abgaryan et.al.|[2408.06993](http://arxiv.org/abs/2408.06993)|**[link](https://github.com/starjob42/datasetjsp)**|**\u5de5\u4f5c\u8f66\u95f4\u8c03\u5ea6\u95ee\u9898(JSSP)\u5728\u4f18\u5316\u751f\u4ea7\u6d41\u7a0b\u65b9\u9762\u4ecd\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u8be5\u95ee\u9898\u6d89\u53ca\u6709\u6548\u5206\u914d\u4efb\u52a1\u5230\u6709\u9650\u6570\u91cf\u7684\u673a\u5668\u4e0a\uff0c\u4ee5\u6700\u5c0f\u5316\u603b\u5904\u7406\u65f6\u95f4\u6216\u4f5c\u4e1a\u5ef6\u8fdf\u7b49\u56e0\u7d20\u3002\u5c3d\u7ba1\u8fd1\u671f\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\u5df2\u7ecf\u4ea7\u751f\u4e86\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4f8b\u5982\u5f3a\u5316\u5b66\u4e60\u548c\u56fe\u795e\u7ecf\u7f51\u7edc\uff0c\u4f46\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLM)\u5728JSSP\u4e2d\u7684\u6f5c\u529b\u3002\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u4e13\u95e8\u4e3a\u8bad\u7ec3LLM\u8bbe\u8ba1\u7684120k\u6570\u636e\u96c6\uff0c\u4e13\u95e8\u9488\u5bf9JSSP\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684\u8c03\u5ea6\u53ef\u4ee5\u5b9e\u73b0\u4e0e\u5176\u5b83\u795e\u7ecf\u65b9\u6cd5\u76f8\u5f53\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u91c7\u6837\u65b9\u6cd5\uff0c\u4ee5\u63d0\u9ad8LLM\u5728\u89e3\u51b3JSSP\u65f6\u7684\u6709\u6548\u6027\u3002**|\n", "2408.06941": "|**2024-08-13**|**OpenResearcher: Unleashing AI for Accelerated Scientific Research**|Yuxiang Zheng et.al.|[2408.06941](http://arxiv.org/abs/2408.06941)|**[link](https://github.com/gair-nlp/openresearcher)**|**\u5feb\u901f\u53d1\u5c55\u7684\u79d1\u5b66\u6587\u732e\u5bf9\u7814\u7a76\u4eba\u5458\u5728\u5404\u81ea\u9886\u57df\u4fdd\u6301\u6700\u65b0\u8fdb\u5c55\u548c\u63a2\u7d22\u65b0\u9886\u57df\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u5e73\u53f0\u2014\u2014OpenResearcher\uff0c\u5b83\u5229\u7528\u4eba\u5de5\u667a\u80fd\u6280\u672f\u52a0\u901f\u7814\u7a76\u8fc7\u7a0b\uff0c\u901a\u8fc7\u56de\u7b54\u7814\u7a76\u4eba\u5458\u7684\u591a\u79cd\u95ee\u9898\u6765\u5e2e\u52a9\u4ed6\u4eec\u3002OpenResearcher\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u6784\u5efa\uff0c\u7ed3\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u7279\u5b9a\u9886\u57df\u7684\u6700\u65b0\u77e5\u8bc6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u5404\u79cd\u5de5\u5177\uff0c\u4f7fOpenResearcher\u80fd\u591f\u7406\u89e3\u7814\u7a76\u4eba\u5458\u7684\u95ee\u9898\u3001\u4ece\u79d1\u5b66\u6587\u732e\u4e2d\u641c\u7d22\u3001\u7b5b\u9009\u68c0\u7d22\u5230\u7684\u4fe1\u606f\u3001\u63d0\u4f9b\u51c6\u786e\u5168\u9762\u7684\u7b54\u6848\uff0c\u5e76\u81ea\u6211\u4f18\u5316\u8fd9\u4e9b\u7b54\u6848\u3002OpenResearcher\u7075\u6d3b\u5730\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\uff0c\u5728\u6548\u7387\u4e0e\u6709\u6548\u6027\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u3002\u7ed3\u679c\uff0cOpenResearcher\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u8282\u7701\u65f6\u95f4\uff0c\u63d0\u9ad8\u4ed6\u4eec\u53d1\u73b0\u65b0\u89c1\u89e3\u548c\u63a8\u52a8\u79d1\u5b66\u7814\u7a76\u7a81\u7834\u7684\u6f5c\u529b\u3002\u6f14\u793a\u3001\u89c6\u9891\u548c\u4ee3\u7801\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/GAIR-NLP/OpenResearcher\u3002**|\n", "2408.06929": "|**2024-08-13**|**Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas**|Louis Kwok et.al.|[2408.06929](http://arxiv.org/abs/2408.06929)|**[link](https://github.com/louiskwoklf/llms-cultural-adaptability)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u591a\u6587\u5316\u73af\u5883\u4e2d\u7684\u6210\u529f\u53d6\u51b3\u4e8e\u5b83\u4eec\u7406\u89e3\u7528\u6237\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u8ba9LLM\u6a21\u62df\u4ee3\u8868\u5404\u79cd\u56fd\u7c4d\u7684\u4eba\u7c7b\u89d2\u8272\u8fdb\u884c\u95ee\u5377\u5f0f\u5fc3\u7406\u5b66\u5b9e\u9a8c\u6765\u8861\u91cf\u8fd9\u4e00\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4f7f\u7528GPT-3.5\u5bf9\u6765\u81ea15\u4e2a\u56fd\u5bb6\u76847,286\u540d\u53c2\u4e0e\u8005\u9605\u8bfb\u5e76\u56de\u5e94\u5177\u6709\u8bf4\u670d\u529b\u7684\u65b0\u95fb\u6587\u7ae0\u7684\u53cd\u5e94\u8fdb\u884c\u6a21\u62df\uff1b\u5e76\u5c06\u7ed3\u679c\u4e0e\u62e5\u6709\u76f8\u540c\u4eba\u53e3\u7edf\u8ba1\u7279\u5f81\u7684\u771f\u5b9e\u53c2\u4e0e\u8005\u6570\u636e\u96c6\u8fdb\u884c\u6bd4\u8f83\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0c\u660e\u786e\u6307\u5b9a\u4e00\u4e2a\u4eba\u7684\u5c45\u4f4f\u56fd\u53ef\u4ee5\u63d0\u9ad8GPT-3.5\u4e0e\u4ed6\u4eec\u7684\u53cd\u5e94\u7684\u4e00\u81f4\u6027\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5f15\u5165\u7684\u53d8\u5316\u663e\u8457\u964d\u4f4e\u4e86\u6574\u4f53\u4e00\u81f4\u6027\uff0c\u5e76\u4e14\u67d0\u4e9b\u8bed\u8a00\u7279\u522b\u5f71\u54cd\u4e86\u6027\u80fd\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u76f4\u63a5\u63d0\u4f9b\u56fd\u7c4d\u4fe1\u606f\u53ef\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6587\u5316\u9002\u5e94\u6027\uff0c\u4f46\u4f7f\u7528\u6bcd\u8bed\u63d0\u793a\u5e76\u4e0d\u4e00\u5b9a\u80fd\u53ef\u9760\u5730\u63d0\u9ad8\u6a21\u62df\u51c6\u786e\u6027\uff0c\u53cd\u800c\u53ef\u80fd\u635f\u5bb3\u6a21\u578b\u7684\u6709\u6548\u6027\u3002|\n", "2408.06904": "|**2024-08-13**|**Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives**|Zhihu Wang et.al.|[2408.06904](http://arxiv.org/abs/2408.06904)|null|\u968f\u7740\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6301\u7eed\u6269\u5c55\uff0c\u5b83\u4eec\u5728\u6027\u80fd\u4e0a\u7684\u589e\u5f3a\u5f80\u5f80\u4e0d\u8db3\u4ee5\u89e3\u51b3\u7279\u5b9a\u9886\u57df\u7684\u4efb\u52a1\u3002\u7cfb\u7edf\u6027\u5730\u5206\u6790\u8fd9\u4e9b\u5931\u8d25\u5e76\u6709\u6548\u63d0\u5347\u5176\u6027\u80fd\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86Re-TASK\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u7406\u8bba\u6a21\u578b\uff0c\u4ece\u80fd\u529b\u3001\u6280\u80fd\u3001\u77e5\u8bc6\u7684\u89d2\u5ea6\u91cd\u65b0\u5ba1\u89c6LLM\u4efb\u52a1\uff0c\u9075\u5faa\u5e03\u5362\u59c6\u5206\u7c7b\u6cd5\u548c\u77e5\u8bc6\u7a7a\u95f4\u7406\u8bba\u7684\u539f\u5219\u3002Re-TASK\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u6df1\u5316\u6211\u4eec\u5bf9LLM\u7684\u7406\u89e3\u3001\u8bc4\u4f30\u548c\u63d0\u5347\uff0c\u7279\u522b\u9488\u5bf9\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u3002\u5b83\u63a2\u7d22\u4e86LLM\u7684\u80fd\u529b\u3001\u5904\u7406\u7684\u77e5\u8bc6\u4ee5\u53ca\u5e94\u7528\u7684\u6280\u80fd\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u9610\u660e\u4e86\u8fd9\u4e9b\u5143\u7d20\u5982\u4f55\u76f8\u4e92\u5173\u8054\u5e76\u5f71\u54cd\u4efb\u52a1\u8868\u73b0\u3002 \u901a\u8fc7\u5e94\u7528Re-TASK\u6846\u67b6\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u8bb8\u591a\u7279\u5b9a\u9886\u57df\u4efb\u52a1\u5931\u8d25\u7684\u539f\u56e0\u4e3b\u8981\u5f52\u548e\u4e8e\u77e5\u8bc6\u4e0d\u8db3\u6216\u6280\u80fd\u9002\u5e94\u5ea6\u4e0d\u591f\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u7ed3\u6784\u5316\u7684\u7b56\u7565\u6765\u589e\u5f3aLLM\uff0c\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u77e5\u8bc6\u6ce8\u5165\u548c\u6280\u80fd\u9002\u5e94\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u5173\u952e\u80fd\u529b\u9879\uff0c\u5e76\u91c7\u7528\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7b56\u7565\u6765\u63d0\u5347\u4efb\u52a1\u6027\u80fd\uff0c\u4ece\u800c\u51cf\u5c11\u5927\u91cf\u5fae\u8c03\u7684\u9700\u6c42\u3002\u6216\u8005\uff0c\u6211\u4eec\u4f7f\u7528\u80fd\u529b\u7279\u5b9a\u6307\u4ee4\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86\u663e\u8457\u63d0\u9ad8LLM\u5728\u6027\u80fd\u548c\u9002\u7528\u6027\u65b9\u9762\u7684\u6548\u679c\u3002|\n", "2408.06874": "|**2024-08-13**|**Leveraging Language Models for Emotion and Behavior Analysis in Education**|Kaito Tanaka et.al.|[2408.06874](http://arxiv.org/abs/2408.06874)|null|\u5206\u6790\u5b66\u751f\u7684\u60c5\u7eea\u548c\u884c\u4e3a\u5bf9\u4e8e\u63d0\u5347\u5b66\u4e60\u6548\u679c\u4e0e\u4e2a\u6027\u5316\u6559\u80b2\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u4f20\u7edf\u65b9\u6cd5\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5bf9\u4fb5\u5165\u6027\u7684\u89c6\u89c9\u548c\u751f\u7406\u6570\u636e\u6536\u96c6\uff0c\u8fd9\u5f15\u53d1\u4e86\u9690\u79c1\u95ee\u9898\u5e76\u9650\u5236\u4e86\u89c4\u6a21\u6027\u5e94\u7528\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u63d0\u793a\u5de5\u7a0b\u6765\u5206\u6790\u5b66\u751f\u7684\u6587\u672c\u6570\u636e\u3002\u6211\u4eec\u7684\u7b56\u7565\u901a\u8fc7\u5b9a\u5236\u7684\u63d0\u793a\u5f15\u5bfcLLMs\u68c0\u6d4b\u60c5\u611f\u548c\u53c2\u4e0e\u72b6\u6001\uff0c\u63d0\u4f9b\u4e00\u79cd\u975e\u4fb5\u5165\u6027\u3001\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\u3002\u6211\u4eec\u4f7f\u7528Qwen\u3001ChatGPT\u3001Claude2\u548cGPT-4\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u4e0e\u57fa\u7840\u6a21\u578b\u548c\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u51c6\u786e\u6027\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u6a21\u578b\u3002\u8fd9\u9879\u7814\u7a76\u5f3a\u8c03\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7ed3\u5408\u63d0\u793a\u5de5\u7a0b\u5728\u63d0\u4f9b\u5b9e\u7528\u6709\u6548\u5de5\u5177\u4ee5\u8fdb\u884c\u6559\u80b2\u60c5\u7eea\u548c\u884c\u4e3a\u5206\u6790\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.06854": "|**2024-08-13**|**LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models**|Jia-Chen Zhang et.al.|[2408.06854](http://arxiv.org/abs/2408.06854)|null|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ee5\u5b9e\u73b0\u9ad8\u53c2\u6570\u6548\u7387\u5e76\u5e94\u7528\u4e8e\u4e0b\u6e38\u4efb\u52a1\u5df2\u6210\u4e3a\u65b0\u7684\u7814\u7a76\u65b9\u5411\u3002\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u663e\u8457\u964d\u4f4e\u4e86\u7ec6\u8c03\u65f6\u7684\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u3002\u5c3d\u7ba1\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u590d\u6742\u4e0b\u6e38\u4efb\u52a1\u4e2d\uff0c\u4ec5\u5728\u5355\u4e00\u5c3a\u5ea6\u4e0a\u8c03\u53c2\u53ef\u80fd\u5e76\u975e\u6700\u4f18\u7b56\u7565\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u6269\u5c55LoRA\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3aLoRA$^2$\u3002\u9996\u5148\uff0c\u901a\u8fc7\u7ed3\u5408\u6b63\u4ea4\u6295\u5f71\u7406\u8bba\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e24\u7ec4\u5728\u76f8\u4e92\u6b63\u4ea4\u5e73\u9762\u4e0a\u7684LoRA\u96c6\u5408\u3002\u7136\u540e\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u91cd\u8981\u6027\u8bc4\u5206\u7b97\u6cd5\uff0c\u8be5\u7b97\u6cd5\u5927\u7ea6\u51cf\u5c11\u4e8698.5%\u7684\u53c2\u6570\u654f\u611f\u5ea6\u8ba1\u7b97\u3002\u901a\u8fc7\u53bb\u9664\u5177\u6709\u8f83\u4f4e\u91cd\u8981\u6027\u5206\u6570\u7684\u5947\u5f02\u503c\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u5bf9\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u7684\u9002\u5e94\u80fd\u529b\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1LoRA$^2$\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u5168\u91cf\u7ec6\u8c03\u76f8\u6bd4\uff0c\u5b83\u4ec5\u5c06\u53ef\u8bad\u7ec3\u53c2\u6570\u6570\u91cf\u51cf\u5c11\u81f30.72%\uff0c\u540c\u65f6\u4ecd\u80fd\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u5373\u4f7f\u8fdb\u4e00\u6b65\u5c06\u53c2\u6570\u51cf\u5c11\u81f30.17M\uff0c\u5176\u7ed3\u679c\u4e5f\u4e0e\u57fa\u7ebf\u6a21\u578b\uff08\u53c2\u6570\u91cf\u591a\u51fa8\u500d\uff09\u76f8\u5f53\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1a|\n", "2408.06849": "|**2024-08-13**|**Causal Agent based on Large Language Model**|Kairong Han et.al.|[2408.06849](http://arxiv.org/abs/2408.06849)|**[link](https://github.com/kairong-han/causal_agent)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\u3002\u7136\u800c\uff0c\u56e0\u679c\u95ee\u9898\u7684\u5185\u5728\u590d\u6742\u6027\u548c\u56e0\u679c\u7406\u8bba\u4f7f\u5f97\u7528\u81ea\u7136\u8bed\u8a00\u51c6\u786e\u63cf\u8ff0\u5b83\u4eec\u53d8\u5f97\u56f0\u96be\uff0c\u8fd9\u963b\u788d\u4e86LLM\u6709\u6548\u5730\u7406\u89e3\u548c\u4f7f\u7528\u5b83\u4eec\u7684\u80fd\u529b\u3002\u7528\u81ea\u7136\u8bed\u8a00\u4f20\u8fbe\u56e0\u679c\u65b9\u6cd5\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fd9\u9650\u5236\u4e86LLM\u5e94\u7528\u5b83\u4eec\u7684\u51c6\u786e\u6027\u3002\u6b64\u5916\uff0c\u56e0\u679c\u6570\u636e\u96c6\u901a\u5e38\u4ee5\u8868\u683c\u5f62\u5f0f\u5b58\u5728\uff0c\u800cLLM\u5728\u5904\u7406\u81ea\u7136\u8bed\u8a00\u6570\u636e\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u79cd\u7ed3\u6784\u4e0a\u7684\u4e0d\u5339\u914d\u59a8\u788d\u4e86\u5bf9\u8868\u683c\u6570\u636e\u7684\u6709\u6548\u63a8\u7406\u3002\u7f3a\u4e4f\u56e0\u679c\u63a8\u7406\u80fd\u529b\u9650\u5236\u4e86LLM\u7684\u53d1\u5c55\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u4e3aLLM\u914d\u5907\u4e86\u56e0\u679c\u5de5\u5177\uff0c\u5e76\u5c06\u5176\u7f6e\u4e8e\u4e00\u4e2a\u4ee3\u7406\u6846\u67b6\u4e2d\uff0c\u79f0\u4e3a\u201c\u56e0\u679c\u4ee3\u7406\u201d\u3002\u8be5\u4ee3\u7406\u5305\u62ec\u5de5\u5177\u3001\u8bb0\u5fc6\u548c\u63a8\u7406\u6a21\u5757\u3002\u5728\u5de5\u5177\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u901a\u8fc7\u5c06\u8868\u683c\u6570\u636e\u4e0e\u81ea\u7136\u8bed\u8a00\u5bf9\u9f50\u6765\u5e94\u7528\u56e0\u679c\u65b9\u6cd5\u3002\u5728\u63a8\u7406\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u91c7\u7528ReAct\u6846\u67b6\u591a\u6b21\u8fed\u4ee3\u4f7f\u7528\u8fd9\u4e9b\u5de5\u5177\u8fdb\u884c\u63a8\u7406\u3002\u5728\u8bb0\u5fc6\u6a21\u5757\u4e2d\uff0c\u56e0\u679c\u4ee3\u7406\u7ef4\u62a4\u4e86\u4e00\u4e2a\u5b57\u5178\u5b9e\u4f8b\uff0c\u5176\u4e2d\u952e\u662f\u552f\u4e00\u7684\u540d\u79f0\uff0c\u503c\u662f\u56e0\u679c\u56fe\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u56e0\u679c\u4ee3\u7406\u7684\u56e0\u679c\u80fd\u529b\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u57fa\u51c6\uff0c\u5305\u62ec\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\uff1a\u53d8\u91cf\u7ea7\u522b\u3001\u8fb9\u7ea7\u522b\u3001\u56e0\u679c\u56fe\u7ea7\u522b\u548c\u56e0\u679c\u6548\u5e94\u7ea7\u522b\u3002\u6211\u4eec\u4f7f\u7528ChatGPT-3.5\u751f\u6210\u4e861300\u4e2a\u9488\u5bf9\u8fd9\u56db\u4e2a\u5c42\u6b21\u95ee\u9898\u7684\u6d4b\u8bd5\u6570\u636e\u96c6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u56e0\u679c\u4ee3\u7406\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u56db\u4e2a\u5c42\u6b21\u7684\u56e0\u679c\u95ee\u9898\u4e0a\u8868\u73b0\u51fa\u6781\u9ad8\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u5747\u8d85\u8fc780%\u3002 \u4e3a\u4e86\u8fdb\u4e00\u6b65\u6d1e\u5bdf\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u901a\u8fc7GitHub\u4ed3\u5e93https://github.com/Kairong-Han/Causal_Agent\u83b7\u53d6\u3002**|\n", "2408.07702": "|**2024-08-14**|**The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models**|Karime Maamari et.al.|[2408.07702](http://arxiv.org/abs/2408.07702)|null|Schema linking is a crucial step in Text-to-SQL pipelines, which translate natural language queries into SQL. The goal of schema linking is to retrieve relevant tables and columns (signal) while disregarding irrelevant ones (noise). However, imperfect schema linking can often exclude essential columns needed for accurate query generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at identifying relevant schema elements during generation, without the need for explicit schema linking. This allows Text-to-SQL pipelines to bypass schema linking entirely and instead pass the full database schema to the LLM, eliminating the risk of excluding necessary information. Furthermore, as alternatives to schema linking, we propose techniques that improve Text-to-SQL accuracy without compromising on essential schema information. Our approach achieves 71.83\\% execution accuracy on the BIRD benchmark, ranking first at the time of submission.|\n", "2408.07666": "|**2024-08-15**|**Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities**|Enneng Yang et.al.|[2408.07666](http://arxiv.org/abs/2408.07666)|**[link](https://github.com/ennengyang/awesome-model-merging-methods-theories-applications)**|**Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at \\url{https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications}.**|\n", "2408.07665": "|**2024-08-14**|**Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models**|Yi-Cheng Lin et.al.|[2408.07665](http://arxiv.org/abs/2408.07665)|**[link](https://github.com/dlion168/spoken_stereoset)**|Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address these biases. This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in SLLMs. By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. Our experiments reveal significant insights into their performance and bias levels. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.|\n", "2408.07663": "|**2024-08-14**|**Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions**|Quan Liu et.al.|[2408.07663](http://arxiv.org/abs/2408.07663)|**[link](https://github.com/gigabaozi/aed)**|**Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root causes of jailbreak issues. We first define the Competitive Index to quantify alignment failures and utilize feedback from self-evaluation to compute post-alignment logits. Then, AED adaptively combines AED and post-alignment logits with the original logits to obtain harmless and helpful distributions. Consequently, our method enhances safety alignment while maintaining helpfulness. We conduct experiments across five models and four common jailbreaks, with the results validating the effectiveness of our approach. Code is available at https://github.com/GIGABaozi/AED.git.**|\n", "2408.07611": "|**2024-08-14**|**WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs**|Weijian Xie et.al.|[2408.07611](http://arxiv.org/abs/2408.07611)|null|Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce \"phantom\" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a \"Retrieval-Augmented Generation (RAG)\" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions.|\n", "2408.07583": "|**2024-08-14**|**Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey**|Hamza Kheddar et.al.|[2408.07583](http://arxiv.org/abs/2408.07583)|null|With significant advancements in Transformers LLMs, NLP has extended its reach into many research fields due to its enhanced capabilities in text generation and user interaction. One field benefiting greatly from these advancements is cybersecurity. In cybersecurity, many parameters that need to be protected and exchanged between senders and receivers are in the form of text and tabular data, making NLP a valuable tool in enhancing the security measures of communication protocols. This survey paper provides a comprehensive analysis of the utilization of Transformers and LLMs in cyber-threat detection systems. The methodology of paper selection and bibliometric analysis is outlined to establish a rigorous framework for evaluating existing research. The fundamentals of Transformers are discussed, including background information on various cyber-attacks and datasets commonly used in this field. The survey explores the application of Transformers in IDSs, focusing on different architectures such as Attention-based models, LLMs like BERT and GPT, CNN/LSTM-Transformer hybrids, emerging approaches like ViTs, among others. Furthermore, it explores the diverse environments and applications where Transformers and LLMs-based IDS have been implemented, including computer networks, IoT devices, critical infrastructure protection, cloud computing, SDN, as well as in autonomous vehicles. The paper also addresses research challenges and future directions in this area, identifying key issues such as interpretability, scalability, and adaptability to evolving threats, and more. Finally, the conclusion summarizes the findings and highlights the significance of Transformers and LLMs in enhancing cyber-threat detection capabilities, while also outlining potential avenues for further research and development.|\n", "2408.07543": "|**2024-08-15**|**MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark**|Minxuan Zhou et.al.|[2408.07543](http://arxiv.org/abs/2408.07543)|**[link](https://github.com/PKU-Baichuan-MLSystemLab/MathScape)**|With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchmarks have not sufficiently integrated visual and textual information. To address this gap, we proposed MathScape, a new benchmark that emphasizes the understanding and application of combined visual and textual information. MathScape is designed to evaluate photo-based math problem scenarios, assessing the theoretical understanding and application ability of MLLMs through a categorical hierarchical approach. We conduct a multi-dimensional evaluation on 11 advanced MLLMs, revealing that our benchmark is challenging even for the most sophisticated models. By analyzing the evaluation results, we identify the limitations of MLLMs, offering valuable insights for enhancing model performance.|\n", "2408.07537": "|**2024-08-15**|**Usefulness of data flow diagrams and large language models for security threat validation: a registered report**|Winnie Bahati Mbaka et.al.|[2408.07537](http://arxiv.org/abs/2408.07537)|null|The arrival of recent cybersecurity standards has raised the bar for security assessments in organizations, but existing techniques don't always scale well. Threat analysis and risk assessment are used to identify security threats for new or refactored systems. Still, there is a lack of definition-of-done, so identified threats have to be validated which slows down the analysis. Existing literature has focused on the overall performance of threat analysis, but no previous work has investigated how deep must the analysts dig into the material before they can effectively validate the identified security threats. We propose a controlled experiment with practitioners to investigate whether some analysis material (like LLM-generated advice) is better than none and whether more material (the system's data flow diagram and LLM-generated advice) is better than some material. In addition, we present key findings from running a pilot with 41 MSc students, which are used to improve the study design. Finally, we also provide an initial replication package, including experimental material and data analysis scripts and a plan to extend it to include new materials based on the final data collection campaign with practitioners (e.g., pre-screening questions).|\n", "2408.07531": "|**2024-08-14**|**Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments**|Seungjun Han et.al.|[2408.07531](http://arxiv.org/abs/2408.07531)|null|Emergency department (ED) overcrowding and the complexity of rapid decision-making in critical care settings pose significant challenges to healthcare systems worldwide. While clinical decision support systems (CDSS) have shown promise, the integration of large language models (LLMs) offers new possibilities for enhancing triage accuracy and clinical decision-making. This study presents an LLM-driven CDSS designed to assist ED physicians and nurses in patient triage, treatment planning, and overall emergency care management. We developed a multi-agent CDSS utilizing Llama-3-70b as the base LLM, orchestrated by CrewAI and Langchain. The system comprises four AI agents emulating key ED roles: Triage Nurse, Emergency Physician, Pharmacist, and ED Coordinator. It incorporates the Korean Triage and Acuity Scale (KTAS) for triage assessment and integrates with the RxNorm API for medication management. The model was evaluated using the Asclepius dataset, with performance assessed by a clinical emergency medicine specialist. The CDSS demonstrated high accuracy in triage decision-making compared to the baseline of a single-agent system. Furthermore, the system exhibited strong performance in critical areas, including primary diagnosis, critical findings identification, disposition decision-making, treatment planning, and resource allocation. Our multi-agent CDSS demonstrates significant potential for supporting comprehensive emergency care management. By leveraging state-of-the-art AI technologies, this system offers a scalable and adaptable tool that could enhance emergency medical care delivery, potentially alleviating ED overcrowding and improving patient outcomes. This work contributes to the growing field of AI applications in emergency medicine and offers a promising direction for future research and clinical implementation.|\n", "2408.07505": "|**2024-08-14**|**Large Language Models Know What Makes Exemplary Contexts**|Quanyu Long et.al.|[2408.07505](http://arxiv.org/abs/2408.07505)|null|In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks without needing to update millions of parameters. This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts; self-rank candidates with different demonstration compositions; self-optimize the demonstration selection and ordering through reinforcement learning. Specifically, our method designs a parameter-efficient retrieval head that generates the optimized demonstration after training with rewards from LLM's own preference. Experimental results validate the proposed method's effectiveness in enhancing ICL performance. Additionally, our approach effectively identifies and selects the most representative examples for the current task, and includes more diversity in retrieval.|\n", "2408.08313": "|**2024-08-15**|**Can Large Language Models Understand Symbolic Graphics Programs?**|Zeju Qiu et.al.|[2408.08313](http://arxiv.org/abs/2408.08313)|null|Assessing the capabilities of large language models (LLMs) is often challenging, in part, because it is hard to find tasks to which they have not been exposed during training. We take one step to address this challenge by turning to a new task: focusing on symbolic graphics programs, which are a popular representation for graphics content that procedurally generates visual data. LLMs have shown exciting promise towards program synthesis, but do they understand symbolic graphics programs? Unlike conventional programs, symbolic graphics programs can be translated to graphics content. Here, we characterize an LLM's understanding of symbolic programs in terms of their ability to answer questions related to the graphics content. This task is challenging as the questions are difficult to answer from the symbolic programs alone -- yet, they would be easy to answer from the corresponding graphics content as we verify through a human experiment. To understand symbolic programs, LLMs may need to possess the ability to imagine how the corresponding graphics content would look without directly accessing the rendered visual content. We use this task to evaluate LLMs by creating a large benchmark for the semantic understanding of symbolic graphics programs. This benchmark is built via program-graphics correspondence, hence requiring minimal human efforts. We evaluate current LLMs on our benchmark to elucidate a preliminary assessment of their ability to reason about visual scenes from programs. We find that this task distinguishes existing LLMs and models considered good at reasoning perform better. Lastly, we introduce Symbolic Instruction Tuning (SIT) to improve this ability. Specifically, we query GPT4-o with questions and images generated by symbolic programs. Such data are then used to finetune an LLM. We also find that SIT data can improve the general instruction following ability of LLMs.|\n", "2408.08310": "|**2024-08-15**|**ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws**|Ruihang Li et.al.|[2408.08310](http://arxiv.org/abs/2408.08310)|null|High-quality data is crucial for the pre-training performance of large language models. Unfortunately, existing quality filtering methods rely on a known high-quality dataset as reference, which can introduce potential bias and compromise diversity. In this paper, we propose ScalingFilter, a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data, thereby eliminating the influence of the reference dataset in the filtering process. An theoretical analysis shows that ScalingFilter is equivalent to an inverse utilization of scaling laws. Through training models with 1.3B parameters on the same data source processed by various quality filters, we find ScalingFilter can improve zero-shot performance of pre-trained models in downstream tasks. To assess the bias introduced by quality filtering, we introduce semantic diversity, a metric of utilizing text embedding models for semantic representations. Extensive experiments reveal that semantic diversity is a reliable indicator of dataset diversity, and ScalingFilter achieves an optimal balance between downstream performance and semantic diversity.|\n", "2408.08302": "|**2024-08-15**|**Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors**|Usman Syed et.al.|[2408.08302](http://arxiv.org/abs/2408.08302)|null|In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, GPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, Llama 3, and Llama 3.1 in solving some selected undergraduate-level transportation engineering problems. We introduce TransportBench, a benchmark dataset that includes a sample of transportation engineering problems on a wide range of subjects in the context of planning, design, management, and control of transportation systems. This dataset is used by human experts to evaluate the capabilities of various commercial and open-sourced LLMs, especially their accuracy, consistency, and reasoning behaviors, in solving transportation engineering problems. Our comprehensive analysis uncovers the unique strengths and limitations of each LLM, e.g. our analysis shows the impressive accuracy and some unexpected inconsistent behaviors of Claude 3.5 Sonnet in solving TransportBench problems. Our study marks a thrilling first step toward harnessing artificial general intelligence for complex transportation challenges.|\n", "2408.08300": "|**2024-08-15**|**HELP: Hierarchical Embeddings-based Log Parsing**|Andy Xu et.al.|[2408.08300](http://arxiv.org/abs/2408.08300)|null|Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heuristics-based parsers require handcrafted features and domain knowledge, which are difficult to generalize at scale. Second, existing large language model-based parsers rely on periodic offline processing, limiting their effectiveness in real-time use cases. Third, existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies. To address these challenges, we propose HELP, a Hierarchical Embeddings-based Log Parser. HELP is the first online semantic-based parser to leverage LLMs for performant and cost-effective log parsing. We achieve this through a novel hierarchical embeddings module, which fine-tunes a text embedding model to cluster logs before parsing, reducing querying costs by multiple orders of magnitude. To combat log drift, we also develop an iterative rebalancing module, which periodically updates existing log groupings. We evaluate HELP extensively on 14 public large-scale datasets, showing that HELP achieves significantly higher F1-weighted grouping and parsing accuracy than current state-of-the-art online log parsers. We also implement HELP into Iudex's production observability platform, confirming HELP's practicality in a production environment. Our results show that HELP is effective and efficient for high-throughput real-world log parsing.|\n", "2408.08291": "|**2024-08-15**|**The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community**|Shachar Don-Yehiya et.al.|[2408.08291](http://arxiv.org/abs/2408.08291)|null|Human-model conversations provide a window into users' real-world scenarios, behavior, and needs, and thus are a valuable resource for model development and research. While for-profit companies collect user data through the APIs of their models, using it internally to improve their own models, the open source and research community lags behind. We introduce the ShareLM collection, a unified set of human conversations with large language models, and its accompanying plugin, a Web extension for voluntarily contributing user-model conversations. Where few platforms share their chats, the ShareLM plugin adds this functionality, thus, allowing users to share conversations from most platforms. The plugin allows the user to rate their conversations, both at the conversation and the response levels, and delete conversations they prefer to keep private before they ever leave the user's local storage. We release the plugin conversations as part of the ShareLM collection, and call for more community effort in the field of open human-model data. The code, plugin, and data are available.|\n", "2408.08282": "|**2024-08-15**|**Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model**|Jin Wang et.al.|[2408.08282](http://arxiv.org/abs/2408.08282)|null|Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.|\n", "2408.08274": "|**2024-08-15**|**BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts**|Qizhen Zhang et.al.|[2408.08274](http://arxiv.org/abs/2408.08274)|null|The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when \"upcycling\" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.|\n", "2408.08231": "|**2024-08-15**|**DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System**|Xihong Yang et.al.|[2408.08231](http://arxiv.org/abs/2408.08231)|null|Benefiting from the strong reasoning capabilities, Large language models (LLMs) have demonstrated remarkable performance in recommender systems. Various efforts have been made to distill knowledge from LLMs to enhance collaborative models, employing techniques like contrastive learning for representation alignment. In this work, we prove that directly aligning the representations of LLMs and collaborative models is sub-optimal for enhancing downstream recommendation tasks performance, based on the information theorem. Consequently, the challenge of effectively aligning semantic representations between collaborative models and LLMs remains unresolved. Inspired by this viewpoint, we propose a novel plug-and-play alignment framework for LLMs and collaborative models. Specifically, we first disentangle the latent representations of both LLMs and collaborative models into specific and shared components via projection layers and representation regularization. Subsequently, we perform both global and local structure alignment on the shared representations to facilitate knowledge transfer. Additionally, we theoretically prove that the specific and shared representations contain more pertinent and less irrelevant information, which can enhance the effectiveness of downstream recommendation tasks. Extensive experimental results on benchmark datasets demonstrate that our method is superior to existing state-of-the-art algorithms.|\n", "2408.08217": "|**2024-08-15**|**RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science**|David Farr et.al.|[2408.08217](http://arxiv.org/abs/2408.08217)|null|Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.|\n", "2408.08210": "|**2024-08-15**|**Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models**|Javier Gonz\u00e1lez et.al.|[2408.08210](http://arxiv.org/abs/2408.08210)|null|Recent advances in AI have been significantly driven by the capabilities of large language models (LLMs) to solve complex problems in ways that resemble human thinking. However, there is an ongoing debate about the extent to which LLMs are capable of actual reasoning. Central to this debate are two key probabilistic concepts that are essential for connecting causes to their effects: the probability of necessity (PN) and the probability of sufficiency (PS). This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms using these probabilistic measures. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which it is possible to compute suitable approximations of PN and PS. Our research marks an important step towards gaining a deeper understanding of when LLMs are capable of reasoning, as illustrated by a series of math examples.|\n", "2408.08869": "|**2024-08-16**|**PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars**|Sumanth Prabhu et.al.|[2408.08869](http://arxiv.org/abs/2408.08869)|null|\u81ea\u4e00\u81f4\u6027\u7b49\u4f9d\u8d56\u4e8e\u51c6\u786e\u7b54\u6848\u63d0\u53d6\u8fc7\u7a0b\u7684\u81ea\u6211\u96c6\u4e1b\u6280\u672f\u5df2\u7ecf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u51c6\u786e\u6027\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u7684\u63d0\u5347\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6280\u672f\u5728\u805a\u5408\u591a\u4e2a\u8f93\u51fa\u65f6\u9700\u8981\u8f83\u9ad8\u7684\u63a8\u7406\u6210\u672c\uff0c\u76f8\u8f83\u4e8e\u8d2a\u5fc3\u89e3\u7801\u800c\u8a00\uff0c\u751f\u6210\u76f8\u5bf9\u8f83\u591a\u7684\u8f93\u51fa\u4ee4\u724c\u3002\u7814\u7a76\u663e\u793a\uff0c\u81ea\u4e00\u81f4\u6027\u65b9\u6cd5\u4ea7\u751f\u7684\u81ea\u7531\u6587\u672c\u8f93\u51fa\u53ef\u4ee5\u901a\u8fc7LLM\u53ef\u9760\u5730\u805a\u5408\u4ee5\u4ea7\u751f\u6700\u7ec8\u8f93\u51fa\u3002\u6b64\u5916\uff0c\u6700\u8fd1\u7684LLM\u63a8\u7406\u8fdb\u5c55\u8868\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u591a\u6837\u5316\u7684\u793a\u4f8b\u80fd\u591f\u8bf1\u5bfcLLM\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002\u8fd9\u4e9b\u5df2\u7ecf\u8bc1\u660e\u7684\u6280\u672f\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u6269\u5c55\u5230\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u5b9e\u73b0\u6587\u672c\u751f\u6210\u7684\u6574\u4f53\u6027\u80fd\u6539\u8fdb\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aPEDAL\uff08\u57fa\u4e8e\u793a\u4f8b\u591a\u6837\u6027\u7684LLM\u805a\u5408\uff09\u7684\u6df7\u5408\u81ea\u6211\u96c6\u4e1b\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86\u57fa\u4e8e\u591a\u6837\u793a\u4f8b\u63d0\u793a\u548cLLM\u805a\u5408\u7684\u4f18\u52bf\uff0c\u4ee5\u5b9e\u73b0\u6027\u80fd\u7684\u63d0\u5347\u3002\u5728\u516c\u5f00\u7684SVAMP\u548cARC\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u63ed\u793a\uff0c\u4e0e\u57fa\u4e8e\u8d2a\u5fc3\u89e3\u7801\u7684\u7b56\u7565\u76f8\u6bd4\uff0cPEDAL\u80fd\u591f\u5728\u8f83\u4f4e\u7684\u63a8\u7406\u6210\u672c\u4e0b\u83b7\u5f97\u66f4\u597d\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u57fa\u4e8e\u81ea\u4e00\u81f4\u6027\u7684\u65b9\u6cd5\u76f8\u6bd4\u5177\u6709\u4f18\u52bf\u3002|\n", "2408.08862": "|**2024-08-16**|**Visual Agents as Fast and Slow Thinkers**|Guangyan Sun et.al.|[2408.08862](http://arxiv.org/abs/2408.08862)|**[link](https://github.com/guangyans/sys2-llava)**|\u5b9e\u73b0\u4e0e\u4eba\u7c7b\u76f8\u5f53\u7684\u667a\u80fd\u9700\u8981\u5bf9\u8ba4\u77e5\u4e0a\u7684\u7b2c\u4e00\u7cfb\u7edf\u548c\u7b2c\u4e8c\u7cfb\u7edf\u601d\u7ef4\u8fdb\u884c\u7ec6\u5316\u3002\u5f53\u524d\u7684\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684AI\uff0c\u867d\u7136\u8868\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u7279\u70b9\uff0c\u4f46\u5e76\u672a\u8fbe\u5230\u771f\u6b63\u7684\u8ba4\u77e5\u6c34\u5e73\u3002\u5728\u4ece\u7ed3\u6784\u5316\u57fa\u51c6\u5411\u771f\u5b9e\u4e16\u754c\u573a\u666f\u8fc7\u6e21\u7684\u8fc7\u7a0b\u4e2d\uff0c\u89c6\u89c9\u4ee3\u7406\u9762\u4e34\u6311\u6218\uff0c\u5f80\u5f80\u5bfc\u81f4\u56de\u7b54\u65e2\u4e0d\u51c6\u786e\u53c8\u8fc7\u4e8e\u81ea\u4fe1\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86FaST\uff08\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\uff09\uff0c\u5b83\u5c06\u5feb\u901f\u4e0e\u7f13\u6162\u601d\u8003\u673a\u5236\u878d\u5165\u5230\u89c6\u89c9\u4ee3\u7406\u4e2d\u3002FaST\u91c7\u7528\u5207\u6362\u9002\u914d\u5668\u52a8\u6001\u9009\u62e9\u7cfb\u7edf1/2\u6a21\u5f0f\uff0c\u6839\u636e\u4efb\u52a1\u7684\u590d\u6742\u6027\u8c03\u6574\u89e3\u51b3\u95ee\u9898\u7684\u65b9\u6cd5\u3002\u9762\u5bf9\u4e0d\u786e\u5b9a\u548c\u672a\u89c1\u8fc7\u7684\u5bf9\u8c61\u65f6\uff0c\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u4fe1\u5fc3\u5e76\u6574\u5408\u65b0\u7684\u4e0a\u4e0b\u6587\u6570\u636e\uff0c\u5b83\u80fd\u591f\u7075\u6d3b\u5e94\u5bf9\u3002 \u6211\u4eec\u63d0\u5021\u4e00\u4e2a\u7075\u6d3b\u7684\u7cfb\u7edf\u3001\u5c42\u6b21\u5316\u7684\u63a8\u7406\u80fd\u529b\u548c\u900f\u660e\u7684\u51b3\u7b56\u6d41\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u4f7f\u5f97FaST\u80fd\u591f\u6a21\u4eff\u4eba\u7c7b\u5728\u89c6\u89c9\u667a\u80fd\u4e2d\u7684\u8ba4\u77e5\u8fc7\u7a0b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFaST\u5728\u89c6\u89c9\u95ee\u7b54(VQA^{v2})\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8680.8%\u7684\u51c6\u786e\u7387\uff0c\u5728\u63a8\u7406\u5206\u5272(ReasonSeg)\u4efb\u52a1\u4e0a\u83b7\u5f97\u4e8648.7%\u7684GIoU\u5206\u6570\uff0c\u8fd9\u5145\u5206\u5c55\u793a\u4e86FaST\u7684\u4f18\u8d8a\u6027\u80fd\u3002\u5e7f\u6cdb\u7684\u6d4b\u8bd5\u9a8c\u8bc1\u4e86FaST\u6838\u5fc3\u7ec4\u4ef6\u7684\u6709\u6548\u6027\u548c\u7a33\u5065\u6027\uff0c\u663e\u793a\u4e86\u5176\u5728\u63a8\u52a8\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u4e2d\u8ba4\u77e5\u89c6\u89c9\u4ee3\u7406\u7684\u53d1\u5c55\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.08849": "|**2024-08-16**|**ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis**|Yubao Zhao et.al.|[2408.08849](http://arxiv.org/abs/2408.08849)|null|\u5728\u533b\u7597\u8f85\u52a9\u9886\u57df\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u6210\u529f\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f7f\u5f97\u60a3\u8005\u80fd\u591f\u5229\u7528\u751f\u7406\u4fe1\u53f7\u6570\u636e\u8fdb\u884c\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u901a\u7528\u7684MLLMs\u5728\u5fc3\u810f\u75c5\u8bca\u65ad\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\uff0c\u5c24\u5176\u662f\u5728ECG\u6570\u636e\u89e3\u6790\u4e0e\u957f\u6587\u672c\u533b\u5b66\u62a5\u544a\u751f\u6210\u7684\u6574\u5408\u4e0a\uff0c\u4e3b\u8981\u539f\u56e0\u662fECG\u6570\u636e\u89e3\u6790\u7684\u590d\u6742\u6027\u4ee5\u53ca\u6587\u672c\u4e0eECG\u4fe1\u53f7\u6a21\u6001\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6b64\u5916\uff0c\u6a21\u578b\u5728\u957f\u6587\u672c\u751f\u6210\u65f6\u5f80\u5f80\u5b58\u5728\u4e25\u91cd\u7684\u7a33\u5b9a\u6027\u95ee\u9898\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u7f3a\u4e4f\u4e0e\u7528\u6237\u67e5\u8be2\u7d27\u5bc6\u76f8\u5173\u7684\u7cbe\u786e\u77e5\u8bc6\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aECG-Chat\u7684\u591a\u4efb\u52a1MLLM\uff0c\u4e13\u6ce8\u4e8eECG\u533b\u5b66\u62a5\u544a\u751f\u6210\uff0c\u5e76\u63d0\u4f9b\u57fa\u4e8e\u5fc3\u810f\u75c5\u5b66\u77e5\u8bc6\u7684\u8de8\u6a21\u6001\u5bf9\u8bdd\u80fd\u529b\u3002\u6211\u4eec\u91c7\u7528\u4e86\u5bf9\u6bd4\u5b66\u4e60\u65b9\u6cd5\uff0c\u5c06ECG\u6ce2\u5f62\u6570\u636e\u4e0e\u6587\u672c\u62a5\u544a\u7ed3\u5408\uff0c\u4ee5\u7cbe\u7ec6\u7684\u65b9\u5f0f\u5bf9\u9f50ECG\u7279\u5f81\u4e0e\u62a5\u544a\u5185\u5bb9\u3002\u8fd9\u79cd\u65b9\u6cd5\u8fd8\u4ea7\u751f\u4e86\u4e00\u4e2a\u5728\u96f6\u6837\u672c\u62a5\u544a\u68c0\u7d22\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u7684ECG\u7f16\u7801\u5668\u3002\u6b64\u5916\uff0c\u6211\u4eec\u901a\u8fc7\u6269\u5c55\u73b0\u6709\u6570\u636e\u96c6\uff0c\u6784\u5efa\u4e86\u5305\u542b19K\u4e2aECG\u8bca\u65ad\u6570\u636e\u96c6\u548c25K\u4e2a\u591a\u8f6e\u5bf9\u8bdd\u6570\u636e\u96c6\u7528\u4e8e\u8bad\u7ec3\u548c\u5fae\u8c03ECG-Chat\uff0c\u4ece\u800c\u63d0\u4f9b\u4e13\u4e1a\u7684\u8bca\u65ad\u548c\u5bf9\u8bdd\u80fd\u529b\u3002\u6b64\u5916\uff0cECG-Chat\u53ef\u4ee5\u901a\u8fc7\u81ea\u52a8\u5316LaTeX\u751f\u6210\u7ba1\u9053\u6765\u751f\u6210\u5168\u9762\u7684ECG\u5206\u6790\u62a5\u544a\u3002\u6211\u4eec\u4e3aECG\u62a5\u544a\u751f\u6210\u4efb\u52a1\u5efa\u7acb\u4e86\u57fa\u51c6\uff0c\u5e76\u5728\u591a\u4e2a\u57fa\u7ebf\u4e0a\u6d4b\u8bd5\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002ECG-Chat\u5728\u5206\u7c7b\u3001\u68c0\u7d22\u3001\u591a\u6a21\u6001\u5bf9\u8bdd\u548c\u533b\u5b66\u62a5\u544a\u751f\u6210\u4efb\u52a1\u4e2d\u5747\u53d6\u5f97\u4e86\u6700\u4f73\u6027\u80fd\u3002\u6211\u4eec\u7684\u62a5\u544a\u6a21\u677f\u8bbe\u8ba1\u4e5f\u5f97\u5230\u4e86\u533b\u7597\u4e13\u4e1a\u4eba\u5458\u7684\u4e00\u81f4\u8ba4\u53ef\u3002|\n", "2408.08848": "|**2024-08-16**|**PsychoLex: Unveiling the Psychological Mind of Large Language Models**|Mohammad Amin Abbasi et.al.|[2408.08848](http://arxiv.org/abs/2408.08848)|null|\u8fd9\u7bc7\u8bba\u6587\u63a2\u8ba8\u4e86\u5fc3\u7406\u5b66\u4e0e\u4eba\u5de5\u667a\u80fd\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5f00\u53d1\u548c\u8bc4\u4f30\u4e13\u7528\u4e8e\u5fc3\u7406\u4efb\u52a1\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u6211\u4eec\u5f15\u5165\u4e86PsychoLex\u5957\u4ef6\uff0c\u65e8\u5728\u589e\u5f3aLLMs\u5728\u6ce2\u65af\u8bed\u548c\u82f1\u8bed\u4e2d\u7684\u5fc3\u7406\u4efb\u52a1\u5904\u7406\u80fd\u529b\u3002\u4e3b\u8981\u8d21\u732e\u5305\u62ecPsychoLexQA\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u6559\u5b66\u5185\u5bb9\u7684\u521b\u5efa\uff0c\u4ee5\u53caPsychoLexEval\u6570\u636e\u96c6\uff0c\u7528\u4e8e\u5bf9LLMs\u5728\u590d\u6742\u5fc3\u7406\u60c5\u666f\u4e0b\u7684\u4e25\u683c\u8bc4\u4f30\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u4ecb\u7ecd\u4e86PsychoLexLLaMA\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u7279\u522b\u4f18\u5316\u4ee5\u9002\u7528\u4e8e\u5fc3\u7406\u5e94\u7528\uff0c\u5176\u6027\u80fd\u660e\u663e\u4f18\u4e8e\u901a\u7528\u6a21\u578b\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86\u5b9a\u5236LLMs\u5728\u63a8\u8fdb\u5fc3\u7406\u7814\u7a76\u548c\u5e94\u7528\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u540c\u65f6\u4e5f\u6307\u51fa\u4e86\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u9886\u57df\u3002\u8fd9\u9879\u7814\u7a76\u4e3a\u5c06LLMs\u878d\u5165\u7279\u5b9a\u7684\u5fc3\u7406\u5b66\u9886\u57df\u5960\u5b9a\u4e86\u57fa\u7840\uff0c\u5bf9\u672a\u6765AI\u9a71\u52a8\u7684\u5fc3\u7406\u5b9e\u8df5\u7684\u53d1\u5c55\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2408.08841": "|**2024-08-16**|**FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats**|Xuanliang Zhang et.al.|[2408.08841](http://arxiv.org/abs/2408.08841)|**[link](https://github.com/zhxlia/FLEXTAF)**|**## \u4e0a\u6587\u80cc\u666f \u8868\u683c\u63a8\u7406\u4efb\u52a1\u65e8\u5728\u6839\u636e\u7ed9\u5b9a\u7684\u8868\u683c\u56de\u7b54\u95ee\u9898\u3002\u76ee\u524d\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u662f\u8868\u683c\u63a8\u7406\u7684\u4e3b\u8981\u65b9\u6cd5\u3002\u73b0\u6709\u7684\u5927\u591a\u6570\u65b9\u6cd5\u90fd\u91c7\u7528\u56fa\u5b9a\u7684\u8868\u683c\u683c\u5f0f\u6765\u8868\u793a\u8868\u683c\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u6027\u80fd\u3002\u9274\u4e8e\u6bcf\u4e2a\u5b9e\u4f8b\u9700\u8981\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u800c\u6a21\u578b\u5177\u6709\u4e0d\u540c\u7684\u80fd\u529b\uff0c\u6211\u4eec\u65ad\u8a00\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u9002\u7528\u4e8e\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\u3002\u901a\u8fc7\u5b9e\u9a8c\u7ed3\u679c\u7684\u5b9a\u91cf\u5206\u6790\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u4e00\u70b9\uff1a\u4f7f\u7528\u4e0d\u540c\u7684\u8868\u683c\u683c\u5f0f\uff0c\u4e0d\u540c\u5b9e\u4f8b\u548c\u6a21\u578b\u53ef\u4ee5\u83b7\u5f97\u4e0d\u540c\u7684\u6027\u80fd\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u589e\u5f3a\u8868\u683c\u63a8\u7406\u6027\u80fd\u7684\u65b9\u6cd5FLEXTAF-Single\u548cFLEXTAF-Vote\uff0c\u901a\u8fc7\u4f7f\u7528\u7075\u6d3b\u7684\u8868\u683c\u683c\u5f0f\u3002\u5177\u4f53\u6765\u8bf4\uff0c(i) FLEXTAF-Single\u8bad\u7ec3\u4e00\u4e2a\u5206\u7c7b\u5668\uff0c\u57fa\u4e8e\u5b9e\u4f8b\u548cLLM\u9884\u6d4b\u6700\u9002\u5408\u7684\u8868\u683c\u683c\u5f0f\u3002(ii) FLEXTAF-Vote\u5728\u4e0d\u540c\u683c\u5f0f\u4e4b\u95f4\u96c6\u6210\u7ed3\u679c\u3002\u6211\u4eec\u5728WikiTableQuestions\u548cTabFact\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u4e86\u663e\u8457\u7684\u6539\u8fdb\uff0c\u4e0e\u4f7f\u7528\u56fa\u5b9a\u8868\u683c\u683c\u5f0f\u5e76\u7ed3\u5408\u8d2a\u5a6a\u89e3\u7801\u548c\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u8fbe\u5230\u7684\u6700\u4f73\u6027\u80fd\u76f8\u6bd4\uff0c\u5e73\u5747\u63d0\u9ad8\u4e862.3%\u548c4.8%\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.08811": "|**2024-08-16**|**Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors**|Felipe A. Csaszar et.al.|[2408.08811](http://arxiv.org/abs/2408.08811)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u5982\u4f55\u5f71\u54cd\u4f01\u4e1a\u6218\u7565\u51b3\u7b56\u8fc7\u7a0b\u3002\u6211\u4eec\u901a\u8fc7\u5b9e\u4f8b\u5c55\u793a\u4e86AI\u5982\u4f55\u589e\u5f3a\u73b0\u6709\u6218\u7565\u51b3\u7b56\u5de5\u5177\uff0c\u5e76\u63d0\u4f9b\u4e86\u6765\u81ea\u9886\u5148\u52a0\u901f\u5668\u8ba1\u5212\u548c\u521b\u4e1a\u7ade\u8d5b\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660e\u5f53\u524d\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u548c\u8bc4\u4f30\u7b56\u7565\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u4f01\u4e1a\u5bb6\u548c\u6295\u8d44\u8005\u76f8\u5f53\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5206\u6790\u4e86\u6218\u7565\u51b3\u7b56\u80cc\u540e\u7684\u5173\u952e\u8ba4\u77e5\u8fc7\u7a0b\u2014\u2014\u641c\u7d22\u3001\u8868\u793a\u548c\u805a\u5408\uff0c\u5e76\u63d0\u51faAI\u6709\u53ef\u80fd\u63d0\u5347\u6218\u7565\u5206\u6790\u7684\u901f\u5ea6\u3001\u8d28\u91cf\u548c\u89c4\u6a21\uff0c\u540c\u65f6\u8fd8\u80fd\u542f\u7528\u5982\u865a\u62df\u6218\u7565\u6a21\u62df\u7b49\u65b0\u65b9\u6cd5\u3002\u7136\u800c\uff0cAI\u5bf9\u4f01\u4e1a\u53d1\u5c55\u7684\u5f71\u54cd\u6700\u7ec8\u53d6\u51b3\u4e8e\u7ade\u4e89\u52a8\u6001\u4ee5\u53caAI\u80fd\u529b\u7684\u53d1\u5c55\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u5c06AI\u5728\u6218\u7565\u51b3\u7b56\u4e2d\u7684\u5e94\u7528\u4e0e\u4f01\u4e1a\u7ed3\u679c\u8054\u7cfb\u8d77\u6765\uff0c\u5e76\u8ba8\u8bba\u4e86AI\u5982\u4f55\u91cd\u5851\u7ade\u4e89\u4f18\u52bf\u7684\u6765\u6e90\u3002\u6700\u540e\uff0c\u6211\u4eec\u8003\u8651\u4e86AI\u5982\u4f55\u65e2\u652f\u6301\u53c8\u6311\u6218\u57fa\u4e8e\u7406\u8bba\u7684\u6218\u7565\u89c2\u7684\u6838\u5fc3\u539f\u5219\u3002\u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u63cf\u7ed8\u4e86\u4e00\u4e2aAI\u4e0e\u6218\u7565\u9886\u57df\u6b63\u5728\u5f62\u6210\u7684\u7814\u7a76\u524d\u6cbf\u3002|\n", "2408.08808": "|**2024-08-16**|**Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge**|Ravi Raju et.al.|[2408.08808](http://arxiv.org/abs/2408.08808)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u673a\u5668\u5b66\u4e60\u9886\u57df\u5e26\u6765\u4e86\u9769\u547d\u6027\u53d8\u5316\uff0c\u7136\u800c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u8fd9\u4e9b\u6a21\u578b\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u591a\u6837\u884c\u4e3a\u3002\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7684\u4ef7\u503c\u5728\u4e8e\u5b83\u80fd\u5426\u6e05\u6670\u533a\u5206\u4e0d\u540c\u80fd\u529b\u7ea7\u522b\u7684\u6a21\u578b\uff08\u53ef\u5206\u6027\uff09\u4ee5\u53ca\u4e0e\u4eba\u7c7b\u504f\u597d\u7684\u7d27\u5bc6\u5339\u914d\u5ea6\u3002\u5f53\u524d\u7684\u6846\u67b6\u5982Alpaca-Eval 2.0 LC \\cite{dubois2024lengthcontrolledalpacaevalsimpleway} \u548cArena-Hard v0.1 \\cite{li2024crowdsourced}\u4e3b\u8981\u5173\u6ce8\u901a\u7528\u67e5\u8be2\uff0c\u5e76\u4e14\u7f3a\u4e4f\u8de8\u6cd5\u5f8b\u3001\u533b\u5b66\u7b49\u9886\u57df\u7684\u591a\u6837\u6027\u3002\u672c\u6587\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u65b0\u9896\u7684\u6570\u636e\u7ba1\u9053\uff0c\u6765\u5b9a\u5236\u4e00\u7cfb\u5217\u591a\u5143\u5316\u7684\u3001\u9488\u5bf9LLM-as-a-Judge\u6846\u67b6\u7684\u9886\u57df\u7279\u5b9a\u8bc4\u4f30\u96c6\uff0c\u4ee5\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u7ed3\u5408\u4e86\u4eba\u5de5\u7b5b\u9009\u3001\u534a\u76d1\u7763\u5b66\u4e60\u751f\u6210\u805a\u7c7b\u4ee5\u53ca\u5206\u5c42\u62bd\u6837\uff0c\u786e\u4fdd\u5728\u5e7f\u6cdb\u9886\u57df\u548c\u8bed\u8a00\u4e2d\u90fd\u6709\u5747\u8861\u7684\u4ee3\u8868\u6027\u3002\u4ea7\u751f\u7684\u8bc4\u4f30\u96c6\u5305\u62ec1573\u4e2a\u6837\u672c\uff0c\u5206\u5e03\u572814\u4e2a\u7c7b\u522b\u4e2d\uff0c\u663e\u793a\u51fa\u9ad8\u53ef\u5206\u6027\uff0884%\uff09\u548c\u5bf9\u524d\u5341\u5927\u6a21\u578b\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u540c\u65f6\u4e0eChatbot Arena\u7684\u5171\u8bc6\u5ea6\uff0884%\uff09\u548cSpearman\u76f8\u5173\u7cfb\u6570\uff080.915\uff09\u4e5f\u8868\u73b0\u51fa\u826f\u597d\u7684\u4e00\u81f4\u6027\u3002\u4e0eAlpacaEval 2.0 LC\u7684\u5171\u8bc6\u5ea6\u76f8\u6bd4\uff0c\u8fd9\u4e00\u503c\u9ad8\u51fa9%\uff0c\u4e0eArena Hard\u76f8\u6bd4\u5219\u9ad8\u51fa20%\uff0c\u800c\u4e0eSpearman\u7cfb\u6570\u76f8\u6bd4\u5219\u662f\u4e0b\u4e00\u4e2a\u6700\u4f73\u57fa\u51c6\u76840.7\u500d\uff0c\u8fd9\u8868\u660e\u6211\u4eec\u5728\u57fa\u51c6\u6d4b\u8bd5\u7684\u6709\u6548\u6027\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f00\u6e90\u7684\u8bc4\u4f30\u5de5\u5177\uff0c\u5141\u8bb8\u7528\u6237\u81ea\u5b9a\u4e49\u7c7b\u522b\u8fdb\u884c\u7cbe\u7ec6\u5206\u6790\uff0c\u4ece\u800c\u4e3a\u5b9e\u8df5\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6d1e\u5bdf\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u589e\u5f3aLLM\u8bc4\u4f30\u65b9\u6cd5\u7684\u900f\u660e\u5ea6\u3001\u591a\u6837\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2408.08782": "|**2024-08-16**|**EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics**|Chenwei Wan et.al.|[2408.08782](http://arxiv.org/abs/2408.08782)|**[link](https://github.com/cw-wan/EmoDynamiX-v2)**|**\u8bbe\u8ba1\u80fd\u591f\u63d0\u4f9b\u6170\u85c9\u548c\u5efa\u8bae\u7684\u5177\u6709\u60c5\u611f\u667a\u80fd\u7684\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u4ee5\u5e2e\u52a9\u90a3\u4e9b\u7ecf\u5386\u538b\u529b\u7684\u4eba\u4eec\uff0c\u662f\u4e00\u4e2a\u6781\u5177\u5438\u5f15\u529b\u7684\u7814\u7a76\u9886\u57df\u3002\u8fc7\u53bb\u7684\u7814\u7a76\u5de5\u4f5c\u7740\u91cd\u4e8e\u6784\u5efa\u6a21\u5757\u5316\u5bf9\u8bdd\u7cfb\u7edf\uff0c\u5e76\u5c06\u5176\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u89c6\u4e3a\u8f85\u52a9\u4efb\u52a1\uff0c\u901a\u8fc7\u5b9a\u5236\u89e3\u7801\u5668\u751f\u6210\u6761\u4ef6\u5316\u7684\u54cd\u5e94\u3002\u6700\u8fd1\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u65b9\u9762\u7684\u53d1\u5c55\u4f7f\u5f97\u65e0\u9700\u660e\u786e\u7684\u793e\u4f1a\u60c5\u611f\u7b56\u7565\u9884\u6d4b\u6b65\u9aa4\u7684\u7aef\u5230\u7aef\u5bf9\u8bdd\u4ee3\u7406\u53d8\u5f97\u6d41\u884c\u8d77\u6765\u3002\u5c3d\u7ba1\u5b83\u4eec\u5728\u8bed\u8a00\u751f\u6210\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u56fa\u6709\u7684\u504f\u597d\u504f\u89c1\uff0c\u503e\u5411\u4e8e\u67d0\u4e9b\u793e\u4f1a\u60c5\u611f\u7b56\u7565\uff0c\u963b\u788d\u4e86\u63d0\u4f9b\u9ad8\u8d28\u91cf\u60c5\u611f\u652f\u6301\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff1a\u5c06\u7b56\u7565\u9884\u6d4b\u4e0e\u8bed\u8a00\u751f\u6210\u5206\u79bb\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aEmoDynamiX\u7684\u65b0\u578b\u5bf9\u8bdd\u7b56\u7565\u9884\u6d4b\u5668\u3002\u8be5\u9884\u6d4b\u5668\u5229\u7528\u5f02\u6784\u56fe\u6765\u5efa\u6a21\u7528\u6237\u60c5\u7eea\u4e0e\u7cfb\u7edf\u7b56\u7565\u4e4b\u95f4\u7684\u5bf9\u8bdd\u52a8\u6001\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4e86\u5bf9\u8bdd\u4e2d\u60c5\u611f\u8bc6\u522b\uff08ERC\uff09\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u7075\u6d3b\u7684\u6df7\u5408\u60c5\u7eea\u6a21\u5757\uff0c\u4ee5\u6355\u6349\u7528\u6237\u7684\u7ec6\u5fae\u60c5\u611f\u72b6\u6001\u3002\u5728\u4e24\u4e2aESC\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEmoDynamiX\u663e\u8457\u8d85\u8d8a\u4e86\u5148\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002 \u8bf7\u6ce8\u610f\uff0c\u4e0a\u8ff0\u7ffb\u8bd1\u5df2\u7ecf\u79fb\u9664\u4e86\",\"\u5b57\u7b26\u3002**|\n", "2408.08780": "|**2024-08-16**|**Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions**|Chenming Tang et.al.|[2408.08780](http://arxiv.org/abs/2408.08780)|null|\u901a\u8fc7\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728ICL\u8fc7\u7a0b\u4e2d\u63cf\u8ff0\u6027\u6307\u4ee4\u7684\u4f5c\u7528\u4ecd\u7136\u6709\u5f85\u63a2\u7d22\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u96c6\u6210\u63d0\u793a\u6846\u67b6\uff0c\u7528\u4e8e\u63cf\u8ff0\u591a\u4e2a\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u9009\u62e9\u6807\u51c6\uff0c\u5e76\u5728\u516d\u4e2a\u7ffb\u8bd1\u65b9\u5411\u7684\u673a\u5668\u7ffb\u8bd1\uff08MT\uff09\u4efb\u52a1\u4e0a\u7684\u521d\u6b65\u5b9e\u9a8c\u8868\u660e\uff0c\u8fd9\u79cd\u6846\u67b6\u80fd\u591f\u63d0\u5347ICL\u6027\u80fd\u3002\u51fa\u4e4e\u610f\u6599\u7684\u662f\uff0cLLM\u53ef\u80fd\u5e76\u4e0d\u5173\u5fc3\u63cf\u8ff0\u7684\u5177\u4f53\u5185\u5bb9\uff0c\u6027\u80fd\u63d0\u5347\u4e3b\u8981\u6e90\u4e8e\u96c6\u6210\u683c\u5f0f\uff0c\u5373\u4f7f\u4f7f\u7528\u968f\u673a\u63cf\u8ff0\u540d\u8bcd\uff0c\u8be5\u6846\u67b6\u4e5f\u80fd\u5e26\u6765\u6539\u8fdb\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u5728\u5e38\u8bc6\u3001\u6570\u5b66\u3001\u903b\u8f91\u63a8\u7406\u548c\u5e7b\u89c9\u4efb\u52a1\u4e0a\u5e94\u7528\u4e86\u8fd9\u79cd\u65b0\u7684\u96c6\u6210\u63d0\u793a\uff0c\u5e76\u4f7f\u7528\u4e09\u79cdLLM\u53d6\u5f97\u4e86\u6709\u5e0c\u671b\u7684\u7ed3\u679c\uff0c\u8fd9\u518d\u6b21\u8868\u660e\u8bbe\u8ba1\u9002\u5f53\u7684\u63d0\u793a\u683c\u5f0f\u6bd4\u4e13\u6ce8\u4e8e\u7279\u5b9a\u63cf\u8ff0\u66f4\u4e3a\u6709\u6548\u548c\u9ad8\u6548\u3002\u5728\u8bba\u6587\u53d1\u8868\u540e\uff0c\u6211\u4eec\u7684\u4ee3\u7801\u5c06\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2408.08779": "|**2024-08-16**|**DAC: Decomposed Automation Correction for Text-to-SQL**|Dingzirui Wang et.al.|[2408.08779](http://arxiv.org/abs/2408.08779)|**[link](https://github.com/zirui-HIT/DAC)**|**\u6587\u672c\u5230SQL\u662f\u4e00\u4e2a\u91cd\u8981\u7684\u4efb\u52a1\uff0c\u5b83\u901a\u8fc7\u81ea\u52a8\u751f\u6210SQL\u67e5\u8be2\u5e2e\u52a9\u4eba\u4eec\u4ece\u6570\u636e\u5e93\u4e2d\u83b7\u53d6\u4fe1\u606f\u3002\u8003\u8651\u5230\u51fa\u8272\u7684\u6027\u80fd\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b9\u6cd5\u6210\u4e3a\u4e86\u6587\u672c\u5230SQL\u7684\u4e3b\u6d41\u65b9\u5f0f\u3002\u5728\u8fd9\u7c7b\u65b9\u6cd5\u4e2d\uff0c\u81ea\u52a8\u4fee\u6b63\u6210\u4e3a\u4e00\u79cd\u6709\u6548\u624b\u6bb5\uff0c\u80fd\u591f\u901a\u8fc7\u7ea0\u6b63\u751f\u6210\u7ed3\u679c\u4e2d\u7684\u9519\u8bef\u6765\u8fdb\u4e00\u6b65\u63d0\u5347\u6027\u80fd\u3002\u73b0\u6709\u4fee\u6b63\u65b9\u6cd5\u8981\u6c42LLM\u76f4\u63a5\u5bf9\u751f\u6210\u7684SQL\u8fdb\u884c\u4fee\u6b63\uff0c\u800c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u68c0\u6d4b\u9519\u8bef\uff0c\u5bfc\u81f4\u4e86\u8f83\u5dee\u7684\u6027\u80fd\u3002\u56e0\u6b64\uff0c\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u91c7\u7528\u5206\u89e3\u5f0f\u4fee\u6b63\u6765\u589e\u5f3a\u6587\u672c\u5230SQL\u7684\u6027\u80fd\u3002\u9996\u5148\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5206\u89e3\u5f0f\u4fee\u6b63\u4f18\u4e8e\u76f4\u63a5\u4fee\u6b63\uff0c\u56e0\u4e3a\u4e0eSQL\u76f8\u6bd4\uff0c\u901a\u8fc7\u7ed3\u679c\u5206\u89e3\u5b50\u4efb\u52a1\u6765\u68c0\u6d4b\u548c\u4fee\u590d\u9519\u8bef\u66f4\u4e3a\u5bb9\u6613\u3002\u57fa\u4e8e\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5206\u89e3\u81ea\u52a8\u5316\u4fee\u6b63\uff08DAC\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u5c06\u6587\u672c\u5230SQL\u5206\u89e3\u4e3a\u5b9e\u4f53\u94fe\u63a5\u548c\u9aa8\u67b6\u89e3\u6790\u4e24\u4e2a\u5b50\u4efb\u52a1\u6765\u4fee\u6b63SQL\u3002DAC\u9996\u5148\u751f\u6210\u4e0e\u95ee\u9898\u5bf9\u5e94\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\uff0c\u7136\u540e\u6bd4\u8f83\u521d\u59cbSQL\u4e0e\u751f\u6210\u7684\u5b9e\u4f53\u548c\u9aa8\u67b6\u4e4b\u95f4\u7684\u5dee\u5f02\u4f5c\u4e3a\u4fee\u6b63\u53cd\u9988\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728Spider\u3001Bird\u548cKaggleDBQA\u4e0a\u7684\u5e73\u5747\u6027\u80fd\u63d0\u9ad8\u4e863.7%\uff0c\u8bc1\u660e\u4e86DAC\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10197": "|**2024-08-19**|**Demystifying the Communication Characteristics for Distributed Transformer Models**|Quentin Anthony et.al.|[2408.10197](http://arxiv.org/abs/2408.10197)|null|\u6df1\u5ea6\u5b66\u4e60\uff08DL\uff09\u6a21\u578b\u57fa\u4e8e\u53d8\u6362\u5668\u67b6\u6784\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3001\u89c6\u89c9\u53d8\u6362\u5668\u3001\u97f3\u9891\u751f\u6210\u548c\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u7b49\u4f17\u591aDL\u5e94\u7528\u9886\u57df\u5b9e\u73b0\u4e86\u9769\u547d\u6027\u8fdb\u5c55\u3002\u8fd9\u4e00\u7cfb\u5217\u8fdb\u6b65\u5f88\u5927\u7a0b\u5ea6\u4e0a\u5f97\u76ca\u4e8e\u5206\u5e03\u5f0f\u8bad\u7ec3\uff0c\u7136\u800c\u5206\u5e03\u5f0f\u901a\u4fe1\u4ecd\u7136\u662f\u5f71\u54cd\u8bad\u7ec3\u8fdb\u5ea6\u7684\u4e00\u4e2a\u91cd\u5927\u74f6\u9888\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u53d8\u6362\u5668\u6a21\u578b\u7684\u901a\u4fe1\u884c\u4e3a\uff0c\u5373\u5728\u4f7f\u7528\u591a\u8282\u70b9/\u591aGPU DL\u8bad\u7ec3\u65f6\uff0c\u4e0d\u540c\u5e76\u884c\u65b9\u6848\u5982\u4f55\u5728\u53d8\u6362\u5668\u80cc\u666f\u4e0b\u8fdb\u884c\u6570\u636e\u901a\u4fe1\u3002\u6211\u4eec\u4ee5GPT\u4e3a\u57fa\u7840\u7684\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u53d8\u6362\u5668\u67b6\u6784\u6848\u4f8b\u7814\u7a76\u7684\u4e3b\u8981\u5bf9\u8c61\uff0c\u7531\u4e8e\u5176\u5e7f\u6cdb\u7684\u5e94\u7528\u800c\u88ab\u9009\u4e2d\u3002\u901a\u8fc7\u6211\u4eec\u7684\u901a\u4fe1\u65e5\u5fd7\u9a8c\u8bc1\u4e86\u6240\u83b7\u5f97\u7684\u5b9e\u9a8c\u7ed3\u679c\uff0c\u5e76\u4f7f\u7528\u5206\u6790\u6a21\u578b\u5bf9\u8fd9\u4e9b\u7ed3\u679c\u8fdb\u884c\u4e86\u786e\u8ba4\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8fdb\u4e00\u6b65\u4f18\u5316\u5c0f\u6d88\u606f\u70b9\u5230\u70b9\u901a\u4fe1\u7684\u5fc5\u8981\u6027\u3001\u5e8f\u5217\u957f\u5ea6\u3001\u6bcfGPU\u541e\u5410\u91cf\u3001\u6a21\u578b\u5927\u5c0f\u4ee5\u53ca\u6240\u7528\u4f18\u5316\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4ee5\u53ca\u5728\u6846\u67b6\u548c\u9ad8\u6027\u80fd\u8ba1\u7b97\u4e2d\u95f4\u4ef6\u8bbe\u8ba1\u4e0e\u4f18\u5316\u65b9\u9762\u53ef\u80fd\u9700\u8981\u5f15\u5bfc\u7684\u8fdb\u4e00\u6b65\u4f18\u5316\u65b9\u5411\u3002|\n", "2408.10174": "|**2024-08-19**|**SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models**|Anke Tang et.al.|[2408.10174](http://arxiv.org/abs/2408.10174)|**[link](https://github.com/tanganke/fusion_bench)**|**\u6df1\u5ea6\u6a21\u578b\u5728\u5927\u89c4\u6a21\u6570\u636e\u96c6\u4e0a\u7684\u8bad\u7ec3\u65e5\u76ca\u53d8\u5f97\u6210\u672c\u9ad8\u6602\uff0c\u8fd9\u4fc3\u4f7f\u4eba\u4eec\u5e7f\u6cdb\u91c7\u7528\u6df1\u5ea6\u6a21\u578b\u878d\u5408\u6280\u672f\uff0c\u4ee5\u5229\u7528\u73b0\u6709\u6a21\u578b\u7684\u77e5\u8bc6\u3002\u4ece\u7b80\u5355\u7684\u6743\u91cd\u5e73\u5747\u5230\u66f4\u590d\u6742\u7684AdaMerging\u7b49\u65b9\u6cd5\uff0c\u6a21\u578b\u878d\u5408\u80fd\u591f\u6709\u6548\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5e76\u52a0\u901f\u65b0\u6a21\u578b\u7684\u5f00\u53d1\u3002\u7136\u800c\uff0c\u4e2a\u4f53\u6a21\u578b\u53c2\u6570\u95f4\u7684\u76f8\u4e92\u5e72\u6270\u4ee5\u53ca\u878d\u5408\u8fc7\u7a0b\u7684\u53ef\u89e3\u91ca\u6027\u4e0d\u8db3\u4ecd\u7136\u662f\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u8bd5\u56fe\u901a\u8fc7\u8bc4\u4f30\u53c2\u6570\u5c5e\u6027\uff08\u5982\u5927\u5c0f\u6216\u7b26\u53f7\uff09\u6216\u8fdb\u884c\u53c2\u6570\u4fee\u526a\u6765\u89e3\u51b3\u53c2\u6570\u5e72\u6270\u95ee\u9898\u3002\u672c\u7814\u7a76\u9996\u5148\u4ece\u7ebf\u6027\u5c42\u5fae\u8c03\u7684\u89d2\u5ea6\u51fa\u53d1\uff0c\u901a\u8fc7\u5b50\u7a7a\u95f4\u5206\u6790\u660e\u786e\u5730\u5b9a\u4e49\u4e86\u53c2\u6570\u5e72\u6270\u4f5c\u4e3a\u4f18\u5316\u95ee\u9898\uff0c\u4ee5\u63ed\u793a\u8fd9\u4e00\u4e3b\u9898\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u96f6\u6837\u672c\u7a00\u758f\u6df7\u5408\u4f4e\u79e9\u4e13\u5bb6\uff08SMILE\uff09\u6784\u9020\u7684\u521b\u65b0\u6a21\u578b\u878d\u5408\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5141\u8bb8\u5728\u65e0\u9700\u989d\u5916\u6570\u636e\u6216\u8fdb\u4e00\u6b65\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u5c06\u6e90\u6a21\u578b\u5347\u7ea7\u4e3a\u6df7\u5408\u4e13\u5bb6\u6a21\u578b\uff08MoE\uff09\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u4ee5\u4e0b\u89c2\u5bdf\uff1a\u5fae\u8c03\u4e3b\u8981\u4fdd\u7559\u4e86\u9884\u8bad\u7ec3\u7684\u91cd\u8981\u90e8\u5206\uff0c\u4f46\u4f7f\u7528\u8f83\u5c11\u91cd\u8981\u6216\u672a\u4f7f\u7528\u7684\u533a\u57df\u6765\u9002\u5e94\u65b0\u4efb\u52a1\u3002\u6b64\u5916\uff0c\u5728\u539f\u59cb\u53c2\u6570\u7a7a\u95f4\u4e2d\u56fa\u6709\u7684\u53c2\u6570\u5e72\u6270\u95ee\u9898\uff0c\u53ef\u4ee5\u901a\u8fc7\u6269\u5c55\u7ef4\u5ea6\u6765\u7ba1\u7406\u3002\u6211\u4eec\u5728\u591a\u79cd\u573a\u666f\u4e0b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5305\u62ec\u56fe\u50cf\u5206\u7c7b\u548c\u6587\u672c\u6cdb\u5316\u4efb\u52a1\uff0c\u4f7f\u7528\u5168\u91cf\u5fae\u8c03\u548cLoRA\u5fae\u8c03\uff0c\u5e76\u5c06\u6211\u4eec\u7684\u65b9\u6cd5\u5e94\u7528\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08CLIP\u6a21\u578b\u3001Flan-T5\u6a21\u578b\u548cMistral-7B\u6a21\u578b\uff09\uff0c\u7a81\u51fa\u4e86SMILE\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8ehttps://github.com/tanganke/fusion_bench**|\n", "2408.10159": "|**2024-08-19**|**Customizing Language Models with Instance-wise LoRA for Sequential Recommendation**|Xiaoyu Kong et.al.|[2408.10159](http://arxiv.org/abs/2408.10159)|null|\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u77e5\u8bc6\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u7684\u4f18\u52bf\uff0c\u8fd1\u671f\u7684\u7814\u7a76\u901a\u8fc7\u8bed\u8a00\u751f\u6210\u8303\u5f0f\u5c06LLM\u5e94\u7528\u4e8e\u5e8f\u5217\u63a8\u8350\u7cfb\u7edf\u4e2d\u3002\u8fd9\u4e9b\u65b9\u6cd5\u5c06\u7528\u6237\u884c\u4e3a\u5e8f\u5217\u8f6c\u6362\u4e3aLLM\u5fae\u8c03\u7684\u63d0\u793a\uff0c\u5229\u7528LoRA\u6a21\u5757\u6765\u7ec6\u5316\u63a8\u8350\u3002\u7136\u800c\uff0c\u5728\u4e0d\u540c\u7528\u6237\u884c\u4e3a\u4e4b\u95f4\u8fdb\u884c\u7edf\u4e00\u5e94\u7528\u65f6\uff0cLoRA\u6709\u65f6\u65e0\u6cd5\u6355\u6349\u5230\u4e2a\u4f53\u5dee\u5f02\u6027\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u4ee5\u53ca\u5728\u4e0d\u540c\u884c\u4e3a\u5e8f\u5217\u95f4\u7684\u8d1f\u8fc1\u79fb\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684LoRA\uff08iLoRA\uff09\uff0c\u5b83\u7ed3\u5408\u4e86LoRA\u4e0e\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6846\u67b6\u3002iLoRA\u521b\u5efa\u4e86\u4e00\u4e2a\u591a\u6837\u5316\u7684\u4e13\u5bb6\u96c6\u5408\uff0c\u6bcf\u4e2a\u4e13\u5bb6\u90fd\u80fd\u591f\u6355\u83b7\u7279\u5b9a\u7684\u7528\u6237\u504f\u597d\u65b9\u9762\uff0c\u5e76\u5f15\u5165\u4e86\u4e00\u4e2a\u7531\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u5f15\u5bfc\u7684\u95e8\u63a7\u51fd\u6570\u3002\u8be5\u95e8\u63a7\u51fd\u6570\u5904\u7406\u5386\u53f2\u4ea4\u4e92\u5e8f\u5217\u4ee5\u751f\u6210\u589e\u5f3a\u8868\u793a\uff0c\u4ece\u800c\u6307\u5bfc\u95e8\u63a7\u7f51\u7edc\u8f93\u51fa\u5b9a\u5236\u7684\u4e13\u5bb6\u53c2\u4e0e\u6743\u91cd\u3002\u8fd9\u79cd\u5b9a\u5236\u5316\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c11\u8d1f\u8fc1\u79fb\u5e76\u52a8\u6001\u9002\u5e94\u591a\u6837\u7684\u884c\u4e3a\u6a21\u5f0f\u3002\u5728\u4e09\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u663e\u793a\u4e86iLoRA\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u4e86\u5176\u5728\u6355\u6349\u7528\u6237\u7279\u5b9a\u504f\u597d\u548c\u63d0\u9ad8\u63a8\u8350\u51c6\u786e\u5ea6\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2408.10151": "|**2024-08-19**|**Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models**|Amey Hengle et.al.|[2408.10151](http://arxiv.org/abs/2408.10151)|**[link](https://github.com/AmeyHengle/multilingual-needle-in-a-haystack)**|\u5728\u8fd1\u671f\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u793a\u4e86\u5728\u591a\u79cd\u8bed\u8a00\u4e2d\u54cd\u5e94\u67e5\u8be2\u7684\u80fd\u529b\u4e4b\u540e\uff0c\u5b83\u4eec\u5904\u7406\u957f\u591a\u8bed\u8a00\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002\u56e0\u6b64\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u8bc4\u4f30LLM\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u81f3\u5173\u91cd\u8981\uff0c\u7279\u522b\u662f\u5728\u4fe1\u606f\u68c0\u7d22\u7684\u80cc\u666f\u4e0b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u591a\u8bed\u8a00\u9488\u5728\u8349\u5806\u4e2d\u7684\u6d4b\u8bd5\uff08MultiLingual Needle-in-a-Haystack\uff0c\u7b80\u79f0MLNeedle\uff09\uff0c\u65e8\u5728\u8bc4\u4f30\u6a21\u578b\u4ece\u591a\u8bed\u8a00\u5e72\u6270\u6587\u672c\u96c6\u5408\uff08\u8349\u5806\uff09\u4e2d\u68c0\u7d22\u76f8\u5173\u4fe1\u606f\uff08\u9488\uff09\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u6d4b\u8bd5\u6269\u5c55\u4e86\u591a\u8bed\u8a00\u95ee\u7b54\u4efb\u52a1\uff0c\u6db5\u76d6\u4e86\u5355\u8bed\u8a00\u548c\u8de8\u8bed\u8a00\u68c0\u7d22\u3002\u6211\u4eec\u5bf9\u5f53\u524d\u7684\u56db\u5927\u5148\u8fdbLLM\u8fdb\u884c\u4e86MLNeedle\u6d4b\u8bd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u663e\u793a\uff0c\u6a21\u578b\u6027\u80fd\u5728\u4e0d\u540c\u8bed\u8a00\u548c\u9488\u7684\u4f4d\u7f6e\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u5f53\u9488\u4f4d\u4e8e\u82f1\u8bed\u8bed\u7cfb\u4e4b\u5916\u7684\u8bed\u8a00\u4e2d\u4ee5\u53ca\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u4e2d\u95f4\u4f4d\u7f6e\u65f6\uff0c\u6a21\u578b\u7684\u6027\u80fd\u6700\u4f4e\u3002\u6b64\u5916\uff0c\u5c3d\u7ba1\u67d0\u4e9b\u6a21\u578b\u58f0\u79f0\u5177\u6709\u9ad8\u8fbe8k\u4e2a\u4ee4\u724c\u7684\u4e0a\u4e0b\u6587\u5927\u5c0f\uff0c\u4f46\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u589e\u52a0\u65f6\uff0c\u5b83\u4eec\u90fd\u6ca1\u6709\u8868\u73b0\u51fa\u6ee1\u610f\u7684\u8de8\u8bed\u8a00\u68c0\u7d22\u6027\u80fd\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5173\u4e8eLLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u5173\u952e\u89c1\u89e3\uff0c\u4ee5\u6307\u5bfc\u672a\u6765\u7684\u8bc4\u4f30\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u7814\u7a76LLM\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7684\u957f\u4e0a\u4e0b\u6587\u884c\u4e3a\u3002|\n", "2408.10147": "|**2024-08-19**|**In-Context Learning with Representations: Contextual Generalization of Trained Transformers**|Tong Yang et.al.|[2408.10147](http://arxiv.org/abs/2408.10147)|null|\u672c\u6587\u901a\u8fc7\u975e\u7ebf\u6027\u56de\u5f52\u4efb\u52a1\u7684\u89c6\u89d2\u6765\u63a2\u8ba8Transformer\u5728\u68af\u5ea6\u4e0b\u964d\u8fc7\u7a0b\u4e2d\u7684\u8bad\u7ec3\u52a8\u6001\u3002\u5728\u6b64\u7c7b\u4efb\u52a1\u4e2d\uff0c\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u5b66\u4e60\u6bcf\u4e2a\u4efb\u52a1\u7684\u6a21\u677f\u51fd\u6570\u5b9e\u73b0\u4e0a\u4e0b\u6587\u6cdb\u5316\uff0c\u6240\u6709\u6a21\u677f\u51fd\u6570\u90fd\u4f4d\u4e8e\u5305\u542b$m$\u4e2a\u57fa\u51fd\u6570\u7684\u7ebf\u6027\u7a7a\u95f4\u5185\u3002\u6211\u4eec\u5bf9\u5355\u5c42\u591a\u5934Transformer\u8fdb\u884c\u4e86\u5206\u6790\uff0c\u4ee5\u5728\u90e8\u5206\u6807\u8bb0\u63d0\u793a\u4e0b\u9884\u6d4b\u672a\u6807\u8bb0\u8f93\u5165\u7684\u4e0a\u4e0b\u6587\u5185\u9884\u6d4b\u80fd\u529b\uff0c\u5176\u4e2d\u6807\u7b7e\u5305\u542b\u9ad8\u65af\u566a\u58f0\uff0c\u6bcf\u4e2a\u63d0\u793a\u4e2d\u7684\u793a\u4f8b\u6570\u91cf\u4e0d\u8db3\u4ee5\u786e\u5b9a\u6a21\u677f\u3002 \u5728\u6e29\u548c\u5047\u8bbe\u4e0b\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5355\u5c42\u591a\u5934Transformer\u7684\u8bad\u7ec3\u635f\u5931\u4f1a\u7ebf\u6027\u6536\u655b\u81f3\u5168\u5c40\u6700\u5c0f\u503c\u3002\u6b64\u5916\uff0cTransformer\u6709\u6548\u5730\u5b66\u4e60\u4e86\u5728\u57fa\u51fd\u6570\u4e0a\u8fdb\u884c\u5cad\u56de\u5f52\u7684\u65b9\u6cd5\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u5c55\u793a\u4e86\u5f53\u63d0\u793a\u4ec5\u5305\u542b\u5c11\u91cf\u67e5\u8be2-\u7b54\u6848\u5bf9\u65f6\uff0cTransformer\u80fd\u591f\u5b66\u4e60\u4e0a\u4e0b\u6587\u4fe1\u606f\uff08\u5373\u6a21\u677f\uff09\u4ee5\u5bf9\u672a\u89c1\u8fc7\u7684\u793a\u4f8b\u548c\u4efb\u52a1\u8fdb\u884c\u6cdb\u5316\u3002|\n", "2408.10141": "|**2024-08-19**|**Instruction Finetuning for Leaderboard Generation from Empirical AI Research**|Salomon Kabongo et.al.|[2408.10141](http://arxiv.org/abs/2408.10141)|null|\u672c\u6587\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6307\u4ee4\u5fae\u8c03\u5728\u81ea\u52a8\u5316\u751f\u6210AI\u7814\u7a76\u6392\u884c\u699c\u4e2d\u7684\u5e94\u7528\uff0c\u4ece\u6587\u7ae0\u4e2d\u63d0\u53d6\uff08\u4efb\u52a1\uff0c\u6570\u636e\u96c6\uff0c\u6307\u6807\uff0c\u5206\u6570\uff09\u56db\u5143\u7ec4\u3002\u8be5\u7814\u7a76\u65e8\u5728\u901a\u8fc7\u4ece\u4f20\u7edf\u7684\u3001\u57fa\u4e8e\u793e\u533a\u7684\u624b\u52a8\u6574\u7406\u8f6c\u53d8\u4e3a\u5229\u7528\u81ea\u52a8\u5316\u3001\u751f\u6210\u5f0fLLM\u65b9\u6cd5\u6765\u7b80\u5316AI\u7814\u7a76\u8fdb\u5c55\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u8d85\u8d8a\u4f9d\u8d56\u4e8e\u7279\u5b9a\u5206\u7c7b\u7684\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u6a21\u578b\u7684\u4f20\u7edf\u65b9\u5f0f\u3002\u901a\u8fc7\u5229\u7528FLAN-T5\u6a21\u578b\uff0c\u672c\u7814\u7a76\u589e\u5f3a\u4e86LLMs\u5728\u4fe1\u606f\u62bd\u53d6\u65b9\u9762\u7684\u9002\u5e94\u6027\u548c\u53ef\u9760\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u6784\u5efa\u7ed3\u6784\u5316\u77e5\u8bc6\u8868\u793a\u3002|\n", "2408.10124": "|**2024-08-19**|**Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models**|Tianyu Zhang et.al.|[2408.10124](http://arxiv.org/abs/2408.10124)|**[link](https://github.com/zhangtia16/molgraph-lardo)**|**\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u662f\u836f\u7269\u53d1\u73b0\u7684\u57fa\u7840\u3002\u8fd1\u5e74\u6765\uff0c\u9884\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u5f97\u5230\u4e86\u5e7f\u6cdb\u5e94\u7528\uff0c\u5e76\u53d6\u5f97\u4e86\u663e\u8457\u6210\u679c\u3002\u4e00\u4e9b\u5c06\u751f\u7269\u5316\u5b66\u9886\u57df\u7684\u5148\u9a8c\u77e5\u8bc6\u878d\u5165\u9884\u8bad\u7ec3\u6846\u67b6\u7684\u65b9\u6cd5\u8868\u73b0\u51fa\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u751f\u7269\u5316\u5b66\u4e13\u5bb6\uff0c\u83b7\u53d6\u548c\u603b\u7ed3\u5927\u91cf\u7684\u9886\u57df\u77e5\u8bc6\u6587\u732e\u65e2\u8017\u65f6\u53c8\u6602\u8d35\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u5e76\u9ad8\u6548\u63d0\u4f9b\u901a\u7528\u77e5\u8bc6\u65b9\u9762\u8868\u73b0\u51fa\u5353\u8d8a\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5076\u5c14\u4f1a\u51fa\u73b0\u5e7b\u89c9\uff0c\u5e76\u7f3a\u4e4f\u751f\u6210\u7279\u5b9a\u9886\u57df\u77e5\u8bc6\u7684\u7cbe\u786e\u6027\u3002\u4e0e\u6b64\u76f8\u53cd\uff0c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\uff08DSMs\uff09\u62e5\u6709\u4e30\u5bcc\u7684\u9886\u57df\u77e5\u8bc6\uff0c\u80fd\u591f\u51c6\u786e\u8ba1\u7b97\u4e0e\u5206\u5b50\u9886\u57df\u76f8\u5173\u7684\u6307\u6807\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5b83\u4eec\u7684\u6a21\u578b\u5927\u5c0f\u6709\u9650\u4e14\u529f\u80fd\u5355\u4e00\uff0c\u5b83\u4eec\u7f3a\u4e4f\u5168\u9762\u7684\u8868\u793a\u5b66\u4e60\u6240\u9700\u7684\u5e7f\u6cdb\u77e5\u8bc6\u3002\u4e3a\u4e86\u5728\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u4e2d\u5145\u5206\u5229\u7528\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMolGraph-LarDo\u7684\u65b0\u578b\u5206\u5b50\u56fe\u8868\u793a\u5b66\u4e60\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u878d\u5408\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9886\u57df\u7279\u5b9a\u5c0f\u578b\u6a21\u578b\u3002\u6280\u672f\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u4e24\u9636\u6bb5\u63d0\u793a\u7b56\u7565\uff0c\u5176\u4e2d\u5f15\u5165DSMs\u6765\u6821\u51c6LLMs\u63d0\u4f9b\u7684\u77e5\u8bc6\uff0c\u4ece\u800c\u589e\u5f3a\u9886\u57df\u7279\u5b9a\u4fe1\u606f\u7684\u51c6\u786e\u6027\uff0c\u4f7fLLMs\u80fd\u591f\u4e3a\u5206\u5b50\u6837\u672c\u751f\u6210\u66f4\u7cbe\u786e\u7684\u6587\u5b57\u63cf\u8ff0\u3002\u968f\u540e\uff0c\u6211\u4eec\u91c7\u7528\u591a\u6a21\u6001\u5bf9\u9f50\u65b9\u6cd5\u534f\u8c03\u5305\u62ec\u5206\u5b50\u56fe\u53ca\u5176\u5bf9\u5e94\u63cf\u8ff0\u6587\u672c\u5728\u5185\u7684\u5404\u79cd\u6a21\u6001\uff0c\u4ee5\u6307\u5bfc\u5206\u5b50\u8868\u793a\u7684\u9884\u8bad\u7ec3\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002**|\n", "2408.10111": "|**2024-08-20**|**PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities**|Yuanjian Xu et.al.|[2408.10111](http://arxiv.org/abs/2408.10111)|null|\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u5efa\u6a21\u5bf9\u4e8e\u7406\u89e3\u4e0e\u9884\u6d4b\u5e02\u573a\u884c\u4e3a\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u9762\u4e34\u7740\u975e\u7ebf\u6027\u3001\u975e\u5e73\u7a33\u6027\u548c\u9ad8\u566a\u58f0\u7b49\u6311\u6218\u3002\u4f20\u7edf\u7684\u6a21\u578b\u5728\u6355\u6349\u590d\u6742\u6a21\u5f0f\u65f6\u53d7\u5230\u8fd9\u4e9b\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u540c\u65f6\u53d7\u5230\u8ba1\u7b97\u8d44\u6e90\u548c\u6a21\u578b\u5bb9\u91cf\u7684\u9650\u5236\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a$\\textbf{PLUTUS}$\u7684\u6a21\u578b\uff0c\u5176\u5168\u79f0\u4e3a$\\textbf{P}$re-trained $\\textbf{L}$arge $\\textbf{U}$nified $\\textbf{T}$ransformer-based\u6a21\u578b\uff0c\u7528\u4e8e\u63ed\u793a\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u4e2d\u7684\u89c4\u5f8b\u3002$\\textbf{PLUTUS}$\u901a\u8fc7\u7ed3\u5408\u53ef\u9006\u5d4c\u5165\u6a21\u5757\u3001\u5bf9\u6bd4\u5b66\u4e60\u548c\u81ea\u52a8\u7f16\u7801\u6280\u672f\uff0c\u521b\u5efa\u4e86\u539f\u59cb\u6570\u636e\u4e0e\u5757\u5d4c\u5165\u4e4b\u95f4\u7684\u8fd1\u4f3c\u4e00\u4e00\u6620\u5c04\u3002 TimeFormer\uff0c\u4e00\u4e2a\u57fa\u4e8e\u6ce8\u610f\u529b\u7684\u67b6\u6784\uff0c\u6784\u6210\u4e86$\\textbf{PLUTUS}$\u7684\u6838\u5fc3\uff0c\u6709\u6548\u5730\u5904\u7406\u4e86\u9ad8\u566a\u58f0\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6ce8\u610f\u529b\u673a\u5236\uff0c\u4ee5\u8de8\u53d8\u91cf\u548c\u65f6\u95f4\u7ef4\u5ea6\u6355\u83b7\u7279\u5f81\u3002$\\textbf{PLUTUS}$\u5728\u89c4\u6a21\u7a7a\u524d\u76841000\u4ebf\u4e2a\u89c2\u5bdf\u503c\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u65e8\u5728\u9002\u5e94\u5608\u6742\u7684\u91d1\u878d\u5e02\u573a\u73af\u5883\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c$\\textbf{PLUTUS}$\u662f\u9996\u4e2a\u5f00\u6e90\u7684\u3001\u5927\u89c4\u6a21\u7684\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6a21\u578b\uff0c\u53c2\u6570\u8d85\u8fc7\u5341\u4ebf\u4e2a\u3002\u5b83\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5b9e\u73b0\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u8fc1\u79fb\u6027\uff0c\u5e76\u4e3a\u91d1\u878d\u9886\u57df\u5efa\u7acb\u4e86\u4e00\u4e2a\u575a\u5b9e\u7684\u57fa\u7840\u6a21\u578b\u3002\u6211\u4eec\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u9884\u8bad\u7ec3\u91d1\u878d\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u7684\u6280\u672f\u6307\u5bfc\uff0c\u786e\u7acb\u4e86\u8be5\u9886\u57df\u7684\u5168\u65b0\u6807\u51c6\u3002|\n", "2408.10086": "|**2024-08-19**|**ARMADA: Attribute-Based Multimodal Data Augmentation**|Xiaomeng Jin et.al.|[2408.10086](http://arxiv.org/abs/2408.10086)|null|\u5728\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLMs\uff09\u4e2d\uff0c\u624b\u52a8\u6807\u6ce8\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u6570\u636e\u4ee5\u8fdb\u884c\u5fae\u8c03\u548c\u5bf9\u9f50\u7684\u6210\u672c\u975e\u5e38\u9ad8\u3002\u5c3d\u7ba1\u73b0\u6709\u7684\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u6846\u67b6\u63d0\u51fa\u4e86\u589e\u5f3a\u56fe\u50cf-\u6587\u672c\u914d\u5bf9\u7684\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u8981\u4e48\u5728\u6587\u672c\u548c\u56fe\u50cf\u4e4b\u95f4\u5b58\u5728\u8bed\u4e49\u4e0d\u4e00\u81f4\uff0c\u8981\u4e48\u751f\u6210\u4e0d\u5207\u5b9e\u9645\u7684\u56fe\u50cf\uff0c\u5bfc\u81f4\u4e0e\u73b0\u5b9e\u4e16\u754c\u793a\u4f8b\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAttribute-based Multimodal Data Augmentation (ARMADA)\u7684\u65b0\u578b\u591a\u6a21\u6001\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u77e5\u8bc6\u5f15\u5bfc\u7684\u63d0\u53ca\u5b9e\u4f53\u89c6\u89c9\u5c5e\u6027\u7684\u4fee\u6539\u6765\u589e\u5f3a\u6570\u636e\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u4ece\u539f\u59cb\u6587\u672c\u6570\u636e\u4e2d\u63d0\u53d6\u5b9e\u4f53\u53ca\u5176\u89c6\u89c9\u5c5e\u6027\uff0c\u7136\u540e\u5728\u77e5\u8bc6\u5e93\uff08KBs\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6307\u5bfc\u4e0b\u641c\u7d22\u89c6\u89c9\u5c5e\u6027\u7684\u66ff\u4ee3\u503c\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5229\u7528\u56fe\u50cf\u7f16\u8f91\u6a21\u578b\u6839\u636e\u63d0\u53d6\u7684\u5c5e\u6027\u7f16\u8f91\u56fe\u50cf\u3002ARMADA\u662f\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u6a21\u6001\u6570\u636e\u751f\u6210\u6846\u67b6\uff1a(i) \u4ece\u7b26\u53f7\u77e5\u8bc6\u5e93\u4e2d\u63d0\u53d6\u77e5\u8bc6\u5173\u8054\u7684\u5c5e\u6027\uff0c\u5b9e\u73b0\u8bed\u4e49\u4e00\u81f4\u4e14\u5177\u6709\u533a\u522b\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u751f\u6210\uff1b(ii) \u5229\u7528\u77e5\u8bc6\u5e93\u5c42\u6b21\u7ed3\u6784\u4e2d\u7684\u540c\u7c7b\u522b\u5b9e\u4f53\u751f\u6210\u89c6\u89c9\u4e0a\u76f8\u4f3c\u4f46\u4e0d\u540c\u7c7b\u522b\u7684\u56fe\u50cf\uff1b(iii) \u4f7f\u7528LLMs\u7684\u5e38\u8bc6\u77e5\u8bc6\u8c03\u8282\u8f85\u52a9\u89c6\u89c9\u5c5e\u6027\uff0c\u5982\u80cc\u666f\uff0c\u4ee5\u66f4\u5168\u9762\u5730\u8868\u793a\u539f\u59cb\u5b9e\u4f53\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u660e\uff0c\u5728\u56db\u4e2a\u4e0b\u6e38\u4efb\u52a1\u4e0a\uff0c\u6211\u4eec\u7684\u6846\u67b6\u80fd\u591f\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u5e76\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\u8fd9\u4e5f\u5f3a\u8c03\u4e86\u5229\u7528\u5916\u90e8\u77e5\u8bc6\u4ee3\u7406\u4ee5\u589e\u5f3a\u53ef\u89e3\u91ca\u6027\u548c\u73b0\u5b9e\u4e16\u754c\u76f8\u5173\u6027\u7684\u5fc5\u8981\u6027\u3002|\n", "2408.10072": "|**2024-08-19**|**FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant**|Zhengchao Huang et.al.|[2408.10072](http://arxiv.org/abs/2408.10072)|null|\u5feb\u901f\u53d1\u5c55\u7684\u6df1\u5ea6\u4f2a\u9020\u6280\u672f\u5f15\u53d1\u4e86\u516c\u4f17\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c24\u5176\u662f\u5728\u5bf9\u516c\u5171\u4fe1\u606f\u5b89\u5168\u6784\u6210\u4e25\u91cd\u5a01\u80c1\u7684\u9762\u90e8\u4f2a\u9020\u65b9\u9762\u3002\u7136\u800c\uff0c\u672a\u77e5\u548c\u591a\u6837\u7684\u4f2a\u9020\u6280\u672f\u3001\u591a\u53d8\u7684\u9762\u90e8\u7279\u5f81\u4ee5\u53ca\u590d\u6742\u7684\u73af\u5883\u56e0\u7d20\u7ed9\u9762\u90e8\u4f2a\u9020\u5206\u6790\u5e26\u6765\u4e86\u5de8\u5927\u6311\u6218\u3002\u73b0\u6709\u6570\u636e\u96c6\u5728\u63cf\u8ff0\u8fd9\u4e9b\u65b9\u9762\u65f6\u5b58\u5728\u4e0d\u8db3\uff0c\u4f7f\u5f97\u4ec5\u901a\u8fc7\u89c6\u89c9\u4fe1\u606f\u96be\u4ee5\u5728\u5404\u79cd\u5e72\u6270\u56e0\u7d20\u4e2d\u533a\u5206\u771f\u5b9e\u4e0e\u4f2a\u9020\u7684\u9762\u90e8\u3002\u6b64\u5916\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u672a\u80fd\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u4e14\u53ef\u89e3\u91ca\u7684\u7ed3\u679c\uff0c\u590d\u6742\u5316\u4e86\u6a21\u578b\u51b3\u7b56\u8fc7\u7a0b\u7684\u7406\u89e3\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u9896\u7684\u201c\u5f00\u653e\u4e16\u754c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u95ee\u7b54\u201d\uff08OW-FFA-VQA\uff09\u4efb\u52a1\u53ca\u5176\u76f8\u5e94\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u9996\u5148\u5efa\u7acb\u4e86\u4e00\u4e2a\u5305\u542b\u771f\u5b9e\u548c\u4f2a\u9020\u9762\u90e8\u56fe\u50cf\u7684\u591a\u6837\u96c6\u5408\uff0c\u5e76\u914d\u6709\u5173\u952e\u63cf\u8ff0\u548c\u53ef\u9760\u4f2a\u9020\u63a8\u7406\u7684\u6570\u636e\u96c6\u3002\u57fa\u4e8e\u6b64\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u9762\u90e8\u4f2a\u9020\u5206\u6790\u52a9\u624b\u201d\uff08FFAA\uff09\uff0c\u5b83\u7531\u4e00\u4e2a\u5fae\u8c03\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u548c\u4e00\u4e2a\u591a\u7b54\u6848\u667a\u80fd\u51b3\u7b56\u7cfb\u7edf\uff08MIDS\uff09\u7ec4\u6210\u3002\u901a\u8fc7\u7ed3\u5408\u5047\u8bbe\u6027\u63d0\u793a\u4e0eMIDS\uff0c\u6709\u6548\u6d88\u9664\u4e86\u6a21\u7cca\u5206\u7c7b\u8fb9\u754c\u7684\u5f71\u54cd\u529b\uff0c\u589e\u5f3a\u4e86\u6a21\u578b\u7684\u9c81\u68d2\u6027\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u7528\u6237\u53cb\u597d\u7684\u53ef\u89e3\u91ca\u7ed3\u679c\uff0c\u800c\u4e14\u5728\u51c6\u786e\u6027\u4e0e\u9c81\u68d2\u6027\u65b9\u9762\u663e\u8457\u8d85\u8d8a\u4e86\u4ee5\u5f80\u7684\u65b9\u6cd5\u3002|\n", "2408.11053": "|**2024-08-20**|**Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks**|Nathaniel Pinckney et.al.|[2408.11053](http://arxiv.org/abs/2408.11053)|**[link](https://github.com/nvlabs/verilog-eval)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6570\u5b57\u786c\u4ef6\u4ee3\u7801\u751f\u6210\u9886\u57df\u7684\u5e94\u7528\u662f\u4e00\u4e2a\u65b0\u5174\u9886\u57df\u3002\u5927\u591a\u6570LLM\u4e3b\u8981\u662f\u5728\u81ea\u7136\u8bed\u8a00\u548c\u8f6f\u4ef6\u4ee3\u7801\u4e0a\u8fdb\u884c\u8bad\u7ec3\u7684\u3002\u786c\u4ef6\u4ee3\u7801\uff0c\u5982Verilog\uff0c\u53ea\u5360\u8bad\u7ec3\u6570\u636e\u7684\u4e00\u5c0f\u90e8\u5206\uff0c\u800c\u4e14\u5f88\u5c11\u6709\u786c\u4ef6\u57fa\u51c6\u5b58\u5728\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c2023\u5e74\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aVerilogEval\u7684\u5f00\u6e90\u57fa\u51c6\uff0c\u5b83\u63d0\u4f9b\u4e86\u4e00\u4e2a\u4e00\u81f4\u7684\u8bc4\u4f30\u6846\u67b6\uff0c\u7528\u4e8eLLM\u5728\u4ee3\u7801\u5b8c\u6210\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u57fa\u51c6\u5728\u5f53\u65f6\u7684\u9886\u5148\u6a21\u578b\uff0c\u5305\u62ecGPT-4\uff0c\u8fdb\u884c\u4e86\u6d4b\u8bd5\u3002\u7136\u800c\uff0cVerilogEval\u548c\u5176\u4ed6Verilog\u751f\u6210\u57fa\u51c6\u7f3a\u4e4f\u5931\u8d25\u5206\u6790\uff0c\u5f53\u524d\u5f62\u5f0f\u4e0b\u4e5f\u4e0d\u5229\u4e8e\u63a2\u7d22\u63d0\u793a\u6280\u672f\u3002\u6b64\u5916\uff0c\u5728VerilogEval\u53d1\u5e03\u540e\uff0c\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u90fd\u7ecf\u5386\u4e86\u6301\u7eed\u7684\u53d1\u5c55\u3002 \u5728\u8fd9\u4e2a\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u65b0\u53d1\u5e03\u7684\u5546\u4e1a\u548c\u5f00\u6e90\u6a21\u578b\u7684\u4e0d\u540c\u89c4\u6a21\uff0c\u9488\u5bf9\u6539\u8fdb\u540e\u7684VerilogEval\u57fa\u51c6\u5957\u4ef6\u3002\u6211\u4eec\u589e\u5f3a\u4e86VerilogEval\u7684\u57fa\u7840\u67b6\u6784\u548c\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u81ea\u52a8\u5206\u7c7b\u5931\u8d25\uff0c\u5f15\u5165\u4e86\u652f\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u793a\u4f8b\u7684\u65b0\u63d0\u793a\uff0c\u5e76\u6269\u5c55\u4e86\u652f\u6301\u7684\u4efb\u52a1\u5230\u89c4\u683c\u5230RTL\u8f6c\u6362\u3002\u6211\u4eec\u53d1\u73b0\u5546\u4e1a\u9886\u57df\u7684\u6700\u65b0\u6a21\u578b\u6709\u4e86\u53ef\u6d4b\u91cf\u7684\u6539\u8fdb\uff0c\u5176\u4e2dGPT-4 Turbo\u5728\u89c4\u683c\u5230RTL\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8659%\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u4e5f\u7814\u7a76\u4e86\u65b0\u51fa\u73b0\u7684\u5f00\u6e90\u548c\u9886\u57df\u7279\u5b9a\u6a21\u578b\u7684\u6027\u80fd\uff0c\u5e76\u5c55\u793a\u4e86\u6a21\u578b\u4ece\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u83b7\u5f97\u663e\u8457\u76ca\u5904\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u53d1\u5e03\u7684Llama 3.1 405B\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u4e0eGPT-4 Turbo\u76f8\u5f53\uff0c\u5b9e\u73b0\u4e8658%\u7684\u6210\u529f\u7387\uff0c\u800c\u8f83\u5c0f\u7684\u9886\u57df\u7279\u5b9a\u7684RTL-Coder 6.7B\u6a21\u578b\u5219\u53d6\u5f97\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768437%\u7684\u6210\u529f\u7387\u3002\u7136\u800c\uff0c\u63d0\u793a\u5de5\u7a0b\u5bf9\u4e8e\u5b9e\u73b0\u826f\u597d\u7684\u6210\u529f\u7387\u81f3\u5173\u91cd\u8981\uff0c\u5e76\u4e14\u968f\u7740\u6a21\u578b\u548c\u4efb\u52a1\u7684\u53d8\u5316\u800c\u53d8\u5316\u3002\u4e00\u4e2a\u5141\u8bb8\u8fdb\u884c\u63d0\u793a\u5de5\u7a0b\u548c\u5931\u8d25\u5206\u6790\u7684\u57fa\u51c6\u57fa\u7840\u8bbe\u65bd\u5bf9\u4e8e\u6301\u7eed\u7684\u6a21\u578b\u5f00\u53d1\u548c\u90e8\u7f72\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.11051": "|**2024-08-20**|**FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**|Yunzhe Xu et.al.|[2408.11051](http://arxiv.org/abs/2408.11051)|**[link](https://github.com/xyz9911/FLAME)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u6f5c\u5728\u80fd\u529b\uff0c\u4f46\u5f53\u524d\u7684\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u867d\u7136LLM\u5728\u901a\u7528\u5bf9\u8bdd\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5728\u4e13\u95e8\u7684\u5bfc\u822a\u4efb\u52a1\u4e0a\u5374\u8868\u73b0\u4e0d\u4f73\uff0c\u76f8\u8f83\u4e8e\u4e13\u4e3aVLN\u8bbe\u8ba1\u7684\u6a21\u578b\uff0c\u5176\u6027\u80fd\u5f80\u5f80\u8f83\u4f4e\u4e0b\u3002\u6211\u4eec\u5f15\u5165\u4e86FLAME\uff08FLAMingo\u67b6\u6784\u5316\u5b9e\u4f53\u4ee3\u7406\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u591a\u6a21\u6001LLM\u7684\u65b0\u578b\u4ee3\u7406\u548c\u67b6\u6784\uff0c\u65e8\u5728\u89e3\u51b3\u57ce\u5e02VLN\u4efb\u52a1\uff0c\u5e76\u80fd\u9ad8\u6548\u5904\u7406\u591a\u4e2a\u89c2\u5bdf\u7ed3\u679c\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u4e86\u4e09\u9636\u6bb5\u8c03\u4f18\u6280\u672f\u4ee5\u5b9e\u73b0\u5bf9\u5bfc\u822a\u4efb\u52a1\u7684\u6709\u6548\u9002\u5e94\uff1a\u5355\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8857\u9053\u89c6\u56fe\u63cf\u8ff0\u3001\u591a\u611f\u77e5\u8c03\u6574\u7528\u4e8e\u8f68\u8ff9\u603b\u7ed3\u4ee5\u53ca\u7aef\u5230\u7aef\u8bad\u7ec3\u5728VLN\u6570\u636e\u96c6\u4e0a\u7684\u7efc\u5408\u80fd\u529b\u3002\u751f\u6210\u7684\u6570\u636e\u96c6\u901a\u8fc7\u81ea\u52a8\u5316\u8fc7\u7a0b\u5408\u6210\u800c\u6210\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cFLAME\u5728Touchdown\u6570\u636e\u96c6\u4e0a\u7684\u4efb\u52a1\u5b8c\u6210\u7387\u8f83\u73b0\u6709\u65b9\u6cd5\u63d0\u9ad8\u4e867.3%\uff0c\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u591a\u6a21\u6001LLM\u5728\u590d\u6742\u5bfc\u822a\u4efb\u52a1\u4e2d\u7684\u6f5c\u529b\uff0c\u4ee3\u8868\u4e86\u5411\u5b9e\u9645\u5e94\u7528\u591a\u6a21\u6001LLM\u4e8e\u5b9e\u4f53\u4eba\u5de5\u667a\u80fd\u9886\u57df\u8fc8\u51fa\u7684\u91cd\u8981\u4e00\u6b65\u3002\u9879\u76ee\u9875\u9762\uff1ahttps://flame-sjtu.github.io**|\n", "2408.11049": "|**2024-08-20**|**MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding**|Jian Chen et.al.|[2408.11049](http://arxiv.org/abs/2408.11049)|**[link](https://github.com/infini-ai-lab/magicdec)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bf8\u5982\u4ea4\u4e92\u5f0f\u804a\u5929\u673a\u5668\u4eba\u3001\u6587\u6863\u5206\u6790\u548c\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u7b49\u957f\u671f\u4e0a\u4e0b\u6587\u5e94\u7528\u4e2d\u53d8\u5f97\u8d8a\u6765\u8d8a\u666e\u904d\uff0c\u4f46\u63d0\u4f9b\u957f\u4e0a\u4e0b\u6587\u8bf7\u6c42\u65f6\uff0c\u8981\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u548c\u9ad8\u541e\u5410\u91cf\u662f\u4e00\u4e2a\u6311\u6218\u3002\u63a8\u6d4b\u6027\u89e3\u7801\uff08SD\uff09\u662f\u4e00\u79cd\u5e7f\u6cdb\u4f7f\u7528\u7684\u964d\u4f4e\u5ef6\u8fdf\u7684\u6280\u672f\uff0c\u4f20\u7edf\u89c2\u70b9\u8ba4\u4e3a\u5176\u6548\u80fd\u4ec5\u9650\u4e8e\u8f83\u5c0f\u7684\u6279\u6b21\u5927\u5c0f\u3002\u7136\u800c\uff0c\u5728MagicDec\u4e2d\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4ee4\u4eba\u60ca\u8bb6\u7684\u4e8b\u5b9e\uff1a\u5373\u4f7f\u5728\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u73af\u5883\u4e2d\uff0c\u5bf9\u4e8e\u4e2d\u7b49\u5230\u8f83\u957f\u5e8f\u5217\uff0cSD\u4ecd\u80fd\u5b9e\u73b0\u52a0\u901f\u3002\u66f4\u6709\u8da3\u7684\u662f\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u4e25\u8c28\u5206\u6790\uff0c\u4e00\u79cd\u667a\u80fd\u8d77\u8349\u7b56\u7565\u53ef\u4ee5\u5728\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u65f6\u83b7\u5f97\u66f4\u597d\u7684\u52a0\u901f\u6548\u679c\u3002 MagicDec\u9996\u5148\u8bc6\u522b\u51fa\u968f\u7740\u6279\u6b21\u5927\u5c0f\u548c\u5e8f\u5217\u957f\u5ea6\u589e\u52a0\u7684\u74f6\u9888\u8f6c\u79fb\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u6d1e\u5bdf\u6765\u66f4\u6709\u6548\u5730\u90e8\u7f72\u63a8\u6d4b\u6027\u89e3\u7801\u4ee5\u652f\u6301\u9ad8\u541e\u5410\u91cf\u63a8\u7406\u3002\u7136\u540e\uff0c\u5b83\u901a\u8fc7\u5229\u7528\u7a00\u758fKV\u7f13\u5b58\u7684\u8349\u6848\u6a21\u578b\u6765\u89e3\u51b3\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u548c\u6279\u6b21\u5927\u5c0f\u589e\u52a0\u800c\u6269\u5c55\u7684KV\u74f6\u9888\u95ee\u9898\u3002|\n", "2408.11043": "|**2024-08-20**|**Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research**|Sreyoshi Bhaduri et.al.|[2408.11043](http://arxiv.org/abs/2408.11043)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u5229\u7528\u57fa\u4e8e\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5206\u6790\u8bbf\u8c08\u8bb0\u5f55\uff0c\u4ee5\u89e3\u51b3\u624b\u52a8\u5206\u6790\u5b9a\u6027\u6570\u636e\u9700\u8981\u5927\u91cf\u65f6\u95f4\u548c\u52aa\u529b\u7684\u95ee\u9898\u3002\u7814\u7a76\u65e8\u5728\u5c06\u7814\u7a76\u95ee\u9898\u8bbe\u5b9a\u4e3a\u7531LLM\u4f5c\u4e3a\u521d\u7ea7\u7814\u7a76\u52a9\u624b\u8fdb\u884c\u8f85\u52a9\u7684\u6a21\u5f0f\u3002\u672c\u7814\u7a76\u63a2\u8ba8\u4e86\u5c06LLM\u89c6\u4e3a\u4eba\u624d\u7ba1\u7406\u9886\u57df\u7814\u7a76\u4eba\u5458\u7684\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u601d\u7ef4\u6a21\u578b\u3002\u901a\u8fc7\u6269\u5c55\u57fa\u4e8eRAG\u7684LLM\u65b9\u6cd5\uff0c\u672c\u6587\u5c55\u793a\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u5bf9\u534a\u7ed3\u6784\u5316\u8bbf\u8c08\u6570\u636e\u8fdb\u884c\u4e3b\u9898\u5efa\u6a21\u65b9\u9762\u7684\u7075\u6d3b\u6027\uff0c\u8d85\u8d8a\u4e86\u5b83\u4eec\u5728\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u4e2d\u7684\u4f20\u7edf\u5e94\u7528\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eLLM\u7684RAG\u65b9\u6cd5\u80fd\u591f\u6210\u529f\u63d0\u53d6\u611f\u5174\u8da3\u7684\u8bae\u9898\uff0c\u4e0e\u4ece\u540c\u4e00\u6570\u636e\u96c6\u624b\u52a8\u751f\u6210\u7684\u4e3b\u9898\u76f8\u6bd4\uff0c\u8986\u76d6\u8303\u56f4\u663e\u8457\u66f4\u9ad8\u3002\u8fd9\u8bc1\u660e\u4e86\u4f7f\u7528LLM\u4f5c\u4e3a\u521d\u7ea7\u8d28\u6027\u7814\u7a76\u52a9\u624b\u7684\u53ef\u884c\u6027\u3002\u6b64\u5916\uff0c\u7814\u7a76\u5efa\u8bae\uff0c\u4f7f\u7528\u6b64\u7c7b\u6a21\u578b\u7684\u7814\u7a76\u8005\u5e94\u4e25\u683c\u9075\u5faa\u4f20\u7edf\u8d28\u6027\u7814\u7a76\u4e2d\u4f7f\u7528\u7684\u8d28\u91cf\u6807\u51c6\uff0c\u4ee5\u786e\u4fdd\u5176\u65b9\u6cd5\u7684\u4e25\u8c28\u6027\u548c\u53ef\u9760\u6027\u3002 \u6700\u540e\uff0c\u8bba\u6587\u63d0\u51fa\u4e86\u9488\u5bf9\u5e0c\u671b\u5c06LLM\u4e0e\u73b0\u6709\u8d28\u6027\u7814\u7a76\u8303\u5f0f\u76f8\u878d\u5408\u7684\u884c\u4e1a\u5b9e\u8df5\u8005\u7684\u5173\u952e\u5efa\u8bae\uff0c\u63d0\u4f9b\u4e86\u4e00\u6761\u6709\u6548\u6574\u5408\u8fd9\u4e9b\u5f3a\u5927\u4f46\u521d\u7ea7\u7684\u4eba\u5de5\u667a\u80fd\u5de5\u5177\u5728\u5b9a\u6027\u6570\u636e\u5206\u6790\u4e2d\u7684\u8def\u5f84\uff0c\u7279\u522b\u662f\u5728\u4eba\u624d\u9886\u57df\u3002|\n", "2408.11029": "|**2024-08-20**|**Scaling Law with Learning Rate Annealing**|Howe Tissue et.al.|[2408.11029](http://arxiv.org/abs/2408.11029)|null|\u6211\u4eec\u53d1\u73b0\u795e\u7ecf\u8bed\u8a00\u6a21\u578b\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0c\u4ea4\u53c9\u71b5\u635f\u5931\u66f2\u7ebf\u9075\u5faa\u4e86\u4e00\u4e2a\u4e0e\u5b66\u4e60\u7387\uff08LR\uff09\u8870\u51cf\u76f8\u5173\u7684\u7f29\u653e\u5b9a\u5f8b\uff1a$L(s) = L_0 + A\\cdot S_1^{-\\alpha} - C\\cdot S_2$\u3002\u5176\u4e2d\uff0c$S_1$\u4ee3\u8868\u524d\u5411\u533a\u57df\uff0c$S_2$\u4ee3\u8868\u5b66\u4e60\u7387\u8870\u51cf\u533a\u57df\u3002\u8fd9\u4e00\u516c\u5f0f\u8003\u8651\u4e86\u4e24\u4e2a\u56e0\u7d20\uff1a\uff081\uff09\u4f20\u7edf\u7684\u7f29\u653e\u5f8b\u5b9a\u4e49\u7684\u524d\u5411\u7f29\u653e\uff1b\u4ee5\u53ca\uff082\uff09\u5b66\u4e60\u7387\u8870\u51cf\u5e26\u6765\u7684\u989d\u5916\u635f\u5931\u4e0b\u964d\u3002\u56e0\u6b64\uff0c\u8be5\u516c\u5f0f\u80fd\u591f\u63cf\u8ff0\u6bcf\u4e2a\u6b65\u9aa4\u7684\u5b8c\u6574\u635f\u5931\u66f2\u7ebf\uff0c\u800c\u975e\u4ec5\u9650\u4e8e\u8bad\u7ec3\u7ed3\u675f\u65f6\u7684\u5355\u4e00\u635f\u5931\u70b9\u3002\u901a\u8fc7\u5e94\u7528\u5305\u542b\u5b66\u4e60\u7387\u8870\u51cf\u7684\u7f29\u653e\u5f8b\uff0c\u5e76\u4ec5\u901a\u8fc7\u4e00\u5230\u4e24\u6b21\u8bad\u7ec3\u66f2\u7ebf\u62df\u5408\uff0c\u6211\u4eec\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\uff08LRS\uff09\u4e0b\u7684\u635f\u5931\u3002 \u6b64\u5916\uff0c\u8fd9\u4e00\u65b9\u7a0b\u51c6\u786e\u5730\u63cf\u8ff0\u4e86\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u52a8\u6001\uff0c\u5e76\u4e3a\u5148\u524d\u7814\u7a76\u4e2d\u5173\u6ce8\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u548c\u5b66\u4e60\u7387\u8870\u51cf\u7684\u76f8\u5173\u5b9e\u9a8c\u53d1\u73b0\u63d0\u4f9b\u4e86\u7406\u8bba\u9a8c\u8bc1\u548c\u89e3\u91ca\u3002\u7531\u6b64\u4ea7\u751f\u7684\u6d1e\u5bdf\uff0c\u4e5f\u4e3a\u7814\u7a76\u4eba\u5458\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u63d0\u524d\u9009\u62e9\u5173\u952e\u7684\u5b66\u4e60\u7387\u8c03\u5ea6\u7b56\u7565\u63d0\u4f9b\u4e86\u6307\u5bfc\u3002\u6700\u91cd\u8981\u7684\u662f\uff0c\u7531\u4e8e\u6574\u4e2a\u8bad\u7ec3\u66f2\u7ebf\u4e0a\u7684\u6240\u6709\u70b9\u90fd\u9075\u5faa\u8be5\u65b9\u7a0b\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u4efb\u4f55\u7ed9\u5b9a\u6b65\u9aa4\u548c\u4efb\u4f55\u5b66\u4e60\u7387\u8c03\u5ea6\u4e0b\u5b9e\u73b0\u51c6\u786e\u7684\u635f\u5931\u9884\u6d4b\uff0c\u800c\u6240\u9700\u8ba1\u7b97\u6210\u672c\u4ec5\u4e3a\u4f7f\u7528\u5c0f\u677e\u9f20\u7f29\u653e\u6cd5\u5219\u62df\u5408\u8bed\u8a00\u6a21\u578b\u635f\u5931\u6240\u9700\u76841%\u4ee5\u4e0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u6781\u5927\u5730\u4fc3\u8fdb\u4e86\u7f29\u653e\u5f8b\u62df\u5408\u548c\u9884\u6d4b\u5728\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fc7\u7a0b\u4e2d\u7684\u666e\u53ca\u6027\u3002|\n", "2408.11021": "|**2024-08-20**|**Athena: Safe Autonomous Agents with Verbal Contrastive Learning**|Tanmana Sadhu et.al.|[2408.11021](http://arxiv.org/abs/2408.11021)|null|\u7531\u4e8e\u65b0\u5174\u80fd\u529b\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u7528\u4f5c\u57fa\u4e8e\u8bed\u8a00\u7684\u4ee3\u7406\uff0c\u6267\u884c\u5404\u79cd\u4efb\u52a1\u5e76\u4ee5\u4e0d\u65ad\u589e\u957f\u7684\u7a0b\u5ea6\u81ea\u4e3b\u505a\u51fa\u51b3\u7b56\u3002\u8fd9\u4e9b\u81ea\u4e3b\u4ee3\u7406\u80fd\u591f\u7406\u89e3\u9ad8\u7ea7\u6307\u4ee4\u3001\u4e0e\u73af\u5883\u4e92\u52a8\uff0c\u5e76\u4f7f\u7528\u53ef\u7528\u7ed9\u5b83\u4eec\u7684\u5de5\u5177\u96c6\u6267\u884c\u590d\u6742\u4efb\u52a1\u3002\u968f\u7740\u4ee3\u7406\u80fd\u529b\u7684\u6269\u5c55\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u5b89\u5168\u6027\u548c\u53ef\u4fe1\u5ea6\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Athena\u6846\u67b6\uff0c\u5b83\u5229\u7528\u4e86\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u7684\u6982\u5ff5\uff0c\u901a\u8fc7\u5c06\u8fc7\u53bb\u5b89\u5168\u548c\u4e0d\u5b89\u5168\u7684\u8f68\u8ff9\u4f5c\u4e3a\u4e0a\u4e0b\u6587\uff08\u5bf9\u6bd4\uff09\u793a\u4f8b\u6765\u6307\u5bfc\u4ee3\u7406\u5411\u5b89\u5168\u6027\u53d1\u5c55\uff0c\u540c\u65f6\u5b8c\u6210\u7ed9\u5b9a\u7684\u4efb\u52a1\u3002\u8be5\u6846\u67b6\u8fd8\u6574\u5408\u4e86\u4e00\u4e2a\u6279\u5224\u6027\u673a\u5236\uff0c\u5728\u6bcf\u4e2a\u6b65\u9aa4\u4e0a\u5f15\u5bfc\u4ee3\u7406\u907f\u514d\u98ce\u9669\u884c\u4e3a\u3002\u6b64\u5916\uff0c\u7531\u4e8e\u7f3a\u4e4f\u5bf9\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u5b89\u5168\u63a8\u7406\u80fd\u529b\u7684\u73b0\u6709\u57fa\u51c6\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u6db5\u76d68\u4e2a\u7c7b\u522b\u5171\u8ba180\u4e2a\u5de5\u5177\u5305\u548c180\u4e2a\u573a\u666f\u7684\u4e00\u7ec4\u6570\u636e\u96c6\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u5b89\u5168\u8bc4\u4f30\u57fa\u51c6\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0c\u53e3\u5934\u5bf9\u6bd4\u5b66\u4e60\u548c\u4ea4\u4e92\u7ea7\u6279\u5224\u6027\u601d\u8003\u663e\u8457\u63d0\u9ad8\u4e86\u5b89\u5168\u6027\u7387\u3002|\n", "2408.11006": "|**2024-08-20**|**While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?**|Wen Cheng et.al.|[2408.11006](http://arxiv.org/abs/2408.11006)|**[link](https://github.com/sensente/security-attacks-on-lccts)**|**\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4ee3\u7801\u8865\u5168\u80fd\u529b\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\uff0c\u50ac\u751f\u4e86\u65b0\u4e00\u4ee3\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u8865\u5168\u5de5\u5177\uff08LCCT\uff09\u3002\u4e0e\u901a\u7528LLM\u4e0d\u540c\uff0c\u8fd9\u4e9b\u5de5\u5177\u5177\u6709\u72ec\u7279\u7684\u64cd\u4f5c\u6d41\u7a0b\uff0c\u6574\u5408\u591a\u79cd\u4fe1\u606f\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u4f18\u5148\u8003\u8651\u4ee3\u7801\u5efa\u8bae\u800c\u975e\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\uff0c\u8fd9\u5f15\u5165\u4e86\u7279\u5b9a\u7684\u5b89\u5168\u6311\u6218\u3002\u6b64\u5916\uff0cLCCT\u901a\u5e38\u4f9d\u8d56\u4e8e\u4e13\u6709\u4ee3\u7801\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u5f15\u53d1\u4e86\u5173\u4e8e\u654f\u611f\u6570\u636e\u6cc4\u9732\u7684\u62c5\u5fe7\u3002\u672c\u6587\u5229\u7528LCCT\u7684\u72ec\u7279\u7279\u6027\uff0c\u5f00\u53d1\u4e86\u9488\u5bf9\u4e24\u79cd\u5173\u952e\u5b89\u5168\u98ce\u9669\u7684\u9488\u5bf9\u6027\u653b\u51fb\u65b9\u6cd5\uff1a\u8d8a\u72f1\u653b\u51fb\u548c\u8bad\u7ec3\u6570\u636e\u63d0\u53d6\u653b\u51fb\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86LCCT\u4e2d\u5b58\u5728\u7684\u91cd\u5927\u6f0f\u6d1e\uff0c\u5305\u62ec\u5728GitHub Copilot\u4e0a\u768499.4%\u6210\u529f\u8d8a\u72f1\u653b\u51fb\u7387\uff0c\u5728Amazon Q\u4e0a\u768446.3%\u6210\u529f\u7387\u3002\u6211\u4eec\u8fd8\u6210\u529f\u4eceGitHub Copilot\u4e2d\u63d0\u53d6\u4e86\u654f\u611f\u7528\u6237\u6570\u636e\uff0c\u5305\u62ec54\u4e2a\u771f\u5b9e\u7535\u5b50\u90ae\u4ef6\u5730\u5740\u548c314\u4e2a\u4e0eGitHub\u7528\u6237\u540d\u5173\u8054\u7684\u7269\u7406\u5730\u5740\u3002\u7814\u7a76\u8fd8\u8868\u660e\uff0c\u8fd9\u4e9b\u57fa\u4e8e\u4ee3\u7801\u7684\u653b\u51fb\u65b9\u6cd5\u5bf9\u901a\u7528LLM\uff08\u5982GPT\u7cfb\u5217\uff09\u540c\u6837\u6709\u6548\uff0c\u7a81\u663e\u4e86\u73b0\u4ee3LLM\u5904\u7406\u4ee3\u7801\u65f6\u5b58\u5728\u7684\u66f4\u5e7f\u6cdb\u5b89\u5168\u95ee\u9898\u3002\u8fd9\u4e9b\u53d1\u73b0\u5f3a\u8c03\u4e86LCCT\u9762\u4e34\u7684\u5173\u952e\u5b89\u5168\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u52a0\u5f3a\u5176\u5b89\u5168\u6846\u67b6\u7684\u91cd\u8981\u65b9\u5411\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u7814\u7a76\u6210\u679c\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u76f8\u5173\u4ee3\u7801\u793a\u4f8b\u548c\u653b\u51fb\u6837\u672c\uff0c\u5b83\u4eec\u53ef\u4ecehttps://github.com/Sensente/Security-Attacks-on-LCCTs\u83b7\u53d6\u3002**|\n", "2408.10995": "|**2024-08-20**|**CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models**|Michael Reinisch et.al.|[2408.10995](http://arxiv.org/abs/2408.10995)|null|\u65b0\u533b\u7597\u6cbb\u7597\u65b9\u6cd5\u7684\u5f00\u53d1\u9700\u8981\u591a\u4e2a\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u3002\u5c3d\u7ba1\u5c06\u836f\u7269\u63a8\u5411\u5e02\u573a\u7684\u6210\u672c\u9ad8\u6602\u4e14\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u53ea\u6709\u4e0d\u523020%\u7684\u836f\u7269\u80fd\u4ece\u7b2c\u4e00\u9636\u6bb5\u8fc7\u6e21\u5230\u6700\u540e\u7684\u6279\u51c6\u3002\u8fd1\u671f\u7684\u7814\u7a76\u6587\u732e\u8868\u660e\uff0c\u8bd5\u9a8c\u65b9\u6848\u7684\u8bbe\u8ba1\u5bf9\u8bd5\u9a8c\u8868\u73b0\u6709\u7740\u663e\u8457\u5f71\u54cd\u3002\u6211\u4eec\u7814\u7a76\u4e86\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u9884\u6d4b\uff08CTOP\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528\u8bd5\u9a8c\u8bbe\u8ba1\u6587\u4ef6\u81ea\u52a8\u9884\u6d4b\u4e0d\u540c\u9636\u6bb5\u7684\u8f6c\u6362\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u9996\u4e2a\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684CTOP\u6a21\u578b\u2014\u2014CTP-LLM\u3002\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3aPhaseTransition\uff08PT\uff09\u7684\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u6839\u636e\u8bd5\u9a8c\u5728\u76d1\u7ba1\u8fc7\u7a0b\u4e2d\u7684\u8fdb\u5c55\u8fdb\u884c\u6807\u8bb0\uff0c\u5e76\u4f5c\u4e3aCTOP\u8bc4\u4f30\u7684\u6807\u51c6\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cbe\u7ec6\u8c03\u53c2GPT-3.5\u4e3a\u57fa\u7840\u7684\u6a21\u578b\uff08CTP-LLM\uff09\u80fd\u591f\u901a\u8fc7\u5206\u6790\u539f\u59cb\u534f\u8bae\u6587\u672c\u6765\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u9636\u6bb5\u7684\u8f6c\u6362\uff0c\u65e0\u9700\u4f9d\u8d56\u4eba\u7c7b\u9009\u62e9\u7684\u7279\u5f81\u3002CTP-LLM\u5728\u6240\u6709\u9636\u6bb5\u7684\u9884\u6d4b\u4e2d\u8fbe\u5230\u4e8667%\u7684\u51c6\u786e\u7387\uff0c\u5728\u9884\u6d4b\u4ece\u7b2c\u4e09\u9636\u6bb5\u5230\u6700\u7ec8\u6279\u51c6\u7684\u8f6c\u6362\u65f6\uff0c\u51c6\u786e\u7387\u66f4\u8fbe\u5230\u4e8675%\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u5f3a\u8c03\u4e86LLM\u9a71\u52a8\u5e94\u7528\u5728\u9884\u6d4b\u4e34\u5e8a\u8bd5\u9a8c\u7ed3\u679c\u548c\u8bc4\u4f30\u8bd5\u9a8c\u8bbe\u8ba1\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.10947": "|**2024-08-20**|**Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models**|Yuyan Chen et.al.|[2408.10947](http://arxiv.org/abs/2408.10947)|null|\u6559\u5e08\u5728\u4f20\u6388\u77e5\u8bc6\u548c\u5f15\u5bfc\u5b66\u4e60\u8005\u65b9\u9762\u53d1\u6325\u7740\u91cd\u8981\u4f5c\u7528\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u6f5c\u5728\u6559\u80b2\u8005\u7684\u89d2\u8272\u6b63\u5728\u6210\u4e3a\u4e00\u4e2a\u91cd\u8981\u7814\u7a76\u9886\u57df\u3002\u8ba4\u8bc6\u5230LLMs\u751f\u6210\u6559\u80b2\u5185\u5bb9\u7684\u80fd\u529b\u53ef\u4ee5\u63a8\u52a8\u81ea\u52a8\u5316\u548c\u4e2a\u6027\u5316\u5b66\u4e60\u7684\u8fdb\u5c55\u3002\u867d\u7136LLMs\u5728\u7406\u89e3\u529b\u548c\u89e3\u51b3\u95ee\u9898\u80fd\u529b\u65b9\u9762\u7684\u6d4b\u8bd5\u5df2\u7ecf\u8fdb\u884c\uff0c\u4f46\u5b83\u4eec\u5728\u6559\u5b66\u65b9\u9762\u7684\u6f5c\u529b\u4ecd\u9c9c\u4e3a\u4eba\u77e5\u3002\u5728\u6559\u5b66\u4e2d\uff0c\u63d0\u95ee\u662f\u4e00\u9879\u5173\u952e\u6280\u80fd\uff0c\u80fd\u591f\u6307\u5bfc\u5b66\u751f\u5206\u6790\u3001\u8bc4\u4f30\u5e76\u7efc\u5408\u6838\u5fc3\u6982\u5ff5\u548c\u539f\u7406\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u4e2a\u57fa\u51c6\u6765\u8bc4\u4f30\u6559\u80b2\u4e2dLLMs\u7684\u63d0\u95ee\u80fd\u529b\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u7684\u6559\u80b2\u95ee\u9898\uff0c\u5229\u7528\u5b89\u5fb7\u68ee\u548c\u514b\u62c9\u592b\u970d\u592b\u7684\u5206\u7c7b\u6cd5\u8986\u76d6\u4e00\u822c\u3001\u5355\u5b66\u79d1\u548c\u8de8\u5b66\u79d1\u9886\u57df\u3002\u6211\u4eec\u4ece\u5c06LLMs\u89c6\u4e3a\u5b66\u4e60\u8005\u8f6c\u5411\u5c06\u5176\u89c6\u4e3a\u6559\u80b2\u8005\uff0c\u901a\u8fc7\u8bc4\u4f30\u5b83\u4eec\u751f\u6210\u95ee\u9898\u7684\u80fd\u529b\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6559\u5b66\u80fd\u529b\u3002\u6211\u4eec\u5e94\u7528\u4e86\u56db\u4e2a\u6307\u6807\uff0c\u5305\u62ec\u76f8\u5173\u6027\u3001\u8986\u76d6\u7387\u3001\u4ee3\u8868\u6027\u4ee5\u53ca\u4e00\u81f4\u6027\uff0c\u6765\u8bc4\u4f30LLMs\u8f93\u51fa\u7684\u6559\u80b2\u8d28\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cGPT-4\u5728\u6559\u6388\u4e00\u822c\u3001\u4eba\u6587\u5b66\u79d1\u548c\u79d1\u5b66\u8bfe\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u663e\u8457\u6f5c\u529b\uff1bClaude2\u4f3c\u4e4e\u66f4\u9002\u5408\u62c5\u4efb\u8de8\u5b66\u79d1\u6559\u5e08\u3002\u6b64\u5916\uff0c\u81ea\u52a8\u8bc4\u5206\u4e0e\u4eba\u7c7b\u89c2\u70b9\u4e00\u81f4\u3002|\n", "2408.10946": "|**2024-08-20**|**Large Language Model Driven Recommendation**|Anton Korikov et.al.|[2408.10946](http://arxiv.org/abs/2408.10946)|null|### \u6458\u8981 \u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u65b0\u673a\u9047\u3002\u5728\u4e4b\u524d\u7684\u7ae0\u8282\u4e2d\uff0c\u6211\u4eec\u5173\u6ce8\u7684\u662f\u57fa\u4e8e\u6807\u51c6\u5316\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u5982\u8d2d\u4e70\u3001\u89c2\u770b\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u589e\u5f3a\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff0c\u8fd9\u4e3a\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6765\u6784\u5efa\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u63a8\u8350\u7cfb\u7edf\u5f00\u8f9f\u4e86\u65b0\u9014\u5f84\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u7684\u65b9\u5f0f\u4ecb\u7ecd\u5173\u952e\u7684\u6570\u636e\u6e90\uff0c\u6db5\u76d6\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u7684\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u63a5\u7740\uff0c\u8be6\u7ec6\u8ba8\u8bba\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u5305\u62ec\u8c03\u4f18\u548c\u672a\u8c03\u4f18\u60c5\u51b5\u4e0b\u7684\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\u3002\u7136\u540e\uff0c\u8f6c\u5411\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u7ba1\u9053\u4e2d\u534f\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u4fc3\u8fdb\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u63d0\u4f9b\u63a8\u8350\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e0e\u7528\u6237\u7684\u4e92\u52a8\uff0c\u7528\u4e8e\u504f\u597d\u63d0\u53d6\u3001\u6279\u8bc4\u548c\u95ee\u7b54\u3002 ### \u7ffb\u8bd1 \u672c\u6587\u63a2\u8ba8\u4e86\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6784\u5efa\u4e2a\u6027\u5316\u63a8\u8350\u7cfb\u7edf\u65b9\u9762\u7684\u65b0\u578b\u5e94\u7528\u3002\u6b64\u524d\u7ae0\u8282\u4e3b\u8981\u5173\u6ce8\u57fa\u4e8e\u6807\u51c6\u3001\u975e\u8a00\u8bed\u7528\u6237\u53cd\u9988\u7684\u63a8\u8350\u7cfb\u7edf\uff0c\u4f8b\u5982\u8d2d\u4e70\u3001\u6d4f\u89c8\u548c\u70b9\u51fb\u7b49\u884c\u4e3a\u3002\u7136\u800c\uff0c\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5b83\u4eec\u5177\u5907\u4e86\u901a\u7528\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u7684\u80fd\u529b\uff0c\u4ece\u800c\u6253\u5f00\u4e86\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u4ea4\u4e92\u6784\u5efa\u9ad8\u5ea6\u5b9a\u5236\u5316\u63a8\u8350\u7cfb\u7edf\u7684\u53ef\u80fd\u6027\u3002 \u672c\u7ae0\u9996\u5148\u901a\u8fc7\u5206\u7c7b\u65b9\u5f0f\u6982\u8ff0\u4e86\u5173\u952e\u6570\u636e\u6e90\uff0c\u5305\u62ec\u5546\u54c1\u63cf\u8ff0\u3001\u7528\u6237\u4e0e\u7cfb\u7edf\u4ea4\u4e92\u4ee5\u53ca\u7528\u6237\u6863\u6848\u3002\u968f\u540e\uff0c\u6df1\u5165\u63a2\u8ba8\u4e86\u57fa\u4e8eLLM\u7684\u63a8\u8350\u6280\u672f\uff0c\u6db5\u76d6\u4e86\u7f16\u7801\u5668\u4ec5\u4f7f\u7528\u548c\u81ea\u56de\u5f52\u63a8\u8350\u65b9\u6cd5\uff0c\u65e0\u8bba\u662f\u5728\u8c03\u4f18\u8fd8\u662f\u672a\u8c03\u4f18\u72b6\u6001\u4e0b\u3002\u63a5\u7740\uff0c\u8ba8\u8bba\u4e86\u591a\u6a21\u5757\u63a8\u8350\u67b6\u6784\uff0c\u5176\u4e2dLLM\u4e0e\u5176\u4ed6\u7ec4\u4ef6\u5982\u68c0\u7d22\u5668\u548c\u63a8\u8350\u7cfb\u7edf\u5728\u591a\u9636\u6bb5\u6d41\u7a0b\u4e2d\u534f\u540c\u5de5\u4f5c\u3002\u6700\u540e\uff0c\u4ecb\u7ecd\u4e86\u5bf9\u8bdd\u5f0f\u63a8\u8350\u7cfb\u7edf\uff08CRS\uff09\uff0c\u5728\u8fd9\u4e9b\u7cfb\u7edf\u4e2d\uff0cLLM\u652f\u6301\u591a\u8f6e\u5bf9\u8bdd\uff0c\u6bcf\u4e00\u8f6e\u4e0d\u4ec5\u7528\u4e8e\u751f\u6210\u63a8\u8350\uff0c\u8fd8\u80fd\u4e0e\u7528\u6237\u8fdb\u884c\u4e92\u52a8\uff0c\u8fdb\u884c\u504f\u597d\u6536\u96c6\u3001\u8bc4\u4ef7\u548c\u95ee\u7b54\u3002|\n", "2408.11813": "|**2024-08-21**|**SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs**|Yuanyang Yin et.al.|[2408.11813](http://arxiv.org/abs/2408.11813)|null|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u611f\u77e5\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u8868\u73b0\uff0c\u5b83\u4eec\u901a\u5e38\u7531\u89c6\u89c9\u7f16\u7801\u5668\u3001\u9002\u914d\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7ec4\u6210\u3002\u9002\u914d\u5668\u4f5c\u4e3a\u89c6\u89c9\u4e0e\u8bed\u8a00\u7ec4\u4ef6\u4e4b\u95f4\u7684\u5173\u952e\u6865\u6881\u3002\u7136\u800c\uff0c\u901a\u8fc7\u56fe\u50cf\u7ea7\u76d1\u7763\u8bad\u7ec3\u9002\u914d\u5668\u5f80\u5f80\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u5bf9\u9f50\u504f\u5dee\uff0c\u8fd9\u4f1a\u524a\u5f31LLM\u7684\u80fd\u529b\u5e76\u9650\u5236\u591a\u6a21\u6001LLM\u7684\u6f5c\u529b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u76d1\u7763\u5d4c\u5165\u5bf9\u9f50\uff08SEA\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u89c6\u89c9\u8bed\u8a00\u9884\u8bad\u7ec3\u6a21\u578b\uff08\u5982CLIP\uff09\u7684\u5206\u8bcd\u7ea7\u5bf9\u9f50\u65b9\u6cd5\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u6765\u8c03\u6574\u89c6\u89c9\u5206\u8bcd\u4e0eLLM\u5d4c\u5165\u7a7a\u95f4\u7684\u4e00\u81f4\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u786e\u4fdd\u4e86\u89c6\u89c9\u548c\u8bed\u8a00\u8868\u793a\u4e4b\u95f4\u66f4\u534f\u8c03\u7684\u6574\u5408\uff0c\u4ece\u800c\u589e\u5f3a\u591a\u6a21\u6001LLM\u7684\u6027\u80fd\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u540c\u65f6\u4fdd\u7559\u5176\u56fa\u6709\u7279\u6027\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cSEA\u6709\u6548\u5730\u63d0\u9ad8\u4e86MLLMs\uff0c\u7279\u522b\u662f\u5bf9\u4e8e\u8f83\u5c0f\u7684\u6a21\u578b\uff0c\u65e0\u9700\u989d\u5916\u7684\u6570\u636e\u6216\u63a8\u7406\u8ba1\u7b97\u3002\u6b64\u5916\uff0cSEA\u4e5f\u4e3a\u5f00\u53d1\u66f4\u901a\u7528\u548c\u9002\u5e94\u6027\u5f3a\u7684\u89e3\u51b3\u65b9\u6848\u4ee5\u589e\u5f3a\u591a\u6a21\u6001\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.11801": "|**2024-08-21**|**Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models**|Yuzhou Huang et.al.|[2408.11801](http://arxiv.org/abs/2408.11801)|null|\u4f20\u7edf\u89c6\u89c9\u53d9\u4e8b\u590d\u6742\uff0c\u9700\u8981\u4e13\u4e1a\u77e5\u8bc6\u548c\u5927\u91cf\u8d44\u6e90\uff0c\u4f46\u5f80\u5f80\u53d7\u9650\u4e8e\u4eba\u7c7b\u7684\u521b\u9020\u529b\u4e0e\u521b\u4f5c\u7cbe\u5ea6\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u589e\u5f3a\u4e86\u89c6\u89c9\u53d9\u4e8b\u80fd\u529b\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u5c40\u9650\u4e8e\u4e8c\u7ef4\u89c6\u89c9\u6548\u679c\u6216\u901a\u8fc7\u52a8\u4f5c\u5408\u6210\u548c\u884c\u4e3a\u6a21\u62df\u7b80\u5316\u6545\u4e8b\uff0c\u672a\u80fd\u751f\u6210\u5168\u9762\u3001\u591a\u7ef4\u7684\u53d9\u4e8b\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51faStory3D-Agent\uff0c\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLM\u7684\u80fd\u529b\u5c06\u63d0\u4f9b\u7684\u53d9\u4e8b\u8f6c\u5316\u4e3a\u4e09\u7ef4\u6e32\u67d3\u53ef\u89c6\u5316\u3002\u901a\u8fc7\u96c6\u6210\u7a0b\u5e8f\u5efa\u6a21\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u7cbe\u786e\u63a7\u5236\u591a\u89d2\u8272\u7684\u52a8\u4f5c\u548c\u52a8\u6001\uff0c\u4ee5\u53ca\u5404\u79cd\u88c5\u9970\u5143\u7d20\uff0c\u786e\u4fdd\u957f\u671f\u548c\u52a8\u6001\u7684\u4e09\u7ef4\u8868\u73b0\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u652f\u6301\u901a\u8fc7\u903b\u8f91\u63a8\u7406\u8fdb\u884c\u53d9\u4e8b\u6269\u5c55\uff0c\u786e\u4fdd\u751f\u6210\u7684\u5185\u5bb9\u4e0e\u73b0\u6709\u6761\u4ef6\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u5bf9Story3D-Agent\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u4ee5\u9a8c\u8bc1\u5176\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u672c\u6846\u67b6\u6765\u63a8\u52a8\u4e09\u7ef4\u6545\u4e8b\u8868\u793a\u7684\u53d1\u5c55\u3002|\n", "2408.11800": "|**2024-08-21**|**PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain**|Rounak Meyur et.al.|[2408.11800](http://arxiv.org/abs/2408.11800)|null|\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u548c\u6587\u672c\u751f\u6210\u9886\u57df\u5feb\u901f\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7684\u5174\u8d77\u4e3a\u901a\u8fc7\u5229\u7528\u7528\u6237\u6307\u5b9a\u6570\u636e\u5e93\u4e2d\u7684\u4fe1\u606f\u6765\u63d0\u9ad8\u751f\u6210\u6587\u672c\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u57fa\u51c6\u6d4b\u8bd5\u5bf9\u4e8e\u8bc4\u4f30\u548c\u6bd4\u8f83\u4e0d\u540cRAG\u914d\u7f6e\u5728\u68c0\u7d22\u5668\u548c\u751f\u6210\u5668\u65b9\u9762\u7684\u6027\u80fd\u81f3\u5173\u91cd\u8981\uff0c\u63d0\u4f9b\u4e86\u8fd9\u4e9b\u914d\u7f6e\u7684\u6709\u6548\u6027\u3001\u53ef\u6269\u5c55\u6027\u548c\u7279\u5b9a\u9886\u57df\u548c\u5e94\u7528\u7684\u9002\u7528\u6027\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u6846\u67b6\uff0c\u7528\u4e8e\u751f\u6210\u4e0e\u7279\u5b9a\u9886\u57df\u76f8\u5173\u7684RAG\u57fa\u51c6\u3002\u8be5\u6846\u67b6\u57fa\u4e8e\u81ea\u52a8\u95ee\u9898\u7b54\u6848\u751f\u6210\u4e0e\u4eba\u7c7b\uff08\u9886\u57df\u4e13\u5bb6\uff09-\u4eba\u5de5\u667a\u80fd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u534f\u4f5c\u7684\u81ea\u52a8\u5316\u8fc7\u7a0b\u3002\u4ee5\u6848\u4f8b\u7814\u7a76\u7684\u5f62\u5f0f\uff0c\u6211\u4eec\u901a\u8fc7\u5f15\u5165PermitQA\u4f5c\u4e3a\u98ce\u573a\u9009\u5740\u548c\u8bb8\u53ef\u9886\u57df\u7684\u9996\u4e2a\u57fa\u51c6\u8fdb\u884c\u4e86\u6846\u67b6\u5c55\u793a\uff0c\u8be5\u57fa\u51c6\u5305\u542b\u4e86\u4e0e\u98ce\u80fd\u9879\u76ee\u73af\u5883\u5f71\u54cd\u76f8\u5173\u7684\u591a\u7bc7\u79d1\u5b66\u6587\u6863/\u62a5\u544a\u3002 \u6211\u4eec\u7684\u6846\u67b6\u7cfb\u7edf\u5730\u4f7f\u7528\u591a\u79cd\u6307\u6807\u548c\u4e0d\u540c\u590d\u6742\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\u7c7b\u578b\u6765\u8bc4\u4f30RAG\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u4e0d\u540c\u6a21\u578b\u5728\u6211\u4eec\u7684\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002|\n", "2408.11795": "|**2024-08-21**|**EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model**|Feipeng Ma et.al.|[2408.11795](http://arxiv.org/abs/2408.11795)|null|\u5728\u591a\u6a21\u6001\u7814\u7a76\u9886\u57df\uff0c\u4f17\u591a\u7814\u7a76\u5229\u7528\u5927\u91cf\u7684\u56fe\u50cf-\u6587\u672c\u5bf9\u8fdb\u884c\u6a21\u6001\u5bf9\u9f50\u5b66\u4e60\uff0c\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLMs\uff09\u8f6c\u5316\u4e3a\u591a\u6a21\u6001LLMs\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u76ee\u524d\u4e3b\u8981\u7684\u5b9e\u73b0\u65b9\u6cd5\u5206\u4e3a\u4e24\u7c7b\uff1a\u81ea\u6ce8\u610f\u529b\u57fa\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u3002\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u56e0\u5176\u7b80\u5355\u7684\u591a\u5c42\u611f\u77e5\u673a\uff08MLP\uff09\u67b6\u6784\u800c\u5177\u6709\u8f83\u9ad8\u7684\u6570\u636e\u6548\u7387\uff0c\u4f46\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u5374\u76f8\u5bf9\u8f83\u4f4e\uff0c\u539f\u56e0\u5728\u4e8e\u5176\u9700\u8981\u5c06\u89c6\u89c9\u548c\u6587\u672c\u4ee4\u724c\u4f5c\u4e3a\u8f93\u5165\u8fdb\u884c\u8fde\u63a5\u3002\u800c\u4ea4\u53c9\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u867d\u7136\u5728\u989d\u5916\u7684\u5b66\u4e60\u53c2\u6570\u65b9\u9762\u4e0d\u5982\u81ea\u6ce8\u610f\u529b\u57fa\u65b9\u6cd5\u9ad8\u6548\uff0c\u4f46\u7531\u4e8e\u907f\u514d\u4e86\u4e3aLLM\u63d0\u4f9b\u8fc7\u957f\u5e8f\u5217\u8f93\u5165\uff0c\u56e0\u6b64\u5728\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8868\u73b0\u66f4\u9ad8\u3002\u4e3a\u4e86\u5e73\u8861\u8fd9\u4e9b\u6743\u8861\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u6570\u636e\u9ad8\u6548\u4e14\u8ba1\u7b97\u9ad8\u6548\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08EE-MLLM\uff09\u3002EE-MLLM\u5728\u4e0d\u5f15\u5165\u989d\u5916\u6a21\u5757\u6216\u53ef\u5b66\u4e60\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\uff0c\u5b9e\u73b0\u4e86\u6570\u636e\u548c\u8ba1\u7b97\u6548\u7387\u7684\u63d0\u5347\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u591a\u6a21\u6001LLM\u4e2d\u7684\u539f\u59cb\u81ea\u6ce8\u610f\u529b\u673a\u5236\u8fdb\u884c\u4e86\u6539\u8fdb\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u590d\u5408\u6ce8\u610f\u529b\u673a\u5236\u3002\u8be5\u673a\u5236\u6709\u4e24\u4e2a\u5173\u952e\u7279\u6027\uff1a1\uff09\u6d88\u9664\u89c6\u89c9\u4ee4\u724c\u5185\u90e8\u7684\u81ea\u6ce8\u610f\u529b\u8ba1\u7b97\uff0c\u4ee5\u5b9e\u73b0\u8ba1\u7b97\u6548\u7387\uff1b2\uff09\u91cd\u7528LLM\u6bcf\u4e00\u5c42\u7684\u6743\u91cd\uff0c\u4ee5\u4fc3\u8fdb\u89c6\u89c9\u4e0e\u8bed\u8a00\u4e4b\u95f4\u7684\u6709\u6548\u6a21\u6001\u5bf9\u9f50\uff0c\u4ece\u800c\u5b9e\u73b0\u6570\u636e\u6548\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cEE-MLLM\u5728\u5305\u62ecMMBench\u3001SeedBench\u7b49\u901a\u7528\u6027\u6570\u636e\u96c6\u4ee5\u53caTextVQA\u3001DocVQA\u7b49\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u5728\u5185\u7684\u591a\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u7684\u6709\u6548\u6027\u3002|\n", "2408.11793": "|**2024-08-21**|**Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design**|Nathaniel H. Park et.al.|[2408.11793](http://arxiv.org/abs/2408.11793)|null|\u5206\u5b50\u5c5e\u6027\u9884\u6d4b\u548c\u901a\u8fc7\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u8fdb\u884c\u751f\u6210\u8bbe\u8ba1\u662f\u7814\u7a76\u7684\u70ed\u70b9\u9886\u57df\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u5b83\u5728\u52a0\u901f\u65b0\u6750\u6599\u5f00\u53d1\u65b9\u9762\u7684\u6f5c\u529b\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u7684\u51fa\u73b0\uff0c\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u5f97\u5230\u4e86\u663e\u8457\u589e\u5f3a\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u5728\u66f4\u590d\u6742\u7684\u7814\u7a76\u4efb\u52a1\u80cc\u666f\u4e0b\u8fdb\u884c\u9884\u6d4b\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u6750\u6599\u8bbe\u8ba1\u4efb\u52a1\u4e2d\u7684\u4fe1\u606f\u68c0\u7d22\u65b9\u9762\uff0c\u4ee3\u7406\u7cfb\u7edf\u4ecd\u6709\u6539\u8fdb\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u5bf9\u9884\u6d4b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u66ff\u4ee3\u5e94\u7528\uff0c\u5982\u5229\u7528\u5b83\u4eec\u7684\u6f5c\u5728\u8868\u793a\u6765\u4fc3\u8fdb\u8de8\u6a21\u6001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff0c\u5728\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\u7cfb\u7edf\u4e2d\u5b9e\u73b0\u4efb\u52a1\u7279\u5b9a\u7684\u6750\u6599\u8bbe\u8ba1\uff0c\u8fd9\u4e00\u9886\u57df\u5c1a\u672a\u5f97\u5230\u63a2\u7d22\u3002 \u5728\u6b64\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5927\u89c4\u6a21\u3001\u9884\u8bad\u7ec3\u7684\u5316\u5b66\u57fa\u7840\u6a21\u578b\u53ef\u4ee5\u4f5c\u4e3a\u4f7f\u5316\u5b66\u4fe1\u606f\u68c0\u7d22\u8bed\u4e49\u5316\u7684\u57fa\u7840\uff0c\u9002\u7528\u4e8e\u5c0f\u5206\u5b50\u3001\u590d\u6742\u805a\u5408\u7269\u6750\u6599\u548c\u53cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5316\u5b66\u57fa\u7840\u6a21\u578b\u4e0e\u56fe\u50cf\u6a21\u578b\uff08\u5982OpenCLIP\uff09\u76f8\u7ed3\u5408\uff0c\u80fd\u591f\u5b9e\u73b0\u8de8\u591a\u4e2a\u8868\u5f81\u6570\u636e\u57df\u7684\u524d\u6240\u672a\u6709\u7684\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\u3002\u6700\u540e\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u4e9b\u7cfb\u7edf\u5728\u591a\u4ee3\u7406\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\uff0c\u4ee5\u652f\u6301\u7ed3\u6784\u548c\u62d3\u6251\u4e3a\u57fa\u7840\u7684\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\u548c\u4fe1\u606f\u68c0\u7d22\uff0c\u4ece\u800c\u4fc3\u8fdb\u590d\u6742\u7814\u7a76\u4efb\u52a1\u7684\u6267\u884c\u3002|\n", "2408.11791": "|**2024-08-21**|**Critique-out-Loud Reward Models**|Zachary Ankner et.al.|[2408.11791](http://arxiv.org/abs/2408.11791)|**[link](https://github.com/zankner/cloud)**|**\u4f20\u7edf\u7684\u5956\u52b1\u6a21\u578b\u5728\u4ece\u4eba\u7c7b\u53cd\u9988\u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff08RLHF\uff09\u65f6\uff0c\u4ec5\u7528\u4e8e\u76f4\u63a5\u9884\u6d4b\u504f\u597d\u5206\u6570\uff0c\u800c\u4e0d\u5229\u7528\u5e95\u5c42\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u80fd\u529b\u3002\u8fd9\u9650\u5236\u4e86\u5956\u52b1\u6a21\u578b\u7684\u80fd\u529b\uff0c\u56e0\u4e3a\u5b83\u4eec\u5fc5\u987b\u901a\u8fc7\u5355\u4e00\u524d\u5411\u4f20\u9012\u6765\u9690\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u5373\uff0c\u5fc5\u987b\u5728\u504f\u597d\u5efa\u6a21\u8fc7\u7a0b\u4e2d\u5b8c\u6210\u63a8\u7406\u3002\u4e3a\u4e86\u4f7f\u5956\u52b1\u6a21\u578b\u80fd\u591f\u663e\u5f0f\u5730\u63a8\u7406\u54cd\u5e94\u7684\u8d28\u91cf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u3002CLoud\u5956\u52b1\u6a21\u578b\u9996\u5148\u751f\u6210\u5bf9\u52a9\u624b\u54cd\u5e94\u7684\u81ea\u7136\u8bed\u8a00\u6279\u8bc4\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u6279\u8bc4\u6765\u9884\u6d4b\u54cd\u5e94\u8d28\u91cf\u7684\u6807\u91cf\u5956\u52b1\u3002 \u6211\u4eec\u8bc1\u660e\u4e86\u5bf9\u4e8eLlama-3-8B\u548c70B\u57fa\u7840\u6a21\u578b\uff0cCLoud\u5956\u52b1\u6a21\u578b\u7684\u6210\u529f\uff1a\u4e0e\u7ecf\u5178\u5956\u52b1\u6a21\u578b\u76f8\u6bd4\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5206\u522b\u5728RewardBench\u4e0a\u63d0\u9ad8\u4e868B\u548c70B\u57fa\u7840\u6a21\u578b\u7684\u4e8c\u5143\u504f\u597d\u5206\u7c7b\u51c6\u786e\u73874.65\u548c5.84\u4e2a\u767e\u5206\u70b9\u3002\u6b64\u5916\uff0c\u5f53\u4f5c\u4e3aBest-of-N\u8bc4\u5206\u6a21\u578b\u4f7f\u7528\u65f6\uff0cCLoud\u5956\u52b1\u6a21\u578b\u5728ArenaHard\u4e0a\u7684\u80dc\u7387\u4e5f\u5b9e\u73b0\u4e86\u5e15\u7d2f\u6258\u6539\u8fdb\u3002\u6700\u540e\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5982\u4f55\u5229\u7528CLoud\u5956\u52b1\u6a21\u578b\u7684\u52a8\u6001\u63a8\u7406\u8ba1\u7b97\u80fd\u529b\uff0c\u901a\u8fc7\u81ea\u6211\u4e00\u81f4\u6027\u89e3\u7801\u6765\u8fdb\u884c\u5956\u52b1\u9884\u6d4b\u3002 \u4ee5\u4e0a\u662f\u5173\u4e8e\u201c\u53e3\u5934\u6279\u8bc4\u201d\uff08CLoud\uff09\u5956\u52b1\u6a21\u578b\u7684\u6458\u8981\u7ffb\u8bd1\uff0c\u5b83\u5c55\u793a\u4e86\u8fd9\u79cd\u65b0\u578b\u5956\u52b1\u6a21\u578b\u5728\u63d0\u5347\u5f3a\u5316\u5b66\u4e60\u7cfb\u7edf\u6027\u80fd\u65b9\u9762\u7684\u6f5c\u529b\u3002**|\n", "2408.11788": "|**2024-08-21**|**DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework**|Zhifei Xie et.al.|[2408.11788](http://arxiv.org/abs/2408.11788)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cDreamFactory\u201d\u7684LLM\u57fa\u6846\u67b6\uff0c\u5b83\u80fd\u89e3\u51b3\u5f53\u524d\u89c6\u9891\u751f\u6210\u6a21\u578b\u5728\u521b\u5efa\u957f\u89c6\u9891\u65f6\u9047\u5230\u7684\u6311\u6218\u3002DreamFactory\u901a\u8fc7\u591a\u667a\u80fd\u4f53\u534f\u4f5c\u539f\u5219\u548c\u5173\u952e\u5e27\u8fed\u4ee3\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u786e\u4fdd\u4e86\u957f\u89c6\u9891\u7684\u4e00\u81f4\u6027\u548c\u98ce\u683c\u7edf\u4e00\u3002\u5b83\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08Chain of Thought\uff0cCOT\uff09\u6765\u5904\u7406\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56fa\u6709\u7684\u4e0d\u786e\u5b9a\u6027\u3002DreamFactory\u80fd\u591f\u751f\u6210\u957f\u3001\u98ce\u683c\u4e00\u81f4\u4e14\u590d\u6742\u7684\u89c6\u9891\u3002 \u5bf9\u4e8e\u8fd9\u4e9b\u957f\u5f62\u5f0f\u89c6\u9891\u7684\u8bc4\u4f30\u63d0\u51fa\u4e86\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u65b0\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u5982\u8de8\u573a\u666f\u9762\u90e8\u8ddd\u79bb\u5206\u6570\u548c\u8de8\u573a\u666f\u98ce\u683c\u4e00\u81f4\u6027\u5206\u6570\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u7814\u7a76\uff0c\u6211\u4eec\u8d21\u732e\u4e86\u4e00\u4e2a\u5305\u542b\u8d85\u8fc7150\u4e2a\u7531\u4eba\u7c7b\u8bc4\u5206\u7684\u591a\u573a\u666f\u89c6\u9891\u7684\u591a\u573a\u666f\u89c6\u9891\u6570\u636e\u96c6\u3002|\n", "2408.11779": "|**2024-08-21**|**Personality Alignment of Large Language Models**|Minjun Zhu et.al.|[2408.11779](http://arxiv.org/abs/2408.11779)|**[link](https://github.com/zhu-minjun/palign)**|**\u4e3a\u4e86\u5f25\u8865\u73b0\u6709\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5bf9\u9f50\u65b9\u6cd5\u5728\u53cd\u6620\u4eba\u7c7b\u666e\u904d\u4ef7\u503c\u89c2\u548c\u884c\u4e3a\u65f6\u7684\u4e0d\u8db3\uff0c\u5ffd\u89c6\u4e86\u4e2a\u4f53\u7528\u6237\u72ec\u7279\u7279\u5f81\u548c\u504f\u597d\u7684\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e2a\u6027\u5bf9\u9f50\u7684\u6982\u5ff5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u6839\u636e\u4e2a\u4f53\u7528\u6237\u6216\u7d27\u5bc6\u5173\u8054\u7fa4\u4f53\u7684\u5177\u4f53\u504f\u597d\u8c03\u6574LLM\u7684\u54cd\u5e94\u4e0e\u51b3\u7b56\u3002\u53d7\u5fc3\u7406\u6d4b\u91cf\u5b66\u7684\u542f\u53d1\uff0c\u6211\u4eec\u6784\u5efa\u4e86Personality Alignment with Personality Inventories (PAPI) \u6570\u636e\u96c6\uff0c\u5305\u542b\u4e8630\u4e07\u771f\u5b9e\u4e3b\u4f53\u7684\u6570\u636e\uff0c\u6bcf\u4e2a\u4e3b\u4f53\u57fa\u4e8e\u4e94\u5927\u4eba\u683c\u56e0\u7d20\u63d0\u4f9b\u884c\u4e3a\u504f\u597d\u4fe1\u606f\u3002\u8fd9\u4e00\u6570\u636e\u96c6\u4f7f\u6211\u4eec\u80fd\u591f\u5b9a\u91cf\u8bc4\u4f30LLM\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u4e0e\u6bcf\u4e2a\u4e3b\u4f53\u7684\u884c\u4e3a\u6a21\u5f0f\u76f8\u5339\u914d\u3002\u9274\u4e8e\u4e2a\u6027\u5bf9\u9f50\u9762\u4e34\u7684\u6311\u6218\uff1a\u5982\u4e2a\u4eba\u6570\u636e\u6709\u9650\u3001\u504f\u597d\u591a\u6837\u4ee5\u53ca\u53ef\u6269\u5c55\u6027\u9700\u6c42\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u6fc0\u6d3b\u5e72\u9884\u4f18\u5316\u65b9\u6cd5\u3002\u8fd9\u79cd\u65b9\u6cd5\u5229\u7528\u6700\u5c11\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u63d0\u9ad8\u4e86LLM\u9ad8\u6548\u5bf9\u9f50\u4e2a\u4f53\u884c\u4e3a\u504f\u597d\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u65b9\u6cd5PAS\u4e0d\u4ec5\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86DPO\uff0c\u800c\u4e14\u4f18\u5316\u65f6\u95f4\u4ec5\u4e3a\u540e\u8005\u7684\u4e94\u5206\u4e4b\u4e00\uff0c\u5177\u6709\u5b9e\u9645\u4ef7\u503c\uff0c\u63a8\u52a8\u4e86\u4e2a\u6027\u5316\u7684AI\u7cfb\u7edf\u51b3\u7b56\u4e0e\u63a8\u7406\u7684\u53d1\u5c55\uff0c\u589e\u5f3a\u4e86\u4e0e\u6bcf\u4f4d\u7528\u6237\u7684\u4ea4\u4e92\u76f8\u5173\u6027\u548c\u610f\u4e49\uff0c\u4fc3\u8fdb\u4e86\u4ee5\u4eba\u4e3a\u672c\u7684\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u53d1\u5e03\u5728\u3002**|\n", "2408.11775": "|**2024-08-21**|**Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards**|Omar Erak et.al.|[2408.11775](http://arxiv.org/abs/2408.11775)|**[link](https://github.com/Nouf-Alabbasi/oKUmura_AI_Telecom_challenge)**|**\u8fd1\u671f\u7684\u7814\u7a76\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7535\u4fe1\u6807\u51c6\u65b9\u9762\u7684\u6280\u672f\u89c4\u8303\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8ePhi-2\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09\u7684\u5fae\u8c03\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u7cfb\u7edf\uff0c\u65e8\u5728\u4f5c\u4e3a\u901a\u4fe1\u7f51\u7edc\u7684\u6743\u5a01\u7b54\u6848\u6765\u6e90\u3002\u6211\u4eec\u5f00\u53d1\u7684\u7cfb\u7edf\u5229\u7528\u524d\u77bb\u6027\u7684\u8bed\u4e49\u5206\u5757\u6765\u52a8\u6001\u786e\u5b9a\u89e3\u6790\u65ad\u70b9\uff0c\u4f9d\u636e\u5d4c\u5165\u76f8\u4f3c\u5ea6\u8fdb\u884c\u8c03\u6574\uff0c\u4ece\u800c\u6709\u6548\u5904\u7406\u591a\u79cd\u6587\u6863\u683c\u5f0f\u3002\u9488\u5bf9\u6280\u672f\u6807\u51c6\u4e2d\u53ef\u80fd\u51fa\u73b0\u7684\u591a\u4e2a\u76f8\u4f3c\u4e0a\u4e0b\u6587\u95ee\u9898\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u91cd\u65b0\u6392\u540d\u7b97\u6cd5\u4ee5\u4f18\u5148\u8003\u8651\u6700\u76f8\u5173\u7684\u63d0\u53d6\u7247\u6bb5\u3002\u8003\u8651\u5230Phi-2\u7684\u5c0f\u8bed\u5883\u7a97\u53e3\u9650\u5236\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aSelfExtend\u7684\u6700\u65b0\u6280\u672f\uff0c\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u6269\u5c55\u8bed\u5883\u7a97\u53e3\uff0c\u4e0d\u4ec5\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u8fd8\u80fd\u9002\u5e94\u5ba2\u6237\u5230\u4e13\u4e1a\u6280\u672f\u4eba\u5458\u7684\u5404\u79cd\u67e5\u8be2\u548c\u8bbe\u8ba1\u9700\u6c42\u3002\u4e3a\u4e86\u5fae\u8c03\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u4f4e\u79e9\u9002\u914d\uff08LoRA\uff09\u6280\u672f\uff0c\u5728\u8bad\u7ec3\u65f6\u63d0\u9ad8\u8ba1\u7b97\u6548\u7387\uff0c\u5e76\u5728\u5c0f\u6570\u636e\u96c6\u4e0a\u5b9e\u73b0\u6709\u6548\u7684\u5fae\u8c03\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u8868\u660e\uff0c\u5728\u7535\u4fe1\u9886\u57df\u5bf9\u73b0\u6709\u95ee\u7b54\u65b9\u6cd5\u7684\u663e\u8457\u6539\u8fdb\uff0c\u6027\u80fd\u8d85\u8fc7GPT-4\uff08\u5927\u7ea6\u662f\u5176\u89c4\u6a21\u7684880\u500d\uff09\u3002\u8fd9\u9879\u5de5\u4f5c\u5c55\u793a\u4e86\u5229\u7528SLM\u5728\u901a\u4fe1\u7f51\u7edc\u4e2d\u7684\u65b0\u65b9\u6cd5\uff0c\u63d0\u4f9b\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u53ef\u4f5c\u4e3a\u6784\u5efa\u667a\u80fd\u8bed\u8a00\u6a21\u578b\u7684\u57fa\u7840\u3002**|\n", "2408.11749": "|**2024-08-21**|**Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks**|Yiyi Chen et.al.|[2408.11749](http://arxiv.org/abs/2408.11749)|**[link](https://github.com/siebeniris/vec2text_exp)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u6765\u81ea\u7f51\u7edc\u653b\u51fb\u8005\u7684\u6076\u610f\u5f71\u54cd\uff0c\u5982\u5bf9\u6297\u6027\u3001\u540e\u95e8\u548c\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u3002\u5bf9\u6b64\uff0c\u65b0\u5174\u7684LLM\u5b89\u5168\u9886\u57df\u81f4\u529b\u4e8e\u7814\u7a76\u5e76\u9632\u5fa1\u6b64\u7c7b\u5a01\u80c1\u3002\u8fc4\u4eca\u4e3a\u6b62\uff0c\u8be5\u9886\u57df\u7684\u5927\u591a\u6570\u5de5\u4f5c\u90fd\u96c6\u4e2d\u5728\u82f1\u8bed\u5355\u4e00\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u7136\u800c\uff0c\u6700\u65b0\u7814\u7a76\u8868\u660e\uff0c\u591a\u8bed\u8a00LLM\u53ef\u80fd\u6bd4\u5176\u5355\u4e00\u8bed\u8a00\u540c\u50da\u66f4\u6613\u53d7\u5230\u5404\u79cd\u653b\u51fb\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86\u5728\u90e8\u5206\u6b27\u6d32\u8bed\u8a00\u4e0a\u7684\u5d4c\u5165\u53cd\u8f6c\uff0c\u4f46\u8981\u5c06\u8fd9\u4e9b\u53d1\u73b0\u63a8\u53ca\u5230\u4e0d\u540c\u8bed\u7cfb\u548c\u4e0d\u540c\u4e66\u5199\u7cfb\u7edf\u7684\u8bed\u8a00\uff0c\u5374\u6781\u5177\u6311\u6218\u6027\u3002\u56e0\u6b64\uff0c\u672c\u7814\u7a76\u65e8\u5728\u63a2\u7d22\u591a\u8bed\u8a00LLM\u5728\u5d4c\u5165\u53cd\u8f6c\u653b\u51fb\u4e0b\u7684\u5b89\u5168\u6027\uff0c\u5e76\u572820\u79cd\u8bed\u8a00\u4e2d\u8fdb\u884c\u8de8\u8bed\u8a00\u548c\u8de8\u4e66\u5199\u7684\u53cd\u8f6c\u6d4b\u8bd5\uff0c\u8986\u76d68\u4e2a\u8bed\u7cfb\u548c12\u79cd\u4e66\u5199\u7cfb\u7edf\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u963f\u62c9\u4f2f\u5b57\u6bcd\u548c\u897f\u91cc\u5c14\u5b57\u6bcd\u4e66\u5199\u7684\u8bed\u8a00\u4ee5\u53ca\u5370\u5ea6-\u96c5\u5229\u5b89\u8bed\u7cfb\u7684\u8bed\u8a00\u7279\u522b\u5bb9\u6613\u53d7\u5230\u5d4c\u5165\u53cd\u8f6c\u7684\u5f71\u54cd\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u89c2\u5bdf\u5230\u53cd\u8f6c\u6a21\u578b\u503e\u5411\u4e8e\u51fa\u73b0\u8bed\u8a00\u6df7\u6dc6\uff0c\u6709\u65f6\u5927\u5e45\u5ea6\u964d\u4f4e\u4e86\u653b\u51fb\u7684\u6709\u6548\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u63a2\u7d22\u4e86\u8fd9\u4e00\u74f6\u9888\uff0c\u63ed\u793a\u4e86\u4e00\u4e9b\u53ef\u9884\u6d4b\u6a21\u5f0f\uff0c\u8fd9\u53ef\u80fd\u88ab\u653b\u51fb\u8005\u5229\u7528\u3002\u6700\u7ec8\uff0c\u672c\u7814\u7a76\u65e8\u5728\u6df1\u5316\u5bf9\u591a\u8bed\u8a00LLM\u9762\u4e34\u7684\u4e3b\u8981\u5b89\u5168\u6f0f\u6d1e\u7684\u7406\u89e3\uff0c\u5e76\u63d0\u9ad8\u5bf9\u6700\u6613\u53d7\u8fd9\u4e9b\u653b\u51fb\u5f71\u54cd\u7684\u8bed\u8a00\u7684\u610f\u8bc6\u3002|\n", "2408.12599": "|**2024-08-22**|**Controllable Text Generation for Large Language Models: A Survey**|Xun Liang et.al.|[2408.12599](http://arxiv.org/abs/2408.12599)|**[link](https://github.com/iaar-shanghai/ctgsurvey)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u4e86\u5353\u8d8a\u7684\u6587\u672c\u751f\u6210\u8d28\u91cf\u3002\u7136\u800c\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0cLLMs\u9700\u8981\u6ee1\u8db3\u65e5\u76ca\u590d\u6742\u7684\u9700\u6c42\u3002\u9664\u4e86\u907f\u514d\u8bef\u5bfc\u6027\u6216\u4e0d\u9002\u5f53\u7684\u5185\u5bb9\uff0cLLMs\u8fd8\u88ab\u671f\u671b\u6839\u636e\u7279\u5b9a\u7528\u6237\u9700\u6c42\u8fdb\u884c\u8c03\u6574\uff0c\u5982\u6a21\u4eff\u7279\u5b9a\u7684\u5199\u4f5c\u98ce\u683c\u6216\u751f\u6210\u5bcc\u6709\u8bd7\u610f\u7684\u6587\u672c\u3002\u8fd9\u4e9b\u591a\u6837\u7684\u9700\u6c42\u63a8\u52a8\u4e86\u53ef\u63a7\u6587\u672c\u751f\u6210\uff08CTG\uff09\u6280\u672f\u7684\u53d1\u5c55\uff0c\u65e8\u5728\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u7b26\u5408\u9884\u8bbe\u7684\u63a7\u5236\u6761\u4ef6\uff0c\u5982\u5b89\u5168\u6027\u3001\u60c5\u611f\u503e\u5411\u3001\u4e3b\u9898\u4e00\u81f4\u6027\u4ee5\u53ca\u8bed\u8a00\u98ce\u683c\uff0c\u540c\u65f6\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u6709\u7528\u6027\u3001\u6d41\u7545\u6027\u548c\u591a\u6837\u6027\u3002 \u672c\u6587\u7cfb\u7edf\u5730\u56de\u987e\u4e86CTG\u5728LLMs\u9886\u57df\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u8be6\u7ec6\u5b9a\u4e49\u4e86\u5176\u6838\u5fc3\u6982\u5ff5\uff0c\u5e76\u660e\u786e\u4e86\u63a7\u5236\u6761\u4ef6\u548c\u6587\u672c\u8d28\u91cf\u7684\u8981\u6c42\u3002\u6211\u4eec\u5c06CTG\u4efb\u52a1\u5206\u4e3a\u4e24\u5927\u7c7b\uff1a\u5185\u5bb9\u63a7\u5236\u548c\u5c5e\u6027\u63a7\u5236\uff0c\u5e76\u5bf9\u6bcf\u79cd\u7c7b\u578b\u7684\u65b9\u6cd5\u8fdb\u884c\u4e86\u8ba8\u8bba\uff0c\u5305\u62ec\u6a21\u578b\u91cd\u8bad\u7ec3\u3001\u5fae\u8c03\u3001\u5f3a\u5316\u5b66\u4e60\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6f5c\u5728\u7a7a\u95f4\u64cd\u7eb5\u548c\u89e3\u7801\u65f6\u5e72\u9884\u3002\u6211\u4eec\u5206\u6790\u4e86\u6bcf\u79cd\u65b9\u6cd5\u7684\u7279\u70b9\u3001\u4f18\u52bf\u548c\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u5b9e\u73b0\u751f\u6210\u63a7\u5236\u7684\u6df1\u5165\u89c1\u89e3\u3002\u6b64\u5916\uff0c\u6211\u4eec\u56de\u987e\u4e86CTG\u8bc4\u4f30\u65b9\u6cd5\u3001\u603b\u7ed3\u4e86\u5176\u8de8\u9886\u57df\u7684\u5e94\u7528\uff0c\u5e76\u6307\u51fa\u4e86\u5f53\u524d\u7814\u7a76\u7684\u5173\u952e\u6311\u6218\uff0c\u5982\u6d41\u7545\u5ea6\u548c\u5b9e\u7528\u6027\u7684\u964d\u4f4e\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u82e5\u5e72\u547c\u5401\uff0c\u5f3a\u8c03\u672a\u6765\u7814\u7a76\u5e94\u66f4\u6ce8\u91cd\u5b9e\u9645\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u4e3a\u8be5\u9886\u57df\u7684\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u6307\u5bfc\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u548c\u4e2d\u6587\u7248\u672c\u5df2\u5f00\u6e90\u5728https://github.com/IAAR-Shanghai/CTGSurvey\u3002**|\n", "2408.12579": "|**2024-08-22**|**RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment**|Xiaohan Wang et.al.|[2408.12579](http://arxiv.org/abs/2408.12579)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u3001MedPaLM-2\u548cMed-Gemini\u5728\u5404\u7c7b\u533b\u7597\u8bc4\u4f30\u6307\u6807\u4e0a\u8868\u73b0\u51fa\u4e0e\u533b\u5b66\u4e13\u5bb6\u7ade\u4e89\u7684\u6027\u80fd\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u4e0e\u533b\u751f\u76f8\u5ab2\u7f8e\u7684\u4e13\u4e1a\u8bca\u65ad\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u7279\u522b\u662f\u5728\u9ad8\u6548\u6536\u96c6\u60a3\u8005\u4fe1\u606f\u4ee5\u53ca\u63a8\u7406\u6700\u7ec8\u8bca\u65ad\u7684\u8fc7\u7a0b\u4e2d\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aRuleAlign\u7684\u6846\u67b6\uff0c\u65e8\u5728\u4f7fLLM\u4e0e\u7279\u5b9a\u8bca\u65ad\u89c4\u5219\u4fdd\u6301\u4e00\u81f4\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u5305\u542b\u57fa\u4e8e\u89c4\u5219\u7684\u533b\u60a3\u5bf9\u8bdd\u6570\u636e\u96c6\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cd\u901a\u8fc7\u504f\u597d\u5b66\u4e60\u8fdb\u884c\u5bf9\u9f50\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u671f\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u591f\u542f\u53d1\u63a2\u7d22LLM\u4f5c\u4e3aAI\u533b\u5e08\u7684\u6f5c\u529b\u3002|\n", "2408.12570": "|**2024-08-22**|**Jamba-1.5: Hybrid Transformer-Mamba Models at Scale**|Jamba Team et.al.|[2408.12570](http://arxiv.org/abs/2408.12570)|null|\u6211\u4eec\u63a8\u51fa\u4e86Jamba-1.5\uff0c\u57fa\u4e8e\u6211\u4eecJamba\u67b6\u6784\u7684\u65b0\u578b\u6307\u4ee4\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002Jamba\u662f\u4e00\u79cd\u6df7\u5408Transformer-Mamba\u4e13\u5bb6\u6df7\u5408\u67b6\u6784\uff0c\u5b83\u5728\u4e0a\u4e0b\u6587\u957f\u5ea6\u8303\u56f4\u5185\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u4f7f\u7528\uff0c\u540c\u65f6\u4fdd\u6301\u4e0eTransformer\u6a21\u578b\u76f8\u540c\u6216\u66f4\u597d\u7684\u8d28\u91cf\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4e24\u79cd\u6a21\u578b\u5927\u5c0f\uff1aJamba-1.5-Large\uff0c\u5177\u670994B\u4e2a\u6d3b\u8dc3\u53c2\u6570\uff1b\u4ee5\u53caJamba-1.5-Mini\uff0c\u5177\u670912B\u4e2a\u6d3b\u8dc3\u53c2\u6570\u3002\u8fd9\u4e24\u79cd\u6a21\u578b\u5747\u9488\u5bf9\u591a\u79cd\u5bf9\u8bdd\u548c\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5e76\u4e14\u5177\u6709256K\u4ee4\u724c\u7684\u6700\u5927\u6709\u6548\u4e0a\u4e0b\u6587\u957f\u5ea6\uff0c\u5728\u5f00\u653e\u6743\u91cd\u6a21\u578b\u4e2d\u6700\u5927\u3002\u4e3a\u4e86\u652f\u6301\u6210\u672c\u6548\u76ca\u7684\u63a8\u7406\uff0c\u6211\u4eec\u5f15\u5165\u4e86ExpertsInt8\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u91cf\u5316\u6280\u672f\uff0c\u5141\u8bb8\u5728\u5904\u7406256K\u4ee4\u724c\u4e0a\u4e0b\u6587\u65f6\u5c06Jamba-1.5-Large\u6a21\u578b\u653e\u5165\u5177\u67098\u4e2a80GB GPU\u7684\u673a\u5668\u4e0a\u800c\u4e0d\u4f1a\u635f\u5931\u8d28\u91cf\u3002\u5f53\u5728\u4e00\u7cfb\u5217\u5b66\u672f\u548c\u804a\u5929\u673a\u5668\u4eba\u57fa\u51c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\u65f6\uff0cJamba-1.5\u6a21\u578b\u53d6\u5f97\u4e86\u51fa\u8272\u7684\u7ed3\u679c\uff0c\u540c\u65f6\u63d0\u4f9b\u4e86\u9ad8\u541e\u5410\u91cf\u5e76\u4f18\u4e8e\u5176\u4ed6\u5f00\u653e\u6743\u91cd\u6a21\u578b\u5728\u957f\u4e0a\u4e0b\u6587\u57fa\u51c6\u4e0a\u7684\u6027\u80fd\u3002\u4e24\u79cd\u5927\u5c0f\u7684\u6a21\u578b\u7684\u6743\u91cd\u90fd\u6839\u636eJamba\u5f00\u653e\u6a21\u578b\u8bb8\u53ef\u516c\u5f00\u63d0\u4f9b\uff0c\u5e76\u4e14\u6211\u4eec\u53d1\u5e03\u4e86ExpertsInt8\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u3002|\n", "2408.12561": "|**2024-08-22**|**ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation**|Lujia Zhong et.al.|[2408.12561](http://arxiv.org/abs/2408.12561)|**[link](https://github.com/lujiazho/ssprop)**|**\u8fd1\u671f\uff0c\u6df1\u5ea6\u5b66\u4e60\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u6a21\u578b\u9886\u57df\uff0c\u5982\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u6982\u7387\u6027\u6269\u6563\u6a21\u578b\u3002\u7136\u800c\uff0c\u8bad\u7ec3\u8fd9\u4e9b\u6a21\u578b\u5f80\u5f80\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\uff0c\u6d88\u8017\u6570\u5341\u4ebf\u7684\u6d6e\u70b9\u8fd0\u7b97\uff08petaFLOPs\uff09\uff0c\u5bfc\u81f4\u5de8\u5927\u7684\u80fd\u6e90\u6d88\u8017\u548c\u78b3\u8db3\u8ff9\uff0c\u5f15\u53d1\u4e86\u5bf9\u73af\u5883\u7684\u91cd\u5927\u62c5\u5fe7\u3002\u5728\u8bad\u7ec3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u7684\u8fc7\u7a0b\u4e2d\uff0c\u53cd\u5411\u4f20\u64ad\uff08Back-propagation, BP\uff09\u662f\u4e3b\u8981\u7684\u8ba1\u7b97\u8d1f\u62c5\u6765\u6e90\u3002 \u4e3a\u4e86\u63a8\u52a8\u80fd\u6e90\u6548\u7387\u7684\u63d0\u9ad8\uff0c\u5e76\u5141\u8bb8\u5728\u4efb\u4f55\u673a\u5668\u548c\u8bbe\u5907\u4e0a\u5b9e\u73b0\u7a00\u758f\u5b66\u4e60\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u3001\u80fd\u6e90\u9ad8\u6548\u7684\u5377\u79ef\u6a21\u5757\uff0c\u5b83\u80fd\u591f\u65e0\u7f1d\u96c6\u6210\u5230\u4efb\u4f55\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u4e2d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u901a\u9053\u7ea7\u7a00\u758f\u6027\uff0c\u5e76\u57fa\u4e8e\u5047\u8bbeBP\u901a\u5e38\u5bc6\u96c6\u4e14\u4f4e\u6548\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u8fc7\u62df\u5408\u548c\u9ad8\u8ba1\u7b97\u6d88\u8017\uff0c\u63d0\u51fa\u4e86\u989d\u5916\u7684\u68af\u5ea6\u9009\u62e9\u8c03\u5ea6\u5668\uff0c\u5728\u53cd\u5411\u4f20\u64ad\u9636\u6bb5\u8fdb\u884c\u9009\u62e9\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53ef\u4ee5\u51cf\u5c1140%\u7684\u8ba1\u7b97\u91cf\uff0c\u540c\u65f6\u6709\u53ef\u80fd\u63d0\u5347\u6a21\u578b\u6027\u80fd\uff0c\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u751f\u6210\u4efb\u52a1\u4e0a\u5f97\u5230\u9a8c\u8bc1\u3002\u8fd9\u79cd\u51cf\u5c11\u53ef\u4ee5\u5e26\u6765\u663e\u8457\u7684\u80fd\u6e90\u8282\u7701\u548c\u8f83\u4f4e\u7684\u78b3\u8db3\u8ff9\uff0c\u5c24\u5176\u662f\u5728\u5927\u578bAI\u7cfb\u7edf\u7684\u7814\u7a76\u4e0e\u5f00\u53d1\u9636\u6bb5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4ee5\u4e0d\u540c\u4e8eDropout\u7684\u65b9\u5f0f\u7f13\u89e3\u4e86\u8fc7\u62df\u5408\u95ee\u9898\uff0c\u5141\u8bb8\u5b83\u4e0eDropout\u7ed3\u5408\u4f7f\u7528\uff0c\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u5e76\u964d\u4f4e\u8ba1\u7b97\u8d44\u6e90\u6d88\u8017\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u9002\u7528\u4e8e\u5404\u79cd\u6570\u636e\u96c6\u548c\u4efb\u52a1\uff0c\u5e76\u4e0e\u591a\u79cd\u6df1\u5ea6\u5b66\u4e60\u67b6\u6784\u548c\u6a21\u5757\u517c\u5bb9\u3002\u76f8\u5173\u4ee3\u7801\u5df2\u516c\u5f00\u53d1\u5e03\u5728https://github.com/lujiazho/ssProp\u3002**|\n", "2408.12547": "|**2024-08-22**|**Towards Evaluating and Building Versatile Large Language Models for Medicine**|Chaoyi Wu et.al.|[2408.12547](http://arxiv.org/abs/2408.12547)|**[link](https://github.com/magic-ai4med/meds-ins)**|**\u5728\u8fd9\u9879\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5168\u9762\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014MedS-Bench\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e34\u5e8a\u573a\u666f\u4e2d\u7684\u6027\u80fd\u3002\u4e0e\u73b0\u6709\u4fa7\u91cd\u4e8e\u591a\u9879\u9009\u62e9\u95ee\u9898\u56de\u7b54\u7684\u57fa\u51c6\u4e0d\u540c\uff0cMedS-Bench\u8986\u76d6\u4e8611\u4e2a\u9ad8\u7ea7\u522b\u4e34\u5e8a\u4efb\u52a1\uff0c\u5305\u62ec\u4e34\u5e8a\u62a5\u544a\u6458\u8981\u3001\u6cbb\u7597\u5efa\u8bae\u3001\u8bca\u65ad\u3001\u5b9e\u4f53\u8bc6\u522b\u548c\u533b\u5b66\u6982\u5ff5\u89e3\u91ca\u7b49\u3002\u6211\u4eec\u4f7f\u7528\u5c11\u91cf\u63d0\u793a\u5bf9\u516d\u6b3e\u9886\u5148\u7684LLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5982MEDITRON\u3001Mistral\u3001InternLM 2\u3001Llama 3\u3001GPT-4\u548cClaude-3.5\uff0c\u53d1\u73b0\u5373\u4f7f\u662f\u6700\u9ad8\u7ea7\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u590d\u6742\u4efb\u52a1\u4e0a\u4e5f\u5b58\u5728\u6311\u6218\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f00\u53d1\u4e86MedS-Ins\uff0c\u4e00\u4e2a\u9762\u5411\u533b\u5b66\u9886\u57df\u7684\u5927\u578b\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\u3002MedS-Ins\u5305\u542b\u4e8658\u4e2a\u533b\u5b66\u76f8\u5173\u7684\u8bed\u8a00\u8bed\u6599\u5e93\uff0c\u603b\u8ba11350\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86122\u4e2a\u4efb\u52a1\u3002\u901a\u8fc7\u5c55\u793a\u8be5\u6570\u636e\u96c6\u7684\u7528\u9014\uff0c\u6211\u4eec\u5728\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u3001\u5f00\u6e90\u7684\u533b\u7597\u8bed\u8a00\u6a21\u578b\u4e0a\u8fdb\u884c\u4e86\u6307\u4ee4\u8c03\u4f18\u5b9e\u9a8c\uff0c\u7ed3\u679c\u5f97\u5230\u4e86\u540d\u4e3aMMedIns-Llama 3\u7684\u65b0\u6a21\u578b\uff0c\u5b83\u5728\u51e0\u4e4e\u6240\u6709\u4e34\u5e8a\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u90fd\u8d85\u8fc7\u4e86\u73b0\u6709\u6a21\u578b\u3002\u4e3a\u4e86\u4fc3\u8fdb\u5bf9LLMs\u5e94\u7528\u4e8e\u4e34\u5e8a\u6311\u6218\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u5df2\u5c06MedS-Ins\u6570\u636e\u96c6\u5b8c\u5168\u516c\u5f00\uff0c\u5e76\u9080\u8bf7\u7814\u7a76\u793e\u533a\u53c2\u4e0e\u5176\u6269\u5c55\u3002\u6b64\u5916\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u52a8\u6001\u6392\u884c\u699c\uff0c\u8ba1\u5212\u5b9a\u671f\u66f4\u65b0\u6d4b\u8bd5\u96c6\uff0c\u4ee5\u8ddf\u8e2a\u8fdb\u5c55\u5e76\u589e\u5f3a\u901a\u7528LLM\u5728\u533b\u5b66\u9886\u57df\u4e2d\u7684\u9002\u5e94\u80fd\u529b\u3002\u6392\u884c\u699c\uff1ahttps://henrychur.github.io/MedS-Bench/\u3002Github\uff1ahttps://github.com/MAGIC-AI4Med/MedS-Ins\u3002**|\n", "2408.12496": "|**2024-08-22**|**MEDCO: Medical Education Copilots Based on A Multi-Agent Framework**|Hao Wei et.al.|[2408.12496](http://arxiv.org/abs/2408.12496)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u5b66\u548c\u5065\u5eb7\u9886\u57df\u7b49\u591a\u4e2a\u7814\u7a76\u9886\u57df\u4ea7\u751f\u4e86\u91cd\u5927\u5f71\u54cd\uff0c\u7136\u800cLLMs\u4f5c\u4e3a\u533b\u7597\u6559\u80b2\u4e2d\u7684\u52a9\u624b\u6f5c\u529b\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u5f53\u524d\u7684AI\u8f85\u52a9\u6559\u80b2\u5de5\u5177\u53d7\u9650\u4e8e\u5355\u4e00\u5b66\u4e60\u65b9\u6cd5\u4ee5\u53ca\u65e0\u6cd5\u6a21\u62df\u5b9e\u9645\u533b\u7597\u57f9\u8bad\u7684\u591a\u5b66\u79d1\u6027\u548c\u4e92\u52a8\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDCO\uff08Medical EDucation COpilots\uff09\u7684\u65b0\u578b\u591a\u4ee3\u7406\u52a9\u624b\u7cfb\u7edf\uff0c\u4e13\u95e8\u7528\u4e8e\u6a21\u62df\u771f\u5b9e\u4e16\u754c\u533b\u7597\u57f9\u8bad\u73af\u5883\u3002MEDCO\u6574\u5408\u4e86\u4e09\u4e2a\u6838\u5fc3\u4ee3\u7406\uff1a\u4e00\u4e2a\u81ea\u4e3b\u60a3\u8005\u3001\u4e00\u4f4d\u4e13\u5bb6\u533b\u751f\u548c\u4e00\u4f4d\u653e\u5c04\u79d1\u533b\u5e08\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u591a\u6a21\u6001\u548c\u4e92\u52a8\u7684\u5b66\u4e60\u73af\u5883\u3002\u6211\u4eec\u7684\u6846\u67b6\u7740\u91cd\u4e8e\u6559\u6388\u9ad8\u6548\u63d0\u95ee\u6280\u5de7\u3001\u8de8\u5b66\u79d1\u534f\u4f5c\u4ee5\u53ca\u5b66\u751f\u4e4b\u95f4\u7684\u540c\u4f34\u8ba8\u8bba\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u7ecf\u8fc7MEDCO\u8bad\u7ec3\u7684\u865a\u62df\u5b66\u751f\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u4e0e\u9ad8\u7ea7\u6a21\u578b\u76f8\u5ab2\u7f8e\u7684\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u8fd8\u5c55\u73b0\u51fa\u7c7b\u4f3c\u4eba\u7c7b\u7684\u5b66\u4e60\u884c\u4e3a\u548c\u8fdb\u6b65\uff0c\u5e76\u4e14\u5b66\u4e60\u6837\u672c\u6570\u91cf\u589e\u52a0\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u533b\u7597\u6559\u80b2\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u79cd\u4e92\u52a8\u548c\u534f\u4f5c\u7684\u5b66\u4e60\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u63d0\u4f9b\u4e86\u5173\u4e8e\u96c6\u6210AI\u7684\u8bad\u7ec3\u6a21\u5f0f\u6709\u6548\u6027\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2408.12494": "|**2024-08-22**|**GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models**|Kunsheng Tang et.al.|[2408.12494](http://arxiv.org/abs/2408.12494)|**[link](https://github.com/kstanghere/gendercare-ccs24)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u751f\u6210\u65b9\u9762\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u4f46\u4e5f\u88ab\u89c2\u5bdf\u5230\u653e\u5927\u4e86\u793e\u4f1a\u504f\u89c1\uff0c\u5c24\u5176\u662f\u4e0e\u6027\u522b\u76f8\u5173\u7684\u504f\u89c1\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u5df2\u7ecf\u63d0\u51fa\u4e86\u82e5\u5e72\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u57fa\u51c6\u6d4b\u8bd5\u5f80\u5f80\u7f3a\u4e4f\u5b9e\u9645\u7684\u7075\u6d3b\u6027\u6216\u65e0\u610f\u4e2d\u5f15\u5165\u4e86\u504f\u89c1\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenderCARE\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\uff0c\u5305\u62ec\u521b\u65b0\u7684\u51c6\u5219\u3001\u8bc4\u4f30\u3001\u51cf\u5c11\u6280\u672f\u4ee5\u53ca\u8bc4\u4ef7\u6307\u6807\uff0c\u65e8\u5728\u91cf\u5316\u548c\u51cf\u8f7bLLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u786e\u7acb\u4e86\u5f00\u521b\u6027\u7684\u6027\u522b\u5e73\u7b49\u57fa\u51c6\u51c6\u5219\uff0c\u8986\u76d6\u4e86\u5305\u5bb9\u6027\u3001\u591a\u6837\u6027\u3001\u53ef\u89e3\u91ca\u6027\u3001\u5ba2\u89c2\u6027\u3001\u7a33\u5065\u6027\u548c\u73b0\u5b9e\u6027\u7b49\u591a\u4e2a\u7ef4\u5ea6\u3002\u6839\u636e\u8fd9\u4e9b\u51c6\u5219\uff0c\u6211\u4eec\u6784\u5efa\u4e86GenderPair\uff0c\u4e00\u4e2a\u65b0\u9896\u7684\u914d\u5bf9\u57fa\u51c6\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLM\u4e2d\u7684\u6027\u522b\u504f\u89c1\u3002\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u6807\u51c6\u5316\u4e14\u73b0\u5b9e\u7684\u8bc4\u4f30\uff0c\u5305\u62ec\u4ee5\u524d\u88ab\u5ffd\u89c6\u7684\u6027\u522b\u7fa4\u4f53\uff0c\u5982\u8de8\u6027\u522b\u8005\u548c\u975e\u4e8c\u5143\u4e2a\u4f53\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u6709\u6548\u7684\u53bb\u504f\u6280\u672f\uff0c\u5305\u62ec\u53cd\u4e8b\u5b9e\u6570\u636e\u589e\u5f3a\u548c\u4e13\u95e8\u7684\u5fae\u8c03\u7b56\u7565\uff0c\u4ee5\u5728\u4e0d\u635f\u5bb3LLM\u6574\u4f53\u6027\u80fd\u7684\u524d\u63d0\u4e0b\u51cf\u5c11\u6027\u522b\u504f\u89c1\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u572817\u4e2a\u4e0d\u540c\u7684LLM\u4e0a\uff0c\u5404\u79cd\u6027\u522b\u504f\u89c1\u57fa\u51c6\u7684\u663e\u8457\u51cf\u5c11\uff0c\u6700\u9ad8\u53ef\u8fbe\u8d85\u8fc790%\uff0c\u5e73\u5747\u503c\u8d85\u8fc735%\u3002\u91cd\u8981\u7684\u662f\uff0c\u8fd9\u4e9b\u51cf\u5c11\u5e26\u6765\u7684\u4e3b\u6d41\u8bed\u8a00\u4efb\u52a1\u65b9\u9762\u7684\u53d8\u5f02\u6027\u4fdd\u6301\u57282%\u4ee5\u4e0b\u3002\u901a\u8fc7\u63d0\u4f9b\u771f\u5b9e\u6027\u7684\u8bc4\u4f30\u548c\u9488\u5bf9\u6027\u522b\u504f\u89c1\u7684\u5b9a\u5236\u51cf\u5c11\uff0c\u6211\u4eec\u5e0c\u671bGenderCARE\u80fd\u591f\u4ee3\u8868\u5728LLM\u4e2d\u5b9e\u73b0\u516c\u5e73\u548c\u516c\u6b63\u7684\u4e00\u4e2a\u91cd\u8981\u6b65\u9aa4\u3002\u66f4\u591a\u7ec6\u8282\u8bf7\u53c2\u9605https://github.com/kstanghere/GenderCARE-ccs24\u3002**|\n", "2408.12480": "|**2024-08-23**|**Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese**|Khang T. Doan et.al.|[2408.12480](http://arxiv.org/abs/2408.12480)|null|\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Vintern-1B\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u8d8a\u5357\u8bed\u4efb\u52a1\u7684\u53ef\u9760\u7684\u4e00\u767e\u4ebf\u53c2\u6570\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u3002\u901a\u8fc7\u6574\u5408Qwen2-0.5B-Instruct\u8bed\u8a00\u6a21\u578b\u4e0eInternViT-300M-448px\u89c6\u89c9\u6a21\u578b\uff0cVintern-1B\u4f18\u5316\u4e86\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\uff08OCR\uff09\u3001\u6587\u6863\u63d0\u53d6\u548c\u8d8a\u5357\u8bed\u4e0a\u4e0b\u6587\u4e2d\u7684\u901a\u7528\u95ee\u9898\u56de\u7b54\u7b49\u5e94\u7528\u3002\u8be5\u6a21\u578b\u5728\u8d85\u8fc7\u4e09\u767e\u4e07\u5f20\u56fe\u50cf-\u95ee\u9898-\u7b54\u6848\u5bf9\u7684\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u5b9e\u73b0\u4e86\u5728\u591a\u4e2a\u8d8a\u5357\u8bed\u57fa\u51c6\u6d4b\u8bd5\u5982OpenViVQA\u548cViTextVQA\u4e0a\u7684\u7a33\u5065\u6027\u80fd\u548c\u53ef\u9760\u7ed3\u679c\u3002Vintern-1B\u8db3\u591f\u5c0f\uff0c\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u5404\u79cd\u79bb\u7ebf\u5e94\u7528\u4e2d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f00\u6e90\u4e86\u51e0\u7ec4\u7528\u4e8e\u6587\u672c\u548c\u56fe\u8868\u7684\u8d8a\u5357\u8bed\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u6570\u636e\u96c6\uff0c\u4f7f\u7528\u7684\u662fGemini 1.5 Flash\u521b\u5efa\u7684\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://huggingface.co/5CD-AI/Vintern-1B-v2\u3002|\n", "2408.12475": "|**2024-08-22**|**Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition**|Bozheng Li et.al.|[2408.12475](http://arxiv.org/abs/2408.12475)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65f6\u5e8f\u5e8f\u5217\u611f\u77e5\u6a21\u578b\uff08TSAM\uff09\u4ee5\u8fdb\u884c\u5c11\u91cf\u6837\u672c\u52a8\u4f5c\u8bc6\u522b\uff08FSAR\uff09\uff0c\u8be5\u6a21\u578b\u5728\u9884\u8bad\u7ec3\u6846\u67b6\u4e2d\u5f15\u5165\u4e86\u5e8f\u5217\u611f\u77e5\u5668\u9002\u914d\u5668\uff0c\u65e8\u5728\u6574\u5408\u7a7a\u95f4\u4fe1\u606f\u548c\u5e8f\u5217\u65f6\u95f4\u52a8\u6001\u5230\u7279\u5f81\u5d4c\u5165\u4e2d\u3002\u4e0e\u73b0\u6709\u901a\u8fc7\u63a2\u7d22\u6240\u6709\u5e27\u4e4b\u95f4\u5173\u7cfb\u6765\u6355\u83b7\u65f6\u95f4\u4fe1\u606f\u7684\u7ec6\u8c03\u65b9\u6cd5\u4e0d\u540c\uff0c\u6211\u4eec\u7684\u57fa\u4e8e\u611f\u77e5\u5668\u7684\u9002\u914d\u5668\u80fd\u591f\u6cbf\u65f6\u95f4\u7ebf\u9012\u5f52\u5730\u6355\u6349\u5e8f\u5217\u52a8\u6001\uff0c\u5e76\u611f\u77e5\u987a\u5e8f\u53d8\u5316\u3002\u4e3a\u4e86\u83b7\u53d6\u6bcf\u4e2a\u7c7b\u522b\u7684\u5224\u522b\u6027\u8868\u793a\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bfc\u51fa\u7684\u6587\u672c\u5e93\uff0c\u5bf9\u89c6\u89c9\u539f\u578b\u8fdb\u884c\u4e86\u4e30\u5bcc\uff0c\u901a\u8fc7\u6574\u5408\u4e0a\u4e0b\u6587\u8bed\u4e49\u4fe1\u606f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u4e0d\u5e73\u8861\u6700\u4f18\u4f20\u8f93\u7b56\u7565\u6765\u8fdb\u884c\u7279\u5f81\u5339\u914d\uff0c\u4ee5\u51cf\u8f7b\u4e0e\u7c7b\u522b\u65e0\u5173\u7279\u5f81\u7684\u5f71\u54cd\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u6709\u6548\u7684\u51b3\u7b56\u3002\u5728\u4e94\u4e2aFSAR\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u521b\u4e0b\u4e86\u65b0\u7684\u57fa\u51c6\uff0c\u4e0e\u7b2c\u4e8c\u597d\u7684\u7ade\u4e89\u5bf9\u624b\u76f8\u6bd4\u53d6\u5f97\u4e86\u663e\u8457\u7684\u4f18\u52bf\u3002|\n", "2408.12470": "|**2024-08-22**|**DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems**|Jiaju Chen et.al.|[2408.12470](http://arxiv.org/abs/2408.12470)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u63a8\u8350\u7cfb\u7edf\u4e2d\u7684\u96c6\u6210\u663e\u8457\u63d0\u5347\u4e86\u6027\u80fd\uff0c\u4f46\u5f80\u5f80\u4f34\u968f\u7740\u63a8\u8350\u591a\u6837\u6027\u4e0b\u964d\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u635f\u5bb3\u7528\u6237\u4f53\u9a8c\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u5e94\u8fd0\u800c\u751f\uff0c\u5b83\u5141\u8bb8\u7528\u6237\u6307\u5b9a\u504f\u597d\u5e76\u83b7\u5f97\u6ee1\u8db3\u5176\u591a\u6837\u5316\u9700\u6c42\u7684\u63a8\u8350\u3002\u5c3d\u7ba1\u5177\u6709\u6f5c\u529b\uff0c\u73b0\u6709\u7684\u53ef\u63a7\u63a8\u8350\u7cfb\u7edf\u901a\u5e38\u4f9d\u8d56\u4e8e\u7b80\u5355\u673a\u5236\uff0c\u5982\u5355\u4e00\u63d0\u793a\uff0c\u6765\u8c03\u8282\u591a\u6837\u6027\uff0c\u8fd9\u79cd\u505a\u6cd5\u672a\u80fd\u5145\u5206\u6355\u6349\u7528\u6237\u504f\u597d\u7684\u590d\u6742\u6027\u3002\u9488\u5bf9\u8fd9\u4e9b\u5c40\u9650\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDLCRec\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u57fa\u4e8eLLM\u7684\u63a8\u8350\u7cfb\u7edf\u7684\u7cbe\u7ec6\u7c92\u5ea6\u591a\u6837\u6027\u63a7\u5236\u3002\u4e0e\u4f20\u7edf\u65b9\u6cd5\u4e0d\u540c\uff0cDLCRec\u91c7\u7528\u7cbe\u7ec6\u4efb\u52a1\u5206\u89e3\u7b56\u7565\uff0c\u5c06\u63a8\u8350\u8fc7\u7a0b\u62c6\u5206\u4e3a\u4e09\u4e2a\u4f9d\u6b21\u8fdb\u884c\u7684\u5b50\u4efb\u52a1\uff1a\u4f53\u88c1\u9884\u6d4b\u3001\u4f53\u88c1\u586b\u5145\u548c\u9879\u76ee\u9884\u6d4b\u3002\u8fd9\u4e9b\u5b50\u4efb\u52a1\u72ec\u7acb\u8bad\u7ec3\u5e76\u5728\u7528\u6237\u5b9a\u4e49\u7684\u63a7\u5236\u6570\u6307\u5bfc\u4e0b\u4f9d\u6b21\u63a8\u7406\uff0c\u786e\u4fdd\u4e86\u5bf9\u591a\u6837\u6027\u7684\u66f4\u7cbe\u786e\u63a7\u5236\u3002\u6b64\u5916\uff0c\u7a00\u7f3a\u4e14\u5206\u5e03\u4e0d\u5747\u7684\u591a\u6837\u6027\u76f8\u5173\u7528\u6237\u884c\u4e3a\u6570\u636e\u7684\u7f3a\u4e4f\u6784\u6210\u4e86\u5bf9\u5fae\u8c03\u7684\u4e25\u5cfb\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u79cd\u6570\u636e\u589e\u5f3a\u6280\u672f\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u566a\u58f0\u548c\u79bb\u7fa4\u6570\u636e\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u6280\u672f\u4f7f\u6a21\u578b\u63a5\u89e6\u5230\u66f4\u5e7f\u6cdb\u7684\u6a21\u5f0f\uff0c\u4ece\u800c\u63d0\u9ad8\u5176\u751f\u6210\u4e0d\u540c\u591a\u6837\u6027\u7684\u63a8\u8350\u7684\u9002\u5e94\u6027\u3002\u6211\u4eec\u7684\u5168\u9762\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDLCRec\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u5bf9\u591a\u6837\u6027\u7684\u7cbe\u786e\u63a7\u5236\uff0c\u800c\u4e14\u5728\u591a\u4e2a\u63a8\u8350\u573a\u666f\u4e2d\u90fd\u4f18\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u3002|\n", "2408.13257": "|**2024-08-23**|**MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?**|Yi-Fan Zhang et.al.|[2408.13257](http://arxiv.org/abs/2408.13257)|null|\u8fd1\u671f\uff0c\u5168\u9762\u8bc4\u4f30\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u7814\u7a76\u793e\u533a\u4e2d\u5f15\u53d1\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u73b0\u6709\u57fa\u51c6\u6d4b\u8bd5\u5b58\u5728\u4e00\u4e9b\u666e\u904d\u7684\u969c\u788d\uff0c\u4f7f\u5f97\u8861\u91cf\u6a21\u578b\u9762\u4e34\u7684\u5b9e\u9645\u4e16\u754c\u6311\u6218\u53d8\u5f97\u56f0\u96be\uff0c\u5305\u62ec\uff1a1\uff09\u6570\u636e\u89c4\u6a21\u8f83\u5c0f\u5bfc\u81f4\u6027\u80fd\u6ce2\u52a8\u5927\uff1b2\uff09\u4f9d\u8d56\u6a21\u578b\u751f\u6210\u6ce8\u91ca\u9020\u6210\u6570\u636e\u8d28\u91cf\u53d7\u9650\uff1b3\uff09\u4efb\u52a1\u96be\u5ea6\u4e0d\u8db3\uff0c\u5c24\u5176\u662f\u7531\u4e8e\u56fe\u50cf\u5206\u8fa8\u7387\u6709\u9650\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86MME-RealWorld\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u4ece\u516c\u5171\u6570\u636e\u96c6\u548c\u4e92\u8054\u7f51\u6536\u96c6\u4e86\u8d85\u8fc730\u4e07\u5f20\u56fe\u7247\uff0c\u5e76\u7b5b\u9009\u51fa13,366\u5f20\u9ad8\u8d28\u91cf\u56fe\u7247\u8fdb\u884c\u6807\u6ce8\u3002\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u52a8\u7528\u4e8625\u540d\u4e13\u4e1a\u6ce8\u91ca\u5458\u548c7\u540dMLLM\u9886\u57df\u7684\u4e13\u5bb6\uff0c\u5171\u8d21\u732e\u4e8629,429\u4e2a\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u6db5\u76d6\u4e865\u79cd\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4e0b\u768443\u4e2a\u5b50\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u751a\u81f3\u5bf9\u4eba\u7c7b\u6765\u8bf4\u4e5f\u6781\u5177\u6311\u6218\u6027\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cMME-RealWorld\u662f\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u4eba\u5de5\u6807\u6ce8\u57fa\u51c6\uff0c\u5176\u7279\u5f81\u4e3a\u6700\u9ad8\u5206\u8fa8\u7387\u4ee5\u53ca\u4e13\u6ce8\u4e8e\u771f\u5b9e\u4e16\u754c\u5e94\u7528\u7684\u76ee\u6807\u5bfc\u5411\u3002 \u6211\u4eec\u8fdb\u4e00\u6b65\u5bf928\u4e2a\u9886\u5148\u7684MLLM\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u8bc4\u4f30\uff0c\u5982GPT-4o\u3001Gemini 1.5 Pro\u548cClaude 3.5 Sonnet\u3002\u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u65e0\u6cd5\u5e94\u5bf9\u6211\u4eec\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u6ca1\u6709\u4e00\u4e2a\u6a21\u578b\u8fbe\u523060%\u7684\u51c6\u786e\u7387\u3002\u611f\u77e5\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u548c\u7406\u89e3\u590d\u6742\u7684\u771f\u5b9e\u4e16\u754c\u573a\u666f\u4ecd\u7136\u662f\u4e9f\u5f85\u89e3\u51b3\u7684\u5173\u952e\u95ee\u9898\u3002\u76f8\u5173\u7684\u6570\u636e\u548c\u8bc4\u4f30\u4ee3\u7801\u5df2\u53d1\u5e03\u5728https://mme-realworld.github.io/ \u3002|\n", "2408.13253": "|**2024-08-23**|**Domain-specific long text classification from sparse relevant information**|C\u00e9lia D'Cruz et.al.|[2408.13253](http://arxiv.org/abs/2408.13253)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65e0\u7591\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5b9e\u73b0\u4e86\u91cd\u5927\u9769\u65b0\uff0c\u5f53\u524d\u7684\u8d8b\u52bf\u662f\u63a8\u52a8\u5355\u4e00\u6a21\u578b\u89e3\u51b3\u6240\u6709\u4efb\u52a1\uff08\u5982\u60c5\u611f\u5206\u6790\u3001\u7ffb\u8bd1\u7b49\uff09\u3002\u7136\u800c\uff0c\u5728\u5904\u7406\u7a00\u758f\u4fe1\u606f\u6216\u5f31\u4fe1\u53f7\u65f6\uff0c\u8fd9\u4e9b\u6a21\u578b\u7684\u7edf\u8ba1\u673a\u5236\u96be\u4ee5\u6709\u6548\u5229\u7528\u5173\u952e\u4fe1\u606f\u3002\u4f8b\u5982\uff0c\u5728\u957f\u7bc7\u7279\u5b9a\u9886\u57df\u6587\u6863\u7684\u5206\u7c7b\u4e2d\uff0c\u76f8\u5173\u6027\u5f80\u5f80\u4f9d\u8d56\u4e8e\u4e00\u4e2a\u6216\u51e0\u4e2a\u5173\u952e\u672f\u8bed\u3002\u533b\u7597\u9886\u57df\u4e2d\uff0c\u786e\u5b9a\u67d0\u4e2a\u62a5\u544a\u662f\u5426\u5305\u542b\u4e86\u5173\u4e8e\u60a3\u8005\u72b6\u51b5\u7684\u5173\u952e\u4fe1\u606f\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5173\u952e\u4fe1\u606f\u901a\u5e38\u57fa\u4e8e\u4e00\u4e24\u4e2a\u7279\u5b9a\u7684\u5b64\u7acb\u672f\u8bed\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5c42\u6b21\u5316\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e00\u4e2a\u6f5c\u5728\u76ee\u6807\u672f\u8bed\u5217\u8868\u6765\u68c0\u7d22\u5019\u9009\u53e5\u5b50\uff0c\u5e76\u5c06\u8fd9\u4e9b\u53e5\u5b50\u8868\u793a\u4e3a\u5305\u542b\u5b83\u4eec\u7684\u76ee\u6807\u672f\u8bed\u7684\u4e0a\u4e0b\u6587\u5d4c\u5165\u3002\u5bf9\u76ee\u6807\u672f\u8bed\uff08\u6216\u672f\u8bed\uff09\u7684\u5d4c\u5165\u8fdb\u884c\u805a\u5408\u5bfc\u81f4\u6587\u6863\u8868\u793a\u88ab\u7528\u4e8e\u5206\u7c7b\u3002\u6211\u4eec\u5206\u522b\u5728\u82f1\u8bed\u548c\u6cd5\u8bed\u7684\u516c\u5f00\u533b\u7597\u6587\u6863\u57fa\u51c6\u6570\u636e\u96c6\u4ee5\u53ca\u79c1\u6709\u533b\u7597\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u7a84\u5c42\u7ea7\u6a21\u578b\u5728\u7279\u5b9a\u9886\u57df\u80cc\u666f\u4e0b\u68c0\u7d22\u76f8\u5173\u957f\u6587\u6863\u65b9\u9762\u4f18\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002|\n", "2408.13233": "|**2024-08-23**|**Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time**|Yingyu Liang et.al.|[2408.13233](http://arxiv.org/abs/2408.13233)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u5feb\u901f\u8ba1\u7b97\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u4e2d\u7684\u68af\u5ea6\u8ba1\u7b97\u3002\u8be5\u65b9\u6cd5\u5728\u51e0\u4e4e\u7ebf\u6027\u65f6\u95f4\u5185$n^{1+o(1)}$\u8ba1\u7b97\u6574\u4e2a\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u7684\u68af\u5ea6\uff0c\u5176\u4e2d$n$\u662f\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u3002\u8fd9\u4e00\u7a81\u7834\u6781\u5927\u5730\u964d\u4f4e\u4e86\u4f20\u7edf\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u76f8\u5173\u7684\u8ba1\u7b97\u74f6\u9888\u3002\u6211\u4eec\u7684\u7406\u8bba\u9002\u7528\u4e8e\u4efb\u4f55\u635f\u5931\u51fd\u6570\uff0c\u5e76\u5728\u5168\u6a21\u578b\u4e0a\u4fdd\u6301\u53ef\u63a7\u5236\u7684\u8fd1\u4f3c\u8bef\u5dee\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fd8\u8003\u8651\u4e86\u591a\u5c42\u53d8\u6362\u5668\u6a21\u578b\u5305\u542b\u8bb8\u591a\u5b9e\u7528\u5b50\u6a21\u5757\u7684\u60c5\u51b5\uff0c\u5982\u6b8b\u5dee\u8fde\u63a5\u3001\u56e0\u679c\u63a9\u7801\u548c\u591a\u5934\u6ce8\u610f\u529b\u3002\u901a\u8fc7\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u68af\u5ea6\u8ba1\u7b97\u7684\u6548\u7387\uff0c\u6211\u4eec\u671f\u671b\u901a\u8fc7\u57fa\u4e8e\u6211\u4eec\u7684\u7406\u8bba\u7ed3\u679c\u6539\u8fdb\u957f\u4e0a\u4e0b\u6587\u8bed\u8a00\u6a21\u578b\u7684\u8bad\u7ec3\u548c\u90e8\u7f72\uff0c\u4f7f\u8fd9\u4e9b\u6a21\u578b\u66f4\u52a0\u6709\u6548\u3002|\n", "2408.13214": "|**2024-08-23**|**EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods**|Hongcheng Ding et.al.|[2408.13214](http://arxiv.org/abs/2408.13214)|null|\u51c6\u786e\u9884\u6d4bEUR/USD\u6c47\u7387\u5bf9\u6295\u8d44\u8005\u3001\u4f01\u4e1a\u548c\u653f\u7b56\u5236\u5b9a\u8005\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6IUS\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u65b0\u95fb\u548c\u5206\u6790\u7684\u975e\u7ed3\u6784\u5316\u6587\u672c\u6570\u636e\u4e0e\u6c47\u7387\u548c\u91d1\u878d\u6307\u6807\u7684\u7ed3\u6784\u5316\u6570\u636e\uff0c\u4ee5\u589e\u5f3a\u6c47\u7387\u9884\u6d4b\u80fd\u529b\u3002IUS\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6587\u672c\u60c5\u611f\u6781\u6027\u8bc4\u5206\u548c\u6c47\u7387\u53d8\u52a8\u5206\u7c7b\u3002\u8fd9\u4e9b\u6587\u672c\u7279\u5f81\u4e0e\u5b9a\u91cf\u7279\u5f81\u76f8\u7ed3\u5408\uff0c\u5e76\u8f93\u5165\u5230\u56e0\u679c\u9a71\u52a8\u7279\u5f81\u751f\u6210\u5668\u4e2d\u3002\u7136\u540e\u4f7f\u7528Optuna\u4f18\u5316\u7684Bi-LSTM\u6a21\u578b\u9884\u6d4bEUR/USD\u6c47\u7387\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u6a21\u578b\u5728\u51cf\u5c11\u5e73\u5747\u7edd\u5bf9\u8bef\u5dee\uff08MAE\uff0910.69%\u548c\u6839\u5747\u65b9\u8bef\u5dee\uff08RMSE\uff099.56%\u65b9\u9762\u4f18\u4e8e\u57fa\u51c6\u6a21\u578b\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u878d\u5408\u975e\u7ed3\u6784\u5316\u548c\u7ed3\u6784\u5316\u6570\u636e\uff0c\u51c6\u786e\u6027\u6bd4\u4ec5\u4f7f\u7528\u7ed3\u6784\u5316\u6570\u636e\u66f4\u9ad8\u3002\u6b64\u5916\uff0c\u4f7f\u7528\u9876\u7ea712\u4e2a\u91cd\u8981\u5b9a\u91cf\u7279\u5f81\u548c\u6587\u672c\u7279\u5f81\u76f8\u7ed3\u5408\u8fdb\u884c\u7279\u5f81\u9009\u62e9\u8bc1\u660e\u662f\u6700\u6709\u6548\u7684\u3002\u63d0\u51fa\u7684IUS\u6846\u67b6\u548cOptuna-Bi-LSTM\u6a21\u578b\u63d0\u4f9b\u4e86\u4e00\u79cd\u5f3a\u5927\u7684\u65b0\u65b9\u6cd5\uff0c\u7528\u4e8e\u591a\u6e90\u6570\u636e\u96c6\u6210\u7684\u6c47\u7387\u9884\u6d4b\u3002|\n", "2408.13204": "|**2024-08-23**|**DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation**|Qiming Zhu et.al.|[2408.13204](http://arxiv.org/abs/2408.13204)|null|\u4ee3\u7801\u57fa\u51c6\uff0c\u5982HumanEval\uff0c\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u5b83\u4eec\u4f18\u52bf\u4e0e\u4e0d\u8db3\u7684\u6d1e\u5bdf\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u57fa\u51c6\u4e3b\u8981\u96c6\u4e2d\u5728\u901a\u7528\u7f16\u7801\u4efb\u52a1\u4e0a\uff08\u4f8b\u5982\uff1a\u5192\u6ce1\u6392\u5e8f\u3001\u6700\u5927\u516c\u7ea6\u6570\uff09\uff0c\u5bf9\u9886\u57df\u7279\u5b9a\u7f16\u7801\u4efb\u52a1\uff08\u5982\u8ba1\u7b97\u3001\u7cfb\u7edf\u3001\u52a0\u5bc6\uff09\u7684\u63a2\u7d22\u5219\u8f83\u5c11\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u9886\u57df\u4ee3\u7801\u57fa\u51c6DOMAINEVAL\uff0c\u65e8\u5728\u5168\u9762\u8bc4\u4f30LLMs\u7684\u7f16\u7801\u80fd\u529b\u3002\u6211\u4eec\u7684\u6d41\u7a0b\u4ee5\u5168\u81ea\u52a8\u65b9\u5f0f\u5de5\u4f5c\uff0c\u5141\u8bb8\u4ece\u4ee3\u7801\u4ed3\u5e93\u4e2d\u6784\u5efa\u683c\u5f0f\u5316\u7684\u7814\u7a76\u4e3b\u9898\u8fdb\u884c\u5e95\u90e8\u63a8\u52a8\u5f0f\u6784\u5efa\u3002\u901a\u8fc7\u4f7f\u752812\u4e2a\u4ee3\u8868\u6027LLM\u5728DOMAINEVAL\u4e0a\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u4e00\u4e9b\u6709\u8da3\u7684\u7ed3\u679c\u3002 \u6211\u4eec\u6ce8\u610f\u5230\uff0cLLMs\u5728\u8ba1\u7b97\u4efb\u52a1\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u4f46\u5728\u52a0\u5bc6\u548c\u7cfb\u7edf\u7f16\u7801\u4efb\u52a1\u4e0a\u5374\u6709\u6240\u6b20\u7f3a\u3002\u67d0\u4e9bLLM\u5728\u8fd9\u4e9b\u9886\u57df\u7684\u6027\u80fd\u5dee\u8ddd\u53ef\u80fd\u9ad8\u8fbe68.94%\uff0880.94%-12.0%\uff09\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u751f\u6210\u66f4\u591a\u6837\u672c\u53ef\u4ee5\u63d0\u9ad8LLMs\u7684\u6574\u4f53\u6027\u80fd\uff0c\u4f46\u9886\u57df\u504f\u89c1\u751a\u81f3\u53ef\u80fd\u589e\u52a0\u3002\u672c\u7814\u7a76\u7684\u8d21\u732e\u5305\u62ec\u4e00\u4e2a\u4ee3\u7801\u751f\u6210\u57fa\u51c6\u6570\u636e\u96c6DOMAINEVAL\uff0c\u6db5\u76d6\u516d\u4e2a\u6d41\u884c\u9886\u57df\uff0c\u4ee5\u53ca\u4e00\u4e2a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7ba1\u9053\u7528\u4e8e\u6784\u5efa\u4ee3\u7801\u57fa\u51c6\uff0c\u5e76\u57fa\u4e8e\u5728DOMAINEVAL\u4e0a\u7684\u6027\u80fd\u8bc6\u522b\u4e86LLMs\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u5c40\u9650\u6027\uff0c\u63d0\u4f9b\u4e86\u672a\u6765\u7814\u7a76\u6539\u8fdb\u7684\u65b9\u5411\u3002\u9886\u5bfc\u8005\u677f\u53ef\u5728https://domaineval.github.io/\u67e5\u770b\u3002|\n", "2408.13184": "|**2024-08-23**|**Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning**|Hourui Deng et.al.|[2408.13184](http://arxiv.org/abs/2408.13184)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\uff0c\u7a7a\u95f4\u63a8\u7406\u662f\u5b9e\u73b0\u611f\u77e5\u667a\u80fd\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u5728\u7b80\u5355\u7684\u8ff7\u5bab\u73af\u5883\u4e2d\uff0cLLM\u5728\u957f\u671f\u8def\u5f84\u89c4\u5212\u65b9\u9762\u4ecd\u9762\u4e34\u6311\u6218\uff0c\u4e3b\u8981\u53d7\u5230\u5176\u7a7a\u95f4\u5e7b\u89c9\u548c\u957f\u671f\u63a8\u7406\u5bfc\u81f4\u7684\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u7684\u5f71\u54cd\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6a21\u578b\u2014\u2014\u7a7a\u95f4\u5230\u5173\u7cfb\u8f6c\u6362\u4e0e\u9012\u8fdbQ\u5b66\u4e60\uff08S2RCQL\uff09\u3002\u4e3a\u89e3\u51b3LLM\u7684\u7a7a\u95f4\u5e7b\u89c9\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7a7a\u95f4\u5230\u5173\u7cfb\u201d\u7684\u65b9\u6cd5\uff0c\u5c06\u7a7a\u95f4\u63d0\u793a\u8f6c\u5316\u4e3a\u5b9e\u4f53\u5173\u7cfb\u548c\u8868\u793a\u5b9e\u4f53\u5173\u7cfb\u94fe\u7684\u8def\u5f84\uff0c\u5145\u5206\u6316\u6398\u4e86LLM\u5728\u5e8f\u5217\u601d\u8003\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u57fa\u4e8eQ\u5b66\u4e60\u7684\u8def\u5f84\u89c4\u5212\u7b97\u6cd5\uff0c\u4ee5\u7f13\u89e3\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\uff0c\u589e\u5f3aLLM\u7684\u63a8\u7406\u80fd\u529b\u3002\u901a\u8fc7\u5c06\u72b6\u6001\u52a8\u4f5c\u7684Q\u503c\u4f5c\u4e3a\u63d0\u793a\u7684\u8f85\u52a9\u4fe1\u606f\uff0c\u6211\u4eec\u7ea0\u6b63\u4e86LLM\u7684\u5e7b\u89c9\uff0c\u5f15\u5bfcLLM\u5b66\u4e60\u6700\u4f18\u8def\u5f84\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u53cd\u5411\u8bfe\u7a0b\u5b66\u4e60\u6280\u672f\uff0c\u8fdb\u4e00\u6b65\u7f13\u89e3\u4e86\u4e0a\u4e0b\u6587\u4e0d\u4e00\u81f4\u5e7b\u89c9\u3002\u8be5\u6280\u672f\u901a\u8fc7\u964d\u4f4e\u4efb\u52a1\u96be\u5ea6\u5e76\u5229\u7528\u6210\u529f\u7ecf\u9a8c\uff0c\u5e2e\u52a9LLM\u5feb\u901f\u79ef\u7d2f\uff0c\u5e76\u4ee5\u6b64\u6765\u5e94\u5bf9\u66f4\u590d\u6742\u4efb\u52a1\u3002\u6211\u4eec\u5728\u767e\u5ea6\u81ea\u4e3b\u7814\u53d1\u7684LLM\uff1aERNIE-Bot 4.0\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\u3002\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684S2RCQL\u5728\u6210\u529f\u7387\u548c\u6700\u4f18\u6027\u65b9\u9762\u5206\u522b\u63d0\u9ad8\u4e8623%\u81f340%\uff0c\u76f8\u8f83\u4e8e\u5148\u8fdb\u7684\u63d0\u793a\u5de5\u7a0b\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002|\n", "2408.13073": "|**2024-08-23**|**IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models**|Zhihao Yu et.al.|[2408.13073](http://arxiv.org/abs/2408.13073)|**[link](https://github.com/yzhHoward/IntelliCare)**|\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u6570\u636e\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u53d6\u5f97\u5de8\u5927\u8fdb\u6b65\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u5728\u5904\u7406\u6709\u9650\u6570\u636e\u4e2d\u7684\u591a\u6837\u5316\u7684\u533b\u5b66\u4ee3\u7801\u65f6\u5f80\u5f80\u96be\u4ee5\u5168\u9762\u6355\u6349\u5176\u8bed\u4e49\u3002\u5f15\u5165\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u77e5\u8bc6\u6574\u5408\u4e3a\u63d0\u5347\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u63d0\u4f9b\u4e86\u6709\u524d\u666f\u7684\u9014\u5f84\u3002\u7136\u800c\uff0cLLM\u5206\u6790\u53ef\u80fd\u4f1a\u56e0\u6b67\u4e49\u95ee\u9898\u548c\u4e0d\u4e00\u81f4\u6027\u5bfc\u81f4\u663e\u8457\u7684\u6ce2\u52a8\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u6709\u6548\u5229\u7528\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aIntelliCare\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5229\u7528LLM\u63d0\u4f9b\u9ad8\u8d28\u91cf\u7684\u60a3\u8005\u7ea7\u5916\u90e8\u77e5\u8bc6\u5e76\u589e\u5f3a\u73b0\u6709\u7684EHR\u6a21\u578b\u6765\u6539\u5584\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u3002\u5177\u4f53\u6765\u8bf4\uff0cIntelliCare\u901a\u8fc7\u8bc6\u522b\u60a3\u8005\u7fa4\u4f53\uff0c\u5e76\u5229\u7528\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u7edf\u8ba1\u4fe1\u606f\u6765\u589e\u5f3aLLM\u7684\u7406\u89e3\u548c\u751f\u6210\u80fd\u529b\uff0c\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6b67\u4e49\u95ee\u9898\u3002\u6b64\u5916\uff0c\u5b83\u901a\u8fc7\u7ed3\u5408EHR\u6a21\u578b\u548c\u56f0\u60d1\u5ea6\u91cf\u6765\u7ec6\u5316\u4eceLLM\u83b7\u53d6\u7684\u77e5\u8bc6\uff0c\u91c7\u7528\u6df7\u5408\u65b9\u6cd5\u751f\u6210\u591a\u4e2a\u5206\u6790\u7ed3\u679c\u5e76\u8fdb\u884c\u6821\u51c6\u3002\u5728\u4e09\u4e2a\u4e34\u5e8a\u9884\u6d4b\u4efb\u52a1\u4e0a\u5bf9\u4e24\u4e2a\u5927\u89c4\u6a21EHR\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cIntelliCare\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u73b0\u6709\u65b9\u6cd5\u7684\u8868\u73b0\uff0c\u51f8\u663e\u4e86\u5176\u5728\u63a8\u8fdb\u4e2a\u6027\u5316\u533b\u7597\u4fdd\u5065\u9884\u6d4b\u548c\u51b3\u7b56\u652f\u6301\u7cfb\u7edf\u65b9\u9762\u7684\u6f5c\u529b\u3002|\n", "2408.13071": "|**2024-08-23**|**Guiding IoT-Based Healthcare Alert Systems with Large Language Models**|Yulan Gao et.al.|[2408.13071](http://arxiv.org/abs/2408.13071)|null|\u5728\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff08HAS\uff09\u9886\u57df\uff0c\u968f\u7740\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u3001\u7269\u8054\u7f51\uff08IoT\uff09\u6280\u672f\u7684\u5feb\u901f\u53d1\u5c55\u4ee5\u53ca\u516c\u4f17\u5065\u5eb7\u610f\u8bc6\u7684\u63d0\u9ad8\uff0cHAS\u6b63\u7ecf\u5386\u7740\u5feb\u901f\u7684\u53d8\u9769\u3002\u5c3d\u7ba1\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u4f46\u5b58\u5728\u4e00\u4e2a\u6838\u5fc3\u6311\u6218\uff1a\u5982\u4f55\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u5728\u4e2a\u6027\u5316\u5065\u5eb7\u8b66\u62a5\u7684\u51c6\u786e\u6027\u4e0e\u4e25\u683c\u9690\u79c1\u4fdd\u62a4\u4e4b\u95f4\u627e\u5230\u5e73\u8861\u70b9\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\u2014\u2014LLM-HAS\uff08\u5927\u578b\u8bed\u8a00\u6a21\u578b\u533b\u7597\u5065\u5eb7\u8b66\u62a5\u7cfb\u7edf\uff09\u3002\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u878d\u5165\u5230HAS\u4e2d\uff0c\u4ee5\u663e\u8457\u63d0\u5347\u8b66\u62a5\u7684\u51c6\u786e\u6027\u3001\u786e\u4fdd\u7528\u6237\u9690\u79c1\uff0c\u5e76\u589e\u5f3a\u4e2a\u6027\u5316\u533b\u7597\u670d\u52a1\uff0c\u540c\u65f6\u6539\u5584\u7528\u6237\u4f53\u9a8c\u7684\u8d28\u91cf\uff08QoE\uff09\u3002\u6211\u4eec\u7684\u521b\u65b0\u6846\u67b6\u91c7\u7528\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u65b9\u6cd5\uff0c\u7ed3\u5408LLM\uff0c\u901a\u8fc7\u5206\u6790\u7528\u6237\u7684\u4e2a\u6027\u5316\u504f\u597d\u548c\u6f5c\u5728\u5065\u5eb7\u98ce\u9669\u6765\u5904\u7406\u989d\u5916\u7684\u6587\u672c\u5de5\u4f5c\u63cf\u8ff0\u3002\u8fd9\u79cd\u5206\u6790\u6307\u5bfc\u4e86\u4e13\u95e8\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08DDPG\uff09\u4e13\u5bb6\u7684\u9009\u62e9\uff0c\u4ed6\u4eec\u8d1f\u8d23\u63d0\u4f9b\u7cbe\u786e\u7684\u5065\u5eb7\u8b66\u62a5\u3002\u6b64\u5916\uff0cLLM-HAS\u80fd\u591f\u5904\u7406\u5bf9\u8bdd\u5f0f\u7528\u6237\u53cd\u9988\uff0c\u4e0d\u4ec5\u5141\u8bb8\u5bf9DDPG\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd8\u80fd\u52a0\u6df1\u7528\u6237\u53c2\u4e0e\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8\u5065\u5eb7\u7ba1\u7406\u7b56\u7565\u7684\u51c6\u786e\u6027\u548c\u4e2a\u6027\u5316\u7a0b\u5ea6\u3002 \u6a21\u62df\u7ed3\u679c\u9a8c\u8bc1\u4e86LLM-HAS\u6846\u67b6\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5176\u4f5c\u4e3a\u5229\u7528\u751f\u6210\u578b\u4eba\u5de5\u667a\u80fd\uff08GAI\uff09\u63d0\u4f9b\u9ad8\u5ea6\u51c6\u786e\u53ef\u9760\u8b66\u62a5\u7684\u7a81\u7834\u6027\u65b9\u6cd5\u7684\u6f5c\u529b\u3002|\n", "2408.13031": "|**2024-08-23**|**VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models**|Wentao Wu et.al.|[2408.13031](http://arxiv.org/abs/2408.13031)|**[link](https://github.com/event-ahu/vfm-det)**|**\u73b0\u6709\u8f66\u8f86\u68c0\u6d4b\u5668\u901a\u5e38\u901a\u8fc7\u5728\u57fa\u4e8e\u9884\u8bad\u7ec3\u4e3b\u5e72\uff08\u5982ResNet\u3001ViT\uff09\u7684\u9884\u8bad\u7ec3\u5178\u578b\u68c0\u6d4b\u5668\uff08\u4f8b\u5982YOLO\u3001RCNN\u3001DETR\u7cfb\u5217\uff09\u4e0a\u8fdb\u884c\u8f66\u8f86\u56fe\u50cf\u8bad\u7ec3\u83b7\u5f97\u3002\u4e00\u4e9b\u7814\u7a76\u8005\u8fd8\u5229\u7528\u5e76\u589e\u5f3a\u5927\u578b\u57fa\u7840\u6a21\u578b\u6765\u63d0\u5347\u68c0\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e9b\u68c0\u6d4b\u5668\u53ef\u80fd\u4ec5\u83b7\u5f97\u6b21\u4f18\u7ed3\u679c\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f7f\u7528\u7684\u5927\u578b\u6a21\u578b\u5e76\u975e\u4e13\u95e8\u4e3a\u8f66\u8f86\u8bbe\u8ba1\u3002\u6b64\u5916\uff0c\u4ed6\u4eec\u7684\u7ed3\u679c\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u89c6\u89c9\u7279\u5f81\uff0c\u5e76\u4e14\u5f88\u5c11\u8003\u8651\u8f66\u8f86\u8bed\u4e49\u4fe1\u606f\u4e0e\u89c6\u89c9\u8868\u793a\u4e4b\u95f4\u7684\u5bf9\u9f50\u3002 \u5728\u6b64\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u9884\u8bad\u7ec3\u7684\u8f66\u8f86\u6a21\u578b\uff08VehicleMAE\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08T5\uff09\u7684\u65b0\u8f66\u8f86\u68c0\u6d4b\u8303\u5f0f\uff0c\u79f0\u4e3aVFM-Det\u3002\u5b83\u9075\u5faa\u533a\u57df\u5efa\u8bae\u6846\u68c0\u6d4b\u6846\u67b6\uff0c\u6bcf\u4e2a\u63d0\u8bae\u7684\u7279\u5f81\u53ef\u4ee5\u901a\u8fc7VehicleMAE\u589e\u5f3a\u3002\u66f4\u91cd\u8981\u7684\u662f\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684VAtt2Vec\u6a21\u5757\uff0c\u7528\u4e8e\u9884\u6d4b\u8fd9\u4e9b\u63d0\u8bae\u7684\u8f66\u8f86\u8bed\u4e49\u5c5e\u6027\u5e76\u5c06\u5b83\u4eec\u8f6c\u6362\u4e3a\u7279\u5f81\u5411\u91cf\uff0c\u901a\u8fc7\u5bf9\u6bd4\u5b66\u4e60\u589e\u5f3a\u89c6\u89c9\u7279\u5f81\u3002\u5bf9\u4e09\u4e2a\u8f66\u8f86\u68c0\u6d4b\u57fa\u51c6\u6570\u636e\u96c6\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u5145\u5206\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u8f66\u8f86\u68c0\u6d4b\u5668\u7684\u6709\u6548\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5206\u522b\u5728Cityscapes\u6570\u636e\u96c6\u4e0a\u7684$AP_{0.5}$\u3001$AP_{0.75}$\u6307\u6807\u4e0a\uff0c\u76f8\u8f83\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e86$+5.1\\%$\u3001$+6.2\\%$\u3002\u6b64\u5de5\u4f5c\u7684\u6e90\u4ee3\u7801\u5c06\u5728https://github.com/Event-AHU/VFM-Det\u53d1\u5e03\u3002**|\n", "2408.13028": "|**2024-08-23**|**In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting**|Haowei Du et.al.|[2408.13028](http://arxiv.org/abs/2408.13028)|null|\u5728\u5f53\u524d\u7684\u5b66\u672f\u754c\uff0c\u5bf9\u57fa\u4e8e\u6307\u4ee4\u589e\u5f3a\u7684\u5c11\u91cf\u5b9e\u4f8b\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08Large Language Models, LLM\uff09\u8fdb\u884c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08In-context Learning, ICL\uff09\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u73b0\u6709\u7684\u7528\u4e8eICL\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u5229\u7528\u7a00\u758f\u6216\u5bc6\u96c6\u68c0\u7d22\u5668\uff0c\u5e76\u4e14\u80fd\u591f\u4ea7\u751f\u6709\u6548\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528LLM\u5bf9\u53cd\u9988\u4fe1\u606f\u7684\u5229\u7528\u6765\u8bad\u7ec3\u68c0\u7d22\u5668\uff0c\u6240\u9009\u7684\u793a\u4f8b\u53ef\u80fd\u65e0\u6cd5\u663e\u8457\u63d0\u5347LLM\u7684\u7c7b\u6bd4\u80fd\u529b\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u57fa\u4e8e\u5f3a\u5316\u5b66\u4e60\u7684\u7b56\u7565\u6846\u67b6\uff08Policy-based Reinforcement Learning Framework, RLS\uff09\u7528\u4e8e\u793a\u4f8b\u9009\u62e9\u3002\u8be5\u6846\u67b6\u7531\u8bed\u8a00\u6a21\u578b\uff08Language Model, LM\uff09\u9009\u62e9\u5668\u548cLLM\u751f\u6210\u5668\u7ec4\u6210\u3002\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u5c06\u5019\u9009\u793a\u4f8b\u7f16\u7801\u4e3a\u5bc6\u96c6\u8868\u793a\uff0c\u5e76\u4ece\u4e2d\u9009\u62e9top-k\u4e2a\u793a\u4f8b\u4f5c\u4e3aLLM\u7684\u793a\u8303\u3002\u901a\u8fc7\u91c7\u7528LLM\u7684\u8f93\u51fa\u6765\u8ba1\u7b97\u5956\u52b1\u548c\u7b56\u7565\u68af\u5ea6\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u9009\u62e9\u5668\u3002 \u6211\u4eec\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u793a\u4f8b\u9009\u62e9\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u76f8\u8f83\u4e8e\u76d1\u7763\u5fae\u8c03\uff08Supervised Fine-tuning, SFT\uff09\u6a21\u578b\u663e\u793a\u51fa\u4f18\u52bf\u3002\u8fdb\u4e00\u6b65\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u793a\u4f8b\u7684\u6570\u91cf\u4e30\u5bcc\u6027\u548c\u4e0e\u6d4b\u8bd5\u6848\u4f8b\u7684\u76f8\u4f3c\u6027\u5bf9\u4e8eICL\u4e2d\u7684LLM\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002|\n", "2408.14470": "|**2024-08-27**|**Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models**|Aradhye Agarwal et.al.|[2408.14470](http://arxiv.org/abs/2408.14470)|**[link](https://github.com/Aradhye2002/selective-peft-toolkit)**|**\u7ec6\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e0b\u6e38\u4efb\u52a1\u4e0a\u9700\u8981\u5927\u91cf\u8ba1\u7b97\u8d44\u6e90\u3002\u53c2\u6570\u9ad8\u6548\u7ec6\u8c03\uff08PEFT\uff09\u7c7b\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4ec5\u5fae\u8c03\u6a21\u578b\u53c2\u6570\u7684\u5c0f\u90e8\u5206\u6765\u7f13\u89e3\u8fd9\u4e9b\u8ba1\u7b97\u6311\u6218\u3002\u867d\u7136\u4ece\u8ba1\u7b97\u6548\u7387\u65b9\u9762\u8003\u8651\uff0c\u8fd9\u4e9b\u6280\u672f\u901a\u5e38\u65e0\u6cd5\u4e0e\u5b8c\u5168\u5fae\u8c03\u7684\u6a21\u578b\u6027\u80fd\u76f8\u5339\u654c\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u56fa\u6709\u7684\u504f\u89c1\u3002\u4f20\u7edf\u7684\u9009\u62e9\u6027PEFT\u6280\u672f\u57fa\u4e8e\u9884\u5148\u5b9a\u4e49\u7684\u9884\u7b97\uff08\u4e5f\u79f0\u4e3a\u53bb\u906e\u7f69\uff09\u4f7f\u7528\u56fa\u5b9a\u53c2\u6570\u96c6\uff0c\u672a\u80fd\u52a8\u6001\u6355\u6349\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u7ecf\u5e38\u8d85\u51fa\u9884\u7b97\u3002\u6211\u4eec\u5f15\u5165\u4e86$\\text{ID}^3$\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u9009\u62e9\u6027PEFT\u65b9\u6cd5\uff0c\u5b83\u8fde\u7eed\u8ba1\u7b97\u53c2\u6570\u7684\u91cd\u8981\u6027\uff0c\u5e76\u901a\u8fc7\u5e73\u8861\u53c2\u6570\u9009\u62e9\u8fc7\u7a0b\u4e2d\u7684\u63a2\u7d22\u4e0e\u5229\u7528\u6765\u52a8\u6001\u5730\u53bb\u906e\u7f69\u53c2\u6570\u3002\u6211\u4eec\u572815\u4e2a\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8986\u76d6\u4e86\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u4e0e\u751f\u6210\u4efb\u52a1\uff0c\u663e\u793a\u4e86\u4e0e\u57fa\u4e8e\u56fa\u5b9a\u53bb\u906e\u7f69\u7684PEFT\u6280\u672f\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u5206\u6790\u8bc1\u660e\uff0c$\\text{ID}^3$\u5c06\u68af\u5ea6\u66f4\u65b0\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u4e00\u500d\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u8ba1\u7b97\u6548\u7387\u3002$\\text{ID}^3$\u5bf9\u795e\u7ecf\u5143\u7684\u968f\u673a\u521d\u59cb\u5316\u5177\u6709\u9c81\u68d2\u6027\uff0c\u56e0\u6b64\u53ef\u4ee5\u65e0\u7f1d\u96c6\u6210\u5230\u73b0\u6709\u6dfb\u52a0\u5f0f\u548c\u91cd\u65b0\u53c2\u6570\u5316\u57faPEFT\u6a21\u5757\uff0c\u5982\u9002\u914d\u5668\u548cLoRA\u4e2d\uff0c\u7528\u4e8e\u52a8\u6001\u7a00\u758f\u5316\u3002**|\n", "2408.14469": "|**2024-08-26**|**Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos**|Qirui Chen et.al.|[2408.14469](http://arxiv.org/abs/2408.14469)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u957f\u5f62\u5f0f\u7b2c\u4e00\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e2d\u7684\u591a\u8df3\u89c6\u9891\u95ee\u7b54\uff08Multi-Hop Video Question Answering\uff0cMH-VidQA\uff09\u95ee\u9898\u3002\u8fd9\u9879\u4efb\u52a1\u4e0d\u4ec5\u9700\u8981\u56de\u7b54\u89c6\u89c9\u95ee\u9898\uff0c\u8fd8\u9700\u8981\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u591a\u4e2a\u76f8\u5173\u7684\u65f6\u95f4\u6bb5\u4f5c\u4e3a\u89c6\u89c9\u8bc1\u636e\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u52a8\u5316\u6d41\u7a0b\u6765\u521b\u5efa\u5e26\u6709\u5173\u8054\u65f6\u95f4\u8bc1\u636e\u7684\u591a\u8df3\u95ee\u9898\u89e3\u7b54\u914d\u5bf9\uff0c\u4ece\u800c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6307\u4ee4\u8c03\u6574\u7684\u5927\u89c4\u6a21\u6570\u636e\u96c6\u3002\u4e3a\u4e86\u76d1\u6d4b\u8fd9\u4e00\u65b0\u4efb\u52a1\u7684\u8fdb\u5c55\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u6574\u7406\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u57fa\u51c6\u2014\u2014MultiHop-EgoQA\uff0c\u901a\u8fc7\u4ed4\u7ec6\u7684\u624b\u52a8\u9a8c\u8bc1\u548c\u7ec6\u5316\u8fdb\u884c\u6784\u5efa\u3002 \u5b9e\u9a8c\u7ed3\u679c\u63ed\u793a\u4e86\u73b0\u6709\u8de8\u6a21\u6001\u7cfb\u7edf\u5728\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5bfc\u81f4\u6027\u80fd\u4e0d\u4f73\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cGrounding Scattered Evidence with Large Language Model\u201d\uff08GeLM\uff09\u7684\u65b0\u67b6\u6784\uff0c\u8be5\u67b6\u6784\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u5730\u7406\u89e3\u7801\u6a21\u5757\u589e\u5f3a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\uff0c\u8be5\u6a21\u5757\u4f7f\u7528\u7075\u6d3b\u7684\u5730\u7406\u89e3\u7801\u4ee4\u724c\u4ece\u89c6\u9891\u4e2d\u68c0\u7d22\u65f6\u95f4\u8bc1\u636e\u3002\u5728\u6211\u4eec\u7684\u89c6\u89c9\u6307\u4ee4\u6570\u636e\u4e0a\u8fdb\u884c\u8bad\u7ec3\u540e\uff0cGeLM\u5c55\u793a\u4e86\u589e\u5f3a\u7684\u591a\u8df3\u5b9a\u4f4d\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4e3a\u8fd9\u4e00\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u8bbe\u5b9a\u4e86\u65b0\u7684\u57fa\u51c6\u3002\u6b64\u5916\uff0c\u5f53\u5728\u7b2c\u4e09\u4eba\u79f0\u89c6\u89d2\u89c6\u9891\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u76f8\u540c\u7684\u67b6\u6784\u5728\u5355\u8df3\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\uff08ActivityNet-RTL\uff09\u4e0a\u4e5f\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u8bc1\u660e\u4e86\u5176\u6709\u6548\u6027\u3002|\n", "2408.14467": "|**2024-08-26**|**Explicit Inductive Inference using Large Language Models**|Tianyang Liu et.al.|[2408.14467](http://arxiv.org/abs/2408.14467)|null|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ba1\u9053\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fd9\u4e00\u504f\u5dee\u8fdb\u884c\u660e\u786e\u7684\u5f52\u7eb3\u63a8\u7406\u3002\u8be5\u7ba1\u9053\u4f7f\u7528LLM\u5c06\u524d\u63d0\u8f6c\u6362\u4e3a\u4e00\u7ec4\u5df2\u9a8c\u8bc1\u7684\u66ff\u4ee3\u65b9\u6848\uff0c\u5e76\u901a\u8fc7\u805a\u5408\u884d\u751f\u7684\u65b0\u8574\u542b\u8be2\u95ee\u7684\u7b54\u6848\u6765\u652f\u6301\u539f\u59cb\u63a8\u7406\u9884\u6d4b\u3002\u5728\u65b9\u5411\u6027\u8c13\u8bcd\u8574\u542b\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u901a\u8fc7\u5e94\u7528\u6b64\u7b80\u5355\u7ba1\u9053\uff0c\u53ef\u4ee5\u63d0\u9ad8LLM\u5728\u63a8\u7406\u4e0a\u7684\u6574\u4f53\u6027\u80fd\uff0c\u5e76\u663e\u8457\u51cf\u8f7b\u5b83\u4eec\u7684\u8bc1\u5b9e\u504f\u5dee\u5f71\u54cd\u3002|\n", "2408.14438": "|**2024-08-26**|**Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study**|Liuchang Xu Shuo Zhao et.al.|[2408.14438](http://arxiv.org/abs/2408.14438)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5982ChatGPT\u3001Gemini\u7b49\u7684\u95ee\u4e16\uff0c\u8bc4\u4f30\u5b83\u4eec\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4ee3\u7801\u751f\u6210\u7b49\u591a\u65b9\u9762\u80fd\u529b\u7684\u91cd\u8981\u6027\u65e5\u76ca\u51f8\u663e\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u65b9\u9762\u7684\u8868\u73b0\u5e76\u672a\u5f97\u5230\u5168\u9762\u8bc4\u4f30\u3002\u672c\u7814\u7a76\u586b\u8865\u4e86\u8fd9\u4e00\u7a7a\u767d\uff0c\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u65b0\u9896\u7684\u591a\u4efb\u52a1\u7a7a\u95f4\u8bc4\u4ef7\u6570\u636e\u96c6\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u548c\u6bd4\u8f83\u51e0\u79cd\u5148\u8fdb\u6a21\u578b\u5728\u7a7a\u95f4\u4efb\u52a1\u4e0a\u7684\u6027\u80fd\u3002\u8be5\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u5341\u4e8c\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\u7c7b\u578b\uff0c\u5305\u62ec\u7a7a\u95f4\u7406\u89e3\u548c\u8def\u5f84\u89c4\u5212\uff0c\u5e76\u4e14\u6bcf\u9879\u4efb\u52a1\u90fd\u6709\u7ecf\u8fc7\u9a8c\u8bc1\u7684\u51c6\u786e\u7b54\u6848\u3002 \u6211\u4eec\u91c7\u7528\u53cc\u9636\u6bb5\u6d4b\u8bd5\u65b9\u6cd5\u5bf9\u591a\u4e2a\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5305\u62ecOpenAI\u7684gpt-3.5-turbo\u3001gpt-4o\u4ee5\u53caZhipuAI\u7684glm-4\u3002\u9996\u5148\u8fdb\u884c\u96f6\u6837\u672c\u6d4b\u8bd5\uff0c\u968f\u540e\u6839\u636e\u96be\u5ea6\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u5206\u7c7b\uff0c\u5e76\u6267\u884c\u4e86\u63d0\u793a\u8c03\u4f18\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u7b2c\u4e00\u9636\u6bb5\u7684\u6d4b\u8bd5\u4e2d\uff0cgpt-4o\u7684\u6574\u4f53\u51c6\u786e\u6027\u6700\u9ad8\uff0c\u5e73\u5747\u8fbe\u5230\u4e8671.3%\u3002\u5c3d\u7ba1moonshot-v1-8k\u5728\u603b\u4f53\u4e0a\u7565\u900a\u4e00\u7b79\uff0c\u4f46\u5728\u5730\u540d\u8bc6\u522b\u4efb\u52a1\u4e0a\u5374\u8d85\u8d8a\u4e86gpt-4o\u3002\u7814\u7a76\u8fd8\u63ed\u793a\u4e86\u7279\u5b9a\u4efb\u52a1\u4e2d\u63d0\u793a\u7b56\u7565\u5bf9\u6a21\u578b\u6027\u80fd\u7684\u5f71\u54cd\u3002\u4f8b\u5982\uff0c\u94fe\u5f0f\u601d\u8003\uff08COT\uff09\u7b56\u7565\u4f7fgpt-4o\u5728\u8def\u5f84\u89c4\u5212\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece12.4%\u63d0\u5347\u81f387.5%\uff0c\u800c\u4e00\u6b21\u5c04\u51fb\u7b56\u7565\u5219\u4f7fmoonshot-v1-8k\u5728\u5730\u56fe\u7ed8\u5236\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u4ece10.1%\u63d0\u9ad8\u523076.3%\u3002|\n", "2408.14419": "|**2024-08-26**|**CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models**|Shubham Bharti et.al.|[2408.14419](http://arxiv.org/abs/2408.14419)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCHARTOM\u7684\u89c6\u89c9\u7406\u8bba\u7406\u89e3\u57fa\u51c6\uff0c\u9488\u5bf9\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002CHARTOM\u7531\u4e13\u95e8\u8bbe\u8ba1\u7684\u6570\u636e\u53ef\u89c6\u5316\u56fe\u8868\u7ec4\u6210\u3002\u7ed9\u5b9a\u4e00\u4e2a\u56fe\u8868\uff0c\u8bed\u8a00\u6a21\u578b\u4e0d\u4ec5\u9700\u8981\u6b63\u786e\u7406\u89e3\u56fe\u8868\uff08\u4e8b\u5b9e\u95ee\u9898\uff09\uff0c\u8fd8\u9700\u8981\u5224\u65ad\u8be5\u56fe\u8868\u662f\u5426\u4f1a\u8ba9\u4eba\u7c7b\u8bfb\u8005\u4ea7\u751f\u8bef\u5bfc\uff08\u601d\u7ef4\u95ee\u9898\uff09\u3002\u8fd9\u4e24\u4e2a\u95ee\u9898\u90fd\u5177\u6709\u91cd\u8981\u7684\u793e\u4f1a\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u8be6\u7ec6\u4ecb\u7ecd\u6784\u5efaCHARTOM\u57fa\u51c6\u7684\u8fc7\u7a0b\uff0c\u5305\u62ec\u5176\u5bf9\u4eba\u7c7b\u8868\u73b0\u7684\u6821\u51c6\u3002|\n", "2408.14418": "|**2024-08-26**|**MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues**|Kuluhan Binici et.al.|[2408.14418](http://arxiv.org/abs/2408.14418)|null|\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b(ASR)\u7cfb\u7edf\u5728\u5c06\u8bed\u97f3\u8f6c\u6362\u4e3a\u6587\u672c\u65b9\u9762\u81f3\u5173\u91cd\u8981\uff0c\u7136\u800c\uff0c\u5b83\u4eec\u5f15\u5165\u7684\u9519\u8bef\u4f1a\u4e25\u91cd\u964d\u4f4e\u4e0b\u6e38\u4efb\u52a1\u5982\u6458\u8981\u751f\u6210\u7684\u8868\u73b0\u3002\u8fd9\u4e2a\u95ee\u9898\u5728\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u9886\u57df\u5c24\u4e3a\u7a81\u51fa\uff0c\u8fd9\u662f\u4e00\u4e2a\u6570\u636e\u8d44\u6e90\u6709\u9650\u7684\u9886\u57df\uff0c\u7528\u4e8e\u5fae\u8c03\u7684\u76d1\u7763\u6570\u636e\u7a00\u7f3a\uff0c\u56e0\u6b64\u9700\u8981\u5c06ASR\u6a21\u578b\u4f5c\u4e3a\u9ed1\u76d2\u89e3\u51b3\u65b9\u6848\u4f7f\u7528\u3002\u4f20\u7edf\u7684\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u4e5f\u4e0d\u9002\u7528\u4e8e\u63d0\u9ad8\u6458\u8981\u6a21\u578b\u5bf9\u566a\u97f3\u7684\u9c81\u68d2\u6027\uff0c\u539f\u56e0\u662f\u7f3a\u4e4f\u8db3\u591f\u7684\u533b\u7597\u5bf9\u8bdd\u97f3\u9891\u8bb0\u5f55\u53ca\u5176\u5bf9\u5e94\u7684ASR\u8f6c\u5f55\u6587\u672c\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMEDSAGE\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLMs)\u751f\u6210\u5408\u6210\u6837\u672c\u8fdb\u884c\u6570\u636e\u589e\u5f3a\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5229\u7528LLMs\u7684\u4e0a\u4e0b\u6587\u5b66\u4e60\u80fd\u529b\uff0c\u5e76\u6307\u5bfc\u5b83\u4eec\u57fa\u4e8e\u5c11\u91cf\u53ef\u7528\u7684\u533b\u7597\u5bf9\u8bdd\u793a\u4f8b\u548c\u97f3\u9891\u8bb0\u5f55\uff0c\u751f\u6210\u7c7b\u4f3cASR\u7684\u9519\u8bef\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u80fd\u591f\u6709\u6548\u5730\u5efa\u6a21ASR\u566a\u97f3\uff0c\u5c06\u8fd9\u79cd\u542b\u566a\u6570\u636e\u878d\u5165\u8bad\u7ec3\u8fc7\u7a0b\u663e\u8457\u63d0\u9ad8\u4e86\u533b\u7597\u5bf9\u8bdd\u6458\u8981\u7cfb\u7edf\u7684\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u89e3\u51b3\u4e86\u5173\u952e\u5e94\u7528\u4e2dASR\u8f93\u51fa\u566a\u97f3\u7684\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u589e\u5f3a\u4e34\u5e8a\u5bf9\u8bdd\u6458\u8981\u53ef\u9760\u6027\u7684\u7a33\u5065\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.14398": "|**2024-08-26**|**Language-specific Calibration for Pruning Multilingual Language Models**|Simon Kurz et.al.|[2408.14398](http://arxiv.org/abs/2408.14398)|null|\u8fd1\u671f\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u526a\u679d\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\uff0c\u5728\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0\u4e86\u5353\u8d8a\u7684\u538b\u7f29\u6548\u679c\uff0c\u5e76\u4fdd\u6301\u4e86\u9ad8\u9884\u6d4b\u6027\u80fd\u3002\u7136\u800c\uff0c\u8fd9\u7c7b\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u4f7f\u7528\u82f1\u8bed\u6587\u672c\u8fdb\u884c\u526a\u679d\u6821\u51c6\uff0c\u800c\u5ffd\u7565\u4e86\u73b0\u4ee3LLM\u7684\u591a\u8bed\u8a00\u6027\u8d28\u53ca\u5176\u5728\u975e\u82f1\u8bed\u8bed\u8a00\u4e2d\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u672c\u6587\u65e8\u5728\u63a2\u7d22\u7528\u4e8e\u526a\u679d\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u6548\u7b56\u7565\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u9996\u4e2a\u5168\u9762\u7684\u5b9e\u8bc1\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u4e0d\u540c\u6821\u51c6\u8bed\u8a00\u5728\u591a\u8bed\u8a00\u4efb\u52a1\u3001\u6a21\u578b\u548c\u6700\u5148\u8fdb\u7684\u526a\u679d\u6280\u672f\u4e0b\u5bf9\u526a\u679d\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63d0\u4f9b\u4e86\u5b9e\u7528\u7684\u5efa\u8bae\uff0c\u4f8b\u5982\uff0c\u5728\u76ee\u6807\u8bed\u8a00\u4e0a\u8fdb\u884c\u6821\u51c6\u53ef\u4ee5\u6709\u6548\u5730\u964d\u4f4e\u56f0\u60d1\u5ea6\uff0c\u4f46\u4e0d\u4e00\u5b9a\u80fd\u4fc3\u8fdb\u4e0b\u6e38\u4efb\u52a1\u7684\u6027\u80fd\u63d0\u5347\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u5b9e\u9a8c\u63ed\u793a\uff0c\u76ee\u6807\u8bed\u8a00\u4e0a\u7684\u6821\u51c6\u4e3b\u8981\u8d21\u732e\u5728\u4e8e\u4fdd\u7559\u4e0e\u6d41\u7545\u6027\u548c\u8fde\u8d2f\u6027\u76f8\u5173\u7684\u8bed\u8a00\u7279\u5b9a\u7279\u6027\uff0c\u4f46\u53ef\u80fd\u65e0\u6cd5\u6355\u6349\u5230\u4e0e\u7406\u89e3\u80fd\u529b\u548c\u63a8\u7406\u80fd\u529b\u7b49\u8bed\u8a00\u901a\u7528\u7279\u6027\u7684\u5173\u8054\u3002 \u6700\u540e\uff0c\u6211\u4eec\u4e3a\u672a\u6765\u7684\u5b9e\u8df5\u8005\u63d0\u4f9b\u4e86\u5b9e\u9645\u7684\u5efa\u8bae\u3002|\n", "2408.14387": "|**2024-08-26**|**Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning**|Sakhinana Sagar Srinivas et.al.|[2408.14387](http://arxiv.org/abs/2408.14387)|null|\u7a7a\u95f4\u65f6\u95f4\u9884\u6d4b\u5728\u4ea4\u901a\u7cfb\u7edf\u3001\u7269\u6d41\u548c\u4f9b\u5e94\u94fe\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u53d1\u6325\u7740\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u53d7\u9650\u4e8e\u5904\u7406\u5927\u89c4\u6a21\u590d\u6742\u6570\u636e\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5f00\u6e90\u5927\u578b\u548c\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs \u548c LMs\uff09\u4e0e\u4f20\u7edf\u9884\u6d4b\u65b9\u6cd5\u7684\u6df7\u5408\u7b56\u7565\u3002\u901a\u8fc7\u5f15\u5165\u52a8\u6001\u63d0\u793a\u548c\u5206\u7ec4\u67e5\u8be2\u3001\u591a\u5934\u6ce8\u610f\u529b\u673a\u5236\uff0c\u8be5\u7b56\u7565\u80fd\u591f\u66f4\u6709\u6548\u5730\u6355\u6349\u6f14\u53d8\u975e\u7ebf\u6027\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u4e2d\u7684\u5185\u90e8\u7cfb\u5217\u548c\u8de8\u7cfb\u5217\u4f9d\u8d56\u5173\u7cfb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528\u4f4e\u79e9\u9002\u914d\u4e0e\u6fc0\u6d3b\u8bb0\u5fc6\u51cf\u5c11\u6280\u672f\uff08LoRA-AMR\uff09\uff0c\u5728\u6d88\u8d39\u7ea7\u786c\u4ef6\u4e0a\u5bf9\u5f00\u6e90\u5c0f\u578b LM \u8fdb\u884c\u5b9a\u5236\u5316\u5fae\u8c03\uff0c\u4ee5\u5206\u6790\u65f6\u95f4\u5e8f\u5217\u8d8b\u52bf\uff0c\u540c\u65f6\u4fdd\u7559\u63a8\u7406\u5ef6\u8fdf\u5e76\u964d\u4f4e\u8ba1\u7b97\u5f00\u9500\u548c\u6fc0\u6d3b\u5b58\u50a8\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u5c06\u8bed\u8a00\u6a21\u578b\u5904\u7406\u4e0e\u4f20\u7edf\u65f6\u95f4\u5e8f\u5217\u8868\u793a\u5b66\u4e60\u65b9\u6cd5\u76f8\u7ed3\u5408\uff0c\u5b9e\u73b0\u8de8\u6a21\u6001\u96c6\u6210\uff0c\u4ece\u800c\u83b7\u5f97\u7a33\u5065\u4e14\u51c6\u786e\u7684\u9884\u6d4b\u7ed3\u679c\u3002\u901a\u8fc7\u5728\u591a\u4e2a\u5b9e\u9645\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u8be5\u6846\u67b6\u7684\u6548\u80fd\u5f97\u5230\u4e86\u5145\u5206\u9a8c\u8bc1\uff0c\u5176\u9884\u6d4b\u51c6\u786e\u6027\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002|\n", "2408.14380": "|**2024-08-26**|**Probing Causality Manipulation of Large Language Models**|Chenyang Zhang et.al.|[2408.14380](http://arxiv.org/abs/2408.14380)|**[link](https://github.com/tongjinlp/llm-causality-probing)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u4e86\u591a\u79cd\u80fd\u529b\uff0c\u5305\u62ec\u56e0\u679c\u5173\u7cfb\u95ee\u9898\u3002\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u901a\u5e38\u57fa\u4e8e\u7edf\u8ba1\u5173\u8054\u5de5\u4f5c\uff0c\u800c\u975e\u4e13\u6ce8\u4e8e\u53e5\u5b50\u4e2d\u7684\u56e0\u679c\u4e0e\u5f71\u54cd\u3002\u56e0\u6b64\uff0c\u63a2\u7d22LLM\u5185\u90e8\u5bf9\u56e0\u679c\u6027\u7684\u64cd\u7eb5\u662f\u5fc5\u8981\u7684\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u63d0\u4f9b\u4e0d\u540c\u7684\u6377\u5f84\u5e76\u89c2\u5bdf\u6a21\u578b\u884c\u4e3a\u6765\u63a2\u67e5\u56e0\u679c\u6027\u64cd\u7eb5\u7684\u5c42\u7ea7\u3002\u6211\u4eec\u5229\u7528\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u6280\u672f\uff0c\u9488\u5bf9\u8bbe\u8ba1\u7684\u56e0\u679c\u5206\u7c7b\u4efb\u52a1\uff0c\u5bf9\u4e3b\u6d41LLM\u8fdb\u884c\u5b9e\u9a8c\uff0c\u5305\u62ecGPT-4\u4ee5\u53ca\u4e00\u4e9b\u8f83\u5c0f\u7684\u548c\u7279\u5b9a\u9886\u57df\u7684\u6a21\u578b\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLM\u80fd\u591f\u8bc6\u522b\u4e0e\u56e0\u679c\u6027\u76f8\u5173\u7684\u5b9e\u4f53\uff0c\u5e76\u8ba4\u8bc6\u5230\u76f4\u63a5\u7684\u56e0\u679c\u5173\u7cfb\u3002\u7136\u800c\uff0cLLM\u7f3a\u4e4f\u4e13\u95e8\u7684\u56e0\u679c\u8ba4\u77e5\u80fd\u529b\uff0c\u53ea\u662f\u5c06\u56e0\u679c\u6027\u89c6\u4e3a\u53e5\u5b50\u6574\u4f53\u8bed\u4e49\u7684\u4e00\u90e8\u5206\u3002**|\n", "2408.14354": "|**2024-08-26**|**SWE-bench-java: A GitHub Issue Resolving Benchmark for Java**|Daoguang Zan et.al.|[2408.14354](http://arxiv.org/abs/2408.14354)|**[link](https://github.com/multi-swe-bench/multi-swe-bench-env)**|**GitHub\u95ee\u9898\u89e3\u51b3\u662f\u8f6f\u4ef6\u5de5\u7a0b\u4e2d\u7684\u5173\u952e\u4efb\u52a1\uff0c\u8fd1\u671f\u5728\u884c\u4e1a\u548c\u5b66\u672f\u754c\u90fd\u53d7\u5230\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u5728\u8fd9\u4e2a\u9886\u57df\u5185\uff0cSWE-bench\u5df2\u7ecf\u53d1\u5e03\uff0c\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\uff0c\u4f46\u76ee\u524d\u4ec5\u5173\u6ce8Python\u7248\u672c\u3002\u7136\u800c\uff0c\u652f\u6301\u66f4\u591a\u7f16\u7a0b\u8bed\u8a00\u540c\u6837\u81f3\u5173\u91cd\u8981\uff0c\u56e0\u4e3a\u5de5\u4e1a\u754c\u5bf9\u6b64\u6709\u5f3a\u70c8\u9700\u6c42\u3002\u4f5c\u4e3a\u8fc8\u5411\u591a\u8bed\u8a00\u652f\u6301\u7684\u7b2c\u4e00\u6b65\uff0c\u6211\u4eec\u5f00\u53d1\u4e86Java\u7248\u7684SWE-bench\uff0c\u79f0\u4e3aSWE-bench-java\u3002\u6211\u4eec\u5df2\u516c\u5f00\u53d1\u5e03\u4e86\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u4e86\u57fa\u4e8eDocker\u7684\u8bc4\u4f30\u73af\u5883\u548c\u6392\u884c\u699c\uff0c\u8fd9\u4e9b\u90fd\u5c06\u6301\u7eed\u7ef4\u62a4\u548c\u66f4\u65b0\u3002\u4e3a\u4e86\u9a8c\u8bc1SWE-bench-java\u7684\u53ef\u9760\u6027\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u7ecf\u5178\u65b9\u6cd5SWE-agent\uff0c\u5e76\u5728\u5176\u4e2d\u6d4b\u8bd5\u4e86\u51e0\u79cd\u5f3a\u5927\u7684LLMs\u3002\u4f17\u6240\u5468\u77e5\uff0c\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u8a00\u57fa\u51c6\u65e2\u8017\u65f6\u53c8\u8d39\u529b\uff0c\u56e0\u6b64\u6211\u4eec\u6b22\u8fce\u901a\u8fc7\u62c9\u53d6\u8bf7\u6c42\u6216\u5408\u4f5c\u6765\u52a0\u901f\u5176\u8fed\u4ee3\u548c\u6539\u8fdb\uff0c\u4e3a\u5b8c\u5168\u81ea\u52a8\u5316\u7684\u7f16\u7a0b\u94fa\u5e73\u9053\u8def\u3002**|\n", "2408.15240": "|**2024-08-27**|**Generative Verifiers: Reward Modeling as Next-Token Prediction**|Lunjun Zhang et.al.|[2408.15240](http://arxiv.org/abs/2408.15240)|null|\u9a8c\u8bc1\u5668\u6216\u5956\u52b1\u6a21\u578b\u5e38\u7528\u4e8e\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u6027\u80fd\u3002\u4e00\u79cd\u5e38\u89c1\u7684\u65b9\u6cd5\u662fBest-of-N\u7b56\u7565\uff0c\u5176\u4e2d\u4eceLLM\u751f\u6210\u7684N\u4e2a\u5019\u9009\u89e3\u51b3\u65b9\u6848\u4e2d\u7531\u9a8c\u8bc1\u5668\u8fdb\u884c\u6392\u540d\uff0c\u9009\u62e9\u6700\u4f73\u4e00\u4e2a\u3002\u4f20\u7edf\u4e0a\uff0c\u9a8c\u8bc1\u5668\u662f\u4f5c\u4e3a\u5224\u522b\u5206\u7c7b\u5668\u8fdb\u884c\u8bad\u7ec3\u4ee5\u5bf9\u89e3\u51b3\u65b9\u6848\u6253\u5206\u7684\uff0c\u4f46\u5b83\u4eec\u5e76\u672a\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3LLM\u7684\u6587\u672c\u751f\u6210\u80fd\u529b\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u8bae\u901a\u8fc7\u5728\u9a8c\u8bc1\u548c\u89e3\u51b3\u65b9\u6848\u751f\u6210\u4e0a\u4f7f\u7528\u901a\u7528\u7684\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\u76ee\u6807\u8054\u5408\u8bad\u7ec3\u9a8c\u8bc1\u5668\u3002\u4e0e\u6807\u51c6\u9a8c\u8bc1\u5668\u76f8\u6bd4\uff0c\u8fd9\u6837\u7684\u751f\u6210\u578b\u9a8c\u8bc1\u5668\uff08GenRM\uff09\u53ef\u4ee5\u4eceLLM\u7684\u51e0\u4e2a\u4f18\u52bf\u4e2d\u83b7\u76ca\uff1a\u5b83\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u6307\u4ee4\u8c03\u8c10\u76f8\u7ed3\u5408\uff0c\u652f\u6301\u94fe\u5f0f\u601d\u8003\u63a8\u7406\uff0c\u5e76\u4e14\u53ef\u4ee5\u901a\u8fc7\u589e\u52a0\u63a8\u7406\u65f6\u7684\u8ba1\u7b97\u91cf\u6765\u5229\u7528\u591a\u6570\u6295\u7968\uff0c\u4ece\u800c\u8fdb\u884c\u66f4\u597d\u7684\u9a8c\u8bc1\u3002\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7b97\u6cd5\u95ee\u9898\u548c\u5c0f\u5b66\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u4f7f\u7528Gemma\u4e3a\u57fa\u7840\u7684\u9a8c\u8bc1\u5668\u65f6\uff0cGenRM\u4f18\u4e8e\u5224\u522b\u578b\u9a8c\u8bc1\u5668\u548cLLM\u4f5c\u4e3a\u88c1\u5224\uff0c\u8868\u73b0\u51fa16%-64%\u7684\u95ee\u9898\u89e3\u51b3\u7387\u63d0\u5347\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc1\u660e\u4e86GenRM\u5728\u6570\u636e\u96c6\u89c4\u6a21\u3001\u6a21\u578b\u5bb9\u91cf\u548c\u63a8\u7406\u65f6\u8ba1\u7b97\u91cf\u589e\u52a0\u65b9\u9762\u5177\u6709\u826f\u597d\u7684\u53ef\u6269\u5c55\u6027\u3002|\n", "2408.15221": "|**2024-08-27**|**LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet**|Nathaniel Li et.al.|[2408.15221](http://arxiv.org/abs/2408.15221)|null|\u8fd1\u671f\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9632\u5fa1\u63aa\u65bd\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5bf9\u6709\u5bb3\u67e5\u8be2\u7684\u62d2\u7edd\u80fd\u529b\uff0c\u5373\u4f7f\u5728\u906d\u53d7\u6709\u7ec4\u7ec7\u653b\u51fb\u7684\u60c5\u51b5\u4e0b\u4e5f\u4e0d\u4f8b\u5916\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u9632\u5fa1\u63aa\u65bd\u4e3b\u8981\u662f\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u9488\u5bf9\u81ea\u52a8\u5316\u653b\u51fb\u8fdb\u884c\u8bc4\u4f30\uff0c\u8fd9\u79cd\u5a01\u80c1\u6a21\u578b\u4e0d\u8db3\u4ee5\u53cd\u6620\u771f\u5b9e\u4e16\u754c\u4e2d\u6076\u610f\u884c\u4e3a\u7684\u590d\u6742\u6027\u3002 \u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\uff08\u5373\u653b\u51fb\u8005\u5229\u7528\u6a21\u578b\u7684\u6f0f\u6d1e\u6765\u7ed5\u8fc7\u9632\u5fa1\u673a\u5236\uff09\u80fd\u591f\u63ed\u9732\u9632\u5fa1\u7cfb\u7edf\u4e2d\u7684\u91cd\u5927\u6f0f\u6d1e\u3002\u5728\u4f7f\u7528HarmBench\u8fd9\u4e00\u8bc4\u4f30\u5e73\u53f0\uff0c\u5bf9\u6297\u90a3\u4e9b\u5728\u5355\u8f6e\u5bf9\u8bdd\u4e2d\u4ec5\u62a5\u544a\u4f4e\u767e\u5206\u6bd4\u653b\u51fb\u6210\u529f\u7387\uff08ASR\uff09\u7684\u9632\u5fa1\u7cfb\u7edf\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u7684\u6210\u529f\u7387\u8d85\u8fc7\u4e8670%\u3002\u8fd9\u8868\u660e\u5f53\u524d\u7684\u9632\u5fa1\u673a\u5236\u5728\u9762\u5bf9\u66f4\u590d\u6742\u7684\u3001\u591a\u6b65\u9aa4\u7684\u653b\u51fb\u7b56\u7565\u65f6\u5b58\u5728\u4e0d\u8db3\u3002 \u6b64\u5916\uff0c\u591a\u8f6e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u201c\u8d8a\u72f1\u201d\u8fd8\u63ed\u793a\u4e86\u673a\u5668\u9057\u5fd8\u9632\u5fa1\u7cfb\u7edf\u7684\u6f0f\u6d1e\u3002\u653b\u51fb\u8005\u6210\u529f\u5730\u4ece\u672a\u88ab\u5220\u9664\u7684\u6a21\u578b\u4e2d\u6062\u590d\u4e86\u53ef\u7528\u4e8e\u751f\u7269\u5b89\u5168\u53cc\u91cd\u7528\u9014\u7684\u77e5\u8bc6\uff0c\u8fd9\u8fdb\u4e00\u6b65\u8bc1\u660e\u4e86\u73b0\u6709\u9632\u5fa1\u63aa\u65bd\u5728\u4fdd\u62a4\u654f\u611f\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u7684\u5f31\u70b9\u3002 \u4e3a\u4e86\u603b\u7ed3\u548c\u5171\u4eab\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u591a\u8f6e\u5bf9\u8bdd\u4eba\u5de5\u667a\u80fd\u8d8a\u72f1\u201d\uff08Multi-Turn Human Jailbreaks\uff0c\u7b80\u79f0MHJ\uff09\u7684\u6570\u636e\u96c6\uff0c\u5305\u542b\u4e86\u6765\u81ea537\u4e2a\u4e0d\u540c\u591a\u8f6e\u5bf9\u8bdd\u573a\u666f\u76842912\u4e2a\u89e6\u53d1\u6307\u4ee4\uff0c\u5171\u8ba12,912\u4e2a\u89e6\u53d1\u6307\u4ee4\u6d89\u53ca2,912\u4e2a\u4e0d\u540c\u7684\u591a\u8f6e\u5bf9\u8bdd\u201c\u8d8a\u72f1\u201d\u6848\u4f8b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u8fd8\u516c\u5f00\u53d1\u5e03\u4e86\u8fd9\u4e2a\u6570\u636e\u96c6\u4ee5\u53ca\u5728\u591a\u79cd\u5546\u4e1a\u7ea2\u961f\u6d4b\u8bd5\u4e2d\u53d1\u5c55\u51fa\u7684\u4e00\u7cfb\u5217\u201c\u8d8a\u72f1\u201d\u7b56\u7565\u7684\u7efc\u8ff0\uff0c\u65e8\u5728\u4e3a\u7814\u7a76\u66f4\u5f3a\u5927\u7684LLM\u9632\u5fa1\u7cfb\u7edf\u63d0\u4f9b\u8d44\u6e90\u548c\u652f\u6301\u3002|\n", "2408.15207": "|**2024-08-27**|**Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks**|Shide Zhou et.al.|[2408.15207](http://arxiv.org/abs/2408.15207)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\u6781\u5927\u5730\u6539\u53d8\u4e86\u4eba\u5de5\u667a\u80fd\u7684\u683c\u5c40\uff0c\u7136\u800c\u5728\u654f\u611f\u9886\u57df\u90e8\u7f72\u65f6\uff0c\u5b83\u4eec\u7684\u8106\u5f31\u6027\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u4e25\u91cd\u5173\u5207\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u6076\u610f\u5229\u7528\u7684\u98ce\u9669\u3002\u8fd9\u79cd\u60c5\u51b5\u51f8\u663e\u4e86\u9884\u90e8\u7f72\u6d4b\u8bd5\u4e0d\u8db3\u7684\u95ee\u9898\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u66f4\u52a0\u4e25\u683c\u548c\u5168\u9762\u8bc4\u4f30\u65b9\u6cd5\u7684\u7d27\u8feb\u6027\u3002\u672c\u7814\u7a76\u901a\u8fc7\u5168\u9762\u7684\u5b9e\u8bc1\u5206\u6790\uff0c\u8bc4\u4f30\u4e86\u4f20\u7edf\u8986\u76d6\u6807\u51c6\u5728\u8bc6\u522b\u8fd9\u4e9b\u6f0f\u6d1e\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u5173\u952e\u95ee\u9898\u2014\u2014\u201c\u8d8a\u72f1\u201d\u653b\u51fb\u3002\u7814\u7a76\u9996\u5148\u5bf9LLM\u4e2d\u7684\u9690\u85cf\u72b6\u6001\u8fdb\u884c\u4e86\u805a\u7c7b\u5206\u6790\uff0c\u7ed3\u679c\u663e\u793a\u8fd9\u4e9b\u72b6\u6001\u7684\u5185\u5728\u7279\u6027\u80fd\u591f\u660e\u663e\u533a\u5206\u4e0d\u540c\u7c7b\u578b\u7684\u67e5\u8be2\u3002\u968f\u540e\uff0c\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u7ef4\u5ea6\u2014\u2014\u6807\u51c6\u7ea7\u522b\u3001\u5c42\u7ea7\u522b\u548c\u8bcd\u7ea7\u522b\u2014\u2014\u8bc4\u4f30\u4e86\u8fd9\u4e9b\u6807\u51c6\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86\u6b63\u5e38\u67e5\u8be2\u4e0e\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u5728\u795e\u7ecf\u5143\u6fc0\u6d3b\u6a21\u5f0f\u4e0a\u7684\u663e\u8457\u5dee\u5f02\uff0c\u4ece\u800c\u9a8c\u8bc1\u4e86\u805a\u7c7b\u7ed3\u679c\u3002\u57fa\u4e8e\u8fd9\u4e9b\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u5b9e\u65f6\u68c0\u6d4b\u201c\u8d8a\u72f1\u201d\u653b\u51fb\uff0c\u5229\u7528\u795e\u7ecf\u6fc0\u6d3b\u7279\u5f81\u3002\u6211\u4eec\u7684\u5206\u7c7b\u5668\u8868\u73b0\u51fa\u4e86\u6781\u9ad8\u7684\u51c6\u786e\u7387\uff0c\u5e73\u5747\u8fbe\u523096.33%\uff0c\u6210\u529f\u8bc6\u522b\u51fa\u5305\u62ec\u53ef\u80fd\u5bfc\u81f4\u5bf9\u6297\u6027\u653b\u51fb\u7684\u201c\u8d8a\u72f1\u201d\u67e5\u8be2\u3002\u8fd9\u9879\u7814\u7a76\u7684\u91cd\u8981\u6027\u5728\u4e8e\u5176\u5bf9LLM\u5b89\u5168\u6027\u6d4b\u8bd5\u590d\u6742\u6311\u6218\u7684\u5168\u9762\u5e94\u5bf9\u3002\u901a\u8fc7\u4f7f\u7cfb\u7edf\u80fd\u591f\u5728\u751f\u6210\u7b2c\u4e00\u4e2a\u8bcd\u65f6\u7acb\u5373\u68c0\u6d4b\u5230\u653b\u51fb\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4e3a\u96c6\u6210LLM\u7684\u672a\u6765\u7cfb\u7edf\u63d0\u4f9b\u4e86\u5f3a\u5927\u7684\u5b9e\u65f6\u68c0\u6d4b\u80fd\u529b\u3002\u8fd9\u4e00\u7814\u7a76\u6df1\u5316\u4e86\u6211\u4eec\u5bf9LLM\u5b89\u5168\u6027\u7684\u7406\u89e3\uff0c\u5e76\u4e3a\u5f00\u53d1\u66f4\u7a33\u5065\u7684\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u5960\u5b9a\u4e86\u57fa\u7840\u3002|\n", "2408.15205": "|**2024-08-27**|**Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation**|Jian Hu et.al.|[2408.15205](http://arxiv.org/abs/2408.15205)|**[link](https://github.com/lwpyh/ProMaC_code)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4efb\u52a1\u901a\u7528\u7684\u63d0\u793a\u53ef\u5206\u5272\u65b9\u6cd5\uff0c\u65e8\u5728\u51cf\u5c11\u5bf9\u6bcf\u79cd\u6240\u9700\u5bf9\u8c61\u7684\u5b9e\u4f8b\u7279\u5b9a\u624b\u52a8\u63d0\u793a\u7684\u9700\u6c42\u3002\u901a\u8fc7\u4f7f\u7528\u5355\u4e2a\u4efb\u52a1\u901a\u7528\u63d0\u793a\u6765\u6307\u5bfc\u540c\u4e00\u4efb\u52a1\u4e0b\u4e0d\u540c\u5bf9\u8c61\u7684\u4e0d\u540c\u56fe\u50cf\u7684\u5206\u5272\uff0c\u5f15\u5165\u4e86\u4efb\u52a1\u901a\u7528\u63d0\u793a\u5206\u5272\u3002\u5f53\u524d\u7684\u65b9\u6cd5\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u4ece\u901a\u7528\u63d0\u793a\u63a8\u7406\u51fa\u8be6\u7ec6\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ee5\u63d0\u9ad8\u5206\u5272\u51c6\u786e\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002\u7136\u800c\uff0cMLLMs\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7ecf\u5e38\u51fa\u73b0\u5e7b\u89c9\uff0c\u5bfc\u81f4\u63d0\u793a\u4e0d\u51c6\u786e\u3002\u73b0\u6709\u65b9\u6cd5\u4e13\u6ce8\u4e8e\u6d88\u9664\u5e7b\u89c9\u4ee5\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\uff0c\u672c\u6587\u8ba4\u4e3aMLLM\u5e7b\u89c9\u5728\u6b63\u786e\u5229\u7528\u65f6\u53ef\u4ee5\u63ed\u793a\u6709\u4ef7\u503c\u7684\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee3\u8868\u4e86\u8d85\u8d8a\u5355\u5f20\u56fe\u50cf\u7684\u9884\u8bad\u7ec3\u5927\u89c4\u6a21\u77e5\u8bc6\u3002\u56e0\u6b64\uff0c\u672c\u6587\u5229\u7528\u5e7b\u89c9\u4ece\u56fe\u50cf\u4e2d\u6316\u6398\u4efb\u52a1\u76f8\u5173\u4fe1\u606f\uff0c\u5e76\u9a8c\u8bc1\u5176\u51c6\u786e\u6027\u4ee5\u589e\u5f3a\u751f\u6210\u63d0\u793a\u7684\u7cbe\u786e\u5ea6\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u8fed\u4ee3\u7684\u63d0\u793a-\u63a9\u7801\u5faa\u73af\u751f\u6210\u6846\u67b6\uff08ProMaC\uff09\uff0c\u8be5\u6846\u67b6\u5305\u62ec\u4e00\u4e2a\u63d0\u793a\u751f\u6210\u5668\u548c\u4e00\u4e2a\u63a9\u7801\u751f\u6210\u5668\u3002\u63d0\u793a\u751f\u6210\u5668\u4f7f\u7528\u591a\u5c3a\u5ea6\u94fe\u5f0f\u601d\u8003\u63d0\u793a\uff0c\u6700\u521d\u63a2\u7d22\u5e7b\u89c9\u4ee5\u63d0\u53d6\u6d4b\u8bd5\u56fe\u50cf\u4e0a\u7684\u6269\u5c55\u4e0a\u4e0b\u6587\u77e5\u8bc6\u3002\u7136\u540e\uff0c\u5c06\u8fd9\u4e9b\u5e7b\u89c9\u964d\u4f4e\u5230\u5f62\u6210\u7cbe\u786e\u7684\u5b9e\u4f8b\u7279\u5b9a\u63d0\u793a\uff0c\u4ece\u800c\u5f15\u5bfc\u63a9\u7801\u751f\u6210\u5668\u901a\u8fc7\u63a9\u7801\u8bed\u4e49\u5bf9\u9f50\u4ea7\u751f\u4e0e\u4efb\u52a1\u8bed\u4e49\u4e00\u81f4\u7684\u63a9\u7801\u3002\u751f\u6210\u7684\u63a9\u7801\u901a\u8fc7\u8fed\u4ee3\u5f15\u5bfc\u63d0\u793a\u751f\u6210\u5668\u66f4\u5173\u6ce8\u4efb\u52a1\u76f8\u5173\u7684\u56fe\u50cf\u533a\u57df\u5e76\u51cf\u5c11\u65e0\u5173\u7684\u5e7b\u89c9\uff0c\u6700\u7ec8\u5171\u540c\u63d0\u9ad8\u4e86\u63d0\u793a\u548c\u63a9\u7801\u7684\u8d28\u91cf\u3002 \u5b9e\u9a8c\u7ed3\u679c\u57285\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8bc1\u660e\u4e86ProMaC\u7684\u6709\u6548\u6027\u3002\u8be6\u7ec6\u4ee3\u7801\u89c1https://lwpyh.github.io/ProMaC/\u3002|\n", "2408.15204": "|**2024-08-27**|**Can Unconfident LLM Annotations Be Used for Confident Conclusions?**|Kristina Gligori\u0107 et.al.|[2408.15204](http://arxiv.org/abs/2408.15204)|**[link](https://github.com/kristinagligoric/confidence-driven-inference)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5404\u79cd\u4efb\u52a1\u4e2d\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u8005\u9ad8\u5ea6\u4e00\u81f4\uff0c\u663e\u793a\u51fa\u51cf\u8f7b\u4eba\u7c7b\u6570\u636e\u6536\u96c6\u6311\u6218\u7684\u6f5c\u529b\u3002\u5728\u8ba1\u7b97\u793e\u4f1a\u79d1\u5b66\uff08CSS\uff09\u9886\u57df\uff0c\u7814\u7a76\u4eba\u5458\u8d8a\u6765\u8d8a\u591a\u5730\u5229\u7528LLM\u6ce8\u91ca\u6765\u8865\u5145\u7f13\u6162\u4e14\u6602\u8d35\u7684\u4eba\u7c7b\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u5982\u4f55\u6536\u96c6\u548c\u4f7f\u7528LLM\u6ce8\u91ca\u800c\u4e0d\u635f\u5bb3\u4e0b\u6e38\u7ed3\u8bba\u7684\u6709\u6548\u6027\uff0c\u4ecd\u7f3a\u4e4f\u660e\u786e\u7684\u6307\u5357\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u201d\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u7ed3\u5408\u4e86LLM\u6ce8\u91ca\u548cLLM\u7f6e\u4fe1\u5ea6\u6307\u793a\u5668\uff0c\u4ee5\u6218\u7565\u65b9\u5f0f\u9009\u62e9\u5e94\u6536\u96c6\u54ea\u4e9b\u4eba\u7c7b\u6ce8\u91ca\uff0c\u65e8\u5728\u751f\u4ea7\u51c6\u786e\u7684\u7edf\u8ba1\u4f30\u8ba1\u548c\u53ef\u9a8c\u8bc1\u7684\u7f6e\u4fe1\u533a\u95f4\uff0c\u540c\u65f6\u51cf\u5c11\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5177\u6709\u9632\u6b62LLM\u6ce8\u91ca\u8d28\u91cf\u5dee\u7684\u4fdd\u969c\u63aa\u65bd\uff0c\u786e\u4fdd\u5f97\u51fa\u7684\u7ed3\u8bba\u65e2\u6709\u6548\u53c8\u4e0d\u6bd4\u4ec5\u4f9d\u8d56\u4eba\u7c7b\u6ce8\u91ca\u66f4\u4e0d\u51c6\u786e\u3002\u6211\u4eec\u5728\u4e09\u4e2aCSS\u573a\u666f\u2014\u2014\u793c\u8c8c\u6587\u672c\u3001\u7acb\u573a\u548c\u504f\u89c1\u2014\u2014\u4e2d\u7684\u7edf\u8ba1\u4f30\u8ba1\u4efb\u52a1\u4e2d\uff0c\u901a\u8fc7\u4e0e\u57fa\u7ebf\u6bd4\u8f83\uff0c\u8bc1\u660e\u4e86\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u7684\u6709\u6548\u6027\uff0c\u6bcf\u79cd\u573a\u666f\u4e0b\u6240\u9700\u7684\u4eba\u7c7b\u6ce8\u91ca\u6570\u91cf\u51cf\u5c11\u4e86\u8d85\u8fc725%\u3002\u5c3d\u7ba1\u6211\u4eec\u4f7f\u7528CSS\u573a\u666f\u8fdb\u884c\u6f14\u793a\uff0c\u4f46\u7f6e\u4fe1\u9a71\u52a8\u63a8\u7406\u53ef\u4ee5\u7528\u4e8e\u5e7f\u6cdbNLP\u95ee\u9898\u4e2d\u7684\u5927\u591a\u6570\u6807\u51c6\u91cf\u4f30\u8ba1\u3002|\n", "2408.15176": "|**2024-08-27**|**Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement**|Longshen Ou et.al.|[2408.15176](http://arxiv.org/abs/2408.15176)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u591a\u4e2a\u9886\u57df\u5c55\u793a\u4e86\u663e\u8457\u7684\u80fd\u529b\uff0c\u5305\u62ec\u7b26\u53f7\u97f3\u4e50\u751f\u6210\u3002\u7136\u800c\uff0c\u5229\u7528\u8fd9\u4e9b\u9884\u8bad\u7ec3\u7684\u6a21\u578b\u8fdb\u884c\u53ef\u63a7\u97f3\u4e50\u7f16\u6392\u4efb\u52a1\u7684\u6311\u6218\u4ecd\u7136\u65b0\u9896\uff0c\u6bcf\u4e2a\u4efb\u52a1\u90fd\u9700\u8981\u4e0d\u540c\u7684\u97f3\u4e50\u4fe1\u606f\u4f5c\u4e3a\u63a7\u5236\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u7684\u5e8f\u5217\u5230\u5e8f\u5217\u6846\u67b6\uff0c\u5b83\u5141\u8bb8\u5bf9\u7b26\u53f7\u97f3\u4e50\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u6267\u884c\u56db\u4e2a\u4e0d\u540c\u7684\u591a\u8f68\u7f16\u6392\u4efb\u52a1\uff1a\u4e50\u961f\u7f16\u6392\u3001\u94a2\u7434\u7f29\u51cf\u3001\u9f13\u7f16\u6392\u548c\u58f0\u97f3\u5206\u79bb\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6240\u63d0\u51fa\u7684\u7b56\u7565\u5728\u6240\u6709\u56db\u4e2a\u4efb\u52a1\u4e0a\u5747\u5b9e\u73b0\u4e86\u66f4\u9ad8\u97f3\u4e50\u8d28\u91cf\u7684\u7ed3\u679c\uff0c\u4e0e\u4e13\u95e8\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u57fa\u7ebf\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u901a\u8fc7\u989d\u5916\u7684\u63a2\u67e5\u5206\u6790\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u9884\u8bad\u7ec3\u9636\u6bb5\u8d4b\u4e88\u6a21\u578b\u7406\u89e3\u97f3\u4e50\u6761\u4ef6\u7684\u57fa\u672c\u77e5\u8bc6\uff0c\u8fd9\u5728\u4ec5\u901a\u8fc7\u7279\u5b9a\u4efb\u52a1\u7684\u5fae\u8c03\u96be\u4ee5\u83b7\u5f97\u7684\u60c5\u51b5\u4e0b\u5c24\u4e3a\u91cd\u8981\u3002|\n", "2408.15172": "|**2024-08-27**|**X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation**|Hanjia Lyu et.al.|[2408.15172](http://arxiv.org/abs/2408.15172)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5df2\u88ab\u8bc1\u660e\u80fd\u663e\u8457\u63d0\u5347\u4e30\u5bcc\u9879\u76ee\u63cf\u8ff0\u7684\u6548\u679c\uff0c\u8fdb\u800c\u589e\u5f3a\u63a8\u8350\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u4e8e\u7eaf\u6587\u672c\u63d0\u793a\uff0c\u6216\u8005\u91c7\u7528\u57fa\u672c\u7684\u591a\u6a21\u6001\u7b56\u7565\uff0c\u672a\u80fd\u5145\u5206\u5229\u7528\u6587\u672c\u4e0e\u89c6\u89c9\u6a21\u6001\u4e4b\u95f4\u4e92\u8865\u7684\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCross-Reflection Prompting\uff08X-Reflect\uff09\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u901a\u8fc7\u5f15\u5bfcLMM\u660e\u786e\u8bc6\u522b\u5e76\u8c03\u548c\u6587\u672c\u4e0e\u56fe\u50cf\u4e4b\u95f4\u7684\u652f\u6301\u6027\u4e0e\u51b2\u7a81\u4fe1\u606f\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u901a\u8fc7\u6355\u6349\u4e24\u79cd\u6a21\u6001\u7684\u7ec6\u5fae\u6d1e\u5bdf\uff0c\u6b64\u65b9\u6cd5\u751f\u6210\u4e86\u66f4\u4e3a\u5168\u9762\u4e14\u8bed\u5883\u4e30\u5bcc\u7684\u9879\u76ee\u8868\u793a\u3002\u5728\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4e0b\u6e38\u63a8\u8350\u51c6\u786e\u5ea6\u4e0a\u4f18\u4e8e\u73b0\u6709\u7684\u63d0\u793a\u57fa\u7ebf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u6846\u67b6\u5728\u4e0d\u540cLMM\u67b6\u6784\u4e0b\u7684\u6cdb\u5316\u80fd\u529b\u4ee5\u53ca\u63d0\u793a\u7b56\u7565\u7684\u9c81\u68d2\u6027\uff0c\u63d0\u4f9b\u4e86\u4f18\u5316\u7684\u89c1\u89e3\u3002\u8fd9\u9879\u5de5\u4f5c\u5f3a\u8c03\u4e86\u6574\u5408\u591a\u6a21\u6001\u4fe1\u606f\u7684\u91cd\u8981\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u6539\u5584\u591a\u6a21\u6001\u63a8\u8350\u7cfb\u7edf\u4e2d\u9879\u76ee\u7406\u89e3\u7684\u65b0\u578b\u89e3\u51b3\u65b9\u6848\u3002|\n", "2408.15171": "|**2024-08-27**|**Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation**|N. E. Kriman et.al.|[2408.15171](http://arxiv.org/abs/2408.15171)|null|\u81ea2022\u5e74ChatGPT\u7684\u53d1\u5e03\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u5e94\u7528\u8303\u56f4\u663e\u8457\u6269\u5927\uff0c\u663e\u793a\u51fa\u5176\u5728\u5404\u79cd\u573a\u666f\u4e2d\u7684\u4ef7\u503c\u3002\u7136\u800c\uff0c\u5bf9\u4e8e\u4f01\u4e1a\u7ea7\u548c\u5546\u4e1a\u5e94\u7528\u800c\u8a00\uff0cLLMs\u751f\u6210\u4e0d\u51c6\u786e\u4fe1\u606f\u7684\u8d8b\u52bf\uff0c\u5373\u6240\u8c13\u7684\u201c\u5e7b\u89c9\u201d\u73b0\u8c61\uff0c\u6210\u4e3a\u4e86\u4e00\u4e2a\u4e3b\u8981\u6311\u6218\u3002\u672c\u9879\u76ee\u63d0\u51fa\u4e86\u4e00\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u5728\u4e0e\u539f\u59cb\u6587\u672c\u8fdb\u884c\u6bd4\u8f83\u65f6\u8bc4\u4f30LLM\u751f\u6210\u6982\u8981\u7684\u51c6\u786e\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6734\u7d20\u8d1d\u53f6\u65af\u5206\u7c7b\u6765\u5224\u65ad\u751f\u6210\u5185\u5bb9\u7684\u771f\u5b9e\u6027\u3002 \u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u53ef\u4ee5\u4f30\u8ba1\u751f\u6210\u6587\u672c\u4e0e\u5b9e\u9645\u4fe1\u606f\u4e4b\u95f4\u7684\u5339\u914d\u5ea6\uff0c\u4ece\u800c\u63d0\u9ad8LLM\u5e94\u7528\u7684\u8d28\u91cf\u548c\u53ef\u9760\u6027\u3002\u8fd9\u4e0d\u4ec5\u6709\u52a9\u4e8e\u8bc6\u522b\u53ef\u80fd\u5b58\u5728\u7684\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u4e4b\u5904\uff0c\u8fd8\u80fd\u589e\u5f3a\u7528\u6237\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u4fe1\u4efb\uff0c\u4fc3\u8fdb\u5176\u5728\u66f4\u5e7f\u6cdb\u9886\u57df\u7684\u6709\u6548\u4f7f\u7528\u3002\u6b64\u5916\uff0c\u8be5\u65b9\u6cd5\u8fd8\u80fd\u4e3aLLM\u7684\u6301\u7eed\u6539\u8fdb\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u53cd\u9988\uff0c\u63a8\u52a8\u6280\u672f\u8fdb\u6b65\uff0c\u6700\u7ec8\u5b9e\u73b0\u66f4\u9ad8\u8d28\u91cf\u3001\u66f4\u53ef\u9760\u7684\u4eba\u5de5\u667a\u80fd\u8f85\u52a9\u5185\u5bb9\u751f\u6210\u3002|\n", "2408.15079": "|**2024-08-27**|**BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline**|Guosheng Dong et.al.|[2408.15079](http://arxiv.org/abs/2408.15079)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6838\u5fc3\u80fd\u529b\u9ad8\u5ea6\u4f9d\u8d56\u4e8e\u5e7f\u6cdb\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u7684\u7ec4\u6210\u548c\u9009\u62e9\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u88ab\u591a\u4e2a\u673a\u6784\u89c6\u4e3a\u5546\u4e1a\u79d8\u5bc6\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5f00\u6e90\u4e86\u4e00\u4e2a\u901a\u7528\u9002\u7528\u7684\u6570\u636e\u5904\u7406\u7ba1\u9053\uff0c\u5e76\u901a\u8fc7\u5f15\u5165\u4e00\u4e2a\u7ade\u4e89\u6027\u7684LLM\u57fa\u7ebf\u6765\u9a8c\u8bc1\u5176\u6709\u6548\u6027\u548c\u6f5c\u529b\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6570\u636e\u5904\u7406\u7ba1\u9053\u5305\u62ec\u5e7f\u57df\u6536\u96c6\u4ee5\u6269\u5927\u89c4\u6a21\u548c\u91cd\u65b0\u52a0\u6743\u4ee5\u63d0\u9ad8\u8d28\u91cf\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u7ba1\u9053\u5bf93\u4e07\u4ebf\u4e2a\u4ee4\u724c\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u660e\u786e\u7684\u4e0b\u6e38\u4efb\u52a1\u4f18\u5316\uff0c\u63a5\u7740\u8fdb\u884c\u4e00\u4e2a\u7b80\u5355\u4f46\u6709\u6548\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u3002BaichuanSEED\u5728\u6574\u4e2a\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8868\u73b0\u51fa\u4e00\u81f4\u6027\u4e0e\u9884\u6d4b\u6027\uff0c\u5e76\u5728\u7efc\u5408\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u4e0e\u51e0\u4e2a\u5148\u8fdb\u7684\u5546\u4e1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u5982Qwen1.5\u548cLlama3\uff0c\u5b9e\u73b0\u4e86\u53ef\u6bd4\u6027\u80fd\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u51e0\u4e2a\u542f\u53d1\u5f0f\u5b9e\u9a8c\uff0c\u8ba8\u8bba\u4e86\u5728\u6570\u5b66\u548c\u7f16\u7a0b\u7b49\u4e0b\u6e38\u4efb\u52a1\u8fdb\u4e00\u6b65\u4f18\u5316\u7684\u53ef\u80fd\u6027\u3002|\n", "2408.15066": "|**2024-08-27**|**Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models**|Ned Cooper et.al.|[2408.15066](http://arxiv.org/abs/2408.15066)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u4ea4\u4e92\u53cd\u9988\u529f\u80fd\u5728ChatGPT\u754c\u9762\u4e2d\u7684\u53ef\u7528\u6027\uff0c\u5206\u6790\u4e86\u8fd9\u4e9b\u529f\u80fd\u5982\u4f55\u5851\u9020\u7528\u6237\u8f93\u5165\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fed\u4ee3\u8fc7\u7a0b\u4e2d\u7684\u53c2\u4e0e\u5ea6\u3002\u901a\u8fc7\u8c03\u7814ChatGPT\u7528\u6237\u5e76\u5e94\u7528\u4e86\u53ef\u64cd\u4f5c\u6027\u6846\u67b6\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u8fd9\u7c7b\u529f\u80fd\u9f13\u52b1\u7b80\u5355\u3001\u9891\u7e41\u4e14\u4fa7\u91cd\u4e8e\u6027\u80fd\u7684\u53cd\u9988\uff0c\u540c\u65f6\u9650\u5236\u4e86\u96c6\u4f53\u8f93\u5165\u548c\u7528\u6237\u95f4\u7684\u8ba8\u8bba\u3002\u6211\u4eec\u4e3b\u5f20\uff0c\u8fd9\u79cd\u53cd\u9988\u683c\u5f0f\u6781\u5927\u5730\u9650\u5236\u4e86\u7528\u6237\u7684\u53c2\u4e0e\uff0c\u5f3a\u5316\u4e86\u7528\u6237\u3001\u516c\u4f17\u4e0e\u5f00\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u516c\u53f8\u4e4b\u95f4\u7684\u6743\u529b\u4e0d\u5e73\u7b49\u3002\u6211\u4eec\u7684\u5206\u6790\u4e3a\u73b0\u6709\u53c2\u4e0e\u5f0f\u4eba\u5de5\u667a\u80fd\u6587\u732e\u63d0\u4f9b\u4e86\u65b0\u7684\u89c6\u89d2\uff0c\u7740\u91cd\u4e8e\u73b0\u6709\u53cd\u9988\u6d41\u7a0b\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u4e86\u91cd\u65b0\u8bbe\u8ba1\u7684\u65b9\u5411\u3002 \u4e3a\u4e86\u4f7f\u516c\u4f17\u5728\u4eba\u5de5\u667a\u80fd\u53d1\u5c55\u4e2d\u80fd\u591f\u66f4\u5177\u6709\u610f\u4e49\u5730\u53c2\u4e0e\uff0c\u6211\u4eec\u63d0\u5021\u8f6c\u5411\u5173\u6ce8\u6a21\u578b\u8f93\u51fa\u4e0e\u7279\u5b9a\u7528\u6237\u504f\u597d\u7684\u4e00\u81f4\u6027\u7684\u8fc7\u7a0b\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u5f3a\u8c03\u9700\u8981\u4fc3\u8fdb\u516c\u53f8\u4e0e\u4e0d\u540c\u201c\u516c\u4f17\u201d\u4e4b\u95f4\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u76ee\u7684\u548c\u5e94\u7528\u8fdb\u884c\u5bf9\u8bdd\u7684\u8fc7\u7a0b\u3002\u8fd9\u4e00\u65b9\u6cd5\u8981\u6c42\u5bf9\u6301\u7eed\u7684\u793e\u4f1a\u57fa\u7840\u8bbe\u65bd\u5efa\u8bbe\u7684\u5173\u6ce8\uff0c\u5373\u521b\u5efa\u548c\u7ef4\u6301\u89e3\u51b3AI\u5f00\u53d1\u548c\u90e8\u7f72\u5f71\u54cd\u7fa4\u4f53\u5173\u5207\u6240\u9700\u7684\u793e\u4f1a\u3001\u6280\u672f\u548c\u673a\u6784\u7ed3\u6784\u3002|\n", "2408.15998": "|**2024-08-28**|**Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders**|Min Shi et.al.|[2408.15998](http://arxiv.org/abs/2408.15998)|**[link](https://github.com/nvlabs/eagle)**|**\u300a\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u7684\u89c6\u89c9\u7406\u89e3\u80fd\u529b\uff1a\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684\u8bbe\u8ba1\u7a7a\u95f4\u63a2\u7d22\u300b\u4e00\u6587\u63a2\u8ba8\u4e86\u51c6\u786e\u89e3\u6790\u590d\u6742\u89c6\u89c9\u4fe1\u606f\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u91cd\u8981\u6027\u3002\u8fd1\u671f\u7814\u7a76\u663e\u793a\uff0c\u589e\u5f3a\u7684\u89c6\u89c9\u611f\u77e5\u80fd\u663e\u8457\u964d\u4f4e\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5728\u5149\u5b66\u5b57\u7b26\u8bc6\u522b\u3001\u6587\u6863\u5206\u6790\u7b49\u5206\u8fa8\u7387\u654f\u611f\u4efb\u52a1\u4e0a\u63d0\u5347\u6027\u80fd\u3002\u8bb8\u591a\u5148\u8fdbMLLMs\u901a\u8fc7\u96c6\u6210\u591a\u79cd\u89c6\u89c9\u7f16\u7801\u5668\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u3002\u7136\u800c\uff0c\u5f53\u524d\u7f3a\u4e4f\u5bf9\u5173\u952e\u65b9\u9762\u7cfb\u7edf\u7684\u6bd4\u8f83\u548c\u8be6\u7ec6\u7684\u62c6\u89e3\u7814\u7a76\uff0c\u6bd4\u5982\u4e13\u5bb6\u9009\u62e9\u548c\u591a\u89c6\u89c9\u4e13\u5bb6\u878d\u5408\u7b56\u7565\u3002\u672c\u6587\u5bf9\u4f7f\u7528\u6df7\u5408\u89c6\u89c9\u7f16\u7801\u5668\u7684MLLM\u8bbe\u8ba1\u7a7a\u95f4\u8fdb\u884c\u4e86\u5e7f\u6cdb\u63a2\u7d22\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u591a\u4e2a\u4e92\u8865\u89c6\u89c9\u7f16\u7801\u5668\u7684\u89c6\u89c9\u4ee4\u724c\u7b80\u5355\u62fc\u63a5\u5373\u53ef\u8fbe\u5230\u4e0e\u66f4\u590d\u6742\u7684\u6df7\u5408\u67b6\u6784\u6216\u7b56\u7565\u76f8\u5f53\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u5f15\u5165\u9884\u5bf9\u9f50\uff08Pre-Alignment\uff09\u673a\u5236\uff0c\u4ee5\u5f25\u5408\u4e13\u6ce8\u4e8e\u89c6\u89c9\u7684\u7f16\u7801\u5668\u4e0e\u8bed\u8a00\u4ee4\u724c\u4e4b\u95f4\u7684\u5dee\u8ddd\uff0c\u4ece\u800c\u63d0\u5347\u6a21\u578b\u4e00\u81f4\u6027\u3002\u7531\u6b64\u4ea7\u751f\u7684MLLM\u5bb6\u65cf\u2014\u2014Eagle\uff0c\u5728\u4e3b\u8981\u7684MLLM\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8d85\u8d8a\u4e86\u5176\u4ed6\u9886\u5148\u5f00\u6e90\u6a21\u578b\u3002\u76f8\u5173\u4ee3\u7801\u53ca\u6a21\u578b\u5df2\u5f00\u6e90\u53d1\u5e03\uff1ahttps://github.com/NVlabs/Eagle**|\n", "2408.15971": "|**2024-08-28**|**BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems**|Wei Wang et.al.|[2408.15971](http://arxiv.org/abs/2408.15971)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6b63\u5728\u53d8\u5f97\u8d8a\u6765\u8d8a\u5f3a\u5927\uff0c\u80fd\u591f\u5904\u7406\u590d\u6742\u4efb\u52a1\uff0c\u4f8b\u5982\u6784\u5efa\u5355\u4e00\u4ee3\u7406\u548c\u591a\u4ee3\u7406\u7cfb\u7edf\u3002\u76f8\u8f83\u4e8e\u5355\u4e00\u4ee3\u7406\uff0c\u591a\u4ee3\u7406\u7cfb\u7edf\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u534f\u4f5c\u80fd\u529b\u63d0\u51fa\u4e86\u66f4\u9ad8\u7684\u8981\u6c42\u3002\u5df2\u6709\u7684\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u5173\u6ce8\u4e8e\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u5728\u7ec6\u7c92\u5ea6\u8bc4\u4f30\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u5e76\u4e14\u5ffd\u7565\u4e86\u591a\u4ee3\u7406\u7cfb\u7edf\u7684\u534f\u4f5c\u4e0e\u7ade\u4e89\u573a\u666f\u3002 \u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\u2014\u2014BattleAgentBench\u3002\u8be5\u57fa\u51c6\u5b9a\u4e49\u4e86\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u4e03\u4e2a\u5b50\u9636\u6bb5\uff0c\u65e8\u5728\u4ece\u5355\u4e00\u4ee3\u7406\u573a\u666f\u5bfc\u822a\u80fd\u529b\u3001\u914d\u5bf9\u4ee3\u7406\u4efb\u52a1\u6267\u884c\u80fd\u529b\u4ee5\u53ca\u591a\u4ee3\u7406\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7b49\u591a\u4e2a\u7ef4\u5ea6\uff0c\u5bf9\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u7ec6\u81f4\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5bf9\u56db\u5927\u95ed\u6e90\u6a21\u578b\u548c\u4e03\u5927\u5f00\u6e90\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u8bc4\u4f30\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u5728\u7b80\u5355\u4efb\u52a1\u4e0a\u5219\u9762\u4e34\u6311\u6218\u3002\u5bf9\u4e8e\u9700\u8981\u5408\u4f5c\u4e0e\u7ade\u4e89\u80fd\u529b\u7684\u56f0\u96be\u4efb\u52a1\uff0c\u5c3d\u7ba1\u57fa\u4e8eAPI\u7684\u6a21\u578b\u5c55\u793a\u4e86\u4e00\u5b9a\u7684\u534f\u4f5c\u80fd\u529b\uff0c\u4f46\u4ecd\u6709\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002|\n", "2408.15966": "|**2024-08-28**|**More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding**|Yuan Tang et.al.|[2408.15966](http://arxiv.org/abs/2408.15966)|**[link](https://github.com/tangyuan96/greenplm)**|\u5728\u672c\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u91cd\u65b0\u5ba1\u89c6\u4e86\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7406\u89e3\u4e09\u7ef4\u7269\u7406\u4e16\u754c\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u5927\u89c4\u6a21\u7684\u4e09\u7ef4\u70b9\u4e91\u4e0e\u6587\u672c\u914d\u5bf9\u6570\u636e\u96c6\uff0cLLM \u5728\u4e09\u7ef4\u7406\u89e3\u4e0a\u7684\u6210\u529f\u5c1a\u672a\u5b9e\u73b0\u590d\u5236\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u4efb\u52a1\uff1a3D \u6570\u636e\u9ad8\u6548\u70b9\u4e91-\u8bed\u8a00\u7406\u89e3\u3002\u76ee\u6807\u662f\u4f7fLLM \u80fd\u591f\u5229\u7528\u6700\u5c11\u7684\u4e09\u7ef4\u70b9\u4e91\u548c\u6587\u672c\u6570\u636e\u5bf9\u5b9e\u73b0\u7a33\u5065\u7684\u4e09\u7ef4\u5bf9\u8c61\u7406\u89e3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u4efb\u52a1\uff0c\u6211\u4eec\u5f15\u5165\u4e86GreenPLM\uff0c\u901a\u8fc7\u5229\u7528\u66f4\u591a\u7684\u6587\u672c\u6570\u636e\u6765\u5f25\u8865\u7f3a\u5c11\u7684\u4e09\u7ef4\u6570\u636e\u3002\u9996\u5148\uff0c\u501f\u9274\u4f7f\u7528CLIP\u5bf9\u56fe\u50cf\u548c\u6587\u672c\u8fdb\u884c\u5bf9\u9f50\u7684\u65b9\u5f0f\uff0c\u6211\u4eec\u5229\u7528\u9884\u8bad\u7ec3\u7684\u70b9\u4e91-\u6587\u672c\u7f16\u7801\u5668\u5c06\u4e09\u7ef4\u70b9\u4e91\u7a7a\u95f4\u6620\u5c04\u5230\u6587\u672c\u7a7a\u95f4\u3002\u8fd9\u4e00\u6620\u5c04\u4f7f\u5f97\u6211\u4eec\u53ef\u4ee5\u65e0\u7f1d\u5730\u8fde\u63a5\u6587\u672c\u7a7a\u95f4\u4e0eLLM\u3002\u4e00\u65e6\u5efa\u7acb\u4e86\u70b9\u4e91-\u6587\u672c-LLM\u7684\u8fde\u63a5\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u901a\u8fc7\u6269\u5c55\u4e2d\u95f4\u6587\u672c\u7a7a\u95f4\u589e\u5f3a\u6587\u672c-LLM\u7684\u5bf9\u9f50\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u4e09\u7ef4\u70b9\u4e91\u6570\u636e\u7684\u4f9d\u8d56\u3002 \u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u751f\u6210\u4e86600\u4e07\u4e2a\u5173\u4e8e\u4e09\u7ef4\u7269\u4f53\u7684\u81ea\u7531\u6587\u672c\u63cf\u8ff0\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e09\u9636\u6bb5\u8bad\u7ec3\u7b56\u7565\uff0c\u5e2e\u52a9LLM\u66f4\u597d\u5730\u63a2\u7d22\u4e0d\u540c\u6a21\u6001\u4e4b\u95f4\u7684\u5185\u5728\u8054\u7cfb\u3002\u4e3a\u4e86\u5b9e\u73b0\u9ad8\u6548\u7684\u6a21\u6001\u5bf9\u9f50\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u96f6\u53c2\u6570\u4ea4\u53c9\u6ce8\u610f\u529b\u6a21\u5757\u7528\u4e8e\u4ee4\u724c\u805a\u5408\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGreenPLM\u4ec5\u9700\u8981\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u6240\u75283D\u8bad\u7ec3\u6570\u636e\u768412%\uff0c\u5c31\u80fd\u8fbe\u5230\u66f4\u4f18\u7684\u4e09\u7ef4\u7406\u89e3\u6027\u80fd\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cGreenPLM\u4ec5\u4f7f\u7528\u6587\u672c\u6570\u636e\u4e5f\u80fd\u5b9e\u73b0\u7ade\u4e89\u529b\u7684\u8868\u73b0\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6743\u91cd\u53ef\u5728\u4ee5\u4e0b\u94fe\u63a5\u83b7\u53d6\uff1ahttps://github.com/TangYuan96/GreenPLM\u3002|\n", "2408.15950": "|**2024-08-28**|**Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games**|Nicholas R. Waytowich et.al.|[2408.15950](http://arxiv.org/abs/2408.15950)|null|\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u8fdb\u5c55\u4f7f\u5176\u80fd\u529b\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u6587\u672c\u4efb\u52a1\uff0c\u6269\u5c55\u5230\u4e86\u591a\u6a21\u6001\u9886\u57df\uff0c\u6574\u5408\u4e86\u89c6\u89c9\u3001\u542c\u89c9\u548c\u6587\u672c\u6570\u636e\u3002\u867d\u7136\u5728\u673a\u5668\u4eba\u5b66\u548c\u6e38\u620f\u7b49\u9ad8\u9636\u89c4\u5212\u9886\u57df\u5bf9\u591a\u6a21\u6001LLM\u7684\u7814\u7a76\u5df2\u7ecf\u76f8\u5f53\u5e7f\u6cdb\uff0c\u4f46\u5728\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u6f5c\u529b\u5374\u9c9c\u6709\u63a2\u7d22\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001LLM\u5728 Atari \u89c6\u9891\u6e38\u620f\u9886\u57df\u7684\u5e94\u7528\uff0c\u5f15\u5165\u4e86 Atari \u6e38\u620f\u6027\u80fd\u4f5c\u4e3a\u8bc4\u4f30\u591a\u6a21\u6001LLM\u6267\u884c\u4f4e\u7ea7\u63a7\u5236\u4efb\u52a1\u80fd\u529b\u7684\u65b0\u57fa\u51c6\u3002\u4e0e\u4f20\u7edf\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u548c\u6a21\u4eff\u5b66\u4e60\uff08IL\uff09\u65b9\u6cd5\u76f8\u6bd4\uff0c\u8fd9\u4e9bLLM\u65e0\u9700\u5927\u91cf\u7684\u8ba1\u7b97\u8d44\u6e90\u548c\u5956\u52b1\u51fd\u6570\u5b9a\u4e49\uff0c\u800c\u662f\u5229\u7528\u73b0\u6709\u7684\u591a\u6a21\u6001\u77e5\u8bc6\u76f4\u63a5\u4e0e\u6e38\u620f\u73af\u5883\u4ea4\u4e92\u3002 \u6211\u4eec\u7684\u7814\u7a76\u8bc4\u4f30\u4e86\u591a\u4e2a\u591a\u6a21\u6001LLM\u7684\u8868\u73b0\uff0c\u4e0e\u4f20\u7edfRL\u4ee3\u7406\u3001\u4eba\u7c7b\u73a9\u5bb6\u548c\u968f\u673a\u4ee3\u7406\u8fdb\u884c\u4e86\u6bd4\u8f83\uff0c\u91cd\u70b9\u5173\u6ce8\u5b83\u4eec\u7406\u89e3\u590d\u6742\u89c6\u89c9\u573a\u666f\u5e76\u5236\u5b9a\u6218\u7565\u54cd\u5e94\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u5f15\u5165\u4eba\u7c7b\u6f14\u793a\u7684\u6e38\u620f\u73a9\u6cd5\u8f68\u8ff9\u6765\u7814\u7a76\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u7684\u5f71\u54cd\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7406\u89e3\u80fd\u529b\u3002 \u901a\u8fc7\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u786e\u5b9a\u591a\u6a21\u6001LLM\u80fd\u5426\u5229\u7528\u5176\u5e7f\u6cdb\u7684\u8bad\u7ec3\u6765\u6709\u6548\u5730\u5145\u5f53\u4f4e\u7ea7\u63a7\u5236\u5668\uff0c\u4ece\u800c\u91cd\u65b0\u5b9a\u4e49\u52a8\u6001\u548c\u89c6\u89c9\u590d\u6742\u73af\u5883\u4e2d\u7684\u6f5c\u5728\u5e94\u7528\u3002\u6709\u5173\u989d\u5916\u7ed3\u679c\u548c\u89c6\u9891\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u8bbf\u95ee\u6211\u4eec\u7684\u9879\u76ee\u7f51\u9875\uff1ahttps://sites.google.com/view/atari-gpt/\u3002|\n", "2408.15915": "|**2024-08-28**|**Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models**|Yuncheng Yang et.al.|[2408.15915](http://arxiv.org/abs/2408.15915)|**[link](https://github.com/yaphabates/rocket)**|\u5728\u7279\u5b9a\u9886\u57df\u57f9\u517b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ee5\u89e3\u51b3\u4efb\u52a1\u6240\u9700\u7684\u4e13\u957f\u5f80\u5f80\u9700\u8981\u9488\u5bf9\u7a33\u5b9a\u9884\u671f\u8f93\u51fa\u8fdb\u884c\u4e13\u95e8\u8c03\u6574\u3002\u907f\u514d\u624b\u52a8\u51c6\u5907\u6307\u4ee4\u6570\u636e\u96c6\u548c\u8bad\u7ec3\u8d44\u6e90\u5e26\u6765\u7684\u5de8\u5927\u6210\u672c\uff0c\u5229\u7528\u5f00\u653e\u77e5\u8bc6\u5305\u62ec\u4f4e\u79e9\u9002\u5e94\uff08LoRA\uff09\u6a21\u578b\u548c\u6307\u4ee4\u6570\u636e\u96c6\u4f5c\u4e3a\u8d77\u70b9\u662f\u5408\u7406\u7684\u9009\u62e9\u3002\u7136\u800c\uff0c\u73b0\u6709\u65b9\u6cd5\u5728\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u4e0a\u4fa7\u91cd\u4e8e\u901a\u7528\u80fd\u529b\u7684\u6027\u80fd\uff0c\u800c\u5ffd\u89c6\u4e86\u5728\u7279\u5b9a\u9886\u57df\u90e8\u7f72\u65f6\u66b4\u9732\u7684\u77e5\u8bc6\u5dee\u8ddd\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u8fc7\u5f15\u5165\u5c11\u91cf\u4eba\u5de5\u6807\u6ce8\u6837\u672c\uff08\u5373K-shot\uff09\u6765\u5f25\u5408\u6b64\u7c7b\u5dee\u8ddd\u7684\u65b9\u6cd5\uff0c\u4ee5\u4fc3\u8fdbLLM\u5728\u5f00\u653e\u77e5\u8bc6\u4e0a\u7684\u4efb\u52a1\u4e13\u957f\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u9ad8\u6548\u4e14\u53ef\u6269\u5c55\u7684\u7ba1\u9053\uff0c\u4ee5\u6210\u672c\u6548\u76ca\u65b9\u5f0f\u751f\u6210\u4efb\u52a1\u4e13\u5bb6\uff0c\u5176\u4e2dK-shot\u6570\u636e\u53c2\u4e0e\u9009\u62e9\u6700\u5177\u6f5c\u529b\u7684\u4e13\u5bb6\u5019\u9009\u8005\u548c\u4efb\u52a1\u76f8\u5173\u7684\u6307\u4ee4\u3002\u6784\u5efa\u4e86\u4e00\u4e2a\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u7cfb\u7edf\uff0c\u5145\u5206\u5229\u7528\u591a\u4e2a\u4e13\u5bb6\u4e4b\u95f4\u72ec\u7279\u4f46\u4e92\u8865\u7684\u77e5\u8bc6\u3002\u6211\u4eec\u63ed\u793a\u4e86MoE\u7cfb\u7edf\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\uff1a 1. \u9075\u5faaK-shot\u539f\u5219\uff1a\u786e\u4fdd\u771f\u6b63\u5177\u5907\u89e3\u51b3K-shot\u95ee\u9898\u80fd\u529b\u7684\u6a21\u578b\u88ab\u9009\u4e2d\uff0c\u800c\u975e\u76f2\u731c\u8005\u3002 2. \u5f3a\u8c03\u591a\u6837\u6027\uff1a\u4e0d\u4ec5\u4e13\u5bb6\u672c\u8eab\u5177\u6709\u591a\u6837\u6027\uff0c\u800c\u4e14\u5728\u6574\u4e2a\u6a21\u578b\u548c\u6570\u636e\u9009\u62e9\u8fc7\u7a0b\u4e2d\uff0c\u7ec6\u8c03\u6307\u4ee4\u4e5f\u4f53\u73b0\u51fa\u591a\u6837\u6027\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u5b9e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u5bf9\u5f00\u653e\u77e5\u8bc6\u5229\u7528\u7684\u4f18\u8d8a\u6027\u3002\u540e\u7eed\u5c06\u53d1\u5e03\u4ee3\u7801\u548c\u6a21\u578b\u3002|\n", "2408.15907": "|**2024-08-28**|**Decentralized LLM Inference over Edge Networks with Energy Harvesting**|Aria Khoshsirat et.al.|[2408.15907](http://arxiv.org/abs/2408.15907)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u7684\u5353\u8d8a\u6027\u80fd\u5df2\u7ecf\u6781\u5927\u5730\u6539\u53d8\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u4f46\u5728\u8d44\u6e90\u53d7\u9650\u73af\u5883\u5982\u8fb9\u7f18\u7f51\u7edc\u4e2d\u7684\u90e8\u7f72\u4ecd\u9762\u4e34\u6311\u6218\u3002\u5206\u5e03\u5f0f\u63a8\u7406\u6280\u672f\u7684\u51fa\u73b0\u901a\u8fc7\u5728\u591a\u53f0\u8bbe\u5907\u95f4\u5206\u914d\u6a21\u578b\u5757\u6765\u63d0\u5347\u7075\u6d3b\u6027\u548c\u6210\u672c\u6548\u76ca\uff0c\u4f46\u4ecd\u5b58\u5728\u80fd\u6e90\u9650\u5236\u95ee\u9898\uff0c\u5c24\u5176\u662f\u9488\u5bf9\u7535\u6c60\u4f9b\u7535\u7684\u8fb9\u7f18\u8bbe\u5907\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u4e92\u8054\u3001\u4f7f\u7528\u80fd\u91cf\u6536\u96c6\u7684\u7535\u6c60\u4f9b\u7535\u8fb9\u7f18\u8bbe\u5907\u7684\u534f\u4f5c\u63a8\u7406\u53ef\u6301\u7eed\u6a21\u578b\u3002\u901a\u8fc7\u5efa\u7acb\u534a\u9a6c\u5c14\u53ef\u592b\u6a21\u578b\u63cf\u8ff0\u8bbe\u5907\u72b6\u6001\uff0c\u8003\u8651\u5904\u7406\u53c2\u6570\u548c\u5e73\u5747\u7eff\u8272\u80fd\u6e90\u5230\u8fbe\u60c5\u51b5\uff0c\u4ee5\u6307\u5bfc\u8bbe\u8ba1\u65e8\u5728\u51cf\u5c11\u8bbe\u5907\u505c\u673a\u65f6\u95f4\u548c\u6700\u5927\u5316\u7f51\u7edc\u541e\u5410\u91cf\u7684\u8c03\u5ea6\u7b97\u6cd5\u3002\u901a\u8fc7\u5b9e\u8bc1\u8bc4\u4f30\u548c\u6a21\u62df\u8fd0\u884c\uff0c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u4e3a\u8fb9\u7f18\u7f51\u7edc\u4e0a\u7684\u8282\u80fd\u5206\u5e03\u5f0f\u63a8\u7406\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2408.15903": "|**2024-08-28**|**LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments**|Ruirui Chen et.al.|[2408.15903](http://arxiv.org/abs/2408.15903)|null|\u5feb\u901f\u8fc7\u65f6\u7684\u4fe1\u606f\u4f7f\u5f97\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6574\u5408\u65b0\u77e5\u8bc6\u65b9\u9762\u9762\u4e34\u6311\u6218\u3002\u73b0\u6709\u65b9\u6cd5\u5728\u5904\u7406\u9700\u8981\u51c6\u786e\u4e8b\u5b9e\u8bc6\u522b\u548c\u5e8f\u5217\u903b\u8f91\u63a8\u7406\u7684\u591a\u8df3\u95ee\u9898\u65f6\u4ecd\u5b58\u5728\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u9762\u5bf9\u5927\u91cf\u4e8b\u5b9e\u66f4\u65b0\u7684\u60c5\u51b5\u4e0b\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e86Graph Memory-based Editing for Large Language Models\uff08GMeLLo\uff09\uff0c\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u5b83\u7ed3\u5408\u4e86\u77e5\u8bc6\u56fe\u8c31\uff08KGs\uff09\u7684\u660e\u786e\u77e5\u8bc6\u8868\u793a\u4e0eLLMs\u7684\u8bed\u8a00\u7075\u6d3b\u6027\u3002GMeLLo\u4e0d\u4ec5\u5229\u7528LLMs\u8fdb\u884c\u95ee\u7b54\uff0c\u8fd8\u8fd0\u7528\u8fd9\u4e9b\u6a21\u578b\u5c06\u81ea\u7136\u8bed\u8a00\u8f6c\u6362\u4e3a\u7ed3\u6784\u5316\u67e5\u8be2\u548c\u4e8b\u5b9e\u4e09\u5143\u7ec4\uff0c\u4ece\u800c\u5b9e\u73b0\u4e0eKGs\u7684\u65e0\u7f1d\u4ea4\u4e92\uff0c\u7528\u4e8e\u5feb\u901f\u66f4\u65b0\u548c\u7cbe\u786e\u7684\u591a\u8df3\u63a8\u7406\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGMeLLo\u5728\u591a\u8df3\u95ee\u7b54\u57fa\u51c6MQuAKE\u4e2d\u663e\u8457\u8d85\u8d8a\u5f53\u524d\u6700\u5148\u8fdb\u7684\u77e5\u8bc6\u7f16\u8f91\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u5728\u6d89\u53ca\u5927\u91cf\u77e5\u8bc6\u66f4\u65b0\u7684\u573a\u666f\u4e2d\u3002|\n", "2408.15901": "|**2024-08-28**|**Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts**|Nikolas Gritsch et.al.|[2408.15901](http://arxiv.org/abs/2408.15901)|null|\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u6548\u7387\u3001\u4e13\u4e1a\u5316\u548c\u5bf9\u65b0\u6570\u636e\u5206\u5e03\u7684\u9002\u5e94\u6027\u65b9\u9762\u96be\u4ee5\u540c\u65f6\u5177\u5907\u8fd9\u4e9b\u4f18\u79c0\u54c1\u8d28\u3002\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u67b6\u6784\u56e0\u5176\u6761\u4ef6\u8ba1\u7b97\u7684\u5185\u5728\u7279\u6027\uff0c\u6210\u4e3a\u7814\u7a76\u7684\u91cd\u70b9\u9886\u57df\uff0c\u65e8\u5728\u63d0\u5347\u8fd9\u4e9b\u54c1\u8d28\u3002\u672c\u5de5\u4f5c\u4e13\u6ce8\u4e8e\u201c\u5347\u7ea7\u201d\u5bc6\u96c6\u578b\u4e13\u5bb6\u6a21\u578b\u81f3MoE\u67b6\u6784\uff0c\u65e8\u5728\u589e\u5f3a\u4e13\u4e1a\u5316\u7684\u540c\u65f6\uff0c\u4e5f\u589e\u52a0\u5bf9\u65b0\u4efb\u52a1\u7684\u7075\u6d3b\u9002\u5e94\u6027\u3002 \u6211\u4eec\u5f15\u5165\u4e86Nexus\uff0c\u4e00\u79cd\u589e\u5f3a\u7684MoE\u67b6\u6784\uff0c\u5176\u5177\u6709\u81ea\u9002\u5e94\u8def\u7531\u673a\u5236\uff0c\u5141\u8bb8\u6a21\u578b\u5b66\u4e60\u5c06\u4e13\u5bb6\u5d4c\u5165\u4ece\u9886\u57df\u8868\u793a\u8fdb\u884c\u6295\u5f71\u3002\u8fd9\u79cd\u7b56\u7565\u4f7f\u5f97Nexus\u80fd\u591f\u901a\u8fc7\u5355\u72ec\u8bad\u7ec3\u7684\u5bc6\u96c6\u6a21\u578b\u7075\u6d3b\u5730\u6dfb\u52a0\u65b0\u7684\u4e13\u5bb6\uff0c\u65e0\u9700\u5bf9\u672a\u89c1\u6570\u636e\u57df\u8fdb\u884c\u5927\u89c4\u6a21MoE\u8bad\u7ec3\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cNexus\u5728\u521d\u59cb\u5347\u7ea7\u9636\u6bb5\u5b9e\u73b0\u4e86\u9ad8\u8fbe2.1%\u7684\u76f8\u5bf9\u589e\u76ca\uff0c\u5728\u4f7f\u7528\u6709\u9650\u7684\u5fae\u8c03\u6570\u636e\u6269\u5c55MoE\u65f6\u5b9e\u73b0\u4e8618.8%\u7684\u76f8\u5bf9\u589e\u76ca\u3002Nexus\u7684\u7075\u6d3b\u6027\u5bf9\u4e8e\u5efa\u7acb\u4e00\u4e2a\u5f00\u6e90\u751f\u6001\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\uff0c\u8be5\u751f\u6001\u7cfb\u7edf\u5141\u8bb8\u6bcf\u4e2a\u7528\u6237\u6839\u636e\u81ea\u5df1\u7684\u9700\u6c42\u4e0d\u65ad\u7ec4\u88c5\u81ea\u5df1\u7684MoE\u6df7\u5408\u6a21\u578b\u3002|\n", "2408.15895": "|**2024-08-28**|**Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models**|Sebastian Vallejo Vera et.al.|[2408.15895](http://arxiv.org/abs/2408.15895)|null|\u4eba\u7c7b\u7f16\u7801\u5458\u5b58\u5728\u504f\u89c1\u3002\u6211\u4eec\u901a\u8fc7\u590d\u5236Ennser-Jedenastik\u548cMeyer\uff082018\uff09\u7684\u5b9e\u9a8c\uff0c\u53d1\u73b0\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc4\u4f30\u653f\u6cbb\u58f0\u660e\u65f6\u4f7f\u7528\u653f\u6cbb\u4fe1\u606f\uff0c\u7279\u522b\u662f\u653f\u515a\u7ebf\u7d22\u3002LLMs\u4e0d\u4ec5\u6839\u636e\u653f\u515a\u7ebf\u7d22\u4e0a\u4e0b\u6587\u5316\u5224\u65ad\u9648\u8ff0\u662f\u6b63\u9762\u3001\u8d1f\u9762\u8fd8\u662f\u4e2d\u6027\uff0c\u8fd8\u53cd\u6620\u51fa\u5b83\u4eec\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u751f\u6210\u7684\u4eba\u7c7b\u6570\u636e\u6240\u5177\u6709\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u4e0e\u4eba\u7c7b\u4e0d\u540c\u7684\u662f\uff0c\u4eba\u7c7b\u4ec5\u5728\u9762\u5bf9\u6781\u7aef\u653f\u515a\u58f0\u660e\u65f6\u8868\u73b0\u51fa\u504f\u89c1\uff0c\u800cLLMs\u5373\u4f7f\u5728\u88ab\u63d0\u793a\u6765\u81ea\u4e2d\u95f4\u5de6\u7ffc\u548c\u4e2d\u95f4\u53f3\u7ffc\u653f\u515a\u7684\u58f0\u660e\u65f6\u4e5f\u663e\u793a\u51fa\u663e\u8457\u504f\u89c1\u3002\u6700\u540e\u90e8\u5206\u8ba8\u8bba\u4e86\u8fd9\u4e9b\u53d1\u73b0\u7684\u610f\u4e49\u3002|\n", "2408.15879": "|**2024-08-28**|**Persuasion Games using Large Language Models**|Ganesh Prasath Ramani et.al.|[2408.15879](http://arxiv.org/abs/2408.15879)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5df2\u7ecf\u53d1\u5c55\u6210\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u672c\u6587\u7814\u7a76\u4e86LLM\u5728\u5851\u9020\u4eba\u7c7b\u89c2\u70b9\u5e76\u8fdb\u800c\u5f71\u54cd\u4ed6\u4eec\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u7684\u51b3\u7b56\u65b9\u9762\u7684\u6f5c\u529b\u3002\u8fd9\u4e9b\u80fd\u529b\u5728\u6295\u8d44\u3001\u4fe1\u7528\u5361\u548c\u4fdd\u9669\u7b49\u591a\u4e2a\u9886\u57df\u627e\u5230\u4e86\u5e94\u7528\uff0c\u5e2e\u52a9\u7528\u6237\u9009\u62e9\u5408\u9002\u7684\u4fdd\u9669\u653f\u7b56\u3001\u6295\u8d44\u8ba1\u5212\u3001\u4fe1\u7528\u5361\u4ee5\u53ca\u96f6\u552e\u4ea7\u54c1\uff0c\u751a\u81f3\u5728\u884c\u4e3a\u6539\u53d8\u652f\u6301\u7cfb\u7edf\uff08BCSS\uff09\u4e2d\u4e5f\u6709\u5e94\u7528\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u590d\u6742\u591a\u4ee3\u7406\u6846\u67b6\uff0c\u5176\u4e2d\u4e00\u7ec4\u4ee3\u7406\u4ee5\u534f\u4f5c\u65b9\u5f0f\u64cd\u4f5c\u3002\u4e3b\u8981\u4ee3\u7406\u76f4\u63a5\u4e0e\u7528\u6237\u8fdb\u884c\u6709\u8bf4\u670d\u529b\u7684\u5bf9\u8bdd\uff0c\u800c\u8f85\u52a9\u4ee3\u7406\u6267\u884c\u8bf8\u5982\u4fe1\u606f\u68c0\u7d22\u3001\u54cd\u5e94\u5206\u6790\u3001\u5236\u5b9a\u8bf4\u670d\u7b56\u7565\u548c\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u4efb\u52a1\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u534f\u4f5c\u65b9\u6cd5\u663e\u8457\u63d0\u9ad8\u4e86LLM\u7684\u8bf4\u670d\u6548\u679c\u3002\u6211\u4eec\u6301\u7eed\u5206\u6790\u7528\u6237\u7684\u62b5\u6297\u6027\uff0c\u5e76\u901a\u8fc7\u7ed3\u5408\u89c4\u5219\u57fa\u4e8e\u548cLLM\u57fa\u4e8e\u7684\u62b5\u6297-\u8bf4\u670d\u6620\u5c04\u6280\u672f\u6765\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\u3002 \u6211\u4eec\u4f7f\u7528\u6a21\u62df\u7684\u4eba\u683c\u5f62\u8c61\uff0c\u5e76\u5728\u4fdd\u9669\u3001\u94f6\u884c\u548c\u96f6\u552e\u9886\u57df\u751f\u6210\u5bf9\u8bdd\uff0c\u4ee5\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u8bc6\u522b\u3001\u9002\u5e94\u548c\u5f71\u54cd\u4e0d\u540c\u4eba\u683c\u7c7b\u578b\u65b9\u9762\u7684\u719f\u7ec3\u7a0b\u5ea6\u3002\u540c\u65f6\uff0c\u6211\u4eec\u4e5f\u68c0\u67e5\u4e86LLM\u6a21\u62df\u4eba\u683c\u6240\u91c7\u7528\u7684\u62b5\u6297\u673a\u5236\u3002\u8bf4\u670d\u6548\u679c\u901a\u8fc7\u4ea4\u4e92\u524d\u540e\u7684\u53ef\u8861\u91cf\u8c03\u67e5\u3001LLM\u751f\u6210\u7684\u5bf9\u8bdd\u8bc4\u5206\u4ee5\u53ca\u7528\u6237\u51b3\u7b56\uff08\u8d2d\u4e70\u6216\u4e0d\u8d2d\u4e70\uff09\u8fdb\u884c\u91cf\u5316\u3002|\n", "2408.16756": "|**2024-08-29**|**How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models**|Jiyue Jiang et.al.|[2408.16756](http://arxiv.org/abs/2408.16756)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u6539\u53d8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7684\u7ade\u8d5b\u73af\u5883\uff0c\u7279\u522b\u662f\u5728\u82f1\u8bed\u548c\u5176\u4ed6\u6570\u636e\u4e30\u5bcc\u7684\u8bed\u8a00\u4e2d\u3002\u7136\u800c\uff0c\u5728\u8bf8\u5982\u7ca4\u8bed\u8fd9\u6837\u7684\u4ee3\u8868\u6027\u4e0d\u8db3\u7684\u8bed\u8a00\u9886\u57df\uff0c\u5f00\u53d1\u5dee\u8ddd\u4ecd\u7136\u663e\u8457\u5b58\u5728\uff0c\u8fd9\u5c24\u5176\u4ee4\u4eba\u62c5\u5fe7\uff0c\u8003\u8651\u5230\u5e7f\u6df1\u6e2f\u6fb3\u5927\u6e7e\u533a\u7684\u7ecf\u6d4e\u91cd\u8981\u6027\uff0c\u4ee5\u53ca\u5728\u65b0\u52a0\u5761\u548c\u5317\u7f8e\u5730\u533a\u5927\u91cf\u7ca4\u8bed\u4f7f\u7528\u8005\u7684\u60c5\u51b5\u3002\u5c3d\u7ba1\u7ca4\u8bed\u5e7f\u6cdb\u4f7f\u7528\uff0c\u4f46\u5728NLP\u7814\u7a76\u4e2d\u5bf9\u7ca4\u8bed\u7684\u4ee3\u8868\u5374\u5c11\u4e4b\u53c8\u5c11\uff0c\u5c24\u5176\u662f\u4e0e\u5176\u4ed6\u540c\u6837\u53d1\u8fbe\u5730\u533a\u7684\u8bed\u8a00\u76f8\u6bd4\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e9b\u7a7a\u767d\uff0c\u6211\u4eec\u6982\u8ff0\u4e86\u5f53\u524d\u7684\u7ca4\u8bedNLP\u65b9\u6cd5\uff0c\u5e76\u5f15\u5165\u4e86\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4e8b\u5b9e\u751f\u6210\u3001\u6570\u5b66\u903b\u8f91\u3001\u590d\u6742\u63a8\u7406\u548c\u7ca4\u8bed\u4e2d\u7684\u901a\u7528\u77e5\u8bc6\u7b49\u65b9\u9762\u7684\u6027\u80fd\u7684\u65b0\u57fa\u51c6\uff0c\u65e8\u5728\u63a8\u52a8\u5f00\u6e90\u7ca4\u8bedLLM\u6280\u672f\u7684\u53d1\u5c55\u3002\u6211\u4eec\u4e5f\u63d0\u51fa\u4e86\u672a\u6765\u7684\u7814\u7a76\u65b9\u5411\u548c\u63a8\u8350\u7684\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u7ca4\u8bedLLM\u7684\u5f00\u53d1\u3002|\n", "2408.16753": "|**2024-08-29**|**Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models**|Alec Solway et.al.|[2408.16753](http://arxiv.org/abs/2408.16753)|null|\u5f3a\u5316\u5b66\u4e60\u5728\u9884\u8bad\u7ec3\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u6700\u5927\u5316\u4f3c\u7136\u6027\u6765\u9884\u6d4b\u5927\u578b\u6587\u672c\u8bed\u6599\u5e93\u4e2d\u7684\u4e0b\u4e00\u4e2a\u6587\u672c\u4ee4\u724c\uff0c\u7528\u4e8e\u5c06\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5bf9\u9f50\u3002\u5728\u90e8\u7f72\u5230\u7279\u5b9a\u9886\u57df\u4e4b\u524d\uff0c\u901a\u5e38\u4f1a\u5bf9\u6a21\u578b\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u5fae\u8c03\u4ee5\u9002\u5e94\u4efb\u52a1\u76f8\u5173\u7684\u6570\u636e\u3002\u7531\u4e8e\u4eba\u7c7b\u504f\u597d\u4fe1\u53f7\u5728\u6700\u540e\u9636\u6bb5\u5f80\u5f80\u4e0d\u53ef\u7528\uff0c\u56e0\u6b64\u901a\u5e38\u4f7f\u7528\u6700\u5927\u5316\u4f3c\u7136\u6027\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u9ed8\u8ba4\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5f3a\u5316\u5b66\u4e60\u9664\u4e86\u80fd\u591f\u4fc3\u8fdb\u4e0e\u4eba\u7c7b\u5b9a\u4e49\u5956\u52b1\u51fd\u6570\u7684\u5bf9\u9f50\u4e4b\u5916\uff0c\u8fd8\u6709\u5176\u4ed6\u4f18\u52bf\u3002\u76f8\u6bd4\u4e8e\u6700\u5927\u5316\u4f3c\u7136\u6027\uff0c\u5373\u6a21\u4eff\u5b66\u4e60\u6a21\u578b\u5728\u7406\u60f3\u6761\u4ef6\u4e0b\u5e94\u6267\u884c\u7684\u64cd\u4f5c\uff0c\u5f3a\u5316\u5b66\u4e60\u4e0d\u9650\u4e8e\u4ec5\u5c55\u793a\u8fbe\u5230\u6700\u4f18\u72b6\u6001\u65f6\u7684\u64cd\u4f5c\uff0c\u800c\u662f\u5728\u63a2\u7d22\u7b56\u7565\u7a7a\u95f4\u7684\u8fc7\u7a0b\u4e2d\u8bad\u7ec3\u6a21\u578b\u5728\u5404\u79cd\u60c5\u51b5\u4e0b\u7684\u64cd\u4f5c\u3002\u6b64\u5916\uff0c\u5b83\u8fd8\u8bad\u7ec3\u6a21\u578b\u907f\u514d\u6267\u884c\u7ade\u4e89\u4f46\u6548\u679c\u4e0d\u4f73\u7684\u64cd\u4f5c\u3002\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u8fdb\u884c\u6700\u540e\u4e00\u9636\u6bb5\u5fae\u8c03\u7684\u6846\u67b6\uff0c\u5e76\u6d4b\u8bd5\u4e86\u8be5\u65b9\u6cd5\u662f\u5426\u80fd\u5e26\u6765\u6027\u80fd\u63d0\u5347\u3002\u5b9e\u9a8c\u96c6\u4e2d\u5728\u62bd\u8c61\u6982\u62ec\u4e0a\uff0c\u4f46\u6846\u67b6\u5177\u6709\u666e\u904d\u9002\u7528\u6027\u3002\u91c7\u7528\u8be5\u6d41\u7a0b\u4ea7\u751f\u7684\u7ed3\u679c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u6700\u5927\u4f3c\u7136\u6027\u8f93\u51fa\u7684\u7ed3\u679c\u3002\u5bf9\u4e8e\u7279\u5b9a\u7684\u6570\u636e\u96c6\uff0c\u901a\u8fc7\u540e\u5904\u7406\u6700\u5927\u4f3c\u7136\u8f93\u51fa\u53ef\u4ee5\u7f29\u5c0f\u6027\u80fd\u5dee\u8ddd\u3002\u7136\u800c\uff0c\u8be5\u6846\u67b6\u63d0\u4f9b\u4e86\u4e00\u79cd\u4f18\u5316\u6a21\u578b\u7684\u65b0\u9014\u5f84\uff0c\u5728\u540e\u5904\u7406\u53ef\u80fd\u4e0d\u90a3\u4e48\u76f4\u63a5\u6709\u6548\u6216\u6709\u6548\u7684\u573a\u666f\u4e2d\u5c24\u4e3a\u6709\u7528\uff0c\u5e76\u4e14\u5b83\u53ef\u4ee5\u6269\u5c55\u4ee5\u5305\u62ec\u66f4\u591a\u7c7b\u522b\u7684\u9700\u8981\u60e9\u7f5a\u5e76\u8bad\u7ec3\u53cd\u5bf9\u7684\u4e0d\u9002\u5f53\u8f93\u51fa\uff0c\u5982\u5e7b\u89c9\u3002|\n", "2408.16749": "|**2024-08-29**|**Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge**|Beidi Dong et.al.|[2408.16749](http://arxiv.org/abs/2408.16749)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u68c0\u6d4b\u548c\u9650\u5236\u7f51\u7edc\u4e0a\u6781\u7aef\u4e3b\u4e49\u601d\u60f3\u4f20\u64ad\u65b9\u9762\uff0c\u81ea\u52a8\u5de5\u5177\u7684\u91cd\u8981\u6027\u3002\u7814\u7a76\u6bd4\u8f83\u4e86\u53cc\u5411\u7f16\u7801\u8868\u793a\u7684Transformer\uff08BERT\uff09\u548c\u751f\u6210\u9884\u8bad\u7ec3Transformer\uff08GPT\uff09\u6a21\u578b\uff0c\u5728\u201c\u53f3\u7ffc\u201d\u548c\u201c\u5de6\u7ffc\u201d\u610f\u8bc6\u5f62\u6001\u5173\u952e\u8bcd\u7684\u793e\u4ea4\u5a92\u4f53\u5e16\u5b50\u4e2d\u8fdb\u884c\u68c0\u6d4b\u4e0e\u5206\u7c7b\u7684\u80fd\u529b\u3002\u6211\u4eec\u6536\u96c6\u4e86\u542b\u6709\u4e0a\u8ff0\u5173\u952e\u8bcd\u7684\u5e16\u5b50\uff0c\u5e76\u4eba\u5de5\u6807\u8bb0\u4e3a\u6781\u7aef\u4e3b\u4e49\u6216\u975e\u6781\u7aef\u4e3b\u4e49\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c06\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u5206\u4e3a\u4e94\u4e2a\u6784\u6210\u8981\u7d20\u4e4b\u4e00\uff0c\u57fa\u4e8e\u5de5\u4f5c\u5b9a\u4e49\u6846\u67b6\u3002 BERT\u6a21\u578b\u7684\u6027\u80fd\u8bc4\u4f30\u57fa\u4e8e\u8bad\u7ec3\u6570\u636e\u89c4\u6a21\u548c\u7c7b\u522b\u95f4\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5bf9\u6bd4\u4e86\u4f7f\u7528\u4e0d\u540c\u63d0\u793a\u7684GPT 3.5\u548cGPT 4\u6a21\u578b\u7684\u6027\u80fd\uff1a\u539f\u59cb\u63d0\u793a\u3001\u4e00\u822c\u5b9a\u4e49\u3001\u89d2\u8272\u626e\u6f14\u548c\u4e13\u4e1a\u5b9a\u4e49\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6700\u4f73\u8868\u73b0\u7684GPT\u6a21\u578b\u4f18\u4e8e\u6700\u4f73\u8868\u73b0\u7684BERT\u6a21\u578b\uff0c\u66f4\u8be6\u7ec6\u7684\u63d0\u793a\u901a\u5e38\u80fd\u5e26\u6765\u66f4\u597d\u7684\u7ed3\u679c\u3002\u7136\u800c\uff0c\u8fc7\u4e8e\u590d\u6742\u7684\u63d0\u793a\u53ef\u80fd\u4f1a\u5f71\u54cd\u6027\u80fd\u3002\u4e0d\u540c\u7684GPT\u7248\u672c\u5bf9\u88ab\u8ba4\u5b9a\u4e3a\u6781\u7aef\u4e3b\u4e49\u7684\u654f\u611f\u5ea6\u5404\u4e0d\u76f8\u540c\u3002GPT 3.5\u5728\u8bc6\u522b\u5de6\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\uff0c\u800cGPT 4\u5219\u5728\u8bc6\u522b\u53f3\u7ffc\u6781\u7aef\u4e3b\u4e49\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u66f4\u597d\u3002 \u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT\u6a21\u578b\uff09\u5728\u5728\u7ebf\u6781\u7aef\u4e3b\u4e49\u5206\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u663e\u8457\u6f5c\u529b\uff0c\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684BERT\u6a21\u578b\uff0c\u5728\u96f6\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u8272\u3002\u672a\u6765\u7814\u7a76\u5e94\u63a2\u7d22\u4eba\u7c7b\u4e0e\u8ba1\u7b97\u673a\u4ea4\u4e92\u5728\u4f18\u5316GPT\u6a21\u578b\u4ee5\u8fdb\u884c\u6781\u7aef\u4e3b\u4e49\u68c0\u6d4b\u4e0e\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u4f5c\u7528\uff0c\u4ee5\u5f00\u53d1\u66f4\u9ad8\u6548\uff08\u4f8b\u5982\uff0c\u66f4\u5feb\u6377\u3001\u66f4\u5c11\u52aa\u529b\uff09\u4e14\u66f4\u6709\u6548\u7684\u8bc6\u522b\u6781\u7aef\u4e3b\u4e49\u5185\u5bb9\u65b9\u6cd5\u3002|\n", "2408.16740": "|**2024-08-29**|**Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models**|Ji\u0159\u00ed Mili\u010dka et.al.|[2408.16740](http://arxiv.org/abs/2408.16740)|null|\u672c\u6587\u4ece\u5b9a\u91cf\u8bed\u8a00\u5b66\u7684\u89d2\u5ea6\u63a2\u8ba8\u4e86\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u53ca\u5176\u751f\u6210\u6587\u672c\u6240\u9762\u4e34\u7684\u6982\u5ff5\u3001\u65b9\u6cd5\u8bba\u548c\u6280\u672f\u6311\u6218\u3002\u672c\u6587\u57fa\u4e8e\u4e00\u4e2a\u7406\u8bba\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u533a\u5206\u4e86\u4f5c\u4e3a\u8f7d\u4f53\u7684LLM\u4e0e\u6a21\u62df\u7684\u5b9e\u4f53\u3002\u672c\u6587\u5021\u5bfc\u5bf9\u6a21\u578b\u91c7\u53d6\u4e25\u683c\u975e\u62df\u4eba\u5316\u7684\u65b9\u6cd5\uff0c\u540c\u65f6\u8c28\u614e\u5730\u5e94\u7528\u7528\u4e8e\u7814\u7a76\u4eba\u7c7b\u8bed\u8a00\u884c\u4e3a\u7684\u65b9\u6cd5\u6765\u5206\u6790\u6a21\u62df\u5b9e\u4f53\u3002\u867d\u7136\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7814\u7a76\u8005\u5173\u6ce8\u6a21\u578b\u672c\u8eab\u3001\u5176\u67b6\u6784\u3001\u8bc4\u4f30\u4ee5\u53ca\u63d0\u9ad8\u6027\u80fd\u7684\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u5b9a\u91cf\u8bed\u8a00\u5b66\u5bb6\uff0c\u6211\u4eec\u7684\u76ee\u6807\u662f\u6784\u5efa\u5173\u4e8eLLM\u751f\u6210\u6587\u672c\u7279\u6027\u7684\u7406\u8bba\u4f53\u7cfb\uff0c\u5b83\u4eec\u4e0e\u4eba\u7c7b\u751f\u6210\u7684\u6587\u672c\u6709\u4f55\u4e0d\u540c\uff0c\u4ee5\u53ca\u6a21\u62df\u5b9e\u4f53\u7684\u5c5e\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5e94\u63a2\u7d22LLM\u4f5c\u4e3a\u7814\u7a76\u4eba\u7c7b\u6587\u5316\u5de5\u5177\u7684\u53ef\u80fd\u6027\uff0c\u800c\u8bed\u8a00\u662f\u8fd9\u4e00\u6587\u5316\u4e0d\u53ef\u6216\u7f3a\u7684\u4e00\u90e8\u5206\u3002|\n", "2408.16700": "|**2024-08-29**|**GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models**|Moreno D'Inc\u00e0 et.al.|[2408.16700](http://arxiv.org/abs/2408.16700)|**[link](https://github.com/moreno98/gradbias)**|**\u8fd1\u671f\u5728\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u751f\u6210\u6a21\u578b\u9886\u57df\u53d6\u5f97\u7684\u8fdb\u5c55\u4f7f\u5f97\u9ad8\u8d28\u91cf\u56fe\u50cf\u751f\u6210\u6210\u4e3a\u53ef\u80fd\u3002\u968f\u7740\u6027\u80fd\u548c\u53ef\u8bbf\u95ee\u6027\u7684\u63d0\u9ad8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6b63\u53d7\u5230\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u548c\u6b22\u8fce\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u516c\u5e73\u6027\u548c\u5b89\u5168\u6027\u662f\u9632\u6b62\u504f\u89c1\u4f20\u64ad\u548c\u5ef6\u7eed\u7684\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9884\u5b9a\u4e49\u504f\u89c1\uff08\u5982\u6027\u522b\u3001\u79cd\u65cf\uff09\u7684\u5c01\u95ed\u96c6\u5408\u4e0a\u8fdb\u884c\u504f\u89c1\u68c0\u6d4b\u3002\u7136\u800c\uff0c\u5728\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\uff0c\u5373\u65e0\u9700\u9884\u5148\u8bbe\u5b9a\u7684\u60c5\u51b5\u4e0b\uff0c\u68c0\u6d4b\u548c\u91cf\u5316\u504f\u89c1\u662f\u4e00\u4e2a\u6311\u6218\u3002 \u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u8bc6\u522b\u3001\u91cf\u5316\u548c\u89e3\u91ca\u5f00\u653e\u96c6\u8bbe\u7f6e\u4e0b\u7684\u504f\u89c1\u3002\u8be5\u7ba1\u9053\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ece\u4e00\u7ec4\u63cf\u8ff0\u4e2d\u63d0\u51fa\u504f\u89c1\u3002\u968f\u540e\uff0c\u4f7f\u7528\u76ee\u6807\u751f\u6210\u6a21\u578b\u751f\u6210\u4e00\u7cfb\u5217\u56fe\u50cf\u3002\u6700\u540e\uff0c\u901a\u8fc7\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u8fdb\u884c\u504f\u89c1\u8bc4\u4f30\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4e24\u79cd\u57fa\u4e8e\u6b64\u6846\u67b6\u7684\u65b9\u6cd5\uff1aOpenBias \u548c GradBias\u3002OpenBias \u80fd\u591f\u68c0\u6d4b\u5e76\u91cf\u5316\u4e0e\u4eba\u3001\u7269\u4f53\u548c\u52a8\u7269\u76f8\u5173\u7684\u5df2\u77e5\u548c\u65b0\u578b\u504f\u89c1\uff0c\u5e76\u4e0e\u73b0\u6709\u7684\u5c01\u95ed\u96c6\u504f\u89c1\u68c0\u6d4b\u65b9\u6cd5\u4ee5\u53ca\u4eba\u7c7b\u5224\u65ad\u9ad8\u5ea6\u4e00\u81f4\u3002GradBias \u663e\u793a\u51fa\u4e2d\u6027\u8bcd\u6c47\u5bf9\u504f\u89c1\u7684\u5f71\u54cd\u663e\u8457\uff0c\u5e76\u4e14\u5728\u591a\u9879\u57fa\u7ebf\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u5305\u62ec\u6700\u5148\u8fdb\u7684\u57fa\u7840\u6a21\u578b\u3002 \u4ee3\u7801\u5df2\u5728\u6b64\u5904\u63d0\u4f9b\uff1ahttps://github.com/Moreno98/GradBias\u3002**|\n", "2408.16673": "|**2024-08-29**|**Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity**|Ziniu Li et.al.|[2408.16673](http://arxiv.org/abs/2408.16673)|null|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u4e0b\u6e38\u4efb\u52a1\u7684\u7cbe\u8c03\uff08Supervised Fine-Tuning\uff0cSFT\uff09\u8fc7\u7a0b\u4e2d\u9047\u5230\u7684\u8fc7\u62df\u5408\u548c\u8f93\u51fa\u591a\u6837\u6027\u53d7\u9650\u7684\u95ee\u9898\u3002\u4f20\u7edf\u4e0a\uff0c\u4ea4\u53c9\u71b5\uff08Cross Entropy\uff0cCE\uff09\u635f\u5931\u51fd\u6570\u88ab\u5e7f\u6cdb\u7528\u4e8eSFT\uff0c\u7136\u800c\u5b83\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5bf9\u6570\u636e\u5206\u5e03\u8fdb\u884c\u8fc7\u4e8e\u6fc0\u8fdb\u7684\u66f4\u65b0\uff0c\u4ece\u800c\u5f15\u53d1\u8fc7\u62df\u5408\u548c\u964d\u4f4e\u8f93\u51fa\u7684\u591a\u6837\u6027\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u672c\u6587\u5f15\u5165\u4e86\u6700\u5927\u71b5\u539f\u5219\uff0c\u8be5\u539f\u5219\u503e\u5411\u4e8e\u4fc3\u8fdb\u6a21\u578b\u751f\u6210\u66f4\u5e73\u6ed1\u7684\u6982\u7387\u5206\u5e03\uff0c\u540c\u65f6\u4ecd\u80fd\u6709\u6548\u6355\u6349\u6570\u636e\u7279\u5f81\u3002\u5177\u4f53\u5730\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGEM\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u89e3\u51b3\u53cd\u5411Kullback-Leibler\u6563\u5ea6\u6700\u5c0f\u5316\u95ee\u9898\uff0c\u5e76\u52a0\u5165\u71b5\u6b63\u5219\u5316\u5668\uff0c\u6765\u5339\u914d\u76ee\u6807\u5206\u5e03\u3002 \u5728\u5bf9Llama-3-8B\u6a21\u578b\u8fdb\u884cSFT\u65f6\uff0cGEM\u5728\u591a\u4e2a\u65b9\u9762\u4f18\u4e8eCE\u3002\u9996\u5148\uff0c\u5728\u4f7f\u7528UltraFeedback\u6570\u636e\u96c6\u8bad\u7ec3\u4ee5\u589e\u5f3a\u6a21\u578b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u65f6\uff0cGEM\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u8ff9\u8c61\uff0c\u8868\u73b0\u4e3a\u66f4\u4f4e\u7684\u56f0\u60d1\u5ea6\u548c\u5728IFEval\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u66f4\u597d\u6027\u80fd\u3002\u6b64\u5916\uff0cGEM\u8fd8\u63d0\u9ad8\u4e86\u8f93\u51fa\u7684\u591a\u6837\u6027\uff0c\u5373\u4f7f\u5728\u6ca1\u6709\u7279\u5b9a\u9886\u57df\u6570\u636e\u7684\u60c5\u51b5\u4e0b\uff0c\u4ec5\u901a\u8fc7\u6700\u4f73n\u91c7\u6837\uff0c\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u7684\u6027\u80fd\u4e5f\u5f97\u5230\u4e86\u6700\u9ad87\u5206\u7684\u63d0\u5347\u3002 \u8fdb\u4e00\u6b65\u5730\uff0c\u5f53\u4f7f\u7528\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u96c6\u5bf9\u6570\u5b66\u63a8\u7406\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u65f6\uff0cGEM\u540c\u6837\u8868\u73b0\u51fa\u8f83\u4f4e\u7684\u8fc7\u62df\u5408\u548c\u4e0eCE\u76f8\u6bd4\u9ad8\u8fbe10\u5206\u7684\u6027\u80fd\u63d0\u5347\u3002|\n", "2408.16601": "|**2024-08-29**|**Examination of Code generated by Large Language Models**|Robin Beer et.al.|[2408.16601](http://arxiv.org/abs/2408.16601)|**[link](https://github.com/t-muras/ai-code-analysis)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4f8b\u5982ChatGPT\u548cCopilot\uff0c\u6b63\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u4ee3\u7801\u751f\u6210\u5f7b\u5e95\u6539\u53d8\u8f6f\u4ef6\u5f00\u53d1\uff0c\u8fd9\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u4fc3\u8fdb\u4e86\u5feb\u901f\u539f\u578b\u8bbe\u8ba1\u3001\u6559\u80b2\u652f\u6301\u4ee5\u53ca\u751f\u4ea7\u529b\u7684\u63d0\u5347\u3002\u56e0\u6b64\uff0cLLM\u751f\u6210\u7684\u4ee3\u7801\u6b63\u786e\u6027\u548c\u8d28\u91cf\u5e94\u4e0e\u4eba\u5de5\u7f16\u5199\u7684\u4ee3\u7801\u76f8\u5f53\u3002\u4e3a\u4e86\u8bc4\u4f30\u5f53\u524dLLM\u5728\u751f\u6210Java\u548cPython\u8bed\u8a00\u4e2d\u7684\u7b80\u5355\u7b97\u6cd5\u53ca\u5176\u5bf9\u5e94\u7684\u5355\u5143\u6d4b\u8bd5\u65f6\u7684\u6b63\u786e\u6027\u548c\u8d28\u91cf\uff08\u8986\u76d6\u7387\uff09\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u53d7\u63a7\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u5305\u62ec\u8ba9LLM\u751f\u6210\u4ee3\u7801\u5e76\u8bc4\u4f30\u5176\u6b63\u786e\u6027\u4e0e\u8d28\u91cf\u3002\u6211\u4eec\u89c2\u5bdf\u5230LLM\u4e4b\u95f4\u3001\u4e0d\u540c\u7f16\u7a0b\u8bed\u8a00\u4e4b\u95f4\u3001\u7b97\u6cd5\u4e0e\u6d4b\u8bd5\u4ee3\u7801\u4e4b\u95f4\u4ee5\u53ca\u65f6\u95f4\u4e0a\u7684\u663e\u8457\u5dee\u5f02\u3002\u672c\u6587\u62a5\u544a\u4e86\u8fd9\u4e9b\u7ed3\u679c\u53ca\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u4ee5\u4fbf\u8fdb\u884c\u91cd\u590d\u548c\u53ef\u6bd4\u7684\u8bc4\u4f30\uff0c\u4ee5\u6db5\u76d6\u66f4\u591a\u7684\u7b97\u6cd5\u3001\u8bed\u8a00\u548cLLM\u968f\u65f6\u95f4\u7684\u53d8\u5316\u60c5\u51b5\u3002**|\n", "2408.16586": "|**2024-08-29**|**Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies**|Zhiyang Qi et.al.|[2408.16586](http://arxiv.org/abs/2408.16586)|null|\u8fd1\u671f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-4\u7684\u53d1\u5c55\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5bf9\u8bdd\u7cfb\u7edf\u7684\u6027\u80fd\uff0c\u4f7f\u5f97\u5b83\u4eec\u80fd\u591f\u751f\u6210\u66f4\u4e3a\u81ea\u7136\u6d41\u7545\u7684\u5bf9\u8bdd\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7cfb\u7edf\u4ecd\u9762\u4e34\u7740\u8bf8\u5982\u6301\u7eed\u5bf9\u8bdd\u7ba1\u7406\u3001\u8bb0\u5fc6\u4fdd\u7559\u548c\u51cf\u5c11\u5e7b\u89c9\u7b49\u6311\u6218\u3002AIWolfDial2024\u8fd9\u4e00\u9879\u76ee\u901a\u8fc7\u91c7\u7528\u201c\u72fc\u4eba\u6740\u201d\u8fd9\u4e00\u4e0d\u5b8c\u5168\u4fe1\u606f\u6e38\u620f\u6765\u6d4b\u8bd5LLM\u5728\u590d\u6742\u4e92\u52a8\u73af\u5883\u4e2d\u7684\u80fd\u529b\uff0c\u4ee5\u5e94\u5bf9\u4e0a\u8ff0\u6311\u6218\u3002\u8be5\u9879\u76ee\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u201c\u72fc\u4eba\u6740\u201d\u6e38\u620fAI\uff0c\u5176\u4e2d\u6bcf\u4e2a\u89d2\u8272\u90fd\u901a\u8fc7\u60c5\u5883\u5206\u6790\u6765\u8f85\u52a9\u56de\u5e94\u751f\u6210\u3002\u5bf9\u4e8e\u201c\u72fc\u4eba\u201d\u8fd9\u4e00\u89d2\u8272\uff0c\u9879\u76ee\u91c7\u7528\u4e86\u5305\u62ec\u903b\u8f91\u5438\u5f15\u529b\u3001\u53ef\u4fe1\u5ea6\u5438\u5f15\u529b\u548c\u60c5\u611f\u5438\u5f15\u529b\u5728\u5185\u7684\u591a\u79cd\u8bf4\u670d\u7b56\u7565\uff0c\u4ee5\u6709\u6548\u5730\u5f15\u5bfc\u5176\u4ed6\u73a9\u5bb6\u4e0e\u81ea\u5df1\u7684\u884c\u52a8\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2408.16518": "|**2024-08-29**|**CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues**|Rena Gao et.al.|[2408.16518](http://arxiv.org/abs/2408.16518)|**[link](https://github.com/renagao/csl2024)**|\u6211\u4eec\u5f00\u53d1\u4e86CNIMA\uff08\u4e00\u79cd\u4e2d\u6587\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u975e\u6bcd\u8bed\u4e92\u52a8\u6d4b\u91cf\u4e0e\u81ea\u52a8\u5316\u6570\u636e\u96c6\uff09\uff0c\u5305\u542b10,000\u4e2a\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u8bc4\u4f30\u6846\u67b6\u6765\u6ce8\u91caCNIMA\uff0c\u8be5\u6846\u67b6\u6700\u521d\u7528\u4e8e\u82f1\u8bed\u4f5c\u4e3a\u7b2c\u4e8c\u8bed\u8a00\u7684\u5bf9\u8bdd\uff0c\u5b83\u8bc4\u4f30\u4e86\u5fae\u89c2\u5c42\u9762\u7279\u5f81\uff08\u5982\u56de\u8bdd\uff09\u548c\u5b8f\u89c2\u5c42\u9762\u4e92\u52a8\u6807\u7b7e\uff08\u5982\u4e3b\u9898\u7ba1\u7406\uff09\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u8be5\u6846\u67b6\u4ece\u82f1\u8bed\u5230\u4e2d\u6587\u7684\u53ef\u79fb\u690d\u6027\u3002\u53d1\u73b0\u8be5\u6846\u67b6\u5728\u4e0d\u540c\u8bed\u8a00\u4e4b\u95f4\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5e76\u63ed\u793a\u4e86\u666e\u904d\u6027\u548c\u7279\u5b9a\u4e8e\u8bed\u8a00\u7684\u5fae\u89c2\u5c42\u9762\u548c\u5b8f\u89c2\u5c42\u9762\u7279\u5f81\u4e4b\u95f4\u7684\u5173\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u81ea\u52a8\u5316\u8bc4\u4f30\u7684\u65b9\u6cd5\uff0c\u5e76\u627e\u5230\u4e86\u5f3a\u5927\u7684\u6027\u80fd\uff0c\u521b\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u81ea\u52a8\u5316\u7b2c\u4e8c\u8bed\u8a00\u8bc4\u4f30\u5de5\u5177\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u6613\u4e8e\u9002\u5e94\u5176\u4ed6\u8bed\u8a00\uff0c\u56e0\u4e3a\u5b83\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u56e0\u6b64\u4e0d\u9700\u8981\u5927\u89c4\u6a21\u6807\u6ce8\u8bad\u7ec3\u6570\u636e\u3002|\n", "2408.16502": "|**2024-08-29**|**LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?**|Jan Cegin et.al.|[2408.16502](http://arxiv.org/abs/2408.16502)|null|\u751f\u6210\u5f0f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u6570\u636e\u589e\u5f3a\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u8d8a\u6765\u8d8a\u5e7f\u6cdb\uff0c\u6587\u672c\u6837\u672c\u901a\u8fc7LLM\u8fdb\u884c\u540c\u4e49\u66ff\u6362\u540e\u7528\u4e8e\u5206\u7c7b\u6a21\u578b\u7684\u5fae\u8c03\u3002\u7136\u800c\uff0c\u5173\u4e8eLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u76f8\u8f83\u4e8e\u73b0\u6709\u6210\u719f\u65b9\u6cd5\u662f\u5426\u5177\u6709\u660e\u663e\u4f18\u52bf\u7684\u7814\u7a76\u8bc1\u636e\u76f8\u5bf9\u7f3a\u4e4f\u3002\u4e3a\u4e86\u63a2\u8ba8\u5728\u4f55\u79cd\u60c5\u51b5\u4e0b\u4f7f\u7528LLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u66f4\u4e3a\u6709\u5229\uff0c\u672c\u7814\u7a76\u57286\u4e2a\u6570\u636e\u96c6\u30013\u4e2a\u5206\u7c7b\u5668\u548c2\u79cd\u5fae\u8c03\u65b9\u6cd5\u4e0a\u8fdb\u884c\u4e86\u5bf9\u6bd4\u5b9e\u9a8c\u3002\u6211\u4eec\u8fd8\u8c03\u6574\u4e86\u79cd\u5b50\u6570\u91cf\u548c\u6536\u96c6\u6837\u672c\u7684\u6570\u91cf\uff0c\u4ee5\u4fbf\u66f4\u5168\u9762\u5730\u63a2\u7d22\u4e0b\u6e38\u6a21\u578b\u51c6\u786e\u5ea6\u7a7a\u95f4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u6210\u672c\u6548\u76ca\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u975e\u5e38\u5c11\u91cf\u79cd\u5b50\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u6570\u636e\u589e\u5f3a\u65b9\u6cd5\u503c\u5f97\u90e8\u7f72\u3002\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\uff0c\u73b0\u6709\u65b9\u6cd5\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u7c7b\u4f3c\u751a\u81f3\u66f4\u597d\u7684\u6a21\u578b\u51c6\u786e\u5ea6\u3002|\n", "2408.17437": "|**2024-08-30**|**SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists**|Raoyuan Zhao et.al.|[2408.17437](http://arxiv.org/abs/2408.17437)|**[link](https://github.com/loreley99/syntheval_checklist)**|**\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\uff0c\u4f20\u7edf\u7684\u57fa\u51c6\u6d4b\u8bd5\u901a\u5e38\u4f7f\u7528\u9759\u6001\u9884\u7559\u6d4b\u8bd5\u96c6\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5f80\u5f80\u4f1a\u5bfc\u81f4\u6027\u80fd\u8fc7\u4f30\u8ba1\uff0c\u5e76\u7f3a\u4e4f\u63d0\u4f9b\u5168\u9762\u3001\u53ef\u89e3\u91ca\u548c\u52a8\u6001\u8bc4\u4f30NLP\u6a21\u578b\u7684\u80fd\u529b\u3002\u8fd1\u671f\uff0c\u5982DynaBench\uff08Kiela\u7b49\uff0c2021\u5e74\uff09\u548cCheckList\uff08Ribeiro\u7b49\uff0c2020\u5e74\uff09\u7b49\u4f5c\u54c1\u901a\u8fc7\u591a\u6b65\u9aa4\u4eba\u5de5\u6ce8\u91ca\u7ba1\u9053\u751f\u6210\u6d4b\u8bd5\u7c7b\u578b\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4ee5\u5bf9NLP\u6a21\u578b\u8fdb\u884c\u884c\u4e3a\u6d4b\u8bd5\u3002\u4e0d\u5e78\u7684\u662f\uff0c\u624b\u52a8\u521b\u5efa\u5404\u79cd\u6d4b\u8bd5\u7c7b\u578b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\u52b3\u52a8\uff0c\u6210\u672c\u9ad8\u6602\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSYNTHEVAL\u7684\u6df7\u5408\u884c\u4e3a\u6d4b\u8bd5\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u751f\u6210\u5927\u91cf\u6d4b\u8bd5\u7c7b\u578b\uff0c\u4e3aNLP\u6a21\u578b\u8fdb\u884c\u5168\u9762\u8bc4\u4f30\u3002SYNTHEVAL\u9996\u5148\u901a\u8fc7LLMs\u8fdb\u884c\u53d7\u63a7\u751f\u6210\u751f\u6210\u53e5\u5b50\uff0c\u7136\u540e\u901a\u8fc7\u6bd4\u8f83LLMs\u4e0e\u7279\u5b9a\u4efb\u52a1\u7684NLP\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\u6765\u8bc6\u522b\u6311\u6218\u6027\u793a\u4f8b\u3002\u6700\u540e\u9636\u6bb5\uff0c\u7531\u4eba\u7c7b\u4e13\u5bb6\u8c03\u67e5\u8fd9\u4e9b\u6311\u6218\u6027\u793a\u4f8b\uff0c\u624b\u52a8\u8bbe\u8ba1\u6a21\u677f\uff0c\u5e76\u786e\u5b9a\u7279\u5b9a\u4efb\u52a1\u6a21\u578b\u4e00\u81f4\u8868\u73b0\u7684\u5931\u8d25\u7c7b\u578b\u3002\u6211\u4eec\u5c06SYNTHEVAL\u5e94\u7528\u4e8e\u60c5\u611f\u5206\u6790\u548c\u6709\u6bd2\u8bed\u8a00\u68c0\u6d4b\u4e24\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e0a\uff0c\u5e76\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u8bc6\u522b\u8fd9\u4e9b\u4efb\u52a1\u4e2d\u5f3a\u5927\u6a21\u578b\u7684\u5f31\u70b9\u65b9\u9762\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5206\u4eab\u4e86\u4ee3\u7801\u4e8ehttps://github.com/Loreley99/SynthEval_CheckList\u3002**|\n", "2408.17431": "|**2024-08-30**|**Advancing Multi-talker ASR Performance with Large Language Models**|Mohan Shi et.al.|[2408.17431](http://arxiv.org/abs/2408.17431)|null|\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u9886\u57df\uff0c\u8bc6\u522b\u5bf9\u8bdd\u573a\u666f\u4e2d\u7684\u91cd\u53e0\u8bed\u97f3\u662f\u6781\u5177\u6311\u6218\u6027\u7684\u95ee\u9898\u3002\u4f20\u7edf\u7684\u5904\u7406\u65b9\u6cd5\u901a\u8fc7\u5e8f\u5217\u8f93\u51fa\u8bad\u7ec3\uff08SOT\uff09\uff0c\u5373\u5c06\u591a\u4e2a\u8bf4\u8bdd\u8005\u7684\u58f0\u97f3\u6392\u653e\u65f6\u95f4\u6309\u7167\u5176\u53d1\u8a00\u987a\u5e8f\u8fdb\u884c\u62fc\u63a5\uff0c\u6765\u89e3\u51b3\u591a\u8bf4\u8bdd\u8005ASR\u95ee\u9898\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u4ece\u5bf9\u8bdd\u4e2d\u62fc\u63a5\u76f8\u5173\u8bdd\u8bed\u7684\u8f6c\u5f55\u4f9d\u8d56\u4e8e\u6784\u5efa\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u65b0\u65b9\u6cd5\u53ef\u80fd\u66f4\u9002\u5408\u5904\u7406\u8fd9\u7c7b\u590d\u6742\u4e14\u5177\u6709\u6311\u6218\u6027\u7684\u573a\u666f\uff0c\u56e0\u4e3a\u5b83\u5229\u7528\u4e86\u9884\u8bad\u7ec3\u89e3\u7801\u5668\u7684\u5f3a\u5927\u80fd\u529b\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684SOT\u65b9\u6cd5\u7528\u4e8e\u591a\u8bf4\u8bdd\u8005ASR\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u548cLLM\uff0c\u5e76\u901a\u8fc7\u9002\u5f53\u7684\u7b56\u7565\u5bf9\u591a\u8bf4\u8bdd\u8005\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6a21\u62df\u6570\u636e\u96c6LibriMix\u4e0a\u4f18\u4e8e\u4f20\u7edf\u7684\u65b9\u6cd5\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6AMI\u7684\u8bc4\u4f30\u96c6\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u663e\u8457\u8d85\u8d8a\u4e86\u4e4b\u524d\u4f7f\u75281000\u500d\u66f4\u591a\u76d1\u7763\u6570\u636e\u8bad\u7ec3\u7684AED\u6a21\u578b\u3002|\n", "2408.17404": "|**2024-08-30**|**Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach**|Jialiang Wei et.al.|[2408.17404](http://arxiv.org/abs/2408.17404)|null|\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\uff0c\u501f\u9274\u5e94\u7528\u5546\u5e97\uff08AppStore\uff09\u7684\u89c4\u8303\u83b7\u53d6\u65b9\u6cd5\u88ab\u8bc1\u660e\u975e\u5e38\u6709\u76ca\u3002\u5f00\u53d1\u8005\u7ecf\u5e38\u7814\u7a76\u7ade\u4e89\u5bf9\u624b\u7684\u5e94\u7528\u7a0b\u5e8f\u4ee5\u6536\u96c6\u65b0\u529f\u80fd\u7684\u7075\u611f\u3002\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u542f\u53d1\u7684\u89c4\u8303\u83b7\u53d6\u5177\u6709\u6f5c\u529b\u3002LLMs\u53ef\u4ee5\u5728\u8fd9\u4e00\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u65b0\u529f\u80fd\u60f3\u6cd5\u7684\u7075\u611f\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u5728\u5b9e\u8df5\u4e2d\u8d8a\u6765\u8d8a\u53d7\u6b22\u8fce\uff0c\u4f46\u5b83\u4eec\u4e4b\u95f4\u7684\u5dee\u5f02\u7f3a\u4e4f\u6df1\u5165\u7406\u89e3\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6bd4\u8f83\u7814\u7a76\uff0c\u5bf9\u6bd4\u4e86\u5e94\u7528\u5546\u5e97\u548cLLM\u542f\u53d1\u7684\u65b9\u6cd5\u5728\u7ec6\u5316\u529f\u80fd\u4e3a\u5b50\u529f\u80fd\u65f6\u7684\u8868\u73b0\u3002\u901a\u8fc7\u624b\u52a8\u5206\u6790\u4ece\u4e24\u79cd\u65b9\u6cd5\u63a8\u8350\u76841200\u4e2a\u5b50\u529f\u80fd\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u5b83\u4eec\u7684\u4f18\u70b9\u3001\u6311\u6218\u4ee5\u53ca\u5173\u952e\u5dee\u5f02\u3002\u5c3d\u7ba1\u4e24\u79cd\u65b9\u6cd5\u90fd\u63a8\u8350\u4e86\u9ad8\u5ea6\u76f8\u5173\u4e14\u63cf\u8ff0\u6e05\u6670\u7684\u5b50\u529f\u80fd\uff0c\u4f46LLMs\u5728\u7279\u522b\u6d89\u53ca\u672a\u89c1\u5e94\u7528\u8303\u56f4\u7684\u65b0\u9896\u6027\u65b9\u9762\u4f3c\u4e4e\u66f4\u4e3a\u5f3a\u5927\u3002\u6b64\u5916\uff0c\u4e00\u4e9b\u63a8\u8350\u7684\u529f\u80fd\u662f\u865a\u6784\u7684\uff0c\u5176\u53ef\u884c\u6027\u4e0d\u660e\u786e\uff0c\u8fd9\u5f3a\u8c03\u4e86\u4eba\u7c7b\u5206\u6790\u5e08\u5728\u83b7\u53d6\u8fc7\u7a0b\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2408.17377": "|**2024-08-30**|**NDP: Next Distribution Prediction as a More Broad Target**|Junhao Ruan et.al.|[2408.17377](http://arxiv.org/abs/2408.17377)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u8303\u5f0f\u8fdb\u884c\u8bad\u7ec3\uff0c\u5c55\u793a\u4e86\u5f3a\u5927\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684NTP\u8303\u5f0f\u5b58\u5728\u51e0\u4e2a\u9650\u5236\uff0c\u7279\u522b\u662f\u5728\u8ba1\u5212\u4efb\u52a1\u590d\u6742\u6027\u548c\u63a8\u7406\u9636\u6bb5\u7684\u9519\u8bef\u4f20\u64ad\u65b9\u9762\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u6269\u5c55\u4e86\u5bf9NTP\u7684\u6279\u8bc4\uff0c\u6307\u51fa\u5176\u9650\u5236\u8fd8\u6e90\u4e8e\u8bad\u7ec3\u76ee\u6807\u72ed\u7a84\uff1a\u9884\u6d4b\u4e00\u4e2a\u6b21\u4f18\u7684\u4e00\u70ed\u5206\u5e03\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u6279\u8bc4\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u9884\u5b9e\u9a8c\uff0c\u5c06\u5f3a\u5927\u7684LLM\u7684\u8f93\u51fa\u5206\u5e03\u89c6\u4e3a\u9ad8\u6548\u7684\u4e16\u754c\u6570\u636e\u538b\u7f29\u3002\u901a\u8fc7\u8bc4\u4f30n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\uff0c\u6211\u4eec\u53d1\u73b0n-gram\u5206\u5e03\u4e0eLLM\u8f93\u51fa\u5206\u5e03\u66f4\u4e3a\u4e00\u81f4\u3002\u57fa\u4e8e\u8fd9\u4e00\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e0b\u4e00\u4e2a\u5206\u5e03\u9884\u6d4b\uff08NDP\uff09\uff0c\u4f7f\u7528n-gram\u5206\u5e03\u6765\u66ff\u6362\u4e00\u70ed\u76ee\u6807\uff0c\u4ece\u800c\u589e\u5f3a\u5b66\u4e60\u8fc7\u7a0b\u800c\u65e0\u9700\u989d\u5916\u7684\u5728\u7ebf\u8bad\u7ec3\u65f6\u95f4\u3002\u6211\u4eec\u5728\u7ffb\u8bd1\u3001\u901a\u7528\u4efb\u52a1\u3001\u8bed\u8a00\u8fc1\u79fb\u548c\u533b\u7597\u9886\u57df\u9002\u5e94\u7b49\u56db\u4e2a\u9886\u57df\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0eNTP\u76f8\u6bd4\uff0cNDP\u5728\u7ffb\u8bd1\u4efb\u52a1\u4e0a\u53ef\u8fbe\u5230+2.97 COMET\u6539\u8fdb\uff0c\u5728\u901a\u7528\u4efb\u52a1\u4e0a\u5e73\u5747\u6539\u5584+0.61\uff0c\u5728\u533b\u7597\u9886\u57df\u4e0a\u5e73\u5747\u6539\u5584+10.75\u3002\u8fd9\u8868\u660e\u89e3\u51b3\u76ee\u6807\u72ed\u7a84\u95ee\u9898\u7684\u5177\u4f53\u76ca\u5904\uff0c\u5e76\u6307\u51fa\u4e86\u672a\u6765\u6539\u8fdbNTP\u7684\u4e00\u4e2a\u65b0\u65b9\u5411\u3002|\n", "2408.17362": "|**2024-08-30**|**Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain**|Francesca Grasso et.al.|[2408.17362](http://arxiv.org/abs/2408.17362)|**[link](https://github.com/stefanolocci/LLMClassification)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4e24\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09GPT3.5\u548cLlama2\u4ee5\u53ca\u4e00\u79cd\u5c0f\u578b\u8bed\u8a00\u6a21\u578b\uff08SLM\uff09Gemma\u5728\u6c14\u5019\u53d8\u5316\uff08CC\uff09\u548c\u73af\u5883\u9886\u57df\u5185\u7684\u4e09\u79cd\u4e0d\u540c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002\u901a\u8fc7\u4f7f\u7528\u57fa\u4e8eBERT\u7684\u6a21\u578b\u4f5c\u4e3a\u57fa\u51c6\uff0c\u6211\u4eec\u5c06\u8fd9\u4e9b\u8f6c\u6362\u5668\u57fa\u6a21\u578b\u4e0e\u5b83\u4eec\u8fdb\u884c\u6bd4\u8f83\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u80fd\u529b\uff0c\u901a\u8fc7\u5206\u6790\u8fd9\u4e9b\u6587\u672c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u53e3\u5934\u4fe1\u5fc3\u5206\u6570\u7684\u6821\u51c6\u60c5\u51b5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u57fa\u4e8eBERT\u7684\u6a21\u578b\u901a\u5e38\u5728\u6240\u6709\u6a21\u578b\u4e2d\u8868\u73b0\u6700\u4f73\uff0c\u4f46\u5927\u751f\u6210\u6a21\u578b\u7684\u6027\u80fd\u4ecd\u7136\u503c\u5f97\u6ce8\u610f\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u7684\u6821\u51c6\u5206\u6790\u663e\u793a\uff0cGemma\u5728\u521d\u671f\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u826f\u597d\u7684\u6821\u51c6\u6027\uff0c\u968f\u540e\u4ea7\u751f\u4e0d\u4e00\u81f4\u7684\u7ed3\u679c\uff1bLlama\u5177\u6709\u5408\u7406\u7684\u6821\u51c6\u6027\uff0c\u800cGPT\u59cb\u7ec8\u8868\u73b0\u51fa\u5f3a\u5927\u7684\u6821\u51c6\u6027\u3002\u901a\u8fc7\u8fd9\u9879\u7814\u7a76\uff0c\u6211\u4eec\u65e8\u5728\u4e3a\u8ba8\u8bba\u5927\u578b\u751f\u6210\u578bLM\u5728\u89e3\u51b3\u5730\u7403\u6700\u7d27\u8feb\u95ee\u9898\u65b9\u9762\u7684\u9002\u7528\u6027\u548c\u6709\u6548\u6027\u505a\u51fa\u8d21\u732e\uff0c\u7279\u522b\u662f\u5728\u751f\u6001\u5b66\u548cCC\u80cc\u666f\u4e0b\u7a81\u51fa\u5176\u4f18\u52bf\u548c\u9650\u5236\u3002**|\n", "2408.17354": "|**2024-08-30**|**Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage**|Md Rafi Ur Rashid et.al.|[2408.17354](http://arxiv.org/abs/2408.17354)|null|\u9488\u5bf9\u79c1\u6709\u6570\u636e\u8fdb\u884c\u4e0b\u6e38\u5e94\u7528\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5fae\u8c03\u5b58\u5728\u91cd\u5927\u9690\u79c1\u98ce\u9669\uff0c\u53ef\u80fd\u6cc4\u9732\u654f\u611f\u4fe1\u606f\u3002\u5f53\u524d\u793e\u533a\u5e73\u53f0\u63d0\u4f9b\u4e86\u65b9\u4fbf\u7684\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u6a21\u578b\u5206\u53d1\uff0c\u4efb\u4f55\u4eba\u90fd\u53ef\u4ee5\u53d1\u5e03\u800c\u65e0\u9700\u4e25\u683c\u7684\u9a8c\u8bc1\u3002\u8fd9\u79cd\u60c5\u5883\u4e0b\uff0c\u9690\u79c1\u5a01\u80c1\u663e\u8457\u589e\u52a0\uff0c\u56e0\u4e3a\u9884\u8bad\u7ec3\u6a21\u578b\u53ef\u80fd\u88ab\u6545\u610f\u7be1\u6539\u4ee5\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u6cc4\u9732\u79c1\u4eba\u6570\u636e\u3002\u672c\u7814\u7a76\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4e2d\u6bd2\u6280\u672f\uff0c\u4f7f\u7528\u6a21\u578b\u5378\u8f7d\u4f5c\u4e3a\u653b\u51fb\u5de5\u5177\u3002\u8fd9\u79cd\u65b9\u6cd5\u901a\u8fc7\u8c03\u6574\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u6765\u63d0\u9ad8\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u7684\u79c1\u4eba\u6570\u636e\u6cc4\u9732\u7a0b\u5ea6\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u4fdd\u6301\u6a21\u578b\u5b9e\u7528\u6027\u7684\u540c\u65f6\uff0c\u589e\u5f3a\u4e86\u6210\u5458\u5f52\u5c5e\u6027\u548c\u6570\u636e\u63d0\u53d6\u653b\u51fb\u7684\u6548\u679c\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u4e0d\u540c\u6a21\u578b\u3001\u6570\u636e\u96c6\u548c\u5fae\u8c03\u8bbe\u7f6e\u4e0b\u663e\u793a\uff0c\u6211\u4eec\u7684\u653b\u51fb\u663e\u8457\u8d85\u8d8a\u4e86\u57fa\u51c6\u6027\u80fd\u3002\u8fd9\u9879\u5de5\u4f5c\u5411\u4e0b\u8f7d\u672a\u7ecf\u8fc7\u4e25\u683c\u9a8c\u8bc1\u6765\u6e90\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u7528\u6237\u53d1\u51fa\u4e86\u8b66\u544a\uff0c\u7a81\u663e\u4e86\u6f5c\u5728\u7684\u98ce\u9669\u3002|\n", "2408.17316": "|**2024-08-30**|**Bridging Domain Knowledge and Process Discovery Using Large Language Models**|Ali Norouzifar et.al.|[2408.17316](http://arxiv.org/abs/2408.17316)|**[link](https://github.com/alinorouzifar/imr-llm)**|**\u53d1\u73b0\u4f18\u8d28\u6d41\u7a0b\u6a21\u578b\u5bf9\u4e8e\u6267\u884c\u4e0d\u540c\u7684\u6d41\u7a0b\u5206\u6790\u4efb\u52a1\u81f3\u5173\u91cd\u8981\uff0c\u5982\u4e00\u81f4\u6027\u68c0\u67e5\u548c\u6d41\u7a0b\u6539\u8fdb\u3002\u81ea\u52a8\u5316\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u5f80\u5f80\u5ffd\u89c6\u4e86\u6709\u4ef7\u503c\u7684\u4e13\u4e1a\u9886\u57df\u77e5\u8bc6\u3002\u8fd9\u4e9b\u77e5\u8bc6\uff0c\u5305\u62ec\u6765\u81ea\u4e13\u4e1a\u9886\u57df\u4e13\u5bb6\u7684\u89c1\u89e3\u548c\u8be6\u7ec6\u6d41\u7a0b\u6587\u6863\uff0c\u901a\u5e38\u5728\u6d41\u7a0b\u53d1\u73b0\u8fc7\u7a0b\u4e2d\u672a\u5f97\u5230\u5145\u5206\u5229\u7528\u3002\u672c\u6587\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u76f4\u63a5\u5c06\u6b64\u7c7b\u77e5\u8bc6\u6574\u5408\u5230\u6d41\u7a0b\u53d1\u73b0\u4e2d\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u3002\u6211\u4eec\u4f7f\u7528\u4eceLLMs\u4e2d\u63d0\u53d6\u7684\u89c4\u5219\u6765\u6307\u5bfc\u6a21\u578b\u6784\u5efa\u8fc7\u7a0b\uff0c\u786e\u4fdd\u5176\u4e0e\u9886\u57df\u77e5\u8bc6\u548c\u5b9e\u9645\u6d41\u7a0b\u6267\u884c\u4fdd\u6301\u4e00\u81f4\u3002\u901a\u8fc7\u6574\u5408LLMs\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u5ea7\u8fde\u63a5\u4ee5\u81ea\u7136\u8bed\u8a00\u8868\u8fbe\u7684\u6d41\u7a0b\u77e5\u8bc6\u4e0e\u53d1\u73b0\u7a33\u5065\u6d41\u7a0b\u6a21\u578b\u4e4b\u95f4\u7684\u6865\u6881\uff0c\u663e\u8457\u63a8\u8fdb\u4e86\u6d41\u7a0b\u53d1\u73b0\u65b9\u6cd5\u8bba\u3002\u4e3a\u4e86\u5c55\u793a\u6211\u4eec\u6846\u67b6\u7684\u5b9e\u7528\u6027\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u6848\u4f8b\u7814\u7a76\uff0c\u5bf9\u8c61\u662fUWV\u5458\u5de5\u4fdd\u9669\u516c\u53f8\uff0c\u8fd9\u8bc1\u660e\u4e86\u5176\u5b9e\u9645\u4f18\u52bf\u548c\u6709\u6548\u6027\u3002**|\n", "2408.17280": "|**2024-08-30**|**Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts**|Rhui Dih Lee et.al.|[2408.17280](http://arxiv.org/abs/2408.17280)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u5de5\u5177\u5305\uff0c\u7528\u4e8e\u4ece\u5df2\u8bad\u7ec3\u7684\u6a21\u578b\u521b\u5efa\u4f4e\u6210\u672c\u7684\u9886\u57df\u4e13\u5bb6\u6df7\u5408\uff08MOE\uff09\u3002\u8be5\u5de5\u5177\u5305\u53ef\u4ee5\u7528\u4e8e\u4ece\u6a21\u578b\u6216\u9002\u914d\u5668\u521b\u5efa\u6df7\u5408\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d4b\u8bd5\uff0c\u5e76\u63d0\u4f9b\u4e86\u5173\u4e8e\u4f7f\u7528\u5de5\u5177\u5305\u5b9a\u4e49\u7ed3\u679cMOE\u67b6\u6784\u7684\u6307\u5bfc\u3002\u516c\u5f00\u4e86\u4e00\u4e2a\u53ef\u7528\u7684\u5b58\u50a8\u5e93\u3002|\n", "2408.17258": "|**2024-08-30**|**Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach**|Tong Nie et.al.|[2408.17258](http://arxiv.org/abs/2408.17258)|null|\u7535\u5b50\u5546\u52a1\u548c\u57ce\u5e02\u5316\u7684\u84ec\u52c3\u53d1\u5c55\uff0c\u6781\u5927\u5730\u589e\u5f3a\u4e86\u57ce\u5e02\u533a\u57df\u7684\u914d\u9001\u6d3b\u52a8\uff0c\u5bfc\u81f4\u4e86\u9700\u6c42\u91cf\u7684\u589e\u52a0\u4e0e\u590d\u6742\u6027\u7684\u63d0\u5347\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6570\u636e\u9a71\u52a8\u7684\u9884\u6d4b\u65b9\u6cd5\uff0c\u7279\u522b\u662f\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u6280\u672f\uff0c\u5f00\u59cb\u5728\u57ce\u5e02\u914d\u9001\u9700\u6c42\u7ba1\u7406\u95ee\u9898\u4e2d\u53d1\u6325\u5173\u952e\u4f5c\u7528\u3002\u7136\u800c\uff0c\u4e00\u4e2a\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7814\u7a76\u7684\u95ee\u9898\u662f\u5168\u57ce\u8303\u56f4\u5185\u7684\u914d\u9001\u9700\u6c42\u8054\u5408\u4f30\u8ba1\u4e0e\u9884\u6d4b\u3002\u9488\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u5c06\u5176\u5efa\u6a21\u4e3a\u4e00\u4e2a\u57fa\u4e8e\u56fe\u7684\u65f6\u7a7a\u5b66\u4e60\u4efb\u52a1\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e00\u4e2a\u6d88\u606f\u4f20\u9012\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u6765\u6355\u6349\u76f8\u5173\u533a\u57df\u4e4b\u95f4\u9700\u6c42\u6a21\u5f0f\u7684\u4ea4\u4e92\u3002\u5176\u6b21\uff0c\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u6211\u4eec\u4ece\u672a\u7ed3\u6784\u5316\u7684\u5730\u7406\u4f4d\u7f6e\u6570\u636e\u4e2d\u63d0\u53d6\u901a\u7528\u7684\u5730\u7406\u7a7a\u95f4\u77e5\u8bc6\u7f16\u7801\uff0c\u5e76\u5c06\u5176\u6574\u5408\u5230\u9700\u6c42\u9884\u6d4b\u5668\u4e2d\u3002\u6700\u540e\uff0c\u4e3a\u4e86\u4fc3\u8fdb\u6a21\u578b\u5728\u4e0d\u540c\u57ce\u5e02\u7684\u8fc1\u79fb\u80fd\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u5f52\u7eb3\u8bad\u7ec3\u65b9\u6848\u3002 \u6211\u4eec\u5728\u4e24\u4e2a\u771f\u5b9e\u7684\u914d\u9001\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u5305\u62ec\u4e2d\u56fd\u7684\u516b\u4e2a\u57ce\u5e02\u548c\u7f8e\u56fd\u7684\u57ce\u5e02\uff0c\u7ed3\u679c\u8868\u660e\u6211\u4eec\u7684\u6a21\u578b\u5728\u8fd9\u4e9b\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u4e2d\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u57fa\u51c6\u65b9\u6cd5\u3002|\n", "2408.17253": "|**2024-08-30**|**VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters**|Mouxiang Chen et.al.|[2408.17253](http://arxiv.org/abs/2408.17253)|**[link](https://github.com/keytoyze/visionts)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u4ece\u4e30\u5bcc\u4e14\u9ad8\u8d28\u91cf\u7684\u81ea\u7136\u56fe\u50cf\u51fa\u53d1\u6784\u5efa\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u57fa\u7840\u6a21\u578b\u7684\u65b0\u8def\u5f84\u3002\u73b0\u6709\u7684\u65b9\u6cd5\u8981\u4e48\u901a\u8fc7\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u8981\u4e48\u5efa\u7acb\u5927\u89c4\u6a21\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u96c6\u6765\u5f00\u53d1TSF\u57fa\u7840\u6a21\u578b\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u9762\u4e34\u8de8\u57df\u5dee\u8ddd\u6216\u9886\u57df\u5185\u5f02\u8d28\u6027\u7684\u4e25\u5cfb\u6311\u6218\u3002\u6211\u4eec\u57fa\u4e8e\u56fe\u50cf\u4e0e\u65f6\u95f4\u5e8f\u5217\u4e4b\u95f4\u5185\u5728\u76f8\u4f3c\u6027\uff0c\u63a2\u7d22\u4e86\u4e00\u79cd\u65b0\u7684TSF\u4efb\u52a1\u8868\u793a\uff0c\u5c06\u5176\u91cd\u65b0\u8868\u8ff0\u4e3a\u56fe\u50cf\u91cd\u5efa\u4efb\u52a1\uff0c\u5e76\u5229\u7528\u5728ImageNet\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u81ea\u6211\u76d1\u7763\u9884\u8bad\u7ec3\u7684\u89c6\u89c9\u63a9\u7801\u81ea\u52a8\u7f16\u7801\u5668\uff08MAE\uff09\u8fdb\u884c\u5904\u7406\u3002 \u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u5728\u65e0\u9700\u8fdb\u4e00\u6b65\u5728\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u8fdb\u884c\u9002\u5e94\u7684\u60c5\u51b5\u4e0b\uff0c\u6240\u63d0\u51fa\u7684VisionTS\u5c31\u80fd\u5b9e\u73b0\u4f18\u4e8e\u73b0\u6709TSF\u57fa\u7840\u6a21\u578b\u7684\u96f6\u6837\u672c\u9884\u6d4b\u6027\u80fd\u3002\u901a\u8fc7\u6700\u5c0f\u7a0b\u5ea6\u7684\u5fae\u8c03\uff0cVisionTS\u80fd\u591f\u8fdb\u4e00\u6b65\u63d0\u5347\u9884\u6d4b\u6027\u80fd\uff0c\u5e76\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u6c34\u5e73\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u89c6\u89c9\u6a21\u578b\u53ef\u80fd\u4e3aTSF\u63d0\u4f9b\u514d\u8d39\u5348\u9910\uff0c\u5e76\u5f3a\u8c03\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u4e0eTSF\u9886\u57df\u672a\u6765\u4ea4\u53c9\u7814\u7a76\u7684\u6f5c\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u516c\u5f00\u5728https://github.com/Keytoyze/VisionTS\u4e0a\u3002**|\n", "2409.02920": "|**2024-09-04**|**RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)**|Yao Mu et.al.|[2409.02920](http://arxiv.org/abs/2409.02920)|null|\u672c\u7bc7\u8bba\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboTwin\u7684\u65b0\u578b\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u5b83\u7ed3\u5408\u4e86\u73b0\u5b9e\u4e16\u754c\u4e2d\u7684\u9065\u63a7\u6570\u636e\u4e0e\u901a\u8fc7\u6570\u5b57\u5b6a\u751f\u751f\u6210\u7684\u5408\u6210\u6570\u636e\u3002RoboTwin\u65e8\u5728\u4e3a\u53cc\u81c2\u673a\u5668\u4eba\u573a\u666f\u63d0\u4f9b\u652f\u6301\uff0c\u7279\u522b\u5173\u6ce8\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u548c\u4eba\u673a\u4ea4\u4e92\u80fd\u529b\u3002\u6211\u4eec\u5229\u7528COBOT Magic\u5e73\u53f0\u6536\u96c6\u4e86\u4e30\u5bcc\u7684\u6570\u636e\uff0c\u6db5\u76d6\u5de5\u5177\u64cd\u4f5c\u548c\u4eba\u673a\u4e92\u52a8\u7684\u591a\u6837\u6027\u3002 \u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\u6765\u521b\u5efa\u6570\u5b57\u5b6a\u751f\u4f53\uff0c\u5229\u7528AI\u751f\u6210\u7684\u5185\u5bb9\u5c06\u4e8c\u7ef4\u56fe\u50cf\u8f6c\u6362\u4e3a\u8be6\u7ec6\u7684\u4e09\u7ef4\u6a21\u578b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u501f\u52a9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u4e13\u5bb6\u7ea7\u8bad\u7ec3\u6570\u636e\u548c\u9762\u5411\u529f\u80fd\u6027\u7684\u4efb\u52a1\u7279\u5b9a\u59ff\u6001\u5e8f\u5217\u3002 \u6211\u4eec\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a 1. RoboTwin\u57fa\u51c6\u6570\u636e\u96c6\uff0c 2. \u9ad8\u6548\u7684\u73b0\u5b9e\u5230\u6a21\u62df\u7ba1\u9053\uff0c\u4ee5\u53ca 3. \u5229\u7528\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u52a8\u4e13\u5bb6\u7ea7\u6570\u636e\u751f\u6210\u3002 \u8fd9\u4e9b\u8fdb\u5c55\u65e8\u5728\u89e3\u51b3\u673a\u5668\u4eba\u8bad\u7ec3\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u6709\u671b\u52a0\u901f\u5f00\u53d1\u66f4\u591a\u529f\u80fd\u5f3a\u5927\u3001\u9002\u5e94\u6027\u5e7f\u6cdb\u7684\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5e94\u7528\u4e8e\u5e7f\u6cdb\u7684\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u3002\u9879\u76ee\u9875\u9762\u53ef\u8bbf\u95ee\uff1ahttps://robotwin-benchmark.github.io/early-version/|\n", "2409.02897": "|**2024-09-05**|**LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA**|Jiajie Zhang et.al.|[2409.02897](http://arxiv.org/abs/2409.02897)|**[link](https://github.com/THUDM/LongCite)**|\u5c3d\u7ba1\u5f53\u524d\u7684\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u5728\u57fa\u4e8e\u5927\u91cf\u6587\u672c\u56de\u7b54\u7528\u6237\u95ee\u9898\u65b9\u9762\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u7f3a\u4e4f\u5f15\u7528\u4f7f\u5f97\u7528\u6237\u96be\u4ee5\u9a8c\u8bc1\u7b54\u6848\u7684\u51c6\u786e\u6027\uff0c\u4ece\u800c\u5f15\u53d1\u4e86\u5bf9\u5176\u53ef\u9760\u6027\u7684\u62c5\u5fe7\uff0c\u56e0\u4e3a\u5b83\u4eec\u53ef\u80fd\u4ea7\u751f\u9519\u8bef\u7684\u4fe1\u606f\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u4f7f\u8fd9\u4e9b\u957f\u6587\u672c\u5927\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u751f\u6210\u5305\u542b\u7cbe\u7ec6\u53e5\u7ea7\u5f15\u7528\u7684\u54cd\u5e94\uff0c\u4ee5\u63d0\u9ad8\u5b83\u4eec\u7684\u5fe0\u5b9e\u5ea6\u548c\u53ef\u9a8c\u8bc1\u6027\u3002 \u6211\u4eec\u9996\u5148\u5f15\u5165\u4e86LongBench-Cite\uff0c\u4e00\u4e2a\u81ea\u52a8\u8bc4\u4f30\u5f53\u524d\u5927\u8bed\u8a00\u6a21\u578b\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u95ee\u9898\u56de\u7b54\u4e2d\u7684\u8868\u73b0\u7684\u57fa\u51c6\uff0c\u63ed\u793a\u4e86\u5728\u53e5\u7ea7\u5f15\u7528\u65b9\u9762\u5b58\u5728\u5de8\u5927\u7684\u6539\u8fdb\u7a7a\u95f4\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86CoF\uff08\u7c97\u5230\u7ec6\uff09\u8fd9\u4e00\u65b0\u9896\u7684\u7ba1\u9053\uff0c\u5229\u7528\u73b0\u6210\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u5305\u542b\u7cbe\u786e\u53e5\u7ea7\u5f15\u7528\u7684\u957f\u6587\u672c\u95ee\u7b54\u5b9e\u4f8b\uff0c\u5e76\u4ee5\u6b64\u7ba1\u9053\u6784\u5efa\u4e86LongCite-45k\uff0c\u4e00\u4e2a\u7528\u4e8e\u53e5\u7ea7\u5f15\u7528\u95ee\u9898\u7684\u5927\u578b\u81ea\u76d1\u7763\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528LongCite-45k\u6570\u636e\u96c6\u8bad\u7ec3\u4e86LongCite-8B\u548cLongCite-9B\u6a21\u578b\uff0c\u6210\u529f\u5730\u4f7f\u5b83\u4eec\u80fd\u591f\u5728\u5355\u4e2a\u8f93\u51fa\u4e2d\u751f\u6210\u51c6\u786e\u7684\u54cd\u5e94\u548c\u7cbe\u7ec6\u7684\u53e5\u7ea7\u5f15\u7528\u3002\u5728LongBench-Cite\u4e0a\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u8bad\u7ec3\u6a21\u578b\u5728\u5f15\u7528\u8d28\u91cf\u65b9\u9762\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6c34\u5e73\uff0c\u8d85\u8d8a\u4e86\u5305\u62ecGPT-4\u5728\u5185\u7684\u9ad8\u7ea7\u4e13\u6709\u6a21\u578b\u3002|\n", "2409.02889": "|**2024-09-04**|**LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture**|Xidong Wang et.al.|[2409.02889](http://arxiv.org/abs/2409.02889)|**[link](https://github.com/freedomintelligence/longllava)**|**\u6269\u5c55\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u957f\u671f\u4e0a\u4e0b\u6587\u80fd\u529b\u5bf9\u4e8e\u89c6\u9891\u7406\u89e3\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u7406\u89e3\u548c\u591a\u6a21\u6001\u4ee3\u7406\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6d89\u53ca\u5230\u4e00\u7cfb\u5217\u7cfb\u7edf\u4f18\u5316\uff0c\u5305\u62ec\u6a21\u578b\u67b6\u6784\u3001\u6570\u636e\u6784\u9020\u548c\u8bad\u7ec3\u7b56\u7565\uff0c\u5c24\u5176\u662f\u89e3\u51b3\u968f\u7740\u66f4\u591a\u56fe\u50cf\u5f15\u5165\u800c\u51fa\u73b0\u7684\u6027\u80fd\u4e0b\u964d\u4ee5\u53ca\u9ad8\u6602\u8ba1\u7b97\u6210\u672c\u7b49\u95ee\u9898\u3002\u672c\u6587\u901a\u8fc7\u5c06\u6a21\u578b\u67b6\u6784\u8c03\u6574\u4e3aMamba\u548cTransformer\u5757\u7684\u6df7\u5408\u4f53\u3001\u91c7\u7528\u65e2\u80fd\u8003\u8651\u591a\u4e2a\u56fe\u50cf\u95f4\u65f6\u95f4\u4f9d\u8d56\u6027\u53c8\u80fd\u8003\u8651\u7a7a\u95f4\u4f9d\u8d56\u6027\u7684\u6570\u636e\u6784\u9020\u65b9\u6cd5\uff0c\u5e76\u5b9e\u65bd\u6e10\u8fdb\u5f0f\u8bad\u7ec3\u7b56\u7565\uff0c\u5bf9\u8fd9\u4e9b\u6311\u6218\u8fdb\u884c\u4e86\u5e94\u5bf9\u3002\u53d1\u5e03\u7684\u6a21\u578b\u201cLongLLaVA\u201d\uff08\u957f\u671f\u8bed\u8a00\u4e0e\u89c6\u89c9\u52a9\u624b\uff09\u662f\u9996\u4e2a\u6df7\u5408\u578bMLLM\uff0c\u5b9e\u73b0\u4e86\u6548\u7387\u4e0e\u6548\u679c\u4e4b\u95f4\u7684\u826f\u597d\u5e73\u8861\u3002LongLLaVA\u4e0d\u4ec5\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u800c\u4e14\u4fdd\u6301\u4e86\u9ad8\u541e\u5410\u91cf\u548c\u4f4e\u5185\u5b58\u6d88\u8017\u7684\u7279\u70b9\u3002\u7279\u522b\u5730\uff0c\u5b83\u80fd\u591f\u5728\u5355\u4e2aA100 80GB GPU\u4e0a\u5904\u7406\u8fd1\u4e00\u5343\u5f20\u56fe\u7247\uff0c\u5c55\u793a\u4e86\u5e7f\u6cdb\u4efb\u52a1\u5e94\u7528\u524d\u666f\u7684\u6f5c\u529b\u3002**|\n", "2409.02841": "|**2024-09-04**|**Historical German Text Normalization Using Type- and Token-Based Language Modeling**|Anton Ehrmanntraut et.al.|[2409.02841](http://arxiv.org/abs/2409.02841)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf91700\u5e74\u81f31900\u5e74\u5fb7\u56fd\u6587\u5b66\u6587\u672c\u7684\u6b63\u8bcd\u6cd5\u89c4\u8303\u5316\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u57fa\u4e8e\u5e73\u884c\u8bed\u6599\u5e93\u8bad\u7ec3\u3002\u6240\u63d0\u51fa\u7684\u7cfb\u7edf\u5229\u7528\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u548cTransformer\u8bed\u8a00\u6a21\u578b\uff0c\u7ed3\u5408\u7f16\u7801\u5668-\u89e3\u7801\u5668\u6a21\u578b\u5bf9\u5355\u4e2a\u8bcd\u6c47\u7c7b\u578b\u8fdb\u884c\u89c4\u8303\u5316\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u56e0\u679c\u8bed\u8a00\u6a21\u578b\u5728\u4e0a\u4e0b\u6587\u4e2d\u8c03\u6574\u8fd9\u4e9b\u89c4\u8303\u5316\u7ed3\u679c\u3002\u5e7f\u6cdb\u8bc4\u4f30\u8868\u660e\uff0c\u8be5\u63d0\u51fa\u7684\u7cfb\u7edf\u63d0\u4f9b\u4e86\u6700\u5148\u8fdb\u7684\u51c6\u786e\u6027\uff0c\u4e0e\u5b8c\u5168\u7aef\u5230\u7aef\u7684\u53e5\u5b50\u7ea7\u89c4\u8303\u5316\u7cfb\u7edf\u76f8\u5f53\uff0c\u8be5\u7cfb\u7edf\u662f\u901a\u8fc7\u5bf9\u9884\u8bad\u7ec3\u7684Transformer\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\u800c\u5b9e\u73b0\u7684\u3002\u7136\u800c\uff0c\u7531\u4e8e\u6a21\u578b\u96be\u4ee5\u6cdb\u5316\u4ee5\u53ca\u7f3a\u4e4f\u5927\u91cf\u9ad8\u8d28\u91cf\u5e73\u884c\u6570\u636e\uff0c\u5386\u53f2\u6587\u672c\u7684\u89c4\u8303\u5316\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\u3002|\n", "2409.02836": "|**2024-09-04**|**Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models**|Moein Shahiki Tash et.al.|[2409.02836](http://arxiv.org/abs/2409.02836)|null|\u672c\u6587\u901a\u8fc7\u8fd0\u7528\u9ad8\u7ea7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u6280\u672f\uff0c\u5bf9\u52a0\u5bc6\u8d27\u5e01\u76f8\u5173\u8ba8\u8bba\u4e2d\u7684\u9884\u6d4b\u9648\u8ff0\u3001\u5e0c\u671b\u6f14\u8bb2\u53ca\u6094\u6068\u68c0\u6d4b\u884c\u4e3a\u8fdb\u884c\u5206\u6790\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5206\u7c7b\u65b9\u6cd5\u2014\u2014\u201c\u9884\u6d4b\u9648\u8ff0\u201d\uff0c\u5c06\u5176\u7ec6\u5206\u4e3a\u9884\u6d4b\u589e\u52a0\u3001\u9884\u6d4b\u51cf\u5c11\u3001\u9884\u6d4b\u4e2d\u7acb\u6216\u975e\u9884\u6d4b\u7c7b\u522b\u3002\u5229\u7528GPT-4o\u8fd9\u4e00\u524d\u6cbf\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4eec\u5728\u4e94\u5927\u4e3b\u6d41\u52a0\u5bc6\u8d27\u5e01\uff08Cardano\u3001Binance\u3001Matic\u3001Fantom\u3001Ripple\uff09\u7684\u8ba8\u8bba\u4e2d\u63a2\u7d22\u4e86\u60c5\u7eea\u52a8\u6001\u3002\u7814\u7a76\u53d1\u73b0\uff0cMatic\u5728\u4e50\u89c2\u9884\u6d4b\u65b9\u9762\u663e\u793a\u51fa\u7279\u522b\u9ad8\u7684\u503e\u5411\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5e0c\u671b\u4e0e\u6094\u6068\u60c5\u7eea\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u63ed\u793a\u4e86\u8fd9\u4e9b\u60c5\u611f\u4e0e\u9884\u6d4b\u884c\u4e3a\u4e4b\u95f4\u590d\u6742\u7684\u4e92\u52a8\u6a21\u5f0f\u3002\u5c3d\u7ba1\u9762\u4e34\u6570\u636e\u91cf\u548c\u8d44\u6e90\u53ef\u7528\u6027\u65b9\u9762\u7684\u9650\u5236\uff0c\u6211\u4eec\u7684\u7814\u7a76\u4ecd\u63ed\u793a\u4e86\u52a0\u5bc6\u8d27\u5e01\u5e02\u573a\u6295\u8d44\u8005\u884c\u4e3a\u548c\u60c5\u7eea\u8d8b\u52bf\u7684\u91cd\u8981\u53d1\u73b0\uff0c\u4e3a\u6218\u7565\u51b3\u7b56\u548c\u672a\u6765\u7814\u7a76\u63d0\u4f9b\u4e86\u4fe1\u606f\u3002|\n", "2409.02834": "|**2024-09-04**|**CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models**|Wentao Liu et.al.|[2409.02834](http://arxiv.org/abs/2409.02834)|null|\u672c\u6587\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3aCMM-Math\u7684\u4e2d\u6587\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\uff0c\u5305\u542b\u57fa\u51c6\u548c\u8bad\u7ec3\u90e8\u5206\uff0c\u65e8\u5728\u8bc4\u4f30\u548c\u589e\u5f3a\u5927\u578b\u591a\u6a21\u6001\u6a21\u578b\uff08LMM\uff09\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\u3002CMM-Math\u5305\u542b\u4e86\u8d85\u8fc728,000\u4e2a\u9ad8\u8d28\u91cf\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u4ece\u5c0f\u5b66\u5230\u9ad8\u4e2d\u7684\u4e2d\u56fd12\u4e2a\u5e74\u7ea7\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\uff08\u4f8b\u5982\u9009\u62e9\u9898\u3001\u586b\u7a7a\u9898\u7b49\uff09\uff0c\u5e76\u63d0\u4f9b\u4e86\u8be6\u7ec6\u7684\u89e3\u51b3\u65b9\u6848\u3002\u7279\u522b\u5730\uff0c\u95ee\u9898\u6216\u89c2\u70b9\u4e2d\u53ef\u80fd\u5305\u542b\u89c6\u89c9\u4e0a\u4e0b\u6587\uff0c\u4f7f\u5f97\u8fd9\u4e2a\u6570\u636e\u96c6\u66f4\u5177\u6311\u6218\u6027\u3002\u901a\u8fc7\u5168\u9762\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u5f53\u524d\u6700\u5148\u8fdb\u7684LMM\u5728CMM-Math\u6570\u636e\u96c6\u4e0a\u9762\u4e34\u6311\u6218\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728LMM\u5f00\u53d1\u65b9\u9762\u8fdb\u4e00\u6b65\u6539\u8fdb\u7684\u5fc5\u8981\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMultimodal Mathematical LMM\uff08Math-LMM\uff09\u7684\u6a21\u578b\u6765\u5904\u7406\u6df7\u5408\u8f93\u5165\u7684\u591a\u4e2a\u56fe\u50cf\u548c\u6587\u672c\u6bb5\u843d\u7684\u95ee\u9898\u3002\u6211\u4eec\u91c7\u7528\u4e09\u4e2a\u9636\u6bb5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff1a\u57fa\u7840\u9884\u8bad\u7ec3\u3001\u57fa\u7840\u5fae\u8c03\u548c\u6570\u5b66\u5fae\u8c03\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u4e0e\u4e09\u4e2a\u591a\u6a21\u6001\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u7684SOTA LMM\u8fdb\u884c\u6bd4\u8f83\u65f6\uff0c\u6709\u6548\u5730\u63d0\u9ad8\u4e86\u6570\u5b66\u63a8\u7406\u6027\u80fd\u3002|\n", "2409.02828": "|**2024-09-04**|**ExpLLM: Towards Chain of Thought for Facial Expression Recognition**|Xing Lan et.al.|[2409.02828](http://arxiv.org/abs/2409.02828)|null|\u9762\u90e8\u8868\u60c5\u8bc6\u522b\uff08FER\uff09\u5728\u591a\u5a92\u4f53\u9886\u57df\u81f3\u5173\u91cd\u8981\uff0c\u5bf9\u5404\u79cd\u5e94\u7528\u5177\u6709\u91cd\u5927\u5f71\u54cd\u3002\u7136\u800c\uff0c\u7406\u89e3\u9762\u90e8\u8868\u60c5\u7684\u539f\u56e0\u5bf9\u4e8e\u51c6\u786e\u8bc6\u522b\u8868\u60c5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u7684\u65b9\u6cd5\uff0c\u5982\u57fa\u4e8e\u9762\u90e8\u52a8\u4f5c\u5355\u4f4d\uff08AUs\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u5e38\u63d0\u4f9bAU\u540d\u79f0\u548c\u5f3a\u5ea6\uff0c\u4f46\u7f3a\u4e4f\u5173\u4e8eAU\u4e4b\u95f4\u7684\u4e92\u52a8\u4ee5\u53ca\u6574\u4f53\u8868\u60c5\u4e4b\u95f4\u5173\u7cfb\u7684\u6d1e\u5bdf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aExpLLM\u7684\u65b0\u65b9\u6cd5\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u7684\u51c6\u786e\u601d\u7ef4\u94fe\uff08CoT\uff09\u3002\u6211\u4eec\u4ece\u4e09\u4e2a\u5173\u952e\u89c6\u89d2\u8bbe\u8ba1\u4e86CoT\u673a\u5236\uff1a\u5173\u952e\u89c2\u5bdf\u3001\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u548c\u7ed3\u8bba\u3002\u5173\u952e\u89c2\u5bdf\u63cf\u8ff0\u4e86AU\u7684\u540d\u79f0\u3001\u5f3a\u5ea6\u53ca\u5176\u76f8\u5173\u60c5\u611f\u3002\u603b\u4f53\u60c5\u611f\u89e3\u91ca\u57fa\u4e8e\u591a\u4e2aAU\u53ca\u5176\u4e92\u52a8\u8fdb\u884c\u5206\u6790\uff0c\u786e\u5b9a\u4e3b\u5bfc\u60c5\u611f\u53ca\u5176\u5173\u7cfb\u3002\u6700\u540e\uff0c\u7ed3\u8bba\u57fa\u4e8e\u524d\u4e00\u5206\u6790\u5f97\u51fa\u6700\u7ec8\u7684\u8868\u60c5\u6807\u7b7e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86Exp-CoT\u5f15\u64ce\uff0c\u7528\u4e8e\u6784\u5efa\u6b64\u8868\u60c5CoT\u5e76\u751f\u6210\u6307\u4ee4\u63cf\u8ff0\u6570\u636e\u4ee5\u8bad\u7ec3\u6211\u4eec\u7684ExpLLM\u3002\u5728RAF-DB\u548cAffectNet\u6570\u636e\u96c6\u4e0a\u7684\u5927\u91cf\u5b9e\u9a8c\u8868\u660e\uff0cExpLLM\u4f18\u4e8e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u9762\u90e8\u8868\u60c5\u8bc6\u522b\u65b9\u6cd5\u3002\u5728\u5fae\u8868\u60c5\u8bc6\u522b\u65b9\u9762\uff0cExpLLM\u4e5f\u8d85\u8d8a\u4e86\u6700\u65b0\u7684GPT-4o\uff0c\u5c24\u5176\u662f\u5728GPT-4o\u7ecf\u5e38\u5931\u8d25\u7684\u60c5\u51b5\u4e0b\u3002|\n", "2409.02823": "|**2024-09-04**|**Design Contradictions: Help or Hindrance?**|Aron E. Owen et.al.|[2409.02823](http://arxiv.org/abs/2409.02823)|null|\u5728\u6570\u636e\u53ef\u89c6\u5316\u9886\u57df\uff0c\u521b\u65b0\u601d\u7ef4\u7684\u8feb\u5207\u9700\u6c42\u4fc3\u4f7f\u6211\u4eec\u63a2\u7d22\u65b0\u7684\u521b\u610f\u65b9\u6cd5\u3002\u901a\u8fc7\u7ec4\u5408\u4e24\u4e2a\u6216\u66f4\u591a\u5177\u6709\u5bf9\u7acb\u6027\u8d28\u7684\u521b\u9020\u6027\u8bcd\u6c47\uff0c\u80fd\u591f\u6fc0\u53d1\u65b0\u578b\u60f3\u6cd5\u4e0e\u8bbe\u8ba1\uff0c\u5bf9\u521b\u610f\u8fc7\u7a0b\u4ea7\u751f\u79ef\u6781\u5f71\u54cd\u3002\u968f\u7740\u4eba\u5de5\u667a\u80fd\u9a71\u52a8\u8bbe\u8ba1\u7684\u53d1\u5c55\uff0c\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u6d6e\u51fa\u6c34\u9762\uff1a\u8fd9\u4e9b\u8bbe\u8ba1\u77db\u76fe\u662f\u5426\u80fd\u4e0eAI\u5de5\u5177\u534f\u540c\u5de5\u4f5c\uff1f\u76ee\u524d\u7b54\u6848\u662f\u5426\u5b9a\u7684\u3002AI\u7cfb\u7edf\uff0c\u5c24\u5176\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u4f9d\u8d56\u4e8e\u4ea7\u751f\u76f8\u4f3c\u6027\u7684\u7b97\u6cd5\uff0c\u800c\u521b\u9020\u529b\u5f80\u5f80\u9700\u8981\u5dee\u5f02\u6027\u548c\u65b0\u9896\u6027\u3002\u8fd9\u4efd\u6d77\u62a5\u5f00\u542f\u4e86\u5173\u4e8e\u5982\u4f55\u5f15\u5bfcAI\u7cfb\u7edf\u53d8\u5f97\u66f4\u5177\u521b\u9020\u6027\u548c\u751f\u6210\u65b0\u60f3\u6cd5\u7684\u5bf9\u8bdd\u3002\u8fd9\u9879\u7814\u7a76\u9080\u8bf7\u6211\u4eec\u91cd\u65b0\u8003\u8651\u4f20\u7edf\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5e76\u63a2\u7d22AI\u9a71\u52a8\u4e16\u754c\u4e2d\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u80fd\u5426\u5e94\u7528\u4f20\u7edf\u7684\u8bbe\u8ba1\u65b9\u6cd5\uff0c\u5982\u53cc\u94bb\u77f3\u6a21\u578b\uff0c\u6216\u8005\u662f\u5426\u9700\u8981\u65b0\u7684\u8bbe\u8ba1\u5de5\u7a0b\u65b9\u6cd5\uff1f\u5982\u4f55\u5229\u7528\u751f\u6210\u5f0fAI\u5feb\u901f\u8bbe\u8ba1\u53ef\u89c6\u5316\u5e76\u6784\u601d\u65b0\u60f3\u6cd5\uff1f\u8fd9\u7bc7\u8bba\u6587\u65e8\u5728\u5f00\u542f\u8fd9\u4e00\u91cd\u8981\u5bf9\u8bdd\uff0c\u5e76\u63d0\u4f9b\u6709\u5173AI\u5728\u63a8\u52a8\u6570\u636e\u53ef\u89c6\u5316\u521b\u610f\u65b9\u9762\u7684\u6f5c\u529b\u7684\u5b9e\u7528\u89c1\u89e3\u3002|\n", "2409.02822": "|**2024-09-04**|**Language Understanding as a Constraint on Consensus Size in LLM Societies**|Giordano De Marzo et.al.|[2409.02822](http://arxiv.org/abs/2409.02822)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u671d\u7740\u534f\u4f5c\u4efb\u52a1\u53d1\u5c55\u7684\u60c5\u51b5\u4e0b\uff0c\u591a\u4e2a\u4ee3\u7406\u76f8\u4e92\u4f5c\u7528\uff0c\u5982\u540c\u4e00\u4e2aLLM\u793e\u4f1a\u3002\u5728\u8fd9\u79cd\u80cc\u666f\u4e0b\uff0c\u5927\u91cf\u7684LLM\u80fd\u591f\u901a\u8fc7\u81ea\u6211\u7ec4\u7ec7\u65b9\u5f0f\u8fbe\u6210\u5173\u4e8e\u4efb\u610f\u89c4\u8303\u7684\u5171\u8bc6\uff0c\u8fd9\u4e9b\u89c4\u8303\u5728\u4fe1\u606f\u652f\u6301\u67d0\u4e00\u9009\u9879\u4f18\u4e8e\u53e6\u4e00\u9009\u9879\u7684\u60c5\u51b5\u4e0b\u4e0d\u5b58\u5728\u3002\u4e3a\u4e86\u7406\u89e3LLM\u662f\u5426\u4e0e\u4eba\u7c7b\u793e\u4f1a\u4e00\u6837\uff0c\u5728\u6ca1\u6709\u673a\u6784\u7684\u60c5\u51b5\u4e0b\u80fd\u591f\u8fbe\u5230\u5171\u8bc6\uff0c\u6211\u4eec\u5e94\u7528\u4e86\u590d\u6742\u79d1\u5b66\u7684\u65b9\u6cd5\u548c\u884c\u4e3a\u79d1\u5b66\u7684\u539f\u5219\uff0c\u5f00\u521b\u4e86\u4e00\u79cdAI\u4eba\u7c7b\u5b66\u7684\u65b0\u65b9\u6cd5\u3002\u7814\u7a76\u53d1\u73b0\uff0cLLM\u80fd\u591f\u5728\u7fa4\u4f53\u4e2d\u8fbe\u6210\u5171\u8bc6\uff0c\u5e76\u4e14LLM\u7684\u610f\u89c1\u52a8\u6001\u53ef\u4ee5\u7528\u4e00\u4e2a\u7531\u591a\u6570\u529b\u91cf\u7cfb\u6570\u53c2\u6570\u5316\u7684\u51fd\u6570\u6765\u7406\u89e3\uff0c\u8be5\u7cfb\u6570\u51b3\u5b9a\u4e86\u5171\u8bc6\u662f\u5426\u53ef\u80fd\u3002\u5bf9\u4e8e\u5177\u6709\u66f4\u9ad8\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u6a21\u578b\u800c\u8a00\uff0c\u8fd9\u79cd\u591a\u6570\u529b\u91cf\u66f4\u5f3a\uff0c\u800c\u5bf9\u4e8e\u8f83\u5927\u7684\u7fa4\u4f53\u800c\u8a00\u5219\u4f1a\u51cf\u5f31\uff0c\u5bfc\u81f4\u5b58\u5728\u4e00\u4e2a\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\uff0c\u8d85\u8fc7\u8fd9\u4e2a\u5927\u5c0f\uff0c\u5bf9\u4e8e\u7ed9\u5b9a\u7684LLM\uff0c\u8fbe\u6210\u5171\u8bc6\u53d8\u5f97\u4e0d\u53ef\u80fd\u3002\u8fd9\u4e00\u4e34\u754c\u7fa4\u4f53\u5927\u5c0f\u968f\u7740\u6a21\u578b\u7684\u8bed\u8a00\u7406\u89e3\u80fd\u529b\u7684\u589e\u957f\u5448\u6307\u6570\u7ea7\u589e\u957f\uff0c\u5bf9\u4e8e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u800c\u8a00\uff0c\u5176\u53ef\u4ee5\u8fbe\u5230\u8fdc\u8d85\u975e\u6b63\u5f0f\u4eba\u7c7b\u7fa4\u4f53\u5178\u578b\u89c4\u6a21\u7684\u6570\u91cf\u7ea7\u3002|\n", "2409.02795": "|**2024-09-04**|**Towards a Unified View of Preference Learning for Large Language Models: A Survey**|Bofei Gao et.al.|[2409.02795](http://arxiv.org/abs/2409.02795)|**[link](https://github.com/kbsdjames/awesome-llm-preference-learning)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u5b9e\u73b0\u6210\u529f\u7684\u5173\u952e\u56e0\u7d20\u4e4b\u4e00\u662f\u4f7fLLM\u7684\u8f93\u51fa\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u3002\u8fd9\u4e00\u8fc7\u7a0b\u901a\u5e38\u9700\u8981\u5c11\u91cf\u6570\u636e\u5c31\u80fd\u9ad8\u6548\u63d0\u5347LLM\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u6709\u6548\uff0c\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u7814\u7a76\u8986\u76d6\u4e86\u591a\u4e2a\u9886\u57df\uff0c\u76f8\u5173\u65b9\u6cd5\u76f8\u5bf9\u590d\u6742\u96be\u4ee5\u7406\u89e3\u3002\u4e0d\u540c\u65b9\u6cd5\u4e4b\u95f4\u7684\u5173\u7cfb\u5c1a\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\uff0c\u9650\u5236\u4e86\u504f\u597d\u8c03\u6574\u7b56\u7565\u7684\u53d1\u5c55\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5206\u89e3\u4e86\u73b0\u6709\u6d41\u884c\u8c03\u6574\u7b56\u7565\u7684\u56db\u4e2a\u7ec4\u6210\u90e8\u5206\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7edf\u4e00\u6846\u67b6\u6765\u7814\u7a76\u5f53\u524d\u7684\u8c03\u6574\u7b56\u7565\uff0c\u4ee5\u6b64\u5efa\u7acb\u5b83\u4eec\u4e4b\u95f4\u7684\u8054\u7cfb\u3002\u5728\u672c\u6587\u7efc\u8ff0\u4e2d\uff0c\u6211\u4eec\u5c06\u6240\u6709\u504f\u597d\u5b66\u4e60\u7b56\u7565\u5206\u89e3\u4e3a\u56db\u4e2a\u90e8\u5206\uff1a\u6a21\u578b\u3001\u6570\u636e\u3001\u53cd\u9988\u548c\u7b97\u6cd5\u3002\u8fd9\u79cd\u7edf\u4e00\u89c6\u89d2\u4e3a\u73b0\u6709\u8c03\u6574\u7b97\u6cd5\u63d0\u4f9b\u4e86\u6df1\u5165\u7406\u89e3\uff0c\u5e76\u4e14\u4e5f\u5f00\u542f\u4e86\u6574\u5408\u4e0d\u540c\u7b56\u7565\u4f18\u52bf\u7684\u53ef\u80fd\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u73b0\u6709\u4e3b\u6d41\u7b97\u6cd5\u7684\u5de5\u4f5c\u793a\u4f8b\uff0c\u4ee5\u5e2e\u52a9\u8bfb\u8005\u5168\u9762\u4e86\u89e3\u3002\u6700\u540e\uff0c\u57fa\u4e8e\u6211\u4eec\u7684\u7edf\u4e00\u89c6\u89d2\uff0c\u6211\u4eec\u63a2\u8ba8\u4e86\u8c03\u6574\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7684\u6311\u6218\u4ee5\u53ca\u672a\u6765\u7814\u7a76\u65b9\u5411\u3002|\n", "2409.03752": "|**2024-09-05**|**Attention Heads of Large Language Models: A Survey**|Zifan Zheng et.al.|[2409.03752](http://arxiv.org/abs/2409.03752)|**[link](https://github.com/iaar-shanghai/awesome-attention-heads)**|**\u81eaChatGPT\u95ee\u4e16\u4ee5\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u4f5c\u4e3a\u9ed1\u76d2\u7cfb\u7edf\u5b58\u5728\u3002\u56e0\u6b64\uff0c\u5176\u53d1\u5c55\u4e3b\u8981\u4f9d\u8d56\u4e8e\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\uff0c\u9650\u5236\u4e86\u901a\u8fc7\u6539\u53d8\u5185\u90e8\u67b6\u6784\u548c\u63a8\u7406\u8def\u5f84\u6765\u63d0\u5347\u6027\u80fd\u7684\u53ef\u80fd\u6027\u3002\u8bb8\u591a\u7814\u7a76\u8005\u5f00\u59cb\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u90e8\u673a\u5236\uff0c\u65e8\u5728\u8bc6\u522b\u63a8\u7406\u74f6\u9888\u7684\u672c\u8d28\uff0c\u5927\u591a\u6570\u7814\u7a76\u96c6\u4e2d\u5728\u6ce8\u610f\u529b\u5934\u90e8\u4e0a\u3002\u6211\u4eec\u7684\u7efc\u8ff0\u65e8\u5728\u901a\u8fc7\u805a\u7126\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u6027\u548c\u6ce8\u610f\u529b\u5934\u90e8\u7684\u5185\u5728\u673a\u5236\uff0c\u63ed\u793a\u5176\u5185\u90e8\u63a8\u7406\u8fc7\u7a0b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c06\u4eba\u7c7b\u601d\u8003\u8fc7\u7a0b\u63d0\u70bc\u4e3a\u56db\u4e2a\u9636\u6bb5\u6846\u67b6\uff1a\u77e5\u8bc6\u56de\u5fc6\u3001\u60c5\u5883\u5185\u8bc6\u522b\u3001\u6f5c\u5728\u63a8\u7406\u548c\u8868\u8fbe\u51c6\u5907\u3002\u5229\u7528\u8fd9\u4e00\u6846\u67b6\uff0c\u6211\u4eec\u7cfb\u7edf\u5730\u56de\u987e\u73b0\u6709\u7814\u7a76\uff0c\u8bc6\u522b\u5e76\u5206\u7c7b\u7279\u5b9a\u6ce8\u610f\u529b\u5934\u90e8\u7684\u529f\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u53d1\u73b0\u8fd9\u4e9b\u7279\u6b8a\u5934\u90e8\u6240\u4f7f\u7528\u7684\u5b9e\u9a8c\u65b9\u6cd5\uff0c\u5206\u4e3a\u65e0\u6a21\u578b\u65b9\u6cd5\u548c\u6709\u6a21\u578b\u65b9\u6cd5\u4e24\u5927\u7c7b\u3002\u6211\u4eec\u4e5f\u6982\u8ff0\u4e86\u76f8\u5173\u8bc4\u4f30\u65b9\u6cd5\u548c\u57fa\u51c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u8ba8\u8bba\u5f53\u524d\u7814\u7a76\u7684\u5c40\u9650\u6027\uff0c\u5e76\u63d0\u51fa\u51e0\u4e2a\u6f5c\u5728\u7684\u53d1\u5c55\u65b9\u5411\u3002\u6211\u4eec\u7684\u53c2\u8003\u6587\u732e\u5217\u8868\u5f00\u6e90\u4e8e\u3002**|\n", "2409.03735": "|**2024-09-05**|**LLM-CI: Assessing Contextual Integrity Norms in Language Models**|Yan Shvartzshnaider et.al.|[2409.03735](http://arxiv.org/abs/2409.03735)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ece\u4e92\u8054\u7f51\u4e0a\u6536\u96c6\u7684\u6570\u636e\u4e2d\u8bb0\u5fc6\u90e8\u5206\u8bad\u7ec3\u6570\u636e\u7684\u540c\u65f6\uff0c\u4e5f\u53ef\u80fd\u65e0\u610f\u4e2d\u7f16\u7801\u4e86\u793e\u4f1a\u504f\u597d\u548c\u89c4\u8303\u3002\u968f\u7740\u8fd9\u4e9b\u6a21\u578b\u88ab\u6574\u5408\u5230\u793e\u4f1a\u6280\u672f\u7cfb\u7edf\u4e2d\uff0c\u786e\u4fdd\u5b83\u4eec\u7f16\u7801\u7684\u89c4\u8303\u7b26\u5408\u793e\u4f1a\u671f\u671b\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u89c4\u8303\u53ef\u80fd\u56e0\u6a21\u578b\u3001\u8d85\u53c2\u6570\u3001\u4f18\u5316\u6280\u672f\u4ee5\u53ca\u6570\u636e\u96c6\u7684\u4e0d\u540c\u800c\u4e0d\u540c\u3002\u7531\u4e8e\u63d0\u793a\u654f\u611f\u6027\u7684\u95ee\u9898\u2014\u2014\u5fae\u5c0f\u7684\u63d0\u793a\u53d8\u5316\u4f1a\u5bfc\u81f4\u4e0d\u540c\u7684\u54cd\u5e94\uff0c\u73b0\u6709\u7684\u8bc4\u4f30\u65b9\u6cd5\u53d8\u5f97\u4e0d\u53ef\u9760\u3002\u9700\u8981\u4e00\u4e2a\u5168\u9762\u7684\u6846\u67b6\u6765\u6db5\u76d6\u5404\u79cd\u6a21\u578b\u3001\u4f18\u5316\u548c\u6570\u636e\u96c6\uff0c\u5e76\u63d0\u4f9b\u53ef\u9760\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u7f16\u7801\u7684\u89c4\u8303\u3002 \u6211\u4eec\u63d0\u51fa\u4e86LLM-CI\uff0c\u8fd9\u662f\u7b2c\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4f30LLM\u4e2d\u7f16\u7801\u9690\u79c1\u89c4\u8303\u7684\u5f00\u6e90\u6846\u67b6\u3002LLM-CI\u4f7f\u7528\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b8c\u6574\u6027\u56e0\u7d20\u7684\u60c5\u5883\u53d9\u8ff0\u65b9\u6cd5\u6765\u8bc4\u4f30\u4e0d\u540c\u4e0a\u4e0b\u6587\u4e2d\u548c\u4e0d\u540cLLM\u4e2d\u7684\u7f16\u7801\u89c4\u8303\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u591a\u63d0\u793a\u8bc4\u4f30\u65b9\u6cd5\u6765\u89e3\u51b3\u63d0\u793a\u654f\u611f\u6027\u95ee\u9898\uff0c\u901a\u8fc7\u4ec5\u4ece\u5bfc\u81f4\u591a\u4e2a\u53d8\u4f53\u4e00\u81f4\u54cd\u5e94\u7684\u63d0\u793a\u4e2d\u8bc4\u4f30\u89c4\u8303\uff0c\u4ee5\u5168\u9762\u8bc4\u4f30\u4f7f\u7528\u5148\u524d\u5de5\u4f5c\u4e2d\u7684IoT\u548cCOPPA\u60c5\u666f\u6570\u636e\u96c6\u7684LLM\u3002 \u901a\u8fc7\u4f7f\u7528LLM-CI\u548c\u6211\u4eec\u63d0\u51fa\u7684\u8fd9\u79cd\u65b9\u6cd5\uff0c\u6211\u4eec\u5168\u9762\u5730\u8bc4\u4f30\u4e86LLM\uff0c\u7814\u7a76\u4e86\u6a21\u578b\u5c5e\u6027\uff08\u5982\u8d85\u53c2\u6570\u3001\u5bb9\u91cf\uff09\u548c\u4f18\u5316\u7b56\u7565\uff08\u5982\u5bf9\u9f50\u3001\u91cf\u5316\uff09\u7684\u5f71\u54cd\u3002|\n", "2409.03734": "|**2024-09-05**|**Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry**|Meena Jagadeesan et.al.|[2409.03734](http://arxiv.org/abs/2409.03734)|null|\u672c\u6587\u4ece\u7ecf\u6d4e\u548c\u7b97\u6cd5\u4e24\u4e2a\u89d2\u5ea6\u7814\u7a76\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7b49\u5927\u89c4\u6a21\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5e02\u573a\u4e2d\u7684\u96c6\u4e2d\u95ee\u9898\uff0c\u4ee5\u53ca\u662f\u5426\u5b58\u5728\u8fdb\u5165\u6b64\u7c7b\u5e02\u573a\u7684\u4e0d\u53ef\u514b\u670d\u969c\u788d\u3002\u6211\u4eec\u901a\u8fc7\u6b63\u5f0f\u5b9a\u4e49\u4e00\u4e2a\u591a\u76ee\u6807\u9ad8\u7ef4\u56de\u5f52\u6846\u67b6\u6765\u63a2\u8ba8\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u7684\u95ee\u9898\uff0c\u8be5\u6846\u67b6\u6355\u6349\u5230\u4e86\u58f0\u8a89\u635f\u5bb3\u7684\u7279\u5f81\uff0c\u5e76\u5206\u6790\u4e86\u65b0\u516c\u53f8\u8fdb\u5165\u5e02\u573a\u6240\u9700\u7684\u6837\u672c\u6570\u91cf\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u591a\u76ee\u6807\u8003\u8651\u80fd\u591f\u4ece\u6839\u672c\u4e0a\u964d\u4f4e\u8fdb\u5165\u58c1\u5792\u2014\u2014\u6240\u9700\u6837\u672c\u6570\u91cf\u53ef\u80fd\u8fdc\u5c0f\u4e8e\u73b0\u6709\u516c\u53f8\u7684\u6570\u636e\u96c6\u5927\u5c0f\u3002\u5728\u8bc1\u660e\u8fd9\u4e9b\u7ed3\u679c\u7684\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u8fd8\u53d1\u5c55\u4e86\u591a\u76ee\u6807\u73af\u5883\u4e2d\u9ad8\u7ef4\u7ebf\u6027\u56de\u5f52\u7684\u7f29\u653e\u5b9a\u5f8b\uff0c\u5c55\u793a\u4e86\u5f53\u6570\u636e\u96c6\u89c4\u6a21\u8f83\u5927\u65f6\uff0c\u7f29\u653e\u7387\u4f1a\u53d8\u5f97\u8f83\u6162\uff0c\u8fd9\u4e00\u53d1\u73b0\u53ef\u80fd\u5177\u6709\u72ec\u7acb\u7684\u7814\u7a76\u4ef7\u503c\u3002|\n", "2409.03733": "|**2024-09-05**|**Planning In Natural Language Improves LLM Search For Code Generation**|Evan Wang et.al.|[2409.03733](http://arxiv.org/abs/2409.03733)|null|\u5728\u5927\u89c4\u6a21\u63d0\u5347\u8bad\u7ec3\u8ba1\u7b97\u80fd\u529b\u7684\u540c\u65f6\uff0c\u63a8\u7406\u8ba1\u7b97\u7684\u89c4\u6a21\u6269\u5c55\u5e76\u672a\u5e26\u6765\u7c7b\u4f3c\u7684\u8fdb\u6b65\u3002\u6211\u4eec\u5047\u8bbe\uff0c\u8fd9\u4e00\u9886\u57df\u7f3a\u4e4f\u5173\u952e\u6027\u7684\u7a81\u7834\u5728\u4e8e\u751f\u6210\u6a21\u578b\u7684\u8f93\u51fa\u591a\u6837\u6027\u4e0d\u8db3\uff0c\u5bfc\u81f4\u641c\u7d22\u6548\u7387\u4f4e\u4e0b\uff0c\u56e0\u4e3a\u6a21\u578b\u4e0d\u65ad\u4ea7\u751f\u9ad8\u5ea6\u76f8\u4f3c\u4f46\u9519\u8bef\u7684\u7ed3\u679c\u3002\u901a\u8fc7\u5b9e\u8bc1\u7814\u7a76\uff0c\u6211\u4eec\u53d1\u73b0\u63d0\u9ad8\u8f93\u51fa\u591a\u6837\u6027\u53ef\u4ee5\u6709\u6548\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPLANSEARCH\u7684\u65b0\u9896\u641c\u7d22\u7b97\u6cd5\uff0c\u5b83\u5728\u4eba\u7c7b\u8bc4\u4ef7\u3001MBPP+\u548cLiveCodeBench\uff08\u4e00\u4e2a\u7528\u4e8e\u7ade\u4e89\u6027\u7f16\u7a0b\u7684\u65e0\u6c61\u67d3\u57fa\u51c6\uff09\u7b49\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u751f\u6210\u5173\u4e8e\u95ee\u9898\u7684\u591a\u6837\u89c2\u5bdf\uff0c\u5e76\u5229\u7528\u8fd9\u4e9b\u89c2\u5bdf\u6784\u5efa\u89e3\u51b3\u7b56\u7565\uff0c\u6765\u63a2\u7d22\u6bd4\u4f20\u7edf\u65b9\u6cd5\u66f4\u5e7f\u6cdb\u7684\u6f5c\u5728\u89e3\u51b3\u65b9\u6848\u7a7a\u95f4\u3002\u5728\u4f7f\u7528PLANSEARCH\u7ed3\u5408Claude 3.5 Sonnet\u8fdb\u884c\u4f18\u5316\u540e\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86LiveCodeBench\u4e0a77.0%\u7684\u901a\u8fc7\u7387\uff08pass@200\uff09\uff0c\u8fd9\u4e0d\u4ec5\u8d85\u8d8a\u4e86\u4e0d\u4f7f\u7528\u641c\u7d22\u65b9\u6cd5\uff08pass@1=41.4%\uff09\u7684\u7ed3\u679c\uff0c\u4e5f\u4f18\u4e8e\u4ec5\u4f9d\u8d56\u91cd\u590d\u91c7\u6837\u7684\u65b9\u6cd5\uff08pass@200=60.6%\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u80fd\u591f\u51c6\u786e\u9884\u6d4b\u641c\u7d22\u5e26\u6765\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5176\u5173\u952e\u56e0\u7d20\u662f\u751f\u6210\u60f3\u6cd5\u7684\u591a\u6837\u6027\u3002|\n", "2409.03708": "|**2024-09-06**|**RAG based Question-Answering for Contextual Response Prediction System**|Sriram Veturi et.al.|[2409.03708](http://arxiv.org/abs/2409.03708)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u80fd\u529b\uff0c\u9488\u5bf9\u5b9e\u9645\u5de5\u4e1a\u5e94\u7528\u4e2d\u7684\u95ee\u9898\u56de\u7b54\u573a\u666f\u3002\u7ed9\u5b9a\u5ba2\u6237\u67e5\u8be2\uff0c\u8be5\u7cfb\u7edf\u4f1a\u68c0\u7d22\u76f8\u5173\u77e5\u8bc6\u6587\u6863\uff0c\u5e76\u7ed3\u5408\u4e4b\u524d\u7684\u804a\u5929\u5386\u53f2\uff0c\u4e3a\u96f6\u552e\u516c\u53f8\u7684\u5ba2\u670d\u4e2d\u5fc3\u63d0\u4f9b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u751f\u6210\u54cd\u5e94\u5efa\u8bae\u3002\u901a\u8fc7\u5168\u9762\u7684\u81ea\u52a8\u5316\u548c\u4eba\u5de5\u8bc4\u4f30\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5728\u51c6\u786e\u6027\u548c\u76f8\u5173\u6027\u4e0a\u4f18\u4e8e\u5f53\u524d\u57fa\u4e8eBERT\u7684\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u57fa\u4e8eRAG\u7684LLMs\u53ef\u4ee5\u4f5c\u4e3a\u4eba\u7c7b\u5ba2\u6237\u670d\u52a1\u4ee3\u8868\u7684\u4f18\u79c0\u8f85\u52a9\u5de5\u5177\uff0c\u51cf\u8f7b\u4ed6\u4eec\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002|\n", "2409.03671": "|**2024-09-05**|**TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems**|Stylianos Loukas Vasileiou et.al.|[2409.03671](http://arxiv.org/abs/2409.03671)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRACE-cs\u7684\u65b0\u578b\u6df7\u5408\u7cfb\u7edf\uff0c\u5b83\u7ed3\u5408\u4e86\u7b26\u53f7\u63a8\u7406\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u89e3\u51b3\u6392\u7a0b\u95ee\u9898\u4e2d\u7684\u5bf9\u6bd4\u67e5\u8be2\u3002TRACE-cs\u5229\u7528SAT\u6c42\u89e3\u6280\u672f\u7f16\u7801\u6392\u7a0b\u7ea6\u675f\uff0c\u5e76\u751f\u6210\u7528\u6237\u67e5\u8be2\u7684\u89e3\u91ca\uff0c\u540c\u65f6\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5c06\u7528\u6237\u7684\u67e5\u8be2\u8f6c\u6362\u4e3a\u903b\u8f91\u6761\u76ee\uff0c\u5e76\u7ec6\u5316\u7b26\u53f7\u6c42\u89e3\u5668\u751f\u6210\u7684\u89e3\u91ca\u4e3a\u81ea\u7136\u8bed\u8a00\u53e5\u5b50\u3002\u901a\u8fc7\u6574\u5408\u8fd9\u4e9b\u7ec4\u4ef6\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5c55\u793a\u4e86\u5c06\u7b26\u53f7\u65b9\u6cd5\u4e0eLLM\u76f8\u7ed3\u5408\uff0c\u521b\u5efa\u5177\u6709\u6b63\u786e\u6027\u4fdd\u8bc1\u7684\u53ef\u89e3\u91caAI\u4ee3\u7406\u7684\u6f5c\u529b\u3002|\n", "2409.03668": "|**2024-09-05**|**A Fused Large Language Model for Predicting Startup Success**|Abdurahman Maarouf et.al.|[2409.03668](http://arxiv.org/abs/2409.03668)|null|\u4e3a\u4e86\u5e2e\u52a9\u6295\u8d44\u8005\u505a\u51fa\u6709\u6548\u7684\u51b3\u7b56\u5e76\u6301\u7eed\u5bfb\u627e\u76c8\u5229\u7684\u521b\u4e1a\u6295\u8d44\u673a\u4f1a\uff0c\u9700\u8981\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u5982\u4eca\uff0c\u6295\u8d44\u8005\u4e0d\u4ec5\u53ef\u4ee5\u5229\u7528\u6709\u5173\u521d\u521b\u516c\u53f8\u7684\u5404\u79cd\u57fa\u672c\u9762\u4fe1\u606f\uff08\u5982\u516c\u53f8\u7684\u6210\u7acb\u65f6\u95f4\u3001\u521b\u59cb\u4eba\u6570\u91cf\u4ee5\u53ca\u6240\u5904\u884c\u4e1a\uff09\uff0c\u8fd8\u53ef\u4ee5\u901a\u8fc7\u5728\u7ebf\u98ce\u9669\u6295\u8d44\uff08VC\uff09\u5e73\u53f0\u83b7\u53d6\u5173\u4e8e\u516c\u53f8\u521b\u65b0\u548c\u4e1a\u52a1\u6a21\u5f0f\u7684\u6587\u672c\u63cf\u8ff0\u4fe1\u606f\uff0c\u4f8b\u5982Crunchbase\u3002\u4e3a\u4e86\u652f\u6301\u6295\u8d44\u8005\u7684\u51b3\u7b56\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\uff0c\u65e8\u5728\u5728VC\u5e73\u53f0\u4e0a\u5b9a\u4f4d\u6210\u529f\u7684\u521d\u521b\u516c\u53f8\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5f00\u53d1\u3001\u8bad\u7ec3\u5e76\u8bc4\u4f30\u4e86\u4e00\u4e2a\u4e13\u95e8\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u65e8\u5728\u8bc4\u4f30VC\u5e73\u53f0\u4e0a\u516c\u53f8\u7684\u81ea\u6211\u63cf\u8ff0\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u80fd\u591f\u9884\u6d4b\u5176\u6210\u529f\u6027\u3002\u4f7f\u7528\u6765\u81eaCrunchbase\u768420,172\u4e2a\u5728\u7ebf\u8d44\u6599\u6863\u6848\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u878d\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53ef\u4ee5\u9884\u6d4b\u521d\u521b\u516c\u53f8\u7684\u6210\u529f\u7387\uff0c\u5176\u4e2d\u6587\u672c\u81ea\u6211\u63cf\u8ff0\u5bf9\u9884\u6d4b\u80fd\u529b\u8d21\u732e\u4e86\u663e\u8457\u90e8\u5206\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u51b3\u7b56\u652f\u6301\u5de5\u5177\uff0c\u5e2e\u52a9\u6295\u8d44\u8005\u627e\u5230\u76c8\u5229\u7684\u6295\u8d44\u673a\u4f1a\u3002|\n", "2409.03662": "|**2024-09-05**|**The representation landscape of few-shot learning and fine-tuning in large language models**|Diego Doimo et.al.|[2409.03662](http://arxiv.org/abs/2409.03662)|**[link](https://github.com/diegodoimo/geometry_icl_finetuning)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u6539\u8fdb\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6027\u80fd\u7684\u4e24\u79cd\u5e38\u89c1\u7b56\u7565\uff1a\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u548c\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u3002\u5c3d\u7ba1\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u672c\u8d28\u4e0d\u540c\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u80fd\u4ea7\u751f\u76f8\u4f3c\u7684\u6027\u80fd\u63d0\u5347\u3002\u7136\u800c\uff0c\u6211\u4eec\u5bf9\u5b83\u4eec\u662f\u5426\u5728LLM\u5185\u90e8\u8bf1\u5bfc\u51fa\u76f8\u4f3c\u7684\u8868\u793a\u7ed3\u6784\u77e5\u4e4b\u751a\u5c11\u3002\u6211\u4eec\u901a\u8fc7\u5206\u6790\u8fd9\u4e24\u79cd\u60c5\u51b5\u4e0b\u9690\u85cf\u8868\u793a\u7684\u6982\u7387\u666f\u89c2\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u5728\u76f8\u540c\u7684\u95ee\u7b54\u4efb\u52a1\u4e0a\u6bd4\u8f83\u4e86LLM\u7684\u8868\u73b0\uff0c\u53d1\u73b0ICL\u548cSFT\u4ea7\u751f\u4e86\u975e\u5e38\u4e0d\u540c\u7684\u5185\u90e8\u7ed3\u6784\uff0c\u4e24\u8005\u90fd\u5728\u7f51\u7edc\u7684\u4e2d\u95f4\u90e8\u5206\u7ecf\u5386\u4e86\u4e00\u4e2a\u660e\u663e\u7684\u8f6c\u53d8\u3002\u5728\u6a21\u578b\u7684\u524d\u534a\u90e8\u5206\uff0cICL\u5851\u9020\u4e86\u5206\u5c42\u7ec4\u7ec7\u7684\u53ef\u89e3\u91ca\u8868\u793a\uff0c\u6309\u7167\u5176\u8bed\u4e49\u5185\u5bb9\u8fdb\u884c\u6392\u5e8f\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0cSFT\u5f97\u5230\u7684\u6982\u7387\u666f\u89c2\u66f4\u52a0\u6a21\u7cca\u4e14\u8bed\u4e49\u6df7\u6742\u3002\u5728\u7f51\u7edc\u7684\u540e\u534a\u90e8\u5206\uff0c\u5fae\u8c03\u540e\u7684\u8868\u793a\u53d1\u5c55\u51fa\u4e86\u66f4\u6709\u5229\u4e8e\u7f16\u7801\u7b54\u6848\u8eab\u4efd\u7684\u6982\u7387\u6a21\u5f0f\uff0c\u800cICL\u8868\u793a\u7684\u6982\u7387\u5cf0\u5219\u4e0d\u592a\u660e\u786e\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u63ed\u793a\u4e86LLM\u5728\u4e0d\u540c\u6761\u4ef6\u4e0b\u89e3\u51b3\u76f8\u540c\u4efb\u52a1\u65f6\u6240\u91c7\u7528\u7684\u591a\u6837\u5316\u8ba1\u7b97\u7b56\u7565\uff0c\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u671d\u7740\u8bbe\u8ba1\u51fa\u4ece\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u4fe1\u606f\u7684\u6700\u4f73\u65b9\u6cd5\u8fc8\u8fdb\u3002**|\n", "2409.03659": "|**2024-09-06**|**LLM-based multi-agent poetry generation in non-cooperative environments**|Ran Zhang et.al.|[2409.03659](http://arxiv.org/abs/2409.03659)|**[link](https://github.com/zhangr2021/Multiagent_poetry)**|**\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u591a\u6837\u6027\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u4e14\u8bad\u7ec3\u8fc7\u7a0b\u4e0e\u4eba\u7c7b\u5b66\u4e60\u65b9\u5f0f\u5927\u76f8\u5f84\u5ead\u3002\u57fa\u4e8e\u8fd9\u6837\u7684\u8003\u8651\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u793e\u4f1a\u5b66\u4e60\u7684\u6846\u67b6\uff0c\u5728\u6b64\u6846\u67b6\u4e0b\uff0c\u6211\u4eec\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\uff0c\u4ee5\u9f13\u52b1\u591a\u6837\u6027\uff0c\u540c\u65f6\u9664\u4e86\u5408\u4f5c\u4e92\u52a8\u5916\u8fd8\u5f3a\u8c03\u975e\u5408\u4f5c\u4e92\u52a8\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u662f\u9996\u6b21\u5c1d\u8bd5\u5728\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u4f7f\u7528\u57fa\u4e8e\u8bad\u7ec3\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\uff08GPT-2\uff09\u548c\u57fa\u4e8e\u63d0\u793a\u7684\u7cfb\u7edf\uff08GPT-3 \u548c GPT-4\uff09\u8fdb\u884c\u8bd7\u6b4c\u751f\u6210\u3002 \u6839\u636e\u5bf9\u751f\u6210\u768496,000\u9996\u8bd7\u6b4c\u7684\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5bf9\u57fa\u4e8e\u8bad\u7ec3\u7684\u667a\u80fd\u4f53\u7684\u8bd7\u6b4c\u751f\u6210\u8fc7\u7a0b\u4ea7\u751f\u4e86\u79ef\u6781\u5f71\u54cd\uff0c\u5bfc\u81f4\u4ee5\u4e0b\u7ed3\u679c\uff1a1\uff09\u591a\u6837\u6027\u589e\u52a0\u4e863.0-3.7\u4e2a\u767e\u5206\u70b9\uff08pp\uff09\uff0c\u65b0\u9896\u6027\u589e\u52a0\u4e865.6-11.3\u4e2a\u767e\u5206\u70b9\uff0c\u6839\u636e\u72ec\u7279\u548c\u65b0\u9896\u7684n-grams\u8bc4\u4f30\u3002\u751f\u6210\u7684\u8bd7\u6b4c\u5728\u8bcd\u6c47\u3001\u98ce\u683c\u548c\u8bed\u4e49\u65b9\u9762\u4e5f\u8868\u73b0\u51fa\u7fa4\u4f53\u5dee\u5f02\u3002\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u5728\u6211\u4eec\u7684\u6846\u67b6\u4e2d\u4e5f\u4ece\u975e\u5408\u4f5c\u73af\u5883\u4e2d\u83b7\u76ca\uff0c\u5177\u6709\u975e\u540c\u8d28\u667a\u80fd\u4f53\u7684\u591a\u6837\u5316\u7684\u6a21\u578b\u7ec4\u5408\u6709\u53ef\u80fd\u8fdb\u4e00\u6b65\u63d0\u9ad8\u591a\u6837\u6027\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u591a\u6837\u6027\u589e\u52a0\u4e867.0-17.5\u4e2a\u767e\u5206\u70b9\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u667a\u80fd\u4f53\u663e\u793a\u4e86\u968f\u7740\u65f6\u95f4\u63a8\u79fb\u8bcd\u6c47\u591a\u6837\u6027\u7684\u4e0b\u964d\uff0c\u5e76\u6ca1\u6709\u5c55\u73b0\u51fa\u65e8\u5728\u5728\u793e\u4ea4\u7f51\u7edc\u4e2d\u5b9e\u73b0\u7684\u7fa4\u4f53\u95f4\u5206\u5316\u3002 \u672c\u6587\u8ba4\u4e3a\uff0c\u5728\u8bf8\u5982\u81ea\u52a8\u8bd7\u6b4c\u751f\u6210\u7b49\u521b\u610f\u4efb\u52a1\u4e2d\uff0c\u9700\u8981\u8fdb\u884c\u8303\u5f0f\u8f6c\u53d8\uff0c\u5f15\u5165\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u4ea4\u4e92\u7684\u793e\u4f1a\u5b66\u4e60\u8fc7\u7a0b\uff08\u901a\u8fc7\u57fa\u4e8eLLM\u7684\u667a\u80fd\u4f53\u5efa\u6a21\uff09\uff0c\u4ee5\u4fc3\u8fdb\u66f4\u52a0\u591a\u6837\u6027\u548c\u521b\u65b0\u7684\u751f\u6210\u3002**|\n", "2409.03512": "|**2024-09-05**|**From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents**|Jifan Yu et.al.|[2409.03512](http://arxiv.org/abs/2409.03512)|null|\u81ea\u6700\u65e9\u7684\u5728\u7ebf\u6559\u80b2\u5b9e\u4f8b\u51fa\u73b0\uff0c\u8bfe\u7a0b\u88ab\u4e0a\u4f20\u81f3\u53ef\u8bbf\u95ee\u5e76\u5171\u4eab\u7684\u5728\u7ebf\u5e73\u53f0\u4ee5\u6765\uff0c\u8fd9\u79cd\u6269\u5927\u77e5\u8bc6\u4f20\u64ad\u8303\u56f4\u3001\u89e6\u53ca\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u5f62\u5f0f\u5f15\u53d1\u4e86\u5e7f\u6cdb\u8ba8\u8bba\u548c\u666e\u904d\u91c7\u7eb3\u3002\u8ba4\u8bc6\u5230\u4e2a\u6027\u5316\u5b66\u4e60\u4ecd\u5b58\u5728\u6539\u8fdb\u7a7a\u95f4\uff0c\u4eba\u5de5\u667a\u80fd\u6280\u672f\u4e0d\u65ad\u878d\u5165\u8fd9\u4e00\u5b66\u4e60\u6a21\u5f0f\uff0c\u7531\u6b64\u4ea7\u751f\u4e86\u591a\u79cd\u6559\u80b2AI\u5e94\u7528\uff0c\u5982\u6559\u80b2\u63a8\u8350\u548c\u667a\u80fd\u8f85\u5bfc\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u667a\u80fd\u7684\u6d8c\u73b0\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u6559\u80b2\u589e\u5f3a\u529f\u80fd\u5f97\u4ee5\u57fa\u4e8e\u7edf\u4e00\u7684\u57fa\u7840\u6a21\u578b\u6784\u5efa\uff0c\u5b9e\u73b0\u66f4\u6df1\u5c42\u9762\u7684\u6574\u5408\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u63d0\u51faMAIC\uff08\u5927\u89c4\u6a21AI\u8d4b\u80fd\u8bfe\u7a0b\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u7684\u5728\u7ebf\u6559\u80b2\u5f62\u5f0f\uff0c\u5229\u7528LLM\u9a71\u52a8\u7684\u591a\u4ee3\u7406\u7cfb\u7edf\u6784\u5efaAI\u8f85\u52a9\u8bfe\u5802\uff0c\u5e73\u8861\u4e86\u89c4\u6a21\u6027\u548c\u9002\u5e94\u6027\u3002\u9664\u4e86\u63a2\u7d22\u6982\u5ff5\u6846\u67b6\u548c\u6280\u672f\u521b\u65b0\u5916\uff0c\u6211\u4eec\u5728\u6e05\u534e\u5927\u5b66\u2014\u2014\u4e2d\u56fd\u9876\u5c16\u5927\u5b66\u4e4b\u4e00\u2014\u2014\u8fdb\u884c\u4e86\u521d\u6b65\u5b9e\u9a8c\u3002\u901a\u8fc7\u8d85\u8fc710\u4e07\u6761\u5b66\u4e60\u8bb0\u5f55\u548c500\u591a\u540d\u5b66\u751f\u7684\u6570\u636e\uff0c\u6211\u4eec\u83b7\u5f97\u4e86\u5b9d\u8d35\u89c2\u5bdf\u548c\u521d\u6b65\u5206\u6790\u3002\u8fd9\u4e2a\u9879\u76ee\u5c06\u6301\u7eed\u53d1\u5c55\uff0c\u6700\u7ec8\u76ee\u6807\u662f\u5efa\u7acb\u4e00\u4e2a\u5168\u9762\u5f00\u653e\u7684\u5e73\u53f0\uff0c\u652f\u6301\u548c\u7edf\u4e00\u7814\u7a76\u3001\u6280\u672f\u548c\u5e94\u7528\uff0c\u5728\u5927\u6a21\u578bAI\u65f6\u4ee3\u63a2\u7d22\u5728\u7ebf\u6559\u80b2\u7684\u53ef\u80fd\u6027\u3002\u6211\u4eec\u8bbe\u60f3\u8fd9\u4e2a\u5e73\u53f0\u662f\u4e00\u4e2a\u5408\u4f5c\u67a2\u7ebd\uff0c\u6c47\u96c6\u6559\u80b2\u8005\u3001\u7814\u7a76\u4eba\u5458\u548c\u521b\u65b0\u8005\u5171\u540c\u63a2\u7d22AI\u9a71\u52a8\u5728\u7ebf\u6559\u80b2\u7684\u672a\u6765\u3002|\n", "2409.04421": "|**2024-09-06**|**RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs**|Jiaxing Wu et.al.|[2409.04421](http://arxiv.org/abs/2409.04421)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u57fa\u4e8e\u9884\u6d4b\u53cd\u9988\u7684\u5f3a\u5316\u5b66\u4e60\uff08Reinforcement Learning from Prediction Feedback\uff0cRLPF\uff09\u201d\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Models\uff0cLLMs\uff09\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u5e94\u7528\u65f6\u9762\u4e34\u7684\u95ee\u9898\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5f53LLMs\u4ece\u7528\u6237\u7684\u8fc7\u5f80\u6d3b\u52a8\u9884\u6d4b\u884c\u4e3a\u65f6\uff0c\u5b83\u4eec\u7684\u6709\u6548\u6027\u5f80\u5f80\u53d6\u51b3\u4e8e\u80fd\u5426\u6709\u6548\u5730\u5229\u7528\u5927\u91cf\u3001\u957f\u7bc7\u7684\u7528\u6237\u5386\u53f2\u6570\u636e\uff0c\u800c\u8fd9\u4e9b\u6570\u636e\u901a\u5e38\u542b\u6709\u566a\u97f3\u4e14\u957f\u5ea6\u8fc7\u957f\u3002\u73b0\u6709\u9884\u8bad\u7ec3\u7684LLMs\u53ef\u80fd\u751f\u6210\u7684\u6458\u8981\u867d\u77ed\u5c0f\u7cbe\u608d\uff0c\u4f46\u7f3a\u4e4f\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\uff0c\u4ece\u800c\u9650\u5236\u4e86\u5176\u5728\u4e2a\u4eba\u5316\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0cRLPF\u65b9\u6cd5\u901a\u8fc7\u5fae\u8c03LLMs\u6765\u751f\u6210\u7cbe\u70bc\u3001\u4eba\u7c7b\u53ef\u8bfb\u7684\u7528\u6237\u6982\u8981\uff0c\u8fd9\u4e9b\u6982\u8981\u80fd\u591f\u4f18\u5316\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\u3002\u901a\u8fc7\u6700\u5927\u5316\u751f\u6210\u6982\u8981\u7684\u6709\u7528\u6027\uff0cRLPF\u80fd\u591f\u6709\u6548\u63d0\u53d6\u5927\u91cf\u7528\u6237\u5386\u53f2\u6570\u636e\u7684\u5173\u952e\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u5bf9\u4e0b\u6e38\u4efb\u52a1\u81f3\u5173\u91cd\u8981\u7684\u4fe1\u606f\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0cRLPF\u5728\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u4e0a\u663e\u8457\u63d0\u5347\u4e8622%\uff0c\u5728\u4e8b\u5b9e\u6027\u3001\u62bd\u8c61\u6027\u548c\u53ef\u8bfb\u6027\u7b49\u6307\u6807\u4e0a\u7684\u8868\u73b0\u5206\u522b\u8fbe\u5230\u4e8684.59%\u7684\u80dc\u7387\uff0c\u540c\u65f6\u5b9e\u73b0\u4e8674%\u7684\u4e0a\u4e0b\u6587\u957f\u5ea6\u51cf\u5c11\uff0c\u4e14\u572816\u4e2a\u672a\u89c1\u7684\u4efb\u52a1\u548c/\u6216\u6570\u636e\u96c6\u4e0a\u5747\u6709\u6027\u80fd\u63d0\u5347\uff0c\u8fd9\u8868\u660e\u5176\u5177\u6709\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\u3002 \u603b\u4e4b\uff0cRLPF\u63d0\u4f9b\u4e86\u4e00\u79cd\u589e\u5f3aLLMs\u5728\u4e2a\u4eba\u5316\u9886\u57df\u5e94\u7528\u7684\u6709\u524d\u666f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u901a\u8fc7\u5c06\u957f\u7bc7\u3001\u566a\u97f3\u4e30\u5bcc\u7684\u7528\u6237\u5386\u53f2\u8f6c\u5316\u4e3a\u4fe1\u606f\u4e30\u5bcc\u3001\u6613\u4e8e\u7406\u89e3\u7684\u8868\u793a\uff0c\u4ece\u800c\u63d0\u9ad8LLMs\u7684\u4e2a\u4eba\u5316\u80fd\u529b\u3002|\n", "2409.04388": "|**2024-09-06**|**Question-Answering Dense Video Events**|Hangyu Qin et.al.|[2409.04388](http://arxiv.org/abs/2409.04388)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u65b0\u7684\u4efb\u52a1\u2014\u2014\u9488\u5bf9\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u4e0e\u5b9a\u4f4d\uff0c\u8fd9\u8981\u6c42\u6a21\u578b\u80fd\u591f\u51c6\u786e\u7406\u89e3\u5e76\u63a8\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u591a\u4e2a\u4e8b\u4ef6\u3002\u4e3a\u4e86\u652f\u6301\u8fd9\u4e00\u7814\u7a76\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aDeVE-QA\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u5173\u4e8e10600\u4e2a\u957f\u89c6\u9891\u4e2d26000\u4e2a\u4e8b\u4ef6\u768478000\u4e2a\u95ee\u9898\u3002 \u73b0\u6709\u5728\u5355\u4e8b\u4ef6\u95ee\u7b54\u4e0a\u8868\u73b0\u51fa\u8272\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u9762\u5bf9DeVE-QA\u65f6\u9047\u5230\u6311\u6218\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u5728\u5904\u7406\u957f\u65f6\u95f4\u6bb5\u5185\u53d1\u751f\u7684\u591a\u4e2a\u4e8b\u4ef6\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDeVi\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u662f\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u5373\u53ef\u63d0\u5347MLLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002DeVi\u901a\u8fc7\u5f15\u5165\u4e09\u4e2a\u5173\u952e\u6a21\u5757\u6765\u6539\u8fdb\u73b0\u6709\u7684MLLMs\uff1a\u5c42\u7ea7\u63cf\u8ff0\u6a21\u5757\u3001\u65f6\u95f4\u4e8b\u4ef6\u8bb0\u5fc6\u6a21\u5757\u548c\u81ea\u6211\u4e00\u81f4\u6027\u68c0\u67e5\u6a21\u5757\u3002\u8fd9\u4e09\u4e2a\u6a21\u5757\u5206\u522b\u7528\u4e8e\u68c0\u6d4b\u3001\u4e0a\u4e0b\u6587\u5316\u548c\u8bb0\u5fc6\u957f\u89c6\u9891\u4e2d\u7684\u5bc6\u96c6\u4e8b\u4ef6\uff0c\u4ee5\u53ca\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u4ee5\u8fdb\u884c\u95ee\u9898\u56de\u7b54\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u73b0\u6709MLLMs\u76f8\u6bd4\uff0cDeVi\u5728\u56de\u7b54\u5bc6\u96c6\u4e8b\u4ef6\u95ee\u9898\u548c\u5b9a\u4f4d\u76f8\u5173\u89c6\u9891\u7247\u6bb5\u65b9\u9762\u8868\u73b0\u66f4\u4f18\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728DeVE-QA\u6570\u636e\u96c6\u4e0a\uff0cDeVi\u7684G(round)QA\u51c6\u786e\u7387\u63d0\u9ad8\u4e864.1%\uff0c\u5728NExT-GQA\u6570\u636e\u96c6\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e863.7%\u3002|\n", "2409.04318": "|**2024-09-06**|**Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs**|Aliakbar Nafar et.al.|[2409.04318](http://arxiv.org/abs/2409.04318)|**[link](https://github.com/HLR/LvsR-LLM)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8bc4\u4f30\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5185\u5728\u5b66\u4e60\u673a\u5236\u7684\u6846\u67b6\u3002\u6211\u4eec\u58f0\u79f0\uff0c\u8fd9\u4e9b\u673a\u5236\u662f\u901a\u8fc7\u68c0\u7d22\u5185\u90e8\u77e5\u8bc6\u548c\u901a\u8fc7\u5173\u6ce8\u56de\u5f52\u4efb\u52a1\u4ece\u4e0a\u4e0b\u6587\u4e2d\u7684\u793a\u4f8b\u8fdb\u884c\u5b66\u4e60\u7684\u7ec4\u5408\u3002\u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLMs\u5728\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u6267\u884c\u56de\u5f52\u7684\u80fd\u529b\uff0c\u5e76\u8bbe\u8ba1\u5b9e\u9a8c\u6765\u8861\u91cf\u6a21\u578b\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u901a\u8fc7\u68c0\u7d22\u5176\u5185\u90e8\u77e5\u8bc6\u800c\u4e0d\u662f\u4ece\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u5b66\u4e60\u6765\u8fdb\u884c\u5185\u5728\u5b66\u4e60\u3002\u6211\u4eec\u8ba4\u4e3a\u8fd9\u4e2a\u8fc7\u7a0b\u4f4d\u4e8e\u8fd9\u4e24\u4e2a\u6781\u7aef\u4e4b\u95f4\u7684\u8fde\u7eed\u4f53\u4e0a\u3002\u6211\u4eec\u6df1\u5165\u5206\u6790\u4e86\u6839\u636e\u5404\u79cd\u56e0\u7d20\uff08\u5982\u4efb\u52a1\u7684\u5148\u9a8c\u77e5\u8bc6\u4ee5\u53ca\u63d0\u4f9b\u7ed9\u4e0a\u4e0b\u6587\u793a\u4f8b\u7684\u4fe1\u606f\u7c7b\u578b\u548c\u4e30\u5bcc\u5ea6\uff09\u8fd9\u4e9b\u673a\u5236\u88ab\u89e6\u53d1\u7684\u7a0b\u5ea6\u3002\u6211\u4eec\u4f7f\u7528\u4e09\u79cdLLMs\u5e76\u5229\u7528\u591a\u4e2a\u6570\u636e\u96c6\u6765\u9a8c\u8bc1\u6211\u4eec\u7684\u53d1\u73b0\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u5982\u4f55\u6839\u636e\u6240\u89e3\u51b3\u7684\u95ee\u9898\u5229\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u4e2d\u7684\u5143\u5b66\u4e60\u548c\u4fc3\u8fdb\u77e5\u8bc6\u68c0\u7d22\u7684\u65b9\u6cd5\u3002|\n", "2409.04312": "|**2024-09-06**|**An optically accelerated extreme learning machine using hot atomic vapors**|Pierre Azam et.al.|[2409.04312](http://arxiv.org/abs/2409.04312)|null|\u673a\u5668\u5b66\u4e60\u6b63\u9010\u6e10\u6210\u4e3a\u4e00\u79cd\u5e7f\u6cdb\u5e94\u7528\u7684\u6280\u672f\uff0c\u5176\u589e\u957f\u901f\u5ea6\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u539f\u56e0\u5728\u4e8e\u5b83\u80fd\u591f\u63d0\u4f9b\u89e3\u51b3\u793e\u4f1a\u5173\u6ce8\u95ee\u9898\u7684\u5b9e\u7528\u89e3\u51b3\u65b9\u6848\u7684\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u968f\u7740\u5e94\u7528\u548c\u6240\u9700\u8d44\u6e90\u7684\u589e\u52a0\uff0c\u5f53\u524d\u7684\u786c\u4ef6\u6280\u672f\u5f00\u59cb\u53d7\u9650\u3002\u7279\u522b\u662f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6216\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u8bc6\u522b\u7b49\u65b0\u578b\u673a\u5668\u5b66\u4e60\u9886\u57df\uff0c\u8ba1\u7b97\u65f6\u95f4\u4e0e\u80fd\u6e90\u6210\u672c\u6210\u4e3a\u4e86\u5173\u952e\u95ee\u9898\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u591a\u5e74\u6765\u5df2\u7ecf\u8bbe\u8ba1\u51fa\u4e86\u5149\u5b66\u5e73\u53f0\uff0c\u65e8\u5728\u5f00\u53d1\u66f4\u9ad8\u6548\u7684\u673a\u5668\u5b66\u4e60\u786c\u4ef6\u3002 \u5176\u4e2d\uff0c\u81ea\u7531\u7a7a\u95f4\u4f20\u64ad\u5e73\u53f0\u5177\u6709\u591a\u79cd\u4f18\u52bf\uff1a\u5e76\u884c\u6027\u3001\u4f4e\u80fd\u8017\u4e0e\u8ba1\u7b97\u901f\u5ea6\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u7ed3\u5408\u5149\u675f\u5728\u70ed\u539f\u5b50\u84b8\u6c14\u4e2d\u4f20\u64ad\u7684\u5f3a\u70c8\u4e14\u53ef\u8c03\u975e\u7ebf\u6027\u7279\u6027\u7684\u65b0\u8bbe\u8ba1\uff0c\u5e76\u4e0e\u6781\u7aef\u5b66\u4e60\u673a\u6a21\u578b\u76f8\u7ed3\u5408\u3002\u901a\u8fc7\u6570\u503c\u6a21\u62df\u4e0e\u5b9e\u9a8c\u9a8c\u8bc1\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5728MNIST\u56fe\u50cf\u5206\u7c7b\u4efb\u52a1\u4e2d\u4f7f\u7528\u6b64\u7c7b\u81ea\u7531\u7a7a\u95f4\u975e\u7ebf\u6027\u4f20\u64ad\u589e\u5f3a\u8bad\u7ec3\u7684\u6548\u679c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6307\u51fa\u4e86\u5b9e\u9a8c\u4e2d\u7684\u591a\u4e2a\u8d85\u53c2\u6570\uff0c\u8fd9\u4e9b\u53c2\u6570\u8fdb\u4e00\u6b65\u4f18\u5316\u540e\u53ef\u4ee5\u63d0\u9ad8\u5e73\u53f0\u7684\u51c6\u786e\u6027\u3002|\n", "2409.04286": "|**2024-09-06**|**Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets**|Desiree Heim et.al.|[2409.04286](http://arxiv.org/abs/2409.04286)|null|\u5f53\u524d\u516c\u5f00\u7684\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u5728\u591a\u6837\u6027\u3001\u8be6\u5c3d\u6ce8\u91ca\u4ee5\u53ca\u7528\u6237\u548c\u6587\u6863\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u65b9\u9762\u5b58\u5728\u4e0d\u8db3\uff0c\u8fd9\u963b\u788d\u4e86\u5bf9\u77e5\u8bc6\u5de5\u4f5c\u8f85\u52a9\u7cfb\u7edf\u8fdb\u884c\u5ba2\u89c2\u548c\u53ef\u6bd4\u8f83\u7684\u6570\u636e\u9a71\u52a8\u8bc4\u4f30\u4e0e\u4f18\u5316\u3002\u7531\u4e8e\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u6536\u96c6\u6b64\u7c7b\u6570\u636e\u6240\u9700\u7684\u8d44\u6e90\u5de8\u5927\uff0c\u4ee5\u53ca\u6570\u636e\u5ba1\u67e5\u7684\u5fc5\u8981\u6027\uff0c\u56e0\u6b64\u6784\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u51e0\u4e4e\u4e0d\u53ef\u80fd\u5b9e\u73b0\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u53ef\u914d\u7f6e\u7684\u591a\u4ee3\u7406\u77e5\u8bc6\u5de5\u4f5c\u6570\u636e\u96c6\u751f\u6210\u5668\u3002\u8be5\u7cfb\u7edf\u6a21\u62df\u4e86\u7531\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u6587\u6863\u5e76\u76f8\u4e92\u534f\u4f5c\u7684\u4ee3\u7406\u4e4b\u95f4\u7684\u77e5\u8bc6\u5de5\u4f5c\uff0c\u5e76\u8bb0\u5f55\u4e86\u4f34\u968f\u7684\u6570\u636e\u8f68\u8ff9\u3002\u6b64\u5916\uff0c\u751f\u6210\u5668\u5728\u5176\u914d\u7f6e\u4e2d\u6355\u83b7\u6216\u5728\u6a21\u62df\u8fc7\u7a0b\u4e2d\u521b\u5efa\u7684\u6240\u6709\u80cc\u666f\u4fe1\u606f\uff0c\u5e76\u4ee5\u77e5\u8bc6\u56fe\u8c31\u7684\u5f62\u5f0f\u5b58\u50a8\u3002\u6700\u540e\uff0c\u4ea7\u751f\u7684\u6570\u636e\u96c6\u53ef\u4ee5\u7528\u4e8e\u5229\u7528\u548c\u5171\u4eab\uff0c\u800c\u65e0\u9700\u6d89\u53ca\u9690\u79c1\u6216\u673a\u5bc6\u95ee\u9898\u3002 \u672c\u6587\u4ecb\u7ecd\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u8bbe\u8ba1\u613f\u666f\uff0c\u5e76\u4e13\u6ce8\u4e8e\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u751f\u6210\u771f\u5b9e\u7684\u77e5\u8bc6\u5de5\u4f5c\u6587\u6863\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u4eba\u7c7b\u8bc4\u4f30\u8005\u8bc4\u4f30\u4e86\u751f\u6210\u6587\u6863\u768453%\u548c\u771f\u5b9e\u6587\u6863\u768474%\uff0c\u8ba4\u4e3a\u5b83\u4eec\u5177\u6709\u771f\u5b9e\u6027\uff0c\u8fd9\u8868\u660e\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u6f5c\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5206\u6790\u4e86\u53c2\u4e0e\u8005\u8bc4\u8bba\u4e2d\u63d0\u5230\u7684\u771f\u5b9e\u6027\u6807\u51c6\uff0c\u5e76\u5bf9\u5df2\u8bc6\u522b\u7684\u5e38\u89c1\u95ee\u9898\u8fdb\u884c\u4e86\u8be6\u7ec6\u8bf4\u660e\uff0c\u63d0\u51fa\u4e86\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.04270": "|**2024-09-06**|**Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models**|Yuxiao Huang et.al.|[2409.04270](http://arxiv.org/abs/2409.04270)|null|\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4f18\u5316\u8303\u5f0f\uff0c\u4ee5\u5efa\u7acb\u4e00\u4e2a\u81ea\u4e3b\u6a21\u578b\u5de5\u5382\uff0c\u7528\u4e8e\u751f\u6210\u9002\u7528\u4e8e\u4e0d\u540c\u4f18\u5316\u4efb\u52a1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u3002\u8fd9\u4e00\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u8bbe\u8ba1\u8fc7\u7a0b\uff0c\u5b9e\u73b0\u9ad8\u6548\u4e14\u6709\u6548\u7684\u77e5\u8bc6\u8f6c\u79fb\u3002\u4e3a\u4e86\u8bc4\u4f30\u6240\u63d0\u51fa\u65b9\u6cd5\u7684\u6027\u80fd\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u7814\u7a76\uff0c\u5c06\u751f\u6210\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u4e0e\u73b0\u6709\u7684\u6700\u4f73\u77e5\u8bc6\u8f6c\u79fb\u65b9\u6cd5\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u8868\u660e\uff0c\u751f\u6210\u7684\u6a21\u578b\u5728\u6548\u7387\u548c\u6709\u6548\u6027\u65b9\u9762\u5747\u8868\u73b0\u51fa\u4f18\u4e8e\u6216\u4e0e\u624b\u5de5\u8bbe\u8ba1\u7684\u77e5\u8bc6\u8f6c\u79fb\u6a21\u578b\u76f8\u5f53\u7684\u6027\u80fd\u3002|\n", "2409.04183": "|**2024-09-06**|**GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding**|Ziyin Zhang et.al.|[2409.04183](http://arxiv.org/abs/2409.04183)|null|\u5728\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86GALLa - \u56fe\u5f62\u5bf9\u9f50\u5927\u578b\u8bed\u8a00\u6a21\u578b\u3002GALLa \u5229\u7528\u56fe\u795e\u7ecf\u7f51\u7edc\u548c\u8de8\u6a21\u6001\u5bf9\u9f50\u6280\u672f\uff0c\u5728\u5fae\u8c03\u8fc7\u7a0b\u4e2d\u5411LLM\u6ce8\u5165\u4ee3\u7801\u7684\u7ed3\u6784\u4fe1\u606f\u4f5c\u4e3a\u8f85\u52a9\u4efb\u52a1\u3002\u8fd9\u79cd\u6846\u67b6\u65e2\u65e0\u6a21\u578b\u4f9d\u8d56\u6027\u4e5f\u65e0\u4efb\u52a1\u4f9d\u8d56\u6027\uff0c\u5b83\u53ef\u4ee5\u5e94\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801LLM\u7528\u4e8e\u4efb\u4f55\u4ee3\u7801\u4e0b\u6e38\u4efb\u52a1\uff0c\u5e76\u4ec5\u5728\u8bad\u7ec3\u65f6\u4ece\u4e0e\u5fae\u8c03\u6570\u636e\u65e0\u5173\u7684\u8bed\u6599\u5e93\u4e2d\u83b7\u53d6\u7ed3\u6784\u5316\u56fe\u5f62\u6570\u636e\uff0c\u800c\u5728\u63a8\u7406\u9636\u6bb5\u65e0\u9700\u989d\u5916\u6210\u672c\u3002\u901a\u8fc7\u56db\u79cd\u4e0d\u540c\u57fa\u7ebfLLM\uff08\u53c2\u6570\u91cf\u4ece3.5\u4ebf\u523080\u4ebf\u4e0d\u7b49\uff09\u5728\u4e94\u4e2a\u4ee3\u7801\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u9a8c\u8bc1\u4e86GALLa\u7684\u6709\u6548\u6027\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5f3a\u5927\u7684\u6a21\u578b\u5982LLaMA3\uff0c\u4e5f\u8bc1\u660e\u4e86\u5176\u4e00\u81f4\u6027\u6539\u8fdb\u3002|\n", "2409.04181": "|**2024-09-06**|**Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering**|Larissa Pusch et.al.|[2409.04181](http://arxiv.org/abs/2409.04181)|null|\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u7684\u8fdb\u6b65\u6781\u5927\u5730\u6539\u53d8\u4e86\u6211\u4eec\u4e0e\u6570\u636e\u5e93\u7b49\u4fe1\u606f\u7cfb\u7edf\u7684\u4ea4\u4e92\u65b9\u5f0f\uff0c\u4f7f\u5176\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u7136\u800c\uff0c\u5728\u5173\u952e\u51c6\u786e\u6027\u9886\u57df\uff0c\u5982\u751f\u7269\u533b\u5b66\u9886\u57df\uff0c\u4ecd\u5b58\u5728\u6311\u6218\u3002\u5176\u4e2d\u4e00\u4e2a\u91cd\u8981\u95ee\u9898\u662f\u5e7b\u89c9\u95ee\u9898\uff0c\u5373\u6a21\u578b\u751f\u6210\u4e86\u6570\u636e\u652f\u6301\u4e4b\u5916\u7684\u4fe1\u606f\uff0c\u8fd9\u53ef\u80fd\u5bfc\u81f4\u5371\u9669\u7684\u9519\u8bef\u4fe1\u606f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u901a\u8fc7\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u77e5\u8bc6\u56fe\u8c31\uff08KG\uff09\u6765\u6539\u5584\u95ee\u7b54\u7cfb\u7edf\u7684\u51c6\u786e\u6027\u548c\u53ef\u9760\u6027\uff0c\u4ee5\u751f\u7269\u533b\u5b66KG\u4e3a\u4f8b\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8eLangChain\u6846\u67b6\u6784\u5efa\uff0c\u901a\u8fc7\u5f15\u5165\u67e5\u8be2\u68c0\u67e5\u5668\u786e\u4fddLLM\u751f\u6210\u7684\u67e5\u8be2\u5728\u8bed\u6cd5\u548c\u8bed\u4e49\u4e0a\u7684\u6709\u6548\u6027\uff0c\u7136\u540e\u4f7f\u7528\u8fd9\u4e9b\u67e5\u8be2\u4ece\u77e5\u8bc6\u56fe\u8c31\u4e2d\u63d0\u53d6\u4fe1\u606f\uff0c\u5927\u5e45\u51cf\u5c11\u4e86\u9519\u8bef\u5982\u5e7b\u89c9\u7684\u53d1\u751f\u3002 \u6211\u4eec\u4f7f\u7528\u4e86\u4e00\u4e2a\u5305\u542b50\u4e2a\u751f\u7269\u533b\u5b66\u95ee\u9898\u7684\u65b0\u57fa\u51c6\u6570\u636e\u96c6\u5bf9\u6574\u4f53\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6d4b\u8bd5\u4e86\u5305\u62ecGPT-4 Turbo\u548cllama3:70b\u5728\u5185\u7684\u51e0\u79cdLLM\u3002\u7ed3\u679c\u663e\u793a\uff0c\u867d\u7136GPT-4 Turbo\u5728\u751f\u6210\u51c6\u786e\u67e5\u8be2\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5f00\u6e90\u6a21\u578b\u5982llama3:70b\u5728\u9002\u5f53\u7684\u95ee\u9898\u63d0\u793a\u5de5\u7a0b\u4e0b\u4e5f\u663e\u793a\u51fa\u6f5c\u529b\u3002\u4e3a\u4e86\u4f7f\u8fd9\u79cd\u65b9\u6cd5\u6613\u4e8e\u8bbf\u95ee\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u7528\u6237\u53cb\u597d\u7684Web\u754c\u9762\uff0c\u5141\u8bb8\u7528\u6237\u8f93\u5165\u81ea\u7136\u8bed\u8a00\u67e5\u8be2\uff0c\u67e5\u770b\u751f\u6210\u548c\u4fee\u6b63\u7684Cypher\u67e5\u8be2\uff0c\u5e76\u9a8c\u8bc1\u7ed3\u679c\u8def\u5f84\u7684\u51c6\u786e\u6027\u3002 \u603b\u4f53\u800c\u8a00\uff0c\u8fd9\u79cd\u6df7\u5408\u65b9\u6cd5\u6709\u6548\u5730\u89e3\u51b3\u4e86\u6570\u636e\u7f3a\u53e3\u548c\u5e7b\u89c9\u7b49\u5e38\u89c1\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u9760\u4e14\u76f4\u89c2\u7684\u89e3\u51b3\u65b9\u6848\u6765\u6539\u8fdb\u95ee\u7b54\u7cfb\u7edf\u3002\u751f\u6210\u672c\u6587\u7ed3\u679c\u548c\u7528\u6237\u754c\u9762\u6240\u9700\u6e90\u4ee3\u7801\u7684Git\u4ed3\u5e93\u94fe\u63a5\u5982\u4e0b\uff1ahttps://git.zib.de/lpusch/cyphergenkg-gui|\n", "2409.04168": "|**2024-09-06**|**From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks**|Andreas Stephan et.al.|[2409.04168](http://arxiv.org/abs/2409.04168)|null|\u4e3a\u4e86\u51cf\u5c11\u5bf9\u4eba\u5de5\u6807\u6ce8\u7684\u9700\u6c42\uff0c\u63d0\u51fa\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u5019\u9009\u6a21\u578b\u8d28\u91cf\u7684\u8bc4\u5224\u8005\u3002\u8fd9\u4e9bLLM\u8bc4\u5224\u8005\u901a\u5e38\u901a\u8fc7\u5728\u6458\u8981\u6216\u673a\u5668\u7ffb\u8bd1\u7b49\u751f\u6210\u4efb\u52a1\u4e0a\u4e0e\u4eba\u7c7b\u5224\u65ad\u7684\u76f8\u5173\u6027\u6765\u8bc4\u4f30\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u5728\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u7684LLM\u8bc4\u5224\u8005\u3002\u8fd9\u7c7b\u4efb\u52a1\u9700\u8981\u591a\u6b65\u63a8\u7406\uff0c\u5176\u89e3\u7b54\u7684\u6b63\u786e\u6027\u53ef\u4ee5\u9a8c\u8bc1\uff0c\u4ece\u800c\u63d0\u4f9b\u4e86\u4e00\u79cd\u66f4\u5ba2\u89c2\u7684\u8bc4\u4f30\u65b9\u5f0f\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u8be6\u7ec6\u7684\u8868\u73b0\u5206\u6790\uff0c\u5e76\u53d1\u73b0\u4f7f\u7528\u7684\u8bc4\u5224\u8005\u5927\u591a\u65e0\u6cd5\u63d0\u9ad8\u4efb\u52a1\u6027\u80fd\uff0c\u4f46\u80fd\u591f\u9009\u62e9\u66f4\u597d\u7684\u6a21\u578b\u3002\u6211\u4eec\u7684\u5206\u6790\u63ed\u793a\u4e86\u8bc4\u5224\u8868\u73b0\u4e0e\u5019\u9009\u6a21\u578b\u4efb\u52a1\u8868\u73b0\u4e4b\u95f4\u7684\u5f3a\u76f8\u5173\u6027\u3002\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u503e\u5411\u4e8e\u9009\u62e9\u66f4\u9ad8\u8d28\u91cf\u7684\u6a21\u578b\uff0c\u5373\u4f7f\u5176\u7b54\u6848\u662f\u9519\u8bef\u7684\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u53ef\u4ee5\u901a\u8fc7\u7edf\u8ba1\u63aa\u65bd\uff0c\u5982\u5019\u9009\u6a21\u578b\u7684\u4efb\u52a1\u6027\u80fd\uff0c\u6765\u9884\u6d4b\u8bc4\u5224\u8868\u73b0\u3002\u5728\u6d88\u878d\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4ea4\u6362\u6216\u5c4f\u853d\u5019\u9009\u7b54\u6848\uff0c\u5e76\u89c2\u5bdf\u5230\u8bc4\u5224\u8005\u7ecf\u5e38\u4fdd\u6301\u539f\u59cb\u5224\u65ad\uff0c\u8fd9\u63d0\u4f9b\u4e86\u8bc1\u636e\u8868\u660e\u8bc4\u5224\u8005\u5728\u5224\u65ad\u4e2d\u878d\u5165\u4e86\u5199\u4f5c\u98ce\u683c\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u53d1\u73b0\u4f7f\u7528\u7edf\u8ba1\u6307\u6807\u91cf\u5316\u5224\u65ad\u4e2d\u7684\u89c4\u5f8b\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528\u5b83\u4eec\u7684\u5404\u79cd\u89d2\u5ea6\u3002|\n", "2409.04164": "|**2024-09-06**|**Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation**|Luis Mayer et.al.|[2409.04164](http://arxiv.org/abs/2409.04164)|null|\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4f5c\u4e3a\u4e00\u79cd\u5f3a\u5927\u7684\u5de5\u5177\uff0c\u5728\u591a\u4e2a\u9886\u57df\u5c55\u73b0\u51fa\u6f5c\u529b\uff0c\u5305\u62ec\u8f6f\u4ef6\u5de5\u7a0b\u3002\u5728\u672c\u7814\u7a76\u4e2d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4e94\u6b3e\u6700\u5148\u8fdb\u7684LLM\u2014\u2014Bard\u3001BingChat\u3001ChatGPT\u3001Llama2\u548cCode Llama\u2014\u2014\u5728\u6587\u672c\u5230\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u5411\u6a21\u578b\u63d0\u4f9b\u6765\u81ea\u7f16\u7a0b\u7f51\u7ad9LeetCode\u7684\u7f16\u7801\u95ee\u9898\u63cf\u8ff0\u6587\u672c\u63d0\u793a\uff0c\u8981\u6c42\u5b83\u4eec\u7528Python\u7f16\u5199\u89e3\u51b3\u65b9\u6848\u3002\u968f\u540e\uff0c\u6211\u4eec\u4f7f\u7528LeetCode\u7684\u6d4b\u8bd5\u529f\u80fd\u6765\u8bc4\u4f30\u751f\u6210\u8f93\u51fa\u7684\u8d28\u91cf\u3002 \u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5b58\u5728\u663e\u8457\u5dee\u5f02\u3002ChatGPT\u5728\u5904\u7406\u8fd9\u7c7b\u7f16\u7a0b\u6311\u6218\u65b9\u9762\u8868\u73b0\u6700\u4e3a\u6709\u6548\uff0c\u751a\u81f3\u8d85\u8fc7\u4e86\u4e13\u95e8\u9488\u5bf9\u4ee3\u7801\u7684\u6a21\u578b\uff0c\u5982Code Llama\u3002\u4e3a\u4e86\u8fdb\u4e00\u6b65\u4e86\u89e3\u60c5\u51b5\uff0c\u6211\u4eec\u6d4b\u91cf\u4e86\u751f\u6210\u4ee3\u7801\u7684\u8fd0\u884c\u65f6\u95f4\u548c\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\uff0c\u5e76\u5c06\u5176\u4e0eLeetCode\u4e0a\u7684\u5176\u4ed6\u4ee3\u7801\u63d0\u4ea4\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u8be6\u7ec6\u9519\u8bef\u5206\u6790\u5305\u62ec\u6bd4\u8f83\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u6b63\u786e\u7f29\u8fdb\u548c\u5f62\u5f0f\u5dee\u5f02\uff0c\u4ee5\u53ca\u5c06\u672a\u89e3\u51b3\u7684\u4efb\u52a1\u5f52\u7c7b\u5230\u7279\u5b9a\u9519\u8bef\u7c7b\u522b\uff0c\u6709\u52a9\u4e8e\u6211\u4eec\u66f4\u6df1\u5165\u5730\u7406\u89e3\u7ed3\u679c\u5e76\u627e\u5230\u6539\u8fdb\u7a7a\u95f4\u3002\u7814\u7a76\u7ed3\u679c\u8fd8\u663e\u793a\uff0c\u5f53\u6a21\u578b\u9762\u4e34\u5927\u91cf\u4e0a\u4e0b\u6587\u4fe1\u606f\u65f6\uff0c\u5373\u8f83\u957f\u63d0\u793a\u65f6\uff0c\u751f\u6210\u7684\u4ee3\u7801\u8d8a\u6765\u8d8a\u4e0d\u51c6\u786e\u3002|\n", "2409.05840": "|**2024-09-09**|**MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct**|Run Luo et.al.|[2409.05840](http://arxiv.org/abs/2409.05840)|null|\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u53d1\u5c55\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5728\u6570\u636e\u91cf\u548c\u6570\u636e\u8d28\u91cf\u65b9\u9762\u4ecd\u7136\u5b58\u5728\u5173\u952e\u74f6\u9888\u3002\u624b\u52a8\u521b\u5efa\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u65e2\u8017\u65f6\u53c8\u4f4e\u6548\uff0c\u5c24\u5176\u662f\u5728\u751f\u6210\u9ad8\u590d\u6742\u6027\u7684\u6307\u4ee4\u65f6\u3002\u6b64\u5916\uff0c\u4ece\u201c\u9ed1\u76d2\u201d\u5546\u4e1a\u6a21\u578b\uff08\u4f8b\u5982GPT-4o\u3001GPT-4V\uff09\u4e2d\u63d0\u53d6\u6307\u4ee4\u6570\u636e\u5f80\u5f80\u5bfc\u81f4\u751f\u6210\u7684\u6307\u4ee4\u6570\u636e\u8fc7\u4e8e\u7b80\u5355\uff0c\u8fd9\u9650\u5236\u4e86\u6a21\u578b\u6027\u80fd\u4ec5\u4e0e\u5176\u81ea\u8eab\u6c34\u5e73\u76f8\u5f53\u3002\u6784\u5efa\u591a\u6837\u6027\u548c\u590d\u6742\u6027\u6307\u4ee4\u6570\u636e\u7684\u6311\u6218\u4f9d\u7136\u5de8\u5927\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMMEvol\u7684\u65b0\u9896\u591a\u6a21\u6001\u6307\u4ee4\u6570\u636e\u8fdb\u5316\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u7cbe\u7ec6\u611f\u77e5\u6f14\u5316\u3001\u8ba4\u77e5\u63a8\u7406\u6f14\u5316\u4ee5\u53ca\u4e92\u52a8\u6f14\u5316\u3002\u8fd9\u4e00\u8fed\u4ee3\u65b9\u6cd5\u7a81\u7834\u4e86\u6570\u636e\u8d28\u91cf\u74f6\u9888\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u590d\u6742\u4e14\u591a\u6837\u5316\u7684\u56fe\u50cf-\u6587\u672c\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u4ece\u800c\u589e\u5f3a\u4e86MLLMs\u7684\u80fd\u529b\u3002\u6211\u4eec\u4ee5\u521d\u59cb\u6307\u4ee4\u96c6\u5408SEED-163K\u4e3a\u57fa\u7840\uff0c\u5229\u7528MMEvol\u7cfb\u7edf\u5730\u6269\u5c55\u4e86\u6307\u4ee4\u7c7b\u578b\u7684\u591a\u6837\u6027\uff0c\u878d\u5165\u4e86\u589e\u5f3a\u8ba4\u77e5\u80fd\u529b\u7684\u63a8\u7406\u6b65\u9aa4\uff0c\u5e76\u4ece\u56fe\u50cf\u4e2d\u63d0\u53d6\u4e86\u8be6\u7ec6\u4fe1\u606f\u4ee5\u63d0\u9ad8\u89c6\u89c9\u7406\u89e3\u548c\u9c81\u68d2\u6027\u3002 \u4e3a\u4e86\u5168\u9762\u8bc4\u4f30\u6211\u4eec\u6570\u636e\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u4f7f\u7528\u8fdb\u5316\u7684\u6570\u636e\u8bad\u7ec3\u4e86LLaVA-NeXT\uff0c\u5e76\u572813\u4e2a\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u4e0e\u57fa\u4e8e\u539f\u59cb\u6570\u636e\u8bad\u7ec3\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5e73\u5747\u63d0\u9ad8\u4e863.1\u70b9\u51c6\u786e\u7387\uff0c\u5e76\u57289\u4e2a\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\u6c34\u5e73\u3002|\n", "2409.05824": "|**2024-09-09**|**Are Large Language Models a Threat to Programming Platforms? An Exploratory Study**|Md Mustakim Billah et.al.|[2409.05824](http://arxiv.org/abs/2409.05824)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982ChatGPT\u3001Gemini\u548cMeta AI\u5728LeetCode\u3001Codeforces\u548cHackerRank\u7b49\u7ade\u8d5b\u7f16\u7a0b\u5e73\u53f0\u4e0a\u7684\u95ee\u9898\u89e3\u51b3\u80fd\u529b\u3002\u8fd9\u4e9b\u5e73\u53f0\u5e38\u88ab\u62db\u8058\u4eba\u5458\u7528\u6765\u7b5b\u9009\u7f16\u7a0b\u6280\u80fd\u3002\u968f\u7740LLM\u80fd\u529b\u7684\u63d0\u5347\uff0c\u5bf9\u5176\u5728\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u3001\u5404\u7c7b\u522b\u7684\u7f16\u7a0b\u6311\u6218\u4e2d\u7684\u8868\u73b0\u8fdb\u884c\u8bc4\u4f30\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002 \u7814\u7a76\u56e2\u961f\u4eceLeetCode\u9009\u53d6\u4e8698\u4e2a\u95ee\u9898\uff0c\u4eceCodeforces\u9009\u53d6\u4e86126\u4e2a\u95ee\u9898\uff0c\u8986\u76d6\u4e8615\u4e2a\u7c7b\u522b\u3002\u901a\u8fc7\u4e5d\u573a\u5728\u7ebfCodeforces\u548cLeetCode\u7ade\u8d5b\u4ee5\u53caHackerRank\u7684\u4e24\u9879\u8ba4\u8bc1\u6d4b\u8bd5\uff0c\u5bf9LLM\u7684\u5b9e\u65f6\u6027\u80fd\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002\u7814\u7a76\u8fc7\u7a0b\u4e2d\u4f7f\u7528\u4e86\u63d0\u793a\u548c\u53cd\u9988\u673a\u5236\u6765\u5f15\u5bfcLLM\uff0c\u5e76\u63a2\u7d22\u4e86\u4e0d\u540c\u573a\u666f\u4e4b\u95f4\u7684\u76f8\u5173\u6027\u3002 \u7ed3\u679c\u663e\u793a\uff0cChatGPT\u7b49LLM\u5728LeetCode\u548cHackerRank\u7684\u8ba4\u8bc1\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff08\u6210\u529f\u7387\u4e3a71.43%\uff09\uff0c\u4f46\u5728\u865a\u62df\u7ade\u8d5b\u4e2d\uff0c\u7279\u522b\u662f\u5728Codeforces\u7684\u9ad8\u96be\u5ea6\u6bd4\u8d5b\u4e2d\uff0c\u5b83\u4eec\u7684\u8868\u73b0\u4e0d\u5c3d\u5982\u4eba\u610f\u3002\u5c3d\u7ba1\u5728LeetCode\u6863\u6848\u5e93\u4e2d\u7684\u7528\u6237\u4e2d\u8868\u73b0\u4f18\u4e8e\u90e8\u5206\u7528\u6237\uff0c\u4f46LLM\u5728\u65f6\u95f4\u6548\u7387\u548c\u5185\u5b58\u6548\u7387\u4e0a\u8868\u73b0\u7a81\u51fa\uff0c\u800c\u5728\u66f4\u56f0\u96be\u7684Codeforces\u7ade\u8d5b\u4e2d\u5219\u5904\u4e8e\u52a3\u52bf\u3002 \u5c3d\u7ba1\u5f53\u524d\u60c5\u51b5\u5e76\u672a\u7acb\u5373\u6784\u6210\u5a01\u80c1\uff0c\u4f46LLM\u5728\u8fd9\u4e9b\u5e73\u53f0\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u62c5\u5fe7\uff0c\u672a\u6765\u9700\u8981\u6539\u8fdb\u4ee5\u63d0\u9ad8\u5176\u6027\u80fd\u3002|\n", "2409.05806": "|**2024-09-09**|**Benchmarking Chinese Knowledge Rectification in Large Language Models**|Tianhe Lu et.al.|[2409.05806](http://arxiv.org/abs/2409.05806)|**[link](https://github.com/zjunlp/easyedit)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u60ca\u4eba\u7684\u751f\u6210\u80fd\u529b\uff0c\u4f46\u5b83\u4eec\u5e76\u975e\u6ca1\u6709\u7f3a\u9677\uff0c\u7279\u522b\u662f\u5b58\u5728\u5e7b\u89c9\u7684\u95ee\u9898\u3002\u5f53LLM\u5e94\u7528\u4e8e\u7279\u5b9a\u8bed\u8a00\u548c\u9886\u57df\u65f6\uff0c\u8fd9\u4e00\u95ee\u9898\u5c24\u4e3a\u7a81\u51fa\u3002\u4f8b\u5982\uff0c\u5728\u5904\u7406\u4e2d\u56fd\u53e4\u4ee3\u8bd7\u6b4c\u3001\u8c1a\u8bed\u6216\u6210\u8bed\u65f6\uff0cLLM\u53ef\u80fd\u4f1a\u751f\u6210\u6beb\u65e0\u610f\u4e49\u7684\u4fe1\u606f\uff0c\u8fd9\u662f\u7531\u4e8e\u7f3a\u4e4f\u7279\u5b9a\u77e5\u8bc6\u9020\u6210\u7684\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLM\u7684\u57fa\u51c6\uff0c\u901a\u8fc7\u77e5\u8bc6\u7f16\u8f91\u6765\u7ea0\u6b63\u4e2d\u6587\u77e5\u8bc6\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4ece\u5404\u79cd\u6765\u6e90\u6536\u96c6\u4e03\u79cd\u7c7b\u578b\u7684\u77e5\u8bc6\uff0c\u5305\u62ec\u53e4\u5178\u6587\u672c\u3001\u6210\u8bed\u4ee5\u53ca\u6765\u81ea\u767e\u5ea6\u8d34\u5427\u201c\u6c42\u8bf8\u5bb6\u201d\u7684\u5185\u5bb9\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u65b0\u7684\u4e2d\u6587\u6570\u636e\u96c6CKnowEdit\uff0c\u4ee5\u5e94\u5bf9\u4e2d\u6587\u8bed\u8a00\u7279\u6709\u7684\u590d\u8c03\u6027\u3001\u53cd\u8bbd\u6027\u548c\u903b\u8f91\u7ed3\u6784\u3002\u901a\u8fc7\u5bf9\u8fd9\u4e2a\u6570\u636e\u96c6\u7684\u5206\u6790\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5f53\u524dLLM\u5728\u638c\u63e1\u4e2d\u6587\u65b9\u9762\u7684\u6311\u6218\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5728\u8be5\u6570\u636e\u96c6\u4e0a\u5bf9\u73b0\u6709\u7684\u77e5\u8bc6\u7f16\u8f91\u6280\u672f\u8fdb\u884c\u8bc4\u4f30\uff0c\u53d1\u73b0\u5bf9\u4e2d\u6587\u77e5\u8bc6\u7684\u4fee\u6b63\u4ecd\u5b58\u5728\u5de8\u5927\u7684\u63d0\u5347\u7a7a\u95f4\u3002\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u8bbf\u95ee\uff1ahttps://github.com/zjunlp/EasyEdit\u3002**|\n", "2409.05771": "|**2024-09-09**|**Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models**|Emily Cheng et.al.|[2409.05771](http://arxiv.org/abs/2409.05771)|null|\u7814\u7a76\u5df2\u53cd\u590d\u8bc1\u660e\uff0c\u4ece\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u63d0\u53d6\u7684\u4e2d\u95f4\u9690\u85cf\u72b6\u6001\u80fd\u591f\u9884\u6d4b\u5bf9\u81ea\u7136\u8bed\u8a00\u523a\u6fc0\u7684\u6d4b\u91cf\u5927\u8111\u53cd\u5e94\u3002\u7136\u800c\uff0c\u5173\u4e8e\u4f7f\u8fd9\u4e00\u9ad8\u9884\u6d4b\u6027\u80fd\u6210\u4e3a\u53ef\u80fd\u7684\u8868\u793a\u7279\u6027\u7684\u4e86\u89e3\u975e\u5e38\u6709\u9650\u3002\u4e3a\u4ec0\u4e48\u662f\u4e2d\u95f4\u5c42\u800c\u4e0d\u662f\u8f93\u51fa\u5c42\u5728\u8fd9\u4e00\u72ec\u7279\u4e14\u9ad8\u5ea6\u901a\u7528\u7684\u8f6c\u79fb\u4efb\u52a1\u4e2d\u6700\u4e3a\u6709\u6548\uff1f\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u529f\u80fd\u6027\u78c1\u5171\u632f\u6210\u50cf\u4e2d\u7684\u8bed\u8a00\u7f16\u7801\u6a21\u578b\u8bc1\u636e\u652f\u6301\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5185\u5b58\u5728\u4e24\u4e2a\u9636\u6bb5\u62bd\u8c61\u8fc7\u7a0b\u7684\u5b58\u5728\u3002\u6211\u4eec\u4f7f\u7528\u6d41\u5f62\u5b66\u4e60\u65b9\u6cd5\u8868\u660e\uff0c\u8fd9\u79cd\u62bd\u8c61\u8fc7\u7a0b\u81ea\u7136\u5730\u5728\u8bed\u8a00\u6a21\u578b\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4ea7\u751f\uff0c\u5e76\u4e14\u968f\u7740\u8bad\u7ec3\u7ee7\u7eed\u8fdb\u884c\uff0c\u8fd9\u4e2a\u62bd\u8c61\u8fc7\u7a0b\u7684\u7b2c\u4e00\u4e2a\u201c\u7ec4\u5408\u201d\u9636\u6bb5\u88ab\u538b\u7f29\u5230\u66f4\u5c11\u7684\u5c42\u4e2d\u3002\u6700\u540e\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5c42\u6b21\u7f16\u7801\u6027\u80fd\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8868\u793a\u7684\u5185\u5728\u7ef4\u5ea6\u4e4b\u95f4\u5b58\u5728\u5f3a\u70c8\u7684\u5bf9\u5e94\u5173\u7cfb\u3002\u6211\u4eec\u521d\u6b65\u8bc1\u636e\u8868\u660e\uff0c\u8fd9\u79cd\u5bf9\u5e94\u5173\u7cfb\u4e3b\u8981\u6765\u6e90\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5185\u5728\u7ec4\u5408\u6027\uff0c\u800c\u975e\u5176\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u5c5e\u6027\u3002|\n", "2409.05768": "|**2024-09-09**|**Model Input Verification of Large Scale Simulations**|Rumyana Neykova et.al.|[2409.05768](http://arxiv.org/abs/2409.05768)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8e\u9a8c\u8bc1\u6a21\u62df\u8f93\u5165\u6570\u636e\u6709\u6548\u6027\u7684\u65b9\u6cd5\u8bba\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6a21\u578b\u8f93\u5165\u9a8c\u8bc1\uff08MIV\uff09\u3002\u6211\u4eec\u901a\u8fc7\u8bbe\u8ba1\u7279\u5b9a\u4e8e\u6a21\u62df\u5efa\u6a21\u9700\u6c42\u7684\u6570\u636e\u6a21\u5f0f\u548c\u9a8c\u8bc1\u5de5\u5177\u5728\u540d\u4e3aFabGuard\u7684\u5de5\u5177\u96c6\u4e2d\u5b9e\u73b0\u4e86\u8fd9\u4e00\u65b9\u6cd5\u3002\u672c\u6587\u5f15\u5165\u4e86MIV\u6a21\u5f0f\u7684\u6b63\u5f0f\u5206\u7c7b\uff0c\u5e76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u96c6\u6210\u5230\u73b0\u6709\u6a21\u62df\u5de5\u4f5c\u6d41\u7a0b\u4e2d\u7684\u7b80\u5316\u9a8c\u8bc1\u7ba1\u9053\u3002FabGuard\u5728\u4e09\u4e2a\u4e0d\u540c\u9886\u57df\u2014\u2014\u51b2\u7a81\u9a71\u52a8\u7684\u4eba\u53e3\u8fc1\u79fb\u3001\u707e\u5bb3\u758f\u6563\u4ee5\u53ca\u75be\u75c5\u4f20\u64ad\u6a21\u578b\u2014\u2014\u7684\u5e94\u7528\u5f97\u5230\u4e86\u5c55\u793a\u3002\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u52a8\u5316\u7ea6\u675f\u751f\u6210\u548c\u63a8\u7406\u65b9\u9762\u7684\u5e94\u7528\u3002\u5728\u5bf9\u4e00\u4e2a\u79fb\u6c11\u6a21\u62df\u6848\u4f8b\u7684\u7814\u7a76\u4e2d\uff0cLLMs\u4e0d\u4ec5\u6b63\u786e\u63a8\u65ad\u51fa\u4e8623\u4e2a\u5f00\u53d1\u8005\u5b9a\u4e49\u7684\u7ea6\u675f\u4e2d\u768422\u4e2a\uff0c\u800c\u4e14\u8fd8\u53d1\u73b0\u4e86\u73b0\u6709\u7ea6\u675f\u4e2d\u7684\u9519\u8bef\uff0c\u5e76\u63d0\u51fa\u4e86\u65b0\u7684\u6709\u6548\u7ea6\u675f\u3002\u6211\u4eec\u7684\u8bc4\u4f30\u8868\u660e\uff0c\u5bf9\u4e8e\u5927\u578b\u6570\u636e\u96c6\uff0cMIV\u662f\u53ef\u884c\u7684\uff0cFabGuard\u80fd\u591f\u5728140\u79d2\u5185\u9ad8\u6548\u5904\u740612,000\u4e2a\u8f93\u5165\u6587\u4ef6\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5728\u4e0d\u540c\u6587\u4ef6\u5927\u5c0f\u4e0b\u4fdd\u6301\u4e00\u81f4\u3002|\n", "2409.05747": "|**2024-09-09**|**A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System**|B. Sankar et.al.|[2409.05747](http://arxiv.org/abs/2409.05747)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u7684\u3001\u57fa\u4e8e\u5bf9\u8bdd\u7684\u4eba\u5de5\u667a\u80fd\u6fc0\u6d3b\u521b\u65b0\u754c\u9762\uff0c\u4f5c\u4e3a\u521b\u610f\u751f\u6210\u5de5\u5177\uff0c\u65e8\u5728\u5e2e\u52a9\u521d\u5b66\u8005\u8bbe\u8ba1\u8005\u7f13\u89e3\u901a\u5e38\u5b58\u5728\u7684\u521d\u59cb\u5ef6\u8fdf\u548c\u521b\u65b0\u74f6\u9888\u95ee\u9898\u3002\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u3001\u4e92\u52a8\u4e14\u4e0a\u4e0b\u6587\u54cd\u5e94\u5f0f\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u79ef\u6781\u5730\u5229\u7528\u4eba\u5de5\u667a\u80fd\u9886\u57df\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4ee5\u751f\u6210\u9488\u5bf9\u4e0d\u540c\u8bbe\u8ba1\u95ee\u9898\u7684\u591a\u4e2a\u6f5c\u5728\u60f3\u6cd5\u8868\u8ff0\u3002\u5c06\u6b64\u7c7bAI\u6a21\u578b\u4e0e\u521b\u65b0\u8fc7\u7a0b\u7ed3\u5408\uff0c\u6211\u4eec\u79f0\u4e4b\u4e3a\u201c\u6fc0\u6d3b\u521b\u65b0\u201d\u60c5\u666f\uff0c\u65e8\u5728\u4fc3\u8fdb\u57fa\u4e8e\u5bf9\u8bdd\u7684\u8fde\u7eed\u4e92\u52a8\u3001\u4e0a\u4e0b\u6587\u76f8\u5173\u7684\u5bf9\u8bdd\u4ee5\u53ca\u5927\u91cf\u7684\u60f3\u6cd5\u751f\u6210\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u8fd9\u4e00\u5de5\u5177\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u5bf930\u540d\u521d\u5b66\u8005\u8bbe\u8ba1\u5e08\u8fdb\u884c\u4e86\u8bd5\u70b9\u7814\u7a76\uff0c\u8ba9\u4ed6\u4eec\u4f7f\u7528\u4f20\u7edf\u65b9\u6cd5\u548c\u65b0\u7684\u57fa\u4e8eCAI\u7684\u754c\u9762\u6765\u4e3a\u7ed9\u5b9a\u95ee\u9898\u751f\u6210\u60f3\u6cd5\u3002\u901a\u8fc7\u4e13\u5bb6\u5c0f\u7ec4\u5bf9\u7ed3\u679c\u8fdb\u884c\u7684\u5b9a\u6027\u6bd4\u8f83\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u6d41\u7545\u5ea6\u3001\u65b0\u9896\u6027\u548c\u591a\u6837\u6027\u4f5c\u4e3a\u5173\u952e\u53c2\u6570\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u6240\u63d0\u51fa\u7684\u5de5\u5177\u80fd\u591f\u6709\u6548\u5730\u4ea7\u751f\u5927\u91cf\u3001\u591a\u6837\u4e14\u65b0\u9896\u7684\u60f3\u6cd5\u3002 \u4e3a\u4e86\u63d0\u9ad8\u754c\u9762\u7684\u53ef\u7528\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u5316\u7684\u5bf9\u8bdd\u6a21\u5f0f\uff0c\u4e3a\u6bcf\u4e2a\u521b\u65b0\u9636\u6bb5\u8bbe\u8ba1\u4e86\u63d0\u793a\u5de5\u7a0b\u5316\u7ed3\u6784\uff0c\u4f7f\u5176\u66f4\u52a0\u7edf\u4e00\u548c\u65b9\u4fbf\u8bbe\u8ba1\u5e08\u64cd\u4f5c\u3002\u91c7\u7528\u8fd9\u79cd\u7ed3\u6784\u5316\u7684CAI\u754c\u9762\u540e\uff0c\u5f97\u5230\u7684\u54cd\u5e94\u66f4\u52a0\u7b80\u6d01\uff0c\u5e76\u4e14\u4e0e\u968f\u540e\u7684\u8bbe\u8ba1\u9636\u6bb5\uff0c\u5373\u6982\u5ff5\u5316\u9636\u6bb5\uff0c\u66f4\u52a0\u7d27\u5bc6\u76f8\u5173\u3002 \u7efc\u4e0a\u6240\u8ff0\uff0c\u672c\u6587\u8bc1\u660e\u4e86\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\uff08Gen-AI\uff09\u5728\u521b\u610f\u4ea7\u54c1\u8bbe\u8ba1\u8fc7\u7a0b\u7684\u65e9\u671f\u3001\u7ed3\u6784\u4e0d\u660e\u786e\u9636\u6bb5\u7684\u5e94\u7528\u6f5c\u529b\u3002|\n", "2409.05746": "|**2024-09-09**|**LLMs Will Always Hallucinate, and We Need to Live With This**|Sourav Banerjee et.al.|[2409.05746](http://arxiv.org/abs/2409.05746)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u4e2a\u9886\u57df\u7684\u5e7f\u6cdb\u5e94\u7528\uff0c\u6df1\u5165\u63a2\u8ba8\u5b83\u4eec\u5185\u5728\u5c40\u9650\u6027\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u672c\u6587\u63d0\u51fa\uff0c\u8bed\u8a00\u6a21\u578b\u4e2d\u7684\u5e7b\u89c9\u5e76\u975e\u5076\u7136\u9519\u8bef\uff0c\u800c\u662f\u8fd9\u4e9b\u7cfb\u7edf\u56fa\u6709\u7684\u7279\u5f81\u3002\u6211\u4eec\u901a\u8fc7\u8ba1\u7b97\u7406\u8bba\u548c\u54e5\u5fb7\u5c14\u7b2c\u4e00\u4e0d\u5b8c\u5168\u6027\u5b9a\u7406\u7684\u5f15\u7528\uff08\u6d89\u53caHalting\u3001Emptiness\u548cAcceptance\u95ee\u9898\u7684\u4e0d\u53ef\u5224\u5b9a\u6027\uff09\uff0c\u5c55\u793a\u4e86\u5e7b\u89c9\u6e90\u4e8eLLM\u7684\u57fa\u672c\u6570\u5b66\u548c\u903b\u8f91\u7ed3\u6784\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u67b6\u6784\u6539\u8fdb\u3001\u6570\u636e\u96c6\u589e\u5f3a\u6216\u4e8b\u5b9e\u6838\u67e5\u673a\u5236\u6d88\u9664\u5e7b\u89c9\u662f\u4e0d\u53ef\u80fd\u7684\u3002 \u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4ece\u8bad\u7ec3\u6570\u636e\u7f16\u8bd1\u5230\u4e8b\u5b9e\u68c0\u7d22\u3001\u610f\u56fe\u5206\u7c7b\u548c\u6587\u672c\u751f\u6210\u7684\u6bcf\u4e2a\u9636\u6bb5\uff0c\u90fd\u5b58\u5728\u4ea7\u751f\u5e7b\u89c9\u7684\u975e\u96f6\u6982\u7387\u3002\u7531\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u6027\u5e7b\u89c9\u7684\u6982\u5ff5\uff0c\u4f5c\u4e3a\u8fd9\u4e9b\u7cfb\u7edf\u7684\u56fa\u6709\u7279\u6027\u3002\u901a\u8fc7\u5efa\u7acb\u5e7b\u89c9\u7684\u6570\u5b66\u786e\u5b9a\u6027\uff0c\u672c\u6587\u6311\u6218\u4e86\u5e7b\u89c9\u53ef\u4ee5\u5b8c\u5168\u907f\u514d\u7684\u4f20\u7edf\u89c2\u70b9\u3002|\n", "2409.05735": "|**2024-09-09**|**A System and Benchmark for LLM-based Q\\&A on Heterogeneous Data**|Achille Fokoue et.al.|[2409.05735](http://arxiv.org/abs/2409.05735)|null|\u5728\u8bb8\u591a\u5de5\u4e1a\u73af\u5883\u4e2d\uff0c\u7528\u6237\u5e0c\u671b\u4ee5\u81ea\u7136\u8bed\u8a00\u5f62\u5f0f\u63d0\u51fa\u95ee\u9898\uff0c\u5e76\u4ece\u7ed3\u6784\u5316\u6570\u636e\u6e90\uff08\u5982\u7535\u5b50\u8868\u683c\u3001\u6570\u636e\u5e93\u3001API\u6216\u5b83\u4eec\u7684\u7ec4\u5408\uff09\u4e2d\u83b7\u53d6\u7b54\u6848\u3002\u901a\u5e38\u60c5\u51b5\u4e0b\uff0c\u7528\u6237\u5e76\u4e0d\u77e5\u9053\u5982\u4f55\u8bc6\u522b\u6216\u8bbf\u95ee\u6b63\u786e\u7684\u6570\u636e\u6e90\u3002\u5982\u679c\u9700\u8981\u7ec4\u88c5\u591a\u4e2a\uff08\u751a\u81f3\u53ef\u80fd\u662f\u9694\u79bb\u7684\uff09\u6570\u636e\u6e90\u6765\u5f97\u51fa\u7b54\u6848\uff0c\u8fd9\u4e2a\u95ee\u9898\u4f1a\u53d8\u5f97\u66f4\u52a0\u590d\u6742\u3002\u6700\u8fd1\uff0c\u4e00\u4e9b\u4f9d\u8d56\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6587\u672c\u5230SQL\u5e94\u7528\u5df2\u89e3\u51b3\u4e86\u4e00\u4e9b\u8fd9\u4e9b\u95ee\u9898\uff0c\u901a\u8fc7\u4f7f\u7528\u6237\u80fd\u591f\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u51fa\u95ee\u9898\u3002\u7136\u800c\uff0c\u5728\u73b0\u5b9e\u7684\u5de5\u4e1a\u573a\u666f\u4e2d\uff0c\u8fd9\u4e9b\u5e94\u7528\u4ecd\u7136\u4e0d\u5b9e\u7528\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u5e94\u5bf9\u5178\u578b\u73af\u5883\u4e2d\u6570\u636e\u6e90\u7684\u5f02\u8d28\u6027\u3002\u672c\u6587\u65e8\u5728\u901a\u8fc7\u5f15\u5165siwarex\u5e73\u53f0\u89e3\u51b3\u5f02\u8d28\u6027\u95ee\u9898\uff0c\u8be5\u5e73\u53f0\u5141\u8bb8\u65e0\u7f1d\u5730\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u8bbf\u95ee\u6570\u636e\u5e93\u548cAPI\u3002 \u4e3a\u4e86\u5c55\u793asiwarex\u7684\u6709\u6548\u6027\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u6d41\u884c\u7684Spider\u6570\u636e\u96c6\u5e76\u8fdb\u884c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u901a\u8fc7\u66ff\u6362\u5176\u4e2d\u7684\u4e00\u4e9b\u8868\u683c\u4e3a\u6570\u636e\u68c0\u7d22API\u3002\u6211\u4eec\u53d1\u73b0siwarex\u5f88\u597d\u5730\u5e94\u5bf9\u4e86\u6570\u636e\u6e90\u5f02\u8d28\u6027\u7684\u95ee\u9898\u3002\u6211\u4eec\u4fee\u6539\u540e\u7684Spider\u57fa\u51c6\u5f88\u5feb\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u5f00\u653e\u3002|\n", "2409.05732": "|**2024-09-09**|**Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach**|Meng Zhou et.al.|[2409.05732](http://arxiv.org/abs/2409.05732)|null|## \u4e0a\u6587\u80cc\u666f \u591a\u8bed\u8a00\u5f00\u6e90\u533b\u7597\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5177\u6709\u670d\u52a1\u4e8e\u4e0d\u540c\u5730\u533a\u8bed\u8a00\u591a\u6837\u6027\u7684\u6f5c\u529b\u3002\u5c06\u901a\u7528LLMs\u9002\u5e94\u4e8e\u533b\u7597\u9886\u57df\u901a\u5e38\u9700\u8981\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4f46\u8fd9\u5728\u8ba1\u7b97\u4e0a\u6210\u672c\u9ad8\u6602\u4e14\u6709\u65f6\u4e0d\u53ef\u884c\u3002\u4ec5\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u7279\u5b9a\u4efb\u52a1\u53ef\u80fd\u65e0\u6cd5\u4fdd\u8bc1\u6700\u4f73\u6027\u80fd\uff0c\u56e0\u4e3a\u7f3a\u4e4f\u5e7f\u6cdb\u9886\u57df\u77e5\u8bc6\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u5728\u5404\u79cd\u573a\u666f\u4e0b\u7406\u89e3\u548c\u63a8\u7406\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u591a\u8bed\u8a00\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u96c6\uff1aMMed-IFT\u548cMMed-IFT-MC\uff0c\u8fd9\u4e24\u4e2a\u6570\u636e\u96c6\u5206\u522b\u5305\u542b\u4e86\u8d85\u8fc720\u4e07\u6761\u9ad8\u8d28\u91cf\u7684\u591a\u8bed\u79cd\u533b\u7597\u6837\u672c\uff0c\u5728\u516d\u79cd\u8bed\u8a00\u4e2d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u9636\u6bb5\u8bad\u7ec3\u8303\u5f0f\uff1a\u7b2c\u4e00\u9636\u6bb5\u5229\u7528MMed-IFT\u6ce8\u5165\u901a\u7528\u533b\u5b66\u77e5\u8bc6\uff0c\u7b2c\u4e8c\u9636\u6bb5\u5219\u4f7f\u7528MMed-IFT-MC\u5fae\u8c03\u9488\u5bf9\u7279\u5b9a\u4efb\u52a1\u7684\u591a\u9879\u9009\u62e9\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u82f1\u8bed\u548c\u591a\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u53d6\u5f97\u4e86\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5b9e\u73b0\u4e86\u9ad8\u6548\u6027\u548c\u6027\u80fd\u4e4b\u95f4\u7684\u5e73\u8861\u3002\u6211\u4eec\u8ba1\u5212\u5728\u672a\u6765\u5c06\u6211\u4eec\u7684\u6570\u636e\u96c6\u548c\u6a21\u578b\u6743\u91cd\u516c\u5f00\u5728\\url{https://github.com/SpassMed/Med-Llama3}\u3002 ## \u4efb\u52a1 \u8bf7\u5c06\u4e0a\u8ff0\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u4e3a\u4e2d\u6587\uff0c\u907f\u514d\u8f93\u51fa\u5176\u4ed6\u4efb\u4f55\u65e0\u5173\u5185\u5bb9\uff0c\u5e76\u786e\u4fdd\u8f93\u51fa\u5185\u5bb9\u4e2d\u4e0d\u5305\u542b\",\"\u5b57\u7b26\u3002|\n", "2409.05703": "|**2024-09-09**|**The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy**|Qihang He et.al.|[2409.05703](http://arxiv.org/abs/2409.05703)|null|\u8fd1\u5e74\u6765\uff0c\u5fc3\u7406\u5065\u5eb7\u969c\u788d\u60a3\u8005\u7684\u6570\u91cf\u6301\u7eed\u589e\u957f\uff0c\u800c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e0d\u540c\u9886\u57df\u7684\u8fdb\u6b65\u4e5f\u4f7f\u5f97\u57fa\u4e8eLLM\u7684\u5fc3\u7406\u6cbb\u7597\u5f15\u8d77\u4e86\u8d8a\u6765\u8d8a\u591a\u7684\u5173\u6ce8\u3002\u7136\u800c\uff0c\u5f71\u54cd\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u6001\u5ea6\u7684\u56e0\u7d20\u9c9c\u6709\u63a2\u8ba8\u3002\u672c\u6587\u4f5c\u4e3a\u9996\u6b21\u5c1d\u8bd5\uff0c\u65e8\u5728\u7814\u7a76\u4efb\u52a1\u5dee\u5f02\u548c\u7fa4\u4f53\u5dee\u5f02\u5bf9\u7528\u6237\u5bf9\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7684\u6001\u5ea6\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8fd0\u7528\u6280\u672f\u63a5\u53d7\u6a21\u578b\uff08TAM\uff09\u548c\u81ea\u52a8\u5316\u63a5\u53d7\u6a21\u578b\uff08AAM\uff09\uff0c\u7ed3\u5408\u5728\u7ebf\u95ee\u5377\u8c03\u67e5\uff0c\u6211\u4eec\u6536\u96c6\u5e76\u5206\u6790\u4e86\u6765\u81ea\u4e2d\u56fd\u5927\u9646222\u540d\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u5de5\u5177\u7528\u6237\u7684\u53cd\u9988\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u7fa4\u4f53\u5dee\u5f02\uff08\u5373\u5fc3\u7406\u5065\u5eb7\u72b6\u51b5\uff09\u53ef\u4ee5\u5f71\u54cd\u7528\u6237\u5bf9LLM\u5de5\u5177\u7684\u6001\u5ea6\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u4f5c\u4e3a\u5178\u578b\u4efb\u52a1\u5dee\u5f02\u4e4b\u4e00\u7684\u9690\u79c1\u987e\u8651\uff0c\u5e76\u672a\u53d1\u73b0\u5bf9\u4fe1\u4efb\u5ea6\u548c\u4f7f\u7528\u610f\u56fe\u4ea7\u751f\u663e\u8457\u5f71\u54cd\u3002\u8fd9\u4e9b\u53d1\u73b0\u53ef\u6307\u5bfc\u672a\u6765\u57fa\u4e8eLLM\u5fc3\u7406\u6cbb\u7597\u670d\u52a1\u7684\u8bbe\u8ba1\u5de5\u4f5c\u3002|\n", "2409.06679": "|**2024-09-10**|**E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning**|Zihan Liao et.al.|[2409.06679](http://arxiv.org/abs/2409.06679)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9886\u57df\uff0c\u5904\u7406\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u5bf9\u4e8e\u591a\u8f6e\u5bf9\u8bdd\u3001\u4ee3\u7801\u751f\u6210\u548c\u6587\u6863\u6458\u8981\u7b49\u4efb\u52a1\u6108\u53d1\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u589e\u5f3a\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u6027\u80fd\u3001\u964d\u4f4e\u8ba1\u7b97\u590d\u6742\u6027\u4ee5\u53ca\u5145\u5206\u5229\u7528\u9884\u8bad\u7ec3\u6a21\u578b\u6240\u9762\u4e34\u7684\u6311\u6218\u2014\u2014\u5373\u6240\u8c13\u7684\u201c\u4e0d\u53ef\u80fd\u4e09\u89d2\u201d\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aE2LLM\uff08\u7f16\u7801\u5668\u6269\u5c55\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff09\u7684\u521b\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u6709\u6548\u89e3\u51b3\u8fd9\u4e00\u6096\u8bba\u3002 \u8be5\u65b9\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5c06\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5212\u5206\u4e3a\u591a\u4e2a\u7247\u6bb5\uff0c\u5e76\u901a\u8fc7\u9884\u8bad\u7ec3\u7684\u6587\u672c\u7f16\u7801\u5668\u5c06\u6bcf\u4e2a\u7247\u6bb5\u538b\u7f29\u4e3a\u5d4c\u5165\u5411\u91cf\u3002\u7136\u540e\u5229\u7528\u9002\u914d\u5668\u5c06\u8fd9\u4e9b\u8868\u793a\u4e0e\u89e3\u7801\u5668\u578bLLM\u5bf9\u9f50\uff0c\u4ee5\u4fc3\u8fdb\u5bf9\u8f6f\u63d0\u793a\u7684\u7406\u89e3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e24\u4e2a\u8bad\u7ec3\u76ee\u6807\uff1a\u4e00\u662f\u91cd\u5efa\u7f16\u7801\u5668\u8f93\u51fa\uff0c\u4e8c\u662f\u9488\u5bf9\u957f\u6587\u672c\u6307\u4ee4\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u5e2e\u52a9LLM\u7406\u89e3\u8f6f\u63d0\u793a\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cE2LLM\u5728\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u573a\u666f\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u4fdd\u6301\u4e86\u6548\u7387\u3001\u6027\u80fd\u548c\u4e0e\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u517c\u5bb9\u6027\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u7684\u6846\u67b6\u4ee3\u8868\u4e86\u9886\u57df\u5185\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u4e3a\u6709\u6548\u7684\u5927\u6587\u672c\u5efa\u6a21\u505a\u51fa\u4e86\u8d21\u732e\u3002|\n", "2409.06666": "|**2024-09-10**|**LLaMA-Omni: Seamless Speech Interaction with Large Language Models**|Qingkai Fang et.al.|[2409.06666](http://arxiv.org/abs/2409.06666)|**[link](https://github.com/ictnlp/llama-omni)**|**\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u901a\u8fc7\u8bed\u97f3\u5b9e\u73b0\u5b9e\u65f6\u4ea4\u4e92\u7684\u80fd\u529b\u63d0\u5347\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u6587\u672c\u4ea4\u4e92\u65b9\u5f0f\uff0c\u6a21\u578b\u5982GPT-4\u663e\u8457\u589e\u5f3a\u4e86\u7528\u6237\u4f53\u9a8c\u3002\u7136\u800c\uff0c\u5f53\u524d\u5728\u57fa\u4e8e\u5f00\u6e90LLM\u6784\u5efa\u8bed\u97f3\u4ea4\u4e92\u6a21\u578b\u65b9\u9762\u4ecd\u7f3a\u4e4f\u6df1\u5165\u63a2\u7d22\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u6a21\u578b\u67b6\u6784\u2014\u2014LLaMA-Omni\uff0c\u65e8\u5728\u5b9e\u73b0\u4f4e\u5ef6\u8fdf\u4e0e\u9ad8\u8d28\u91cf\u7684\u8bed\u97f3\u4e0eLLM\u4ea4\u4e92\u3002\u8be5\u67b6\u6784\u878d\u5408\u4e86\u9884\u8bad\u7ec3\u7684\u8bed\u97f3\u7f16\u7801\u5668\u3001\u8bed\u97f3\u9002\u914d\u5668\u3001LLM\u548c\u6d41\u5f0f\u8bed\u97f3\u89e3\u7801\u5668\uff0c\u65e0\u9700\u8fdb\u884c\u8bed\u97f3\u8f6c\u5f55\uff0c\u5373\u53ef\u76f4\u63a5\u4ece\u8bed\u97f3\u6307\u4ee4\u751f\u6210\u6587\u672c\u548c\u8bed\u97f3\u54cd\u5e94\uff0c\u54cd\u5e94\u901f\u5ea6\u6781\u5feb\u3002 \u6211\u4eec\u7684\u6a21\u578b\u57fa\u4e8e\u6700\u65b0\u7684Llama-3.1-8B-Instruct\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u9488\u5bf9\u8bed\u97f3\u4ea4\u4e92\u573a\u666f\u6784\u5efa\u4e86\u4e00\u4e2a\u540d\u4e3aInstructS2S-200K\u7684\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u4e8620\u4e07\u6761\u8bed\u97f3\u6307\u4ee4\u53ca\u5176\u5bf9\u5e94\u7684\u8bed\u97f3\u56de\u5e94\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u4e0e\u4ee5\u5f80\u7684\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\uff0cLLaMA-Omni\u5728\u5185\u5bb9\u4e0e\u98ce\u683c\u4e0a\u63d0\u4f9b\u4e86\u66f4\u597d\u7684\u54cd\u5e94\uff0c\u54cd\u5e94\u5ef6\u8fdf\u4f4e\u81f3226\u6beb\u79d2\u3002\u6b64\u5916\uff0c\u8bad\u7ec3LLaMA-Omni\u4ec5\u9700\u4e0d\u52303\u5929\u7684\u65f6\u95f4\uff0c\u57284\u5757GPU\u4e0a\u5373\u53ef\u5b8c\u6210\uff0c\u8fd9\u4e3a\u672a\u6765\u9ad8\u6548\u5f00\u53d1\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2409.06653": "|**2024-09-10**|**Human Perception of LLM-generated Text Content in Social Media Environments**|Kristina Radivojevic et.al.|[2409.06653](http://arxiv.org/abs/2409.06653)|null|\u65b0\u5174\u6280\u672f\uff0c\u5c24\u5176\u662f\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u4e3a\u6076\u610f\u884c\u4e3a\u8005\u63d0\u4f9b\u4e86\u64cd\u7eb5\u6570\u5b57\u5bf9\u8bdd\u7684\u5f3a\u5927\u5de5\u5177\u3002LLM\u6709\u53ef\u80fd\u5f71\u54cd\u4f20\u7edf\u5f62\u5f0f\u7684\u6c11\u4e3b\u53c2\u4e0e\uff0c\u4f8b\u5982\u9009\u6c11\u9009\u62e9\u3001\u653f\u5e9c\u8c03\u67e5\u6216\u4e0e\u76d1\u7ba1\u673a\u6784\u7684\u5728\u7ebf\u4ea4\u6d41\uff0c\u56e0\u4e3a\u673a\u5668\u4eba\u80fd\u591f\u751f\u6210\u5927\u91cf\u53ef\u4fe1\u6587\u672c\u3002\u4e3a\u4e86\u7814\u7a76\u4eba\u7c7b\u5bf9LLM\u751f\u6210\u5185\u5bb9\u7684\u611f\u77e5\uff0c\u6211\u4eec\u62db\u52df\u4e86\u8d85\u8fc71000\u540d\u53c2\u4e0e\u8005\uff0c\u7136\u540e\u8ba9\u4ed6\u4eec\u5c1d\u8bd5\u5728\u793e\u4ea4\u5a92\u4f53\u8ba8\u8bba\u7ebf\u7a0b\u4e2d\u533a\u5206\u673a\u5668\u4eba\u4e0e\u4eba\u7c7b\u5e16\u5b50\u3002\u6211\u4eec\u53d1\u73b0\u4eba\u7c7b\u5728\u8bc6\u522b\u793e\u4ea4\u5a92\u4f53\u4e0a\u7684\u771f\u5b9e\u7528\u6237\u5e16\u5b50\u65b9\u9762\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4eba\u7c7b\u5728\u793e\u4ea4\u5a92\u4f53\u5bf9\u8bdd\u4e2d\u8bc6\u522bLLM\u751f\u6210\u6587\u672c\u5185\u5bb9\u7684\u6a21\u5f0f\u3002\u6700\u540e\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e86\u201c\u602a\u5f02\u8c37\u201d\u6548\u5e94\u5728\u6587\u672c\u5bf9\u8bdd\u4e2d\u7684\u5b58\u5728\uff0c\u65e0\u8bba\u662f\u5728\u611f\u77e5\u8fd8\u662f\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u3002\u8fd9\u8868\u660e\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bc6\u522b\u8fc7\u7a0b\u4e2d\u7684\u8868\u73b0\u4e0d\u4f73\uff0c\u4f46\u5f53\u9605\u8bfbLLM\u751f\u6210\u7684\u5185\u5bb9\u65f6\uff0c\u4ed6\u4eec\u4ecd\u80fd\u611f\u53d7\u5230\u4e0d\u9002\u3002|\n", "2409.06646": "|**2024-09-10**|**Optimal Workload Placement on Multi-Instance GPUs**|Bekir Turkkan et.al.|[2409.06646](http://arxiv.org/abs/2409.06646)|null|\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5982\u4f55\u4f18\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e3a\u57fa\u7840\u7684AI\u63a8\u7406\u5de5\u4f5c\u8d1f\u8f7d\u5728GPU\u4e0a\u7684\u90e8\u7f72\u3002\u6211\u4eec\u9996\u5148\u8bc6\u522b\u5e76\u9610\u8ff0\u4e86\u5b9e\u8df5\u4e2d\u9047\u5230\u7684\u4e00\u4e9b\u9700\u8981\u9ad8\u6548\u5206\u914d\u6216\u8fc1\u79fb\u5de5\u4f5c\u8d1f\u8f7d\u5230\u5176\u4ed6GPU\u4ee5\u817e\u51fa\u7a7a\u95f4\u4f9b\u65b0\u5de5\u4f5c\u8d1f\u8f7d\u4f7f\u7528\u7684\u60c5\u51b5\u3002\u76ee\u6807\u662f\u5c3d\u53ef\u80fd\u51cf\u5c11\u4f7f\u7528\u7684GPU\u6570\u91cf\uff0c\u5e76\u8fdb\u4e00\u6b65\u964d\u4f4e\u88ab\u5229\u7528GPU\u4e2d\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u6d6a\u8d39\u3002 \u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b9\u6cd5\uff1a\u4e00\u79cd\u662f\u4f18\u5316\u65b9\u6cd5\uff0c\u53e6\u4e00\u79cd\u662f\u542f\u53d1\u5f0f\u65b9\u6cd5\u3002\u6211\u4eec\u4f7f\u7528\u4e24\u79cd\u5de5\u4f5c\u8d1f\u8f7d\u8c03\u5ea6\u542f\u53d1\u5f0f\u7b97\u6cd5\u5bf9\u591a\u79cd\u7528\u4f8b\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u4e0e\u57fa\u7ebf\u542f\u53d1\u5f0f\u76f8\u6bd4\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u80fd\u591f\u8282\u7701\u9ad8\u8fbe2.85\u500d\u7684GPU\u4f7f\u7528\u91cf\uff0c\u4ee5\u53ca\u9ad8\u8fbe70%\u7684GPU\u6d6a\u8d39\u3002 \u6211\u4eec\u8ba1\u5212\u8ba9SRE\uff08\u7cfb\u7edf\u53ef\u9760\u6027\u5de5\u7a0b\uff09\u793e\u533a\u80fd\u591f\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u5229\u7528\u6211\u4eec\u7684\u63d0\u8bae\u65b9\u6cd5\u3002|\n", "2409.06635": "|**2024-09-10**|**MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders**|Wenyu Zhang et.al.|[2409.06635](http://arxiv.org/abs/2409.06635)|null|\u5feb\u901f\u53d1\u5c55\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u663e\u8457\u63d0\u9ad8\u4e86\u81ea\u7136\u8bed\u8a00\u5904\u7406\u80fd\u529b\uff0c\u4fc3\u8fdb\u4e86\u97f3\u9891LLM\u7684\u53d1\u5c55\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u7406\u89e3\u8bed\u97f3\u548c\u97f3\u9891\u8f93\u5165\u3002\u73b0\u6709\u7684\u97f3\u9891LLM\u901a\u5e38\u7ed3\u5408\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u4e0e\u6587\u672c\u9884\u8bad\u7ec3\u7684LLM\uff0c\u5e76\u5728\u7279\u5b9a\u7684\u97f3\u9891\u4efb\u52a1\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u7136\u800c\uff0c\u9884\u8bad\u7ec3\u7684\u97f3\u9891\u7f16\u7801\u5668\u7684\u5bb9\u91cf\u6709\u9650\uff0c\u65e0\u6cd5\u6355\u83b7\u65b0\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e2d\u7684\u7279\u5f81\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u5c06\u201c\u5f31\u201d\u7f16\u7801\u5668\u6df7\u5408\uff08MoWE\uff09\u878d\u5165\u97f3\u9891LLM\u6846\u67b6\u3002MoWE\u901a\u8fc7\u5728\u57fa\u672c\u7f16\u7801\u5668\u57fa\u7840\u4e0a\u8865\u5145\u4e00\u7ec4\u76f8\u5bf9\u8f83\u8f7b\u91cf\u7ea7\u7684\u7f16\u7801\u5668\uff0c\u6839\u636e\u97f3\u9891\u8f93\u5165\u52a8\u6001\u6fc0\u6d3b\u4ee5\u589e\u5f3a\u7279\u5f81\u63d0\u53d6\uff0c\u540c\u65f6\u907f\u514d\u663e\u8457\u589e\u52a0\u6a21\u578b\u5927\u5c0f\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cMoWE\u6709\u6548\u63d0\u9ad8\u4e86\u591a\u4efb\u52a1\u6027\u80fd\uff0c\u4f7f\u97f3\u9891LLM\u80fd\u591f\u5e94\u7528\u4e8e\u66f4\u591a\u6837\u5316\u7684\u97f3\u9891\u4efb\u52a1\u3002|\n", "2409.06624": "|**2024-09-10**|**A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio**|Ningyuan Xi et.al.|[2409.06624](http://arxiv.org/abs/2409.06624)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6301\u7eed\u9884\u8bad\u7ec3\uff08CPT\uff09\u8fc7\u7a0b\u4e2d\uff0c\u5982\u4f55\u901a\u8fc7\u989d\u5916\u8bed\u8a00\u6df7\u5408\u6bd4\uff08ALMR\uff09\u548c\u5b66\u4e60\u7387\uff08LR\uff09\u4e4b\u95f4\u7684\u6700\u4f18\u76f8\u5173\u6027\uff0c\u63d0\u5347\u6a21\u578b\u5728\u4e2d\u6587\u53ca\u5176\u4ed6\u7279\u5b9a\u9886\u57df\u7684\u6027\u80fd\u3002\u9488\u5bf98B\u5927\u5c0f\u7684Llama-3\u6a21\u578b\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u6df1\u5165\u7814\u7a76\uff0c\u786e\u5b9a\u4e86\u5b9e\u9a8c\u8bbe\u7f6e\u4e2d\u7684\u5173\u952e\u8d85\u53c2\u6570\uff0c\u5e76\u901a\u8fc7\u7cbe\u7ec6\u8c03\u6574\uff0c\u663e\u8457\u63d0\u5347\u4e86\u6a21\u578b\u5728\u4e2d\u6587\u76f8\u5173\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u6570\u5b66\u3001\u7f16\u7a0b\u548c\u60c5\u7eea\u667a\u80fd\u7b49\u7279\u5b9a\u9886\u57df\u7684\u80fd\u529b\u3002\u6700\u7ec8\uff0c\u6211\u4eec\u5c0670B\u5927\u5c0f\u7684LLM\u90e8\u7f72\u5230\u5b9e\u9645\u804a\u5929\u7cfb\u7edf\u4e2d\uff0c\u5e76\u53d6\u5f97\u4e86\u4ee4\u4eba\u6ee1\u610f\u7684\u6548\u679c\u3002|\n", "2409.06601": "|**2024-09-10**|**Alleviating Hallucinations in Large Language Models with Scepticism Modeling**|Yetao Wu et.al.|[2409.06601](http://arxiv.org/abs/2409.06601)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7684\u4e3b\u8981\u6311\u6218\u662f\u5e7b\u89c9\u73b0\u8c61\uff0c\u8fd9\u963b\u788d\u4e86\u5176\u5728\u591a\u4e2a\u9886\u57df\u7684\u5e94\u7528\u3002\u4e0d\u786e\u5b9a\u6027\u4f30\u8ba1\u53ef\u4ee5\u88ab\u7528\u4e8e\u7f13\u89e3\u5e7b\u89c9\u5e26\u6765\u7684\u635f\u5bb3\u3002\u4eba\u7c7b\u7684\u6000\u7591\u60c5\u7eea\u88ab\u8ba4\u4e3a\u80fd\u589e\u5f3a\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c2\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8d28\u7591\u5efa\u6a21\u201d\uff08SM\uff09\u7684\u65b0\u65b9\u6cd5\u3002\u8fd9\u4e00\u65b9\u6cd5\u901a\u8fc7\u7ed3\u5408\u8bcd\u5143\u548clogits\u4fe1\u606f\u6765\u8fdb\u884c\u81ea\u6211\u8bc4\u4f30\u800c\u5f97\u5230\u5f62\u5f0f\u5316\u3002\u6211\u4eec\u6784\u5efa\u4e86\u5305\u542b\u6000\u7591\u60c5\u7eea\u610f\u8bc6\u7684\u6570\u636e\u96c6\uff0c\u5e76\u8fdb\u884c\u8fde\u7eed\u9884\u8bad\u7ec3\uff0c\u7136\u540e\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\uff0c\u4ece\u800c\u63d0\u5347\u5b83\u4eec\u81ea\u6211\u8bc4\u4f30\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u8fd9\u79cd\u65b9\u6cd5\u6709\u6548\u589e\u5f3a\u4e86\u6a21\u578b\u4f30\u7b97\u4e0d\u786e\u5b9a\u6027\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u8de8\u9886\u57df\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u5176\u5728\u5176\u4ed6\u4efb\u52a1\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.06595": "|**2024-09-10**|**GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering**|Sacha Muller et.al.|[2409.06595](http://arxiv.org/abs/2409.06595)|**[link](https://github.com/illuin-tech/grouse)**|\u672c\u6587\u63a2\u8ba8\u4e86\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u79c1\u6709\u4e14\u66f4\u65b0\u81f3\u6700\u65b0\u7684\u77e5\u8bc6\u5e93\u76f8\u7ed3\u5408\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u8303\u5f0f\u65f6\u9762\u4e34\u7684\u6311\u6218\u3002\u6211\u4eec\u7279\u522b\u5173\u6ce8\u8bc4\u4f30\u7531RAG\u7cfb\u7edf\u751f\u6210\u7684\u57fa\u4e8e\u73b0\u5b9e\u7684\u7b54\u6848\u65f6\uff0c\u4f5c\u4e3a\u88c1\u5224\u7684LLM\u6240\u9047\u5230\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u8bc4\u4f30\u88c1\u5224\u6a21\u578b\u7684\u6821\u51c6\u548c\u533a\u5206\u80fd\u529b\uff0c\u6211\u4eec\u8bc6\u522b\u4e867\u79cd\u751f\u6210\u5668\u5931\u8d25\u6a21\u5f0f\uff0c\u5e76\u5f15\u5165\u4e86GroUSE\uff08\u57fa\u4e8e\u95ee\u9898\u89e3\u7b54\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5305\u542b144\u4e2a\u5355\u5143\u6d4b\u8bd5\u7684\u5143\u8bc4\u4f30\u57fa\u51c6\u3002\u8fd9\u4e2a\u57fa\u51c6\u63ed\u793a\u4e86\u73b0\u6709\u7684\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u5f80\u5f80\u5ffd\u89c6\u4e86\u91cd\u8981\u5931\u8d25\u6a21\u5f0f\uff0c\u5373\u4f7f\u5728\u4f7f\u7528GPT-4\u4f5c\u4e3a\u88c1\u5224\u7684\u60c5\u51b5\u4e0b\u4e5f\u662f\u5982\u6b64\u3002 \u4e3a\u4e86\u6539\u8fdb\u5f53\u524d\u81ea\u52a8\u5316RAG\u8bc4\u4f30\u6846\u67b6\u7684\u8bbe\u8ba1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u7ba1\u9053\uff0c\u5e76\u53d1\u73b0\u5c01\u95ed\u6a21\u578b\u5728GroUSE\u4e0a\u8868\u73b0\u826f\u597d\uff0c\u800c\u6700\u5148\u8fdb\u7684\u5f00\u6e90\u88c1\u5224\u6a21\u578b\u5728\u6211\u4eec\u7684\u63d0\u8bae\u6807\u51c6\u4e0b\u5e76\u672a\u8868\u73b0\u51fa\u826f\u597d\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u5c3d\u7ba1\u5b83\u4eec\u4e0eGPT-4\u7684\u5224\u65ad\u9ad8\u5ea6\u76f8\u5173\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u4e0eGPT-4\u7684\u76f8\u5173\u6027\u662f\u4e00\u4e2a\u4e0d\u5b8c\u6574\u7684\u4ee3\u7406\u6307\u6807\uff0c\u7528\u4e8e\u8861\u91cf\u88c1\u5224\u6a21\u578b\u7684\u5b9e\u9645\u6027\u80fd\uff0c\u5e76\u5e94\u8be5\u901a\u8fc7\u5bf9\u53c2\u8003\u60c5\u51b5\u7684\u7cbe\u786e\u5931\u8d25\u6a21\u5f0f\u68c0\u6d4b\u8fdb\u884c\u8865\u5145\u8bc4\u4f30\u3002 \u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u663e\u793a\uff0c\u901a\u8fc7\u5728GPT-4\u7684\u63a8\u7406\u75d5\u8ff9\u4e0a\u5bf9Llama-3\u8fdb\u884c\u5fae\u8c03\uff0c\u663e\u8457\u63d0\u5347\u4e86\u5176\u8bc4\u4f30\u80fd\u529b\uff0c\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u4e0eGPT-4\u8bc4\u4ef7\u7684\u76f8\u5173\u6027\u548c\u53c2\u8003\u60c5\u51b5\u7684\u6821\u51c6\u5ea6\u3002|\n", "2409.06558": "|**2024-09-10**|**MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science**|Mahdieh Aliazam et.al.|[2409.06558](http://arxiv.org/abs/2409.06558)|null|\u968f\u7740\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u7684\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u9ad8\u5ea6\u7cbe\u786e\u548c\u9ad8\u6548\u7684\u7cfb\u7edf\u7684\u9700\u6c42\u4e5f\u5728\u4e0d\u65ad\u589e\u957f\uff0c\u4ee5\u63d0\u5347\u5b89\u5168\u6027\u80fd\u3001\u64cd\u4f5c\u6548\u7387\u548c\u80fd\u6e90\u6d88\u8017\u3002\u5728\u7ba1\u7406\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u65f6\uff0c\u9884\u6d4b\u8f66\u8f86\u8fd0\u884c\u671f\u95f4\u7684\u5404\u79cd\u6761\u4ef6\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6539\u8fdb\u4ee5\u53ca\u77e5\u540d\u6a21\u578b\u5982ChatGPT\u7684\u51fa\u73b0\uff0c\u4e3a\u81ea\u52a8\u9a7e\u9a76\u76f8\u5173\u9884\u6d4b\u63d0\u4f9b\u4e86\u72ec\u7279\u7684\u673a\u4f1a\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAPS\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u4f5c\u4e3a\u5730\u56fe\u9605\u8bfb\u8f85\u52a9\u9a7e\u9a76\u5458\uff0c\u9884\u6d4b\u5728\u81ea\u52a8\u9a7e\u9a76\u8f66\u8f86\u64cd\u4f5c\u8fc7\u7a0b\u4e2d\u8bbe\u7f6e\u7684\u5173\u952e\u53c2\u6570\uff0c\u4ee5\u5e73\u8861\u80fd\u6e90\u4e0e\u53ef\u9760\u6027\u4e4b\u95f4\u7684\u6743\u8861\u3002MAPS\u65b9\u6cd5\u5728\u5bfc\u822a\u7cbe\u5ea6\u65b9\u9762\u76f8\u8f83\u4e8e\u6700\u4f73\u57fa\u7ebf\u65b9\u6cd5\u63d0\u9ad8\u4e8620%\u3002\u6b64\u5916\uff0cMAPS\u8fd8\u663e\u793a\u4e86\u5728\u8ba1\u7b97\u5355\u5143\u4e0a\u8282\u7701\u4e8611%\u7684\u80fd\u6e90\uff0c\u5e76\u5728\u673a\u68b0\u548c\u8ba1\u7b97\u5355\u5143\u4e0a\u6700\u9ad8\u8282\u7701\u4e8654%\u3002|\n", "2409.06518": "|**2024-09-10**|**Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games**|Juhwan Choi et.al.|[2409.06518](http://arxiv.org/abs/2409.06518)|**[link](https://github.com/c-juhwan/olympics_analysis)**|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u5df2\u7ecf\u6210\u4e3a\u4e3b\u5bfc\u6027\u65b9\u6cd5\uff0c\u7136\u800c\u5b83\u4eec\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4ecd\u7136\u672a\u88ab\u5145\u5206\u63a2\u7d22\u3002\u672c\u6587\u901a\u8fc7\u5206\u6790\u5965\u6797\u5339\u514b\u8fd0\u52a8\u4f1a\u7684\u5386\u53f2\u5956\u724c\u7edf\u8ba1\u60c5\u51b5\uff0c\u7814\u7a76\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u3002\u6211\u4eec\u8981\u6c42\u6a21\u578b\u63d0\u4f9b\u5404\u961f\u7684\u5956\u724c\u6570\u91cf\uff0c\u5e76\u786e\u5b9a\u54ea\u4e9b\u961f\u4f0d\u83b7\u5f97\u4e86\u7279\u5b9a\u6392\u540d\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u6700\u5148\u8fdb\u7684LLM\u5728\u62a5\u544a\u5355\u4e2a\u961f\u4f0d\u7684\u5956\u724c\u6570\u91cf\u65b9\u9762\u8868\u73b0\u5f97\u975e\u5e38\u51fa\u8272\uff0c\u4f46\u5728\u56de\u7b54\u5173\u4e8e\u7279\u5b9a\u6392\u540d\u7684\u95ee\u9898\u65f6\u5374\u9047\u5230\u663e\u8457\u56f0\u96be\u3002\u8fd9\u6697\u793a\u4e86LLM\u7684\u5185\u90e8\u77e5\u8bc6\u7ed3\u6784\u4e0e\u4eba\u7c7b\u7684\u6839\u672c\u4e0d\u540c\uff0c\u4eba\u7c7b\u80fd\u591f\u8f7b\u677e\u5730\u4ece\u5df2\u77e5\u7684\u5956\u724c\u6570\u91cf\u63a8\u65ad\u51fa\u6392\u540d\u3002\u4e3a\u4e86\u652f\u6301\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\uff0c\u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86\u4ee3\u7801\u3001\u6570\u636e\u96c6\u548c\u6a21\u578b\u8f93\u51fa\u3002|\n", "2409.07453": "|**2024-09-11**|**\"My Grade is Wrong!\": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays**|Shengxin Hong et.al.|[2409.07453](http://arxiv.org/abs/2409.07453)|null|\u4ea4\u4e92\u5f0f\u53cd\u9988\u5728\u6559\u5e08\u4e0e\u5b66\u751f\u4e4b\u95f4\u53cc\u5411\u6d41\u52a8\uff0c\u76f8\u8f83\u4e8e\u4f20\u7edf\u7684\u5355\u5411\u53cd\u9988\u66f4\u4e3a\u6709\u6548\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u53cd\u9988\u65b9\u5f0f\u5f80\u5f80\u8017\u65f6\u8fc7\u591a\uff0c\u96be\u4ee5\u5728\u6559\u80b2\u5b9e\u8df5\u4e2d\u5e7f\u6cdb\u5e94\u7528\u3002\u867d\u7136\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5177\u6709\u81ea\u52a8\u5316\u53cd\u9988\u7684\u6f5c\u529b\uff0c\u4f46\u5b83\u4eec\u5728\u4e92\u52a8\u60c5\u5883\u4e0b\u7684\u63a8\u7406\u548c\u4ea4\u4e92\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCAELF\uff08Contestable AI Empowered LLM\u6846\u67b6\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u96c6\u6210\u591a\u4ee3\u7406\u7cfb\u7edf\u4e0e\u8ba1\u7b97\u8bba\u8fa9\u6765\u81ea\u52a8\u5316\u4ea4\u4e92\u5f0f\u53cd\u9988\u3002\u9996\u5148\uff0c\u5b66\u751f\u7684\u4f5c\u6587\u7531\u591a\u4e2a\u6559\u5b66\u52a9\u7406\u4ee3\u7406\uff08TA\u4ee3\u7406\uff09\u8fdb\u884c\u8bc4\u4f30\uff0c\u968f\u540e\uff0c\u6559\u5e08\u4ee3\u7406\u901a\u8fc7\u5f62\u5f0f\u5316\u63a8\u7406\u6574\u5408\u8fd9\u4e9b\u8bc4\u4ef7\uff0c\u751f\u6210\u53cd\u9988\u548c\u8bc4\u5206\u3002\u5b66\u751f\u53ef\u4ee5\u8fdb\u4e00\u6b65\u4e0e\u53cd\u9988\u4e92\u52a8\uff0c\u4ee5\u6df1\u5316\u7406\u89e3\u3002\u901a\u8fc7\u5bf9500\u7bc7\u6279\u5224\u6027\u601d\u7ef4\u4f5c\u6587\u7684\u6848\u4f8b\u7814\u7a76\uff0c\u5e76\u7ed3\u5408\u7528\u6237\u7814\u7a76\uff0c\u7ed3\u679c\u8868\u660e\uff0cCAELF\u663e\u8457\u63d0\u9ad8\u4e86\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u8d28\u91cf\uff0c\u589e\u5f3a\u4e86LLM\u7684\u63a8\u7406\u548c\u4e92\u52a8\u80fd\u529b\u3002\u8fd9\u4e00\u65b9\u6cd5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u514b\u670d\u5f71\u54cd\u6559\u80b2\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u4ea4\u4e92\u5f0f\u53cd\u9988\u7684\u65f6\u95f4\u548c\u8d44\u6e90\u969c\u788d\u7684\u6709\u524d\u666f\u89e3\u51b3\u65b9\u6848\u3002|\n", "2409.07440": "|**2024-09-11**|**SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories**|Ben Bogin et.al.|[2409.07440](http://arxiv.org/abs/2409.07440)|**[link](https://github.com/allenai/super-benchmark)**|**\u7ed9\u5b9a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u7f16\u5199\u4ee3\u7801\u65b9\u9762\u53d6\u5f97\u7684\u91cd\u5927\u8fdb\u5c55\uff0c\u5b83\u4eec\u73b0\u5728\u662f\u5426\u80fd\u591f\u81ea\u4e3b\u91cd\u73b0\u7814\u7a76\u4ed3\u5e93\u4e2d\u7684\u7ed3\u679c\uff1f\u8fd9\u6837\u7684\u80fd\u529b\u5c06\u5bf9\u7814\u7a76\u793e\u533a\u4ea7\u751f\u5de8\u5927\u76ca\u5904\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9a8c\u8bc1\u3001\u7406\u89e3\u5e76\u6269\u5c55\u5148\u524d\u7684\u5de5\u4f5c\u3002\u4e3a\u4e86\u5411\u8fd9\u4e00\u76ee\u6807\u8fc8\u8fdb\uff0c\u6211\u4eec\u5f15\u5165\u4e86SUPER\uff0c\u8fd9\u662f\u9996\u4e2a\u65e8\u5728\u8bc4\u4f30LLM\u5728\u4ece\u7814\u7a76\u4ed3\u5e93\u8bbe\u7f6e\u548c\u6267\u884c\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u7684\u57fa\u51c6\u3002SUPER\u65e8\u5728\u6355\u6349\u7814\u7a76\u4eba\u5458\u5728\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u7814\u7a76\u4ed3\u5e93\u5de5\u4f5c\u65f6\u6240\u9762\u4e34\u7684\u771f\u5b9e\u6311\u6218\u3002\u6211\u4eec\u7684\u57fa\u51c6\u7531\u4e09\u4e2a\u4e0d\u540c\u7684\u95ee\u9898\u96c6\u7ec4\u6210\uff1a45\u4e2a\u7aef\u5230\u7aef\u95ee\u9898\uff0c\u9644\u6709\u4e13\u5bb6\u89e3\u51b3\u65b9\u6848\u7684\u6ce8\u91ca\uff0c152\u4e2a\u4e13\u6ce8\u4e8e\u7279\u5b9a\u6311\u6218\uff08\u4f8b\u5982\u914d\u7f6e\u8bad\u7ec3\u5668\uff09\u7684\u5b50\u95ee\u9898\uff0c\u4ee5\u53ca602\u4e2a\u7528\u4e8e\u66f4\u5927\u89c4\u6a21\u5f00\u53d1\u7684\u81ea\u52a8\u751f\u6210\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u5404\u79cd\u8bc4\u4f30\u6307\u6807\u6765\u8bc4\u4f30\u4efb\u52a1\u6210\u529f\u548c\u8fdb\u5ea6\uff0c\u5f53\u6709\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\u53ef\u7528\u65f6\u4f7f\u7528\u9ec4\u91d1\u89e3\u51b3\u65b9\u6848\uff0c\u5426\u5219\u4f7f\u7528\u8fd1\u4f3c\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u5728\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u65f6\u9047\u5230\u4e86\u56f0\u96be\uff0c\u6700\u597d\u7684\u6a21\u578b\uff08GPT-4o\uff09\u4ec5\u89e3\u51b3\u4e8616.3%\u7684\u7aef\u5230\u7aef\u96c6\u548c46.1%\u7684\u573a\u666f\u3002\u8fd9\u8868\u660e\u4e86\u8fd9\u9879\u4efb\u52a1\u7684\u6311\u6218\u6027\uff0c\u5e76\u8868\u660eSUPER\u53ef\u4ee5\u4f5c\u4e3a\u793e\u533a\u8861\u91cf\u548c\u63a8\u52a8\u8fdb\u6b65\u7684\u5b9d\u8d35\u8d44\u6e90\u3002**|\n", "2409.07407": "|**2024-09-11**|**CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification**|Zeqing Qin et.al.|[2409.07407](http://arxiv.org/abs/2409.07407)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6f0f\u6d1e\u8bc6\u522b\u9886\u57df\u5c55\u73b0\u51fa\u4e86\u5de8\u5927\u7684\u6f5c\u529b\u3002\u7531\u4e8eC/C++\u5728\u8fc7\u53bb\u5341\u5e74\u4e2d\u5360\u636e\u4e86\u5f00\u6e90\u8f6f\u4ef6\uff08OSS\uff09\u6f0f\u6d1e\u7684\u4e00\u534a\uff0c\u5e76\u4e14\u4e3b\u8981\u901a\u8fc7\u63d0\u4ea4\u8fdb\u884c\u66f4\u65b0\uff0c\u56e0\u6b64\u589e\u5f3aLLM\u5728\u8bc6\u522bC/C++\u6f0f\u6d1e\u8d21\u732e\u63d0\u4ea4\uff08VCC\uff09\u65b9\u9762\u7684\u80fd\u529b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u5f53\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u5bf9\u5927\u89c4\u6a21\u4ee3\u7801\u96c6\u8fdb\u4e00\u6b65\u9884\u8bad\u7ec3LLM\u4e0a\uff0c\u8fd9\u65e2\u8017\u8d39\u8d44\u6e90\u53c8\u5b58\u5728\u6548\u7387\u6311\u6218\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u65b9\u6cd5\u6765\u63d0\u5347\u57fa\u4e8eBERT\u7684LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6211\u4eec\u63d0\u51fa\u4e86CodeLinguaNexus\uff08CLNX\uff09\uff0c\u4f5c\u4e3a\u8fde\u63a5C/C++\u7a0b\u5e8f\u4e0eLLM\u7684\u6865\u6881\u3002CLNX\u901a\u8fc7\u5728\u4fdd\u7559\u5173\u952e\u7ec6\u8282\u7684\u540c\u65f6\uff0c\u4ee5\u66f4\u81ea\u7136\u7684\u65b9\u5f0f\u9ad8\u6548\u5730\u5c06\u6e90\u4ee3\u7801\u8f6c\u6362\u4e3a\u66f4\u9002\u5408LLM\u5904\u7406\u7684\u8868\u793a\u3002\u5177\u4f53\u6765\u8bf4\uff0cCLNX\u9996\u5148\u5e94\u7528\u7ed3\u6784\u7ea7\u81ea\u7136\u5316\u6765\u5206\u89e3\u590d\u6742\u7684\u7a0b\u5e8f\uff0c\u7136\u540e\u5e94\u7528\u7b26\u53f7\u7ea7\u81ea\u7136\u5316\u6765\u89e3\u91ca\u590d\u6742\u7684\u7b26\u53f7\u3002\u6211\u4eec\u5728\u5305\u542b25,872\u4e2aC/C++\u51fd\u6570\u53ca\u5176\u63d0\u4ea4\u7684\u516c\u5f00\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86CLNX\u3002\u7ed3\u679c\u8868\u660e\uff0cCLNX\u663e\u8457\u63d0\u5347\u4e86LLM\u8bc6\u522bC/C++ VCC\u7684\u80fd\u529b\u3002\u6b64\u5916\uff0c\u914d\u5907CLNX\u7684CodeBERT\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u4f18\u6027\u80fd\uff0c\u5e76\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u8bc6\u522b\u4e8638\u4e2aOSS\u6f0f\u6d1e\u3002|\n", "2409.07394": "|**2024-09-11**|**AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge**|Han Wang et.al.|[2409.07394](http://arxiv.org/abs/2409.07394)|**[link](https://github.com/hannight/adacad)**|**\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u53c2\u6570\u5b58\u50a8\u7684\u77e5\u8bc6\u4e4b\u95f4\u5b58\u5728\u77e5\u8bc6\u51b2\u7a81\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u4f7f\u7528\u6807\u51c6\u89e3\u7801\u6280\u672f\u65f6\u6027\u80fd\u53d7\u635f\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u6280\u672f\u5f80\u5f80\u5ffd\u89c6\u4e86\u4e0a\u4e0b\u6587\u3002\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u95f4\u5bf9\u6bd4\u65b9\u6cd5\u8bd5\u56fe\u901a\u8fc7\u6bd4\u8f83\u5e26\u6709\u548c\u4e0d\u5e26\u6709\u4e0a\u4e0b\u6587\u7684LLM\u8f93\u51fa\u5206\u5e03\u4e4b\u95f4\u7684\u5bf9\u6bd4\uff0c\u5e76\u6839\u636e\u5b83\u4eec\u4e4b\u95f4\u7684\u5bf9\u6bd4\u8c03\u6574\u6a21\u578b\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e9b\u65b9\u6cd5\u7ecf\u5e38\u9519\u8bef\u5730\u5224\u65ad\u51b2\u7a81\u7684\u7a0b\u5ea6\uff0c\u5e76\u4e14\u96be\u4ee5\u5904\u7406\u4e0d\u540c\u51b2\u7a81\u7a0b\u5ea6\u7684\u5b9e\u4f8b\uff0c\u9759\u6001\u65b9\u6cd5\u5728\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\u8fc7\u5ea6\u8c03\u6574\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u5b9e\u4f8b\u7684\u7cbe\u7ec6\u7c92\u5ea6\u65b9\u6cd5AdaCAD\uff0c\u5b83\u52a8\u6001\u5730\u6839\u636eJensen-Shannon\u6563\u5ea6\u6d4b\u91cf\u7684\u4e0a\u4e0b\u6587\u548c\u53c2\u6570\u77e5\u8bc6\u5206\u5e03\u4e4b\u95f4\u7684\u51b2\u7a81\u7a0b\u5ea6\u6765\u63a8\u65ad\u8c03\u6574\u6743\u91cd\u3002\u6211\u4eec\u5728\u56db\u4e2a\u6a21\u578b\u4e0a\u5bf9\u516d\u4e2a\u591a\u6837\u5316\u7684\u95ee\u7b54\uff08QA\uff09\u6570\u636e\u96c6\u548c\u4e09\u4e2a\u6458\u8981\u4efb\u52a1\u8fdb\u884c\u7684\u5b9e\u9a8c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65e0\u9700\u8bad\u7ec3\u7684\u81ea\u9002\u5e94\u65b9\u6cd5\u59cb\u7ec8\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u4f18\u4e8e\u5176\u4ed6\u89e3\u7801\u65b9\u6cd5\uff0c\u5e73\u5747\u51c6\u786e\u7387\u63d0\u9ad8\u4e8614.21%\uff08\u7edd\u5bf9\u503c\uff09\uff0c\u5e76\u4e14\u63d0\u9ad8\u4e86\u6458\u8981\u7684\u771f\u5b9e\u6027\uff0cAlignScore\u63d0\u9ad8\u4e865.59\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0c\u4e0e\u51b2\u7a81\u7684\u5bf9\u6bd4\u57fa\u7ebf\u76f8\u6bd4\uff0c\u5f53\u51b2\u7a81\u4e0d\u5b58\u5728\u65f6\uff0c\u89e3\u7801\u4f1a\u635f\u5bb3\u6027\u80fd\uff0c\u800cAdaCAD\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u635f\u5931\uff0c\u4f7f\u5176\u66f4\u9002\u7528\u4e8e\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u96c6\uff0c\u5728\u8fd9\u4e9b\u6570\u636e\u96c6\u4e2d\uff0c\u6709\u4e9b\u793a\u4f8b\u5b58\u5728\u51b2\u7a81\uff0c\u800c\u5176\u4ed6\u793a\u4f8b\u5219\u4e0d\u5b58\u5728\u51b2\u7a81\u3002**|\n", "2409.07368": "|**2024-09-11**|**Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code**|Khiem Ton et.al.|[2409.07368](http://arxiv.org/abs/2409.07368)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aSGCode\u7684\u7075\u6d3b\u63d0\u793a\u4f18\u5316\u7cfb\u7edf\uff0c\u7528\u4e8e\u901a\u8fc7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u5b89\u5168\u4ee3\u7801\u3002SGCode\u5c06\u6700\u8fd1\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e0eLLM\u7ed3\u5408\u5728\u4e00\u4e2a\u7edf\u4e00\u7684\u7cfb\u7edf\u4e2d\uff0c\u901a\u8fc7\u524d\u7aef\u548c\u540e\u7aefAPI\u63d0\u4f9b\u670d\u52a1\uff0c\u4f7f\u7528\u6237\u80fd\u591f\uff1a1\uff09\u751f\u6210\u65e0\u6f0f\u6d1e\u7684\u5b89\u5168\u4ee3\u7801\uff1b2\uff09\u67e5\u770b\u548c\u5171\u4eab\u5b89\u5168\u6027\u5206\u6790\uff1b\u4ee5\u53ca3\uff09\u8f7b\u677e\u5728\u4e0d\u540c\u7684\u63d0\u793a\u4f18\u5316\u65b9\u6cd5\u4e4b\u95f4\u5207\u6362\uff0c\u5e76\u63d0\u4f9b\u6709\u5173\u6a21\u578b\u548c\u7cfb\u7edf\u6027\u80fd\u7684\u89c1\u89e3\u3002\u6211\u4eec\u4f7f\u7528AWS\u670d\u52a1\u5668\u4e0a\u7684PromSec\u586b\u5145SGCode\uff0c\u8fd9\u662f\u4e00\u79cd\u65b9\u6cd5\uff0c\u901a\u8fc7\u5c06LLM\u3001\u5b89\u5168\u5de5\u5177\u4e0e\u8f7b\u91cf\u7ea7\u751f\u6210\u5bf9\u6297\u56fe\u795e\u7ecf\u7f51\u7edc\u76f8\u7ed3\u5408\uff0c\u6765\u68c0\u6d4b\u5e76\u4fee\u590d\u751f\u6210\u4ee3\u7801\u4e2d\u7684\u5b89\u5168\u6f0f\u6d1e\uff0c\u4ece\u800c\u4f18\u5316\u63d0\u793a\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u8868\u660e\uff0cSGCode\u4f5c\u4e3a\u516c\u5171\u5de5\u5177\uff0c\u80fd\u591f\u63ed\u793a\u6a21\u578b\u5b9e\u7528\u6027\u3001\u5b89\u5168\u4ee3\u7801\u751f\u6210\u548c\u7cfb\u7edf\u6210\u672c\u4e4b\u95f4\u7684\u6743\u8861\uff0c\u5177\u6709\u76f8\u5bf9\u8f83\u4f4e\u7684\u6210\u672c\u3002SGCode\u5df2\u4e0a\u7ebf\u4e8e\uff1a\u3002|\n", "2409.07355": "|**2024-09-11**|**Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation**|SeongYeub Chu et.al.|[2409.07355](http://arxiv.org/abs/2409.07355)|**[link](https://github.com/BBeeChu/InteractEval)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cInteractEval\u201d\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u91c7\u7528\u201cThink-Aloud\u201d\u65b9\u6cd5\u7ed3\u5408\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u610f\u89c1\uff0c\u4ee5\u751f\u6210\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u7684\u5c5e\u6027\u3002\u901a\u8fc7\u878d\u5408\u4eba\u7c7b\u7684\u7075\u6d3b\u6027\u548c\u63a8\u7406\u80fd\u529b\u4ee5\u53caLLM\u7684\u4e00\u81f4\u6027\uff0cInteractEval\u5728\u4e00\u81f4\u6027\u3001\u6d41\u7545\u6027\u3001\u76f8\u5173\u6027\u548c\u8fde\u8d2f\u6027\u56db\u4e2a\u7ef4\u5ea6\u4e0a\u5747\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684\u975eLLM\u57fa\u7ebf\u548cLLM\u57fa\u7ebf\u6a21\u578b\u3002\u5b9e\u9a8c\u8fd8\u63a2\u8ba8\u4e86\u201cThink-Aloud\u201d\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u8868\u660e\u5b83\u80fd\u4fc3\u8fdb\u4eba\u7c7b\u548cLLM\u7684\u53d1\u6563\u601d\u7ef4\uff0c\u4ece\u800c\u4ea7\u751f\u66f4\u5e7f\u6cdb\u7684\u76f8\u5173\u5c5e\u6027\uff0c\u5e76\u63d0\u9ad8\u6587\u672c\u8bc4\u4f30\u6027\u80fd\u3002\u6bd4\u8f83\u5206\u6790\u663e\u793a\uff0c\u4eba\u7c7b\u5728\u8bc6\u522b\u4e0e\u5185\u90e8\u8d28\u91cf\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u8fde\u8d2f\u6027\u548c\u6d41\u7545\u6027\uff09\u65b9\u9762\u8868\u73b0\u4f18\u5f02\uff0c\u800cLLM\u5728\u4e0e\u5916\u90e8\u5bf9\u9f50\u76f8\u5173\u7684\u5c5e\u6027\uff08\u5982\u4e00\u81f4\u6027\u548c\u76f8\u5173\u6027\uff09\u4e0a\u8868\u73b0\u66f4\u597d\u3002\u56e0\u6b64\uff0c\u7ed3\u5408\u4eba\u7c7b\u548cLLM\u5171\u540c\u4ea7\u751f\u7684\u8bc4\u4f30\u7ed3\u679c\u6700\u4f73\u3002\u6362\u53e5\u8bdd\u8bf4\uff0c\u672c\u6587\u5f3a\u8c03\u4e86\u5728\u81ea\u52a8\u5316\u57fa\u4e8e\u68c0\u67e5\u6e05\u5355\u7684\u6587\u672c\u8bc4\u4f30\u6846\u67b6\u4e2d\u6709\u6548\u6574\u5408\u4eba\u7c7b\u548cLLM\u7684\u5fc5\u8981\u6027\u3002\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\textbf{\\url{https://github.com/BBeeChu/InteractEval.git}}}\u3002**|\n", "2409.07331": "|**2024-09-11**|**Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering**|Weixi Weng et.al.|[2409.07331](http://arxiv.org/abs/2409.07331)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u95ee\u7b54\uff08VQA\uff09\u4efb\u52a1\u4e0a\u5c55\u793a\u4e86\u51fa\u8272\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002\u7136\u800c\uff0c\u5728\u77e5\u8bc6\u57fa\u89c6\u89c9\u95ee\u7b54\uff08KB-VQA\uff09\u4efb\u52a1\u4e2d\uff0cMLLMs\u53ef\u80fd\u7f3a\u4e4f\u4eba\u7c7b\u5e38\u8bc6\u6216\u7279\u5b9a\u9886\u57df\u7684\u4e13\u4e1a\u77e5\u8bc6\uff0c\u4ece\u800c\u9700\u8981\u4ece\u5916\u90e8\u77e5\u8bc6\u6e90\u83b7\u53d6\u6240\u9700\u4fe1\u606f\u4ee5\u56de\u7b54\u6b64\u7c7b\u95ee\u9898\u3002\u5148\u524d\u7684\u5de5\u4f5c\uff0c\u5982\u68c0\u7d22\u589e\u5f3a\u7684VQA-v2\uff08RAVQA-v2\uff09\uff0c\u4fa7\u91cd\u4e8e\u5145\u5206\u5229\u7528\u8f93\u5165\u4fe1\u606f\uff0c\u4f8b\u5982\u56fe\u50cf\u6587\u672c\u63cf\u8ff0\u548c\u68c0\u7d22\u7684\u77e5\u8bc6\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u90fd\u5ffd\u89c6\u4e86\u4e00\u4e2a\u95ee\u9898\uff1a\u968f\u7740\u8f93\u5165\u4ee4\u724c\u6570\u91cf\u7684\u589e\u52a0\uff0c\u63a8\u7406\u6548\u7387\u663e\u8457\u964d\u4f4e\uff0c\u8fd9\u4e0e\u5b9e\u9645\u5e94\u7528\u7684\u9700\u6c42\u76f8\u77db\u76fe\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u68c0\u7d22\u589e\u5f3a\u7684\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08RACC\uff09\u3002RACC\u5b66\u4e60\u538b\u7f29\u5e76\u805a\u5408\u68c0\u7d22\u4e0a\u4e0b\u6587\uff0c\u5e76\u751f\u6210\u7d27\u51d1\u7684\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u5f62\u5f0f\u7684\u8c03\u8282\u3002\u7136\u540e\uff0c\u4f7f\u7528\u8fd9\u79cd\u8c03\u8282\u6765\u9002\u5e94\u4e0b\u6e38\u51bb\u7ed3\u7684MLLM\uff0c\u4ece\u800c\u5b9e\u73b0\u6709\u6548\u4e14\u9ad8\u6548\u7684\u63a8\u7406\u3002RACC\u5728OK-VQA\u4e0a\u5b9e\u73b0\u4e86\u5f53\u524d\u6700\u4f73\u768462.9%\u6027\u80fd\u3002\u6b64\u5916\uff0c\u5b83\u5c06RAVQA-v2\u7684\u63a8\u7406\u5ef6\u8fdf\u663e\u8457\u964d\u4f4e\u4e8622.0%-59.7%\u3002\u5927\u91cf\u7684\u5b9e\u9a8c\u8868\u660e\u4e86RACC\u7684\u5e7f\u6cdb\u9002\u7528\u6027\u3002\u5b83\u4e0e\u5404\u79cd\u73b0\u6210\u7684MLLM\u517c\u5bb9\uff0c\u5e76\u53ef\u4ee5\u5904\u7406\u5305\u62ec\u6587\u672c\u548c\u591a\u6a21\u6001\u6587\u6863\u5728\u5185\u7684\u4e0d\u540c\u77e5\u8bc6\u6e90\u3002|\n", "2409.07314": "|**2024-09-11**|**MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications**|Praveen K Kanithi et.al.|[2409.07314](http://arxiv.org/abs/2409.07314)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u533b\u7597\u5065\u5eb7\u9886\u57df\u7684\u5feb\u901f\u5f00\u53d1\u5f15\u53d1\u4e86\u5bf9\u8d85\u8d8a\u5982USMLE\u7b49\u5e38\u7528\u57fa\u51c6\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u9700\u6c42\uff0c\u4ee5\u66f4\u597d\u5730\u53cd\u6620\u5b9e\u9645\u5e94\u7528\u8868\u73b0\u3002\u867d\u7136\u73b0\u5b9e\u4e16\u754c\u7684\u8bc4\u4f30\u662f\u5b9e\u7528\u6027\u7684\u91cd\u8981\u6307\u6807\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u843d\u540e\u4e8eLLM\u6f14\u8fdb\u7684\u901f\u5ea6\uff0c\u53ef\u80fd\u5bfc\u81f4\u7814\u7a76\u7ed3\u679c\u5728\u90e8\u7f72\u65f6\u53d8\u5f97\u8fc7\u65f6\u3002\u8fd9\u79cd\u65f6\u95f4\u4e0a\u7684\u8131\u8282\u9700\u8981\u4e00\u79cd\u5168\u9762\u7684\u524d\u671f\u8bc4\u4f30\u65b9\u6cd5\uff0c\u4ee5\u6307\u5bfc\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6a21\u578b\u9009\u62e9\u3002 \u6211\u4eec\u5f15\u5165\u4e86MEDIC\u6846\u67b6\uff0c\u5b83\u4ece\u4e94\u4e2a\u5173\u952e\u7684\u4e34\u5e8a\u80fd\u529b\u7ef4\u5ea6\u8bc4\u4f30LLM\uff1a\u533b\u5b66\u63a8\u7406\u3001\u4f26\u7406\u4e0e\u504f\u89c1\u3001\u6570\u636e\u548c\u8bed\u8a00\u7406\u89e3\u3001\u4e0a\u4e0b\u6587\u5b66\u4e60\u4ee5\u53ca\u4e34\u5e8a\u5b89\u5168\u6027\u3002MEDIC\u91c7\u7528\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u4ea4\u53c9\u5ba1\u67e5\u6846\u67b6\uff0c\u91cf\u5316\u4e86LLM\u5728\u8986\u76d6\u8303\u56f4\u548c\u5e7b\u89c9\u68c0\u6d4b\u7b49\u9886\u57df\u7684\u6027\u80fd\uff0c\u800c\u65e0\u9700\u53c2\u8003\u8f93\u51fa\u3002\u6211\u4eec\u4f7f\u7528MEDIC\u5bf9\u533b\u7597\u95ee\u7b54\u3001\u5b89\u5168\u3001\u603b\u7ed3\u3001\u7b14\u8bb0\u751f\u6210\u4ee5\u53ca\u5176\u4ed6\u4efb\u52a1\u8fdb\u884c\u4e86\u8bc4\u4f30\u3002 \u6211\u4eec\u7684\u7ed3\u679c\u663e\u793a\u4e0d\u540c\u6a21\u578b\u5927\u5c0f\u4e4b\u95f4\u3001\u57fa\u7ebf\u6a21\u578b\u4e0e\u533b\u5b66\u5fae\u8c03\u6a21\u578b\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u5f02\uff0c\u5e76\u5bf9\u9700\u8981\u7279\u5b9a\u6a21\u578b\u4f18\u52bf\u7684\u5e94\u7528\uff08\u5982\u4f4e\u5e7b\u89c9\u6216\u8f83\u4f4e\u63a8\u7406\u6210\u672c\uff09\u7684\u6a21\u578b\u9009\u62e9\u5177\u6709\u542f\u793a\u610f\u4e49\u3002MEDIC\u7684\u591a\u7ef4\u5ea6\u8bc4\u4f30\u63ed\u793a\u4e86\u7406\u8bba\u80fd\u529b\u548c\u5b9e\u9645\u5b9e\u65bd\u4e4b\u95f4\u7684\u6027\u80fd\u6743\u8861\uff0c\u5f25\u5408\u4e86\u5728\u533b\u7597\u4fdd\u5065\u73af\u5883\u4e2d\u8bc6\u522b\u548c\u9002\u5e94\u6700\u6709\u524d\u666f\u6a21\u578b\u7684\u5dee\u8ddd\uff0c\u786e\u4fdd\u4e86\u9002\u5408\u591a\u79cd\u533b\u7597\u4fdd\u5065\u5e94\u7528\u7684\u6a21\u578b\u5f97\u5230\u8bc6\u522b\u548c\u9002\u5e94\u3002|\n", "2409.07276": "|**2024-09-11**|**STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM**|Qijiong Liu et.al.|[2409.07276](http://arxiv.org/abs/2409.07276)|null|\u4f20\u7edf\u63a8\u8350\u6a21\u578b\u901a\u5e38\u4f9d\u8d56\u4e8e\u72ec\u7279\u7684\u9879\u76ee\u6807\u8bc6\u7b26\uff08ID\uff09\u6765\u533a\u5206\u9879\u76ee\uff0c\u8fd9\u53ef\u80fd\u9650\u5236\u4e86\u5b83\u4eec\u5229\u7528\u9879\u76ee\u5185\u5bb9\u4fe1\u606f\u548c\u63a8\u5e7f\u957f\u5c3e\u6216\u51b7\u542f\u52a8\u9879\u76ee\u7684\u80fd \u529b\u3002\u8fd1\u671f\uff0c\u5df2\u63d0\u51fa\u8bed\u4e49\u5206\u8bcd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\uff0c\u65e8\u5728\u5c06\u6bcf\u4e2a\u9879\u76ee\u7684\u8bed\u4e49\u8868\u793a\u5206\u8bcd\u4e3a\u4e00\u7cfb\u5217\u79bb\u6563\u7684\u4ee4\u724c\u3002\u901a\u8fc7\u8fd9\u79cd\u65b9\u5f0f\uff0c\u5b83\u4fdd \u7559\u4e86\u9879\u76ee\u5728\u8fd9\u4e9b\u4ee4\u724c\u5185\u7684\u8bed\u4e49\uff0c\u5e76\u786e\u4fdd\u5177\u6709\u76f8\u4f3c\u8bed\u4e49\u7684\u9879\u76ee\u7531\u76f8\u4f3c\u7684\u4ee4\u724c\u8868\u793a\u3002\u8fd9\u4e9b\u8bed\u4e49\u4ee4\u724c\u6210\u4e3a\u8bad\u7ec3\u751f\u6210\u63a8\u8350\u6a21\u578b\u7684\u57fa\u7840\u3002\u7136\u800c\uff0c\u73b0\u6709 \u7684\u751f\u6210\u63a8\u8350\u65b9\u6cd5\u901a\u5e38\u6d89\u53ca\u591a\u4e2a\u5b50\u6a21\u578b\u8fdb\u884c\u5d4c\u5165\u3001\u91cf\u5316\u548c\u63a8\u8350\uff0c\u5bfc\u81f4\u7cfb\u7edf\u8fc7\u4e8e\u590d\u6742\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u79f0\u4e3aSTORE\uff0c \u5229\u7528\u5355\u4e00\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u540c\u65f6\u6267\u884c\u8fd9\u4e24\u9879\u4efb\u52a1\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u8bed\u4e49\u5206\u8bcd\u8868\u8ff0\u4e3a\u6587\u672c\u5230\u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u800c\u751f\u6210\u63a8\u8350\u5219\u8868\u8ff0\u4e3a\u4ee4\u724c\u5230 \u4ee4\u724c\u7684\u4efb\u52a1\uff0c\u901a\u8fc7\u8865\u5145\u4ee4\u724c\u5230\u6587\u672c\u91cd\u6784\u4efb\u52a1\u548c\u6587\u672c\u5230\u4ee4\u724c\u8f85\u52a9\u4efb\u52a1\uff0c\u6240\u6709\u8fd9\u4e9b\u4efb\u52a1\u5747\u4ee5\u751f\u6210\u65b9\u5f0f\u8868\u8ff0\u5e76\u4f7f\u7528\u5355\u4e00LLM\u9aa8\u5e72\u8fdb\u884c\u8bad\u7ec3\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u5b9e\u9a8c\uff0c\u4ee5\u9a8c\u8bc1\u6211\u4eec\u7684STORE\u6846\u67b6\u5728\u5404\u79cd\u63a8\u8350\u4efb\u52a1\u548c\u6570\u636e\u96c6\u4e0a\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u6e90\u4ee3\u7801\u548c\u914d\u7f6e\uff0c\u4ee5\u4fbf\u8fdb\u884c\u53ef\u590d\u73b0\u7684\u7814\u7a76\u3002|\n", "2409.07267": "|**2024-09-11**|**MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving**|Enming Zhang et.al.|[2409.07267](http://arxiv.org/abs/2409.07267)|**[link](https://github.com/emzucas/minidrive)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMiniDrive\u7684\u65b0\u578b\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLM\uff09\u5728\u81ea\u52a8\u9a7e\u9a76\u573a\u666f\u4e2d\u7684\u5e94\u7528\u96be\u9898\u3002\u73b0\u6709\u7684VLM\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u8ba1\u7b97\u5bc6\u96c6\u578b\u7684\u89c6\u89c9\u7f16\u7801\u5668\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u5728\u5b9e\u9645\u4e16\u754c\u548c\u5b9e\u65f6\u5e94\u7528\u4e2d\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u5927\u591a\u6570\u73b0\u6709VLM\u7f3a\u4e4f\u5904\u7406\u591a\u5f20\u56fe\u7247\u7684\u80fd\u529b\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u96be\u4ee5\u9002\u5e94\u81ea\u52a8\u9a7e\u9a76\u4e2d\u7684\u591a\u6444\u50cf\u5934\u611f\u77e5\u9700\u6c42\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u7279\u5f81\u5de5\u7a0b\u6df7\u5408\u4e13\u5bb6\uff08FE-MoE\uff09\u548c\u52a8\u6001\u6307\u4ee4\u9002\u914d\u5668\uff08DI-Adapter\uff09\u3002FE-MoE\u6709\u6548\u5730\u5c06\u4e8c\u7ef4\u7279\u5f81\u6620\u5c04\u5230\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\uff0c\u7136\u540e\u4f5c\u4e3a\u8f93\u5165\u4f20\u9012\u7ed9\u8bed\u8a00\u6a21\u578b\u3002DI-Adapter\u5141\u8bb8\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u6839\u636e\u6307\u4ee4\u6587\u672c\u5d4c\u5165\u52a8\u6001\u53d8\u5316\uff0c\u89e3\u51b3\u4e86\u4ee5\u5f80\u65b9\u6cd5\u4e2d\u540c\u4e00\u56fe\u7247\u4e0b\u9759\u6001\u89c6\u89c9\u4ee4\u724c\u5d4c\u5165\u7684\u95ee\u9898\u3002 \u4e0e\u4e4b\u524d\u7684\u6210\u679c\u76f8\u6bd4\uff0cMiniDrive\u5728\u53c2\u6570\u5927\u5c0f\u3001\u6d6e\u70b9\u8fd0\u7b97\u91cf\u548c\u54cd\u5e94\u6548\u7387\u65b9\u9762\u5747\u8fbe\u5230\u4e86\u6700\u4f18\u6027\u80fd\uff0c\u6700\u5c0f\u7248\u672c\u4ec5\u5305\u542b83M\u53c2\u6570\u3002|\n", "2409.08264": "|**2024-09-12**|**Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale**|Rogerio Bonatti et.al.|[2409.08264](http://arxiv.org/abs/2409.08264)|**[link](https://github.com/microsoft/windowsagentarena)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u5728\u9700\u8981\u89c4\u5212\u548c\u63a8\u7406\u7684\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u4f5c\u4e3a\u8ba1\u7b97\u673a\u4ee3\u7406\u7684\u5f3a\u5927\u6f5c\u529b\uff0c\u80fd\u663e\u8457\u63d0\u5347\u4eba\u7c7b\u751f\u4ea7\u529b\u548c\u8f6f\u4ef6\u53ef\u8bbf\u95ee\u6027\u3002\u7136\u800c\uff0c\u8861\u91cf\u8fd9\u4e9b\u4ee3\u7406\u5728\u771f\u5b9e\u73af\u5883\u4e2d\u7684\u6027\u80fd\u4ecd\u5b58\u5728\u6311\u6218\uff1a\uff08i\uff09\u5927\u591a\u6570\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u9650\u4e8e\u7279\u5b9a\u6a21\u6001\u6216\u9886\u57df\uff08\u4f8b\u5982\u7eaf\u6587\u672c\u3001\u7f51\u9875\u5bfc\u822a\u3001\u95ee\u9898\u56de\u7b54\u3001\u7f16\u7a0b\uff09\uff0c\uff08ii\uff09\u5b8c\u6574\u57fa\u51c6\u8bc4\u4f30\u8017\u65f6\u957f\uff08\u901a\u5e38\u9700\u6570\u5929\u65f6\u95f4\uff09\uff0c\u56e0\u4e3a\u4efb\u52a1\u5177\u6709\u591a\u6b65\u9aa4\u7684\u5e8f\u5217\u6027\u8d28\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201cWindows Agent Arena\u201d\uff1a\u4e00\u4e2a\u53ef\u590d\u73b0\u7684\u901a\u7528\u73af\u5883\uff0c\u4e13\u6ce8\u4e8eWindows\u64cd\u4f5c\u7cfb\u7edf\uff0c\u5141\u8bb8\u4ee3\u7406\u81ea\u7531\u64cd\u4f5c\u5e76\u4f7f\u7528\u4e0e\u4eba\u7c7b\u7528\u6237\u5728\u89e3\u51b3\u4efb\u52a1\u65f6\u76f8\u540c\u7684\u5e7f\u6cdb\u5e94\u7528\u7a0b\u5e8f\u3001\u5de5\u5177\u548c\u7f51\u7edc\u6d4f\u89c8\u5668\u3002\u6211\u4eec\u6839\u636eOSWorld\u6846\u67b6\uff08Xie\u7b49\u4eba\uff0c2024\u5e74\uff09\u521b\u5efa\u4e86150\u591a\u4e2a\u8de8\u4ee3\u8868\u9886\u57df\u7684\u591a\u6837\u5316Windows\u4efb\u52a1\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u89c4\u5212\u3001\u5c4f\u5e55\u7406\u89e3\u53ca\u5de5\u5177\u4f7f\u7528\u7684\u4ee3\u7406\u80fd\u529b\u8981\u6c42\u3002 \u6211\u4eec\u7684\u57fa\u51c6\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u5e76\u80fd\u591f\u65e0\u7f1d\u5730\u5728Azure\u4e0a\u5e76\u884c\u5316\uff0c\u4ece\u800c\u5728\u77ed\u77ed20\u5206\u949f\u5185\u5b8c\u6210\u5168\u9762\u57fa\u51c6\u8bc4\u4f30\u3002\u4e3a\u4e86\u5c55\u793aWindows Agent Arena\u7684\u80fd\u529b\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ee3\u7406Navi\u3002Navi\u5728Windows\u9886\u57df\u5185\u7684\u6210\u529f\u7387\u8fbe\u5230\u4e8619.5%\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u672a\u7ecf\u8f85\u52a9\u7684\u4eba\u7c7b\u8868\u73b0\u5219\u4e3a74.5%\u3002\u6b64\u5916\uff0cNavi\u5728\u53e6\u4e00\u4e2a\u6d41\u884c\u7684\u57fa\u4e8e\u7f51\u7edc\u7684\u57fa\u51c6\u6d4b\u8bd5Mind2Web\u4e2d\u4e5f\u8868\u73b0\u51fa\u8272\u3002 \u6211\u4eec\u63d0\u4f9b\u4e86\u5bf9Navi\u6027\u80fd\u7684\u8be6\u7ec6\u5b9a\u91cf\u548c\u5b9a\u6027\u5206\u6790\uff0c\u5e76\u63d0\u4f9b\u4e86\u5229\u7528Windows Agent Arena\u8fdb\u884c\u672a\u6765\u7814\u7a76\u7684\u4ee3\u7406\u5f00\u53d1\u548c\u6570\u636e\u751f\u6210\u673a\u4f1a\u7684\u89c1\u89e3\u3002\u7f51\u9875\uff1ahttps://microsoft.github.io/WindowsAgentArena \u4ee3\u7801\uff1ahttps://github.com/microsoft/WindowsAgentArena**|\n", "2409.08250": "|**2024-09-12**|**OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering**|Jiahao Nick Li et.al.|[2409.08250](http://arxiv.org/abs/2409.08250)|null|\u4eba\u4eec\u5e38\u901a\u8fc7\u7167\u7247\u3001\u5c4f\u5e55\u622a\u56fe\u548c\u89c6\u9891\u6765\u6355\u6349\u8bb0\u5fc6\u3002\u73b0\u6709\u7684\u57fa\u4e8eAI\u7684\u5de5\u5177\u80fd\u591f\u4f7f\u7528\u81ea\u7136\u8bed\u8a00\u68c0\u7d22\u8fd9\u4e9b\u6570\u636e\uff0c\u4f46\u4e3b\u8981\u5c40\u9650\u4e8e\u68c0\u7d22\u50cf\u7167\u7247\u4e2d\u7684\u7279\u5b9a\u7269\u4f53\u8fd9\u6837\u7684\u5355\u4e00\u4fe1\u606f\uff0c\u96be\u4ee5\u5904\u7406\u6d89\u53ca\u7406\u89e3\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\uff08\u5982\u4e8b\u4ef6\u5e8f\u5217\uff09\u7684\u66f4\u590d\u6742\u67e5\u8be2\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u4e3a\u671f\u4e00\u4e2a\u6708\u7684\u65e5\u5fd7\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u73b0\u5b9e\u7528\u6237\u67e5\u8be2\uff0c\u5e76\u751f\u6210\u4e86\u4e00\u4e2a\u96c6\u6210\u4e0e\u6355\u83b7\u8bb0\u5fc6\u76f8\u5173\u5fc5\u8981\u4e0a\u4e0b\u6587\u4fe1\u606f\u7684\u5206\u7c7b\u4f53\u7cfb\u3002\u968f\u540e\uff0c\u6211\u4eec\u5f15\u5165\u4e86OmniQuery\uff0c\u8fd9\u662f\u4e00\u79cd\u80fd\u591f\u56de\u7b54\u9700\u8981\u63d0\u53d6\u548c\u63a8\u65ad\u591a\u5c42\u4e0a\u4e0b\u6587\u4fe1\u606f\u4ee5\u6574\u5408\u76f8\u4e92\u5173\u8054\u8bb0\u5fc6\u7684\u590d\u6742\u4e2a\u4eba\u8bb0\u5fc6\u76f8\u5173\u95ee\u9898\u7684\u65b0\u578b\u7cfb\u7edf\u3002OmniQuery\u901a\u8fc7\u4ece\u591a\u4e2a\u76f8\u4e92\u5173\u8054\u7684\u8bb0\u5fc6\u4e2d\u96c6\u6210\u5206\u6563\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u6765\u589e\u5f3a\u5355\u4e2a\u6355\u83b7\u7684\u8bb0\u5fc6\uff0c\u68c0\u7d22\u76f8\u5173\u8bb0\u5fc6\uff0c\u5e76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63d0\u4f9b\u5168\u9762\u7684\u7b54\u6848\u3002\u5728\u4eba\u7c7b\u8bc4\u4f30\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86OmniQuery\u7684\u6709\u6548\u6027\uff0c\u51c6\u786e\u7387\u8fbe\u523071.5%\uff0c\u5e76\u4e14\u5b83\u572874.5%\u7684\u65f6\u95f4\u91cc\u8d85\u8d8a\u4e86\u4f20\u7edf\u7684RAG\u7cfb\u7edf\uff0c\u5728\u67d0\u4e9b\u4efb\u52a1\u4e0a\u751a\u81f3\u53d6\u5f97\u4e86\u80dc\u5229\u6216\u5e76\u5217\u7b2c\u4e00\u7684\u6210\u7ee9\u3002|\n", "2409.08239": "|**2024-09-12**|**Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources**|Alisia Lupidi et.al.|[2409.08239](http://arxiv.org/abs/2409.08239)|null|\u5728\u9762\u5bf9\u4f9d\u8d56\u7ed3\u6784\u5316\u6570\u636e\u3001\u590d\u6742\u63a8\u7406\u6216\u5de5\u5177\u4f7f\u7528\u7684\u6311\u6218\u6027\u573a\u666f\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSource2Synth\u7684\u65b0\u65b9\u6cd5\uff0c\u5b83\u65e0\u9700\u6602\u8d35\u7684\u4eba\u7c7b\u6807\u6ce8\u5373\u53ef\u7528\u4e8e\u6559\u6388LLMs\u65b0\u6280\u80fd\u3002Source2Synth\u63a5\u53d7\u81ea\u5b9a\u4e49\u6570\u636e\u6e90\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u751f\u6210\u5177\u6709\u57fa\u4e8e\u73b0\u5b9e\u4e16\u754c\u6765\u6e90\u7684\u4e2d\u95f4\u63a8\u7406\u6b65\u9aa4\u7684\u5408\u6210\u6570\u636e\u70b9\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u6839\u636e\u5176\u53ef\u56de\u7b54\u6027\u4e22\u5f03\u4f4e\u8d28\u91cf\u751f\u6210\u6765\u63d0\u9ad8\u6570\u636e\u96c6\u8d28\u91cf\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u9886\u57df\u4e2d\u5e94\u7528\u6b64\u65b9\u6cd5\u6765\u5c55\u793a\u5176\u901a\u7528\u6027\uff1a\u5728\u591a\u8df3\u95ee\u9898\u56de\u7b54\uff08MHQA\uff09\u4e2d\u6d4b\u8bd5\u63a8\u7406\u80fd\u529b\uff0c\u5728\u8868\u683c\u578b\u95ee\u9898\u56de\u7b54\uff08TQA\uff09\u4e2d\u6d4b\u8bd5\u5de5\u5177\u4f7f\u7528\u3002\u4e0e\u7ecf\u8fc7\u5fae\u8c03\u7684\u57fa\u672c\u6a21\u578b\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728WikiSQL\u4e0a\u7684TQA\u4e0a\u63d0\u9ad8\u4e8625.51%\uff0c\u5728HotPotQA\u4e0a\u7684MHQA\u4e0a\u63d0\u9ad8\u4e8622.57%\u7684\u6027\u80fd\u3002|\n", "2409.08234": "|**2024-09-12**|**LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems**|Hakan T. Otal et.al.|[2409.08234](http://arxiv.org/abs/2409.08234)|**[link](https://github.com/ai-in-complex-systems-lab/llm-honeypot)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u521b\u65b0\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u6784\u5efa\u771f\u5b9e\u4e14\u4e92\u52a8\u7684\u871c\u7f50\u7cfb\u7edf\u3002\u901a\u8fc7\u5728\u5305\u542b\u653b\u51fb\u8005\u751f\u6210\u547d\u4ee4\u548c\u54cd\u5e94\u7684\u591a\u6837\u5316\u6570\u636e\u96c6\u4e0a\u5bf9\u5f00\u6e90\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u5f00\u53d1\u51fa\u4e00\u79cd\u80fd\u591f\u4e0e\u653b\u51fb\u8005\u8fdb\u884c\u9ad8\u7ea7\u4ea4\u4e92\u7684\u871c\u7f50\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5173\u952e\u6b65\u9aa4\uff1a\u6570\u636e\u6536\u96c6\u4e0e\u5904\u7406\u3001\u63d0\u793a\u5de5\u7a0b\u3001\u6a21\u578b\u9009\u62e9\u4ee5\u53ca\u76d1\u7763\u5f0f\u5fae\u8c03\uff0c\u4ee5\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002\u901a\u8fc7\u76f8\u4f3c\u6027\u6307\u6807\u8bc4\u4f30\u4e0e\u73b0\u573a\u90e8\u7f72\uff0c\u7ed3\u679c\u663e\u793a\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u751f\u6210\u51c6\u786e\u4e14\u4fe1\u606f\u4e30\u5bcc\u7684\u54cd\u5e94\u3002\u7814\u7a76\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u5728\u91cd\u5851\u871c\u7f50\u6280\u672f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u4e3a\u7f51\u7edc\u5b89\u5168\u4e13\u4e1a\u4eba\u5458\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u5de5\u5177\u6765\u68c0\u6d4b\u548c\u5206\u6790\u6076\u610f\u6d3b\u52a8\uff0c\u4ece\u800c\u589e\u5f3a\u6574\u4f53\u5b89\u5168\u67b6\u6784\u3002**|\n", "2409.08202": "|**2024-09-12**|**What Makes a Maze Look Like a Maze?**|Joy Hsu et.al.|[2409.08202](http://arxiv.org/abs/2409.08202)|null|\u4eba\u7c7b\u89c6\u89c9\u7406\u89e3\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\u80fd\u591f\u7075\u6d3b\u5730\u89e3\u91ca\u62bd\u8c61\u6982\u5ff5\u7684\u80fd\u529b\uff1a\u83b7\u53d6\u63d0\u5347\u89c4\u5219\u6765\u89e3\u91ca\u5b83\u4eec\u6240\u8c61\u5f81\u7684\u542b\u4e49\uff0c\u5728\u719f\u6089\u548c\u4e0d\u719f\u6089\u7684\u4e0a\u4e0b\u6587\u4e2d\u951a\u5b9a\u5b83\u4eec\uff0c\u5e76\u5bf9\u5b83\u4eec\u8fdb\u884c\u9884\u6d4b\u6216\u63a8\u7406\u3002\u5c3d\u7ba1\u73b0\u6210\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u5177\u4f53\u5bf9\u8c61\u7c7b\u522b\uff08\u5982\u6811\u679d\uff09\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5b83\u4eec\u4ecd\u7136\u96be\u4ee5\u7406\u89e3\u8fd9\u6837\u7684\u89c6\u89c9\u62bd\u8c61\uff08\u4f8b\u5982\uff0c\u4e00\u7ec4\u6811\u679d\u5982\u4f55\u5f62\u6210\u8ff7\u5bab\u7684\u5899\u58c1\uff09\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u6df1\u5ea6\u67b6\u6784\u63a5\u5730\uff08DSG\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u5229\u7528\u660e\u786e\u7684\u7ed3\u6784\u5316\u8868\u793a\u6cd5\u6765\u951a\u5b9a\u548c\u63a8\u7406\u89c6\u89c9\u62bd\u8c61\u7684\u6846\u67b6\u3002DSG\u7684\u6838\u5fc3\u662f\u67b6\u6784\u2014\u2014\u5206\u89e3\u62bd\u8c61\u6982\u5ff5\u7684\u4f9d\u8d56\u56fe\u5f62\u63cf\u8ff0\uff0c\u5c06\u5176\u5206\u89e3\u4e3a\u66f4\u57fa\u672c\u7684\u7b26\u53f7\u3002DSG\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63d0\u53d6\u67b6\u6784\uff0c\u7136\u540e\u901a\u8fc7\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5206\u5c42\u5730\u5c06\u67b6\u6784\u4e2d\u7684\u5177\u4f53\u5230\u62bd\u8c61\u7ec4\u4ef6\u951a\u5b9a\u5230\u56fe\u50cf\u4e0a\u3002\u951a\u5b9a\u540e\u7684\u67b6\u6784\u7528\u4e8e\u589e\u5f3a\u5bf9\u89c6\u89c9\u62bd\u8c61\u7684\u7406\u89e3\u3002\u6211\u4eec\u7cfb\u7edf\u5730\u8bc4\u4f30\u4e86DSG\u53ca\u5176\u4e0d\u540c\u7684\u65b9\u6cd5\u5728\u6211\u4eec\u65b0\u521b\u5efa\u7684\u89c6\u89c9\u62bd\u8c61\u6570\u636e\u96c6\u4e0a\u7684\u63a8\u7406\u6027\u80fd\uff0c\u8be5\u6570\u636e\u96c6\u7531\u4eba\u7c7b\u6807\u6ce8\u7684\u771f\u5b9e\u4e16\u754c\u56fe\u50cf\u548c\u76f8\u5e94\u7684\u95ee\u7b54\u5bf9\u7ec4\u6210\u3002\u6211\u4eec\u5c55\u793a\u4e86DSG\u663e\u8457\u63d0\u9ad8\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u62bd\u8c61\u89c6\u89c9\u63a8\u7406\u65b9\u9762\u7684\u8868\u73b0\uff0c\u5e76\u671d\u7740\u4e0e\u4eba\u7c7b\u4e00\u81f4\u7684\u89c6\u89c9\u62bd\u8c61\u7406\u89e3\u8fc8\u8fdb\u4e86\u4e00\u6b65\u3002|\n", "2409.08185": "|**2024-09-12**|**Fine-tuning Large Language Models for Entity Matching**|Aaron Steiner et.al.|[2409.08185](http://arxiv.org/abs/2409.08185)|**[link](https://github.com/wbsg-uni-mannheim/tailormatch)**|**\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5b9e\u4f53\u5339\u914d\u7684\u6f5c\u529b\uff0c\u7279\u522b\u662f\u901a\u8fc7\u5fae\u8c03\u3002\u5df2\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u793a\u5de5\u7a0b\u548c\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u4e0a\u3002\u672c\u6587\u4ece\u4e24\u4e2a\u7ef4\u5ea6\u5206\u6790\u4e86\u5fae\u8c03\u7684\u53ef\u884c\u6027\uff1a1\uff09\u8bad\u7ec3\u793a\u4f8b\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u5b9e\u9a8c\u6d89\u53ca\u5728\u8bad\u7ec3\u96c6\u4e2d\u6dfb\u52a0\u4e0d\u540c\u7c7b\u578b\u7684LLM\u751f\u6210\u89e3\u91ca\uff1b2\uff09\u4f7f\u7528LLM\u9009\u62e9\u548c\u751f\u6210\u8bad\u7ec3\u793a\u4f8b\u3002\u6211\u4eec\u4e0d\u4ec5\u5173\u6ce8\u6e90\u6570\u636e\u96c6\u4e0a\u7684\u5339\u914d\u6027\u80fd\uff0c\u8fd8\u7814\u7a76\u4e86\u5fae\u8c03\u5bf9\u6a21\u578b\u5728\u540c\u57df\u6570\u636e\u96c6\u4ee5\u53ca\u8de8\u9886\u57df\u6570\u636e\u96c6\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u7684\u5f71\u54cd\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5fae\u8c03\u663e\u8457\u63d0\u5347\u4e86\u5c0f\u578b\u6a21\u578b\u7684\u6027\u80fd\uff0c\u800c\u5927\u578b\u6a21\u578b\u7684\u8868\u73b0\u5219\u53c2\u5dee\u4e0d\u9f50\u3002\u5fae\u8c03\u5728\u63d0\u5347\u540c\u57df\u6570\u636e\u96c6\u7684\u6cdb\u5316\u80fd\u529b\u7684\u540c\u65f6\uff0c\u4e5f\u5f71\u54cd\u4e86\u8de8\u57df\u8fc1\u79fb\u7684\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5411\u8bad\u7ec3\u96c6\u6dfb\u52a0\u7ed3\u6784\u5316\u7684\u89e3\u91ca\u5bf9\u56db\u79cdLLM\u4e2d\u7684\u4e09\u79cd\u6709\u6b63\u9762\u5f71\u54cd\uff0c\u800c\u63d0\u51fa\u7684\u793a\u4f8b\u9009\u62e9\u548c\u751f\u6210\u65b9\u6cd5\u4ec5\u63d0\u5347\u4e86Llama 3.1 8B\u7684\u6027\u80fd\uff0c\u540c\u65f6\u964d\u4f4e\u4e86GPT-4o Mini\u7684\u6027\u80fd\u3002**|\n", "2409.08148": "|**2024-09-12**|**Faster Speech-LLaMA Inference with Multi-token Prediction**|Desh Raj et.al.|[2409.08148](http://arxiv.org/abs/2409.08148)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u4efb\u52a1\u4e0a\u53d8\u5f97\u6781\u4e3a\u719f\u7ec3\uff0c\u5305\u62ec\u6d89\u53ca\u591a\u6a21\u6001\u8f93\u5165\u7684\u4efb\u52a1\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u901a\u8fc7\u4f7f\u7528\u8bed\u97f3\u7f16\u7801\u5668\u5b9e\u4f8b\u5316LLM\uff08\u4f8b\u5982LLaMA\uff09\u5e76\u5229\u7528\u914d\u5bf9\u6570\u636e\u5bf9\u5176\u8fdb\u884c\u8bad\u7ec3\uff0c\u53ef\u4ee5\u8d4b\u4e88\u53ea\u89e3\u7801\u7684\u6a21\u578b\u8bed\u97f3\u8bc6\u522b\uff08ASR\uff09\u80fd\u529b\uff0c\u56e0\u6b64\u79f0\u4e4b\u4e3aSpeech-LLaMA\u3002\u7136\u800c\uff0c\u7531\u4e8e\u81ea\u56de\u5f52\u63a8\u7406\u7684\u987a\u5e8f\u6027\u8d28\u4ee5\u53ca\u76f8\u5bf9\u8f83\u5927\u7684\u89e3\u7801\u5668\uff0cSpeech-LLaMA\u6a21\u578b\u7684\u63a8\u7406\u65f6\u95f4\u76f8\u5bf9\u8f83\u9ad8\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u901a\u8fc7\u5728\u540c\u4e00\u89e3\u7801\u6b65\u9aa4\u4e2d\u9884\u6d4b\u591a\u4e2a\u4ee4\u724c\u6765\u52a0\u901fSpeech-LLaMA\u7684\u63a8\u7406\u3002\u6211\u4eec\u63a2\u7d22\u4e86\u51e0\u4e2a\u80fd\u591f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u6a21\u578b\u67b6\u6784\uff0c\u5e76\u901a\u8fc7\u9608\u503c\u63a8\u7406\u548c\u9a8c\u8bc1\u63a8\u7406\u7b56\u7565\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u524d\u7f00\u7684\u675f\u641c\u7d22\u89e3\u7801\u65b9\u6cd5\uff0c\u5141\u8bb8\u6b64\u7c7b\u6a21\u578b\u8fdb\u884c\u9ad8\u6548\u7684\u6700\u5c0f\u8bcd\u9519\u8bef\u7387\uff08MWER\uff09\u8bad\u7ec3\u3002\u6211\u4eec\u5728\u591a\u79cd\u516c\u5171\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u7ed3\u679c\u663e\u793a\u5b83\u4eec\u5c06\u89e3\u7801\u8c03\u7528\u7684\u6570\u91cf\u51cf\u5c11\u4e86\u7ea63.2\u500d\uff0c\u540c\u65f6\u4fdd\u6301\u6216\u63d0\u9ad8\u4e86WER\u6027\u80fd\u3002|\n", "2409.08147": "|**2024-09-12**|**LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models**|Zhengliang Liu et.al.|[2409.08147](http://arxiv.org/abs/2409.08147)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u8868\u73b0\u7684\u65b0\u65b9\u6cd5\uff0c\u65e8\u5728\u89e3\u51b3\u957f\u671f\u5b58\u5728\u7684\u5ba2\u89c2\u8bc4\u4f30\u8fa9\u8bba\u7ed3\u679c\u7684\u6311\u6218\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u6846\u67b6\uff0c\u4ece\u201c\u653f\u7b56\u3001\u4e2a\u6027\u4e0e\u89c6\u89d2\u201d\uff083P\uff09\u548c\u201c\u5174\u8da3\u3001\u610f\u8bc6\u5f62\u6001\u4e0e\u8eab\u4efd\u8ba4\u540c\u201d\uff083I\uff09\u7684\u89d2\u5ea6\u5206\u6790\u56db\u4f4d\u5173\u952e\u53d7\u4f17\u7fa4\u4f53\uff1a\u9009\u6c11\u3001\u4f01\u4e1a\u3001\u6350\u8d60\u8005\u53ca\u653f\u5ba2\u5bf9\u5019\u9009\u4eba\u7684\u5171\u9e23\u3002\u8be5\u65b9\u6cd5\u901a\u8fc7\u751f\u6210\u201cLLM-POTUS\u8bc4\u5206\u201d\uff0c\u5373\u57fa\u4e8e3P\u4e0e3I\u4e4b\u95f4\u4e00\u81f4\u6027\u5ea6\u91cf\u7684\u91cf\u5316\u6307\u6807\uff0c\u6765\u8bc4\u4ef7\u8fa9\u8bba\u8868\u73b0\u3002\u6211\u4eec\u5e94\u7528\u6b64\u6846\u67b6\u5bf9\u8fd1\u671f\u7f8e\u56fd\u603b\u7edf\u8fa9\u8bba\u7684\u6587\u672c\u8fdb\u884c\u5206\u6790\uff0c\u63ed\u793a\u4e86\u4e0d\u540c\u8fa9\u8bba\u7b56\u7565\u7684\u6709\u6548\u6027\u53ca\u5176\u5bf9\u4e0d\u540c\u53d7\u4f17\u7fa4\u4f53\u7684\u5f71\u54cd\u3002\u7814\u7a76\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u653f\u6cbb\u5206\u6790\u5de5\u5177\uff0c\u8fd8\u63a2\u7d22\u4e86\u5728\u590d\u6742\u793e\u4f1a\u80cc\u666f\u4e0b\u4f7f\u7528LLM\u4f5c\u4e3a\u516c\u6b63\u8bc4\u5224\u8005\u7684\u6f5c\u529b\u4e0e\u5c40\u9650\u6027\u3002\u6b64\u5916\uff0c\u8be5\u6846\u67b6\u4e3a\u4e2a\u4eba\u516c\u6c11\u63d0\u4f9b\u4e86\u4e00\u4e2a\u72ec\u7acb\u7684\u5de5\u5177\uff0c\u7528\u4e8e\u8bc4\u4f30\u603b\u7edf\u8fa9\u8bba\u7684\u8868\u73b0\uff0c\u4ece\u800c\u589e\u5f3a\u6c11\u4e3b\u53c2\u4e0e\u5ea6\uff0c\u51cf\u5c11\u5bf9\u53ef\u80fd\u504f\u89c1\u7684\u5a92\u4f53\u89e3\u8bfb\u548c\u673a\u6784\u5f71\u54cd\u529b\u7684\u4f9d\u8d56\uff0c\u8fdb\u800c\u52a0\u5f3a\u77e5\u60c5\u516c\u6c11\u53c2\u4e0e\u7684\u57fa\u7840\u3002|\n", "2409.08098": "|**2024-09-12**|**The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal**|Huiyuan Xie et.al.|[2409.08098](http://arxiv.org/abs/2409.08098)|null|\u672c\u6587\u7814\u7a76\u4e86\u6280\u672f\u9769\u65b0\u4e0e\u83b7\u53d6\u516c\u6b63\u4e4b\u95f4\u7684\u4ea4\u6c47\u70b9\uff0c\u901a\u8fc7\u5728\u82f1\u56fd\u5c31\u4e1a\u6cd5\u5ead\uff08UKET\uff09\u6784\u5efa\u9884\u6d4b\u6848\u4f8b\u7ed3\u679c\u7684\u57fa\u51c6\u3002\u4e3a\u4e86\u5e94\u5bf9\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u6311\u6218\uff0c\u8be5\u7814\u7a76\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u81ea\u52a8\u6ce8\u91ca\uff0c\u4ece\u800c\u521b\u5efa\u4e86CLC-UKET\u6570\u636e\u96c6\u3002\u8be5\u6570\u636e\u96c6\u5305\u542b\u7ea619,000\u4e2aUKET\u6848\u4f8b\u53ca\u5176\u5143\u6570\u636e\u3002\u5168\u9762\u7684\u6cd5\u5f8b\u6ce8\u91ca\u6db5\u76d6\u4e86\u4e8b\u5b9e\u3001\u4e3b\u5f20\u3001\u5148\u4f8b\u5f15\u7528\u3001\u6cd5\u89c4\u5f15\u7528\u3001\u6848\u4f8b\u7ed3\u679c\u3001\u7406\u7531\u548c\u7ba1\u8f96\u6743\u4ee3\u7801\u3002\u501f\u52a9CLC-UKET\u6570\u636e\uff0c\u6211\u4eec\u5bf9UKET\u7684\u591a\u7c7b\u6848\u4f8b\u7ed3\u679c\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u4e86\u7814\u7a76\u3002\u6536\u96c6\u4e86\u4eba\u7c7b\u9884\u6d4b\u4ee5\u5efa\u7acb\u6a21\u578b\u6bd4\u8f83\u7684\u6027\u80fd\u53c2\u8003\u3002\u4ece\u57fa\u7840\u6a21\u578b\u7684\u5b9e\u8bc1\u7ed3\u679c\u6765\u770b\uff0c\u5fae\u8c03\u7684\u8f6c\u6362\u5668\u6a21\u578b\u5728UKET\u9884\u6d4b\u4efb\u52a1\u4e0a\u4f18\u4e8e\u96f6\u6b21\u548c\u5c11\u91cf\u6837\u672c\u7684LLM\u3002\u96f6\u6b21LLM\u7684\u6027\u80fd\u53ef\u4ee5\u901a\u8fc7\u6574\u5408\u4e0e\u4efb\u52a1\u76f8\u5173\u7684\u4fe1\u606f\u6765\u589e\u5f3a\uff0c\u878d\u5165\u5c11\u91cf\u6837\u672c\u793a\u4f8b\u4e2d\u3002\u6211\u4eec\u5e0c\u671bCLC-UKET\u6570\u636e\u96c6\u3001\u4eba\u7c7b\u6ce8\u91ca\u4ee5\u53ca\u5b9e\u8bc1\u53d1\u73b0\u80fd\u591f\u4f5c\u4e3a\u5c31\u4e1a\u76f8\u5173\u7ea0\u7eb7\u89e3\u51b3\u7684\u5b9d\u8d35\u57fa\u51c6\u3002|\n", "2409.08087": "|**2024-09-12**|**Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks**|Benji Peng et.al.|[2409.08087](http://arxiv.org/abs/2409.08087)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u8fd1\u5e74\u6765\u6709\u5173\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5b89\u5168\u6027\u7684\u5173\u952e\u95ee\u9898\u7684\u7814\u7a76\u6587\u732e\uff0c\u91cd\u70b9\u662f\u51c6\u786e\u6027\u3001\u504f\u89c1\u3001\u5185\u5bb9\u68c0\u6d4b\u4ee5\u53ca\u5bf9\u6297\u653b\u51fb\u7684\u8106\u5f31\u6027\u3002\u6587\u7ae0\u8be6\u7ec6\u8ba8\u8bba\u4e86LLM\u8f93\u51fa\u53ef\u80fd\u4e0d\u51c6\u786e\u6216\u8bef\u5bfc\u6027\u7684\u95ee\u9898\uff0c\u5e76\u5f3a\u8c03\u4e86\u901a\u8fc7\u4e8b\u5b9e\u6838\u67e5\u65b9\u6cd5\u589e\u5f3a\u54cd\u5e94\u53ef\u9760\u6027\u7684\u5b9e\u65bd\u7b56\u7565\u3002\u6587\u7ae0\u6df1\u5165\u63a2\u8ba8\u4e86\u5185\u5d4c\u4e8eLLM\u4e2d\u7684\u56fa\u6709\u504f\u89c1\uff0c\u901a\u8fc7\u591a\u6837\u5316\u7684\u8bc4\u4f30\u6280\u672f\uff0c\u5982\u63a7\u5236\u8f93\u5165\u7814\u7a76\u548c\u7ea2\u961f\u6f14\u7ec3\uff0c\u5bf9\u5176\u8fdb\u884c\u6279\u5224\u6027\u5ba1\u89c6\u3002\u63d0\u51fa\u4e86\u5168\u9762\u7684\u504f\u89c1\u7f13\u89e3\u7b56\u7565\u5206\u6790\uff0c\u5305\u62ec\u4ece\u9884\u5904\u7406\u5e72\u9884\u5230\u8bad\u7ec3\u671f\u95f4\u8c03\u6574\u548c\u540e\u5904\u7406\u6539\u8fdb\u7684\u5404\u79cd\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6587\u7ae0\u8fd8\u63a2\u7a76\u4e86\u533a\u5206LLM\u751f\u6210\u5185\u5bb9\u4e0e\u4eba\u7c7b\u521b\u4f5c\u6587\u672c\u7684\u590d\u6742\u6027\uff0c\u5f15\u5165\u4e86\u8bf8\u5982DetectGPT\u7684\u68c0\u6d4b\u673a\u5236\u4ee5\u53ca\u6c34\u5370\u6280\u672f\uff0c\u540c\u65f6\u6307\u51fa\u5728\u590d\u6742\u60c5\u51b5\u4e0b\u57fa\u4e8e\u673a\u5668\u5b66\u4e60\u7684\u5206\u7c7b\u5668\u5b58\u5728\u5c40\u9650\u6027\u3002\u6587\u7ae0\u8fd8\u5206\u6790\u4e86LLM\u7684\u6f0f\u6d1e\uff0c\u5305\u62ec\u9003\u9038\u653b\u51fb\u548c\u63d0\u793a\u6ce8\u5165\u653b\u51fb\uff0c\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u548c\u5927\u89c4\u6a21\u7ade\u8d5bHackAPrompt\u7b49\u8fdb\u884c\u4e86\u6df1\u5165\u63a2\u8ba8\u3002\u6700\u540e\uff0c\u6587\u7ae0\u56de\u987e\u4e86\u4fdd\u62a4LLM\u7684\u9632\u5fa1\u63aa\u65bd\uff0c\u5f3a\u8c03\u4e86\u9700\u8981\u5bf9LLM\u5b89\u5168\u6027\u9886\u57df\u8fdb\u884c\u66f4\u6df1\u5165\u7814\u7a76\u7684\u91cd\u8981\u6027\u3002|\n", "2409.09030": "|**2024-09-13**|**Agents in Software Engineering: Survey, Landscape, and Vision**|Yanxian Huang et.al.|[2409.09030](http://arxiv.org/abs/2409.09030)|**[link](https://github.com/deepsoftwareanalytics/awesome-agent4se)**|**\u8fd1\u5e74\u6765\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u5c24\u5176\u662f\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u4e2d\u7684\u4efb\u52a1\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u8bb8\u591a\u5c06LLMs\u4e0eSE\u7ed3\u5408\u7684\u7814\u7a76\u5de5\u4f5c\u660e\u786e\u6216\u9690\u542b\u5730\u91c7\u7528\u4e86\u4ee3\u7406\u7684\u6982\u5ff5\u3002\u7136\u800c\uff0c\u7f3a\u4e4f\u5bf9\u73b0\u6709\u5de5\u4f5c\u53d1\u5c55\u80cc\u666f\u7684\u6df1\u5165\u7efc\u8ff0\u3001\u5206\u6790\u5b83\u4eec\u5982\u4f55\u7ed3\u5408\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6280\u672f\u4f18\u5316\u5404\u79cd\u4efb\u52a1\u4ee5\u53ca\u6f84\u6e05SE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\u3002\u672c\u6587\u65e8\u5728\u8fdb\u884c\u9996\u6b21\u5173\u4e8e\u7ed3\u5408LLMs\u4e0eSE\u7684\u7814\u7a76\u7efc\u8ff0\uff0c\u5e76\u63d0\u51faSE\u4e2d\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6a21\u5757\uff1a\u611f\u77e5\u3001\u8bb0\u5fc6\u548c\u884c\u52a8\u3002\u540c\u65f6\uff0c\u6211\u4eec\u603b\u7ed3\u4e86\u8fd9\u4e24\u4e2a\u9886\u57df\u7ed3\u5408\u65f6\u9762\u4e34\u7684\u5f53\u524d\u6311\u6218\uff0c\u5e76\u9488\u5bf9\u8fd9\u4e9b\u6311\u6218\u63d0\u51fa\u4e86\u672a\u6765\u7684\u673a\u9047\u3002\u6211\u4eec\u7ef4\u62a4\u4e86\u4e00\u4e2a\u76f8\u5173\u7684\u8bba\u6587GitHub\u4ed3\u5e93\uff0c\u5730\u5740\u4e3a\uff1ahttps://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE\u3002**|\n", "2409.09010": "|**2024-09-13**|**Contri(e)ve: Context + Retrieve for Scholarly Question Answering**|Kanchan Shivashankar et.al.|[2409.09010](http://arxiv.org/abs/2409.09010)|null|### \u6458\u8981\u7ffb\u8bd1 \u5b66\u8005\u4ea4\u6d41\u662f\u4e00\u4e2a\u5feb\u901f\u53d1\u5c55\u7684\u9886\u57df\uff0c\u8574\u542b\u7740\u4e30\u5bcc\u7684\u77e5\u8bc6\u3002\u7136\u800c\uff0c\u7531\u4e8e\u5176\u975e\u7ed3\u6784\u5316\u7684\u6587\u6863\u683c\u5f0f\uff0c\u4f20\u7edf\u7684\u6587\u6863\u68c0\u7d22\u65b9\u6cd5\u96be\u4ee5\u4ece\u4e2d\u63d0\u53d6\u6709\u7528\u4fe1\u606f\u3002\u5b66\u8005\u77e5\u8bc6\u56fe\u8c31\u901a\u8fc7\u6784\u5efa\u4e00\u4e2a\u8bed\u4e49\u7f51\u7edc\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u63d0\u4f9b\u4e86\u9690\u85cf\u7684\u6d1e\u5bdf\u3001\u6458\u8981\u548c\u6613\u4e8e\u901a\u8fc7\u67e5\u8be2\u83b7\u53d6\u7684\u8bbf\u95ee\u6027\u3002\u81ea\u7136\u5730\uff0c\u5bf9\u5b66\u8005\u56fe\u8c31\u8fdb\u884c\u95ee\u7b54\u6269\u5c55\u4e86\u66f4\u5e7f\u6cdb\u53d7\u4f17\u7684\u53ef\u8bbf\u95ee\u6027\u3002\u4f46\u5728\u8fd9\u4e00\u9886\u57df\u7684\u67d0\u4e9b\u77e5\u8bc6\u4ecd\u7136\u4ee5\u975e\u7ed3\u6784\u5316\u6587\u672c\u5f62\u5f0f\u5448\u73b0\uff0c\u56e0\u6b64\u9700\u8981\u7ed3\u5408\u89e3\u51b3\u65b9\u6848\u6765\u4e3a\u95ee\u7b54\u7cfb\u7edf\u63d0\u4f9b\u652f\u6301\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u4e24\u6b65\u89e3\u51b3\u65b9\u6848\uff0c\u4f7f\u7528\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff1aLlama3.1\u5bf9\u5b66\u8005-QALD\u6570\u636e\u96c6\u8fdb\u884c\u5904\u7406\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ece\u4e0d\u540c\u7684\u7ed3\u6784\u5316\u548c\u975e\u7ed3\u6784\u5316\u6570\u636e\u6e90\u4e2d\u63d0\u53d6\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5185\u5bb9\uff1aDBLP\u3001SemOpenAlex\u77e5\u8bc6\u56fe\u8c31\u4ee5\u53ca\u7ef4\u57fa\u767e\u79d1\u6587\u672c\u3002 \u5176\u6b21\uff0c\u6211\u4eec\u5b9e\u65bd\u4e86\u63d0\u793a\u5de5\u7a0b\uff0c\u4ee5\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u4fe1\u606f\u68c0\u7d22\u6027\u80fd\u3002 \u6211\u4eec\u7684\u65b9\u6cd5\u5728F1\u5206\u6570\u4e0a\u53d6\u5f97\u4e8640%\u7684\u6210\u7ee9\uff0c\u5e76\u89c2\u5bdf\u5230\u4e00\u4e9b\u6765\u81eaLLM\u7684\u5f02\u5e38\u54cd\u5e94\uff0c\u8fd9\u4e9b\u54cd\u5e94\u5728\u8bba\u6587\u7684\u6700\u540e\u90e8\u5206\u8fdb\u884c\u4e86\u8ba8\u8bba\u3002|\n", "2409.08963": "|**2024-09-13**|**Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance**|Lucio La Cava et.al.|[2409.08963](http://arxiv.org/abs/2409.08963)|null|\u786e\u4fdd\u5185\u5bb9\u7b26\u5408\u793e\u533a\u51c6\u5219\u5bf9\u4e8e\u7ef4\u62a4\u5065\u5eb7\u7684\u5728\u7ebf\u793e\u4ea4\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u7136\u800c\uff0c\u4f20\u7edf\u7684\u57fa\u4e8e\u4eba\u7c7b\u7684\u5408\u89c4\u6027\u68c0\u67e5\u5728\u5904\u7406\u7528\u6237\u751f\u6210\u5185\u5bb9\u7684\u4e0d\u65ad\u589e\u957f\u91cf\u548c\u6709\u9650\u7684\u7ba1\u7406\u5458\u6570\u91cf\u65f6\u9762\u4e34\u7740\u6269\u5c55\u96be\u9898\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u7684\u65b0\u8fdb\u5c55\uff0c\u4e3a\u81ea\u52a8\u5316\u5185\u5bb9\u5408\u89c4\u6027\u9a8c\u8bc1\u5f00\u8f9f\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002\u672c\u6587\u8bc4\u4f30\u4e86\u516d\u4e2a\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\uff0c\u8fd9\u4e9b\u4ee3\u7406\u57fa\u4e8eOpen-LLMs\uff0c\u5728\u53bb\u4e2d\u5fc3\u5316\u793e\u4ea4\u7f51\u7edc\u4e2d\u5bf9\u89c4\u5219\u5408\u89c4\u6027\u8fdb\u884c\u81ea\u52a8\u9a8c\u8bc1\uff0c\u8fd9\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u73af\u5883\uff0c\u56e0\u4e3a\u793e\u533a\u7684\u8303\u56f4\u548c\u89c4\u5219\u5404\u4e0d\u76f8\u540c\u3002\u901a\u8fc7\u5bf9\u6765\u81ea\u6570\u767e\u4e2aMastodon\u670d\u52a1\u5668\u7684\u8d85\u8fc750,000\u6761\u5e16\u5b50\u7684\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4eba\u5de5\u667a\u80fd\u4ee3\u7406\u80fd\u591f\u6709\u6548\u5730\u68c0\u6d4b\u975e\u5408\u89c4\u5185\u5bb9\u3001\u638c\u63e1\u8bed\u8a00\u4e0a\u7684\u7ec6\u5fae\u5dee\u522b\uff0c\u5e76\u9002\u5e94\u4e0d\u540c\u7684\u793e\u533a\u4e0a\u4e0b\u6587\u3002\u5927\u591a\u6570\u4ee3\u7406\u8fd8\u663e\u793a\u51fa\u9ad8\u7684\u4e00\u81f4\u6027\u548c\u4e00\u81f4\u6027\uff0c\u5728\u8bc4\u5206\u89e3\u91ca\u548c\u5408\u89c4\u5efa\u8bae\u4e0a\u4e0e\u4eba\u5de5\u8bc4\u4ef7\u8005\u76f8\u5339\u914d\u3002\u901a\u8fc7\u9886\u57df\u4e13\u5bb6\u7684\u4eba\u5de5\u8bc4\u4f30\uff0c\u786e\u8ba4\u4e86\u4ee3\u7406\u7684\u53ef\u9760\u6027\u548c\u5b9e\u7528\u6027\uff0c\u8fd9\u8868\u660e\u5b83\u4eec\u662f\u534a\u81ea\u52a8\u5316\u6216\u4eba\u673a\u534f\u4f5c\u5185\u5bb9\u7ba1\u7406\u7cfb\u7edf\u7684\u6709\u524d\u666f\u7684\u5de5\u5177\u3002|\n", "2409.08937": "|**2024-09-13**|**Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions**|Zahra Ashktorab et.al.|[2409.08937](http://arxiv.org/abs/2409.08937)|null|\u672c\u6587\u7814\u7a76\u4e86\u5728\u4eba\u7c7b\u4e0e\u4eba\u5de5\u667a\u80fd\u5408\u4f5c\u8fdb\u884c\u6587\u672c\u751f\u6210\u4efb\u52a1\u65f6\uff0c\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u534f\u52a9\u751f\u6210\u9ad8\u8d28\u91cf\u5bf9\u8bdd\u6570\u636e\u3002\u5bf9\u4e8e\u8fd9\u4e9b\u6a21\u578b\u800c\u8a00\uff0c\u9700\u8981\u6570\u636e\u8fdb\u884c\u5fae\u8c03\uff0c\u8fd9\u662f\u63d0\u5347\u5176\u6027\u80fd\u7684\u5173\u952e\u6b65\u9aa4\u3002\u5728\u5ba2\u6237\u670d\u52a1\u5bf9\u8bdd\u4e0a\u4e0b\u6587\u4e2d\uff0c\u6570\u636e\u4ee5\u4eba\u4e0e\u5ba2\u670d\u4ee3\u7406\u4e4b\u95f4\u7684\u5bf9\u8bdd\u5f62\u5f0f\u5b58\u5728\uff0c\u5e76\u53ef\u501f\u52a9AI\u52a9\u624b\u751f\u6210\u3002\u5728\u6211\u4eec\u7684\u7814\u7a76\u4e2d\uff0c\u5171\u62db\u52df\u4e8611\u4f4d\u7528\u6237\uff0c\u6bcf\u4f4d\u7528\u6237\u5b8c\u62108\u9879\u4efb\u52a1\uff0c\u603b\u5171\u5b8c\u6210\u4e8688\u9879\u4efb\u52a1\u3002\u7ed3\u679c\u53d1\u73b0\uff0c\u5e7b\u89c9\u7684\u5b58\u5728\u5bf9\u6570\u636e\u8d28\u91cf\u4ea7\u751f\u4e86\u8d1f\u9762\u5f71\u54cd\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u5c3d\u7ba1\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5e76\u975e\u603b\u80fd\u62b5\u6d88\u5e7b\u89c9\u5bf9\u6570\u636e\u8d28\u91cf\u7684\u4e0d\u5229\u5f71\u54cd\uff0c\u4f46\u5e7b\u89c9\u548c\u8ba4\u77e5\u9a71\u52a8\u56e0\u7d20\u5171\u540c\u4f5c\u7528\u4e8e\u6570\u636e\u8d28\u91cf\uff0c\u5e76\u5f71\u54cd\u7528\u6237\u5982\u4f55\u5229\u7528\u5448\u73b0\u7ed9\u4ed6\u4eec\u7684AI\u54cd\u5e94\u3002\u901a\u8fc7\u5206\u6790\u7528\u6237\u884c\u4e3a\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u5bf9AI\u751f\u6210\u54cd\u5e94\u4f9d\u8d56\u7684\u660e\u663e\u6a21\u5f0f\uff0c\u8fd9\u5f3a\u8c03\u4e86\u5728\u5bf9\u8bddAI\u60c5\u5883\u4e0b\u7ba1\u7406\u5e7b\u89c9\u5728AI\u751f\u6210\u5185\u5bb9\u4e2d\u7684\u91cd\u8981\u6027\u3002|\n", "2409.08936": "|**2024-09-13**|**SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records**|Paloma Rabaey et.al.|[2409.08936](http://arxiv.org/abs/2409.08936)|**[link](https://github.com/prabaey/synsum)**|**\u6211\u4eec\u63d0\u51fa\u4e86SynSUM\u57fa\u51c6\u6570\u636e\u96c6\uff0c\u8fd9\u662f\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5c06\u975e\u7ed3\u6784\u5316\u7684\u4e34\u5e8a\u8bb0\u5f55\u4e0e\u7ed3\u6784\u5316\u80cc\u666f\u53d8\u91cf\u8054\u7cfb\u8d77\u6765\u3002\u8be5\u6570\u636e\u96c6\u753110,000\u4e2a\u865a\u6784\u7684\u60a3\u8005\u8bb0\u5f55\u7ec4\u6210\uff0c\u5305\u542b\u8868\u683c\u53d8\u91cf\uff08\u5982\u75c7\u72b6\u3001\u8bca\u65ad\u548c\u57fa\u7840\u6761\u4ef6\uff09\u4ee5\u53ca\u4e0e\u4e4b\u76f8\u5173\u7684\u63cf\u8ff0\u865a\u6784\u60a3\u8005\u5c31\u8bca\u60c5\u51b5\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u9886\u57df\u4e3a\u547c\u5438\u75be\u75c5\u3002\u8868\u683c\u90e8\u5206\u7684\u6570\u636e\u901a\u8fc7\u8d1d\u53f6\u65af\u7f51\u7edc\u751f\u6210\uff0c\u5176\u4e2d\u56e0\u679c\u7ed3\u6784\u548c\u6761\u4ef6\u6982\u7387\u7531\u4e13\u5bb6\u57fa\u4e8e\u9886\u57df\u77e5\u8bc6\u63d0\u51fa\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08GPT-4o\uff09\u751f\u6210\u4e0e\u60a3\u8005\u5c31\u8bca\u76f8\u5173\u7684\u4e34\u5e8a\u7b14\u8bb0\uff0c\u63cf\u8ff0\u60a3\u8005\u7684\u75c7\u72b6\u548c\u989d\u5916\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u3002 SynSUM\u6570\u636e\u96c6\u4e3b\u8981\u65e8\u5728\u4fc3\u8fdb\u5728\u5b58\u5728\u8868\u683c\u80cc\u666f\u53d8\u91cf\u7684\u60c5\u51b5\u4e0b\u5bf9\u4e34\u5e8a\u4fe1\u606f\u63d0\u53d6\u7684\u7814\u7a76\uff0c\u53ef\u4ee5\u901a\u8fc7\u9886\u57df\u77e5\u8bc6\u5c06\u8fd9\u4e9b\u53d8\u91cf\u94fe\u63a5\u5230\u4ece\u6587\u672c\u4e2d\u63d0\u53d6\u7684\u6982\u5ff5\u5174\u8da3\u70b9\u2014\u2014\u5728SynSUM\u7684\u60c5\u51b5\u4e0b\u662f\u75c7\u72b6\u3002\u6b21\u8981\u7528\u9014\u5305\u62ec\u7814\u7a76\u8868\u683c\u6570\u636e\u548c\u6587\u672c\u7684\u81ea\u52a8\u5316\u4e34\u5e8a\u63a8\u7406\u3001\u5728\u5b58\u5728\u8868\u683c\u548c/\u6216\u6587\u672c\u6df7\u6742\u56e0\u7d20\u60c5\u51b5\u4e0b\u7684\u56e0\u679c\u6548\u5e94\u4f30\u8ba1\u4ee5\u53ca\u591a\u6a21\u6001\u5408\u6210\u6570\u636e\u751f\u6210\u3002 \u8be5\u6570\u636e\u96c6\u53ef\u4ee5\u4ece\u4ee5\u4e0b\u94fe\u63a5\u4e0b\u8f7d\uff1a**|\n", "2409.08931": "|**2024-09-13**|**LLM-based Weak Supervision Framework for Query Intent Classification in Video Search**|Farnoosh Javadi et.al.|[2409.08931](http://arxiv.org/abs/2409.08931)|null|\u6d41\u5a92\u4f53\u670d\u52a1\u5df2\u7ecf\u5f7b\u5e95\u6539\u53d8\u4e86\u6211\u4eec\u53d1\u73b0\u548c\u53c2\u4e0e\u6570\u5b57\u5a31\u4e50\u7684\u65b9\u5f0f\u3002\u5c3d\u7ba1\u5982\u6b64\uff0c\u6709\u6548\u7406\u89e3\u7528\u6237\u641c\u7d22\u67e5\u8be2\u7684\u5e7f\u6cdb\u8303\u56f4\u4ecd\u7136\u9762\u4e34\u91cd\u5927\u6311\u6218\u3002\u6784\u5efa\u4e00\u4e2a\u80fd\u591f\u5904\u7406\u4ee3\u8868\u4e0d\u540c\u7528\u6237\u610f\u56fe\u7684\u5404\u79cd\u5b9e\u4f53\u7684\u51c6\u786e\u67e5\u8be2\u7406\u89e3\u7cfb\u7edf\u5bf9\u4e8e\u63d0\u4f9b\u589e\u5f3a\u7684\u7528\u6237\u4f53\u9a8c\u81f3\u5173\u91cd\u8981\u3002\u901a\u8fc7\u8bad\u7ec3\u81ea\u7136\u8bed\u8a00\u7406\u89e3\uff08NLU\uff09\u6a21\u578b\u53ef\u4ee5\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u7136\u800c\uff0c\u5728\u8fd9\u4e2a\u4e13\u95e8\u9886\u57df\u7684\u9ad8\u8d28\u91cf\u6807\u6ce8\u6570\u636e\u83b7\u53d6\u662f\u4e00\u4e2a\u5de8\u5927\u7684\u969c\u788d\u3002\u624b\u52a8\u6ce8\u91ca\u6210\u672c\u9ad8\u6602\u4e14\u5728\u6355\u6349\u7528\u6237\u8bcd\u6c47\u53d8\u5f02\u6027\u65b9\u9762\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f31\u76d1\u7763\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81ea\u52a8\u6807\u6ce8\u5927\u91cf\u7528\u6237\u641c\u7d22\u67e5\u8be2\u3002\u901a\u8fc7\u4f7f\u7528\u63d0\u793a\u5de5\u7a0b\u548c\u591a\u6837\u5316\u7684LLM\u89d2\u8272\uff0c\u6211\u4eec\u751f\u6210\u4e86\u4e0e\u4eba\u5de5\u6ce8\u91ca\u8005\u671f\u671b\u76f8\u5339\u914d\u7684\u8bad\u7ec3\u6570\u636e\u3002\u901a\u8fc7\u5f15\u5165\u9886\u57df\u77e5\u8bc6\uff0c\u5229\u7528\u94fe\u5f0f\u601d\u8003\u548c\u4e0a\u4e0b\u6587\u5b66\u4e60\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5229\u7528\u6807\u8bb0\u6570\u636e\u8bad\u7ec3\u4f18\u5316\u7528\u4e8e\u5b9e\u65f6\u63a8\u7406\u7684\u4f4e\u5ef6\u8fdf\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u53ec\u56de\u7387\u4e0a\u4f18\u4e8e\u57fa\u7ebf\u5e73\u5747\u63d0\u9ad8\u4e86113%\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u4ea7\u751f\u7528\u4e8e\u5f31\u76d1\u7763\u7684\u9ad8\u8d28\u91cfLLM\u751f\u6210\u6570\u636e\uff1b\u4e0e\u4eba\u7c7b\u6ce8\u91ca\u7684F1\u5f97\u5206\u52a0\u6743\u5206\u5e03\u76f8\u6bd4\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u9884\u6d4b\u548c\u4eba\u7c7b\u6ce8\u89e3\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u63d0\u9ad8\u4e8647.60%\u3002\u6211\u4eec\u7684\u89d2\u8272\u9009\u62e9\u8def\u7531\u673a\u5236\u8fdb\u4e00\u6b65\u589e\u52a0\u4e863.67%\u7684\u52a0\u6743F1\u5f97\u5206\uff0c\u8fd9\u662f\u5728\u65b0\u578b\u63d0\u793a\u5de5\u7a0b\u6846\u67b6\u57fa\u7840\u4e0a\u7684\u989d\u5916\u6536\u76ca\u3002|\n", "2409.08904": "|**2024-09-13**|**AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models**|Yifei Yao et.al.|[2409.08904](http://arxiv.org/abs/2409.08904)|**[link](https://github.com/sjtu-mvasl-robotics/AnyBipe)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7aef\u5230\u7aef\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u8bad\u7ec3\u548c\u90e8\u7f72\u673a\u5668\u4eba\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7b56\u7565\uff0c\u8be5\u6846\u67b6\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u5f15\u5bfc\u3002\u8be5\u6846\u67b6\u7531\u4e09\u4e2a\u76f8\u4e92\u8fde\u63a5\u7684\u6a21\u5757\u7ec4\u6210\uff1a\u4e00\u4e2a\u901a\u8fc7LLM\u8bbe\u8ba1\u5956\u52b1\u51fd\u6570\u7684\u6a21\u5757\u3001\u4e00\u4e2a\u5229\u7528\u73b0\u6709\u5de5\u4f5c\u7684RL\u8bad\u7ec3\u6a21\u5757\u4ee5\u53ca\u4e00\u4e2a\u6a21\u62df\u5230\u73b0\u5b9e\uff08sim-to-real\uff09\u540c\u6001\u8bc4\u4f30\u6a21\u5757\u3002\u8fd9\u79cd\u65b9\u6cd5\u663e\u8457\u51cf\u5c11\u4e86\u5bf9\u4eba\u5de5\u5e72\u9884\u7684\u9700\u6c42\uff0c\u4ec5\u9700\u8981\u57fa\u672c\u7684\u6a21\u62df\u548c\u90e8\u7f72\u5e73\u53f0\uff0c\u5e76\u4e14\u63d0\u4f9b\u4e86\u4eba\u5de5\u5de5\u7a0b\u7b56\u7565\u548c\u5386\u53f2\u6570\u636e\u7684\u6574\u5408\u9009\u9879\u3002\u6211\u4eec\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u8fd9\u4e9b\u6a21\u5757\u7684\u6784\u5efa\u3001\u5b83\u4eec\u76f8\u5bf9\u4e8e\u4f20\u7edf\u65b9\u6cd5\u7684\u4f18\u52bf\uff0c\u4ee5\u53ca\u5c55\u793a\u8be5\u6846\u67b6\u5728\u53cc\u8db3\u673a\u5668\u4eba\u6b65\u6001\u63a7\u5236\u81ea\u4e3b\u5f00\u53d1\u548c\u6539\u8fdb\u80fd\u529b\u7684\u5b9e\u4f8b\uff0c\u8bc1\u660e\u5176\u5728\u4e0d\u9700\u8981\u4eba\u7c7b\u5e72\u9884\u7684\u60c5\u51b5\u4e0b\u64cd\u4f5c\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.08890": "|**2024-09-13**|**A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research**|Martin Obschonka et.al.|[2409.08890](http://arxiv.org/abs/2409.08890)|null|\u5728\u4eba\u5de5\u667a\u80fd\uff08AI\uff09\u91c7\u7528\u7684\u8fc5\u901f\u589e\u957f\u4ee5\u53ca\u5927\u6570\u636e\u53ef\u7528\u6027\u7684\u80cc\u666f\u4e0b\uff0c\u521b\u4e1a\u5b66\u9886\u57df\u53ef\u80fd\u8fce\u6765\u6709\u53f2\u4ee5\u6765\u6700\u91cd\u5927\u7684\u8f6c\u53d8\u3002\u672c\u6587\u901a\u8fc7\u5f3a\u8c03AI\u9769\u547d\u671f\u95f4\u521b\u4e1a\u7814\u7a76\u4e2d\u6f5c\u5728\u7684\u65e0\u6210\u6548\u77e5\u8bc6\u4ea4\u6d41\u98ce\u9669\uff0c\u505a\u51fa\u4e86\u7d27\u8feb\u7684\u5143\u8d21\u732e\u3002\u5b83\u63d0\u4f9b\u4e86\u7f13\u89e3\u8fd9\u4e00\u98ce\u9669\u7684\u7b56\u7565\uff0c\u5e76\u4e3a\u672a\u6765\u57fa\u4e8eAI\u7684\u7814\u7a76\u63d0\u4f9b\u4e86\u6307\u5bfc\uff0c\u4ee5\u589e\u5f3a\u5176\u96c6\u4f53\u5f71\u54cd\u529b\u548c\u76f8\u5173\u6027\u3002 \u501f\u9274Akerlof\u8457\u540d\u7684\u201c\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u201d\u6982\u5ff5\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u7531\u4e8e\u9886\u57df\u6f14\u8fdb\u5230\u5f53\u524d\u73af\u5883\u800c\u53ef\u80fd\u51fa\u73b0\u7684\u91cd\u5927\u77e5\u8bc6\u4e0d\u5bf9\u79f0\u6027\uff0c\u5982\u6784\u9020\u6709\u6548\u6027\u3001\u7406\u8bba\u6784\u5efa\u548c\u7814\u7a76\u76f8\u5173\u6027\u65b9\u9762\u7684\u590d\u6742\u6027\u3002\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u6027\u7279\u522b\u6df1\u690d\u4e8e\u6240\u8c13\u7684\u53cc\u91cd\u9ed1\u7bb1\u56f0\u5883\u4e2d\uff0c\u5373AI\u65b9\u6cd5\u7684\u5e7f\u6cdb\u8ba4\u53ef\u7684\u9ed1\u7bb1\u6027\u8d28\u4e0e\u7531\u5185\u5728\u4e0d\u786e\u5b9a\u6027\u9a71\u52a8\u7684\u521b\u4e1a\u73b0\u8c61\u7684\u9ed1\u7bb1\u6027\u8d28\u7684\u4ea4\u6c47\u70b9\u3002\u7ed3\u679c\uff0c\u8fd9\u4e9b\u4e0d\u5bf9\u79f0\u53ef\u80fd\u5bfc\u81f4\u4e0d\u53ef\u68c0\u6d4b\u7684\u6b21\u4f18\u7814\u7a76\u4ea7\u54c1\u589e\u52a0\uff0c\u4ece\u800c\u5f62\u6210\u4e00\u4e2a\u635f\u5bb3\u9886\u57df\u798f\u7949\u3001\u58f0\u8a89\u548c\u5f71\u54cd\u529b\u7684\u52a3\u8d28\u5546\u54c1\u5e02\u573a\u3002 \u7136\u800c\uff0c\u91cd\u8981\u7684\u662f\uff0c\u5982\u679c\u80fd\u591f\u7f13\u89e3\u8fd9\u4e9b\u98ce\u9669\uff0cAI\u9769\u547d\u6709\u53ef\u80fd\u9884\u793a\u7740\u521b\u4e1a\u7814\u7a76\u7684\u65b0\u9ec4\u91d1\u65f6\u4ee3\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u63d0\u5347\u9886\u57df\u81f3\u66f4\u9ad8\u6c34\u5e73\u7684AI\u97e7\u6027\u6240\u9700\u91c7\u53d6\u7684\u884c\u52a8\uff0c\u540c\u65f6\u575a\u5b9a\u5730\u4fdd\u6301\u5176\u57fa\u7840\u539f\u5219\u548c\u6838\u5fc3\u4ef7\u503c\u89c2\u3002|\n", "2409.08864": "|**2024-09-13**|**Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies**|Zhiqiang Zhong et.al.|[2409.08864](http://arxiv.org/abs/2409.08864)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u5904\u7406\u5404\u79cd\u6570\u636e\u7ed3\u6784\u65f6\u5c55\u73b0\u4e86\u60ca\u4eba\u7684\u80fd\u529b\uff0c\u5305\u62ec\u56fe\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u96c6\u4e2d\u5728\u5f00\u53d1\u7528\u4e8e\u56fe\u8868\u793a\u7684\u6587\u672c\u7f16\u7801\u65b9\u6cd5\u4e0a\uff0c\u4f46\u591a\u6a21\u6001LLM\u7684\u51fa\u73b0\u4e3a\u7406\u89e3\u56fe\u63d0\u4f9b\u4e86\u4e00\u4e2a\u65b0\u7684\u524d\u6cbf\u3002\u8fd9\u4e9b\u5148\u8fdb\u7684\u6a21\u578b\u80fd\u591f\u540c\u65f6\u5904\u7406\u6587\u672c\u548c\u56fe\u50cf\uff0c\u901a\u8fc7\u7ed3\u5408\u89c6\u89c9\u8868\u793a\u4e0e\u4f20\u7edf\u7684\u6587\u672c\u6570\u636e\uff0c\u53ef\u80fd\u5728\u63d0\u9ad8\u5bf9\u56fe\u7ed3\u6784\u7684\u7406\u89e3\u65b9\u9762\u5e26\u6765\u6539\u8fdb\u3002\u8fd9\u9879\u7814\u7a76\u63a2\u8ba8\u4e86\u53ef\u89c6\u5316\u56fe\u5728\u4e0d\u540c\u7ea7\u522b\uff08\u8282\u70b9\u3001\u8fb9\u548c\u56fe\u7ea7\u522b\uff09\u4e0a\u5bf9LLM\u6027\u80fd\u7684\u5f71\u54cd\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u5bf9\u6bd4\u4e86\u591a\u6a21\u6001\u65b9\u6cd5\u4e0e\u7eaf\u6587\u672c\u56fe\u8868\u793a\u7684\u6709\u6548\u6027\u3002\u7ed3\u679c\u63d0\u4f9b\u4e86\u5173\u4e8e\u5229\u7528\u89c6\u89c9\u56fe\u6a21\u6001\u589e\u5f3aLLM\u5bf9\u56fe\u7ed3\u6784\u7406\u89e3\u80fd\u529b\u7684\u6f5c\u529b\u548c\u9650\u5236\u7684\u5b9d\u8d35\u89c1\u89e3\u3002|\n", "2409.08846": "|**2024-09-13**|**FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition**|Zhenhua Xu et.al.|[2409.08846](http://arxiv.org/abs/2409.08846)|null|\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9700\u8981\u5de8\u5927\u7684\u8ba1\u7b97\u80fd\u529b\u548c\u5927\u91cf\u7684\u6570\u636e\u3002\u56e0\u6b64\uff0c\u901a\u8fc7\u6307\u7eb9\u4fdd\u62a4\u8fd9\u4e9b\u6a21\u578b\u7684\u77e5\u8bc6\u4ea7\u6743\u5bf9\u4e8e\u6240\u6709\u6743\u8ba4\u8bc1\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5c1d\u8bd5\u901a\u8fc7\u5fae\u8c03\u5411LLMs\u6dfb\u52a0\u6307\u7eb9\uff0c\u4f46\u8fd9\u4ecd\u6210\u672c\u9ad8\u6602\u4e14\u96be\u4ee5\u6269\u5c55\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FP-VEC\uff0c\u4e00\u79cd\u4f7f\u7528\u6307\u7eb9\u5411\u91cf\u4f5c\u4e3a\u9ad8\u6548LLM\u6307\u7eb9\u65b9\u6cd5\u7684\u8bd5\u70b9\u7814\u7a76\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u751f\u6210\u4e00\u4e2a\u4ee3\u8868\u5d4c\u5165\u5728\u6a21\u578b\u4e2d\u7684\u4fdd\u5bc6\u7b7e\u540d\u7684\u6307\u7eb9\u5411\u91cf\uff0c\u5141\u8bb8\u901a\u8fc7\u5411\u91cf\u76f8\u52a0\u65e0\u7f1d\u5730\u5c06\u76f8\u540c\u7684\u6307\u7eb9\u6574\u5408\u5230\u65e0\u9650\u6570\u91cf\u7684LLMs\u4e2d\u3002\u5728\u591a\u4e2aLLMs\u4e0a\u7684\u7ed3\u679c\u8868\u660e\uff0cFP-VEC\u8f7b\u91cf\u7ea7\uff0c\u53ef\u4ee5\u5728\u4ec5\u4f7f\u7528CPU\u7684\u8bbe\u5907\u4e0a\u8fd0\u884c\u4ee5\u8fdb\u884c\u6307\u7eb9\u8bc6\u522b\uff1b\u53ef\u6269\u5c55\uff0c\u53ea\u9700\u8981\u4e00\u6b21\u8bad\u7ec3\u5373\u53ef\u5b9e\u73b0\u65e0\u9650\u6b21\u7684\u6307\u7eb9\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u4e14\u80fd\u591f\u4fdd\u6301\u6a21\u578b\u7684\u6b63\u5e38\u884c\u4e3a\u3002\u9879\u76ee\u9875\u9762\u4f4d\u4e8ehttps://fingerprintvector.github.io \u3002|\n", "2409.10516": "|**2024-09-16**|**RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval**|Di Liu et.al.|[2409.10516](http://arxiv.org/abs/2409.10516)|**[link](https://github.com/jzbjyb/reatt)**|\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u7136\u800c\uff0c\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u4e8c\u6b21\u65f6\u95f4\u590d\u6742\u5ea6\u5bf9\u6269\u5c55\u5230\u66f4\u957f\u4e0a\u4e0b\u6587\u5e26\u6765\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5bfc\u81f4\u4e86\u6781\u9ad8\u7684\u63a8\u7406\u5ef6\u8fdf\u548cGPU\u5185\u5b58\u6d88\u8017\u4ee5\u7f13\u5b58\u952e\u503c\uff08KV\uff09\u5411\u91cf\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65e0\u9700\u8bad\u7ec3\u7684\u65b9\u6cd5\u2014\u2014\u68c0\u7d22\u6ce8\u610f\u529b\uff08RetrievalAttention\uff09\uff0c\u4ee5\u52a0\u901f\u6ce8\u610f\u529b\u8ba1\u7b97\u3002\u901a\u8fc7\u5229\u7528\u6ce8\u610f\u529b\u64cd\u4f5c\u7684\u52a8\u6001\u7a00\u758f\u7279\u6027\uff0cRetrievalAttention\u5728CPU\u5185\u5b58\u4e0a\u6784\u5efa\u4e86\u8fd1\u4f3c\u6700\u8fd1\u90bb\u641c\u7d22\uff08ANNS\uff09\u7d22\u5f15\uff0c\u5e76\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u901a\u8fc7\u5411\u91cf\u641c\u7d22\u68c0\u7d22\u6700\u76f8\u5173\u7684\u90e8\u5206\u3002 \u7531\u4e8e\u67e5\u8be2\u5411\u91cf\u4e0e\u952e\u5411\u91cf\u4e4b\u95f4\u7684\u5206\u5e03\u5916\uff08OOD\uff09\u95ee\u9898\uff0c\u73b0\u6210\u7684ANNS\u7d22\u5f15\u4ecd\u9700\u8981\u626b\u63cfO(N)\uff08\u901a\u5e38\u4e3a\u6240\u6709\u952e\u768430%\uff09\u7684\u6570\u636e\u8fdb\u884c\u7cbe\u786e\u68c0\u7d22\uff0c\u8fd9\u65e0\u6cd5\u5145\u5206\u5229\u7528\u9ad8\u7a00\u758f\u6027\u3002RetrievalAttention\u9996\u5148\u8bc6\u522b\u4e86ANNS\u57fa\u6ce8\u610f\u529b\u4e2d\u7684OOD\u6311\u6218\uff0c\u5e76\u901a\u8fc7\u4e00\u4e2a\u9002\u5e94\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u611f\u77e5\u5411\u91cf\u641c\u7d22\u7b97\u6cd5\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u8be5\u7b97\u6cd5\u4ec5\u8bbf\u95ee1-3%\u7684\u6570\u636e\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u4e9a\u7ebf\u6027\u65f6\u95f4\u590d\u6742\u5ea6\u3002 RetrievalAttention\u5927\u5e45\u964d\u4f4e\u4e86\u957f\u4e0a\u4e0b\u6587LLMs\u7684\u63a8\u7406\u6210\u672c\uff0c\u540c\u65f6\u663e\u8457\u51cf\u5c11\u4e86GPU\u5185\u5b58\u9700\u6c42\uff0c\u800c\u4fdd\u6301\u4e86\u6a21\u578b\u51c6\u786e\u6027\u3002\u5c24\u5176\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cRetrievalAttention\u4ec5\u9700\u898116GB\u7684GPU\u5185\u5b58\u5373\u53ef\u4e3a\u5177\u67098B\u53c2\u6570\u7684LLM\u63d0\u4f9b\u670d\u52a1\uff0c\u652f\u6301\u5904\u7406128K\u4e2a\u4ee4\u724c\uff0c\u80fd\u591f\u5728\u5355\u4e2aNVIDIA RTX4090\uff0824GB\uff09\u4e0a\u751f\u6210\u4e00\u4e2a\u4ee4\u724c\u8017\u65f60.188\u79d2\u3002|\n", "2409.10506": "|**2024-09-16**|**Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models**|Momoko Shiraishi et.al.|[2409.10506](http://arxiv.org/abs/2409.10506)|null|\u7531\u4e8e\u73b0\u6709C\u7a0b\u5e8f\u4e2d\u7684\u5185\u5b58\u5b89\u5168\u6027\u6f0f\u6d1e\u6301\u7eed\u5a01\u80c1\u4ee5\u53caRust\u8bed\u8a00\u4f5c\u4e3aC\u8bed\u8a00\u66ff\u4ee3\u54c1\u6240\u53d7\u5230\u7684\u5e7f\u6cdb\u5173\u6ce8\uff0c\u5c06C\u4ee3\u7801\u8f6c\u6362\u4e3aRust\u4ee3\u7801\u5b58\u5728\u5f3a\u70c8\u7684\u52a8\u673a\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u751f\u6210\u6bd4\u57fa\u4e8e\u89c4\u5219\u65b9\u6cd5\u66f4\u81ea\u7136\u3001\u66f4\u5b89\u5168\u7684\u4ee3\u7801\u6765\u81ea\u52a8\u5316\u8fd9\u4e00\u7ffb\u8bd1\u8fc7\u7a0b\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0cLLM\u751f\u6210\u7684Rust\u4ee3\u7801\u5f80\u5f80\u65e0\u6cd5\u7f16\u8bd1\uff0c\u5373\u4f7f\u662f\u76f8\u5bf9\u8f83\u5c0f\u7684C\u7a0b\u5e8f\uff0c\u8fd9\u4e3b\u8981\u5f52\u56e0\u4e8e\u4e24\u79cd\u8bed\u8a00\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u548c\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u7684\u7ffb\u8bd1\u65b9\u6848\uff0c\u4ee5\u63d0\u9ad8\u5927\u89c4\u6a21C\u4ee3\u7801\u6210\u529f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\u7684\u6982\u7387\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u4e09\u4e2a\u5173\u952e\u6280\u672f\uff1a\uff081\uff09\u9884\u5904\u7406C\u4ee3\u7801\uff0c\u4f7f\u5176\u7ed3\u6784\u548c\u8868\u8fbe\u5f0f\u66f4\u597d\u5730\u4e0eRust\u5bf9\u9f50\uff1b\uff082\uff09\u5c06\u4ee3\u7801\u5206\u5272\u4e3a\u6700\u4f73\u5927\u5c0f\u7684\u7ffb\u8bd1\u5355\u5143\uff0c\u4ee5\u907f\u514d\u8d85\u51faLLM\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u9650\u5236\uff1b\uff083\uff09\u901a\u8fc7\u4f7f\u7528\u4e0a\u4e0b\u6587\u8865\u5145\u63d0\u793a\uff0c\u8fed\u4ee3\u7f16\u8bd1\u5e76\u4fee\u590d\u9519\u8bef\uff0c\u540c\u65f6\u4fdd\u6301\u4e0d\u540c\u7ffb\u8bd1\u5355\u5143\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u3002\u6210\u529f\u7f16\u8bd1\u662f\u5b9e\u73b0\u529f\u80fd\u7b49\u6548\u6027\u7684\u9996\u8981\u6b65\u9aa4\uff0c\u56e0\u4e3a\u53ea\u6709\u53ef\u7f16\u8bd1\u7684\u4ee3\u7801\u624d\u80fd\u8fdb\u4e00\u6b65\u8fdb\u884c\u6d4b\u8bd5\u3002 \u572820\u4e2a\u57fa\u51c6C\u7a0b\u5e8f\u7684\u5b9e\u9a8c\u4e2d\uff0c\u5305\u62ec\u90a3\u4e9b\u8d85\u8fc74\u5343\u884c\u4ee3\u7801\u7684\u7a0b\u5e8f\uff0c\u6211\u4eec\u6210\u529f\u5730\u5c06\u6240\u6709\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u53ef\u7f16\u8bd1\u7684Rust\u4ee3\u7801\uff0c\u6ca1\u6709\u4e22\u5931\u539f\u59cb\u4ee3\u7801\u7684\u5bf9\u5e94\u90e8\u5206\u3002|\n", "2409.10504": "|**2024-09-16**|**DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction**|John Wu et.al.|[2409.10504](http://arxiv.org/abs/2409.10504)|null|\u5728\u533b\u5b66\u7f16\u7801\u7b49\u9ad8\u7ef4\u6216\u591a\u6807\u7b7e\u9884\u6d4b\u4efb\u52a1\u4e2d\uff0c\u65e2\u9700\u8981\u9884\u6d4b\u7684\u51c6\u786e\u6027\u4e5f\u9700\u8981\u89e3\u91ca\u7684\u53ef\u8bfb\u6027\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4f9d\u8d56\u4e8e\u5c40\u90e8\u89e3\u91ca\u65b9\u6cd5\uff0c\u65e0\u6cd5\u63d0\u4f9b\u6574\u4e2a\u591a\u6807\u7b7e\u96c6\u5185\u6bcf\u4e2a\u6807\u7b7e\u9884\u6d4b\u80cc\u540e\u7684\u5168\u9762\u673a\u5236\u89e3\u91ca\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDIctionary Label Attention\uff08\u7b80\u79f0\\method\uff09\u7684\u6a21\u5757\u5316\u89e3\u91ca\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06\u4e0d\u53ef\u89e3\u91ca\u7684\u5bc6\u96c6\u5d4c\u5165\u5206\u89e3\u5230\u7a00\u758f\u5d4c\u5165\u7a7a\u95f4\u4e2d\u3002\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u975e\u96f6\u5143\u7d20\uff08\u5b57\u5178\u7279\u5f81\uff09\u4ee3\u8868\u4e86\u5168\u5c40\u5b66\u4e60\u7684\u533b\u7597\u6982\u5ff5\u3002 \u901a\u8fc7\u4eba\u5de5\u8bc4\u4f30\uff0c\u6211\u4eec\u53d1\u73b0\u6211\u4eec\u7684\u7a00\u758f\u5d4c\u5165\u6bd4\u5176\u5bc6\u96c6\u5bf9\u5e94\u7269\u5728\u4eba\u7c7b\u7406\u89e3\u4e0a\u81f3\u5c11\u63d0\u9ad8\u4e8650%\u3002\u6211\u4eec\u7684\u81ea\u52a8\u5b57\u5178\u7279\u5f81\u8bc6\u522b\u7ba1\u9053\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u901a\u8fc7\u68c0\u67e5\u5e76\u603b\u7ed3\u6bcf\u4e2a\u5b57\u5178\u7279\u5f81\u6fc0\u6d3b\u7684\u6700\u9ad8\u7ea7\u8bcd\u6c47\uff0c\u63ed\u793a\u4e86\u6570\u5343\u4e2a\u5b66\u4e60\u5230\u7684\u533b\u7597\u6982\u5ff5\u3002\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u7a00\u758f\u7684\u53ef\u89e3\u91ca\u77e9\u9635\u8868\u793a\u5b57\u5178\u7279\u5f81\u4e0e\u533b\u7597\u4ee3\u7801\u4e4b\u95f4\u7684\u5173\u7cfb\uff0c\u8fd9\u4e0d\u4ec5\u589e\u5f3a\u4e86\u6a21\u578b\u9884\u6d4b\u7684\u673a\u5236\u6027\u548c\u5168\u5c40\u7406\u89e3\u80fd\u529b\uff0c\u800c\u4e14\u5728\u4e0d\u9700\u8981\u5927\u91cf\u4eba\u5de5\u6ce8\u91ca\u7684\u60c5\u51b5\u4e0b\uff0c\u4fdd\u6301\u4e86\u7ade\u4e89\u529b\u548c\u53ef\u6269\u5c55\u6027\u3002|\n", "2409.10502": "|**2024-09-16**|**Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles**|Kulin Shah et.al.|[2409.10502](http://arxiv.org/abs/2409.10502)|null|\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8eTransformer\u67b6\u6784\u7684\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u771f\u6b63\u53d1\u5c55\u51fa\u4e86\u57fa\u672c\u7684\u641c\u7d22\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ecd\u662f\u4e00\u4e2a\u6301\u7eed\u8ba8\u8bba\u7684\u8bdd\u9898\u3002\u672c\u7814\u7a76\u65e8\u5728\u63a2\u8ba8\u56e0\u679c\u8bed\u8a00\u5efa\u6a21\u80fd\u5426\u5b66\u4f1a\u89e3\u51b3\u590d\u6742\u7684\u6570\u72ec\u8c1c\u9898\u8fd9\u4e00\u4efb\u52a1\u3002\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\u9700\u8981\u6a21\u578b\u9996\u5148\u5728\u6240\u6709\u7a7a\u767d\u5355\u5143\u683c\u4e2d\u8fdb\u884c\u641c\u7d22\u4ee5\u51b3\u5b9a\u586b\u5145\u54ea\u4e2a\u5355\u5143\u683c\uff0c\u7136\u540e\u5e94\u7528\u9002\u5f53\u7684\u7b56\u7565\u6765\u586b\u5145\u9009\u5b9a\u7684\u5355\u5143\u683c\u3002\u6709\u65f6\uff0c\u7b56\u7565\u7684\u5e94\u7528\u4ec5\u5bfc\u81f4\u5355\u5143\u683c\u53ef\u80fd\u503c\u7684\u51cf\u5c11\uff0c\u800c\u975e\u786e\u5b9a\u786e\u5207\u503c\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u9700\u8981\u5bf9\u5355\u4e2a\u5355\u5143\u683c\u5e94\u7528\u591a\u4e2a\u7b56\u7565\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u7ecf\u8fc7\u903b\u8f91\u6b65\u9aa4\u5e8f\u5217\u8bad\u7ec3\u7684Transformer\u6a21\u578b\u786e\u5b9e\u80fd\u591f\u5b66\u4f1a\u89e3\u51b3\u6570\u72ec\u8c1c\u9898\uff08\u6211\u4eec\u7684\u6a21\u578b\u6b63\u786e\u89e3\u51b3\u4e8694.21%\u7684\u8c1c\u9898\uff09\u3002\u6211\u4eec\u8fd8\u5bf9Zebra\u8c1c\u9898\uff08\u53c8\u79f0\u7231\u56e0\u65af\u5766\u8c1c\u9898\uff09\u8fdb\u884c\u4e86\u6269\u5c55\u5206\u6790\uff0c\u5e76\u8bc1\u660e\u6a21\u578b\u80fd\u591f\u6b63\u786e\u89e3\u51b392.04%\u7684\u8c1c\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u8bad\u7ec3\u540e\u7684Transformer\u5185\u90e8\u8868\u793a\uff0c\u5e76\u901a\u8fc7\u7ebf\u6027\u63a2\u67e5\u53d1\u73b0\uff0c\u53ef\u4ee5\u4ece\u5b83\u4eec\u4e2d\u89e3\u7801\u51fa\u7ed9\u5b9a\u5355\u5143\u683c\u7684\u6240\u6709\u53ef\u80fd\u503c\u4fe1\u606f\uff0c\u8fd9\u8868\u660eTransformer\u6743\u91cd\u4e2d\u9690\u542b\u7740\u5f3a\u5927\u7684\u63a8\u7406\u5f15\u64ce\u3002|\n", "2409.10490": "|**2024-09-16**|**Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models**|Shaznin Sultana et.al.|[2409.10490](http://arxiv.org/abs/2409.10490)|null|\u8fd1\u5e74\u6765\uff0c\u8f6f\u4ef6\u5f00\u53d1\u9886\u57df\u5bf9\u5f00\u6e90\u9879\u76ee\u4f9d\u8d56\u7684\u589e\u52a0\u5bfc\u81f4\u4e86\u6f0f\u6d1e\u95ee\u9898\u7684\u663e\u8457\u589e\u957f\uff0c\u8fd9\u4e00\u73b0\u8c61\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\u3002\u672c\u6587\u65e8\u5728\u63a2\u8ba8\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bc6\u522b\u4ee3\u7801\u5e93\u4e2d\u7684\u6f0f\u6d1e\u65b9\u9762\u7684\u80fd\u529b\u4e0e\u6548\u679c\uff0c\u7279\u522b\u5173\u6ce8\u4e86\u65b0\u5174LLM\u6280\u672f\u7684\u6700\u65b0\u8fdb\u5c55\u3002\u901a\u8fc7\u5bf9\u6bd4\u5206\u6790\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u5305\u62ecLlama\u3001CodeLlama\u3001Gemma\u548cCodeGemma\u5728\u5185\u7684\u6700\u8fd1\u52a0\u5165\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff0c\u4ee5\u53caBERT\u3001RoBERTa\u548cGPT-3\u7b49\u73b0\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u7684\u6027\u80fd\u3002\u6211\u4eec\u7684\u7814\u7a76\u76ee\u6807\u662f\u63ed\u793aLLM\u5728\u6f0f\u6d1e\u68c0\u6d4b\u9886\u57df\u7684\u80fd\u529b\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e0d\u540c\u5f00\u6e90\u4ed3\u5e93\u7684\u5b89\u5168\u5b9e\u8df5\u63d0\u5347\u3002\u7ed3\u679c\u663e\u793a\uff0cCodeGemma\u5728\u68c0\u6d4b\u8f6f\u4ef6\u5b89\u5168\u6f0f\u6d1e\u65b9\u9762\u53d6\u5f97\u4e86\u6700\u9ad8\u7684F1\u5206\u6570\uff0858%\uff09\u548c\u53ec\u56de\u7387\uff0887%\uff09\u3002|\n", "2409.10484": "|**2024-09-16**|**XLM for Autonomous Driving Systems: A Comprehensive Review**|Sonda Fourati et.al.|[2409.10484](http://arxiv.org/abs/2409.10484)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u4fe1\u606f\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u4e86\u60ca\u4eba\u7684\u80fd\u529b\u3002\u8fd9\u4e9b\u4efb\u52a1\u6db5\u76d6\u4e86\u4ece\u6570\u636e\u63d0\u53d6\u548c\u6587\u732e\u603b\u7ed3\u5230\u5185\u5bb9\u751f\u6210\u3001\u9884\u6d4b\u5efa\u6a21\u3001\u51b3\u7b56\u5236\u5b9a\u4ee5\u53ca\u7cfb\u7edf\u63a7\u5236\u7b49\u591a\u4e2a\u65b9\u9762\u3002\u6b64\u5916\uff0c\u89c6\u89c9\u5927\u578b\u6a21\u578b\uff08VLMs\uff09\u548c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\uff0c\u5373XLMs\uff0c\u80fd\u591f\u7ed3\u5408\u591a\u79cd\u6570\u636e\u6a21\u6001\uff0c\u5e76\u5229\u7528\u8bed\u8a00\u7406\u89e3\u7684\u5f3a\u5927\u529b\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u8bf8\u5982\u81ea\u52a8\u9a7e\u9a76\u7cfb\u7edf\uff08ADS\uff09\u7b49\u57fa\u4e8e\u4fe1\u606f\u7cfb\u7edf\u7684\u8fdb\u6b65\u3002\u901a\u8fc7\u5c06\u8bed\u8a00\u901a\u4fe1\u4e0e\u591a\u6a21\u5f0f\u611f\u5b98\u8f93\u5165\uff08\u5982\u5168\u666f\u56fe\u50cf\u548c\u6fc0\u5149\u96f7\u8fbe\u6216\u96f7\u8fbe\u6570\u636e\uff09\u76f8\u7ed3\u5408\uff0c\u53ef\u4ee5\u91c7\u53d6\u51c6\u786e\u7684\u9a7e\u9a76\u884c\u52a8\u3002\u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u672c\u6587\u7efc\u8ff0\u4e86XLMs\u5728\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u65b9\u9762\u7684\u6f5c\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u56de\u987e\u4e86ADS\u548cXLMs\u7684\u76f8\u5173\u6587\u732e\uff0c\u5305\u62ec\u5b83\u4eec\u7684\u67b6\u6784\u3001\u5de5\u5177\u548c\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u8be6\u7ec6\u9610\u8ff0\u4e86\u90e8\u7f72XLMs\u4ee5\u5b9e\u73b0\u81ea\u52a8\u9a7e\u9a76\u89e3\u51b3\u65b9\u6848\u7684\u65b9\u6cd5\u3002\u6700\u540e\uff0c\u6211\u4eec\u6307\u51fa\u4e86XLM\u90e8\u7f72\u5728ADS\u4e2d\u7684\u76f8\u5173\u6311\u6218\uff0c\u5e76\u63d0\u51fa\u4e86\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u4fc3\u8fdbXLM\u5728\u672a\u6765ADS\u6846\u67b6\u4e2d\u7684\u5e94\u7528\u3002|\n", "2409.10482": "|**2024-09-17**|**Schrodinger's Memory: Large Language Models**|Wei Wang et.al.|[2409.10482](http://arxiv.org/abs/2409.10482)|null|\u8bb0\u5fc6\u662f\u4eba\u7c7b\u6d3b\u52a8\u7684\u57fa\u7840\uff1b\u6ca1\u6709\u8bb0\u5fc6\uff0c\u51e0\u4e4e\u4e0d\u53ef\u80fd\u6267\u884c\u65e5\u5e38\u751f\u6d3b\u4e2d\u7684\u4efb\u4f55\u4efb\u52a1\u3002\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u80fd\u529b\u6b63\u53d8\u5f97\u8d8a\u6765\u8d8a\u63a5\u8fd1\u4eba\u7c7b\u3002\u4f46LLMs\u6709\u8bb0\u5fc6\u5417\uff1f\u6839\u636e\u5f53\u524d\u7684\u8868\u73b0\uff0cLLMs\u786e\u5b9e\u663e\u793a\u51fa\u5177\u6709\u8bb0\u5fc6\u7684\u8ff9\u8c61\u3002\u90a3\u4e48\uff0c\u8fd9\u79cd\u8bb0\u5fc6\u673a\u5236\u80cc\u540e\u662f\u4ec0\u4e48\u539f\u7406\u5462\uff1f\u76ee\u524d\u7684\u7814\u7a76\u7f3a\u4e4f\u5bf9LLMs\u8bb0\u5fc6\u80fd\u529b\u548c\u5e95\u5c42\u7406\u8bba\u7684\u6df1\u5165\u63a2\u8ba8\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5229\u7528\u6cdb\u903c\u8fd1\u5b9a\u7406\uff08UAT\uff09\u6765\u89e3\u91caLLMs\u7684\u8bb0\u5fc6\u673a\u5236\u3002\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5b9e\u9a8c\u6765\u9a8c\u8bc1\u5404\u79cdLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8fd9\u4e9b\u8bb0\u5fc6\u80fd\u529b\u7684\u65b0\u65b9\u6cd5\u6765\u8bc4\u4f30\u5b83\u4eec\u7684\u80fd\u529b\u3002\u6211\u4eec\u8ba4\u4e3a\uff0cLLMs\u7684\u8bb0\u5fc6\u5de5\u4f5c\u65b9\u5f0f\u7c7b\u4f3c\u4e8e\u859b\u5b9a\u8c14\u7684\u8bb0\u5fc6\uff0c\u5373\u53ea\u6709\u5728\u67e5\u8be2\u7279\u5b9a\u8bb0\u5fc6\u65f6\u624d\u4f1a\u663e\u73b0\u51fa\u6765\u3002\u6211\u4eec\u53ea\u80fd\u901a\u8fc7\u54cd\u5e94\u67e5\u8be2\u7684\u8f93\u51fa\u6765\u786e\u5b9a\u6a21\u578b\u662f\u5426\u4fdd\u7559\u4e86\u8bb0\u5fc6\uff1b\u5426\u5219\uff0c\u5b83\u4ecd\u7136\u662f\u4e0d\u786e\u5b9a\u7684\u3002\u6700\u540e\uff0c\u6211\u4eec\u6269\u5c55\u4e86\u8fd9\u4e00\u6982\u5ff5\uff0c\u901a\u8fc7\u6bd4\u8f83\u4eba\u8111\u548cLLMs\u7684\u8bb0\u5fc6\u80fd\u529b\uff0c\u5f3a\u8c03\u4e86\u5b83\u4eec\u5728\u64cd\u4f5c\u673a\u5236\u4e0a\u7684\u76f8\u4f3c\u6027\u548c\u5dee\u5f02\u6027\u3002|\n", "2409.10444": "|**2024-09-16**|**LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning**|Jicong Ao et.al.|[2409.10444](http://arxiv.org/abs/2409.10444)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cLLM\u4f5c\u4e3a\u884c\u4e3a\u6811\u89c4\u5212\u5668\u201d\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u673a\u5668\u4eba\u88c5\u914d\u4efb\u52a1\u89c4\u5212\u4e0e\u6267\u884c\u4e2d\u7684\u884c\u4e3a\u6811\uff08BT\uff09\u751f\u6210\u3002\u6211\u4eec\u5f15\u5165\u4e86\u56db\u79cd\u57fa\u4e8e\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u65b9\u6cd5\uff0c\u5229\u7528LLMs\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u63a8\u7406\u80fd\u529b\uff0c\u4ee5BT\u683c\u5f0f\u4ea7\u751f\u4efb\u52a1\u8ba1\u5212\uff0c\u4ece\u800c\u51cf\u5c11\u4eba\u5de5\u52aa\u529b\u5e76\u786e\u4fdd\u5176\u7a33\u5065\u6027\u548c\u53ef\u7406\u89e3\u6027\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8bc4\u4f30\u4e86\u5bf9\u540c\u4e00\u4efb\u52a1\u8fdb\u884c\u5fae\u8c03\u7684\u53c2\u6570\u8f83\u5c11\u7684LLMs\u7684\u8868\u73b0\u3002\u5728\u6a21\u62df\u548c\u5b9e\u9645\u4e16\u754c\u8bbe\u7f6e\u4e0b\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u63d0\u9ad8\u4e86LLMs\u5728BT\u751f\u6210\u65b9\u9762\u7684\u6027\u80fd\uff0c\u901a\u8fc7\u57fa\u4e8e\u4e0a\u4e0b\u6587\u7684\u5b66\u4e60\u548c\u76d1\u7763\u5fae\u8c03\uff0c\u5728BT\u751f\u6210\u65b9\u9762\u663e\u8457\u63d0\u9ad8\u4e86\u6210\u529f\u7387\u3002|\n", "2409.10411": "|**2024-09-16**|**A Large-Scale Privacy Assessment of Android Third-Party SDKs**|Mark Huasong Meng et.al.|[2409.10411](http://arxiv.org/abs/2409.10411)|null|\u672c\u6587\u7814\u7a76\u5bf9Android\u5e73\u53f0\u4e0a\u7684\u7b2c\u4e09\u65b9\u8f6f\u4ef6\u5f00\u53d1\u5de5\u5177\u5305\uff08SDK\uff09\u8fdb\u884c\u4e86\u9488\u5bf9\u6027\u5206\u6790\uff0c\u65e8\u5728\u586b\u8865Android\u8f6f\u4ef6\u4f9b\u5e94\u94fe\u4e2d\u7684\u5173\u952e\u7a7a\u767d\uff0c\u5173\u6ce8\u4e8e\u7528\u6237\u9690\u79c1\u4fdd\u62a4\u95ee\u9898\u3002\u7814\u7a76\u4e3b\u8981\u4ece\u4e24\u4e2a\u5173\u952e\u7684SDK\u53d1\u5e03\u5e73\u53f0\uff0c\u5b98\u65b9\u5e73\u53f0\u4e0e\u5927\u578b\u66ff\u4ee3\u5e73\u53f0\uff0c\u5bf9\u5e7f\u6cdb\u4f7f\u7528\u7684158\u4e2aSDK\u8fdb\u884c\u4e86\u8c03\u67e5\u3002 \u5728\u9690\u79c1\u6cc4\u9732\u65b9\u9762\uff0c\u6211\u4eec\u53d1\u73b0\u4e86338\u4e2a\u5b9e\u4f8b\uff0c\u8868\u660e\u8fd9\u4e9bSDK\u5728\u672a\u7ecf\u6388\u6743\u7684\u60c5\u51b5\u4e0b\uff0c\u975e\u6cd5\u4f20\u8f93\u4e86\u7528\u6237\u7684\u654f\u611f\u4fe1\u606f\u3002\u8fd9\u53ef\u80fd\u88ab\u7528\u4e8e\u975e\u6cd5\u76ee\u7684\uff0c\u5982\u7528\u6237\u8ffd\u8e2a\u6216\u725f\u5229\u3002 \u5728\u9690\u79c1\u5408\u89c4\u6027\u65b9\u9762\uff0c\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u8d85\u8fc730%\u7684\u88ab\u68c0\u67e5SDK\u5e76\u672a\u63d0\u4f9b\u9690\u79c1\u653f\u7b56\uff0c\u4ee5\u62ab\u9732\u5176\u6570\u636e\u5904\u7406\u5b9e\u8df5\u3002\u5bf9\u4e8e\u90a3\u4e9b\u63d0\u4f9b\u4e86\u9690\u79c1\u653f\u7b56\u7684SDK\uff0c\u670937%\u8fc7\u5ea6\u6536\u96c6\u4e86\u7528\u6237\u6570\u636e\uff0c\u800c88%\u5219\u9519\u8bef\u5730\u58f0\u79f0\u62e5\u6709\u8bbf\u95ee\u654f\u611f\u6570\u636e\u7684\u6743\u5229\u3002 \u6211\u4eec\u5728\u4e00\u5e74\u540e\u91cd\u65b0\u5ba1\u89c6\u4e86SDK\u7684\u6700\u65b0\u7248\u672c\uff0c\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u4ee4\u4eba\u62c5\u5fe7\u7684\u8d8b\u52bf\u5e76\u6ca1\u6709\u5f97\u5230\u6539\u5584\u3002 \u57fa\u4e8e\u6211\u4eec\u7684\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e09\u9879\u884c\u52a8\u5efa\u8bae\uff0c\u65e8\u5728\u964d\u4f4e\u9690\u79c1\u6cc4\u9732\u98ce\u9669\u5e76\u589e\u5f3aAndroid\u7528\u6237\u7684\u9690\u79c1\u4fdd\u62a4\u3002\u8fd9\u9879\u7814\u7a76\u4e0d\u4ec5\u5bf9\u884c\u4e1a\u63d0\u51fa\u4e86\u7d27\u8feb\u7684\u5173\u6ce8\u547c\u5401\uff0c\u4e5f\u4e3a\u672a\u6765\u7684\u76d1\u7ba1\u5e72\u9884\u63d0\u4f9b\u4e86\u5173\u952e\u89c1\u89e3\u3002|\n", "2409.10354": "|**2024-09-17**|**Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot**|Bhuvan Sachdeva et.al.|[2409.10354](http://arxiv.org/abs/2409.10354)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u5e94\u7528\u53ca\u5176\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u5e7b\u89c9\u3001\u4fe1\u606f\u4e0d\u5b8c\u6574\u548c\u504f\u89c1\uff0c\u8fd9\u5f71\u54cd\u4e86\u5b83\u4eec\u7684\u53ef\u9760\u6027\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u95ee\u9898\uff0c\u7814\u7a76\u8005\u53d1\u5e03\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6784\u5efa\u4f60\u81ea\u5df1\u7684\u4e13\u5bb6\u673a\u5668\u4eba\u201d\uff08BYOeB\uff09\u7684\u5e73\u53f0\uff0c\u5141\u8bb8\u5f00\u53d1\u4eba\u5458\u521b\u5efa\u96c6\u6210\u4e13\u5bb6\u9a8c\u8bc1\u7684LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u3002CataractBot\u662f\u8be5\u5e73\u53f0\u7684\u7b2c\u4e00\u4e2a\u5b9e\u73b0\uff0c\u5b83\u4e13\u6ce8\u4e8e\u63d0\u4f9b\u6709\u5173\u767d\u5185\u969c\u624b\u672f\u7684\u4e13\u5bb6\u9a8c\u8bc1\u56de\u7b54\u3002\u521d\u6b65\u8bc4\u4f30\u663e\u793a\u4e86\u5176\u6f5c\u529b\uff0c\u4f46\u8be5\u7814\u7a76\u6837\u672c\u91cf\u8f83\u5c0f\u4e14\u4e3b\u8981\u4e3a\u5b9a\u6027\u5206\u6790\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5bf9CataractBot\u8fdb\u884c\u4e86\u4e3a\u671f24\u5468\u7684\u5927\u89c4\u6a21\u90e8\u7f72\uff0c\u6d89\u53ca318\u540d\u60a3\u8005\u53ca\u5176\u966a\u540c\u4eba\u5458\u53d1\u9001\u76841992\u6761\u6d88\u606f\uff0c\u5176\u4e2d91.71%\u7684\u56de\u7b54\u7ecf\u8fc7\u4e86\u4e03\u4f4d\u4e13\u5bb6\u7684\u9a8c\u8bc1\u3002\u901a\u8fc7\u5206\u6790\u4ea4\u4e92\u65e5\u5fd7\uff0c\u6211\u4eec\u53d1\u73b0\u533b\u7597\u95ee\u9898\u8fdc\u591a\u4e8e\u7269\u6d41\u95ee\u9898\uff0c\u5e7b\u89c9\u73b0\u8c61\u53ef\u4ee5\u5ffd\u7565\u4e0d\u8ba1\uff0c\u5e76\u4e14\u4e13\u5bb6\u8bc4\u5b9a84.52%\u7684\u533b\u7597\u56de\u7b54\u51c6\u786e\u65e0\u8bef\u3002\u968f\u7740\u77e5\u8bc6\u5e93\u901a\u8fc7\u4e13\u5bb6\u66f4\u6b63\u4e0d\u65ad\u6269\u5c55\uff0c\u7cfb\u7edf\u7684\u6027\u80fd\u5f97\u5230\u4e8619.02%\u7684\u63d0\u5347\uff0c\u51cf\u5c11\u4e86\u4e13\u5bb6\u7684\u5de5\u4f5c\u8d1f\u62c5\u3002\u8fd9\u4e9b\u53d1\u73b0\u6307\u5bfc\u672a\u6765LLM\u9a71\u52a8\u7684\u804a\u5929\u673a\u5668\u4eba\u8bbe\u8ba1\u7684\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.11404": "|**2024-09-17**|**AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs**|Basel Mousi et.al.|[2409.11404](http://arxiv.org/abs/2409.11404)|null|\u963f\u62c9\u4f2f\u8bed\uff0c\u4ee5\u5176\u4e30\u5bcc\u7684\u65b9\u8a00\u591a\u6837\u6027\uff0c\u4ecd\u7136\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u663e\u8457\u88ab\u4f4e\u4f30\uff0c\u5c24\u5176\u662f\u5728\u65b9\u8a00\u53d8\u4f53\u65b9\u9762\u3002\u6211\u4eec\u901a\u8fc7\u4f7f\u7528\u673a\u5668\u7ffb\u8bd1\u7ed3\u5408\u4eba\u5de5\u540e\u7f16\u8f91\u521b\u5efa\u7684\u4e03\u4e2a\u4eba\u5de5\u5408\u6210\u6570\u636e\u96c6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u8fd9\u4e9b\u6570\u636e\u96c6\u6db5\u76d6\u4e86\u73b0\u4ee3\u6807\u51c6\u963f\u62c9\u4f2f\u8bed\uff08MSA\uff09\u4ee5\u53ca\u963f\u62c9\u4f2f\u5404\u5730\u533a\u7684\u65b9\u8a00\u3002\u6211\u4eec\u63d0\u51fa\u4e86AraDiCE\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u7406\u89e3\u4e0e\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u7814\u7a76\u4fa7\u91cd\u4e8e\u4f4e\u8d44\u6e90\u963f\u62c9\u4f2f\u65b9\u8a00\uff0c\u5e76\u5bf9\u5176\u8fdb\u884c\u4e86\u8bc4\u4ef7\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u9996\u6b21\u5f15\u5165\u4e86\u4e00\u4e2a\u7ec6\u7c92\u5ea6\u57fa\u51c6\uff0c\u4e13\u95e8\u7528\u4e8e\u8bc4\u4f30\u963f\u62c9\u4f2f\u534a\u5c9b\u3001\u57c3\u53ca\u548c\u9ece\u51e1\u7279\u5730\u533a\u4e4b\u95f4\u7684\u6587\u5316\u610f\u8bc6\uff0c\u4e3aLLM\u8bc4\u4f30\u63d0\u4f9b\u4e86\u65b0\u7684\u7ef4\u5ea6\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5c3d\u7ba1\u9488\u5bf9\u7279\u5b9a\u963f\u62c9\u4f2f\u8bed\u6a21\u578b\u5982Jais\u548cAceGPT\u5728\u65b9\u8a00\u4efb\u52a1\u4e0a\u4f18\u4e8e\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u4f46\u5728\u65b9\u8a00\u8bc6\u522b\u3001\u751f\u6210\u548c\u7ffb\u8bd1\u65b9\u9762\u4ecd\u5b58\u5728\u91cd\u5927\u6311\u6218\u3002\u8fd9\u9879\u5de5\u4f5c\u8d21\u732e\u4e86\u7ea64.5\u4e07\u4e2a\u7ecf\u8fc7\u4eba\u5de5\u540e\u7f16\u8f91\u7684\u6837\u672c\u3001\u4e00\u4e2a\u6587\u5316\u57fa\u51c6\uff0c\u5e76\u5f3a\u8c03\u4e86\u6839\u636e\u7279\u5b9a\u8bad\u7ec3\u6765\u6539\u5584\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6355\u6349\u4e0d\u540c\u963f\u62c9\u4f2f\u65b9\u8a00\u548c\u6587\u5316\u80cc\u666f\u7ec6\u5fae\u5dee\u5f02\u7684\u91cd\u8981\u6027\u3002\u6211\u4eec\u5c06\u53d1\u5e03\u5728\u672c\u7814\u7a76\u4e2d\u6784\u5efa\u7684\u65b9\u8a00\u7ffb\u8bd1\u6a21\u578b\u548c\u57fa\u51c6\u3002|\n", "2409.11402": "|**2024-09-17**|**NVLM: Open Frontier-Class Multimodal LLMs**|Wenliang Dai et.al.|[2409.11402](http://arxiv.org/abs/2409.11402)|null|\u6211\u4eec\u5f15\u5165\u4e86NVLM 1.0\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e0a\u8fbe\u5230\u524d\u6cbf\u6c34\u5e73\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cf\uff0c\u5176\u6027\u80fd\u4e0e\u9876\u7ea7\u4e13\u6709\u6a21\u578b\uff08\u5982GPT-4o\uff09\u548c\u5f00\u6e90\u6a21\u578b\uff08\u5982Llama 3-V 405B\u548cInternVL 2\uff09\u76f8\u5339\u654c\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0cNVLM 1.0\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u540e\uff0c\u5728\u4ec5\u6587\u672c\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u751a\u81f3\u8d85\u8fc7\u4e86\u5176\u80cc\u540e\u7684\u8bed\u8a00\u6a21\u578b\u57fa\u7840\u67b6\u6784\u3002 \u5728\u6a21\u578b\u8bbe\u8ba1\u65b9\u9762\uff0c\u6211\u4eec\u5bf9\u89e3\u7801\u5668\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08\u5982LLaVA\uff09\u548c\u4ea4\u53c9\u6ce8\u610f\u529b\u578b\u6a21\u578b\uff08\u5982Flamingo\uff09\u8fdb\u884c\u4e86\u5168\u9762\u6bd4\u8f83\u3002\u57fa\u4e8e\u8fd9\u4e24\u79cd\u65b9\u6cd5\u7684\u4f18\u52bf\u548c\u52a3\u52bf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u67b6\u6784\uff0c\u4ee5\u63d0\u9ad8\u8bad\u7ec3\u6548\u7387\u548c\u591a\u6a21\u6001\u63a8\u7406\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u7528\u4e8e\u52a8\u6001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u76841-D\u74f7\u7816\u6807\u8bb0\u8bbe\u8ba1\uff0c\u8fd9\u663e\u8457\u63d0\u9ad8\u4e86\u591a\u6a21\u6001\u63a8\u7406\u548cOCR\u76f8\u5173\u4efb\u52a1\u7684\u6027\u80fd\u3002 \u5173\u4e8e\u8bad\u7ec3\u6570\u636e\uff0c\u6211\u4eec\u7cbe\u5fc3\u6536\u96c6\u5e76\u63d0\u4f9b\u4e86\u6240\u6709\u67b6\u6784\u7684\u9884\u8bad\u7ec3\u548c\u76d1\u7763\u5fae\u8c03\u6570\u636e\u96c6\u7684\u8be6\u7ec6\u4fe1\u606f\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u6570\u636e\u8d28\u91cf\u548c\u4efb\u52a1\u591a\u6837\u6027\u6bd4\u89c4\u6a21\u66f4\u4e3a\u91cd\u8981\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u4e3aNVLM-1.0\u6a21\u578b\u5f00\u53d1\u4e86\u751f\u4ea7\u7ea7\u591a\u6a21\u6001\u529f\u80fd\uff0c\u4f7f\u5b83\u4eec\u5728\u89c6\u89c9\u8bed\u8a00\u4efb\u52a1\u4e2d\u4e0d\u4ec5\u4fdd\u6301\u751a\u81f3\u8d85\u8d8a\u4e86\u57fa\u7840\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u3002\u4e3a\u4e86\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u6211\u4eec\u5728\u591a\u6a21\u6001\u8bad\u7ec3\u4e2d\u5de7\u5999\u5730\u6574\u5408\u4e86\u4e00\u4e2a\u9ad8\u8d28\u91cf\u7684\u7eaf\u6587\u672c\u6570\u636e\u96c6\uff0c\u4ee5\u53ca\u5927\u91cf\u7684\u591a\u6a21\u6001\u6570\u5b66\u548c\u63a8\u7406\u6570\u636e\uff0c\u4ece\u800c\u5728\u6240\u6709\u6a21\u6001\u4e0b\u63d0\u9ad8\u4e86\u6570\u5b66\u548c\u7f16\u7801\u80fd\u529b\u3002 \u4e3a\u4e86\u63a8\u52a8\u9886\u57df\u7814\u7a76\uff0c\u6211\u4eec\u5c06\u53d1\u5e03\u6a21\u578b\u6743\u91cd\u5e76\u5f00\u6e90\u4ee3\u7801\u4f9b\u793e\u533a\u4f7f\u7528\uff1ahttps://nvlm-project.github.io/\u3002|\n", "2409.11390": "|**2024-09-17**|**Says Who? Effective Zero-Shot Annotation of Focalization**|Rebecca M. M. Hicke et.al.|[2409.11390](http://arxiv.org/abs/2409.11390)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u6d4b\u8bd5\u4e86\u5f53\u524d\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u4e3a\u6587\u5b66\u6587\u672c\u6807\u6ce8\u7126\u70b9\u6a21\u5f0f\u65f6\u7684\u8868\u73b0\u3002\u5c3d\u7ba1\u4efb\u52a1\u5177\u6709\u6311\u6218\u6027\uff0c\u4f46\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cLLMs\u5728\u8fd9\u4e00\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4e0e\u53d7\u8fc7\u8bad\u7ec3\u7684\u4eba\u7c7b\u6ce8\u91ca\u8005\u76f8\u5f53\u3002\u6211\u4eec\u4ee5\u65af\u8482\u82ac\u00b7\u91d1\u7684\u5c0f\u8bf4\u4e3a\u4f8b\u8fdb\u884c\u6848\u4f8b\u7814\u7a76\uff0c\u5c55\u793a\u4e86\u8fd9\u79cd\u65b9\u6cd5\u5728\u8ba1\u7b97\u6587\u5b66\u7814\u7a76\u4e2d\u7684\u5b9e\u7528\u6027\uff0c\u8bf4\u660e\u4e86\u5982\u4f55\u5927\u89c4\u6a21\u5730\u7814\u7a76\u7126\u70b9\u6a21\u5f0f\u3002|\n", "2409.11378": "|**2024-09-17**|**Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement**|Simon Yu et.al.|[2409.11378](http://arxiv.org/abs/2409.11378)|**[link](https://github.com/for-ai/iterative-data-selection)**|\u7ec6\u8c03\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u5728\u6307\u4ee4\u6570\u636e\u4e0a\u7684\u80fd\u529b\u5bf9\u4e8e\u589e\u5f3a\u9884\u8bad\u7ec3\u77e5\u8bc6\u548c\u63d0\u5347\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u81f3\u5173\u91cd\u8981\u3002\u968f\u7740\u6307\u4ee4\u6570\u636e\u96c6\u7684\u4e0d\u65ad\u589e\u591a\uff0c\u9009\u62e9\u6709\u6548\u7684\u6570\u636e\u8fdb\u884c\u6709\u6548\u8bad\u7ec3\u53d8\u5f97\u8d8a\u6765\u8d8a\u91cd\u8981\u3002\u672c\u6587\u63a2\u8ba8\u4e86\u5982\u4f55\u786e\u5b9a\u6709\u6548\u8bad\u7ec3\u7684\u6700\u4f73\u6570\u636e\u5b50\u96c6\u3002\u73b0\u6709\u7814\u7a76\u5f80\u5f80\u4fa7\u91cd\u4e8e\u5b9e\u4f8b\u8d28\u91cf\u7b49\u5c40\u90e8\u6807\u51c6\u8fdb\u884c\u5b50\u96c6\u9009\u62e9\uff0c\u4f46\u6211\u4eec\u8ba4\u4e3a\u5168\u5c40\u89c6\u89d2\u5173\u6ce8\u6570\u636e\u591a\u6837\u6027\u66f4\u4e3a\u5173\u952e\u3002\u6211\u4eec\u91c7\u7528k\u5747\u503c\u805a\u7c7b\u65b9\u6cd5\u786e\u4fdd\u6240\u9009\u5b50\u96c6\u5145\u5206\u4ee3\u8868\u6574\u4e2a\u6570\u636e\u96c6\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u542f\u53d1\u81ea\u4e3b\u52a8\u5b66\u4e60\u6280\u672f\u7684\u8fed\u4ee3\u4f18\u5316\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ece\u5404\u4e2a\u805a\u7c7b\u4e2d\u91cd\u65b0\u91c7\u6837\u5b9e\u4f8b\uff0c\u5e76\u5728\u6bcf\u4e00\u6b21\u8bad\u7ec3\u8fed\u4ee3\u4e2d\u91cd\u65b0\u8bc4\u4f30\u6bcf\u4e2a\u805a\u7c7b\u7684\u91cd\u8981\u6027\u548c\u91c7\u6837\u6743\u91cd\u3002\u8fd9\u79cd\u65b9\u6cd5\u80fd\u591f\u964d\u4f4e\u5f02\u5e38\u503c\u7684\u5f71\u54cd\u5e76\u81ea\u52a8\u7b5b\u9009\u51fa\u5305\u542b\u4f4e\u8d28\u91cf\u6570\u636e\u7684\u805a\u7c7b\u3002\u901a\u8fc7\u5728\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u3001\u4e00\u822c\u4e16\u754c\u77e5\u8bc6\u3001\u4ee3\u7801\u548c\u6570\u5b66\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u8bc4\u4f30\uff0c\u5e76\u5bf9\u5404\u79cd\u6a21\u578b\u5bb6\u65cf\u8fdb\u884c\u5fae\u8c03\uff0c\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u81f4\u6027\u6539\u8fdb\uff0c\u76f8\u6bd4\u4e8e\u968f\u673a\u9009\u62e9\u63d0\u9ad8\u4e867%\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u91c7\u6837\u65b9\u6cd5\u63d0\u9ad8\u4e863.8%\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u5f3a\u8c03\u4e86\u5728\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4ee5\u589e\u5f3a\u5e7f\u6cdb\u7684\u8bc4\u4f30\u4efb\u52a1\u6027\u80fd\u65f6\uff0c\u4f18\u5148\u8003\u8651\u591a\u6837\u6027\u7684\u91c7\u6837\u65b9\u6cd5\u7684\u91cd\u8981\u6027\u3002 \u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728https://github.com/for-ai/iterative-data-selection\u3002|\n", "2409.11376": "|**2024-09-17**|**Towards Time Series Reasoning with LLMs**|Winnie Chow et.al.|[2409.11376](http://arxiv.org/abs/2409.11376)|null|\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u89c6\u89c9\u7b49\u9886\u57df\u7684\u7406\u89e3\u548c\u63a8\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u4f46\u65f6\u95f4\u5e8f\u5217\u9886\u57df\u5c1a\u672a\u770b\u5230\u8fd9\u79cd\u5e7f\u6cdb\u7684\u6210\u529f\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u65f6\u95f4\u5e8f\u5217MLLM\u7814\u7a76\u5728\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\u4e2d\u663e\u793a\u51fa\u6709\u5e0c\u671b\u7684\u8868\u73b0\uff0c\u4f46\u5f88\u5c11\u6709\u5de5\u4f5c\u5c55\u793a\u4e86\u5982\u4f55\u4f7f\u7528\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u81ea\u7136\u8bed\u8a00\u7684\u65f6\u95f4\u5e8f\u5217\u63a8\u7406\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u591a\u6a21\u6001\u65f6\u95f4\u5e8f\u5217LLM\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8de8\u5404\u79cd\u9886\u57df\u5b66\u4e60\u901a\u7528\u4fe1\u606f\uff0c\u5e76\u5177\u6709\u5f3a\u5927\u7684\u96f6\u6837\u672c\u6027\u80fd\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5728LLM\u9876\u90e8\u8bad\u7ec3\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u65f6\u95f4\u5e8f\u5217\u7f16\u7801\u5668\uff0c\u76f4\u63a5\u63d0\u53d6\u65f6\u95f4\u5e8f\u5217\u4fe1\u606f\u3002\u7136\u540e\uff0c\u6211\u4eec\u901a\u8fc7\u589e\u5f3a\u7684\u65f6\u95f4\u5e8f\u5217\u4efb\u52a1\u5bf9\u6a21\u578b\u8fdb\u884c\u5fae\u8c03\uff0c\u4ee5\u9f13\u52b1\u6a21\u578b\u751f\u6210\u63a8\u7406\u8def\u5f84\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0c\u6a21\u578b\u5b66\u4e60\u5230\u7684\u6f5c\u5728\u8868\u793a\u53cd\u6620\u4e86\u7279\u5b9a\u7684\u65f6\u95f4\u5e8f\u5217\u7279\u5f81\uff08\u4f8b\u5982\u659c\u7387\u3001\u9891\u7387\uff09\uff0c\u5e76\u4e14\u5728\u591a\u79cd\u9886\u57df\u7684\u4e00\u7cfb\u5217\u96f6\u6837\u672c\u63a8\u7406\u4efb\u52a1\u4e0a\u5747\u4f18\u4e8eGPT-4o\u3002|\n", "2409.11375": "|**2024-09-17**|**Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification**|Fatema-E- Jannat et.al.|[2409.11375](http://arxiv.org/abs/2409.11375)|null|\u5728\u533b\u7597\u9886\u57df\u4e2d\uff0c\u83b7\u53d6\u5927\u91cf\u6570\u636e\u9762\u4e34\u7740\u663e\u8457\u7684\u6311\u6218\uff0c\u4e3b\u8981\u662f\u7531\u4e8e\u9690\u79c1\u95ee\u9898\u3002\u7136\u800c\uff0c\u4e3a\u4e86\u8bad\u7ec3\u7528\u4e8e\u89c6\u7f51\u819c\u75be\u75c5\u8bca\u65ad\u7684\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u96c6\u3002\u5728\u8f83\u5c0f\u6570\u636e\u96c6\u4e0a\u6709\u6548\u6cdb\u5316\u7684\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u6301\u7eed\u7684\u6311\u6218\u3002\u6570\u636e\u7a00\u7f3a\u6027\u6784\u6210\u4e86\u5b9e\u65bd\u53ef\u6269\u5c55\u533b\u7597AI\u89e3\u51b3\u65b9\u6848\u7684\u5b9e\u9645\u969c\u788d\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u7ed3\u5408\u4e86\u591a\u79cd\u6570\u636e\u6e90\uff0c\u4ee5\u63d0\u9ad8\u6027\u80fd\u5e76\u589e\u5f3a\u5bf9\u65b0\u6570\u636e\u7684\u6cdb\u5316\u80fd\u529b\uff0c\u901a\u8fc7\u8d4b\u4e88\u6a21\u578b\u4ece\u591a\u6a21\u6001\u6570\u636e\u96c6\u4e2d\u66f4\u6df1\u5165\u7406\u89e3\u6570\u636e\u8868\u793a\u7684\u80fd\u529b\u3002\u6211\u4eec\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548cSwinV2\u6846\u67b6\u5f00\u53d1\u4e86\u4e00\u4e2a\u81ea\u76d1\u7763\u6846\u67b6\uff0c\u4ee5\u589e\u5f3a\u6a21\u578b\u5bf9\u591a\u6a21\u6001\u6570\u636e\u96c6\u8868\u793a\u7684\u7406\u89e3\uff0c\u4ece\u800c\u63d0\u9ad8\u4f7f\u7528\u5149\u5b66\u76f8\u5e72\u65ad\u5c42\u6210\u50cf\uff08OCT\uff09\u56fe\u50cf\u68c0\u6d4b\u773c\u75c5\u7684\u80fd\u529b\u3002 \u6211\u4eec\u91c7\u7528\u4e86\u4e24\u9636\u6bb5\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5373\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u548c\u4e0b\u6e38\u76d1\u7763\u5206\u7c7b\u5668\u7684\u5fae\u8c03\u3002\u9488\u5bf9\u4e09\u79cd\u4e0d\u540c\u6570\u636e\u96c6\u8fdb\u884c\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5728\u672a\u878d\u5408\u6570\u636e\u3001\u6570\u636e\u91cf\u6709\u9650\u8bbe\u7f6e\u548c\u65e0\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u573a\u666f\u4e0b\u91c7\u7528\u4e0d\u540c\u7684\u7f16\u7801\u5668\u67b6\u6784\uff0c\u5f3a\u8c03\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u7a33\u5065\u6027\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8868\u660e\uff0c\u5373\u4f7f\u5728\u8fd9\u4e9b\u591a\u6837\u5316\u7684\u6761\u4ef6\u4e0b\uff0c\u4e5f\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6027\u80fd\uff0c\u5e76\u4e14\u4e0e\u57fa\u7ebf\u6a21\u578bResNet-50\u76f8\u6bd4\uff0c\u5177\u6709\u66f4\u5f3a\u7684\u6cdb\u5316\u80fd\u529b\u3002|\n", "2409.11365": "|**2024-09-17**|**CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration**|Jiahui Gao et.al.|[2409.11365](http://arxiv.org/abs/2409.11365)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u9762\u5bf9\u6076\u610f\u89c6\u89c9\u8f93\u5165\u65f6\u7684\u5b89\u5168\u610f\u8bc6\u95ee\u9898\u3002MLLM\u901a\u5e38\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\uff0c\u5e76\u914d\u4ee5\u56fe\u50cf\u7f16\u7801\u5668\u5c06\u56fe\u50cf\u8f6c\u6362\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u76f8\u4e00\u81f4\u7684\u6587\u672c\u6570\u636e\u96c6\u4e2d\u7684\u4ee4\u724c\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u89c6\u89c9\u6a21\u6001\u7684\u6574\u5408\u5f15\u5165\u4e86\u4e00\u79cd\u72ec\u7279\u7684\u8106\u5f31\u6027\uff1aMLLM\u5bf9\u6076\u610f\u56fe\u50cf\u8f93\u5165\u53d8\u5f97\u654f\u611f\uff0c\u5e76\u503e\u5411\u4e8e\u751f\u6210\u53ef\u80fd\u5f15\u53d1\u5b89\u5168\u6216\u6709\u5bb3\u54cd\u5e94\u7684\u8f93\u51fa\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u901a\u8fc7\u5728MLLM\u7684\u8f93\u5165\u4e2d\u52a0\u5165\u4e00\u4e2a\u539f\u5219\uff0c\u4ee5\u660e\u786e\u5b9a\u4e49\u5b89\u5168\u6027\u8981\u6c42\uff0c\u5176\u5b89\u5168\u610f\u8bc6\u5f97\u5230\u4e86\u589e\u5f3a\u3002\u8fd9\u8bc1\u5b9e\u4e86MLLM\u5728\u5904\u7406\u56fe\u50cf\u8f93\u5165\u65f6\u5177\u6709\u4e00\u5b9a\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u4f46\u8fd9\u4e00\u80fd\u529b\u53d7\u5230\u6a21\u6001\u5dee\u8ddd\u7684\u5f71\u54cd\u800c\u51cf\u5f31\u3002 \u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u6280\u672f\u2014\u2014CoCA\uff08Calibration of Conditional Awareness\uff09\uff0c\u65e8\u5728\u901a\u8fc7\u8c03\u6574\u8f93\u51fa\u5206\u5e03\u6765\u589e\u5f3aMLLM\u7684\u5b89\u5168\u610f\u8bc6\u3002\u8be5\u7b56\u7565\u6709\u52a9\u4e8e\u6a21\u578b\u6062\u590d\u5176\u539f\u59cb\u7684\u5b89\u5168\u610f\u8bc6\uff0c\u540c\u65f6\u4e0d\u727a\u7272\u5176\u539f\u6709\u80fd\u529b\u3002\u901a\u8fc7\u5728\u591a\u6a21\u6001\u5b89\u5168\u6027\u548c\u7406\u89e3\u57fa\u51c6\u4e0a\u9a8c\u8bc1\u4e86\u8fd9\u79cd\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.11360": "|**2024-09-17**|**AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances**|Dhruv Agarwal et.al.|[2409.11360](http://arxiv.org/abs/2409.11360)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5f53\u897f\u65b9\u5bfc\u5411\u7684AI\u6a21\u578b\u5411\u6765\u81ea\u4e0d\u540c\u6587\u5316\u80cc\u666f\u7684\u7528\u6237\u63d0\u4f9b\u5199\u4f5c\u5efa\u8bae\u65f6\u4f1a\u53d1\u751f\u4ec0\u4e48\u60c5\u51b5\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u4e2a\u8de8\u6587\u5316\u7684\u53d7\u63a7\u5b9e\u9a8c\uff0c\u5171\u6709\u6765\u81ea\u5370\u5ea6\u548c\u7f8e\u56fd\u7684118\u540d\u53c2\u4e0e\u8005\u5b8c\u6210\u4e86\u5177\u6709\u6587\u5316\u57fa\u7840\u7684\u5199\u4f5c\u4efb\u52a1\uff0c\u5e76\u5728\u6709\u65e0AI\u5efa\u8bae\u7684\u60c5\u51b5\u4e0b\u5b8c\u6210\u3002\u6211\u4eec\u7684\u5206\u6790\u663e\u793a\uff0cAI\u4e3a\u7f8e\u56fd\u4eba\u63d0\u4f9b\u4e86\u66f4\u9ad8\u7684\u6548\u7387\u589e\u76ca\uff0c\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5370\u5ea6\u53c2\u4e0e\u8005\u5219\u5728\u91c7\u7528\u897f\u65b9\u5199\u4f5c\u98ce\u683c\u65b9\u9762\u53d7\u5230\u5f71\u54cd\uff0c\u4e0d\u4ec5\u6539\u53d8\u4e86\u6240\u5199\u7684\u5185\u5bb9\uff0c\u4e5f\u6539\u53d8\u4e86\u5176\u5199\u4f5c\u98ce\u683c\u3002\u8fd9\u4e9b\u53d1\u73b0\u8868\u660e\uff0c\u4ee5\u897f\u65b9\u4e3a\u4e2d\u5fc3\u7684AI\u6a21\u578b\u4f1a\u5c06\u5199\u4f5c\u65b9\u5f0f\u540c\u8d28\u5316\uff0c\u4f7f\u4e4b\u8d8b\u5411\u4e8e\u897f\u65b9\u89c4\u8303\uff0c\u4ece\u800c\u524a\u5f31\u4e86\u80fd\u591f\u4f53\u73b0\u6587\u5316\u5dee\u5f02\u7684\u7ec6\u5fae\u4e4b\u5904\u3002|\n", "2409.11353": "|**2024-09-17**|**THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models**|Mengfei Liang et.al.|[2409.11353](http://arxiv.org/abs/2409.11353)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aTHaMES\uff08\u5de5\u5177\u7528\u4e8e\u5e7b\u89c9\u7f13\u89e3\u4e0e\u8bc4\u4f30\uff09\u7684\u96c6\u6210\u6846\u67b6\u548c\u5e93\uff0c\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u5b58\u5728\u7684\u5e7b\u89c9\u751f\u6210\u8fd9\u4e00\u65e5\u76ca\u589e\u957f\u7684\u6311\u6218\u3002\u73b0\u6709\u7684\u68c0\u6d4b\u548c\u7f13\u89e3\u65b9\u6cd5\u5f80\u5f80\u5b64\u7acb\u4e14\u65e0\u6cd5\u6ee1\u8db3\u7279\u5b9a\u9886\u57df\u7684\u9700\u8981\uff0c\u7f3a\u4e4f\u6807\u51c6\u5316\u6d41\u7a0b\u3002THaMES\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7aef\u5230\u7aef\u89e3\u51b3\u65b9\u6848\uff0c\u6db5\u76d6\u8bc4\u4f30\u548c\u7f13\u89e3LLMs\u4e2d\u5e7b\u89c9\u95ee\u9898\u7684\u5404\u4e2a\u73af\u8282\uff0c\u5305\u62ec\u81ea\u52a8\u5316\u6d4b\u8bd5\u96c6\u751f\u6210\u3001\u591a\u7ef4\u5ea6\u57fa\u51c6\u6d4b\u8bd5\u4ee5\u53ca\u7075\u6d3b\u7684\u7f13\u89e3\u7b56\u7565\u3002\u5b83\u901a\u8fc7\u6279\u91cf\u5904\u7406\u3001\u52a0\u6743\u62bd\u6837\u548c\u53cd\u4e8b\u5b9e\u9a8c\u8bc1\u7b49\u6280\u672f\u81ea\u52a8\u521b\u5efa\u9ad8\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6210\u672c\u6548\u76ca\u9ad8\u7684\u6d4b\u8bd5\u96c6\u3002THaMES\u8bc4\u4f30\u4e86\u6a21\u578b\u5728\u6587\u672c\u751f\u6210\u548c\u4e8c\u5206\u7c7b\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u68c0\u6d4b\u4e0e\u51cf\u5c11\u80fd\u529b\uff0c\u5e76\u5e94\u7528\u4e86\u6700\u4f73\u7f13\u89e3\u7b56\u7565\uff0c\u5982\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u3001\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u548c\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u3002\u4f7f\u7528\u5b66\u672f\u8bba\u6587\u3001\u653f\u6cbb\u65b0\u95fb\u548c\u7ef4\u57fa\u767e\u79d1\u7684\u77e5\u8bc6\u5e93\u5bf9\u524d\u6cbfLLMs\u8fdb\u884c\u8bc4\u4f30\u53d1\u73b0\uff0c\u5546\u4e1a\u6a21\u578b\u5982GPT-4o\u5728\u53d7\u76ca\u4e8eRAG\u65b9\u9762\u6bd4ICL\u66f4\u591a\uff0c\u800c\u5f00\u6e90\u6a21\u578b\u5982Llama-3.1-8B-Instruct\u548cMistral-Nemo\u5219\u4eceICL\u4e2d\u83b7\u5f97\u66f4\u5927\u76ca\u5904\u3002\u6b64\u5916\uff0cPEFT\u663e\u8457\u63d0\u9ad8\u4e86Llama-3.1-8B-Instruct\u5728\u8bc4\u4f30\u4efb\u52a1\u4e2d\u7684\u6027\u80fd\u3002|\n", "2409.11282": "|**2024-09-17**|**Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5**|Marcel Lamott et.al.|[2409.11282](http://arxiv.org/abs/2409.11282)|null|\u968f\u7740\u5404\u7c7b\u6570\u5b57\u6587\u6863\u683c\u5f0f\u7684\u6fc0\u589e\uff0c\u5c24\u5176\u662f\u90a3\u4e9b\u975e\u6807\u51c6\u5316\u7684\u6587\u6863\u5982\u5546\u4e1a\u62a5\u544a\u548c\u73af\u5883\u8bc4\u4f30\u62a5\u544a\uff0c\u6587\u6863\u7406\u89e3\u53d8\u5f97\u6108\u53d1\u91cd\u8981\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u591a\u79cd\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e0a\u5c55\u73b0\u51fa\u5f3a\u5927\u7684\u80fd\u529b\uff0c\u4f46\u5728\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u76f4\u63a5\u5e94\u7528\u4ecd\u9762\u4e34\u6311\u6218\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u8868\u660eLLMs\u5728\u8fd9\u4e00\u9886\u57df\u5177\u6709\u6f5c\u529b\uff0c\u7136\u800c\u5b83\u4eec\u5de8\u5927\u7684\u8ba1\u7b97\u9700\u6c42\u4f7f\u5176\u96be\u4ee5\u6709\u6548\u5730\u90e8\u7f72\u3002\u6b64\u5916\uff0c\u4e13\u6709\u7684\u201c\u9ed1\u76d2\u201dLLMs\u5f80\u5f80\u4f18\u4e8e\u5f00\u6e90\u7248\u672c\uff0c\u8fd9\u6784\u6210\u4e86\u5e7f\u6cdb\u53ef\u8bbf\u95ee\u6027\u7684\u969c\u788d\u3002\u672c\u6587\u6df1\u5165\u63a2\u8ba8\u4e86\u6587\u6863\u7406\u89e3\u7684\u9886\u57df\uff0c\u5229\u7528\u4e86\u4eceLLM ChatGPT\u5230FLAN-T5\u7684\u63d0\u70bc\u65b9\u6cd5\u6765\u5e73\u8861\u5927\u6a21\u578b\u7684\u5f3a\u5927\u529f\u80fd\u4e0e\u8ba1\u7b97\u9650\u5236\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u6574\u5408\u6807\u8bb0\u548c\u8bfe\u7a0b\u5b66\u4e60\u673a\u5236\u6765\u4fc3\u8fdb\u77e5\u8bc6\u7684\u6709\u6548\u8f6c\u79fb\u3002\u8fd9\u9879\u5de5\u4f5c\u5bf9\u6587\u6863\u7406\u89e3\u65b9\u6cd5\u7684\u8fdb\u5c55\u505a\u51fa\u4e86\u8d21\u732e\uff0c\u63d0\u4f9b\u4e86\u4e00\u4e2a\u53ef\u6269\u5c55\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u4ee5\u5f25\u5408\u8d44\u6e90\u5bc6\u96c6\u578bLLMs\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u63d0\u70bc\u6280\u672f\u5728\u4f7f\u590d\u6742\u8bed\u8a00\u6a21\u578b\u5728\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u4e2d\u5f97\u5230\u5e7f\u6cdb\u5e94\u7528\u7684\u6f5c\u529b\uff0c\u4ece\u800c\u63a8\u52a8\u81ea\u7136\u8bed\u8a00\u5904\u7406\u548c\u6587\u6863\u7406\u89e3\u9886\u57df\u7684\u53d1\u5c55\u3002|\n", "2409.12194": "|**2024-09-20**|**Gender Representation and Bias in Indian Civil Service Mock Interviews**|Somonnoy Banerjee et.al.|[2409.12194](http://arxiv.org/abs/2409.12194)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e09\u4e2a\u5173\u952e\u8d21\u732e\u3002\u9996\u5148\uff0c\u901a\u8fc7\u6536\u96c6\u81ea888\u4e2a\u5370\u5ea6\u516c\u52a1\u5458\u5019\u9009\u4eba\u9762\u8bd5\u6a21\u62df\u7684YouTube\u89c6\u9891\u4e2d\u768451,278\u4e2a\u95ee\u9898\u6837\u672c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u5bf9\u7537\u6027\u548c\u5973\u6027\u5019\u9009\u4eba\u63d0\u95ee\u7684\u6027\u522b\u504f\u89c1\u5728\u5e7f\u6cdb\u6027\u8d28\u4e0a\u7684\u663e\u8457\u5b58\u5728\u3002\u7b2c\u4e8c\uff0c\u6211\u4eec\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5b9e\u9a8c\u63ed\u793a\u4e86\u5728\u6027\u522b\u63a8\u65ad\u4efb\u52a1\u4e2d\uff0c\u8fd9\u4e9b\u6a21\u578b\u63d0\u4f9b\u7684\u89e3\u91ca\u4e2d\u5b58\u5728\u5f3a\u70c8\u7684\u6027\u522b\u504f\u89c1\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u542b51,278\u4e2a\u9762\u8bd5\u95ee\u9898\u7684\u65b0\u578b\u6570\u636e\u96c6\uff0c\u8fd9\u53ef\u4ee5\u4e3a\u672a\u6765\u7684\u4eba\u6587\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u63d0\u4f9b\u4fe1\u606f\u3002|\n", "2409.12183": "|**2024-09-18**|**To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning**|Zayne Sprague et.al.|[2409.12183](http://arxiv.org/abs/2409.12183)|null|\u4e3a\u4e86\u5206\u6790\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u5728\u54ea\u4e9b\u4efb\u52a1\u4e2d\u771f\u6b63\u6709\u76ca\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u91cf\u5316\u5143\u5206\u6790\uff0c\u8986\u76d6\u4e86\u8d85\u8fc7100\u7bc7\u4f7f\u7528CoT\u7684\u8bba\u6587\uff0c\u5e76\u5bf920\u4e2a\u6570\u636e\u96c6\u8fdb\u884c\u4e8614\u79cd\u6a21\u578b\u7684\u81ea\u6211\u8bc4\u4f30\u3002\u7ed3\u679c\u8868\u660e\uff0cCoT\u4e3b\u8981\u5728\u6570\u5b66\u6216\u903b\u8f91\u4efb\u52a1\u4e0a\u63d0\u4f9b\u663e\u8457\u6027\u80fd\u4f18\u52bf\uff0c\u800c\u5728\u5176\u4ed6\u7c7b\u578b\u4efb\u52a1\u4e0a\u7684\u589e\u76ca\u8f83\u5c0f\u3002\u5728MMLU\u4e0a\uff0c\u76f4\u63a5\u751f\u6210\u7b54\u6848\u800c\u65e0\u9700CoT\u51e0\u4e4e\u4e0eCoT\u5177\u6709\u76f8\u540c\u7684\u51c6\u786e\u6027\uff0c\u9664\u975e\u95ee\u9898\u6216\u6a21\u578b\u7684\u56de\u7b54\u5305\u542b\u7b49\u53f7\uff0c\u8fd9\u8868\u660e\u7b26\u53f7\u64cd\u4f5c\u548c\u63a8\u7406\u3002 \u57fa\u4e8e\u8fd9\u4e00\u53d1\u73b0\uff0c\u6211\u4eec\u5206\u6790\u4e86CoT\u5728\u8fd9\u4e9b\u95ee\u9898\u4e2d\u7684\u884c\u4e3a\uff0c\u901a\u8fc7\u5206\u79bb\u89c4\u5212\u548c\u6267\u884c\uff0c\u5e76\u4e0e\u589e\u5f3a\u5de5\u5177\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u6bd4\u8f83\u3002CoT\u5927\u90e8\u5206\u6536\u76ca\u6765\u81ea\u6539\u8fdb\u7684\u7b26\u53f7\u6267\u884c\uff0c\u4f46\u76f8\u8f83\u4e8e\u4f7f\u7528\u7b26\u53f7\u6c42\u89e3\u5668\uff0c\u5b83\u5728\u6027\u80fd\u4e0a\u8868\u73b0\u4e0d\u4f73\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u53ef\u4ee5\u6839\u636e\u9700\u8981\u5e94\u7528CoT\uff0c\u540c\u65f6\u4fdd\u6301\u6027\u80fd\u5e76\u8282\u7701\u63a8\u7406\u6210\u672c\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u7ed3\u679c\u8fd8\u8868\u660e\uff0c\u9700\u8981\u8d85\u8d8a\u57fa\u4e8e\u63d0\u793a\u7684CoT\uff0c\u8f6c\u5411\u65b0\u7684\u8303\u5f0f\uff0c\u66f4\u597d\u5730\u5229\u7528\u6574\u4e2a\u8303\u56f4\u5185\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5e94\u7528\u4e2d\u7684\u4e2d\u95f4\u8ba1\u7b97\u3002|\n", "2409.12180": "|**2024-09-18**|**Finetuning Language Models to Emit Linguistic Expressions of Uncertainty**|Arslan Chaudhry et.al.|[2409.12180](http://arxiv.org/abs/2409.12180)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4fe1\u606f\u68c0\u7d22\u4e0e\u51b3\u7b56\u4efb\u52a1\u4e2d\u7684\u5e94\u7528\u3002\u5c3d\u7ba1LLM\u5177\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u4ef7\u503c\uff0c\u4f46\u5b83\u4eec\u503e\u5411\u4e8e\u751f\u6210\u4e0e\u73b0\u5b9e\u4e16\u754c\u4e8b\u5b9e\u76f8\u51b2\u7a81\u7684\u4fe1\u606f\uff0c\u5e76\u4ee5\u8bf4\u670d\u6027\u7684\u65b9\u5f0f\u8868\u8fbe\uff0c\u4f7f\u5f97\u8fd9\u4e9b\u4e0d\u51c6\u786e\u6027\u770b\u8d77\u6765\u81ea\u4fe1\u4e14\u4ee4\u4eba\u4fe1\u670d\u3002\u8fd9\u5bfc\u81f4\u6700\u7ec8\u7528\u6237\u96be\u4ee5\u4e00\u81f4\u5730\u5c06LLM\u7684\u81ea\u4fe1\u5ea6\u4e0e\u9884\u6d4b\u7684\u51c6\u786e\u6027\u5bf9\u9f50\uff0c\u5e38\u5e38\u5bfc\u81f4\u5bf9\u6240\u6709\u8f93\u51fa\u7684\u76f2\u76ee\u4fe1\u4efb\u6216\u5b8c\u5168\u5ffd\u89c6\u5176\u53ef\u9760\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u5728\u4e0d\u786e\u5b9a\u6027\u589e\u5f3a\u7684\u9884\u6d4b\u57fa\u7840\u4e0a\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\u7684\u65b9\u6cd5\uff0c\u4ee5\u6b64\u6765\u5f00\u53d1\u80fd\u591f\u751f\u6210\u8bed\u8a00\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u7684\u6a21\u578b\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8861\u91cf\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u6821\u51c6\u7a0b\u5ea6\uff0c\u7136\u540e\u901a\u8fc7\u57fa\u4e8e\u6a21\u578b\u81ea\u8eab\u4fe1\u5fc3\u7684\u5fae\u8c03\uff0c\u4f7f\u8bed\u8a00\u6a21\u578b\u4ea7\u751f\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002\u901a\u8fc7\u5bf9\u5404\u79cd\u95ee\u7b54\u6570\u636e\u96c6\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86LLM\u5728\u8bc4\u4f30\u9884\u6d4b\u65f6\u5177\u6709\u826f\u597d\u7684\u6821\u51c6\u80fd\u529b\uff0c\u5e76\u57fa\u4e8e\u6a21\u578b\u672c\u8eab\u7684\u4fe1\u5fc3\u8fdb\u884c\u76d1\u7763\u5fae\u8c03\uff0c\u53ef\u83b7\u5f97\u7279\u522b\u9002\u7528\u4e8e\u5355\u4e2a\u58f0\u660e\u7b54\u6848\u7684\u826f\u597d\u6821\u51c6\u7684\u4e0d\u786e\u5b9a\u6027\u8868\u8ff0\u3002|\n", "2409.12150": "|**2024-09-18**|**Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference**|Najmeh Forouzandehmehr et.al.|[2409.12150](http://arxiv.org/abs/2409.12150)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\uff0c\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5f3a\u5927\u8868\u8fbe\u80fd\u529b\u6765\u89e3\u51b3\u4e2a\u6027\u5316\u670d\u88c5\u63a8\u8350\u8fd9\u4e00\u590d\u6742\u6311\u6218\u3002\u901a\u8fc7\u7ec6\u8c03\u548c\u76f4\u63a5\u53cd\u9988\u96c6\u6210\uff0c\u6211\u4eec\u8bd5\u56fe\u514b\u670dLLM\u7684\u201c\u9ed1\u76d2\u201d\u7279\u6027\u548c\u9759\u6001\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e0a\u4f7f\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u56fe\u50cf\u63cf\u8ff0\uff0c\u6765\u5f25\u5408\u9879\u76ee\u89c6\u89c9\u4e0e\u6587\u672c\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u4ece\u4eba\u7c7b\u7f16\u76ee\u7684\u65f6\u5c1a\u56fe\u50cf\u4e2d\u63d0\u53d6\u98ce\u683c\u548c\u8272\u5f69\u7279\u5f81\uff0c\u4ece\u800c\u5f62\u6210\u4e2a\u6027\u5316\u7684\u63a8\u8350\u57fa\u7840\u3002\u6211\u4eec\u4f7f\u7528\u5f00\u6e90\u7684Polyvore\u6570\u636e\u96c6\u5bf9LLM\u8fdb\u884c\u9ad8\u6548\u7ec6\u8c03\uff0c\u4f18\u5316\u5176\u63a8\u8350\u65f6\u5c1a\u642d\u914d\u7684\u80fd\u529b\u3002\u91c7\u7528\u76f4\u63a5\u504f\u597d\u673a\u5236\u5e76\u7ed3\u5408\u8d1f\u4f8b\uff0c\u4ee5\u589e\u5f3aLLM\u7684\u51b3\u7b56\u8fc7\u7a0b\u3002\u8fd9\u521b\u5efa\u4e86\u4e00\u4e2a\u81ea\u6211\u589e\u5f3a\u7684\u4eba\u5de5\u667a\u80fd\u53cd\u9988\u5faa\u73af\uff0c\u6301\u7eed\u5730\u6839\u636e\u5b63\u8282\u6027\u65f6\u5c1a\u8d8b\u52bf\u4f18\u5316\u63a8\u8350\u3002\u6211\u4eec\u7684\u6846\u67b6\u5728Polyvore\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u9488\u5bf9\u4e24\u4e2a\u5173\u952e\u4efb\u52a1\uff1a\u8865\u5168\u7a7a\u767d\u548c\u8f85\u52a9\u9879\u76ee\u68c0\u7d22\u3002\u8fd9\u4e9b\u8bc4\u4f30\u7ed3\u679c\u5f3a\u8c03\u4e86\u6846\u67b6\u751f\u6210\u65f6\u5c1a\u3001\u4e0e\u6f6e\u6d41\u4e00\u81f4\u7684\u670d\u88c5\u5efa\u8bae\u7684\u80fd\u529b\uff0c\u5e76\u901a\u8fc7\u76f4\u63a5\u53cd\u9988\u6301\u7eed\u6539\u8fdb\u3002\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u63d0\u8bae\u6846\u67b6\u5728\u8fd9\u4e9b\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u663e\u8457\u4f18\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\uff0c\u521b\u9020\u4e86\u66f4\u52a0\u534f\u8c03\u7684\u670d\u88c5\u3002\u6539\u8fdb\u7684\u8868\u73b0\u8bc1\u660e\u4e86\u8be5\u6846\u67b6\u589e\u5f3a\u8d2d\u7269\u4f53\u9a8c\u3001\u63d0\u4f9b\u51c6\u786e\u5efa\u8bae\u7684\u6f5c\u529b\uff0c\u8bc1\u660e\u4e86\u5b83\u76f8\u5bf9\u4e8e\u57fa\u4e8e\u539f\u59cbLLM\u7684\u670d\u88c5\u751f\u6210\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2409.12147": "|**2024-09-18**|**MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning**|Justin Chih-Yao Chen et.al.|[2409.12147](http://arxiv.org/abs/2409.12147)|**[link](https://github.com/dinobby/magicore)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u63a8\u7406\u80fd\u529b\u53ef\u4ee5\u901a\u8fc7\u5728\u6d4b\u8bd5\u65f6\u91c7\u7528\u805a\u5408\u7b56\u7565\u8fdb\u884c\u63d0\u5347\uff0c\u5373\u751f\u6210\u591a\u4e2a\u6837\u672c\u5e76\u57fa\u4e8e\u751f\u6210\u6837\u672c\u8fdb\u884c\u6295\u7968\u3002\u867d\u7136\u8fd9\u4e9b\u7b56\u7565\u80fd\u591f\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5f80\u5f80\u5b58\u5728\u9971\u548c\u70b9\u3002\u6539\u8fdb\u65b9\u6cd5\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201cRefinement\u201d\u7684\u7b56\u7565\uff0c\u901a\u8fc7\u5229\u7528LLM\u751f\u6210\u7684\u53cd\u9988\u6765\u63d0\u5347\u89e3\u51b3\u65b9\u6848\u7684\u8d28\u91cf\u3002\u7136\u800c\uff0cRefinement\u4e5f\u5e26\u6765\u4e86\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u8fc7\u5ea6\u7ec6\u5316\uff1a\u5bf9\u6240\u6709\u5b9e\u4f8b\u8fdb\u884c\u7edf\u4e00\u7ec6\u5316\u53ef\u80fd\u5bfc\u81f4\u8fc7\u5ea6\u4fee\u6b63\uff0c\u4ece\u800c\u964d\u4f4e\u6574\u4f53\u6027\u80fd\u3002\uff082\uff09\u96be\u4ee5\u5b9a\u4f4d\u548c\u7ea0\u6b63\u9519\u8bef\uff1aLLM\u5177\u6709\u6709\u9650\u7684\u81ea\u6211\u7ea0\u6b63\u80fd\u529b\uff0c\u5f88\u96be\u8bc6\u522b\u5e76\u7ea0\u6b63\u81ea\u5df1\u7684\u9519\u8bef\u3002\uff083\uff09\u7ec6\u5316\u4e0d\u8db3\uff1a\u51b3\u5b9a\u9700\u8981\u591a\u5c11\u8fed\u4ee3\u7684\u7ec6\u5316\u5e76\u4e0d\u5bb9\u6613\uff0c\u8fc7\u65e9\u505c\u6b62\u53ef\u80fd\u4f1a\u8ba9\u9519\u8bef\u672a\u5f97\u5230\u89e3\u51b3\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMAgICoRe\u7684\u65b9\u6cd5\uff0c\u5b83\u901a\u8fc7\u5c06\u95ee\u9898\u96be\u5ea6\u5206\u4e3a\u7b80\u5355\u6216\u56f0\u96be\uff0c\u5e76\u4f7f\u7528\u7c97\u7c92\u5ea6\u805a\u5408\u89e3\u51b3\u7b80\u5355\u95ee\u9898\uff0c\u4f7f\u7528\u7ec6\u7c92\u5ea6\u548c\u591a\u8f6e\u8fed\u4ee3\u7ec6\u5316\u89e3\u51b3\u56f0\u96be\u95ee\u9898\uff0c\u4ee5\u907f\u514d\u8fc7\u5ea6\u7ec6\u5316\u3002\u4e3a\u4e86\u6539\u5584\u9519\u8bef\u5b9a\u4f4d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57fa\u4e8e\u6b65\u9aa4\u7ea7\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u5206\u6570\u7684\u5916\u90e8\u8bc4\u5206\u3002\u6b64\u5916\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e00\u4e2a\u7531\u4e09\u4e2a\u4ee3\u7406\u7ec4\u6210\u7684\u591a\u4ee3\u7406\u5faa\u73af\uff1a\u6c42\u89e3\u8005\u3001\u5ba1\u67e5\u8005\uff08\u6839\u636e\u6b65\u9aa4\u7ea7RM\u5206\u6570\u751f\u6210\u9488\u5bf9\u6027\u53cd\u9988\uff09\u4ee5\u53ca\u7ec6\u5316\u8005\uff08\u6574\u5408\u53cd\u9988\uff09\uff0c\u4ee5\u786e\u4fdd\u6709\u6548\u7ec6\u5316\u3002\u4e3a\u4e86\u786e\u4fdd\u8db3\u591f\u7684\u7ec6\u5316\uff0c\u6211\u4eec\u91cd\u65b0\u8bc4\u4f30\u66f4\u65b0\u540e\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u5e76\u5728\u5fc5\u8981\u65f6\u542f\u52a8\u8fdb\u4e00\u6b65\u7684\u7ec6\u5316\u8f6e\u6b21\u3002\u6211\u4eec\u4f7f\u7528Llama-3-8B\u548cGPT-3.5\u57285\u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86MAgICoRe\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u6709\u6548\u6027\u3002\u5373\u4f7f\u53ea\u8fdb\u884c\u4e00\u6b21\u8fed\u4ee3\uff0cMAgICoRe\u4e5f\u80fd\u5728\u4f7f\u7528\u4e0d\u5230\u57fa\u7ebf\u6837\u672c\u4e00\u534a\u7684\u60c5\u51b5\u4e0b\uff0c\u5206\u522b\u8d85\u8fc7Self-Consistency\u3001Best-of-k\u548cSelf-Refine\u7b97\u6cd53.4%\u30013.2%\u548c4.0%\u3002\u4e0e\u8fed\u4ee3\u7ec6\u5316\u7684\u57fa\u7ebf\u76f8\u6bd4\uff0cMAgICoRe\u968f\u7740\u8fed\u4ee3\u6b21\u6570\u7684\u589e\u52a0\u6301\u7eed\u63d0\u9ad8\u6027\u80fd\u3002\u6700\u540e\uff0c\u6211\u4eec\u7684\u6d88\u878d\u5b9e\u9a8c\u5f3a\u8c03\u4e86MAgICoRe\u4e2dRMs\u548c\u591a\u4ee3\u7406\u901a\u4fe1\u7684\u91cd\u8981\u6027\u3002**|\n", "2409.12140": "|**2024-09-18**|**MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion**|Kalakonda Sai Shashank et.al.|[2409.12140](http://arxiv.org/abs/2409.12140)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMoRAG\u7684\u521b\u65b0\u591a\u90e8\u5206\u878d\u5408\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u7b56\u7565\uff0c\u7528\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u4eba\u4f53\u52a8\u4f5c\u751f\u6210\u3002\u6b64\u65b9\u6cd5\u901a\u8fc7\u5229\u7528\u589e\u5f3a\u7684\u8fd0\u52a8\u68c0\u7d22\u8fc7\u7a0b\u83b7\u5f97\u7684\u989d\u5916\u77e5\u8bc6\u6765\u63d0\u5347\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u3002\u901a\u8fc7\u6709\u6548\u6fc0\u53d1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\uff0c\u6211\u4eec\u89e3\u51b3\u4e86\u8fd0\u52a8\u68c0\u7d22\u4e2d\u7684\u62fc\u5199\u9519\u8bef\u548c\u91cd\u8ff0\u95ee\u9898\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u91c7\u7528\u591a\u90e8\u5206\u68c0\u7d22\u7b56\u7565\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u68c0\u7d22\u5728\u8bed\u8a00\u7a7a\u95f4\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u901a\u8fc7\u7a7a\u95f4\u7ec4\u5408\u68c0\u7d22\u5230\u7684\u52a8\u4f5c\u6765\u751f\u6210\u591a\u6837\u5316\u7684\u6837\u672c\u3002\u6b64\u5916\uff0c\u5229\u7528\u4f4e\u7ea7\u3001\u7279\u5b9a\u90e8\u5206\u7684\u8fd0\u52a8\u4fe1\u606f\uff0c\u6211\u4eec\u53ef\u4ee5\u6784\u5efa\u9488\u5bf9\u672a\u89c1\u8fc7\u6587\u672c\u63cf\u8ff0\u7684\u8fd0\u52a8\u6837\u672c\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u4f5c\u4e3a\u63d2\u4ef6\u6a21\u5757\u4f7f\u7528\uff0c\u4ee5\u63d0\u9ad8\u8fd0\u52a8\u6269\u6563\u6a21\u578b\u7684\u6027\u80fd\u3002\u4ee3\u7801\u3001\u9884\u8bad\u7ec3\u6a21\u578b\u548c\u89c6\u9891\u793a\u4f8b\u5c06\u5728\u4ee5\u4e0b\u7f51\u5740\u63d0\u4f9b\uff1ahttps://motion-rag.github.io/|\n", "2409.12139": "|**2024-09-24**|**Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models**|Sijing Chen et.al.|[2409.12139](http://arxiv.org/abs/2409.12139)|null|\u968f\u7740\u5927\u6570\u636e\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u65f6\u4ee3\u7684\u5230\u6765\uff0c\u96f6\u6837\u672c\u4e2a\u6027\u5316\u5feb\u901f\u5b9a\u5236\u5df2\u6210\u4e3a\u4e00\u4e2a\u663e\u8457\u8d8b\u52bf\u3002\u672c\u62a5\u544a\u4ecb\u7ecd\u4e86Takin AudioLLM\u7cfb\u5217\u6280\u672f\u4e0e\u6a21\u578b\uff0c\u4e3b\u8981\u5305\u62ecTakin TTS\u3001Takin VC\u548cTakin Morphing\uff0c\u4e13\u95e8\u7528\u4e8e\u6709\u58f0\u8bfb\u7269\u5236\u4f5c\u3002\u8fd9\u4e9b\u6a21\u578b\u5177\u5907\u96f6\u6837\u672c\u8bed\u97f3\u751f\u6210\u80fd\u529b\uff0c\u80fd\u4ea7\u751f\u51e0\u4e4e\u4e0e\u771f\u4eba\u58f0\u97f3\u96be\u4ee5\u533a\u5206\u7684\u9ad8\u8d28\u91cf\u8bed\u97f3\uff0c\u4f7f\u5f97\u4e2a\u4eba\u53ef\u4ee5\u6839\u636e\u81ea\u8eab\u9700\u6c42\u5b9a\u5236\u8bed\u97f3\u5185\u5bb9\u3002 \u9996\u5148\uff0c\u6211\u4eec\u4ecb\u7ecdTakin TTS\uff0c\u8fd9\u662f\u4e00\u79cd\u57fa\u4e8e\u589e\u5f3a\u795e\u7ecf\u8bed\u97f3\u7f16\u89e3\u7801\u5668\u548c\u591a\u4efb\u52a1\u8bad\u7ec3\u6846\u67b6\u7684\u795e\u7ecf\u7f16\u89e3\u7801\u8bed\u8a00\u6a21\u578b\uff0c\u80fd\u591f\u4ee5\u96f6\u6837\u672c\u65b9\u5f0f\u751f\u6210\u9ad8\u4fdd\u771f\u81ea\u7136\u8bed\u97f3\u3002\u5bf9\u4e8eTakin VC\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u5185\u5bb9\u4e0e\u97f3\u8272\u8054\u5408\u5efa\u6a21\u65b9\u6cd5\u6765\u63d0\u9ad8\u8bf4\u8bdd\u4eba\u76f8\u4f3c\u5ea6\uff0c\u5e76\u5021\u5bfc\u57fa\u4e8e\u6761\u4ef6\u6d41\u5339\u914d\u7684\u89e3\u7801\u5668\u8fdb\u4e00\u6b65\u63d0\u5347\u5176\u81ea\u7136\u6027\u548c\u8868\u8fbe\u529b\u3002\u6700\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Takin Morphing\u7cfb\u7edf\uff0c\u8be5\u7cfb\u7edf\u91c7\u7528\u9ad8\u5ea6\u89e3\u8026\u4e14\u5148\u8fdb\u7684\u97f3\u8272\u4e0e\u8282\u594f\u5efa\u6a21\u65b9\u6cd5\uff0c\u4f7f\u4e2a\u4f53\u80fd\u591f\u4ee5\u7cbe\u786e\u53ef\u63a7\u7684\u65b9\u5f0f\u6839\u636e\u81ea\u5df1\u7684\u504f\u597d\u5b9a\u5236\u8bed\u97f3\u751f\u4ea7\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eecTakin AudioLLM\u7cfb\u5217\u6a21\u578b\u7684\u6709\u6548\u6027\u548c\u9c81\u68d2\u6027\u3002\u6709\u5173\u8be6\u7ec6\u6f14\u793a\uff0c\u8bf7\u53c2\u9605\u3002|\n", "2409.12122": "|**2024-09-18**|**Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement**|An Yang et.al.|[2409.12122](http://arxiv.org/abs/2409.12122)|null|\u5728\u672c\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u7cfb\u5217\u6570\u5b66\u4e13\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff1aQwen2.5-Math \u548c Qwen2.5-Math-Instruct-1.5B/7B/72B\u3002Qwen2.5 \u7cfb\u5217\u7684\u6838\u5fc3\u521b\u65b0\u5728\u4e8e\u5728\u6574\u4e2a\u7ba1\u9053\u4e2d\u878d\u5165\u81ea\u6211\u63d0\u5347\u7684\u54f2\u5b66\uff0c\u5305\u62ec\u9884\u8bad\u7ec3\u3001\u540e\u5904\u7406\u548c\u63a8\u7406\u9636\u6bb5\uff1a\uff081\uff09\u5728\u9884\u8bad\u7ec3\u9636\u6bb5\uff0c\u4f7f\u7528 Qwen2-Math-Instruct \u6765\u751f\u6210\u5927\u89c4\u6a21\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6570\u636e\u3002\uff082\uff09\u5728\u540e\u5904\u7406\u9636\u6bb5\uff0c\u6211\u4eec\u901a\u8fc7\u4ece Qwen2-Math-Instruct \u8fdb\u884c\u5927\u91cf\u91c7\u6837\u6765\u5f00\u53d1\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06\u6b64 RM \u5e94\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u7684\u8fed\u4ee3\u8fdb\u5316\u3002\u901a\u8fc7\u589e\u5f3a\u7684 SFT \u6a21\u578b\uff0c\u6709\u53ef\u80fd\u8fdb\u884c\u8fed\u4ee3\u8bad\u7ec3\u5e76\u66f4\u65b0 RM\uff0c\u8fdb\u800c\u6307\u5bfc SFT \u6570\u636e\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u5728\u6700\u7ec8\u7684 SFT \u6a21\u578b\u4e0a\uff0c\u6211\u4eec\u91c7\u7528\u7ec8\u6781 RM \u8fdb\u884c\u5f3a\u5316\u5b66\u4e60\uff0c\u4ece\u800c\u4ea7\u751f Qwen2.5-Math-Instruct \u6a21\u578b\u3002\uff083\uff09\u6b64\u5916\uff0c\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u4f7f\u7528 RM \u6765\u5f15\u5bfc\u91c7\u6837\uff0c\u4f18\u5316\u6a21\u578b\u6027\u80fd\u3002 Qwen2.5-Math-Instruct \u652f\u6301\u4e2d\u6587\u548c\u82f1\u6587\uff0c\u5e76\u5177\u6709\u9ad8\u7ea7\u6570\u5b66\u63a8\u7406\u80fd\u529b\uff0c\u5305\u62ec\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u548c\u5de5\u5177\u96c6\u6210\u63a8\u7406\uff08TIR\uff09\u3002\u6211\u4eec\u5728\u82f1\u8bed\u548c\u4e2d\u6587\u7684 10 \u4e2a\u6570\u5b66\u6570\u636e\u96c6\u4e0a\u8bc4\u4f30\u4e86\u6211\u4eec\u7684\u6a21\u578b\uff0c\u5982 GSM8K\u3001MATH\u3001GaoKao\u3001AMC23 \u548c AIME24\uff0c\u6db5\u76d6\u4ece\u5c0f\u5b66\u6c34\u5e73\u5230\u6570\u5b66\u7ade\u8d5b\u95ee\u9898\u7684\u5e7f\u6cdb\u96be\u5ea6\u3002|\n", "2409.12117": "|**2024-09-18**|**Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference**|Edresson Casanova et.al.|[2409.12117](http://arxiv.org/abs/2409.12117)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u901a\u8fc7\u5c06\u97f3\u9891\u8f6c\u6362\u4e3a\u79bb\u6563\u4ee4\u724c\u7684\u97f3\u9891\u7f16\u89e3\u7801\u5668\u65b9\u9762\u663e\u8457\u63a8\u52a8\u4e86\u97f3\u9891\u5904\u7406\uff0c\u8fd9\u4f7f\u5f97\u53ef\u4ee5\u5c06\u8bed\u8a00\u5efa\u6a21\u6280\u672f\u5e94\u7528\u4e8e\u97f3\u9891\u6570\u636e\u3002\u7136\u800c\uff0c\u97f3\u9891\u7f16\u89e3\u7801\u5668\u901a\u5e38\u4ee5\u9ad8\u5e27\u7387\u8fd0\u884c\uff0c\u5bfc\u81f4\u8bad\u7ec3\u548c\u63a8\u7406\u901f\u5ea6\u7f13\u6162\uff0c\u7279\u522b\u662f\u5728\u81ea\u56de\u5f52\u6a21\u578b\u4e2d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4f4e\u5e27\u7387\u8bed\u97f3\u7f16\u89e3\u7801\u5668\uff08LFSC\uff09\uff1a\u4e00\u79cd\u795e\u7ecf\u97f3\u9891\u7f16\u89e3\u7801\u5668\uff0c\u5b83\u5229\u7528\u6709\u9650\u6807\u91cf\u91cf\u5316\u548c\u4e0e\u5927\u578b\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u7684\u5bf9\u6297\u6027\u8bad\u7ec3\uff0c\u4ee51.89 kbps\u7684\u6bd4\u7279\u7387\u548c21.5\u5e27/\u79d2\u5b9e\u73b0\u9ad8\u8d28\u91cf\u7684\u97f3\u9891\u538b\u7f29\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u6211\u4eec\u7684\u65b0\u578b\u7f16\u89e3\u7801\u5668\u53ef\u4ee5\u4f7f\u57fa\u4e8eLLM\u7684\u6587\u672c\u5230\u8bed\u97f3\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u52a0\u5feb\u7ea6\u4e09\u500d\uff0c\u540c\u65f6\u63d0\u9ad8\u53ef\u61c2\u5ea6\u5e76\u4ea7\u751f\u4e0e\u4ee5\u5f80\u6a21\u578b\u76f8\u5f53\u7684\u8d28\u91cf\u3002|\n", "2409.12106": "|**2024-09-18**|**Measuring Human and AI Values based on Generative Psychometrics with Large Language Models**|Haoran Ye et.al.|[2409.12106](http://arxiv.org/abs/2409.12106)|**[link](https://github.com/value4ai/gpv)**|**\u672c\u6587\u5f15\u5165\u4e86\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u751f\u6210\u5fc3\u7406\u6d4b\u5ea6\uff08GPV\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u6570\u636e\u9a71\u52a8\u7684\u4ef7\u503c\u6d4b\u91cf\u8303\u5f0f\uff0c\u7406\u8bba\u57fa\u7840\u5728\u4e8e\u6587\u672c\u63ed\u793a\u7684\u9009\u62e9\u6027\u611f\u77e5\u3002\u9996\u5148\uff0c\u6211\u4eec\u5bf9LLM\u8fdb\u884c\u5fae\u8c03\u4ee5\u5b9e\u73b0\u7cbe\u786e\u7684\u611f\u77e5\u5c42\u7ea7\u4ef7\u503c\u6d4b\u91cf\uff0c\u5e76\u9a8c\u8bc1LLM\u89e3\u6790\u6587\u672c\u5f62\u6210\u611f\u77e5\u7684\u6838\u5fc3\u80fd\u529b\uff0c\u4ece\u800c\u6784\u5efaGPV\u7ba1\u9053\u7684\u57fa\u7840\u3002\u7136\u540e\uff0c\u6211\u4eec\u5c06GPV\u5e94\u7528\u4e8e\u4eba\u7c7b\u64b0\u5199\u7684\u535a\u5ba2\uff0c\u8bc1\u660e\u5176\u7a33\u5b9a\u6027\u548c\u6709\u6548\u6027\uff0c\u5e76\u4e14\u4f18\u4e8e\u5148\u524d\u7684\u5fc3\u7406\u5b66\u5de5\u5177\u3002\u63a5\u7740\uff0c\u6211\u4eec\u5c06GPV\u6269\u5c55\u5230LLM\u4ef7\u503c\u6d4b\u91cf\uff0c\u901a\u8fc7\u4ee5\u4e0b\u65b9\u5f0f\u63a8\u52a8\u5f53\u524d\u6280\u672f\uff1a1\uff09\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eLLM\u53ef\u6269\u5c55\u548c\u81ea\u7531\u5f62\u5f0f\u8f93\u51fa\u7684\u91cf\u5316\u65b9\u6cd5\uff0c\u4f7f\u4ef7\u503c\u6d4b\u91cf\u80fd\u591f\u9488\u5bf9\u7279\u5b9a\u60c5\u5883\uff1b2\uff09\u6bd4\u8f83\u4e86\u4e0d\u540c\u6d4b\u91cf\u65b9\u6cd5\uff0c\u63ed\u793a\u4e86\u524d\u4eba\u65b9\u6cd5\u7684\u56de\u5e94\u504f\u5dee\uff1b3\uff09\u5c1d\u8bd5\u5c06LLM\u4ef7\u503c\u4e0e\u5b89\u5168\u6027\u8054\u7cfb\u8d77\u6765\uff0c\u53d1\u73b0\u4e0d\u540c\u4ef7\u503c\u4f53\u7cfb\u7684\u9884\u6d4b\u529b\uff0c\u5e76\u5206\u6790\u5404\u79cd\u4ef7\u503c\u5bf9LLM\u5b89\u5168\u6027\u7684\u5f71\u54cd\u3002\u901a\u8fc7\u8de8\u5b66\u79d1\u52aa\u529b\uff0c\u672c\u6587\u65e8\u5728\u5229\u7528AI\u63a8\u52a8\u4e0b\u4e00\u4ee3\u5fc3\u7406\u6d4b\u5ea6\u7684\u53d1\u5c55\uff0c\u5e76\u5229\u7528\u5fc3\u7406\u6d4b\u5ea6\u4fc3\u8fdb\u4ef7\u503c\u5bfc\u5411\u7684AI\u3002**|\n", "2409.17143": "|**2024-09-25**|**Attention Prompting on Image for Large Vision-Language Models**|Runpeng Yu et.al.|[2409.17143](http://arxiv.org/abs/2409.17143)|**[link](https://github.com/yu-rp/apiprompting)**|**\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u76f8\u6bd4\uff0c\u5927\u578b\u89c6\u89c9-\u8bed\u8a00\u6a21\u578b\uff08LVLM\uff09\u8fd8\u80fd\u63a5\u53d7\u56fe\u50cf\u4f5c\u4e3a\u8f93\u5165\uff0c\u56e0\u6b64\u5c55\u793a\u4e86\u66f4\u591a\u6709\u8da3\u7684\u73b0\u8c61\u7ea7\u80fd\u529b\uff0c\u5e76\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u8868\u73b0\u3002\u53d7LLM\u4e2d\u6587\u672c\u63d0\u793a\u7684\u542f\u53d1\uff0c\u63a2\u7d22\u4e86\u589e\u5f3aLVLM\u5bf9\u89c6\u89c9\u4fe1\u606f\u611f\u77e5\u80fd\u529b\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u3002\u7136\u800c\uff0c\u4ee5\u5f80\u7684\u89c6\u89c9\u63d0\u793a\u6280\u672f\u4ec5\u5904\u7406\u89c6\u89c9\u8f93\u5165\u800c\u4e0d\u8003\u8651\u6587\u672c\u67e5\u8be2\uff0c\u9650\u5236\u4e86\u6a21\u578b\u9075\u5faa\u6587\u672c\u6307\u4ee4\u5b8c\u6210\u4efb\u52a1\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u672c\u5de5\u4f5c\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u7684\u65b0\u63d0\u793a\u6280\u672f\uff0c\u8be5\u6280\u672f\u7b80\u5355\u5730\u5728\u539f\u59cb\u8f93\u5165\u56fe\u50cf\u4e0a\u53e0\u52a0\u4e86\u4e00\u4e2a\u7531\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u751f\u6210\u7684\u3001\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\uff0c\u5e76\u6709\u6548\u5730\u589e\u5f3a\u4e86LVLM\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u901a\u8fc7\u4e00\u4e2a\u8f85\u52a9\u6a21\u578b\uff08\u5982CLIP\uff09\u4e3a\u8f93\u5165\u56fe\u50cf\u751f\u6210\u4e00\u4e2a\u4f9d\u8d56\u4e8e\u6587\u672c\u67e5\u8be2\u7684\u6ce8\u610f\u529b\u70ed\u56fe\u3002\u7136\u540e\uff0c\u70ed\u56fe\u7b80\u5355\u5730\u4e58\u4ee5\u539f\u59cb\u56fe\u50cf\u7684\u50cf\u7d20\u503c\u6765\u83b7\u5f97\u5b9e\u9645\u8f93\u5165\u56fe\u50cf\u4f9bLVLM\u4f7f\u7528\u3002\u5728\u5404\u79cd\u89c6\u89c9-\u8bed\u8a00\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u6280\u672f\u7684\u6709\u6548\u6027\u3002\u4f8b\u5982\uff0c\u201c\u6ce8\u610f\u529b\u6620\u5c04\u4e0a\u7684\u56fe\u50cf\u63d0\u793a\u201d\u5206\u522b\u63d0\u9ad8\u4e86LLaVA-1.5\u5728MM-Vet\u548cLLaVA-Wild\u57fa\u51c6\u4e0a\u7684\u6027\u80fd3.8%\u548c2.9%\u3002**|\n", "2409.17141": "|**2024-09-25**|**FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression**|Fazal Mittu et.al.|[2409.17141](http://arxiv.org/abs/2409.17141)|**[link](https://github.com/fazalmittu/finezip)**|**\u672c\u6587\u6df1\u5165\u5206\u6790\u4e86\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u4e0eTransformer\u7684\u6587\u672c\u538b\u7f29\u6280\u672f\uff0c\u5e76\u5c06\u5176\u4e0e\u4f20\u7edf\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u8fdb\u884c\u5bf9\u6bd4\u3002\u5c3d\u7ba1\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u7cfb\u7edf\u5728\u538b\u7f29\u6bd4\u4e0a\u663e\u8457\u4f18\u4e8e\u4f20\u7edf\u65b9\u6cd5\uff0c\u4f46\u5b83\u4eec\u5728\u5b9e\u7528\u6027\u65b9\u9762\u5374\u6781\u4e3a\u6709\u9650\u3002\u4ee5Llama3-8B\u4e3a\u57fa\u7840\u7684LLM\u538b\u7f29\u7cfb\u7edf\u2014\u2014LLMZip\uff0c\u5728\u538b\u7f29\u4ec510MB\u6587\u672c\u65f6\u9700\u89819.5\u5929\u7684\u65f6\u95f4\uff0c\u5c3d\u7ba1\u538b\u7f29\u6548\u679c\u6709\u6240\u63d0\u5347\u3002 \u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FineZip\u2014\u2014\u4e00\u79cd\u7ed3\u5408\u5728\u7ebf\u8bb0\u5fc6\u4e0e\u52a8\u6001\u4e0a\u4e0b\u6587\u6982\u5ff5\u7684\u65b0\u578bLLM\u6587\u672c\u538b\u7f29\u7cfb\u7edf\u3002FineZip\u76f8\u8f83\u4e8eLLMZip\uff0c\u5c06\u538b\u7f29\u65f6\u95f4\u5927\u5e45\u7f29\u77ed\u81f3\u7ea64\u5c0f\u65f6\uff0c\u6027\u80fd\u63d0\u5347\u4e8654\u500d\uff0c\u4e14\u4e0e\u4f20\u7edf\u7b97\u6cd5\u538b\u7f29\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5176\u538b\u7f29\u6548\u7387\u63d0\u9ad8\u4e86\u5927\u7ea650%\u3002\u901a\u8fc7\u672c\u7814\u7a76\uff0c\u6211\u4eec\u8fc8\u51fa\u4e86\u8ba9\u57fa\u4e8eLLM\u7684\u65e0\u635f\u6587\u672c\u538b\u7f29\u6210\u4e3a\u73b0\u5b9e\u7684\u7b2c\u4e00\u6b65\u3002\u5c3d\u7ba1FineZip\u5df2\u53d6\u5f97\u663e\u8457\u8fdb\u5c55\uff0c\u4f46LLM\u4ecd\u4e0d\u9002\u7528\u4e8e\u5927\u89c4\u6a21\u6587\u672c\u538b\u7f29\u3002\u6211\u4eec\u671f\u5f85\u672c\u6587\u7684\u7814\u7a76\u548c\u521b\u65b0\u80fd\u4e3a\u672a\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\u94fa\u5e73\u9053\u8def\u3002**|\n", "2409.17140": "|**2024-09-25**|**Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents**|Junting Lu et.al.|[2409.17140](http://arxiv.org/abs/2409.17140)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aAXIS\u7684\u65b0\u578b\u57fa\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u4ee3\u7406\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u5e94\u7528\u7a0b\u5e8f\u7f16\u7a0b\u63a5\u53e3\uff08API\uff09\u4f18\u5148\u5904\u7406\u64cd\u4f5c\u800c\u975e\u7528\u6237\u754c\u9762\uff08UI\uff09\u64cd\u4f5c\uff0c\u4ee5\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u4ee3\u7406\u5728\u590d\u6742\u4efb\u52a1\u4e2d\u7684\u9ad8\u5ef6\u8fdf\u548c\u4f4e\u53ef\u9760\u6027\u95ee\u9898\u3002\u6b64\u5916\uff0cAXIS\u6846\u67b6\u8fd8\u901a\u8fc7\u81ea\u52a8\u5316\u63a2\u7d22\u5e94\u7528\u7a0b\u5e8f\u7684\u65b9\u5f0f\u4fc3\u8fdb\u4e86API\u7684\u521b\u5efa\u4e0e\u6269\u5c55\u3002 \u5728Office Word\u5e94\u7528\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u4eba\u7c7b\u76f8\u6bd4\uff0cAXIS\u5728\u4efb\u52a1\u5b8c\u6210\u65f6\u95f4\u4e0a\u7f29\u77ed\u4e8665%-70%\uff0c\u8ba4\u77e5\u8d1f\u8377\u964d\u4f4e\u4e8638%-53%\uff0c\u540c\u65f6\u4fdd\u6301\u4e8697%-98%\u7684\u51c6\u786e\u6027\u3002\u8fd9\u9879\u5de5\u4f5c\u4e3a\u4eba\u7c7b\u3001\u4ee3\u7406\u548c\u8ba1\u7b97\u673a\u4ea4\u4e92\uff08HACI\uff09\u6846\u67b6\u4ee5\u53ca\u5e94\u7528\u7a0b\u5e8f\u63d0\u4f9b\u8005\u5728LLM\u65f6\u4ee3\u7684\u65b0UI\u8bbe\u8ba1\u539f\u5219\u505a\u51fa\u4e86\u8d21\u732e\u3002\u5b83\u4e5f\u63a2\u8ba8\u4e86\u5c06\u6bcf\u4e2a\u5e94\u7528\u7a0b\u5e8f\u8f6c\u5316\u4e3a\u4ee3\u7406\u7684\u53ef\u80fd\u6027\uff0c\u4e3a\u4ee3\u7406\u4e3a\u4e2d\u5fc3\u7684\u64cd\u4f5c\u7cfb\u7edf\uff08Agent OS\uff09\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.17115": "|**2024-09-25**|**Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale**|Fan Zhou et.al.|[2409.17115](http://arxiv.org/abs/2409.17115)|**[link](https://github.com/gair-nlp/prox)**|**\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u9884\u8bad\u7ec3\u9886\u57df\uff0c\u4eba\u4eec\u957f\u671f\u4ee5\u6765\u4f9d\u8d56\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u5236\u5b9a\u63d0\u5347\u6570\u636e\u8d28\u91cf\u7684\u542f\u53d1\u5f0f\u89c4\u5219\uff0c\u81f3\u4eca\u5df2\u53d1\u5c55\u51fa\u4f17\u591a\u89c4\u5219\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u89c4\u5219\u7f3a\u4e4f\u7075\u6d3b\u6027\uff0c\u65e0\u6cd5\u6709\u6548\u9488\u5bf9\u6bcf\u4e2a\u5b9e\u4f8b\u7684\u72ec\u7279\u7279\u6027\u8fdb\u884c\u8c03\u6574\u3002\u540c\u65f6\uff0c\u4e3a\u6bcf\u4e2a\u5b9e\u4f8b\u5e94\u7528\u5b9a\u5236\u89c4\u5219\u5bf9\u4e8e\u4eba\u7c7b\u4e13\u5bb6\u800c\u8a00\u662f\u4e0d\u5207\u5b9e\u9645\u7684\u3002\u672c\u6587\u5c55\u793a\u4e86\u5373\u4f7f\u662f\u53c2\u6570\u6570\u91cf\u4ec5\u67090.3B\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u4e5f\u80fd\u5c55\u73b0\u51fa\u4e0e\u4eba\u7c7b\u4e13\u5bb6\u76f8\u5f53\u7684\u6570\u636e\u4f18\u5316\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u201c\u7f16\u7a0b\u6bcf\u4f8b\u201d\uff08ProX\uff09\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u6570\u636e\u4f18\u5316\u89c6\u4e3a\u7f16\u7a0b\u4efb\u52a1\uff0c\u5141\u8bb8\u6a21\u578b\u901a\u8fc7\u751f\u6210\u5e76\u6267\u884c\u7cbe\u7ec6\u7c92\u5ea6\u7684\u64cd\u4f5c\uff08\u5982\u5b57\u7b26\u4e32\u89c4\u8303\u5316\uff09\u5bf9\u6bcf\u4e2a\u4e2a\u4f53\u5b9e\u4f8b\u8fdb\u884c\u5927\u89c4\u6a21\u4f18\u5316\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4f7f\u7528ProX\u7b5b\u9009\u540e\u7684\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u5404\u79cd\u4e0b\u6e38\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u4f18\u4e8e\u539f\u59cb\u6570\u636e\u6216\u7531\u5176\u4ed6\u7b5b\u9009\u65b9\u6cd5\u5904\u7406\u7684\u6570\u636e\uff0c\u6027\u80fd\u63d0\u5347\u8d85\u8fc72%\u3002\u8be5\u6846\u67b6\u7684\u6709\u6548\u6027\u9002\u7528\u4e8e\u4e0d\u540c\u89c4\u6a21\u7684\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u6570\u636e\u96c6\uff0c\u5305\u62ecC4\u3001RedPajama-V2\u548cFineWeb\u3002\u6b64\u5916\uff0cProX\u5728\u7279\u5b9a\u9886\u57df\u7684\u8fde\u7eed\u9884\u8bad\u7ec3\u4e2d\u8868\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff1a\u5728\u65e0\u9700\u7279\u5b9a\u9886\u57df\u8bbe\u8ba1\u7684\u60c5\u51b5\u4e0b\uff0c\u4f7f\u7528ProX\u4f18\u5316\u7684OpenWebMath\u6570\u636e\u9884\u8bad\u7ec3\u7684\u6a21\u578b\uff0c\u5728\u51c6\u786e\u6027\u4e0a\u5206\u522b\u6bd4Mistral-7B\u3001Llama-2-7B\u548cCodeLlama-7B\u63d0\u9ad8\u4e867.6%\u300114.6%\u548c20.3%\uff0c\u4ec5\u4f7f\u7528\u7ea610B\u4ee4\u724c\u5373\u53ef\u8fbe\u5230\u7c7b\u4f3c\u4e8e\u4f7f\u7528200B\u4ee4\u724c\u9884\u8bad\u7ec3\u7684Llama-7B\u6a21\u578b\u7684\u6c34\u5e73\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0cProX\u663e\u8457\u8282\u7701\u4e86\u8bad\u7ec3FLOPs\uff0c\u4e3a\u9ad8\u6548LLM\u9884\u8bad\u7ec3\u5f00\u8f9f\u4e86\u6709\u524d\u666f\u7684\u9053\u8def\u3002 \u6211\u4eec\u516c\u5f00\u53d1\u5e03\u4e86ProX\uff0c\u5305\u62ec>100B\u7684\u8bed\u6599\u5e93\u3001\u6a21\u578b\u4ee5\u53ca\u6240\u6709\u8bad\u7ec3\u548c\u5b9e\u73b0\u7ec6\u8282\uff0c\u4ee5\u4fc3\u8fdb\u53ef\u590d\u5236\u7814\u7a76\u548c\u672a\u6765\u521b\u65b0\u3002\u4ee3\u7801\uff1ahttps://github.com/GAIR-NLP/ProX**|\n", "2409.17092": "|**2024-09-25**|**Accumulator-Aware Post-Training Quantization**|Ian Colbert et.al.|[2409.17092](http://arxiv.org/abs/2409.17092)|null|\u8fd1\u5e74\u6765\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u7d22\u4e86\u4f4e\u7cbe\u5ea6\u7d2f\u52a0\uff0c\u62a5\u544a\u4e86\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684\u541e\u5410\u91cf\u3001\u529f\u7387\u548c\u9762\u79ef\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u63d0\u8bae\u4ec5\u8003\u8651\u4e86\u91cf\u5316\u611f\u77e5\u8bad\u7ec3\uff08QAT\uff09\u8303\u5f0f\uff0c\u5728\u8be5\u8303\u5f0f\u4e2d\uff0c\u6a21\u578b\u5728\u91cf\u5316\u5faa\u73af\u4e2d\u8fdb\u884c\u5fae\u8c03\u6216\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u3002\u968f\u7740\u6a21\u578b\u7ee7\u7eed\u589e\u5927\uff0cQAT\u6280\u672f\u7684\u6210\u672c\u53d8\u5f97\u8d8a\u6765\u8d8a\u9ad8\uff0c\u8fd9\u6fc0\u53d1\u4e86\u6700\u8fd1\u5bf9\u540e\u91cf\u5316\u91cf\u5316\uff08PTQ\uff09\u7814\u7a76\u7684\u70ed\u6f6e\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0c\u8fd9\u662f\u9996\u6b21\u6b63\u5f0f\u7814\u7a76PTQ\u80cc\u666f\u4e0b\u7684\u79ef\u7b97\u5668\u611f\u77e5\u91cf\u5316\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86AXE\uff0c\u4e00\u4e2a\u65e8\u5728\u8d4b\u4e88\u73b0\u6709\u5c42\u5f0fPTQ\u7b97\u6cd5\u6ea2\u51fa\u907f\u514d\u4fdd\u8bc1\u7684\u5b9e\u7528\u6846\u67b6\u7684\u6269\u5c55\u3002\u6211\u4eec\u901a\u8fc7\u5728\u4e24\u4e2a\u6700\u5148\u8fdb\u7684PTQ\u7b97\u6cd5\uff1aGPFQ\u548cOPTQ\u4e4b\u4e0a\u5b9e\u73b0AXE\u6765\u7406\u8bba\u5730\u63a8\u52a8AXE\uff0c\u5e76\u8bc1\u660e\u5176\u7075\u6d3b\u6027\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u6211\u4eec\u901a\u8fc7\u9996\u6b21\u652f\u6301\u591a\u9636\u6bb5\u79ef\u7d2f\u6765\u4e00\u822c\u5316AXE\uff0c\u4e3a\u5168\u6570\u636e\u8def\u5f84\u4f18\u5316\u548c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u6269\u5c55\u6253\u5f00\u5927\u95e8\u3002\u6211\u4eec\u5728\u56fe\u50cf\u5206\u7c7b\u548c\u8bed\u8a00\u751f\u6210\u6a21\u578b\u4e0a\u8bc4\u4f30\u4e86AXE\uff0c\u5e76\u89c2\u5bdf\u5230\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5728\u79ef\u7b97\u5668\u4f4d\u5bbd\u4e0e\u6a21\u578b\u51c6\u786e\u6027\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u6539\u8fdb\u3002|\n", "2409.17066": "|**2024-09-25**|**VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models**|Yifei Liu et.al.|[2409.17066](http://arxiv.org/abs/2409.17066)|**[link](https://github.com/microsoft/vptq)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aVector Post-Training Quantization\uff08VPTQ\uff09\u7684\u4f4e\u6bd4\u7279\u91cf\u5316\u65b9\u6cd5\uff0c\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u901a\u8fc7\u4f7f\u7528\u4e8c\u6b21\u4f18\u5316\u6765\u5b9a\u4e49LLM\u5411\u91cf\u91cf\u5316\u95ee\u9898\uff0c\u5e76\u901a\u8fc7\u89e3\u51b3\u4f18\u5316\u95ee\u9898\u6765\u6307\u5bfc\u91cf\u5316\u7b97\u6cd5\u8bbe\u8ba1\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u5f15\u5165\u4e86\u901a\u9053\u72ec\u7acb\u7684\u4e8c\u6b21\u4f18\u5316\u4ee5\u5b9e\u73b0\u7cbe\u7ec6\u5316\u91cf\u5316\u3002\u540c\u65f6\uff0c\u901a\u8fc7\u5206\u89e3\u4f18\u5316\u95ee\u9898\uff0c\u63d0\u51fa\u4e86\u7b80\u660e\u6709\u6548\u7684\u4ee3\u7801\u672c\u521d\u59cb\u5316\u7b97\u6cd5\u3002\u6b64\u5916\uff0cVPTQ\u8fd8\u6269\u5c55\u4e86\u6b8b\u5dee\u548c\u5f02\u5e38\u503c\u91cf\u5316\u652f\u6301\uff0c\u8fd9\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u6a21\u578b\u7cbe\u5ea6\uff0c\u8fd8\u80fd\u8fdb\u4e00\u6b65\u538b\u7f29\u6a21\u578b\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0eSOTA\u76f8\u6bd4\uff0c\u57282\u6bd4\u7279\u91cf\u5316\u65f6\uff0cVPTQ\u5c06\u6a21\u578b\u91cf\u5316\u56f0\u60d1\u5ea6\u964d\u4f4e0.01-0.34\uff0cMistral-7B\u4e0a\u4e3a0.38-0.68\uff0cLLaMA-3\u4e0a\u4e3a4.41-7.34\u3002\u5728\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684\u5e73\u5747\u51c6\u786e\u5ea6\u63d0\u5347\u8303\u56f4\u4e3aLLaMA-2\u4e0a\u76840.79%-1.5%\uff0cMistral-7B\u4e0a\u76841%\uff0c\u4ee5\u53caLLaMA-3\u4e0a\u768411%-22%\u3002\u91cf\u5316\u7b97\u6cd5\u6267\u884c\u65f6\u95f4\u4ec5\u536010.4%-18.6%\uff0c\u5bfc\u81f4\u63a8\u7406\u541e\u5410\u91cf\u63d0\u9ad81.6-1.8\u500d\u3002**|\n", "2409.17054": "|**2024-09-25**|**Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia**|Azmul Asmar Irfan et.al.|[2409.17054](http://arxiv.org/abs/2409.17054)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u89e3\u51b3\u65b9\u6848\uff0c\u5229\u7528\u672c\u5730\u5316\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u533b\u751f\u4e0e\u60a3\u8005\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u4f7f\u7528Whisper\u6a21\u578b\u8fdb\u884c\u8f6c\u5f55\uff0cGPT-3\u8fdb\u884c\u603b\u7ed3\uff0c\u5e76\u5c06\u5176\u683c\u5f0f\u5316\u4e3aePuskemas\u533b\u7597\u8bb0\u5f55\u3002\u6b64\u7cfb\u7edf\u4f5c\u4e3a\u73b0\u6709\u7f51\u7edc\u6d4f\u89c8\u5668\u6269\u5c55\u7684\u9644\u52a0\u7ec4\u4ef6\u5b9e\u73b0\uff0c\u5141\u8bb8\u533b\u751f\u5728\u8bf4\u8bdd\u65f6\u586b\u5199\u60a3\u8005\u8868\u683c\u3002\u901a\u8fc7\u5229\u7528\u5b9e\u65f6\u8f6c\u5f55\u3001\u7ffb\u8bd1\u548c\u603b\u7ed3\u529f\u80fd\uff0c\u533b\u751f\u53ef\u4ee5\u63d0\u9ad8\u60a3\u8005\u62a4\u7406\u7684\u5468\u8f6c\u65f6\u95f4\uff0c\u540c\u65f6\u589e\u5f3a\u8bb0\u5f55\u7684\u8d28\u91cf\uff0c\u4f7f\u5f97\u8bb0\u5f55\u66f4\u52a0\u8be6\u7ec6\u4e14\u5bcc\u6709\u6d1e\u5bdf\u529b\uff0c\u4ee5\u4f9b\u672a\u6765\u7684\u8bbf\u95ee\u53c2\u8003\u3002\u8fd9\u4e00\u521b\u65b0\u65e8\u5728\u89e3\u51b3\u5370\u5c3c\u533b\u7597\u673a\u6784\u62e5\u6324\u4ee5\u53ca\u533b\u62a4\u4eba\u5458\u884c\u653f\u8d1f\u62c5\u91cd\u7684\u95ee\u9898\u3002\u6211\u4eec\u76f8\u4fe1\uff0c\u8fd9\u79cd\u89e3\u51b3\u65b9\u6848\u5c06\u5e2e\u52a9\u533b\u751f\u8282\u7701\u65f6\u95f4\u3001\u63d0\u4f9b\u66f4\u597d\u7684\u62a4\u7406\u5e76\u4ea7\u751f\u66f4\u51c6\u786e\u7684\u533b\u7597\u8bb0\u5f55\uff0c\u4ee3\u8868\u4e86\u5411\u73b0\u4ee3\u5316\u533b\u7597\u4fdd\u5065\u8fc8\u8fdb\u7684\u91cd\u8981\u4e00\u6b65\uff0c\u786e\u4fdd\u5373\u4f7f\u5728\u8d44\u6e90\u6709\u9650\u7684\u73af\u5883\u4e2d\uff0c\u60a3\u8005\u4e5f\u80fd\u83b7\u5f97\u53ca\u65f6\u3001\u9ad8\u8d28\u91cf\u7684\u62a4\u7406\u3002|\n", "2409.17044": "|**2024-09-25**|**How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not**|Francesco Verdini et.al.|[2409.17044](http://arxiv.org/abs/2409.17044)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u60ca\u4eba\u8868\u73b0\u63a8\u52a8\u4e86\u7814\u7a76\u52aa\u529b\uff0c\u4f7f\u5176\u80fd\u591f\u5e94\u7528\u4e8e\u4e00\u7cfb\u5217\u4efb\u52a1\u548c\u8f93\u5165\u6a21\u6001\u3002\u5728\u8bed\u97f3\u8f6c\u6587\u672c\uff08S2T\uff09\u4efb\u52a1\u4e2d\uff0c\u65b0\u5174\u7684\u89e3\u51b3\u65b9\u6848\u662f\u901a\u8fc7\u9002\u914d\u5668\u6a21\u5757\u5c06\u8bed\u97f3\u57fa\u7840\u6a21\u578b\uff08SFM\uff09\u7684\u8f93\u51fa\u6295\u5f71\u5230LLM\u5d4c\u5165\u7a7a\u95f4\u3002\u7136\u800c\uff0c\u76ee\u524d\u8fd8\u6ca1\u6709\u5de5\u4f5c\u63a2\u8ba8\u4e0b\u6e38\u4efb\u52a1\u6027\u80fd\u5728\u591a\u5927\u7a0b\u5ea6\u4e0a\u4f9d\u8d56\u4e8e\u6bcf\u4e2a\u7ec4\u4ef6\uff08SFM\u3001\u9002\u914d\u5668\u3001LLM\uff09\uff0c\u6216\u8005\u9009\u62e9\u9002\u914d\u5668\u7684\u6700\u4f73\u8bbe\u8ba1\u662f\u5426\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bc4\u4f30\u4e865\u4e2a\u9002\u914d\u5668\u6a21\u5757\u30012\u4e2aLLM\uff08Mistral\u548cLlama\uff09\u4ee5\u53ca2\u4e2aSFM\uff08Whisper\u548cSeamlessM4T\uff09\u5728\u81ea\u52a8\u8bed\u97f3\u8bc6\u522b\u548c\u8bed\u97f3\u7ffb\u8bd1\u4e24\u4e2a\u5e7f\u6cdb\u4f7f\u7528\u7684S2T\u4efb\u52a1\u4e0a\u7684\u7ec4\u5408\u6548\u679c\u3002\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0cSFM\u5728\u4e0b\u6e38\u6027\u80fd\u4e2d\u626e\u6f14\u7740\u81f3\u5173\u91cd\u8981\u7684\u89d2\u8272\uff0c\u800c\u9002\u914d\u5668\u7684\u9009\u62e9\u5177\u6709\u9002\u5ea6\u7684\u5f71\u54cd\uff0c\u5e76\u4e14\u53d6\u51b3\u4e8e\u6240\u9009\u7684SFM\u548cLLM\u3002|\n", "2409.17027": "|**2024-09-25**|**Counterfactual Token Generation in Large Language Models**|Ivi Chatzi et.al.|[2409.17027](http://arxiv.org/abs/2409.17027)|**[link](https://github.com/networks-learning/counterfactual-llms)**|\u672c\u6587\u65e8\u5728\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u529f\u80fd\uff0c\u4f7f\u5176\u80fd\u591f\u63a8\u7406\u8fc7\u53bb\u751f\u6210\u7684\u4ee4\u724c\u6240\u5448\u73b0\u7684\u53ef\u80fd\u66ff\u4ee3\u60c5\u51b5\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8eGumbel-Max\u7ed3\u6784\u56e0\u679c\u6a21\u578b\u7684\u56e0\u679c\u6a21\u578b\uff0c\u4ee5\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u8fd9\u4e00\u529f\u80fd\u3002\u6211\u4eec\u7684\u6a21\u578b\u80fd\u591f\u5728\u51e0\u4e4e\u4e0d\u589e\u52a0\u4e0e\u57fa\u7840\u4ee4\u724c\u751f\u6210\u6210\u672c\u7684\u60c5\u51b5\u4e0b\uff0c\u8fdb\u884c\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\uff0c\u5b9e\u73b0\u8fc7\u7a0b\u7b80\u5355\u4e14\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u6216\u63d0\u793a\u5de5\u7a0b\u3002\u6211\u4eec\u5728\u6b64\u57fa\u7840\u4e0a\u5728Llama 3 8B-instruct\u4e0a\u5b9e\u73b0\u4e86\u8be5\u6a21\u578b\uff0c\u5e76\u5bf9\u751f\u6210\u7684\u53cd\u4e8b\u5b9e\u6587\u672c\u8fdb\u884c\u4e86\u5b9a\u6027\u548c\u5b9a\u91cf\u5206\u6790\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u53cd\u4e8b\u5b9e\u4ee4\u724c\u751f\u6210\u5728\u504f\u89c1\u68c0\u6d4b\u65b9\u9762\u7684\u5e94\u7528\uff0c\u63ed\u793a\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6784\u5efa\u7684\u4e16\u754c\u6a21\u578b\u4e2d\u7684\u4e00\u4e9b\u6709\u8da3\u89c1\u89e3\u3002|\n", "2409.17011": "|**2024-09-25**|**LLM-CARD: Towards a Description and Landscape of Large Language Models**|Shengwei Tian et.al.|[2409.17011](http://arxiv.org/abs/2409.17011)|**[link](https://github.com/shengwei-tian/dependency-parser-visualization)**|\u968f\u7740\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u9886\u57df\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cdNLP\u4efb\u52a1\u4e2d\u4e0d\u65ad\u6d8c\u73b0\u3002\u968f\u7740\u53d1\u8868\u7684\u8bba\u6587\u6570\u91cf\u4e0d\u65ad\u589e\u52a0\uff0c\u7814\u7a76\u4eba\u5458\u548c\u5f00\u53d1\u8005\u9762\u4e34\u4fe1\u606f\u8fc7\u8f7d\u7684\u6311\u6218\u3002\u56e0\u6b64\uff0c\u5f00\u53d1\u4e00\u4e2a\u80fd\u591f\u81ea\u52a8\u4ece\u5b66\u672f\u8bba\u6587\u4e2d\u63d0\u53d6\u5e76\u7ec4\u7ec7LLM\u5173\u952e\u4fe1\u606f\u7684\u7cfb\u7edf\u53d8\u5f97\u5c24\u4e3a\u91cd\u8981\u3002\u672c\u5de5\u4f5c\u65e8\u5728\u901a\u8fc7\u4f7f\u7528\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff08NER\uff09\u548c\u5173\u7cfb\u62bd\u53d6\uff08RE\uff09\u65b9\u6cd5\u6765\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u53ef\u4ee5\u81ea\u52a8\u4ece\u8bba\u6587\u4e2d\u63d0\u53d6\u5173\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u5173\u952e\u4fe1\u606f\uff0c\u5e2e\u52a9\u7814\u7a76\u4eba\u5458\u9ad8\u6548\u5730\u83b7\u53d6\u5173\u4e8eLLMs\u7684\u4fe1\u606f\u3002\u8fd9\u4e9b\u7279\u6027\u5305\u62ec\u6a21\u578b\u7684\u201c\u8bb8\u53ef\u201d\u3001\u201c\u540d\u79f0\u201d\u548c\u201c\u5e94\u7528\u201d\u3002\u501f\u52a9\u8fd9\u4e9b\u7279\u6027\uff0c\u6211\u4eec\u53ef\u4ee5\u4e3a\u6bcf\u7bc7\u8bba\u6587\u5f62\u6210\u4e00\u4e2a\u6a21\u578b\u5361\u7247\u3002\u5728\u6570\u636e\u8d21\u732e\u65b9\u9762\uff0c\u5bf9106\u7bc7\u5b66\u672f\u8bba\u6587\u8fdb\u884c\u4e86\u5904\u7406\uff0c\u5b9a\u4e49\u4e86\u4e09\u4e2a\u5b57\u5178\u2014\u2014LLMs\u540d\u79f0\u3001\u8bb8\u53ef\u548c\u5e94\u7528\u3002\u901a\u8fc7\u5b57\u5178\u67e5\u627e\u63d0\u53d6\u4e8611051\u4e2a\u53e5\u5b50\uff0c\u5e76\u901a\u8fc7\u4eba\u5de5\u5ba1\u67e5\u6700\u7ec8\u9009\u62e9\u4e86129\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u540d\u79f0\u4e0e\u8bb8\u53ef\u4e4b\u95f4\u7684\u94fe\u63a5\uff0c\u4ee5\u53ca106\u4e2a\u53e5\u5b50\uff0c\u5176\u4e2d\u5305\u542b\u6a21\u578b\u540d\u79f0\u4e0e\u5e94\u7528\u4e4b\u95f4\u7684\u94fe\u63a5\u3002|\n", "2409.18127": "|**2024-09-26**|**EgoLM: Multi-Modal Language Model of Egocentric Motions**|Fangzhou Hong et.al.|[2409.18127](http://arxiv.org/abs/2409.18127)|null|\u5728\u7a7f\u6234\u8bbe\u5907\u7684\u666e\u53ca\u80cc\u666f\u4e0b\uff0c\u7406\u89e3\u4e3b\u89c2\u89c6\u89d2\u7684\u52a8\u4f5c\u53d8\u5f97\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53d1\u5c55\u5177\u6709\u60c5\u5883\u610f\u8bc6\u7684\u4eba\u5de5\u667a\u80fd\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aEgoLM\u7684\u901a\u7528\u6846\u67b6\uff0c\u7528\u4e8e\u4ece\u591a\u6a21\u6001\u8f93\u5165\uff08\u5982\u4e3b\u89c2\u89c6\u9891\u548c\u8fd0\u52a8\u4f20\u611f\u5668\uff09\u4e2d\u8ddf\u8e2a\u548c\u7406\u89e3\u4e3b\u89c2\u52a8\u4f5c\u3002EgoLM\u901a\u8fc7\u5229\u7528\u4e30\u5bcc\u7684\u4e0a\u4e0b\u6587\u6765\u89e3\u51b3\u5355\u6a21\u6001\u6761\u4ef6\u4e0b\u7684\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u548c\u7406\u89e3\u96be\u9898\u3002\u4e3a\u4e86\u4fc3\u8fdb\u8fd9\u4e00\u901a\u7528\u4e14\u591a\u6a21\u6001\u7684\u6846\u67b6\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u6d1e\u5bdf\u662f\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u5efa\u6a21\u4e3b\u4f53\u52a8\u4f5c\u548c\u81ea\u7136\u8bed\u8a00\u7684\u8054\u5408\u5206\u5e03\u3002\u591a\u6a21\u6001\u4f20\u611f\u5668\u8f93\u5165\u88ab\u7f16\u7801\u5e76\u6295\u5f71\u5230\u8bed\u8a00\u6a21\u578b\u7684\u8054\u5408\u6f5c\u5728\u7a7a\u95f4\u4e2d\uff0c\u5e76\u7528\u4e8e\u89e6\u53d1\u52a8\u4f5c\u751f\u6210\u6216\u6587\u672c\u751f\u6210\uff0c\u5206\u522b\u7528\u4e8e\u4e3b\u4f53\u8fd0\u52a8\u8ddf\u8e2a\u6216\u7406\u89e3\u3002\u5927\u89c4\u6a21\u591a\u6a21\u6001\u4eba\u4f53\u52a8\u4f5c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u9a8c\u8bc1\u4e86EgoLM\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u5728\u666e\u904d\u4e3b\u89c2\u5b66\u4e60\u4e2d\u7684\u6709\u6548\u6027\u3002|\n", "2409.18119": "|**2024-09-26**|**Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography**|Yuexi Du et.al.|[2409.18119](http://arxiv.org/abs/2409.18119)|null|\u5728\u533b\u7597\u56fe\u50cf\u5206\u6790\u9886\u57df\uff0c\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u663e\u793a\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u4f46\u5176\u9700\u8981\u5927\u91cf\u7684\u6570\u636e\u548c\u8ba1\u7b97\u8d44\u6e90\u3002\u56e0\u6b64\uff0c\u73b0\u6709\u7684CLIP\u5e94\u7528\u4e3b\u8981\u96c6\u4e2d\u5728\u5982\u80f8\u7247\u8fd9\u7c7b\u62e5\u6709\u4e30\u5bcc\u56fe\u50cf\u62a5\u544a\u6570\u636e\u7684\u6a21\u6001\u4e0a\uff0c\u800c\u5ffd\u7565\u4e86\u8bf8\u5982\u4e73\u817aX\u5149\u7b49\u8bb8\u591a\u91cd\u8981\u6a21\u6001\u7684\u7814\u7a76\u3002\u672c\u6587\u9996\u6b21\u63d0\u51fa\u5c06\u5b8c\u6574\u7684CLIP\u6a21\u578b\u5e94\u7528\u4e8e\u4e73\u817aX\u5149\u56fe\u50cf\u5206\u6790\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u7740\u6807\u8bb0\u6570\u636e\u7a00\u7f3a\u3001\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u5c0f\u611f\u5174\u8da3\u533a\u57df\u4ee5\u53ca\u6570\u636e\u4e0d\u5e73\u8861\u7684\u6311\u6218\u3002 \u6211\u4eec\u9996\u5148\u5f00\u53d1\u4e86\u4e00\u79cd\u9488\u5bf9\u4e73\u817aX\u5149\u7684\u4e13\u7528\u76d1\u7763\u6846\u67b6\uff0c\u5229\u7528\u5176\u591a\u89c6\u56fe\u7279\u6027\u3002\u6b64\u5916\uff0c\u8bbe\u8ba1\u4e86\u5bf9\u9f50\u6a21\u5757\u4ee5\u66f4\u597d\u5730\u805a\u7126\u4e8e\u9ad8\u5206\u8fa8\u7387\u56fe\u50cf\u4e2d\u7684\u8be6\u7ec6\u7279\u5f81\u3002\u6700\u540e\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\u65b9\u6cd5\uff0c\u7528\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff0c\u8fd9\u4e9b\u6a21\u578b\u9884\u5148\u4f7f\u7528\u533b\u5b66\u77e5\u8bc6\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ee5\u5e94\u5bf9\u6570\u636e\u9650\u5236\u95ee\u9898\u3002 \u6211\u4eec\u7684\u591a\u89c6\u56fe\u548c\u591a\u5c3a\u5ea6\u5bf9\u9f50\uff08MaMA\uff09\u65b9\u6cd5\uff0c\u5728\u4e24\u4e2a\u5927\u578b\u771f\u5b9e\u4e16\u754c\u4e73\u817aX\u5149\u6570\u636e\u96c6EMBED\u548cRSNA-Mammo\u4e0a\uff0c\u5bf9\u4e8e\u4e09\u79cd\u4e0d\u540c\u7684\u4efb\u52a1\uff0c\u76f8\u8f83\u4e8e\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u65b9\u6cd5\u53d6\u5f97\u4e86\u663e\u8457\u6027\u80fd\u63d0\u5347\uff0c\u540c\u65f6\u76f8\u6bd4\u6700\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u4ec5\u4f7f\u7528\u4e8652%\u7684\u6a21\u578b\u5927\u5c0f\u3002|\n", "2409.18111": "|**2024-09-26**|**E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding**|Ye Liu et.al.|[2409.18111](http://arxiv.org/abs/2409.18111)|**[link](https://github.com/PolyU-ChenLab/ETBench)**|**\u4e3a\u4e86\u9a8c\u8bc1\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\uff08Video Large Language Models, Video-LLMs\uff09\u5728\u901a\u7528\u89c6\u9891\u7406\u89e3\u4e2d\u7684\u5de8\u5927\u6f5c\u529b\uff0c\u5df2\u63d0\u51fa\u4e86\u4e00\u7cfb\u5217\u57fa\u51c6\u6d4b\u8bd5\u6765\u8bca\u65ad\u6a21\u578b\u5728\u4e0d\u540c\u573a\u666f\u4e0b\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u4ec5\u901a\u8fc7\u89c6\u9891\u7ea7\u95ee\u9898\u56de\u7b54\u8fdb\u884c\u8bc4\u4f30\uff0c\u7f3a\u4e4f\u5bf9\u4e8b\u4ef6\u7ea7\u522b\u7684\u7cbe\u7ec6\u8bc4\u4f30\u548c\u4efb\u52a1\u591a\u6837\u6027\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86E.T. Bench\uff08\u4e8b\u4ef6\u7ea7\u522b\u4e0e\u65f6\u95f4\u654f\u611f\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\uff09\uff0c\u8fd9\u662f\u4e00\u4e2a\u9488\u5bf9\u5f00\u653e\u5f0f\u7684\u4e8b\u4ef6\u7ea7\u522b\u89c6\u9891\u7406\u89e3\u7684\u5927\u89c4\u6a21\u3001\u9ad8\u8d28\u91cf\u57fa\u51c6\u6d4b\u8bd5\u3002 E.T. Bench\u6309\u7167\u4e09\u5c42\u4efb\u52a1\u5206\u7c7b\u4f53\u7cfb\u8fdb\u884c\u7ec4\u7ec7\uff0c\u5305\u542b\u4e86\u6db5\u76d612\u4e2a\u4efb\u52a1\u76847300\u4e2a\u6837\u672c\uff0c\u4ee5\u53ca8\u4e2a\u9886\u57df\u76842514\u5c0f\u65f6\u603b\u65f6\u957f\u76847000\u4e2a\u89c6\u9891\uff0c\u63d0\u4f9b\u4e86\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6211\u4eec\u5e7f\u6cdb\u5730\u5bf98\u4e2a\u56fe\u50cf\u5927\u8bed\u8a00\u6a21\u578b\u548c12\u4e2a\u89c6\u9891\u5927\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u5e76\u4e14\u7ed3\u679c\u663e\u793a\uff0c\u7528\u4e8e\u7c97\u7c92\u5ea6\uff08\u89c6\u9891\u7ea7\uff09\u7406\u89e3\u7684\u6700\u5148\u8fdb\u7684\u6a21\u578b\u5728\u89e3\u51b3\u6211\u4eec\u7684\u7cbe\u7ec6\u7c92\u5ea6\u4efb\u52a1\u65f6\u8868\u73b0\u4e0d\u4f73\uff0c\u4f8b\u5982\u5728\u89c6\u9891\u4e2d\u5b9a\u4f4d\u611f\u5174\u8da3\u7684\u4e8b\u4ef6\uff0c\u4e3b\u8981\u539f\u56e0\u662f\u89c6\u9891\u4e0a\u4e0b\u6587\u957f\u5ea6\u77ed\u3001\u65f6\u95f4\u8868\u793a\u4e0d\u5f53\u4ee5\u53ca\u7f3a\u4e4f\u591a\u4e8b\u4ef6\u8bad\u7ec3\u6570\u636e\u3002\u9488\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\u2014\u2014E.T. Chat\uff0c\u4ee5\u53ca\u4e13\u95e8\u4e3a\u7cbe\u7ec6\u7c92\u5ea6\u4e8b\u4ef6\u7406\u89e3\u8bbe\u8ba1\u7684\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6E.T. Instruct 164K\u3002\u6211\u4eec\u7684\u7b80\u5355\u4f46\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6848\u5728\u591a\u4e2a\u573a\u666f\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u6027\u80fd\u3002**|\n", "2409.18060": "|**2024-09-26**|**Infering Alt-text For UI Icons With Large Language Models During App Development**|Sabrina Haque et.al.|[2409.18060](http://arxiv.org/abs/2409.18060)|null|\u786e\u4fdd\u79fb\u52a8\u5e94\u7528\u7684\u65e0\u969c\u788d\u6027\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\uff0c\u5c24\u5176\u662f\u5bf9\u4e8e\u4f9d\u8d56\u5c4f\u5e55\u9605\u8bfb\u5668\u7684\u89c6\u969c\u7528\u6237\u3002\u754c\u9762\u56fe\u6807\u5bf9\u4e8e\u5bfc\u822a\u548c\u4e92\u52a8\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u5f80\u5f80\u7f3a\u4e4f\u6709\u610f\u4e49\u7684\u66ff\u4ee3\u6587\u672c\uff0c\u4ece\u800c\u5f62\u6210\u4f7f\u7528\u969c\u788d\u3002\u4f20\u7edf\u7684\u6df1\u5ea6\u5b66\u4e60\u65b9\u6cd5\u5728\u751f\u6210\u66ff\u4ee3\u6587\u672c\u65f6\u9700\u8981\u5927\u91cf\u6570\u636e\u96c6\uff0c\u5e76\u4e14\u5728\u56fe\u6807\u7c7b\u578b\u591a\u6837\u6027\u4e0e\u4e0d\u5e73\u8861\u6027\u65b9\u9762\u5b58\u5728\u56f0\u96be\u3002\u66f4\u8fd1\u671f\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5219\u8981\u6c42\u5b8c\u6574\u7684UI\u5c4f\u5e55\uff0c\u8fd9\u5728\u5e94\u7528\u7a0b\u5e8f\u5f00\u53d1\u7684\u8fed\u4ee3\u9636\u6bb5\u53ef\u80fd\u4e0d\u5207\u5b9e\u9645\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u90e8\u5206UI\u6570\u636e\u81ea\u4e3b\u751f\u6210\u79fb\u52a8UI\u56fe\u6807\u7684\u63cf\u8ff0\u6027\u66ff\u4ee3\u6587\u672c\u3002\u901a\u8fc7\u6574\u5408\u5305\u62ec\u7c7b\u522b\u3001\u8d44\u6e90ID\u3001\u8fb9\u754c\u3001OCR\u68c0\u6d4b\u5230\u7684\u6587\u5b57\u4ee5\u53ca\u7236\u8282\u70b9\u548c\u540c\u7ea7\u8282\u70b9\u7684\u4e0a\u4e0b\u6587\u4fe1\u606f\u5728\u5185\u7684\u56fe\u6807\u4e0a\u4e0b\u6587\uff0c\u6211\u4eec\u5bf9\u5927\u7ea61400\u4e2a\u56fe\u6807\u7684\u5c0f\u578b\u6570\u636e\u96c6\u8fdb\u884c\u79bb\u7ebf\u5fae\u8c03\uff0c\u4ece\u800c\u751f\u6210\u4e86IconDesc\u3002\u5728\u5b9e\u8bc1\u8bc4\u4f30\u548c\u7528\u6237\u7814\u7a76\u4e2d\uff0cIconDesc\u663e\u8457\u63d0\u9ad8\u4e86\u751f\u6210\u76f8\u5173\u66ff\u4ee3\u6587\u672c\u7684\u80fd\u529b\u3002\u8fd9\u4e00\u80fd\u529b\u4f7f\u5f97IconDesc\u6210\u4e3a\u5f00\u53d1\u8005\u7684\u91cd\u8981\u5de5\u5177\uff0c\u5e2e\u52a9\u4ed6\u4eec\u5feb\u901f\u8fed\u4ee3\u548c\u63d0\u5347UI\u7684\u65e0\u969c\u788d\u6027\u3002|\n", "2409.18053": "|**2024-09-26**|**DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving**|Dingrui Wang et.al.|[2409.18053](http://arxiv.org/abs/2409.18053)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u578b\u81ea\u4e3b\u9a7e\u9a76\u6846\u67b6DualAD\uff0c\u65e8\u5728\u6a21\u4eff\u4eba\u7c7b\u5728\u9a7e\u9a76\u8fc7\u7a0b\u4e2d\u7684\u51b3\u7b56\u903b\u8f91\u3002DualAD\u7531\u4e24\u5c42\u6784\u6210\uff1a\u5e95\u5c42\u4e3a\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\uff0c\u8d1f\u8d23\u5904\u7406\u9700\u8981\u8f83\u5c11\u51b3\u7b56\u7684\u5e38\u89c4\u9a7e\u9a76\u4efb\u52a1\uff1b\u4e0a\u5c42\u5219\u914d\u5907\u4e86\u4e00\u4e2a\u57fa\u4e8e\u89c4\u5219\u7684\u6587\u5b57\u7f16\u7801\u5668\uff0c\u5c06\u7edd\u5bf9\u72b6\u6001\u4e0b\u7684\u9a7e\u9a76\u573a\u666f\u8f6c\u5316\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6b64\u6587\u672c\u968f\u540e\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u8fdb\u884c\u51b3\u7b56\u3002\u5f53\u68c0\u6d4b\u5230\u6f5c\u5728\u5371\u9669\u65f6\uff0c\u4e0a\u5c42\u4f1a\u4ecb\u5165\u5e95\u5c42\u7684\u51b3\u7b56\u8fc7\u7a0b\uff0c\u4ee5\u6a21\u4eff\u4eba\u7c7b\u5728\u5173\u952e\u60c5\u51b5\u4e0b\u7684\u51b3\u7b56\u903b\u8f91\u3002\u95ed\u5408\u73af\u8def\u5b9e\u9a8c\u663e\u793a\uff0c\u4f7f\u7528\u96f6\u8bad\u7ec3\u9884\u8bad\u7ec3\u6a21\u578b\u7684DualAD\u663e\u8457\u4f18\u4e8e\u7f3a\u4e4f\u51b3\u7b56\u80fd\u529b\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u8fd0\u52a8\u89c4\u5212\u5668\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u8fd8\u5f3a\u8c03\u4e86\u6587\u5b57\u7f16\u7801\u5668\u7684\u6709\u6548\u6027\uff0c\u5b83\u6781\u5927\u5730\u589e\u5f3a\u4e86\u6a21\u578b\u5bf9\u573a\u666f\u7684\u7406\u89e3\u80fd\u529b\u3002\u6b64\u5916\uff0c\u96c6\u6210\u7684DualAD\u6a21\u578b\u968f\u7740\u66f4\u5f3a\u5927\u7684LLM\u7684\u4f7f\u7528\u800c\u5f97\u5230\u6539\u5584\uff0c\u8fd9\u8868\u660e\u8be5\u6846\u67b6\u5177\u6709\u8fdb\u4e00\u6b65\u589e\u5f3a\u7684\u6f5c\u529b\u3002\u6211\u4eec\u63d0\u4f9b\u4ee3\u7801\u548c\u57fa\u51c6\u6d4b\u8bd5\u4f9b\u516c\u4f17\u8bbf\u95ee\u3002|\n", "2409.18042": "|**2024-09-26**|**EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions**|Kai Chen et.al.|[2409.18042](http://arxiv.org/abs/2409.18042)|null|\u5728\u5f00\u653e\u6e90\u4ee3\u7801\u793e\u533a\u4e2d\uff0c\u8ba9\u5927\u578b\u8bed\u8a00\u6a21\u578b\u80fd\u591f\u4ee5\u516c\u5f00\u6570\u636e\u8fdb\u884c\u7aef\u5230\u7aef\u7684\u56fe\u50cf\u3001\u6587\u672c\u548c\u8bed\u97f3\u751f\u6210\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u73b0\u6709\u7684\u89c6\u8bed\u6a21\u578b\u4f9d\u8d56\u4e8e\u5916\u90e8\u5de5\u5177\u8fdb\u884c\u8bed\u97f3\u5904\u7406\uff0c\u800c\u8bed\u97f3\u8bed\u6a21\u578b\u4ecd\u7f3a\u4e4f\u89c6\u89c9\u7406\u89e3\u80fd\u529b\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86EMOVA\uff08\u60c5\u7eea\u5316\u7684\u5168\u6a21\u5f0f\u8bed\u97f3\u52a9\u624b\uff09\uff0c\u4ee5\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5177\u5907\u7aef\u5230\u7aef\u7684\u8bed\u97f3\u80fd\u529b\uff0c\u540c\u65f6\u4fdd\u6301\u9886\u5148\u7684\u89c6\u8bed\u8868\u73b0\u3002\u901a\u8fc7\u8bed\u4e49-\u58f0\u5b66\u5206\u79bb\u7684\u8bed\u97f3\u7f16\u7801\u5668\uff0c\u6211\u4eec\u610f\u5916\u5730\u53d1\u73b0\uff0c\u5168\u6a21\u6001\u5bf9\u9f50\u53ef\u4ee5\u8fdb\u4e00\u6b65\u589e\u5f3a\u89c6\u8bed\u548c\u8bed\u97f3\u80fd\u529b\uff0c\u4e0e\u76f8\u5e94\u7684\u53cc\u6a21\u6001\u5bf9\u9f50\u6a21\u578b\u76f8\u6bd4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u8f7b\u91cf\u7ea7\u98ce\u683c\u6a21\u5757\uff0c\u7528\u4e8e\u7075\u6d3b\u63a7\u5236\u8bed\u97f3\u98ce\u683c\uff08\u4f8b\u5982\u60c5\u611f\u548c\u97f3\u8c03\uff09\u3002\u9996\u6b21\uff0cEMOVA\u5728\u89c6\u8bed\u548c\u8bed\u97f3\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u5747\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5e76\u540c\u65f6\u652f\u6301\u5e26\u6709\u751f\u52a8\u60c5\u611f\u7684\u5168\u6a21\u6001\u5bf9\u8bdd\u3002|\n", "2409.18028": "|**2024-09-26**|**Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective**|Yotam Wolf et.al.|[2409.18028](http://arxiv.org/abs/2409.18028)|null|\u5728\u8fdb\u884c\u590d\u6742\u5206\u6790\u4efb\u52a1\uff08\u5982\u4ee3\u7801\u751f\u6210\uff09\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f7f\u7528\u4e2d\uff0c\u901a\u5e38\u4f1a\u5c06\u6574\u4e2a\u4efb\u52a1\u7684\u89e3\u51b3\u65b9\u6848\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u8fdb\u884c\u91c7\u6837\u3002\u5148\u524d\u7684\u7814\u7a76\u8868\u660e\uff0c\u5728\u6a21\u578b\u7684\u4e0a\u4e0b\u6587\u4e2d\u5206\u89e3\u4efb\u52a1\uff08\u5373\u94fe\u5f0f\u601d\u7ef4\uff09\u5bf9\u4e8e\u89e3\u51b3\u8fd9\u7c7b\u4efb\u52a1\u662f\u6709\u76ca\u7684\u3002\u672c\u6587\u6307\u51fa\u4e86\u4e00\u79cd\u9650\u5236\uff0c\u5373LLM\u5728\u540c\u4e00\u4e2a\u4e0a\u4e0b\u6587\u7a97\u53e3\u5185\u6267\u884c\u591a\u4e2a\u5b50\u4efb\u52a1\u7684\u80fd\u529b\u2014\u2014\u4e00\u79cd\u201c\u590d\u5408\u96be\u5ea6\u201d\u3002\u8fd9\u8868\u660e\u5728LLM\u7ec4\u6210\u7684\u591a\u667a\u80fd\u4f53\u7cfb\u7edf\u4e2d\u5c06\u5206\u89e3\u540e\u7684\u95ee\u9898\u5206\u53d1\u5904\u7406\u5177\u6709\u4f18\u52bf\u3002\u6211\u4eec\u901a\u8fc7\u751f\u6210\u590d\u6742\u5ea6\u6307\u6807\u6765\u91cf\u5316\u8fd9\u79cd\u590d\u5408\u96be\u5ea6\uff0c\u5373\u5728\u91c7\u6837\u5230\u81f3\u5c11\u4e00\u4e2a\u6b63\u786e\u89e3\u6240\u9700\u7684LLM\u751f\u6210\u6b21\u6570\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u76f8\u5bf9\u4e8e\u5728\u76f8\u540c\u4e0a\u4e0b\u6587\u5185\u89e3\u51b3\u7ec4\u5408\u95ee\u9898\uff0c\u5c06\u95ee\u9898\u5206\u6563\u7ed9\u591a\u4e2a\u667a\u80fd\u4f53\u7684\u751f\u6210\u590d\u6742\u5ea6\u4e4b\u95f4\u5b58\u5728\u5dee\u8ddd\uff0c\u5e76\u4e14\u968f\u7740\u89e3\u957f\u5ea6\u7684\u589e\u52a0\uff0c\u8fd9\u4e2a\u5dee\u8ddd\u5448\u6307\u6570\u589e\u957f\u3002\u6211\u4eec\u901a\u8fc7\u7406\u8bba\u8bc1\u660e\u548c\u5b9e\u9a8c\u8bc1\u660e\u4e86\u8fd9\u4e00\u7ed3\u679c\u3002|\n", "2409.18025": "|**2024-09-26**|**An Adversarial Perspective on Machine Unlearning for AI Safety**|Jakub \u0141ucki et.al.|[2409.18025](http://arxiv.org/abs/2409.18025)|**[link](https://github.com/ethz-spylab/unlearning-vs-safety)**|\u672c\u6587\u63a2\u8ba8\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u62d2\u7edd\u5371\u9669\u77e5\u8bc6\u76f8\u5173\u95ee\u9898\u65b9\u9762\u7684\u5fae\u8c03\u65b9\u5f0f\uff0c\u4f46\u8fd9\u4e9b\u9632\u62a4\u63aa\u65bd\u5f80\u5f80\u5bb9\u6613\u88ab\u7ed5\u8fc7\u3002\u53bb\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5f7b\u5e95\u6d88\u9664\u6a21\u578b\u7684\u5371\u9669\u80fd\u529b\u5e76\u4f7f\u5176\u5bf9\u653b\u51fb\u8005\u4e0d\u53ef\u8bbf\u95ee\u3002\u672c\u6587\u4ece\u5bf9\u6297\u6027\u89c6\u89d2\u6311\u6218\u4e86\u53bb\u5b66\u4e60\u4e0e\u4f20\u7edf\u5b89\u5168\u540e\u8bad\u7ec3\u4e4b\u95f4\u7684\u57fa\u672c\u5dee\u5f02\u3002\u6211\u4eec\u8bc1\u660e\u4e86\u4e4b\u524d\u88ab\u8ba4\u4e3a\u65e0\u6548\u7684\u73b0\u6709\u9003\u8131\u65b9\u6cd5\uff0c\u5728\u7cbe\u5fc3\u5e94\u7528\u65f6\u53ef\u4ee5\u6210\u529f\u5e94\u5bf9\u53bb\u5b66\u4e60\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u7cfb\u5217\u9002\u5e94\u6027\u65b9\u6cd5\u6765\u6062\u590d\u5927\u90e8\u5206\u88ab\u8ba4\u4e3a\u662f\u65e0\u6cd5\u5b66\u4e60\u7684\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u4f7f\u7528RMU\uff08\u5f53\u524d\u6700\u5148\u8fdb\u7684\u53bb\u5b66\u4e60\u65b9\u6cd5\uff09\u7f16\u8f91\u6a21\u578b\u540e\uff0c\u901a\u8fc7\u5728\u65e0\u5173\u793a\u4f8b\u4e0a\u8fdb\u884c\u5fae\u8c03\u6216\u5728\u6fc0\u6d3b\u7a7a\u95f4\u4e2d\u79fb\u9664\u7279\u5b9a\u65b9\u5411\uff0c\u53ef\u4ee5\u6062\u590d\u5927\u90e8\u5206\u5371\u9669\u80fd\u529b\u3002\u6211\u4eec\u7684\u53d1\u73b0\u8d28\u7591\u4e86\u5f53\u524d\u53bb\u5b66\u4e60\u65b9\u6cd5\u7684\u7a33\u5065\u6027\uff0c\u5e76\u5bf9\u5b83\u4eec\u76f8\u5bf9\u4e8e\u5b89\u5168\u8bad\u7ec3\u7684\u4f18\u52bf\u63d0\u51fa\u4e86\u7591\u95ee\u3002|\n", "2409.18023": "|**2024-09-26**|**DARE: Diverse Visual Question Answering with Robustness Evaluation**|Hannah Sterz et.al.|[2409.18023](http://arxiv.org/abs/2409.18023)|null|\u300aDARE\uff1a\u591a\u6837\u5316\u7684\u89c6\u89c9\u95ee\u7b54\u4e0e\u9c81\u68d2\u6027\u8bc4\u4f30\u300b\u8bba\u6587\u6458\u8981\u7ffb\u8bd1\u5982\u4e0b\uff1a \u672c\u6587\u5f15\u5165\u4e86DARE\uff08Diverse Visual Question Answering with Robustness Evaluation\uff09\uff0c\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u5e76\u6536\u96c6\u7684\u591a\u9009\u578b\u89c6\u89c9\u95ee\u7b54\u57fa\u51c6\u3002DARE\u65e8\u5728\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u8bed\u8a00\u63a8\u7406\u4efb\u52a1\u4e2d\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u4e94\u4e2a\u4e0d\u540c\u7c7b\u522b\u7684\u89c6\u89c9\u95ee\u9898\u4e0a\uff0c\u5e76\u5305\u62ec\u57fa\u4e8e\u63d0\u793a\u53d8\u5316\u3001\u7b54\u6848\u9009\u9879\u5b50\u96c6\u3001\u8f93\u51fa\u683c\u5f0f\u548c\u6b63\u786e\u7b54\u6848\u6570\u91cf\u7b49\u56db\u4e2a\u9c81\u68d2\u6027\u5bfc\u5411\u8bc4\u4f30\u7684\u5168\u9762\u8bc4\u4f30\u3002 \u7814\u7a76\u53d1\u73b0\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5927\u591a\u6570\u7c7b\u522b\u4e2d\u4ecd\u7136\u9762\u4e34\u6311\u6218\uff0c\u4e14\u65e0\u6cd5\u5728\u6d4b\u8bd5\u7684\u6240\u6709\u9c81\u68d2\u6027\u8bc4\u4f30\u4e2d\u4fdd\u6301\u4e00\u81f4\u7684\u9ad8\u6027\u80fd\u3002\u5728\u4e0d\u540c\u7b54\u6848\u9009\u9879\u5b50\u96c6\u7684\u60c5\u51b5\u4e0b\uff0c\u6700\u5dee\u60c5\u51b5\u4e0b\u7684\u6027\u80fd\u4e0b\u964d\u53ef\u8fbe\u6807\u51c6\u60c5\u51b5\u4e0b\u768434%\u3002\u5f00\u6e90\u6a21\u578b\u5982LLaVA 1.6\u548cIdefics\u5728\u9c81\u68d2\u6027\u65b9\u9762\u65e0\u6cd5\u4e0e\u95ed\u6e90\u6a21\u578bGPT-4\u548cGemini\u76f8\u5339\u654c\uff0c\u800c\u540e\u8005\u5728\u4e0d\u540c\u53d8\u4f53\u4e0b\u4ecd\u8868\u73b0\u51fa\u660e\u663e\u7684\u8106\u5f31\u6027\u3002 \u603b\u4e4b\uff0c\u8be5\u7814\u7a76\u63ed\u793a\u4e86\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u63a8\u7406\u4efb\u52a1\u65f6\u6240\u9762\u4e34\u7684\u5c40\u9650\u6027\uff0c\u5e76\u5f3a\u8c03\u4e86\u5728\u8bbe\u8ba1\u66f4\u9c81\u68d2\u7684\u6a21\u578b\u65f6\u9700\u8981\u8003\u8651\u7684\u95ee\u9898\u3002|\n", "2409.18014": "|**2024-09-26**|**Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles**|Lewei He et.al.|[2409.18014](http://arxiv.org/abs/2409.18014)|null|\u9488\u5bf9\u957f\u6587\u672c\u4e0a\u4e0b\u6587\u5904\u7406\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4ecd\u7136\u5b58\u5728\u5b9e\u73b0\u590d\u6742\u6027\u3001\u8bad\u7ec3\u6548\u7387\u548c\u6570\u636e\u7a00\u758f\u6027\u7b49\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u8303\u5f0f\u2014\u2014\u5728\u7ebf\u957f\u671f\u4e0a\u4e0b\u6587\u5904\u7406\uff08OLP\uff09\uff0c\u9002\u7528\u4e8e\u5904\u7406\u65e0\u9650\u957f\u5ea6\u7684\u6587\u6863\uff0c\u5e38\u89c1\u4e8e\u81ea\u52a8\u5316\u65b0\u95fb\u62a5\u9053\u3001\u76f4\u64ad\u7535\u5546\u548c\u75c5\u6bd2\u77ed\u89c6\u9891\u7b49\u591a\u6837\u5316\u7684\u6d41\u5a92\u4f53\u4fe1\u606f\u63a5\u6536\u4e0e\u7ec4\u7ec7\u573a\u666f\u3002\u540c\u65f6\uff0c\u5728\u9009\u62e9\u4f17\u591a\u6027\u80fd\u4f18\u5f02\u3001\u4ef7\u683c\u9002\u4e2d\u4e14\u54cd\u5e94\u5ef6\u8fdf\u77ed\u7684LLM\u65f6\uff0c\u5f80\u5f80\u9047\u5230\u96be\u4ee5\u6289\u62e9\u7684\u95ee\u9898\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u89d2\u8272\u5f3a\u5316\u5b66\u4e60\uff08Role-RL\uff09\u6846\u67b6\uff0c\u81ea\u52a8\u90e8\u7f72\u4e0d\u540c\u89d2\u8272\u7684LLM\u5728OLP\u7ba1\u9053\u4e2d\uff0c\u6839\u636e\u5176\u5b9e\u9645\u6027\u80fd\u8fdb\u884c\u5408\u7406\u5206\u914d\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5927\u91cf\u7684\u5b9e\u9a8c\uff0c\u5e76\u5728\u6211\u4eec\u7684OLP-MINI\u6570\u636e\u96c6\u4e0a\u53d1\u73b0\uff0c\u7ed3\u5408Role-RL\u6846\u67b6\u7684OLP\u7cfb\u7edf\u5e73\u5747\u53ec\u56de\u7387\u4e3a93.2%\uff0c\u5b9e\u73b0\u4e86OLP\u57fa\u51c6\uff0c\u5e76\u8282\u7701\u4e8679.4%\u7684LLM\u6210\u672c\u3002\u76f8\u5173\u4ee3\u7801\u548c\u6570\u636e\u96c6\u5df2\u516c\u5f00\u53d1\u5e03\uff1ahttps://anonymous.4open.science/r/Role-RL\u3002|\n", "2409.18957": "|**2024-09-27**|**LML: Language Model Learning a Dataset for Data-Augmented Prediction**|Praneeth Vadlapati et.al.|[2409.18957](http://arxiv.org/abs/2409.18957)|**[link](https://github.com/pro-genai/lml-dap)**|**\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u5206\u7c7b\u4efb\u52a1\u7684\u65b0\u65b9\u6cd5\uff0c\u8fd9\u901a\u5e38\u7531\u673a\u5668\u5b66\u4e60\uff08ML\uff09\u6a21\u578b\u5904\u7406\u3002\u4e0e\u4f9d\u8d56\u5927\u91cf\u6570\u636e\u6e05\u6d17\u548c\u7279\u5f81\u5de5\u7a0b\u7684ML\u6a21\u578b\u4e0d\u540c\uff0c\u6b64\u65b9\u6cd5\u901a\u8fc7\u7b80\u5316\u6d41\u7a0b\uff0c\u4f7f\u7528LLM\u6765\u4f18\u5316\u8fc7\u7a0b\u3002\u672c\u6587\u5f15\u5165\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u8bed\u8a00\u6a21\u578b\u5b66\u4e60\uff08LML\uff09\u201d\u7684\u6982\u5ff5\uff0c\u501f\u52a9\u4e00\u79cd\u79f0\u4e3a\u201c\u6570\u636e\u589e\u5f3a\u9884\u6d4b\uff08DAP\uff09\u201d\u7684\u65b0\u65b9\u6cd5\u3002\u5206\u7c7b\u4efb\u52a1\u7531LLM\u6267\u884c\uff0c\u7c7b\u4f3c\u4e8e\u4eba\u7c7b\u624b\u52a8\u63a2\u7d22\u548c\u7406\u89e3\u6570\u636e\uff0c\u5e76\u5229\u7528\u6570\u636e\u4f5c\u4e3a\u53c2\u8003\u6765\u505a\u51fa\u5206\u7c7b\u51b3\u7b56\u3002 \u8bad\u7ec3\u6570\u636e\u88ab\u603b\u7ed3\u548c\u8bc4\u4f30\uff0c\u4ee5\u786e\u5b9a\u5bfc\u81f4\u6bcf\u4e2a\u6807\u7b7e\u5206\u7c7b\u7684\u4e3b\u8981\u7279\u5f81\u3002\u5728DAP\u8fc7\u7a0b\u4e2d\uff0c\u7cfb\u7edf\u4f7f\u7528\u6570\u636e\u6982\u8981\u81ea\u52a8\u751f\u6210\u67e5\u8be2\uff0c\u7528\u4e8e\u4ece\u6570\u636e\u96c6\u4e2d\u68c0\u7d22\u76f8\u5173\u884c\u3002\u901a\u8fc7\u4f7f\u7528\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u6570\u636e\uff0cLLM\u57fa\u4e8e\u6570\u636e\u6982\u8981\u548c\u76f8\u5173\u884c\u751f\u6210\u5206\u7c7b\uff0c\u5373\u4f7f\u9762\u5bf9\u590d\u6742\u6570\u636e\u4e5f\u80fd\u786e\u4fdd\u6ee1\u610f\u7684\u51c6\u786e\u6027\u3002\u6570\u636e\u6982\u8981\u548c\u7c7b\u4f3c\u6570\u636e\u5728DAP\u4e2d\u7684\u5e94\u7528\u786e\u4fdd\u4e86\u51b3\u7b56\u7684\u4e0a\u4e0b\u6587\u610f\u8bc6\u3002\u8be5\u65b9\u6cd5\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u4e86\u201c\u4ee5\u53ef\u89e3\u91ca\u7684\u673a\u5668\u5b66\u4e60\u6a21\u578b\u8eab\u4efd\u884c\u4e8b\u201d\u7684\u8bed\u53e5\uff0c\u589e\u5f3a\u4e86\u9884\u6d4b\u7684\u53ef\u89e3\u91ca\u6027\uff0c\u5141\u8bb8\u7528\u6237\u5ba1\u67e5\u6bcf\u6761\u9884\u6d4b\u80cc\u540e\u7684\u903b\u8f91\u3002\u5728\u67d0\u4e9b\u6d4b\u8bd5\u6848\u4f8b\u4e2d\uff0c\u7cfb\u7edf\u7684\u51c6\u786e\u7387\u8d85\u8fc790%\uff0c\u8bc1\u660e\u4e86\u7cfb\u7edf\u7684\u6709\u6548\u6027\u53ca\u5176\u5728\u5404\u79cd\u573a\u666f\u4e0b\u8d85\u8d8a\u4f20\u7edfML\u6a21\u578b\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u5df2\u53d1\u5e03\u4e8ehttps://github.com/Pro-GenAI/LML-DAP\u3002**|\n", "2409.18943": "|**2024-09-27**|**Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models**|Jiaming Li et.al.|[2409.18943](http://arxiv.org/abs/2409.18943)|**[link](https://github.com/geaming2002/ruler)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u4f7f\u5f97\u4eba\u7c7b\u80fd\u591f\u4ee5\u81ea\u7136\u7684\u65b9\u5f0f\u4e0eAI\u4ee3\u7406\u4e92\u52a8\u3002\u7136\u800c\uff0c\u5728\u9700\u8981\u751f\u6210\u7279\u5b9a\u957f\u5ea6\u54cd\u5e94\u65f6\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5f80\u5f80\u96be\u4ee5\u6ee1\u8db3\u7528\u6237\u9700\u6c42\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u5728\u51c6\u786e\u611f\u77e5\u6570\u503c\u9650\u5236\u65b9\u9762\u5b58\u5728\u7684\u56fa\u6709\u56f0\u96be\u3002\u4e3a\u4e86\u63a2\u7d22\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u9075\u5faa\u7279\u5b9a\u957f\u5ea6\u6307\u4ee4\u65f6\u63a7\u5236\u751f\u6210\u54cd\u5e94\u957f\u5ea6\u7684\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\uff08TLG\uff09\u5e76\u8bbe\u8ba1\u4e86\u4e24\u4e2a\u5ea6\u91cf\u6807\u51c6\uff0c\u7cbe\u786e\u5339\u914d\uff08PM\uff09\u548c\u7075\u6d3b\u5339\u914d\uff08FM\uff09\uff0c\u4ee5\u8bc4\u4f30\u6a21\u578b\u5728\u9075\u5b88\u6307\u5b9a\u54cd\u5e94\u957f\u5ea6\u65b9\u9762\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u3001\u6a21\u578b\u65e0\u5173\u7684\u65b9\u6cd5Ruler\uff0c\u901a\u8fc7\u4f7f\u7528\u5143\u957f\u5ea6\u6807\u8bb0\uff08MLTs\uff09\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u5ea6\u53d7\u9650\u6307\u4ee4\u4e0b\u7684\u6307\u4ee4\u9075\u5faa\u80fd\u529b\u3002\u5177\u4f53\u800c\u8a00\uff0cRuler\u4f7fLLMs\u80fd\u591f\u5728\u6307\u4ee4\u4e2d\u5305\u542b\u957f\u5ea6\u7ea6\u675f\u7684\u60c5\u51b5\u4e0b\u751f\u6210\u6307\u5b9a\u957f\u5ea6\u7684\u54cd\u5e94\u3002\u800c\u4e14\uff0c\u5f53\u957f\u5ea6\u7ea6\u675f\u6ca1\u6709\u660e\u786e\u63d0\u4f9b\u65f6\uff0cRuler\u8fd8\u80fd\u81ea\u52a8\u751f\u6210\u9002\u5f53\u7684MLT\uff0c\u8868\u73b0\u51fa\u51fa\u8272\u7684\u901a\u7528\u6027\u548c\u6cdb\u5316\u80fd\u529b\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8868\u660e\uff0cRuler\u5728\u76ee\u6807\u957f\u5ea6\u751f\u6210\u4efb\u52a1\u4e0a\u5bf9\u4e0d\u540c\u7684LLMs\u90fd\u663e\u793a\u51fa\u6709\u6548\u6027\uff0c\u4f8b\u5982\u5728PM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a27.97\uff0c\u5728FM\u4e0a\u7684\u5e73\u5747\u589e\u76ca\u4e3a29.57\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u6d88\u878d\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86Ruler\u7684\u6709\u6548\u6027\u53ca\u5176\u6cdb\u5316\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u53ef\u5728https://github.com/Geaming2002/Ruler\u83b7\u53d6\u3002**|\n", "2409.18938": "|**2024-09-27**|**From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding**|Heqing Zou et.al.|[2409.18938](http://arxiv.org/abs/2409.18938)|null|\u672c\u6587\u7efc\u8ff0\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u89c6\u89c9\u7f16\u7801\u5668\u96c6\u6210\u5728\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e2d\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5229\u7528\u5176\u56fa\u6709\u4f18\u52bf\u6765\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u4ee5\u8fdb\u884c\u89c6\u89c9\u63a8\u7406\u3002\u7531\u4e8e\u89c6\u89c9\u6570\u636e\u7684\u591a\u6837\u6027\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MM-LLMs\uff09\u5728\u8bbe\u8ba1\u548c\u8bad\u7ec3\u4e0a\u9488\u5bf9\u7406\u89e3\u56fe\u50cf\u3001\u77ed\u89c6\u9891\u548c\u957f\u89c6\u9891\u65f6\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u7279\u5f81\u548c\u6311\u6218\u3002\u6211\u4eec\u7684\u7814\u7a76\u805a\u7126\u4e8e\u957f\u89c6\u9891\u7406\u89e3\u4e0e\u9759\u6001\u56fe\u50cf\u53ca\u77ed\u89c6\u9891\u7406\u89e3\u4e4b\u95f4\u7684\u663e\u8457\u5dee\u5f02\u53ca\u5176\u72ec\u7279\u6311\u6218\u3002 \u4e0d\u540c\u4e8e\u9759\u6001\u56fe\u50cf\uff0c\u77ed\u89c6\u9891\u5305\u542b\u4e86\u5e8f\u5217\u5e27\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u5185\u90e8\u7684\u65f6\u95f4\u4fe1\u606f\uff1b\u800c\u957f\u89c6\u9891\u5219\u5305\u542b\u4e86\u591a\u4e2a\u4e8b\u4ef6\u7684\u65f6\u7a7a\u4fe1\u606f\u4ee5\u53ca\u4e8b\u4ef6\u95f4\u7684\u957f\u671f\u65f6\u95f4\u4f9d\u8d56\u6027\u3002\u672c\u6587\u65e8\u5728\u8ffd\u6eaf\u5e76\u603b\u7ed3MM-LLMs\u4ece\u56fe\u50cf\u7406\u89e3\u5230\u957f\u89c6\u9891\u7406\u89e3\u7684\u53d1\u5c55\u5386\u7a0b\uff0c\u8be6\u7ec6\u5bf9\u6bd4\u5404\u79cd\u89c6\u89c9\u7406\u89e3\u4efb\u52a1\u4e4b\u95f4\u7684\u5dee\u5f02\uff0c\u5e76\u7a81\u51fa\u957f\u89c6\u9891\u7406\u89e3\u6240\u9762\u4e34\u7684\u6311\u6218\uff0c\u5982\u66f4\u7ec6\u81f4\u7684\u65f6\u7a7a\u7ec6\u8282\u3001\u52a8\u6001\u4e8b\u4ef6\u548c\u957f\u671f\u4f9d\u8d56\u6027\u3002 \u63a5\u7740\uff0c\u672c\u6587\u5bf9MM-LLMs\u5728\u6a21\u578b\u8bbe\u8ba1\u548c\u8bad\u7ec3\u65b9\u6cd5\u4e0a\u7684\u53d1\u5c55\u8fdb\u884c\u4e86\u8be6\u5c3d\u7684\u6982\u8ff0\uff0c\u7279\u522b\u5173\u6ce8\u4e8e\u5982\u4f55\u6709\u6548\u7406\u89e3\u957f\u89c6\u9891\u3002\u6700\u540e\uff0c\u901a\u8fc7\u6bd4\u8f83\u73b0\u6709MM-LLMs\u5728\u4e0d\u540c\u957f\u5ea6\u7684\u89c6\u9891\u7406\u89e3\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u7684\u8868\u73b0\uff0c\u672c\u6587\u8ba8\u8bba\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u957f\u89c6\u9891\u7406\u89e3\u9886\u57df\u53ef\u80fd\u7684\u672a\u6765\u53d1\u5c55\u65b9\u5411\u3002|\n", "2409.18924": "|**2024-09-27**|**AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow**|Huizi Yu et.al.|[2409.18924](http://arxiv.org/abs/2409.18924)|null|\u5728\u73b0\u4ee3\u533b\u5b66\u6559\u80b2\u4e0e\u7814\u7a76\u9886\u57df\uff0c\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u53d1\u6325\u7740\u81f3\u5173\u91cd\u8981\u7684\u4f5c\u7528\uff0c\u5b83\u4eec\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5b89\u5168\u3001\u7efc\u5408\u7684\u5b66\u4e60\u73af\u5883\uff0c\u5e76\u5141\u8bb8\u8fdb\u884c\u4e34\u5e8a\u51b3\u7b56\u6a21\u62df\u3002\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6709\u671b\u901a\u8fc7\u9ad8\u4fdd\u771f\u5ea6\u548c\u4f4e\u6210\u672c\u5730\u590d\u5236\u533b\u7597\u72b6\u51b5\u548c\u533b\u60a3\u4e92\u52a8\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u786e\u4fdd\u8fd9\u4e9b\u7cfb\u7edf\u7684\u6709\u6548\u6027\u548c\u53ef\u4fe1\u6027\u4ecd\u662f\u4e00\u4e2a\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u9700\u8981\u4e00\u4e2a\u89c4\u6a21\u5927\u3001\u591a\u6837\u4e14\u7cbe\u786e\u7684\u60a3\u8005\u77e5\u8bc6\u5e93\uff0c\u540c\u65f6\u5177\u5907\u5f3a\u5927\u7684\u7a33\u5b9a\u77e5\u8bc6\u4f20\u64ad\u80fd\u529b\u3002 \u5728\u6b64\u80cc\u666f\u4e0b\uff0c\u6211\u4eec\u5f00\u53d1\u4e86AIPatient\uff0c\u8fd9\u662f\u4e00\u4e2a\u9ad8\u7ea7\u7684\u6a21\u62df\u60a3\u8005\u7cfb\u7edf\uff0c\u5b83\u4ee5AIPatient\u77e5\u8bc6\u56fe\u8c31\uff08AIPatient KG\uff09\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u91c7\u7528\u57fa\u4e8e\u63a8\u7406\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08Reasoning RAG\uff09\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u4f5c\u4e3a\u751f\u6210\u57fa\u7840\u3002AIPatient KG\u4eceMedical Information Mart for Intensive Care\uff08MIMIC-III\uff09\u6570\u636e\u5e93\u4e2d\u7684\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHRs\uff09\u62bd\u53d6\u6570\u636e\uff0c\u751f\u6210\u4e86\u4e00\u4e2a\u5728\u77e5\u8bc6\u5e93\u6709\u6548\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff08F1\u5f97\u5206\u4e3a0.89\uff09\u3001\u4e34\u5e8a\u591a\u6837\u6027\u548c\u76f8\u5173\u6027\u9ad8\u76841,495\u540d\u60a3\u8005\u7684\u7fa4\u4f53\u3002 Reasoning RAG\u5229\u7528\u4e86\u516d\u4e2a\u7531LLM\u9a71\u52a8\u7684\u4ee3\u7406\uff0c\u8986\u76d6\u4e86\u5305\u62ec\u68c0\u7d22\u3001KG\u67e5\u8be2\u751f\u6210\u3001\u62bd\u8c61\u3001\u68c0\u67e5\u3001\u91cd\u5199\u548c\u603b\u7ed3\u5728\u5185\u7684\u4efb\u52a1\u3002\u8fd9\u4e2a\u4ee3\u7406\u6846\u67b6\u5728\u57fa\u4e8eEHR\u7684\u533b\u7597\u95ee\u7b54\uff08QA\uff09\u4efb\u52a1\u4e0a\u8fbe\u5230\u4e8694.15%\u7684\u6574\u4f53\u51c6\u786e\u6027\uff0c\u663e\u8457\u4f18\u4e8e\u4ec5\u4f7f\u7528\u65e0\u4ee3\u7406\u6216\u90e8\u5206\u4ee3\u7406\u96c6\u6210\u7684\u57fa\u51c6\u3002 \u6211\u4eec\u7684\u7cfb\u7edf\u8fd8\u5c55\u793a\u4e86\u9ad8\u53ef\u8bfb\u6027\uff08\u4e2d\u4f4d\u6570Flesch\u9605\u8bfb\u8f7b\u677e\u5ea677.23\uff1b\u4e2d\u4f4d\u6570Flesch-Kincaid\u5e74\u7ea75.6\uff09\u3001\u7a33\u5065\u6027\uff08ANOVA F\u503c0.6126\uff0cp<0.1\uff09\u548c\u7a33\u5b9a\u6027\uff08ANOVA F\u503c0.782\uff0cp<0.1\uff09\u3002AIPatient\u7cfb\u7edf\u7684\u51fa\u8272\u6027\u80fd\u9884\u793a\u7740\u5176\u5728\u533b\u5b66\u6559\u80b2\u3001\u6a21\u578b\u8bc4\u4f30\u548c\u7cfb\u7edf\u96c6\u6210\u7b49\u591a\u4e2a\u5e94\u7528\u9886\u57df\u7684\u5de8\u5927\u6f5c\u529b\u3002|\n", "2409.18911": "|**2024-09-27**|**Soft Measures for Extracting Causal Collective Intelligence**|Maryam Berijanian et.al.|[2409.18911](http://arxiv.org/abs/2409.18911)|**[link](https://github.com/kuldeep7688/soft-measures-causal-intelligence)**|**\u7406\u89e3\u4e0e\u6a21\u62df\u96c6\u4f53\u667a\u6167\u5bf9\u4e8e\u5904\u7406\u590d\u6742\u793e\u4f1a\u7cfb\u7edf\u81f3\u5173\u91cd\u8981\u3002\u6a21\u7cca\u8ba4\u77e5\u5730\u56fe\uff08FCMs\uff09\u4f5c\u4e3a\u8868\u793a\u56e0\u679c\u5fc3\u7406\u6a21\u578b\u7684\u5f3a\u5927\u5de5\u5177\uff0c\u901a\u8fc7\u5b9a\u5411\u56fe\u8fdb\u884c\u7f16\u7801\uff0c\u4f46\u76f4\u63a5\u4ece\u6587\u672c\u63d0\u53d6\u9ad8\u53ef\u4fe1\u5ea6\u7684FCMs\u5177\u6709\u6311\u6218\u6027\u3002\u672c\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u63d0\u53d6FCMs\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u5f15\u5165\u4e86\u65b0\u9896\u7684\u57fa\u4e8e\u56fe\u7684\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u5e76\u901a\u8fc7\u4f7f\u7528Elo\u8bc4\u5206\u7cfb\u7edf\u5173\u8054\u8f93\u51fa\u4e0e\u4eba\u7c7b\u5224\u65ad\u6765\u8bc4\u4f30\u8fd9\u4e9b\u5ea6\u91cf\u3002\u7ed3\u679c\u663e\u793a\uff0c\u8fd9\u4e9b\u5ea6\u91cf\u4e0e\u4eba\u7c7b\u8bc4\u4ef7\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\uff0c\u5c3d\u7ba1\u8868\u73b0\u6700\u597d\u7684\u5ea6\u91cf\u4ecd\u7136\u5728\u6355\u6349FCM\u7ec6\u5fae\u5dee\u522b\u65b9\u9762\u5b58\u5728\u5c40\u9650\u6027\u3002\u5bf9LLMs\u8fdb\u884c\u5fae\u8c03\u53ef\u4ee5\u63d0\u9ad8\u6027\u80fd\uff0c\u4f46\u73b0\u6709\u7684\u5ea6\u91cf\u4ecd\u7136\u4e0d\u8db3\u4ee5\u6ee1\u8db3\u9700\u6c42\u3002\u672c\u7814\u7a76\u5f3a\u8c03\u4e86\u9700\u8981\u9488\u5bf9FCMs\u63d0\u53d6\u8bbe\u8ba1\u7684\u8f6f\u76f8\u4f3c\u6027\u5ea6\u91cf\uff0c\u4ece\u800c\u63a8\u52a8\u4e86\u4f7f\u7528NLP\u6a21\u62df\u96c6\u4f53\u667a\u6167\u7684\u53d1\u5c55\u3002**|\n", "2409.18892": "|**2024-09-27**|**IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation**|Fan Lin et.al.|[2409.18892](http://arxiv.org/abs/2409.18892)|**[link](https://github.com/DUTlf/IDGen)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u65e5\u76ca\u589e\u5f3a\uff0c\u8bc4\u4f30\u96c6\u5fc5\u987b\u4e0e\u65f6\u4ff1\u8fdb\uff0c\u4ee5\u786e\u4fdd\u5176\u6301\u7eed\u4fdd\u6301\u8db3\u591f\u7684\u533a\u5206\u80fd\u529b\u3002\u53d7\u6559\u80b2\u8bc4\u4f30\u4e2d\u5e7f\u6cdb\u4f7f\u7528\u7684\u9879\u76ee\u9274\u522b\uff08Item Discrimination, ID\uff09\u7406\u8bba\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8eID\u7684\u63d0\u793a\u5408\u6210\u6846\u67b6\uff0c\u7528\u4e8e\u8bc4\u4f30LLMs\uff0c\u786e\u4fdd\u8bc4\u4f30\u96c6\u80fd\u591f\u6839\u636e\u6a21\u578b\u7684\u80fd\u529b\u4e0d\u65ad\u66f4\u65b0\u548c\u4f18\u5316\u3002\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u6ce8\u91cd\u5e7f\u5ea6\u4e0e\u7cbe\u786e\u6027\u5e76\u91cd\u3002\u5b83\u80fd\u751f\u6210\u65e2\u80fd\u5168\u9762\u8bc4\u4f30LLMs\u80fd\u529b\uff0c\u53c8\u80fd\u63ed\u793a\u4e0d\u540c\u6a21\u578b\u4e4b\u95f4\u6709\u610f\u4e49\u6027\u80fd\u5dee\u5f02\u7684\u63d0\u793a\uff0c\u4ece\u800c\u5b9e\u73b0\u5bf9\u5b83\u4eec\u5728\u5404\u79cd\u4efb\u52a1\u548c\u9886\u57df\u4e2d\u7684\u76f8\u5bf9\u5f3a\u9879\u548c\u5f31\u70b9\u7684\u6709\u6548\u533a\u5206\u3002 \u4e3a\u4e86\u4ea7\u751f\u9ad8\u8d28\u91cf\u7684\u6570\u636e\uff0c\u6211\u4eec\u5728\u901a\u7528\u5316\u6846\u67b6\u4e2d\u878d\u5165\u4e86\u4e00\u4e2a\u81ea\u6211\u6821\u6b63\u673a\u5236\uff0c\u5e76\u5f00\u53d1\u4e86\u4e24\u4e2a\u6a21\u578b\u6765\u9884\u6d4b\u63d0\u793a\u7684\u9274\u522b\u80fd\u529b\u548c\u96be\u5ea6\u8bc4\u5206\uff0c\u4ee5\u6b64\u63a8\u52a8\u6211\u4eec\u7684\u6570\u636e\u5408\u6210\u6846\u67b6\u3002\u8fd9\u4e9b\u5de5\u5177\u5bf9\u8bc4\u4f30\u6570\u636e\u5408\u6210\u7814\u7a76\u5177\u6709\u91cd\u8981\u4ef7\u503c\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u6570\u636e\u5e94\u7528\u4e8e\u8bc4\u4f30\u4e94\u6b3e\u6700\u5148\u8fdb\u7684\u6a21\u578b\u3002\u8be5\u6570\u636e\u5e73\u5747\u5f97\u5206\u4e3a51.92\uff0c\u65b9\u5dee\u4e3a10.06\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u5148\u524d\u7684\u5de5\u4f5c\uff08\u5982SELF-INSTRUCT\u548cWizardLM\uff09\u7684\u5e73\u5747\u5f97\u5206\u8d85\u8fc767\uff0c\u65b9\u5dee\u4f4e\u4e8e3.2\u3002\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u6846\u67b6\u751f\u6210\u7684\u6570\u636e\u5728\u6311\u6218\u6027\u548c\u533a\u5206\u80fd\u529b\u4e0a\u6bd4\u4e4b\u524d\u7684\u5de5\u4f5c\u66f4\u5177\u4f18\u52bf\u3002\u6211\u4eec\u8ba1\u5212\u53d1\u5e03\u5305\u542b\u8d85\u8fc73000\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u63d0\u793a\u7684\u6570\u636e\u5e93\uff0c\u4ee5\u4fc3\u8fdbLLMs\u8bc4\u4f30\u7814\u7a76\u7684\u53d1\u5c55\u3002|\n", "2409.18858": "|**2024-09-27**|**Predicting and analyzing memorization within fine-tuned Large Language Models**|J\u00e9r\u00e9mie Dentan et.al.|[2409.18858](http://arxiv.org/abs/2409.18858)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u56e0\u5176\u5728\u89e3\u51b3\u590d\u6742\u4efb\u52a1\u65b9\u9762\u7684\u80fd\u529b\u800c\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u8bb0\u5fc6\u4e86\u76f8\u5f53\u5927\u7684\u6bd4\u4f8b\uff0c\u8fd9\u5728\u63a8\u7406\u65f6\u6784\u6210\u4e86\u4e25\u91cd\u7684\u5a01\u80c1\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u79cd\u65e0\u610f\u7684\u8bb0\u5fc6\u95ee\u9898\uff0c\u7406\u89e3\u54ea\u4e9b\u5143\u7d20\u88ab\u8bb0\u5fc6\u4ee5\u53ca\u539f\u56e0\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\u5927\u591a\u6570\u73b0\u6709\u5de5\u4f5c\u63d0\u4f9b\u7684\u662f\u4e8b\u540e\u89e3\u91ca\uff0c\u8fd9\u5728\u5b9e\u8df5\u4e2d\u5174\u8da3\u6709\u9650\u3002\u4e3a\u586b\u8865\u8fd9\u4e00\u7f3a\u53e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u65b9\u6cd5\uff0c\u57fa\u4e8e\u5207\u7247\u4e92\u4fe1\u606f\uff0c\u5728\u5206\u7c7b\u573a\u666f\u4e2d\u9884\u5148\u68c0\u6d4b\u8bb0\u5fc6\u6837\u672c\u3002\u8be5\u65b9\u6cd5\u4ece\u8bad\u7ec3\u7684\u65e9\u671f\u9636\u6bb5\u5c31\u5177\u6709\u9ad8\u6548\u6027\uff0c\u5e76\u4e14\u6613\u4e8e\u9002\u5e94\u5b9e\u9645\u573a\u666f\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5f97\u5230\u4e86\u65b0\u7684\u7406\u8bba\u7ed3\u679c\u7684\u652f\u6301\uff0c\u6211\u4eec\u901a\u8fc7\u5b9e\u9a8c\u5c55\u793a\u4e86\u8fd9\u4e00\u70b9\uff0c\u5e76\u4e14\u9700\u8981\u8f83\u4f4e\u7684\u8ba1\u7b97\u9884\u7b97\u3002\u6211\u4eec\u83b7\u5f97\u4e86\u5f3a\u5927\u7684\u5b9e\u8bc1\u7ed3\u679c\uff0c\u4e3a\u5728\u8bb0\u5fc6\u53d1\u751f\u4e4b\u524d\u7cfb\u7edf\u5730\u68c0\u67e5\u548c\u4fdd\u62a4\u8fd9\u4e9b\u6613\u53d7\u5f71\u54cd\u7684\u6837\u672c\u94fa\u5e73\u4e86\u9053\u8def\u3002|\n", "2409.18857": "|**2024-09-27**|**Mitigating Selection Bias with Node Pruning and Auxiliary Options**|Hyeong Kyu Choi et.al.|[2409.18857](http://arxiv.org/abs/2409.18857)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u56de\u7b54\u591a\u9009\u9898\u65f6\u5f80\u5f80\u8868\u73b0\u51fa\u5bf9\u67d0\u4e9b\u9009\u9879\u7684\u4e0d\u9002\u5f53\u504f\u597d\uff0c\u8fd9\u5728LLM\u81ea\u52a8\u5316\u7cfb\u7edf\u4e2d\u5f15\u53d1\u4e86\u663e\u8457\u7684\u53ef\u9760\u6027\u95ee\u9898\u3002\u4ee5\u5f80\u7684\u89e3\u51b3\u65b9\u6848\u4e3b\u8981\u901a\u8fc7\u8c03\u6574\u6a21\u578b\u7684\u8f93\u5165\u548c/\u6216\u8f93\u51fa\u6765\u5e94\u5bf9\u504f\u89c1\u95ee\u9898\u3002\u800c\u6211\u4eec\u7684\u5de5\u4f5c\u5219\u91c7\u53d6\u4e86\u4e0d\u540c\u7684\u8def\u5f84\uff0c\u65e8\u5728\u63a2\u7a76\u6a21\u578b\u5185\u90e8\u504f\u89c1\u7684\u5f62\u6210\u673a\u5236\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u504f\u5dee\u8282\u70b9\u4fee\u526a\uff08BNP\uff09\u7684\u65b0\u9896\u53bb\u504f\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u65e8\u5728\u5220\u9664\u90a3\u4e9b\u5bfc\u81f4\u504f\u89c1\u7684\u7ebf\u6027\u5c42\u53c2\u6570\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u8f85\u52a9\u9009\u9879\u6ce8\u5165\uff08AOI\uff09\u7684\u7b80\u5355\u800c\u6709\u6548\u7684\u8f93\u5165\u4fee\u6539\u6280\u672f\uff0c\u9002\u7528\u4e8e\u9ed1\u76d2\u6a21\u578b\u7684\u53bb\u504f\u3002\u4e3a\u4e86\u63d0\u4f9b\u4e00\u4e2a\u66f4\u7cfb\u7edf\u7684\u65b9\u6cd5\u6765\u8bc4\u4f30\u9009\u62e9\u504f\u89c1\uff0c\u6211\u4eec\u56de\u987e\u4e86\u73b0\u6709\u6307\u6807\uff0c\u5e76\u63d0\u51fa\u4e86\u9009\u62e9Kullback-Leibler\u6563\u5ea6\uff08CKLD\uff09\uff0c\u4ee5\u89e3\u51b3\u5e38\u7528\u6307\u6807\u5bf9\u6807\u7b7e\u4e0d\u5e73\u8861\u4e0d\u654f\u611f\u7684\u95ee\u9898\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5e94\u7528\u5230\u4e09\u79cd\u4e0d\u540c\u7684LLM\u65f6\u8868\u73b0\u51fa\u4e86\u9c81\u68d2\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2409.18812": "|**2024-09-27**|**LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis**|Hamed Babaei Giglou et.al.|[2409.18812](http://arxiv.org/abs/2409.18812)|**[link](https://github.com/HamedBabaei/LLMs4Synthesis)**|\u9762\u5bf9\u79d1\u5b66\u6587\u732e\u65e5\u76ca\u589e\u957f\u7684\u590d\u6742\u6027\u548c\u6570\u91cf\uff0c\u672c\u6587\u63d0\u51fa\u4e86LLMs4Synthesis\u6846\u67b6\uff0c\u65e8\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u9ad8\u8d28\u91cf\u79d1\u5b66\u7efc\u5408\u5206\u6790\u7684\u80fd\u529b\u3002\u8be5\u6846\u67b6\u9488\u5bf9\u5feb\u901f\u3001\u8fde\u8d2f\u548c\u8bed\u5883\u4e30\u5bcc\u7684\u79d1\u5b66\u89c1\u89e3\u96c6\u6210\u9700\u6c42\uff0c\u5229\u7528\u5f00\u6e90\u548c\u4e13\u6709LLMs\uff0c\u4ee5\u89e3\u51b3\u5f53\u524d\u5b9a\u91cf\u6307\u6807\u5728\u8bc4\u4f30\u8fd9\u4e9b\u7efc\u5408\u5206\u6790\u65f6\u5b58\u5728\u7684\u4e0d\u8db3\u3002\u901a\u8fc7\u5f00\u53d1\u4e00\u79cd\u5904\u7406\u79d1\u5b66\u8bba\u6587\u7684\u65b0\u65b9\u6cd5\u3001\u5b9a\u4e49\u65b0\u7684\u7efc\u5408\u7c7b\u578b\u4ee5\u53ca\u5efa\u7acb\u4e5d\u9879\u8be6\u7ec6\u7684\u8d28\u91cf\u8bc4\u4f30\u6807\u51c6\uff0c\u6211\u4eec\u7684\u7814\u7a76\u5bf9\u8fd9\u4e00\u9886\u57df\u505a\u51fa\u4e86\u8d21\u732e\u3002\u6211\u4eec\u8fd8\u63d0\u8bae\u5c06LLMs\u4e0e\u5f3a\u5316\u5b66\u4e60\u548cAI\u53cd\u9988\u76f8\u7ed3\u5408\uff0c\u4ee5\u4f18\u5316\u7efc\u5408\u8d28\u91cf\uff0c\u5e76\u786e\u4fdd\u5176\u4e0e\u65e2\u5b9a\u6807\u51c6\u4fdd\u6301\u4e00\u81f4\u3002LLMs4Synthesis\u6846\u67b6\u53ca\u5176\u7ec4\u6210\u90e8\u5206\u7684\u53ef\u7528\u6027\uff0c\u6709\u671b\u63d0\u5347\u79d1\u5b66\u7814\u7a76\u7efc\u5408\u8fc7\u7a0b\u7684\u751f\u6210\u548c\u8bc4\u4ef7\u80fd\u529b\u3002|\n", "2409.18794": "|**2024-09-27**|**Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs**|Yanyuan Qiao et.al.|[2409.18794](http://arxiv.org/abs/2409.18794)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u9879\u540d\u4e3aOpen-Nav\u7684\u521b\u65b0\u7814\u7a76\uff0c\u65e8\u5728\u63a2\u7d22\u5f00\u6e90\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8fde\u7eed\u73af\u5883\u4e2d\u7684\u96f6\u6837\u672c\u89c6\u89c9\u4e0e\u8bed\u8a00\u5bfc\u822a\uff08VLN\uff09\u4efb\u52a1\u5e94\u7528\u3002Open-Nav\u91c7\u7528\u4e86\u7a7a\u95f4\u65f6\u95f4\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63a8\u7406\u65b9\u6cd5\uff0c\u5c06\u4efb\u52a1\u5206\u89e3\u4e3a\u6307\u4ee4\u7406\u89e3\u3001\u8fdb\u5ea6\u4f30\u8ba1\u548c\u51b3\u7b56\u5236\u5b9a\u4e09\u4e2a\u90e8\u5206\uff0c\u4ee5\u63d0\u9ad8\u6a21\u578b\u5728\u5bfc\u822a\u573a\u666f\u4e2d\u7684\u611f\u77e5\u80fd\u529b\u5e76\u589e\u5f3a\u5bf9\u7ec6\u7c92\u5ea6\u7269\u4f53\u548c\u7a7a\u95f4\u77e5\u8bc6\u7684\u7406\u89e3\u3002\u5b9e\u9a8c\u7ed3\u679c\u5728\u6a21\u62df\u73af\u5883\u548c\u771f\u5b9e\u4e16\u754c\u73af\u5883\u4e2d\u5747\u663e\u793a\uff0cOpen-Nav\u80fd\u591f\u4e0e\u4f7f\u7528\u95ed\u6e90LLMs\u5b9e\u73b0\u76f8\u5f53\u7684\u7ade\u4e89\u6027\u6027\u80fd\u3002|\n", "2409.20566": "|**2024-09-30**|**MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning**|Haotian Zhang et.al.|[2409.20566](http://arxiv.org/abs/2409.20566)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bb6\u65cfMM1.5\uff0c\u65e8\u5728\u589e\u5f3a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7406\u89e3\u3001\u89c6\u89c9\u5f15\u7528\u4e0e\u5b9a\u4f4d\u4ee5\u53ca\u591a\u56fe\u50cf\u63a8\u7406\u7684\u80fd\u529b\u3002\u5728MM1\u67b6\u6784\u7684\u57fa\u7840\u4e0a\uff0cMM1.5\u91c7\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6a21\u578b\u8bad\u7ec3\uff0c\u7cfb\u7edf\u6027\u5730\u63a2\u7d22\u5728\u6574\u4e2a\u6a21\u578b\u8bad\u7ec3\u751f\u547d\u5468\u671f\u5185\u4e0d\u540c\u6570\u636e\u6df7\u5408\u7684\u5f71\u54cd\u3002\u8fd9\u5305\u62ec\u9ad8\u8d28\u91cf\u7684OCR\u6570\u636e\u548c\u5408\u6210\u63cf\u8ff0\u7b26\u7528\u4e8e\u6301\u7eed\u9884\u8bad\u7ec3\uff0c\u4ee5\u53ca\u4f18\u5316\u7684\u89c6\u89c9\u6307\u4ee4\u8c03\u53c2\u6570\u636e\u6df7\u5408\u7528\u4e8e\u76d1\u7763\u5fae\u8c03\u3002\u6211\u4eec\u7684\u6a21\u578b\u6db5\u76d6\u4e86\u4ece1\u4ebf\u523030\u4ebf\u53c2\u6570\u7684\u8303\u56f4\uff0c\u5305\u62ec\u5bc6\u96c6\u578b\u548c\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u53d8\u4f53\uff0c\u5e76\u8bc1\u660e\u4e86\u5373\u4f7f\u5728\u8f83\u5c0f\u89c4\u6a21\uff081\u4ebf\u548c3\u4ebf\u53c2\u6570\uff09\u4e0b\uff0c\u7cbe\u5fc3\u7684\u6570\u636e\u6574\u7406\u548c\u8bad\u7ec3\u7b56\u7565\u4e5f\u80fd\u4ea7\u751f\u5f3a\u5927\u7684\u6027\u80fd\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e24\u4e2a\u4e13\u95e8\u7684\u53d8\u4f53\uff1aMM1.5-Video\uff0c\u7528\u4e8e\u89c6\u9891\u7406\u89e3\uff1bMM1.5-UI\uff0c\u7528\u4e8e\u79fb\u52a8\u7528\u6237\u754c\u9762\u7406\u89e3\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u548c\u6d88\u878d\u5206\u6790\uff0c\u6211\u4eec\u63d0\u4f9b\u4e86\u5173\u4e8e\u8bad\u7ec3\u8fc7\u7a0b\u548c\u51b3\u7b56\u7684\u8be6\u7ec6\u89c1\u89e3\uff0c\u8fd9\u4e9b\u89c1\u89e3\u5bf9\u4e8e\u672a\u6765\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u5177\u6709\u5b9d\u8d35\u7684\u6307\u5bfc\u610f\u4e49\u3002|\n", "2409.20557": "|**2024-09-30**|**Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos**|Md Mohaiminul Islam et.al.|[2409.20557](http://arxiv.org/abs/2409.20557)|null|\u672c\u6587\u63d0\u51fa\u4e86VidAssist\uff0c\u4e00\u4e2a\u7528\u4e8e\u4ece\u6559\u5b66\u89c6\u9891\u4e2d\u8fdb\u884c\u96f6\u6837\u672c\u6216\u5c11\u91cf\u6837\u672c\u7684\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u7684\u96c6\u6210\u6846\u67b6\u3002VidAssist\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4f5c\u4e3a\u77e5\u8bc6\u5e93\u548c\u8bc4\u4f30\u5de5\u5177\uff0c\u751f\u6210\u5e76\u8bc4\u4f30\u884c\u52a8\u8ba1\u5212\uff0c\u4ee5\u6b64\u514b\u670d\u4ece\u5c0f\u89c4\u6a21\u3001\u4f4e\u591a\u6837\u6027\u6570\u636e\u96c6\u83b7\u53d6\u8fc7\u7a0b\u77e5\u8bc6\u7684\u6311\u6218\u3002\u6b64\u5916\uff0cVidAssist\u91c7\u7528\u5e7f\u5ea6\u4f18\u5148\u641c\u7d22\u7b97\u6cd5\u8fdb\u884c\u6700\u4f18\u8ba1\u5212\u751f\u6210\uff0c\u5e76\u4f7f\u7528\u4e13\u4e3a\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u8ba1\u7684\u4ef7\u503c\u51fd\u6570\uff0c\u5728\u6bcf\u4e00\u6b65\u8bc4\u4f30\u9884\u6d4b\u52a8\u4f5c\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cVidAssist\u63d0\u4f9b\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u4e0d\u540c\u76ee\u6807\u5bfc\u5411\u89c4\u5212\u8bbe\u7f6e\u7684\u7edf\u4e00\u6846\u67b6\uff0c\u5982\u89c6\u89c9\u8f85\u52a9\u89c4\u5212\uff08VPA\uff09\u548c\u7a0b\u5e8f\u89c4\u5212\uff08PP\uff09\uff0c\u5728\u96f6\u6837\u672c\u548c\u5c11\u91cf\u6837\u672c\u8bbe\u7f6e\u4e0b\u8868\u73b0\u51fa\u5353\u8d8a\u6027\u80fd\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u5c11\u91cf\u6837\u672c\u6a21\u578b\u5728COIN\u6570\u636e\u96c6\u4e0a\u7684VPA\u4efb\u52a1\u548cPP\u4efb\u52a1\u4e0a\u5206\u522b\u6bd4\u5168\u76d1\u7763\u7684\u524d\u5bfc\u65b9\u6cd5\u9ad8\u51fa+7.7%\u548c+4.81%\uff0c\u540c\u65f6\u9884\u6d4b4\u4e2a\u672a\u6765\u52a8\u4f5c\u3002\u6240\u6709\u4ee3\u7801\u548c\u6a21\u578b\u90fd\u5728https://sites.google.com/view/vidassist\u516c\u5f00\u63d0\u4f9b\u3002|\n", "2409.20550": "|**2024-09-30**|**LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation**|Ziyao Zhang et.al.|[2409.20550](http://arxiv.org/abs/2409.20550)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u9879\u9488\u5bf9\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u5e7b\u89c9\u73b0\u8c61\u7684\u5b9e\u8bc1\u7814\u7a76\u3002\u5c3d\u7ba1LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\u4ee4\u4eba\u9f13\u821e\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u5b9e\u9645\u5f00\u53d1\u8fc7\u7a0b\u4e2d\u590d\u6742\u7684\u4e0a\u4e0b\u6587\u4f9d\u8d56\u5173\u7cfb\u65f6\uff0c\u5f80\u5f80\u4f1a\u4ea7\u751f\u9519\u8bef\u6216\u4e0d\u51c6\u786e\u7684\u7ed3\u679c\u3002\u4ee5\u5f80\u7684\u7814\u7a76\u4e3b\u8981\u5173\u6ce8\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7801\u751f\u6210\u5728\u5355\u4e00\u529f\u80fd\u751f\u6210\u573a\u666f\u4e0b\u7684\u5e7b\u89c9\u5206\u6790\uff0c\u4f46\u672c\u6587\u5c06\u7814\u7a76\u8303\u56f4\u6269\u5c55\u81f3\u66f4\u5b9e\u9645\u4e14\u590d\u6742\u7684\u4ed3\u5e93\u7ea7\u751f\u6210\u60c5\u666f\u3002 \u9996\u5148\uff0c\u901a\u8fc7\u4eba\u5de5\u68c0\u67e5\u516d\u79cd\u4e3b\u6d41LLM\u7684\u4ee3\u7801\u751f\u6210\u7ed3\u679c\uff0c\u672c\u6587\u5efa\u7acb\u4e86LLM\u751f\u6210\u4ee3\u7801\u7684\u5e7b\u89c9\u5206\u7c7b\u4f53\u7cfb\u3002\u63a5\u4e0b\u6765\uff0c\u8be6\u7ec6\u9610\u8ff0\u4e86\u5e7b\u89c9\u73b0\u8c61\uff0c\u5e76\u5206\u6790\u4e86\u4e0d\u540c\u6a21\u578b\u95f4\u5e7b\u89c9\u5206\u5e03\u7684\u60c5\u51b5\u3002\u8fdb\u4e00\u6b65\u5730\uff0c\u672c\u6587\u63a2\u8ba8\u4e86\u5e7b\u89c9\u4ea7\u751f\u7684\u539f\u56e0\uff0c\u5e76\u8bc6\u522b\u4e86\u56db\u4e2a\u53ef\u80fd\u5bfc\u81f4\u5e7b\u89c9\u7684\u56e0\u7d20\u3002 \u6700\u540e\uff0c\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8bb0\u5fc6\u7f51\u7edc\uff08RAG\uff09\u7684\u7f13\u89e3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u6240\u6709\u7814\u7a76\u7684LLM\u4e0a\u5747\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6709\u6548\u6027\u3002\u63d0\u4f9b\u4e86\u4e00\u4e2a\u5305\u62ec\u4ee3\u7801\u3001\u6570\u636e\u548c\u5b9e\u9a8c\u7ed3\u679c\u7684\u53ef\u590d\u5236\u5305\uff0c\u4f9b\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u53c2\u8003\u548c\u9a8c\u8bc1\u3002\u6b64\u7814\u7a76\u6709\u52a9\u4e8e\u63d0\u9ad8LLM\u5728\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u7684\u53ef\u9760\u6027\u4e0e\u51c6\u786e\u6027\uff0c\u5bf9\u8f6f\u4ef6\u5de5\u7a0b\u9886\u57df\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20548": "|**2024-09-30**|**Robi Butler: Remote Multimodal Interactions with Household Robot Assistant**|Anxing Xiao et.al.|[2409.20548](http://arxiv.org/abs/2409.20548)|null|\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86Robi Butler\uff0c\u4e00\u79cd\u65b0\u578b\u7684\u5bb6\u5ead\u673a\u5668\u4eba\u7cfb\u7edf\uff0c\u5b83\u80fd\u591f\u4e0e\u8fdc\u7a0b\u7528\u6237\u8fdb\u884c\u591a\u6a21\u6001\u4ea4\u4e92\u3002\u57fa\u4e8e\u5148\u8fdb\u7684\u901a\u4fe1\u63a5\u53e3\uff0cRobi Butler\u5141\u8bb8\u7528\u6237\u76d1\u63a7\u673a\u5668\u4eba\u7684\u72b6\u6001\u3001\u53d1\u9001\u6587\u672c\u6216\u8bed\u97f3\u6307\u4ee4\uff0c\u5e76\u901a\u8fc7\u624b\u52bf\u9009\u62e9\u76ee\u6807\u5bf9\u8c61\u3002\u6211\u4eec\u7684\u7cfb\u7edf\u7684\u6838\u5fc3\u662f\u4e00\u4e2a\u7531\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u9a71\u52a8\u7684\u9ad8\u7ea7\u884c\u4e3a\u6a21\u5757\uff0c\u8be5\u6a21\u5757\u80fd\u591f\u89e3\u91ca\u591a\u6a21\u6001\u6307\u4ee4\u5e76\u751f\u6210\u884c\u52a8\u8ba1\u5212\u3002\u8fd9\u4e9b\u8ba1\u5212\u7531\u652f\u6301\u6587\u672c\u548c\u70b9\u51fb\u67e5\u8be2\u7684\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5904\u7406\u7684\u5f00\u653e\u8bcd\u6c47\u96c6\u7ec4\u6210\u3002\u6574\u5408\u4ee5\u4e0a\u7ec4\u4ef6\u4f7f\u5f97Robi Butler\u80fd\u591f\u5728\u96f6\u6837\u672c\u7684\u60c5\u51b5\u4e0b\u5c06\u8fdc\u7a0b\u591a\u6a21\u6001\u6307\u4ee4\u8f6c\u5316\u4e3a\u73b0\u5b9e\u4e16\u754c\u5bb6\u5ead\u73af\u5883\u4e2d\u7684\u5b9e\u9645\u64cd\u4f5c\u3002\u6211\u4eec\u901a\u8fc7\u6f14\u793a\u5404\u79cd\u65e5\u5e38\u5bb6\u52a1\u4efb\u52a1\u7684\u6709\u6548\u6027\u548c\u6548\u7387\uff0c\u5c55\u793a\u4e86\u8be5\u7cfb\u7edf\u7684\u5e94\u7528\uff0c\u8fd9\u4e9b\u4efb\u52a1\u6d89\u53ca\u5230\u8fdc\u7a0b\u7528\u6237\u7ed9\u51fa\u591a\u6a21\u6001\u6307\u4ee4\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u8fdb\u884c\u4e86\u7528\u6237\u7814\u7a76\uff0c\u5206\u6790\u4e86\u591a\u6a21\u6001\u4ea4\u4e92\u5bf9\u8fdc\u7a0b\u4eba\u673a\u4ea4\u4e92\u7684\u6548\u7387\u548c\u7528\u6237\u4f53\u9a8c\u7684\u5f71\u54cd\uff0c\u5e76\u8ba8\u8bba\u4e86\u53ef\u80fd\u7684\u6539\u8fdb\u63aa\u65bd\u3002|\n", "2409.20512": "|**2024-09-30**|**Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models**|Arpan Mukherjee et.al.|[2409.20512](http://arxiv.org/abs/2409.20512)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u89e3\u51b3\u51c6\u786e\u9884\u6d4b\u5de5\u4e1a\u5408\u6210\u4e2d\u6240\u7528\u9499\u949b\u77ff\u6eb6\u5242\u6bd2\u6027\u8fd9\u4e00\u6311\u6218\u3002\u7531\u4e8e\u7f3a\u4e4f\u9488\u5bf9\u6027\u548c\u7ed3\u6784\u5316\u7684\u6bd2\u6027\u6570\u636e\uff0c\u8fd9\u4e00\u4efb\u52a1\u9762\u4e34\u5c40\u9650\u6027\u3002\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u8bed\u8a00\u6a21\u578b\u7684\u81ea\u52a8\u5316\u6570\u636e\u63d0\u53d6\u4e0e\u5177\u6709\u4e0d\u786e\u5b9a\u6027\u4fe1\u606f\u7684\u9884\u6d4b\u6a21\u578b\uff0c\u4ee5\u586b\u8865\u6570\u636e\u7a7a\u767d\u5e76\u63d0\u9ad8\u9884\u6d4b\u7684\u7f6e\u4fe1\u5ea6\u3002 \u9996\u5148\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u4e24\u79cd\u65b9\u6cd5\u4ece\u6d89\u53ca\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u7684\u79d1\u5b66\u6587\u732e\u8bed\u6599\u5e93\u4e2d\u81ea\u52a8\u63d0\u53d6\u76f8\u5173\u6570\u636e\uff1a\u8f83\u5c0f\u7684\u53cc\u5411\u8bed\u8a00\u6a21\u578b\uff08\u5982BERT\u548cELMo\uff09\u56e0\u5176\u91cd\u590d\u6027\u548c\u786e\u5b9a\u6027\u8f93\u51fa\u800c\u88ab\u4f7f\u7528\uff1b\u800c\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5982GPT-3.5\u5219\u5229\u7528\u5176\u5e9e\u5927\u7684\u8bad\u7ec3\u8bed\u6599\u5e93\u548c\u66f4\u597d\u7684\u54cd\u5e94\u751f\u6210\u80fd\u529b\u3002\u6211\u4eec\u7684\u201c\u63d0\u793a\u548c\u9a8c\u8bc1\u201d\u6280\u672f\u96c6\u6210\u5230LLM\u4e2d\uff0c\u65e8\u5728\u5b9e\u73b0\u6709\u9488\u5bf9\u6027\u7684\u63d0\u53d6\u548c\u4f18\u5316\uff0c\u4ece\u800c\u51cf\u5c11LLM\u7684\u5e7b\u89c9\u73b0\u8c61\uff0c\u63d0\u5347\u63d0\u53d6\u6570\u636e\u7684\u8d28\u91cf\u3002 \u63a5\u4e0b\u6765\uff0c\u63d0\u53d6\u7684\u6570\u636e\u88ab\u8f93\u5165\u5230\u9884\u8bad\u7ec3\u7684\u591a\u4efb\u52a1\u4e8c\u5143\u5206\u7c7b\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\uff0c\u7528\u4e8e\u9884\u6d4b\u63d0\u53d6\u6eb6\u5242\u7684ED\u6027\u8d28\u3002\u6211\u4eec\u5229\u7528\u4ece\u5206\u7c7b\u6a21\u578b\u83b7\u5f97\u7684\u7c7b\u522b\u6982\u7387\u8fdb\u884c\u9999\u519c\u71b5\u4e3a\u57fa\u7840\u7684\u4e0d\u786e\u5b9a\u6027\u91cf\u5316\uff0c\u4ee5\u6b64\u6765\u91cf\u5316\u4e0d\u786e\u5b9a\u6027\u5e76\u8bc6\u522b\u9884\u6d4b\u4e2d\u7684\u6570\u636e\u7f3a\u53e3\u3002\u8fd9\u79cd\u65b9\u6cd5\u5bfc\u81f4\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u7528\u4e8e\u9499\u949b\u77ff\u5408\u6210\u6eb6\u5242\u53ca\u5176\u57fa\u4e8e\u4e0d\u786e\u5b9a\u6027\u865a\u62df\u6bd2\u6027\u7684\u8bc4\u4f30\u6570\u636e\u96c6\u3002 \u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u548c\u5f26\u56fe\u6765\u53ef\u89c6\u5316\u6eb6\u5242\u4e4b\u95f4\u7684\u76f8\u4e92\u4f5c\u7528\uff0c\u5e76\u4f18\u5148\u8003\u8651\u90a3\u4e9b\u53ef\u80fd\u5b58\u5728\u5371\u9669\u7684\u6eb6\u5242\uff0c\u7ed3\u679c\u53d1\u73b070%\u7684\u6eb6\u5242\u76f8\u4e92\u4f5c\u7528\u4e3b\u8981\u4e0e\u7279\u5b9a\u7684\u4e24\u79cd\u9499\u949b\u77ff\u76f8\u5173\u8054\u3002|\n", "2409.20502": "|**2024-09-30**|**COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models**|Divyanshu Daiya et.al.|[2409.20502](http://arxiv.org/abs/2409.20502)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aCOLLAGE\u7684\u65b0\u578b\u6846\u67b6\uff0c\u7528\u4e8e\u901a\u8fc7\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u548c\u5c42\u6b21\u5316\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u5411\u91cf\u91cf\u5316\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VQ-VAE\uff09\u6765\u751f\u6210\u534f\u4f5c\u5f0f\u4ee3\u7406-\u5bf9\u8c61-\u4ee3\u7406\u4ea4\u4e92\u3002\u6211\u4eec\u7684\u6a21\u578b\u89e3\u51b3\u4e86\u8fd9\u4e00\u9886\u57df\u6570\u636e\u7a00\u7f3a\u7684\u95ee\u9898\uff0c\u901a\u8fc7\u6574\u5408LLM\u7684\u77e5\u8bc6\u548c\u63a8\u7406\u80fd\u529b\u6765\u6307\u5bfc\u751f\u6210\u6027\u6269\u6563\u6a21\u578b\u3002\u5c42\u6b21\u5316\u7684VQ-VAE\u67b6\u6784\u5728\u591a\u4e2a\u62bd\u8c61\u7ea7\u522b\u6355\u83b7\u4e86\u4e0d\u540c\u7684\u8fd0\u52a8\u7279\u5f02\u6027\u7279\u5f81\uff0c\u907f\u514d\u4e86\u5197\u4f59\u6982\u5ff5\uff0c\u5e76\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u591a\u5206\u8fa8\u7387\u8868\u793a\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u5728\u9690\u7a7a\u95f4\u4e2d\u64cd\u4f5c\u7684\u6269\u6563\u6a21\u578b\uff0c\u5e76\u7ed3\u5408\u4e86\u7531LLM\u751f\u6210\u7684\u8fd0\u52a8\u89c4\u5212\u63d0\u793a\u6765\u5f15\u5bfc\u53bb\u566a\u8fc7\u7a0b\uff0c\u4ece\u800c\u5b9e\u73b0\u4e86\u9488\u5bf9\u7279\u5b9a\u63d0\u793a\u7684\u8fd0\u52a8\u751f\u6210\uff0c\u5177\u6709\u66f4\u9ad8\u7684\u63a7\u5236\u6027\u548c\u591a\u6837\u6027\u3002\u5728CORE-4D\u548cInterHuman\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u751f\u6210\u771f\u5b9e\u4e14\u591a\u6837\u5316\u7684\u534f\u4f5c\u4eba\u7c7b-\u7269\u4f53-\u4eba\u7c7b\u4ea4\u4e92\u65b9\u9762\u7684\u6709\u6548\u6027\uff0c\u8d85\u8d8a\u4e86\u73b0\u6709\u6700\u4f73\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u673a\u5668\u4eba\u5b66\u3001\u56fe\u5f62\u5b66\u548c\u8ba1\u7b97\u673a\u89c6\u89c9\u7b49\u9886\u57df\u5efa\u6a21\u590d\u6742\u4ea4\u4e92\u63d0\u4f9b\u4e86\u65b0\u7684\u53ef\u80fd\u6027\u3002|\n", "2409.20441": "|**2024-10-01**|**Instance-adaptive Zero-shot Chain-of-Thought Prompting**|Xiaosong Yuan et.al.|[2409.20441](http://arxiv.org/abs/2409.20441)|null|\u96f6\u5c04\u94fe\u601d\u8003\uff08CoT\uff09\u63d0\u793a\u7b56\u7565\u5728\u589e\u5f3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u89e3\u51b3\u73b0\u5b9e\u4e16\u754c\u63a8\u7406\u4efb\u52a1\u7684\u6027\u80fd\u65b9\u9762\u5c55\u73b0\u51fa\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5355\u4e00\u4efb\u52a1\u7ea7\u63d0\u793a\u5728\u6574\u4e2a\u5b9e\u4f8b\u4e0a\u7684\u5e94\u7528\u5b58\u5728\u5c40\u9650\u6027\uff0c\u56e0\u4e3a\u4e00\u4e2a\u63d0\u793a\u65e0\u6cd5\u4e0e\u6240\u6709\u5b9e\u4f8b\u90fd\u6210\u4e3a\u6700\u4f73\u642d\u6863\u3002\u56e0\u6b64\uff0c\u66f4\u6070\u5f53\u7684\u505a\u6cd5\u662f\u7cbe\u5fc3\u8003\u8651\u63d0\u793a\u4e0e\u6bcf\u4e2a\u5b9e\u4f8b\u4e4b\u95f4\u7684\u4e92\u52a8\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b97\u6cd5\u4f5c\u4e3a\u96f6\u5c04CoT\u63a8\u7406\u7684\u4e00\u79cd\u66ff\u4ee3\u7b56\u7565\uff0c\u65e8\u5728\u901a\u8fc7\u9002\u5f53\u5730\u533a\u5206\u51fa\u597d\u7684\u548c\u574f\u7684\u63d0\u793a\u6765\u63d0\u5347\u6027\u80fd\u3002 \u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u9996\u5148\u901a\u8fc7\u4fe1\u606f\u6d41\u7684\u89d2\u5ea6\u5bf9LLM\u8fdb\u884c\u5206\u6790\uff0c\u4ee5\u63ed\u793a\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\uff0c\u53d1\u73b0\u4fe1\u606f\u4ece\u95ee\u9898\u5230\u63d0\u793a\u4ee5\u53ca\u95ee\u9898\u5230\u63a8\u7406\u7684\u53cc\u5411\u6d41\u52a8\u5bf9\u63a8\u7406\u7ed3\u679c\u5f71\u54cd\u6700\u5927\u3002\u6211\u4eec\u6ce8\u610f\u5230\uff0c\u66f4\u4f18\u79c0\u7684\u96f6\u5c04CoT\u63a8\u7406\u9700\u8981\u63d0\u793a\u4ece\u95ee\u9898\u4e2d\u83b7\u53d6\u8bed\u4e49\u4fe1\u606f\uff0c\u7136\u540e\u63a8\u7406\u4ece\u95ee\u9898\u76f4\u63a5\u6216\u901a\u8fc7\u63d0\u793a\u95f4\u63a5\u5730\u805a\u5408\u8db3\u591f\u4fe1\u606f\u3002\u76f8\u53cd\uff0c\u7f3a\u5931\u8fd9\u4e9b\u4efb\u4f55\u4e00\u9879\u53ef\u80fd\u90fd\u4f1a\u5bfc\u81f4\u4e00\u4e2a\u4e0d\u7406\u60f3\u7684\u63d0\u793a\u3002\u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e00\u4e2a\u9002\u7528\u4e8e\u96f6\u5c04CoT\u63a8\u7406\u7684\u5b9e\u4f8b\u81ea\u9002\u5e94\u63d0\u793a\u7b56\u7565\uff08IAP\uff09\u3002 \u5728LLaMA-2\u3001LLaMA-3\u548cQwen\u4e0a\u5bf9\u6570\u5b66\u3001\u903b\u8f91\u548c\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\uff08\u5982GSM8K\u3001MMLU\u3001\u56e0\u679c\u5224\u65ad\uff09\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u5b9e\u4f8b\u81ea\u9002\u5e94\u96f6\u5c04CoT\u63d0\u793a\u7b56\u7565\u5728\u67d0\u4e9b\u5b9a\u5236\u63d0\u793a\u6216\u590d\u6742\u7a0b\u5e8f\u7684\u57fa\u7840\u4e0a\u8868\u73b0\u51fa\u66f4\u597d\u7684\u6027\u80fd\uff0c\u8fd9\u8bc1\u660e\u4e86\u6211\u4eec\u5728\u96f6\u5c04CoT\u63a8\u7406\u673a\u5236\u7814\u7a76\u4e2d\u7684\u53d1\u73b0\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002|\n", "2409.20385": "|**2024-09-30**|**Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation**|Shan Chen et.al.|[2409.20385](http://arxiv.org/abs/2409.20385)|null|\u80cc\u666f\uff1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u88ab\u8bad\u7ec3\u6210\u9075\u5faa\u6307\u4ee4\uff0c\u4f46\u8fd9\u79cd\u8bbe\u8ba1\u4f7f\u5176\u5bb9\u6613\u5728\u751f\u6210\u9519\u8bef\u4fe1\u606f\u65f6\u76f2\u76ee\u9075\u4ece\u7528\u6237\u8bf7\u6c42\u3002\u5728\u533b\u5b66\u9886\u57df\uff0c\u8fd9\u53ef\u80fd\u4f1a\u52a0\u901f\u9519\u8bef\u4fe1\u606f\u7684\u4f20\u64ad\uff0c\u4ece\u800c\u5f71\u54cd\u4eba\u7c7b\u5065\u5eb7\u3002\u7814\u7a76\u76ee\u6807/\u65b9\u6cd5\uff1a\u6211\u4eec\u5206\u6790\u4e86\u6a21\u578b\u5728\u77e5\u9053\u8bf7\u6c42\u4e0d\u5408\u7406\u7684\u60c5\u51b5\u4e0b\uff0c\u751f\u6210\u4e0e\u836f\u7269\u6709\u5173\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u503e\u5411\u3002\u6211\u4eec\u63a2\u8ba8\u4e86\u901a\u8fc7\u4e0a\u4e0b\u6587\u63d0\u793a\u548c\u8c03\u6574\u53c2\u6570\uff0c\u4f7fLLMs\u4f18\u5148\u8003\u8651\u903b\u8f91\u63a8\u7406\u800c\u975e\u9075\u4ece\u6027\uff0c\u4ee5\u964d\u4f4e\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u98ce\u9669\u7684\u53ef\u80fd\u6027\u3002 \u7ed3\u679c\uff1a\u6240\u6709\u524d\u6cbfLLMs\u90fd\u9075\u5b88\u4e86\u751f\u6210\u8bef\u5bfc\u6027\u5185\u5bb9\u7684\u4e0d\u5408\u7406\u8bf7\u6c42\u3002\u7136\u800c\uff0c\u57fa\u4e8e\u63d0\u793a\u7684\u65b9\u6cd5\u548c\u53c2\u6570\u8c03\u6574\u7b56\u7565\u53ef\u4ee5\u63d0\u5347\u68c0\u6d4b\u8bf7\u6c42\u903b\u8f91\u9519\u8bef\u7684\u80fd\u529b\uff0c\u5e76\u9632\u6b62\u533b\u7597\u4fe1\u606f\u7684\u8bef\u4f20\u3002 \u7ed3\u8bba\uff1a\u5c06LLMs\u7684\u8bbe\u8ba1\u91cd\u5fc3\u4ece\u9075\u4ece\u6027\u8f6c\u5411\u903b\u8f91\u63a8\u7406\uff0c\u6709\u52a9\u4e8e\u964d\u4f4e\u5176\u88ab\u5229\u7528\u4e8e\u4f20\u64ad\u533b\u7597\u4fe1\u606f\u8bef\u5bfc\u7684\u98ce\u9669\u3002|\n", "2409.20370": "|**2024-09-30**|**The Perfect Blend: Redefining RLHF with Mixture of Judges**|Tengyu Xu et.al.|[2409.20370](http://arxiv.org/abs/2409.20370)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u7684\u540e\u8bad\u7ec3\u8303\u5f0f\uff0c\u79f0\u4e3a\u7ea6\u675f\u751f\u6210\u7b56\u7565\u4f18\u5316\uff08CGPO\uff09\u3002CGPO\u7684\u6838\u5fc3\u662f\u201c\u88c1\u5224\u6df7\u5408\u201d\uff08MoJ\uff09\uff0c\u5b83\u4ee5\u6210\u672c\u6548\u76ca\u7684\u65b9\u5f0f\u5bf9\u7b56\u7565\u8fdb\u884c\u5206\u5c42\u7ea6\u675f\u4f18\u5316\uff0c\u4ece\u800c\u5728\u539f\u7406\u4e0a\u8bc6\u522bRLHF\u4e2d\u7684\u5b8c\u7f8e\u878d\u5408\u3002\u6b64\u65b9\u6cd5\u5728\u7406\u8bba\u4e0a\u6709\u4fdd\u8bc1\uff0c\u4e0d\u9700\u8981\u5927\u91cf\u7684\u8d85\u53c2\u6570\u8c03\u6574\uff0c\u5e76\u4e14\u53ef\u4ee5\u5728\u5e38\u89c1\u7684\u540e\u8bad\u7ec3\u7ba1\u9053\u4e2d\u65e0\u7f1d\u96c6\u6210\u3002\u8fd9\u6709\u52a9\u4e8e\u68c0\u6d4b\u548c\u7f13\u89e3\u5956\u52b1\u4f5c\u5f0a\u884c\u4e3a\uff0c\u5e76\u5728\u5927\u91cf\u76ee\u6807\u7684\u573a\u666f\u4e0b\u8fbe\u5230\u5e15\u7d2f\u6258\u6700\u4f18\u70b9\u3002 \u6211\u4eec\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cCGPO\u5728\u5404\u79cd\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u6807\u51c6\u7684RLHF\u7b97\u6cd5\uff0c\u5982PPO\u548cDPO\uff0c\u5305\u62ec\u901a\u7528\u804a\u5929\u3001STEM\u95ee\u9898\u3001\u6307\u4ee4\u9075\u5faa\u548c\u7f16\u7a0b\u7b49\u3002\u5177\u4f53\u800c\u8a00\uff0cCGPO\u5728AlpacaEval-2\uff08\u901a\u7528\u804a\u5929\uff09\u4e0a\u63d0\u9ad8\u4e867.4%\uff0c\u5728Arena-Hard\uff08STEM\u4e0e\u63a8\u7406\uff09\u4e0a\u63d0\u9ad8\u4e8612.5%\uff0c\u5e76\u5728\u6570\u5b66\u548c\u5176\u4ed6\u9886\u57df\u5982\u7f16\u7a0b\u7b49\u4efb\u52a1\u4e0a\u4fdd\u6301\u4e00\u81f4\u7684\u6539\u8fdb\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u867d\u7136PPO\u7ecf\u5e38\u88ab\u4f7f\u7528\uff0c\u4f46\u5728\u6d41\u884c\u7684\u7f16\u7a0b\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u5b83\u5bb9\u6613\u906d\u53d7\u4e25\u91cd\u7684\u5956\u52b1\u4f5c\u5f0a\uff0c\u800cCGPO\u6210\u529f\u5730\u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\u3002 \u8fd9\u4e00\u7a81\u7834\u5728RLHF\u9886\u57df\u4e0d\u4ec5\u89e3\u51b3\u4e86\u5956\u52b1\u4f5c\u5f0a\u548c\u6781\u7aef\u591a\u76ee\u6807\u4f18\u5316\u7684\u6311\u6218\uff0c\u800c\u4e14\u63a8\u8fdb\u4e86\u901a\u7528\u8bed\u8a00\u6a21\u578b\u5728\u591a\u79cd\u5e94\u7528\u4e2d\u7684\u5bf9\u9f50\u6280\u672f\u3002|\n", "2409.20365": "|**2024-09-30**|**VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs**|Ruotong Liao et.al.|[2409.20365](http://arxiv.org/abs/2409.20365)|null|\u5728\u89c6\u9891\u8bed\u8a00\u9886\u57df\uff0c\u5229\u7528\u96f6\u6837\u672c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u63a8\u7406\u8fdb\u884c\u89c6\u9891\u7406\u89e3\u7684\u6700\u65b0\u5de5\u4f5c\u5df2\u6210\u4e3a\u6311\u6218\u4f20\u7edf\u7aef\u5230\u7aef\u6a21\u578b\u7684\u6709\u529b\u7ade\u4e89\u8005\u3002\u7136\u800c\uff0c\u957f\u89c6\u9891\u7684\u7406\u89e3\u9762\u4e34\u7740\u72ec\u7279\u7684\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5904\u7406\u6301\u7eed\u65f6\u95f4\u8f83\u957f\u7684\u65f6\u95f4\u8de8\u5ea6\u65f6\uff0c\u5373\u4f7f\u662f\u96f6\u6837\u672cLLM\u65b9\u6cd5\u4e5f\u662f\u5982\u6b64\u3002\u957f\u89c6\u9891\u4e2d\u7684\u4fe1\u606f\u5197\u4f59\u95ee\u9898\u4fc3\u4f7f\u6211\u4eec\u601d\u8003\u54ea\u4e9b\u4fe1\u606f\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u81f3\u5173\u91cd\u8981\uff0c\u4ee5\u53ca\u5982\u4f55\u5229\u7528\u5b83\u4eec\u8fdb\u884c\u590d\u6742\u7684\u7a7a\u95f4-\u65f6\u95f4\u63a8\u7406\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u957f\u89c6\u9891\u5206\u6790\u7684\u7406\u89e3\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aVideoINSTA\uff08INformative Spatial-TemporAl Reasoning\uff09\u7684\u6846\u67b6\uff0c\u7528\u4e8e\u96f6\u6837\u672c\u957f\u89c6\u9891\u7406\u89e3\u3002VideoINSTA\u7684\u4e3b\u8981\u8d21\u732e\u5305\u62ec\uff1a\uff081\uff09\u5229\u7528LLM\u8fdb\u884c\u957f\u89c6\u9891\u7406\u89e3\u7684\u96f6\u6837\u672c\u6846\u67b6\uff1b\uff082\uff09\u4e8b\u4ef6\u9a71\u52a8\u7684\u65f6\u95f4\u63a8\u7406\u548c\u57fa\u4e8e\u5185\u5bb9\u7684\u7a7a\u95f4\u63a8\u7406\u65b9\u6cd5\uff0c\u4f7fLLM\u80fd\u591f\u5bf9\u89c6\u9891\u4e2d\u7684\u7a7a\u95f4-\u65f6\u95f4\u4fe1\u606f\u8fdb\u884c\u63a8\u7406\uff1b\uff083\uff09\u4e00\u79cd\u81ea\u6211\u53cd\u601d\u7684\u4fe1\u606f\u63a8\u7406\u65b9\u6848\uff0c\u901a\u8fc7\u4fe1\u606f\u5145\u5206\u6027\u548c\u9884\u6d4b\u7f6e\u4fe1\u5ea6\u7684\u5e73\u8861\u6765\u8c03\u6574\u65f6\u95f4\u56e0\u7d20\u3002 \u6211\u4eec\u7684\u6a21\u578b\u5728\u4e09\u4e2a\u957f\u89c6\u9891\u95ee\u7b54\u57fa\u51c6\u6d4b\u8bd5\u4e0a\u663e\u8457\u63d0\u9ad8\u4e86\u73b0\u6709\u6700\u4f73\u6027\u80fd\uff1aEgoSchema\u3001NextQA\u548cIntentQA\uff0c\u4ee5\u53ca\u5f00\u653e\u95ee\u7b54\u6570\u636e\u96c6ActivityNetQA\u3002\u4ee3\u7801\u5df2\u5728\u6b64\u5904\u53d1\u5e03\uff1ahttps://github.com/mayhugotong/VideoINSTA\u3002|\n", "2410.01805": "|**2024-10-02**|**Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads**|Yuxiang Huang et.al.|[2410.01805](http://arxiv.org/abs/2410.01805)|**[link](https://github.com/huangyuxiang03/Locret)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u652f\u6301\u957f\u671f\u4e0a\u4e0b\u6587\u7406\u89e3\u548c\u5904\u7406\u4efb\u52a1\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u6b65\u3002\u7136\u800c\uff0c\u5c06LLMs\u7684\u751f\u6210\u63a8\u7406\u6269\u5c55\u5230\u5982\u6b64\u957f\u7684\u4e0a\u4e0b\u6587\u4f1a\u589e\u52a0\u5927\u91cf\u7684\u8ba1\u7b97\u8d1f\u8f7d\uff0c\u5e76\u8981\u6c42\u5728\u7ef4\u6301\u57fa\u4e8e\u8f6c\u6362\u5668\u7684LLMs\u7684\u5173\u952e\u503c\u5bf9\uff08KV\uff09\u7f13\u5b58\u65f6\u4f7f\u7528\u5927\u91cfGPU\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u538b\u7f29\u65b9\u6cd5\uff0c\u5982\u91cf\u5316\uff0c\u968f\u7740\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u589e\u52a0\u800c\u9047\u5230\u5185\u5b58\u74f6\u9888\uff1b\u800c\u56fa\u5b9a\u5927\u5c0f\u7684\u7f13\u5b58\uff0c\u5982\u6dd8\u6c70\u7b56\u7565\uff0c\u5219\u7531\u4e8e\u4e0d\u9ad8\u6548\u7684\u7b56\u7565\u800c\u5bfc\u81f4\u6548\u7387\u4f4e\u4e0b\u3002\u8fd9\u4e9b\u9650\u5236\u9650\u5236\u4e86\u5728\u5355\u4e2aNvidia 4090 GPU\u7b49\u6d88\u8d39\u8005\u7ea7\u8bbe\u5907\u4e0a\u7684\u90e8\u7f72\u3002 \u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86Locret\u6846\u67b6\uff0c\u8fd9\u662f\u4e00\u79cd\u7528\u4e8e\u957f\u4e0a\u4e0b\u6587LLM\u63a8\u7406\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5f15\u5165\u4fdd\u7559\u5934\u90e8\u6765\u8bc4\u4f30KV\u7f13\u5b58\u5355\u5143\u7684\u56e0\u679c\u91cd\u8981\u6027\uff0c\u4ece\u800c\u5141\u8bb8\u5728\u56fa\u5b9a\u7f13\u5b58\u5927\u5c0f\u5185\u8fdb\u884c\u66f4\u51c6\u786e\u7684\u6dd8\u6c70\u3002Locret\u5728\u51bb\u7ed3\u7684\u4e3b\u5e72LLM\u57fa\u7840\u4e0a\u8fdb\u884c\u4e86\u5fae\u8c03\uff0c\u4f7f\u7528\u6807\u51c6\u957f\u65f6\u95f4\u4e0a\u4e0b\u6587SFT\u6570\u636e\u96c6\u7684\u5c11\u91cf\u6570\u636e\u3002\u5728\u63a8\u7406\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4ee5\u5206\u5757\u9884\u586b\u5145\u6a21\u5f0f\u6dd8\u6c70\u4f4e\u91cd\u8981\u6027\u7684\u7f13\u5b58\u5355\u5143\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u5cf0\u503cGPU\u5185\u5b58\u4f7f\u7528\u91cf\u3002 \u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u8bc1\u7814\u7a76\u6765\u8bc4\u4f30Locret\uff0c\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6700\u8fd1\u7684\u7ade\u4e89\u65b9\u6cd5\uff08\u5305\u62ecInfLLM\u3001\u91cf\u5316\u3001SirLLM\u548cMInference\uff09\u76f8\u6bd4\uff0cLocret\u5728\u5185\u5b58\u6548\u7387\u548c\u751f\u6210\u5185\u5bb9\u8d28\u91cf\u65b9\u9762\u5747\u8868\u73b0\u51fa\u8272\u2014\u2014Locret\u5b9e\u73b0\u4e86\u4e0ePhi-3-mini-128K\u548cLlama-3.1-8B-instruct\u5168KV\u7f13\u5b58\u76f8\u6bd4\u8d85\u8fc720\u500d\u548c8\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\u7387\u3002\u6b64\u5916\uff0cLocret\u8fd8\u53ef\u4ee5\u4e0e\u5176\u4ed6\u65b9\u6cd5\uff08\u5982\u91cf\u5316\u548c\u4ee4\u724c\u5408\u5e76\uff09\u7ed3\u5408\u4f7f\u7528\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cLocret\u662f\u7b2c\u4e00\u4e2a\u80fd\u591f\u5c06Llama-3.1-8B\u6216\u7c7b\u4f3c\u6a21\u578b\u90e8\u7f72\u5230\u5355\u4e2aNvidia 4090 GPU\u4e0a\uff0c\u540c\u65f6\u5728\u4e0d\u727a\u7272\u751f\u6210\u8d28\u91cf\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0128K\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u7684\u6846\u67b6\uff0c\u4e14\u4ec5\u9700\u8981\u5c11\u91cf\u989d\u5916\u7684\u7cfb\u7edf\u4f18\u5316\u3002**|\n", "2410.01799": "|**2024-10-02**|**Efficient $1$-bit tensor approximations**|Alex W. Neal Riasanovsky et.al.|[2410.01799](http://arxiv.org/abs/2410.01799)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7a7a\u95f4\u6548\u7387\u9ad8\u7684\u77e9\u9635\u548c\u4efb\u610f\u9636\u5f20\u91cf\u5206\u89e3\u65b9\u6cd5\uff0c\u4f5c\u4e3a\u7ebf\u6027\u7ec4\u5408\u7684\u5f20\u91cf\u79ef\u5f62\u5f0f\uff0c\u5176\u4e2d\u5411\u91cf\u503c\u4e3a$\\{-1, 1\\}$\u3002\u5bf9\u4e8e\u4efb\u4e00\u77e9\u9635$A \\in \\mathbb{R}^{m \\times n}$\uff0c\u5176\u8868\u8fbe\u5f0f\u4e3a\uff1a$$A - R_w = S_w C_w T_w^\\top = \\sum_{j=1}^w c_j \\cdot \\mathbf{s}_j \\mathbf{t}_j^\\top$$ \u8fd9\u662f\u4e00\u4e2a\u5173\u4e8e$A$\u7684\u201c\u5bbd\u5ea6\u4e3a$w$\u7684\u7b26\u53f7\u5207\u5206\u89e3\u201d\u3002\u8fd9\u91cc$C_w = \"diag\"(\\mathbf{c}_w)$\uff0c\u4e14$S_w, T_w$\u548c\u5411\u91cf$\\mathbf{s}_j, \\mathbf{t}_j$\u5747\u4e3a$\\{-1, 1\\}$\u503c\u3002\u7528\u4e8e\u5b58\u50a8$(S_w, T_w, C_w)$\u6240\u9700\u7684\u7a7a\u95f4\u662f$w \\cdot (m + n)$\u4f4d\uff0c\u5e76\u4ec5\u9700$w$\u4e2a\u6d6e\u70b9\u6570\u3002\u5f53\u5e94\u7528\u4e8e\u5177\u6709i.i.d. $\\mathcal N (0, 1)$\u5206\u5e03\u5143\u7d20\u7684#f32\u77e9\u9635\u65f6\uff0c$\\,R_w\\,_F$\u5448\u73b0\u51fa\u6307\u6570\u8870\u51cf\u3002\u9009\u62e9\u5408\u9002\u7684$w$\uff0c\u4f7f$(S_w, T_w, C_w)$\u7684\u5185\u5b58\u5360\u7528\u4e0e\\textit{f16}\u6216\\textit{bf16}\u77e9\u9635\u76f8\u540c\uff0c\u76f8\u5bf9\u8bef\u5dee\u76f8\u5f53\u3002\u6211\u4eec\u7684\u7b97\u6cd5\u572820\u884c\u4f2a\u4ee3\u7801\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u6548\u7684\u7b26\u53f7\u5207\u5206\u89e3\u3002\u5b83\u6e90\u81ea1999\u5e74Frieze\u548cKannan\u7684\u4e00\u7bc7\u8457\u540d\u8bba\u6587\u7684\u7b80\u5355\u4fee\u6539\u3002 \u4f5c\u4e3a\u7b2c\u4e00\u4e2a\u5e94\u7528\uff0c\u6211\u4eec\u5bf9\u5f00\u653e\u6e90\u7801\u5927\u578b\u8bed\u8a00\u6a21\u578b\\textit{Mistral-7B-v0.1}\u4e2d\u7684\u6743\u91cd\u77e9\u9635\u8fdb\u884c\u4e86$50\\%$\u7684\u7a7a\u95f4\u538b\u7f29\u3002\u4ee4\u4eba\u60ca\u8bb6\u7684\u662f\uff0c\u6240\u6709$226$\u4e2a\u4f59\u77e9\u9635\u7684\u76f8\u5bf9\u8bef\u5dee\u5747\u5c0f\u4e8e$6\\%$\uff0c\u4e14\u6269\u5c55\u6a21\u578b\u5728huggingface\u6392\u884c\u699c\u4e0a\u4e0e\\textit{Mistral-7B-v0.1}\u6a21\u578b\u8868\u73b0\u76f8\u8fd1\u3002\u968f\u7740\u7a7a\u95f4\u538b\u7f29\u7387\u4ece$50\\%$\u964d\u4f4e\u81f3$25\\%$\uff0c\u57fa\u51c6\u6027\u80fd\u7f13\u6162\u4e0b\u964d\u3002\u6211\u4eec\u4f18\u5316\u4e86\u5f00\u6e90\u7684\\textit{rust}\u5b9e\u73b0\uff0c\u4f7f\u7528\u4e86\\textit{avx2}\u548c\\textit{avx512}\u67b6\u6784\u4e0b\u7684\\textit{simd}\u6307\u4ee4\u8fdb\u884c\u52a0\u901f\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c06\u8be5\u7b97\u6cd5\u6269\u5c55\u5230\u4e86\u4efb\u610f\u9636\u5f20\u91cf\uff0c\u5e76\u5229\u7528\u5b83\u538b\u7f29\u4e86\u4e00\u5f20\u4f5c\u8005\u732bAngus\u7684\u7167\u7247\u3002 \u8bf7\u6ce8\u610f\uff0c\u8fd9\u91cc\u7684\u6587\u672c\u5e76\u672a\u5305\u542b\u4efb\u4f55\u7279\u6b8a\u5b57\u7b26\u6216\u7279\u5b9a\u683c\u5f0f\u6807\u8bb0\uff0c\u800c\u662f\u4ee5\u7eaf\u6587\u672c\u5f62\u5f0f\u5448\u73b0\u4e86\u6458\u8981\u5185\u5bb9\u3002|\n", "2410.01795": "|**2024-10-02**|**Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models**|Joseph Lee et.al.|[2410.01795](http://arxiv.org/abs/2410.01795)|**[link](https://github.com/pennshenlab/freeform)**|**\u57fa\u4e8e\u590d\u6742\u9057\u4f20\u57fa\u7840\u9884\u6d4b\u8868\u578b\uff0c\u5229\u7528\u5c0f\u800c\u53ef\u89e3\u91ca\u7684\u53d8\u5f02\u7279\u5f81\u4ecd\u7136\u662f\u4e00\u9879\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\u3002\u4f20\u7edf\u4e0a\uff0c\u4f7f\u7528\u6570\u636e\u9a71\u52a8\u7684\u65b9\u6cd5\u8fdb\u884c\u6b64\u4efb\u52a1\uff0c\u4f46\u57fa\u56e0\u578b\u6570\u636e\u7684\u9ad8\u7ef4\u7279\u6027\u4f7f\u5f97\u5206\u6790\u548c\u9884\u6d4b\u53d8\u5f97\u56f0\u96be\u3002\u53d7\u5230\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7f16\u7801\u7684\u4e30\u5bcc\u77e5\u8bc6\u53ca\u5176\u5728\u5904\u7406\u590d\u6742\u751f\u7269\u533b\u5b66\u6982\u5ff5\u4e0a\u7684\u6210\u529f\u542f\u53d1\uff0c\u6211\u4eec\u65e8\u5728\u63a2\u7d22LLM\u5728\u8868\u683c\u57fa\u56e0\u578b\u6570\u636e\u7279\u5f81\u9009\u62e9\u4e0e\u5de5\u7a0b\u65b9\u9762\u7684\u80fd\u529b\uff0c\u5e76\u5f15\u5165\u4e00\u79cd\u57fa\u4e8e\u77e5\u8bc6\u7684\u6846\u67b6\u3002\u6211\u4eec\u5f00\u53d1\u4e86FREEFORM\uff0c\u4e00\u79cd\u81ea\u7531\u6d41\u52a8\u63a8\u7406\u4e0e\u96c6\u6210\u589e\u5f3a\u7279\u5f81\u8f93\u51fa\u548c\u7a33\u5065\u5efa\u6a21\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u7ed3\u5408\u4e86\u94fe\u5f0f\u601d\u8003\u4e0e\u96c6\u6210\u539f\u5219\uff0c\u5229\u7528LLM\u7684\u5185\u5728\u77e5\u8bc6\u6765\u9009\u62e9\u548c\u5de5\u7a0b\u7279\u5f81\u3002\u5728\u4e24\u4e2a\u4e0d\u540c\u7684\u4eba\u7c7b\u57fa\u56e0\u578b-\u8868\u578b\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u8bc4\u4f30\uff0c\u5305\u62ec\u9057\u4f20\u8840\u7edf\u548c\u9057\u4f20\u6027\u542c\u529b\u635f\u5931\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e2a\u6846\u67b6\u5728\u4f4e\u6837\u672c\u91cf\u60c5\u51b5\u4e0b\u4f18\u4e8e\u51e0\u79cd\u6570\u636e\u9a71\u52a8\u65b9\u6cd5\u3002FREEFORM\u4f5c\u4e3a\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\uff0c\u53ef\u4ee5\u5728GitHub\u4e0a\u83b7\u53d6\uff1ahttps://github.com/PennShenLab/FREEFORM\u3002**|\n", "2410.01792": "|**2024-10-02**|**When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1**|R. Thomas McCoy et.al.|[2410.01792](http://arxiv.org/abs/2410.01792)|null|\u5728\u201c\u81ea\u52a8\u56de\u5f52\u4f59\u70ec\u201d\uff08McCoy\u7b49\u4eba\uff0c2023\u5e74\uff09\u4e2d\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u51e0\u4e2a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8d77\u6e90\u4e0a\u5b58\u5728\u4e00\u4e9b\u91cd\u8981\u9650\u5236\uff0c\u8fd9\u5f52\u56e0\u4e8e\u5b83\u4eec\u7684\u4e0b\u4e00\u4e2a\u5355\u8bcd\u9884\u6d4b\u7279\u6027\u3002\u8fd9\u91cc\u6211\u4eec\u63a2\u8ba8\u4e86OpenAI\u7684\u65b0\u7cfb\u7edfo1\u662f\u5426\u4f9d\u7136\u5b58\u5728\u8fd9\u4e9b\u95ee\u9898\uff0c\u4e0e\u4e4b\u524d\u7684LLMs\u76f8\u6bd4\uff0co1\u5728\u63a8\u7406\u4f18\u5316\u65b9\u9762\u6709\u6240\u4e0d\u540c\u3002\u7814\u7a76\u53d1\u73b0\uff0co1\u5728\u8bb8\u591a\u60c5\u51b5\u4e0b\u663e\u8457\u4f18\u4e8e\u4e4b\u524d\u6a21\u578b\uff0c\u5728\u67d0\u4e9b\u5e38\u89c1\u4efb\u52a1\u7684\u7f55\u89c1\u53d8\u4f53\u4e0a\uff08\u4f8b\u5982\uff0c\u4ece\u5217\u8868\u4e2d\u7684\u6bcf\u4e2a\u8bcd\u7684\u7b2c\u4e8c\u4e2a\u5b57\u6bcd\u5f62\u6210\u7f29\u5199\uff0c\u800c\u4e0d\u662f\u7b2c\u4e00\u4e2a\u5b57\u6bcd\uff09\u8868\u73b0\u5c24\u5176\u51fa\u8272\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u5b9a\u91cf\u6539\u8fdb\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\uff0c\u4f46o1\u4f9d\u7136\u663e\u793a\u51fa\u4e86\u4e0e\u4e4b\u524d\u7cfb\u7edf\u76f8\u540c\u7684\u57fa\u672c\u8d8b\u52bf\uff1a\u5bf9\u4e8e\u6982\u7387\u8f83\u9ad8\u7684\u793a\u4f8b\u548c\u4efb\u52a1\uff0co1\u7684\u8868\u73b0\u66f4\u597d\u4e14\u9700\u8981\u7684\u201c\u601d\u8003\u4ee4\u724c\u201d\u6570\u91cf\u8f83\u5c11\uff1b\u800c\u5728\u6982\u7387\u8f83\u4f4e\u7684\u60c5\u51b5\u4e0b\u5219\u8868\u73b0\u4e0d\u4f73\u3002 \u8fd9\u4e9b\u7ed3\u679c\u8868\u660e\uff0c\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u4ee5\u8fdb\u884c\u63a8\u7406\u53ef\u4ee5\u51cf\u8f7b\u4f46\u53ef\u80fd\u65e0\u6cd5\u5b8c\u5168\u514b\u670d\u8bed\u8a00\u6a21\u578b\u7684\u6982\u7387\u654f\u611f\u6027\u95ee\u9898\u3002|\n", "2410.01789": "|**2024-10-02**|**Investigating on RLHF methodology**|Alexey Kutalev et.al.|[2410.01789](http://arxiv.org/abs/2410.01789)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u6839\u636e\u4eba\u7c7b\u504f\u597d\u7684\u5bf9\u9f50\u95ee\u9898\u3002\u6211\u4eec\u8ba8\u8bba\u4e86\u8bad\u7ec3\u504f\u597d\u6a21\u578b\u7684\u7279\u6027\uff0c\u8be5\u6a21\u578b\u6a21\u62df\u4eba\u7c7b\u504f\u597d\uff0c\u5e76\u4ecb\u7ecd\u4e86\u5b9e\u73b0\u6700\u4f73\u7ed3\u679c\u6240\u9700\u7684\u65b9\u6cd5\u548c\u7ec6\u8282\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63a2\u8ba8\u4e86\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u5fae\u8c03\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65b9\u6cd5\uff0c\u63cf\u8ff0\u4e86\u9047\u5230\u7684\u6311\u6218\u4ee5\u53ca\u514b\u670d\u8fd9\u4e9b\u6311\u6218\u7684\u65b9\u5f0f\u3002\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u76f4\u63a5\u504f\u597d\u4f18\u5316\u65b9\u6cd5\u7684\u7ecf\u9a8c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u6211\u4eec\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e0e\u4eba\u7c7b\u504f\u597d\u5bf9\u9f50\uff0c\u800c\u65e0\u9700\u521b\u5efa\u5355\u72ec\u7684\u504f\u597d\u6a21\u578b\u3002\u4f5c\u4e3a\u6211\u4eec\u7684\u8d21\u732e\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u901a\u8fc7\u56f0\u60d1\u5ea6\u7b5b\u9009\u6536\u96c6\u504f\u597d\u6570\u636e\u96c6\u7684\u65b9\u6cd5\uff0c\u8fd9\u4f7f\u5f97\u4e3a\u7279\u5b9a\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u8fd9\u6837\u7684\u6570\u636e\u96c6\u7684\u8fc7\u7a0b\u66f4\u52a0\u7b80\u4fbf\u4e14\u6210\u672c\u6548\u76ca\u66f4\u9ad8\u3002|\n", "2410.01784": "|**2024-10-02**|**OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models**|Heng Yang et.al.|[2410.01784](http://arxiv.org/abs/2410.01784)|**[link](https://github.com/yangheng95/OmniGenomeBench)**|**\u8fd1\u5e74\u6765\uff0c\u4eba\u5de5\u667a\u80fd\u9886\u57df\u7684\u8fdb\u6b65\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u6fc0\u53d1\u4e86\u5bf9\u57fa\u56e0\u7ec4\u57fa\u7840\u6a21\u578b\uff08GFMs\uff09\u7a81\u7834\u6027\u8fdb\u5c55\u7684\u671f\u5f85\u3002\u81ea\u751f\u547d\u8fdb\u5316\u4e4b\u521d\u5c31\u9690\u85cf\u5728\u591a\u6837\u5316\u7684\u57fa\u56e0\u7ec4\u4e2d\u7684\u201c\u81ea\u7136\u4e4b\u7801\u201d\uff0c\u8574\u542b\u7740\u5de8\u5927\u6f5c\u529b\uff0c\u80fd\u591f\u901a\u8fc7\u57fa\u56e0\u7ec4\u5efa\u6a21\u5bf9\u4eba\u7c7b\u548c\u751f\u6001\u7cfb\u7edf\u4ea7\u751f\u6df1\u8fdc\u5f71\u54cd\u3002\u8fd1\u671fGFM\u9886\u57df\u7684\u91cd\u8981\u7a81\u7834\uff0c\u5982Evo\uff0c\u5438\u5f15\u4e86\u5927\u91cf\u6295\u8d44\u4e0e\u5173\u6ce8\uff0c\u5b83\u4eec\u89e3\u51b3\u4e86\u957f\u671f\u5b58\u5728\u7684\u6311\u6218\uff0c\u5e76\u5c06\u57fa\u56e0\u7ec4\u7814\u7a76\u4ece\u624b\u52a8\u3001\u4e0d\u53ef\u9760\u548c\u4f4e\u6548\u7684\u4f20\u7edf\u6a21\u5f0f\u8f6c\u53d8\u4e3a\u81ea\u52a8\u5316\u3001\u53ef\u9760\u548c\u9ad8\u6548\u7684\u65b0\u8303\u5f0f\u3002\u5728\u57fa\u56e0\u7ec4\u5b66\u8fde\u7eed\u6280\u672f\u9769\u547d\u7684\u80cc\u666f\u4e0b\uff0cGFM\u7814\u7a76\u9762\u4e34\u4e24\u5927\u6311\u6218\uff1a\u7f3a\u4e4fGFM\u57fa\u51c6\u6d4b\u8bd5\u5de5\u5177\u4ee5\u53ca\u591a\u7ef4\u57fa\u56e0\u7ec4\u5b66\u7684\u5f00\u6e90\u8f6f\u4ef6\u7f3a\u5931\u3002\u8fd9\u4e9b\u6311\u6218\u963b\u788d\u4e86GFM\u5feb\u901f\u6f14\u8fdb\u53ca\u5176\u5e7f\u6cdb\u5e94\u7528\u4e8e\u7406\u89e3\u4e0e\u5408\u6210\u57fa\u56e0\u7ec4\u7b49\u6570\u5341\u5e74\u6765\u5b58\u5728\u7684\u95ee\u9898\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GFMBench\u6846\u67b6\uff0c\u4e00\u4e2a\u4e13\u6ce8\u4e8eGFM\u5bfc\u5411\u57fa\u51c6\u6d4b\u8bd5\u7684\u5e73\u53f0\u3002GFMBench\u6807\u51c6\u5316\u4e86\u57fa\u51c6\u5957\u4ef6\uff0c\u5e76\u5b9e\u73b0\u4e86\u5bf9\u5927\u91cf\u5f00\u6e90GFMs\u7684\u81ea\u52a8\u5316\u57fa\u51c6\u6d4b\u8bd5\u3002\u5b83\u96c6\u6210\u4e86\u6765\u81ea\u56db\u5927\u5927\u578b\u57fa\u51c6\u7684\u6570\u767e\u4e07\u4e2a\u57fa\u56e0\u5e8f\u5217\uff0c\u8986\u76d6\u6570\u767e\u79cd\u57fa\u56e0\u7ec4\u4efb\u52a1\uff0c\u4f7fGFMs\u6c11\u4e3b\u5316\uff0c\u9002\u7528\u4e8e\u5e7f\u6cdb\u7684\u865a\u62df\u57fa\u56e0\u7ec4\u5e94\u7528\u3002\u6b64\u5916\uff0cGFMBench\u4f5c\u4e3a\u5f00\u6e90\u8f6f\u4ef6\u53d1\u5e03\uff0c\u63d0\u4f9b\u7528\u6237\u53cb\u597d\u754c\u9762\u548c\u591a\u6837\u5316\u6559\u7a0b\uff0c\u9002\u7528\u4e8e\u81ea\u52a8\u6d4b\u8bd5\u4ee5\u53caRNA\u8bbe\u8ba1\u548c\u7ed3\u6784\u9884\u6d4b\u7b49\u590d\u6742\u4efb\u52a1\u3002\u4e3a\u4e86\u4fc3\u8fdb\u57fa\u56e0\u7ec4\u5efa\u6a21\u9886\u57df\u7684\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u6211\u4eec\u542f\u52a8\u4e86\u4e00\u4e2a\u516c\u5171\u6392\u884c\u699c\uff0c\u5c55\u793a\u7531AutoBench\u751f\u6210\u7684\u57fa\u51c6\u6027\u80fd\u3002GFMBench\u4ee3\u8868\u4e86\u6807\u51c6\u5316GFM\u57fa\u51c6\u6d4b\u8bd5\u548c\u6c11\u4e3b\u5316GFM\u5e94\u7528\u7684\u4e00\u5927\u6b65\u3002**|\n", "2410.01782": "|**2024-10-02**|**Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models**|Shayekh Bin Islam et.al.|[2410.01782](http://arxiv.org/abs/2410.01782)|**[link](https://github.com/ShayekhBinIslam/openrag)**|\u4e3a\u4e86\u63d0\u5347\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u4e8b\u5b9e\u51c6\u786e\u6027\u4e0a\u7684\u8868\u73b0\uff0c\u68c0\u7d22\u589e\u5f3a\u751f\u6210\uff08RAG\uff09\u65b9\u6cd5\u5df2\u7ecf\u5f97\u5230\u4e86\u5e7f\u6cdb\u7814\u7a76\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u65b9\u6cd5\u5f80\u5f80\u5728\u5229\u7528\u68c0\u7d22\u5230\u7684\u8bc1\u636e\u8fdb\u884c\u63a8\u7406\u7684\u80fd\u529b\u4e0a\u5b58\u5728\u5c40\u9650\u6027\uff0c\u5c24\u5176\u662f\u5728\u4f7f\u7528\u5f00\u6e90LLM\u65f6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u5dee\u8ddd\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u6846\u67b6\u2014\u2014Open-RAG\uff0c\u65e8\u5728\u589e\u5f3a\u5f00\u6e90LLM\u5728RAG\u4e2d\u7684\u63a8\u7406\u80fd\u529b\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u4efb\u610f\u5bc6\u96c6\u578bLLM\u8f6c\u6362\u6210\u4e00\u4e2a\u53c2\u6570\u9ad8\u6548\u7684\u7a00\u758f\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\uff0c\u80fd\u591f\u5904\u7406\u5305\u62ec\u5355\u8df3\u548c\u591a\u8df3\u67e5\u8be2\u5728\u5185\u7684\u590d\u6742\u63a8\u7406\u4efb\u52a1\u3002 Open-RAG\u7684\u72ec\u7279\u4e4b\u5904\u5728\u4e8e\uff0c\u5b83\u901a\u8fc7\u8bad\u7ec3\u6a21\u578b\u6765\u5e94\u5bf9\u770b\u4f3c\u76f8\u5173\u4f46\u5177\u6709\u8bef\u5bfc\u6027\u7684\u5e72\u6270\u9879\uff0c\u4ece\u800c\u6709\u6548\u5730\u5bfc\u822a\u590d\u6742\u573a\u666f\u3002\u901a\u8fc7\u5229\u7528\u6f5c\u5b66\u4e60\uff0cOpen-RAG\u52a8\u6001\u9009\u62e9\u76f8\u5173\u4e13\u5bb6\u5e76\u6574\u5408\u5916\u90e8\u77e5\u8bc6\uff0c\u4ee5\u63d0\u4f9b\u66f4\u51c6\u786e\u3001\u66f4\u5177\u4e0a\u4e0b\u6587\u7684\u76f8\u5173\u54cd\u5e94\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u6df7\u5408\u81ea\u9002\u5e94\u68c0\u7d22\u65b9\u6cd5\uff0c\u7528\u4e8e\u5224\u65ad\u68c0\u7d22\u7684\u5fc5\u8981\u6027\uff0c\u5e76\u5e73\u8861\u6027\u80fd\u589e\u76ca\u4e0e\u63a8\u7406\u901f\u5ea6\u4e4b\u95f4\u7684\u6743\u8861\u3002 \u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u57fa\u4e8eLlama2-7B\u7684Open-RAG\u5728\u5404\u79cd\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u4e2d\uff0c\u76f8\u8f83\u4e8eChatGPT\u3001Self-RAG\u548cCommand R+\u7b49\u6700\u5148\u8fdb\u7684LLM\u548cRAG\u6a21\u578b\uff0c\u8868\u73b0\u51fa\u66f4\u4f18\u7684\u8868\u73b0\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6a21\u578b\u5f00\u6e90\u5728https://openragmoe.github.io/\u3002|\n", "2410.01769": "|**2024-10-02**|**Quantifying Generalization Complexity for Large Language Models**|Zhenting Qi et.al.|[2410.01769](http://arxiv.org/abs/2410.01769)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u7406\u89e3\u590d\u6742\u67e5\u8be2\u548c\u6267\u884c\u9ad8\u7ea7\u4efb\u52a1\u7684\u975e\u51e1\u80fd\u529b\u7684\u540c\u65f6\uff0c\u5b83\u4eec\u7684\u6cdb\u5316\u80fd\u529b\u5f80\u5f80\u4e0e\u8bb0\u5fc6\u6df1\u5ea6\u4ea4\u7ec7\u5728\u4e00\u8d77\uff0c\u8fd9\u8981\u6c42\u6211\u4eec\u8fdb\u884c\u66f4\u7cbe\u786e\u7684\u8bc4\u4f30\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86Scylla\uff0c\u8fd9\u662f\u4e00\u4e2a\u52a8\u6001\u8bc4\u4f30\u6846\u67b6\uff0c\u5b9a\u91cf\u8861\u91cfLLMs\u7684\u6cdb\u5316\u80fd\u529b\u3002Scylla\u901a\u8fc7\u5728\u5206\u5e03\u5185\uff08ID\uff09\u548c\u5206\u5e03\u5916\uff08OOD\uff09\u6570\u636e\u4e0a\u8bc4\u4f30\u6a21\u578b\u6027\u80fd\u6765\u5206\u79bb\u6cdb\u5316\u4e0e\u8bb0\u5fc6\uff0c\u6d89\u53ca20\u4e2a\u4efb\u52a1\uff0c\u8986\u76d65\u4e2a\u590d\u6742\u5ea6\u7ea7\u522b\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u63ed\u793a\u4e86\u4efb\u52a1\u590d\u6742\u5ea6\u4e0eID\u548cOOD\u6570\u636e\u4e4b\u95f4\u7684\u6027\u80fd\u5dee\u8ddd\u4e4b\u95f4\u975e\u5355\u8c03\u7684\u5173\u7cfb\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u6cdb\u5316\u5c71\u8c37\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u8fd9\u4e00\u73b0\u8c61\u63ed\u793a\u4e86\u4e00\u4e2a\u5173\u952e\u9608\u503c\u2014\u2014\u79f0\u4e3a\u5173\u952e\u590d\u6742\u6027\u2014\u2014\u5728\u8be5\u9608\u503c\u5904\uff0c\u975e\u6cdb\u5316\u884c\u4e3a\u7684\u4f9d\u8d56\u8fbe\u5230\u5cf0\u503c\uff0c\u8868\u660e\u4e86LLMs\u6cdb\u5316\u80fd\u529b\u7684\u4e0a\u9650\u3002\u968f\u7740\u6a21\u578b\u5927\u5c0f\u7684\u589e\u52a0\uff0c\u5173\u952e\u590d\u6742\u6027\u5411\u66f4\u9ad8\u5c42\u6b21\u7684\u4efb\u52a1\u590d\u6742\u5ea6\u79fb\u52a8\uff0c\u8868\u660e\u66f4\u5927\u7684\u6a21\u578b\u53ef\u4ee5\u5728\u4f9d\u8d56\u4e8e\u8bb0\u5fc6\u4e4b\u524d\u5904\u7406\u66f4\u590d\u6742\u7684\u63a8\u7406\u4efb\u52a1\u3002\u5229\u7528Scylla\u548c\u5173\u952e\u590d\u6742\u6027\u7684\u6982\u5ff5\uff0c\u6211\u4eec\u5bf9\u5305\u62ec\u5f00\u6e90\u6a21\u578b\u5982LLaMA\u548cQwen\u5bb6\u65cf\u3001\u4ee5\u53ca\u95ed\u6e90\u6a21\u578b\u5982Claude\u548cGPT\u5728\u5185\u768428\u4e2aLLMs\u8fdb\u884c\u4e86\u57fa\u51c6\u6d4b\u8bd5\uff0c\u63d0\u4f9b\u4e86\u66f4\u7a33\u5065\u7684\u8bc4\u4f30\uff0c\u5e76\u5bf9LLMs\u7684\u6cdb\u5316\u80fd\u529b\u6709\u4e86\u66f4\u6e05\u6670\u7684\u7406\u89e3\u3002|\n", "2410.01744": "|**2024-10-02**|**LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks**|Mengzhao Jia et.al.|[2410.01744](http://arxiv.org/abs/2410.01744)|**[link](https://github.com/jill0001/leopard)**|\u6587\u672c\u4e30\u5bcc\u7684\u56fe\u50cf\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u666e\u904d\u5b58\u5728\uff0c\u5982\u5e7b\u706f\u7247\u6f14\u793a\u3001\u626b\u63cf\u6587\u6863\u548c\u7f51\u9875\u5feb\u7167\u7b49\uff0c\u5176\u4e2d\u6587\u672c\u4f5c\u4e3a\u6838\u5fc3\u89c6\u89c9\u5143\u7d20\u5f15\u5bfc\u6574\u4f53\u7406\u89e3\u3002\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u7684\u4efb\u52a1\u5c24\u5176\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u5b83\u4eec\u4e0d\u4ec5\u9700\u8981\u7406\u89e3\u5355\u4e2a\u56fe\u50cf\u7684\u5185\u5bb9\uff0c\u8fd8\u9700\u8981\u5728\u591a\u4e2a\u89c6\u89c9\u8f93\u5165\u4e4b\u95f4\u63a8\u7406\u5173\u7cfb\u548c\u903b\u8f91\u6d41\u7a0b\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u573a\u666f\u7684\u91cd\u8981\u6027\uff0c\u5f53\u524d\u7684\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5728\u5904\u7406\u6b64\u7c7b\u4efb\u52a1\u65f6\u9047\u5230\u4e24\u4e2a\u5173\u952e\u6311\u6218\uff1a\uff081\uff09\u7f3a\u4e4f\u9002\u5408\u4e8e\u591a\u56fe\u50cf\u6587\u672c\u4e30\u5bcc\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff1b\uff082\uff09\u96be\u4ee5\u5e73\u8861\u56fe\u50cf\u5206\u8fa8\u7387\u4e0e\u89c6\u89c9\u7279\u5f81\u5e8f\u5217\u957f\u5ea6\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u6311\u6218\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\OurMethod\uff0c\u4e00\u4e2a\u4e13\u95e8\u8bbe\u8ba1\u7528\u4e8e\u5904\u7406\u6d89\u53ca\u591a\u6587\u672c\u4e30\u5bcc\u56fe\u50cf\u7684\u89c6\u8bed\u8a00\u4efb\u52a1\u7684MLLM\u3002\u9996\u5148\uff0c\u6211\u4eec\u6536\u96c6\u4e86\u7ea6\u4e00\u767e\u4e07\u6761\u9488\u5bf9\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u573a\u666f\u7684\u9ad8\u8d28\u91cf\u591a\u6a21\u6001\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u9002\u5e94\u6027\u7684\u9ad8\u5206\u8fa8\u7387\u591a\u56fe\u50cf\u7f16\u7801\u6a21\u5757\uff0c\u6839\u636e\u8f93\u5165\u56fe\u50cf\u7684\u539f\u59cb\u7eb5\u6a2a\u6bd4\u548c\u5206\u8fa8\u7387\u52a8\u6001\u4f18\u5316\u89c6\u89c9\u5e8f\u5217\u957f\u5ea6\u7684\u5206\u914d\u3002\u5728\u4e00\u7cfb\u5217\u5e7f\u6cdb\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u6211\u4eec\u7684\u6a21\u578b\u5728\u591a\u6587\u672c\u4e30\u5bcc\u3001\u591a\u56fe\u50cf\u8bc4\u4f30\u4e2d\u8868\u73b0\u51fa\u4f18\u8d8a\u7684\u80fd\u529b\uff0c\u5e76\u5728\u901a\u7528\u9886\u57df\u8bc4\u4f30\u4e2d\u5c55\u73b0\u51fa\u7ade\u4e89\u529b\u3002|\n", "2410.01738": "|**2024-10-02**|**VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models**|Kailai Feng et.al.|[2410.01738](http://arxiv.org/abs/2410.01738)|**[link](https://github.com/carlofkl/vitaglyph)**|**\u672c\u6587\u5f15\u5165\u4e86\u4e00\u79cd\u53cc\u5206\u652f\u3001\u65e0\u9700\u8bad\u7ec3\u7684\u65b0\u578b\u827a\u672f\u5b57\u4f53\u751f\u6210\u65b9\u6cd5\u2014\u2014VitaGlyph\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u7075\u6d3b\u5730\u8868\u8fbe\u8f93\u5165\u5b57\u7b26\u7684\u6838\u5fc3\u6982\u5ff5\u4ee5\u53ca\u4e30\u5bcc\u76f8\u5173\u7684\u80cc\u666f\u4fe1\u606f\uff0c\u5b9e\u73b0\u827a\u672f\u5b57\u4f53\u4e0e\u53ef\u63a7\u5236\u7684\u51e0\u4f55\u53d8\u5316\u4e4b\u95f4\u7684\u5e73\u8861\uff0c\u4ece\u800c\u4fdd\u6301\u5b57\u4f53\u7684\u53ef\u8bfb\u6027\u3002VitaGlyph\u7684\u6838\u5fc3\u7406\u5ff5\u662f\u5c06\u8f93\u5165\u5b57\u7b26\u89c6\u4e3a\u7531\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7ec4\u6210\u7684\u573a\u666f\uff0c\u5e76\u5728\u4e0d\u540c\u51e0\u4f55\u53d8\u6362\u7a0b\u5ea6\u4e0b\u8fdb\u884c\u6e32\u67d3\u3002 \u5177\u4f53\u6765\u8bf4\uff0cVitaGlyph\u901a\u8fc7\u4ee5\u4e0b\u4e09\u4e2a\u9636\u6bb5\u6846\u67b6\u5b9e\u73b0\u5176\u529f\u80fd\uff1a(i) \u77e5\u8bc6\u83b7\u53d6\u9636\u6bb5\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bbe\u8ba1\u4e3b\u4f53\u548c\u5468\u56f4\u73af\u5883\u7684\u6587\u672c\u63cf\u8ff0\uff1b(ii) \u533a\u57df\u5206\u89e3\u9636\u6bb5\u8bc6\u522b\u6700\u5339\u914d\u4e3b\u4f53\u63cf\u8ff0\u7684\u90e8\u5206\uff0c\u5e76\u5c06\u8f93\u5165\u7684\u5b57\u7b26\u56fe\u50cf\u5206\u4e3a\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\uff1b(iii) \u5b57\u4f53\u98ce\u683c\u5316\u9636\u6bb5\u9996\u5148\u901a\u8fc7\u8bed\u4e49\u5b57\u4f53\u4f18\u5316\u4e3b\u4f53\u533a\u57df\u7684\u7ed3\u6784\uff0c\u7136\u540e\u5206\u522b\u4f7f\u7528\u53ef\u63a7\u7ec4\u5408\u751f\u6210\u6280\u672f\u6e32\u67d3\u4e3b\u4f53\u548c\u5468\u56f4\u533a\u57df\u7684\u7eb9\u7406\u3002 \u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cVitaGlyph\u4e0d\u4ec5\u5728\u827a\u672f\u6027\u548c\u53ef\u8bfb\u6027\u65b9\u9762\u8868\u73b0\u51fa\u8272\uff0c\u8fd8\u80fd\u591f\u63cf\u7ed8\u591a\u79cd\u5b9a\u5236\u6982\u5ff5\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5bcc\u6709\u521b\u610f\u548c\u6109\u60a6\u7684\u827a\u672f\u5b57\u4f53\u751f\u6210\u3002\u9879\u76ee\u4ee3\u7801\u5c06\u5728https://github.com/Carlofkl/VitaGlyph\u516c\u5f00\u63d0\u4f9b\u3002**|\n", "2410.02761": "|**2024-10-03**|**FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models**|Zhipei Xu et.al.|[2410.02761](http://arxiv.org/abs/2410.02761)|**[link](https://github.com/zhipeixu/fakeshield)**|\u751f\u6210\u5f0fAI\u7684\u5feb\u901f\u53d1\u5c55\u72b9\u5982\u4e00\u628a\u53cc\u5203\u5251\uff0c\u65e2\u4fc3\u8fdb\u4e86\u5185\u5bb9\u521b\u4f5c\uff0c\u4e5f\u4f7f\u5f97\u56fe\u50cf\u7f16\u8f91\u548c\u96be\u4ee5\u8fa8\u8bc6\u53d8\u5f97\u66f4\u52a0\u4fbf\u6377\u3002\u5f53\u524d\u7684\u56fe\u50cf\u4f2a\u9020\u68c0\u6d4b\u4e0e\u5b9a\u4f4d\uff08IFDL\uff09\u65b9\u6cd5\u867d\u7136\u5728\u4e00\u5b9a\u7a0b\u5ea6\u4e0a\u6709\u6548\uff0c\u4f46\u4ecd\u7136\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a1\uff09\u9ed1\u76d2\u6027\u8d28\uff0c\u5373\u65e0\u6cd5\u77e5\u6653\u5176\u68c0\u6d4b\u539f\u7406\uff1b2\uff09\u5bf9\u4e0d\u540c\u4f2a\u9020\u6280\u672f\uff08\u5982Photoshop\u3001DeepFake\u3001AIGC-Editing\u7b49\uff09\u7684\u6cdb\u5316\u80fd\u529b\u6709\u9650\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u53ef\u89e3\u91ca\u7684IFDL\u4efb\u52a1\uff0c\u5e76\u8bbe\u8ba1\u4e86\u5177\u6709\u591a\u6a21\u6001\u80fd\u529b\u7684\u6846\u67b6\u2014\u2014FakeShield\u3002\u8be5\u6846\u67b6\u65e8\u5728\u8bc4\u4f30\u56fe\u50cf\u7684\u771f\u5b9e\u6027\uff0c\u751f\u6210\u7be1\u6539\u533a\u57df\u7684\u63a9\u6a21\uff0c\u5e76\u57fa\u4e8e\u50cf\u7d20\u7ea7\u548c\u56fe\u50cf\u7ea7\u7684\u7be1\u6539\u7ebf\u7d22\u63d0\u4f9b\u5224\u65ad\u4f9d\u636e\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5229\u7528GPT-4o\u589e\u5f3a\u4e86\u73b0\u6709\u7684IFDL\u6570\u636e\u96c6\uff0c\u521b\u5efa\u4e86\u591a\u6a21\u6001\u7be1\u6539\u63cf\u8ff0\u6570\u636e\u96c6\uff08MMTD-Set\uff09\uff0c\u7528\u4e8e\u8bad\u7ec3FakeShield\u7684\u7be1\u6539\u5206\u6790\u80fd\u529b\u3002\u540c\u65f6\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u57df\u6807\u7b7e\u5f15\u5bfc\u7684\u53ef\u89e3\u91ca\u4f2a\u9020\u68c0\u6d4b\u6a21\u5757\uff08DTE-FDM\uff09\u548c\u591a\u6a21\u6001\u4f2a\u9020\u5b9a\u4f4d\u6a21\u5757\uff08MFLM\uff09\uff0c\u4ee5\u5e94\u5bf9\u5404\u79cd\u4f2a\u9020\u68c0\u6d4b\u89e3\u91ca\u548c\u5b9e\u73b0\u7531\u8be6\u7ec6\u6587\u672c\u63cf\u8ff0\u6307\u5bfc\u7684\u4f2a\u9020\u5b9a\u4f4d\u3002 \u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u9a8c\u8bc1\uff0cFakeShield\u6709\u6548\u5730\u68c0\u6d4b\u548c\u5b9a\u4f4d\u4e86\u5404\u79cd\u7be1\u6539\u6280\u672f\uff0c\u63d0\u4f9b\u4e86\u6bd4\u4ee5\u5f80IFDL\u65b9\u6cd5\u66f4\u53ef\u89e3\u91ca\u4e14\u6027\u80fd\u66f4\u4f18\u7684\u89e3\u51b3\u65b9\u6848\u3002|\n", "2410.02757": "|**2024-10-03**|**Loong: Generating Minute-level Long Videos with Autoregressive Language Models**|Yuqing Wang et.al.|[2410.02757](http://arxiv.org/abs/2410.02757)|null|\u5728\u751f\u6210\u65f6\u957f\u8fbe\u5230\u6570\u5206\u949f\u7684\u4e30\u5bcc\u5185\u5bb9\u89c6\u9891\u65b9\u9762\uff0c\u5c3d\u7ba1\u5177\u6709\u6311\u6218\u6027\u4f46\u524d\u666f\u5e7f\u9614\u3002\u81ea\u56de\u5f52\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u81ea\u7136\u8bed\u8a00\u5904\u7406\u9886\u57df\u751f\u6210\u8fde\u8d2f\u4e14\u957f\u5ea6\u8f83\u957f\u7684\u4ee4\u724c\u5e8f\u5217\u65b9\u9762\u53d6\u5f97\u4e86\u5de8\u5927\u6210\u529f\uff0c\u800c\u5728\u63a2\u7d22\u4f7f\u7528\u81ea\u56de\u5f52LLMs\u8fdb\u884c\u89c6\u9891\u751f\u6210\u65f6\uff0c\u4e3b\u8981\u5c40\u9650\u4e8e\u751f\u6210\u51e0\u79d2\u949f\u7684\u77ed\u89c6\u9891\u3002\u672c\u6587\u5bf9\u963b\u6b62\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u751f\u6210\u957f\u65f6\u95f4\u89c6\u9891\u7684\u6311\u6218\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\u3002\u57fa\u4e8e\u89c2\u5bdf\u548c\u5206\u6790\u7ed3\u679c\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u4e8e\u81ea\u56de\u5f52LLM\u7684\u89c6\u9891\u751f\u6210\u5668\u201cLoong\u201d\uff0c\u80fd\u591f\u751f\u6210\u957f\u8fbe\u6570\u5206\u949f\u7684\u89c6\u9891\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u6587\u672c\u4ee4\u724c\u548c\u89c6\u9891\u4ee4\u724c\u7edf\u4e00\u4e3a\u81ea\u56de\u5f52LLM\u53ef\u4ee5\u8fdb\u884c\u81ea\u56de\u5f52\u5efa\u6a21\u7684\u5e8f\u5217\uff0c\u5e76\u4ece\u96f6\u5f00\u59cb\u8bad\u7ec3\u6a21\u578b\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u6e10\u8fdb\u5f0f\u77ed\u81f3\u957f\u8bad\u7ec3\u548c\u635f\u5931\u91cd\u65b0\u52a0\u6743\u65b9\u6848\uff0c\u4ee5\u7f13\u89e3\u957f\u671f\u89c6\u9891\u8bad\u7ec3\u4e2d\u7684\u635f\u5931\u4e0d\u5e73\u8861\u95ee\u9898\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u7814\u7a76\u4e86\u63a8\u7406\u7b56\u7565\uff0c\u5305\u62ec\u89c6\u9891\u4ee4\u724c\u91cd\u7f16\u7801\u548c\u91c7\u6837\u7b56\u7565\uff0c\u4ee5\u51cf\u5c11\u63a8\u7406\u8fc7\u7a0b\u4e2d\u7d2f\u79ef\u7684\u8bef\u5dee\u3002\u6211\u4eec\u7684\u63d0\u51fa\u7684\u201cLoong\u201d\u53ef\u4ee5\u4ece10\u79d2\u7684\u89c6\u9891\u8fdb\u884c\u8bad\u7ec3\uff0c\u5e76\u6269\u5c55\u5230\u6839\u636e\u6587\u672c\u63d0\u793a\u751f\u6210\u6570\u5206\u949f\u7ea7\u522b\u7684\u957f\u89c6\u9891\uff0c\u5982\u7ed3\u679c\u6240\u793a\u3002\u66f4\u591a\u793a\u4f8b\u8bf7\u8bbf\u95ee\uff1ahttps://epiphqny.github.io/Loong-video\u3002|\n", "2410.02755": "|**2024-10-03**|**SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost**|Jifan Zhang et.al.|[2410.02755](http://arxiv.org/abs/2410.02755)|null|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSIEVE\u7684\u8f7b\u91cf\u7ea7\u66ff\u4ee3\u65b9\u6848\uff0c\u8be5\u65b9\u6848\u5728\u6210\u672c\u4ec5\u4e3aGPT-4o\u5355\u6b21\u8fc7\u6ee4\u8c03\u7528\u7684\u5341\u5206\u4e4b\u4e00\u7684\u60c5\u51b5\u4e0b\uff0c\u4ecd\u80fd\u4e0eGPT-4o\u7684\u51c6\u786e\u6027\u76f8\u5339\u914d\u3002SIEVE\u7684\u6838\u5fc3\u5728\u4e8e\u5c06GPT-4o\u548c\u8f7b\u91cf\u7ea7T5\u6a21\u578b\u65e0\u7f1d\u96c6\u6210\uff0c\u5e76\u4f7f\u7528\u4e3b\u52a8\u5b66\u4e60\u65b9\u6cd5\u5728\u5c11\u91cfGPT-4o\u8c03\u7528\u7684\u652f\u6301\u4e0b\u5bf9T5\u8fdb\u884c\u5fae\u8c03\u3002\u4e00\u65e6\u8bad\u7ec3\u5b8c\u6210\uff0cSIEVE\u7684\u8868\u73b0\u4e0eGPT-4o\u76f8\u5f53\uff0c\u4f46\u6210\u672c\u5374\u4f4e\u5f97\u591a\uff08\u4ec5\u4e3a\u73b0\u6709\u6280\u672f\u76841%\uff09\u3002\u6211\u4eec\u5728OpenWebText\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff0c\u9488\u5bf9\u9ad8\u8d28\u91cf\u548c\u9886\u57df\u7279\u5b9a\u5185\u5bb9\u7684\u4e94\u4e2a\u9ad8\u5ea6\u5b9a\u5236\u5316\u7684\u8fc7\u6ee4\u4efb\u52a1\u9a8c\u8bc1\u4e86SIEVE\u7684\u6709\u6548\u6027\u548c\u6548\u7387\u3002 \u8fdb\u4e00\u6b65\u9a8c\u8bc1SIEVE\u7684\u6548\u679c\u663e\u793a\uff0cSIEVE\u548cGPT-4o\u5728\u51c6\u786e\u6027\u65b9\u9762\u8fbe\u5230\u76f8\u4f3c\u6c34\u5e73\uff0c\u800c\u4eba\u7c7b\u8bc4\u4f30\u8005\u66f4\u503e\u5411\u4e8eSIEVE\u7684\u8fc7\u6ee4\u7ed3\u679c\u800c\u975eGPT-4o\u7684\u7ed3\u679c\u3002|\n", "2410.02749": "|**2024-10-03**|**Training Language Models on Synthetic Edit Sequences Improves Code Synthesis**|Ulyana Piterbarg et.al.|[2410.02749](http://arxiv.org/abs/2410.02749)|null|\u672c\u6587\u5f00\u53d1\u4e86\u4e00\u79cd\u540d\u4e3aLintSeq\u7684\u5408\u6210\u6570\u636e\u751f\u6210\u7b97\u6cd5\u3002\u8be5\u7b97\u6cd5\u901a\u8fc7\u4f7f\u7528\u4ee3\u7801\u68c0\u67e5\u5668\u6765\u7a0b\u5e8f\u5316\u5730\u5728\u4e0d\u5f15\u5165\u9519\u8bef\u7684\u60c5\u51b5\u4e0b\u968f\u673a\u9009\u53d6\u63d2\u5165\u64cd\u4f5c\u5e8f\u5217\uff0c\u4ece\u800c\u5bf9\u73b0\u6709\u4ee3\u7801\u8fdb\u884c\u91cd\u6784\uff0c\u751f\u6210\u4e00\u7cfb\u5217\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u3002\u8fd9\u4e9b\u5e8f\u5217\u4ee5\u8fde\u7eed\u7684\u7a0b\u5e8f\u5dee\u5f02\u5f62\u5f0f\u8f93\u51fa\u3002 \u4e3a\u4e86\u6d4b\u8bd5LintSeq\uff0c\u6211\u4eec\u5c06\u5176\u5e94\u7528\u4e8e\u5c06\u6307\u4ee4+\u7a0b\u5e8f\u5bf9\u91cd\u65b0\u683c\u5f0f\u5316\u4e3a\u6307\u4ee4+\u7a0b\u5e8f\u5dee\u5f02\u5e8f\u5217\u5bf9\u7684\u4ee3\u7801\u5e93\u3002\u7136\u540e\uff0c\u6211\u4eec\u5bf9\u53c2\u6570\u4ece2.6B\u523014B\u7684\u591a\u4e2a\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u4e86\u57fa\u4e8e\u6307\u4ee4\u7684\u5fae\u8c03\uff0c\u6bd4\u8f83\u4e86\u5728\u539f\u59cb\u7248\u672c\u548c\u91cd\u65b0\u683c\u5f0f\u5316\u7248\u672c\u6570\u636e\u96c6\u4e0a\u7684\u96f6\u6b21\u5c04\u51fb\u6027\u80fd\u5728\u4ee3\u7801\u5408\u6210\u57fa\u51c6\u4e0a\u7684\u8868\u73b0\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5728\u591a\u6b21\u91c7\u6837\u671f\u95f4\uff0c\u7ecf\u8fc7\u4ee3\u7801\u5dee\u5f02\u5fae\u8c03\u7684\u6a21\u578b\u4ea7\u751f\u7684\u7a0b\u5e8f\u591a\u6837\u6027\u9ad8\u4e8e\u57fa\u7ebf\u3002\u8fd9\u5bfc\u81f4\u4e86\u5728\u7ed9\u5b9a\u5c1d\u8bd5\u6b21\u6570\u201ck\u201d\u65f6\uff0c\u9488\u5bf9\u57fa\u51c6\u8986\u76d6\u7387\u7684\u63a8\u7406\u65f6\u95f4\u6269\u5c55\u6027\u66f4\u597d\uff0c\u5373\u89e3\u51b3\u4efb\u4f55\u95ee\u9898\u7684\u6982\u7387\u201cpass@k\u201d\u3002\u4f8b\u5982\uff0c\u5728HumanEval pass@50\u4e0a\uff0c\u8f83\u5c0f\u6a21\u578b\u5728\u7ecf\u8fc7\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u5e8f\u5217\u5fae\u8c03\u540e\u4e0eGPT-4\u76f8\u6bd4\u5177\u6709\u7ade\u4e89\u529b\uff0c\u5e76\u4e14\u4f18\u4e8e\u57fa\u4e8e\u57fa\u7ebf\u6570\u636e\u96c6\u5fae\u8c03\u7684\u6a21\u578b\uff0c\u7edd\u5bf9\u5f97\u5206\u9ad8\u51fa20%\uff08\u00b13%\uff09\u3002 \u6700\u540e\uff0c\u6211\u4eec\u8fd8\u9884\u8bad\u7ec3\u4e86\u81ea\u5df1\u7684\u5c0f\u578b\u6a21\u578b\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5bf9\u5c0f\u578b\u6a21\u578b\u8fdb\u884c\u57fa\u4e8e\u5408\u6210\u4ee3\u7801\u7f16\u8f91\u7684\u5fae\u8c03\u53ef\u4ee5\u8fbe\u5230\u7c7b\u8bbe\u5907\u6a21\u578b\u7684\u6700\u9ad8\u4ee3\u7801\u5408\u6210\u6027\u80fd\u3002\u6211\u4eec\u76841.5\u4ebf\u53c2\u6570\u7f16\u8f91\u5e8f\u5217\u6a21\u578b\u5728\u6027\u80fd\u4e0a\u5339\u914d\u6216\u8d85\u8d8a\u4e86\u53c2\u6570\u91cf\u7ffb\u500d\u7684\u4ee3\u7801\u6a21\u578b\uff0c\u65e0\u8bba\u662f\u5426\u8fdb\u884c\u591a\u6b21\u91c7\u6837\uff0c\u5305\u62ecCodex\u548cAlphaCode\u3002|\n", "2410.02748": "|**2024-10-03**|**CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation**|Han He et.al.|[2410.02748](http://arxiv.org/abs/2410.02748)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u4ece\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u589e\u5f3a\u603b\u7ed3\u63d0\u793a\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u8bc1\u660e\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u53ef\u4ee5\u63d0\u9ad8ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u7684\u5206\u6790\u663e\u793a\uff0c\u878d\u5165\u77ed\u8bed\u7ea7\u522b\u7684\u663e\u8457\u4fe1\u606f\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7ea7\u522b\u7684\u4fe1\u606f\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5bf9\u5e7b\u89c9\u7684\u5f71\u54cd\u5e76\u975e\u5728\u6240\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0a\u90fd\u662f\u79ef\u6781\u7684\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u9879\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u6a21\u578bKeyphrase Signal Extractor\uff08CriSPO\uff09\uff0c\u8be5\u6a21\u578b\u53ef\u4ee5\u5fae\u8c03\u4ee5\u63d0\u53d6\u663e\u8457\u7684\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528CriSPO\uff0c\u6211\u4eec\u5728\u6570\u636e\u96c6\u3001\u5f00\u6e90\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u5bf9ROUGE\u6539\u8fdb\u7684\u4e00\u81f4\u6027\uff0c\u65e0\u9700\u5bf9LLM\u8fdb\u884c\u5b9a\u5236\u3002\u6211\u4eec\u7684\u53d1\u73b0\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u603b\u7ed3\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.02746": "|**2024-10-03**|**Contrastive Localized Language-Image Pre-Training**|Hong-You Chen et.al.|[2410.02746](http://arxiv.org/abs/2410.02746)|null|\u672c\u6587\u9488\u5bf9\u5bf9\u6bd4\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLIP\uff09\u4f5c\u4e3a\u89c6\u89c9\u8bed\u8a00\u57fa\u7840\u6a21\u578b\u7684\u6210\u529f\uff0c\u91cd\u70b9\u5728\u4e8e\u901a\u8fc7\u5728\u56fe\u50cf\u7ea7\u522b\u4e0a\u5bf9\u9f50\u7f51\u7edc\u6587\u672c\u6ce8\u91ca\u6765\u4f18\u5316\u89c6\u89c9\u7f16\u7801\u5668\u3002\u7136\u800c\uff0c\u8fd9\u79cd\u7b56\u7565\u5728\u9700\u8981\u7ec6\u7c92\u5ea6\u89c6\u89c9\u8868\u793a\u7684\u4e0b\u6e38\u4efb\u52a1\u4e2d\u53ef\u80fd\u53d8\u5f97\u4e0d\u591f\u5145\u5206\uff0c\u5c24\u5176\u662f\u5f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u9700\u8981\u8fdb\u884c\u533a\u57df\u7ea7\u7406\u89e3\u65f6\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bf9\u6bd4\u5b9a\u4f4d\u8bed\u8a00-\u56fe\u50cf\u9884\u8bad\u7ec3\uff08CLOC\uff09\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u8865\u5145CLIP\u4ee5\u589e\u52a0\u533a\u57df\u6587\u672c\u5bf9\u6bd4\u635f\u5931\u548c\u6a21\u5757\u6765\u63d0\u5347\u5176\u5b9a\u4f4d\u80fd\u529b\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u7684\u6982\u5ff5\uff0c\u5373\u53ef\u63d0\u793a\u5d4c\u5165\uff0c\u5176\u5141\u8bb8\u7f16\u7801\u5668\u751f\u6210\u6613\u4e8e\u901a\u8fc7\u7a7a\u95f4\u63d0\u793a\u8f6c\u6362\u4e3a\u533a\u57df\u8868\u793a\u7684\u56fe\u50cf\u5d4c\u5165\u3002\u4e3a\u4e86\u652f\u6301\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\uff0c\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u89c6\u89c9\u589e\u5f3a\u4e14\u7a7a\u95f4\u5c40\u90e8\u5316\u7684\u63cf\u8ff0\u7b26\u751f\u6210\u6846\u67b6\uff0c\u80fd\u591f\u6709\u6548\u751f\u6210\u5927\u89c4\u6a21\u7684\u533a\u57df\u6587\u672c\u4f2a\u6807\u7b7e\u3002\u901a\u8fc7\u6269\u5c55\u5230\u6570\u5341\u4ebf\u6807\u6ce8\u56fe\u50cf\uff0cCLOC\u4f7f\u5f97\u56fe\u50cf\u533a\u57df\u8bc6\u522b\u548c\u68c0\u7d22\u4efb\u52a1\u4e2d\u7684\u9ad8\u8d28\u91cf\u533a\u57df\u5d4c\u5165\u6210\u4e3a\u53ef\u80fd\uff0c\u5e76\u53ef\u4ee5\u4f5c\u4e3aCLIP\u7684\u76f4\u63a5\u66ff\u4ee3\u54c1\uff0c\u7528\u4e8e\u589e\u5f3aMLLMs\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee3\u548c\u4e0a\u4e0b\u6587\u7406\u89e3\u4efb\u52a1\u4e2d\u3002|\n", "2410.02744": "|**2024-10-03**|**Neutral residues: revisiting adapters for model extension**|Franck Signe Talla et.al.|[2410.02744](http://arxiv.org/abs/2410.02744)|null|\u6211\u4eec\u89e3\u51b3\u4e86\u4e00\u4e2a\u65b0\u7684\u95ee\u9898\uff1a\u5982\u4f55\u5c06\u9884\u8bad\u7ec3\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u6269\u5c55\u5230\u5728\u8bad\u7ec3\u65f6\u672a\u66fe\u89c1\u8fc7\u7684\u9886\u57df\uff0c\u4f8b\u5982\u6dfb\u52a0\u4e00\u79cd\u539f\u59cb\u6a21\u578b\u672a\u89c1\u8fc7\u6216\u89c1\u8fc7\u5f88\u5c11\u8bad\u7ec3\u6570\u636e\u7684\u8bed\u8a00\u3002\u6d41\u884c\u7684\u89e3\u51b3\u65b9\u6848\u5982\u5fae\u8c03\u6216\u4f4e\u79e9\u9002\u5e94\u5728\u9886\u57df\u9002\u5e94\u65b9\u9762\u53d6\u5f97\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5b9e\u9645\u4e0a\u5e76\u672a\u589e\u52a0\u989d\u5916\u7684\u80fd\u529b\uff0c\u5e76\u4e14\u964d\u4f4e\u4e86\u539f\u59cb\u9886\u57df\u7684\u6027\u80fd\u3002\u672c\u6587\u4ece\u4e09\u4e2a\u89d2\u5ea6\u5206\u6790\u4e86\u8fd9\u4e2a\u95ee\u9898\uff1a\u6570\u636e\u3001\u67b6\u6784\u548c\u8bad\u7ec3\u8fc7\u7a0b\uff0c\u8fd9\u4e9b\u90fd\u88ab\u6709\u5229\u5730\u8054\u5408\u8003\u8651\u3002\u7279\u522b\u662f\uff0c\u6211\u4eec\u6539\u8fdb\u4e86\u9002\u914d\u5668\uff0c\u5e76\u4f7f\u5176\u6709\u53ef\u80fd\u5b66\u4e60\u4e00\u4e2a\u5168\u65b0\u7684\u8bed\u8a00\uff0c\u540c\u65f6\u786e\u4fdd\u795e\u7ecf\u7f51\u7edc\u5728\u539f\u59cb\u9886\u57df\u7684\u8f93\u51fa\u51e0\u4e4e\u4e0d\u53d8\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u4fee\u6539\u4e86\u65b0\u7684\u6b8b\u5dee\u5757\u7684\u65b9\u5f0f\uff0c\u4f7f\u5f97\u6bcf\u4e2a\u65b0\u7684\u6b8b\u5dee\u5757\u5728\u539f\u59cb\u9886\u57df\u8f93\u51fa\u63a5\u8fd1\u96f6\u7684\u7ed3\u679c\u3002 \u8fd9\u79cd\u88ab\u79f0\u4e3a\u201c\u4e2d\u6027\u6b8b\u5dee\u201d\u7684\u89e3\u51b3\u65b9\u6848\u501f\u9274\u4e86\u6df7\u5408\u4e13\u5bb6\u67b6\u6784\u7684\u7ec4\u4ef6\uff0c\u6548\u679c\u663e\u8457\uff1a\u4e0e\u4ec5\u7528\u82f1\u8bed\u8bad\u7ec3\u7684\u539f\u59cb\u6a21\u578b\u76f8\u6bd4\uff0c\u53ea\u9700\u8981\u989d\u591620%\u7684\u5b66\u4e60\u6743\u91cd\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u5b66\u4e60\u65b0\u8bed\u8a00\u548c\u4e0d\u5fd8\u8bb0\u82f1\u8bed\u4e4b\u95f4\u7684\u6743\u8861\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u4f18\u4e8e\u540c\u65f6\u8fdb\u884c\u7684\u5176\u4ed6\u65b9\u6cd5\uff08\u5fae\u8c03\u3001\u4f4e\u79e9\u6216\u5e38\u89c4\u9002\u914d\u5668\uff09\u7684\u7ed3\u679c\u3002|\n", "2410.02743": "|**2024-10-03**|**MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions**|Yekun Chai et.al.|[2410.02743](http://arxiv.org/abs/2410.02743)|null|\u5f3a\u5316\u5b66\u4e60\u4ece\u4eba\u7c7b\u53cd\u9988\uff08RLHF\uff09\u5df2\u7ecf\u8bc1\u660e\u4e86\u5728\u4f7f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e0e\u4eba\u7c7b\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u65b9\u9762\u5177\u6709\u6709\u6548\u6027\u3002\u7136\u800c\uff0c\u57fa\u4e8etoken\u7684RLHF\u9762\u4e34\u7740\u957f\u671f\u5e8f\u5217\u4e2d\u7684\u8d23\u4efb\u5f52\u56e0\u95ee\u9898\uff0c\u5176\u4e2d\u5ef6\u8fdf\u5956\u52b1\u4f7f\u5f97\u6a21\u578b\u96be\u4ee5\u786e\u5b9a\u54ea\u4e9b\u64cd\u4f5c\u5bfc\u81f4\u4e86\u6210\u529f\u7684\u7ed3\u679c\uff0c\u8fd9\u963b\u788d\u4e86\u5b66\u4e60\u6548\u7387\u5e76\u51cf\u6162\u4e86\u6536\u655b\u901f\u5ea6\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMA-RLHF\u7684\u7b80\u5355\u800c\u6709\u6548\u7684RLHF\u6846\u67b6\uff0c\u5b83\u5c06\u5b8f\u52a8\u4f5c\u2014\u2014\u4e00\u7cfb\u5217token\u6216\u66f4\u9ad8\u5c42\u6b21\u7684\u8bed\u8a00\u6784\u9020\u2014\u2014\u878d\u5165\u5230\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u3002\u901a\u8fc7\u5728\u66f4\u9ad8\u62bd\u8c61\u7ea7\u522b\u4e0a\u64cd\u4f5c\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u51cf\u5c11\u4e86\u884c\u52a8\u548c\u5956\u52b1\u4e4b\u95f4\u7684\u65f6\u5e8f\u8ddd\u79bb\uff0c\u4ece\u800c\u4fc3\u8fdb\u4e86\u66f4\u5feb\u4e14\u66f4\u51c6\u786e\u7684\u8d23\u4efb\u5f52\u56e0\u3002\u8fd9\u5bfc\u81f4\u4e86\u66f4\u7a33\u5b9a\u7684\u7b56\u7565\u68af\u5ea6\u4f30\u8ba1\uff0c\u5e76\u63d0\u9ad8\u4e86\u6bcf\u4e2aepisode\u5185\u7684\u5b66\u4e60\u6548\u7387\uff0c\u6240\u6709\u8fd9\u4e9b\u90fd\u65e0\u9700\u5728\u8bad\u7ec3\u6216\u63a8\u7406\u671f\u95f4\u589e\u52a0\u8ba1\u7b97\u590d\u6742\u6027\u3002\u6211\u4eec\u901a\u8fc7\u5728\u6587\u672c\u6458\u8981\u3001\u5bf9\u8bdd\u751f\u6210\u3001\u95ee\u9898\u56de\u7b54\u548c\u7a0b\u5e8f\u5408\u6210\u7b49\u5404\u4e2a\u6a21\u578b\u5927\u5c0f\u548c\u4efb\u52a1\u4e0a\u8fdb\u884c\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u65b9\u6cd5\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u6587\u672c\u6458\u8981\u548c\u4ee3\u7801\u751f\u6210\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e86\u9ad8\u8fbe30%\u7684\u6027\u80fd\u63d0\u5347\uff0c\u5728\u5bf9\u8bdd\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e8618%\uff0c\u5728\u95ee\u9898\u56de\u7b54\u4efb\u52a1\u4e2d\u5b9e\u73b0\u4e868%\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u6bd4\u6807\u51c6\u7684RLHF\u5feb1.7\u81f32\u500d\u7684\u8bad\u7ec3\u65f6\u95f4\u8fbe\u5230\u4e0e\u4e4b\u76f8\u5339\u654c\u7684\u6027\u80fd\u6c34\u5e73\uff0c\u5e76\u4e14\u968f\u7740\u8fdb\u4e00\u6b65\u7684\u8bad\u7ec3\uff0c\u7ee7\u7eed\u8d85\u8d8a\u5b83\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\uff0c\u4f9b\u516c\u4f17\u8bbf\u95ee\uff0c\u7f51\u5740\u4e3ahttps://github.com/ernie-research/MA-RLHF \u3002|\n", "2410.02742": "|**2024-10-03**|**Grounding Large Language Models In Embodied Environment With Imperfect World Models**|Haolan Liu et.al.|[2410.02742](http://arxiv.org/abs/2410.02742)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u5e94\u7528\u4e2d\u53d6\u5f97\u4e86\u5e7f\u6cdb\u7684\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u5728\u5904\u7406\u57fa\u672c\u7269\u7406\u63a8\u7406\u6216\u6267\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65f6\u7ecf\u5e38\u9047\u5230\u56f0\u96be\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u5b83\u4eec\u7f3a\u4e4f\u5bf9\u73b0\u5b9e\u4e16\u754c\u7269\u7406\u7ec6\u8282\u7684\u76f4\u63a5\u7ecf\u9a8c\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGrounding Large language model with Imperfect world MOdel (GLIMO)\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u4ee3\u7406\u4e16\u754c\u6a21\u578b\uff0c\u5982\u6a21\u62df\u5668\uff0c\u6765\u6536\u96c6\u548c\u5408\u6210\u8bad\u7ec3\u6570\u636e\u3002GLIMO\u6574\u5408\u4e86\u4e00\u4e2a\u57fa\u4e8eLLM\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u5668\uff0c\u7528\u4e8e\u521b\u5efa\u9ad8\u8d28\u91cf\u4e14\u591a\u6837\u5316\u7684\u6307\u4ee4\u6570\u636e\u96c6\u3002\u751f\u6210\u5668\u5305\u62ec\u4e00\u4e2a\u7528\u4e8e\u65f6\u95f4\u4e00\u81f4\u6027\u4f53\u9a8c\u91c7\u6837\u7684\u8fed\u4ee3\u81ea\u6211\u7cbe\u70bc\u6a21\u5757\u3001\u4e00\u7ec4\u591a\u6837\u5316\u7684\u95ee\u7b54\u6307\u4ee4\u79cd\u5b50\uff0c\u4ee5\u53ca\u4e00\u4e2a\u53cd\u601d\u5148\u524d\u7ecf\u9a8c\u7684\u68c0\u7d22\u589e\u5f3a\u751f\u6210\u6a21\u5757\u3002 \u5168\u9762\u7684\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u80fd\u591f\u663e\u8457\u63d0\u5347\u5f3a\u5f00\u6e90LLMs\uff08\u5982LLaMA-3\uff09\u7684\u8868\u73b0\uff0c\u5206\u522b\u5728\u4e09\u4e2a\u4e0d\u540c\u57fa\u51c6\u4e0a\u5b9e\u73b0\u4e862.04\u500d\u30011.54\u500d\u548c1.82\u500d\u7684\u6027\u80fd\u63d0\u5347\u3002\u5176\u6027\u80fd\u80fd\u591f\u4e0e\u6216\u8d85\u8d8a\u5176\u66f4\u5927\u7684\u540c\u8f88\u6a21\u578b\uff0c\u5982GPT-4\u3002|\n", "2410.02741": "|**2024-10-03**|**Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization**|Lei Xu et.al.|[2410.02741](http://arxiv.org/abs/2410.02741)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5229\u7528\u6e90\u6587\u6863\u4e2d\u63d0\u53d6\u7684\u663e\u8457\u4fe1\u606f\u6765\u589e\u5f3a\u751f\u6210\u63d0\u793a\u4ee5\u6539\u8fdb\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6458\u8981\u80fd\u529b\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u5173\u952e\u77ed\u8bed\u80fd\u63d0\u5347ROUGE F1\u548c\u53ec\u56de\u7387\uff0c\u4f7f\u5f97\u751f\u6210\u7684\u6458\u8981\u4e0e\u53c2\u8003\u6458\u8981\u66f4\u52a0\u76f8\u4f3c\u4e14\u66f4\u5b8c\u6574\u3002\u901a\u8fc7\u8c03\u6574\u5173\u952e\u77ed\u8bed\u7684\u6570\u91cf\uff0c\u53ef\u4ee5\u63a7\u5236\u7cbe\u786e\u5ea6\u548c\u53ec\u56de\u7387\u4e4b\u95f4\u7684\u6743\u8861\u3002\u8fdb\u4e00\u6b65\u5206\u6790\u663e\u793a\uff0c\u5c06\u77ed\u8bed\u7ea7\u7684\u663e\u8457\u4fe1\u606f\u878d\u5165\u63d0\u793a\u4f18\u4e8e\u57fa\u4e8e\u5355\u8bcd\u6216\u53e5\u5b50\u7684\u7b56\u7565\u3002\u7136\u800c\uff0c\u8fd9\u5e76\u4e0d\u610f\u5473\u7740\u5bf9\u6240\u6709LLM\u90fd\u666e\u904d\u6709\u76ca\uff0c\u7279\u522b\u662f\u5728\u51cf\u5c11\u5e7b\u89c9\u65b9\u9762\u3002\u4e3a\u4e86\u8fdb\u884c\u8fd9\u4e00\u5206\u6790\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8f7b\u91cf\u7ea7\u7684Keyphrase Signal Extractor\uff08SigExt\uff09\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u53ef\u8fdb\u884c\u5fae\u8c03\u4ee5\u63d0\u53d6\u5173\u952e\u77ed\u8bed\u3002\u901a\u8fc7\u4f7f\u7528SigExt\uff0c\u6211\u4eec\u5728\u591a\u4e2a\u6570\u636e\u96c6\u3001\u516c\u5f00\u6743\u91cd\u548c\u4e13\u6709LLM\u4e0a\u5b9e\u73b0\u4e86\u4e0d\u4f9d\u8d56\u4e8eLLM\u5b9a\u5236\u7684ROUGE\u6307\u6807\u6539\u5584\u6548\u679c\u3002\u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u4e3a\u6784\u5efa\u57fa\u4e8e\u63d0\u793a\u7684\u6458\u8981\u7cfb\u7edf\u65f6\u5229\u7528\u663e\u8457\u4fe1\u606f\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002|\n", "2410.03663": "|**2024-10-04**|**Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models**|Zhuochun Li et.al.|[2410.03663](http://arxiv.org/abs/2410.03663)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3a\u201cMistake-Aware Peer-Review Distillation\u201d\uff08MAPD\uff09\u7684\u521b\u65b0\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u6539\u8fdb\u5f00\u6e90\u5c0f\u578b\u6a21\u578b\u7684\u77e5\u8bc6\u63d0\u70bc\uff08KD\uff09\u8fc7\u7a0b\u6765\u63d0\u9ad8\u5b83\u4eec\u7684\u6027\u80fd\uff0c\u8fd9\u4e9b\u8fc7\u7a0b\u901a\u5e38\u4f9d\u8d56\u4e8e\u5927\u578b\u5546\u4e1a\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u6559\u5e08\u3002\u4e0e\u4ee5\u5f80\u7814\u7a76\u4ec5\u4f7f\u7528\u5355\u4e00\u6559\u5e08\u751f\u6210\u7684\u9ec4\u91d1\u7406\u636e\u8fdb\u884c\u8bad\u7ec3\u4e0d\u540c\uff0cMAPD\u65b9\u6cd5\u91c7\u53d6\u4e86\u66f4\u4e3a\u7ec6\u81f4\u7684\u7b56\u7565\uff1a 1. **\u4e2a\u6027\u5316\u9519\u8bef\u53cd\u9988**\uff1aMAPD\u4e0d\u4ec5\u8981\u6c42\u6559\u5e08\u63d0\u4f9b\u5b66\u751f\u7b54\u6848\u7684\u6b63\u786e\u7406\u636e\uff0c\u66f4\u8fdb\u4e00\u6b65\u5730\uff0c\u5b83\u8ba9\u6559\u5e08\u6307\u51fa\u5b66\u751f\u7684\u9519\u8bef\u5e76\u89e3\u91ca\u539f\u56e0\uff0c\u4ece\u800c\u751f\u6210\u5b9a\u5236\u5316\u7684\u6559\u5b66\u6570\u636e\u3002 2. **\u6a21\u62df\u540c\u884c\u8bc4\u5ba1**\uff1a\u901a\u8fc7\u8bbe\u8ba1\u4e00\u4e2a\u6559\u5e08\u95f4\u7684\u6a21\u62df\u540c\u884c\u8bc4\u5ba1\u8fc7\u7a0b\uff0cMAPD\u7b5b\u9009\u51fa\u90a3\u4e9b\u8fbe\u5230\u4e00\u5b9a\u63a5\u53d7\u6807\u51c6\u7684\u751f\u6210\u7406\u636e\u3002\u8fd9\u4e00\u673a\u5236\u51cf\u5c11\u4e86\u6559\u5e08\u56e0\u731c\u6d4b\u800c\u7ed9\u51fa\u9519\u8bef\u7406\u636e\u7684\u53ef\u80fd\u6027\uff0c\u4ece\u800c\u63d0\u9ad8\u4e86\u6559\u5b66\u6570\u636e\u7684\u8d28\u91cf\u3002 \u672c\u6587\u5728\u6570\u5b66\u3001\u5e38\u8bc6\u548c\u903b\u8f91\u63a8\u7406\u4efb\u52a1\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0c\u9a8c\u8bc1\u4e86MAPD\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.03658": "|**2024-10-04**|**RAFT: Realistic Attacks to Fool Text Detectors**|James Wang et.al.|[2410.03658](http://arxiv.org/abs/2410.03658)|**[link](https://github.com/jameslwang/raft)**|\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u73b0\u6709\u5927\u578b\u8bed\u8a00\u6a21\u578b\u68c0\u6d4b\u5668\u7684\u8bed\u6cd5\u65e0\u8bef\u7684\u9ed1\u76d2\u653b\u51fb\u65b9\u6cd5\uff0c\u79f0\u4e3aRAFT\u3002\u4e0e\u4e4b\u524d\u9488\u5bf9\u8bed\u8a00\u6a21\u578b\u7684\u653b\u51fb\u4e0d\u540c\uff0cRAFT\u65b9\u6cd5\u5229\u7528\u4e86\u8bcd\u7ea7\u4e0a\u7684LLM\u5d4c\u5165\u7684\u53ef\u8fc1\u79fb\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u539f\u59cb\u6587\u672c\u8d28\u91cf\u4e0d\u53d8\u3002\u901a\u8fc7\u5229\u7528\u8f85\u52a9\u5d4c\u5165\uff0cRAFT\u8d2a\u5a6a\u5730\u9009\u62e9\u9700\u8981\u6270\u52a8\u7684\u76ee\u6807\u5355\u8bcd\uff0c\u4ee5\u5bf9\u6297\u7279\u5b9a\u7684\u68c0\u6d4b\u5668\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRAFT\u653b\u51fb\u80fd\u591f\u6709\u6548\u5730\u4f7f\u6240\u6709\u7814\u7a76\u4e2d\u7684\u68c0\u6d4b\u5668\u5728\u5404\u79cd\u9886\u57df\u4e2d\u5931\u6548\u9ad8\u8fbe99%\uff0c\u5e76\u4e14\u5177\u6709\u8de8\u6e90\u6a21\u578b\u7684\u53ef\u79fb\u690d\u6027\u3002\u624b\u52a8\u7684\u4eba\u7c7b\u8bc4\u4f30\u7814\u7a76\u8868\u660e\uff0cRAFT\u751f\u6210\u7684\u653b\u51fb\u5b9e\u4f8b\u65e2\u771f\u5b9e\u53c8\u96be\u4ee5\u4e0e\u539f\u521b\u4eba\u7c7b\u7f16\u5199\u6587\u672c\u533a\u5206\u5f00\u6765\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5c55\u793a\u4e86RAFT\u751f\u6210\u7684\u4f8b\u5b50\u53ef\u4ee5\u7528\u6765\u8bad\u7ec3\u9c81\u68d2\u6027\u66f4\u5f3a\u7684\u68c0\u6d4b\u5668\u3002\u6211\u4eec\u7684\u5de5\u4f5c\u63ed\u793a\u4e86\u5f53\u524d\u7684LLM\u68c0\u6d4b\u5668\u5e76\u975e\u5177\u6709\u9c81\u68d2\u6027\uff0c\u5f3a\u8c03\u4e86\u8feb\u5207\u9700\u8981\u66f4\u5f3a\u5927\u7684\u68c0\u6d4b\u673a\u5236\u7684\u5fc5\u8981\u6027\u3002|\n", "2410.03642": "|**2024-10-04**|**Aligning LLMs with Individual Preferences via Interaction**|Shujin Wu et.al.|[2410.03642](http://arxiv.org/abs/2410.03642)|**[link](https://github.com/shujinwu-0814/aloe)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5c55\u73b0\u51fa\u65e5\u76ca\u5148\u8fdb\u7684\u80fd\u529b\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u884c\u4e3a\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u548c\u504f\u597d\u4fdd\u6301\u4e00\u81f4\u5bf9\u4e8e\u5e7f\u6cdb\u91c7\u7528\u8fd9\u4e9b\u6a21\u578b\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u9075\u5faa\u8bf8\u5982\u5e2e\u52a9\u6027\u3001\u65e0\u5bb3\u6027\u548c\u8bda\u5b9e\u6027\u7b49\u4e00\u822c\u539f\u5219\u4e0a\uff0c\u4f46\u5ffd\u89c6\u4e86\u8003\u8651\u5230\u4e2a\u4eba\u548c\u591a\u6837\u6027\u504f\u597d\u7684\u9700\u6c42\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31\u4e86\u4e2a\u6027\u5316\u7684\u4eba\u7c7b\u4f53\u9a8c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u80fd\u591f\u201c\u4ea4\u4e92\u4ee5\u5bf9\u9f50\u201d\u7684LLMs\uff0c\u5373\u8ba9LLMs\u53d1\u5c55\u51fa\u4e00\u79cd\u9690\u5f0f\u63a8\u65ad\u5f53\u524d\u7528\u6237\u672a\u660e\u786e\u8868\u8fbe\u7684\u4e2a\u6027\u5316\u504f\u597d\u7684\u5143\u6280\u80fd\uff0c\u5e76\u636e\u6b64\u52a8\u6001\u8c03\u6574\u540e\u7eed\u884c\u4e3a\u548c\u54cd\u5e94\u4ee5\u9002\u5e94\u8fd9\u4e9b\u63a8\u65ad\u7684\u504f\u597d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5305\u62ec\u5efa\u7acb\u4e00\u4e2a\u75313,310\u4e2a\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7ec4\u6210\u7684\u591a\u6837\u5316\u6c60\uff0c\u901a\u8fc7\u521d\u59cb\u793a\u4f8b\u521b\u5efa\uff0c\u7136\u540e\u901a\u8fc7\u8fed\u4ee3\u81ea\u6211\u751f\u6210\u548c\u7b5b\u9009\u8fdb\u884c\u6269\u5c55\u3002\u5728\u4e0d\u540c\u7528\u6237\u4eba\u8bbe\u7684\u6307\u5bfc\u4e0b\uff0c\u6211\u4eec\u5229\u7528\u591aLLM\u534f\u4f5c\u5f00\u53d1\u4e86\u4e00\u4e2a\u5305\u542b3K+\u591a\u8f6e\u5bf9\u8bdd\u7684\u6811\u5f62\u7ed3\u6784\u591a\u8f6e\u504f\u597d\u6570\u636e\u96c6\u3002\u6700\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u76d1\u7763\u5fae\u8c03\u548c\u5f3a\u5316\u5b66\u4e60\u5bf9\u6570\u636e\u96c6\u8fdb\u884c\u4e86\u589e\u5f3a\uff0c\u4ee5\u63d0\u9ad8LLMs\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u5efa\u7acb\u4e86ALOE\uff08ALign With CustOmized PrEferences\uff09\u57fa\u51c6\uff0c\u5305\u542b100\u4e2a\u7cbe\u5fc3\u6311\u9009\u7684\u4f8b\u5b50\u4ee5\u53ca\u7528\u4e8e\u8861\u91cf\u5bf9\u8bdd\u4e2d\u4e2a\u6027\u5316\u5bf9\u9f50\u6027\u80fd\u7684\u9002\u5f53\u5ea6\u91cf\u6807\u51c6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u5728\u901a\u8fc7\u4e92\u52a8\u5b9e\u73b0\u52a8\u6001\u3001\u4e2a\u6027\u5316\u7684\u5bf9\u9f50\u65b9\u9762\u975e\u5e38\u6709\u6548\u3002**|\n", "2410.03613": "|**2024-10-04**|**Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation**|Jie Xiao et.al.|[2410.03613](http://arxiv.org/abs/2410.03613)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6211\u4eec\u5de5\u4f5c\u548c\u65e5\u5e38\u751f\u6d3b\u7684\u5404\u4e2a\u65b9\u9762\u65e5\u76ca\u666e\u53ca\uff0c\u5bf9\u7528\u6237\u9690\u79c1\u7684\u5173\u6ce8\u63a8\u52a8\u4e86\u8fd9\u4e9b\u6a21\u578b\u672c\u5730\u90e8\u7f72\u7684\u8d8b\u52bf\u3002\u5b58\u5728\u4e00\u4e9b\u8f7b\u91cf\u7ea7LLM\uff08\u4f8b\u5982Gemini Nano\uff0cLLAMA2 7B\uff09\uff0c\u5b83\u4eec\u53ef\u4ee5\u5728\u667a\u80fd\u624b\u673a\u4e0a\u672c\u5730\u8fd0\u884c\uff0c\u4e3a\u7528\u6237\u63d0\u4f9b\u5bf9\u5176\u4e2a\u4eba\u6570\u636e\u7684\u66f4\u5927\u63a7\u5236\u6743\u3002\u4f5c\u4e3a\u4e00\u9879\u8fc5\u901f\u53d1\u5c55\u7684\u5e94\u7528\uff0c\u6211\u4eec\u5173\u6ce8\u5b83\u4eec\u5728\u5546\u7528\u79fb\u52a8\u8bbe\u5907\u4e0a\u7684\u6027\u80fd\u3002 \u4e3a\u4e86\u5168\u9762\u4e86\u89e3LLM\u5728\u79fb\u52a8\u5e73\u53f0\u4e0a\u7684\u90e8\u7f72\u73b0\u72b6\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5168\u9762\u7684\u6d4b\u91cf\u7814\u7a76\u3002\u6211\u4eec\u8bc4\u4f30\u4e86\u5f71\u54cd\u7528\u6237\u4f53\u9a8c\u7684\u6307\u6807\uff0c\u5305\u62ec\u4ee4\u724c\u541e\u5410\u91cf\u3001\u5ef6\u8fdf\u548c\u7535\u6c60\u6d88\u8017\uff0c\u4ee5\u53ca\u5bf9\u5f00\u53d1\u8005\u81f3\u5173\u91cd\u8981\u7684\u56e0\u7d20\uff0c\u5982\u8d44\u6e90\u5229\u7528\u3001\u52a8\u6001\u7535\u538b\u9891\u7387\u7f29\u653e\u7b56\u7565\u548c\u63a8\u7406\u5f15\u64ce\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8be6\u7ec6\u5206\u6790\u4e86\u786c\u4ef6\u80fd\u529b\u548c\u7cfb\u7edf\u52a8\u529b\u5b66\u5982\u4f55\u5f71\u54cd\u672c\u5730\u8bbe\u5907\u4e0a\u7684LLM\u6027\u80fd\uff0c\u8fd9\u53ef\u80fd\u6709\u52a9\u4e8e\u5f00\u53d1\u8005\u8bc6\u522b\u5e76\u89e3\u51b3\u79fb\u52a8LLM\u5e94\u7528\u7a0b\u5e8f\u4e2d\u7684\u74f6\u9888\u3002\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u9488\u5bf9\u4e3b\u8981\u4f9b\u5e94\u5546\u7684\u79fb\u52a8\u7cfb\u7edf\u7ea7\u82af\u7247\uff08SoC\uff09\u7684\u5168\u9762\u6bd4\u8f83\uff0c\u7a81\u51fa\u4e86\u5b83\u4eec\u5728\u5904\u7406LLM\u5de5\u4f5c\u8d1f\u8f7d\u65f6\u7684\u6027\u80fd\u5dee\u5f02\u3002\u6211\u4eec\u5e0c\u671b\u8fd9\u9879\u7814\u7a76\u80fd\u591f\u4e3a\u672c\u5730\u8bbe\u5907LLM\u7684\u5f00\u53d1\u548c\u672a\u6765\u79fb\u52a8\u7cfb\u7edf\u67b6\u6784\u7684\u8bbe\u8ba1\u63d0\u4f9b\u6d1e\u5bdf\u3002|\n", "2410.03608": "|**2024-10-04**|**TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation**|Jonathan Cook et.al.|[2410.03608](http://arxiv.org/abs/2410.03608)|null|\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e7f\u6cdb\u5e94\u7528\u80cc\u666f\u4e0b\uff0c\u6784\u5efa\u7075\u6d3b\u4e14\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u5176\u9075\u5faa\u6307\u4ee4\u80fd\u529b\u7684\u65b9\u6cd5\u81f3\u5173\u91cd\u8981\u3002\u76ee\u524d\uff0c\u504f\u597d\u5224\u65ad\u6210\u4e3a\u4e86\u8bc4\u4f30\u6807\u51c6\u7684\u9ed8\u8ba4\u9009\u62e9\uff0c\u5c3d\u7ba1\u8fd9\u79cd\u505a\u6cd5\u7b80\u5316\u4e86\u590d\u6742\u3001\u591a\u7ef4\u504f\u597d\u7684\u63d0\u70bc\uff0c\u5c06\u5176\u5f52\u7ed3\u4e3a\u5355\u4e00\u6392\u540d\u3002\u7136\u800c\uff0c\u968f\u7740\u4eba\u5de5\u6ce8\u91ca\u7684\u7f13\u6162\u548c\u6210\u672c\u9ad8\u6602\uff0cLLM\u88ab\u8d8a\u6765\u8d8a\u591a\u5730\u7528\u4e8e\u505a\u51fa\u8fd9\u4e9b\u5224\u65ad\uff0c\u8fd9\u727a\u7272\u4e86\u53ef\u9760\u6027\u548c\u53ef\u89e3\u91ca\u6027\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TICK\uff08\u9488\u5bf9\u7279\u5b9a\u6307\u4ee4\u7684\u7ed3\u6784\u5316\u8bc4\u4f30\u4e0e\u6838\u67e5\u6e05\u5355\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5168\u81ea\u52a8\u5316\u3001\u53ef\u89e3\u91ca\u7684\u8bc4\u4f30\u65b9\u6848\uff0c\u901a\u8fc7LLM\u751f\u6210\u7684\u3001\u9488\u5bf9\u6307\u4ee4\u7684\u6838\u67e5\u6e05\u5355\u7ed3\u6784\u5316\u8bc4\u4f30\u3002 \u9996\u5148\uff0c\u6211\u4eec\u5c55\u793a\u4e86\uff0c\u5728\u7ed9\u5b9a\u6307\u4ee4\u7684\u60c5\u51b5\u4e0b\uff0cLLM\u80fd\u591f\u53ef\u9760\u5730\u4ea7\u751f\u9ad8\u8d28\u91cf\u3001\u5b9a\u5236\u5316\u7684\u8bc4\u4f30\u6838\u67e5\u6e05\u5355\uff0c\u5c06\u6307\u4ee4\u5206\u89e3\u4e3a\u4e00\u7cfb\u5217\u662f/\u5426\u95ee\u9898\u3002\u6bcf\u4e2a\u95ee\u9898\u8be2\u95ee\u5019\u9009\u56de\u5e94\u662f\u5426\u6ee1\u8db3\u6307\u4ee4\u7684\u5177\u4f53\u8981\u6c42\u3002\u6211\u4eec\u8bc1\u660e\u4f7f\u7528TICK\u80fd\u591f\u663e\u8457\u63d0\u9ad8LLM\u5224\u65ad\u4e0e\u4eba\u7c7b\u504f\u597d\u4e4b\u95f4\u7cbe\u786e\u4e00\u81f4\u6027\u7684\u9891\u7387\uff0c\u76f8\u6bd4\u76f4\u63a5\u7531LLM\u8bc4\u5206\u8f93\u51fa\uff0c\u8fd9\u4e00\u6bd4\u4f8b\u4ece46.4%\u63d0\u5347\u81f352.2%\u3002 \u63a5\u7740\uff0c\u6211\u4eec\u5c55\u793a\u4e86STICK\uff08\u81ea\u6211TICK\uff09\u53ef\u4ee5\u5229\u7528\u81ea\u6211\u7ec6\u5316\u548c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u6765\u6539\u5584\u591a\u4e2a\u57fa\u51c6\u7684\u751f\u6210\u8d28\u91cf\u3002\u5bf9LiveBench\u63a8\u7406\u4efb\u52a1\u8fdb\u884cSTICK\u81ea\u6211\u7ec6\u5316\uff0c\u5b9e\u73b0\u4e86\u7edd\u5bf9\u589e\u76ca+7.8%\uff0c\u800c\u4f7f\u7528STICK\u8fdb\u884c\u6700\u4f73\u4e2d\u7684N\u9009\u62e9\u5728\u771f\u5b9e\u4e16\u754c\u6307\u4ee4\u6570\u636e\u96c6WildBench\u4e0a\u83b7\u5f97\u4e86+6.3%\u7684\u7edd\u5bf9\u6539\u8fdb\u3002\u8fd9\u8868\u660e\uff0c\u7ed3\u6784\u5316\u7684\u3001\u591a\u7ef4\u5ea6\u7684\u81ea\u6211\u6539\u8fdb\u662f\u8fdb\u4e00\u6b65\u63d0\u5347LLM\u80fd\u529b\u7684\u4e00\u4e2a\u6709\u524d\u666f\u7684\u65b9\u5411\u3002 \u6700\u540e\uff0c\u901a\u8fc7\u5411\u76f4\u63a5\u4e3aWildBench\u6307\u4ee4\u8bc4\u4f30LLM\u54cd\u5e94\u7684\u4eba\u7c7b\u8bc4\u4f30\u8005\u63d0\u4f9bLLM\u751f\u6210\u7684\u6838\u67e5\u6e05\u5355\uff0c\u6211\u4eec\u663e\u8457\u63d0\u9ad8\u4e86\u8bc4\u4f30\u8005\u4e4b\u95f4\u7684\u5171\u8bc6\u5ea6\uff08\u4ece0.194\u63d0\u5347\u81f30.256\uff09\u3002|\n", "2410.03600": "|**2024-10-04**|**Efficiently Identifying Watermarked Segments in Mixed-Source Texts**|Xuandong Zhao et.al.|[2410.03600](http://arxiv.org/abs/2410.03600)|null|\u6587\u672c\u6c34\u5370\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u5e94\u7528\u65e5\u76ca\u589e\u957f\uff0c\u7528\u4e8e\u68c0\u6d4b\u5408\u6210\u6587\u672c\uff0c\u4ee5\u7f13\u89e3\u865a\u5047\u65b0\u95fb\u548c\u5b66\u672f\u4e0d\u8bda\u5b9e\u7b49\u6ee5\u7528\u60c5\u51b5\u3002\u73b0\u6709\u6c34\u5370\u68c0\u6d4b\u6280\u672f\u4e3b\u8981\u5173\u6ce8\u4e8e\u5bf9\u6574\u4e2a\u6587\u6863\u8fdb\u884c\u5206\u7c7b\uff0c\u5224\u65ad\u5176\u662f\u5426\u88ab\u6c34\u5370\u6807\u8bb0\uff0c\u4f46\u5f80\u5f80\u5ffd\u7565\u4e86\u5728\u66f4\u957f\u7684\u6df7\u5408\u6765\u6e90\u6587\u6863\u4e2d\u8bc6\u522b\u5355\u72ec\u6c34\u5370\u6bb5\u843d\u7684\u5e38\u89c1\u573a\u666f\u3002\u53d7\u5230\u6284\u88ad\u68c0\u6d4b\u7cfb\u7edf\u7684\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u578b\u65b9\u6cd5\u8fdb\u884c\u90e8\u5206\u6c34\u5370\u68c0\u6d4b\u3002\u9996\u5148\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u51e0\u4f55\u8986\u76d6\u68c0\u6d4b\u6846\u67b6\uff0c\u65e8\u5728\u786e\u5b9a\u957f\u6587\u672c\u4e2d\u662f\u5426\u5b58\u5728\u6c34\u5370\u6bb5\u843d\u3002\u5176\u6b21\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u81ea\u9002\u5e94\u5728\u7ebf\u5b66\u4e60\u7b97\u6cd5\uff0c\u4ee5\u51c6\u786e\u5b9a\u4f4d\u6587\u672c\u4e2d\u7684\u6c34\u5370\u6bb5\u843d\u4f4d\u7f6e\u3002\u5728\u4e09\u79cd\u6d41\u884c\u7684\u6c34\u5370\u6280\u672f\uff08KGW-Watermark\u3001Unigram-Watermark \u548c Gumbel-Watermark\uff09\u4e0a\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u53d6\u5f97\u4e86\u9ad8\u7cbe\u5ea6\uff0c\u5e76\u663e\u8457\u4f18\u4e8e\u57fa\u7ebf\u65b9\u6cd5\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5177\u6709\u9002\u5e94\u5176\u4ed6\u6c34\u5370\u6280\u672f\u7684\u80fd\u529b\uff0c\u63d0\u4f9b\u4e86\u7cbe\u786e\u6c34\u5370\u68c0\u6d4b\u7684\u65b0\u89c1\u89e3\u3002|\n", "2410.03595": "|**2024-10-04**|**Understanding Reasoning in Chain-of-Thought from the Hopfieldian View**|Lijie Hu et.al.|[2410.03595](http://arxiv.org/abs/2410.03595)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u5404\u7c7b\u4efb\u52a1\u4e2d\u5c55\u73b0\u51fa\u975e\u51e1\u80fd\u529b\uff0c\u94fe\u5f0f\u601d\u8003\uff08Chain-of-Thought, CoT\uff09\u63d0\u793a\u4f5c\u4e3a\u4e00\u79cd\u63d0\u5347\u63a8\u7406\u80fd\u529b\u7684\u5173\u952e\u6280\u672f\u9010\u6e10\u53d7\u5230\u5173\u6ce8\u3002\u7136\u800c\uff0c\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u63d0\u9ad8\u6027\u80fd\u65b9\u9762\uff0c\u7f3a\u4e4f\u5bf9CoT\u6210\u529f\u80cc\u540e\u6839\u672c\u56e0\u7d20\u7684\u5168\u9762\u89e3\u91ca\u6846\u67b6\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u8ba4\u77e5\u795e\u7ecf\u79d1\u5b66\u4e2d\u7684\u970d\u666e\u83f2\u5c14\u5fb7\u8ba4\u77e5\u89c2\u7684\u65b0\u89c6\u89d2\u3002\u6211\u4eec\u5efa\u7acb\u4e86\u4e00\u4e2a\u94fe\u63a5CoT\u63a8\u7406\u4e0e\u523a\u6fc0\u3001\u52a8\u4f5c\u3001\u795e\u7ecf\u7fa4\u4f53\u548c\u8868\u793a\u7a7a\u95f4\u7b49\u5173\u952e\u8ba4\u77e5\u5143\u7d20\u4e4b\u95f4\u7684\u5173\u7cfb\u6846\u67b6\u3002\u4ece\u8fd9\u4e00\u89c6\u89d2\u51fa\u53d1\uff0c\u6211\u4eec\u53ef\u4ee5\u7406\u89e3\u63a8\u7406\u8fc7\u7a0b\u5b9e\u8d28\u4e0a\u662f\u8fd9\u4e9b\u8868\u793a\u7a7a\u95f4\u4e4b\u95f4\u7684\u79fb\u52a8\u3002 \u57fa\u4e8e\u6b64\u6d1e\u5bdf\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b9\u6cd5\u6765\u5b9a\u4f4dCoT\u54cd\u5e94\u4e2d\u7684\u63a8\u7406\u9519\u8bef\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u540d\u4e3a\u201c\u601d\u8003\u7684\u8868\u793a\u201d\uff08Representation-of-Thought, RoT\uff09\u7684\u6846\u67b6\uff0c\u5229\u7528\u4f4e\u7ef4\u8868\u793a\u7a7a\u95f4\u7684\u9c81\u68d2\u6027\u6765\u589e\u5f3aCoT\u63a8\u7406\u8fc7\u7a0b\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u63a7\u5236\u7684\u80fd\u529b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cRoT\u4e0d\u4ec5\u63d0\u9ad8\u4e86CoT\u63a8\u7406\u7684\u9c81\u68d2\u6027\u548c\u53ef\u89e3\u91ca\u6027\uff0c\u800c\u4e14\u63d0\u4f9b\u4e86\u5bf9\u63a8\u7406\u8fc7\u7a0b\u8fdb\u884c\u7cbe\u7ec6\u5316\u63a7\u5236\u7684\u53ef\u80fd\u6027\u3002|\n", "2410.03577": "|**2024-10-04**|**Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models**|Xin Zou et.al.|[2410.03577](http://arxiv.org/abs/2410.03577)|**[link](https://github.com/1zhou-Wang/MemVR)**|\u5c3d\u7ba1\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5177\u6709\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u4f46\u5b83\u4eec\u5bb9\u6613\u51fa\u73b0\u5e7b\u89c9\uff0c\u7279\u522b\u662f\u5728\u89c6\u89c9\u8f93\u5165\u4e2d\u4e0d\u5b58\u5728\u5173\u952e\u7ec6\u8282\u65f6\uff0c\u4f1a\u5938\u5f20\u5730\u7f16\u9020\u5185\u5bb9\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u9075\u5faa\u4e86\u4eba\u7c7b\u8ba4\u77e5\u8fc7\u7a0b\u4e2d\u7684\u4e00\u4e2a\u5e38\u89c1\u6b65\u9aa4\u2014\u2014\u5f53\u5bf9\u73b0\u573a\u5173\u952e\u7ec6\u8282\u7684\u8bb0\u5fc6\u9010\u6e10\u6a21\u7cca\u65f6\uff0c\u76f4\u89c2\u7684\u505a\u6cd5\u662f\u518d\u6b21\u67e5\u770b\u8fd9\u4e9b\u7ec6\u8282\u4ee5\u5bfb\u6c42\u51c6\u786e\u548c\u771f\u5b9e\u7684\u4fe1\u606f\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u8bb0\u5fc6\u7a7a\u95f4\u89c6\u89c9\u91cd\u8bfb\u201d\uff08MemVR\uff09\u7684\u65b0\u578b\u5e7b\u89c9\u7f13\u89e3\u8303\u5f0f\uff0c\u5b83\u65e0\u9700\u5916\u90e8\u77e5\u8bc6\u68c0\u7d22\u6216\u989d\u5916\u7684\u5fae\u8c03\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5c06\u89c6\u89c9\u63d0\u793a\u4f5c\u4e3a\u8865\u5145\u8bc1\u636e\uff0c\u901a\u8fc7\u524d\u9988\u7f51\u7edc\uff08FFN\uff09\u6ce8\u5165\u5230MLLMs\u4e2d\u4f5c\u4e3a\u952e\u503c\u8bb0\u5fc6\uff0c\u5f53\u6a21\u578b\u5bf9\u95ee\u9898\u76f8\u5173\u7684\u89c6\u89c9\u8bb0\u5fc6\u4e0d\u786e\u5b9a\u751a\u81f3\u9057\u5fd8\u65f6\u3002\u5168\u9762\u7684\u5b9e\u9a8c\u8bc4\u4f30\u8868\u660e\uff0cMemVR\u5728\u5404\u79cdMLLMs\u4e0a\u663e\u8457\u7f13\u89e3\u4e86\u5e7b\u89c9\u95ee\u9898\uff0c\u5e76\u4e14\u5728\u4e0d\u589e\u52a0\u65f6\u95f4\u5f00\u9500\u7684\u60c5\u51b5\u4e0b\uff0c\u5728\u901a\u7528\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u8868\u73b0\u51fa\u8272\uff0c\u4ece\u800c\u7a81\u663e\u51fa\u5176\u5e7f\u6cdb\u9002\u7528\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.03568": "|**2024-10-04**|**Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs)**|Abrar Rahman et.al.|[2410.03568](http://arxiv.org/abs/2410.03568)|null|\u672c\u6587\u5bf9\u5f53\u524d\u9876\u7ea7\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u91c7\u7528\u7684\u5206\u8bcd\u6280\u672f\u8fdb\u884c\u4e86\u5168\u9762\u7814\u7a76\uff0c\u5e76\u63a2\u8ba8\u4e86\u8fd9\u4e9b\u6280\u672f\u5728\u4e0d\u540c\u8bed\u8a00\u5c24\u5176\u662f\u8d44\u6e90\u532e\u4e4f\u8bed\u8a00\u670d\u52a1\u6210\u672c\u4e0e\u53ef\u7528\u6027\u65b9\u9762\u7684\u6f5c\u5728\u5f71\u54cd\u3002\u7814\u7a76\u8003\u8651\u4e86\u591a\u79cdLLMs\uff0c\u5305\u62ec\u4f7f\u7528cl100k_base\u5d4c\u5165\u7684GPT-4\u3001\u4f7f\u7528p50k_base\u5d4c\u5165\u7684GPT-3\u4ee5\u53ca\u4f7f\u7528r50k_base\u5d4c\u5165\u7684DaVinci\uff0c\u540c\u65f6\u5bf9\u6bd4\u4e86\u5e7f\u6cdb\u4f7f\u7528\u7684BERT\u57fa\u7840\u5206\u8bcd\u5668\u3002\u7814\u7a76\u5206\u6790\u4e86\u8fd9\u4e9b\u6a21\u578b\u4e4b\u95f4\u7684\u5206\u8bcd\u5dee\u5f02\uff0c\u5e76\u6df1\u5165\u63a2\u7a76\u4e86\u5b50\u8bcd\u5206\u8bcd\u5728\u8bed\u8a00\u8868\u793a\u4e0a\u7684\u6311\u6218\u3002 \u7814\u7a76\u5f3a\u8c03\u4e86\u57f9\u517b\u8bed\u8a00\u610f\u8bc6\u5f00\u53d1\u5b9e\u8df5\u7684\u91cd\u8981\u6027\uff0c\u7279\u522b\u662f\u9488\u5bf9\u90a3\u4e9b\u4f20\u7edf\u4e0a\u8d44\u6e90\u4e0d\u8db3\u7684\u8bed\u8a00\u3002\u6b64\u5916\uff0c\u672c\u6587\u8fd8\u901a\u8fc7\u6848\u4f8b\u7814\u7a76\u5c55\u793a\u4e86\u5206\u8bcd\u9009\u62e9\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u7684\u5f71\u54cd\uff0c\u7279\u522b\u662f\u5728\u7535\u5b50\u5065\u5eb7\u8bb0\u5f55\uff08EHR\uff09\u7cfb\u7edf\u4e2d\u7684\u5e94\u7528\u3002\u7814\u7a76\u65e8\u5728\u4fc3\u8fdbAI\u670d\u52a1\u9886\u57df\uff0c\u7279\u522b\u662f\u8de8\u8bed\u8a00\u73af\u5883\u4e2d\u7684\u901a\u7528\u5316\u56fd\u9645\u5316\uff08I18N\uff09\u5b9e\u8df5\uff0c\u7279\u522b\u5173\u6ce8\u88ab\u73b0\u6709AI\u5e94\u7528\u4e25\u91cd\u5ffd\u89c6\u7684\u8bed\u8a00\u7684\u5305\u5bb9\u6027\u53d1\u5c55\u3002|\n", "2410.03553": "|**2024-10-04**|**Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding**|Wei Wu et.al.|[2410.03553](http://arxiv.org/abs/2410.03553)|null|\u86cb\u767d\u8d28\u4f5c\u4e3a\u751f\u7269\u5206\u5b50\u7684\u6838\u5fc3\uff0c\u5728\u751f\u7269\u8fc7\u7a0b\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u5305\u62ec\u4ee3\u8c22\u53cd\u5e94\u548cDNA\u590d\u5236\u3002\u51c6\u786e\u9884\u6d4b\u5b83\u4eec\u7684\u6027\u8d28\u548c\u529f\u80fd\u5bf9\u751f\u7269\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u6700\u8fd1\u5f00\u53d1\u7684\u86cb\u767d\u8d28\u8bed\u8a00\u6a21\u578b\uff08pLMs\uff09\u901a\u8fc7\u76d1\u7763\u5fae\u8c03\u63d0\u4f9b\u4e86\u89e3\u51b3\u95ee\u9898\u7684\u6709\u5e0c\u671b\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u5fae\u8c03\u7684\u6a21\u578b\u4ec5\u9488\u5bf9\u7279\u5b9a\u4e0b\u6e38\u9884\u6d4b\u4efb\u52a1\u8fdb\u884c\u5b9a\u5236\uff0c\u5b9e\u73b0\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u4ecd\u7136\u662f\u4e00\u4e2a\u6311\u6218\u3002\u4e3a\u6b64\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u7ed3\u6784\u589e\u5f3a\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u8c03\u8c10\uff08SEPIT\uff09\u6846\u67b6\u6765\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5728pLMs\u4e2d\u96c6\u6210\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u7ed3\u6784\u611f\u77e5\u6a21\u5757\uff0c\u4ee5\u63d0\u4f9b\u6709\u5173\u7ed3\u6784\u7684\u77e5\u8bc6\uff0c\u5e76\u5c06\u8fd9\u4e9b\u589e\u5f3a\u7684pLMs\u4e0e\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fde\u63a5\u8d77\u6765\uff0c\u4ee5\u751f\u6210\u86cb\u767d\u8d28\u7684\u7406\u89e3\u3002\u5728\u8fd9\u4e2a\u6846\u67b6\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u65b0\u9896\u7684\u4e24\u9636\u6bb5\u6307\u4ee4\u8c03\u8c10\u7ba1\u9053\uff0c\u9996\u5148\u901a\u8fc7\u57fa\u4e8e\u56fe\u6807\u7684\u6307\u4ee4\u5efa\u7acb\u86cb\u767d\u8d28\u7684\u57fa\u672c\u7406\u89e3\uff0c\u7136\u540e\u4f7f\u7528\u4e13\u5bb6\u6df7\u5408\uff08MoEs\uff09\u5b66\u4e60\u66f4\u590d\u6742\u5c5e\u6027\u548c\u529f\u80fd\u4fe1\u606f\uff0c\u540c\u65f6\u4fdd\u6301\u6fc0\u6d3b\u53c2\u6570\u7684\u6570\u91cf\u76f8\u540c\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684\u6700\u5168\u9762\u7684\u86cb\u767d\u8d28\u6307\u4ee4\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u6211\u4eec\u80fd\u591f\u8bad\u7ec3\u548c\u8bc4\u4f30\u901a\u7528\u7684\u86cb\u767d\u8d28\u7406\u89e3\u6a21\u578b\u3002\u5e7f\u6cdb\u7684\u7ecf\u9a8c\u7ed3\u679c\u5728\u5f00\u653e\u5f0f\u751f\u6210\u548c\u5c01\u95ed\u96c6\u5408\u7b54\u6848\u4efb\u52a1\u4e0a\u663e\u793a\u4e86SEPIT\u76f8\u5bf9\u4e8e\u95ed\u6e90\u901a\u7528LLM\u548c\u4f7f\u7528\u86cb\u767d\u8d28\u77e5\u8bc6\u8bad\u7ec3\u7684\u5f00\u6e90LLM\u7684\u4f18\u8d8a\u6027\u80fd\u3002|\n", "2410.05269": "|**2024-10-07**|**Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models**|Fei Wang et.al.|[2410.05269](http://arxiv.org/abs/2410.05269)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e2d\u7684\u6570\u636e\u662f\u5173\u952e\u8981\u7d20\u3002\u8fd1\u671f\u7814\u7a76\u63a2\u7d22\u4e86\u5229\u7528LLM\u8fdb\u884c\u9ad8\u6548\u6570\u636e\u6536\u96c6\u7684\u65b9\u6cd5\u3002\u7136\u800c\uff0c\u7531LLM\u751f\u6210\u7684\u6570\u636e\u5f80\u5f80\u5b58\u5728\u8d28\u91cf\u53c2\u5dee\u4e0d\u9f50\u3001\u67d0\u4e9b\u65b9\u9762\u88ab\u4f4e\u4f30\u6216\u7f3a\u5931\u4ee5\u53ca\u6570\u636e\u70b9\u8d28\u91cf\u4f4e\u4e0b\u7684\u95ee\u9898\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201c\u6570\u636e\u987e\u95ee\u201d\u7684\u589e\u5f3a\u578bLLM\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8003\u8651\u76ee\u6807\u6570\u636e\u96c6\u7684\u7279\u6027\uff0c\u4ece\u9884\u5b9a\u4e49\u7684\u539f\u5219\u51fa\u53d1\uff0c\u76d1\u63a7\u751f\u6210\u6570\u636e\u7684\u72b6\u6001\uff0c\u8bc6\u522b\u5f53\u524d\u6570\u636e\u96c6\u7684\u5f31\u70b9\uff0c\u5e76\u636e\u6b64\u6307\u5bfc\u6570\u636e\u751f\u6210\u7684\u4e0b\u4e00\u8f6e\u8fed\u4ee3\u3002\u6570\u636e\u987e\u95ee\u53ef\u4ee5\u8f7b\u677e\u5730\u96c6\u6210\u5230\u73b0\u6709\u7684\u6570\u636e\u751f\u6210\u65b9\u6cd5\u4e2d\uff0c\u4ee5\u63d0\u9ad8\u6570\u636e\u8d28\u91cf\u548c\u8986\u76d6\u9762\u3002 \u5728\u5bf9\u4e09\u4e2a\u4ee3\u8868\u6027LLM\uff08\u5373Mistral\u3001Llama2\u548cFalcon\uff09\u7684\u5b89\u5168\u5bf9\u9f50\u8fdb\u884c\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6570\u636e\u987e\u95ee\u8bc1\u660e\u4e86\u5176\u5728\u4e0d\u727a\u7272\u6a21\u578b\u5b9e\u7528\u6027\u7684\u60c5\u51b5\u4e0b\uff0c\u6709\u6548\u63d0\u5347\u6a21\u578b\u5bf9\u5404\u79cd\u7cbe\u7ec6\u7c92\u5ea6\u5b89\u5168\u95ee\u9898\u7684\u9002\u5e94\u6027\u7684\u80fd\u529b\u3002|\n", "2410.05265": "|**2024-10-07**|**PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs**|Mengzhao Chen et.al.|[2410.05265](http://arxiv.org/abs/2410.05265)|**[link](https://github.com/chenmnz/prefixquant)**|**\u91cf\u5316\u5bf9\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u90e8\u7f72\u81f3\u5173\u91cd\u8981\uff0c\u5b83\u80fd\u663e\u8457\u63d0\u5347\u5185\u5b58\u6548\u7387\u4e0e\u63a8\u7406\u901f\u5ea6\u3002\u73b0\u6709\u7684\u6fc0\u6d3b\u91cf\u5316\u65b9\u6cd5\u4e3b\u8981\u9488\u5bf9\u901a\u9053\u7ea7\u5f02\u5e38\u503c\u8fdb\u884c\u5904\u7406\uff0c\u5f80\u5f80\u5ffd\u7565\u4e86\u4ee4\u724c\u7ea7\u7684\u5f02\u5e38\u503c\uff0c\u8fd9\u5bfc\u81f4\u4e86\u5bf9\u6210\u672c\u9ad8\u6602\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u4f9d\u8d56\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aPrefixQuant\u7684\u65b0\u9896\u6280\u672f\uff0c\u8be5\u6280\u672f\u5728\u4e0d\u91cd\u65b0\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u79bb\u7ebf\u8bc6\u522b\u51fa\u9ad8\u9891\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u5c06\u5176\u4f5c\u4e3a\u524d\u7f00\u653e\u5165KV\u7f13\u5b58\u4e2d\uff0c\u4ee5\u9632\u6b62\u63a8\u7406\u8fc7\u7a0b\u4e2d\u751f\u6210\u5f02\u5e38\u4ee4\u724c\uff0c\u5e76\u7b80\u5316\u4e86\u91cf\u5316\u8fc7\u7a0b\u3002\u636e\u6211\u4eec\u6240\u77e5\uff0cPrefixQuant\u662f\u9996\u4e2a\u80fd\u591f\u5b9e\u73b0\u9ad8\u6548\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u5e76\u8d85\u8d8a\u6602\u8d35\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u7684\u65b9\u6cd5\u3002\u4f8b\u5982\uff0c\u5728W4A4KV4\uff08\u6743\u91cd4\u4f4d\u3001\u6fc0\u6d3b4\u4f4d\u3001KV\u7f13\u5b584\u4f4d\uff09\u7684Llama-3-8B\u6a21\u578b\u4e2d\uff0c\u4f7f\u7528PrefixQuant\u548c\u9010\u5f20\u91cf\u9759\u6001\u91cf\u5316\u540e\uff0cWikiText2\u7684\u56f0\u60d1\u5ea6\u964d\u4f4e\u4e867.43\u4e2a\u70b9\uff0c\u5e73\u5747\u51c6\u786e\u7387\u57285\u4e2a\u5e38\u8bc6\u63a8\u7406\u4efb\u52a1\u4e0a\u63d0\u9ad8\u4e8671.08%\uff0c\u76f8\u8f83\u4e8e\u4e4b\u524d\u7684\u9010\u4ee4\u724c\u52a8\u6001\u91cf\u5316\u65b9\u6cd5QuaRot\uff0c\u5206\u522b\u5728\u56f0\u60d1\u5ea6\u4e0a\u63d0\u5347\u4e860.98\u4e2a\u70b9\uff0c\u5728\u51c6\u786e\u7387\u4e0a\u63d0\u5347\u4e865.98\u4e2a\u70b9\u3002\u6b64\u5916\uff0c\u4f7f\u7528PrefixQuant\u91cf\u5316\u540e\u7684\u6a21\u578b\u7684\u63a8\u7406\u901f\u5ea6\u76f8\u8f83\u4e8eFP16\u6a21\u578b\u63d0\u5347\u4e861.60\u500d\u52302.81\u500d\uff0c\u4e14\u8d85\u8fc7\u4e86QuaRot\u6a21\u578b1.2\u500d\u52301.3\u500d\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u4e8e\\url{https://github.com/ChenMnZ/PrefixQuant}\u3002**|\n", "2410.05262": "|**2024-10-07**|**TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles**|Qingchen Yu et.al.|[2410.05262](http://arxiv.org/abs/2410.05262)|**[link](https://github.com/mazzzystar/TurtleBench)**|**\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u5e94\u7528\u8303\u56f4\u4e0d\u65ad\u6269\u5927\uff0c\u5bf9\u53ef\u9760\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u73b0\u6709\u7684LLM\u8bc4\u4f30\u57fa\u51c6\u4e3b\u8981\u4f9d\u8d56\u9759\u6001\u6570\u636e\u96c6\uff0c\u8fd9\u4f7f\u5f97\u8bc4\u4f30\u6a21\u578b\u5728\u4e0e\u7528\u6237\u52a8\u6001\u4ea4\u4e92\u65f6\u7684\u8868\u73b0\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\u3002\u6b64\u5916\uff0c\u8fd9\u4e9b\u57fa\u51c6\u5f80\u5f80\u9700\u8981\u7279\u5b9a\u80cc\u666f\u77e5\u8bc6\uff0c\u4ece\u800c\u590d\u6742\u5316\u4e86\u8861\u91cf\u6a21\u578b\u903b\u8f91\u63a8\u7406\u80fd\u529b\u7684\u6d4b\u91cf\u3002\u57fa\u4e8e\u5f3a\u5927\u6a21\u578b\u6216\u4eba\u5de5\u52aa\u529b\u7684\u5176\u4ed6\u52a8\u6001\u8bc4\u4f30\u65b9\u6cd5\u53ef\u80fd\u4f1a\u5f15\u5165\u504f\u89c1\uff0c\u5e76\u4e14\u6210\u672c\u548c\u65f6\u95f4\u9700\u6c42\u9ad8\uff0c\u8fd9\u963b\u788d\u4e86\u5927\u89c4\u6a21\u5e94\u7528\u3002 \u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86TurtleBench\u3002TurtleBench\u4ece\u6211\u4eec\u5f00\u53d1\u7684\u5728\u7ebfTurtle Soup Puzzle\u5e73\u53f0\u6536\u96c6\u771f\u5b9e\u7684\u7528\u6237\u731c\u6d4b\uff0c\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u751f\u6210\u76f8\u5bf9\u52a8\u6001\u7684\u8bc4\u4f30\u6570\u636e\u96c6\uff0c\u53ef\u4ee5\u964d\u4f4e\u6a21\u578b\u4f5c\u5f0a\u7684\u98ce\u9669\uff0c\u540c\u65f6\u4f7f\u8bc4\u4f30\u66f4\u8d34\u8fd1\u5b9e\u9645\u7528\u6237\u7684\u63a8\u7406\u9700\u6c42\uff0c\u4ece\u800c\u63d0\u9ad8\u8bc4\u4f30\u7684\u53ef\u9760\u6027\u3002TurtleBench\u5305\u542b\u4e861,532\u4e2a\u7528\u6237\u731c\u6d4b\u53ca\u5176\u6b63\u786e\u6027\u7684\u6ce8\u91ca\u4fe1\u606f\u3002\u5229\u7528\u8fd9\u4e2a\u6570\u636e\u96c6\uff0c\u6211\u4eec\u5168\u9762\u8bc4\u4f30\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u4e5d\u4e2aLLM\u6a21\u578b\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cOpenAI o1\u7cfb\u5217\u6a21\u578b\u5728\u8fd9\u4e9b\u8bc4\u4f30\u4e2d\u5e76\u672a\u53d6\u5f97\u9886\u5148\u5730\u4f4d\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u8fdb\u4e00\u6b65\u7814\u7a76\u7684\u5047\u8bbe\uff0c\u4f8b\u5982\u201co1\u7684\u6f5c\u5728\u63a8\u7406\u4f7f\u7528\u4e86\u7b80\u5355\u7684\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u6280\u672f\u201d\u548c\u201c\u589e\u52a0CoT\u957f\u5ea6\u4e0d\u4ec5\u63d0\u4f9b\u4e86\u63a8\u7406\u76ca\u5904\uff0c\u540c\u65f6\u4e5f\u5e26\u6765\u4e86\u566a\u97f3\u6210\u672c\u201d\u3002**|\n", "2410.05258": "|**2024-10-07**|**Differential Transformer**|Tianzhu Ye et.al.|[2410.05258](http://arxiv.org/abs/2410.05258)|null|\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u5dee\u5f02\u53d8\u6362\u5668\uff08Diff Transformer\uff09\uff0c\u5b83\u80fd\u591f\u589e\u5f3a\u5bf9\u76f8\u5173\u4e0a\u4e0b\u6587\u7684\u6ce8\u610f\u529b\u540c\u65f6\u6d88\u9664\u566a\u97f3\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5dee\u5f02\u6ce8\u610f\u529b\u673a\u5236\u901a\u8fc7\u8ba1\u7b97\u4e24\u4e2a\u72ec\u7acb\u7684softmax\u6ce8\u610f\u529b\u6620\u5c04\u4e4b\u95f4\u7684\u5dee\u503c\u6765\u786e\u5b9a\u6ce8\u610f\u529b\u5206\u6570\u3002\u8fd9\u79cd\u51cf\u6cd5\u64cd\u4f5c\u53ef\u4ee5\u6d88\u9664\u566a\u97f3\u5e76\u4fc3\u8fdb\u7a00\u758f\u6ce8\u610f\u529b\u6a21\u5f0f\u7684\u4ea7\u751f\u3002\u5728\u8bed\u8a00\u5efa\u6a21\u4efb\u52a1\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u6807\u51c6\u7684\u53d8\u6362\u5668\u76f8\u6bd4\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u5728\u6a21\u578b\u5927\u5c0f\u548c\u8bad\u7ec3\u6837\u672c\u91cf\u7684\u6269\u5c55\u4e0a\u5747\u8868\u73b0\u51fa\u8272\u3002\u66f4\u4ee4\u4eba\u5174\u594b\u7684\u662f\uff0c\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u5982\u957f\u4e0a\u4e0b\u6587\u5efa\u6a21\u3001\u5173\u952e\u4fe1\u606f\u68c0\u7d22\u3001\u5e7b\u89c9\u6291\u5236\u3001\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u4ee5\u53ca\u6fc0\u6d3b\u5f02\u5e38\u51cf\u5c11\u7b49\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u90fd\u5c55\u73b0\u51fa\u663e\u8457\u4f18\u52bf\u3002\u7531\u4e8e\u5bf9\u65e0\u5173\u4e0a\u4e0b\u6587\u7684\u5173\u6ce8\u8f83\u5c11\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u80fd\u591f\u6709\u6548\u7f13\u89e3\u95ee\u7b54\u548c\u6587\u672c\u6458\u8981\u4e2d\u7684\u5e7b\u89c9\u95ee\u9898\u3002\u5728\u4e0a\u4e0b\u6587\u5185\u5b66\u4e60\u65b9\u9762\uff0c\u5dee\u5f02\u53d8\u6362\u5668\u4e0d\u4ec5\u63d0\u9ad8\u4e86\u51c6\u786e\u7387\uff0c\u800c\u4e14\u5bf9\u4e8e\u987a\u5e8f\u6392\u5217\u66f4\u4e3a\u9c81\u68d2\uff0c\u8fd9\u88ab\u8ba4\u4e3a\u662f\u957f\u671f\u7684\u7a33\u5065\u6027\u95ee\u9898\u3002\u8fd9\u4e9b\u7ed3\u679c\u786e\u7acb\u4e86\u5dee\u5f02\u53d8\u6362\u5668\u4f5c\u4e3a\u63a8\u52a8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u53d1\u5c55\u7684\u9ad8\u6548\u4e14\u6709\u524d\u666f\u67b6\u6784\u7684\u5730\u4f4d\u3002|\n", "2410.05254": "|**2024-10-07**|**GLEE: A Unified Framework and Benchmark for Language-based Economic Environments**|Eilam Shapira et.al.|[2410.05254](http://arxiv.org/abs/2410.05254)|**[link](https://github.com/eilamshapira/GLEE)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7ecf\u6d4e\u4e0e\u6218\u7565\u4e92\u52a8\u9886\u57df\u5c55\u73b0\u51fa\u5de8\u5927\u6f5c\u529b\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u9886\u57df\u901a\u5e38\u4ee5\u81ea\u7136\u8bed\u8a00\u6c9f\u901a\u4e3a\u4e3b\u3002\u8fd9\u5f15\u53d1\u4e86\u4e00\u7cfb\u5217\u5173\u952e\u95ee\u9898\uff1aLLMs\u662f\u5426\u8868\u73b0\u51fa\u7406\u6027\u884c\u4e3a\uff1f\u5b83\u4eec\u80fd\u5426\u6a21\u4eff\u4eba\u7c7b\u884c\u4e3a\uff1f\u5b83\u4eec\u662f\u5426\u503e\u5411\u4e8e\u8fbe\u5230\u9ad8\u6548\u548c\u516c\u5e73\u7684\u7ed3\u679c\uff1f\u81ea\u7136\u8bed\u8a00\u5728\u7b56\u7565\u4e92\u52a8\u4e2d\u7684\u89d2\u8272\u662f\u4ec0\u4e48\uff1f\u7ecf\u6d4e\u73af\u5883\u7684\u7279\u6027\u5982\u4f55\u5f71\u54cd\u8fd9\u4e9b\u52a8\u6001\uff1f\u8fd9\u4e9b\u95ee\u9898\u5bf9\u4e8e\u5c06\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u96c6\u6210\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u6570\u636e\u9a71\u52a8\u7cfb\u7edf\uff08\u5982\u5728\u7ebf\u96f6\u552e\u5e73\u53f0\u548c\u63a8\u8350\u7cfb\u7edf\uff09\u4e2d\u65f6\u7684\u7ecf\u6d4e\u548c\u793e\u4f1a\u5f71\u54cd\u81f3\u5173\u91cd\u8981\u3002\u5c3d\u7ba1\u673a\u5668\u5b66\u4e60\u793e\u533a\u4e00\u76f4\u5728\u63a2\u7d22LLMs\u5728\u591a\u4ee3\u7406\u8bbe\u7f6e\u4e2d\u7684\u6f5c\u529b\uff0c\u4f46\u4e0d\u540c\u7814\u7a76\u4e4b\u95f4\u7684\u5047\u8bbe\u3001\u8bbe\u8ba1\u9009\u62e9\u548c\u8bc4\u4f30\u6807\u51c6\u5dee\u5f02\u4f7f\u5f97\u5f88\u96be\u5f97\u51fa\u7a33\u5065\u4e14\u6709\u610f\u4e49\u7684\u7ed3\u8bba\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6807\u51c6\u5316\u7814\u7a76\u57fa\u4e8e\u53cc\u4eba\u3001\u5e8f\u5217\u3001\u8bed\u8a00\u9a71\u52a8\u6e38\u620f\u7684\u6807\u51c6\u6846\u67b6\u3002\u53d7\u7ecf\u6d4e\u5b66\u6587\u732e\u542f\u53d1\uff0c\u6211\u4eec\u5b9a\u4e49\u4e86\u4e09\u4e2a\u57fa\u672c\u6e38\u620f\u5bb6\u65cf\uff0c\u5177\u6709\u4e00\u81f4\u7684\u53c2\u6570\u5316\u3001\u81ea\u7531\u5ea6\u548c\u7528\u4e8e\u8bc4\u4f30\u4ee3\u7406\u6027\u80fd\uff08\u81ea\u6211\u6536\u76ca\uff09\u4ee5\u53ca\u6e38\u620f\u7ed3\u679c\uff08\u6548\u7387\u548c\u516c\u5e73\u6027\uff09\u7684\u7ecf\u6d4e\u6307\u6807\u3002 \u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u4e2a\u5f00\u6e90\u6846\u67b6\u6765\u6a21\u62df\u4ea4\u4e92\u548c\u5206\u6790\uff0c\u5e76\u5229\u7528\u5b83\u6536\u96c6\u4e86LMM\u5bf9LMM\u4ea4\u4e92\u7684\u5927\u91cf\u6570\u636e\u96c6\u4ee5\u53ca\u989d\u5916\u7684\u4eba\u7c7b\u5bf9LMM\u4ea4\u4e92\u6570\u636e\u96c6\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u5c55\u793a\u4e86\u6211\u4eec\u7684\u6846\u67b6\u548c\u6570\u636e\u96c6\u5982\u4f55\u88ab\u7528\u6765\uff1a (i) \u6bd4\u8f83\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u4e0e\u4eba\u7c7b\u73a9\u5bb6\u5728\u5404\u79cd\u7ecf\u6d4e\u80cc\u666f\u4e0b\u7684\u884c\u4e3a\uff1b (ii) \u4ece\u4e2a\u4f53\u548c\u96c6\u4f53\u5c42\u9762\u8bc4\u4f30\u4ee3\u7406\u7684\u6027\u80fd\uff1b (iii) \u5b9a\u91cf\u5206\u6790\u7ecf\u6d4e\u73af\u5883\u7279\u6027\u5bf9\u4ee3\u7406\u884c\u4e3a\u7684\u5f71\u54cd\u3002**|\n", "2410.05252": "|**2024-10-07**|**Causal Micro-Narratives**|Mourad Heddaya et.al.|[2410.05252](http://arxiv.org/abs/2410.05252)|null|\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u6765\u5bf9\u6587\u672c\u4e2d\u7684\u56e0\u679c\u5fae\u53d9\u4e8b\u8fdb\u884c\u5206\u7c7b\u3002\u8fd9\u4e9b\u53d9\u4e8b\u662f\u5173\u4e8e\u76ee\u6807\u4e3b\u4f53\u7684\u56e0\u679c\u89e3\u91ca\u7684\u53e5\u5b50\u7ea7\u63cf\u8ff0\u3002\u8be5\u65b9\u6cd5\u4ec5\u9700\u8981\u9488\u5bf9\u7279\u5b9a\u4e3b\u9898\u7684\u56e0\u679c\u548c\u6548\u679c\u7684\u672c\u4f53\uff0c\u6211\u4eec\u901a\u8fc7\u5e94\u7528\u5230\u901a\u8d27\u81a8\u80c0\u53d9\u4e8b\u4e2d\u8fdb\u884c\u4e86\u793a\u8303\u3002\u5229\u7528\u8986\u76d6\u7f8e\u56fd\u5386\u53f2\u548c\u5f53\u4ee3\u65b0\u95fb\u6587\u7ae0\u7684\u4eba\u5de5\u6807\u6ce8\u6570\u636e\u96c6\u8fdb\u884c\u8bad\u7ec3\uff0c\u6211\u4eec\u5728\u591a\u6807\u7b7e\u5206\u7c7b\u4efb\u52a1\u4e0a\u8bc4\u4f30\u4e86\u51e0\u79cd\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u3002\u8868\u73b0\u6700\u597d\u7684\u6a21\u578b\u2014\u2014\u5fae\u8c03\u540e\u7684Llama 3.1 8B\uff0c\u5728\u53d9\u4e8b\u68c0\u6d4b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.87\uff0c\u5728\u53d9\u4e8b\u5206\u7c7b\u4e0a\u8fbe\u5230F1\u5f97\u5206\u4e3a0.71\u3002\u5168\u9762\u7684\u9519\u8bef\u5206\u6790\u63ed\u793a\u4e86\u8bed\u4e49\u6b67\u4e49\u5e26\u6765\u7684\u6311\u6218\uff0c\u5e76\u6307\u51fa\u6a21\u578b\u9519\u8bef\u5f80\u5f80\u53cd\u6620\u4e86\u4eba\u5de5\u6ce8\u91ca\u8005\u7684\u5206\u6b67\u3002\u8fd9\u9879\u7814\u7a76\u5efa\u7acb\u4e86\u4e00\u4e2a\u4ece\u5b9e\u9645\u6570\u636e\u4e2d\u63d0\u53d6\u56e0\u679c\u5fae\u53d9\u4e8b\u7684\u6846\u67b6\uff0c\u5177\u6709\u5e7f\u6cdb\u7684\u793e\u4f1a\u79d1\u5b66\u7814\u7a76\u5e94\u7528\u524d\u666f\u3002|\n", "2410.05248": "|**2024-10-07**|**SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe**|Yuxin Xiao et.al.|[2410.05248](http://arxiv.org/abs/2410.05248)|null|\u4e3a\u4e86\u5728\u4ea4\u4e92\u9a71\u52a8\u4efb\u52a1\u4e2d\u8bf1\u5bfc\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u671f\u671b\u7684\u884c\u4e3a\uff0c\u901a\u5e38\u91c7\u7528\u6307\u4ee4-\u8c03\u4f18\u9636\u6bb5\uff0c\u901a\u8fc7\u4e0b\u4e00\u4e2a\u8bcd\u9884\u6d4b\uff08NTP\uff09\u635f\u5931\u8bad\u7ec3LLM\u4e8e\u6307\u4ee4\u54cd\u5e94\u5bf9\u3002\u5148\u524d\u7684\u5de5\u4f5c\u65e8\u5728\u63d0\u5347\u8c03\u4f18\u6027\u80fd\uff0c\u5e38\u7740\u91cd\u4e8e\u9ad8\u8d28\u91cf\u7684\u76d1\u7763\u5fae\u8c03\uff08SFT\uff09\u6570\u636e\u96c6\u7684\u6784\u5efa\uff0c\u8fd9\u901a\u5e38\u9700\u8981\u6602\u8d35\u7684\u6570\u636e\u8fc7\u6ee4\u8fc7\u7a0b\u6216\u4eba\u529b\u5bc6\u96c6\u578b\u7684\u4eba\u5de5\u6ce8\u91ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5e76\u672a\u5145\u5206\u5229\u7528\u6570\u636e\u96c6\u7684\u5185\u5728\u7279\u6027\uff0c\u5bfc\u81f4\u4e86\u9ad8\u6602\u7684\u8ba1\u7b97\u548c\u52b3\u52a8\u6210\u672c\uff0c\u9650\u5236\u4e86\u53ef\u6269\u5c55\u6027\u548c\u6027\u80fd\u63d0\u5347\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSFTMix\u7684\u65b0\u9896\u65b9\u6cd5\uff0c\u5b83\u8d85\u8d8a\u4e86\u4f20\u7edfNTP\u8303\u5f0f\uff0c\u65e0\u9700\u7cbe\u5fc3\u8bbe\u8ba1\u7684SFT\u6570\u636e\u96c6\u5373\u53ef\u63d0\u5347\u8c03\u4f18\u6027\u80fd\u3002 \u89c2\u5bdf\u5230LLM\u5728\u8bed\u4e49\u8868\u793a\u7a7a\u95f4\u4e2d\u8868\u73b0\u51fa\u4e0d\u5747\u5300\u7684\u7f6e\u4fe1\u5ea6\u5206\u5e03\uff0c\u6211\u4eec\u63d0\u51fa\uff0c\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\u5728\u8c03\u4f18\u8fc7\u7a0b\u4e2d\u5e94\u626e\u6f14\u4e0d\u540c\u7684\u89d2\u8272\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0cSFTMix\u5229\u7528\u8bad\u7ec3\u52a8\u6001\u6765\u8bc6\u522b\u5177\u6709\u4e0d\u540c\u7f6e\u4fe1\u5ea6\u7ea7\u522b\u7684\u793a\u4f8b\uff0c\u7136\u540e\u5e94\u7528\u57fa\u4e8eMixup\u7684\u6b63\u5219\u5316\u6765\u51cf\u5c11\u5bf9\u9ad8\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u8fc7\u62df\u5408\uff0c\u540c\u65f6\u4f20\u64ad\u76d1\u7763\u4fe1\u53f7\u4ee5\u6539\u5584\u76f8\u5bf9\u4f4e\u7f6e\u4fe1\u5ea6\u793a\u4f8b\u7684\u5b66\u4e60\u6548\u679c\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u5f97SFTMix\u80fd\u591f\u5728\u5e7f\u6cdb\u7684\u64cd\u4f5c\u6307\u4ee4\u9075\u5faa\u548c\u533b\u7597\u4fdd\u5065\u9886\u57df\u7684\u7279\u5b9aSFT\u4efb\u52a1\u4e2d\u663e\u8457\u8d85\u8d8aNTP\uff0c\u8bc1\u660e\u4e86\u5176\u5bf9\u4e0d\u540cLLM\u5bb6\u65cf\u548c\u4efb\u610f\u5927\u5c0f\u6570\u636e\u96c6\u7684\u9002\u5e94\u6027\u548c\u53ef\u6269\u5c55\u6027\u3002\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\u8fdb\u4e00\u6b65\u9a8c\u8bc1\u4e86SFTMix\u8bbe\u8ba1\u9009\u62e9\u7684\u7a33\u5065\u6027\uff0c\u5f3a\u8c03\u4e86\u5176\u5728\u4e0d\u540cLLM\u548c\u6570\u636e\u96c6\u4e0a\u7684\u4e00\u81f4\u6027\u80fd\u63d0\u5347\u80fd\u529b\uff0c\u9002\u7528\u4e8e\u66f4\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u3002|\n", "2410.05243": "|**2024-10-07**|**Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents**|Boyu Gou et.al.|[2410.05243](http://arxiv.org/abs/2410.05243)|**[link](https://github.com/OSU-NLP-Group/UGround)**|\u672c\u8bba\u6587\u63a2\u8ba8\u4e86\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u5982\u4f55\u91cd\u5851\u56fe\u5f62\u7528\u6237\u754c\u9762\uff08GUI\uff09\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u4f7f\u5176\u4ece\u53d7\u63a7\u6a21\u62df\u5411\u8de8\u5e73\u53f0\u7684\u590d\u6742\u73b0\u5b9e\u4e16\u754c\u5e94\u7528\u8fc7\u6e21\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u4ee3\u7406\u7684\u6709\u6548\u6027\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u5176\u56fa\u6709\u6027\u7684\u7a33\u5065\u6027\u3002\u5f53\u524d\u7684GUI\u4ee3\u7406\u4e3b\u8981\u4f9d\u8d56\u4e8e\u57fa\u4e8e\u6587\u672c\u7684\u8868\u793a\uff0c\u5982HTML\u6216\u53ef\u8bbf\u95ee\u6027\u6811\uff0c\u5c3d\u7ba1\u5b83\u4eec\u5177\u6709\u5b9e\u7528\u6027\uff0c\u4f46\u5f80\u5f80\u5f15\u5165\u566a\u58f0\u3001\u4e0d\u5b8c\u6574\u6027\u4ee5\u53ca\u589e\u52a0\u8ba1\u7b97\u5f00\u9500\u3002 \u6211\u4eec\u7684\u89c2\u70b9\u662f\uff0c\u4e3aGUI\u4ee3\u7406\u6784\u5efa\u4e00\u79cd\u7c7b\u4f3c\u4eba\u7c7b\u7684\u4f53\u73b0\uff0c\u80fd\u591f\u5b8c\u5168\u901a\u8fc7\u89c6\u89c9\u611f\u77e5\u73af\u5883\uff0c\u5e76\u76f4\u63a5\u5bf9GUI\u6267\u884c\u50cf\u7d20\u7ea7\u64cd\u4f5c\u3002\u5173\u952e\u5728\u4e8e\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u5b83\u4eec\u80fd\u591f\u51c6\u786e\u5730\u5c06GUI\u5143\u7d20\u7684\u5404\u79cd\u5f15\u7528\u8868\u8fbe\u6620\u5c04\u5230\u5176\u5728\u4e0d\u540c\u5e73\u53f0\u4e0a\u7684GUI\u5750\u6807\u4e0a\u3002\u6211\u4eec\u8868\u660e\uff0c\u4e00\u4e2a\u7b80\u5355\u7684\u914d\u65b9\u2014\u2014\u5305\u62ec\u57fa\u4e8e\u7f51\u7edc\u7684\u5408\u6210\u6570\u636e\u548c\u5bf9LLaVA\u67b6\u6784\u7684\u8f7b\u5fae\u8c03\u6574\u2014\u2014\u5bf9\u4e8e\u8bad\u7ec3\u8fd9\u6837\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u662f\u51fa\u5947\u6709\u6548\u7684\u3002 \u6211\u4eec\u6536\u96c6\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u5927\u7684GUI\u89c6\u89c9\u5b9a\u4f4d\u6570\u636e\u96c6\uff0c\u5305\u542b10M\u4e2aGUI\u5143\u7d20\u53ca\u5176\u5f15\u7528\u8868\u8fbe\uff0c\u8986\u76d6\u4e861.3M\u5f20\u622a\u56fe\uff0c\u4ee5\u6b64\u6765\u8bad\u7ec3UGround\uff0c\u8fd9\u662f\u7528\u4e8eGUI\u4ee3\u7406\u7684\u5f3a\u5927\u901a\u7528\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\u3002\u5728\u516d\u4e2a\u8de8\u4e09\u4e2a\u7c7b\u522b\uff08\u5b9a\u4f4d\u3001\u79bb\u7ebf\u4ee3\u7406\u548c\u5728\u7ebf\u4ee3\u7406\uff09\u7684\u57fa\u51c6\u6d4b\u8bd5\u4e0a\uff0c\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\u51fa\u4ee5\u4e0b\u4e24\u70b9\uff1a 1\uff09UGround\u663e\u8457\u4f18\u4e8e\u73b0\u6709GUI\u4ee3\u7406\u7684\u89c6\u89c9\u5b9a\u4f4d\u6a21\u578b\uff0c\u7edd\u5bf9\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe20%\u3002 2\uff09\u4f7f\u7528UGround\u7684\u4ee3\u7406\u5728\u6027\u80fd\u4e0a\u8d85\u8d8a\u4e86\u6700\u5148\u8fdb\u7684\u4ee3\u7406\uff0c\u5c3d\u7ba1\u73b0\u6709\u7684\u4ee3\u7406\u4f7f\u7528\u989d\u5916\u7684\u57fa\u4e8e\u6587\u672c\u7684\u8f93\u5165\uff0c\u800c\u6211\u4eec\u7684\u4ee3\u7406\u4ec5\u4f9d\u8d56\u4e8e\u89c6\u89c9\u611f\u77e5\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5f3a\u6709\u529b\u5730\u652f\u6301\u4e86\u8fd9\u6837\u4e00\u79cd\u8bbe\u60f3\uff1a\u5373\u50cf\u4eba\u7c7b\u4e00\u6837\u5728\u6570\u5b57\u4e16\u754c\u4e2d\u5bfc\u822a\u7684GUI\u4ee3\u7406\u662f\u53ef\u884c\u7684\uff0c\u5e76\u4e14\u5145\u6ee1\u4e86\u6f5c\u529b\u3002|\n", "2410.05229": "|**2024-10-07**|**GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models**|Iman Mirzadeh et.al.|[2410.05229](http://arxiv.org/abs/2410.05229)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u6700\u65b0\u8fdb\u5c55\u5f15\u53d1\u4e86\u5bf9\u5b83\u4eec\u5728\u6570\u5b66\u63a8\u7406\u80fd\u529b\u4e0a\u7684\u5173\u6ce8\uff0c\u7279\u522b\u662f\u9488\u5bf9\u5c0f\u5b66\u6c34\u5e73\u95ee\u9898\u3002GSM8K\u57fa\u51c6\u6d4b\u8bd5\u5e7f\u6cdb\u7528\u4e8e\u8bc4\u4f30\u6a21\u578b\u5728\u8fd9\u4e00\u9886\u57df\u7684\u8868\u73b0\u3002\u5c3d\u7ba1LLM\u5728GSM8K\u4e0a\u7684\u6210\u7ee9\u8fd1\u5e74\u6765\u663e\u8457\u63d0\u9ad8\uff0c\u4f46\u5176\u6570\u5b66\u63a8\u7406\u80fd\u529b\u662f\u5426\u771f\u6b63\u6709\u6240\u63d0\u5347\u4ecd\u7136\u5b58\u5728\u7591\u95ee\uff0c\u8fd9\u4f7f\u5f97\u73b0\u6709\u8bc4\u4f30\u6307\u6807\u7684\u53ef\u9760\u6027\u53d7\u5230\u8d28\u7591\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u5927\u89c4\u6a21\u7814\u7a76\uff0c\u6db5\u76d6\u4e86\u5f53\u524d\u6700\u524d\u6cbf\u7684\u5f00\u653e\u548c\u5c01\u95ed\u6a21\u578b\u3002\u4e3a\u4e86\u514b\u670d\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u7684\u5c40\u9650\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86GSM-Symbolic\u6539\u8fdb\u7248\u57fa\u51c6\uff0c\u8be5\u57fa\u51c6\u57fa\u4e8e\u7b26\u53f7\u6a21\u677f\u751f\u6210\u4e86\u591a\u6837\u5316\u7684\u9898\u76ee\u3002GSM-Symbolic\u4f7f\u5f97\u8bc4\u4f30\u66f4\u52a0\u53ef\u63a7\uff0c\u63d0\u4f9b\u4e86\u5173\u952e\u6d1e\u5bdf\u548c\u66f4\u53ef\u9760\u7684\u6307\u6807\u6765\u8861\u91cf\u6a21\u578b\u7684\u63a8\u7406\u80fd\u529b\u3002 \u6211\u4eec\u7684\u53d1\u73b0\u63ed\u793a\u4e86LLM\u5728\u56de\u7b54\u4e0d\u540c\u7248\u672c\u540c\u9898\u65f6\u8868\u73b0\u51fa\u660e\u663e\u7684\u5dee\u5f02\u6027\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728GSM-Symbolic\u57fa\u51c6\u4e2d\uff0c\u4ec5\u6539\u53d8\u95ee\u9898\u4e2d\u7684\u6570\u503c\u540e\uff0c\u6240\u6709\u6a21\u578b\u7684\u8868\u73b0\u90fd\u4f1a\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u8fd9\u4e9b\u6a21\u578b\u5728\u6570\u5b66\u63a8\u7406\u65b9\u9762\u7684\u8106\u5f31\u6027\uff0c\u5e76\u8868\u660e\u968f\u7740\u95ee\u9898\u4e2d\u6761\u76ee\u6570\u91cf\u7684\u589e\u52a0\uff0c\u5176\u6027\u80fd\u4f1a\u663e\u8457\u964d\u4f4e\u3002\u6211\u4eec\u63a8\u6d4b\uff0c\u8fd9\u662f\u56e0\u4e3a\u5f53\u524d\u7684LLM\u65e0\u6cd5\u6267\u884c\u771f\u6b63\u7684\u903b\u8f91\u63a8\u7406\uff1b\u5b83\u4eec\u53ea\u662f\u590d\u5236\u4e86\u8bad\u7ec3\u6570\u636e\u4e2d\u7684\u63a8\u7406\u6b65\u9aa4\u3002\u5373\u4f7f\u6dfb\u52a0\u4e00\u4e2a\u770b\u4f3c\u4e0e\u95ee\u9898\u76f8\u5173\u7684\u5355\u4e2a\u6761\u76ee\uff0c\u6240\u6709\u6700\u5148\u8fdb\u7684\u6a21\u578b\u7684\u8868\u73b0\u4e5f\u4f1a\u5927\u5e45\u4e0b\u964d\uff08\u9ad8\u8fbe65%\uff09\uff0c\u5c3d\u7ba1\u8fd9\u4e2a\u6761\u76ee\u5b9e\u9645\u4e0a\u5e76\u4e0d\u8d21\u732e\u4e8e\u5b8c\u6210\u7b54\u6848\u6240\u9700\u7684\u5173\u952e\u63a8\u7406\u94fe\u3002\u603b\u4e4b\uff0c\u6211\u4eec\u7684\u5de5\u4f5c\u4e3a\u7406\u89e3LLM\u5728\u6570\u5b66\u63a8\u7406\u4e0a\u7684\u80fd\u529b\u548c\u5c40\u9650\u6027\u63d0\u4f9b\u4e86\u4e00\u4e2a\u66f4\u4e3a\u7ec6\u81f4\u7684\u89c6\u89d2\u3002|\n", "2410.05224": "|**2024-10-07**|**Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates**|Avanika Narayan et.al.|[2410.05224](http://arxiv.org/abs/2410.05224)|null|\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aCookbook\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u901a\u8fc7\u7f16\u7a0b\u65b9\u5f0f\u751f\u6210\u8bad\u7ec3\u6570\u636e\uff0c\u6570\u636e\u4e3b\u8981\u7531\u968f\u673a\u6807\u8bb0\u7684\u7b80\u5355\u6a21\u5f0f\u7ec4\u6210\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728\u89c4\u6a21\u548c\u6210\u672c\u65b9\u9762\u5177\u6709\u4f18\u52bf\uff0c\u4e14\u907f\u514d\u4e86\u4e0e\u4eba\u7c7b\u6216\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u751f\u6210\u6570\u636e\u76f8\u5173\u7684\u6cd5\u5f8b\u548c\u9690\u79c1\u95ee\u9898\u3002\u9996\u5148\uff0cCookbook\u5229\u7528\u6570\u636e\u751f\u6210Python\u51fd\u6570\u6a21\u677f\u6765\u4ea7\u751f\u9f13\u52b1\u6a21\u578b\u5b66\u4e60\u4e0e\u7279\u5b9a\u4efb\u52a1\u76f8\u5339\u914d\u7684\u663e\u5f0f\u89c4\u5219\u7684\u8bad\u7ec3\u6570\u636e\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u8fdb\u884c\u5fae\u8c03\u80fd\u591f\u663e\u8457\u63d0\u9ad8\u6a21\u578b\u5728\u5bf9\u5e94\u4efb\u52a1\u4e0a\u7684\u8868\u73b0\uff0c\u6700\u9ad8\u53ef\u8fbe52.7\u4e2a\u51c6\u786e\u6027\u70b9\u3002\u5176\u6b21\uff0c\u7531\u4e8e\u6307\u4ee4\u6570\u636e\u96c6\u80fd\u591f\u540c\u65f6\u6539\u5584\u591a\u4e2a\u4e0b\u6e38\u4efb\u52a1\u7684\u8868\u73b0\uff0cCookbook\u7b97\u6cd5\u81ea\u52a8\u5b66\u4e60\u5982\u4f55\u6df7\u5408\u6765\u81ea\u4e0d\u540c\u6a21\u677f\u7684\u6570\u636e\u4ee5\u4f18\u5316\u591a\u4e2a\u4efb\u52a1\u7684\u6027\u80fd\u3002\u5728\u6807\u51c6\u7684\u591a\u4efb\u52a1GPT4ALL\u8bc4\u4f30\u5957\u4ef6\u4e0a\uff0c\u4f7f\u7528Cookbook\u751f\u6210\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u7684Mistral-7B\u6a21\u578b\u5728\u5e73\u5747\u51c6\u786e\u6027\u548c\u4e09\u4e2a\u4efb\u52a1\u4e2d\u7684\u4e09\u4e2a\u4e0a\u5747\u53d6\u5f97\u6700\u4f73\u6210\u7ee9\u3002\u6700\u540e\uff0c\u5206\u6790\u4e86Cookbook\u4e3a\u4f55\u80fd\u63d0\u9ad8\u6027\u80fd\u4ee5\u53ca\u5176\u80cc\u540e\u7684\u539f\u7406\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u9879\u6307\u6807\u6765\u9a8c\u8bc1\u6539\u8fdb\u7684\u4e3b\u8981\u539f\u56e0\u662f\u6a21\u578b\u751f\u6210\u7684\u7ed3\u679c\u66f4\u597d\u5730\u9075\u5faa\u4e86\u6a21\u677f\u89c4\u5219\u3002|\n", "2410.07176": "|**2024-10-09**|**Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models**|Fei Wang et.al.|[2410.07176](http://arxiv.org/abs/2410.07176)|null|\u5728\u63a2\u7d22\u5982\u4f55\u901a\u8fc7\u8054\u5408\u5206\u6790\u6765\u7406\u89e3\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u5bf9\u751f\u6210\u578b\u95ee\u7b54\uff08RAG\uff09\u884c\u4e3a\u7684\u5f71\u54cd\uff0c\u4ee5\u53ca\u5982\u4f55\u5728LLM\u5185\u90e8\u77e5\u8bc6\u4e0e\u5916\u90e8\u6765\u6e90\u4e4b\u95f4\u4ea7\u751f\u6f5c\u5728\u51b2\u7a81\u65f6\uff0c\u6211\u4eec\u53d1\u73b0\uff0c\u4e0d\u5b8c\u7f8e\u7684\u68c0\u7d22\u589e\u5f3a\u53ef\u80fd\u662f\u4e0d\u53ef\u907f\u514d\u7684\uff0c\u5e76\u4e14\u4f1a\u5bf9RAG\u7cfb\u7edf\u9020\u6210\u4e25\u91cd\u5f71\u54cd\u3002\u901a\u8fc7\u5728\u73b0\u5b9e\u6761\u4ef6\u4e0b\u7684\u63a7\u5236\u6027\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u4e86\u4ece\u68c0\u7d22\u5230\u7684\u4e0d\u5b8c\u6574\u77e5\u8bc6\u4e0eLLM\u5185\u90e8\u77e5\u8bc6\u4e4b\u95f4\u7684\u77e5\u8bc6\u51b2\u7a81\u662fRAG\u540e\u5904\u7406\u9636\u6bb5\u9700\u8981\u514b\u670d\u7684\u5173\u952e\u74f6\u9888\u3002 \u4e3a\u4e86\u4f7fLLM\u5728\u9762\u5bf9\u4e0d\u5b8c\u7f8e\u68c0\u7d22\u65f6\u5177\u6709\u9c81\u68d2\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201c\u7cbe\u660eRAG\u201d\u8fd9\u4e00\u65b0\u9896\u7684RAG\u65b9\u6cd5\u3002\u8be5\u65b9\u6cd5\u80fd\u591f\u9002\u5f53\u5730\u6fc0\u53d1LLM\u5185\u90e8\u77e5\u8bc6\u4e2d\u7684\u5173\u952e\u4fe1\u606f\uff0c\u901a\u8fc7\u6e90\u610f\u8bc6\u5730\u6574\u5408\u5185\u90e8\u548c\u5916\u90e8\u77e5\u8bc6\uff0c\u6700\u7ec8\u6839\u636e\u4fe1\u606f\u53ef\u9760\u6027\u786e\u5b9a\u7b54\u6848\u3002\u6211\u4eec\u7684\u5b9e\u9a8c\u7ed3\u679c\u4f7f\u7528\u4e86Gemini\u548cClaude\u4e24\u4e2a\u6a21\u578b\u9a8c\u8bc1\u4e86\u201c\u7cbe\u660eRAG\u201d\u7684\u6709\u6548\u6027\uff0c\u8bc1\u660e\u5176\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u7684\u589e\u5f3aRAG\u9c81\u68d2\u6027\u7684\u65b9\u6cd5\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u6700\u574f\u60c5\u51b5\u573a\u666f\u4e0b\uff0c\u201c\u7cbe\u660eRAG\u201d\u662f\u552f\u4e00\u80fd\u591f\u8fbe\u5230\u6216\u8d85\u8fc7\u6ca1\u6709RAG\u7684LLM\u6027\u80fd\u7684\u65b9\u6cd5\u3002 \u8fdb\u4e00\u6b65\u7684\u5206\u6790\u8868\u660e\uff0c\u201c\u7cbe\u660eRAG\u201d\u6709\u6548\u5730\u89e3\u51b3\u4e86\u77e5\u8bc6\u51b2\u7a81\u95ee\u9898\uff0c\u63d0\u9ad8\u4e86RAG\u7cfb\u7edf\u7684\u53ef\u9760\u6027\u548c\u53ef\u4fe1\u5ea6\u3002|\n", "2410.07173": "|**2024-10-09**|**Do better language models have crisper vision?**|Jona Ruthardt et.al.|[2410.07173](http://arxiv.org/abs/2410.07173)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u6587\u672c\u4ec5\u4f9d\u8d56\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u7406\u89e3\u89c6\u89c9\u4e16\u754c\u65b9\u9762\u7684\u8868\u73b0\u3002\u968f\u7740LLMs\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u5e94\u7528\u65e5\u76ca\u5e7f\u6cdb\uff0c\u8fd9\u4e00\u95ee\u9898\u53d8\u5f97\u65e2\u57fa\u7840\u53c8\u5173\u952e\u3002\u73b0\u6709\u7814\u7a76\u4e3b\u8981\u96c6\u4e2d\u5728\u6709\u9650\u7684\u573a\u666f\u4e0a\uff0c\u5982\u751f\u6210\u89c6\u89c9\u5185\u5bb9\u6216\u5bf9\u591a\u6a21\u6001\u6570\u636e\u8fdb\u884c\u805a\u7c7b\u3002\u56e0\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u9879\u540d\u4e3a\u201c\u89c6\u89c9\u6587\u672c\u8868\u793a\u57fa\u51c6\u201d\uff08ViTeRB\uff09\u7684\u4efb\u52a1\uff0c\u65e8\u5728\u8bc6\u522b\u51fa\u80fd\u591f\u4e0e\u89c6\u89c9\u4e16\u754c\u9ad8\u5ea6\u4e00\u81f4\u7684\u5173\u952e\u5c5e\u6027\u3002\u57fa\u4e8e\u6b64\u4efb\u52a1\u7684\u7ed3\u679c\uff0c\u6211\u4eec\u53d1\u73b0\u89e3\u7801\u5668\u578b\u5927\u8bed\u8a00\u6a21\u578b\u5728\u89c6\u89c9\u4e3a\u4e2d\u5fc3\u7684\u8bed\u5883\u4e0b\u4f5c\u4e3a\u6587\u672c\u8868\u793a\u7684\u7406\u60f3\u5019\u9009\uff0c\u8fd9\u4e0e\u5f53\u524d\u4f7f\u7528\u6587\u672c\u7f16\u7801\u5668\u7684\u505a\u6cd5\u5f62\u6210\u4e86\u5bf9\u6bd4\u3002 \u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u201cShareLock\u201d\u2014\u2014\u4e00\u79cd\u8d85\u8f7b\u91cf\u7ea7\u7684\u7c7b\u4f3cCLIP\u7684\u6a21\u578b\u3002\u901a\u8fc7\u5229\u7528\u4ece\u5f3a\u5927\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u9884\u8ba1\u7b97\u7684\u51bb\u7ed3\u7279\u5f81\uff0cShareLock\u5728ImageNet\u4e0a\u53d6\u5f97\u4e8651%\u7684\u51c6\u786e\u7387\uff0c\u4ec5\u4f7f\u7528\u4e86563,000\u5f20\u56fe\u50cf-\u63cf\u8ff0\u5bf9\u3002\u6b64\u5916\uff0c\u8bad\u7ec3\u6240\u9700\u7684\u8d44\u6e90\u4ec5\u4e3a1\u4e2aGPU\u5c0f\u65f6\uff08\u6216\u5305\u62ec\u7279\u5f81\u9884\u8ba1\u7b97\u768410\u4e2a\u5c0f\u65f6\uff09\uff0c\u8fdc\u5c11\u4e8e\u4ee5\u5f80\u65b9\u6cd5\u6240\u9700\u7684\u65f6\u95f4\u6570\u91cf\u7ea7\u3002\u6211\u4eec\u5c06\u63d0\u4f9b\u8be5\u4ee3\u7801\u3002|\n", "2410.07167": "|**2024-10-09**|**Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate**|Qidong Huang et.al.|[2410.07167](http://arxiv.org/abs/2410.07167)|**[link](https://github.com/shikiw/modality-integration-rate)**|**\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u6709\u6548\u7684\u3001\u7a33\u5065\u7684\u4e14\u901a\u7528\u7684\u6307\u6807\u2014\u2014\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u7528\u4e8e\u8861\u91cf\u5927\u578b\u89c6\u89c9\u8bed\u8a00\u6a21\u578b(LVLMs)\u7684\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u8d28\u91cf\u3002\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u5728\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u4e2d\u626e\u6f14\u7740\u5173\u952e\u89d2\u8272\uff0c\u800c\u5982\u4f55\u5728\u6602\u8d35\u7684\u76d1\u7763\u5fae\u8c03\u9636\u6bb5\u4e4b\u524d\u8bc4\u4f30\u5176\u8bad\u7ec3\u8d28\u91cf\u5219\u662f\u4e00\u4e2a\u672a\u5145\u5206\u63a2\u7d22\u7684\u9886\u57df\u3002\u5bf9\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b(LLLs)\uff0c\u5e38\u7528\u7684\u9884\u8bad\u7ec3\u6307\u6807\u5305\u62ec\u635f\u5931\u3001\u56f0\u60d1\u5ea6\u4ee5\u53ca\u4e0a\u4e0b\u6587\u5185\u8bc4\u4f30\u7ed3\u679c\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u8fd9\u4e9b\u6307\u6807\u5728\u5bf9\u826f\u597d\u8bad\u7ec3\u7684LLMs\u4e0e\u65b0\u6a21\u6001\u8fdb\u884c\u5bf9\u9f50\u65f6\u5e76\u4e0d\u5177\u6709\u5f88\u597d\u7684\u6307\u793a\u6027\u3002\u7531\u4e8e\u7f3a\u4e4f\u5408\u9002\u7684\u6307\u6807\uff0cLVLMs\u5728\u5173\u952e\u7684\u9884\u8bad\u7ec3\u9636\u6bb5\u7684\u7814\u7a76\u53d7\u5230\u4e86\u6781\u5927\u7684\u963b\u788d\uff0c\u5305\u62ec\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u9ad8\u6548\u6a21\u5757\u8bbe\u8ba1\u7b49\u3002\u672c\u6587\u63d0\u51fa\u4ece\u8de8\u6a21\u6001\u5206\u5e03\u8ddd\u79bb\u7684\u89d2\u5ea6\u6765\u8bc4\u4f30\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u5f15\u5165\u4e86\u6a21\u6001\u6574\u5408\u7387(MIR)\uff0c\u8be5\u6307\u6807\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a1\uff09**\u6709\u6548**\u5730\u4ee3\u8868\u9884\u8bad\u7ec3\u8d28\u91cf\uff0c\u5e76\u4e0e\u7ecf\u8fc7\u76d1\u7763\u5fae\u8c03\u540e\u7684\u57fa\u51c6\u6027\u80fd\u5448\u73b0\u6b63\u76f8\u5173\uff1b2\uff09**\u7a33\u5065**\u4e8e\u4e0d\u540c\u7684\u8bad\u7ec3/\u8bc4\u4f30\u6570\u636e\uff1b3\uff09**\u6cdb\u5316**\u4e8e\u591a\u79cd\u8bad\u7ec3\u914d\u7f6e\u548c\u67b6\u6784\u9009\u62e9\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u7cfb\u5217\u9884\u8bad\u7ec3\u5b9e\u9a8c\u4ee5\u63a2\u7d22MIR\u7684\u6709\u6548\u6027\uff0c\u5e76\u89c2\u5bdf\u5230\u4ee4\u4eba\u6ee1\u610f\u7684\u7ed3\u679c\uff0c\u5373MIR\u80fd\u591f\u6307\u793a\u8bad\u7ec3\u6570\u636e\u9009\u62e9\u3001\u8bad\u7ec3\u7b56\u7565\u8c03\u5ea6\u4ee5\u53ca\u6a21\u578b\u67b6\u6784\u8bbe\u8ba1\u4ee5\u83b7\u5f97\u66f4\u597d\u7684\u9884\u8bad\u7ec3\u7ed3\u679c\u3002\u6211\u4eec\u5e0c\u671bMIR\u80fd\u591f\u6210\u4e3a\u6784\u5efa\u5177\u5907\u5f3a\u5927\u80fd\u529b\u7684LVLMs\u7684\u6709\u7528\u6307\u6807\uff0c\u5e76\u6fc0\u53d1\u4e0d\u540c\u9886\u57df\u5173\u4e8e\u6a21\u6001\u5bf9\u9f50\u7684\u540e\u7eed\u7814\u7a76\u3002\u6211\u4eec\u7684\u4ee3\u7801\u5df2\u5f00\u6e90\u5728\uff1ahttps://github.com/shikiw/Modality-Integration-Rate\u3002**|\n", "2410.07166": "|**2024-10-09**|**Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making**|Manling Li et.al.|[2410.07166](http://arxiv.org/abs/2410.07166)|**[link](https://github.com/embodied-agent-interface/embodied-agent-interface)**|**\u4e3a\u4e86\u7cfb\u7edf\u5730\u8bc4\u4f30\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u7684\u8868\u73b0\uff0c\u867d\u7136\u5df2\u6709\u5927\u91cf\u7814\u7a76\u5229\u7528LLMs\u5904\u7406\u5b9e\u4f53\u5316\u73af\u5883\u4e2d\u7684\u51b3\u7b56\u95ee\u9898\uff0c\u4f46\u6211\u4eec\u4ecd\u7f3a\u4e4f\u5bf9\u5176\u6027\u80fd\u7684\u5168\u9762\u7406\u89e3\u3002\u73b0\u6709\u5de5\u4f5c\u901a\u5e38\u5728\u4e0d\u540c\u9886\u57df\u3001\u9488\u5bf9\u4e0d\u540c\u76ee\u7684\u3001\u57fa\u4e8e\u4e0d\u540c\u8f93\u5165\u548c\u8f93\u51fa\u6784\u5efaLLMs\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u7edf\u4e00\u8bc4\u4ef7\u5b83\u4eec\u3002\u73b0\u6709\u8bc4\u4f30\u65b9\u6cd5\u5f80\u5f80\u4ec5\u4f9d\u8d56\u6700\u7ec8\u7684\u6210\u529f\u7387\uff0c\u8fd9\u4f7f\u5f97\u96be\u4ee5\u8bc6\u522bLLMs\u7f3a\u5931\u7684\u80fd\u529b\u4ee5\u53ca\u95ee\u9898\u6240\u5728\uff0c\u8fdb\u800c\u963b\u788d\u4e86\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u6709\u6548\u4e14\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u901a\u7528\u63a5\u53e3\uff08\u5b9e\u4f53\u5316\u667a\u80fd\u4f53\u63a5\u53e3\uff09\uff0c\u65e8\u5728\u652f\u6301\u5404\u79cd\u4efb\u52a1\u7c7b\u578b\u4e0eLLM\u6a21\u5757\u8f93\u5165-\u8f93\u51fa\u89c4\u8303\u7684\u7edf\u4e00\u5316\u3002\u5177\u4f53\u800c\u8a00\uff0c\u8be5\u63a5\u53e3\u5141\u8bb8\uff1a 1. \u7edf\u4e00\u591a\u79cd\u6d89\u53ca\u72b6\u6001\u4e0e\u65f6\u95f4\u5ef6\u4f38\u76ee\u6807\u7684\u5b9e\u4f53\u5316\u51b3\u7b56\u4efb\u52a1\u3002 2. \u7edf\u4e00\u56db\u79cd\u5e38\u7528\u7684\u7528\u4e8e\u51b3\u7b56\u7684LLM\u6a21\u5757\uff1a\u76ee\u6807\u89e3\u91ca\u3001\u5b50\u76ee\u6807\u5206\u89e3\u3001\u52a8\u4f5c\u5e8f\u5217\u89c4\u5212\u548c\u8fc7\u6e21\u5efa\u6a21\u3002 3. \u63d0\u4f9b\u4e00\u7cfb\u5217\u7cbe\u7ec6\u7c92\u5ea6\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u5c06\u8bc4\u4f30\u7ec6\u5206\u4e3a\u5404\u79cd\u9519\u8bef\u7c7b\u578b\uff0c\u5982\u5e7b\u89c9\u9519\u8bef\u3001\u53ef\u7528\u6027\u9519\u8bef\u3001\u4e0d\u540c\u7c7b\u578b\u89c4\u5212\u9519\u8bef\u7b49\u3002 \u6574\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u7684\u57fa\u51c6\u63d0\u4f9b\u4e86\u5bf9LLMs\u5728\u4e0d\u540c\u5b50\u4efb\u52a1\u4e0a\u7684\u5168\u9762\u8bc4\u4f30\uff0c\u63ed\u793a\u4e86LLM\u9a71\u52a8\u7684\u5b9e\u4f53\u5316\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u7684\u5f3a\u9879\u4e0e\u5f31\u70b9\uff0c\u5e76\u4e3a\u6709\u6548\u548c\u9009\u62e9\u6027\u5730\u5229\u7528LLMs\u5728\u5b9e\u4f53\u5316\u51b3\u7b56\u4e2d\u63d0\u4f9b\u4e86\u89c1\u89e3\u3002**|\n", "2410.07163": "|**2024-10-09**|**Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning**|Chongyu Fan et.al.|[2410.07163](http://arxiv.org/abs/2410.07163)|**[link](https://github.com/OPTML-Group/Unlearn-Simple)**|\u672c\u6587\u65e8\u5728\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u53bb\u5b66\u4e60\u95ee\u9898\uff0c\u5373\u5728\u4e0d\u91cd\u65b0\u4ece\u5934\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\uff0c\u6d88\u9664\u4e0d\u9700\u8981\u7684\u6570\u636e\u5f71\u54cd\u4ee5\u53ca\u76f8\u5173\u6a21\u578b\u80fd\u529b\uff08\u5982\u7248\u6743\u6570\u636e\u6216\u6709\u5bb3\u5185\u5bb9\u751f\u6210\uff09\uff0c\u540c\u65f6\u4fdd\u7559\u5fc5\u8981\u7684\u6a21\u578b\u529f\u80fd\u3002\u5c3d\u7ba1\u5bf9LLM\u53bb\u5b66\u4e60\u7684\u9700\u6c42\u65e5\u76ca\u589e\u957f\uff0c\u4f46\u5c1a\u672a\u5f62\u6210\u4e00\u79cd\u539f\u7406\u6027\u7684\u4f18\u5316\u6846\u67b6\u3002 \u4e3a\u6b64\uff0c\u6211\u4eec\u56de\u987e\u4e86\u5f53\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u2014\u2014\u8d1f\u504f\u597d\u4f18\u5316\uff08NPO\uff09\uff0c\u5e76\u53d1\u73b0\u4e86\u53c2\u8003\u6a21\u578b\u504f\u89c1\u7684\u95ee\u9898\uff0c\u8fd9\u53ef\u80fd\u524a\u5f31NPO\u7684\u6709\u6548\u6027\uff0c\u7279\u522b\u662f\u5728\u53bb\u5b66\u4e60\u4e0d\u540c\u96be\u5ea6\u6570\u636e\u65f6\u3002\u9274\u4e8e\u6b64\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u53bb\u5b66\u4e60\u4f18\u5316\u6846\u67b6\u2014\u2014SimNPO\uff0c\u8868\u660e\u901a\u8fc7\u7b80\u5355\u7684\u504f\u597d\u4f18\u5316\u51cf\u5c11\u5bf9\u53c2\u8003\u6a21\u578b\u7684\u4f9d\u8d56\uff08\u4ece\u7b80\u5316\u89c6\u89d2\u6765\u770b\uff09\u6709\u52a9\u4e8e\u53bb\u5b66\u4e60\u8fc7\u7a0b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u4e86\u6df1\u5165\u7684SimNPO\u4f18\u52bf\u5206\u6790\uff0c\u901a\u8fc7\u6df7\u5408\u9a6c\u5c14\u53ef\u592b\u94fe\u7684\u5206\u6790\u65b9\u6cd5\u652f\u6301\u8fd9\u4e00\u89c2\u70b9\u3002 \u6211\u4eec\u901a\u8fc7\u5728TOFU\u548cMUSE\u7b49\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5927\u91cf\u5b9e\u9a8c\u9a8c\u8bc1\u4e86SimNPO\u76f8\u5bf9\u4e8e\u73b0\u6709\u53bb\u5b66\u4e60\u57fa\u7ebf\u7684\u4f18\u8d8a\u6027\uff0c\u5e76\u5c55\u793a\u4e86\u5176\u5bf9\u91cd\u65b0\u5b66\u4e60\u653b\u51fb\u7684\u9c81\u68d2\u6027\u3002\u6240\u6709\u4ee3\u7801\u5747\u53ef\u5728GitHub\u4e0a\u7684https://github.com/OPTML-Group/Unlearn-Simple\u83b7\u53d6\u3002|\n", "2410.07155": "|**2024-10-09**|**Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis**|Bohan Zeng et.al.|[2410.07155](http://arxiv.org/abs/2410.07155)|**[link](https://github.com/yangling0818/trans4d)**|**\u8fd1\u671f\u5728\u6269\u6563\u6a21\u578b\u9886\u57df\u7684\u8fdb\u5c55\u5c55\u793a\u4e86\u5176\u5728\u56fe\u50cf\u548c\u89c6\u9891\u751f\u6210\u65b9\u9762\u7684\u5353\u8d8a\u80fd\u529b\uff0c\u8fdb\u4e00\u6b65\u63d0\u5347\u4e864D\u5408\u6210\u7684\u6709\u6548\u6027\u3002\u73b0\u6709\u76844D\u751f\u6210\u65b9\u6cd5\u80fd\u591f\u6839\u636e\u7528\u6237\u53cb\u597d\u7684\u6761\u4ef6\u751f\u6210\u9ad8\u8d28\u91cf\u76844D\u5bf9\u8c61\u6216\u573a\u666f\uff0c\u5bf9\u6e38\u620f\u548c\u89c6\u9891\u884c\u4e1a\u5927\u6709\u88e8\u76ca\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u65b9\u6cd5\u5728\u5408\u6210\u590d\u67424D\u8fc7\u6e21\u548c\u573a\u666f\u5185\u5bf9\u8c61\u4ea4\u4e92\u7684\u663e\u8457\u53d8\u5f62\u65b9\u9762\u4ecd\u5b58\u5728\u6311\u6218\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTrans4D\u7684\u521b\u65b0\u6587\u672c\u52304D\u5408\u6210\u6846\u67b6\uff0c\u65e8\u5728\u5b9e\u73b0\u771f\u5b9e\u53ef\u4fe1\u7684\u573a\u666f\u7ea7\u590d\u6742\u8fc7\u6e21\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u9996\u5148\u5229\u7528\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u751f\u6210\u7269\u7406\u610f\u8bc6\u7684\u573a\u666f\u63cf\u8ff0\u4ee5\u8fdb\u884c4D\u573a\u666f\u521d\u59cb\u5316\u4ee5\u53ca\u6709\u6548\u8fc7\u6e21\u65f6\u95f4\u89c4\u5212\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u51e0\u4f55\u611f\u77e5\u76844D\u8fc7\u6e21\u7f51\u7edc\uff0c\u57fa\u4e8e\u8ba1\u5212\u5b9e\u73b0\u590d\u6742\u7684\u573a\u666f\u7ea74D\u8fc7\u6e21\uff0c\u6d89\u53ca\u8868\u73b0\u529b\u5f3a\u7684\u5bf9\u8c61\u51e0\u4f55\u53d8\u5f62\u3002\u5e7f\u6cdb\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cTrans4D\u5728\u751f\u6210\u5177\u6709\u51c6\u786e\u6027\u548c\u9ad8\u8d28\u91cf\u8fc7\u6e21\u76844D\u573a\u666f\u65b9\u9762\u59cb\u7ec8\u8d85\u8d8a\u73b0\u6709\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\uff0c\u9a8c\u8bc1\u4e86\u5176\u6709\u6548\u6027\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/Trans4D**|\n", "2410.07129": "|**2024-10-09**|**Mental Disorders Detection in the Era of Large Language Models**|Gleb Kuzmin et.al.|[2410.07129](http://arxiv.org/abs/2410.07129)|null|\u672c\u6587\u6bd4\u8f83\u4e86\u4f20\u7edf\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3001\u7f16\u7801\u5668\u57fa\u6a21\u578b\u4ee5\u53ca\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5728\u6291\u90c1\u75c7\u548c\u7126\u8651\u75c7\u68c0\u6d4b\u4efb\u52a1\u4e0a\u7684\u6548\u679c\u3002\u8003\u8651\u4e86\u4e94\u4e2a\u4e0d\u540c\u683c\u5f0f\u7684\u6570\u636e\u5e93\uff0c\u6bcf\u4e2a\u6570\u636e\u5e93\u90fd\u91c7\u7528\u4e86\u4e0d\u540c\u7684\u65b9\u6cd5\u6765\u5b9a\u4e49\u76ee\u6807\u75c5\u7406\u5b66\u7c7b\u522b\u3002\u6211\u4eec\u6d4b\u8bd5\u4e86\u57fa\u4e8e\u8bed\u8a00\u7279\u5f81\u7684AutoML\u6a21\u578b\u3001\u591a\u79cd\u53d8\u4f53\u7684Transformer\u7f16\u7801\u5668\uff0c\u5982BERT\uff0c\u4ee5\u53ca\u6700\u5148\u8fdb\u7684LLM\u4f5c\u4e3a\u75c5\u7406\u5206\u7c7b\u6a21\u578b\u3002\u7ed3\u679c\u8868\u660e\uff0cLLM\u5728\u566a\u58f0\u5927\u4e14\u8bad\u7ec3\u6837\u672c\u5728\u6587\u672c\u957f\u5ea6\u548c\u7c7b\u578b\u4e0a\u5dee\u5f02\u663e\u8457\u7684\u5c0f\u6570\u636e\u96c6\u4e0a\u8868\u73b0\u51fa\u8272\u3002\u7136\u800c\uff0c\u5f53\u5728\u786e\u8bca\u4e3a\u6291\u90c1\u75c7\u4e2a\u4f53\u7684\u6587\u672c\u4e0a\u8fdb\u884c\u8bad\u7ec3\u65f6\uff0c\u8bed\u8a00\u6a21\u578b\u7684\u6027\u80fd\u4f18\u4e8e\u4f20\u7edf\u7684\u5fc3\u7406\u8bed\u8a00\u5b66\u7279\u5f81\u548c\u7f16\u7801\u5668\u57fa\u6a21\u578b\uff0c\u8fd9\u51f8\u663e\u4e86\u5b83\u4eec\u5728\u7279\u5b9a\u4e34\u5e8a\u5e94\u7528\u4e2d\u7684\u6f5c\u529b\u3002|\n", "2410.07113": "|**2024-10-09**|**Personalized Visual Instruction Tuning**|Renjie Pi et.al.|[2410.07113](http://arxiv.org/abs/2410.07113)|**[link](https://github.com/sterzhang/pvit)**|\u8fd1\u671f\uff0c\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u8fdb\u5c55\u5c55\u73b0\u4e86\u663e\u8457\u7684\u8fdb\u6b65\uff0c\u7136\u800c\uff0c\u8fd9\u4e9b\u6a21\u578b\u5b58\u5728\u4e00\u4e2a\u660e\u663e\u7684\u5c40\u9650\u6027\u2014\u2014\u201c\u9762\u90e8\u76f2\u75c7\u201d\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u4eec\u80fd\u591f\u8fdb\u884c\u4e00\u822c\u6027\u7684\u5bf9\u8bdd\uff0c\u4f46\u5374\u65e0\u6cd5\u9488\u5bf9\u7279\u5b9a\u4e2a\u4f53\u8fdb\u884c\u4e2a\u6027\u5316\u5bf9\u8bdd\u3002\u8fd9\u4e00\u7f3a\u9677\u963b\u788d\u4e86MLLMs\u5728\u4e2a\u6027\u5316\u573a\u666f\u4e2d\u7684\u5e94\u7528\uff0c\u5982\u5b9a\u5236\u5316\u7684\u79fb\u52a8\u8bbe\u5907\u89c6\u89c9\u52a9\u624b\u6216\u9700\u8981\u8bc6\u522b\u5bb6\u5ead\u6210\u5458\u7684\u5bb6\u7528\u673a\u5668\u4eba\u3002\u4e3a\u6b64\uff0c\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u4e2a\u6027\u5316\u89c6\u89c9\u6307\u4ee4\u8c03\u6574\uff08PVIT\uff09\u7684\u65b0\u9896\u6570\u636e\u6574\u7406\u4e0e\u8bad\u7ec3\u6846\u67b6\uff0c\u65e8\u5728\u4f7fMLLMs\u80fd\u591f\u8bc6\u522b\u56fe\u50cf\u4e2d\u7684\u76ee\u6807\u4e2a\u4f53\uff0c\u5e76\u5c55\u5f00\u4e2a\u6027\u5316\u4e14\u8fde\u8d2f\u7684\u5bf9\u8bdd\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u6d89\u53ca\u5f00\u53d1\u4e00\u4e2a\u590d\u6742\u7684\u7ba1\u9053\uff0c\u8be5\u7ba1\u9053\u80fd\u591f\u81ea\u4e3b\u751f\u6210\u5305\u542b\u4e2a\u6027\u5316\u5bf9\u8bdd\u7684\u8bad\u7ec3\u6570\u636e\u3002\u8fd9\u4e2a\u7ba1\u9053\u5229\u7528\u4e86\u5404\u79cd\u89c6\u89c9\u4e13\u5bb6\u3001\u56fe\u50cf\u751f\u6210\u6a21\u578b\u548c\uff08\u591a\u6a21\u6001\uff09\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u3002\u4e3a\u4e86\u8bc4\u4f30MLLMs\u7684\u4e2a\u6027\u5316\u6f5c\u529b\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u4e2a\u540d\u4e3aP-Bench\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5176\u4e2d\u5305\u62ec\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u591a\u79cd\u95ee\u9898\u7c7b\u578b\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5728\u4f7f\u7528\u6211\u4eec\u6574\u7406\u7684\u6570\u636e\u96c6\u8fdb\u884c\u5fae\u8c03\u540e\uff0c\u4e2a\u6027\u5316\u6027\u80fd\u5f97\u5230\u4e86\u663e\u8457\u63d0\u5347\u3002|\n", "2410.07109": "|**2024-10-09**|**I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy**|Gian Maria Campedelli et.al.|[2410.07109](http://arxiv.org/abs/2410.07109)|**[link](https://github.com/mobs-fbk/llm_interaction_simulator)**|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u667a\u80fd\u4f53\u53d8\u5f97\u8d8a\u6765\u8d8a\u81ea\u4e3b\uff0c\u5e76\u4e14\u5728\u5f7c\u6b64\u95f4\u81ea\u7531\u4e92\u52a8\u65f6\uff0c\u7814\u7a76\u5b83\u4eec\u4e4b\u95f4\u7684\u4ea4\u4e92\u6a21\u5f0f\u53d8\u5f97\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u6709\u52a9\u4e8e\u6211\u4eec\u9884\u89c1\u53ef\u80fd\u4ea7\u751f\u7684\u65b0\u73b0\u8c61\u4ee5\u53ca\u6f5c\u5728\u98ce\u9669\u3002\u672c\u6587\u53d7\u65af\u5766\u798f\u76d1\u72f1\u5b9e\u9a8c\u542f\u53d1\uff0c\u4e13\u6ce8\u4e8e\u7814\u7a76\u5177\u6709\u4e25\u683c\u793e\u4f1a\u7b49\u7ea7\u80cc\u666f\u7684\u591a\u667a\u80fd\u4f53\u73af\u5883\u4e2d\u7684LLM\u4ea4\u4e92\u6a21\u5f0f\u3002 \u7814\u7a76\u805a\u7126\u4e8e\u4e24\u7c7b\u4e3b\u8981\u73b0\u8c61\uff1a\u8bf4\u670d\u529b\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\uff0c\u5728\u6d89\u53ca\u770b\u5b88\u548c\u8bd5\u56fe\u8fbe\u6210\u7279\u5b9a\u76ee\u6807\uff08\u5982\u83b7\u5f97\u989d\u5916\u7684\u6237\u5916\u6d3b\u52a8\u65f6\u95f4\u6216\u9003\u72f1\uff09\u7684\u56da\u72af\u667a\u80fd\u4f53\u4e4b\u95f4\u7684\u6a21\u62df\u573a\u666f\u4e2d\u8fdb\u884c\u63a2\u8ba8\u3002\u901a\u8fc7\u4f7f\u7528200\u4e2a\u5b9e\u9a8c\u573a\u666f\uff0c\u5171\u8ba12000\u6b21\u673a\u5668\u95f4\u7684\u5bf9\u8bdd\uff0c\u7814\u7a76\u4e86\u4e94\u79cd\u6d41\u884c\u7684LLM\uff0c\u83b7\u5f97\u4e86\u4ee5\u4e0b\u663e\u8457\u53d1\u73b0\uff1a 1. \u4e00\u4e9b\u6a21\u578b\u5728\u591a\u667a\u80fd\u4f53\u8bbe\u7f6e\u4e2d\u6301\u7eed\u5931\u8d25\uff0c\u65e0\u6cd5\u8fdb\u884c\u6709\u610f\u4e49\u7684\u5bf9\u8bdd\u3002 2. \u5bf9\u4e8e\u80fd\u591f\u6210\u529f\u4e92\u52a8\u7684\u6a21\u578b\uff0c\u76ee\u6807\u5bf9\u667a\u80fd\u4f53\u7684\u8bf4\u670d\u529b\u6709\u663e\u8457\u5f71\u54cd\uff0c\u800c\u5bf9\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u5f71\u54cd\u5219\u5fae\u4e4e\u5176\u5fae\u3002 3. \u667a\u80fd\u4f53\u7684\u89d2\u8272\uff0c\u7279\u522b\u662f\u770b\u5b88\u7684\u4eba\u683c\u7279\u8d28\uff0c\u5bf9\u56da\u72af\u7684\u8bf4\u670d\u6210\u529f\u51e0\u7387\u548c\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u51fa\u73b0\u6709\u7740\u76f4\u63a5\u63a8\u52a8\u4f5c\u7528\u3002 4. \u5373\u4f7f\u6ca1\u6709\u660e\u786e\u63d0\u793a\u7279\u5b9a\u7684\u4eba\u683c\u7279\u8d28\uff0c\u4ec5\u901a\u8fc7\u8d4b\u4e88\u89d2\u8272\uff0c\u4e5f\u89c2\u5bdf\u5230\u4e86\u53cd\u793e\u4f1a\u884c\u4e3a\u7684\u81ea\u7136\u4ea7\u751f\u3002 \u8fd9\u4e9b\u7ed3\u679c\u5bf9LLM\u4ea4\u4e92\u667a\u80fd\u4f53\u7684\u53d1\u5c55\u4ee5\u53ca\u5bf9\u5176\u793e\u4f1a\u5f71\u54cd\u7684\u8ba8\u8bba\u5177\u6709\u91cd\u8981\u542f\u793a\u3002|\n", "2410.07103": "|**2024-10-09**|**Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context**|Sangwon Yu et.al.|[2410.07103](http://arxiv.org/abs/2410.07103)|null|\u5728\u591a\u8df3\u63a8\u7406\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9762\u4e34\u7740\u57fa\u4e8e\u7ed9\u5b9a\u4e0a\u4e0b\u6587\u5185\u7684\u652f\u6301\u6587\u6863\u8fdb\u884c\u591a\u6b65\u9aa4\u63a8\u7406\u7684\u6311\u6218\u3002LLM\u5f80\u5f80\u96be\u4ee5\u7b5b\u9009\u51fa\u4e0d\u76f8\u5173\u7684\u6587\u6863\uff0c\u5e76\u4e14\u5176\u6027\u80fd\u5bf9\u4e0a\u4e0b\u6587\u4e2d\u652f\u6301\u6587\u6863\u7684\u4f4d\u7f6e\u975e\u5e38\u654f\u611f\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u8bc6\u522b\u51fa\u4e86\u4e00\u4e2a\u989d\u5916\u7684\u6311\u6218\uff1aLLM\u7684\u6027\u80fd\u4e5f\u5bf9\u5448\u73b0\u652f\u6301\u6587\u6863\u7684\u987a\u5e8f\u975e\u5e38\u654f\u611f\u3002\u6211\u4eec\u5c06\u6b64\u95ee\u9898\u79f0\u4e3a\u201c\u9519\u5e8f\u4e0a\u4e0b\u6587\u95ee\u9898\u201d\u3002\u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6cd5\u2014\u2014\u4e0a\u4e0b\u6587\u91cd\u590d\uff08CoRe\uff09\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u591a\u6b21\u63d0\u793a\u6a21\u578b\u4ee5\u786e\u4fdd\u652f\u6301\u6587\u6863\u4ee5\u6700\u4f73\u987a\u5e8f\u5448\u73b0\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002 \u901a\u8fc7\u5e94\u7528CoRe\uff0c\u6211\u4eec\u5728\u591a\u8df3\u95ee\u7b54\u4efb\u52a1\u4e0a\u7684F1\u5f97\u5206\u63d0\u9ad8\u4e86\u9ad8\u8fbe30%\uff0c\u5728\u5408\u6210\u4efb\u52a1\u4e0a\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e86\u9ad8\u8fbe70%\u3002\u6b64\u5916\uff0cCoRe\u6709\u52a9\u4e8e\u7f13\u89e3LLM\u666e\u904d\u5b58\u5728\u7684\u201c\u4e2d\u95f4\u8ff7\u5931\u201d\u95ee\u9898\uff0c\u5e76\u53ef\u4ee5\u4e0e\u5229\u7528\u94fe\u5f0f\u601d\u8003\uff08CoT\uff09\u63a8\u7406\u7684\u68c0\u7d22\u65b9\u6cd5\u6709\u6548\u7ed3\u5408\u4f7f\u7528\u3002|\n", "2410.08202": "|**2024-10-10**|**Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training**|Gen Luo et.al.|[2410.08202](http://arxiv.org/abs/2410.08202)|null|\u968f\u7740\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u8fc5\u901f\u53d1\u5c55\uff0c\u5bf9\u6269\u5c55\u5176\u80fd\u529b\u4ee5\u5904\u7406\u591a\u6a21\u6001\u4efb\u52a1\u7684\u5173\u6ce8\u65e5\u76ca\u589e\u52a0\u3002\u5176\u4e2d\uff0c\u5bf9\u5355\u4f53\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u7684\u7814\u7a76\u5f15\u8d77\u4e86\u5e7f\u6cdb\u5173\u6ce8\uff0c\u8fd9\u4e9b\u6a21\u578b\u6574\u5408\u4e86\u89c6\u89c9\u7f16\u7801\u548c\u8bed\u8a00\u89e3\u7801\u529f\u80fd\u3002\u5c3d\u7ba1\u5355\u4f53MLLM\u5728\u7ed3\u6784\u4e0a\u7b80\u6d01\u4e14\u6613\u4e8e\u90e8\u7f72\uff0c\u4f46\u8981\u5b9e\u73b0\u5177\u6709\u7ade\u4e89\u529b\u6027\u80fd\u7684\u8bad\u7ec3\u4ecd\u9762\u4e34\u6311\u6218\u3002\u6d41\u884c\u7684\u7b56\u7565\u91c7\u7528\u8fde\u7eed\u9884\u8bad\u7ec3\u65b9\u6cd5\uff0c\u5c06\u9884\u8bad\u7ec3\u7684LLM\u6269\u5c55\u4e3a\u5355\u4f53MLLM\uff0c\u8fd9\u4f1a\u5bfc\u81f4\u707e\u96be\u6027\u9057\u5fd8\u5e76\u5bfc\u81f4\u6027\u80fd\u9000\u5316\u3002 \u672c\u6587\u65e8\u5728\u4ece\u589e\u91cf\u5b66\u4e60\u7684\u89d2\u5ea6\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u6027\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u6838\u5fc3\u601d\u60f3\u662f\u5728\u9884\u8bad\u7ec3\u7684LLM\u4e2d\u5d4c\u5165\u89c6\u89c9\u53c2\u6570\uff0c\u901a\u8fc7\u589e\u91cf\u5b66\u4e60\u673a\u5236\uff0c\u5373\u5728\u4f18\u5316\u89c6\u89c9\u53c2\u6570\u65f6\u51bb\u7ed3LLM\uff0c\u4ece\u5927\u91cf\u6570\u636e\u4e2d\u9010\u6b65\u5b66\u4e60\u89c6\u89c9\u77e5\u8bc6\u3002\u57fa\u4e8e\u8fd9\u4e00\u539f\u5219\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aMono-InternVL\u7684\u65b0\u578b\u5355\u4f53MLLM\uff0c\u5b83\u901a\u8fc7\u591a\u6a21\u6001\u6df7\u5408\u4e13\u5bb6\u7ed3\u6784\u65e0\u7f1d\u5730\u878d\u5408\u4e86\u4e00\u7cfb\u5217\u89c6\u89c9\u4e13\u5bb6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u521b\u65b0\u7684\u9884\u8bad\u7ec3\u7b56\u7565\u6765\u6700\u5927\u5316Mono-InternVL\u7684\u89c6\u89c9\u80fd\u529b\uff0c\u5373\u5185\u751f\u89c6\u89c9\u9884\u8bad\u7ec3\uff08EViP\uff09\u3002\u5177\u4f53\u800c\u8a00\uff0cEViP\u8bbe\u8ba1\u4e3a\u4e00\u4e2a\u89c6\u89c9\u4e13\u5bb6\u7684\u6e10\u8fdb\u5f0f\u5b66\u4e60\u8fc7\u7a0b\uff0c\u65e8\u5728\u5145\u5206\u5229\u7528\u4ece\u4f4e\u8d28\u91cf\u6570\u636e\u5230\u9ad8\u8d28\u91cf\u6570\u636e\u7684\u89c6\u89c9\u77e5\u8bc6\u3002 \u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u572816\u4e2a\u57fa\u51c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u8bc1\u5b9e\u4e86\u4e0e\u5f53\u524d\u6700\u5148\u8fdb\u7684\u5355\u4f53MLLM\u76f8\u6bd4\uff0cMono-InternVL\u57286\u4e2a\u591a\u6a21\u6001\u57fa\u51c6\u4e0a\u7684\u5353\u8d8a\u6027\u80fd\uff0c\u4f8b\u5982\u5728OCRBench\u4e0a\u7684+113\u70b9\u4f18\u52bf\uff0c\u800c\u4e14\u8fd8\u786e\u8ba4\u4e86\u5176\u66f4\u597d\u7684\u90e8\u7f72\u6548\u7387\uff0c\u9996\u6b21\u4ee4\u724c\u5ef6\u8fdf\u964d\u4f4e\u4e86\u9ad8\u8fbe67%\u3002|\n", "2410.08197": "|**2024-10-10**|**From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions**|Changle Qu et.al.|[2410.08197](http://arxiv.org/abs/2410.08197)|**[link](https://github.com/quchangle1/DRAFT)**|**\u672c\u6587\u4e13\u6ce8\u4e8e\u89e3\u51b3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u5b58\u5728\u7684\u7406\u89e3\u9e3f\u6c9f\u95ee\u9898\uff0c\u8fd9\u4e00\u9e3f\u6c9f\u6e90\u4e8e\u73b0\u6709\u4eba\u7c7b\u5bfc\u5411\u7684\u5de5\u5177\u6587\u6863\u7684\u4e0d\u5b8c\u5584\u6027\u548c\u4e0d\u51c6\u786e\u6027\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aDRAFT\u7684\u65b0\u6846\u67b6\uff0c\u65e8\u5728\u52a8\u6001\u4f18\u5316\u5de5\u5177\u6587\u6863\uff0c\u901a\u8fc7\u5206\u6790\u6765\u81eaLLM\u4e0e\u5916\u90e8\u5de5\u5177\u4ea4\u4e92\u8fc7\u7a0b\u4e2d\u7684\u53cd\u9988\u548c\u8f68\u8ff9\u4fe1\u606f\u3002\u8be5\u65b9\u6cd5\u57fa\u4e8e\u4e00\u79cd\u521b\u65b0\u7684\u8bd5\u9519\u5b66\u4e60\u6d41\u7a0b\uff0c\u5305\u62ec\u7ecf\u9a8c\u6536\u96c6\u3001\u4ece\u7ecf\u9a8c\u5b66\u4e60\u4ee5\u53ca\u6587\u6863\u91cd\u5199\u4e09\u4e2a\u9636\u6bb5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5de5\u5177\u6587\u6863\u7684\u8d28\u91cf\u3002 \u4e3a\u4e86\u786e\u4fdd\u63a2\u7d22\u7684\u591a\u6837\u6027\u5e76\u907f\u514d\u8fc7\u62df\u5408\uff0cDRAFT\u8fd8\u91c7\u7528\u4e86\u4fc3\u8fdb\u591a\u6837\u6027\u7684\u63a2\u7d22\u7b56\u7565\uff0c\u5e76\u914d\u5907\u4e86\u4e00\u4e2a\u5de5\u5177\u9002\u5e94\u6027\u7ec8\u6b62\u673a\u5236\u6765\u63d0\u9ad8\u6548\u7387\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cDRAFT\u901a\u8fc7\u8fed\u4ee3\u53cd\u9988\u4f18\u5316\u663e\u8457\u63d0\u9ad8\u4e86\u6587\u6863\u8d28\u91cf\uff0c\u4fc3\u8fdb\u4e86LLM\u5bf9\u5de5\u5177\u7684\u66f4\u6df1\u5165\u7406\u89e3\u548c\u66f4\u6709\u6548\u5229\u7528\u3002\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u63ed\u793a\u4e86\u901a\u8fc7\u8fd9\u79cd\u65b9\u6cd5\u4f18\u5316\u540e\u7684\u5de5\u5177\u6587\u6863\u5177\u6709\u5f3a\u5927\u7684\u8de8\u6a21\u578b\u901a\u7528\u80fd\u529b\u3002**|\n", "2410.08196": "|**2024-10-10**|**MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code**|Zimu Lu et.al.|[2410.08196](http://arxiv.org/abs/2410.08196)|**[link](https://github.com/mathllm/mathcoder2)**|**\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u7528\u4e8e\u751f\u6210\u4f34\u968f\u63a8\u7406\u6b65\u9aa4\u7684\u6570\u5b66\u4ee3\u7801\uff0c\u4ee5\u8fdb\u884c\u6301\u7eed\u9884\u8bad\u7ec3\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u6574\u5408\u6570\u5b66\u76f8\u5173\u7f51\u7edc\u6570\u636e\u3001\u4f7f\u7528\u6570\u5b66\u5305\u7684\u4ee3\u7801\u3001\u6570\u5b66\u6559\u79d1\u4e66\u548c\u5408\u6210\u6570\u636e\u6765\u6784\u5efa\u9ad8\u8d28\u91cf\u7684\u6570\u5b66\u6301\u7eed\u9884\u8bad\u7ec3\u6570\u636e\u96c6\u3002\u63a5\u7740\uff0c\u6211\u4eec\u901a\u8fc7\u63d0\u53d6LaTeX\u8868\u8fbe\u5f0f\u3001\u8868\u8fbe\u5f0f\u7684\u6761\u4ef6\u4ee5\u53ca\u7ed3\u679c\u6765\u6784\u9020\u63a8\u7406\u6b65\u9aa4\u3002\u57fa\u4e8e\u8fd9\u4e9b\u63d0\u53d6\u7684\u4fe1\u606f\uff0c\u6211\u4eec\u751f\u6210\u76f8\u5e94\u7684\u4ee3\u7801\uff0c\u4ee5\u51c6\u786e\u6355\u6349\u6570\u5b66\u63a8\u7406\u8fc7\u7a0b\u3002\u6211\u4eec\u5c06\u751f\u6210\u7684\u4ee3\u7801\u9644\u52a0\u5230\u6bcf\u4e2a\u63a8\u7406\u6b65\u9aa4\u540e\uff0c\u5f62\u6210\u5305\u542b\u81ea\u7136\u8bed\u8a00\u63a8\u7406\u6b65\u9aa4\u53ca\u5176\u5bf9\u5e94\u4ee3\u7801\u7684\u6570\u636e\u5bf9\u3002\u5c06\u6b64\u6570\u636e\u4e0e\u539f\u59cb\u6570\u636e\u96c6\u7ed3\u5408\uff0c\u5f97\u5230\u4e00\u4e2a\u5305\u542b19.2B\u4e2a\u6807\u8bb0\u7684\u9ad8\u6027\u80fd\u6570\u5b66\u9884\u8bad\u7ec3\u8bed\u6599\u5e93\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aMathCode-Pile\u3002\u4f7f\u7528\u6b64\u8bed\u6599\u5e93\u5bf9\u51e0\u79cd\u6d41\u884c\u7684\u57fa\u6a21\u8fdb\u884c\u8bad\u7ec3\uff0c\u663e\u8457\u63d0\u9ad8\u4e86\u5b83\u4eec\u7684\u6570\u5b66\u80fd\u529b\uff0c\u4ece\u800c\u4ea7\u751f\u4e86\u540d\u4e3aMathCoder2\u7684\u6a21\u578b\u5bb6\u65cf\u3002\u6240\u6709\u6570\u636e\u5904\u7406\u548c\u8bad\u7ec3\u4ee3\u7801\u5747\u5f00\u6e90\uff0c\u786e\u4fdd\u4e86\u6574\u4e2a\u6570\u636e\u6536\u96c6\u548c\u8bad\u7ec3\u6d41\u7a0b\u7684\u900f\u660e\u6027\u548c\u53ef\u590d\u73b0\u6027\u3002\u4ee3\u7801\u5728https://github.com/mathllm/MathCoder2\u4e0a\u53d1\u5e03\u3002**|\n", "2410.08193": "|**2024-10-10**|**GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment**|Yuancheng Xu et.al.|[2410.08193](http://arxiv.org/abs/2410.08193)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5c55\u73b0\u51fa\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u80fd\u529b\uff0c\u4f46\u9700\u8981\u4ed4\u7ec6\u5bf9\u9f50\u4ee5\u6ee1\u8db3\u4eba\u7c7b\u7684\u504f\u597d\u3002\u4f20\u7edf\u7684\u8bad\u7ec3\u65f6\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u4eba\u7c7b\u504f\u597d\u6570\u636e\u96c6\u6765\u5fae\u8c03LLM\uff0c\u4f46\u4f1a\u5e26\u6765\u663e\u8457\u7684\u8bad\u7ec3\u6210\u672c\uff0c\u5e76\u4e14\u9700\u8981\u91cd\u590d\u8bad\u7ec3\u4ee5\u5e94\u5bf9\u591a\u6837\u5316\u7684\u7528\u6237\u504f\u597d\u3002\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\u901a\u8fc7\u4f7f\u7528\u5956\u52b1\u6a21\u578b\uff08RM\uff09\u6765\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\uff0c\u4ece\u800c\u89e3\u51b3\u4e86\u8fd9\u4e00\u95ee\u9898\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684\u6d4b\u8bd5\u65f6\u65b9\u6cd5\u4f9d\u8d56\u4e8e\u8f68\u8ff9\u7ea7RM\uff0c\u5b83\u4eec\u65e8\u5728\u8bc4\u4f30\u5b8c\u6574\u54cd\u5e94\uff0c\u8fd9\u4f7f\u5f97\u5b83\u4eec\u4e0d\u9002\u5408\u7528\u4e8e\u9700\u8981\u4ece\u90e8\u5206\u54cd\u5e94\u8ba1\u7b97\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\u7684\u81ea\u56de\u5f52\u6587\u672c\u751f\u6210\u3002 \u4e3a\u4e86\u5e94\u5bf9\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5f15\u5165\u4e86GenARM\uff0c\u4e00\u79cd\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u65b9\u6cd5\uff0c\u5229\u7528\u4e86\u81ea\u56de\u5f52\u5956\u52b1\u6a21\u578b\u2014\u2014\u4e00\u79cd\u65b0\u578b\u7684\u5956\u52b1\u53c2\u6570\u5316\u65b9\u6cd5\uff0c\u65e8\u5728\u9884\u6d4b\u81ea\u56de\u5f52\u751f\u6210\u8fc7\u7a0b\u4e2d\u7684\u4e0b\u4e00\u4e2a\u8bcd\u5956\u52b1\uff0c\u4ee5\u5b9e\u73b0\u9ad8\u6548\u548c\u6709\u6548\u7684\u81ea\u56de\u5f52\u751f\u6210\u3002\u7406\u8bba\u4e0a\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u8fd9\u79cd\u53c2\u6570\u5316\u53ef\u4ee5\u5728KL\u6b63\u5219\u5316\u5f3a\u5316\u5b66\u4e60\u6846\u67b6\u5185\u5f15\u5bfc\u51bb\u7ed3\u7684LLM\u63a5\u8fd1\u4efb\u4f55\u7531\u4f20\u7edfRM\u53ef\u5b9e\u73b0\u7684\u5206\u5e03\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0cGenARM\u5728\u6027\u80fd\u4e0a\u663e\u8457\u4f18\u4e8e\u5148\u524d\u7684\u6d4b\u8bd5\u65f6\u5bf9\u9f50\u57fa\u7ebf\uff0c\u5e76\u4e14\u4e0e\u8bad\u7ec3\u65f6\u65b9\u6cd5\u7684\u6027\u80fd\u76f8\u5f53\u3002\u6b64\u5916\uff0cGenARM\u652f\u6301\u5f31\u5230\u5f3a\u7684\u6307\u5bfc\uff0c\u5141\u8bb8\u5728\u4e0d\u9700\u8981\u8bad\u7ec3\u66f4\u5927\u6a21\u578b\u7684\u60c5\u51b5\u4e0b\uff0c\u901a\u8fc7\u8f83\u5c0f\u7684RM\u5bf9\u66f4\u5927\u7684LLM\u8fdb\u884c\u5bf9\u9f50\uff0c\u4ece\u800c\u964d\u4f4e\u4e86\u6210\u672c\u3002\u8fdb\u4e00\u6b65\u5730\uff0cGenARM\u8fd8\u652f\u6301\u591a\u76ee\u6807\u5bf9\u9f50\uff0c\u5141\u8bb8\u5b9e\u65f6\u5e73\u8861\u504f\u597d\u7ef4\u5ea6\uff0c\u6ee1\u8db3\u4e0d\u540c\u7528\u6237\u9700\u6c42\uff0c\u800c\u65e0\u9700\u91cd\u65b0\u8bad\u7ec3\u3002|\n", "2410.08174": "|**2024-10-10**|**Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models**|Qingni Wang et.al.|[2410.08174](http://arxiv.org/abs/2410.08174)|null|\u672c\u8bba\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aTRON\u7684\u4e24\u6b65\u6846\u67b6\uff0c\u65e8\u5728\u5bf9\u4efb\u4f55\u652f\u6301\u5728\u5f00\u653e\u548c\u5c01\u95ed\u573a\u666f\u4e0b\u91c7\u6837\u7684\u5927\u578b\u591a\u6a21\u6001\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u8fdb\u884c\u98ce\u9669\u63a7\u5236\u4e0e\u8bc4\u4f30\u3002TRON\u7531\u4e24\u4e2a\u4e3b\u8981\u7ec4\u4ef6\u6784\u6210\uff1a\uff081\uff09\u4e00\u79cd\u65b0\u9896\u7684\u6821\u51c6\u8bc4\u5206\u65b9\u6cd5\uff0c\u7528\u4e8e\u4ee5\u6700\u5c0f\u5c3a\u5bf8\u91c7\u6837\u54cd\u5e94\u96c6\uff1b\uff082\uff09\u57fa\u4e8e\u81ea\u81f4\u6027\u7406\u8bba\u7684\u975e\u4e00\u81f4\u6027\u8bc4\u5206\uff0c\u901a\u8fc7\u8bbe\u5b9a\u4e24\u79cd\u7279\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u6765\u63a7\u5236\u9519\u8bef\u7387\u3002\u6b64\u5916\uff0c\u672c\u7814\u7a76\u9996\u6b21\u63a2\u8ba8\u4e86\u5728\u5f00\u653e\u573a\u666f\u4e0b\u7684\u9884\u6d4b\u96c6\u4e2d\u7684\u8bed\u4e49\u5197\u4f59\u95ee\u9898\uff0c\u5e76\u636e\u6b64\u63d0\u51fa\u4e86\u4e00\u4e2a\u7528\u4e8e\u8bc4\u4ef7MLLM\u7684\u65b0\u6307\u6807\u2014\u2014\u5e73\u5747\u96c6\u5408\u5927\u5c0f\u3002 \u901a\u8fc7\u5728\u56db\u4e2a\u89c6\u9891\u95ee\u7b54\uff08VideoQA\uff09\u6570\u636e\u96c6\u4e0a\u4f7f\u7528\u516b\u79cdMLLM\u8fdb\u884c\u5168\u9762\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86TRON\u80fd\u591f\u5b9e\u73b0\u7528\u6237\u6307\u5b9a\u7684\u98ce\u9669\u6c34\u5e73\u8303\u56f4\u5185\u7684\u671f\u671b\u9519\u8bef\u7387\u3002\u540c\u65f6\uff0c\u53bb\u91cd\u540e\u7684\u9884\u6d4b\u96c6\u5728\u4fdd\u6301\u9002\u5e94\u6027\u7684\u540c\u65f6\uff0c\u5c55\u73b0\u51fa\u66f4\u9ad8\u6548\u3001\u7a33\u5b9a\u7684\u98ce\u9669\u8bc4\u4f30\u80fd\u529b\uff0c\u5728\u4e0d\u540c\u98ce\u9669\u6c34\u5e73\u4e0b\u5747\u6709\u51fa\u8272\u8868\u73b0\u3002|\n", "2410.08172": "|**2024-10-10**|**On the Evaluation of Generative Robotic Simulations**|Feng Chen et.al.|[2410.08172](http://arxiv.org/abs/2410.08172)|null|\u7531\u4e8e\u83b7\u53d6\u771f\u5b9e\u4e16\u754c\u6570\u636e\u7684\u56f0\u96be\u6027\uff0c\u673a\u5668\u4eba\u6a21\u62df\u5df2\u6210\u4e3a\u5e76\u884c\u8bad\u7ec3\u548c\u6a21\u62df\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u8f6c\u6362\u7684\u5173\u952e\uff0c\u8fd9\u51f8\u663e\u4e86\u53ef\u6269\u5c55\u4eff\u771f\u673a\u5668\u4eba\u4efb\u52a1\u7684\u91cd\u8981\u6027\u3002\u57fa\u7840\u6a21\u578b\u5df2\u7ecf\u5c55\u73b0\u51fa\u5728\u81ea\u4e3b\u751f\u6210\u53ef\u884c\u673a\u5668\u4eba\u4efb\u52a1\u65b9\u9762\u7684\u60ca\u4eba\u80fd\u529b\u3002\u7136\u800c\uff0c\u8fd9\u4e00\u65b0\u8303\u5f0f\u5f3a\u8c03\u4e86\u8bc4\u4f30\u8fd9\u4e9b\u81ea\u4e3b\u751f\u6210\u4efb\u52a1\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u751f\u6210\u6a21\u62df\u7684\u5168\u9762\u8bc4\u4ef7\u6846\u67b6\u3002\u6211\u4eec\u7684\u6846\u67b6\u5c06\u8bc4\u4f30\u5206\u4e3a\u4e09\u4e2a\u6838\u5fc3\u65b9\u9762\uff1a\u8d28\u91cf\u3001\u591a\u6837\u6027\u548c\u6cdb\u5316\u3002\u5bf9\u4e8e\u5355\u4efb\u52a1\u8d28\u91cf\uff0c\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\u8bc4\u4f30\u751f\u6210\u4efb\u52a1\u7684\u771f\u5b9e\u6027\u548c\u751f\u6210\u8f68\u8ff9\u7684\u5b8c\u6574\u6027\u3002\u5728\u591a\u6837\u6027\u65b9\u9762\uff0c\u6211\u4eec\u901a\u8fc7\u4efb\u52a1\u63cf\u8ff0\u7684\u6587\u672c\u76f8\u4f3c\u6027\u548c\u6536\u96c6\u7684\u4efb\u52a1\u8f68\u8ff9\u8bad\u7ec3\u7684\u4e16\u754c\u6a21\u578b\u635f\u5931\u6765\u6d4b\u91cf\u4efb\u52a1\u548c\u6570\u636e\u7684\u591a\u6837\u6027\u3002\u5bf9\u4e8e\u4efb\u52a1\u7ea7\u522b\u7684\u6cdb\u5316\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u4f7f\u7528\u591a\u4e2a\u751f\u6210\u4efb\u52a1\u8bad\u7ec3\u7684\u7b56\u7565\u5728\u672a\u89c1\u8fc7\u7684\u4efb\u52a1\u4e0a\u7684\u96f6\u6837\u672c\u6cdb\u5316\u80fd\u529b\u3002\u5728\u4e09\u4e2a\u4ee3\u8868\u6027\u4efb\u52a1\u751f\u6210\u7ba1\u9053\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u7684\u6846\u67b6\u7684\u8bc4\u4f30\u7ed3\u679c\u4e0e\u4eba\u7c7b\u8bc4\u4f30\u9ad8\u5ea6\u4e00\u81f4\uff0c\u786e\u8ba4\u4e86\u6211\u4eec\u65b9\u6cd5\u7684\u53ef\u884c\u6027\u548c\u6709\u6548\u6027\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u53ef\u4ee5\u901a\u8fc7\u67d0\u4e9b\u65b9\u6cd5\u5b9e\u73b0\u8d28\u91cf\u548c\u591a\u6837\u6027\u7684\u6307\u6807\uff0c\u4f46\u6ca1\u6709\u4efb\u4f55\u4e00\u79cd\u65b9\u6cd5\u80fd\u591f\u5728\u6240\u6709\u6307\u6807\u4e0a\u90fd\u8868\u73b0\u51fa\u8272\uff0c\u8fd9\u8868\u660e\u9700\u8981\u66f4\u591a\u5730\u5173\u6ce8\u5e73\u8861\u8fd9\u4e9b\u4e0d\u540c\u6307\u6807\u3002\u6b64\u5916\uff0c\u6211\u4eec\u7684\u5206\u6790\u8fdb\u4e00\u6b65\u7a81\u663e\u4e86\u5f53\u524d\u5de5\u4f5c\u9762\u4e34\u7684\u5171\u540c\u6311\u6218\u2014\u2014\u4f4e\u6cdb\u5316\u80fd\u529b\u3002 \u533f\u540d\u7f51\u7ad9\u94fe\u63a5\uff1ahttps://sites.google.com/view/evaltasks|\n", "2410.08164": "|**2024-10-10**|**Agent S: An Open Agentic Framework that Uses Computers Like a Human**|Saaket Agashe et.al.|[2410.08164](http://arxiv.org/abs/2410.08164)|**[link](https://github.com/simular-ai/agent-s)**|**\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aAgent S\u7684\u5f00\u653e\u6027\u4ee3\u7406\u6846\u67b6\uff0c\u5b83\u901a\u8fc7\u56fe\u5f62\u7528\u6237\u754c\u9762(GUI)\u4e0e\u8ba1\u7b97\u673a\u8fdb\u884c\u81ea\u4e3b\u4ea4\u4e92\uff0c\u65e8\u5728\u901a\u8fc7\u81ea\u52a8\u5316\u590d\u6742\u3001\u591a\u6b65\u9aa4\u7684\u4efb\u52a1\u6765\u6539\u53d8\u4eba\u673a\u4ea4\u4e92\u65b9\u5f0f\u3002Agent S\u65e8\u5728\u89e3\u51b3\u81ea\u52a8\u5316\u8ba1\u7b97\u673a\u4efb\u52a1\u65f6\u9047\u5230\u7684\u4e09\u4e2a\u5173\u952e\u6311\u6218\uff1a\u83b7\u53d6\u7279\u5b9a\u9886\u57df\u7684\u77e5\u8bc6\u3001\u5728\u957f\u4efb\u52a1\u5468\u671f\u5185\u89c4\u5212\u4ee5\u53ca\u5904\u7406\u52a8\u6001\u3001\u975e\u5747\u5300\u7684\u754c\u9762\u3002\u4e3a\u6b64\uff0cAgent S\u5f15\u5165\u4e86\u7ecf\u9a8c\u589e\u5f3a\u7684\u5c42\u6b21\u89c4\u5212\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5728\u591a\u4e2a\u7ea7\u522b\u4e0a\u7ed3\u5408\u5916\u90e8\u77e5\u8bc6\u641c\u7d22\u548c\u5185\u90e8\u7ecf\u9a8c\u68c0\u7d22\uff0c\u4ece\u800c\u5b9e\u73b0\u9ad8\u6548\u7684\u4efb\u52a1\u89c4\u5212\u548c\u5b50\u4efb\u52a1\u6267\u884c\u3002\u6b64\u5916\uff0c\u5b83\u91c7\u7528\u4e86\u4ee3\u7406-\u8ba1\u7b97\u673a\u63a5\u53e3(ACI)\uff0c\u57fa\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b(MLLMs)\u66f4\u597d\u5730\u63ed\u793aGUI\u4ee3\u7406\u7684\u63a8\u7406\u548c\u63a7\u5236\u80fd\u529b\u3002\u5728OSWorld\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u8bc4\u4f30\u663e\u793a\uff0c\u4e0e\u57fa\u7ebf\u76f8\u6bd4\uff0cAgent S\u7684\u6210\u529f\u7387\u63d0\u9ad8\u4e869.37%(\u76f8\u5bf9\u63d0\u9ad8\u4e8683.6%)\uff0c\u5e76\u8fbe\u5230\u4e86\u65b0\u7684\u6700\u9ad8\u6c34\u5e73\u3002\u5168\u9762\u5206\u6790\u5f3a\u8c03\u4e86\u5404\u4e2a\u7ec4\u4ef6\u7684\u6709\u6548\u6027\uff0c\u5e76\u63d0\u4f9b\u4e86\u672a\u6765\u6539\u8fdb\u7684\u89c1\u89e3\u3002\u6b64\u5916\uff0cAgent S\u5728\u65b0\u53d1\u5e03\u7684WindowsAgentArena\u57fa\u51c6\u4e0a\u5c55\u793a\u4e86\u5e7f\u6cdb\u7684\u901a\u7528\u6027\uff0c\u80fd\u591f\u9002\u5e94\u4e0d\u540c\u7684\u64cd\u4f5c\u7cfb\u7edf\u3002\u6709\u5173\u4ee3\u7801\u7684\u66f4\u591a\u4fe1\u606f\uff0c\u8bf7\u53c2\u9605https://github.com/simular-ai/Agent-S\u3002**|\n", "2410.08146": "|**2024-10-10**|**Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning**|Amrith Setlur et.al.|[2410.08146](http://arxiv.org/abs/2410.08146)|null|\u63d0\u9ad8\u5927\u578b\u8bed\u8a00\u6a21\u578b\u63a8\u7406\u80fd\u529b\u7684\u4e00\u79cd\u6709\u524d\u666f\u7684\u65b9\u6cd5\u662f\u4f7f\u7528\u8fc7\u7a0b\u5956\u52b1\u6a21\u578b\uff08PRMs\uff09\u3002\u4e0e\u4ec5\u5728\u6700\u7ec8\u6b65\u9aa4\u63d0\u4f9b\u53cd\u9988\u7684\u7ed3\u679c\u5956\u52b1\u6a21\u578b\uff08ORMs\uff09\u76f8\u6bd4\uff0cPRMs\u5728\u591a\u6b65\u63a8\u7406\u8ddf\u8e2a\u7684\u6bcf\u4e2a\u6b65\u9aa4\u90fd\u63d0\u4f9b\u53cd\u9988\uff0c\u53ef\u80fd\u6709\u52a9\u4e8e\u6539\u8fdb\u4fe1\u7528\u5206\u914d\u3002\u7136\u800c\uff0c\u6536\u96c6\u5bc6\u96c6\u3001\u6bcf\u6b65\u9aa4\u7684\u4eba\u7c7b\u6807\u7b7e\u5e76\u4e0d\u5177\u6709\u53ef\u6269\u5c55\u6027\uff0c\u4ece\u81ea\u52a8\u6807\u8bb0\u6570\u636e\u8bad\u7ec3PRMs\u8fc4\u4eca\u4e3a\u6b62\u5bfc\u81f4\u7684\u589e\u76ca\u6709\u9650\u3002\u4e3a\u4e86\u901a\u8fc7\u8fd0\u884c\u641c\u7d22\u6765\u6539\u8fdb\u57fa\u7b56\u7565\u6216\u5c06\u5176\u7528\u4f5c\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u7684\u5bc6\u96c6\u5956\u52b1\u6765\u4f18\u5316\u57fa\u7b56\u7565\uff0c\u6211\u4eec\u63d0\u51fa\u7684\u95ee\u9898\u662f\uff1a\u201c\u6211\u4eec\u5e94\u8be5\u5982\u4f55\u8bbe\u8ba1\u8fc7\u7a0b\u5956\u52b1\uff1f\u201d\u6211\u4eec\u7684\u5173\u952e\u6d1e\u5bdf\u662f\uff0c\u4e3a\u4e86\u6709\u6548\uff0c\u6b65\u9aa4\u7ea7\u5956\u52b1\u5e94\u8be5\u8861\u91cf\u8fdb\u5ea6\uff1a\u91c7\u53d6\u6b65\u9aa4\u524d\u540e\u4ea7\u751f\u6b63\u786e\u54cd\u5e94\u7684\u53ef\u80fd\u6027\u53d8\u5316\uff0c\u5bf9\u5e94\u4e8eRL\u4e2d\u7684\u6b65\u9aa4\u7ea7\u4f18\u52bf\u7684\u6982\u5ff5\u3002\u5173\u952e\u5728\u4e8e\uff0c\u8fd9\u79cd\u8fdb\u5c55\u5e94\u8be5\u5728\u4e0e\u57fa\u7b56\u7565\u4e0d\u540c\u7684\u8bc1\u660e\u7b56\u7565\u4e0b\u8fdb\u884c\u6d4b\u91cf\u3002\u6211\u4eec\u7406\u8bba\u5730\u63cf\u8ff0\u4e86\u826f\u597d\u7684\u8bc1\u660e\u8005\u96c6\u5408\uff0c\u5e76\u4e14\u6211\u4eec\u7684\u7ed3\u679c\u8868\u660e\uff0c\u901a\u8fc7\u8fd9\u6837\u7684\u8bc1\u660e\u8005\u4f18\u5316\u8fc7\u7a0b\u5956\u52b1\u53ef\u4ee5\u6539\u5584\u6d4b\u8bd5\u65f6\u641c\u7d22\u548c\u5728\u7ebfRL\u671f\u95f4\u7684\u63a2\u7d22\u3002\u5b9e\u9645\u4e0a\uff0c\u6211\u4eec\u7684\u63cf\u8ff0\u663e\u793a\uff0c\u5f31\u8bc1\u660e\u8005\u7b56\u7565\u53ef\u4ee5\u663e\u7740\u63d0\u9ad8\u66f4\u5f3a\u7684\u57fa\u7b56\u7565\uff0c\u8fd9\u4e5f\u662f\u6211\u4eec\u5728\u5b9e\u9a8c\u4e0a\u89c2\u5bdf\u5230\u7684\u73b0\u8c61\u3002\u6211\u4eec\u901a\u8fc7\u8bad\u7ec3\u8fc7\u7a0b\u4f18\u52bf\u9a8c\u8bc1\u5668\uff08PAVs\uff09\u6765\u9884\u6d4b\u5728\u8fd9\u4e9b\u8bc1\u660e\u8005\u4e0b\u8fdb\u884c\u7684\u8fdb\u5c55\uff0c\u8bc1\u660e\u4e0eORMs\u76f8\u6bd4\uff0c\u5728\u7ebfRL\u4f7f\u7528PAVs\u63d0\u4f9b\u7684\u5bc6\u96c6\u5956\u52b1\u53ef\u4ee5\u5b9e\u73b0\u9ad8\u8fbe8\uff05\u4ee5\u4e0a\u7684\u51c6\u786e\u6027\u63d0\u9ad8\uff0c\u4ee5\u53ca1.5\u81f35\u500d\u7684\u8ba1\u7b97\u6548\u7387\u63d0\u9ad8\u3002\u4f7f\u7528PAVs\u7684\u5728\u7ebfRL\u9996\u6b21\u5b9e\u73b0\u4e86\u6837\u672c\u6548\u7387\u63d0\u53475-6\u500d\uff0c\u51c6\u786e\u7387\u63d0\u5347\u8d85\u8fc76\uff05\u7684\u7ed3\u679c\u3002|\n", "2410.08145": "|**2024-10-10**|**Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs**|Xiaoyuan Liu et.al.|[2410.08145](http://arxiv.org/abs/2410.08145)|null|\u672c\u6587\u63a2\u8ba8\u4e86\u5728\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u4e2d\u89c6\u89c9\u4fe1\u606f\u4e0e\u6a21\u578b\u5185\u90e8\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u7684\u95ee\u9898\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u5728\u7279\u5b9a\u60c5\u51b5\u4e0b\uff0cMLLMs\u53ef\u80fd\u57fa\u4e8e\u6587\u672c\u67e5\u8be2\u800c\u975e\u89c6\u89c9\u8f93\u5165\u505a\u51fa\u51b3\u7b56\uff0c\u5bfc\u81f4\u5e38\u8bc6\u7ea7\u7684\u89c6\u89c9-\u77e5\u8bc6\u77db\u76fe\u3002\u4e3a\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u5957\u81ea\u52a8\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\uff0c\u5e76\u8f85\u4ee5\u4eba\u5de5\u8d28\u91cf\u63a7\u5236\u73af\u8282\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u7528\u4e8e\u6a21\u62df\u548c\u8bc4\u4f30\u6b64\u7c7b\u51b2\u7a81\u7684\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u3002 \u8be5\u57fa\u51c6\u6d4b\u8bd5\u5305\u542b\u4e86374\u5f20\u539f\u521b\u56fe\u7247\u53ca1122\u4e2a\u9ad8\u8d28\u91cf\u7684\u95ee\u9898-\u7b54\u6848\u5bf9\uff0c\u8986\u76d6\u4e86\u4e24\u79cd\u51b2\u7a81\u76ee\u6807\u7c7b\u578b\u548c\u4e09\u4e2a\u4e0d\u540c\u96be\u5ea6\u7ea7\u522b\u7684\u95ee\u9898\uff0c\u4e3a\u5168\u9762\u8bc4\u4f30\u6a21\u578b\u63d0\u4f9b\u4e86\u5de5\u5177\u3002\u901a\u8fc7\u8fd9\u4e00\u57fa\u51c6\uff0c\u6211\u4eec\u5bf9\u4e5d\u79cd\u4ee3\u8868\u6027\u7684MLLM\u8fdb\u884c\u4e86\u8bc4\u4f30\uff0c\u53d1\u73b0\u8fd9\u4e9b\u6a21\u578b\u5728\u5904\u7406\u89c6\u89c9\u4e0e\u5e38\u8bc6\u77e5\u8bc6\u51b2\u7a81\u65f6\u5b58\u5728\u663e\u8457\u7684\u6587\u672c\u4f9d\u8d56\u6027\u95ee\u9898\u3002 \u57fa\u4e8e\u6b64\u53d1\u73b0\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u63d0\u793a\u7b56\u7565\u2014\u2014\u201c\u805a\u7126\u4e8e\u89c6\u89c9\u201d\uff08FoV\uff09\uff0c\u65e8\u5728\u589e\u5f3a\u6a21\u578b\u5728\u9047\u5230\u51b2\u7a81\u65f6\u4f18\u5148\u8003\u8651\u89c6\u89c9\u8f93\u5165\u7684\u80fd\u529b\uff0c\u4ece\u800c\u51cf\u5c11\u5bf9\u77db\u76fe\u6587\u672c\u4fe1\u606f\u7684\u4f9d\u8d56\u3002\u6211\u4eec\u7684\u5206\u6790\u7ed3\u679c\u4ee5\u53ca\u63d0\u51fa\u7684\u7b56\u7565\u5bf9\u7406\u89e3\u5e76\u7f13\u89e3MLLM\u4e2d\u7684\u89c6\u89c9-\u77e5\u8bc6\u51b2\u7a81\u5177\u6709\u91cd\u8981\u610f\u4e49\u3002 \u6b64\u5916\uff0c\u672c\u6587\u8fd8\u63d0\u4f9b\u4e86\u6570\u636e\u96c6\u548c\u4ee3\u7801\u7684\u516c\u5f00\u8bbf\u95ee\u6743\u9650\uff0c\u4ee5\u4fc3\u8fdb\u793e\u533a\u8fdb\u4e00\u6b65\u7684\u7814\u7a76\u548c\u5e94\u7528\u3002|\n", "2410.08143": "|**2024-10-10**|**DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory**|Yutong Wang et.al.|[2410.08143](http://arxiv.org/abs/2410.08143)|**[link](https://github.com/yutongwang1216/docmtagent)**|**\u5728\u673a\u5668\u7ffb\u8bd1\u9886\u57df\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u7ecf\u53d6\u5f97\u4e86\u76f8\u5f53\u53ef\u89c2\u7684\u8d28\u91cf\u63d0\u5347\u3002\u7136\u800c\uff0c\u5927\u591a\u6570\u5f53\u524d\u7684MT-LLM\u7814\u7a76\u4ecd\u7136\u9762\u4e34\u5728\u5904\u7406\u6574\u4e2a\u6587\u6863\u65f6\u4fdd\u6301\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u51c6\u786e\u6027\u7684\u6311\u6218\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aDelTA\u7684\u6587\u6863\u7ea7\u7ffb\u8bd1\u4ee3\u7406\uff0c\u65e8\u5728\u514b\u670d\u8fd9\u4e9b\u5c40\u9650\u6027\u3002DelTA\u5177\u6709\u4e00\u79cd\u591a\u5c42\u6b21\u8bb0\u5fc6\u7ed3\u6784\uff0c\u80fd\u591f\u5b58\u50a8\u4e0d\u540c\u7c92\u5ea6\u548c\u8de8\u5ea6\u7684\u4fe1\u606f\uff0c\u5305\u62ec\u4e13\u6709\u540d\u8bcd\u8bb0\u5f55\u3001\u53cc\u8bed\u6458\u8981\u3001\u957f\u671f\u8bb0\u5fc6\u548c\u77ed\u671f\u8bb0\u5fc6\uff0c\u8fd9\u4e9b\u4fe1\u606f\u7531\u8f85\u52a9\u7684LLM\u7ec4\u4ef6\u8fde\u7eed\u68c0\u7d22\u548c\u66f4\u65b0\u3002\u5b9e\u9a8c\u7ed3\u679c\u663e\u793a\uff0c\u5728\u56db\u4e2a\u5f00\u6e90/\u95ed\u6e90LLM\u548c\u4e24\u4e2a\u4ee3\u8868\u6027\u6587\u6863\u7ffb\u8bd1\u6570\u636e\u96c6\u4e0a\uff0cDelTA\u5728\u7ffb\u8bd1\u4e00\u81f4\u6027\u4e0e\u8d28\u91cf\u65b9\u9762\u5747\u663e\u8457\u4f18\u4e8e\u5f3a\u5927\u7684\u57fa\u7ebf\uff0c\u5e73\u5747\u4e00\u81f4\u6027\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe4.58\u4e2a\u767e\u5206\u70b9\uff0cCOMET\u5f97\u5206\u63d0\u9ad8\u9ad8\u8fbe3.16\u70b9\u3002DelTA\u91c7\u7528\u9010\u53e5\u7ffb\u8bd1\u7b56\u7565\uff0c\u786e\u4fdd\u65e0\u53e5\u5b50\u9057\u6f0f\uff0c\u5e76\u63d0\u4f9b\u4e0e\u4e3b\u6d41\u65b9\u6cd5\u76f8\u6bd4\u66f4\u4e3a\u5185\u5b58\u9ad8\u6548\u7684\u9009\u62e9\u3002\u6b64\u5916\uff0cDelTA\u63d0\u9ad8\u4e86\u4ee3\u8bcd\u7ffb\u8bd1\u51c6\u786e\u6027\uff0c\u5e76\u4e14\u4ee3\u7406\u7684\u6458\u8981\u7ec4\u4ef6\u4e5f\u663e\u793a\u51fa\u4f5c\u4e3a\u57fa\u4e8e\u67e5\u8be2\u7684\u6458\u8981\u4efb\u52a1\u5de5\u5177\u7684\u6f5c\u529b\u3002\u6211\u4eec\u5df2\u5c06\u4ee3\u7801\u548c\u6570\u636e\u53d1\u5e03\u5728https://github.com/YutongWang1216/DocMTAgent\u3002**|\n", "2410.09040": "|**2024-10-11**|**AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation**|Zijun Wang et.al.|[2410.09040](http://arxiv.org/abs/2410.09040)|**[link](https://github.com/ucsc-vlaa/attngcg-attack)**|**\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u8f6c\u6362\u5668\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53d7\u5230\u56da\u7981\u653b\u51fb\u7684\u8106\u5f31\u6027\uff0c\u7279\u522b\u5173\u6ce8\u57fa\u4e8e\u4f18\u5316\u7684\u8d2a\u5a6a\u5750\u6807\u68af\u5ea6\uff08GCG\uff09\u7b56\u7565\u3002\u6211\u4eec\u9996\u5148\u89c2\u5bdf\u5230\u653b\u51fb\u7684\u6709\u6548\u6027\u4e0e\u6a21\u578b\u5185\u90e8\u884c\u4e3a\u4e4b\u95f4\u5b58\u5728\u6b63\u76f8\u5173\u5173\u7cfb\u3002\u4f8b\u5982\uff0c\u5f53\u6a21\u578b\u5bf9\u65e8\u5728\u786e\u4fddLLM\u5b89\u5168\u5bf9\u9f50\u7684\u7cfb\u7edf\u63d0\u793a\u7ed9\u4e88\u66f4\u591a\u5173\u6ce8\u65f6\uff0c\u653b\u51fb\u5f80\u5f80\u6548\u679c\u8f83\u5dee\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u589e\u5f3a\u65b9\u6cd5\uff0c\u901a\u8fc7\u64cd\u7eb5\u6a21\u578b\u7684\u6ce8\u610f\u529b\u5206\u6570\u6765\u4fc3\u8fdbLLM\u7684\u56da\u7981\uff0c\u6211\u4eec\u5c06\u5176\u547d\u540d\u4e3aAttnGCG\u3002\u5b9e\u9a8c\u4e0a\uff0cAttnGCG\u5728\u5404\u79cdLLMs\u4e0a\u8868\u73b0\u51fa\u4e00\u81f4\u7684\u6539\u8fdb\uff0c\u5728Llama-2\u7cfb\u5217\u4e2d\u5e73\u5747\u63d0\u9ad8\u4e86\u7ea67%\uff0c\u5728Gemma\u7cfb\u5217\u4e2d\u63d0\u9ad8\u4e86\u7ea610%\u3002\u6211\u4eec\u7684\u7b56\u7565\u8fd8\u5c55\u793a\u4e86\u9488\u5bf9\u672a\u89c1\u8fc7\u7684\u6709\u5bb3\u76ee\u6807\u548c\u9ed1\u76d2LLMs\uff08\u5982GPT-3.5\u548cGPT-4\uff09\u7684\u7a33\u5065\u653b\u51fb\u8f6c\u79fb\u80fd\u529b\u3002\u6b64\u5916\uff0c\u6211\u4eec\u6ce8\u610f\u5230\u6211\u4eec\u7684\u6ce8\u610f\u529b\u5206\u6570\u53ef\u89c6\u5316\u66f4\u6613\u4e8e\u89e3\u91ca\uff0c\u4f7f\u6211\u4eec\u80fd\u591f\u66f4\u597d\u5730\u4e86\u89e3\u5982\u4f55\u901a\u8fc7\u6709\u9488\u5bf9\u6027\u7684\u6ce8\u610f\u529b\u64cd\u7eb5\u5b9e\u73b0\u66f4\u6709\u6548\u7684\u56da\u7981\u3002\u6211\u4eec\u53d1\u5e03\u4e86\u4ee3\u7801\uff0c\u53ef\u5728https://github.com/UCSC-VLAA/AttnGCG-attack\u4e2d\u83b7\u53d6\u3002**|\n", "2410.09039": "|**2024-10-11**|**Semi-Supervised Learning of Noisy Mixture of Experts Models**|Oh-Ran Kwon et.al.|[2410.09039](http://arxiv.org/abs/2410.09039)|null|\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09\u6a21\u578b\u662f\u4e00\u4e2a\u7075\u6d3b\u7684\u9884\u6d4b\u5efa\u6a21\u6846\u67b6\uff0c\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u7684\u65f6\u4ee3\u91cd\u65b0\u5f15\u8d77\u4e86\u4eba\u4eec\u7684\u5173\u6ce8\u3002\u4e00\u4e2a\u7531\u9884\u6d4b\u201c\u4e13\u5bb6\u201d\u7ec4\u6210\u7684\u96c6\u5408\u4e0e\u63a7\u5236\u5728\u9884\u6d4b\u65f6\u6bcf\u4e2a\u4e13\u5bb6\u5f71\u54cd\u529b\u7684\u201c\u95e8\u63a7\u51fd\u6570\u201d\u5171\u540c\u5b66\u4e60\u3002\u8fd9\u79cd\u7ed3\u6784\u5141\u8bb8\u76f8\u5bf9\u7b80\u5355\u7684\u6a21\u578b\u5728\u590d\u6742\u3001\u5f02\u6784\u7684\u6570\u636e\u73af\u5883\u4e2d\u8868\u73b0\u51fa\u8272\u3002\u5728\u5f53\u4eca\u8bb8\u591a\u5e94\u7528\u573a\u666f\u4e2d\uff0c\u672a\u6807\u8bb0\u6570\u636e\u5e7f\u6cdb\u53ef\u7528\u800c\u6807\u6ce8\u6570\u636e\u5374\u96be\u4ee5\u83b7\u53d6\u3002\u534a\u76d1\u7763\u5b66\u4e60\u65b9\u6cd5\u65e8\u5728\u5229\u7528\u672a\u6807\u8bb0\u6570\u636e\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7528\u4e8eMoE\u6a21\u578b\u534a\u76d1\u7763\u5b66\u4e60\u7684\u65b0\u65b9\u6cd5\u3002\u6211\u4eec\u4ece\u6d77\u6d0b\u5b66\u5bb6\u5f00\u53d1\u7684\u4e00\u79cd\u5047\u8bbe\u5f3a\u70c8\u7684\u534a\u76d1\u7763MoE\u6a21\u578b\u5f00\u59cb\uff0c\u8be5\u6a21\u578b\u5047\u8bbe\u672a\u6807\u6ce8\u6570\u636e\u4e2d\u7684\u6f5c\u5728\u805a\u7c7b\u7ed3\u6784\u76f4\u63a5\u6620\u5c04\u5230\u76d1\u7763\u4efb\u52a1\u4e2d\u6bcf\u4e2a\u4e13\u5bb6\u5e94\u7ed9\u4e88\u7684\u5f71\u54cd\u3002\u6211\u4eec\u653e\u677e\u4e86\u8fd9\u4e00\u5047\u8bbe\uff0c\u8bbe\u60f3\u4e24\u8005\u4e4b\u95f4\u5b58\u5728\u566a\u58f0\u8fde\u63a5\uff0c\u5e76\u57fa\u4e8e\u6700\u5c0f\u5316\u5254\u9664\u5e73\u65b9\u7b97\u6cd5\u63d0\u51fa\u4e86\u4e00\u79cd\u7b97\u6cd5\uff0c\u5373\u4f7f\u5b58\u5728\u6570\u636e\u9519\u4f4d\u4e5f\u80fd\u6210\u529f\u3002\u6211\u4eec\u7684\u7406\u8bba\u5206\u6790\u786e\u5b9a\u4e86\u8be5\u65b9\u6cd5\u80fd\u591f\u4ea7\u751f\u63a5\u8fd1\u53c2\u6570\u7387\u6536\u655b\u4f30\u8ba1\u5668\u7684\u6761\u4ef6\u3002\u6a21\u62df\u548c\u771f\u5b9e\u6570\u636e\u793a\u4f8b\u8bc1\u660e\u4e86\u8be5\u65b9\u6cd5\u7684\u6709\u6548\u6027\u3002|\n", "2410.09038": "|**2024-10-11**|**SimpleStrat: Diversifying Language Model Generation with Stratification**|Justin Wong et.al.|[2410.09038](http://arxiv.org/abs/2410.09038)|null|\u751f\u6210\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u591a\u6837\u5316\u54cd\u5e94\u5bf9\u4e8e\u89c4\u5212/\u641c\u7d22\u548c\u5408\u6210\u6570\u636e\u751f\u6210\u7b49\u5e94\u7528\u81f3\u5173\u91cd\u8981\u3002\u8fd9\u4e9b\u5e94\u7528\u9700\u8981\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u591a\u6837\u5316\u7684\u7b54\u6848\uff0c\u4ee5\u4fbf\u5728\u6bcf\u6b21\u751f\u6210\u65f6\u90fd\u80fd\u5f97\u5230\u4e0d\u540c\u7684\u7ed3\u679c\u3002\u4e4b\u524d\u7684\u65b9\u6cd5\u901a\u5e38\u4f9d\u8d56\u4e8e\u589e\u52a0\u6e29\u5ea6\u6765\u63d0\u9ad8\u591a\u6837\u6027\u3002\u7136\u800c\uff0c\u4e0e\u666e\u904d\u8ba4\u8bc6\u76f8\u53cd\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u79cd\u65b9\u6cd5\u4e0d\u4ec5\u4f1a\u5bfc\u81f4\u968f\u7740\u6e29\u5ea6\u589e\u52a0\uff0c\u4e2a\u4f53\u751f\u6210\u7684\u8d28\u91cf\u964d\u4f4e\uff0c\u800c\u4e14\u5176\u6709\u6548\u6027\u8fd8\u53d6\u51b3\u4e8e\u6a21\u578b\u7684\u4e0b\u4e00\u4e2a\u8bcd\u6982\u7387\u4e0e\u771f\u5b9e\u7b54\u6848\u5206\u5e03\u7684\u76f8\u4f3c\u6027\u3002 \u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u201cSimpleStrat\u201d\u7684\u66ff\u4ee3\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u5229\u7528\u8bed\u8a00\u6a21\u578b\u672c\u8eab\u5bf9\u7a7a\u95f4\u8fdb\u884c\u5206\u533a\u3002\u5728\u63a8\u7406\u9636\u6bb5\uff0c\u968f\u673a\u9009\u62e9\u4e00\u4e2a\u5206\u533a\u5e76\u5728\u5176\u4e2d\u62bd\u53d6\u6837\u672c\u3002\u4e3a\u4e86\u8861\u91cf\u591a\u6837\u6027\uff0c\u6211\u4eec\u5f15\u5165\u4e86CoverageQA\u6570\u636e\u96c6\uff0c\u5b83\u5305\u542b\u4e86\u5177\u6709\u591a\u4e2a\u540c\u7b49\u53ef\u80fd\u7b54\u6848\u7684\u672a\u6307\u5b9a\u95ee\u9898\u3002\u901a\u8fc7\u6d4b\u91cf\u8f93\u51fa\u5206\u5e03\u4e0e\u6709\u6548\u5730\u9762\u771f\u76f8\u7b54\u6848\u7684\u5747\u5300\u5206\u5e03\u4e4b\u95f4\u7684KL\u6563\u5ea6\u6765\u8bc4\u4f30\u591a\u6837\u6027\u3002\u7531\u4e8e\u8ba1\u7b97\u4e13\u7528\u6a21\u578b\u6bcf\u6761\u54cd\u5e94/\u89e3\u51b3\u65b9\u6848\u7684\u6982\u7387\u901a\u5e38\u662f\u4e0d\u53ef\u884c\u7684\uff0c\u56e0\u6b64\u6211\u4eec\u4f7f\u7528\u53ec\u56de\u7387\u6765\u8bc4\u4f30\u5730\u771f\u7406\u89e3\u3002 \u6211\u4eec\u7684\u8bc4\u4f30\u7ed3\u679c\u663e\u793a\uff0c\u4f7f\u7528SimpleStrat\u65b9\u6cd5\u53ef\u4ee5\u5b9e\u73b0\u6bd4GPT-4o\u9ad80.05\u7684\u53ec\u56de\u7387\uff0c\u5e76\u4e14\u5e73\u5747\u51cf\u5c11\u4e860.36\u7684KL\u6563\u5ea6\u4e0eLlama 3\u76f8\u6bd4\u3002|\n", "2410.09037": "|**2024-10-11**|**Mentor-KD: Making Small Language Models Better Multi-step Reasoners**|Hojae Lee et.al.|[2410.09037](http://arxiv.org/abs/2410.09037)|**[link](https://github.com/2hojae/mentor-kd)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u901a\u8fc7\u5229\u7528\u94fe\u5f0f\u601d\u7ef4\uff08CoT\uff09\u63d0\u793a\u5728\u5404\u79cd\u590d\u6742\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u975e\u51e1\u7684\u6027\u80fd\u3002\u8fd1\u671f\u7684\u7814\u7a76\u63d0\u51fa\u4e86\u4e00\u79cd\u77e5\u8bc6\u84b8\u998f\uff08KD\uff09\u65b9\u6cd5\u2014\u2014\u63a8\u7406\u84b8\u998f\uff0c\u901a\u8fc7\u5fae\u8c03\u7531LLM\u6559\u5e08\u751f\u6210\u7684\u591a\u6b65\u63a8\u7406\u8bed\u8a00\u6a21\u578b\uff0c\u5c06LLM\u7684\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u6a21\u578b\u4e0a\u3002\u7136\u800c\uff0c\u8fd9\u4e9b\u7814\u7a76\u5728\u4ee5\u4e0b\u4e24\u4e2a\u65b9\u9762\u8003\u8651\u4e0d\u8db3\uff1a\u4eceLLM\u6559\u5e08\u6a21\u578b\u83b7\u53d6\u7684\u793a\u4f8b\u96c6\u8d28\u91cf\u4f4e\u548c\u8f6f\u6807\u7b7e\u63d0\u4f9b\u4e0d\u8db3\u3002\u672c\u6587\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3a\u5bfc\u5e08-KD\u7684\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u6709\u6548\u5730\u5c06LLM\u7684\u591a\u6b65\u63a8\u7406\u80fd\u529b\u8f6c\u79fb\u5230\u8f83\u5c0f\u7684\u8bed\u8a00\u6a21\u578b\u4e0a\uff0c\u5e76\u89e3\u51b3\u4e86\u4e0a\u8ff0\u6311\u6218\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u5229\u7528\u4e00\u4e2a\u5bfc\u5e08\u2014\u2014\u7279\u5b9a\u4efb\u52a1\u7684\u4e2d\u95f4\u5927\u5c0f\u7684\u5fae\u8c03\u6a21\u578b\u2014\u2014\u6765\u589e\u52a0\u989d\u5916\u7684CoT\u6ce8\u91ca\u5e76\u4e3a\u5b66\u751f\u6a21\u578b\u63d0\u4f9b\u8f6f\u6807\u7b7e\uff0c\u4ee5\u5728\u63a8\u7406\u84b8\u998f\u8fc7\u7a0b\u4e2d\u63d0\u4f9b\u652f\u6301\u3002\u6211\u4eec\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u786e\u8ba4\u4e86\u5bfc\u5e08-KD\u5728\u4e0d\u540c\u6a21\u578b\u548c\u590d\u6742\u63a8\u7406\u4efb\u52a1\u4e0a\u7684\u6709\u6548\u6027\u3002**|\n", "2410.09034": "|**2024-10-11**|**PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents**|Xiangyu Yin et.al.|[2410.09034](http://arxiv.org/abs/2410.09034)|null|Ptychography\u662f\u4e00\u79cd\u5728X\u5c04\u7ebf\u548c\u7535\u5b50\u663e\u5fae\u955c\u9886\u57df\u5e7f\u6cdb\u5e94\u7528\u7684\u9ad8\u7ea7\u8ba1\u7b97\u6210\u50cf\u6280\u672f\u3002\u5b83\u5728\u7269\u7406\u5b66\u3001\u5316\u5b66\u3001\u751f\u7269\u5b66\u548c\u6750\u6599\u79d1\u5b66\u7b49\u7814\u7a76\u9886\u57df\u4ee5\u53ca\u534a\u5bfc\u4f53\u8868\u5f81\u7b49\u5de5\u4e1a\u5e94\u7528\u4e2d\u88ab\u5e7f\u6cdb\u91c7\u7528\u3002\u5b9e\u8df5\u8fc7\u7a0b\u4e2d\uff0c\u83b7\u5f97\u9ad8\u8d28\u91cf\u7684ptychographic\u56fe\u50cf\u9700\u8981\u540c\u65f6\u4f18\u5316\u4f17\u591a\u5b9e\u9a8c\u548c\u7b97\u6cd5\u53c2\u6570\u3002\u4f20\u7edf\u4e0a\uff0c\u53c2\u6570\u9009\u62e9\u5f80\u5f80\u4f9d\u8d56\u4e8e\u8bd5\u9519\u6cd5\uff0c\u5bfc\u81f4\u5de5\u4f5c\u6548\u7387\u4f4e\u4e0b\uff0c\u5e76\u53ef\u80fd\u5f15\u5165\u4eba\u4e3a\u504f\u89c1\u3002\u672c\u5de5\u4f5c\u5f00\u53d1\u4e86\u201cptychographic\u5b9e\u9a8c\u4e0e\u5206\u6790\u673a\u5668\u4eba\u201d\uff08PEAR\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u5229\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u81ea\u52a8\u5904\u7406ptychography\u6570\u636e\u5206\u6790\u7684\u6846\u67b6\u3002\u4e3a\u4e86\u786e\u4fdd\u9ad8\u9c81\u68d2\u6027\u548c\u51c6\u786e\u6027\uff0cPEAR\u91c7\u7528\u4e86\u591a\u4e2aLLM\u4ee3\u7406\u8fdb\u884c\u77e5\u8bc6\u68c0\u7d22\u3001\u4ee3\u7801\u751f\u6210\u3001\u53c2\u6570\u63a8\u8350\u548c\u56fe\u50cf\u63a8\u7406\u4efb\u52a1\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cPEAR\u7684\u591a\u4ee3\u7406\u8bbe\u8ba1\u663e\u8457\u63d0\u9ad8\u4e86\u5de5\u4f5c\u6d41\u7a0b\u7684\u6210\u529f\u7387\uff0c\u5373\u4f7f\u4f7f\u7528\u8f83\u5c0f\u7684\u5f00\u6e90\u6743\u91cd\u6a21\u578b\u5982LLaMA 3.1 8B\u4e5f\u662f\u5982\u6b64\u3002PEAR\u8fd8\u652f\u6301\u5404\u79cd\u81ea\u52a8\u5316\u7ea7\u522b\uff0c\u5e76\u8bbe\u8ba1\u6709\u53ef\u81ea\u5b9a\u4e49\u7684\u672c\u5730\u77e5\u8bc6\u5e93\uff0c\u4ee5\u786e\u4fdd\u5176\u5728\u4e0d\u540c\u7814\u7a76\u73af\u5883\u4e0b\u7684\u7075\u6d3b\u6027\u548c\u9002\u5e94\u6027\u3002|\n", "2410.09013": "|**2024-10-11**|**The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals**|Xiaofeng Wu et.al.|[2410.09013](http://arxiv.org/abs/2410.09013)|null|\u672c\u6587\u7814\u7a76\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u89c6\u89c9\u8bed\u8a00\u6a21\u578b\uff08VLMs\uff09\u5728\u5229\u7528\u6c49\u5b57\u4e2d\u7684\u89c6\u89c9\u4fe1\u606f\u65b9\u9762\u7684\u6f5c\u529b\uff0c\u5c24\u5176\u662f\u5173\u4e8e\u90e8\u9996\u3001\u7ed3\u6784\u3001\u7b14\u753b\u4ee5\u53ca\u7b14\u753b\u6570\u91cf\u7684\u4fe1\u606f\u3002\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u7cfb\u7edf\u6765\u8bc4\u4f30\u8fd9\u4e9b\u6a21\u578b\u5bf9\u6c49\u5b57\u4e2d\u89c6\u89c9\u5143\u7d20\u7684\u7406\u89e3\u7a0b\u5ea6\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u5c3d\u7ba1\u63d0\u4f9b\u5b57\u7b26\u56fe\u50cf\uff0c\u6a21\u578b\u4ecd\u7136\u5c55\u793a\u4e86\u6709\u9650\u4f46\u90e8\u5206\u7406\u89e3\u89c6\u89c9\u4fe1\u606f\u7684\u80fd\u529b\u3002 \u4e3a\u4e86\u6fc0\u53d1\u6a21\u578b\u5229\u7528\u90e8\u9996\u8fdb\u884c\u4e2d\u6587\u7406\u89e3\u4efb\u52a1\u7684\u6f5c\u529b\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5c1d\u8bd5\u5c06\u90e8\u9996\u4fe1\u606f\u878d\u5165\u5230\u63d0\u793a\u4e2d\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5728\u63d0\u4f9b\u5173\u4e8e\u90e8\u9996\u7684\u989d\u5916\u4fe1\u606f\u65f6\uff0c\u8bcd\u6027\u6807\u6ce8\u4efb\u52a1\u7684\u8868\u73b0\u5f97\u5230\u4e86\u4e00\u81f4\u6027\u7684\u63d0\u5347\u3002\u8fd9\u8868\u660e\u901a\u8fc7\u6574\u5408\u5b50\u5b57\u7b26\u4fe1\u606f\uff0c\u6709\u53ef\u80fd\u589e\u5f3a\u8bed\u8a00\u5904\u7406\u80fd\u529b\u3002|\n", "2410.09012": "|**2024-10-11**|**Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models**|Hao Li et.al.|[2410.09012](http://arxiv.org/abs/2410.09012)|null|\u672c\u6587\u9996\u6b21\u4ece\u5b9e\u8df5\u8005\u7684\u89c6\u89d2\u5206\u6790\u4e86\u57fa\u7840\u6a21\u578b\uff08FMs\uff09\u5728\u8f6f\u4ef6\u5de5\u7a0b\uff08SE\uff09\u9886\u57df\u7684\u5e94\u7528\u3002\u901a\u8fc7\u5206\u6790\u6765\u81ea\u9876\u7ea7\u79d1\u6280\u516c\u53f8\u7684155\u7bc7FM4SE\u548c997\u7bc7SE4FM\u535a\u5ba2\u6587\u7ae0\uff0c\u5229\u7528\u57fa\u4e8eFM\u7684\u8c03\u7814\u65b9\u6cd5\u7cfb\u7edf\u5730\u6807\u8bb0\u548c\u603b\u7ed3\u4e86\u8ba8\u8bba\u7684\u6d3b\u52a8\u548c\u4efb\u52a1\u3002\u7814\u7a76\u53d1\u73b0\uff0c\u867d\u7136\u4ee3\u7801\u751f\u6210\u662fFM4SE\u4e2d\u6700\u7a81\u51fa\u7684\u4efb\u52a1\uff0c\u4f46FMs\u8fd8\u88ab\u7528\u4e8e\u4ee3\u7801\u7406\u89e3\u3001\u603b\u7ed3\u548cAPI\u63a8\u8350\u7b49\u4f17\u591a\u5176\u4ed6SE\u6d3b\u52a8\u3002\u5173\u4e8eSE4FM\u7684\u5927\u591a\u6570\u535a\u5ba2\u6587\u7ae0\u5173\u6ce8\u4e8e\u6a21\u578b\u90e8\u7f72\u4e0e\u64cd\u4f5c\u4ee5\u53ca\u7cfb\u7edf\u67b6\u6784\u4e0e\u7f16\u6392\u3002\u5c3d\u7ba1\u4e91\u90e8\u7f72\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u4f46\u5bf9FMs\u8fdb\u884c\u538b\u7f29\u5e76\u5728\u8fb9\u7f18\u6216\u79fb\u52a8\u8bbe\u5907\u4e0a\u90e8\u7f72\u7684\u5174\u8da3\u6b63\u5728\u589e\u957f\u3002\u672c\u6587\u63d0\u51fa\u4e86\u516b\u4e2a\u672a\u6765\u7814\u7a76\u65b9\u5411\uff0c\u65e8\u5728\u5f25\u5408\u7406\u8bba\u53d1\u73b0\u4e0e\u5b9e\u9645\u5e94\u7528\u4e4b\u95f4\u7684\u5dee\u8ddd\u3002\u6211\u4eec\u7684\u7814\u7a76\u4e0d\u4ec5\u4e30\u5bcc\u4e86FMs\u5728SE\u9886\u57df\u5b9e\u8df5\u5e94\u7528\u7684\u77e5\u8bc6\u4f53\u7cfb\uff0c\u8fd8\u5c55\u793a\u4e86FMs\u5728\u6280\u672f\u4e0e\u7070\u8272\u6587\u732e\u9886\u57df\u8fdb\u884c\u6587\u732e\u8c03\u7814\u7684\u6709\u6548\u6027\u3002\u6211\u4eec\u63d0\u4f9b\u7684\u6570\u636e\u96c6\u3001\u7ed3\u679c\u3001\u4ee3\u7801\u4ee5\u53ca\u4f7f\u7528\u7684\u63d0\u793a\u53ef\u4ee5\u5728\u5728\u7ebf\u590d\u5236\u5305https://github.com/SAILResearch/fmse-blogs\u4e2d\u627e\u5230\u3002|\n", "2410.09008": "|**2024-10-11**|**SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights**|Ling Yang et.al.|[2410.09008](http://arxiv.org/abs/2410.09008)|**[link](https://github.com/yangling0818/supercorrect-llm)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5982GPT-4\u3001PaLM\u548cLLaMA\u5728\u5404\u79cd\u63a8\u7406\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u663e\u8457\u7684\u6539\u8fdb\u3002\u7136\u800c\uff0c\u8f83\u5c0f\u7684\u6a21\u578b\u5982Llama-3-8B\u548cDeepSeekMath-Base\u4ecd\u7136\u5728\u590d\u6742\u7684\u6570\u5b66\u63a8\u7406\u65b9\u9762\u5b58\u5728\u6311\u6218\uff0c\u56e0\u4e3a\u5b83\u4eec\u65e0\u6cd5\u6709\u6548\u5730\u8bc6\u522b\u5e76\u7ea0\u6b63\u63a8\u7406\u9519\u8bef\u3002\u8fd1\u671f\u7684\u53cd\u601d\u65b9\u6cd5\u65e8\u5728\u901a\u8fc7\u4f7f\u6a21\u578b\u80fd\u591f\u81ea\u6211\u53cd\u601d\u548c\u81ea\u6211\u6821\u6b63\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u4f46\u4ecd\u9762\u4e34\u72ec\u7acb\u68c0\u6d4b\u63a8\u7406\u6b65\u9aa4\u4e2d\u7684\u9519\u8bef\u7684\u6311\u6218\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e9b\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aSuperCorrect\u7684\u65b0\u578b\u4e24\u9636\u6bb5\u6846\u67b6\uff0c\u5b83\u4f7f\u7528\u5927\u578b\u6559\u5e08\u6a21\u578b\u6765\u76d1\u7763\u548c\u7ea0\u6b63\u8f83\u5c0f\u5b66\u751f\u6a21\u578b\u7684\u63a8\u7406\u548c\u53cd\u601d\u8fc7\u7a0b\u3002 \u5728\u7b2c\u4e00\u9636\u6bb5\uff0c\u6211\u4eec\u4ece\u6559\u5e08\u6a21\u578b\u4e2d\u63d0\u53d6\u4e86\u5c42\u6b21\u5316\u7684\u9ad8\u9636\u548c\u8be6\u7ec6\u7684\u601d\u60f3\u6a21\u677f\uff0c\u4ee5\u6307\u5bfc\u5b66\u751f\u6a21\u578b\u751f\u6210\u66f4\u7ec6\u81f4\u7684\u63a8\u7406\u601d\u60f3\u3002\u5728\u7b2c\u4e8c\u9636\u6bb5\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u8de8\u6a21\u578b\u534f\u4f5c\u76f4\u63a5\u504f\u597d\u4f18\u5316\uff08DPO\uff09\u6765\u589e\u5f3a\u5b66\u751f\u6a21\u578b\u7684\u81ea\u6211\u6821\u6b63\u80fd\u529b\uff0c\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u8ddf\u968f\u6559\u5e08\u7684\u4fee\u6b63\u8f68\u8ff9\u8fdb\u884c\u6539\u8fdb\u3002\u8fd9\u79cd\u8de8\u6a21\u578bDPO\u65b9\u6cd5\u6559\u4f1a\u5b66\u751f\u6a21\u578b\u901a\u8fc7\u4ece\u6559\u5e08\u6a21\u578b\u83b7\u5f97\u7684\u9519\u8bef\u9a71\u52a8\u7684\u89c1\u89e3\u6709\u6548\u5730\u5b9a\u4f4d\u5e76\u89e3\u51b3\u9519\u8bef\u7684\u601d\u60f3\uff0c\u6253\u7834\u5176\u601d\u60f3\u7684\u74f6\u9888\uff0c\u5e76\u901a\u8fc7\u5b66\u4e60\u65b0\u6280\u80fd\u548c\u77e5\u8bc6\u6765\u5e94\u5bf9\u5177\u6709\u6311\u6218\u6027\u7684\u95ee\u9898\u3002 \u5e7f\u6cdb\u7684\u5b9e\u9a8c\u4e00\u81f4\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u4f18\u8d8a\u6027\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u6211\u4eec\u7684SuperCorrect-7B\u6a21\u578b\u5728MATH/GSM8K\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u8457\u8d85\u8d8a\u4e86\u5f3a\u5927\u7684DeepSeekMath-7B\u548cQwen2.5-Math-7B\uff0c\u5206\u522b\u5728MATH\u548cGSM8K\u57fa\u51c6\u4e0a\u63d0\u9ad8\u4e867.8%/5.3%\u548c15.1%/6.3%\uff0c\u5728\u6240\u67097B\u6a21\u578b\u4e2d\u5b9e\u73b0\u4e86\u65b0\u7684\u6700\u5148\u8fdb\u6027\u80fd\u3002\u4ee3\u7801\uff1ahttps://github.com/YangLing0818/SuperCorrect-llm**|\n", "2410.09006": "|**2024-10-11**|**From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts**|Zhuohao Jerry Zhang et.al.|[2410.09006](http://arxiv.org/abs/2410.09006)|null|\u968f\u7740\u751f\u6210\u5f0f\u4eba\u5de5\u667a\u80fd\u7684\u8fdb\u6b65\uff0c\u4eba\u4eec\u5728\u521b\u5efa\u80fd\u591f\u901a\u8fc7\u7528\u6237\u754c\u9762\uff08UI\uff09\u7ba1\u7406\u65e5\u5e38\u4efb\u52a1\u7684\u81ea\u4e3b\u4ee3\u7406\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55\u3002\u5c3d\u7ba1\u5148\u524d\u7684\u7814\u7a76\u5df2\u7ecf\u63a2\u8ba8\u4e86AI\u4ee3\u7406\u5982\u4f55\u5bfc\u822aUI\u4ee5\u53ca\u7406\u89e3UI\u7ed3\u6784\u7684\u673a\u5236\uff0c\u4f46\u4ee3\u7406\u53ca\u5176\u81ea\u4e3b\u884c\u4e3a\uff08\u7279\u522b\u662f\u53ef\u80fd\u5177\u6709\u98ce\u9669\u6216\u4e0d\u53ef\u9006\u6027\u7684\u884c\u4e3a\uff09\u7684\u5f71\u54cd\u548c\u540e\u679c\u4ecd\u7136\u7f3a\u4e4f\u6df1\u5165\u7814\u7a76\u3002\u672c\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86AI\u4ee3\u7406UI\u64cd\u4f5c\u7684\u5b9e\u9645\u4e16\u754c\u5f71\u54cd\u548c\u540e\u679c\u3002 \u6211\u4eec\u9996\u5148\u901a\u8fc7\u4e00\u7cfb\u5217\u4e0e\u9886\u57df\u4e13\u5bb6\u7684\u5de5\u4f5c\u574a\u5f00\u53d1\u4e86\u4e00\u79cdUI\u64cd\u4f5c\u5f71\u54cd\u7684\u5206\u7c7b\u7cfb\u7edf\u3002\u968f\u540e\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u4e00\u9879\u6570\u636e\u7efc\u5408\u7814\u7a76\uff0c\u6536\u96c6\u4e86\u7528\u6237\u611f\u77e5\u4e3a\u5177\u6709\u5f71\u54cd\u529b\u7684UI\u5c4f\u5e55\u8f68\u8ff9\u548c\u64cd\u4f5c\u6570\u636e\u3002\u7136\u540e\uff0c\u6211\u4eec\u4f7f\u7528\u6211\u4eec\u7684\u5f71\u54cd\u7c7b\u522b\u5bf9\u6536\u96c6\u7684\u6570\u636e\u548c\u4ece\u73b0\u6709UI\u5bfc\u822a\u6570\u636e\u96c6\u4e2d\u91cd\u65b0\u5229\u7528\u7684\u6570\u636e\u8fdb\u884c\u4e86\u6ce8\u91ca\u3002\u6211\u4eec\u5bf9\u4e0d\u540c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u53ca\u5176\u53d8\u4f53\u7684\u5b9a\u91cf\u8bc4\u4f30\u663e\u793a\u4e86\u8fd9\u4e9bLLM\u7406\u89e3\u548c\u9884\u6d4bAI\u4ee3\u7406\u53ef\u80fd\u91c7\u53d6\u7684UI\u64cd\u4f5c\u5f71\u54cd\u7684\u80fd\u529b\u3002 \u6211\u4eec\u7684\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u6211\u4eec\u7684\u5206\u7c7b\u7cfb\u7edf\u589e\u5f3a\u4e86\u8fd9\u4e9bLLM\u7684\u63a8\u7406\u80fd\u529b\uff0c\u4f7f\u5b83\u4eec\u80fd\u591f\u66f4\u597d\u5730\u7406\u89e3UI\u64cd\u4f5c\u7684\u5f71\u54cd\u3002\u7136\u800c\uff0c\u6211\u4eec\u4e5f\u53d1\u73b0\u4e86\u4ed6\u4eec\u5728\u53ef\u9760\u5730\u5206\u7c7b\u66f4\u5fae\u5999\u6216\u590d\u6742\u7684\u5f71\u54cd\u529b\u7c7b\u522b\u65f6\u5b58\u5728\u663e\u8457\u5dee\u8ddd\u7684\u95ee\u9898\u3002|\n", "2410.08996": "|**2024-10-11**|**Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference**|Grace Proebsting et.al.|[2410.08996](http://arxiv.org/abs/2410.08996)|null|\u6211\u4eec\u901a\u8fc7\u4f7f\u7528GPT-4\u3001Llama-2\u548cMistral 7b\u7b49\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6765\u751f\u6210\u81ea\u7136\u8bed\u8a00\u63a8\u7406\uff08NLI\uff09\u5047\u8bbe\uff0c\u6d4b\u8bd5\u4e86\u7528LLM\u66ff\u6362\u4f17\u5305\u5de5\u4f5c\u8005\u5bf9\u4ea7\u751f\u6ce8\u91ca\u504f\u89c1\u7684\u5f71\u54cd\u3002\u6211\u4eec\u590d\u73b0\u4e86\u65af\u5766\u798fNLI\u8bed\u6599\u5e93\u7684\u90e8\u5206\u6570\u636e\uff0c\u5e76\u8bad\u7ec3\u4e86\u4ec5\u4f7f\u7528\u5047\u8bbe\u7684\u5206\u7c7b\u5668\u6765\u786e\u5b9aLLM\u751f\u6210\u7684\u5047\u8bbe\u662f\u5426\u5305\u542b\u6ce8\u91ca\u504f\u89c1\u3002\u5728\u6211\u4eec\u7684\u7531LLM\u751f\u6210\u7684NLI\u6570\u636e\u96c6\u4e0a\uff0c\u57fa\u4e8eBERT\u7684\u4ec5\u5047\u8bbe\u5206\u7c7b\u5668\u8fbe\u5230\u4e8686%-96%\u7684\u51c6\u786e\u7387\uff0c\u8fd9\u8868\u660e\u8fd9\u4e9b\u6570\u636e\u96c6\u5305\u542b\u4ec5\u5047\u8bbe\u7684\u504f\u89c1\u3002\u6211\u4eec\u8fd8\u53d1\u73b0LLM\u751f\u6210\u7684\u5047\u8bbe\u4e2d\u5b58\u5728\u9891\u7e41\u7684\u201c\u7ebf\u7d22\u201d\uff0c\u4f8b\u5982\uff0c\u201c\u5728\u6cf3\u6c60\u91cc\u6e38\u6cf3\u201d\u8fd9\u4e00\u77ed\u8bed\u5728GPT-4\u751f\u6210\u768410000\u591a\u4e2a\u77db\u76fe\u5047\u8bbe\u4e2d\u51fa\u73b0\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u4f9b\u4e86\u5b9e\u8bc1\u8bc1\u636e\uff0c\u8bc1\u660eNLI\u4e2d\u5df2\u77e5\u7684\u504f\u89c1\u53ef\u80fd\u5728LLM\u751f\u6210\u7684\u6570\u636e\u4e2d\u6301\u7eed\u5b58\u5728\u3002|\n", "2410.10819": "|**2024-10-14**|**DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads**|Guangxuan Xiao et.al.|[2410.10819](http://arxiv.org/abs/2410.10819)|**[link](https://github.com/mit-han-lab/duo-attention)**|**\u90e8\u7f72\u957f\u4e0a\u4e0b\u6587\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u81f3\u5173\u91cd\u8981\uff0c\u4f46\u4e5f\u5e26\u6765\u4e86\u663e\u8457\u7684\u8ba1\u7b97\u548c\u5185\u5b58\u6311\u6218\u3002\u7f13\u5b58\u6240\u6709\u6ce8\u610f\u529b\u5934\u4e2d\u7684Key\u548cValue\uff08KV\uff09\u72b6\u6001\u4f1a\u6d88\u8017\u5927\u91cf\u5185\u5b58\u3002\u73b0\u6709\u7684KV\u7f13\u5b58\u526a\u679d\u65b9\u6cd5\u8981\u4e48\u635f\u5bb3\u4e86LLM\u7684\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\uff0c\u8981\u4e48\u53ea\u63d0\u4f9b\u4e86\u6709\u9650\u7684\u6548\u7387\u63d0\u5347\u3002\u672c\u6587\u53d1\u73b0\uff0c\u53ea\u6709\u90e8\u5206\u6ce8\u610f\u529b\u5934\uff0c\u5373\u68c0\u7d22\u5934\uff0c\u5bf9\u4e8e\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u662f\u81f3\u5173\u91cd\u8981\u7684\uff0c\u5e76\u4e14\u9700\u8981\u5bf9\u6240\u6709\u6807\u8bb0\u8fdb\u884c\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u673a\u5236\u3002\u76f8\u53cd\uff0c\u6240\u6709\u5176\u4ed6\u5934\u90e8\uff0c\u4e3b\u8981\u5173\u6ce8\u6700\u8fd1\u7684\u6807\u8bb0\u4ee5\u53ca\u6ce8\u610f\u529b\u6c47\u70b9\uff0c\u79f0\u4e3a\u6d41\u5934\u90e8\uff0c\u4e0d\u9700\u8981\u5b8c\u6574\u7684\u6ce8\u610f\u529b\u3002\u57fa\u4e8e\u8fd9\u4e00\u89c1\u89e3\uff0c\u6211\u4eec\u5f15\u5165\u4e86DuoAttention\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4ec5\u5bf9\u68c0\u7d22\u5934\u5e94\u7528\u5b8c\u6574\u7684KV\u7f13\u5b58\uff0c\u800c\u5bf9\u6d41\u5934\u90e8\u4f7f\u7528\u8f7b\u91cf\u7ea7\u3001\u56fa\u5b9a\u957f\u5ea6\u7684KV\u7f13\u5b58\uff0c\u4ece\u800c\u5728\u4e0d\u635f\u5bb3\u957f\u4e0a\u4e0b\u6587\u80fd\u529b\u7684\u60c5\u51b5\u4e0b\u51cf\u5c11LLM\u89e3\u7801\u548c\u9884\u586b\u5145\u7684\u5185\u5b58\u548c\u5ef6\u8fdf\u3002DuoAttention\u91c7\u7528\u4e86\u4e00\u79cd\u57fa\u4e8e\u4f18\u5316\u7684\u7b97\u6cd5\uff0c\u4f7f\u7528\u5408\u6210\u6570\u636e\u51c6\u786e\u8bc6\u522b\u68c0\u7d22\u5934\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5c06\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u5185\u5b58\u6700\u591a\u51cf\u5c11\u4e862.55\u500d\uff08\u5bf9\u4e8eMHA\u6a21\u578b\uff09\u548c1.67\u500d\uff08\u5bf9\u4e8eGQA\u6a21\u578b\uff09\uff0c\u540c\u65f6\u89e3\u7801\u901f\u5ea6\u63d0\u9ad8\u4e86\u6700\u591a2.18\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.50\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u52a0\u901f\u9884\u586b\u5145\u6700\u591a1.73\u500d\uff08MHA\u6a21\u578b\uff09\u548c1.63\u500d\uff08GQA\u6a21\u578b\uff09\uff0c\u5e76\u4e14\u4e0e\u5168\u6ce8\u610f\u529b\u76f8\u6bd4\uff0c\u7cbe\u5ea6\u635f\u5931\u6700\u5c0f\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u7ed3\u5408\u91cf\u5316\u6280\u672f\uff0cDuoAttention\u4f7fLlama-3-8B\u80fd\u591f\u5728\u5355\u4e2aA100 GPU\u4e0a\u89e3\u7801\u957f\u8fbe330\u4e07\u4e0a\u4e0b\u6587\u957f\u5ea6\u7684\u6570\u636e\u3002\u4ee3\u7801\u53ef\u5728https://github.com/mit-han-lab/duo-attention\u83b7\u53d6\u3002**|\n", "2410.10813": "|**2024-10-14**|**LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory**|Di Wu et.al.|[2410.10813](http://arxiv.org/abs/2410.10813)|**[link](https://github.com/xiaowu0162/longmemeval)**|**\u8fd1\u671f\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9a71\u52a8\u7684\u804a\u5929\u52a9\u624b\u7cfb\u7edf\u5df2\u96c6\u6210\u4e86\u8bb0\u5fc6\u7ec4\u4ef6\u6765\u8ddf\u8e2a\u7528\u6237\u4e0e\u52a9\u624b\u4e4b\u95f4\u7684\u804a\u5929\u5386\u53f2\uff0c\u4ece\u800c\u5b9e\u73b0\u66f4\u51c6\u786e\u548c\u4e2a\u6027\u5316\u7684\u54cd\u5e94\u3002\u7136\u800c\uff0c\u5b83\u4eec\u5728\u6301\u7eed\u4ea4\u4e92\u4e2d\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u4ecd\u9700\u6df1\u5165\u7814\u7a76\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u4e2a\u540d\u4e3aLongMemEval\u7684\u7efc\u5408\u57fa\u51c6\uff0c\u7528\u4e8e\u8bc4\u4f30\u804a\u5929\u52a9\u624b\u7684\u4e94\u9879\u6838\u5fc3\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\uff1a\u4fe1\u606f\u63d0\u53d6\u3001\u591a\u4f1a\u8bdd\u63a8\u7406\u3001\u65f6\u95f4\u63a8\u7406\u3001\u77e5\u8bc6\u66f4\u65b0\u548c\u5f03\u6743\u3002\u8be5\u57fa\u51c6\u5305\u542b500\u4e2a\u7cbe\u5fc3\u7b56\u5212\u7684\u95ee\u9898\uff0c\u5e76\u5d4c\u5165\u5728\u81ea\u7531\u6269\u5c55\u7684\u7528\u6237\u4e0e\u52a9\u624b\u804a\u5929\u5386\u53f2\u4e2d\u3002LongMemEval\u5bf9\u73b0\u6709\u7684\u957f\u671f\u8bb0\u5fc6\u7cfb\u7edf\u63d0\u51fa\u4e86\u91cd\u5927\u6311\u6218\uff0c\u5728\u5546\u4e1a\u804a\u5929\u52a9\u624b\u548c\u957f\u4e0a\u4e0b\u6587LLM\u4e0a\uff0c\u8de8\u6301\u7eed\u4ea4\u4e92\u7684\u8bb0\u5fc6\u4fe1\u606f\u4fdd\u7559\u7387\u4e0b\u964d\u4e8630%\u3002\u968f\u540e\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7edf\u4e00\u6846\u67b6\uff0c\u5c06\u957f\u671f\u8bb0\u5fc6\u8bbe\u8ba1\u5206\u89e3\u4e3a\u7d22\u5f15\u3001\u68c0\u7d22\u548c\u9605\u8bfb\u9636\u6bb5\u7684\u56db\u4e2a\u8bbe\u8ba1\u9009\u62e9\u3002\u57fa\u4e8e\u5173\u952e\u5b9e\u9a8c\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u51e0\u79cd\u5185\u5b58\u8bbe\u8ba1\uff0c\u5305\u62ec\u4f1a\u8bdd\u5206\u89e3\u4ee5\u4f18\u5316\u503c\u7c92\u5ea6\u3001\u4e8b\u5b9e\u589e\u5f3a\u7684\u5173\u952e\u6269\u5c55\u4ee5\u589e\u5f3a\u7d22\u5f15\u7ed3\u6784\u4ee5\u53ca\u65f6\u95f4\u611f\u77e5\u67e5\u8be2\u6269\u5c55\u4ee5\u7ec6\u5316\u641c\u7d22\u8303\u56f4\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u8fd9\u4e9b\u4f18\u5316\u6781\u5927\u5730\u63d0\u9ad8\u4e86LongMemEval\u4e0a\u7684\u5185\u5b58\u53ec\u56de\u7387\u548c\u4e0b\u6e38\u95ee\u9898\u56de\u7b54\u6027\u80fd\u3002\u603b\u4f53\u800c\u8a00\uff0c\u672c\u7814\u7a76\u4e3a\u63a8\u8fdb\u57fa\u4e8eLLM\u7684\u804a\u5929\u52a9\u624b\u7684\u957f\u671f\u8bb0\u5fc6\u80fd\u529b\u63d0\u4f9b\u4e86\u6709\u4ef7\u503c\u7684\u8d44\u6e90\u548c\u6307\u5bfc\uff0c\u4e3a\u66f4\u4e2a\u6027\u5316\u548c\u53ef\u9760\u7684\u5bf9\u8bddAI\u94fa\u5e73\u4e86\u9053\u8def\u3002**|\n", "2410.10814": "|**2024-10-14**|**Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free**|Ziyue Li et.al.|[2410.10814](http://arxiv.org/abs/2410.10814)|null|\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u751f\u6210\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u4f46\u5176\u89e3\u7801\u5668-only\u67b6\u6784\u901a\u5e38\u9650\u5236\u4e86\u5b83\u4eec\u4f5c\u4e3a\u5d4c\u5165\u6a21\u578b\u7684\u6f5c\u529b\uff0c\u9664\u975e\u8fdb\u884c\u8fdb\u4e00\u6b65\u7684\u8868\u793a\u5fae\u8c03\u3002\u8fd9\u662f\u5426\u4e0e\u5b83\u4eec\u4f5c\u4e3a\u901a\u7528\u6a21\u578b\u7684\u4e3b\u5f20\u76f8\u77db\u76fe\uff1f\u4e3a\u4e86\u56de\u7b54\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u66f4\u4ed4\u7ec6\u5730\u7814\u7a76\u4e86\u6df7\u5408\u4e13\u5bb6\uff08MoE\uff09LLMs\u3002\u6211\u4eec\u7684\u7814\u7a76\u8868\u660e\uff0cMoE LLMs\u4e2d\u7684\u4e13\u5bb6\u8def\u7531\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u73b0\u6210\u7684\u5d4c\u5165\u6a21\u578b\uff0c\u5728\u5404\u79cd\u5d4c\u5165\u91cd\u70b9\u4efb\u52a1\u4e0a\u8868\u73b0\u51fa\u8272\uff0c\u800c\u65e0\u9700\u4efb\u4f55\u5fae\u8c03\u3002\u6b64\u5916\uff0c\u6211\u4eec\u5e7f\u6cdb\u7684\u5206\u6790\u8868\u660e\uff0cMoE\u8def\u7531\u6743\u91cd\uff08RW\uff09\u4e0eLLMs\u5e7f\u6cdb\u4f7f\u7528\u7684\u9690\u85cf\u72b6\u6001\uff08HS\uff09\u4e92\u8865\u3002\u4e0eHS\u76f8\u6bd4\uff0c\u6211\u4eec\u53d1\u73b0RW\u5bf9\u63d0\u793a\u7684\u9009\u62e9\u66f4\u5177\u9c81\u68d2\u6027\uff0c\u5e76\u5173\u6ce8\u9ad8\u5c42\u6b21\u8bed\u4e49\u3002\u53d7\u6b64\u5206\u6790\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86MoEE\uff0c\u7ed3\u5408\u4e86RW\u548cHS\uff0c\u5176\u6027\u80fd\u4f18\u4e8e\u5355\u72ec\u4f7f\u7528\u4efb\u4e00\u65b9\u6cd5\u3002\u6211\u4eec\u5bf9\u5b83\u4eec\u7684\u7ec4\u5408\u53ca\u5176\u63d0\u793a\u7b56\u7565\u7684\u63a2\u7d22\u63ed\u793a\u4e86\u82e5\u5e72\u65b0\u9896\u89c1\u89e3\uff0c\u4f8b\u5982\uff0cRW\u548cHS\u76f8\u4f3c\u5ea6\u7684\u52a0\u6743\u548c\u4f18\u4e8e\u5b83\u4eec\u8fde\u63a5\u540e\u7684\u76f8\u4f3c\u5ea6\u3002\u6211\u4eec\u5728\u6765\u81ea\u5927\u89c4\u6a21\u6587\u672c\u5d4c\u5165\u57fa\u51c6\uff08MTEB\uff09\u76846\u4e2a\u5d4c\u5165\u4efb\u52a1\u4e2d\u768420\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5b9e\u9a8c\u3002\u7ed3\u679c\u8868\u660e\uff0cMoEE\u663e\u8457\u63d0\u5347\u4e86\u57fa\u4e8eLLM\u7684\u5d4c\u5165\u6548\u679c\uff0c\u4e14\u65e0\u9700\u8fdb\u4e00\u6b65\u5fae\u8c03\u3002|\n", "2410.10801": "|**2024-10-14**|**Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning**|Aakanksha et.al.|[2410.10801](http://arxiv.org/abs/2410.10801)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5df2\u88ab\u5168\u7403\u5e7f\u6cdb\u91c7\u7528\uff0c\u5e94\u7528\u4e8e\u5404\u79cd\u9886\u57df\u3002\u7136\u800c\uff0c\u786e\u4fdd\u5176\u5b89\u5168\u4f7f\u7528\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u504f\u597d\u8bad\u7ec3\u548c\u5b89\u5168\u63aa\u65bd\u5f80\u5f80\u8fc7\u5ea6\u62df\u5408\u4e8e\u897f\u65b9\u4e2d\u5fc3\u6570\u636e\u96c6\u4e2d\u7684\u5371\u5bb3\uff0c\u800c\u5b89\u5168\u534f\u8bae\u901a\u5e38\u65e0\u6cd5\u6269\u5c55\u5230\u591a\u8bed\u8a00\u73af\u5883\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5728\u591a\u6837\u5316\u7684\u591a\u4efb\u52a1\u8bbe\u7f6e\u4e2d\u63a2\u7d22\u6a21\u578b\u5408\u5e76\uff0c\u5728\u591a\u8bed\u8a00\u80cc\u666f\u4e0b\u7ed3\u5408\u5b89\u5168\u548c\u901a\u7528\u4efb\u52a1\u3002\u6bcf\u79cd\u8bed\u8a00\u5728\u4e0d\u540c\u4efb\u52a1\u4e2d\u5f15\u5165\u4e86\u72ec\u7279\u7684\u5b66\u4e60\u6311\u6218\u3002\u6211\u4eec\u53d1\u73b0\uff0c\u57fa\u4e8e\u76ee\u6807\u7684\u5408\u5e76\u6bd4\u6df7\u5408\u6570\u636e\u66f4\u6709\u6548\uff0c\u603b\u4f53\u6027\u80fd\u548c\u5b89\u5168\u6027\u5206\u522b\u63d0\u9ad8\u4e868%\u548c10%\u3002\u6211\u4eec\u8fd8\u53d1\u73b0\uff0c\u57fa\u4e8e\u8bed\u8a00\u7684\u5408\u5e76\u975e\u5e38\u6709\u6548\u2014\u2014\u901a\u8fc7\u5408\u5e76\u5355\u8bed\u5fae\u8c03\u6a21\u578b\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u5728\u76f8\u540c\u53ef\u7528\u6570\u636e\u4e0b\uff0c\u76f8\u6bd4\u6df7\u5408\u6570\u636e\u65b9\u6cd5\uff0c\u6574\u4f53\u6027\u80fd\u63d0\u9ad84%\uff0c\u6240\u6709\u8bed\u8a00\u4e0a\u7684\u5371\u5bb3\u51cf\u5c117%\u3002\u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u5bf9\u5408\u5e76\u65b9\u6cd5\u7684\u7efc\u5408\u7814\u7a76\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6784\u5efa\u5f3a\u5927\u4e14\u5b89\u5168\u7684\u591a\u8bed\u8a00\u6a21\u578b\u7684\u6709\u7528\u6846\u67b6\u3002|\n", "2410.10798": "|**2024-10-15**|**MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling**|Jian Yang et.al.|[2410.10798](http://arxiv.org/abs/2410.10798)|null|\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\u7684\u53d1\u5c55\u63a8\u52a8\u4e86\u8054\u5408\u6982\u7387\u6a21\u578b\u7684\u8fdb\u6b65\uff0c\u8fd9\u4e9b\u6a21\u578b\u80fd\u591f\u540c\u65f6\u7406\u89e3\u548c\u751f\u6210\u56fe\u50cf\u3002\u7136\u800c\uff0c\u6211\u4eec\u53d1\u73b0\u6700\u8fd1\u7684\u65b9\u6cd5\u5728\u7406\u89e3\u4efb\u52a1\u8fc7\u7a0b\u4e2d\u4e0d\u53ef\u907f\u514d\u5730\u4f1a\u4e22\u5931\u56fe\u50cf\u4fe1\u606f\uff0c\u8fd9\u4e3b\u8981\u662f\u7531\u4e8e\u56fe\u50cf\u79bb\u6563\u5316\u6216\u6269\u6563\u53bb\u566a\u6b65\u9aa4\u9020\u6210\u7684\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u591a\u6a21\u6001\u81ea\u56de\u5f52\uff08MMAR\uff09\u6982\u7387\u5efa\u6a21\u6846\u67b6\u3002\u4e0e\u79bb\u6563\u5316\u65b9\u6cd5\u4e0d\u540c\uff0cMMAR\u91c7\u7528\u8fde\u7eed\u503c\u7684\u56fe\u50cf\u6807\u8bb0\u6765\u907f\u514d\u4fe1\u606f\u4e22\u5931\u3002\u4e0d\u540c\u4e8e\u57fa\u4e8e\u6269\u6563\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u901a\u8fc7\u5728\u6bcf\u4e2a\u81ea\u56de\u5f52\u56fe\u50cf\u5757\u5d4c\u5165\u9876\u90e8\u6dfb\u52a0\u4e00\u4e2a\u8f7b\u91cf\u7ea7\u6269\u6563\u5934\u6765\u89e3\u8026\u6269\u6563\u8fc7\u7a0b\u548c\u81ea\u56de\u5f52\u4e3b\u5e72\u6a21\u578b\u3002\u8fd9\u6837\u4e00\u6765\uff0c\u5f53\u6a21\u578b\u4ece\u56fe\u50cf\u751f\u6210\u8fc7\u6e21\u5230\u901a\u8fc7\u6587\u672c\u751f\u6210\u8fdb\u884c\u7406\u89e3\u65f6\uff0c\u4e3b\u5e72\u6a21\u578b\u5bf9\u56fe\u50cf\u7684\u9690\u85cf\u8868\u793a\u4e0d\u53d7\u9650\u4e8e\u6700\u540e\u7684\u53bb\u566a\u6b65\u9aa4\u3002\u4e3a\u4e86\u6210\u529f\u8bad\u7ec3\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u8fd8\u63d0\u51fa\u4e86\u4e00\u79cd\u7406\u8bba\u4e0a\u88ab\u8bc1\u660e\u53ef\u4ee5\u89e3\u51b3\u6570\u503c\u7a33\u5b9a\u6027\u95ee\u9898\u7684\u6280\u672f\uff0c\u5e76\u63d0\u51fa\u4e86\u4e00\u79cd\u5e73\u8861\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u76ee\u6807\u7684\u8bad\u7ec3\u7b56\u7565\u3002\u901a\u8fc7\u572818\u4e2a\u56fe\u50cf\u7406\u89e3\u57fa\u51c6\u4e0a\u8fdb\u884c\u5e7f\u6cdb\u7684\u8bc4\u4f30\uff0cMMAR\u5c55\u793a\u4e86\u6bd4\u5176\u4ed6\u8054\u5408\u591a\u6a21\u6001\u6a21\u578b\u66f4\u4f18\u8d8a\u7684\u6027\u80fd\uff0c\u5176\u6027\u80fd\u53ef\u4e0e\u91c7\u7528\u9884\u8bad\u7ec3CLIP\u89c6\u89c9\u7f16\u7801\u5668\u7684\u65b9\u6cd5\u76f8\u5ab2\u7f8e\uff0c\u540c\u65f6\u8fd8\u80fd\u751f\u6210\u9ad8\u8d28\u91cf\u7684\u56fe\u50cf\u3002\u6211\u4eec\u8fd8\u8868\u660e\uff0c\u8be5\u65b9\u6cd5\u5728\u66f4\u5927\u6570\u636e\u96c6\u548c\u66f4\u5927\u6a21\u578b\u89c4\u6a21\u4e0b\u5177\u6709\u53ef\u6269\u5c55\u6027\u3002|\n", "2410.10796": "|**2024-10-14**|**Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance**|Sachin Goyal et.al.|[2410.10796](http://arxiv.org/abs/2410.10796)|**[link](https://github.com/locuslab/context-parametric-inversion)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\u901a\u8fc7\u6307\u4ee4\u5fae\u8c03\u6765\u589e\u5f3a\u5176\u9075\u5faa\u7528\u6237\u6307\u4ee4\u548c\u5904\u7406\u8f93\u5165\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u5373\u4f7f\u662f\u6700\u5148\u8fdb\u7684\u6a21\u578b\u4e5f\u5e38\u5e38\u96be\u4ee5\u9075\u5faa\u6307\u4ee4\uff0c\u5c24\u5176\u662f\u5728\u8f93\u5165\u4e0a\u4e0b\u6587\u4e0e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e0d\u4e00\u81f4\u65f6\u3002\u8fd9\u4f1a\u5bfc\u81f4\u5404\u79cd\u5931\u8d25\uff0c\u4f8b\u5982\u5e7b\u89c9\uff0c\u5373\u54cd\u5e94\u5185\u5bb9\u8fc7\u65f6\u3001\u5e26\u6709\u504f\u89c1\u6216\u5305\u542b\u672a\u7ecf\u9a8c\u8bc1\u7684\u4e8b\u5b9e\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u8bd5\u56fe\u7406\u89e3\u8fd9\u79cd\u4e0d\u826f\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u7684\u6839\u672c\u539f\u56e0\uff0c\u7279\u522b\u662f\u5728\u6307\u4ee4\u5fae\u8c03\u4e4b\u540e\u3002\u6211\u4eec\u89c2\u5bdf\u5230\u4e00\u4e2a\u6709\u8da3\u7684\u73b0\u8c61\uff1a\u5728\u6307\u4ee4\u5fae\u8c03\u8fc7\u7a0b\u4e2d\uff0c\u4e0a\u4e0b\u6587\u4f9d\u8d56\u6027\u6700\u521d\u5982\u9884\u671f\u822c\u589e\u52a0\uff0c\u4f46\u968f\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u8fdb\u884c\uff0c\u8fd9\u79cd\u4f9d\u8d56\u6027\u9010\u6e10\u51cf\u5c11\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u79f0\u4e3a\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\uff0c\u5e76\u53d1\u73b0\u5728\u591a\u4e2a\u901a\u7528\u6307\u4ee4\u8c03\u4f18\u6570\u636e\u96c6\uff08\u5982TULU\u3001Alpaca\u548cUltrachat\uff09\u4ee5\u53ca\u6a21\u578b\u5bb6\u65cf\uff08\u5982Llama\u3001Mistral\u548cPythia\uff09\u4e2d\u90fd\u5b58\u5728\u8fd9\u79cd\u73b0\u8c61\u3002\u5728\u4e00\u4e2a\u7b80\u5355\u7684\u7406\u8bba\u8bbe\u7f6e\u4e2d\uff0c\u6211\u4eec\u6cbf\u7740\u6307\u4ee4\u5fae\u8c03\u7684\u68af\u5ea6\u4e0b\u964d\u8f68\u8ff9\u5206\u79bb\u51fa\u4e0a\u4e0b\u6587-\u53c2\u6570\u53cd\u8f6c\u53d1\u751f\u7684\u539f\u56e0\u3002\u6211\u4eec\u5c06\u8fd9\u4e00\u73b0\u8c61\u4e0e\u6307\u4ee4\u5fae\u8c03\u6570\u636e\u6df7\u5408\u4e2d\u7684\u793a\u4f8b\u8054\u7cfb\u8d77\u6765\uff0c\u8fd9\u4e9b\u793a\u4f8b\u4e2d\u8f93\u5165\u4e0a\u4e0b\u6587\u63d0\u4f9b\u7684\u4fe1\u606f\u5df2\u7ecf\u5b58\u5728\u4e8e\u6a21\u578b\u7684\u53c2\u6570\u77e5\u8bc6\u4e2d\u3002\u6211\u4eec\u7684\u5206\u6790\u63d0\u51fa\u4e86\u67d0\u4e9b\u6709\u9650\u7684\u7f13\u89e3\u7b56\u7565\uff0c\u540c\u65f6\u4e5f\u9a8c\u8bc1\u4e86\u6211\u4eec\u7684\u7406\u8bba\u89c1\u89e3\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u5de5\u4f5c\u80fd\u4f5c\u4e3a\u89e3\u51b3\u8fd9\u4e00\u5931\u8d25\u6a21\u5f0f\u7684\u4e00\u4e2a\u8d77\u70b9\uff0c\u800c\u8fd9\u4e00\u6a21\u5f0f\u662fLLM\u8bad\u7ec3\u4e2d\u7684\u4e00\u4e2a\u6807\u51c6\u90e8\u5206\u3002**|\n", "2410.10779": "|**2024-10-14**|**Focused ReAct: Improving ReAct through Reiterate and Early Stop**|Shuoqiu Li et.al.|[2410.10779](http://arxiv.org/abs/2410.10779)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u63a8\u7406\u548c\u51b3\u7b56\u80fd\u529b\u65b9\u9762\u6709\u4e86\u663e\u8457\u7684\u63d0\u5347\uff0c\u8fd9\u4f53\u73b0\u5728ReAct\u7b49\u65b9\u6cd5\u4e2d\u3002\u7136\u800c\uff0c\u5c3d\u7ba1ReAct\u5728\u5904\u7406\u590d\u6742\u4efb\u52a1\u65f6\u975e\u5e38\u6709\u6548\uff0c\u4f46\u5b83\u9762\u4e34\u4e24\u4e2a\u4e3b\u8981\u6311\u6218\uff1a\u4e00\u662f\u5bb9\u6613\u504f\u79bb\u539f\u59cb\u95ee\u9898\uff0c\u4e8c\u662f\u9677\u5165\u884c\u52a8\u5faa\u73af\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86Focused ReAct\uff0c\u8fd9\u662fReAct\u8303\u5f0f\u7684\u4e00\u4e2a\u589e\u5f3a\u7248\u672c\uff0c\u5b83\u7ed3\u5408\u4e86\u91cd\u7533\u548c\u65e9\u671f\u505c\u6b62\u673a\u5236\u3002\u8fd9\u4e9b\u6539\u8fdb\u6709\u52a9\u4e8e\u6a21\u578b\u4fdd\u6301\u5bf9\u539f\u59cb\u95ee\u9898\u7684\u5173\u6ce8\u5e76\u907f\u514d\u91cd\u590d\u884c\u4e3a\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u539f\u59cb\u7684ReAct\u65b9\u6cd5\u76f8\u6bd4\uff0cFocused ReAct\u7684\u51c6\u786e\u7387\u63d0\u9ad8\u4e8618%\u5230530%\uff0c\u8fd0\u884c\u65f6\u95f4\u51cf\u5c11\u4e86\u6700\u591a34%\u3002|\n", "2410.10762": "|**2024-10-14**|**AFlow: Automating Agentic Workflow Generation**|Jiayi Zhang et.al.|[2410.10762](http://arxiv.org/abs/2410.10762)|**[link](https://github.com/geekan/metagpt)**|**\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u89e3\u51b3\u5404\u79cd\u9886\u57df\u4e2d\u7684\u590d\u6742\u4efb\u52a1\u65b9\u9762\u5c55\u73b0\u51fa\u4e86\u663e\u8457\u7684\u6f5c\u529b\uff0c\u901a\u5e38\u901a\u8fc7\u91c7\u7528\u9075\u5faa\u8be6\u7ec6\u6307\u4ee4\u548c\u64cd\u4f5c\u5e8f\u5217\u7684\u4ee3\u7406\u5de5\u4f5c\u6d41\u7a0b\u6765\u5b9e\u73b0\u3002\u7136\u800c\uff0c\u6784\u5efa\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\u9700\u8981\u5927\u91cf\u7684\u4eba\u529b\uff0c\u8fd9\u9650\u5236\u4e86\u5176\u53ef\u6269\u5c55\u6027\u548c\u901a\u7528\u6027\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8bd5\u56fe\u81ea\u52a8\u5316\u751f\u6210\u548c\u4f18\u5316\u8fd9\u4e9b\u5de5\u4f5c\u6d41\u7a0b\uff0c\u4f46\u73b0\u6709\u7684\u65b9\u6cd5\u4ecd\u7136\u4f9d\u8d56\u4e8e\u521d\u59cb\u7684\u624b\u52a8\u8bbe\u7f6e\uff0c\u5e76\u4e14\u672a\u80fd\u5b9e\u73b0\u5b8c\u5168\u81ea\u52a8\u5316\u548c\u6709\u6548\u7684\u6d41\u7a0b\u751f\u6210\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u6211\u4eec\u5c06\u5de5\u4f5c\u6d41\u4f18\u5316\u91cd\u65b0\u8868\u8ff0\u4e3a\u4e00\u4e2a\u4ee3\u7801\u8868\u793a\u7684\u5de5\u4f5c\u6d41\u7a7a\u95f4\u641c\u7d22\u95ee\u9898\uff0c\u5728\u8be5\u7a7a\u95f4\u4e2d\uff0c\u7531LLM\u8c03\u7528\u7684\u8282\u70b9\u901a\u8fc7\u8fb9\u8fde\u63a5\u3002\u6211\u4eec\u5f15\u5165\u4e86AFlow\uff0c\u8fd9\u662f\u4e00\u4e2a\u81ea\u52a8\u5316\u7684\u6846\u67b6\uff0c\u4f7f\u7528\u8499\u7279\u5361\u6d1b\u6811\u641c\u7d22\u6709\u6548\u5730\u63a2\u7d22\u8fd9\u4e2a\u7a7a\u95f4\uff0c\u901a\u8fc7\u4ee3\u7801\u4fee\u6539\u3001\u6811\u7ed3\u6784\u7684\u7ecf\u9a8c\u4ee5\u53ca\u6267\u884c\u53cd\u9988\u8fed\u4ee3\u5730\u6539\u8fdb\u5de5\u4f5c\u6d41\u7a0b\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u8bc1\u8bc4\u4f30\u8868\u660e\uff0cAFlow\u7684\u6709\u6548\u6027\uff0c\u5e73\u5747\u6bd4\u6700\u5148\u8fdb\u7684\u57fa\u7ebf\u63d0\u9ad8\u4e865.7%\u3002\u6b64\u5916\uff0cAFlow\u4f7f\u5f97\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u7279\u5b9a\u4efb\u52a1\u4e0a\u80fd\u591f\u8d85\u8d8aGPT-4\uff0c\u540c\u65f6\u5176\u63a8\u7406\u6210\u672c\u4ec5\u4e3aGPT-4\u76844.55%\u3002\u4ee3\u7801\u5c06\u5728https://github.com/geekan/MetaGPT\u83b7\u53d6\u3002**|\n", "2410.10760": "|**2024-10-14**|**Denial-of-Service Poisoning Attacks against Large Language Models**|Kuofeng Gao et.al.|[2410.10760](http://arxiv.org/abs/2410.10760)|**[link](https://github.com/sail-sg/p-dos)**|**\u8fd1\u671f\u7684\u7814\u7a76\u8868\u660e\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5bb9\u6613\u53d7\u5230\u62d2\u7edd\u670d\u52a1\uff08DoS\uff09\u653b\u51fb\uff0c\u8fd9\u79cd\u653b\u51fb\u901a\u8fc7\u6076\u610f\u8f93\u5165\u5982\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u89e6\u53d1\u6a21\u578b\u65e0\u9650\u8f93\u51fa\uff0c\u800c\u4e0d\u4f1a\u751f\u6210[EOS]\u7ed3\u675f\u7b26\u3002\u8fd9\u4e9b\u653b\u51fb\u53ef\u80fd\u5bfc\u81f4\u9ad8\u5ef6\u8fdf\uff0c\u5e76\u4f7fLLM\u670d\u52a1\u5bf9\u5176\u4ed6\u7528\u6237\u6216\u4efb\u52a1\u4e0d\u53ef\u7528\u3002\u7136\u800c\uff0c\u5728\u5b58\u5728\u8bed\u97f3\u5230\u6587\u672c\u63a5\u53e3\u7684\u60c5\u51b5\u4e0b\uff08\u4f8b\u5982\uff0c\u5bf9\u673a\u5668\u4eba\u7684\u8bed\u97f3\u6307\u4ee4\uff09\uff0c\u6267\u884c\u6b64\u7c7bDoS\u653b\u51fb\u53d8\u5f97\u5177\u6709\u6311\u6218\u6027\uff0c\u56e0\u4e3a\u901a\u8fc7\u8bed\u97f3\u5f88\u96be\u5f15\u5165\u62fc\u5199\u9519\u8bef\u6216\u65e0\u610f\u4e49\u7684\u63d0\u793a\u8bcd\u3002\u4e00\u79cd\u7b80\u5355\u7684DoS\u653b\u51fb\u65b9\u5f0f\u662f\u6307\u793a\u6a21\u578b\u201c\u4e0d\u65ad\u91cd\u590d\u2018Hello\u2019\u201d\uff0c\u4f46\u6211\u4eec\u89c2\u5bdf\u5230\u4f9d\u8d56\u81ea\u7136\u6307\u4ee4\u7684\u65b9\u5f0f\u4f1a\u9650\u5236\u8f93\u51fa\u957f\u5ea6\uff0c\u8be5\u957f\u5ea6\u53d7\u9650\u4e8e\u9884\u8bad\u7ec3\u6570\u636e\u7684\u6700\u5927\u957f\u5ea6\u3002\u4e3a\u4e86\u514b\u670d\u8fd9\u4e00\u9650\u5236\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9LLMs\u7684\u57fa\u4e8e\u6295\u6bd2\u7684DoS\uff08P-DoS\uff09\u653b\u51fb\u65b9\u6cd5\uff0c\u8bc1\u660e\u901a\u8fc7\u6ce8\u5165\u4e00\u4e2a\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u6295\u6bd2\u6837\u672c\u53ef\u4ee5\u7a81\u7834\u8f93\u51fa\u957f\u5ea6\u7684\u9650\u5236\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a\u6295\u6bd2\u6837\u672c\u80fd\u591f\u4ee5\u4e0d\u52301\u7f8e\u5143\u7684\u6210\u672c\u6210\u529f\u653b\u51fbGPT-4o\u548cGPT-4o mini\uff08\u901a\u8fc7OpenAI\u7684\u5fae\u8c03API\uff09\uff0c\u5bfc\u81f4\u91cd\u590d\u8f93\u51fa\u76f4\u81f3\u8fbe\u5230\u6700\u5927\u63a8\u7406\u957f\u5ea6\uff0816K\u4e2a\u6807\u8bb0\uff0c\u76f8\u6bd4\u4e4b\u4e0b\u672a\u6295\u6bd2\u524d\u4e3a0.5K\uff09\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u5bf9\u5f00\u6e90LLMs\u8fdb\u884c\u4e86\u5168\u9762\u7684\u6d88\u878d\u7814\u7a76\uff0c\u5e76\u5c06\u6b64\u65b9\u6cd5\u6269\u5c55\u5230LLM\u4ee3\u7406\uff0c\u5176\u4e2d\u653b\u51fb\u8005\u53ef\u4ee5\u63a7\u5236\u5fae\u8c03\u6570\u636e\u96c6\u548c\u7b97\u6cd5\u3002\u6211\u4eec\u7684\u53d1\u73b0\u5f3a\u8c03\u4e86\u9700\u8981\u9632\u5fa1P-DoS\u653b\u51fb\u4ee5\u786e\u4fddLLMs\u7684\u5b89\u5168\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/P-DoS\u83b7\u53d6\u3002**|\n", "2410.10759": "|**2024-10-14**|**SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization**|Akrit Mudvari et.al.|[2410.10759](http://arxiv.org/abs/2410.10759)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u8fd1\u5e74\u6765\u6210\u4e3a\u4e00\u9879\u98a0\u8986\u6027\u7684\u521b\u65b0\uff0c\u5728\u6211\u4eec\u7684\u65e5\u5e38\u751f\u6d3b\u4e2d\u626e\u6f14\u7740\u91cd\u8981\u89d2\u8272\uff0c\u56e0\u4e3a\u5b83\u4eec\u80fd\u591f\u7406\u89e3\u548c\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u6587\u672c\u3002\u5b83\u4eec\u7684\u529f\u80fd\u5305\u62ec\u81ea\u7136\u8bed\u8a00\u7406\u89e3\u3001\u4fe1\u606f\u68c0\u7d22\u548c\u641c\u7d22\u3001\u7ffb\u8bd1\u3001\u804a\u5929\u673a\u5668\u4eba\u3001\u865a\u62df\u52a9\u624b\u7b49\u3002\u7136\u800c\uff0c\u4f17\u6240\u5468\u77e5\uff0cLLMs\u5728\u53c2\u6570\u6570\u91cf\u4e0a\u975e\u5e38\u5e9e\u5927\u3002\u6b64\u5916\uff0c\u5e95\u5c42\u67b6\u6784Transformer\u4e2d\u7684\u81ea\u6ce8\u610f\u529b\u673a\u5236\u5728\u8ba1\u7b97\u548c\u5185\u5b58\u65b9\u9762\u4e0e\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u5448\u4e8c\u6b21\u590d\u6742\u6027\u5173\u7cfb\u3002\u7531\u4e8e\u8fd9\u4e9b\u539f\u56e0\uff0cLLM\u63a8\u7406\u8d44\u6e90\u5bc6\u96c6\u578b\u9ad8\uff0c\u56e0\u6b64LLM\u63a8\u7406\u7684\u541e\u5410\u91cf\u53d7\u5230\u9650\u5236\uff0c\u5c24\u5176\u662f\u5728\u8f83\u957f\u5e8f\u5217\u7684\u60c5\u51b5\u4e0b\u3002\u5728\u8fd9\u4efd\u62a5\u544a\u4e2d\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u670d\u52a1\u5668\u4e0e\u5176\u5ba2\u6237\u7aef\u4e4b\u95f4\u7684\u534f\u4f5c\u63a8\u7406\u67b6\u6784\uff0c\u4ee5\u7f13\u89e3\u541e\u5410\u91cf\u9650\u5236\u3002\u5728\u8fd9\u4e2a\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u8003\u8651\u4e86\u53cc\u65b9\u53ef\u7528\u7684\u8d44\u6e90\uff0c\u5373\u8ba1\u7b97\u548c\u901a\u4fe1\u6210\u672c\u3002\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u57fa\u4e8e\u52a8\u6001\u89c4\u5212\u7684\u7b97\u6cd5\uff0c\u4ee5\u6700\u4f18\u65b9\u5f0f\u5206\u914d\u670d\u52a1\u5668\u548c\u5ba2\u6237\u7aef\u8bbe\u5907\u4e4b\u95f4\u7684\u8ba1\u7b97\uff0c\u4ece\u800c\u63d0\u9ad8\u670d\u52a1\u5668\u541e\u5410\u91cf\uff0c\u540c\u65f6\u4e0d\u8fdd\u53cd\u670d\u52a1\u6c34\u5e73\u534f\u8bae\uff08SLA\uff09\u3002\u5b9e\u9a8c\u8868\u660e\uff0c\u6211\u4eec\u80fd\u591f\u9ad8\u6548\u5730\u5206\u914d\u5de5\u4f5c\u8d1f\u8f7d\uff0c\u4f7f\u670d\u52a1\u5668\u7684\u5de5\u4f5c\u8d1f\u8f7d\u51cf\u5c11\u7ea6\u4e09\u5206\u4e4b\u4e00\uff0c\u540c\u65f6\u6bd4\u8d2a\u5fc3\u65b9\u6cd5\u63d0\u9ad8\u4e8619%\u3002\u7ed3\u679c\u8868\u660e\uff0c\u5728\u5177\u6709\u4e0d\u540c\u7c7b\u578bLLM\u63a8\u7406\u8bf7\u6c42\u7684\u73af\u5883\u4e2d\uff0c\u670d\u52a1\u5668\u7684\u541e\u5410\u91cf\u5f97\u5230\u4e86\u63d0\u5347\u3002|\n", "2410.11841": "|**2024-10-15**|**GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation**|Fei Tang et.al.|[2410.11841](http://arxiv.org/abs/2410.11841)|null|\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\u7684\u53ef\u89e3\u91ca\u63a8\u8350\uff08LLM-based ER\uff09\u7cfb\u7edf\u5728\u751f\u6210\u7c7b\u4f3c\u4eba\u7c7b\u7684\u63a8\u8350\u89e3\u91ca\u65b9\u9762\u663e\u793a\u51fa\u6f5c\u529b\u3002\u7136\u800c\uff0c\u5b83\u4eec\u9762\u4e34\u7740\u5efa\u6a21\u7528\u6237\u4e0e\u9879\u76ee\u4e4b\u95f4\u7684\u534f\u540c\u504f\u597d\u3001\u4e2a\u6027\u5316\u89e3\u91ca\u4ee5\u53ca\u5904\u7406\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u540d\u4e3aGaVaMoE\u7684\u65b0\u6846\u67b6\uff0c\u5373\u9ad8\u65af\u53d8\u5206\u95e8\u63a7\u4e13\u5bb6\u6df7\u5408\u6a21\u578b\uff0c\u7528\u4e8e\u53ef\u89e3\u91ca\u63a8\u8350\u3002GaVaMoE\u5f15\u5165\u4e86\u4e24\u4e2a\u5173\u952e\u7ec4\u4ef6\uff1a(1) \u4e00\u4e2a\u8bc4\u5206\u91cd\u6784\u6a21\u5757\uff0c\u91c7\u7528\u5e26\u6709\u9ad8\u65af\u6df7\u5408\u6a21\u578b\uff08GMM\uff09\u7684\u53d8\u5206\u81ea\u7f16\u7801\u5668\uff08VAE\uff09\uff0c\u4ee5\u6355\u6349\u590d\u6742\u7684\u7528\u6237-\u9879\u76ee\u534f\u540c\u504f\u597d\uff0c\u4f5c\u4e3a\u9884\u8bad\u7ec3\u7684\u591a\u95e8\u673a\u5236\uff1b(2) \u4e00\u7ec4\u7ec6\u7c92\u5ea6\u7684\u4e13\u5bb6\u6a21\u578b\uff0c\u4e0e\u591a\u95e8\u673a\u5236\u8026\u5408\uff0c\u7528\u4e8e\u751f\u6210\u9ad8\u5ea6\u4e2a\u6027\u5316\u7684\u89e3\u91ca\u3002VAE\u7ec4\u4ef6\u5bf9\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u4e2d\u7684\u6f5c\u5728\u56e0\u7d20\u8fdb\u884c\u5efa\u6a21\uff0c\u800cGMM\u5219\u805a\u7c7b\u5177\u6709\u76f8\u4f3c\u884c\u4e3a\u7684\u7528\u6237\u3002\u6bcf\u4e2a\u805a\u7c7b\u5bf9\u5e94\u591a\u95e8\u673a\u5236\u4e2d\u7684\u4e00\u4e2a\u95e8\uff0c\u5c06\u7528\u6237-\u9879\u76ee\u5bf9\u8def\u7531\u5230\u9002\u5f53\u7684\u4e13\u5bb6\u6a21\u578b\u3002\u8fd9\u79cd\u67b6\u6784\u4f7fGaVaMoE\u80fd\u591f\u4e3a\u7279\u5b9a\u7c7b\u578b\u7684\u7528\u6237\u548c\u504f\u597d\u751f\u6210\u5b9a\u5236\u5316\u89e3\u91ca\uff0c\u901a\u8fc7\u5229\u7528\u7528\u6237\u4e4b\u95f4\u7684\u76f8\u4f3c\u6027\u6765\u7f13\u89e3\u6570\u636e\u7a00\u758f\u95ee\u9898\u3002\u5728\u4e09\u4e2a\u771f\u5b9e\u4e16\u754c\u6570\u636e\u96c6\u4e0a\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cGaVaMoE\u5728\u89e3\u91ca\u8d28\u91cf\u3001\u4e2a\u6027\u5316\u548c\u4e00\u81f4\u6027\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u65b9\u6cd5\u3002\u7279\u522b\u662f\uff0c\u5728\u7a00\u758f\u7528\u6237-\u9879\u76ee\u4ea4\u4e92\u573a\u666f\u4e2d\uff0cGaVaMoE\u8868\u73b0\u51fa\u7a33\u5065\u7684\u6027\u80fd\uff0c\u5373\u4f7f\u5bf9\u4e8e\u5386\u53f2\u6570\u636e\u6709\u9650\u7684\u7528\u6237\u4e5f\u80fd\u4fdd\u6301\u9ad8\u8d28\u91cf\u7684\u89e3\u91ca\u3002|\n", "2410.11829": "|**2024-10-15**|**MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding**|Yue Cao et.al.|[2410.11829](http://arxiv.org/abs/2410.11829)|**[link](https://github.com/yuecao0119/MMFuser)**|**\u5c3d\u7ba1\u5728\u8de8\u6a21\u6001\u4ea4\u4e92\u4e2d\u7406\u89e3\u590d\u6742\u7684\u4eba\u7c7b\u610f\u56fe\u65b9\u9762\uff0c\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u6355\u6349\u590d\u6742\u7684\u56fe\u50cf\u7ec6\u8282\u4ecd\u7136\u5177\u6709\u6311\u6218\u6027\u3002\u5148\u524d\u7684\u65b9\u6cd5\u901a\u8fc7\u6574\u5408\u591a\u4e2a\u89c6\u89c9\u7f16\u7801\u5668\u6765\u589e\u5f3a\u89c6\u89c9\u7ec6\u8282\uff0c\u4f46\u8fd9\u79cd\u65b9\u6cd5\u5f15\u5165\u4e86\u5197\u4f59\u548c\u8ba1\u7b97\u5f00\u9500\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u5927\u591a\u6570MLLMs\u4ec5\u4f7f\u7528\u89c6\u89c9\u7f16\u7801\u5668\u7684\u6700\u540e\u4e00\u5c42\u7279\u5f81\u56fe\u6765\u8fdb\u884c\u89c6\u89c9\u8868\u793a\uff0c\u800c\u5ffd\u7565\u4e86\u6d45\u5c42\u7279\u5f81\u56fe\u4e2d\u7684\u4e30\u5bcc\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\\modelname\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u591a\u5c42\u7279\u5f81\u878d\u5408\u5668\uff0c\u80fd\u591f\u9ad8\u6548\u5730\u6574\u5408\u6765\u81ea\u89c6\u89c9\u53d8\u6362\u5668\uff08ViTs\uff09\u7684\u6df1\u5c42\u548c\u6d45\u5c42\u7279\u5f81\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u5b83\u5229\u7528\u8bed\u4e49\u5bf9\u9f50\u7684\u6df1\u5c42\u7279\u5f81\u4f5c\u4e3a\u67e5\u8be2\uff0c\u52a8\u6001\u63d0\u53d6\u6d45\u5c42\u7279\u5f81\u4e2d\u7f3a\u5931\u7684\u7ec6\u8282\uff0c\u4ece\u800c\u5728\u4fdd\u6301\u8bed\u4e49\u5bf9\u9f50\u7684\u540c\u65f6\u4e30\u5bcc\u4e86\u8868\u793a\u5f62\u5f0f\u7684\u7ec6\u7c92\u5ea6\u4fe1\u606f\u3002\u5e94\u7528\u4e8eLLaVA-1.5\u6a21\u578b\u65f6\uff0c\\modelname\u5728\u89c6\u89c9\u8868\u793a\u548c\u57fa\u51c6\u6027\u80fd\u4e0a\u53d6\u5f97\u4e86\u663e\u8457\u63d0\u5347\uff0c\u63d0\u4f9b\u4e86\u4e00\u79cd\u6bd4\u591a\u7f16\u7801\u5668\u96c6\u6210\u65b9\u6cd5\u66f4\u7075\u6d3b\u3001\u66f4\u8f7b\u91cf\u5316\u7684\u89e3\u51b3\u65b9\u6848\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5df2\u53d1\u5e03\u5728https://github.com/yuecao0119/MMFuser\u3002**|\n", "2410.11815": "|**2024-10-15**|**SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing**|Zhiyuan Zhang et.al.|[2410.11815](http://arxiv.org/abs/2410.11815)|null|\u573a\u666f\u56fe\u4ee5\u8282\u70b9\u548c\u8fb9\u7684\u5f62\u5f0f\u63d0\u4f9b\u4e86\u56fe\u50cf\u7684\u7ed3\u6784\u5316\u3001\u5206\u5c42\u8868\u793a\uff0c\u5206\u522b\u8868\u793a\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\u3002\u5b83\u53ef\u4ee5\u7528\u4f5c\u56fe\u50cf\u7f16\u8f91\u7684\u81ea\u7136\u754c\u9762\uff0c\u663e\u8457\u63d0\u9ad8\u7cbe\u5ea6\u548c\u7075\u6d3b\u6027\u3002\u5229\u7528\u8fd9\u4e00\u4f18\u52bf\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u4e2a\u65b0\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u5c06\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u4e0e\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6a21\u578b\u76f8\u7ed3\u5408\uff0c\u7528\u4e8e\u57fa\u4e8e\u573a\u666f\u56fe\u7684\u56fe\u50cf\u7f16\u8f91\u3002\u8fd9\u79cd\u96c6\u6210\u4f7f\u5f97\u5728\u5bf9\u8c61\u7ea7\u522b\u8fdb\u884c\u7cbe\u786e\u4fee\u6539\u4ee5\u53ca\u5bf9\u573a\u666f\u8fdb\u884c\u521b\u9020\u6027\u91cd\u6784\u6210\u4e3a\u53ef\u80fd\uff0c\u800c\u4e0d\u4f1a\u635f\u5bb3\u6574\u4f53\u56fe\u50cf\u7684\u5b8c\u6574\u6027\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u5206\u4e3a\u4e24\u4e2a\u4e3b\u8981\u9636\u6bb5\uff1a1\uff09\u5229\u7528LLM\u9a71\u52a8\u7684\u573a\u666f\u89e3\u6790\u5668\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u56fe\u50cf\u7684\u573a\u666f\u56fe\uff0c\u6355\u6349\u5173\u952e\u5bf9\u8c61\u53ca\u5176\u76f8\u4e92\u5173\u7cfb\uff0c\u5e76\u89e3\u6790\u7ec6\u7c92\u5ea6\u5c5e\u6027\u5982\u5bf9\u8c61\u63a9\u7801\u548c\u63cf\u8ff0\u3002\u8fd9\u4e9b\u6ce8\u91ca\u4fc3\u8fdb\u4e86\u6982\u5ff5\u5b66\u4e60\uff0c\u4f7f\u7528\u5fae\u8c03\u6269\u6563\u6a21\u578b\u6765\u4ee3\u8868\u6bcf\u4e2a\u5bf9\u8c61\uff0c\u7528\u4f18\u5316\u7684\u6807\u8bb0\u548c\u8be6\u7ec6\u7684\u63cf\u8ff0\u63d0\u793a\u8868\u793a\u30022\uff09\u5728\u56fe\u50cf\u7f16\u8f91\u9636\u6bb5\uff0cLLM\u7f16\u8f91\u63a7\u5236\u5668\u6307\u5bfc\u7279\u5b9a\u533a\u57df\u7684\u7f16\u8f91\u3002\u8fd9\u4e9b\u7f16\u8f91\u901a\u8fc7\u6ce8\u610f\u529b\u8c03\u8282\u7684\u6269\u6563\u7f16\u8f91\u5668\u5b9e\u73b0\uff0c\u5229\u7528\u5fae\u8c03\u6a21\u578b\u6267\u884c\u5bf9\u8c61\u6dfb\u52a0\u3001\u5220\u9664\u3001\u66ff\u6362\u548c\u8c03\u6574\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u6211\u4eec\u7684\u6846\u67b6\u5728\u7f16\u8f91\u7cbe\u5ea6\u548c\u573a\u666f\u7f8e\u5b66\u65b9\u9762\u663e\u8457\u4f18\u4e8e\u73b0\u6709\u56fe\u50cf\u7f16\u8f91\u65b9\u6cd5\u3002|\n", "2410.11805": "|**2024-10-15**|**NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models**|Han Han et.al.|[2410.11805](http://arxiv.org/abs/2410.11805)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7ed3\u5408\u5de5\u5177\u5b66\u4e60\u5728\u73b0\u5b9e\u5e94\u7528\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u679c\u3002\u5728\u5de5\u5177\u5b66\u4e60\u8fc7\u7a0b\u4e2d\uff0cLLMs\u53ef\u80fd\u4f1a\u6309\u7167\u5d4c\u5957\u987a\u5e8f\u8c03\u7528\u591a\u4e2a\u5de5\u5177\uff0c\u5176\u4e2d\u540e\u4e00\u4e2a\u5de5\u5177\u8c03\u7528\u53ef\u80fd\u5c06\u5176\u524d\u4e00\u4e2a\u5de5\u5177\u7684\u54cd\u5e94\u4f5c\u4e3a\u8f93\u5165\u53c2\u6570\u3002\u7136\u800c\uff0c\u5f53\u524d\u5bf9\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7814\u7a76\u4ecd\u7136\u4e0d\u8db3\uff0c\u56e0\u4e3a\u73b0\u6709\u7684\u57fa\u51c6\u6d4b\u8bd5\u7f3a\u4e4f\u76f8\u5173\u6570\u636e\u5b9e\u4f8b\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u5f15\u5165\u4e86NesTools\u6765\u586b\u8865\u5168\u9762\u8bc4\u4f30\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u7684\u7a7a\u767d\u3002NesTools\u5305\u542b\u4e00\u79cd\u65b0\u9896\u7684\u81ea\u52a8\u6570\u636e\u751f\u6210\u65b9\u6cd5\uff0c\u7528\u4e8e\u6784\u5efa\u5177\u6709\u4e0d\u540c\u5d4c\u5957\u7ed3\u6784\u7684\u5927\u89c4\u6a21\u5d4c\u5957\u5de5\u5177\u8c03\u7528\u3002\u901a\u8fc7\u4eba\u5de5\u5ba1\u6838\u548c\u4f18\u5316\uff0c\u8be5\u6570\u636e\u96c6\u8d28\u91cf\u9ad8\u4e14\u4e0e\u73b0\u5b9e\u573a\u666f\u7d27\u5bc6\u76f8\u5173\u3002\u56e0\u6b64\uff0cNesTools\u53ef\u4ee5\u4f5c\u4e3a\u4e00\u4e2a\u65b0\u7684\u57fa\u51c6\u6765\u8bc4\u4f30LLMs\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u80fd\u529b\u3002\u6211\u4eec\u5bf922\u4e2aLLMs\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u5e76\u4f7f\u7528NesTools\u8fdb\u884c\u4e86\u6df1\u5165\u5206\u6790\uff0c\u7ed3\u679c\u8868\u660e\u5f53\u524d\u7684LLMs\u5728\u590d\u6742\u7684\u5d4c\u5957\u5de5\u5177\u5b66\u4e60\u4efb\u52a1\u4e0a\u4ecd\u7136\u5b58\u5728\u56f0\u96be\u3002|\n", "2410.11802": "|**2024-10-15**|**FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting**|Zhe Li et.al.|[2410.11802](http://arxiv.org/abs/2410.11802)|null|\u65f6\u95f4\u5e8f\u5217\u9884\u6d4b\uff08TSF\uff09\u5728\u91d1\u878d\u3001\u6c14\u8c61\u670d\u52a1\u548c\u80fd\u6e90\u7ba1\u7406\u7b49\u591a\u4e2a\u9886\u57df\u90fd\u662f\u5173\u952e\u529f\u80fd\u3002\u5c3d\u7ba1\u8fd1\u5e74\u6765\u51fa\u73b0\u4e86\u8bb8\u591aTSF\u65b9\u6cd5\uff0c\u4f46\u8fd9\u4e9b\u65b9\u6cd5\u4e2d\u7684\u8bb8\u591a\u9700\u8981\u7279\u5b9a\u9886\u57df\u7684\u6570\u636e\u6536\u96c6\u548c\u6a21\u578b\u8bad\u7ec3\uff0c\u5e76\u4e14\u5728\u65b0\u9886\u57df\u4e0a\u7684\u6cdb\u5316\u6027\u80fd\u8f83\u5dee\u3002\u57fa\u7840\u6a21\u578b\u65e8\u5728\u514b\u670d\u8fd9\u4e00\u5c40\u9650\u3002\u5b83\u4eec\u901a\u8fc7\u5927\u89c4\u6a21\u8bed\u8a00\u6216\u65f6\u95f4\u5e8f\u5217\u6570\u636e\u9884\u8bad\u7ec3\uff0c\u8868\u73b0\u51fa\u5728\u65b0\u6216\u672a\u89c1\u8fc7\u7684\u6570\u636e\u4e0a\u8fdb\u884c\u63a8\u7406\u7684\u6f5c\u529b\u3002\u8fd9\u4fc3\u4f7f\u4e86\u65b0\u578bTSF\u57fa\u7840\u6a21\u578b\u7684\u6d8c\u73b0\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5373FoundTS\uff0c\u4ee5\u5b9e\u73b0\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u5f7b\u5e95\u800c\u516c\u5e73\u7684\u8bc4\u4f30\u548c\u6bd4\u8f83\u3002FoundTS\u6db5\u76d6\u4e86\u5404\u79cd\u57fa\u4e8e\u5927\u578b\u8bed\u8a00\u6a21\u578b\u548c\u9884\u8bad\u7ec3\u65f6\u95f4\u5e8f\u5217\u7684\u57fa\u7840\u6a21\u578b\u3002\u6b64\u5916\uff0cFoundTS\u652f\u6301\u4e0d\u540c\u7684\u9884\u6d4b\u7b56\u7565\uff0c\u5305\u62ec\u96f6\u6837\u672c\u3001\u5c11\u91cf\u6837\u672c\u548c\u5168\u6837\u672c\uff0c\u4ece\u800c\u4fc3\u8fdb\u66f4\u5168\u9762\u7684\u8bc4\u4f30\u3002\u6700\u540e\uff0cFoundTS\u63d0\u4f9b\u4e86\u4e00\u4e2a\u6807\u51c6\u5316\u7684\u8bc4\u4f30\u6d41\u7a0b\u7ba1\u9053\uff0c\u5305\u62ec\u6570\u636e\u96c6\u5206\u5272\u3001\u52a0\u8f7d\u3001\u5f52\u4e00\u5316\u548c\u5c11\u91cf\u6837\u672c\u62bd\u53d6\uff0c\u4ece\u800c\u5b9e\u73b0\u516c\u5e73\u7684\u8bc4\u4f30\u3002\u5728\u6b64\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u5bf9\u5e7f\u6cdb\u9886\u57df\u5185\u5177\u6709\u4e0d\u540c\u7edf\u8ba1\u7279\u6027\u7684\u591a\u79cd\u6570\u636e\u96c6\u4e0a\u7684TSF\u57fa\u7840\u6a21\u578b\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u8bc4\u4f30\u3002\u5177\u4f53\u800c\u8a00\uff0c\u6211\u4eec\u8bc6\u522b\u4e86\u73b0\u6709\u57fa\u7840\u6a21\u578b\u7684\u4f18\u70b9\u3001\u7f3a\u70b9\u53ca\u5176\u5185\u5728\u9650\u5236\uff0c\u5e76\u786e\u5b9a\u4e86\u672a\u6765\u6a21\u578b\u8bbe\u8ba1\u7684\u65b9\u5411\u3002\u6211\u4eec\u7684\u4ee3\u7801\u548c\u6570\u636e\u96c6\u53ef\u4ee5\u5728https://anonymous.4open.science/r/FoundTS-C2B0\u83b7\u53d6\u3002|\n", "2410.11786": "|**2024-10-15**|**Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability**|Tsz Ting Chung et.al.|[2410.11786](http://arxiv.org/abs/2410.11786)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5e7f\u6cdb\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4efb\u52a1\u4e2d\u5c55\u793a\u4e86\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u7684\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u5229\u7528\u4e0a\u4e0b\u6587\u5b66\u4e60\u65f6\u3002\u7136\u800c\uff0c\u4e0a\u4e0b\u6587\u5b66\u4e60\u5e26\u6765\u4e86\u989d\u5916\u7684\u8ba1\u7b97\u548c\u8d22\u52a1\u6210\u672c\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u4e00\u4e9b\u63d0\u793a\u538b\u7f29\u65b9\u6cd5\u88ab\u63d0\u51fa\u4ee5\u538b\u7f29\u4e0a\u4e0b\u6587\u5b66\u4e60\u4e2d\u7684\u63d0\u793a\u3002\u5c3d\u7ba1\u8fd9\u4e9b\u65b9\u6cd5\u53d6\u5f97\u4e86\u6210\u529f\uff0c\u4f46\u5b83\u4eec\u9762\u4e34\u7740\u7531\u4e8e\u6a21\u578b\u7279\u5b9a\u538b\u7f29\u800c\u5bfc\u81f4\u7684\u8fc1\u79fb\u6027\u5dee\u7684\u95ee\u9898\uff0c\u6216\u8005\u4f9d\u8d56\u5916\u90e8\u8bad\u7ec3\u6570\u636e\uff0c\u4f8b\u5982GPT-4\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u5f00\u53d1\u7edf\u4e00\u538b\u7f29\u65b9\u6cd5\u7684\u80fd\u529b\uff0c\u8be5\u65b9\u6cd5\u901a\u8fc7\u79bb\u6563\u5316\u4e0d\u5177\u4fe1\u606f\u6027\u7684\u6807\u8bb0\uff0c\u91c7\u7528\u81ea\u76d1\u7763\u9884\u8bad\u7ec3\u6280\u672f\u3002\u901a\u8fc7\u5728\u6301\u7eed\u9884\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u5f15\u5165\u5c11\u91cf\u53c2\u6570\uff0c\u6240\u63d0\u51fa\u7684Selection-p\u4e3a\u6bcf\u4e2a\u8f93\u5165\u6807\u8bb0\u751f\u6210\u4e00\u4e2a\u6982\u7387\u503c\uff0c\u6307\u793a\u4fdd\u7559\u6216\u4e22\u5f03\u8be5\u6807\u8bb0\u3002\u5b9e\u9a8c\u8868\u660e\uff0cSelection-p\u5728\u591a\u4e2a\u5206\u7c7b\u4efb\u52a1\u4e2d\u8fbe\u5230\u4e86\u6700\u5148\u8fdb\u7684\u6027\u80fd\uff0c\u5728\u5b9e\u73b0\u9ad8\u8fbe10\u500d\u7684\u538b\u7f29\u7387\u7684\u540c\u65f6\uff0c\u4ec5\u7ecf\u5386\u4e86\u5fae\u5c0f\u76840.8%\u6027\u80fd\u4e0b\u964d\u3002\u6b64\u5916\uff0c\u5b83\u76f8\u6bd4\u5148\u524d\u7684\u5de5\u4f5c\u5728\u4e0d\u540c\u6a21\u578b\u4e0a\u7684\u8fc1\u79fb\u6027\u66f4\u4f18\u3002\u53e6\u5916\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u5206\u6790\u4e86Selection-p\u5982\u4f55\u6709\u52a9\u4e8e\u5728\u957f\u4e0a\u4e0b\u6587\u4e2d\u4fdd\u6301\u4e0a\u4e0b\u6587\u5b66\u4e60\u7684\u6027\u80fd\u3002|\n", "2410.11782": "|**2024-10-15**|**G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks**|Guibin Zhang et.al.|[2410.11782](http://arxiv.org/abs/2410.11782)|null|\u8fd1\u671f\u5728\u57fa\u4e8e\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u7684\u4ee3\u7406\u6280\u672f\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u8bc1\u660e\u96c6\u4f53\u667a\u80fd\u53ef\u4ee5\u663e\u8457\u8d85\u8d8a\u5355\u4e2a\u4ee3\u7406\u7684\u80fd\u529b\uff0c\u8fd9\u4e3b\u8981\u5f97\u76ca\u4e8e\u7cbe\u5fc3\u8bbe\u8ba1\u7684\u4ee3\u7406\u95f4\u901a\u4fe1\u62d3\u6251\u3002\u5c3d\u7ba1\u6709\u8bb8\u591a\u591a\u6837\u5316\u4e14\u9ad8\u6027\u80fd\u7684\u8bbe\u8ba1\u53ef\u4f9b\u9009\u62e9\uff0c\u4f46\u5b9e\u8df5\u8005\u5728\u4e3a\u7279\u5b9a\u4efb\u52a1\u9009\u62e9\u6700\u6709\u6548\u7684\u7ba1\u9053\u65f6\u5e38\u5e38\u611f\u5230\u56f0\u60d1\uff1a\u54ea\u79cd\u62d3\u6251\u6700\u9002\u5408\u6211\u7684\u4efb\u52a1\uff0c\u540c\u65f6\u907f\u514d\u4e0d\u5fc5\u8981\u7684\u901a\u4fe1\u4ee4\u724c\u5f00\u9500\u5e76\u786e\u4fdd\u9ad8\u8d28\u91cf\u7684\u89e3\u51b3\u65b9\u6848\uff1f\u9488\u5bf9\u8fd9\u4e00\u56f0\u5883\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86G-Designer\uff0c\u8fd9\u662f\u4e00\u79cd\u81ea\u9002\u5e94\u3001\u9ad8\u6548\u4e14\u7a33\u5065\u7684\u591a\u4ee3\u7406\u90e8\u7f72\u89e3\u51b3\u65b9\u6848\uff0c\u80fd\u591f\u52a8\u6001\u8bbe\u8ba1\u4efb\u52a1\u611f\u77e5\u7684\u5b9a\u5236\u5316\u901a\u4fe1\u62d3\u6251\u3002\u5177\u4f53\u6765\u8bf4\uff0cG-Designer\u5c06\u591a\u4ee3\u7406\u7cfb\u7edf\u5efa\u6a21\u4e3a\u4e00\u4e2a\u591a\u4ee3\u7406\u7f51\u7edc\uff0c\u5229\u7528\u53d8\u5206\u56fe\u81ea\u52a8\u7f16\u7801\u5668\u5bf9\u8282\u70b9\uff08\u4ee3\u7406\uff09\u548c\u4e00\u4e2a\u7279\u5b9a\u4efb\u52a1\u7684\u865a\u62df\u8282\u70b9\u8fdb\u884c\u7f16\u7801\uff0c\u5e76\u89e3\u7801\u51fa\u4e00\u4e2a\u4efb\u52a1\u9002\u5e94\u6027\u5f3a\u4e14\u6027\u80fd\u9ad8\u7684\u901a\u4fe1\u62d3\u6251\u3002\u5728\u516d\u4e2a\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u7684\u5e7f\u6cdb\u5b9e\u9a8c\u8868\u660e\uff0cG-Designer\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\\textbf{(1) \u9ad8\u6027\u80fd}\uff0c\u5728MMLU\u4e0a\u7684\u51c6\u786e\u7387\u8fbe\u523084.50%\uff0c\u5728HumanEval\u4e0a\u7684pass@1\u8fbe\u523089.90%\uff1b\\textbf{(2) \u4efb\u52a1\u9002\u5e94\u6027}\uff0c\u6839\u636e\u4efb\u52a1\u96be\u5ea6\u6784\u5efa\u5b9a\u5236\u5316\u7684\u901a\u4fe1\u534f\u8bae\uff0c\u5c06\u4ee4\u724c\u6d88\u8017\u51cf\u5c11\u4e86\u9ad8\u8fbe95.33%\uff1b\u5e76\u4e14\\textbf{(3) \u5bf9\u6297\u9c81\u68d2}\uff0c\u80fd\u591f\u62b5\u5fa1\u4ee3\u7406\u5bf9\u6297\u653b\u51fb\uff0c\u4ec5\u5bfc\u81f40.3%\u7684\u51c6\u786e\u7387\u4e0b\u964d\u3002|\n", "2410.11781": "|**2024-10-15**|**Language Models Encode Numbers Using Digit Representations in Base 10**|Amit Arnold Levy et.al.|[2410.11781](http://arxiv.org/abs/2410.11781)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5904\u7406\u5373\u4f7f\u662f\u7b80\u5355\u7684\u6570\u503c\u95ee\u9898\u65f6\uff0c\u5982\u6bd4\u8f83\u4e24\u4e2a\u5c0f\u6570\u5b57\uff0c\u4e5f\u7ecf\u5e38\u51fa\u9519\u3002\u4e00\u4e2a\u81ea\u7136\u7684\u5047\u8bbe\u662f\u8fd9\u4e9b\u9519\u8bef\u6e90\u4e8e\u6a21\u578b\u5982\u4f55\u8868\u793a\u6570\u5b57\uff0c\u7279\u522b\u662f\u5b83\u4eec\u662f\u5426\u6355\u6349\u5230\u4e86\u6570\u5b57\u7684\u5b9e\u9645\u6570\u503c\u3002\u6211\u4eec\u901a\u8fc7\u89c2\u5bdf\u53d1\u73b0\uff0cLLM\u5728\u6570\u503c\u4efb\u52a1\u4e0a\u7684\u9519\u8bef\u901a\u5e38\u5206\u5e03\u5728\u7b54\u6848\u7684\u201c\u4f4d\u6570\u201d\u4e0a\uff0c\u800c\u4e0d\u662f\u56f4\u7ed5\u5176\u201c\u6570\u503c\u201d\u6b63\u5e38\u5206\u5e03\u3002\u901a\u8fc7\u4e00\u7cfb\u5217\u63a2\u9488\u5b9e\u9a8c\u548c\u56e0\u679c\u5e72\u9884\uff0c\u6211\u4eec\u5c55\u793a\u4e86LLM\u5185\u90e8\u4ee5\u5341\u8fdb\u5236\u7684\u6bcf\u4e00\u4f4d\u6570\u5b57\u8fdb\u884c\u5706\u73af\u5f0f\u8868\u793a\uff0c\u800c\u4e0d\u662f\u6570\u503c\u8868\u793a\u3002\u8fd9\u79cd\u57fa\u4e8e\u4f4d\u7684\u8868\u793a\u65b9\u5f0f\uff0c\u800c\u975e\u6570\u503c\u8868\u793a\uff0c\u63ed\u793a\u4e86\u6a21\u578b\u5728\u6d89\u53ca\u6570\u503c\u63a8\u7406\u7684\u4efb\u52a1\u4e2d\u7684\u9519\u8bef\u6a21\u5f0f\uff0c\u5e76\u53ef\u4f5c\u4e3a\u672a\u6765\u7814\u7a76\u5206\u6790LLM\u4e2d\u6570\u503c\u673a\u5236\u7684\u57fa\u7840\u3002|\n", "2410.11779": "|**2024-10-15**|**MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation**|Chenxi Wang et.al.|[2410.11779](http://arxiv.org/abs/2410.11779)|**[link](https://github.com/zjunlp/Deco)**|**\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7ecf\u5e38\u8868\u73b0\u51fa\u5e7b\u89c9\u73b0\u8c61\uff0c\u4f46\u5176\u80cc\u540e\u7684\u539f\u56e0\u5c1a\u672a\u5f97\u5230\u5145\u5206\u7406\u89e3\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u8fdb\u884c\u4e86\u5b9e\u8bc1\u5206\u6790\u5e76\u53d1\u73b0\uff0c\u5c3d\u7ba1MLLMs\u5728\u6700\u7ec8\u8f93\u51fa\u4e2d\u9519\u8bef\u5730\u751f\u6210\u4e86\u5bf9\u8c61\uff0c\u4f46\u5728\u524d\u4e00\u5c42\u5b83\u4eec\u5b9e\u9645\u4e0a\u80fd\u591f\u8bc6\u522b\u89c6\u89c9\u5bf9\u8c61\u3002\u6211\u4eec\u63a8\u6d4b\u8fd9\u53ef\u80fd\u662f\u7531\u4e8e\u8bed\u8a00\u6a21\u578b\u7684\u5f3a\u5927\u77e5\u8bc6\u5148\u9a8c\u6291\u5236\u4e86\u89c6\u89c9\u4fe1\u606f\uff0c\u4ece\u800c\u5bfc\u81f4\u5e7b\u89c9\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u52a8\u6001\u6821\u6b63\u89e3\u7801\u65b9\u6cd5\uff08DeCo\uff09\uff0c\u8be5\u65b9\u6cd5\u81ea\u9002\u5e94\u5730\u9009\u62e9\u5408\u9002\u7684\u524d\u4e00\u5c42\uff0c\u5e76\u6309\u6bd4\u4f8b\u5c06\u77e5\u8bc6\u6574\u5408\u5230\u6700\u7ec8\u5c42\u4ee5\u8c03\u6574\u8f93\u51falogits\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0cDeCo\u662f\u4e0e\u6a21\u578b\u65e0\u5173\u7684\uff0c\u53ef\u4ee5\u65e0\u7f1d\u5730\u4e0e\u5404\u79cd\u7ecf\u5178\u89e3\u7801\u7b56\u7565\u7ed3\u5408\uff0c\u5e76\u5e94\u7528\u4e8e\u4e0d\u540c\u7684MLLMs\u3002\u6211\u4eec\u5728\u5e7f\u6cdb\u4f7f\u7528\u7684\u57fa\u51c6\u4e0a\u8bc4\u4f30\u4e86DeCo\uff0c\u7ed3\u679c\u8868\u660e\u5b83\u76f8\u6bd4\u57fa\u7ebf\u5927\u5e45\u964d\u4f4e\u4e86\u5e7b\u89c9\u7387\uff0c\u7a81\u663e\u4e86\u5176\u51cf\u8f7b\u5e7b\u89c9\u7684\u6f5c\u529b\u3002\u4ee3\u7801\u53ef\u5728https://github.com/zjunlp/DeCo\u83b7\u53d6\u3002**|\n", "2410.11772": "|**2024-10-15**|**Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models**|Kai Yao et.al.|[2410.11772](http://arxiv.org/abs/2410.11772)|**[link](https://github.com/kaiseem/ist)**|**\u53c2\u6570\u9ad8\u6548\u5fae\u8c03\uff08PEFT\uff09\u65b9\u6cd5\u56e0\u5176\u5728\u9002\u5e94\u9884\u8bad\u7ec3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5230\u4e0b\u6e38\u4efb\u52a1\u65f6\u663e\u8457\u51cf\u5c11\u5185\u5b58\u548c\u8ba1\u7b97\u5f00\u9500\u7684\u6f5c\u529b\u800c\u5e7f\u53d7\u6b22\u8fce\u3002\u7136\u800c\uff0c\u5927\u591a\u6570PEFT\u65b9\u6cd5\u7684\u4e00\u4e2a\u5e38\u89c1\u9650\u5236\u662f\u5b83\u4eec\u5728\u6574\u4e2a\u5c42\u4e2d\u5e94\u7528\u7edf\u4e00\u7684\u67b6\u6784\u8bbe\u8ba1\uff0c\u8fd9\u6d89\u53ca\u76f8\u540c\u7684\u53ef\u8bad\u7ec3\u6a21\u5757\uff0c\u5e76\u5ffd\u7565\u4e86\u6bcf\u5c42\u7684\u91cd\u8981\u6027\u5dee\u5f02\uff0c\u4ece\u800c\u5bfc\u81f4\u5fae\u8c03\u7ed3\u679c\u4e0d\u4f73\u3002\u4e3a\u4e86\u514b\u670d\u4e0a\u8ff0\u5c40\u9650\u5e76\u83b7\u5f97\u66f4\u597d\u7684\u6027\u80fd\uff0c\u6211\u4eec\u5f00\u53d1\u4e86\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\uff0c\u79f0\u4e3a\u91cd\u8981\u6027\u611f\u77e5\u7a00\u758f\u8c03\u4f18\uff08IST\uff09\uff0c\u4ee5\u5145\u5206\u5229\u7528\u56fa\u6709\u7684\u7a00\u758f\u6027\uff0c\u5e76\u901a\u8fc7\u6709\u6548\u7684\u9010\u5c42\u91cd\u8981\u6027\u8bc4\u5206\u9009\u62e9\u6700\u91cd\u8981\u7684\u5168\u5c42\u5b50\u96c6\u3002\u6240\u63d0\u51fa\u7684IST\u662f\u4e00\u79cd\u901a\u7528\u4e14\u5373\u63d2\u5373\u7528\u7684\u6280\u672f\uff0c\u4e0e\u5404\u79cd\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u517c\u5bb9\u3002\u901a\u8fc7\u5229\u7528\u4f30\u8ba1\u7684\u91cd\u8981\u6027\u5f97\u5206\uff0cIST\u5728PEFT\u6a21\u5757\u4e2d\u52a8\u6001\u66f4\u65b0\u8fd9\u4e9b\u9009\u5b9a\u7684\u5c42\uff0c\u4ece\u800c\u964d\u4f4e\u5185\u5b58\u9700\u6c42\u3002\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u4f9b\u4e86\u6536\u655b\u6027\u7684\u7406\u8bba\u8bc1\u660e\u548c\u4f18\u4e8e\u5747\u5300\u66f4\u65b0\u7b56\u7565\u7684\u5b9e\u8bc1\u8bc1\u636e\uff0c\u4ee5\u8bc1\u660eIST\u76f8\u5bf9\u4e8e\u73b0\u6709\u65b9\u6cd5\u7684\u4f18\u52bf\u3002\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u6db5\u76d6\u4e86\u5404\u79cdLLMs\u3001PEFT\u65b9\u6cd5\u548c\u4e0b\u6e38\u4efb\u52a1\uff0c\u8bc1\u5b9e\u4e86\u6211\u4eec\u63d0\u51fa\u65b9\u6cd5\u7684\u6709\u6548\u6027\uff0c\u5c55\u793a\u4e86IST\u589e\u5f3a\u73b0\u6709\u57fa\u4e8e\u5c42\u7684PEFT\u65b9\u6cd5\u7684\u80fd\u529b\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/Kaiseem/IST\u83b7\u53d6\u3002**|\n", "2410.12788": "|**2024-10-16**|**Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception**|Jihao Zhao et.al.|[2410.12788](http://arxiv.org/abs/2410.12788)|null| Retrieval-Augmented Generation\uff08RAG\uff09\u5728\u4f5c\u4e3a\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53ef\u884c\u8865\u5145\u65f6\uff0c\u5e38\u5e38\u5ffd\u7565\u4e86\u5176\u7ba1\u9053\u4e2d\u4e00\u4e2a\u5173\u952e\u65b9\u9762\u2014\u2014\u6587\u672c\u5206\u5757\uff0c\u8fd9\u5f71\u54cd\u4e86\u77e5\u8bc6\u5bc6\u96c6\u578b\u4efb\u52a1\u7684\u8d28\u91cf\u3002\u672c\u6587\u4ecb\u7ecd\u4e86\u4e00\u79cd\u79f0\u4e3a\u5143\u5206\u5757\uff08Meta-Chunking\uff09\u7684\u6982\u5ff5\uff0c\u8fd9\u662f\u4e00\u79cd\u4ecb\u4e8e\u53e5\u5b50\u548c\u6bb5\u843d\u4e4b\u95f4\u7684\u7c92\u5ea6\uff0c\u7531\u6bb5\u843d\u5185\u5177\u6709\u6df1\u5c42\u6b21\u8bed\u8a00\u903b\u8f91\u8054\u7cfb\u7684\u4e00\u7ec4\u53e5\u5b50\u7ec4\u6210\u3002\u4e3a\u4e86\u5b9e\u73b0\u5143\u5206\u5757\uff0c\u6211\u4eec\u57fa\u4e8eLLMs\u8bbe\u8ba1\u4e86\u4e24\u79cd\u7b56\u7565\uff1a\u8fb9\u754c\u91c7\u6837\u5206\u5757\u548c\u56f0\u60d1\u5ea6\u5206\u5757\u3002\u524d\u8005\u5229\u7528LLMs\u5bf9\u8fde\u7eed\u53e5\u5b50\u662f\u5426\u9700\u8981\u5206\u5272\u8fdb\u884c\u4e8c\u5206\u7c7b\u51b3\u7b56\uff0c\u57fa\u4e8e\u4ece\u8fb9\u754c\u91c7\u6837\u83b7\u5f97\u7684\u6982\u7387\u5dee\u505a\u51fa\u51b3\u7b56\u3002\u540e\u8005\u901a\u8fc7\u5206\u6790\u56f0\u60d1\u5ea6\u5206\u5e03\u7684\u7279\u70b9\u6765\u7cbe\u786e\u8bc6\u522b\u6587\u672c\u5206\u5757\u8fb9\u754c\u3002\u6b64\u5916\uff0c\u8003\u8651\u5230\u4e0d\u540c\u6587\u672c\u7684\u56fa\u6709\u590d\u6742\u6027\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u7ed3\u5408\u5143\u5206\u5757\u4e0e\u52a8\u6001\u5408\u5e76\u7684\u7b56\u7565\uff0c\u4ee5\u5b9e\u73b0\u5728\u7ec6\u7c92\u5ea6\u548c\u7c97\u7c92\u5ea6\u6587\u672c\u5206\u5757\u4e4b\u95f4\u53d6\u5f97\u5e73\u8861\u3002\u5b9e\u9a8c\u5728\u5341\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\uff0c\u7ed3\u679c\u8868\u660e\u5143\u5206\u5757\u53ef\u4ee5\u66f4\u6709\u6548\u5730\u63d0\u9ad8\u57fa\u4e8eRAG\u7684\u5355\u8df3\u548c\u591a\u8df3\u95ee\u7b54\u6027\u80fd\u3002\u4f8b\u5982\uff0c\u57282WikiMultihopQA\u6570\u636e\u96c6\u4e0a\uff0c\u5b83\u6bd4\u76f8\u4f3c\u6027\u5206\u5757\u63d0\u9ad8\u4e861.32\u7684\u6027\u80fd\uff0c\u540c\u65f6\u4ec5\u6d88\u8017\u4e8645.8%\u7684\u65f6\u95f4\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u5728https://github.com/IAAR-Shanghai/Meta-Chunking \u83b7\u53d6\u3002|\n", "2410.12782": "|**2024-10-16**|**In-Context Learning Enables Robot Action Prediction in LLMs**|Yida Yin et.al.|[2410.12782](http://arxiv.org/abs/2410.12782)|null|\u6700\u8fd1\uff0c\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u8bed\u8a00\u9886\u57df\u901a\u8fc7\u4e0a\u4e0b\u6587\u5b66\u4e60\uff08ICL\uff09\u53d6\u5f97\u4e86\u663e\u8457\u7684\u6210\u529f\u3002\u7136\u800c\uff0c\u5229\u7528LLMs\u7684ICL\u80fd\u529b\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u7684\u7814\u7a76\u8fd8\u76f8\u5bf9\u8f83\u5c11\u3002\u5728\u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e00\u79cd\u540d\u4e3aRoboPrompt\u7684\u6846\u67b6\uff0c\u8be5\u6846\u67b6\u4f7f\u73b0\u6210\u7684\u7eaf\u6587\u672cLLMs\u80fd\u591f\u5728\u65e0\u9700\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0b\u901a\u8fc7ICL\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u9996\u5148\u901a\u8fc7\u542f\u53d1\u5f0f\u65b9\u6cd5\u8bc6\u522b\u51fa\u4e00\u4e2a\u7247\u6bb5\u4e2d\u7684\u5173\u952e\u5e27\uff0c\u8fd9\u4e9b\u5173\u952e\u5e27\u6355\u6349\u4e86\u91cd\u8981\u7684\u65f6\u523b\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u4ece\u8fd9\u4e9b\u5173\u952e\u5e27\u4e2d\u63d0\u53d6\u672b\u7aef\u6267\u884c\u5668\u7684\u52a8\u4f5c\u4ee5\u53ca\u4f30\u8ba1\u7684\u521d\u59cb\u7269\u4f53\u59ff\u6001\uff0c\u5e76\u5c06\u4e24\u8005\u8f6c\u6362\u4e3a\u6587\u672c\u63cf\u8ff0\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u7ed3\u6784\u5316\u7684\u6a21\u677f\uff0c\u4ece\u8fd9\u4e9b\u6587\u672c\u63cf\u8ff0\u548c\u4efb\u52a1\u6307\u4ee4\u4e2d\u5f62\u6210ICL\u6f14\u793a\u3002\u8fd9\u4f7f\u5f97LLM\u80fd\u591f\u5728\u6d4b\u8bd5\u65f6\u76f4\u63a5\u9884\u6d4b\u673a\u5668\u4eba\u52a8\u4f5c\u3002\u901a\u8fc7\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u548c\u5206\u6790\uff0cRoboPrompt\u5728\u6a21\u62df\u548c\u771f\u5b9e\u73af\u5883\u4e2d\u5747\u8868\u73b0\u51fa\u6bd4\u96f6\u6837\u672c\u548cICL\u57fa\u7ebf\u66f4\u5f3a\u7684\u6027\u80fd\u3002|\n", "2410.12774": "|**2024-10-16**|**Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information**|Yingya Li et.al.|[2410.12774](http://arxiv.org/abs/2410.12774)|null|\u591a\u4efb\u52a1\u5b66\u4e60\u7684\u6210\u529f\u5728\u5f88\u5927\u7a0b\u5ea6\u4e0a\u53d6\u51b3\u4e8e\u4efb\u52a1\u7684\u5206\u7ec4\u65b9\u5f0f\u3002\u7b80\u5355\u5730\u5c06\u6240\u6709\u4efb\u52a1\u6216\u968f\u673a\u9009\u62e9\u7684\u4efb\u52a1\u7ec4\u5408\u5728\u4e00\u8d77\u53ef\u80fd\u5bfc\u81f4\u8d1f\u8fc1\u79fb\uff0c\u4ece\u800c\u4f7f\u591a\u4efb\u52a1\u6a21\u578b\u7684\u8868\u73b0\u4e0d\u5982\u5355\u4efb\u52a1\u6a21\u578b\u3002\u5c3d\u7ba1\u5df2\u7ecf\u505a\u51fa\u4e86\u8bb8\u591a\u52aa\u529b\u6765\u8bc6\u522b\u4efb\u52a1\u5206\u7ec4\u5e76\u8861\u91cf\u4e0d\u540c\u4efb\u52a1\u4e4b\u95f4\u7684\u76f8\u5173\u6027\uff0c\u4f46\u5b9a\u4e49\u4e00\u4e2a\u6307\u6807\u4ee5\u4ece\u4f17\u591a\u6f5c\u5728\u4efb\u52a1\u7ec4\u5408\u4e2d\u786e\u5b9a\u6700\u4f73\u4efb\u52a1\u5206\u7ec4\u4ecd\u7136\u662f\u4e00\u4e2a\u5177\u6709\u6311\u6218\u6027\u7684\u7814\u7a76\u8bfe\u9898\u3002\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u57fa\u4e8e\u70b9\u5f0fV-\u53ef\u7528\u4fe1\u606f\uff08PVI\uff09\u6d4b\u91cf\u4efb\u52a1\u96be\u5ea6\u7684\u4efb\u52a1\u76f8\u5173\u6027\u5ea6\u91cf\u65b9\u6cd5\u3002PVI\u662f\u4e00\u79cd\u65b0\u8fd1\u63d0\u51fa\u7684\u5ea6\u91cf\u6807\u51c6\uff0c\u7528\u4e8e\u4f30\u8ba1\u7ed9\u5b9a\u6a21\u578b\u65f6\u6570\u636e\u96c6\u5305\u542b\u591a\u5c11\u53ef\u7528\u4fe1\u606f\u3002\u6211\u4eec\u5047\u8bbe\u5177\u6709\u7edf\u8ba1\u4e0a\u4e0d\u53ef\u533a\u5206\u7684PVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u8db3\u591f\u76f8\u4f3c\uff0c\u53ef\u4ee5\u4ece\u8054\u5408\u5b66\u4e60\u8fc7\u7a0b\u4e2d\u53d7\u76ca\u3002\u6211\u4eec\u5728\u4e00\u822c\u3001\u751f\u7269\u533b\u5b66\u548c\u4e34\u5e8a\u9886\u57df\u768415\u4e2aNLP\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5168\u9762\u5b9e\u9a8c\uff0c\u4ee5\u8bc4\u4f30\u8be5\u5ea6\u91cf\u65b9\u6cd5\u7528\u4e8e\u4efb\u52a1\u5206\u7ec4\u7684\u53ef\u884c\u6027\u3002\u6211\u4eec\u5c06\u8054\u5408\u5b66\u4e60\u5668\u7684\u7ed3\u679c\u4e0e\u5355\u4efb\u52a1\u5b66\u4e60\u5668\u3001\u73b0\u6709\u57fa\u7ebf\u65b9\u6cd5\u4ee5\u53ca\u6700\u8fd1\u7684\u5927\u89c4\u6a21\u8bed\u8a00\u6a21\u578b\uff08\u5305\u62ecLlama 2\u548cGPT-4\uff09\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u7ed3\u679c\u663e\u793a\uff0c\u901a\u8fc7\u5c06\u5177\u6709\u76f8\u4f3cPVI\u4f30\u8ba1\u503c\u7684\u4efb\u52a1\u5206\u7ec4\uff0c\u8054\u5408\u5b66\u4e60\u5668\u5728\u8f83\u5c11\u603b\u53c2\u6570\u7684\u60c5\u51b5\u4e0b\u83b7\u5f97\u4e86\u5177\u6709\u7ade\u4e89\u529b\u7684\u7ed3\u679c\uff0c\u5e76\u4e14\u5728\u4e0d\u540c\u9886\u57df\u5185\u8868\u73b0\u4e00\u81f4\u3002|\n", "2410.12757": "|**2024-10-16**|**StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples**|Ajay Patel et.al.|[2410.12757](http://arxiv.org/abs/2410.12757)|null|\u98ce\u683c\u8868\u793a\u65e8\u5728\u5c06\u5177\u6709\u76f8\u4f3c\u5199\u4f5c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u63a5\u8fd1\u7684\u4f4d\u7f6e\uff0c\u5e76\u5c06\u5177\u6709\u4e0d\u540c\u98ce\u683c\u7684\u6587\u672c\u5d4c\u5165\u5230\u8fdc\u79bb\u7684\u4f4d\u7f6e\uff0c\u800c\u4e0d\u8003\u8651\u5185\u5bb9\u3002\u7136\u800c\uff0c\u7528\u4e8e\u8bad\u7ec3\u8fd9\u4e9b\u8868\u793a\u7684\u5bf9\u6bd4\u4e09\u5143\u7ec4\u5f80\u5f80\u5728\u98ce\u683c\u548c\u5185\u5bb9\u4e0a\u90fd\u6709\u6240\u53d8\u5316\uff0c\u5bfc\u81f4\u8868\u793a\u4e2d\u53ef\u80fd\u5b58\u5728\u5185\u5bb9\u6cc4\u6f0f\u7684\u95ee\u9898\u3002\u6211\u4eec\u5f15\u5165\u4e86\u4e00\u79cd\u540d\u4e3aStyleDistance\u7684\u65b0\u65b9\u6cd5\u6765\u8bad\u7ec3\u66f4\u5f3a\u7684\u72ec\u7acb\u4e8e\u5185\u5bb9\u7684\u98ce\u683c\u5d4c\u5165\u3002\u6211\u4eec\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u521b\u5efa\u4e86\u4e00\u4e2a\u5408\u6210\u6570\u636e\u96c6\uff0c\u5176\u4e2d\u5305\u542b\u53d7\u63a7\u98ce\u683c\u53d8\u5316\u7684\u8fd1\u4f3c\u91ca\u4e49\uff0c\u5e76\u4e3a\u7cbe\u786e\u7684\u5bf9\u6bd4\u5b66\u4e60\u751f\u6210\u4e86\u8de8\u8d8a40\u4e2a\u4e0d\u540c\u98ce\u683c\u7279\u5f81\u7684\u6b63\u4f8b\u548c\u8d1f\u4f8b\u3002\u6211\u4eec\u901a\u8fc7\u4eba\u5de5\u548c\u81ea\u52a8\u8bc4\u4f30\u6765\u8bc4\u4f30\u5408\u6210\u6570\u636e\u548c\u5d4c\u5165\u7684\u8d28\u91cf\u3002StyleDistance\u589e\u5f3a\u4e86\u98ce\u683c\u5d4c\u5165\u7684\u5185\u5bb9\u72ec\u7acb\u6027\uff0c\u8fd9\u79cd\u5d4c\u5165\u53ef\u4ee5\u63a8\u5e7f\u5230\u73b0\u5b9e\u4e16\u754c\u7684\u57fa\u51c6\u6d4b\u8bd5\uff0c\u5e76\u5728\u4e0b\u6e38\u5e94\u7528\u4e2d\u4f18\u4e8e\u9886\u5148\u7684\u98ce\u683c\u8868\u793a\u3002\u6211\u4eec\u7684\u6a21\u578b\u53ef\u4ee5\u5728https://huggingface.co/StyleDistance/styledistance\u627e\u5230\u3002|\n", "2410.12735": "|**2024-10-17**|**CREAM: Consistency Regularized Self-Rewarding Language Models**|Zhaoyang Wang et.al.|[2410.12735](http://arxiv.org/abs/2410.12735)|null|\u8fd1\u671f\u7684\u81ea\u6211\u5956\u52b1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u6210\u529f\u5730\u5e94\u7528\u4e86LLM\u4f5c\u4e3a\u88c1\u5224\u7684\u65b9\u6cd5\uff0c\u4ee5\u8fed\u4ee3\u65b9\u5f0f\u63d0\u5347\u5bf9\u9f50\u6027\u80fd\uff0c\u800c\u65e0\u9700\u4eba\u5de5\u6807\u6ce8\u7684\u504f\u597d\u6570\u636e\u3002\u8fd9\u4e9b\u65b9\u6cd5\u901a\u5e38\u4f7f\u7528\u540c\u4e00LLM\u4f5c\u4e3a\u7b56\u7565\u6a21\u578b\uff08\u751f\u6210\u54cd\u5e94\uff09\u548c\u5956\u52b1\u6a21\u578b\uff08\u8bc4\u5206\u548c\u6392\u5e8f\u8fd9\u4e9b\u54cd\u5e94\uff09\u3002\u7136\u540e\uff0c\u6839\u636e\u6392\u540d\u7684\u54cd\u5e94\u4f5c\u4e3a\u504f\u597d\u5bf9\u6765\u901a\u8fc7\u76f4\u63a5\u5bf9\u9f50\u6280\u672f\uff08\u4f8b\u5982DPO\uff09\u8bad\u7ec3LLM\u3002\u7136\u800c\uff0c\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0c\u5956\u52b1\u548c\u6392\u5e8f\u7684\u51c6\u786e\u6027\u6ca1\u6709\u4fdd\u8bc1\uff0c\u8fd9\u5bf9\u4e8e\u786e\u4fdd\u51c6\u786e\u7684\u5956\u52b1\u548c\u9ad8\u8d28\u91cf\u7684\u504f\u597d\u6570\u636e\u81f3\u5173\u91cd\u8981\u3002\u6765\u81ea\u76f8\u5bf9\u8f83\u5c0f\u7684LLM\uff08\u4f8b\u59827B\u53c2\u6570\uff09\u7684\u7ecf\u9a8c\u7ed3\u679c\u4e5f\u8868\u660e\uff0c\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u7ecf\u8fc7\u51e0\u6b21\u8fed\u4ee3\u540e\uff0c\u81ea\u6211\u5956\u52b1\u7684\u6539\u8fdb\u53ef\u80fd\u4f1a\u51cf\u5f31\uff0c\u6211\u4eec\u5047\u8bbe\u8fd9\u662f\u7531\u4e8e\u5956\u52b1\u7cfb\u7edf\u4e2d\u7684\u7d2f\u79ef\u504f\u5dee\u6240\u81f4\u3002\u8fd9\u79cd\u504f\u5dee\u53ef\u80fd\u5bfc\u81f4\u7528\u4e8e\u8bad\u7ec3LLM\u7684\u4e0d\u53ef\u9760\u504f\u597d\u6570\u636e\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u9996\u5148\u5236\u5b9a\u4e86\u5e76\u5206\u6790\u4e86\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\u7684\u5e7f\u4e49\u8fed\u4ee3\u504f\u597d\u5fae\u8c03\u6846\u67b6\u3002\u7136\u540e\uff0c\u6211\u4eec\u5728\u8fd9\u4e00\u5e7f\u4e49\u6846\u67b6\u4e2d\u5f15\u5165\u6b63\u5219\u5316\uff0c\u4ee5\u51cf\u8f7b\u81ea\u6211\u5956\u52b1\u8fc7\u7a0b\u4e2d\u7684\u8fc7\u5ea6\u81ea\u4fe1\u504f\u597d\u6807\u8bb0\u3002\u57fa\u4e8e\u8fd9\u4e00\u7406\u8bba\u6d1e\u5bdf\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u4e00\u81f4\u6027\u6b63\u5219\u5316\u7684\u81ea\u6211\u5956\u52b1\u8bed\u8a00\u6a21\u578b\uff08CREAM\uff09\uff0c\u8be5\u6a21\u578b\u5229\u7528\u4e0d\u540c\u8fed\u4ee3\u4e2d\u7684\u5956\u52b1\u4e00\u81f4\u6027\u6765\u6b63\u5219\u5316\u81ea\u6211\u5956\u52b1\u8bad\u7ec3\uff0c\u5e2e\u52a9\u6a21\u578b\u4ece\u66f4\u53ef\u9760\u7684\u504f\u597d\u6570\u636e\u4e2d\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u79cd\u660e\u786e\u7684\u6b63\u5219\u5316\uff0c\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8bc1\u660e\u4e86CREAM\u5728\u63d0\u9ad8\u5956\u52b1\u4e00\u81f4\u6027\u548c\u5bf9\u9f50\u6027\u80fd\u65b9\u9762\u7684\u4f18\u8d8a\u6027\u3002\u4ee3\u7801\u53ef\u5728https://github.com/Raibows/CREAM\u516c\u5f00\u83b7\u53d6\u3002|\n", "2410.12707": "|**2024-10-16**|**FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression**|Zhenheng Tang et.al.|[2410.12707](http://arxiv.org/abs/2410.12707)|null|\u4e3a\u4e86\u7f13\u89e3\u5728\u8bad\u7ec3\u5927\u578b\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc\uff08DNNs\uff09\uff0c\u7279\u522b\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u65f6\u7684\u786c\u4ef6\u77ed\u7f3a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86FusionLLM\uff0c\u8fd9\u662f\u4e00\u79cd\u53bb\u4e2d\u5fc3\u5316\u7684\u8bad\u7ec3\u7cfb\u7edf\uff0c\u65e8\u5728\u5229\u7528\u5730\u7406\u5206\u5e03\u7684GPU\u8de8\u4e0d\u540c\u7684\u8ba1\u7b97\u96c6\u7fa4\u6216\u5355\u4e2a\u8bbe\u5907\u8fdb\u884cDNN\u8bad\u7ec3\u3002\u53bb\u4e2d\u5fc3\u5316\u8bad\u7ec3\u5728\u7cfb\u7edf\u8bbe\u8ba1\u548c\u6548\u7387\u65b9\u9762\u9762\u4e34\u91cd\u5927\u6311\u6218\uff0c\u5305\u62ec\uff1a1\uff09\u9700\u8981\u8fdc\u7a0b\u81ea\u52a8\u5fae\u5206\uff08RAD\uff09\uff0c2\uff09\u652f\u6301\u7075\u6d3b\u7684\u6a21\u578b\u5b9a\u4e49\u548c\u5f02\u6784\u8f6f\u4ef6\uff0c3\uff09\u5f02\u6784\u786c\u4ef6\u5bfc\u81f4\u8d44\u6e90\u5229\u7528\u7387\u4f4e\u6216\u5b58\u5728\u6162\u901f\u8282\u70b9\u95ee\u9898\uff0c\u4ee5\u53ca4\uff09\u7f51\u7edc\u901a\u4fe1\u7f13\u6162\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e9b\u6311\u6218\uff0c\u5728\u7cfb\u7edf\u8bbe\u8ba1\u4e2d\uff0c\u6211\u4eec\u5c06\u6a21\u578b\u8868\u793a\u4e3a\u64cd\u4f5c\u7b26\uff08OP-DAG\uff09\u7684\u6709\u5411\u65e0\u73af\u56fe\u3002DAG\u4e2d\u7684\u6bcf\u4e2a\u8282\u70b9\u4ee3\u8868DNN\u4e2d\u7684\u64cd\u4f5c\u7b26\uff0c\u8fb9\u5219\u8868\u793a\u64cd\u4f5c\u7b26\u4e4b\u95f4\u7684\u6570\u636e\u4f9d\u8d56\u5173\u7cfb\u3002\u57fa\u4e8e\u8fd9\u79cd\u8bbe\u8ba1\uff0c1\uff09\u7528\u6237\u53ef\u4ee5\u81ea\u5b9a\u4e49\u4efb\u4f55DNN\u800c\u4e0d\u5fc5\u5173\u5fc3\u5e95\u5c42\u64cd\u4f5c\u7b26\u5b9e\u73b0\uff1b2\uff09\u6211\u4eec\u901a\u8fc7\u66f4\u7ec6\u7c92\u5ea6\u7684\u5b50\u4efb\u52a1\u8fdb\u884c\u4efb\u52a1\u8c03\u5ea6\uff0c\u63d0\u4f9b\u66f4\u591a\u7684\u4f18\u5316\u7a7a\u95f4\uff1b3\uff09DAG\u8fd0\u884c\u65f6\u6267\u884c\u5668\u53ef\u4ee5\u5728\u4e0d\u4f9d\u8d56\u4e00\u81f4\u7684\u4f4e\u7ea7\u673a\u5668\u5b66\u4e60\u6846\u67b6\u7248\u672c\u7684\u60c5\u51b5\u4e0b\u5b9e\u73b0RAD\u3002 \u4e3a\u4e86\u63d0\u9ad8\u7cfb\u7edf\u6548\u7387\uff0c\u6211\u4eec\u5b9e\u73b0\u4e86\u4e00\u4e2a\u5de5\u4f5c\u8d1f\u8f7d\u4f30\u8ba1\u5668\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u79cdOP-Fence\u8c03\u5ea6\u5668\uff0c\u5c06\u5177\u6709\u76f8\u4f3c\u5e26\u5bbd\u7684\u8bbe\u5907\u5206\u7ec4\u5728\u4e00\u8d77\uff0c\u5e76\u5bf9DAG\u8fdb\u884c\u5206\u533a\u4ee5\u589e\u52a0\u541e\u5410\u91cf\u3002\u6b64\u5916\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cdAdaTopK\u538b\u7f29\u5668\uff0c\u4ee5\u81ea\u9002\u5e94\u5730\u538b\u7f29\u5728\u6700\u6162\u901a\u4fe1\u94fe\u8def\u4e0a\u7684\u4e2d\u95f4\u6fc0\u6d3b\u548c\u68af\u5ea6\u3002\u4e3a\u4e86\u8bc4\u4f30\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u7b97\u6cd5\u7684\u6536\u655b\u6027\u548c\u6548\u7387\uff0c\u6211\u4eec\u5728\u4e09\u4e2a\u73b0\u5b9e\u6d4b\u8bd5\u5e73\u53f0\u4e0a\u4f7f\u7528\u8fde\u63a5\u901f\u5ea6\u57288 Mbps\u523010 Gbps\u768448\u4e2aGPU\u4e0a\u8bad\u7ec3\u4e86ResNet-101\u548cGPT-2\u3002\u5b9e\u9a8c\u7ed3\u679c\u8868\u660e\uff0c\u4e0e\u57fa\u7ebf\u65b9\u6cd5\u76f8\u6bd4\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u548c\u65b9\u6cd5\u53ef\u4ee5\u5728\u786e\u4fdd\u6536\u655b\u7684\u540c\u65f6\u5b9e\u73b01.45\u81f39.39\u500d\u7684\u901f\u5ea6\u63d0\u5347\u3002|\n", "2410.12700": "|**2024-10-16**|**Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization**|Xingqi Wang et.al.|[2410.12700](http://arxiv.org/abs/2410.12700)|**[link](https://github.com/achernarwang/LiVO)**|**\u8fd1\u5e74\u6765\uff0c\u57fa\u4e8e\u5927\u89c4\u6a21\u6570\u636e\u8bad\u7ec3\u7684\u6269\u6563\u6a21\u578b\u5df2\u7ecf\u80fd\u591f\u751f\u6210\u4e0e\u4eba\u7c7b\u6c34\u5e73\u56fe\u50cf\u96be\u4ee5\u533a\u5206\u7684\u56fe\u50cf\uff0c\u4f46\u5b83\u4eec\u5e38\u5e38\u4ea7\u751f\u6709\u5bb3\u5185\u5bb9\uff0c\u8fd9\u4e9b\u5185\u5bb9\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u4e0d\u7b26\uff0c\u4f8b\u5982\u793e\u4f1a\u504f\u89c1\u548c\u5192\u72af\u6027\u5185\u5bb9\u3002\u5c3d\u7ba1\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u9886\u57df\u8fdb\u884c\u4e86\u5927\u91cf\u7814\u7a76\uff0c\u4f46\u6587\u672c\u5230\u56fe\u50cf\uff08T2I\uff09\u6a21\u578b\u7684\u5bf9\u9f50\u95ee\u9898\u4ecd\u672a\u5f97\u5230\u5145\u5206\u63a2\u7d22\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86LiVO\uff08\u8f7b\u91cf\u7ea7\u4ef7\u503c\u4f18\u5316\uff09\uff0c\u8fd9\u662f\u4e00\u79cd\u65b0\u9896\u7684\u8f7b\u91cf\u7ea7\u65b9\u6cd5\uff0c\u7528\u4e8e\u5c06T2I\u6a21\u578b\u4e0e\u4eba\u7c7b\u4ef7\u503c\u89c2\u5bf9\u9f50\u3002LiVO\u4ec5\u4f18\u5316\u4e00\u4e2a\u5373\u63d2\u5373\u7528\u7684\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u4ee5\u5c06\u6307\u5b9a\u7684\u4ef7\u503c\u539f\u5219\u6574\u5408\u5230\u8f93\u5165\u63d0\u793a\u4e2d\uff0c\u4ece\u800c\u5728\u63a7\u5236\u751f\u6210\u56fe\u50cf\u7684\u8bed\u4e49\u548c\u4ef7\u503c\u89c2\u65b9\u9762\u53d1\u6325\u4f5c\u7528\u3002\u5177\u4f53\u6765\u8bf4\uff0c\u6211\u4eec\u8bbe\u8ba1\u4e86\u4e00\u79cd\u9488\u5bf9\u6269\u6563\u6a21\u578b\u7684\u504f\u597d\u4f18\u5316\u635f\u5931\u51fd\u6570\uff0c\u8be5\u51fd\u6570\u5728\u7406\u8bba\u4e0a\u903c\u8fd1LLM\u5bf9\u9f50\u4e2d\u4f7f\u7528\u7684Bradley-Terry\u6a21\u578b\uff0c\u4f46\u63d0\u4f9b\u4e86\u56fe\u50cf\u8d28\u91cf\u548c\u4ef7\u503c\u4e00\u81f4\u6027\u4e4b\u95f4\u7684\u66f4\u7075\u6d3b\u7684\u6743\u8861\u3002\u4e3a\u4e86\u4f18\u5316\u4ef7\u503c\u7f16\u7801\u5668\uff0c\u6211\u4eec\u8fd8\u5f00\u53d1\u4e86\u4e00\u4e2a\u6846\u67b6\u6765\u81ea\u52a8\u6784\u5efa\u4e00\u4e2a\u5305\u542b86k\u4e2a\u6837\u672c\uff08\u63d0\u793a\u3001\u5bf9\u9f50\u56fe\u50cf\u3001\u8fdd\u53cd\u56fe\u50cf\u3001\u4ef7\u503c\u539f\u5219\uff09\u7684\u6587\u672c-\u56fe\u50cf\u504f\u597d\u6570\u636e\u96c6\u3002\u901a\u8fc7\u4e0d\u66f4\u65b0\u5927\u591a\u6570\u6a21\u578b\u53c2\u6570\u5e76\u901a\u8fc7\u4ece\u8f93\u5165\u63d0\u793a\u4e2d\u8fdb\u884c\u81ea\u9002\u5e94\u4ef7\u503c\u9009\u62e9\uff0cLiVO\u663e\u8457\u51cf\u5c11\u4e86\u6709\u5bb3\u8f93\u51fa\uff0c\u5e76\u5b9e\u73b0\u4e86\u66f4\u5feb\u7684\u6536\u655b\uff0c\u8d85\u8d8a\u4e86\u51e0\u79cd\u5f3a\u5927\u7684\u57fa\u7ebf\u6a21\u578b\uff0c\u8fc8\u51fa\u4e86\u5411\u4f26\u7406\u5bf9\u9f50\u7684T2I\u6a21\u578b\u8fc8\u51fa\u7684\u7b2c\u4e00\u6b65\u3002**|\n", "2410.12686": "|**2024-10-16**|**Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2**|Mohamad Abdi et.al.|[2410.12686](http://arxiv.org/abs/2410.12686)|null|\u89e3\u5256\u5b66\u6807\u5fd7\u5728\u533b\u5b66\u5f71\u50cf\u4e2d\u5bf9\u4e8e\u5bfc\u822a\u548c\u5f02\u5e38\u68c0\u6d4b\u81f3\u5173\u91cd\u8981\u3002\u73b0\u4ee3\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\uff0c\u5982Llama-2\uff0c\u4e3a\u5c06\u8fd9\u4e9b\u6807\u5fd7\u4ece\u81ea\u7531\u6587\u672c\u7684\u653e\u5c04\u5b66\u62a5\u544a\u6620\u5c04\u5230\u56fe\u50cf\u6570\u636e\u4e2d\u7684\u76f8\u5e94\u4f4d\u7f6e\u63d0\u4f9b\u4e86\u5e0c\u671b\u3002\u6700\u8fd1\u7684\u7814\u7a76\u8868\u660e\uff0cLLMs\u53ef\u80fd\u80fd\u591f\u5f62\u6210\u8fde\u8d2f\u7684\u751f\u6210\u8fc7\u7a0b\u8868\u793a\u3002\u53d7\u6b64\u542f\u53d1\uff0c\u6211\u4eec\u7814\u7a76\u4e86LLMs\u662f\u5426\u51c6\u786e\u5730\u8868\u793a\u89e3\u5256\u5b66\u6807\u5fd7\u7684\u7a7a\u95f4\u4f4d\u7f6e\u3002\u901a\u8fc7\u4f7f\u7528Llama-2\u6a21\u578b\u8fdb\u884c\u5b9e\u9a8c\uff0c\u6211\u4eec\u53d1\u73b0\u5b83\u4eec\u53ef\u4ee5\u7ebf\u6027\u5730\u8868\u793a\u7a7a\u95f4\u4e2d\u7684\u89e3\u5256\u5b66\u6807\u5fd7\uff0c\u5e76\u4e14\u5bf9\u4e0d\u540c\u63d0\u793a\u5177\u6709\u76f8\u5f53\u5f3a\u7684\u9c81\u68d2\u6027\u3002\u8fd9\u4e9b\u7ed3\u679c\u5f3a\u8c03\u4e86LLMs\u589e\u5f3a\u533b\u5b66\u5f71\u50cf\u5de5\u4f5c\u6d41\u7a0b\u6548\u7387\u548c\u51c6\u786e\u6027\u7684\u6f5c\u529b\u3002|\n", "2410.12656": "|**2024-10-16**|**Evaluating Morphological Compositional Generalization in Large Language Models**|Mete Ismayilzada et.al.|[2410.12656](http://arxiv.org/abs/2410.12656)|null|\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u79cd\u81ea\u7136\u8bed\u8a00\u751f\u6210\u548c\u7406\u89e3\u4efb\u52a1\u4e2d\u5df2\u7ecf\u53d6\u5f97\u4e86\u663e\u8457\u7684\u8fdb\u5c55\u3002\u7136\u800c\uff0c\u5b83\u4eec\u7684\u8bed\u8a00\u6cdb\u5316\u80fd\u529b\u4ecd\u7136\u503c\u5f97\u8d28\u7591\uff0c\u8fd9\u5f15\u53d1\u4e86\u5173\u4e8e\u8fd9\u4e9b\u6a21\u578b\u662f\u5426\u50cf\u4eba\u7c7b\u4e00\u6837\u5b66\u4e60\u8bed\u8a00\u7684\u7591\u95ee\u3002\u5c3d\u7ba1\u4eba\u7c7b\u5728\u8bed\u8a00\u4f7f\u7528\u4e2d\u8868\u73b0\u51fa\u7ec4\u5408\u80fd\u529b\u548c\u8bed\u8a00\u521b\u9020\u6027\uff0c\u4f46LLMs\u5728\u8fd9\u65b9\u9762\u7684\u8868\u73b0\uff0c\u7279\u522b\u662f\u5728\u5f62\u6001\u5b66\u65b9\u9762\u7684\u80fd\u529b\uff0c\u4ecd\u9700\u8fdb\u4e00\u6b65\u63a2\u7d22\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u901a\u8fc7\u7ec4\u5408\u6027\u7684\u89c6\u89d2\u7cfb\u7edf\u5730\u7814\u7a76\u4e86LLMs\u5728\u5f62\u6001\u5b66\u6cdb\u5316\u65b9\u9762\u7684\u80fd\u529b\u3002\u6211\u4eec\u5c06\u8bcd\u7d20\u5b9a\u4e49\u4e3a\u7ec4\u5408\u7684\u57fa\u672c\u5355\u4f4d\uff0c\u5e76\u8bbe\u8ba1\u4e86\u4e00\u5957\u65b0\u7684\u751f\u6210\u6027\u548c\u5224\u522b\u6027\u4efb\u52a1\u6765\u8bc4\u4f30\u5f62\u6001\u5b66\u7684\u751f\u4ea7\u529b\u548c\u7cfb\u7edf\u6027\u3002\u91cd\u70b9\u5173\u6ce8\u50cf\u571f\u8033\u5176\u8bed\u548c\u82ac\u5170\u8bed\u8fd9\u6837\u7684\u9ecf\u7740\u8bed\uff0c\u6211\u4eec\u8bc4\u4f30\u4e86\u51e0\u79cd\u6700\u5148\u8fdb\u7684\u6307\u4ee4\u5fae\u8c03\u591a\u8bed\u8a00\u6a21\u578b\uff0c\u5305\u62ecGPT-4\u548cGemini\u3002\u6211\u4eec\u7684\u5206\u6790\u8868\u660e\uff0cLLMs\u5728\u5904\u7406\u5f62\u6001\u5b66\u7ec4\u5408\u6cdb\u5316\u65f6\u7279\u522b\u56f0\u96be\uff0c\u5c24\u5176\u662f\u5728\u5e94\u7528\u4e8e\u65b0\u8bcd\u6839\u65f6\uff0c\u968f\u7740\u5f62\u6001\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u6027\u80fd\u6025\u5267\u4e0b\u964d\u3002\u867d\u7136\u6a21\u578b\u80fd\u591f\u6bd4\u968f\u673a\u731c\u6d4b\u66f4\u597d\u5730\u8bc6\u522b\u4e2a\u522b\u5f62\u6001\u7ec4\u5408\uff0c\u4f46\u5176\u8868\u73b0\u7f3a\u4e4f\u7cfb\u7edf\u6027\uff0c\u5bfc\u81f4\u4e0e\u4eba\u7c7b\u76f8\u6bd4\u5b58\u5728\u663e\u8457\u7684\u51c6\u786e\u7387\u5dee\u8ddd\u3002|\n", "2410.12631": "|**2024-10-16**|**Explainable Moral Values: a neuro-symbolic approach to value classification**|Nicolas Lazzari et.al.|[2410.12631](http://arxiv.org/abs/2410.12631)|null|\u672c\u6587\u7814\u7a76\u4e86\u57fa\u4e8e\u672c\u4f53\u7684\u63a8\u7406\u4e0e\u673a\u5668\u5b66\u4e60\u6280\u672f\u5728\u53ef\u89e3\u91ca\u4ef7\u503c\u5206\u7c7b\u4e2d\u7684\u6574\u5408\u3002\u901a\u8fc7\u4f9d\u8d56\u9053\u5fb7\u57fa\u7840\u7406\u8bba\u4e2d\u7684\u9053\u5fb7\u4ef7\u503c\u89c2\u5f62\u5f0f\u5316\u4ee5\u53caDnS\u672c\u4f53\u8bbe\u8ba1\u6a21\u5f0f\uff0c\u4f7f\u7528sandra\u795e\u7ecf\u7b26\u53f7\u63a8\u7406\u5668\u6765\u63a8\u65ad\u6ee1\u8db3\u7279\u5b9a\u53e5\u5b50\u63cf\u8ff0\u7684\u4ef7\u503c\u3002\u53e5\u5b50\u53ca\u5176\u7ed3\u6784\u5316\u8868\u793a\u662f\u4f7f\u7528\u5f00\u6e90\u7684\u5927\u8bed\u8a00\u6a21\u578b\u81ea\u52a8\u751f\u6210\u7684\u3002\u6240\u63a8\u65ad\u7684\u63cf\u8ff0\u88ab\u7528\u6765\u81ea\u52a8\u68c0\u6d4b\u53e5\u5b50\u6240\u5173\u8054\u7684\u4ef7\u503c\u3002\u6211\u4eec\u5c55\u793a\u4e86\u4ec5\u4f9d\u9760\u63a8\u7406\u5668\u7684\u7ed3\u679c\u5373\u53ef\u5b9e\u73b0\u4e0e\u66f4\u590d\u6742\u65b9\u6cd5\u76f8\u5f53\u7684\u53ef\u89e3\u91ca\u5206\u7c7b\u3002\u6211\u4eec\u8fd8\u5c55\u793a\u4e86\u5c06\u63a8\u7406\u5668\u7684\u63a8\u65ad\u7ed3\u679c\u4e0e\u5206\u5e03\u8bed\u4e49\u65b9\u6cd5\u76f8\u7ed3\u5408\u53ef\u4ee5\u5927\u5e45\u8d85\u8d8a\u6240\u6709\u57fa\u7ebf\uff0c\u5305\u62ec\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u67b6\u6784\u7684\u590d\u6742\u6a21\u578b\u3002\u6700\u540e\uff0c\u6211\u4eec\u6784\u5efa\u4e86\u4e00\u4e2a\u53ef\u89c6\u5316\u5de5\u5177\u6765\u63a2\u7d22\u57fa\u4e8e\u7406\u8bba\u7684\u503c\u5206\u7c7b\u7684\u6f5c\u529b\uff0c\u8be5\u5de5\u5177\u53ef\u5728http://xmv.geomeaning.com/\u516c\u5f00\u8bbf\u95ee\u3002|\n", "2410.13863": "|**2024-10-17**|**Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens**|Lijie Fan et.al.|[2410.13863](http://arxiv.org/abs/2410.13863)|null|\u5728\u89c6\u89c9\u9886\u57df\u4e2d\u6269\u5927\u81ea\u56de\u5f52\u6a21\u578b\u7684\u6548\u679c\u5e76\u4e0d\u50cf\u5728\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u90a3\u6837\u663e\u8457\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u7814\u7a76\u4e86\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u4e2d\u7684\u8fd9\u79cd\u6269\u5c55\u95ee\u9898\uff0c\u5e76\u91cd\u70b9\u5173\u6ce8\u4e24\u4e2a\u5173\u952e\u56e0\u7d20\uff1a\u6a21\u578b\u662f\u5426\u4f7f\u7528\u79bb\u6563\u6216\u8fde\u7eed\u7684\u6807\u8bb0\uff0c\u4ee5\u53ca\u6807\u8bb0\u662f\u901a\u8fc7\u7c7b\u4f3c\u4e8eBERT\u6216GPT\u7684\u53d8\u6362\u5668\u67b6\u6784\u4ee5\u968f\u673a\u8fd8\u662f\u56fa\u5b9a\u6805\u683c\u987a\u5e8f\u751f\u6210\u7684\u3002\u6211\u4eec\u7684\u5b9e\u8bc1\u7ed3\u679c\u8868\u660e\uff0c\u867d\u7136\u6240\u6709\u6a21\u578b\u5728\u9a8c\u8bc1\u635f\u5931\u65b9\u9762\u90fd\u80fd\u6709\u6548\u6269\u5c55\uff0c\u4f46\u5b83\u4eec\u7684\u8bc4\u4f30\u6027\u80fd\u2014\u2014\u901a\u8fc7FID\u3001GenEval\u5f97\u5206\u548c\u89c6\u89c9\u8d28\u91cf\u6765\u8861\u91cf\u2014\u2014\u5219\u8868\u73b0\u51fa\u4e0d\u540c\u7684\u8d8b\u52bf\u3002\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u7684\u6a21\u578b\u5728\u89c6\u89c9\u8d28\u91cf\u4e0a\u660e\u663e\u4f18\u4e8e\u90a3\u4e9b\u4f7f\u7528\u79bb\u6563\u6807\u8bb0\u7684\u6a21\u578b\u3002\u6b64\u5916\uff0c\u751f\u6210\u987a\u5e8f\u548c\u6ce8\u610f\u529b\u673a\u5236\u5bf9GenEval\u5f97\u5206\u6709\u663e\u8457\u5f71\u54cd\uff1a\u968f\u673a\u987a\u5e8f\u6a21\u578b\u5728GenEval\u5f97\u5206\u4e0a\u660e\u663e\u4f18\u4e8e\u6805\u683c\u987a\u5e8f\u6a21\u578b\u3002\u53d7\u8fd9\u4e9b\u53d1\u73b0\u7684\u542f\u53d1\uff0c\u6211\u4eec\u8bad\u7ec3\u4e86\u4e00\u79cd\u540d\u4e3aFluid\u7684\u968f\u673a\u987a\u5e8f\u81ea\u56de\u5f52\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u57fa\u4e8e\u8fde\u7eed\u6807\u8bb0\u3002Fluid 10.5B\u6a21\u578b\u5728MS-COCO 30K\u4e0a\u7684\u96f6\u6837\u672cFID\u5f97\u5206\u4e3a6.16\uff0c\u5728GenEval\u57fa\u51c6\u4e0a\u7684\u603b\u4f53\u5f97\u5206\u4e3a0.69\u3002\u6211\u4eec\u5e0c\u671b\u6211\u4eec\u7684\u53d1\u73b0\u548c\u7ed3\u679c\u80fd\u9f13\u52b1\u672a\u6765\u7684\u7814\u7a76\u8fdb\u4e00\u6b65\u7f29\u5c0f\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u4e4b\u95f4\u7684\u6269\u5c55\u5dee\u8ddd\u3002|\n", "2410.13861": "|**2024-10-17**|**PUMA: Empowering Unified MLLM with Multi-granular Visual Generation**|Rongyao Fang et.al.|[2410.13861](http://arxiv.org/abs/2410.13861)|**[link](https://github.com/rongyaofang/puma)**|**\u8fd1\u5e74\u6765\uff0c\u591a\u6a21\u6001\u57fa\u7840\u6a21\u578b\u5728\u89c6\u89c9-\u8bed\u8a00\u7406\u89e3\u65b9\u9762\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\u3002\u521d\u6b65\u5c1d\u8bd5\u4e5f\u63a2\u7d22\u4e86\u591a\u6a21\u6001\u5927\u8bed\u8a00\u6a21\u578b\uff08MLLM\uff09\u5728\u89c6\u89c9\u5185\u5bb9\u751f\u6210\u4e2d\u7684\u6f5c\u529b\u3002\u7136\u800c\uff0c\u73b0\u6709\u5de5\u4f5c\u672a\u80fd\u5145\u5206\u89e3\u51b3\u7edf\u4e00MLLM\u8303\u5f0f\u4e0b\u4e0d\u540c\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u7684\u591a\u6837\u5316\u7c92\u5ea6\u9700\u6c42\u2014\u2014\u4ece\u6587\u672c\u5230\u56fe\u50cf\u751f\u6210\u6240\u9700\u7684\u591a\u6837\u6027\u5230\u56fe\u50cf\u64cd\u4f5c\u6240\u9700\u7684\u7cbe\u786e\u53ef\u63a7\u6027\u3002\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86PUMA\uff0c\u5373\u901a\u8fc7\u591a\u7c92\u5ea6\u89c6\u89c9\u751f\u6210\u8d4b\u80fd\u7edf\u4e00\u7684MLLM\u3002PUMA\u5c06\u591a\u7c92\u5ea6\u89c6\u89c9\u7279\u5f81\u7edf\u4e00\u4f5c\u4e3aMLLM\u7684\u8f93\u5165\u548c\u8f93\u51fa\uff0c\u4f18\u96c5\u5730\u89e3\u51b3\u4e86\u4e0d\u540c\u7c92\u5ea6\u8981\u6c42\u7684\u5404\u79cd\u56fe\u50cf\u751f\u6210\u4efb\u52a1\u3002\u7ecf\u8fc7\u591a\u6a21\u6001\u9884\u8bad\u7ec3\u548c\u4efb\u52a1\u7279\u5b9a\u6307\u4ee4\u5fae\u8c03\u540e\uff0cPUMA\u5c55\u793a\u4e86\u5728\u5e7f\u6cdb\u591a\u6a21\u6001\u4efb\u52a1\u4e2d\u7684\u80fd\u529b\u3002\u8fd9\u9879\u5de5\u4f5c\u4ee3\u8868\u4e86\u671d\u7740\u771f\u6b63\u7edf\u4e00\u7684MLLM\u8fc8\u51fa\u7684\u91cd\u8981\u4e00\u6b65\uff0c\u8fd9\u79cdMLLM\u80fd\u591f\u9002\u5e94\u5404\u79cd\u89c6\u89c9\u4efb\u52a1\u7684\u7c92\u5ea6\u9700\u6c42\u3002\u4ee3\u7801\u548c\u6a21\u578b\u5c06\u5728https://github.com/rongyaofang/PUMA\u53d1\u5e03\u3002**|\n", "2410.13859": "|**2024-10-17**|**$\u03b3-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models**|Yaxin Luo et.al.|[2410.13859](http://arxiv.org/abs/2410.13859)|null|\u5c3d\u7ba1\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u53d6\u5f97\u4e86\u663e\u8457\u8fdb\u5c55\uff0c\u4f46\u5176\u9ad8\u6602\u7684\u8ba1\u7b97\u6210\u672c\u4ecd\u7136\u662f\u5b9e\u9645\u90e8\u7f72\u7684\u969c\u788d\u3002\u53d7\u81ea\u7136\u8bed\u8a00\u5904\u7406\u4e2d\u6df1\u5ea6\u6df7\u5408\uff08MoD\uff09\u7684\u542f\u53d1\uff0c\u6211\u4eec\u4ece\u201c\u6fc0\u6d3b\u6807\u8bb0\u201d\u7684\u89d2\u5ea6\u7740\u624b\u89e3\u51b3\u8fd9\u4e00\u5c40\u9650\u6027\u3002\u6211\u4eec\u7684\u5173\u952e\u89c1\u89e3\u662f\uff0c\u5982\u679c\u5927\u591a\u6570\u6807\u8bb0\u5bf9\u4e8e\u5c42\u8ba1\u7b97\u662f\u5197\u4f59\u7684\uff0c\u5219\u53ef\u4ee5\u901a\u8fc7MoD\u5c42\u76f4\u63a5\u8df3\u8fc7\u5b83\u4eec\u3002\u7136\u800c\uff0c\u76f4\u63a5\u5c06MLLMs\u7684\u5bc6\u96c6\u5c42\u8f6c\u6362\u4e3aMoD\u5c42\u4f1a\u5bfc\u81f4\u663e\u8457\u7684\u6027\u80fd\u4e0b\u964d\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u9488\u5bf9\u73b0\u6709MLLMs\u7684\u521b\u65b0MoD\u9002\u5e94\u7b56\u7565\uff0c\u79f0\u4e3a\u03b3-MoD\u3002\u5728\u03b3-MoD\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u65b0\u7684\u5ea6\u91cf\u65b9\u6cd5\u6765\u6307\u5bfcMLLM\u4e2d\u7684MoD\u90e8\u7f72\uff0c\u5373\u6ce8\u610f\u529b\u56fe\u7684\u79e9\uff08ARank\uff09\u3002\u901a\u8fc7ARank\uff0c\u6211\u4eec\u53ef\u4ee5\u6709\u6548\u5730\u8bc6\u522b\u54ea\u4e9b\u5c42\u662f\u5197\u4f59\u7684\uff0c\u5e76\u5e94\u88ab\u66ff\u6362\u4e3aMoD\u5c42\u3002\u57fa\u4e8eARank\uff0c\u6211\u4eec\u8fdb\u4e00\u6b65\u63d0\u51fa\u4e86\u4e24\u79cd\u65b0\u9896\u7684\u8bbe\u8ba1\uff0c\u4ee5\u6700\u5927\u9650\u5ea6\u5730\u63d0\u9ad8MLLM\u7684\u8ba1\u7b97\u7a00\u758f\u6027\uff0c\u540c\u65f6\u4fdd\u6301\u5176\u6027\u80fd\uff0c\u5373\u5171\u4eab\u89c6\u89c9-\u8bed\u8a00\u8def\u7531\u5668\u548c\u63a9\u7801\u8def\u7531\u5b66\u4e60\u3002\u901a\u8fc7\u8fd9\u4e9b\u8bbe\u8ba1\uff0c\u8d85\u8fc790%\u7684MLLM\u5bc6\u96c6\u5c42\u53ef\u4ee5\u6709\u6548\u8f6c\u6362\u4e3aMoD\u5c42\u3002\u4e3a\u4e86\u9a8c\u8bc1\u6211\u4eec\u7684\u65b9\u6cd5\uff0c\u6211\u4eec\u5c06\u5176\u5e94\u7528\u4e8e\u4e09\u4e2a\u6d41\u884c\u7684MLLM\uff0c\u5e76\u57289\u4e2a\u57fa\u51c6\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\u3002\u5b9e\u9a8c\u7ed3\u679c\u4e0d\u4ec5\u9a8c\u8bc1\u4e86\u03b3-MoD\u5bf9\u73b0\u6709MLLMs\u7684\u663e\u8457\u6548\u7387\u63d0\u5347\uff0c\u8fd8\u786e\u8ba4\u4e86\u5176\u5728\u5404\u79cdMLLM\u4e0a\u7684\u6cdb\u5316\u80fd\u529b\u3002\u4f8b\u5982\uff0c\u03b3-MoD\u4ec5\u5bfc\u81f4\u8f7b\u5fae\u7684\u6027\u80fd\u4e0b\u964d\uff0c\u5373-1.5%\uff0c\u4f46\u53ef\u4ee5\u5206\u522b\u5c06LLaVA-HR\u7684\u8bad\u7ec3\u548c\u63a8\u7406\u65f6\u95f4\u51cf\u5c1131.0%\u548c53.2%\u3002|\n", "2410.13857": "|**2024-10-17**|**How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs**|Guhao Feng et.al.|[2410.13857](http://arxiv.org/abs/2410.13857)|null|\u5c3d\u7ba1\u57fa\u4e8eTransformer\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u5728\u5404\u4e2a\u9886\u57df\u53d6\u5f97\u4e86\u663e\u8457\u6210\u529f\uff0c\u4f46\u7406\u89e3\u548c\u63d0\u5347\u5176\u6570\u5b66\u80fd\u529b\u4ecd\u7136\u662f\u4e00\u4e2a\u91cd\u5927\u6311\u6218\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u5bf9LLMs\u7684\u6570\u5b66\u80fd\u529b\u8fdb\u884c\u4e86\u4e25\u683c\u7684\u7406\u8bba\u5206\u6790\uff0c\u7279\u522b\u5173\u6ce8\u5b83\u4eec\u7684\u7b97\u672f\u6027\u80fd\u3002\u6211\u4eec\u53d1\u73b0\u6570\u503c\u7cbe\u5ea6\u662f\u5f71\u54cd\u5176\u5728\u6570\u5b66\u4efb\u52a1\u4e2d\u7684\u6709\u6548\u6027\u7684\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\u3002\u7814\u7a76\u7ed3\u679c\u8868\u660e\uff0c\u91c7\u7528\u4f4e\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u5728\u5904\u7406\u7b97\u672f\u4efb\u52a1\uff08\u5982\u8fed\u4ee3\u52a0\u6cd5\u548c\u6574\u6570\u4e58\u6cd5\uff09\u65f6\uff0c\u9664\u975e\u6a21\u578b\u5927\u5c0f\u76f8\u5bf9\u4e8e\u8f93\u5165\u957f\u5ea6\u5448\u8d85\u591a\u9879\u5f0f\u589e\u957f\uff0c\u5426\u5219\u65e0\u6cd5\u6709\u6548\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002\u76f8\u6bd4\u4e4b\u4e0b\uff0c\u91c7\u7528\u6807\u51c6\u6570\u503c\u7cbe\u5ea6\u7684Transformer\u53ef\u4ee5\u9ad8\u6548\u5730\u5904\u7406\u8fd9\u4e9b\u4efb\u52a1\uff0c\u5e76\u4e14\u6240\u9700\u7684\u6a21\u578b\u5c3a\u5bf8\u8981\u5c0f\u5f97\u591a\u3002\u6211\u4eec\u8fd8\u901a\u8fc7\u5b9e\u9a8c\u8fdb\u4e00\u6b65\u652f\u6301\u4e86\u6211\u4eec\u7684\u7406\u8bba\u53d1\u73b0\uff0c\u8fd9\u4e9b\u5b9e\u9a8c\u63a2\u7d22\u4e86\u4e0d\u540c\u6570\u503c\u7cbe\u5ea6\u5bf9\u7b97\u672f\u4efb\u52a1\u7684\u5f71\u54cd\uff0c\u4e3a\u63d0\u9ad8LLMs\u7684\u6570\u5b66\u63a8\u7406\u80fd\u529b\u63d0\u4f9b\u4e86\u5b9d\u8d35\u7684\u89c1\u89e3\u3002|\n", "2410.13854": "|**2024-10-17**|**Can MLLMs Understand the Deep Implication Behind Chinese Images?**|Chenhao Zhang et.al.|[2410.13854](http://arxiv.org/abs/2410.13854)|**[link](https://github.com/MING-ZCH/CII-Bench)**|**\u968f\u7740\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u7684\u80fd\u529b\u4e0d\u65ad\u63d0\u5347\uff0c\u5bf9\u8fd9\u4e9b\u6a21\u578b\u8fdb\u884c\u66f4\u9ad8\u9636\u80fd\u529b\u8bc4\u4f30\u7684\u9700\u6c42\u4e5f\u5728\u589e\u52a0\u3002\u7136\u800c\uff0c\u76ee\u524d\u7f3a\u4e4f\u9488\u5bf9MLLMs\u5728\u4e2d\u6587\u89c6\u89c9\u5185\u5bb9\u7684\u9ad8\u9636\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u8fdb\u884c\u8bc4\u4f30\u7684\u5de5\u4f5c\u3002\u4e3a\u4e86\u586b\u8865\u8fd9\u4e00\u7a7a\u767d\uff0c\u6211\u4eec\u5f15\u5165\u4e86\u201c\u4e2d\u6587\u56fe\u50cf\u9690\u542b\u7406\u89e3\u57fa\u51c6\u201d\uff08CII-Bench\uff09\uff0c\u65e8\u5728\u8bc4\u4f30MLLMs\u5bf9\u4e2d\u56fd\u56fe\u50cf\u7684\u9ad8\u9636\u611f\u77e5\u548c\u7406\u89e3\u80fd\u529b\u3002CII-Bench\u5728\u591a\u4e2a\u65b9\u9762\u4e0e\u73b0\u6709\u57fa\u51c6\u6709\u6240\u4e0d\u540c\u3002\u9996\u5148\uff0c\u4e3a\u4e86\u786e\u4fdd\u4e2d\u6587\u8bed\u5883\u7684\u771f\u5b9e\u6027\uff0cCII-Bench\u4e2d\u7684\u56fe\u50cf\u5747\u6765\u81ea\u4e2d\u56fd\u4e92\u8054\u7f51\uff0c\u5e76\u7ecf\u8fc7\u4eba\u5de5\u5ba1\u6838\uff0c\u76f8\u5e94\u7684\u7b54\u6848\u4e5f\u662f\u624b\u52a8\u7f16\u5199\u7684\u3002\u6b64\u5916\uff0cCII-Bench\u8fd8\u7eb3\u5165\u4e86\u4e00\u4e9b\u4ee3\u8868\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u56fe\u50cf\uff0c\u5982\u8457\u540d\u7684\u4e2d\u56fd\u4f20\u7edf\u7ed8\u753b\uff0c\u8fd9\u53ef\u4ee5\u6df1\u5165\u53cd\u6620\u6a21\u578b\u5bf9\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u7684\u7406\u89e3\u3002\u901a\u8fc7\u5728\u591a\u4e2aMLLMs\u4e0a\u5e7f\u6cdb\u5b9e\u9a8c\uff0c\u6211\u4eec\u5f97\u51fa\u4e86\u663e\u8457\u7684\u53d1\u73b0\u3002\u6700\u521d\uff0cMLLMs\u5728CII-Bench\u4e0a\u7684\u8868\u73b0\u4e0e\u4eba\u7c7b\u4e4b\u95f4\u5b58\u5728\u660e\u663e\u5dee\u8ddd\u3002MLLMs\u7684\u6700\u4f73\u51c6\u786e\u7387\u4e3a64.4%\uff0c\u800c\u4eba\u7c7b\u7684\u5e73\u5747\u51c6\u786e\u7387\u4e3a78.2%\uff0c\u6700\u9ad8\u8fbe\u5230\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u768481.0%\u3002\u968f\u540e\uff0cMLLMs\u5728\u5904\u7406\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u56fe\u50cf\u65f6\u8868\u73b0\u8f83\u5dee\uff0c\u8868\u660e\u5b83\u4eec\u5728\u7406\u89e3\u9ad8\u5c42\u6b21\u8bed\u4e49\u548c\u7f3a\u4e4f\u4e2d\u56fd\u4f20\u7edf\u6587\u5316\u6df1\u5ea6\u77e5\u8bc6\u5e93\u65b9\u9762\u5b58\u5728\u5c40\u9650\u3002\u6700\u540e\uff0c\u89c2\u5bdf\u5230\u5927\u591a\u6570\u6a21\u578b\u5728\u63d0\u793a\u4e2d\u52a0\u5165\u56fe\u50cf\u60c5\u611f\u63d0\u793a\u540e\u51c6\u786e\u6027\u6709\u6240\u63d0\u9ad8\u3002\u6211\u4eec\u8ba4\u4e3a\uff0cCII-Bench\u5c06\u4f7fMLLMs\u66f4\u597d\u5730\u7406\u89e3\u4e2d\u6587\u8bed\u4e49\u548c\u7279\u5b9a\u4e8e\u4e2d\u56fd\u7684\u56fe\u50cf\uff0c\u4ece\u800c\u63a8\u52a8\u8fc8\u5411\u4e13\u5bb6\u7ea7\u901a\u7528\u4eba\u5de5\u667a\u80fd\uff08AGI\uff09\u7684\u8fdb\u7a0b\u3002\u6211\u4eec\u7684\u9879\u76ee\u516c\u5f00\u53ef\u8bbf\u95ee\uff0c\u7f51\u5740\u4e3ahttps://cii-bench.github.io/\u3002**|\n", "2410.13852": "|**2024-10-17**|**Retrospective Learning from Interactions**|Zizhao Chen et.al.|[2410.13852](http://arxiv.org/abs/2410.13852)|null|\u591a\u8f6e\u4ea4\u4e92\u4e2d\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u7528\u6237\u4e4b\u95f4\u7684\u5bf9\u8bdd\u81ea\u7136\u5305\u542b\u4e86\u9690\u5f0f\u7684\u53cd\u9988\u4fe1\u53f7\u3002\u5982\u679cLLM\u4ee5\u51fa\u4e4e\u610f\u6599\u7684\u65b9\u5f0f\u56de\u5e94\u6307\u4ee4\uff0c\u7528\u6237\u53ef\u80fd\u4f1a\u901a\u8fc7\u91cd\u65b0\u8868\u8ff0\u8bf7\u6c42\u3001\u8868\u8fbe\u632b\u8d25\u611f\u6216\u8f6c\u5411\u66ff\u4ee3\u4efb\u52a1\u6765\u4f20\u8fbe\u8fd9\u4e00\u4fe1\u53f7\u3002\u8fd9\u4e9b\u4fe1\u53f7\u4e0e\u4efb\u52a1\u65e0\u5173\uff0c\u5e76\u4e14\u5360\u636e\u76f8\u5bf9\u53d7\u9650\u7684\u8bed\u8a00\u5b50\u7a7a\u95f4\uff0c\u4f7f\u5f97LLM\u5373\u4f7f\u5728\u5b9e\u9645\u4efb\u52a1\u4e0a\u5931\u8d25\u4e5f\u80fd\u8bc6\u522b\u5b83\u4eec\u3002\u8fd9\u4e3a\u6301\u7eed\u4ece\u4ea4\u4e92\u4e2d\u5b66\u4e60\u5f00\u8f9f\u4e86\u4e00\u6761\u9014\u5f84\uff0c\u800c\u65e0\u9700\u989d\u5916\u7684\u6ce8\u91ca\u3002\u6211\u4eec\u5f15\u5165\u4e86ReSpect\u65b9\u6cd5\uff0c\u901a\u8fc7\u56de\u987e\u8fc7\u53bb\u7684\u4ea4\u4e92\u6765\u5b66\u4e60\u8fd9\u4e9b\u4fe1\u53f7\u3002\u6211\u4eec\u5728\u4e00\u4e2a\u65b0\u7684\u591a\u6a21\u6001\u4ea4\u4e92\u573a\u666f\u4e2d\u90e8\u7f72\u4e86ReSpect\uff0c\u5728\u8fd9\u4e2a\u573a\u666f\u4e2d\uff0c\u4eba\u7c7b\u6307\u5bfcLLM\u89e3\u51b3\u5177\u6709\u7ec4\u5408\u89e3\u7a7a\u95f4\u7684\u62bd\u8c61\u63a8\u7406\u4efb\u52a1\u3002\u901a\u8fc7\u4e0e\u4eba\u7c7b\u8fdb\u884c\u6570\u5343\u6b21\u4ea4\u4e92\uff0c\u6211\u4eec\u5c55\u793a\u4e86ReSpect\u5982\u4f55\u9010\u6b65\u63d0\u9ad8\u4efb\u52a1\u5b8c\u6210\u7387\uff0c\u4ece\u6700\u521d\u768431%\u63d0\u5347\u523082%\uff0c\u5e76\u4e14\u6574\u4e2a\u8fc7\u7a0b\u6ca1\u6709\u4efb\u4f55\u5916\u90e8\u6ce8\u91ca\u3002|\n", "2410.13846": "|**2024-10-17**|**SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction**|Xuan Zhang et.al.|[2410.13846](http://arxiv.org/abs/2410.13846)|**[link](https://github.com/sail-sg/simlayerkv)**|**\u8fd1\u671f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u53d1\u5c55\u5df2\u7ecf\u6269\u5c55\u4e86\u5b83\u4eec\u5904\u7406\u957f\u4e0a\u4e0b\u6587\u7684\u80fd\u529b\u3002\u7136\u800c\uff0c\u589e\u52a0\u6a21\u578b\u5c42\u6570\u548c\u8f93\u5165\u5e8f\u5217\u957f\u5ea6\u663e\u8457\u589e\u52a0\u4e86\u5b58\u50a8\u952e\u503c\uff08KV\uff09\u7f13\u5b58\u6240\u9700\u7684\u5185\u5b58\uff0c\u8fd9\u7ed9\u9ad8\u6548\u7684\u63a8\u7406\u5e26\u6765\u4e86\u6311\u6218\u3002\u4e3a\u4e86\u7f13\u89e3\u8fd9\u4e00\u95ee\u9898\uff0c\u6211\u4eec\u63d0\u51fa\u4e86SimLayerKV\uff0c\u8fd9\u662f\u4e00\u79cd\u7b80\u5355\u800c\u6709\u6548\u7684\u65b9\u6cd5\uff0c\u901a\u8fc7\u5728\u8bc6\u522b\u51fa\u7684\u61d2\u60f0\u5c42\u4e2d\u9009\u62e9\u6027\u5730\u4e22\u5f03\u7f13\u5b58\u6765\u51cf\u5c11\u5c42\u95f4KV\u7f13\u5b58\u7684\u5197\u4f59\u3002\u6211\u4eec\u7684\u65b9\u6cd5\u57fa\u4e8e\u8fd9\u6837\u7684\u89c2\u5bdf\uff1a\u5728\u957f\u4e0a\u4e0b\u6587LLMs\u4e2d\u7684\u67d0\u4e9b\u5c42\u8868\u73b0\u51fa\u201c\u61d2\u60f0\u201d\u884c\u4e3a\uff0c\u4e0e\u975e\u61d2\u60f0\u5c42\u76f8\u6bd4\uff0c\u8fd9\u4e9b\u5c42\u5bf9\u5efa\u6a21\u957f\u8ddd\u79bb\u4f9d\u8d56\u8d21\u732e\u8f83\u5c0f\u3002\u901a\u8fc7\u5bf9\u6ce8\u610f\u529b\u6743\u91cd\u6a21\u5f0f\u8fdb\u884c\u5206\u6790\uff0c\u6211\u4eec\u53d1\u73b0\u5bf9\u4e8e\u7ed9\u5b9a\u8f93\u5165\uff0c\u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u8fd9\u4e9b\u61d2\u60f0\u5c42\u7684\u884c\u4e3a\u662f\u4e00\u81f4\u7684\u3002\u8fd9\u4e00\u89c1\u89e3\u542f\u53d1\u4e86\u6211\u4eec\u7684SimLayerKV\u65b9\u6cd5\uff0c\u8be5\u65b9\u6cd5\u80fd\u591f\u8bc6\u522b\u61d2\u60f0\u5c42\u5e76\u76f8\u5e94\u5730\u51cf\u5c11\u5176KV\u7f13\u5b58\u3002SimLayerKV\u65e0\u9700\u8bad\u7ec3\uff0c\u5177\u6709\u901a\u7528\u6027\uff0c\u5e76\u4e14\u53ea\u9700\u4e03\u884c\u4ee3\u7801\u5373\u53ef\u5b9e\u73b0\u3002\u6211\u4eec\u5728\u4e09\u4e2a\u4ee3\u8868\u6027LLM\u4e0a\u8fdb\u884c\u4e86\u5e7f\u6cdb\u7684\u5b9e\u9a8c\uff0c\u4f8b\u5982LLaMA2-7B\u3001LLaMA3-8B\u548cMistral-7B\uff0c\u6d89\u53caLongBench\u57fa\u51c6\u4e2d\u768416\u4e2a\u4efb\u52a1\u3002\u7ed3\u679c\u663e\u793a\uff0c\u5f53\u7ed3\u54084\u4f4d\u91cf\u5316\u65f6\uff0cSimLayerKV\u5b9e\u73b0\u4e865\u500d\u7684KV\u7f13\u5b58\u538b\u7f29\u6bd4\uff0c\u6027\u80fd\u4ec5\u4e0b\u964d1.2%\u3002\u6211\u4eec\u7684\u4ee3\u7801\u53ef\u4ee5\u5728https://github.com/sail-sg/SimLayerKV\u83b7\u53d6\u3002**|\n", "2410.13835": "|**2024-10-17**|**Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs**|Tianyu Guo et.al.|[2410.13835](http://arxiv.org/abs/2410.13835)|null|\u5b9e\u8df5\u8005\u4eec\u5728\u53d8\u538b\u5668\u578b\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4e2d\u89c2\u5bdf\u5230\u4e86\u4e09\u79cd\u4ee4\u4eba\u56f0\u60d1\u7684\u73b0\u8c61\uff1a\u6ce8\u610f\u529b\u6c47\u70b9\u3001\u4ef7\u503c\u72b6\u6001\u8017\u5c3d\u548c\u6b8b\u5dee\u72b6\u6001\u5cf0\u503c\uff0c\u8fd9\u4e9b\u73b0\u8c61\u7edf\u79f0\u4e3a\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u3002\u8fd9\u4e9b\u73b0\u8c61\u7684\u7279\u70b9\u662f\u67d0\u4e9b\u6240\u8c13\u7684\u201c\u6c47\u70b9\u6807\u8bb0\u201d\u63a5\u6536\u4e0d\u6210\u6bd4\u4f8b\u9ad8\u7684\u6ce8\u610f\u529b\u6743\u91cd\uff0c\u8868\u73b0\u51fa\u660e\u663e\u8f83\u5c0f\u7684\u4ef7\u503c\u72b6\u6001\uff0c\u5e76\u4e14\u5177\u6709\u6bd4\u5176\u4ed6\u6807\u8bb0\u5927\u5f97\u591a\u7684\u6b8b\u5dee\u72b6\u6001\u8303\u6570\u3002\u8fd9\u4e9b\u6781\u7aef\u6807\u8bb0\u5728LLM\u63a8\u7406\u3001\u91cf\u5316\u548c\u53ef\u89e3\u91ca\u6027\u65b9\u9762\u5f15\u53d1\u4e86\u8bb8\u591a\u6311\u6218\u3002\u6211\u4eec\u9610\u660e\u4e86\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u80cc\u540e\u7684\u673a\u5236\u3002\u9996\u5148\uff0c\u6211\u4eec\u5728\u975e\u5e38\u7b80\u5355\u7684\u67b6\u6784\u4e2d\u5c55\u793a\u4e86\u8fd9\u4e9b\u73b0\u8c61\u2014\u2014\u4ec5\u6709\u4e00\u5230\u4e09\u5c42\u7684\u53d8\u538b\u5668\uff0c\u5728\u73a9\u5177\u6a21\u578bBigram-Backcopy\uff08BB\uff09\u4efb\u52a1\u4e0a\u8bad\u7ec3\u65f6\u4f1a\u51fa\u73b0\u8fd9\u4e9b\u73b0\u8c61\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6211\u4eec\u786e\u5b9a\u4e86\u4e00\u79cd\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5176\u4e2d\u6ce8\u610f\u529b\u5934\u5728\u7279\u5b9a\u8f93\u5165\u57df\u4e2d\u6210\u4e3a\u6c47\u70b9\uff0c\u800c\u5728\u5176\u4ed6\u57df\u4e2d\u5219\u4e0d\u662f\u3002\u6211\u4eec\u7684\u8bad\u7ec3\u52a8\u6001\u7406\u8bba\u5206\u6790\u63ed\u793a\uff0c\u8fd9\u4e9b\u73b0\u8c61\u662f\u7531\u4e00\u79cd\u76f8\u4e92\u589e\u5f3a\u673a\u5236\u9a71\u52a8\u7684\u3002\u57fa\u4e8e\u8fd9\u4e9b\u89c1\u89e3\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e9b\u7b56\u7565\u6765\u5728\u9884\u8bad\u7ec3\u671f\u95f4\u7f13\u89e3\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\uff0c\u5305\u62ec\u7528ReLU\u66ff\u6362softmax\u4ee5\u53ca\u7528SGD\u66ff\u6362Adam\u3002\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u5c06\u5206\u6790\u6269\u5c55\u5230\u9884\u8bad\u7ec3\u7684LLMs\uff0c\u5305\u62ecLlama\u548cOLMo\uff0c\u7ed3\u679c\u663e\u793a\u8bb8\u591a\u6ce8\u610f\u529b\u5934\u8868\u73b0\u51fa\u4e0eBB\u4efb\u52a1\u4e2d\u7c7b\u4f3c\u7684\u6d3b\u8dc3-\u4f11\u7720\u673a\u5236\uff0c\u5e76\u4e14\u76f8\u4e92\u589e\u5f3a\u673a\u5236\u4e5f\u652f\u914d\u7740LLM\u9884\u8bad\u7ec3\u671f\u95f4\u6781\u7aef\u6807\u8bb0\u73b0\u8c61\u7684\u51fa\u73b0\u3002\u6211\u4eec\u7684\u7ed3\u679c\u63ed\u793a\u4e86\u7531BB\u4efb\u52a1\u9884\u6d4b\u7684\u8bb8\u591a\u9759\u6001\u548c\u52a8\u6001\u6027\u8d28\u4e0e\u5728\u9884\u8bad\u7ec3LLMs\u4e2d\u7684\u89c2\u5bdf\u7ed3\u679c\u4e00\u81f4\u3002|\n", "2410.13825": "|**2024-10-17**|**AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents**|Ke Yang et.al.|[2410.13825](http://arxiv.org/abs/2410.13825)|null|\u901a\u8fc7\u4f7f\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u7684\u4ee3\u7406\u6765\u5b9e\u73b0\u4e2a\u6027\u5316\u3001\u6807\u51c6\u5316\u4efb\u52a1\uff0c\u53ef\u4ee5\u63d0\u5347\u4eba\u7c7b\u7684\u5de5\u4f5c\u6548\u7387\u3002\u81ea\u52a8\u5316\u7f51\u7edc\u4efb\u52a1\uff08\u5982\u5728\u9884\u7b97\u5185\u9884\u8ba2\u9152\u5e97\uff09\u7684\u9700\u6c42\u65e5\u76ca\u589e\u52a0\u3002\u8fd9\u4e9b\u7f51\u7edc\u4ee3\u7406\u4e0d\u4ec5\u80fd\u6ee1\u8db3\u5b9e\u9645\u9700\u6c42\uff0c\u8fd8\u4f5c\u4e3a\u5404\u79cd\u4ee3\u7406\u63a5\u5730\u573a\u666f\u7684\u91cd\u8981\u6982\u5ff5\u9a8c\u8bc1\u793a\u4f8b\uff0c\u5176\u6210\u529f\u5c06\u9884\u793a\u7740\u8bb8\u591a\u672a\u6765\u5e94\u7528\u7684\u8fdb\u6b65\u3002\u5148\u524d\u7684\u7814\u7a76\u901a\u5e38\u624b\u5de5\u8bbe\u8ba1\u7f51\u7edc\u4ee3\u7406\u7b56\u7565\uff08\u4f8b\u5982\u63d0\u793a\u6a21\u677f\u3001\u591a\u4ee3\u7406\u7cfb\u7edf\u3001\u641c\u7d22\u65b9\u6cd5\u7b49\uff09\uff0c\u800c\u8fd9\u4e9b\u7b56\u7565\u53ef\u80fd\u65e0\u6cd5\u5f88\u597d\u5730\u6cdb\u5316\u5230\u6240\u6709\u73b0\u5b9e\u4e16\u754c\u573a\u666f\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u5173\u4e8e\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf/\u52a8\u4f5c\u8868\u793a\u4e0eLLM\u9884\u8bad\u7ec3\u6570\u636e\u4e4b\u95f4\u7684\u4e0d\u5339\u914d\u7814\u7a76\u8f83\u5c11\u3002\u8fd9\u79cd\u5dee\u5f02\u5c24\u5176\u660e\u663e\uff0c\u5f53LLM\u4e3b\u8981\u9488\u5bf9\u8bed\u8a00\u5b8c\u6210\u800c\u975e\u6d89\u53ca\u5177\u8eab\u5bfc\u822a\u52a8\u4f5c\u548c\u7b26\u53f7\u7f51\u7edc\u5143\u7d20\u7684\u4efb\u52a1\u8fdb\u884c\u8bad\u7ec3\u65f6\u3002\u6211\u4eec\u7684\u7814\u7a76\u901a\u8fc7\u7b80\u5355\u5730\u4f18\u5316LLM\u7f51\u7edc\u4ee3\u7406\u7684\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\uff0c\u4f7f\u5176\u66f4\u597d\u5730\u4e0eLLM\u7684\u80fd\u529b\u76f8\u5339\u914d\uff0c\u4ece\u800c\u63d0\u5347\u4e86\u6027\u80fd\u3002\u8fd9\u79cd\u65b9\u6cd5\u4f7f\u6211\u4eec\u57fa\u4e8e\u57fa\u7840\u4ee3\u7406\u7684AgentOccam\u5728\u5e7f\u6cdb\u7684\u7f51\u7edc\u4efb\u52a1\u4e0a\u663e\u8457\u4f18\u4e8e\u4ee5\u524d\u7684\u65b9\u6cd5\u3002\u5177\u4f53\u800c\u8a00\uff0c\u5728WebArena\u57fa\u51c6\u6d4b\u8bd5\u4e2d\uff0c\u8be5\u57fa\u51c6\u6d4b\u8bd5\u6db5\u76d6\u4e86\u4e00\u822c\u7528\u9014\u7684\u7f51\u7edc\u4ea4\u4e92\u4efb\u52a1\uff0cAgentOccam\u6bd4\u524d\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u548c\u540c\u671f\u5de5\u4f5c\u5206\u522b\u9ad8\u51fa9.8\u5206\uff08+29.4%\uff09\u548c5.9\u5206\uff08+15.8%\uff09\uff0c\u5e76\u4e14\u6210\u529f\u7387\u8fbe\u523026.6\u5206\uff08+161%\uff09\uff0c\u8d85\u8fc7\u4e86\u7c7b\u4f3c\u7684\u57fa\u7840\u7f51\u7edc\u4ee3\u7406\u3002\u6211\u4eec\u6ca1\u6709\u4f7f\u7528\u4e0a\u4e0b\u6587\u793a\u4f8b\u3001\u65b0\u4ee3\u7406\u89d2\u8272\u3001\u5728\u7ebf\u53cd\u9988\u6216\u641c\u7d22\u7b56\u7565\u3002AgentOccam\u7684\u7b80\u6d01\u8bbe\u8ba1\u7a81\u663e\u4e86LLM\u5728\u65e0\u6837\u672c\u5b66\u4e60\u4e0b\u6267\u884c\u7f51\u7edc\u4efb\u52a1\u7684\u5f3a\u5927\u80fd\u529b\uff0c\u5e76\u5f3a\u8c03\u4e86\u7cbe\u5fc3\u8c03\u6574\u89c2\u5bdf\u548c\u52a8\u4f5c\u7a7a\u95f4\u5bf9\u4e8e\u57fa\u4e8eLLM\u7684\u4ee3\u7406\u7684\u91cd\u8981\u6027\u3002|\n", "2410.13824": "|**2024-10-17**|**Harnessing Webpage UIs for Text-Rich Visual Understanding**|Junpeng Liu et.al.|[2410.13824](http://arxiv.org/abs/2410.13824)|null|\u6587\u672c\u4e30\u5bcc\u7684\u89c6\u89c9\u7406\u89e3\u2014\u2014\u5373\u5904\u7406\u5bc6\u96c6\u6587\u672c\u5185\u5bb9\u4e0e\u89c6\u89c9\u5143\u7d20\u76f8\u878d\u5408\u7684\u73af\u5883\u7684\u80fd\u529b\uff0c\u5bf9\u4e8e\u591a\u6a21\u6001\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08MLLMs\uff09\u6709\u6548\u4ea4\u4e92\u7ed3\u6784\u5316\u73af\u5883\u81f3\u5173\u91cd\u8981\u3002\u4e3a\u4e86\u589e\u5f3a\u8fd9\u79cd\u80fd\u529b\uff0c\u6211\u4eec\u63d0\u51fa\u5229\u7528\u57fa\u4e8e\u6587\u672c\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08LLMs\uff09\u4ece\u7f51\u9875\u7528\u6237\u754c\u9762\u5408\u6210\u901a\u7528\u7684\u591a\u6a21\u6001\u6307\u4ee4\u3002\u5c3d\u7ba1\u7f3a\u4e4f\u76f4\u63a5\u7684\u89c6\u89c9\u8f93\u5165\uff0c\u57fa\u4e8e\u6587\u672c\u7684LLMs\u80fd\u591f\u5904\u7406\u6765\u81ea\u7f51\u9875\u53ef\u8bbf\u95ee\u6027\u6811\u7684\u7ed3\u6784\u5316\u6587\u672c\u8868\u793a\u3002\u8fd9\u4e9b\u6307\u4ee4\u968f\u540e\u4e0eUI\u622a\u56fe\u914d\u5bf9\u4ee5\u8bad\u7ec3\u591a\u6a21\u6001\u6a21\u578b\u3002\u6211\u4eec\u5f15\u5165\u4e86MultiUI\u6570\u636e\u96c6\uff0c\u8be5\u6570\u636e\u96c6\u5305\u542b\u6765\u81ea100\u4e07\u4e2a\u7f51\u7ad9\u7684730\u4e07\u6837\u672c\uff0c\u6db5\u76d6\u4e86\u591a\u6837\u5316\u7684\u591a\u6a21\u6001\u4efb\u52a1\u548cUI\u5e03\u5c40\u3002\u5728MultiUI\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u4e0d\u4ec5\u5728\u7f51\u9875UI\u4efb\u52a1\u4e2d\u8868\u73b0\u51fa\u8272\u2014\u2014\u5728VisualWebBench\u4e0a\u7684\u6027\u80fd\u63d0\u5347\u9ad8\u8fbe48%\uff0c\u5728\u7f51\u9875\u4ee3\u7406\u6570\u636e\u96c6Mind2Web\u4e0a\u7684\u52a8\u4f5c\u51c6\u786e\u6027\u63d0\u9ad8\u4e8619.1%\u2014\u2014\u800c\u4e14\u5728\u975e\u7f51\u9875UI\u4efb\u52a1\u4ee5\u53ca\u975eUI\u9886\u57df\uff08\u5982\u6587\u6863\u7406\u89e3\u3001OCR\u548c\u56fe\u8868\u89e3\u91ca\uff09\u4e2d\u7684\u6cdb\u5316\u6548\u679c\u4e5f\u51fa\u4e4e\u610f\u6599\u5730\u597d\u3002\u8fd9\u4e9b\u7ed3\u679c\u7a81\u663e\u4e86\u7f51\u9875UI\u6570\u636e\u5728\u63a8\u8fdb\u5404\u79cd\u573a\u666f\u4e0b\u6587\u672c\u4e30\u5bcc\u7684\u89c6\u89c9\u7406\u89e3\u65b9\u9762\u7684\u5e7f\u6cdb\u5e94\u7528\u3002|\n"}} \ No newline at end of file